RE: [PATCH net-next v9] openvswitch: enable NSH support
> > The optimization Yi refers to only affects the slow path translation. > > > > OVS 2.8 does not immediately trigger an immediate recirculation after > > translating > > encap(nsh,...). There is no need to do so as the flow key of the resulting > > packet > > can be determined from the encap() action and its properties. Translation > > continues with the rewritten flow key and subsequent OpenFlow actions will > > typically set the new fields in the new NSH header. The push_nsh datapath > > action > > (including all NSH header fields) is only generated at the next commit, > > e.g. for > > output, cloning, recirculation, encap/decap or another destructive change of > > the flow key. > > > > The implementation of push_nsh in the user-space datapath does not update > > the miniflow (key) of the packet, only the packet data and some metadata. > > If the packet needs to be looked up again the slow path triggers > > recirculation > > to re-parse the packet. There should be no need for the datapath push_nsh > > action to try to update the flow key. > > Thanks Jan for clarification, it can still work after removing that > line, our flows didn't match it after push_nsh, it is output to > VxLAN-gpe port after push_nsh, I'm not sure if we can match dl_type and NSH > fields if we don't output and don't recirculate. No worries, a packet cannot be matched again in the datapath unless it is recirculated. And recirculation today always implies re-parsing. In the future we want to look into possibilities to optimize performance of recirculation, for example by skipping the parsing stage if it is unnecessary. For that we may need to invalidate the flow key in packet metadata when the packet is modified without corresponding update of the key itself. But that is music of the future. /Jan
RE: [PATCH net-next v9] openvswitch: enable NSH support
> From: Yang, Yi [mailto:yi.y.y...@intel.com] > Sent: Friday, 29 September, 2017 08:41 > To: Pravin Shelar > Cc: Jiri Benc ; netdev@vger.kernel.org; > d...@openvswitch.org; e...@erig.me; da...@davemloft.net; Jan Scheurich > > Subject: Re: [PATCH net-next v9] openvswitch: enable NSH support > > On Fri, Sep 29, 2017 at 02:28:38AM +0800, Pravin Shelar wrote: > > On Tue, Sep 26, 2017 at 6:39 PM, Yang, Yi wrote: > > > On Tue, Sep 26, 2017 at 06:49:14PM +0800, Jiri Benc wrote: > > >> On Tue, 26 Sep 2017 12:55:39 +0800, Yang, Yi wrote: > > >> > After push_nsh, the packet won't be recirculated to flow pipeline, so > > >> > key->eth.type must be set explicitly here, but for pop_nsh, the packet > > >> > will be recirculated to flow pipeline, it will be reparsed, so > > >> > key->eth.type will be set in packet parse function, we needn't handle > > >> > it > > >> > in pop_nsh. > > >> > > >> This seems to be a very different approach than what we currently have. > > >> Looking at the code, the requirement after "destructive" actions such > > >> as pushing or popping headers is to recirculate. > > > > > > This is optimization proposed by Jan Scheurich, recurculating after > > > push_nsh > > > will impact on performance, recurculating after pop_nsh is unavoidable, So > > > also cc jan.scheur...@ericsson.com. > > > > > > Actucally all the keys before push_nsh are still there after push_nsh, > > > push_nsh has updated all the nsh keys, so recirculating remains avoidable. > > > > > > > > > We should keep existing model for this patch. Later you can submit > > optimization patch with specific use cases and performance > > improvement. So that we can evaluate code complexity and benefits. > > Ok, I'll remove the below line in push_nsh and send out v11, thanks. > > key->eth.type = htons(ETH_P_NSH); The optimization Yi refers to only affects the slow path translation. OVS 2.8 does not immediately trigger an immediate recirculation after translating encap(nsh,...). There is no need to do so as the flow key of the resulting packet can be determined from the encap() action and its properties. Translation continues with the rewritten flow key and subsequent OpenFlow actions will typically set the new fields in the new NSH header. The push_nsh datapath action (including all NSH header fields) is only generated at the next commit, e.g. for output, cloning, recirculation, encap/decap or another destructive change of the flow key. The implementation of push_nsh in the user-space datapath does not update the miniflow (key) of the packet, only the packet data and some metadata. If the packet needs to be looked up again the slow path triggers recirculation to re-parse the packet. There should be no need for the datapath push_nsh action to try to update the flow key. BR, Jan
RE: [PATCH net-next v6 3/3] openvswitch: enable NSH support
> > There is no way we can re-use the existing TLV tunnel metadata > > infrastructure in OVS for matching and setting NSH MD2 TLV headers. We > > will need to introduce a new (perhaps similar) scheme for modelling > > generic TLV match registers in OVS that are assigned to protocol TLVs > > by the controller. This is FFS. > > This is what I don't understand. > > Why can't you just reuse the space in the struct sw_flow_key where > geneve would put in their metadata. There are 255 empty bytes at the > beginning if you don't have other tunnel metadata anyway. > > If you receive packets over vxlan(gpe), tun_opts gets populated with an > ip_tunnel_key. Couldn't you use the options space in there after the > ip_tunnel_key to store the NSH context just for the sake of storing them > somewhere instead of adding 16 bytes to sw_flow_key? There is a significant conceptual difference between tunnel metadata (copied from a popped tunnel header) and packed match fields extracted during parsing of the packets. If we'd store them in the same space in the sw_flow_key struct, we are calling for trouble. NSH is transport agnostic, it should work over Ethernet, VXLAN(GPE) and other transport tunnels. Think about an NSH packet arriving on an Geneve tunnel port. Any Geneve tunnel options have already been stored in the tun_opts metadata bytes. Now the datapath parses the NSH header and overwrites the tun_opts metadata with the NSH metadata. This would break the OVS semantics. I absolutely understand your concern about efficient space utilization in the flow struct for TLV match fields and it will be part of the design challenge for MD2 TLV support to find a good balance between memory and run-time efficiency. But that is FFS. For the four fixed size MD1 headers the decision has been to include them as additional attributes in the flow key. BR, Jan
RE: [PATCH net-next v6 3/3] openvswitch: enable NSH support
> >> Yes, I wrote that in my previous mail. I wonder why NSH context metadata > >> is not in tun_metadata as well? > > > > tun_metadata is tunnel metadata, GENEVE needs tunnel port, but NSH is > > not so, NSH can't directly use tun_metadata, for MD type 2, we need to a > > lot of rework on tun_metadata to make it shared between GENEVE and NSH, > > I don't think this can happen in near term. So tun_metadata isn't option > > for this now. > > Sorry, I couldn't follow you. Why can't you store the context headers in > tun_metadata exactly? > I think we mixing things. Let me try to clarify: 1. NSH context metadata has end-to-end significance for the SFP. They must be part of the NSH header and cannot be transported as tunnel metadata, because transport tunnels (e.g. Geneve) only connect pairs of SFFs in the path. So we need OVS to be able to match on and set NSH context header fields, also for MD2 TLVs in the future. 2. OVS today has support for matching on TLV tunnel metadata after termination of a Geneve tunnel. This infrastructure is only usable for OVS tunnel ports (like Geneve) but not for matching on TLV headers of the NSH protocol, which is not modelled as an OVS tunnel port but handled in the OpenFlow pipeline (with generic encp/decap actions to enter/terminate an NSH SFP). This was a strategic decision by the OVS community two years ago. There is no way we can re-use the existing TLV tunnel metadata infrastructure in OVS for matching and setting NSH MD2 TLV headers. We will need to introduce a new (perhaps similar) scheme for modelling generic TLV match registers in OVS that are assigned to protocol TLVs by the controller. This is FFS. BR, Jan
RE: [PATCH net-next v6 3/3] openvswitch: enable NSH support
Hi Hannes, Please have a look at the Google doc that sketches the overall solution to support NSH in OVS. https://drive.google.com/open?id=1oWMYUH8sjZJzWa72o2q9kU0N6pNE-rwZcLH3-kbbDR8 In details it is slightly outdated but the NSH MD1 support (and all prerequisites like PTAP and Generic Encap/Decap) have been implemented in OVS 2.8 (ofproto layer and the userspace datapath). The NSH use cases are discussed in the chapter "Support for NSH". OVS is primarily targeting the Classifier and SFF use cases. Obviously the classifier must be able to set the MD1 context headers. The final SFF must be able to inspect the context headers, typically to determine the L2 or L3 forwarding context (e.g. VRF) in which to forward the decapsulated packet on to its final destination. This information has end-to-end significance for the SFP and is encoded by the classifier in the NSH context headers. It cannot be put into transport tunnel options of e.g. a Geneve tunnel connecting two SFFs along the SFP. We are now trying to establish feature parity in the kernel datapath. If the kernel datapath chooses not to support the MD1 fields, OVS with kernel datapath will not be able to address the classifier and final SFF use cases. BR, Jan > -Original Message- > From: Hannes Frederic Sowa [mailto:han...@stressinduktion.org] > Sent: Tuesday, 05 September, 2017 12:30 > To: Yang, Yi > Cc: netdev@vger.kernel.org; d...@openvswitch.org; jb...@redhat.com; > e...@erig.me; b...@ovn.org; Jan Scheurich > > Subject: Re: [PATCH net-next v6 3/3] openvswitch: enable NSH support > > "Yang, Yi" writes: > > > I'm not sure what new action you expect to bring here, I think group > > action is just for this, as you said it isn't only bound to NSH, you can > > start a new thread to discuss this. I don't think it is in scope of NSH. > > It is in scope of this discussion as you will provide a user space API > that makes the NSH context fields accessible from user space in a > certain way. If you commit to this, there is no way going back. > > I haven't yet grasped the idea on how those fields will be used in OVS > besides load balancing. Even for load balancing the tunnel itself > (vxlan-gpe + UDP source port or ipv6 flowlabel) already provides enough > entropy to do per-flow load balancing. What else is needed? Why a > context header for that? You just need multiple action chains and pick > one randomly. > > The only protocol that I can compare that to is geneve with TLVs, but > the TLVs are global and uniquie and a property of the networking > forwarding backplane and not a property of the path inside a tenant. So > I expect this actually to be the first case where I think that matters. > > Why are context labels that special that they are not part of tun_ops? > > Thanks, > Hannes
RE: [PATCH net-next v7] openvswitch: enable NSH support
> From: Jiri Benc [mailto:jb...@redhat.com] > > So what is the correct layout for MASKED_SET action with nested fields? > > 1. All nested values, followed by all nested masks, or > > 2. For each nested field value followed by mask? > > > > I guess alternative 1, but just to be sure. > > It's 2. Alternative 1 breaks netlink assumptions, is ugly to implement > and probably impossible to be properly validated. OK. For outsiders this was far from obvious :-) So, for OVS_ACTION_ATTR_SET_MASKED any nested attribute, no matter on which nesting level, must contain value directly followed by mask. If that is the principle of handling masks in Netlink APIs, than we must adhere to it. BR, Jan
RE: [PATCH net-next v7] openvswitch: enable NSH support
> On Mon, 4 Sep 2017 14:07:45 +0000, Jan Scheurich wrote: > > Then perhaps I misunderstood your comment. I thought you didn't like that > > the > > SET_MASKED action wrapped OVS_KEY_ATTR_NSH which in itself was nested. > > I was aiming to avoid this by lifting the two components of the NSH header > > components to the top level: > > OVS_NSH_ATTR_BASE_HEADER --> OVS_KEY_ATTR_NSH_BASE_HEADER > > OVS_NSH_ATTR_MD1_CONTEXT --> OVS_KEY_ATTR_NSH_MD1_CONTEXT > > No, this should be a nested attr. > > I objected to the way value+mask combo is handled. OK, sorry for the confusion. So what is the correct layout for MASKED_SET action with nested fields? 1. All nested values, followed by all nested masks, or 2. For each nested field value followed by mask? I guess alternative 1, but just to be sure. Jan
RE: [PATCH net-next v7] openvswitch: enable NSH support
> > So is what you are suggesting the following? > > > > For matching: > > OVS_KEY_ATTR_NSH_BASE_HEADERmandatory > > OVS_KEY_ATTR_NSH_MD1_CONTEXToptional, in case MDTYPE == MD1 > > This needs to be: > > OVS_KEY_ATTR_NSH > OVS_KEY_ATTR_NSH_BASE_HEADER > OVS_KEY_ATTR_NSH_MD1_CONTEXT > > > For setting: > > OVS_ACTION_ATTR_SET_MASKED > > -OVS_KEY_ATTR_NSH_BASE_HEADER when setting any base header > > fields > > OVS_ACTION_ATTR_SET_MASKED > > -OVS_KEY_ATTR_NSH_MD1_CONTEXT when setting any MD1 context data fields > > This needs to be: > > OVS_ACTION_ATTR_SET_MASKED > OVS_KEY_ATTR_NSH > OVS_KEY_ATTR_NSH_BASE_HEADER > OVS_KEY_ATTR_NSH_MD1_CONTEXT > Then perhaps I misunderstood your comment. I thought you didn't like that the SET_MASKED action wrapped OVS_KEY_ATTR_NSH which in itself was nested. I was aiming to avoid this by lifting the two components of the NSH header components to the top level: OVS_NSH_ATTR_BASE_HEADER --> OVS_KEY_ATTR_NSH_BASE_HEADER OVS_NSH_ATTR_MD1_CONTEXT --> OVS_KEY_ATTR_NSH_MD1_CONTEXT > In your example, there's no way to distinguish OVS_KEY_ATTR_ENCAP and > OVS_KEY_ATTR_NSH_BASE_HEADER, they are both "1". See above. The two would be separate values in the same enum, i.e. distinct. > > Should we consider to stick to that simple and efficient approach? As far > > As I can see it will be generic for the foreseeable future. > > I'm not sure. From the kernel point of view, we want the same > functionality offered by the openvswitch module and by a tc action. > > In theory, it can be the tc tool that constructs the header from the > command line but there's no precedent. Dunno. Not sure I have the full picture here. Are you saying that the tc tool uses the same Netlink API to the kernel as OVS? What would be the back-end for tc? Also the openvswitch kernel module? BR, Jan
RE: [PATCH net-next v7] openvswitch: enable NSH support
> On Mon, 4 Sep 2017 16:00:05 +0800, Yang, Yi wrote: > > I think we have had similiar discussion about this, the issue is we have > > no way to handle both MD type 1 and MD type 2 by using a common flow key > > struct. > > > > So we have to do so, there is miniflow to handle such issue in > > userspace, but kernel data path didn't provide such mechanism. I think > > every newly-added key will result in the same space-wasting issue for > > kernel data path, isn't it? > > Yes. And it would be surprising if it didn't have an effect on > performance. > > I don't care that much as this is a result of openvswitch module design > decisions but it still would be good to reach a consensus before > implementing uAPI that might not be needed in the end. Matching on NSH MD1 context headers is definitely required. So these must be part of the flow. > > > OVS_ATTR_KEY_NSH is the first case to use nested attributes for set > > action, so no reference code for this, set action has two use cases, one > > has mask but the other hasn't, so it will be easier an nested attribute is > > handled as a whole, in user space, nsh_key_from_nlattr is used in > > several places. > > I very much disagree. What you're doing is very poor design as can be > seen from the code where you have to do weird gymnastics to implement > that. Compare that to a much simple case where each attribute carries > its own value and mask. I have no principle objections against splitting NSH base header and MD1 context headers into independent key attributes on the uAPI if that simplifies the code. But we need to full picture here: So is what you are suggesting the following? For matching: OVS_KEY_ATTR_NSH_BASE_HEADERmandatory OVS_KEY_ATTR_NSH_MD1_CONTEXToptional, in case MDTYPE == MD1 For setting: OVS_ACTION_ATTR_SET_MASKED -OVS_KEY_ATTR_NSH_BASE_HEADER when setting any base header fields OVS_ACTION_ATTR_SET_MASKED -OVS_KEY_ATTR_NSH_MD1_CONTEXT when setting any MD1 context data fields For push_nsh: OVS_ACTION_ATTR_PUSH_NSH -OVS_KEY_ATTR_NSH_BASE_HEADER mandatory -OVS_NSH_ATTR_CONTEXT_HEADERS optional (opaque metadata octet stream) So push_nsh would still require nested attributes. Is that what we want (see below). > > How can we do so? I see batch process code in user space implementation, > > but kernel data path didn't such infrastructure, > > You're right that there's no infrastructure. I somewhat agree that it > might be out of scope of this patch and it can be optimized later. > That's why I also included other suggestions to speed this up until we > implement better parsing: namely, verify correctness once (at the time > it is set from user space) and just expect things to be correct in the > fast path. > > > how can we know next push_nsh uses the same nsh header as previous > > one? > > We store the prepopulated header together with the action. I agree. In our original modelling of OVS_ACTION_ATTR_PUSH_NSH in OVS 2.8 we introduced a monolithic push_nsh action struct including the NSH base header and variable length opaque context headers. That avoids any logic in the kernel datapath to translate a nested action attribute into an internal format with pre-populated header. The user space does that work. Should we consider to stick to that simple and efficient approach? As far As I can see it will be generic for the foreseeable future. > > > These checks should be done only once when the action is configured, not > > > for > > > each packet. > > > > I don't know how to implement a batch processing in kernel data path, it > > seems there isn't such code for reference. > > The checks should be done somewhere in __ovs_nla_copy_action. Which > seems to be done even in your patch by nsh_key_put_from_nlattr > (I didn't get that far during the review last week. The names of > those nsh_(key|hdr)_*_nlattr are so similar that it's impossible to > tell what they do without looking at the call sites - something you > should also consider to improve). That makes it even more wrong: you > appear to check everything twice, first on configuration and then for > every packet. Agree. Validation should only happen at action configuration time. BR, Jan
RE: [PATCH net-next v6 3/3] openvswitch: enable NSH support
> >> Does it makes sense to keep the context headers as part of the flow? > >> What is the reasoning behind it? With mdtype 2 headers this might either > >> not work very well or will increase sw_flow_key size causing slowdowns > >> for all protocols. > > > > For userspace, miniflow can handle such issue, but kernel data path > > didn't provide such mechanism, so I don't think we can think of a better > > way to fix this. > > I do think that both, kernel and dpdk dp, are fine if you don't match on > context fields, which I think is the common case. Agree > As soon as a few paths start to match on contexts at different offsets I > am not sure if miniflow will actually help. I don't know enough about > that. Set cover is a NP hard problem, so constructing plain simple flows > will be an approximation somewhere anyway at some point. > > If the context values are in some way discrete it might make sense, but > for load balancing purposes your ingress router might fill out one > context field with a flow hash of the inner flow to allow ECMP or > loadbalancing. You might want to treat that separately. Otherwise the > different paths might always need to context[3] & 0x3 (bits as number of > possible next hops) and put that into the flow state. Because every hop > and every path might have different numbers of nexthops etc., I don't > think this is easy unifiable anymore. Yes, that would become a scaling challenge for the datapaths and also for schemes to off-load that datapath to HW. I believe it requires discipline on the controller side to not let the number of needed masks explode for OVS. > > I believe every newly-added key will result in similiar issue you > > concern, maybe it will be better to introduce miniflow in kernel data > > path, but it isn't in scope of this patch series. > > You will see the same problem with offloading flows e.g. to network > cards very soon. Flow explosion here will cause your 100Gbit/s NIC > suddenly to go to software slow path. > > > It will be great if you can have a better proposal to fix your concern. > > I do think that something alike miniflow might make sense, but I don't > know enough about miniflow. Struct miniflow in the netdev datapath is merely a compact representation of struct flow. struct flow is partitioned in to 64-bit "stripes" and struct miniflow only stores those 64-bit bit stripes that have a non-zero mask stripe. It reduces the memory footprint for storing packet flow and megaflow entries but does not provide any megaflow lookup performance as such. > New ACTIONS, that are not only bound to NSH, seem appealing to me at the > moment. E.g. have a subtable match in the kernel dp or allowing for some > OXM % modulo -> select action or neighbor alike thing would probably > cover a lot of cases in the beginning. The submatch would probably very > much reassemble the miniflow in dpdk dp. I would like to discuss your ideas to optimize scalable load-sharing in OVS. But I think we should do that in a separate mail thread to let the immediate NSH kernel datapath work proceed. Can you provide some more details of what you suggest? Perhaps take that on the ovs-dev mailing list. BR, Jan
RE: [ovs-dev] [PATCH net-next v6 3/3] openvswitch: enable NSH support
> >> >> Does it makes sense to keep the context headers as part of the flow? > >> >> What is the reasoning behind it? With mdtype 2 headers this might > >> >> either not work very well or will increase sw_flow_key size causing > >> >> slowdowns for all protocols. > >> > [Mooney, Sean K] > >> > Having the nsh context headers in the flow is quite useful It would > >> > allow loadblancing on values stored in the context headers Or other > >> > use. I belive odl previously used context header 4 to store a Flow id > >> > so this could potentialy be used with the multipath action to have > >> ovs > >> > Choose between several possible next hops in the chain. > >> > >> In OVS, masks are a list(!) for matching. How can this work for > >> different paths that might require different masks? If they can't be > >> unified you even get exact matches. Thus, for OVS the context should > >> not be part of the flow. > > > [Mooney, Sean K] I'm not sure what you mean about a list as I never > > made reference to one. md type 1 context headers are 4 mandatory 32 > > bit field. > > The semantic of the context fields depend on the path id. Every path can > have their own interpretation of context fields. > > Thus the paths will cetainly have their own masks. I hope I understood > this part of the standard correctly. > > dp only supports installing (approximated) disjoint megaflows and this > affects performance a lot. Every new mask that gets added to the dp will > be installed into a list which will be walked sequentially for > packets. This will kill performance. > > I assume that user space slows down, too, depending on the algorithm > they use generate megaflows. The primary use case for OVS in SFC/NSH is SFF. In almost all SFF use cases OVS will not need to match on context headers. The ability of OVS to match on context headers should not in general slow down the datapath. When SFC controllers match on parts of the context headers (e.g. in the final SFF or for load-balancing purposes), we will get additional megaflow masks. This is a fact of life in OVS and nothing new in NSH. I don't think this should prevent us from supporting valid use cases in OVS. The slow-down of the megaflow lookup in the datapath with the number of masks has been addressed in the user-space datapath with sorting the mask lists according to hit frequency and maintaining one sorted lists per ingress port. There is further work in progress to improve on this with a cuckoo hash to pick the most likely mask first. It should be possible to apply similar optimization schemes also in the Kernel datapath. > > form an ovs context they should be treated the same as the 32 bit register > > fields. > > We do not need to necessarily support those in md type 2 as all metadata is > > optional. Support for matching on or updating MD2 TLV context headers is FFS in a future OVS release. We do not foresee to add MD2 TLVs explicitly to the flow struct. > > You can also see that context header 1 and 2 are still used in the newer > > odl netvirt sfc classifier implementation > > https://github.com/opendaylight/netvirt/blob/ba22f7cf19d8a827d77a3391a7f654344ade43d8/docs/specs/new-sfc- > classifier.rst#pipeline-changes > > so for odl existing nsh support to work with md type one we must have the > > ability to set the nsh context headers > > and should have the ability to match on them also for load lancing between > > service function in service function group. > > As I said, a separate action sounds much more useful. I don't think it is a good approach in OVS to introduce custom actions for NSH, e.g. for load sharing based on context header data. The primary tool for load sharing in OpenFlow is the Select Group. OVS already support an extension to the OF 1.5 Group Mod message to specify which fields to hash on. This can be used to hash on the relevant NSH context headers. BR, Jan
RE: [ovs-dev] [PATCH net-next v6 3/3] openvswitch: enable NSH support
> > [Mooney, Sean K] > > Having the nsh context headers in the flow is quite useful It would > > allow loadblancing on values stored in the context headers Or other > > use. I belive odl previously used context header 4 to store a Flow id > > so this could potentialy be used with the multipath action to have ovs > > Choose between several possible next hops in the chain. > > In OVS, masks are a list(!) for matching. How can this work for different > paths that might require different masks? If they can't be unified you even > get exact matches. Thus, for OVS the context should not be part of the > flow. The NSH support in OVS 2.8 (for the user-space datapath only, so far) supports matching on and manipulating the fixed size MD1 context headers C1-C4. They are part of the flow and there are corresponding OXM fields defined. It is up to the SDN controller to program pipelines that match on or set these fields. The goal was to support all relevant NSH use cases for MD1: Classifier, SFF, and (with certain limitations) NSH proxy, and SF. We also support MD2 TLV context headers but not yet for matching and setting, so MD2 TLVs are not part of the flow. OVS 2.8 can add MD2 context TLVs with the encap(nsh) action (classifier use case), can transparently forward MD2 headers (SFF use case) and pop an NSH header with MD2 context (final SFF use case). Support for matching and setting MD2 TLVs is FFS and can be added in a later release. BR, Jan
RE: [PATCH net-next v4] openvswitch: enable NSH support
> > If I understand correctly, this is a default definition that can be > > overridden by drivers so that we still cannot rely on the Ethernet > > payload always being 32-bit-aligned. > > Not by drivers, by architectures. Only architectures that can efficiently > handle unaligned access (or on which the cost of handling unaligned access > is lower than the cost of unaligned DMA access) set NET_IP_ALIGN to 0. Thanks, got it! > > > Furthermore, there seem to be machine architectures that cannot handle > > misaligned 32-bit access at all (not even with a performance penalty). > > Those leave NET_IP_ALIGN to 2. Dito > > > Or why else does OVS user space code take so great pain to model > > possible misalignment and provide/use safe access functions? > > I don't know how the ovs user space deals with packet allocation. In the > kernel, the network header is aligned in a way that it allows efficient 32bit > access. It seems that OVS has not had the same approach as Linux. There is no config parameter covering the alignment characteristics of the machine architecture. For packets buffers received from outside sources (e.g. DPDK interfaces) they make no assumptions on alignment and play safe. For packets allocated inside OVS, the Ethernet packet is typically stored so that the L3 header is 32-bit aligned, so that the misalignment precautions would be unnecessary. But I didn't check all code paths.
RE: [PATCH net-next v4] openvswitch: enable NSH support
> > NSH can be carried over Ethernet with a 14 byte header. In that case > > the total NSH header would typically be 16-bit aligned, so that all > > 32-bit members would be misaligned. > > See NET_IP_ALIGN in include/linux/skbuff.h. If I understand correctly, this is a default definition that can be overridden by drivers so that we still cannot rely on the Ethernet payload always being 32-bit-aligned. Furthermore, there seem to be machine architectures that cannot handle misaligned 32-bit access at all (not even with a performance penalty). Or why else does OVS user space code take so great pain to model possible misalignment and provide/use safe access functions? Does Linux kernel code generally ignore this risk? /Jan
RE: [PATCH net-next v4] openvswitch: enable NSH support
> > struct nsh_hdr { > > ovs_be16 ver_flags_ttl_len; > > uint8_t mdtype; > > uint8_t np; > > ovs_16aligned_be32 path_hdr; > > union { > > struct nsh_md1_ctx md1; > > struct nsh_md2_tlv md2[]; > > I'm not that sure about this. With each member of md2 having a different > size, you can't use md2 as an array. However, if it was declared as an > array, it might encourage such (wrong) usage. > > In particular, nsh_hdr->md2[1] is always wrong. > > It seems better to not declare md2 as an array. I understand your concern. But not declaring md2 as array is wrong as well, as there might not be an MD2 TLV context header. An MD2 NSH header is perfectly valid without any TLV. So in any case the user of the struct needs to be aware of the NSH semantics. > > > > I wonder about the possible 16-bit alignment of the 32-bit fields, > > though. How is that handled in the kernel? > > get_unaligned_* > > > Also struct nsh_md1_ctx > > has 32-bit members, which might not be 32-bit aligned in the packet. > > I don't see that happening, it seems the header before md1 is 8 bytes and > sizeof(md1) is 32 bytes? And for md2, the standard mandates that the md2 > size is a multiply of 4 bytes, too. NSH can be carried over Ethernet with a 14 byte header. In that case the total NSH header would typically be 16-bit aligned, so that all 32-bit members would be misaligned.
RE: [PATCH net-next v4] openvswitch: enable NSH support
> > The second member of the union should be a variable length array [] of > struct nsh_md2_tlv > > struct nsh_hdr { > ovs_be16 ver_flags_ttl_len; > uint8_t mdtype; > uint8_t np; > ovs_16aligned_be32 path_hdr; > union { > struct nsh_md1_ctx md1; > struct nsh_md2_tlv md2[]; > }; > }; > > That was the original design before Ben removed it due to missing support > in Microsoft compiler. > For the Kernel datapath we should go back to that. > > I wonder about the possible 16-bit alignment of the 32-bit fields, though. > How is that handled in the kernel? Also struct nsh_md1_ctx has 32-bit > members, which might not be 32-bit aligned in the packet. > One afterthought: I think it would be good to add another member to the union to represent unstructured context headers as used, e.g., in the context of push_nsh. struct nsh_hdr { ovs_be16 ver_flags_ttl_len; uint8_t mdtype; uint8_t np; ovs_16aligned_be32 path_hdr; union { uint8_t ctx_headers[]; struct nsh_md1_ctx md1; struct nsh_md2_tlv md2[]; }; };
RE: [PATCH net-next v4] openvswitch: enable NSH support
> struct nsh_hdr { > ovs_be16 ver_flags_ttl_len; > uint8_t mdtype; > uint8_t np; > ovs_16aligned_be32 path_hdr; > union { > struct nsh_md1_ctx md1; > struct nsh_md2_tlv md2; > }; > }; > The second member of the union should be a variable length array [] of struct nsh_md2_tlv struct nsh_hdr { ovs_be16 ver_flags_ttl_len; uint8_t mdtype; uint8_t np; ovs_16aligned_be32 path_hdr; union { struct nsh_md1_ctx md1; struct nsh_md2_tlv md2[]; }; }; That was the original design before Ben removed it due to missing support in Microsoft compiler. For the Kernel datapath we should go back to that. I wonder about the possible 16-bit alignment of the 32-bit fields, though. How is that handled in the kernel? Also struct nsh_md1_ctx has 32-bit members, which might not be 32-bit aligned in the packet. BR, Jan
RE: [PATCH net-next v2] openvswitch: enable NSH support
> From: Jiri Benc [mailto:jb...@redhat.com] > Sent: Monday, 14 August, 2017 12:48 > > On Mon, 14 Aug 2017 10:35:42 +, Jan Scheurich wrote: > > Is it worth to speculate on how a hypothetical future NSH version > > (with a different Version value in the Base header) might look like? > > Absolutely. This is uAPI we're talking about and once merged, it's set > in stone. Whatever we come up with should allow future extensibility. > > > If this should ever arise, we could introduce a new push_nsh_v2 > > action. > > Which would mean we failed with the design. We would have to maintain > two parallel APIs forever. This is not how the design should be made. Well, if that is the ultimate design goal, we'll have no choice. But if the hypothetic next NSH version looks entirely different, we will have to manage the incompatibility anyhow on the level of the nested attributes. So the only thing we will for sure manage to keep fixed might be the empty wrapper OVS_ACTION_ATTR_PUSH_NSH... We decided earlier on to go for dedicated push/pop_ actions in the datapath instead of a single pair of (very polymorphic) generic encap/decap datapath actions to make the implementation in the datapath more explicit and straightforward. Why not follow the same philosophy also when we need to support push/pop for incompatible versions of the same protocol? Jan
RE: [PATCH net-next v2] openvswitch: enable NSH support
> From: Jiri Benc [mailto:jb...@redhat.com] > Sent: Monday, 14 August, 2017 09:51 > > On Sun, 13 Aug 2017 21:13:57 +, Jan Scheurich wrote: > > Jiri, I am not too familiar with conventions on the OVS netlink > > interface regarding the handling of variable length fields. What is > > the benefit of structuring the push_nsh action into > > > > OVS_ACTION_ATTR_PUSH_NSH > > +-- OVS_ACTION_ATTR_NSH_BASE_HEADER > > +-- OVS_ACTION_ATTR_NSH_MD > > > > instead of grouping the base header fields and the variable length MD > > into a single monolithic attribute OVS_ACTION_ATTR_PUSH_NSH? Is the > > main concern a potential future need to add additional parameters to > > the push_nsh datapath action? Are there examples for such structured > > actions other than OVS_ACTION_ATTR_CT where the need is obvious > > because it is very polymorphic? > > This is about having the design clean. What would you do if you had two > variable length fields? Would you still put them in a single structure > and have a length field in the structure, too? That would be wrong, we > have length in the attribute header. I doubt you would do that. Which > indicates that putting variable length fields to an attribute with > anything other is wrong. > > Also, look at how ugly the code would be. You'd have to subtract the > base header length from the attribute length to get the variable data > length. That's not nice at all. > > Think about the netlink in the way that by default, every field should > have its own attribute. Structures are included only for performance > reasons where certain fields are always passed in a message and having > them in separate attributes would be impractical and waste of space. > Going out of your way to include the context data in the structure thus > doesn't make sense. > > > BTW: The name OVS_ACTION_ATTR_NSH_BASE_HEADER is misleading > because > > in the NSH draft the term base header is used for the first 32-bit > > word, whereas here it includes also the 32-bit Service Path header. > > An excellent comment. This means that it's very well possible that > future NSH versions may not include SP header or may have it of a > different size. Maybe we should consider putting it into a separate > attribute, too? Not sure it is needed, though, I don't think it's > likely to happen. I think the NSH RFC draft is pretty clear that the NSH header (Version 0) will always contain the Base Header, the Service Header and a (possibly empty) set of Context Headers. The length of the context headers is implied by the total NSH header Length in the Base Header. The internal structure of the context headers, implied by the "MD type" in the base header, is irrelevant for the push_nsh action as it is managed by the ofproto-dpif layer. The current NSH header format simply does not allow for a second variable length section in the header. Is it worth to speculate on how a hypothetical future NSH version (with a different Version value in the Base header) might look like? If this should ever arise, we could introduce a new push_nsh_v2 action. In summary, I'm still not convinced that we gain any significant generality by splitting up the OVS_ACTION_ATTR_PUSH_NSH action into nested attributes. It will only slow down the execution of the action in the datapath (or requires some optimization in the datapath to merge the attributes into a single internal action struct). Note: For the OVS_KEY_ATTR_NSH we should split up in to OVS_KEY_ATTR_NSH_FIXED_HEADER (covering Base and Service headers) and OVS_KEY_ATTR_NSH_MD1. Jan
RE: [PATCH net-next v2] openvswitch: enable NSH support
> From: Jiri Benc [mailto:jb...@redhat.com] > Sent: Friday, 11 August, 2017 12:23 > > On Fri, 11 Aug 2017 10:09:36 +, Jan Scheurich wrote: > > Unless someone can explain to me why the datapath should understand the > > internal structure/format of metadata in push_nsh, I would strongly > > vote to keep the metadata as variable length octet sequence in the > > non-structured OVS_ACTION_ATTR_PUSH_NSH > > Could be but it still needs to be in a different attribute and not in > the ovs_action_push_nsh structure. > > Separate attributes for MD1/MD2 has the advantage of easier validation: > with a separate MD1 type attribute, the size check is easier. With an > unstructured MD attribute, we'd need to look into the > OVS_ACTION_ATTR_NSH_BASE_HEADER attribute for mdtype and then validate > the unstructured MD attribute size manually. Not a big deal, though. > I don't have strong opinion here. > > But I do have strong opinion that MD needs to go into a separate > attribute, whether there are separate attributes for MD1/2 or not. Jiri, I am not too familiar with conventions on the OVS netlink interface regarding the handling of variable length fields. What is the benefit of structuring the push_nsh action into OVS_ACTION_ATTR_PUSH_NSH +-- OVS_ACTION_ATTR_NSH_BASE_HEADER +-- OVS_ACTION_ATTR_NSH_MD instead of grouping the base header fields and the variable length MD into a single monolithic attribute OVS_ACTION_ATTR_PUSH_NSH? Is the main concern a potential future need to add additional parameters to the push_nsh datapath action? Are there examples for such structured actions other than OVS_ACTION_ATTR_CT where the need is obvious because it is very polymorphic? BR, Jan BTW: The name OVS_ACTION_ATTR_NSH_BASE_HEADER is misleading because in the NSH draft the term base header is used for the first 32-bit word, whereas here it includes also the 32-bit Service Path header.
RE: [PATCH net-next v2] openvswitch: enable NSH support
> -Original Message- > From: Jiri Benc [mailto:jb...@redhat.com] > Sent: Friday, 11 August, 2017 11:45 > > The context field does not apply to MD type 2. It looks wrong for the > context field to be included in netlink attribute for anything other > than MD type 1. Perhaps it needs to be put into a separate attribute, > too? > > Note that I'm talking only about the uAPI. Internally, ovs can use > struct ovs_key_nsh that is MD type 1 only, there's no problem changing > that later. But for the user space interface, this needs to be clean. > This can be solved for example this way: > > In include/uapi/linux/openvswitch.h: > > struct ovs_key_nsh_base { > __u8 flags; > __u8 mdtype; > __u8 np; > __u8 pad; > __be32 path_hdr; > }; > > + one more netlink attribute carrying MD type 1 info. Will probably > require to change OVS_KEY_ATTR_NSH to a nested attribute etc. > > In net/openvswitch/flow.h (or perhaps a different header would be more > appropriate?): > > struct ovs_key_nsh { > struct ovs_key_nsh_base base; > __be32 context[4]; > }; > > Plus needed conversions between OVS_KEY_ATTR_NSH and struct ovs_key_nsh > when interfacing between the kernel and user space. > > That way, we can have MD type 1 support only for now while still being > allowed to redesign things in whatever way later. > > Jiri For the OVS_KEY_ATTR_NSH I agree to move the fixed size MD1 context headers from nsh_base to a separate struct and use nested netlink attributes. For OVS_ACTION_ATTR_PUSH_NSH attribute any metadata included is opaque to the datapath and should be copied as is into the packet header. I doubt that there is any benefit to model this with nested attributes for MD1 or MD2. This just adds complexity on sender and receiver side and requires updates in case there should be other MD types added later on. Unless someone can explain to me why the datapath should understand the internal structure/format of metadata in push_nsh, I would strongly vote to keep the metadata as variable length octet sequence in the non-structured OVS_ACTION_ATTR_PUSH_NSH BR, Jan
RE: [ovs-dev] [PATCH net-next] openvswitch: add NSH support
Hi all, In OVS 2.8 we support only fixed size NSH MD1 context data for matching and in set/copy_field actions. OVS parses an MD2 NSH header but does not make any TLV headers available to OF. The plan is to add support for matching/manipulating NSH MD2 TLVs through a new infrastructure of generic TLV match fields that can be dynamically mapped to specific protocol TLVs, similar to the way this is done for GENEVE tunnel metadata TLVs today. But this is work for an upcoming OVS release. However, in encap() and decap() NSH actions we do support MD2 format already. The encap_nsh datapath action is agnostic of the MD format. Any MD2 TLV metadata are provided as encap properties in the OF encap() operation. They are translated by the ofproto layer and forwarded as opaque byte sequence in the encap_nsh datapath action. Conversely, the decap_nsh() action pops any TLV metadata using the metadata length in the NSH header. Consequently the datapath action OVS_ACTION_ATTR_ENCAP_NSH is already declared variable length: odp_action_len(uint16_t type) { switch ((enum ovs_action_attr) type) { ... case OVS_ACTION_ATTR_ENCAP_NSH: return ATTR_LEN_VARIABLE; case OVS_ACTION_ATTR_DECAP_NSH: return 0; ... } Unfortunately, that has only partially been reflected in the rest of the code. The action struct should have a variable length metadata[] member and the function odp_put_encap_nsh_action() should set the action nl_attr length dynamically. I'll provide a patch to fix that shortly. BTW: I have no objections to renaming these datapath actions to push_nsh and pop_nsh if that helps avoiding confusion with the existing encap attributes on the netlink interface. But we should do that quickly as it is user-visible and affects unit test cases. BR, Jan > -Original Message- > From: ovs-dev-boun...@openvswitch.org > [mailto:ovs-dev-boun...@openvswitch.org] On Behalf Of Ben Pfaff > Sent: Wednesday, 09 August, 2017 04:42 > To: Yang, Yi Y > Cc: d...@openvswitch.org; netdev@vger.kernel.org; Jiri Benc > ; da...@davemloft.net > Subject: Re: [ovs-dev] [PATCH net-next] openvswitch: add NSH support > > To be clear, the OVS implementation is a placeholder. It will get > replaced by whatever netdev implements, and that's OK. I didn't focus > on making it perfect because I knew that. Instead, I just made sure it > was good enough for an internal OVS implementation that doesn't fix any > ABI or API. OVS can even change the user-visible action names, as long > as we do that soon (and encap/decap versus push/pop doesn't matter to > me). > > The considerations for netdev are different and more permanent. > > On Wed, Aug 09, 2017 at 02:05:12AM +, Yang, Yi Y wrote: > > Hi, Jiri > > > > Thank you for your comments. > > > > __be32 c[4] is the name Ben Pfaff suggested, the original name is c1, c2, > > c3, c4, they are context data, so c seems ok, too :-) > > > > OVS has merged it and has the same name, maybe the better way is adding > > comment /* Context data */ after it. > > > > For MD type 2, struct ovs_key_nsh is very difficult to cover it, so far we > > don't know how to support MD type 2 better, Geneve defined 64 > tun_metadata0-63 to handle this, those keys are parts of struct flow_tnl, the > highest possibility is to reuse those keys. > > > > So for future MD type 2, we will have two parts of keys, one is from struct > > ovs_key_nsh, another is from struct flow_tnl, this won't break > the old uAPI. > > > > "#define OVS_ENCAP_NSH_MAX_MD_LEN 16" is changed per Ben's comment from > > 256, Ben thinks 256 is too big but we only support > MD type 1 now. We still have ways to extend it, for example: > > > > struct ovs_action_encap_nsh * oaen = (struct ovs_action_encap_nsh *) malloc > > (sizeof(struct ovs_action_encap_nsh) + ANY_SIZE); > > > > nl_msg_put_unspec(actions, OVS_ACTION_ATTR_ENCAP_NSH, > > oaen, sizeof(struct ovs_action_encap_nsh) + > > ANY_SIZE); > > > > In addition, we also need to consider, OVS userspace code must be > > consistent with here, so keeping it intact will be better, we have way > to support dynamically extension when we add MD type 2 support. > > > > About action name, unfortunately, userspace data plane has named them as > > encap_nsh & decap_nsh, Jan, what do you think about Jiri's > suggestion? > > > > But from my understanding, encap_* & decap_* are better because they > > corresponding to generic encap & decap actions, in addition, > encap semantics are different from push, encap just pushed an empty header > with default values, users must use set_field to set the > content of the header. > > > > Again, OVS userspace code must be consistent with here, so keeping it > > intact will be better because OVS userspace code was there. > > > > > > -Original Message- > > From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] On > > Behalf Of Jiri Benc > > Sent: Tuesday, August 8, 2017 10:28 PM > > To: Yang, Yi