Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
On 12/14/2016 01:54 PM, Adrien Mazarguil wrote: >> >>> + * @param[out] error >>> + * Perform verbose error reporting if not NULL. >>> + * >>> + * @return >>> + * 0 on success, a negative errno value otherwise and rte_errno is >>> set. >>> + */ >>> +int >>> +rte_flow_query(uint8_t port_id, >>> + struct rte_flow *flow, >>> + enum rte_flow_action_type action, >>> + void *data, >>> + struct rte_flow_error *error); >>> + >>> +#ifdef __cplusplus >>> +} >>> +#endif >> >> I don't see a way to dump all the rules for a port out. I think this is >> neccessary for degbugging. You could have a look through dpif.h in OVS >> and see how dpif_flow_dump_next() is used, it might be a good reference. > > DPDK does not maintain flow rules and, depending on hardware capabilities > and level of compliance, PMDs do not necessarily do it either, > particularly > since it requires space and application probably have a better method to > store these pointers for their own needs. understood > > What you see here is only a PMD interface. Depending on applications > needs, > generic helper functions built on top of these may be added to manage flow > rules in the future. I'm thinking of the case where something goes wrong and I want to get a dump of all the flow rules from hardware, not query the rules I think I have. I don't see a way to do it or something to build a helper on top of? >>> >>> Generic helper functions would exist on top of this API and would likely >>> maintain a list of flow rules themselves. The dump in that case would be >>> entirely implemented in software. I think that recovering flow rules from HW >>> may be complicated in many cases (even without taking storage allocation and >>> rules conversion issues into account), therefore if there is really a need >>> for it, we could perhaps add a dump() function that PMDs are free to >>> implement later. >>> >> >> ok. Maybe there are some more generic stats that can be got from the >> hardware that would help debugging that would suffice, like total flow >> rule hits/misses (i.e. not on a per flow rule basis). >> >> You can get this from the software flow caches and it's widely used for >> debugging. e.g. >> >> pmd thread numa_id 0 core_id 3: >> emc hits:0 >> megaflow hits:0 >> avg. subtable lookups per hit:0.00 >> miss:0 >> > > Perhaps a rule such as the following could do the trick: > > group: 42 (or priority 42) > pattern: void > actions: count / passthru > > Assuming useful flow rules are defined with higher priorities (using lower > group ID or priority level) and provide a terminating action, this one would > count all packets that were not caught by them. > > That is one example to illustrate how "global" counters can be requested by > applications. > > Otherwise you could just make sure all rules contain mark / flag actions, in > which case mbufs would tell directly if they went through them or need > additional SW processing. > ok, sounds like there's some options at least to work with on this which is good. thanks.
Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
Hi Kevin, On Wed, Dec 14, 2016 at 11:48:04AM +, Kevin Traynor wrote: > hi Adrien, sorry for the delay > > <...> > > > Is it expected that the application or pmd will provide locking between > these functions if required? I think it's going to have to be the app. > >>> > >>> Locking is indeed expected to be performed by applications. This API only > >>> documents places where locking would make sense if necessary and expected > >>> behavior. > >>> > >>> Like all control path APIs, this one assumes a single control thread. > >>> Applications must take the necessary precautions. > >> > >> If you look at OVS now it's quite possible that you have 2 rx queues > >> serviced by different threads, that would also install the flow rules in > >> the software flow caches - possibly that could extend to adding hardware > >> flows. There could also be another thread that is querying for stats. So > >> anything that can be done to minimise the locking would be helpful - > >> maybe query() could be atomic and not require any locking? > > > > I think we need basic functions with as few constraints as possible on PMDs > > first, this API being somewhat complex to implement on their side. That > > covers the common use case where applications have a single control thread > > or otherwise perform locking on their own. > > > > Once the basics are there for most PMDs, we may add new functions, items, > > properties and actions that provide additional constraints (timing, > > multi-threading and so on), which remain to be defined according to > > feedback. It is designed to be extended without causing ABI breakage. > > I think Sugesh and I are trying to foresee some of the issues that may > arise when integrating with something like OVS. OTOH it's > hard/impossible to say what will be needed exactly in the API right now > to make it suitable for OVS. > > So, I'm ok with the approach you are taking by exposing a basic API > but I think there should be an expectation that it may not be sufficient > for a project like OVS to integrate in and may take several > iterations/extensions - don't go anywhere! > > > > > As for query(), let's see how PMDs handle it first. A race between query() > > and create() on a given device is almost unavoidable without locking, same > > for queries that reset counters in a given flow rule. Basic parallel queries > > should not cause any harm otherwise, although this cannot be guaranteed yet. > > You still have a race if there is locking, except it is for the lock, > but it has the same effect. The downside of my suggestion is that all > the PMDs would need to guarantee they could gets stats atomically - I'm > not sure if they can or it's too restrictive. > > > > > <...> > > >> > >>> > > + > > +/** > > + * Destroy a flow rule on a given port. > > + * > > + * Failure to destroy a flow rule handle may occur when other flow > > rules > > + * depend on it, and destroying it would result in an inconsistent > > state. > > + * > > + * This function is only guaranteed to succeed if handles are > > destroyed in > > + * reverse order of their creation. > > How can the application find this information out on error? > >>> > >>> Without maintaining a list, they cannot. The specified case is the only > >>> possible guarantee. That does not mean PMDs should not do their best to > >>> destroy flow rules, only that ordering must remain consistent in case of > >>> inability to destroy one. > >>> > >>> What do you suggest? > >> > >> I think if the app cannot remove a specific rule it may want to remove > >> all rules and deal with flows in software for a time. So once the app > >> knows it fails that should be enough. > > > > OK, then since destruction may return an error already, is it fine? > > Applications may call rte_flow_flush() (not supposed to fail unless there is > > a serious issue, abort() in that case) and switch to SW fallback. > > yes, it's fine. > > > > > <...> > > > + * @param[out] error > > + * Perform verbose error reporting if not NULL. > > + * > > + * @return > > + * 0 on success, a negative errno value otherwise and rte_errno is > > set. > > + */ > > +int > > +rte_flow_query(uint8_t port_id, > > + struct rte_flow *flow, > > + enum rte_flow_action_type action, > > + void *data, > > + struct rte_flow_error *error); > > + > > +#ifdef __cplusplus > > +} > > +#endif > > I don't see a way to dump all the rules for a port out. I think this is > neccessary for degbugging. You could have a look through dpif.h in OVS > and see how dpif_flow_dump_next() is used, it might be a good reference. > >>> > >>> DPDK does not maintain flow rules and, depending on hardware capabilities > >>> and level of compliance, PMDs do not necessarily do it either, > >>> particularly > >>>
Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
hi Adrien, sorry for the delay <...> Is it expected that the application or pmd will provide locking between these functions if required? I think it's going to have to be the app. >>> >>> Locking is indeed expected to be performed by applications. This API only >>> documents places where locking would make sense if necessary and expected >>> behavior. >>> >>> Like all control path APIs, this one assumes a single control thread. >>> Applications must take the necessary precautions. >> >> If you look at OVS now it's quite possible that you have 2 rx queues >> serviced by different threads, that would also install the flow rules in >> the software flow caches - possibly that could extend to adding hardware >> flows. There could also be another thread that is querying for stats. So >> anything that can be done to minimise the locking would be helpful - >> maybe query() could be atomic and not require any locking? > > I think we need basic functions with as few constraints as possible on PMDs > first, this API being somewhat complex to implement on their side. That > covers the common use case where applications have a single control thread > or otherwise perform locking on their own. > > Once the basics are there for most PMDs, we may add new functions, items, > properties and actions that provide additional constraints (timing, > multi-threading and so on), which remain to be defined according to > feedback. It is designed to be extended without causing ABI breakage. I think Sugesh and I are trying to foresee some of the issues that may arise when integrating with something like OVS. OTOH it's hard/impossible to say what will be needed exactly in the API right now to make it suitable for OVS. So, I'm ok with the approach you are taking by exposing a basic API but I think there should be an expectation that it may not be sufficient for a project like OVS to integrate in and may take several iterations/extensions - don't go anywhere! > > As for query(), let's see how PMDs handle it first. A race between query() > and create() on a given device is almost unavoidable without locking, same > for queries that reset counters in a given flow rule. Basic parallel queries > should not cause any harm otherwise, although this cannot be guaranteed yet. You still have a race if there is locking, except it is for the lock, but it has the same effect. The downside of my suggestion is that all the PMDs would need to guarantee they could gets stats atomically - I'm not sure if they can or it's too restrictive. > <...> >> >>> > + > +/** > + * Destroy a flow rule on a given port. > + * > + * Failure to destroy a flow rule handle may occur when other flow rules > + * depend on it, and destroying it would result in an inconsistent state. > + * > + * This function is only guaranteed to succeed if handles are destroyed > in > + * reverse order of their creation. How can the application find this information out on error? >>> >>> Without maintaining a list, they cannot. The specified case is the only >>> possible guarantee. That does not mean PMDs should not do their best to >>> destroy flow rules, only that ordering must remain consistent in case of >>> inability to destroy one. >>> >>> What do you suggest? >> >> I think if the app cannot remove a specific rule it may want to remove >> all rules and deal with flows in software for a time. So once the app >> knows it fails that should be enough. > > OK, then since destruction may return an error already, is it fine? > Applications may call rte_flow_flush() (not supposed to fail unless there is > a serious issue, abort() in that case) and switch to SW fallback. yes, it's fine. > <...> > + * @param[out] error > + * Perform verbose error reporting if not NULL. > + * > + * @return > + * 0 on success, a negative errno value otherwise and rte_errno is set. > + */ > +int > +rte_flow_query(uint8_t port_id, > +struct rte_flow *flow, > +enum rte_flow_action_type action, > +void *data, > +struct rte_flow_error *error); > + > +#ifdef __cplusplus > +} > +#endif I don't see a way to dump all the rules for a port out. I think this is neccessary for degbugging. You could have a look through dpif.h in OVS and see how dpif_flow_dump_next() is used, it might be a good reference. >>> >>> DPDK does not maintain flow rules and, depending on hardware capabilities >>> and level of compliance, PMDs do not necessarily do it either, particularly >>> since it requires space and application probably have a better method to >>> store these pointers for their own needs. >> >> understood >> >>> >>> What you see here is only a PMD interface. Depending on applications needs, >>> generic helper functions built on top of these may be added to manage flow >>> rules in the future. >> >> I'm thinking of the case where so
Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
Hi Sugesh, On Mon, Dec 12, 2016 at 10:20:18AM +, Chandran, Sugesh wrote: > Hi Adrien, > > Regards > _Sugesh > > > -Original Message- > > From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com] > > Sent: Friday, December 9, 2016 4:39 PM > > To: Chandran, Sugesh > > Cc: Kevin Traynor ; dev@dpdk.org; Thomas > > Monjalon ; De Lara Guarch, Pablo > > ; Olivier Matz ; > > sugesh.chand...@intel.comn > > Subject: Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API > > > > Hi Sugesh, > > > > On Fri, Dec 09, 2016 at 12:18:03PM +, Chandran, Sugesh wrote: > > [...] > > > > > Are you going to provide any control over the initialization of > > > > > NIC to define the capability matrices For eg; To operate in a L3 > > > > > router mode, > > > > software wanted to initialize the NIC port only to consider the L2 > > > > and L3 fields. > > > > > I assume the initialization is done based on the first rules that > > > > > are > > > > programmed into the NIC.? > > > > > > > > Precisely, PMDs are supposed to determine the most appropriate > > > > device mode to use in order to handle the requested rules. They may > > > > even switch to another mode if necessary assuming this does not > > > > break existing constraints. > > > > > > > > I think we've discussed an atomic (commit-based) mode of operation > > > > through separate functions as well, where the application would > > > > attempt to create a bunch of rules at once, possibly making it > > > > easier for PMDs to determine the most appropriate mode of operation > > for the device. > > > > > > > > All of these may be added later according to users feedback once the > > > > basic API has settled. > > > [Sugesh] Yes , we discussed about this before. However I feel that, it > > > make sense to provide some flexibility to the user/application to define a > > profile/mode of the device. > > > This way the complexity of determining the mode by itself will be taken > > away from PMD. > > > Looking at the P4 enablement patches in OVS, the mode definition APIs > > > can be used in conjunction > > > P4 behavioral model. > > > For eg: A P4 model for a L2 switch operate OVS as a L2 switch. Using > > > the mode definition APIs Its possible to impose the same behavioral model > > in the hardware too. > > > This way its simple, clean and very predictive though it needs to define > > > an > > additional profile_define APIs. > > > I am sorry to provide the comment at this stage, However looking at > > > the adoption of ebpf, P4 make me to think this way. > > > What do you think? > > > > What you suggest (device profile configuration) would be done by a separate > > function in any case, so as long as everyone agrees on a generic method to > > do so, no problem with extending rte_flow. By default in the meantime we'll > > have to rely on PMDs to make the right decision. > [Sugesh] I am fine with PMD is making the decision on profile/mode selection > in > Default case. However we must provide an option for the application to define > a mode > and PMD must honor with it to avoid making an invalid mode change. > > > > Do you think it has to be defined from the beginning? > [Sugesh] I feel it's going to be another big topic to decide how proposed > mode implementation will be looks like, > What should be available modes and etc. So I am OK to consider as its not > part of this flow API definition for now. > However its good to mention that in the API comments section to be aware. Do > you agree that? Will do, I'll mention it in the "future evolutions" section. -- Adrien Mazarguil 6WIND
Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
Hi Adrien, Regards _Sugesh > -Original Message- > From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com] > Sent: Friday, December 9, 2016 4:39 PM > To: Chandran, Sugesh > Cc: Kevin Traynor ; dev@dpdk.org; Thomas > Monjalon ; De Lara Guarch, Pablo > ; Olivier Matz ; > sugesh.chand...@intel.comn > Subject: Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API > > Hi Sugesh, > > On Fri, Dec 09, 2016 at 12:18:03PM +, Chandran, Sugesh wrote: > [...] > > > > Are you going to provide any control over the initialization of > > > > NIC to define the capability matrices For eg; To operate in a L3 > > > > router mode, > > > software wanted to initialize the NIC port only to consider the L2 > > > and L3 fields. > > > > I assume the initialization is done based on the first rules that > > > > are > > > programmed into the NIC.? > > > > > > Precisely, PMDs are supposed to determine the most appropriate > > > device mode to use in order to handle the requested rules. They may > > > even switch to another mode if necessary assuming this does not > > > break existing constraints. > > > > > > I think we've discussed an atomic (commit-based) mode of operation > > > through separate functions as well, where the application would > > > attempt to create a bunch of rules at once, possibly making it > > > easier for PMDs to determine the most appropriate mode of operation > for the device. > > > > > > All of these may be added later according to users feedback once the > > > basic API has settled. > > [Sugesh] Yes , we discussed about this before. However I feel that, it > > make sense to provide some flexibility to the user/application to define a > profile/mode of the device. > > This way the complexity of determining the mode by itself will be taken > away from PMD. > > Looking at the P4 enablement patches in OVS, the mode definition APIs > > can be used in conjunction > > P4 behavioral model. > > For eg: A P4 model for a L2 switch operate OVS as a L2 switch. Using > > the mode definition APIs Its possible to impose the same behavioral model > in the hardware too. > > This way its simple, clean and very predictive though it needs to define an > additional profile_define APIs. > > I am sorry to provide the comment at this stage, However looking at > > the adoption of ebpf, P4 make me to think this way. > > What do you think? > > What you suggest (device profile configuration) would be done by a separate > function in any case, so as long as everyone agrees on a generic method to > do so, no problem with extending rte_flow. By default in the meantime we'll > have to rely on PMDs to make the right decision. [Sugesh] I am fine with PMD is making the decision on profile/mode selection in Default case. However we must provide an option for the application to define a mode and PMD must honor with it to avoid making an invalid mode change. > > Do you think it has to be defined from the beginning? [Sugesh] I feel it's going to be another big topic to decide how proposed mode implementation will be looks like, What should be available modes and etc. So I am OK to consider as its not part of this flow API definition for now. However its good to mention that in the API comments section to be aware. Do you agree that? > > -- > Adrien Mazarguil > 6WIND
Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
Hi Sugesh, On Fri, Dec 09, 2016 at 12:18:03PM +, Chandran, Sugesh wrote: [...] > > > Are you going to provide any control over the initialization of NIC > > > to define the capability matrices For eg; To operate in a L3 router mode, > > software wanted to initialize the NIC port only to consider the L2 and L3 > > fields. > > > I assume the initialization is done based on the first rules that are > > programmed into the NIC.? > > > > Precisely, PMDs are supposed to determine the most appropriate device > > mode to use in order to handle the requested rules. They may even switch > > to another mode if necessary assuming this does not break existing > > constraints. > > > > I think we've discussed an atomic (commit-based) mode of operation > > through separate functions as well, where the application would attempt to > > create a bunch of rules at once, possibly making it easier for PMDs to > > determine the most appropriate mode of operation for the device. > > > > All of these may be added later according to users feedback once the basic > > API has settled. > [Sugesh] Yes , we discussed about this before. However I feel that, it make > sense > to provide some flexibility to the user/application to define a profile/mode > of the device. > This way the complexity of determining the mode by itself will be taken away > from PMD. > Looking at the P4 enablement patches in OVS, the mode definition APIs can be > used in conjunction > P4 behavioral model. > For eg: A P4 model for a L2 switch operate OVS as a L2 switch. Using the mode > definition APIs > Its possible to impose the same behavioral model in the hardware too. > This way its simple, clean and very predictive though it needs to define an > additional profile_define APIs. > I am sorry to provide the comment at this stage, However looking at the > adoption of ebpf, P4 make me > to think this way. > What do you think? What you suggest (device profile configuration) would be done by a separate function in any case, so as long as everyone agrees on a generic method to do so, no problem with extending rte_flow. By default in the meantime we'll have to rely on PMDs to make the right decision. Do you think it has to be defined from the beginning? -- Adrien Mazarguil 6WIND
Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
Hi Adrien, Thank you for your comments, Please see the reply below. Regards _Sugesh > -Original Message- > From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com] > Sent: Thursday, December 8, 2016 3:09 PM > To: Chandran, Sugesh > Cc: Kevin Traynor ; dev@dpdk.org; Thomas > Monjalon ; De Lara Guarch, Pablo > ; Olivier Matz ; > sugesh.chand...@intel.comn > Subject: Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API > > Hi Sugesh, > > On Tue, Dec 06, 2016 at 06:11:38PM +, Chandran, Sugesh wrote: > [...] > > > >>> +int > > > >>> +rte_flow_validate(uint8_t port_id, > > > >>> + const struct rte_flow_attr *attr, > > > >>> + const struct rte_flow_item pattern[], > > > >>> + const struct rte_flow_action actions[], > > > >>> + struct rte_flow_error *error); > > > >> > > > >> Why not just use rte_flow_create() and get an error? Is it less > > > >> disruptive to do a validate and find the rule cannot be created, > > > >> than using a create directly? > > > > > > > > The rationale can be found in the original RFC, which I'll convert > > > > to actual documentation in v2. In short: > > > > > > > > - Calling rte_flow_validate() before rte_flow_create() is useless since > > > > rte_flow_create() also performs validation. > > > > > > > > - We cannot possibly express a full static set of allowed flow rules, > > > > even > > > > if we could, it usually depends on the current hardware configuration > > > > therefore would not be static. > > > > > > > > - rte_flow_validate() is thus provided as a replacement for capability > > > > flags. It can be used to determine during initialization if the > > > > underlying > > > > device can support the typical flow rules an application might want to > > > > provide later and do something useful with that information (e.g. > always > > > > use software fallback due to HW limitations). > > > > > > > > - rte_flow_validate() being a subset of rte_flow_create(), it is > essentially > > > > free to expose. > > > > > > make sense now, thanks. > > [Sugesh] : We had this discussion earlier at the design stage about > > the time taken for programming the hardware, and how to make it > > deterministic. How about having a timeout parameter as well for the > > rte_flow_* If the hardware flow insert is timed out, error out than > > waiting indefinitely, so that application have some control over The > > time to program the flow. It can be another set of APIs something > > like, rte_flow_create_timeout() > > Yes as discussed the existing API does not provide any timing constraints to > PMDs, validate() and create() may take forever to complete, although PMDs > are strongly encouraged to take as little time as possible. > > Like you suggested, this could be done through distinct API calls. The > validate() function would also have its _timeout() counterpart since the set > of possible rules could be restricted in that mode. [Sugesh] Thanks!. Looking forward to see an api set with that implementation as well in the future :). I feel it's a must from the user application point of view. > > > Are you going to provide any control over the initialization of NIC > > to define the capability matrices For eg; To operate in a L3 router mode, > software wanted to initialize the NIC port only to consider the L2 and L3 > fields. > > I assume the initialization is done based on the first rules that are > programmed into the NIC.? > > Precisely, PMDs are supposed to determine the most appropriate device > mode to use in order to handle the requested rules. They may even switch > to another mode if necessary assuming this does not break existing > constraints. > > I think we've discussed an atomic (commit-based) mode of operation > through separate functions as well, where the application would attempt to > create a bunch of rules at once, possibly making it easier for PMDs to > determine the most appropriate mode of operation for the device. > > All of these may be added later according to users feedback once the basic > API has settled. [Sugesh] Yes , we discussed about this before. However I feel that, it make sense to provide some flexibility to the user/application to define a profile/mode of the device. This way the complexity of determining the mode by itself will be taken away from PMD. Looking at the P4 enablement patches in OVS, the mode definition APIs can be used in conjunction P4 behavioral model. For eg: A P4 model for a L2 switch operate OVS as a L2 switch. Using the mode definition APIs Its possible to impose the same behavioral model in the hardware too. This way its simple, clean and very predictive though it needs to define an additional profile_define APIs. I am sorry to provide the comment at this stage, However looking at the adoption of ebpf, P4 make me to think this way. What do you think? > > -- > Adrien Mazarguil > 6WIND
Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
On Fri, Dec 02, 2016 at 09:06:42PM +, Kevin Traynor wrote: > On 12/01/2016 08:36 AM, Adrien Mazarguil wrote: > > Hi Kevin, > > > > On Wed, Nov 30, 2016 at 05:47:17PM +, Kevin Traynor wrote: > >> Hi Adrien, > >> > >> On 11/16/2016 04:23 PM, Adrien Mazarguil wrote: > >>> This new API supersedes all the legacy filter types described in > >>> rte_eth_ctrl.h. It is slightly higher level and as a result relies more on > >>> PMDs to process and validate flow rules. > >>> > >>> Benefits: > >>> > >>> - A unified API is easier to program for, applications do not have to be > >>> written for a specific filter type which may or may not be supported by > >>> the underlying device. > >>> > >>> - The behavior of a flow rule is the same regardless of the underlying > >>> device, applications do not need to be aware of hardware quirks. > >>> > >>> - Extensible by design, API/ABI breakage should rarely occur if at all. > >>> > >>> - Documentation is self-standing, no need to look up elsewhere. > >>> > >>> Existing filter types will be deprecated and removed in the near future. > >> > >> I'd suggest to add a deprecation notice to deprecation.rst, ideally with > >> a target release. > > > > Will do, not a sure about the target release though. It seems a bit early > > since no PMD really supports this API yet. > > > > [...] > >>> diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c > >>> new file mode 100644 > >>> index 000..064963d > >>> --- /dev/null > >>> +++ b/lib/librte_ether/rte_flow.c > >>> @@ -0,0 +1,159 @@ > >>> +/*- > >>> + * BSD LICENSE > >>> + * > >>> + * Copyright 2016 6WIND S.A. > >>> + * Copyright 2016 Mellanox. > >> > >> There's Mellanox copyright but you are the only signed-off-by - is that > >> right? > > > > Yes, I'm the primary maintainer for Mellanox PMDs and this API was designed > > on their behalf to expose several features from mlx4/mlx5 as the existing > > filter types had too many limitations. > > > > [...] > >>> +/* Get generic flow operations structure from a port. */ > >>> +const struct rte_flow_ops * > >>> +rte_flow_ops_get(uint8_t port_id, struct rte_flow_error *error) > >>> +{ > >>> + struct rte_eth_dev *dev = &rte_eth_devices[port_id]; > >>> + const struct rte_flow_ops *ops; > >>> + int code; > >>> + > >>> + if (unlikely(!rte_eth_dev_is_valid_port(port_id))) > >>> + code = ENODEV; > >>> + else if (unlikely(!dev->dev_ops->filter_ctrl || > >>> + dev->dev_ops->filter_ctrl(dev, > >>> + RTE_ETH_FILTER_GENERIC, > >>> + RTE_ETH_FILTER_GET, > >>> + &ops) || > >>> + !ops)) > >>> + code = ENOTSUP; > >>> + else > >>> + return ops; > >>> + rte_flow_error_set(error, code, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, > >>> +NULL, rte_strerror(code)); > >>> + return NULL; > >>> +} > >>> + > >> > >> Is it expected that the application or pmd will provide locking between > >> these functions if required? I think it's going to have to be the app. > > > > Locking is indeed expected to be performed by applications. This API only > > documents places where locking would make sense if necessary and expected > > behavior. > > > > Like all control path APIs, this one assumes a single control thread. > > Applications must take the necessary precautions. > > If you look at OVS now it's quite possible that you have 2 rx queues > serviced by different threads, that would also install the flow rules in > the software flow caches - possibly that could extend to adding hardware > flows. There could also be another thread that is querying for stats. So > anything that can be done to minimise the locking would be helpful - > maybe query() could be atomic and not require any locking? I think we need basic functions with as few constraints as possible on PMDs first, this API being somewhat complex to implement on their side. That covers the common use case where applications have a single control thread or otherwise perform locking on their own. Once the basics are there for most PMDs, we may add new functions, items, properties and actions that provide additional constraints (timing, multi-threading and so on), which remain to be defined according to feedback. It is designed to be extended without causing ABI breakage. As for query(), let's see how PMDs handle it first. A race between query() and create() on a given device is almost unavoidable without locking, same for queries that reset counters in a given flow rule. Basic parallel queries should not cause any harm otherwise, although this cannot be guaranteed yet. > > [...] > >>> +/** > >>> + * Flow rule attributes. > >>> + * > >>> + * Priorities are set on two levels: per group and per rule within > >>> groups. > >>> + * > >>> + * Lower values denote higher priority, the highest priority for both > >>> levels > >>> + * is
Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
Hi Sugesh, On Tue, Dec 06, 2016 at 06:11:38PM +, Chandran, Sugesh wrote: [...] > > >>> +int > > >>> +rte_flow_validate(uint8_t port_id, > > >>> + const struct rte_flow_attr *attr, > > >>> + const struct rte_flow_item pattern[], > > >>> + const struct rte_flow_action actions[], > > >>> + struct rte_flow_error *error); > > >> > > >> Why not just use rte_flow_create() and get an error? Is it less > > >> disruptive to do a validate and find the rule cannot be created, than > > >> using a create directly? > > > > > > The rationale can be found in the original RFC, which I'll convert to > > > actual documentation in v2. In short: > > > > > > - Calling rte_flow_validate() before rte_flow_create() is useless since > > > rte_flow_create() also performs validation. > > > > > > - We cannot possibly express a full static set of allowed flow rules, even > > > if we could, it usually depends on the current hardware configuration > > > therefore would not be static. > > > > > > - rte_flow_validate() is thus provided as a replacement for capability > > > flags. It can be used to determine during initialization if the > > > underlying > > > device can support the typical flow rules an application might want to > > > provide later and do something useful with that information (e.g. always > > > use software fallback due to HW limitations). > > > > > > - rte_flow_validate() being a subset of rte_flow_create(), it is > > > essentially > > > free to expose. > > > > make sense now, thanks. > [Sugesh] : We had this discussion earlier at the design stage about the time > taken for programming the hardware, > and how to make it deterministic. How about having a timeout parameter as > well for the rte_flow_* > If the hardware flow insert is timed out, error out than waiting > indefinitely, so that application have some control over > The time to program the flow. It can be another set of APIs something like, > rte_flow_create_timeout() Yes as discussed the existing API does not provide any timing constraints to PMDs, validate() and create() may take forever to complete, although PMDs are strongly encouraged to take as little time as possible. Like you suggested, this could be done through distinct API calls. The validate() function would also have its _timeout() counterpart since the set of possible rules could be restricted in that mode. > Are you going to provide any control over the initialization of NIC to > define the capability matrices > For eg; To operate in a L3 router mode, software wanted to initialize the > NIC port only to consider the L2 and L3 fields. > I assume the initialization is done based on the first rules that are > programmed into the NIC.? Precisely, PMDs are supposed to determine the most appropriate device mode to use in order to handle the requested rules. They may even switch to another mode if necessary assuming this does not break existing constraints. I think we've discussed an atomic (commit-based) mode of operation through separate functions as well, where the application would attempt to create a bunch of rules at once, possibly making it easier for PMDs to determine the most appropriate mode of operation for the device. All of these may be added later according to users feedback once the basic API has settled. -- Adrien Mazarguil 6WIND
Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
Hi Beilei, On Thu, Dec 08, 2016 at 09:00:05AM +, Xing, Beilei wrote: [...] > > +/** > > + * RTE_FLOW_ITEM_TYPE_ETH > > + * > > + * Matches an Ethernet header. > > + */ > > +struct rte_flow_item_eth { > > + struct ether_addr dst; /**< Destination MAC. */ > > + struct ether_addr src; /**< Source MAC. */ > > + unsigned int type; /**< EtherType. */ > Hi Adrien, > > ETHERTYPE in ether header is 2 bytes, so I think "uint16_t type" is more > appropriate here, what do you think? You're right, thanks for catching this. I'll update it in v2 (soon). -- Adrien Mazarguil 6WIND
Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
> -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Adrien Mazarguil > Sent: Thursday, November 17, 2016 12:23 AM > To: dev@dpdk.org > Cc: Thomas Monjalon ; De Lara Guarch, > Pablo ; Olivier Matz > > Subject: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API > > This new API supersedes all the legacy filter types described in > rte_eth_ctrl.h. > It is slightly higher level and as a result relies more on PMDs to process and > validate flow rules. > > Benefits: > > - A unified API is easier to program for, applications do not have to be > written for a specific filter type which may or may not be supported by > the underlying device. > > - The behavior of a flow rule is the same regardless of the underlying > device, applications do not need to be aware of hardware quirks. > > - Extensible by design, API/ABI breakage should rarely occur if at all. > > - Documentation is self-standing, no need to look up elsewhere. > > Existing filter types will be deprecated and removed in the near future. > > Signed-off-by: Adrien Mazarguil > --- > MAINTAINERS| 4 + > lib/librte_ether/Makefile | 3 + > lib/librte_ether/rte_eth_ctrl.h| 1 + > lib/librte_ether/rte_ether_version.map | 10 + > lib/librte_ether/rte_flow.c| 159 + > lib/librte_ether/rte_flow.h| 947 > lib/librte_ether/rte_flow_driver.h | 177 ++ > 7 files changed, 1301 insertions(+) > > +/** > + * RTE_FLOW_ITEM_TYPE_ETH > + * > + * Matches an Ethernet header. > + */ > +struct rte_flow_item_eth { > + struct ether_addr dst; /**< Destination MAC. */ > + struct ether_addr src; /**< Source MAC. */ > + unsigned int type; /**< EtherType. */ Hi Adrien, ETHERTYPE in ether header is 2 bytes, so I think "uint16_t type" is more appropriate here, what do you think? Thanks, Beilei Xing > +}; > +
Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
Hi Adrien, Thanks for sending out the patches, Please find few comments below, Regards _Sugesh > -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Kevin Traynor > Sent: Friday, December 2, 2016 9:07 PM > To: Adrien Mazarguil > Cc: dev@dpdk.org; Thomas Monjalon ; De > Lara Guarch, Pablo ; Olivier Matz > ; sugesh.chand...@intel.comn > Subject: Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API > >>>>>>Snipp > >>> + * > >>> + * Attaches a 32 bit value to packets. > >>> + * > >>> + * This value is arbitrary and application-defined. For > >>> +compatibility with > >>> + * FDIR it is returned in the hash.fdir.hi mbuf field. > >>> +PKT_RX_FDIR_ID is > >>> + * also set in ol_flags. > >>> + */ > >>> +struct rte_flow_action_mark { > >>> + uint32_t id; /**< 32 bit value to return with packets. */ }; > >> > >> One use case I thought we would be able to do for OVS is > >> classification in hardware and the unique flow id is sent with the packet > >> to > software. > >> But in OVS the ufid is 128 bits, so it means we can't and there is > >> still the miniflow extract overhead. I'm not sure if there is a > >> practical way around this. > >> > >> Sugesh (cc'd) has looked at this before and may be able to comment or > >> correct me. > > > > Yes, we settled on 32 bit because currently no known hardware > > implementation supports more than this. If that changes, another > > action with a larger type shall be provided (no ABI breakage). > > > > Also since even 64 bit would not be enough for the use case you > > mention, there is no choice but use this as an indirect value (such as > > an array or hash table index/value). > > ok, cool. I think Sugesh has other ideas anyway! [Sugesh] It should be fine with 32 bit . we can manage it in OVS accordingly. > > > > > [...] > >>> +/** > >>> + * RTE_FLOW_ACTION_TYPE_RSS > >>> + * > >>> + > >>> + * > >>> + * Terminating by default. > >>> + */ > >>> +struct rte_flow_action_vf { > >>> + uint32_t original:1; /**< Use original VF ID if possible. */ > >>> + uint32_t reserved:31; /**< Reserved, must be zero. */ > >>> + uint32_t id; /**< VF ID to redirect packets to. */ }; > > [...] > >>> +/** > >>> + * Check whether a flow rule can be created on a given port. > >>> + * > >>> + * While this function has no effect on the target device, the flow > >>> +rule is > >>> + * validated against its current configuration state and the > >>> +returned value > >>> + * should be considered valid by the caller for that state only. > >>> + * > >>> + * The returned value is guaranteed to remain valid only as long as > >>> +no > >>> + * successful calls to rte_flow_create() or rte_flow_destroy() are > >>> +made in > >>> + * the meantime and no device parameter affecting flow rules in any > >>> +way are > >>> + * modified, due to possible collisions or resource limitations > >>> +(although in > >>> + * such cases EINVAL should not be returned). > >>> + * > >>> + * @param port_id > >>> + * Port identifier of Ethernet device. > >>> + * @param[in] attr > >>> + * Flow rule attributes. > >>> + * @param[in] pattern > >>> + * Pattern specification (list terminated by the END pattern item). > >>> + * @param[in] actions > >>> + * Associated actions (list terminated by the END action). > >>> + * @param[out] error > >>> + * Perform verbose error reporting if not NULL. > >>> + * > >>> + * @return > >>> + * 0 if flow rule is valid and can be created. A negative errno value > >>> + * otherwise (rte_errno is also set), the following errors are defined: > >>> + * > >>> + * -ENOSYS: underlying device does not support this functionality. > >>> + * > >>> + * -EINVAL: unknown or invalid rule specification. > >>> + * > >>> + * -ENOTSUP: valid but unsupported rule specification (e.g. partial > >>> + * bit-masks are unsupported). > >>> + * > >>> + * -EEXIST: collision with an existing rule. > >>> + * > >>> + * -ENOMEM: not enough resources
Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
On 12/01/2016 08:36 AM, Adrien Mazarguil wrote: > Hi Kevin, > > On Wed, Nov 30, 2016 at 05:47:17PM +, Kevin Traynor wrote: >> Hi Adrien, >> >> On 11/16/2016 04:23 PM, Adrien Mazarguil wrote: >>> This new API supersedes all the legacy filter types described in >>> rte_eth_ctrl.h. It is slightly higher level and as a result relies more on >>> PMDs to process and validate flow rules. >>> >>> Benefits: >>> >>> - A unified API is easier to program for, applications do not have to be >>> written for a specific filter type which may or may not be supported by >>> the underlying device. >>> >>> - The behavior of a flow rule is the same regardless of the underlying >>> device, applications do not need to be aware of hardware quirks. >>> >>> - Extensible by design, API/ABI breakage should rarely occur if at all. >>> >>> - Documentation is self-standing, no need to look up elsewhere. >>> >>> Existing filter types will be deprecated and removed in the near future. >> >> I'd suggest to add a deprecation notice to deprecation.rst, ideally with >> a target release. > > Will do, not a sure about the target release though. It seems a bit early > since no PMD really supports this API yet. > > [...] >>> diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c >>> new file mode 100644 >>> index 000..064963d >>> --- /dev/null >>> +++ b/lib/librte_ether/rte_flow.c >>> @@ -0,0 +1,159 @@ >>> +/*- >>> + * BSD LICENSE >>> + * >>> + * Copyright 2016 6WIND S.A. >>> + * Copyright 2016 Mellanox. >> >> There's Mellanox copyright but you are the only signed-off-by - is that >> right? > > Yes, I'm the primary maintainer for Mellanox PMDs and this API was designed > on their behalf to expose several features from mlx4/mlx5 as the existing > filter types had too many limitations. > > [...] >>> +/* Get generic flow operations structure from a port. */ >>> +const struct rte_flow_ops * >>> +rte_flow_ops_get(uint8_t port_id, struct rte_flow_error *error) >>> +{ >>> + struct rte_eth_dev *dev = &rte_eth_devices[port_id]; >>> + const struct rte_flow_ops *ops; >>> + int code; >>> + >>> + if (unlikely(!rte_eth_dev_is_valid_port(port_id))) >>> + code = ENODEV; >>> + else if (unlikely(!dev->dev_ops->filter_ctrl || >>> + dev->dev_ops->filter_ctrl(dev, >>> + RTE_ETH_FILTER_GENERIC, >>> + RTE_ETH_FILTER_GET, >>> + &ops) || >>> + !ops)) >>> + code = ENOTSUP; >>> + else >>> + return ops; >>> + rte_flow_error_set(error, code, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, >>> + NULL, rte_strerror(code)); >>> + return NULL; >>> +} >>> + >> >> Is it expected that the application or pmd will provide locking between >> these functions if required? I think it's going to have to be the app. > > Locking is indeed expected to be performed by applications. This API only > documents places where locking would make sense if necessary and expected > behavior. > > Like all control path APIs, this one assumes a single control thread. > Applications must take the necessary precautions. If you look at OVS now it's quite possible that you have 2 rx queues serviced by different threads, that would also install the flow rules in the software flow caches - possibly that could extend to adding hardware flows. There could also be another thread that is querying for stats. So anything that can be done to minimise the locking would be helpful - maybe query() could be atomic and not require any locking? > > [...] >>> +/** >>> + * Flow rule attributes. >>> + * >>> + * Priorities are set on two levels: per group and per rule within groups. >>> + * >>> + * Lower values denote higher priority, the highest priority for both >>> levels >>> + * is 0, so that a rule with priority 0 in group 8 is always matched after >>> a >>> + * rule with priority 8 in group 0. >>> + * >>> + * Although optional, applications are encouraged to group similar rules as >>> + * much as possible to fully take advantage of hardware capabilities >>> + * (e.g. optimized matching) and work around limitations (e.g. a single >>> + * pattern type possibly allowed in a given group). >>> + * >>> + * Group and priority levels are arbitrary and up to the application, they >>> + * do not need to be contiguous nor start from 0, however the maximum >>> number >>> + * varies between devices and may be affected by existing flow rules. >>> + * >>> + * If a packet is matched by several rules of a given group for a given >>> + * priority level, the outcome is undefined. It can take any path, may be >>> + * duplicated or even cause unrecoverable errors. >> >> I get what you are trying to do here wrt supporting multiple >> pmds/hardware implementations and it's a good idea to keep it flexible. >> >> Given that the outcome is undefined, it would be nice that t
[dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
Hi Kevin, On Wed, Nov 30, 2016 at 05:47:17PM +, Kevin Traynor wrote: > Hi Adrien, > > On 11/16/2016 04:23 PM, Adrien Mazarguil wrote: > > This new API supersedes all the legacy filter types described in > > rte_eth_ctrl.h. It is slightly higher level and as a result relies more on > > PMDs to process and validate flow rules. > > > > Benefits: > > > > - A unified API is easier to program for, applications do not have to be > > written for a specific filter type which may or may not be supported by > > the underlying device. > > > > - The behavior of a flow rule is the same regardless of the underlying > > device, applications do not need to be aware of hardware quirks. > > > > - Extensible by design, API/ABI breakage should rarely occur if at all. > > > > - Documentation is self-standing, no need to look up elsewhere. > > > > Existing filter types will be deprecated and removed in the near future. > > I'd suggest to add a deprecation notice to deprecation.rst, ideally with > a target release. Will do, not a sure about the target release though. It seems a bit early since no PMD really supports this API yet. [...] > > diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c > > new file mode 100644 > > index 000..064963d > > --- /dev/null > > +++ b/lib/librte_ether/rte_flow.c > > @@ -0,0 +1,159 @@ > > +/*- > > + * BSD LICENSE > > + * > > + * Copyright 2016 6WIND S.A. > > + * Copyright 2016 Mellanox. > > There's Mellanox copyright but you are the only signed-off-by - is that > right? Yes, I'm the primary maintainer for Mellanox PMDs and this API was designed on their behalf to expose several features from mlx4/mlx5 as the existing filter types had too many limitations. [...] > > +/* Get generic flow operations structure from a port. */ > > +const struct rte_flow_ops * > > +rte_flow_ops_get(uint8_t port_id, struct rte_flow_error *error) > > +{ > > + struct rte_eth_dev *dev = &rte_eth_devices[port_id]; > > + const struct rte_flow_ops *ops; > > + int code; > > + > > + if (unlikely(!rte_eth_dev_is_valid_port(port_id))) > > + code = ENODEV; > > + else if (unlikely(!dev->dev_ops->filter_ctrl || > > + dev->dev_ops->filter_ctrl(dev, > > + RTE_ETH_FILTER_GENERIC, > > + RTE_ETH_FILTER_GET, > > + &ops) || > > + !ops)) > > + code = ENOTSUP; > > + else > > + return ops; > > + rte_flow_error_set(error, code, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, > > + NULL, rte_strerror(code)); > > + return NULL; > > +} > > + > > Is it expected that the application or pmd will provide locking between > these functions if required? I think it's going to have to be the app. Locking is indeed expected to be performed by applications. This API only documents places where locking would make sense if necessary and expected behavior. Like all control path APIs, this one assumes a single control thread. Applications must take the necessary precautions. [...] > > +/** > > + * Flow rule attributes. > > + * > > + * Priorities are set on two levels: per group and per rule within groups. > > + * > > + * Lower values denote higher priority, the highest priority for both > > levels > > + * is 0, so that a rule with priority 0 in group 8 is always matched after > > a > > + * rule with priority 8 in group 0. > > + * > > + * Although optional, applications are encouraged to group similar rules as > > + * much as possible to fully take advantage of hardware capabilities > > + * (e.g. optimized matching) and work around limitations (e.g. a single > > + * pattern type possibly allowed in a given group). > > + * > > + * Group and priority levels are arbitrary and up to the application, they > > + * do not need to be contiguous nor start from 0, however the maximum > > number > > + * varies between devices and may be affected by existing flow rules. > > + * > > + * If a packet is matched by several rules of a given group for a given > > + * priority level, the outcome is undefined. It can take any path, may be > > + * duplicated or even cause unrecoverable errors. > > I get what you are trying to do here wrt supporting multiple > pmds/hardware implementations and it's a good idea to keep it flexible. > > Given that the outcome is undefined, it would be nice that the > application has a way of finding the specific effects for verification > and debugging. Right, however it was deemed a bit difficult to manage in many cases hence the vagueness. For example, suppose two rules with the same group and priority, one matching any IPv4 header, the other one any UDP header: - TCPv4 packets => rule #1. - UDPv6 packets => rule #2. - UDPv4 packets => both? That last one is perhaps invalid, checking that some unspecified protocol combination does not overlap is expensive and may
[dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
Hi Adrien, On 11/16/2016 04:23 PM, Adrien Mazarguil wrote: > This new API supersedes all the legacy filter types described in > rte_eth_ctrl.h. It is slightly higher level and as a result relies more on > PMDs to process and validate flow rules. > > Benefits: > > - A unified API is easier to program for, applications do not have to be > written for a specific filter type which may or may not be supported by > the underlying device. > > - The behavior of a flow rule is the same regardless of the underlying > device, applications do not need to be aware of hardware quirks. > > - Extensible by design, API/ABI breakage should rarely occur if at all. > > - Documentation is self-standing, no need to look up elsewhere. > > Existing filter types will be deprecated and removed in the near future. I'd suggest to add a deprecation notice to deprecation.rst, ideally with a target release. > > Signed-off-by: Adrien Mazarguil > --- > MAINTAINERS| 4 + > lib/librte_ether/Makefile | 3 + > lib/librte_ether/rte_eth_ctrl.h| 1 + > lib/librte_ether/rte_ether_version.map | 10 + > lib/librte_ether/rte_flow.c| 159 + > lib/librte_ether/rte_flow.h| 947 > lib/librte_ether/rte_flow_driver.h | 177 ++ > 7 files changed, 1301 insertions(+) > > diff --git a/MAINTAINERS b/MAINTAINERS > index d6bb8f8..3b46630 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -243,6 +243,10 @@ M: Thomas Monjalon > F: lib/librte_ether/ > F: scripts/test-null.sh > > +Generic flow API > +M: Adrien Mazarguil > +F: lib/librte_ether/rte_flow* > + > Crypto API > M: Declan Doherty > F: lib/librte_cryptodev/ > diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile > index efe1e5f..9335361 100644 > --- a/lib/librte_ether/Makefile > +++ b/lib/librte_ether/Makefile > @@ -44,6 +44,7 @@ EXPORT_MAP := rte_ether_version.map > LIBABIVER := 5 > > SRCS-y += rte_ethdev.c > +SRCS-y += rte_flow.c > > # > # Export include files > @@ -51,6 +52,8 @@ SRCS-y += rte_ethdev.c > SYMLINK-y-include += rte_ethdev.h > SYMLINK-y-include += rte_eth_ctrl.h > SYMLINK-y-include += rte_dev_info.h > +SYMLINK-y-include += rte_flow.h > +SYMLINK-y-include += rte_flow_driver.h > > # this lib depends upon: > DEPDIRS-y += lib/librte_net lib/librte_eal lib/librte_mempool > lib/librte_ring lib/librte_mbuf > diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h > index fe80eb0..8386904 100644 > --- a/lib/librte_ether/rte_eth_ctrl.h > +++ b/lib/librte_ether/rte_eth_ctrl.h > @@ -99,6 +99,7 @@ enum rte_filter_type { > RTE_ETH_FILTER_FDIR, > RTE_ETH_FILTER_HASH, > RTE_ETH_FILTER_L2_TUNNEL, > + RTE_ETH_FILTER_GENERIC, > RTE_ETH_FILTER_MAX > }; > > diff --git a/lib/librte_ether/rte_ether_version.map > b/lib/librte_ether/rte_ether_version.map > index 72be66d..b5d2547 100644 > --- a/lib/librte_ether/rte_ether_version.map > +++ b/lib/librte_ether/rte_ether_version.map > @@ -147,3 +147,13 @@ DPDK_16.11 { > rte_eth_dev_pci_remove; > > } DPDK_16.07; > + > +DPDK_17.02 { > + global: > + > + rte_flow_validate; > + rte_flow_create; > + rte_flow_destroy; > + rte_flow_query; > + > +} DPDK_16.11; > diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c > new file mode 100644 > index 000..064963d > --- /dev/null > +++ b/lib/librte_ether/rte_flow.c > @@ -0,0 +1,159 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright 2016 6WIND S.A. > + * Copyright 2016 Mellanox. There's Mellanox copyright but you are the only signed-off-by - is that right? > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of 6WIND S.A. nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
[dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
Hi Beilei, On Fri, Nov 18, 2016 at 06:36:31AM +, Xing, Beilei wrote: > Hi Adrien, > > > -Original Message- > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Adrien Mazarguil > > Sent: Thursday, November 17, 2016 12:23 AM > > To: dev at dpdk.org > > Cc: Thomas Monjalon ; De Lara Guarch, > > Pablo ; Olivier Matz > > > > Subject: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API > > > > This new API supersedes all the legacy filter types described in > > rte_eth_ctrl.h. > > It is slightly higher level and as a result relies more on PMDs to process > > and > > validate flow rules. > > > > Benefits: > > > > - A unified API is easier to program for, applications do not have to be > > written for a specific filter type which may or may not be supported by > > the underlying device. > > > > - The behavior of a flow rule is the same regardless of the underlying > > device, applications do not need to be aware of hardware quirks. > > > > - Extensible by design, API/ABI breakage should rarely occur if at all. > > > > - Documentation is self-standing, no need to look up elsewhere. > > > > Existing filter types will be deprecated and removed in the near future. > > > > Signed-off-by: Adrien Mazarguil > > > > + > > +/** > > + * Opaque type returned after successfully creating a flow. > > + * > > + * This handle can be used to manage and query the related flow (e.g. > > +to > > + * destroy it or retrieve counters). > > + */ > > +struct rte_flow; > > + > > As we talked before, we use attr/pattern/actions to create and destroy a flow > in PMD, > but I don't think it's easy to clone the user-provided parameters and return > the result > to the application as a rte_flow pointer. As you suggested: > /* PMD-specific code. */ > struct rte_flow { > struct rte_flow_attr attr; > struct rte_flow_item *pattern; > struct rte_flow_action *actions; > }; Just to provide some context to the community since the above snippet comes from private exchanges, I've suggested the above structure as a mean to create and remove rules in the same fashion as FDIR, by providing the rule used for creation to the destroy callback. As an opaque type, each PMD currently needs to implement its own version of struct rte_flow. The above definition may ease transition from FDIR to rte_flow for some PMDs, however they need to clone the entire application-provided rule to do so because there is no requirement for it to be kept allocated. I've implemented such a function in testpmd (port_flow_new() in commit [1]) as an example. [1] http://dpdk.org/ml/archives/dev/2016-November/050266.html However my suggestion is for PMDs to use their own HW-specific structure that only contains relevant information instead of being forced to drag large, non-native data around, missing useful context and that requires parsing every time. This is one benefit of using an opaque type in the first place, the other being ABI breakage avoidance. > Because both pattern and actions are pointers, and there're also pointers in > structure > rte_flow_item and struct rte_flow_action. We need to iterate allocation > during clone > and iterate free during destroy, then seems that the code is something ugly, > right? Well since I wrote that code, I won't easily admit it's ugly. I think PMDs should not require the duplication of generic rules actually, which are only defined as a common language between applications and PMDs. Both are free to store rules in their own preferred and efficient format internally. > I think application saves info when creating a flow rule, so why not > application provide > attr/pattern/actions info to PMD before calling PMD API? They have to do so temporarily (e.g. allocated on the stack) while calling rte_flow_create() and rte_flow_validate(), that's it. Once a rule is created, there's no requirement for applications to keep anything around. For simple applications such as testpmd, the generic format is probably enough. More complex and existing applications such as ovs-dpdk may rather choose to keep using their internal format that already fits their needs, partially duplicating this information in rte_flow_attr and rte_flow_item/rte_flow_action lists would waste memory. The conversion in this case should only be performed when creating/validating flow rules. In short, I fail to see any downside with maintaining struct rte_flow opaque to applications. Best regards, -- Adrien Mazarguil 6WIND
[dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
Hi Adrien, > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Adrien Mazarguil > Sent: Thursday, November 17, 2016 12:23 AM > To: dev at dpdk.org > Cc: Thomas Monjalon ; De Lara Guarch, > Pablo ; Olivier Matz > > Subject: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API > > This new API supersedes all the legacy filter types described in > rte_eth_ctrl.h. > It is slightly higher level and as a result relies more on PMDs to process and > validate flow rules. > > Benefits: > > - A unified API is easier to program for, applications do not have to be > written for a specific filter type which may or may not be supported by > the underlying device. > > - The behavior of a flow rule is the same regardless of the underlying > device, applications do not need to be aware of hardware quirks. > > - Extensible by design, API/ABI breakage should rarely occur if at all. > > - Documentation is self-standing, no need to look up elsewhere. > > Existing filter types will be deprecated and removed in the near future. > > Signed-off-by: Adrien Mazarguil > + > +/** > + * Opaque type returned after successfully creating a flow. > + * > + * This handle can be used to manage and query the related flow (e.g. > +to > + * destroy it or retrieve counters). > + */ > +struct rte_flow; > + As we talked before, we use attr/pattern/actions to create and destroy a flow in PMD, but I don't think it's easy to clone the user-provided parameters and return the result to the application as a rte_flow pointer. As you suggested: /* PMD-specific code. */ struct rte_flow { struct rte_flow_attr attr; struct rte_flow_item *pattern; struct rte_flow_action *actions; }; Because both pattern and actions are pointers, and there're also pointers in structure rte_flow_item and struct rte_flow_action. We need to iterate allocation during clone and iterate free during destroy, then seems that the code is something ugly, right? I think application saves info when creating a flow rule, so why not application provide attr/pattern/actions info to PMD before calling PMD API? Thanks, Beilei Xing
[dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API
This new API supersedes all the legacy filter types described in rte_eth_ctrl.h. It is slightly higher level and as a result relies more on PMDs to process and validate flow rules. Benefits: - A unified API is easier to program for, applications do not have to be written for a specific filter type which may or may not be supported by the underlying device. - The behavior of a flow rule is the same regardless of the underlying device, applications do not need to be aware of hardware quirks. - Extensible by design, API/ABI breakage should rarely occur if at all. - Documentation is self-standing, no need to look up elsewhere. Existing filter types will be deprecated and removed in the near future. Signed-off-by: Adrien Mazarguil --- MAINTAINERS| 4 + lib/librte_ether/Makefile | 3 + lib/librte_ether/rte_eth_ctrl.h| 1 + lib/librte_ether/rte_ether_version.map | 10 + lib/librte_ether/rte_flow.c| 159 + lib/librte_ether/rte_flow.h| 947 lib/librte_ether/rte_flow_driver.h | 177 ++ 7 files changed, 1301 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index d6bb8f8..3b46630 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -243,6 +243,10 @@ M: Thomas Monjalon F: lib/librte_ether/ F: scripts/test-null.sh +Generic flow API +M: Adrien Mazarguil +F: lib/librte_ether/rte_flow* + Crypto API M: Declan Doherty F: lib/librte_cryptodev/ diff --git a/lib/librte_ether/Makefile b/lib/librte_ether/Makefile index efe1e5f..9335361 100644 --- a/lib/librte_ether/Makefile +++ b/lib/librte_ether/Makefile @@ -44,6 +44,7 @@ EXPORT_MAP := rte_ether_version.map LIBABIVER := 5 SRCS-y += rte_ethdev.c +SRCS-y += rte_flow.c # # Export include files @@ -51,6 +52,8 @@ SRCS-y += rte_ethdev.c SYMLINK-y-include += rte_ethdev.h SYMLINK-y-include += rte_eth_ctrl.h SYMLINK-y-include += rte_dev_info.h +SYMLINK-y-include += rte_flow.h +SYMLINK-y-include += rte_flow_driver.h # this lib depends upon: DEPDIRS-y += lib/librte_net lib/librte_eal lib/librte_mempool lib/librte_ring lib/librte_mbuf diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h index fe80eb0..8386904 100644 --- a/lib/librte_ether/rte_eth_ctrl.h +++ b/lib/librte_ether/rte_eth_ctrl.h @@ -99,6 +99,7 @@ enum rte_filter_type { RTE_ETH_FILTER_FDIR, RTE_ETH_FILTER_HASH, RTE_ETH_FILTER_L2_TUNNEL, + RTE_ETH_FILTER_GENERIC, RTE_ETH_FILTER_MAX }; diff --git a/lib/librte_ether/rte_ether_version.map b/lib/librte_ether/rte_ether_version.map index 72be66d..b5d2547 100644 --- a/lib/librte_ether/rte_ether_version.map +++ b/lib/librte_ether/rte_ether_version.map @@ -147,3 +147,13 @@ DPDK_16.11 { rte_eth_dev_pci_remove; } DPDK_16.07; + +DPDK_17.02 { + global: + + rte_flow_validate; + rte_flow_create; + rte_flow_destroy; + rte_flow_query; + +} DPDK_16.11; diff --git a/lib/librte_ether/rte_flow.c b/lib/librte_ether/rte_flow.c new file mode 100644 index 000..064963d --- /dev/null +++ b/lib/librte_ether/rte_flow.c @@ -0,0 +1,159 @@ +/*- + * BSD LICENSE + * + * Copyright 2016 6WIND S.A. + * Copyright 2016 Mellanox. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of 6WIND S.A. nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#include + +#include +#include +#include "rte_ethdev.h" +#include "rte_flow_driver.h" +#include "rte_flow.h" + +/* Get generic flow operations structure fr