Re: [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe
On 2/4/2016 5:12 AM, Jamal Hadi Salim wrote: > > On 16-02-03 01:48 PM, Fastabend, John R wrote: > > BTW: For the record John, I empathize with you that we need to > move. Please have patience - we are close; lets just get this resolved > in Seville. I like your patches a lot and would love to just have > your patches pushed in, but the challenges with community is being able > to reach some middle ground. We are not as bad as some of the standards > organizations. I am sure we'll get this resolved by end of next week > if not, I am %100 in agreement some form of your patches (And Amir's > need to go in and then we can refactor as needed) Agreed although I'm a bit worried we are starting to talk about a single hardware IR. This discussion has always failed in my experience. > >>> 1) "priorities" for filters and some form of "index" for actions is >>> is needed. I think index (which tends to be a 32 bit value is what >>> Amir's patches refered to as "cookie" - or at least some hardware >>> can be used to query the action with). Priorities maybe implicit in >>> the order in which they are added. And th idea of appending vs >>> exclusivity vs replace (which netlink already supports) >>> is important to worry about (TCAMS tend to assume an append mode >>> for example). >> >> The code denotes add/del/replace already. I'm not sure why a TCAM >> would assume an append mode but OK maybe that is some API you have >> the APIs I use don't have these semantics. >> > > Basically most hardware (or i should say driver implementations of > mostly TCAMS) allow you to add exactly the same filter as many times > as you want. They dont really look at what you want to filter on > and then scream "conflict". IOW, you (user) are responsible for > conflict resolution at the filter level. The driver sees this blob > and requests for some index/key from the hardware then just adds it. > You can then use this key/index to delete/replace etc. > This is what i meant by "append" mode. > However if a classifier implementation cares about filter ambiguity > resolution, then priorities are used. We need to worry about the > bigger picture. > Sure in other classifiers its used but its not needed in the set I planned to added it later. > >> For this series using cls_u32 the handle gives you everything you need >> to put entries in the right table and row. Namely the ht # and order # >> from 'tc'. > > True - but with a caveat. There are only 2^12 max tables you can > have for example and up to 2^12 filters per bucket etc. > This is a software limitation as well right? If it hasn't showed up as a limitation on the software side why would it be an issue here? Do you have more than 2^12 tables on your devices? If so I guess we can tack on another 32bits somewhere. >> Take a look at u32_change and u32_classify its the handle >> that places the filter into the list and the handle that is matched in >> classify. We should place the filters in the hardware in the same order >> that is used by u32_change. >> > > I can see some parallels, but: > The nodeid in itself is insufficent for two reasons: > You cant have more than 2^12 filters per bucket; > and the nodeid then takes two meanings: a) it is an id > b) it specifies the order in which things are looked up. > > I think you need to take the u32 address and map it to something in your > hardware. But at the same time it is important to have the abstraction > closely emulate your hardware. > IMO the hardware/interface must preserve the same ordering of filters/hash_Tables/etc. How it does that mapping should be a driver concern and it can always abort if it fails. >> Also ran a few tests and can't see how priority works in u32 maybe you >> can shed some light but as best I can tell it doesn't have any effect >> on rule execution. >> > > True. > u32 doesnt care because it will give you a nodeid if you dont specify > one. i.e conflict resolution is mapped to you not specifying exactly > the same ht:bkt:nodeid more than once. And if you will let the > kernel do it for you (as i am assumming you are saying your hardware > will) then no need. Yep. Faithfully offloading u32 here not changing anything except I do have to abort on some cases with the simpler devices. fm10k for example can model hash nodes with divisors > 1. > >>> >>> 2) I like the u32 approach where it makes sense; but sometimes it >>> doesnt make sense from a usability pov. I work with some ASICs >>> that have 10 tuples that are fixed. Yes, a user can describe a policy >>> with u32 but flower would be more usable say with flower (both >>> programmatic and cli) >> >> Sure so create a set of offload hooks for flower we don't need only >> one hardware classifier any more than we would like a single software >> classifiers. > > > Glad to hear that. > I was a little concerned that despite my love for u32 it was > going to be _the_ classifier. It doesnt fit for all offload cases > and sometimes it is because of human
Re: [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe
On 2/4/2016 3:19 PM, Pablo Neira Ayuso wrote: > On Thu, Feb 04, 2016 at 10:16:56AM +0100, Jiri Pirko wrote: >> Wed, Feb 03, 2016 at 10:27:32AM CET, john.fastab...@gmail.com wrote: >>> >>> Also by adding get_parse_graph and set_parse_graph attributes as >>> in my previous flow_api work we can build programmable devices >>> and programmatically learn when rules can or can not be loaded >>> into the hardware. Again future work. >>> >>> Any comments/feedback appreciated. Sorry if you get this twice it doesn't look like my original response made it to netdev and the laptop I replied on charger blew up. >> >> I like this being thin and elegant solution. However, ~2 years ago when I >> pushed openvswitch kernel datapath offload patchset, people were yelling >> at me that it is not generic enough solution, that tc has to be able >> to use the api (Jamal :)), nftables as well. > The other problem with OVS is if you have the capabilities to do wildcard lookups (e.g. TCAM/SRAM/etc) then offloading the exact match table in OVS is really inefficient use of the resource. You really want to load the megaflow table into hardware. I just don't think its a good scheme for what you want. > I would be glad to join this debate during NetDev 1.1 too. > great. > I think we should provide a solution that allows people uses both > tc and nftables, this would require a bit of generic infrastructure on > top of it so we don't restrict users to one single solution, in other > words, we allow the user to select its own poison. > >> Now this patch is making offload strictly tc-based and nobody seems to >> care :) I do. I think that we might try to find some generic middle >> layer. If we can build the universal model for 'tc' and 'nftable' we should unify them higher in the stack? It doesn't make sense to me for the driver folks to try and create the unified model for two subsystems if we don't think its worthwhile in software as well. > > I agree and I'll be happy to help to push this ahead. Let's try to sit > and get together to resolve this. Great. > > See you soon. >
Re: [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe
On 16-02-09 06:24 AM, Fastabend, John R wrote: On 2/4/2016 5:12 AM, Jamal Hadi Salim wrote: On 16-02-03 01:48 PM, Fastabend, John R wrote: Basically most hardware (or i should say driver implementations of mostly TCAMS) allow you to add exactly the same filter as many times as you want. They dont really look at what you want to filter on and then scream "conflict". IOW, you (user) are responsible for conflict resolution at the filter level. The driver sees this blob and requests for some index/key from the hardware then just adds it. You can then use this key/index to delete/replace etc. This is what i meant by "append" mode. However if a classifier implementation cares about filter ambiguity resolution, then priorities are used. We need to worry about the bigger picture. Sure in other classifiers its used but its not needed in the set I planned to added it later. If you leave it open for some other hardware to use we should be fine. For this series using cls_u32 the handle gives you everything you need to put entries in the right table and row. Namely the ht # and order # from 'tc'. True - but with a caveat. There are only 2^12 max tables you can have for example and up to 2^12 filters per bucket etc. This is a software limitation as well right? If it hasn't showed up as a limitation on the software side why would it be an issue here? Do you have more than 2^12 tables on your devices? If so I guess we can tack on another 32bits somewhere. That handle is used as an "Address" to the 32 bit filter. Just beware of the semantics the handle has. It hasnt shown up as a software limitation because the defaults are good enough for most people. But if you ever want to install a million rules that can be looked up at a reasonable pps rate it will become very obvious quickly. I have a sample setup in the talk tommorow which shows such an example. I think you need to take the u32 address and map it to something in your hardware. But at the same time it is important to have the abstraction closely emulate your hardware. IMO the hardware/interface must preserve the same ordering of filters/hash_Tables/etc. How it does that mapping should be a driver concern and it can always abort if it fails. Sure. Also ran a few tests and can't see how priority works in u32 maybe you can shed some light but as best I can tell it doesn't have any effect on rule execution. True. u32 doesnt care because it will give you a nodeid if you dont specify one. i.e conflict resolution is mapped to you not specifying exactly the same ht:bkt:nodeid more than once. And if you will let the kernel do it for you (as i am assumming you are saying your hardware will) then no need. Yep. Faithfully offloading u32 here not changing anything except I do have to abort on some cases with the simpler devices. fm10k for example can model hash nodes with divisors > 1. I wonder if when we get to capabilities we can do this... My issue is we can map flower onto u32 that is fine and u32 onto bpf. But we lose a lot of the power of each classifier when we do this. flower for example is nice because of its simplicity presumably this translates into faster updates, u32 is great because we get full parse graph support and hash tables, ebpf is the biggest beast of all and lets us load arbitrary functions into the device. All are nice in their own right. Did i send you my slides? ;-> cheers, jamal
Re: [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe
On Thu, Feb 04, 2016 at 10:16:56AM +0100, Jiri Pirko wrote: > Wed, Feb 03, 2016 at 10:27:32AM CET, john.fastab...@gmail.com wrote: > > > >Also by adding get_parse_graph and set_parse_graph attributes as > >in my previous flow_api work we can build programmable devices > >and programmatically learn when rules can or can not be loaded > >into the hardware. Again future work. > > > >Any comments/feedback appreciated. > > I like this being thin and elegant solution. However, ~2 years ago when I > pushed openvswitch kernel datapath offload patchset, people were yelling > at me that it is not generic enough solution, that tc has to be able > to use the api (Jamal :)), nftables as well. I would be glad to join this debate during NetDev 1.1 too. I think we should provide a solution that allows people uses both tc and nftables, this would require a bit of generic infrastructure on top of it so we don't restrict users to one single solution, in other words, we allow the user to select its own poison. > Now this patch is making offload strictly tc-based and nobody seems to > care :) I do. I think that we might try to find some generic middle layer. I agree and I'll be happy to help to push this ahead. Let's try to sit and get together to resolve this. See you soon.
Re: [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe
On 16-02-03 01:48 PM, Fastabend, John R wrote: BTW: For the record John, I empathize with you that we need to move. Please have patience - we are close; lets just get this resolved in Seville. I like your patches a lot and would love to just have your patches pushed in, but the challenges with community is being able to reach some middle ground. We are not as bad as some of the standards organizations. I am sure we'll get this resolved by end of next week if not, I am %100 in agreement some form of your patches (And Amir's need to go in and then we can refactor as needed) 1) "priorities" for filters and some form of "index" for actions is is needed. I think index (which tends to be a 32 bit value is what Amir's patches refered to as "cookie" - or at least some hardware can be used to query the action with). Priorities maybe implicit in the order in which they are added. And th idea of appending vs exclusivity vs replace (which netlink already supports) is important to worry about (TCAMS tend to assume an append mode for example). The code denotes add/del/replace already. I'm not sure why a TCAM would assume an append mode but OK maybe that is some API you have the APIs I use don't have these semantics. Basically most hardware (or i should say driver implementations of mostly TCAMS) allow you to add exactly the same filter as many times as you want. They dont really look at what you want to filter on and then scream "conflict". IOW, you (user) are responsible for conflict resolution at the filter level. The driver sees this blob and requests for some index/key from the hardware then just adds it. You can then use this key/index to delete/replace etc. This is what i meant by "append" mode. However if a classifier implementation cares about filter ambiguity resolution, then priorities are used. We need to worry about the bigger picture. For this series using cls_u32 the handle gives you everything you need to put entries in the right table and row. Namely the ht # and order # from 'tc'. True - but with a caveat. There are only 2^12 max tables you can have for example and up to 2^12 filters per bucket etc. Take a look at u32_change and u32_classify its the handle that places the filter into the list and the handle that is matched in classify. We should place the filters in the hardware in the same order that is used by u32_change. I can see some parallels, but: The nodeid in itself is insufficent for two reasons: You cant have more than 2^12 filters per bucket; and the nodeid then takes two meanings: a) it is an id b) it specifies the order in which things are looked up. I think you need to take the u32 address and map it to something in your hardware. But at the same time it is important to have the abstraction closely emulate your hardware. Also ran a few tests and can't see how priority works in u32 maybe you can shed some light but as best I can tell it doesn't have any effect on rule execution. True. u32 doesnt care because it will give you a nodeid if you dont specify one. i.e conflict resolution is mapped to you not specifying exactly the same ht:bkt:nodeid more than once. And if you will let the kernel do it for you (as i am assumming you are saying your hardware will) then no need. 2) I like the u32 approach where it makes sense; but sometimes it doesnt make sense from a usability pov. I work with some ASICs that have 10 tuples that are fixed. Yes, a user can describe a policy with u32 but flower would be more usable say with flower (both programmatic and cli) Sure so create a set of offload hooks for flower we don't need only one hardware classifier any more than we would like a single software classifiers. Glad to hear that. I was a little concerned that despite my love for u32 it was going to be _the_ classifier. It doesnt fit for all offload cases and sometimes it is because of human operators (the 10 tuple hardware classifier i mentioned earlier). BTW: Classifier in this case is very wide ranging (a regex hardware offload for example qualifies). Again I'm trying to faithfully implement what we have in software and load that into the hardware. The handle today gives ingress/egres hook. If you want an all ports hook we should add it to 'tc' software first and then push that to the hardware not create magic hardware bits. See I've drank the cool aid software first than hardware. ;-> No disagreement. It felt like a small sensible change - thats why i suggested it. 4) Why are we forsaking switchdev John? This is certainly re-usable beyond NICs and SRIOV. Sure and switchdev can use it just like they use fdb_add and friends. I just don't want to require switchdev infrastructure on things that really are not switches. I think Amir indicated he would take a try at the switchdev integration. If not I'm willing to do it but it doesn't block this series in any way imo. Ok. Makes sense. 5)What happened to being both able to hardware and/or software?
Re: [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe
Wed, Feb 03, 2016 at 10:27:32AM CET, john.fastab...@gmail.com wrote: >This extends the setup_tc framework so it can support more than just >the mqprio offload and push other classifiers and qdiscs into the >hardware. The series here targets the u32 classifier and ixgbe >driver. I worked out the u32 classifier because it is protocol >oblivious and aligns with multiple hardware devices I have access >to. I did an initial implementation on ixgbe because (a) I have one >in my box (b) its a stable driver and (c) it is relatively simple >compared to the other devices I have here but still has enough >flexibility to exercise the features of cls_u32. > >I intentionally limited the scope of this series to the basic >feature set. Specifically this uses a 'big hammer' feature bit >to do the offload or not. If the bit is set you get offloaded rules >if it is not then rules will not be offloaded. If we can agree on >this patch series there are some more patches on my queue we can >talk about to make the offload decision per rule using flags similar >to how we do l2 mac updates. Additionally the error strategy can >be improved to be hard aborting, log and continue, etc. I think >these are nice to have improvements but shouldn't block this series. > >Also by adding get_parse_graph and set_parse_graph attributes as >in my previous flow_api work we can build programmable devices >and programmatically learn when rules can or can not be loaded >into the hardware. Again future work. > >Any comments/feedback appreciated. I like this being thin and elegant solution. However, ~2 years ago when I pushed openvswitch kernel datapath offload patchset, people were yelling at me that it is not generic enough solution, that tc has to be able to use the api (Jamal :)), nftables as well. Now this patch is making offload strictly tc-based and nobody seems to care :) I do. I think that we might try to find some generic middle layer. Let's discuss this more in person next week. Thanks!
Re: [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe
On 2/3/2016 12:21 PM, John Fastabend wrote: Thanks, we will need at least a v2 to fixup some build errors with various compile flags caught by build_bot and missed by me. Hi John, You didn't mark that as RFC... but we said this direction/approach yet to be talked @ netdev next-week, so.. can you clarify? I suggest not to rush and asking pulling this, lets have the tc workshop beforehand... Please add to v2 listing of changes from V0/V1 to assist with the review. thanks, Or.
[net-next PATCH 0/7] tc offload for cls_u32 on ixgbe
This extends the setup_tc framework so it can support more than just the mqprio offload and push other classifiers and qdiscs into the hardware. The series here targets the u32 classifier and ixgbe driver. I worked out the u32 classifier because it is protocol oblivious and aligns with multiple hardware devices I have access to. I did an initial implementation on ixgbe because (a) I have one in my box (b) its a stable driver and (c) it is relatively simple compared to the other devices I have here but still has enough flexibility to exercise the features of cls_u32. I intentionally limited the scope of this series to the basic feature set. Specifically this uses a 'big hammer' feature bit to do the offload or not. If the bit is set you get offloaded rules if it is not then rules will not be offloaded. If we can agree on this patch series there are some more patches on my queue we can talk about to make the offload decision per rule using flags similar to how we do l2 mac updates. Additionally the error strategy can be improved to be hard aborting, log and continue, etc. I think these are nice to have improvements but shouldn't block this series. Also by adding get_parse_graph and set_parse_graph attributes as in my previous flow_api work we can build programmable devices and programmatically learn when rules can or can not be loaded into the hardware. Again future work. Any comments/feedback appreciated. Thanks, John --- John Fastabend (7): net: rework ndo tc op to consume additional qdisc handle parameter net: rework setup_tc ndo op to consume general tc operand net: sched: add cls_u32 offload hooks for netdevs net: add tc offload feature flag net: tc: helper functions to query action types net: ixgbe: add minimal parser details for ixgbe net: ixgbe: add support for tc_u32 offload drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c |8 + drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h |2 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |2 drivers/net/ethernet/broadcom/bnxt/bnxt.c|9 + drivers/net/ethernet/intel/fm10k/fm10k_netdev.c | 11 + drivers/net/ethernet/intel/i40e/i40e_main.c | 10 + drivers/net/ethernet/intel/ixgbe/ixgbe.h |3 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |6 - drivers/net/ethernet/intel/ixgbe/ixgbe_main.c| 206 ++ drivers/net/ethernet/intel/ixgbe/ixgbe_model.h | 112 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 13 + drivers/net/ethernet/sfc/efx.h |3 drivers/net/ethernet/sfc/tx.c| 10 + drivers/net/ethernet/ti/netcp_core.c | 14 + include/linux/netdev_features.h |3 include/linux/netdevice.h| 24 ++- include/net/pkt_cls.h| 33 include/net/tc_act/tc_gact.h | 16 ++ net/core/ethtool.c |1 net/sched/cls_u32.c | 73 net/sched/sch_mqprio.c |8 + 21 files changed, 541 insertions(+), 26 deletions(-) create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_model.h -- Signature
Re: [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe
On Wed, Feb 03, 2016 at 01:27:32AM -0800, John Fastabend wrote: > This extends the setup_tc framework so it can support more than just > the mqprio offload and push other classifiers and qdiscs into the > hardware. The series here targets the u32 classifier and ixgbe > driver. I worked out the u32 classifier because it is protocol > oblivious and aligns with multiple hardware devices I have access > to. I did an initial implementation on ixgbe because (a) I have one > in my box (b) its a stable driver and (c) it is relatively simple > compared to the other devices I have here but still has enough > flexibility to exercise the features of cls_u32. > > I intentionally limited the scope of this series to the basic > feature set. Specifically this uses a 'big hammer' feature bit > to do the offload or not. If the bit is set you get offloaded rules > if it is not then rules will not be offloaded. If we can agree on > this patch series there are some more patches on my queue we can > talk about to make the offload decision per rule using flags similar > to how we do l2 mac updates. Additionally the error strategy can > be improved to be hard aborting, log and continue, etc. I think > these are nice to have improvements but shouldn't block this series. > > Also by adding get_parse_graph and set_parse_graph attributes as > in my previous flow_api work we can build programmable devices > and programmatically learn when rules can or can not be loaded > into the hardware. Again future work. > > Any comments/feedback appreciated. > > Thanks, > John > > --- > > John Fastabend (7): > net: rework ndo tc op to consume additional qdisc handle parameter > net: rework setup_tc ndo op to consume general tc operand > net: sched: add cls_u32 offload hooks for netdevs > net: add tc offload feature flag > net: tc: helper functions to query action types > net: ixgbe: add minimal parser details for ixgbe > net: ixgbe: add support for tc_u32 offload > Hi John, Nice work :) I will add mlx5 support, and see if can live with u32. If not - will add flower support too. Amir
Re: [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe
On 16-02-03 02:11 AM, Amir Vadai" wrote: > On Wed, Feb 03, 2016 at 01:27:32AM -0800, John Fastabend wrote: >> This extends the setup_tc framework so it can support more than just >> the mqprio offload and push other classifiers and qdiscs into the >> hardware. The series here targets the u32 classifier and ixgbe >> driver. I worked out the u32 classifier because it is protocol >> oblivious and aligns with multiple hardware devices I have access >> to. I did an initial implementation on ixgbe because (a) I have one >> in my box (b) its a stable driver and (c) it is relatively simple >> compared to the other devices I have here but still has enough >> flexibility to exercise the features of cls_u32. >> >> I intentionally limited the scope of this series to the basic >> feature set. Specifically this uses a 'big hammer' feature bit >> to do the offload or not. If the bit is set you get offloaded rules >> if it is not then rules will not be offloaded. If we can agree on >> this patch series there are some more patches on my queue we can >> talk about to make the offload decision per rule using flags similar >> to how we do l2 mac updates. Additionally the error strategy can >> be improved to be hard aborting, log and continue, etc. I think >> these are nice to have improvements but shouldn't block this series. >> >> Also by adding get_parse_graph and set_parse_graph attributes as >> in my previous flow_api work we can build programmable devices >> and programmatically learn when rules can or can not be loaded >> into the hardware. Again future work. >> >> Any comments/feedback appreciated. >> >> Thanks, >> John >> >> --- >> >> John Fastabend (7): >> net: rework ndo tc op to consume additional qdisc handle parameter >> net: rework setup_tc ndo op to consume general tc operand >> net: sched: add cls_u32 offload hooks for netdevs >> net: add tc offload feature flag >> net: tc: helper functions to query action types >> net: ixgbe: add minimal parser details for ixgbe >> net: ixgbe: add support for tc_u32 offload >> > > Hi John, > > Nice work :) Thanks, we will need at least a v2 to fixup some build errors with various compile flags caught by build_bot and missed by me. > > I will add mlx5 support, and see if can live with u32. If not - will > add flower support too. That would be great. Thanks .John > > Amir >
Re: [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe
On 16-02-03 05:31 AM, Or Gerlitz wrote: On 2/3/2016 12:21 PM, John Fastabend wrote: Thanks, we will need at least a v2 to fixup some build errors with various compile flags caught by build_bot and missed by me. Hi John, You didn't mark that as RFC... but we said this direction/approach yet to be talked @ netdev next-week, so.. can you clarify? I suggest not to rush and asking pulling this, lets have the tc workshop beforehand... Yes, the tc workshop is a good place for this. I think we can spill some of it into the switchdev workshop (which is a nice flow since that happens later). Some comments: 1) "priorities" for filters and some form of "index" for actions is is needed. I think index (which tends to be a 32 bit value is what Amir's patches refered to as "cookie" - or at least some hardware can be used to query the action with). Priorities maybe implicit in the order in which they are added. And the idea of appending vs exclusivity vs replace (which netlink already supports) is important to worry about (TCAMS tend to assume an append mode for example). 2) I like the u32 approach where it makes sense; but sometimes it doesnt make sense from a usability pov. I work with some ASICs that have 10 tuples that are fixed. Yes, a user can describe a policy with u32 but flower would be more usable say with flower (both programmatic and cli) 3) The concept of "hook address" is important to be able to express. Amir's patches seemed to miss that (and John brought it up in an email). It could be as simple as ifindex + hookid. With ifindex of 0 meaning all ports and maybe hookid of 0 meaning all hooks. Hook semantics are as mentioned by John (as it stands right now in/egress) 4) Why are we forsaking switchdev John? This is certainly re-usable beyond NICs and SRIOV. 5)What happened to being both able to hardware and/or software? Anyways, I think Seville would be a blast! Come one, come all. cheers, jamal
Re: [net-next PATCH 0/7] tc offload for cls_u32 on ixgbe
On 2/3/2016 4:21 AM, Jamal Hadi Salim wrote: > On 16-02-03 05:31 AM, Or Gerlitz wrote: >> On 2/3/2016 12:21 PM, John Fastabend wrote: >>> Thanks, we will need at least a v2 to fixup some build errors >>> with various compile flags caught by build_bot and missed by me. >> Hi John, >> >> You didn't mark that as RFC... but we said this direction/approach yet >> to be talked @ netdev next-week, so.. can you clarify? Yeah I think this set of patches is ready and it really is what we've been talking about doing for two or three conferences now. Also I don't know where "we" decided to talk about this @ netdev or how that became a prereq for patches. I think this is the correct approach and I am not seeing any contentious pieces here so why not consider it for inclusion. Do you have some issue with the approach? I don't recall any when we talked about this last time. >> >> I suggest not to rush and asking pulling this, lets have the tc workshop >> beforehand... >> rush? we've been talking about it for a year+ > > Yes, the tc workshop is a good place for this. > I think we can spill some of it into the switchdev workshop (which is a > nice flow since that happens later). > Sure but this patch is really the basic stuff we should move on to some of the more interesting pieces. I have ~30 or so patches behind this that do the fun stuff like resource allocation, capababilities, support for the divisor > 1, some new actions, etc. > Some comments: > 1) "priorities" for filters and some form of "index" for actions is > is needed. I think index (which tends to be a 32 bit value is what > Amir's patches refered to as "cookie" - or at least some hardware > can be used to query the action with). Priorities maybe implicit in > the order in which they are added. And th idea of appending vs > exclusivity vs replace (which netlink already supports) > is important to worry about (TCAMS tend to assume an append mode > for example). The code denotes add/del/replace already. I'm not sure why a TCAM would assume an append mode but OK maybe that is some API you have the APIs I use don't have these semantics. For this series using cls_u32 the handle gives you everything you need to put entries in the right table and row. Namely the ht # and order # from 'tc'. Take a look at u32_change and u32_classify its the handle that places the filter into the list and the handle that is matched in classify. We should place the filters in the hardware in the same order that is used by u32_change. Also ran a few tests and can't see how priority works in u32 maybe you can shed some light but as best I can tell it doesn't have any effect on rule execution. > > 2) I like the u32 approach where it makes sense; but sometimes it > doesnt make sense from a usability pov. I work with some ASICs > that have 10 tuples that are fixed. Yes, a user can describe a policy > with u32 but flower would be more usable say with flower (both > programmatic and cli) Sure so create a set of offload hooks for flower we don't need only one hardware classifier any more than we would like a single software classifiers. I'll send out my flower patches when I get to a real system I'm on a corporate laptop at the moment. > > 3) The concept of "hook address" is important to be able to express. > Amir's patches seemed to miss that (and John brought it up in an > email). It could be as simple as ifindex + hookid. With ifindex of > 0 meaning all ports and maybe hookid of 0 meaning all hooks. > Hook semantics are as mentioned by John (as it stands right now > in/egress) > Again I'm trying to faithfully implement what we have in software and load that into the hardware. The handle today gives ingress/egres hook. If you want an all ports hook we should add it to 'tc' software first and then push that to the hardware not create magic hardware bits. See I've drank the cool aid software first than hardware. > 4) Why are we forsaking switchdev John? > This is certainly re-usable beyond NICs and SRIOV. > Sure and switchdev can use it just like they use fdb_add and friends. I just don't want to require switchdev infrastructure on things that really are not switches. I think Amir indicated he would take a try at the switchdev integration. If not I'm willing to do it but it doesn't block this series in any way imo. > 5)What happened to being both able to hardware and/or software? Follow up patch once we get the basic infrastructure in place with the big feature flag bit. I have a patch I'm testing for this now but again I want to move in logical and somewhat minimal sets. > > Anyways, I think Seville would be a blast! Come one, come all. > I'll be there but lets be sure to follow up with this online I know folks are following this who wont be at Seville and I don't see any reason to block these patches and stop the thread for a week or more. > cheers, > jamal > >