Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
On 04/23/15 20:59, Alexei Starovoitov wrote: On 4/23/15 3:51 PM, Jamal Hadi Salim wrote: So you are planning to add queues? If you are that is a different discussion (and the use case needs some clarity). nope. I wasn't planning to do that. Then i would say, lets just keep the naming convention. Maybe better documentation is needed. For packets being forwarded we already had egress qdiscs which had queues so it didnt seem to make sense to enqueue on ingress for such use cases. There was a use case later where multiple ingress ports had to be shared and ifb was born - which is pseudo temporary enqueueing on ingress. agree. imo ifb approach is more flexible, since it has full hierarchy of qdiscs. As you're saying above and from the old ifb logs I thought that ifb is _temporary_ and long term plan is to use ingress_queue, but looks like this is not the case. ifb represented a real use case which questioned the lack of ingress queue. But do note, that problem was solved. And in engineering we just move on unless a compelling reason makes us rethink. IMO, the original use case for ifb no longer requires it but the internets and the googles havent caught up with it yet. Refer to my netdev01 preso/paper where i talk about sharing. However, ifb addressed another issue. Instead of having per port/netdev policies we can now have per-aggregate-port-group policies. While this comes at a small cost of redirecting packets, that penalty is only paid for by people interested in defining such policies. This in itself is useful. Also not too long ago we decided that we don't want another ingress qdisc. Therefore I think it makes sense to simplify the code and boost performance. I'm not proposing to change tooling, of course. From iproute2 point of view we'll still have ingress qdisc. Right now we're pointlessly allocating memory for it and for ingress_queue, whereas we only need to call cls/act. I'm proposing to kill them and used tcf_proto in net_device instead. Right now to reach cls in critical path on ingress we do: rxq = skb-dev-ingress_queue sch = rxq-qdisc sch-enqueue sch-filter_list with a bunch of 'if' conditions and useless memory accesses in-between. If we add 'ingress_filter_list' directly to net_device, it will be just: skb-dev-ingress_filter_list which is huge performance boost. Code size will shrink as well. iproute2 and all existing tools will work as-is. Looks as win-win to me. I hear you; any extra cycle we can remove from the equation is useful. But do note: in this case, it comes down to the thin line between usability and performance. Do take a closer look at the tooling interfaces from user space on ingress qdisc. The serialization, the ops already provided for manipulating filters etc. Those are for free if you use the qdisc abstraction. If you can still keep that and get rid of the opcodes you mention above then it is win-win. My feeling is it is a lot more challenging. The fact that qdiscs dealt with these codes directly allows for specialized handling. Moving them to a generic function seems to defeat that purpose. So I am siding with Cong on this. that's not what patch 1 is doing. It is still doing specialized handling... but in light of what you said above, it looks like much bigger cleanup is needed. We'll continue arguing when I refactor this set and resubmit when net-next reopens. Sorry - i was refereing to the patch where you agregated things after the qdisc invokes a classifier. hmm. There I also didn't convert all qdiscs into single helper. Only those that have exactly the same logic after tc_classify. All other qdiscs have custom handling. No worries, it's hard to review things while traveling. Been there :) I am sorry again - will look when i get out of travel mode. cheers, jamal -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
On 04/24/2015 05:37 AM, Cong Wang wrote: On Thu, Apr 23, 2015 at 5:59 PM, Alexei Starovoitov a...@plumgrid.com wrote: On 4/23/15 3:51 PM, Jamal Hadi Salim wrote: ... agree. imo ifb approach is more flexible, since it has full hierarchy of qdiscs. As you're saying above and from the old ifb logs I thought that ifb is _temporary_ and long term plan is to use ingress_queue, but looks like this is not the case. Also not too long ago we decided that we don't want another ingress qdisc. Therefore I think it makes sense to simplify the code and boost performance. Which performance problem did you see? Did you hit the spinlock bottleneck? That spinlock should be eliminated before you try to do more [...] Those are already the next steps for improving performance after that, by attempting to get rid of these additional indirections and shrinking code. I'm not proposing to change tooling, of course. From iproute2 point of view we'll still have ingress qdisc. Right now we're pointlessly allocating memory for it and for ingress_queue, whereas we only need to call cls/act. I'm proposing to kill them and used tcf_proto in net_device instead. Right now to reach cls in critical path on ingress we do: rxq = skb-dev-ingress_queue sch = rxq-qdisc sch-enqueue sch-filter_list with a bunch of 'if' conditions and useless memory accesses in-between. If we add 'ingress_filter_list' directly to net_device, it will be just: skb-dev-ingress_filter_list which is huge performance boost. Code size will shrink as well. iproute2 and all existing tools will work as-is. Looks as win-win to me. Nope, it breaks Qdisc abstraction, filters never attach to netdevice directly. You should stop, since you really don't understand the code. Would be great if you make netdev a better place, stop such immature comments, and keep it technical as everyone else. The point here is to try thinking outside the box, so lets see how the result in the end looks like, then we can discuss. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
On 04/23/2015 08:12 PM, Cong Wang wrote: On Wed, Apr 22, 2015 at 7:46 PM, Alexei Starovoitov a...@plumgrid.com wrote: ... TC_ACT_QUEUED cannot be removed. Only deprecated with backwards compatibility the way this patch did it. That should have been obvious. It is at least the third time I have to repeat that: we really don't care about out-of-tree modules. Everyone MUST read Documentation/stable_api_nonsense.txt. Seriously, what are you talking about ??? $ git grep -n TC_ACT_QUEUED include/uapi/ include/uapi/linux/pkt_cls.h:105:#define TC_ACT_QUEUED 5 This is *UAPI*, no-one is even remotely talking about out-of-tree kernel modules. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
On Thu, Apr 23, 2015 at 11:21 AM, Daniel Borkmann dan...@iogearbox.net wrote: On 04/23/2015 08:12 PM, Cong Wang wrote: On Wed, Apr 22, 2015 at 7:46 PM, Alexei Starovoitov a...@plumgrid.com wrote: ... TC_ACT_QUEUED cannot be removed. Only deprecated with backwards compatibility the way this patch did it. That should have been obvious. It is at least the third time I have to repeat that: we really don't care about out-of-tree modules. Everyone MUST read Documentation/stable_api_nonsense.txt. Seriously, what are you talking about ??? $ git grep -n TC_ACT_QUEUED include/uapi/ include/uapi/linux/pkt_cls.h:105:#define TC_ACT_QUEUED 5 This is *UAPI*, no-one is even remotely talking about out-of-tree kernel modules. I am not stupid. You should figure out we can just leave its definition by removing it, leaving the default as stolen. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
On Thu, Apr 23, 2015 at 3:13 PM, Alexei Starovoitov a...@plumgrid.com wrote: On 4/23/15 1:45 PM, Jamal Hadi Salim wrote: 1) the _XMIT semantics are useful on the egress side because in fact we do have queues and they can be attached to qdiscs etc. The TC_ACT_XXX codes were _intentional_ since ingress works as a classifier shell. then it is worse mess than I thought :( Why call it _qdisc_ then? and have special and convoluted handling for it in qdisc_create, qdisc_graft and other places? It needs to glue into qdisc layer, unify the abstractions. Are you planning to queue things on ingress? I thought that was the whole purpose of ingress qdisc. There is no queue for ingress qdisc. why then we have dev-ingress_queue? Glue layer, qdisc ties too much with txq, which is what I want to solve. We have struct netdev_rx_queue too, but look at it, there is nothing but some sysfs stuffs needed by RPS. If queueing was never a goal, may be we should kill ingress qdisc and replace it with a simple shim that only does cls/act. The code overall will get much simpler and faster. As I said in response to your patch for skb-data, we need to align ingress with egress, rather than making more differences, this includes qdisc's too. We do have per-cpu queues for ingress, at least in theory we have a queue for ingress to use. Of course, RX is naturally different from TX, which is the root cause why we have so many differences here. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
On 04/23/15 18:13, Alexei Starovoitov wrote: On 4/23/15 1:45 PM, Jamal Hadi Salim wrote: then it is worse mess than I thought :( Why call it _qdisc_ then? and have special and convoluted handling for it in qdisc_create, qdisc_graft and other places? Convinience for tooling and using existing abstractions is the historical. You can attach qdiscs to netdevs. You can then use all sorts of well understood tools. Are you planning to queue things on ingress? I thought that was the whole purpose of ingress qdisc. why then we have dev-ingress_queue? So you are planning to add queues? If you are that is a different discussion (and the use case needs some clarity). If queueing was never a goal, may be we should kill ingress qdisc and replace it with a simple shim that only does cls/act. The code overall will get much simpler and faster. Attaching to the device was considered simpler based on the tooling thought process. On whether we should queue on ingress - it was not considered useful if the packets were going to the host i.e we want to proceed to completion likely queueing to some socket layer further upstream. For packets being forwarded we already had egress qdiscs which had queues so it didnt seem to make sense to enqueue on ingress for such use cases. There was a use case later where multiple ingress ports had to be shared and ifb was born - which is pseudo temporary enqueueing on ingress. Feels like falling into rabbit hole. The fact that qdiscs dealt with these codes directly allows for specialized handling. Moving them to a generic function seems to defeat that purpose. So I am siding with Cong on this. that's not what patch 1 is doing. It is still doing specialized handling... but in light of what you said above, it looks like much bigger cleanup is needed. We'll continue arguing when I refactor this set and resubmit when net-next reopens. Sorry - i was refereing to the patch where you agregated things after the qdisc invokes a classifier. cheers, jamal -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
Jamal Hadi Salim j...@mojatatu.com wrote: 2) the ACT_QUEUED vs STOLEN was supposed to have semantics of something that was stolen (eg redirection should definetely have been returning STOLEN not QUEUED); something that queues for later re-injection (with any/all metadata) was intended to use QUEUED. I believe netfilter may have followed suit and introduced similar codes (so it would be interesting to see how they use them). Hooks (Targets) don't queue themselves, i.e. NF_QUEUE tells the netfilter core that the skb is to be handed off to nf_queue machinery, while NF_STOLEN is the more obvious don't touch this skb ever again. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
On 4/23/15 1:45 PM, Jamal Hadi Salim wrote: 1) the _XMIT semantics are useful on the egress side because in fact we do have queues and they can be attached to qdiscs etc. The TC_ACT_XXX codes were _intentional_ since ingress works as a classifier shell. then it is worse mess than I thought :( Why call it _qdisc_ then? and have special and convoluted handling for it in qdisc_create, qdisc_graft and other places? Are you planning to queue things on ingress? I thought that was the whole purpose of ingress qdisc. why then we have dev-ingress_queue? If queueing was never a goal, may be we should kill ingress qdisc and replace it with a simple shim that only does cls/act. The code overall will get much simpler and faster. Feels like falling into rabbit hole. The fact that qdiscs dealt with these codes directly allows for specialized handling. Moving them to a generic function seems to defeat that purpose. So I am siding with Cong on this. that's not what patch 1 is doing. It is still doing specialized handling... but in light of what you said above, it looks like much bigger cleanup is needed. We'll continue arguing when I refactor this set and resubmit when net-next reopens. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
On Wed, Apr 22, 2015 at 7:46 PM, Alexei Starovoitov a...@plumgrid.com wrote: TC_ACT_QUEUED cannot be removed. Only deprecated with backwards compatibility the way this patch did it. That should have been obvious. It is at least the third time I have to repeat that: we really don't care about out-of-tree modules. Everyone MUST read Documentation/stable_api_nonsense.txt. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
On Thu, Apr 23, 2015 at 5:59 PM, Alexei Starovoitov a...@plumgrid.com wrote: On 4/23/15 3:51 PM, Jamal Hadi Salim wrote: So you are planning to add queues? If you are that is a different discussion (and the use case needs some clarity). nope. I wasn't planning to do that. For packets being forwarded we already had egress qdiscs which had queues so it didnt seem to make sense to enqueue on ingress for such use cases. There was a use case later where multiple ingress ports had to be shared and ifb was born - which is pseudo temporary enqueueing on ingress. agree. imo ifb approach is more flexible, since it has full hierarchy of qdiscs. As you're saying above and from the old ifb logs I thought that ifb is _temporary_ and long term plan is to use ingress_queue, but looks like this is not the case. Also not too long ago we decided that we don't want another ingress qdisc. Therefore I think it makes sense to simplify the code and boost performance. Which performance problem did you see? Did you hit the spinlock bottleneck? That spinlock should be eliminated before you try to do more, and it can be replaced by RCU read lock after John's work (I am still waiting for his patch btw). I'm not proposing to change tooling, of course. From iproute2 point of view we'll still have ingress qdisc. Right now we're pointlessly allocating memory for it and for ingress_queue, whereas we only need to call cls/act. I'm proposing to kill them and used tcf_proto in net_device instead. Right now to reach cls in critical path on ingress we do: rxq = skb-dev-ingress_queue sch = rxq-qdisc sch-enqueue sch-filter_list with a bunch of 'if' conditions and useless memory accesses in-between. If we add 'ingress_filter_list' directly to net_device, it will be just: skb-dev-ingress_filter_list which is huge performance boost. Code size will shrink as well. iproute2 and all existing tools will work as-is. Looks as win-win to me. Nope, it breaks Qdisc abstraction, filters never attach to netdevice directly. You should stop, since you really don't understand the code. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
On 4/23/15 3:51 PM, Jamal Hadi Salim wrote: So you are planning to add queues? If you are that is a different discussion (and the use case needs some clarity). nope. I wasn't planning to do that. For packets being forwarded we already had egress qdiscs which had queues so it didnt seem to make sense to enqueue on ingress for such use cases. There was a use case later where multiple ingress ports had to be shared and ifb was born - which is pseudo temporary enqueueing on ingress. agree. imo ifb approach is more flexible, since it has full hierarchy of qdiscs. As you're saying above and from the old ifb logs I thought that ifb is _temporary_ and long term plan is to use ingress_queue, but looks like this is not the case. Also not too long ago we decided that we don't want another ingress qdisc. Therefore I think it makes sense to simplify the code and boost performance. I'm not proposing to change tooling, of course. From iproute2 point of view we'll still have ingress qdisc. Right now we're pointlessly allocating memory for it and for ingress_queue, whereas we only need to call cls/act. I'm proposing to kill them and used tcf_proto in net_device instead. Right now to reach cls in critical path on ingress we do: rxq = skb-dev-ingress_queue sch = rxq-qdisc sch-enqueue sch-filter_list with a bunch of 'if' conditions and useless memory accesses in-between. If we add 'ingress_filter_list' directly to net_device, it will be just: skb-dev-ingress_filter_list which is huge performance boost. Code size will shrink as well. iproute2 and all existing tools will work as-is. Looks as win-win to me. The fact that qdiscs dealt with these codes directly allows for specialized handling. Moving them to a generic function seems to defeat that purpose. So I am siding with Cong on this. that's not what patch 1 is doing. It is still doing specialized handling... but in light of what you said above, it looks like much bigger cleanup is needed. We'll continue arguing when I refactor this set and resubmit when net-next reopens. Sorry - i was refereing to the patch where you agregated things after the qdisc invokes a classifier. hmm. There I also didn't convert all qdiscs into single helper. Only those that have exactly the same logic after tc_classify. All other qdiscs have custom handling. No worries, it's hard to review things while traveling. Been there :) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
On 04/23/2015 04:46 AM, Alexei Starovoitov wrote: ... The other two threads degenerated into non-technical comments. Yep. :-/ Anyway, this set was RFC to answer my main question whether I should continue with tc cleanup or stop right here. I got my answer. I think it's worth proceeding with this, it's long overdue. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
On 4/22/15 4:39 PM, Cong Wang wrote: On Wed, Apr 22, 2015 at 3:22 PM, Alexei Starovoitov a...@plumgrid.com wrote: On 4/21/15 10:02 PM, Cong Wang wrote: On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov a...@plumgrid.com wrote: TC_ACT_QUEUED was always an alias of TC_ACT_STOLEN. Get rid of redundant checks in all qdiscs. Instead do it once. The current code can be easily extended, while your code not. I don't see the need of this change. well, iproute2 doesn't use TC_ACT_QUEUED action at all and TC_ACT_STOLEN is used by mirred. All in-tree qdiscs alias them. If you're saying that some future actions together with some future qdiscs may take advantage of that, then why they didn't use it over the last 10 years? Having both that do the same thing is only confusing. I think having one value to indicate 'stolen' condition makes TC code easier to understand. Then remove it, I am all for this. ;) TC_ACT_QUEUED cannot be removed. Only deprecated with backwards compatibility the way this patch did it. That should have been obvious. The other two threads degenerated into non-technical comments. Anyway, this set was RFC to answer my main question whether I should continue with tc cleanup or stop right here. I got my answer. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
On 4/21/15 10:02 PM, Cong Wang wrote: On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov a...@plumgrid.com wrote: TC_ACT_QUEUED was always an alias of TC_ACT_STOLEN. Get rid of redundant checks in all qdiscs. Instead do it once. The current code can be easily extended, while your code not. I don't see the need of this change. well, iproute2 doesn't use TC_ACT_QUEUED action at all and TC_ACT_STOLEN is used by mirred. All in-tree qdiscs alias them. If you're saying that some future actions together with some future qdiscs may take advantage of that, then why they didn't use it over the last 10 years? Having both that do the same thing is only confusing. I think having one value to indicate 'stolen' condition makes TC code easier to understand. Jamal, what's your take on this? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
On Wed, Apr 22, 2015 at 3:22 PM, Alexei Starovoitov a...@plumgrid.com wrote: On 4/21/15 10:02 PM, Cong Wang wrote: On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov a...@plumgrid.com wrote: TC_ACT_QUEUED was always an alias of TC_ACT_STOLEN. Get rid of redundant checks in all qdiscs. Instead do it once. The current code can be easily extended, while your code not. I don't see the need of this change. well, iproute2 doesn't use TC_ACT_QUEUED action at all and TC_ACT_STOLEN is used by mirred. All in-tree qdiscs alias them. If you're saying that some future actions together with some future qdiscs may take advantage of that, then why they didn't use it over the last 10 years? Having both that do the same thing is only confusing. I think having one value to indicate 'stolen' condition makes TC code easier to understand. Then remove it, I am all for this. ;) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
On 04/22/2015 07:02 AM, Cong Wang wrote: On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov a...@plumgrid.com wrote: TC_ACT_QUEUED was always an alias of TC_ACT_STOLEN. Get rid of redundant checks in all qdiscs. Instead do it once. The current code can be easily extended, while your code not. So, do you plan to extend it ? The alias has been there since pre-git times, so more than 10 years now, where no-one felt the need to extended the semantics of it ... -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 2/3] tc: deprecate TC_ACT_QUEUED
On Tue, Apr 21, 2015 at 12:27 PM, Alexei Starovoitov a...@plumgrid.com wrote: TC_ACT_QUEUED was always an alias of TC_ACT_STOLEN. Get rid of redundant checks in all qdiscs. Instead do it once. The current code can be easily extended, while your code not. I don't see the need of this change. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html