Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
On Sun, 2006-09-07 at 08:52 -0400, Jamal Hadi Salim wrote: Ok, didnt complete my thought there .. I dont like it, like i said, because it adds one more check in the first path. I was contemplating at some point even not doing the .. not doing the check for device up and just shove the packet at it. cheers, jamal PS:- warning on response still applies. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
* Jamal Hadi Salim [EMAIL PROTECTED] 2006-07-09 08:52 On Sun, 2006-09-07 at 01:46 +0200, Thomas Graf wrote: * Jamal Hadi Salim [EMAIL PROTECTED] 2006-07-08 10:14 [..] There is no easy way to detect such a deadlock. I think it reduces to the same case as eth0-eth0, i.e: It's very simple. Just use the same method and add a flag if a mirred is currently in use as Herbert did to qdisc_run(). But that would make it more complex in the sense it would require a lot more code. Another simpler approach is to check for recursion right in It's very interesting that you change from there is no easy way to another simpler approach... in just one posting :-) dev_queue_xmit() - but i did not dare to do that because i could not justify it for any other code other than loops created by the action code. i.e something along the lines of: if (q-enqueue) { if (!spin_trylock(dev-queue_lock)) { printk(yikes recursed device %s\n,dev-name); goto tx_recursion_detected; } That's not gonna work, dev-queue_lock may be held legimitately by someone else than an underlying dev_queue_xmit() call. [BTW, notice how i used code to describe my view above ;-] I'm very proud of you. Again, I was hoping that Herbert's stuff may change this view (and therefore give it more legitimacy) - but if the above can be accepted then we can forget touching mirred. You need this: if (test_and_set_bit(__LINK_STATE_ENQUEUEING, dev-state)) goto tx_recursion_detected; spin_lock(queue_lock); enqueue(...) qdisc_run() spin_unlock(queue_lock); clear_bit(__LINK_TATE_ENQUEUEING, dev-state); Unfortunately we'd still need __LINK_STATE_QDISC_RUNNING due to net_tx_action(). No Thomas - this is a valid check. As valid as mirred checking if device is up first. I dont like it, like i said, because it adds one more check in the first path. I was contemplating at some point even not doing the I'm not going to argue about this. By mirred deadlock i think you mean the deadlock that happens in dev_queue_xmit()? I meant the deadlock within mirred but the deadlock on queue_lock is happening first anyway. What is still not clear above? Frankly, I have no idea which deadlocks are intentional, what ideas you have about ingress-egress, etc because it is not documented. The list is very long. I'm not going to waste time trying to work out a patch that will then not suit the ideas in your mind and on your notes. You're maintaing this code, no? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
On Sun, 2006-09-07 at 15:33 +0200, Thomas Graf wrote: * Jamal Hadi Salim [EMAIL PROTECTED] 2006-07-09 08:52 [..] Another simpler approach is to check for recursion right in It's very interesting that you change from there is no easy way to another simpler approach... in just one posting :-) I said there was no easy way of detecting - which is different from preventing. Both approaches i showed were things i used (as far back as last year) but did not find palatable to submit. Hopefully we dont go into a tangent on this. if (q-enqueue) { if (!spin_trylock(dev-queue_lock)) { printk(yikes recursed device %s\n,dev-name); goto tx_recursion_detected; } That's not gonna work, dev-queue_lock may be held legimitately by someone else than an underlying dev_queue_xmit() call. If there is a legitimate reason then it wont work. I cant think of one though. [BTW, notice how i used code to describe my view above ;-] I'm very proud of you. Thank you thank you Again, I was hoping that Herbert's stuff may change this view (and therefore give it more legitimacy) - but if the above can be accepted then we can forget touching mirred. You need this: if (test_and_set_bit(__LINK_STATE_ENQUEUEING, dev-state)) goto tx_recursion_detected; spin_lock(queue_lock); enqueue(...) qdisc_run() spin_unlock(queue_lock); clear_bit(__LINK_TATE_ENQUEUEING, dev-state); Unfortunately we'd still need __LINK_STATE_QDISC_RUNNING due to net_tx_action(). This is also another approach that would work. If you think its simpler go ahead and shoot a patch. By mirred deadlock i think you mean the deadlock that happens in dev_queue_xmit()? I meant the deadlock within mirred but the deadlock on queue_lock is happening first anyway. if you can stop things at the queue you wont have to worry about mirred. What is still not clear above? Frankly, I have no idea which deadlocks are intentional, what ideas you have about ingress-egress, etc because it is not documented. The list is very long. Ask one question at a time and i will answer. Some of the things we discussed have no place to be commented-on in the code. But i think what is obvious is: A-*-A is a no-no. And in some cases it is fine to let the user just fsck themselves because then they will understand it is a bad idea [1] when shit happens. OTOH, if there was a KISS way of doing it (as in the ifb case, why not). I'm not going to waste time trying to work out a patch that will then not suit the ideas in your mind and on your notes. You're maintaing this code, no? Yes, of course otherwise i wouldnt bother to comment on any patches. cheers, jamal [1] there was a long discussion a few years back when i was still in lkml on someone trying to redirect (i think) /dev/null - /dev/mem (dont hold me accountable if those are not the right devices, just absorb the moral of this instead). The consensus was it would be very difficult to do without making a lot of changes and if someone wishes to do that they deserved what they are getting. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
* Jamal Hadi Salim [EMAIL PROTECTED] 2006-07-09 10:03 On Sun, 2006-09-07 at 15:33 +0200, Thomas Graf wrote: That's not gonna work, dev-queue_lock may be held legimitately by someone else than an underlying dev_queue_xmit() call. If there is a legitimate reason then it wont work. I cant think of one though. See sch_generic.c, it's documented. A simple grep on queue_lock would have told you the same. This is also another approach that would work. If you think its simpler go ahead and shoot a patch. It's not simpler, it's correct, while your patch is wrong. A-*-A is a no-no. And in some cases it is fine to let the user just fsck themselves because then they will understand it is a bad idea [1] when shit happens. OTOH, if there was a KISS way of doing it (as in the ifb case, why not). I remind you that you started mentioning this A-*-A case while talking about tx deadlocks that were supposed to be prevented with the !from check or something along that lines. I can't really tell because you explain it differently in every posting. Yes, of course otherwise i wouldnt bother to comment on any patches. So maintain the code and fix your bugs. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
On Sun, 2006-09-07 at 16:19 +0200, Thomas Graf wrote: * Jamal Hadi Salim [EMAIL PROTECTED] 2006-07-09 10:03 On Sun, 2006-09-07 at 15:33 +0200, Thomas Graf wrote: That's not gonna work, dev-queue_lock may be held legimitately by someone else than an underlying dev_queue_xmit() call. If there is a legitimate reason then it wont work. I cant think of one though. See sch_generic.c, it's documented. A simple grep on queue_lock would have told you the same. If you mean that the device will also try to grab the qlock there, then that is fine still for the serialization. It all starts at dev_queue_xmit. This is also another approach that would work. If you think its simpler go ahead and shoot a patch. It's not simpler, it's correct, while your patch is wrong. Ok, calm down, will you? Man, you make it very hard to follow Daves Tibetan approach. Let me tell you exactly how i feel about your approach: It is unnecessarily complex. The approach i posted is not only fine, it works with only 4-5 lines of code; i have numerous tests against it over a long period of time. I was trying to be polite and follow Daves advice not to insist it has to be done my way - it almost seems impossible with you trying to nitpick on little unimportant details. A-*-A is a no-no. And in some cases it is fine to let the user just fsck themselves because then they will understand it is a bad idea [1] when shit happens. OTOH, if there was a KISS way of doing it (as in the ifb case, why not). I remind you that you started mentioning this A-*-A case while talking about tx deadlocks that were supposed to be prevented with the !from check or something along that lines. I can't really tell because you explain it differently in every posting. Because you are looking for preciseness at the micro/code level and i am trying to explain at the conceptual/macro level (I have to adjust to your mode but it is hard for me because i dont think at our level that matters). When i sense you didnt understand me the last time, I try to explain it better. The deadlock happens on transmit. If you want to be precise, it happens when dev_queue_xmit is invoked. If you want more preciseness, when the queue lock is contended. If you want more preciseness, the code sample that i showed you should help illustrate it. Yes, of course otherwise i wouldnt bother to comment on any patches. So maintain the code and fix your bugs. Ok, I dont think this is gonna work - I may have to give up on getting any resolution. Heres the suggestion again and you can still shoot a patch: - If we can avoid doing anything at mirred that is the most preferable approach. i.e get the change for free. In other words if it complicates things it is not worth it. Someone redirecting eth0 to eth0 on egress deserves whats happening to them. - Try it against Herberts patch. It may resolve something or may require an adjustment to Herberts patch so we can kill two birds with one stone i.e get it for free. I dont know. I guess i shouldnt say i dont know it is not precise, my point is you may find things when you attempt the change. The patch is in Daves 2.6.18 tree. - Iam fine with your suggestion to use a flag. The problem is not which scheme you use - it is whether the change at such a crucial path in the stack would be acceptable to other people. Anyways, I am off. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
* Jamal Hadi Salim [EMAIL PROTECTED] 2006-07-09 11:00 If you mean that the device will also try to grab the qlock there, then that is fine still for the serialization. It all starts at dev_queue_xmit. Look at where dev-queue_lock is taken, whenever a qdisc or filter is added, modified or deleted the lock is taken. Using your approach packets get dropped while such an operation is taking place. Your approach is wrong. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
* jamal [EMAIL PROTECTED] 2006-07-01 09:35 On Sat, 2006-01-07 at 13:28 +0200, Thomas Graf wrote: Please enlighten me, I may be making a wrong assumption again. 1) tc_verd is reset to 0 after dq in ri_tasklet() 2) TC_AT is set to AT_EGRESS in dev_queue_xmit() 3) TC_FROM being derived from TC_AT in tcf_mirred() when redirecting 4) TC_FROM has the value AT_EGRESS when entering ifb_xmit() Let me answer the next bit and it may clear this. It's pretty clear actually, given eth0-ifb0-ifb1 it would look like: dev_queue_xmit(eth0) tcf_mirred - ifb0 dev_queue_xmit(ifb0) tcf_mirred - ifb1 dev_queue_xmit(ifb1) ifb_xmit(ifb1) QUEUE /* Here we queue the packet for the first time and release the stack of tx locks acquired on the way. TC_FROM was never reset up here so this can't possibly prevent any tx deadlocks. However, the !from check is effective later on ... */ ri_tasklet(ifb1) dev_queue_xmit(ifb0) ifb_xmit(ifb0) /* classification disabled */ /* Drop due to !from with input_dev = ifb1 which is good as it prevents to loop the packet back to ifb1 again which I refered to earlier */ Is this how it was intented? using what i said as looping in the case of A-B-C-D-A for the case of egress since that is what you are alluding to; note that ifb0 will be entered twice: First for example the tasklet in ifb0 will emit the packet, and then it will go all the way back to xmit on ifb0. This is where the issue is. Does that clear it? I tried to stay out of A-B-A for now since that's currently broken due to mirred unless the deadlock is intentional, f.e. when setting up eth0-ifb0-eth0 like this: ip link set ifb0 up tc qdisc add dev ifb0 parent root handle 1: prio tc qdisc add dev eth0 parent root handle 1: prio tc filter add dev eth0 parent 1: protocol ip prio 10 u32 match ip protocol 1 0xff flowid 1:1 action mirred egress redirect dev ifb0 tc filter add dev ifb0 parent 1: protocol ip prio 10 u32 match ip protocol 1 0xff flowid 1:1 action mirred egress redirect dev eth0 The following deadlock will ocur: dev_queue_xmit(eth0) tcf_mirred - ifb0 /* acquire tcf_mirred-lock on eth0 */ dev_queue_xmit(ifb0) tcf_mirred - eth0 /* acquire tcf_mirred-lock on ifb0 dev_queue_xmit(eth0) tcf_mirred - ifb0 /* deadlock */ This is assuming that no tx deadlock happens of course. I did try this out just to make sure, the machine just hung. I can't see any code trying to prevent this but this is another discussion as ifb is not involved in this, I think it's purely a problem of mirred. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
On Sat, 2006-08-07 at 10:14 -0400, Jamal Hadi Salim wrote: I have a dated patch to mirred (may not apply cleanly) Sorry forgot to attach the patch. Attached for real this time;- that i believe will fix this specific one. Try to see if it also fixes this case you have. I meant i know this works for eth0-eth0 i am not sure if it will fix your specific case. I need my laptop at the moment ;- If it does it will be an ok solution but not the best (because it introduces an unnecessary check for the common case). cheers, jamal diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c index 4fcccbd..d8946b3 100644 --- a/net/sched/act_mirred.c +++ b/net/sched/act_mirred.c @@ -208,6 +208,12 @@ bad_mirred: skb2-dev = dev; skb2-input_dev = skb-dev; + if (skb2-input_dev == skb2-dev) { + if (net_ratelimit()) + printk( Mirred: Loop detected to %s\n,skb2-dev-name); + goto bad_mirred; + } + dev_queue_xmit(skb2); spin_unlock(p-lock); return p-action;
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
* Thomas Graf [EMAIL PROTECTED] 2006-07-09 01:46 __LINK_STATE_QDISC_RUNNING will prevent an eventual tx deadlock, it will not prevent the mirred deadlock. BTW, it will also not protect you from deadlocking on dev-queue_lock while enqueueing. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
* jamal [EMAIL PROTECTED] 2006-06-30 22:59 On Sat, 2006-01-07 at 01:45 +0200, Thomas Graf wrote: Fact is that the from verdict is set to a meaningful value again at dev_queue_xmit() or ing_filter() so ifb_xmit() only sees valid values. Ok, Thomas this is one of those places where we end up colliding for no good reason - your statement above is accusatory. And sometimes i say 3 things and you pick one and use that as context. The ifb does clear the field. So if you get redirected again, it will be 0. Please enlighten me, I may be making a wrong assumption again. 1) tc_verd is reset to 0 after dq in ri_tasklet() 2) TC_AT is set to AT_EGRESS in dev_queue_xmit() 3) TC_FROM being derived from TC_AT in tcf_mirred() when redirecting 4) TC_FROM has the value AT_EGRESS when entering ifb_xmit() different instances of the same device do not deadlock. If however, you have a series of devices in which A-B-C-D-A then you will have a possible deadlock. In the case of the ifb i dissallowed it because it was obvious from the testing. Correct me if I'm wrong. The skb is queued between each device. When moving skbs from rq to tq the tx lock is acquired, if not available the packet is requeued for later. Due to the queueing the tx lock of the previous device is released. Where exactly is the deadlock? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
* jamal [EMAIL PROTECTED] 2006-06-30 22:23 I am not certain i understood then: Are we in the mode where the refcount is not needed because chances are small that a device will disappear? It seems to me after all this trouble that it may not be so bad to refcount (I guess i meant refcount the device on input to the ifb and decrement on the output). Based on two principles: a) All netdevices a packet is processed through is properly refcounted until the packet is queued for the first time. Given that iif is strictly set to the previous device, a refcnt is certainly taken up to the point where the skb is put into rq. There is one exception to this: The fact that you don't set skb-dev = ifb when reinjecting at ingress will not take a refcnt on the ifb. This sounds like a broken architecture to me but it doesn't matter as you want to prohibit ifb - ifb anyways. b) The netdevice used for xmit via dev_queue_xmit() is already protected by the initial xmit attempt. I can't see reason why to hurt performance by introducing more atomic operations in your fast path. If you have more concerns regarding these patches, feel free to fix your own architecture or propose to drop the patches, I think I've spend enough time on this. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
On Sat, 2006-01-07 at 13:28 +0200, Thomas Graf wrote: * jamal [EMAIL PROTECTED] 2006-06-30 22:59 Please enlighten me, I may be making a wrong assumption again. 1) tc_verd is reset to 0 after dq in ri_tasklet() 2) TC_AT is set to AT_EGRESS in dev_queue_xmit() 3) TC_FROM being derived from TC_AT in tcf_mirred() when redirecting 4) TC_FROM has the value AT_EGRESS when entering ifb_xmit() Let me answer the next bit and it may clear this. different instances of the same device do not deadlock. If however, you have a series of devices in which A-B-C-D-A then you will have a possible deadlock. In the case of the ifb i dissallowed it because it was obvious from the testing. Correct me if I'm wrong. The skb is queued between each device. When moving skbs from rq to tq the tx lock is acquired, if not available the packet is requeued for later. Due to the queueing the tx lock of the previous device is released. Where exactly is the deadlock? using what i said as looping in the case of A-B-C-D-A for the case of egress since that is what you are alluding to; note that ifb0 will be entered twice: First for example the tasklet in ifb0 will emit the packet, and then it will go all the way back to xmit on ifb0. This is where the issue is. Does that clear it? cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
On Sat, 2006-01-07 at 13:51 +0200, Thomas Graf wrote: * jamal [EMAIL PROTECTED] 2006-06-30 22:23 I am not certain i understood then: Are we in the mode where the refcount is not needed because chances are small that a device will disappear? It seems to me after all this trouble that it may not be so bad to refcount (I guess i meant refcount the device on input to the ifb and decrement on the output). Based on two principles: a) All netdevices a packet is processed through is properly refcounted until the packet is queued for the first time. Given that iif is strictly set to the previous device, a refcnt is certainly taken up to the point where the skb is put into rq. yes, but that refcount may be lost by the time it is dequeued and retransmitted. Am i wrong thinking so? There is one exception to this: The fact that you don't set skb-dev = ifb when reinjecting at ingress will not take a refcnt on the ifb. This sounds like a broken architecture to me but it doesn't matter as you want to prohibit ifb - ifb anyways. Ok, Thomas: Please stop critiqueing the architecture non-stop. You will not convince me of anything this way. Lets just focus on this issue and then you can go back and critique all you want. b) The netdevice used for xmit via dev_queue_xmit() is already protected by the initial xmit attempt. Is it guaranteed? if yes, the patch is perfect. I can't see reason why to hurt performance by introducing more atomic operations in your fast path. well, i would be fine with this - with a caveat that nothing really has changed in ifb then, no? i.e the value of the patch in that case would be to convert input_dev to iif, correct? If yes, this is fine by me since as we have discussed the likelihood of this was small. If you have more concerns regarding these patches, feel free to fix your own architecture or propose to drop the patches, I think I've spend enough time on this. Again, you are not being helpful by throwing in side remarks like these. I am being very fair to you when you ask questions on how X works after all assumptions you make - yet i ask questions which i see as valid and you start throwing your hands off in the air. And then Patrick says i am creating endless debates; this is why we go into loops. Just answer the questions: I am trying to understand or you can choose to ignore the emails all together. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
* jamal [EMAIL PROTECTED] 2006-06-29 21:11 Heres what it would look at ingress: step 0: coming from wire via eth0, dev=eth0, input_dev=eth0 step 1: redirect to ifb0, leaving redirect dev=ifb0, input_dev=eth0 step 2: leaving ifb0, coming back to ingress side of stack dev= eth0, input_dev=ifb0 That creates a nice loop on ingress. Upon reentering the stack with skb-dev set to eth0 again we'll go through the same ingress filters as the first time and we'll hit ifb0 again over and over. Are you suggesting everyone has to insert a pass action matching input_dev in order to escape the loop when using ifb? When leaving ifb0 you want for... ... egress: skb-dev=to (eth0) skb-iif=from (ifb0) ... ingress: skb-dev=at (ifb0) skb-iif=from (eth0) Yes, this is correct. I described the flow of the first one in the earlier email and the ingress side. How can it be correct if it differs from your description above? What I described is what the patch changes it to. Looking closer at ifb it contains a race when updating skb-dev. Preempt has to be disabled when updating skb-dev before calling netif_rx() otherwise the device might disappear. [NET]: Use interface index to keep input device information Using the interface index instead of a direct reference allows a safe usage beyond the scope where an interface could disappear. The old input_dev field was incorrectly made dependant on CONFIG_NET_CLS_ACT in skb_copy(). The ifb device is fixed to set skb-dev in a manner that the device can't disappear before calling netif_rx() and the semantics are fixed so a packet reentering the stack looks like it would have been received on the ifb device. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.git/include/linux/skbuff.h === --- net-2.6.git.orig/include/linux/skbuff.h +++ net-2.6.git/include/linux/skbuff.h @@ -181,7 +181,6 @@ enum { * @sk: Socket we are owned by * @tstamp: Time we arrived * @dev: Device we arrived on/are leaving by - * @input_dev: Device we arrived on * @h: Transport layer header * @nh: Network layer header * @mac: Link layer header @@ -192,6 +191,7 @@ enum { * @data_len: Data length * @mac_len: Length of link layer header * @csum: Checksum + * @iif: Device we arrived on * @local_df: allow local fragmentation * @cloned: Head may be cloned (check refcnt to be sure) * @nohdr: Payload reference only, must not modify header @@ -228,7 +228,6 @@ struct sk_buff { struct sock *sk; struct skb_timeval tstamp; struct net_device *dev; - struct net_device *input_dev; union { struct tcphdr *th; @@ -266,6 +265,7 @@ struct sk_buff { data_len, mac_len, csum; + int iif; __u32 priority; __u8local_df:1, cloned:1, Index: net-2.6.git/include/net/pkt_cls.h === --- net-2.6.git.orig/include/net/pkt_cls.h +++ net-2.6.git/include/net/pkt_cls.h @@ -352,14 +352,19 @@ tcf_change_indev(struct tcf_proto *tp, c static inline int tcf_match_indev(struct sk_buff *skb, char *indev) { + int ret = 1; + if (indev[0]) { - if (!skb-input_dev) - return 0; - if (strcmp(indev, skb-input_dev-name)) + struct net_device *dev; + + dev = dev_get_by_index(skb-iif); + if (!dev) return 0; + ret = !strcmp(indev, dev-name); + dev_put(dev); } - return 1; + return ret; } #endif /* CONFIG_NET_CLS_IND */ Index: net-2.6.git/net/core/dev.c === --- net-2.6.git.orig/net/core/dev.c +++ net-2.6.git/net/core/dev.c @@ -1715,8 +1715,8 @@ static int ing_filter(struct sk_buff *sk if (dev-qdisc_ingress) { __u32 ttl = (__u32) G_TC_RTTL(skb-tc_verd); if (MAX_RED_LOOP ttl++) { - printk(Redir loop detected Dropping packet (%s-%s)\n, - skb-input_dev-name, skb-dev-name); + printk(Redir loop detected Dropping packet (%d-%s)\n, + skb-iif, skb-dev-name); return TC_ACT_SHOT; } @@ -1749,8 +1749,8 @@ int netif_receive_skb(struct sk_buff *sk if (!skb-tstamp.off_sec) net_timestamp(skb); - if (!skb-input_dev) - skb-input_dev = skb-dev; + if (!skb-iif) + skb-iif = skb-dev-ifindex; orig_dev = skb_bond(skb); Index: net-2.6.git/net/core/skbuff.c
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
Thomas Graf a écrit : * Nicolas Dichtel [EMAIL PROTECTED] 2006-06-30 15:16 That creates a nice loop on ingress. Upon reentering the stack with skb-dev set to eth0 again we'll go through the same ingress filters as the first time and we'll hit ifb0 again over and over. Are you suggesting everyone has to insert a pass action matching input_dev in order to escape the loop when using ifb? Bit 8 of skb-tc_verd is set by IFB, so packet isn't reclassify. This bit avoid the loop. Right, my mistake. Just making classification impossible after going through an ifb device certainly seems like a perfect idea, nice! Classification has already be done on the first pass. Mirror action must be the last classifier. Nicolas - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
On Fri, 2006-30-06 at 15:08 +0200, Thomas Graf wrote: * jamal [EMAIL PROTECTED] 2006-06-29 21:11 Heres what it would look at ingress: step 0: coming from wire via eth0, dev=eth0, input_dev=eth0 step 1: redirect to ifb0, leaving redirect dev=ifb0, input_dev=eth0 step 2: leaving ifb0, coming back to ingress side of stack dev= eth0, input_dev=ifb0 That creates a nice loop on ingress. Upon reentering the stack with skb-dev set to eth0 again we'll go through the same ingress filters as the first time and we'll hit ifb0 again over and over. loops are taken care of by other metadata. Are you suggesting everyone has to insert a pass action matching input_dev in order to escape the loop when using ifb? This works, there are no loops. If you use a meta setter and changed input_dev to something that creates a loop it will still be caught when ttls expire. When leaving ifb0 you want for... ... egress: skb-dev=to (eth0) skb-iif=from (ifb0) ... ingress: skb-dev=at (ifb0) skb-iif=from (eth0) Yes, this is correct. I described the flow of the first one in the earlier email and the ingress side. How can it be correct if it differs from your description above? What I described is what the patch changes it to. Double check again: it works as described above; your change messes it. Looking closer at ifb it contains a race when updating skb-dev. Preempt has to be disabled when updating skb-dev before calling netif_rx() otherwise the device might disappear. I am going to ignore the patch until we resolve the issue of iif vs input_dev. Why dont we discuss that? cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
On Fri, 2006-30-06 at 16:15 +0200, Thomas Graf wrote: * jamal [EMAIL PROTECTED] 2006-06-30 09:57 On Fri, 2006-30-06 at 15:08 +0200, Thomas Graf wrote: That creates a nice loop on ingress. Upon reentering the stack with skb-dev set to eth0 again we'll go through the same ingress filters as the first time and we'll hit ifb0 again over and over. loops are taken care of by other metadata. Not really, assuming a simple setup such as: mirred attached to eth0 redirecting to ifb0 mirred attached to ifb0 redirecting to ifb1 will result in: eth0::tcf_mirred() skb-dev = ifb0, input_dev = eth0 ifb0::tcf_mirred() skb-dev = ifb1, input_dev = ifb0 ifb1::ifb_xmit() skb-dev = ifb0, input_dev = ifb1, set ncls bit ifb0::ifb_xmit() skb-dev = ifb1, input_dev = ifb0 ifb1::ifb_xmit() skb-dev = ifb0, input_dev = ifb1 ... Oh dear... and we don't even have a ttl to catch this. Did you actually try to run this before you reached this conclusion? I am going to ignore the patch until we resolve the issue of iif vs input_dev. Why dont we discuss that? It's starting to get useless to discuss with you. sigh. ok, why dont you take a deep breath or a break because i think we are not going to make any progress this way. I know you may be feeling frustrated but i am equally if not more frustrated as well. Our conflict comes in the fact that you are trying to change the architecture in the name of fixing bugs. With all due respect, the architecture works. I have invested many many hours testing and verifying. There may be coding bugs - and those need fixing. Kill the bugs. You agreed in your last posting that the 3rd option, being that not caring about whether a device might disappear but having a way to check for it, is what we agreed on and what makes most sense, yes, i recalled that as the last discussion we had. yet you fail to see that using ifindex is exactly what reflects this descision. The choice is between using an ifindex and a pointer to the device. Why is it solvable via an ifindex but not a device pointer? I may have misunderstood you, so look at this as a fresh opportunity to enlighten me. Forget about all the discussion weve had thus far. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
* jamal [EMAIL PROTECTED] 2006-06-30 10:35 On Fri, 2006-30-06 at 16:15 +0200, Thomas Graf wrote: eth0::tcf_mirred() skb-dev = ifb0, input_dev = eth0 ifb0::tcf_mirred() skb-dev = ifb1, input_dev = ifb0 ifb1::ifb_xmit() skb-dev = ifb0, input_dev = ifb1, set ncls bit ifb0::ifb_xmit() skb-dev = ifb1, input_dev = ifb0 ifb1::ifb_xmit() skb-dev = ifb0, input_dev = ifb1 ... Oh dear... and we don't even have a ttl to catch this. Did you actually try to run this before you reached this conclusion? I did, fortunately some other bug prevents this from happening, packets are simply dropped somewhere. With all due respect, the architecture works. I have invested many many hours testing and verifying. There may be coding bugs - and those need fixing. Kill the bugs. Right, just run this tc filter add dev eth0 parent 1: protocol ip prio 10 u32 match ip tos 0 0 flowid 1:1 action mirred egress redirect dev ifb1 tc filter add dev ifb1 parent 1: protocol ip prio 10 u32 match ip tos 0 0 flowid 1:1 action mirred egress redirect dev ifb0 Anyways, I give up. Last time I've been running after you trying to fix the many bugs you leave behind. Ever noticed that whenever you add some new code it's someone else following up with tons of small bugfix patches having a hard time trying to figure out the actual intent. I'll just duplicate the code for my purpose, so much easier. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
Nicolas Dichtel wrote: Bit 8 of skb-tc_verd is set by IFB, so packet isn't reclassify. This bit avoid the loop. It would, if something would actually set it. ~/src/kernel/linux-2.6$ grep NCLS -r net/ net/core/dev.c: if (skb-tc_verd TC_NCLS) { net/core/dev.c: skb-tc_verd = CLR_TC_NCLS(skb-tc_verd); net/sched/act_api.c:if (skb-tc_verd TC_NCLS) { net/sched/act_api.c:skb-tc_verd = CLR_TC_NCLS(skb-tc_verd); net/sched/act_api.c:D2PRINTK((%p)tcf_action_exec: cleared TC_NCLS in %s out %s\n, ~/src/kernel/linux-2.6$ Am I missing something? Jamal, where is this bit supposed to be set? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
Patrick McHardy wrote: Nicolas Dichtel wrote: Bit 8 of skb-tc_verd is set by IFB, so packet isn't reclassify. This bit avoid the loop. It would, if something would actually set it. ~/src/kernel/linux-2.6$ grep NCLS -r net/ net/core/dev.c: if (skb-tc_verd TC_NCLS) { net/core/dev.c: skb-tc_verd = CLR_TC_NCLS(skb-tc_verd); net/sched/act_api.c:if (skb-tc_verd TC_NCLS) { net/sched/act_api.c:skb-tc_verd = CLR_TC_NCLS(skb-tc_verd); net/sched/act_api.c:D2PRINTK((%p)tcf_action_exec: cleared TC_NCLS in %s out %s\n, ~/src/kernel/linux-2.6$ Am I missing something? Jamal, where is this bit supposed to be set? OK, I'm apparently missing the ability to read :) Please disregard .. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
On Fri, 2006-30-06 at 18:32 +0200, Thomas Graf wrote: * jamal [EMAIL PROTECTED] 2006-06-30 10:35 Did you actually try to run this before you reached this conclusion? I did, fortunately some other bug prevents this from happening, packets are simply dropped somewhere. It is not a bug, Thomas! I am getting a little frustrated now. The packets will be dropped because we set the at field to zero which is invalid. That is done on purpose. It is only meaningful for ifb. The challenge is much bigger than it appears. You could end up deadlocking on the tx lock. So this was the choice i had to make. With all due respect, the architecture works. I have invested many many hours testing and verifying. There may be coding bugs - and those need fixing. Kill the bugs. Right, just run this tc filter add dev eth0 parent 1: protocol ip prio 10 u32 match ip tos 0 0 flowid 1:1 action mirred egress redirect dev ifb1 tc filter add dev ifb1 parent 1: protocol ip prio 10 u32 match ip tos 0 0 flowid 1:1 action mirred egress redirect dev ifb0 Anyways, I give up. I understood you when you first posted. This is part of my testing. Try also to redirect from the same device eth0 to eth0 etc and see some more interesting things. Last time I've been running after you trying to fix the many bugs you leave behind. Ever noticed that whenever you add some new code it's someone else following up with tons of small bugfix patches having a hard time trying to figure out the actual intent. You could always ask instead of making assumptions. And dont get me wrong, I honestly do appreciate you fixing bugs. You have been great and i have always thanked you - except in situations like this where you just stress the hell out of me. Bugs will always be there. Even God has bugs - but you are not fixing bugs in this case. You are attempting to change architecture (which works just fine) in the way you think it should work - and then point to something as a bug because it doesnt work the way you think it should work. This is a problem not just with you BTW, but with Patrick as well (although he has gotten better lately). There is a huge difference for example when dealing with Herbert. My approach in situations like this, which you dont have to follow, is to ask first what the intent was then if i dont like the intent try to convince the owner that there maybe better ways. Or why they are wrong. To make my case, look at what you just did above in just the last 2 emails: You made a claim there is a bug. I asked you if you had really tested what you are pointing to (because i know i test for that). You come back and make claims the bug is elsewhere. This could go on forever typically - and infact it is throughout this thread. Didnt ask if this is perhaps the way it is supposed to work or what the intent was. It has to be a bug and by golly you are fixing it. Put yourself in my shoes. I'll just duplicate the code for my purpose, so much easier. Well, I am sorry you feel that way. I dont even remember where it is that we started. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
Thomas Graf wrote: * Thomas Graf [EMAIL PROTECTED] 2006-06-30 18:32 Anyways, I give up. Last time I've been running after you trying to fix the many bugs you leave behind. Ever noticed that whenever you add some new code it's someone else following up with tons of small bugfix patches having a hard time trying to figure out the actual intent. I'll just duplicate the code for my purpose, so much easier. There you go, leaves ifb broken as-is, at least prevents it from crashing randomly when the input_dev disappears. That would be a pity. After this patch has been ACKed, could we start over with the other bugs one at a time? I wasn't really able to gather the problems from this thread, but some of the behaviour you mentioned does seem questionable. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
Thomas Graf wrote: * Thomas Graf [EMAIL PROTECTED] 2006-06-30 18:32 Anyways, I give up. Last time I've been running after you trying to fix the many bugs you leave behind. Ever noticed that whenever you add some new code it's someone else following up with tons of small bugfix patches having a hard time trying to figure out the actual intent. I'll just duplicate the code for my purpose, so much easier. There you go, leaves ifb broken as-is, at least prevents it from crashing randomly when the input_dev disappears. [NET]: Use interface index to keep input device information Using the interface index instead of a direct reference allows a safe usage beyond the scope where an interface could disappear. The old input_dev field was incorrectly made dependant on CONFIG_NET_CLS_ACT in skb_copy(). Signed-off-by: Thomas Graf [EMAIL PROTECTED] === --- net-2.6.git.orig/drivers/net/ifb.c +++ net-2.6.git/drivers/net/ifb.c @@ -158,19 +158,23 @@ static int ifb_xmit(struct sk_buff *skb, stats-tx_packets++; stats-tx_bytes+=skb-len; - if (!from || !skb-input_dev) { + if (!from || !skb-iif) { Since iif of 0 is valid (afaik), this check is now bogus, eh? Ben -- Ben Greear [EMAIL PROTECTED] Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
On Fri, 2006-30-06 at 19:18 +0200, Patrick McHardy wrote: Thomas Graf wrote: * Thomas Graf [EMAIL PROTECTED] 2006-06-30 18:32 Anyways, I give up. Last time I've been running after you trying to fix the many bugs you leave behind. Ever noticed that whenever you add some new code it's someone else following up with tons of small bugfix patches having a hard time trying to figure out the actual intent. I'll just duplicate the code for my purpose, so much easier. There you go, leaves ifb broken as-is, at least prevents it from crashing randomly when the input_dev disappears. That would be a pity. After this patch has been ACKed, could we start over with the other bugs one at a time? I wasn't really able to gather the problems from this thread, but some of the behaviour you mentioned does seem questionable. I will summarize what the outstanding issues are, the rest of the bugs just ignore otherwise the discussion is a waste of time and may get out of control. 1) ifb references skb-input_dev 2) mirred sets the skb-input_dev which is used in #1 It is possible that when #1 happens infact input_dev is gone because no ref count is incremented. Ok, Thomas is that sufficient to discuss the crux of the matter? At one point many months ago, the logic was for now the likelihood this will happen is low but we need to cover for by at least figuring the existence of input_dev when referencing it. Thomas makes the claim, this can be achieved only by using an ifindex. And i havent been able to see how. I have a small performance problem if i just use ifindex. Using ifindex will eventually save 32 bits on the 64 bit machines. I posed the question as to which was more beneficial as a solution that hasnt been addressed. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
* jamal [EMAIL PROTECTED] 2006-06-30 13:19 On Fri, 2006-30-06 at 18:32 +0200, Thomas Graf wrote: * jamal [EMAIL PROTECTED] 2006-06-30 10:35 Did you actually try to run this before you reached this conclusion? I did, fortunately some other bug prevents this from happening, packets are simply dropped somewhere. It is not a bug, Thomas! I am getting a little frustrated now. The packets will be dropped because we set the at field to zero which is invalid. That is done on purpose. It is only meaningful for ifb. The challenge is much bigger than it appears. You could end up deadlocking on the tx lock. So this was the choice i had to make. Please explain, tc_verd is reset in the tasklet after dequeueing and set again in dev_queue_xmit(). ifb_xmit will always see a valid tc_verd at egress. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
jamal wrote: On Fri, 2006-30-06 at 19:18 +0200, Patrick McHardy wrote: Thomas Graf wrote: * Thomas Graf [EMAIL PROTECTED] 2006-06-30 18:32 Anyways, I give up. Last time I've been running after you trying to fix the many bugs you leave behind. Ever noticed that whenever you add some new code it's someone else following up with tons of small bugfix patches having a hard time trying to figure out the actual intent. I'll just duplicate the code for my purpose, so much easier. There you go, leaves ifb broken as-is, at least prevents it from crashing randomly when the input_dev disappears. That would be a pity. After this patch has been ACKed, could we start over with the other bugs one at a time? I wasn't really able to gather the problems from this thread, but some of the behaviour you mentioned does seem questionable. I will summarize what the outstanding issues are, the rest of the bugs just ignore otherwise the discussion is a waste of time and may get out of control. 1) ifb references skb-input_dev That should be fixed by this patch. 2) mirred sets the skb-input_dev which is used in #1 I think Thomas was more complaining about the values it is set to and the fact that only a single redirection is possible for each packet, no? It is possible that when #1 happens infact input_dev is gone because no ref count is incremented. Ok, Thomas is that sufficient to discuss the crux of the matter? At one point many months ago, the logic was for now the likelihood this will happen is low but we need to cover for by at least figuring the existence of input_dev when referencing it. Thomas makes the claim, this can be achieved only by using an ifindex. And i havent been able to see how. I have a small performance problem if i just use ifindex. Using ifindex will eventually save 32 bits on the 64 bit machines. I posed the question as to which was more beneficial as a solution that hasnt been addressed. Are we still talking about this? Its easy: a pointer without taking a reference can become stale, with ifindex you do the lookup right when using it, so you either get an result or you don't. It also saves atomic operations for anyone _not_ using it, which is that vast majority of users. So the patch clearly makes sense to me. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
On Fri, 2006-30-06 at 19:42 +0200, Thomas Graf wrote: * jamal [EMAIL PROTECTED] 2006-06-30 13:32 Thomas makes the claim, this can be achieved only by using an ifindex. And i havent been able to see how. I have a small performance problem if i just use ifindex. Using ifindex will eventually save 32 bits on the 64 bit machines. I posed the question as to which was more beneficial as a solution that hasnt been addressed. I'd appreciate if you'd stop sperading lies. I claimed it to be the best solution, not the only one. Everyone agrees that it's possible to use a pointer and take a reference in netif_receive_skb() while it also seems obvious to most that it's not a good idea to take two additional atomic operations in the fast path for this purpose. I thought we went past that point already - and i made it clear that the reference is _not_ taken in netif_receive_skb(). So assuming it is taken in mirred (i havent thought of where it is decremented), why would using the ifindex be better? cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
On Fri, 2006-30-06 at 19:42 +0200, Patrick McHardy wrote: jamal wrote: Thomas makes the claim, this can be achieved only by using an ifindex. And i havent been able to see how. I have a small performance problem if i just use ifindex. Using ifindex will eventually save 32 bits on the 64 bit machines. I posed the question as to which was more beneficial as a solution that hasnt been addressed. Are we still talking about this? Its easy: a pointer without taking a reference can become stale, I should have been clear: reference gets taken in mirred. with ifindex you do the lookup right when using it, so you either get an result or you don't. My arguement was: dev get by index per device will involve a a) lookup + b) incrementing the refcount. if i use the dev pointer in that path then it becomes only #b in both cases, you need to increment the counter and then somewhere decrement it. cheers, jamal It also saves atomic operations for anyone _not_ using it, which is that vast majority of users. So the patch clearly makes sense to me. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
On Fri, 2006-30-06 at 19:33 +0200, Patrick McHardy wrote: jamal wrote: You are attempting to change architecture (which works just fine) in the way you think it should work - and then point to something as a bug because it doesnt work the way you think it should work. This is a problem not just with you BTW, but with Patrick as well (although he has gotten better lately). There is a huge difference for example when dealing with Herbert. My approach in situations like this, which you dont have to follow, is to ask first what the intent was then if i dont like the intent try to convince the owner that there maybe better ways. Or why they are wrong. Well, I thought I stay out of this, but since you mention me .. I think you will add value to the discussion ;- Regardless, we need to settle these kind of issues so we can work better together. A while back i said was going to bring some of these issues at the next netconf - but since the discussion is happening already, lets do it here. Maybe we should take this discussion privately? I also had the feeling it has gotten easier working with you lately, The feeling is mutual. but I can understand Thomas's pain, I had the same thoughts more than once. Your code often does have an enormous amount of bugs and whitespace and other stylistic problems and working with you can be challenging for multiple reasons, for example having to go to endless, One thing is clear in my mind at least (and i have said it several times): I am not as good at the semantics as either yourself or Thomas or Dave or Acme etc but i have tons of other things that compensate for. Probably not as good is not the best description - rather my brain cells put less respect on the stylistic commas than they do on the message. Perhaps it's dylexsia or me trying to subconsciouly maximize my time. If you push me hard, however, i will get close to do just as a good job as you;- Still, cant live without you guys! I would describe the majority of the fixes you have sent against me to fall into the stylistic category and into fixing TheLinuxWay (i.e same bug in multiple places and files). This is not to say i am not at fault, or there havent been issues which made me wonder, but we may have different metrics of what bugs i would call my mom and say Mom, I just fixed enormous amount of bugs. [As an example if you went hunting around the kernel, you will find a lot of things that dont totally conform to lindent rules. You could literally send 100s of patches]. Perhaps it is the language usage that puts us at odds at times. often entirely pointless discussions for obvious fixes, You may see it that way - I dont and so when that happens it is not by any means to engage in a meaningless debate. The discussions are typically of the same nature as i just had with Thomas (just a simple one line change in ifb which looked very obvious). The way it goes is that there is a certain assumption made, and a solution is suggested in the form of a patch. Accepting such a patch (which i once in a while dont even get to see until it is in) means a lot of the other assumptions get invalidated and is followed by tons of bug fixes. This has happened a few times to me. And once or twice i have bitched because i ended having to support some user for days with the wrong view. especially if you're already pissed by noticing 50 bugs at once. I dont get agitated by 100 patches which fix stylistic issues or real bugs - although i would have preferred to see the non-obvious first before they go in just in case or at least CCed on submission. Where i get irritated, depending on how my day is going: when i see even a single patch laded with implicit insults. It comes out to me as unnecessary arrogance and immaturity. It may not be intentional use of language, but thats how it comes out. The reason why you might be able to work better with Herbert is IMO that he usually doesn't touch what you seem to feel is your area. I think it's the maturity approach perhaps. With Herbert, it ends up being about solving the problem; there are always moments of tension and what may appear as endless discussion (look at the thread on qdisc_is_running), but in the end it doesnt turn out into a high school debate to prove one is better than the other. So maybe it is the mutual respect and maturity in the discussion. In regards to areas, you may be right; i cross more into Herberts path than he does mine. My reaction in this may be related to the way i treat other people and my expectations: When i submit patches to i would make sure i cc people who i think have stakes or better insight in that area even if they seem irrelevant. In-fact i may have even private conversations with them first if they are large patches. As an example i would make sure i cc you if i had some netfilter issues or Herbert when i submit ipsec patches and we would have a gazillion discussions which would
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
* jamal [EMAIL PROTECTED] 2006-06-30 15:34 I thought we went past that point already - and i made it clear that the reference is _not_ taken in netif_receive_skb(). So assuming it is taken in mirred (i havent thought of where it is decremented), why would using the ifindex be better? The issue exists regardless of mirred/ifb. As soon as the packet is queued for the first time we leave netif_receive_skb() and the dev reference is dropped. Therefore in order to allow functionality like tcf_match_indev() at egress we have to either take a reference or ensure that we can catch the unlikely case of the device having disappeared. I think everyone would agree to use device pointers if only mirred would acquire it. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
From: jamal [EMAIL PROTECTED] Date: Fri, 30 Jun 2006 15:40:26 -0400 My arguement was: dev get by index per device will involve a a) lookup + b) incrementing the refcount. if i use the dev pointer in that path then it becomes only #b in both cases, you need to increment the counter and then somewhere decrement it. The position is that -input_dev usage is not fast path. It may be a fast path in your mirred and ifb devices, but for the rest of the networking and %99 of users it is totally unused. We have a lot of bits of state that sit in the sk_buff but which are used by a tiny minority, yet the space consumption is eaten by everyone. Maybe when the IFB and mirred are in more widespread use we can investigate a different approach. But at this time any change that makes fringe sk_buff state objects shrink or disappear will be seriously considered. Another part of this issue is that if you use a pointer there is no clean place to clean up the input_dev reference within the scope it is actually used. The only reliable place is kfree_skb() which is far beyond the scope that this -input_dev thing is used. We have a lot of problems in some parts of the networking because object references are held for too long. One example is all the awful side effects of the way the bridge netfilter code holds onto object references for long periods of time in the netfilter call chains. And we know the refcounting is broken. And one way we know to fix the refcounting without incurring the cost upon everyone is to use an interface index and only make the input_dev users eat the atomic operation incurred by grabbing the reference. And even for the input_dev users, it isn't such a big deal, there are other costs already associated with hitting these actions and devices that use input_dev, the atomic reference grab there should be lost in the noise I think. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
* jamal [EMAIL PROTECTED] 2006-06-30 15:59 One thing is clear in my mind at least (and i have said it several times): I am not as good at the semantics as either yourself or Thomas or Dave or Acme etc but i have tons of other things that compensate for. Probably not as good is not the best description - rather my brain cells put less respect on the stylistic commas than they do on the message. Perhaps it's dylexsia or me trying to subconsciouly maximize my time. If you push me hard, however, i will get close to do just as a good job as you;- Still, cant live without you guys! Lack of stylistic commas is less of a concern, rather the lack of comments and notes where absolutely essential. You explained to me that stacking ifb devices is not supported in order to avoid tx lock deadlocks. There is not a single note or hint anywhere that would remotely imply this. When questioned about things like that you often refer to notes you have somewhere on paper, promising to get back once looked up, eventually that happens, in this thread it didn't. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
From: Thomas Graf [EMAIL PROTECTED] Date: Fri, 30 Jun 2006 22:30:34 +0200 Lack of stylistic commas is less of a concern, rather the lack of comments and notes where absolutely essential. Yes, intent is very important to document when it is not obvious. Everyone is guilty of not doing this from time to time. :) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
On Fri, 2006-30-06 at 22:08 +0200, Thomas Graf wrote: * jamal [EMAIL PROTECTED] 2006-06-30 15:34 So assuming it is taken in mirred (i havent thought of where it is decremented), why would using the ifindex be better? The issue exists regardless of mirred/ifb. As soon as the packet is queued for the first time we leave netif_receive_skb() and the dev reference is dropped. Therefore in order to allow functionality like tcf_match_indev() at egress we have to either take a reference or ensure that we can catch the unlikely case of the device having disappeared. I think everyone would agree to use device pointers if only mirred would acquire it. Thomas, this is certainly more reasonable explanation. On Fri, 2006-30-06 at 13:17 -0700, David Miller wrote: We have a lot of bits of state that sit in the sk_buff but which are used by a tiny minority, yet the space consumption is eaten by everyone. Maybe when the IFB and mirred are in more widespread use we can investigate a different approach. Not unreasonable. So saving the bits for the majority trumps the need for slightly more cycles in mirred/ifb until they get more widespread. Another part of this issue is that if you use a pointer there is no clean place to clean up the input_dev reference within the scope it is actually used. The only reliable place is kfree_skb() which is far beyond the scope that this -input_dev thing is used. And i cant think of a good place to do it. We have a lot of problems in some parts of the networking because object references are held for too long. One example is all the awful side effects of the way the bridge netfilter code holds onto object references for long periods of time in the netfilter call chains. And we know the refcounting is broken. And one way we know to fix the refcounting without incurring the cost upon everyone is to use an interface index and only make the input_dev users eat the atomic operation incurred by grabbing the reference. And even for the input_dev users, it isn't such a big deal, there are other costs already associated with hitting these actions and devices that use input_dev, the atomic reference grab there should be lost in the noise I think. Alright guys - lets get the change in. I wish we had this specific discussion sooner. Thomas, can you post your latest incarnation so i can mow comment more peacefully? ;- cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
On Fri, 2006-30-06 at 13:33 -0700, David Miller wrote: From: Thomas Graf [EMAIL PROTECTED] Date: Fri, 30 Jun 2006 22:30:34 +0200 Lack of stylistic commas is less of a concern, rather the lack of comments and notes where absolutely essential. Yes, intent is very important to document when it is not obvious. I agree - and i try hard to document but at times there's too much and a line needs to be drawn. As an example: the eth-ifb-ifb case though is a very corner case. All the IMQ types need to only redirect to one ifb; while i test it ive always seen the test as more of a boundary check than a useful setup. And Thomas: I do keep notebooks ;- I doodle about everything, they dont all fit in my knapsack anymore so sometimes when i say i will go and refer to my notes, it is real. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
* jamal [EMAIL PROTECTED] 2006-06-30 16:54 I agree - and i try hard to document but at times there's too much and a line needs to be drawn. As an example: the eth-ifb-ifb case though is a very corner case. All the IMQ types need to only redirect to one ifb; while i test it ive always seen the test as more of a boundary check than a useful setup. Maybe you should just start by explaining why it doesn't work, your original explanation doesn't make much sense. And Thomas: I do keep notebooks ;- I doodle about everything, they dont all fit in my knapsack anymore so sometimes when i say i will go and refer to my notes, it is real. It's plain arrogant to rely code understanding on information that is not available to others. It makes everyone being dependant on you, very nice. To put it in another way, if you would do your job right when writing the code, contacting or CCing you upon changes would be only of a formal matter since the code is understandable and the intent is clear or documented. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
Ok, I think found the last patch you posted, comments below (I have to run off soon): On Fri, 2006-30-06 at 19:13 +0200, Thomas Graf wrote: There you go, leaves ifb broken as-is, at least prevents it from crashing randomly when the input_dev disappears. I hope the above comment does show up in the logs ;- [NET]: Use interface index to keep input device information === --- net-2.6.git.orig/include/net/pkt_cls.h +++ net-2.6.git/include/net/pkt_cls.h @@ -352,14 +352,19 @@ tcf_change_indev(struct tcf_proto *tp, c static inline int tcf_match_indev(struct sk_buff *skb, char *indev) { + int ret = 1; + if (indev[0]) { - if (!skb-input_dev) - return 0; - if (strcmp(indev, skb-input_dev-name)) + struct net_device *dev; + + dev = dev_get_by_index(skb-iif); + if (!dev) return 0; + ret = !strcmp(indev, dev-name); + dev_put(dev); } [..] Index: net-2.6.git/drivers/net/ifb.c === --- net-2.6.git.orig/drivers/net/ifb.c +++ net-2.6.git/drivers/net/ifb.c @@ -158,19 +158,23 @@ static int ifb_xmit(struct sk_buff *skb, stats-tx_packets++; stats-tx_bytes+=skb-len; - if (!from || !skb-input_dev) { + if (!from || !skb-iif) { Can you have something similar to what you did in tcf_match_indev above? i.e grab dev by skb-iif so as to increment refcount - if it doesnt exist it becomes equivalent to !skb-input_dev and if it exists you drop the ref inside the else below after changing input_dev dev_kfree_skb(skb); stats-rx_dropped++; return ret; } else { + struct net_device *iif; /* * note we could be going * ingress - egress or * egress - ingress */ - skb-dev = skb-input_dev; - skb-input_dev = dev; + iif = __dev_get_by_index(skb-iif); + if (!iif) + goto dropped; + skb-dev = iif; + skb-iif = dev-ifindex; if (from AT_INGRESS) { skb_pull(skb, skb-dev-hard_header_len); } else { - cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
* jamal [EMAIL PROTECTED] 2006-06-30 17:09 On Fri, 2006-30-06 at 19:13 +0200, Thomas Graf wrote: Index: net-2.6.git/drivers/net/ifb.c === --- net-2.6.git.orig/drivers/net/ifb.c +++ net-2.6.git/drivers/net/ifb.c @@ -158,19 +158,23 @@ static int ifb_xmit(struct sk_buff *skb, stats-tx_packets++; stats-tx_bytes+=skb-len; - if (!from || !skb-input_dev) { + if (!from || !skb-iif) { Can you have something similar to what you did in tcf_match_indev above? i.e grab dev by skb-iif so as to increment refcount - if it doesnt exist it becomes equivalent to !skb-input_dev and if it exists you drop the ref inside the else below after changing input_dev A non-existant iif is already equivalent to !iif since it jumps into the same branch. Grabing a reference is completely pointless, the netdevice represented by skb-iif is at this point until the packet gets queued covered by a reference taken in netif_rx(). - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
On Fri, 2006-30-06 at 23:10 +0200, Thomas Graf wrote: * jamal [EMAIL PROTECTED] 2006-06-30 16:54 I agree - and i try hard to document but at times there's too much and a line needs to be drawn. As an example: the eth-ifb-ifb case though is a very corner case. All the IMQ types need to only redirect to one ifb; while i test it ive always seen the test as more of a boundary check than a useful setup. Maybe you should just start by explaining why it doesn't work, your original explanation doesn't make much sense. Better to explain the reason for ifb first: ifb exists initially as a replacement for IMQ. 1) qdiscs/policies that are per device as opposed to system wide. This now allows for sharing. 2) Allows for queueing incoming traffic for shaping instead of dropping. In other wise, the main use is for multiple devices to redirect to it. Main desire is not for it to redirect to any other ifb device or eth devices. I actually tried to get it to do that, but run into issues of complexity and and came up with decision to drop instead of killing the machine. Other than that, it can redirect to any other devices - but may still not be meaningful. I have been thinking of Herberts change of qdisc_is_running and this may help actually. And Thomas: I do keep notebooks ;- I doodle about everything, they dont all fit in my knapsack anymore so sometimes when i say i will go and refer to my notes, it is real. It's plain arrogant to rely code understanding on information that is not available to others. It makes everyone being dependant on you, very nice. I try hard to document more than many many people. I give tutorials, I write papers when time allows, I spend time responding to emails. And i do it all for the love of the game. But lets look at the other side of the coin: Dont you think it is polite to ask when something is not clear instead of making assumptions? To put it in another way, if you would do your job right when writing the code, contacting or CCing you upon changes would be only of a formal matter since the code is understandable and the intent is clear or documented. I wish I could achieve that. I dont think theres any piece of code that reaches those standards in the kernel actually. And if theres a piece of code that achieves that it should be emulated. Most of the times, things that are hard are not about the code but about issues like algorithmics, embedded policies etc (I think you are saying the same). cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
This is boring, I reversed everything to not change any semantics and you still complain. * jamal [EMAIL PROTECTED] 2006-06-30 17:35 On Fri, 2006-30-06 at 23:20 +0200, Thomas Graf wrote: * jamal [EMAIL PROTECTED] 2006-06-30 17:09 the ref inside the else below after changing input_dev A non-existant iif is already equivalent to !iif since it jumps into the same branch. I thought 0 was a valid iif as Ben G was pointing - if it is not, you are right. No, 1 ist the first valid ifindex, see dev_new_index(). Grabing a reference is completely pointless, the netdevice represented by skb-iif is at this point until the packet gets queued covered by a reference taken in netif_rx(). You have to consider that this could happen at both ingress and egress. Just think for a second, do you expect the device the packet will be leaving at to disappear? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
* jamal [EMAIL PROTECTED] 2006-06-30 17:31 Better to explain the reason for ifb first: ifb exists initially as a replacement for IMQ. 1) qdiscs/policies that are per device as opposed to system wide. This now allows for sharing. 2) Allows for queueing incoming traffic for shaping instead of dropping. In other wise, the main use is for multiple devices to redirect to it. Main desire is not for it to redirect to any other ifb device or eth devices. I actually tried to get it to do that, but run into issues of complexity and and came up with decision to drop instead of killing the machine. Other than that, it can redirect to any other devices - but may still not be meaningful. Last time I'm asking. Why are packets dropped? You mentioned tc_verd is set to 0 leading to a invalid from verdict. Fact is that the from verdict is set to a meaningful value again at dev_queue_xmit() or ing_filter() so ifb_xmit() only sees valid values. tx locks of individual ifb devices are independant, why would it deadlock? Where is the packet exactly dropped? I have been thinking of Herberts change of qdisc_is_running and this may help actually. Help on what? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
On Sat, 2006-01-07 at 01:22 +0200, Thomas Graf wrote: This is boring, I reversed everything to not change any semantics and you still complain. You reversed a bug that you had introduced. Do you want me to review this patch or not now? Be a little reasonable please. * jamal [EMAIL PROTECTED] 2006-06-30 17:35 Grabing a reference is completely pointless, the netdevice represented by skb-iif is at this point until the packet gets queued covered by a reference taken in netif_rx(). You have to consider that this could happen at both ingress and egress. Just think for a second, do you expect the device the packet will be leaving at to disappear? I am not certain i understood then: Are we in the mode where the refcount is not needed because chances are small that a device will disappear? It seems to me after all this trouble that it may not be so bad to refcount (I guess i meant refcount the device on input to the ifb and decrement on the output). As a note - and i am sure you know this: The packet comes to the ifb and gets queued. It gets dequeued at a later time and is sent off either to the ingress or egress. At any point during the enq/deq the other device being referenced could disappear. This is a lesser sin on ingress than it is on egress. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
jamal wrote: On Fri, 2006-30-06 at 19:33 +0200, Patrick McHardy wrote: Well, I thought I stay out of this, but since you mention me .. I think you will add value to the discussion ;- Regardless, we need to settle these kind of issues so we can work better together. A while back i said was going to bring some of these issues at the next netconf - but since the discussion is happening already, lets do it here. Maybe we should take this discussion privately? Whatever helps resolving this, feel free to take this private if you feel like it. I think I already managed to get along with our differences quite well, but it was a hard path. Having friends that use the same discussion tactics may have helped :) Just to clarify what I'm saying below, I like you a lot personally, this is purely a work thing. I also had the feeling it has gotten easier working with you lately, The feeling is mutual. but I can understand Thomas's pain, I had the same thoughts more than once. Your code often does have an enormous amount of bugs and whitespace and other stylistic problems and working with you can be challenging for multiple reasons, for example having to go to endless, One thing is clear in my mind at least (and i have said it several times): I am not as good at the semantics as either yourself or Thomas or Dave or Acme etc but i have tons of other things that compensate for. Probably not as good is not the best description - rather my brain cells put less respect on the stylistic commas than they do on the message. Perhaps it's dylexsia or me trying to subconsciouly maximize my time. If you push me hard, however, i will get close to do just as a good job as you;- Still, cant live without you guys! I think the things you did and which I struggeled with were right to do, but what I don't understand is why you can't do it right yourself. Software development has a lot of unpleasant parts, but ignoring them can't be the right answer. In a shared project, you should always remember that if you're not doing the work, some else will have to. I would describe the majority of the fixes you have sent against me to fall into the stylistic category and into fixing TheLinuxWay (i.e same bug in multiple places and files). This is not to say i am not at fault, or there havent been issues which made me wonder, but we may have different metrics of what bugs i would call my mom and say Mom, I just fixed enormous amount of bugs. [As an example if you went hunting around the kernel, you will find a lot of things that dont totally conform to lindent rules. You could literally send 100s of patches]. Perhaps it is the language usage that puts us at odds at times. Sorry, absolutely not. The bugs I fixed were not just stylistic things, although I still tried to take care of them where possible. It was a huge number of bad bugs, even affecting kernels that didn't even used the feature which introduced the bugs. As for TheLinuxWay (I think I made my opinion on this clear already), cut-and-paste of crappy code is not it. I'm always amazed how people can copy crappy code without even thinking about it, when I copy code I usually fix bugs in the code I copy _before_ doing so. Language might of course still contribute to our misunderstandings. often entirely pointless discussions for obvious fixes, You may see it that way - I dont and so when that happens it is not by any means to engage in a meaningless debate. The discussions are typically of the same nature as i just had with Thomas (just a simple one line change in ifb which looked very obvious). Sometimes you may be right, but despite all the lengthy discussions we went through, I can't recall a single patch of mine that didn't went in eventually. But OK, people are complaining about missing clarification of intent, we're all sometimes guilty of that. Still, starting a huge discussion about patches that clearly move in the right direction like Thomas' one which started this thread is frustrating to the submitter. Discussions about technical matters should use technical, comprehensible arguments, and nothing else. This basically means on of three things: - Bad because it does .. - Bad because this patch is better - Applied Vague expressions about not good, might be able to do better or might create problems without specification are not helpful as long as they are vague. The point is: _show_ that the submitters argument is wrong, and if it really is, that shouldn't be hard. This is especially true if its not some random guy but someone familiar with what he is changing. This thread is one of many examples. The way it goes is that there is a certain assumption made, and a solution is suggested in the form of a patch. Accepting such a patch (which i once in a while dont even get to see until it is in) means a lot of the other assumptions get invalidated and is followed by tons of bug fixes. This has happened a few times to me.
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
On Sat, 2006-01-07 at 01:45 +0200, Thomas Graf wrote: * jamal [EMAIL PROTECTED] 2006-06-30 17:31 Better to explain the reason for ifb first: ifb exists initially as a replacement for IMQ. 1) qdiscs/policies that are per device as opposed to system wide. This now allows for sharing. 2) Allows for queueing incoming traffic for shaping instead of dropping. In other wise, the main use is for multiple devices to redirect to it. Main desire is not for it to redirect to any other ifb device or eth devices. I actually tried to get it to do that, but run into issues of complexity and and came up with decision to drop instead of killing the machine. Other than that, it can redirect to any other devices - but may still not be meaningful. Last time I'm asking. Then please listen carefully - because if you read my message with a reasonable openness the answer is there. Why are packets dropped? When looping happens the only sane thing to do is to drop packets. I mentioned this was a very tricky thing to achieve in all cases since there are many behaviors and netdevices that have to be considered. As an example i ended not submitting the code for looping from egress to ingress because i felt it needed more testing. And i am certain i have missed some other device combinations. Loops could happen within ingress or egress. They could mostly happen because of mirred or ifb. And some i have discovered via testing. You mentioned tc_verd is set to 0 leading to a invalid from verdict. yes, for ifb. Fact is that the from verdict is set to a meaningful value again at dev_queue_xmit() or ing_filter() so ifb_xmit() only sees valid values. Ok, Thomas this is one of those places where we end up colliding for no good reason - your statement above is accusatory. And sometimes i say 3 things and you pick one and use that as context. The ifb does clear the field. So if you get redirected again, it will be 0. tx locks of individual ifb devices are independant, why would it deadlock? different instances of the same device do not deadlock. If however, you have a series of devices in which A-B-C-D-A then you will have a possible deadlock. In the case of the ifb i dissallowed it because it was obvious from the testing. Where is the packet exactly dropped? For the above in the ifb when they come in with from=0. I have been thinking of Herberts change of qdisc_is_running and this may help actually. Help on what? In the case of deadlocks. Now that it is impossible to contend for the txlock twice (with that patch), a second redirect will end up just enqueueing. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
* jamal [EMAIL PROTECTED] 2006-06-28 09:46 Why not use iflink? It exists, by definition, specifically as the ifindex of the down muxed netdevice (should probably add that definition in the .h). This is semantically different from a message to the stack which says this came to you from input device X (represented by input_dev). I disagree, iflink information is bogus once we start redirecting. What's special about that? It seems a perfectly fine and simple way to create namespaces for sockets without enforcing applications to bind to the correct vlan devices directly. Can you achieve the same by binding to any arbitrary netdevice? If you can then i would agree. It was your request to not update input_dev in netif_receive_skb() but rather have each netdevice handle it manually. What's currently done in ifb: skb-dev = skb-input_dev; skb-input_dev = dev; Confusing, isn't it? I really don't get input_dev/iif is supposed to be updated in ifb. If a packet is to be redirected, the mirred action shall take care of providing this information to the next netdevice. My additional request is to provide a flag to disable this for special purposes not hurting anyone. Please refer to what i said above in that you could end up redirecting many times. I have not added yet the code which allows mirred to also do ingress redirection which would make it even more interesting. The challenge is this, and if you can solve it we would be fine: - I need to access both the input_dev and dev and their metadata. I could clearly find them by their ifindices and then reference them after that. For performance reasons that is not optimal in the fast path. Nice, you forget that while redirecting the device pointed to by input_dev can disappear leaving behind an illegal reference. The purpose of this patch is to fix this. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
I noticed the other email carrying a patch. Let me respond to this email and hopefully that will address the patch. On Thu, 2006-29-06 at 10:51 +0200, Thomas Graf wrote: * jamal [EMAIL PROTECTED] 2006-06-28 09:46 [..] I disagree, iflink information is bogus once we start redirecting. After redirecting, iflink is available i.e it doesnt make any more sense and therefore it should be (ab)usable to carry other info. But the more i think about it, using skb-dev is just fine for finding the device you are looking for even when it gets redirected. I am assuming you still want to find out - in the case of a topology which constitutes more than one netdevice - the second last netdevice, correct? Can you achieve the same by binding to any arbitrary netdevice? If you can then i would agree. It was your request to not update input_dev in netif_receive_skb() but rather have each netdevice handle it manually. For good reasons of course. note: you didnt answer the question i asked, but probably the answer doesnt matter. What's currently done in ifb: skb-dev = skb-input_dev; skb-input_dev = dev; Confusing, isn't it? not at all. Let me explain the design intent further below. I really don't get input_dev/iif is supposed to be updated in ifb. If a packet is to be redirected, the mirred action shall take care of providing this information to the next netdevice. ifb is a special purpose device. Think of it as a redirector, or injector if you want a better noun. If you are going to inject packets back to the stack at arbitrary points, then you need to reflect that in the packet metadata implying where it came from. Likewise if you are going to redirect packets into arbitrary points in the stack (as mirred does), you need to do the same. I dont know if that makes sense. My additional request is to provide a flag to disable this for special purposes not hurting anyone. I am failing to see the purpose. With all due respect, you are changing the design intent, Thomas. I know your intent is noble in trying to save the 32 bits on 64 bit machines (at least thats where your patch seems to have started) but the cost:benefit ratio as i have pointed out is unreasonable. If you can meet the requirements i have specified (I am leaving them below) i will be a happy camper. The challenge is this, and if you can solve it we would be fine: - I need to access both the input_dev and dev and their metadata. I could clearly find them by their ifindices and then reference them after that. For performance reasons that is not optimal in the fast path. So just we are clear: I have strong desire to save compute rather than than saving 32 bits per skb on 64 bit machine. Nice, you forget that while redirecting the device pointed to by input_dev can disappear leaving behind an illegal reference. We have had this discussion before - that is resolvable via dev_put/hold. The purpose of this patch is to fix this. Appreciated - but can you do it without replacing the input_dev? cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
* jamal [EMAIL PROTECTED] 2006-06-29 19:23 What's currently done in ifb: skb-dev = skb-input_dev; skb-input_dev = dev; Confusing, isn't it? not at all. Let me explain the design intent further below. Then let me show you what your code does: [mirred attached to filter on eth0 redirecting to ifb0] tcf_mirred():skb2-input_dev = skb-dev; (skb2-input_dev=eth0) ifb_xmit(): skb-dev = skb-input_dev; (skb-dev=eth0) skb-input_dev = dev; (skb-input_dev=ifb0) So when reentering the stack the skb looks like it would be on eth0 coming from ifb0. Is that what you wanted? Shouldn't it be skb-dev=ifb0 skb-input_dev=eth0? That's what my patch would change it to. I know your intent is noble in trying to save the 32 bits on 64 bit machines (at least thats where your patch seems to have started) but the cost:benefit ratio as i have pointed out is unreasonable. Did I ever claim this? You made this up right now. As of now it doesn't save a single bit. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
From: Thomas Graf [EMAIL PROTECTED] Date: Fri, 30 Jun 2006 01:39:33 +0200 * jamal [EMAIL PROTECTED] 2006-06-29 19:23 I know your intent is noble in trying to save the 32 bits on 64 bit machines (at least thats where your patch seems to have started) but the cost:benefit ratio as i have pointed out is unreasonable. Did I ever claim this? You made this up right now. As of now it doesn't save a single bit. Right. From what I can see Thomas's work allows to do correct reference counting on the input_dev, something which is not possible right now and is a bug. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
On Fri, 2006-30-06 at 01:39 +0200, Thomas Graf wrote: * jamal [EMAIL PROTECTED] 2006-06-29 19:23 not at all. Let me explain the design intent further below. Then let me show you what your code does: [mirred attached to filter on eth0 redirecting to ifb0] tcf_mirred():skb2-input_dev = skb-dev; (skb2-input_dev=eth0) ifb_xmit(): skb-dev = skb-input_dev; (skb-dev=eth0) skb-input_dev = dev; (skb-input_dev=ifb0) So when reentering the stack the skb looks like it would be on eth0 coming from ifb0. Is that what you wanted? ok, that looks like egress side of the stack, correct? indeed thats what i meant earlier i.e above looks right. Another way to look at it, is that on the stack, the dev part acts as a to label of the direction and the input_dev maps to the from. step 0: Packet arriving at egress of eth0 It will have skb-dev pointing to eth0 and skb-input_dev will depend on whether it is coming from the local path (in which it will be NULL) or it was being routed (in which case it will reflect the netdevice it last passed on input. Step 1: when it leaves redirect As you note it will have indev(from) = eth0 and dev(to) still ifb0 Step 2: when it leaves ifb going back to eth0 it will have input_dev(from) = ifb0 and dev(to) eth0 Of course this gets more interesting if you had say redirected to another device like lo which redirected to ifb1. Or when you try to create loops etc. And if you look at ingress, it will be slightly different; however, in both cases (ingress/egress) things are consistent in the labeling of from/to to be input_dev/dev Shouldn't it be skb-dev=ifb0 skb-input_dev=eth0? That's what my patch would change it to. yes, that would change the semantics I know your intent is noble in trying to save the 32 bits on 64 bit machines (at least thats where your patch seems to have started) but the cost:benefit ratio as i have pointed out is unreasonable. Did I ever claim this? You made this up right now. As of now it doesn't save a single bit. Ok, I apologize - i assumed it has something to do with skb diet. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
From: jamal [EMAIL PROTECTED] Date: Thu, 29 Jun 2006 20:08:19 -0400 What am i missing? on 64bit machine, does it not save 32 bits to use an ifindex as opposed to the pointer? The objects around it are pointers, which are 64-bit, and thus the 32-bit object gets padded out to 64-bits in the layout of the struct so that the next pointer member can be properly aligned. It does not change the size of sk_buff at all. Yes, it is a bug, but: dev_hold/put dont work anymore? why do you need an ifindex instead? You sure you want to do that atomic operation on every single input packet, regardless of whether egresss operations are using it or not? We should put the cost of features at the actual users, and not impose it upon everyone. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
On Thu, 2006-29-06 at 17:12 -0700, David Miller wrote: From: jamal [EMAIL PROTECTED] Date: Thu, 29 Jun 2006 20:08:19 -0400 What am i missing? on 64bit machine, does it not save 32 bits to use an ifindex as opposed to the pointer? The objects around it are pointers, which are 64-bit, and thus the 32-bit object gets padded out to 64-bits in the layout of the struct so that the next pointer member can be properly aligned. It does not change the size of sk_buff at all. I see; i take it if things were moved around that may change? Yes, it is a bug, but: dev_hold/put dont work anymore? why do you need an ifindex instead? You sure you want to do that atomic operation on every single input packet, regardless of whether egress operations are using it or not? Can you avoid doing the refcount? Note Thomas is doing dev_get_by_index (which will do the atomic ref count). For me the choice is between having the iif and: - __get device from ifindex - reference dev-something vs getting the input_dev and - reference dev-something We should put the cost of features at the actual users, and not impose it upon everyone. I didnt quiet follow, the ref count seems only needed in the redirection, no? cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
David Miller wrote: I'm saying that, we don't need the refcount, just setting the skb-input_index thing, unless someone actually cares about the input device. As long as the packet hits not paths that care about the SKB input device, no atomic refcounts are taken. It's just an integer sitting there in the SKB. Also, if this lookup is to be done multiple times per skb, we could do the lookup once and grab the ref at that time and store the pointer in the skb. If you wanted to be truly horible about it..could use a 64-bit input_index and copy the pointer there (and set a flag somewhere noting it is no longer an index but a pointer) so save the extra 64-bits. Ben - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- Ben Greear [EMAIL PROTECTED] Candela Technologies Inc http://www.candelatech.com - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
* jamal [EMAIL PROTECTED] 2006-06-29 20:03 On Fri, 2006-30-06 at 01:39 +0200, Thomas Graf wrote: * jamal [EMAIL PROTECTED] 2006-06-29 19:23 not at all. Let me explain the design intent further below. Then let me show you what your code does: [mirred attached to filter on eth0 redirecting to ifb0] tcf_mirred():skb2-input_dev = skb-dev; (skb2-input_dev=eth0) ifb_xmit(): skb-dev = skb-input_dev; (skb-dev=eth0) skb-input_dev = dev; (skb-input_dev=ifb0) So when reentering the stack the skb looks like it would be on eth0 coming from ifb0. Is that what you wanted? ok, that looks like egress side of the stack, correct? No, that's the ingress side leaving ifb again via netif_rx() skb-dev should represent the from/at= and not to=. For egress your code is correct although impossible to guess right due to total lack of a comment where it would make sense and naming that doesn't give any implications. When leaving ifb0 you want for... ... egress: skb-dev=to (eth0) skb-iif=from (ifb0) ... ingress: skb-dev=at (ifb0) skb-iif=from (eth0) So we move the update to the tasklet and set skb-dev to skb-iif before dev_queue_xmit() and skb-dev = dev (ifb) before calling netif_rx() Does that fullfil your requirements? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
On Thu, 2006-29-06 at 17:29 -0700, David Miller wrote: From: jamal [EMAIL PROTECTED] Date: Thu, 29 Jun 2006 20:26:20 -0400 [..] I see; i take it if things were moved around that may change? Yes. Ok, relief - so i was not totally unreasonable then ;- Can you avoid doing the refcount? Note Thomas is doing dev_get_by_index (which will do the atomic ref count). He is doing that where skb-input_device is needed, which is what we want. I am saying the same thing as well - i think. mirred touches the input_dev and therefore setting the refcount in mirred is valid - but iam unsure where to unset it. I didnt quiet follow, the ref count seems only needed in the redirection, no? I'm saying that, we don't need the refcount, just setting the skb-input_index thing, unless someone actually cares about the input device. the ifb references it; only mirred redirects to the ifb at the moment. You would need to increment in mirred, no? Why do i feel i am missing something? ;- As long as the packet hits not paths that care about the SKB input device, no atomic refcounts are taken. It's just an integer sitting there in the SKB. indeed. I think whether it becomes ifindex or pointer you need to increment the refcounter. and decrement somewhere. The challenge for me is a choice to use more cycles if you use ifindex vs less cycles with a pointer. The advantage for going with ifindex would be to save those bits(if you rearrange). The question is which is reasonable?;- cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
* jamal [EMAIL PROTECTED] 2006-06-29 20:48 the ifb references it; only mirred redirects to the ifb at the moment. You would need to increment in mirred, no? Why do i feel i am missing something? ;- The point is to avoid having an atomic operation for every packet when setting iif in netif_receive_skb(). If it was only for mirred nobody would complain I guess. I think whether it becomes ifindex or pointer you need to increment the refcounter. and decrement somewhere. The challenge for me is a choice to use more cycles if you use ifindex vs less cycles with a pointer. The advantage for going with ifindex would be to save those bits(if you rearrange). The question is which is reasonable?;- The third choice is to just don't care if the interface goes away but have a chance to figure it out and just assume as if it would have never been set. The number of devices that can disappear w/o user control is very very limited and not worth an atomic operation for every single packet. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
On Fri, 2006-30-06 at 02:46 +0200, Thomas Graf wrote: * jamal [EMAIL PROTECTED] 2006-06-29 20:03 On Fri, 2006-30-06 at 01:39 +0200, Thomas Graf wrote: * jamal [EMAIL PROTECTED] 2006-06-29 19:23 not at all. Let me explain the design intent further below. Then let me show you what your code does: [mirred attached to filter on eth0 redirecting to ifb0] tcf_mirred():skb2-input_dev = skb-dev; (skb2-input_dev=eth0) ifb_xmit(): skb-dev = skb-input_dev; (skb-dev=eth0) skb-input_dev = dev; (skb-input_dev=ifb0) So when reentering the stack the skb looks like it would be on eth0 coming from ifb0. Is that what you wanted? ok, that looks like egress side of the stack, correct? No, that's the ingress side leaving ifb again via netif_rx() skb-dev should represent the from/at= and not to=. Heres what it would look at ingress: step 0: coming from wire via eth0, dev=eth0, input_dev=eth0 step 1: redirect to ifb0, leaving redirect dev=ifb0, input_dev=eth0 step 2: leaving ifb0, coming back to ingress side of stack dev= eth0, input_dev=ifb0 Again, it gets interesting when you have redirection multiple times and create loops. It will get more interesting when i submit the code for redirecting to ingress. The good news is there is very strong consistency. I dont know if at is the correct description. I visualize it as hitting the stack just at the point. when the packet first comes in both from/to point to eth0. So the semantics are the same as in egress. For egress your code is correct although impossible to guess right due to total lack of a comment where it would make sense and naming that doesn't give any implications. There are some simple comments and I have been trying very hard to explain this for a long time. Patrick was the last person who tortured me ;- Perhaps i need to write a document. When leaving ifb0 you want for... ... egress: skb-dev=to (eth0) skb-iif=from (ifb0) ... ingress: skb-dev=at (ifb0) skb-iif=from (eth0) Yes, this is correct. I described the flow of the first one in the earlier email and the ingress side. Although lets not talk about iif yet. I posed a question earlier on which scheme would be better. So we move the update to the tasklet and set skb-dev to skb-iif before dev_queue_xmit() and skb-dev = dev (ifb) before calling netif_rx() Does that fullfil your requirements? You lost me on what you are trying to achieve. Just restore the line you removed in ifb and things would be fine. Lets then settle the issue of iif vs input_dev. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
On Fri, 2006-30-06 at 02:55 +0200, Thomas Graf wrote: * jamal [EMAIL PROTECTED] 2006-06-29 20:48 The point is to avoid having an atomic operation for every packet when setting iif in netif_receive_skb(). If it was only for mirred nobody would complain I guess. I never intended to punish all users. I think i was misunderstood. The only problem is where do you decrement the refcount when you increment at mirred? In your case, I assume it is at some sort of rule destruction? I think whether it becomes ifindex or pointer you need to increment the refcounter. and decrement somewhere. The challenge for me is a choice to use more cycles if you use ifindex vs less cycles with a pointer. The advantage for going with ifindex would be to save those bits(if you rearrange). The question is which is reasonable?;- The third choice is to just don't care if the interface goes away but have a chance to figure it out and just assume as if it would have never been set. The number of devices that can disappear w/o user control is very very limited and not worth an atomic operation for every single packet. Now that you mention this - I think that option 3 is what we said last time we had this discussion ;- cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
* jamal [EMAIL PROTECTED] 2006-06-27 09:07 I am reading the thread backwards, so i may miss some of the obvious. I also dont remember the exact discussion we had - but the consensus was to leave the field setting as is. Note the meta-setter (been sitting on it for too long) also set the input_dev. Could you share that piece of code so I don't have to duplicate that effort? BTW, in regards to the VLANs, what is wrong with using the netdev-iflink to figure the real device? vlan1 ifb1 vlan2 vlan3 ifb2 vlan4 dhcp daemon would bind to ifb1 and use pktinfo cmsg to figure out whether the packet was received on vlan1 or vlan2, the actual physical device is of no interest anymore. Looking at your ifb code I see that you set skb-dev based on iif and update iif unconditionally. I see a lot of problems with this, firstly iif is already updated in the mirred action, secondly iif doesn't necessarily point to the device the packet was last seen on. Could you maybe explain the policy of iif you have in mind, it doesn't seem very consistent. BTW, why does ifb have a hard header length? The resulting push and pull only slows things down. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
On Wed, 2006-28-06 at 12:18 +0200, Thomas Graf wrote: * jamal [EMAIL PROTECTED] 2006-06-27 09:07 [..] Note the meta-setter (been sitting on it for too long) also set the input_dev. Could you share that piece of code so I don't have to duplicate that effort? I will look it up and send it your way - it is not complete (I have changed machines twice since i first wrote it). Or maybe the intent of your question was to ask how was the setting of input_dev was done? BTW, in regards to the VLANs, what is wrong with using the netdev-iflink to figure the real device? vlan1 ifb1 vlan2 vlan3 ifb2 vlan4 dhcp daemon would bind to ifb1 and use pktinfo cmsg to figure out whether the packet was received on vlan1 or vlan2, the actual physical device is of no interest anymore. Let me see if i understood correctly: you have a device from both vlan1 and vlan2 both being redirected to ifb1? and vlan 3, 4= ifb2? And i take it to make it interesting vlan1,2 on eth0 and vlan3,4 ontop of eth1? note the iflink would/should reflect eth1/2. If you had bonding or bridging in between it gets more interesting. But you are saying you are no longer interested in the info that it came via eth1/2 at some point, correct? Note, in such a case: iflink rewriting will do it just fine but then you loose the original info (I think it would be good to not loose such info to know the origin). I dont know if cmsg uses iflink at all - they probably look at the ifindex. Ok, now question: why are you binding on top of ifb? we could have redirected this to ipip, or tun0 or loopback or eql etc where it may equally not have made sense to bind to. I can see something like this making sense for bonding or bridging but not as a generic answer for all netdevices. Looking at your ifb code I see that you set skb-dev based on iif and update iif unconditionally. I see a lot of problems with this, firstly iif is already updated in the mirred action, secondly iif doesn't necessarily point to the device the packet was last seen on. Could you maybe explain the policy of iif you have in mind, it doesn't seem very consistent. The concept is to use ifb as a transitionary device. Packets come into it from either the ingress side of the stack or egress and get returned on the same side/spot they were found - with ifb represented as the input device. What ifb does is independent of what mirred does (we could redirect to other devices other than ifb). Now that i pay attention to your patch i think i see the complications it is introducing (i am not done yet)- and i did warn about this in the discussion _and_ didnt we agree to leave this stuff alone? Can we drop this change please? BTW, why does ifb have a hard header length? The resulting push and pull only slows things down. Good question. I know there were a lot of oddities i had to deal with when different link layers cross over (some have smaller headers than other and yet others had none ;-) I will have to go and look at my notes test cases and get back to you; It does work as it is right now (I only notice a pull - there is no push). cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
* jamal [EMAIL PROTECTED] 2006-06-28 08:22 Let me see if i understood correctly: you have a device from both vlan1 and vlan2 both being redirected to ifb1? and vlan 3, 4= ifb2? And i take it to make it interesting vlan1,2 on eth0 and vlan3,4 ontop of eth1? note the iflink would/should reflect eth1/2. If you had bonding or bridging in between it gets more interesting. But you are saying you are no longer interested in the info that it came via eth1/2 at some point, correct? Note, in such a case: iflink rewriting will do it just fine but then you loose the original info (I think it would be good to not loose such info to know the origin). I dont know if cmsg uses iflink at all - they probably look at the ifindex. It's using rt_iif which is currently derived from skb-dev even though it would make sense to use skb-iif once that is well defined. Ok, now question: why are you binding on top of ifb? we could have redirected this to ipip, or tun0 or loopback or eql etc where it may equally not have made sense to bind to. I can see something like this making sense for bonding or bridging but not as a generic answer for all netdevices. What's special about that? It seems a perfectly fine and simple way to create namespaces for sockets without enforcing applications to bind to the correct vlan devices directly. The concept is to use ifb as a transitionary device. Packets come into it from either the ingress side of the stack or egress and get returned on the same side/spot they were found - with ifb represented as the input device. What ifb does is independent of what mirred does (we could redirect to other devices other than ifb). This is clear, my question was for you to explain when you intend to update input_dev etc. By overwriting it in ifb you enforce that valueable information is lost. There is nothing wrong with updating iif to point to ifb under certain circumstances but it's not required to strictly enforce this. Having mirred do so conditionally is a lot more flexible without losing any value. Now that i pay attention to your patch i think i see the complications it is introducing (i am not done yet)- and i did warn about this in the discussion _and_ didnt we agree to leave this stuff alone? Can we drop this change please? This statement doesn't make sense, it merely transforms a device reference to a reference by interface index because the current method of handling it doesn't make sense at all since the region where input_dev stays valid without the risk of the device disappearing is very limited. It does work as it is right now (I only notice a pull - there is no push). In order to pull in ifb you need to push in mirred, quite obvious, no? Why are you enforcing users of ifb to fake an ethernet header that is of no use at all? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
On Wed, 2006-28-06 at 15:01 +0200, Thomas Graf wrote: * jamal [EMAIL PROTECTED] 2006-06-28 08:22 [..] Note, in such a case: iflink rewriting will do it just fine but then you loose the original info (I think it would be good to not loose such info to know the origin). I dont know if cmsg uses iflink at all - they probably look at the ifindex. It's using rt_iif which is currently derived from skb-dev even though it would make sense to use skb-iif once that is well defined. Why not use iflink? It exists, by definition, specifically as the ifindex of the down muxed netdevice (should probably add that definition in the .h). This is semantically different from a message to the stack which says this came to you from input device X (represented by input_dev). Ok, now question: why are you binding on top of ifb? we could have redirected this to ipip, or tun0 or loopback or eql etc where it may equally not have made sense to bind to. I can see something like this making sense for bonding or bridging but not as a generic answer for all netdevices. What's special about that? It seems a perfectly fine and simple way to create namespaces for sockets without enforcing applications to bind to the correct vlan devices directly. Can you achieve the same by binding to any arbitrary netdevice? If you can then i would agree. This is clear, my question was for you to explain when you intend to update input_dev etc. By overwriting it in ifb you enforce that valueable information is lost. It is design intent for this field to carry information on where the packet came from. We may loop many times between devices (between ingress-egress or ingress-ingress, within the ttl limit) and each time the only meaningful thing is which was the last netdevice that was sending it. There is nothing wrong with updating iif to point to ifb under certain circumstances but it's not required to strictly enforce this. Having mirred do so conditionally is a lot more flexible without losing any value. Refer to what i said above. Now that i pay attention to your patch i think i see the complications it is introducing (i am not done yet)- and i did warn about this in the discussion _and_ didnt we agree to leave this stuff alone? Can we drop this change please? This statement doesn't make sense, it merely transforms a device reference to a reference by interface index because the current method of handling it doesn't make sense at all since the region where input_dev stays valid without the risk of the device disappearing is very limited. Please refer to what i said above in that you could end up redirecting many times. I have not added yet the code which allows mirred to also do ingress redirection which would make it even more interesting. The challenge is this, and if you can solve it we would be fine: - I need to access both the input_dev and dev and their metadata. I could clearly find them by their ifindices and then reference them after that. For performance reasons that is not optimal in the fast path. [Note, we have had this discussion before and even then we came to some consensus that it would be hard to achieve that hence my view, again, to drop this]. It does work as it is right now (I only notice a pull - there is no push). In order to pull in ifb you need to push in mirred, quite obvious, no? I think that my thinking at the time was that ifb can only receive packets via mirred and therefore the pull on ingress at ifb may have been to counter the push that occurs at mirred. Let me look at my notes and testcases when i get back home; it may make sense not to have it at ifb - the fact that it works as is tells me there is more to it than the obvious. Why are you enforcing users of ifb to fake an ethernet header that is of no use at all? This may just be an artifact of inheriting things from dummy device. The answer will be clear when i look at my notes. cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
On Tue, 2006-27-06 at 12:03 +0200, Thomas Graf wrote: * Patrick McHardy [EMAIL PROTECTED] 2006-06-27 00:29 Doesn't sound too silly (actually quite nifty in my opinion), but I'm not sure I understand correctly, binding to ifb doesn't work since it changes both skb-dev and skb-input_dev. I don't understand this concern. So far the mirred action updates iif but that can be made configurable. I am reading the thread backwards, so i may miss some of the obvious. I also dont remember the exact discussion we had - but the consensus was to leave the field setting as is. Note the meta-setter (been sitting on it for too long) also set the input_dev. BTW, in regards to the VLANs, what is wrong with using the netdev-iflink to figure the real device? cheers, jamal - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
Updating iif to the VLAN device helps keeping routing namespaces defined in case packets from multiple VLANs collapse on a single device again. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.git/net/8021q/vlan_dev.c === --- net-2.6.git.orig/net/8021q/vlan_dev.c +++ net-2.6.git/net/8021q/vlan_dev.c @@ -156,6 +156,8 @@ int vlan_skb_recv(struct sk_buff *skb, s return -1; } + /* Make packets look like they have been received on the VLAN device */ + skb-iif = skb-dev-ifindex; skb-dev-last_rx = jiffies; /* Bump the rx counters for the VLAN device. */ -- - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
Thomas Graf wrote: Updating iif to the VLAN device helps keeping routing namespaces defined in case packets from multiple VLANs collapse on a single device again. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.git/net/8021q/vlan_dev.c === --- net-2.6.git.orig/net/8021q/vlan_dev.c +++ net-2.6.git/net/8021q/vlan_dev.c @@ -156,6 +156,8 @@ int vlan_skb_recv(struct sk_buff *skb, s return -1; } + /* Make packets look like they have been received on the VLAN device */ + skb-iif = skb-dev-ifindex; skb-dev-last_rx = jiffies; I know this was discussed before, but I can't remember the exact outcome. Why don't we just unconditionally update iif in netif_receive_skb()? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
From: Patrick McHardy [EMAIL PROTECTED] Date: Mon, 26 Jun 2006 19:04:15 +0200 I know this was discussed before, but I can't remember the exact outcome. Why don't we just unconditionally update iif in netif_receive_skb()? Software devices might have interesting semantics that would make not setting iif desirable. Once you set iif, you can't just undo it because the information is lost. I also would really prefer to set it unconditionally in netif_receive_skb(), but Jamal's concerns in this area are real. We really need to evaluate this on a case-by-case basis. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
David Miller wrote: From: Patrick McHardy [EMAIL PROTECTED] Date: Mon, 26 Jun 2006 19:04:15 +0200 I know this was discussed before, but I can't remember the exact outcome. Why don't we just unconditionally update iif in netif_receive_skb()? Software devices might have interesting semantics that would make not setting iif desirable. Right, I remember now. Once you set iif, you can't just undo it because the information is lost. I also would really prefer to set it unconditionally in netif_receive_skb(), but Jamal's concerns in this area are real. We really need to evaluate this on a case-by-case basis. I still have a hard time imagining a case where this wouldn't fit, netfilter is perfectly happy with just using skb-dev as input device on the input path so far. I also think it will be confusing to the user to have different notions of what constitutes the input device. If anyone can think of an example where the user would really expect the input device to be something different than skb-dev I'd be really interested to hear it. BTW, this is not meant to be an objection to Thomas's patch, just me still wondering .. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
* David Miller [EMAIL PROTECTED] 2006-06-26 10:46 From: Patrick McHardy [EMAIL PROTECTED] Date: Mon, 26 Jun 2006 19:04:15 +0200 I know this was discussed before, but I can't remember the exact outcome. Why don't we just unconditionally update iif in netif_receive_skb()? Software devices might have interesting semantics that would make not setting iif desirable. Once you set iif, you can't just undo it because the information is lost. I also would really prefer to set it unconditionally in netif_receive_skb(), but Jamal's concerns in this area are real. We really need to evaluate this on a case-by-case basis. I'm playing with a FIFO device based on Jamal's previous work to replace the broken IMQ device and provide real queueing at ingress. A set of VLAN devices could be redirected to such a FIFO device and let applications bind to it in order to implement trivial bind(INADDR_ANY) namespaces based on anything that can be selected by classifiers. This example would benefit from a conditional iif update (given that the mirred action is extended to take a flag to steer this) so applications like a dhcp daemon could use a raw socket on the FIFO device and thus benefit from the bind namespace but still access the original interface the packet was received from using pktinfo cmsg. Maybe silly but the best I could come up with :-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
Thomas Graf wrote: * David Miller [EMAIL PROTECTED] 2006-06-26 10:46 From: Patrick McHardy [EMAIL PROTECTED] Date: Mon, 26 Jun 2006 19:04:15 +0200 I know this was discussed before, but I can't remember the exact outcome. Why don't we just unconditionally update iif in netif_receive_skb()? Software devices might have interesting semantics that would make not setting iif desirable. Once you set iif, you can't just undo it because the information is lost. I also would really prefer to set it unconditionally in netif_receive_skb(), but Jamal's concerns in this area are real. We really need to evaluate this on a case-by-case basis. I'm playing with a FIFO device based on Jamal's previous work to replace the broken IMQ device and provide real queueing at ingress. All my testing (quite a lot) in this area so far suggested that queueing at ingress doesn't work and is actually counter-productive. It gets worse with increasing signal propagation delays (signal in this case means congestion indications). I actually wish I never introduced the stupid IMQ device, it raises false hope and a lot of people seem to fall for it. A set of VLAN devices could be redirected to such a FIFO device and let applications bind to it in order to implement trivial bind(INADDR_ANY) namespaces based on anything that can be selected by classifiers. This example would benefit from a conditional iif update (given that the mirred action is extended to take a flag to steer this) so applications like a dhcp daemon could use a raw socket on the FIFO device and thus benefit from the bind namespace but still access the original interface the packet was received from using pktinfo cmsg. Maybe silly but the best I could come up with :-) Doesn't sound too silly (actually quite nifty in my opinion), but I'm not sure I understand correctly, binding to ifb doesn't work since it changes both skb-dev and skb-input_dev. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] [VLAN]: Update iif when receiving via VLAN device
Patrick McHardy wrote: All my testing (quite a lot) in this area so far suggested that queueing at ingress doesn't work and is actually counter-productive. It gets worse with increasing signal propagation delays (signal in this case means congestion indications). I actually wish I never introduced the stupid IMQ device, it raises false hope and a lot of people seem to fall for it. Small clarification: with queueing at ingress I mean queueing after the bottleneck. It might work if you introduce a virtual bottleneck, but in that case in my experience the bandwidth limit you have to use is dependant on the number of TCP flows, so usually you can't specify a limit that will always work, and its wasteful if there is a smaller number of TCP flows. The reason why you would do it, limiting or prioritizing incoming traffic on the _real_ bottleneck, is not achievable of course. Also noteworthy is that policers don't behave any better, and any other schemes I've tried (I also experienced a lot of with TCP rate control) have similar or related problems. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html