Re: Q: sock output serialization
> "jamal" == jamal <[EMAIL PROTECTED]> writes: jamal> Packets in flight? >> In the extreme case, there could still arrive up to the window >> size frames. jamal> Assuming this depends on path latency and not some bad jamal> programming Yes. Although the latter could also possible. jamal> BTW, earlier i lied: there is a way to tell if your packet jamal> will be dropped which is not very expensive: jamal> if (atomic_read(_dropping) /* packet will be jamal> dropped */ jamal> but even this is 99% accurate in SMP. Well, but better than knowing nothing about congestion state. We could at least document in the x25iface.txt kernel doc that driver authors should check this before acknowledging frames. Henner - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
"jamal" == jamal [EMAIL PROTECTED] writes: jamal Packets in flight? In the extreme case, there could still arrive up to the window size frames. jamal Assuming this depends on path latency and not some bad jamal programming Yes. Although the latter could also possible. jamal BTW, earlier i lied: there is a way to tell if your packet jamal will be dropped which is not very expensive: jamal if (atomic_read(netdev_dropping) /* packet will be jamal dropped */ jamal but even this is 99% accurate in SMP. Well, but better than knowing nothing about congestion state. We could at least document in the x25iface.txt kernel doc that driver authors should check this before acknowledging frames. Henner - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
[EMAIL PROTECTED] said: > I think its fixable to make it do the RR/RNR after bouncing it up the > stack. - ARCnet does ACK in hardware. Packets don't hit the wire until the destination has indicated that it's got a buffer available. You really want to be able to reserve space on the queue before telling the chip to accept another incoming packet - not just realise afterwards that you've screwed up. Strictly speaking, this fact is irrelevant to the case in question, but if we're modifying the generic code for LAPB, we might as well think about other protocols which require similar treatment. -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
[EMAIL PROTECTED] said: I think its fixable to make it do the RR/RNR after bouncing it up the stack. - ARCnet does ACK in hardware. Packets don't hit the wire until the destination has indicated that it's got a buffer available. You really want to be able to reserve space on the queue before telling the chip to accept another incoming packet - not just realise afterwards that you've screwed up. Strictly speaking, this fact is irrelevant to the case in question, but if we're modifying the generic code for LAPB, we might as well think about other protocols which require similar treatment. -- dwmw2 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
On Sun, 17 Sep 2000, Henner Eisen wrote: > > "jamal" == jamal <[EMAIL PROTECTED]> writes: > No. Just, if you do not accept a frame, you must not acknowledge it. > Once it has been acknowledged, you must not discard it. Ok so no problem then > > jamal> Can you stop mid-window and claim there is > jamal> congestion? (maybe time to dust off some books). > > Yes. Again, this makes life simpler. You dont have to accept the whole window. > Just had a look at the X.25 specs again. As far as LAPB is concerned > (and that´s what we are speeking about), it is like this: > When your receiver is busy, you tell the other end about this by means > of a ReceiverNotReady primitive. However, it might take some time until > the peer receives it and reacts on this. Packets in flight? > In the extreme case, there could still arrive up to the window size > frames. Assuming this depends on path latency and not some bad programming > It seems that the > receiver can do whatever it wants to do with frames received during the > busy condition: Either accept the frames (but delay acknowledgement until > the busy condition is cleared) or just discard them. The first one seems > to favor performance while the second favors simplicity. > > I guess in Linux, we should usually choose simplicity. I think even > with the simpicity variant, we could be able to preserve performance > if we can flow control the peer earlier. E.g. when the return value of > your netif_rx indicates 'almost congested, but still able to accept frames', > we could already set the busy condition but continue to deliver the > frames arriving during our busy condition. But that´s performance tuning > and can be taken care of later (I´m even not sure wheter this tuning will > pay off). > This is doable. the 'almost congested, but still able to accept frames' is a tunable parameter via proc. Nobody is stopping you from maintaining your own little queue in the driver to take the first option. The complexity is added to your driver as opposed to the general system. BTW, earlier i lied: there is a way to tell if your packet will be dropped which is not very expensive: if (atomic_read(_dropping) /* packet will be dropped */ but even this is 99% accurate in SMP. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
> "jamal" == jamal <[EMAIL PROTECTED]> writes: jamal> Hmm.. More complexity ;-> Does X.25 mandate you accept all jamal> the window? No. Just, if you do not accept a frame, you must not acknowledge it. Once it has been acknowledged, you must not discard it. jamal> Can you stop mid-window and claim there is jamal> congestion? (maybe time to dust off some books). Yes. Just had a look at the X.25 specs again. As far as LAPB is concerned (and that´s what we are speeking about), it is like this: When your receiver is busy, you tell the other end about this by means of a ReceiverNotReady primitive. However, it might take some time until the peer receives it and reacts on this. In the extreme case, there could still arrive up to the window size frames. It seems that the receiver can do whatever it wants to do with frames received during the busy condition: Either accept the frames (but delay acknowledgement until the busy condition is cleared) or just discard them. The first one seems to favor performance while the second favors simplicity. I guess in Linux, we should usually choose simplicity. I think even with the simpicity variant, we could be able to preserve performance if we can flow control the peer earlier. E.g. when the return value of your netif_rx indicates 'almost congested, but still able to accept frames', we could already set the busy condition but continue to deliver the frames arriving during our busy condition. But that´s performance tuning and can be taken care of later (I´m even not sure wheter this tuning will pay off). Henner - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
On Sun, 17 Sep 2000, Henner Eisen wrote: > Yes, a) that would make life much simpler for driver writers (but more > difficult for you ;). If it is doable without adding overhead to the > general path, it would be nice to provide that semantics to HW_FLOWCONTROLed > devices. > There would be a minute overhead. But i guess let me release the patch first then we can continue this part of the conversation. > However, even with a), after being HW-flow-controlled and setting rx_busy > condition, there could still arrive some more packets until the send window > is full. They either need to be discarded at once or queued somewhere else. > If we don´t want to discard them, you need to accept packets up > to the window size from a device after it has been HW flow conrolled. Hmm.. More complexity ;-> Does X.25 mandate you accept all the window? Can you stop mid-window and claim there is congestion? (maybe time to dust off some books). cheers, jamal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
Hi, > "jamal" == jamal <[EMAIL PROTECTED]> writes: >> With the current scheme, lapb first acknowleges reception of >> the frame and after that, netif_rx() might still discard it -- >> which is evil. >> jamal> This might screw things a bit. Can you defer to say first jamal> call netif_rx() then acknowledge or is this hard-coded into jamal> the f/ware? This depends on the firmware. I don´t know. The software lapb module could be modified to honor a return vale appropriately. But software lapb should be moved above netif for several other reasons anyway (although even there, honoring a return value for flow control would make sense). Maybe it is a good idea to make the congestion return values not netif specific, but making them part of a generic "return semantics for delivering packets to upper layers". The driver maintainers will need to investigate this and take appropriate actions depending on the firmware´s capabilities. My personal use of the X.25 stack was using it in DTE-DTE mode over isdn where I use the isdn-driver´s internal lapb (x75i) implementation. Unfortunatly, the interface to the isdn lower layers does not allow to return an rx_busy condition. >> Provided that netif_would_drop(dev) is reliable (a subsequent jamal> I think this would make it a little more complex than jamal> necessary; the queue state might change right after you Yes, the scenario I had in mind (where it would have been reliable) was a little short-sighted (see reply to Andi´s message). jamal> If you cant defer the acknowledgement until netif_rx() jamal> returns then what we could do is instead: jamal> 1) for devices that are registered with hardware flow jamal> control ==> you have to register as a jamal> CONFIG_NET_HW_FLOWCONTROL device. jamal> a) to let them queue that last packet before they are jamal> shut-up, the assumption is they respect the protocol and jamal> will 'back-off' after that. jamal> b) return BLG_CNG_WOULD_DROP jamal> instead to the device and give it the responsibility to jamal> free the skb or store it wherever it wants but not in the jamal> backlog. jamal> I personally prefer a). Reason: If we have done all the jamal> work so far(context switch etc) and we know the device is jamal> well behaved(meaning it is not going to send another packet jamal> without beiong told things are fine) then it is probably jamal> wiser to just let that packet get on the backlog queue. Yes, a) that would make life much simpler for driver writers (but more difficult for you ;). If it is doable without adding overhead to the general path, it would be nice to provide that semantics to HW_FLOWCONTROLed devices. However, even with a), after being HW-flow-controlled and setting rx_busy condition, there could still arrive some more packets until the send window is full. They either need to be discarded at once or queued somewhere else. If we don´t want to discard them, you need to accept packets up to the window size from a device after it has been HW flow conrolled. Henner - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
> "Andi" == Andi Kleen <[EMAIL PROTECTED]> writes: Andi> It would just be racy. You test, get a not drop and then Andi> another different interrupt would deliver another packet Andi> before you can and fill the queue. Jamal's extended Andi> netif_rx probably makes more sense, because it can be Andi> atomic. I thought if it was executed from the same single interrupt handler (and lapb also processed from that same interrupt handler) while local irq are disables, this could not happen. But for smart controllers, this does not hold, they would need to interrupt the cpu first to query the state, and than again before delivering the packet. And for dumb cards, doing the lapb processing inside irq handler is not nice, anyway. Moving lapb processing above netif_rx() would resolve this and all other problems. Henner - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
"Andi" == Andi Kleen [EMAIL PROTECTED] writes: Andi It would just be racy. You test, get a not drop and then Andi another different interrupt would deliver another packet Andi before you can and fill the queue. Jamal's extended Andi netif_rx probably makes more sense, because it can be Andi atomic. I thought if it was executed from the same single interrupt handler (and lapb also processed from that same interrupt handler) while local irq are disables, this could not happen. But for smart controllers, this does not hold, they would need to interrupt the cpu first to query the state, and than again before delivering the packet. And for dumb cards, doing the lapb processing inside irq handler is not nice, anyway. Moving lapb processing above netif_rx() would resolve this and all other problems. Henner - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
Hi, "jamal" == jamal [EMAIL PROTECTED] writes: With the current scheme, lapb first acknowleges reception of the frame and after that, netif_rx() might still discard it -- which is evil. jamal This might screw things a bit. Can you defer to say first jamal call netif_rx() then acknowledge or is this hard-coded into jamal the f/ware? This depends on the firmware. I don´t know. The software lapb module could be modified to honor a return vale appropriately. But software lapb should be moved above netif for several other reasons anyway (although even there, honoring a return value for flow control would make sense). Maybe it is a good idea to make the congestion return values not netif specific, but making them part of a generic "return semantics for delivering packets to upper layers". The driver maintainers will need to investigate this and take appropriate actions depending on the firmware´s capabilities. My personal use of the X.25 stack was using it in DTE-DTE mode over isdn where I use the isdn-driver´s internal lapb (x75i) implementation. Unfortunatly, the interface to the isdn lower layers does not allow to return an rx_busy condition. Provided that netif_would_drop(dev) is reliable (a subsequent jamal I think this would make it a little more complex than jamal necessary; the queue state might change right after you Yes, the scenario I had in mind (where it would have been reliable) was a little short-sighted (see reply to Andi´s message). jamal If you cant defer the acknowledgement until netif_rx() jamal returns then what we could do is instead: jamal 1) for devices that are registered with hardware flow jamal control == you have to register as a jamal CONFIG_NET_HW_FLOWCONTROL device. jamal a) to let them queue that last packet before they are jamal shut-up, the assumption is they respect the protocol and jamal will 'back-off' after that. jamal b) return BLG_CNG_WOULD_DROP jamal instead to the device and give it the responsibility to jamal free the skb or store it wherever it wants but not in the jamal backlog. jamal I personally prefer a). Reason: If we have done all the jamal work so far(context switch etc) and we know the device is jamal well behaved(meaning it is not going to send another packet jamal without beiong told things are fine) then it is probably jamal wiser to just let that packet get on the backlog queue. Yes, a) that would make life much simpler for driver writers (but more difficult for you ;). If it is doable without adding overhead to the general path, it would be nice to provide that semantics to HW_FLOWCONTROLed devices. However, even with a), after being HW-flow-controlled and setting rx_busy condition, there could still arrive some more packets until the send window is full. They either need to be discarded at once or queued somewhere else. If we don´t want to discard them, you need to accept packets up to the window size from a device after it has been HW flow conrolled. Henner - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
On Sun, 17 Sep 2000, Henner Eisen wrote: "jamal" == jamal [EMAIL PROTECTED] writes: No. Just, if you do not accept a frame, you must not acknowledge it. Once it has been acknowledged, you must not discard it. Ok so no problem then jamal Can you stop mid-window and claim there is jamal congestion? (maybe time to dust off some books). Yes. Again, this makes life simpler. You dont have to accept the whole window. Just had a look at the X.25 specs again. As far as LAPB is concerned (and that´s what we are speeking about), it is like this: When your receiver is busy, you tell the other end about this by means of a ReceiverNotReady primitive. However, it might take some time until the peer receives it and reacts on this. Packets in flight? In the extreme case, there could still arrive up to the window size frames. Assuming this depends on path latency and not some bad programming It seems that the receiver can do whatever it wants to do with frames received during the busy condition: Either accept the frames (but delay acknowledgement until the busy condition is cleared) or just discard them. The first one seems to favor performance while the second favors simplicity. I guess in Linux, we should usually choose simplicity. I think even with the simpicity variant, we could be able to preserve performance if we can flow control the peer earlier. E.g. when the return value of your netif_rx indicates 'almost congested, but still able to accept frames', we could already set the busy condition but continue to deliver the frames arriving during our busy condition. But that´s performance tuning and can be taken care of later (I´m even not sure wheter this tuning will pay off). This is doable. the 'almost congested, but still able to accept frames' is a tunable parameter via proc. Nobody is stopping you from maintaining your own little queue in the driver to take the first option. The complexity is added to your driver as opposed to the general system. BTW, earlier i lied: there is a way to tell if your packet will be dropped which is not very expensive: if (atomic_read(netdev_dropping) /* packet will be dropped */ but even this is 99% accurate in SMP. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
"jamal" == jamal [EMAIL PROTECTED] writes: jamal Hmm.. More complexity ;- Does X.25 mandate you accept all jamal the window? No. Just, if you do not accept a frame, you must not acknowledge it. Once it has been acknowledged, you must not discard it. jamal Can you stop mid-window and claim there is jamal congestion? (maybe time to dust off some books). Yes. Just had a look at the X.25 specs again. As far as LAPB is concerned (and that´s what we are speeking about), it is like this: When your receiver is busy, you tell the other end about this by means of a ReceiverNotReady primitive. However, it might take some time until the peer receives it and reacts on this. In the extreme case, there could still arrive up to the window size frames. It seems that the receiver can do whatever it wants to do with frames received during the busy condition: Either accept the frames (but delay acknowledgement until the busy condition is cleared) or just discard them. The first one seems to favor performance while the second favors simplicity. I guess in Linux, we should usually choose simplicity. I think even with the simpicity variant, we could be able to preserve performance if we can flow control the peer earlier. E.g. when the return value of your netif_rx indicates 'almost congested, but still able to accept frames', we could already set the busy condition but continue to deliver the frames arriving during our busy condition. But that´s performance tuning and can be taken care of later (I´m even not sure wheter this tuning will pay off). Henner - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
On Sat, Sep 16, 2000 at 11:39:45PM +0200, Henner Eisen wrote: > int netif_would_drop(dev) > { > return (queue->input_pkt_queue.qlen > netdev_max_backlog) > || ( (queue->input_pkt_queue.qlen) && (queue->throttle) ) > } > > would fulfil those requirements. It would just be racy. You test, get a not drop and then another different interrupt would deliver another packet before you can and fill the queue. Jamal's extended netif_rx probably makes more sense, because it can be atomic. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
> > With the current scheme, lapb first acknowleges reception of the frame > > and after that, netif_rx() might still discard it -- which is evil. > > This might screw things a bit. Can you defer to say first call > netif_rx() then acknowledge or is this hard-coded into the f/ware? I think its fixable to make it do the RR/RNR after bouncing it up the stack. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
Seems all the good network stuff gets discussed on l-k instead ;-< (hint: some people are not subscribed to l-k) On Sat, 16 Sep 2000, Henner Eisen wrote: > > What about a function to query the state of the backlog queue? > > Something like > > if(netif_would_drop(dev)){ > kfree_skb(skb); > /*optionally,if supported by lapb implementation:*/ > set_lapb_rx_busy_condition(); > return; > } > clear_lapb_rx_busy_condition(); /* if supported */ > pass_frame_to_lapb(lapb,skb); > > The key point is that we need to query the backlog queue and > discard the skb before lapb can acknowledge it. Simply discarding > it when backlog is known to be congested should be sufficient. It could > however improve performance if lapb did additionally flow control the peer. This should be resolved by a patch i am about to submit based on the OLS talk. netif_rx() now returns a value which tells you the congestion levels when you give it a packet (change from void netif_rx()) --- /* return values: * BLG_CNG_NONE(no congestion) * BLG_CNG_LOW (low congestion) * BLG_CNG_MOD (moderate congestion) * BLG_CNG_HIGH(high congestion) * BLG_CNG_DROP(packet was dropped) */ --- > > With the current scheme, lapb first acknowleges reception of the frame > and after that, netif_rx() might still discard it -- which is evil. > This might screw things a bit. Can you defer to say first call netif_rx() then acknowledge or is this hard-coded into the f/ware? > Provided that netif_would_drop(dev) is reliable (a subsequent netif_rf will > reliably not drop the frame), this should make the netif_rx path reliable. > > It seems that, on 2.4.0, something like > > int netif_would_drop(dev) > { > return (queue->input_pkt_queue.qlen > netdev_max_backlog) > || ( (queue->input_pkt_queue.qlen) && (queue->throttle) ) > } > > would fulfil those requirements. I think this would make it a little more complex than necessary; the queue state might change right after you return from netif_would_drop() -- maybe not, i am just hypothesizing. ** You can still create the netif_would_drop() --it just sounds too expensive to me since to be realy sure no packet of yours is dropped, you have to make this call for every packet. If you cant defer the acknowledgement until netif_rx() returns then what we could do is instead: 1) for devices that are registered with hardware flow control ==> you have to register as a CONFIG_NET_HW_FLOWCONTROL device. a) to let them queue that last packet before they are shut-up, the assumption is they respect the protocol and will 'back-off' after that. b) return BLG_CNG_WOULD_DROP instead to the device and give it the responsibility to free the skb or store it wherever it wants but not in the backlog. I personally prefer a). Reason: If we have done all the work so far(context switch etc) and we know the device is well behaved(meaning it is not going to send another packet without beiong told things are fine) then it is probably wiser to just let that packet get on the backlog queue. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
Hi, > "Alan" == Alan Cox <[EMAIL PROTECTED]> writes: >> However, for drivers which support intelligent controllers >> (with lapb in firmware) this is not an option and the problem >> will persist. Alan> 'Smart hardware is broken' repeat .. ;) - but yes its an Alan> issue there. These cards could bypass netif_rx and call Alan> directly to the lapb top end though ? What about a function to query the state of the backlog queue? Something like if(netif_would_drop(dev)){ kfree_skb(skb); /*optionally,if supported by lapb implementation:*/ set_lapb_rx_busy_condition(); return; } clear_lapb_rx_busy_condition(); /* if supported */ pass_frame_to_lapb(lapb,skb); The key point is that we need to query the backlog queue and discard the skb before lapb can acknowledge it. Simply discarding it when backlog is known to be congested should be sufficient. It could however improve performance if lapb did additionally flow control the peer. With the current scheme, lapb first acknowleges reception of the frame and after that, netif_rx() might still discard it -- which is evil. Provided that netif_would_drop(dev) is reliable (a subsequent netif_rf will reliably not drop the frame), this should make the netif_rx path reliable. It seems that, on 2.4.0, something like int netif_would_drop(dev) { return (queue->input_pkt_queue.qlen > netdev_max_backlog) || ( (queue->input_pkt_queue.qlen) && (queue->throttle) ) } would fulfil those requirements. Henner - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
Hi, > "kuznet" == kuznet <[EMAIL PROTECTED]> writes: kuznet> Hello! >> scheduler may re-order frames kuznet> It cannot, provided sender holds order until kuznet> dev_queue_xmit(). But if I set different skb->priority? ;) Well that would be my fault than .. >> or drop them. kuznet> Yes. And if you share _single_ device both for reliable kuznet> and unreliable services, you have to make special tricks. Well, I think this problem will not occur. For shared service, we will use a datalink protocol running above netif. (e.g. mixed X.25 and IP over ethernet where X.25 runs on top of 802.2 LLC.2 which will be implemented above netif). And for smart (firmware lapb) interfaces, which are the real problem, we won´t need to support shared service. >> be fixed by providing a special LAPB network scheduler which >> takes care about preserving reliable LAPB semantics. kuznet> Yes. ATM CLIP already does this, look at atm clip.c and kuznet> sch_atm.c to get an example. Yes. But the above seems to be a network scheduler specialized for passing IP down to an ATM tunnel. What I had in mind would correspond to a special scheduler for an atm net_device (but ATM does not use stadard linux net_device). >> that value before calling netif_rx(). Then upper layer´s >> worried about netif_rx() re-ordering can detect this and act >> appropriately. kuznet> etc. kuznet> No! kuznet> In fact, it is mathematical fact, that as soon as order is kuznet> broken once it is _impossible_ to restore it back. No kuznet> valid actions are invented to do this f.e. for TCP. Agreed. kuznet> Though with lapb the situation is different: it cannot kuznet> lose frames, this changes the situation. Unfortunatly, the netif_rx might still loose frames, and its concurrent netif_rx() which re-orders the frames. Thus, we cannot take advantage of reliable LAPB below netif_rx when packet loss and re-ordering occured above netif_rx(). kuznet> In any case, order must not be broken, if it is kuznet> essential. That's answer. I see. Apparently, IRQ affinity seems the only simple and cheep solution the re-ordering problem. Henner - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
Hello! > scheduler may re-order frames It cannot, provided sender holds order until dev_queue_xmit(). Actually, it is true about all the schedulers, except for the cases, when reordering is allowed explicitly with special policing rules. > or drop them. Yes. And if you share _single_ device both for reliable and unreliable services, you have to make special tricks. > be fixed by providing a special LAPB network scheduler which takes > care about preserving reliable LAPB semantics. Yes. ATM CLIP already does this, look at atm clip.c and sch_atm.c to get an example. > Maybe a general solution to the problem whould be to provide a special > skb->rx_seqno field for SMP kernels. Device drivers can maintain an > rx counter (they usually do so anyway in struct net_device_stats.rx_packets) > which is incremented whenever a new frame is received. The driver > then sets skb->rx_seqno to that value before calling netif_rx(). > Then upper layer´s worried about netif_rx() re-ordering can detect > this and act appropriately. etc. No! In fact, it is mathematical fact, that as soon as order is broken once it is _impossible_ to restore it back. No valid actions are invented to do this f.e. for TCP. Though with lapb the situation is different: it cannot lose frames, this changes the situation. In any case, order must not be broken, if it is essential. That's answer. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
Hi, > "Alan" == Alan Cox <[EMAIL PROTECTED]> writes: Alan> LAPB does not expect ever to see re-ordering. Its a point to Alan> point wire level MAC protocol. Yes, it was never designed for handling re-ordering because this cannot happen on a single wire (and for that reason it is not fair to blame LAPB for not handling re-ordering efficiently). But it seems that re-ordering will by handled nevertheless -- by retransmission of the not-in-sequence frames. At least the X.25 layer 2 spec for LAPB requires: "Reception of out-of-sequence I frames When the DCE receives a valid I frame whose send state sequence number N(S) is incorrect, it will discard the information field of the I frame and transmit an REJ frame with the N(R) set to one higher than the N(S) of the last correctly received I frame. ... ... The DCE will then discard the information field of all I frames received until the expected I frame is correctly received" Not very efficient, but it should work. And as long as re-ordering only happend occasionally, efficiency should not matter. And the fact that LAPB can actually recover from re-ordering problems really shows that it is good design: it can even recover from errors that were considered impossible in the design environment. Alan> 'Smart hardware is broken' repeat .. ;) - but yes its an ;) Alan> issue there. These cards could bypass netif_rx and call Alan> directly to the lapb top end though ? Yes, something like this. Maybe what´s missing is a standard interface in the kernel that allows to access reliable datalink service. Further, it´s not just a netif_rx() issue. dev_queue_xmit() does not provide reliable datalink semantics, either. The default network scheduler may re-order frames or drop them. But this could be fixed by providing a special LAPB network scheduler which takes care about preserving reliable LAPB semantics. Maybe a LAPB network scheduler would even be the best place to hook in the software lapb processing: the standard network scheduler queues could serve simultaneously as the output queue for lapb. Putting lapb anywhere else requires a dedicated LAPB output queue. Further, I´m wondering whether other protocols are also affected by re-ordering problems. X.25 is not the only protocol which relies on frames beeing received in sequence. Frame relay or ATM networks are also required to deliver frames/cells in sequence. And upper layer protocols might depend on this. Maybe a general solution to the problem whould be to provide a special skb->rx_seqno field for SMP kernels. Device drivers can maintain an rx counter (they usually do so anyway in struct net_device_stats.rx_packets) which is incremented whenever a new frame is received. The driver then sets skb->rx_seqno to that value before calling netif_rx(). Then upper layer´s worried about netif_rx() re-ordering can detect this and act appropriately. Only device drivers and protocols affected by re-ordering need to do so. Thus, protocols like tcp/ip which already can handle re-ordering by themselves wont´t be affected. They will only suffer from the increased skb size by sizeof(skb->rx_seqno). Maybe a convention to hook the rx_seqno into skb->cb could even work around this. It´s not trival yet when different upper layer protocols receive frames from the device. NET_RX_SOFTIRQ handler would need to map the device specific sequence numbers to consecutive protocol specific sequence numbers before calling ptype->func(). Henner - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
Hi, "Alan" == Alan Cox [EMAIL PROTECTED] writes: Alan LAPB does not expect ever to see re-ordering. Its a point to Alan point wire level MAC protocol. Yes, it was never designed for handling re-ordering because this cannot happen on a single wire (and for that reason it is not fair to blame LAPB for not handling re-ordering efficiently). But it seems that re-ordering will by handled nevertheless -- by retransmission of the not-in-sequence frames. At least the X.25 layer 2 spec for LAPB requires: "Reception of out-of-sequence I frames When the DCE receives a valid I frame whose send state sequence number N(S) is incorrect, it will discard the information field of the I frame and transmit an REJ frame with the N(R) set to one higher than the N(S) of the last correctly received I frame. ... ... The DCE will then discard the information field of all I frames received until the expected I frame is correctly received" Not very efficient, but it should work. And as long as re-ordering only happend occasionally, efficiency should not matter. And the fact that LAPB can actually recover from re-ordering problems really shows that it is good design: it can even recover from errors that were considered impossible in the design environment. Alan 'Smart hardware is broken' repeat .. ;) - but yes its an ;) Alan issue there. These cards could bypass netif_rx and call Alan directly to the lapb top end though ? Yes, something like this. Maybe what´s missing is a standard interface in the kernel that allows to access reliable datalink service. Further, it´s not just a netif_rx() issue. dev_queue_xmit() does not provide reliable datalink semantics, either. The default network scheduler may re-order frames or drop them. But this could be fixed by providing a special LAPB network scheduler which takes care about preserving reliable LAPB semantics. Maybe a LAPB network scheduler would even be the best place to hook in the software lapb processing: the standard network scheduler queues could serve simultaneously as the output queue for lapb. Putting lapb anywhere else requires a dedicated LAPB output queue. Further, I´m wondering whether other protocols are also affected by re-ordering problems. X.25 is not the only protocol which relies on frames beeing received in sequence. Frame relay or ATM networks are also required to deliver frames/cells in sequence. And upper layer protocols might depend on this. Maybe a general solution to the problem whould be to provide a special skb-rx_seqno field for SMP kernels. Device drivers can maintain an rx counter (they usually do so anyway in struct net_device_stats.rx_packets) which is incremented whenever a new frame is received. The driver then sets skb-rx_seqno to that value before calling netif_rx(). Then upper layer´s worried about netif_rx() re-ordering can detect this and act appropriately. Only device drivers and protocols affected by re-ordering need to do so. Thus, protocols like tcp/ip which already can handle re-ordering by themselves wont´t be affected. They will only suffer from the increased skb size by sizeof(skb-rx_seqno). Maybe a convention to hook the rx_seqno into skb-cb could even work around this. It´s not trival yet when different upper layer protocols receive frames from the device. NET_RX_SOFTIRQ handler would need to map the device specific sequence numbers to consecutive protocol specific sequence numbers before calling ptype-func(). Henner - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
Hello! scheduler may re-order frames It cannot, provided sender holds order until dev_queue_xmit(). Actually, it is true about all the schedulers, except for the cases, when reordering is allowed explicitly with special policing rules. or drop them. Yes. And if you share _single_ device both for reliable and unreliable services, you have to make special tricks. be fixed by providing a special LAPB network scheduler which takes care about preserving reliable LAPB semantics. Yes. ATM CLIP already does this, look at atm clip.c and sch_atm.c to get an example. Maybe a general solution to the problem whould be to provide a special skb-rx_seqno field for SMP kernels. Device drivers can maintain an rx counter (they usually do so anyway in struct net_device_stats.rx_packets) which is incremented whenever a new frame is received. The driver then sets skb-rx_seqno to that value before calling netif_rx(). Then upper layer´s worried about netif_rx() re-ordering can detect this and act appropriately. etc. No! In fact, it is mathematical fact, that as soon as order is broken once it is _impossible_ to restore it back. No valid actions are invented to do this f.e. for TCP. Though with lapb the situation is different: it cannot lose frames, this changes the situation. In any case, order must not be broken, if it is essential. That's answer. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
With the current scheme, lapb first acknowleges reception of the frame and after that, netif_rx() might still discard it -- which is evil. This might screw things a bit. Can you defer to say first call netif_rx() then acknowledge or is this hard-coded into the f/ware? I think its fixable to make it do the RR/RNR after bouncing it up the stack. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
On Sat, Sep 16, 2000 at 11:39:45PM +0200, Henner Eisen wrote: int netif_would_drop(dev) { return (queue-input_pkt_queue.qlen netdev_max_backlog) || ( (queue-input_pkt_queue.qlen) (queue-throttle) ) } would fulfil those requirements. It would just be racy. You test, get a not drop and then another different interrupt would deliver another packet before you can and fill the queue. Jamal's extended netif_rx probably makes more sense, because it can be atomic. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
Hi, "kuznet" == kuznet [EMAIL PROTECTED] writes: kuznet Hello! scheduler may re-order frames kuznet It cannot, provided sender holds order until kuznet dev_queue_xmit(). But if I set different skb-priority? ;) Well that would be my fault than .. or drop them. kuznet Yes. And if you share _single_ device both for reliable kuznet and unreliable services, you have to make special tricks. Well, I think this problem will not occur. For shared service, we will use a datalink protocol running above netif. (e.g. mixed X.25 and IP over ethernet where X.25 runs on top of 802.2 LLC.2 which will be implemented above netif). And for smart (firmware lapb) interfaces, which are the real problem, we won´t need to support shared service. be fixed by providing a special LAPB network scheduler which takes care about preserving reliable LAPB semantics. kuznet Yes. ATM CLIP already does this, look at atm clip.c and kuznet sch_atm.c to get an example. Yes. But the above seems to be a network scheduler specialized for passing IP down to an ATM tunnel. What I had in mind would correspond to a special scheduler for an atm net_device (but ATM does not use stadard linux net_device). that value before calling netif_rx(). Then upper layer´s worried about netif_rx() re-ordering can detect this and act appropriately. kuznet etc. kuznet No! kuznet In fact, it is mathematical fact, that as soon as order is kuznet broken once it is _impossible_ to restore it back. No kuznet valid actions are invented to do this f.e. for TCP. Agreed. kuznet Though with lapb the situation is different: it cannot kuznet lose frames, this changes the situation. Unfortunatly, the netif_rx might still loose frames, and its concurrent netif_rx() which re-orders the frames. Thus, we cannot take advantage of reliable LAPB below netif_rx when packet loss and re-ordering occured above netif_rx(). kuznet In any case, order must not be broken, if it is kuznet essential. That's answer. I see. Apparently, IRQ affinity seems the only simple and cheep solution the re-ordering problem. Henner - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
Hi, "Alan" == Alan Cox [EMAIL PROTECTED] writes: However, for drivers which support intelligent controllers (with lapb in firmware) this is not an option and the problem will persist. Alan 'Smart hardware is broken' repeat .. ;) - but yes its an Alan issue there. These cards could bypass netif_rx and call Alan directly to the lapb top end though ? What about a function to query the state of the backlog queue? Something like if(netif_would_drop(dev)){ kfree_skb(skb); /*optionally,if supported by lapb implementation:*/ set_lapb_rx_busy_condition(); return; } clear_lapb_rx_busy_condition(); /* if supported */ pass_frame_to_lapb(lapb,skb); The key point is that we need to query the backlog queue and discard the skb before lapb can acknowledge it. Simply discarding it when backlog is known to be congested should be sufficient. It could however improve performance if lapb did additionally flow control the peer. With the current scheme, lapb first acknowleges reception of the frame and after that, netif_rx() might still discard it -- which is evil. Provided that netif_would_drop(dev) is reliable (a subsequent netif_rf will reliably not drop the frame), this should make the netif_rx path reliable. It seems that, on 2.4.0, something like int netif_would_drop(dev) { return (queue-input_pkt_queue.qlen netdev_max_backlog) || ( (queue-input_pkt_queue.qlen) (queue-throttle) ) } would fulfil those requirements. Henner - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
Seems all the good network stuff gets discussed on l-k instead ;- (hint: some people are not subscribed to l-k) On Sat, 16 Sep 2000, Henner Eisen wrote: What about a function to query the state of the backlog queue? Something like if(netif_would_drop(dev)){ kfree_skb(skb); /*optionally,if supported by lapb implementation:*/ set_lapb_rx_busy_condition(); return; } clear_lapb_rx_busy_condition(); /* if supported */ pass_frame_to_lapb(lapb,skb); The key point is that we need to query the backlog queue and discard the skb before lapb can acknowledge it. Simply discarding it when backlog is known to be congested should be sufficient. It could however improve performance if lapb did additionally flow control the peer. This should be resolved by a patch i am about to submit based on the OLS talk. netif_rx() now returns a value which tells you the congestion levels when you give it a packet (change from void netif_rx()) --- /* return values: * BLG_CNG_NONE(no congestion) * BLG_CNG_LOW (low congestion) * BLG_CNG_MOD (moderate congestion) * BLG_CNG_HIGH(high congestion) * BLG_CNG_DROP(packet was dropped) */ --- With the current scheme, lapb first acknowleges reception of the frame and after that, netif_rx() might still discard it -- which is evil. This might screw things a bit. Can you defer to say first call netif_rx() then acknowledge or is this hard-coded into the f/ware? Provided that netif_would_drop(dev) is reliable (a subsequent netif_rf will reliably not drop the frame), this should make the netif_rx path reliable. It seems that, on 2.4.0, something like int netif_would_drop(dev) { return (queue-input_pkt_queue.qlen netdev_max_backlog) || ( (queue-input_pkt_queue.qlen) (queue-throttle) ) } would fulfil those requirements. I think this would make it a little more complex than necessary; the queue state might change right after you return from netif_would_drop() -- maybe not, i am just hypothesizing. ** You can still create the netif_would_drop() --it just sounds too expensive to me since to be realy sure no packet of yours is dropped, you have to make this call for every packet. If you cant defer the acknowledgement until netif_rx() returns then what we could do is instead: 1) for devices that are registered with hardware flow control == you have to register as a CONFIG_NET_HW_FLOWCONTROL device. a) to let them queue that last packet before they are shut-up, the assumption is they respect the protocol and will 'back-off' after that. b) return BLG_CNG_WOULD_DROP instead to the device and give it the responsibility to free the skb or store it wherever it wants but not in the backlog. I personally prefer a). Reason: If we have done all the work so far(context switch etc) and we know the device is well behaved(meaning it is not going to send another packet without beiong told things are fine) then it is probably wiser to just let that packet get on the backlog queue. cheers, jamal - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
> LAPB itsself should be able to recover from reordering, although it is > not optimzed for this. It will just discard any received out-of-sequence > frame. The discarded frames will be retransmitted later (exacly like > frames which had been discarded due to CRC errors). LAPB does not expect ever to see re-ordering. Its a point to point wire level MAC protocol. > For drivers using the software lapb module implementation, the right fix > would obviously be to move the lapb processing above the network interface. Agreed > However, for drivers which support intelligent controllers (with lapb > in firmware) this is not an option and the problem will persist. 'Smart hardware is broken' repeat .. ;) - but yes its an issue there. These cards could bypass netif_rx and call directly to the lapb top end though ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
Hi, > "David" == David S Miller <[EMAIL PROTECTED]> writes: David>It smells rotten to the core, can someone tell me David> exactly why reordering is strictly disallowed? I do not David> even know how other OSes can handle this properly since David> most, if not all, use the IRQ dynamic cpu targeting David> facilities of various machines so LAPB is by definition David> broken there too. LAPB itsself should be able to recover from reordering, although it is not optimzed for this. It will just discard any received out-of-sequence frame. The discarded frames will be retransmitted later (exacly like frames which had been discarded due to CRC errors). The problem is the X.25 packet layer (layer 3). It assumes that the LAPB layer has already fixed any lost frames and out-of-sequence problems and therefor does not provide for an own error recovery mechanism. It will detect when frames are missing or out of sequence. But as it cannot recover from such errors, it will just initiate a reset procedure (discarding all currently queued frames, set the state machine to a known state, and tell the network and the peer to also do so, before data transmission resumes. The upper layer is notified about the reset event, the task to recover from the packet loss is left to the upper layer.) David>I sense that usually, LAPB handles this issue at a David> different level, in the hardware? Does LAPB specify how to David> maintain reliably delivery and could we hook into this David> "how" when we need to drop LAPB frames? Perhaps it is too David> late by the time netif_rx is dealing with it. The lapb protocol allows to flow control the peer. So, if known in advance that netif_rx() would discard the frame, it could set its rx_busy condition. (The linux software lapb module however does not support this, but this problem is yet a different matter). From looking at the netif_rx() source, it seems that CONFIG_NET_HW_FLOWCONTROL almost could provide the necessary state information for flow controling the peer. David> LAPB sounds like quite a broken protocol at the moment... David> But I'm sure there are details which will emerge and clear David> this all up. Well, not just at the moment, it has ever been like this. Thus, as we did not panic before, there is neither reason to panic now. Actually, its not the LAPB protocol itsself that is broken, but the way of accessing it from the X.25 packet layer (reliable datalink service is accessed via the unreliable dev_queue_xmit()/netif_rx() interface). I always wondered why it was done like this. Probably the possible problems were not realized during the early design stage and did not show up when testing. (The problems might be unlikly to occur in real-world scenarios. As real-world X.25 connections usually use only slow links (a few kByte/sec), it is very unlikly that the X.25 connection itsself caused the NET_RX queue to overrun. It might only be triggered when the host is simultaneously flooded with other traffic from a local high speed lan network interface. Triggering SMP packet reordering problems with a slow X.25 link is probably even more unlikely). For drivers using the software lapb module implementation, the right fix would obviously be to move the lapb processing above the network interface. (We will need to provide a function call interface between X.25 packet layer and the datalink layer anyway once LLC.2 from the Linux-SNA project is merged and should be supported by X.25 as well). However, for drivers which support intelligent controllers (with lapb in firmware) this is not an option and the problem will persist. Henner - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
>I sense that usually, LAPB handles this issue at a different >level, in the hardware? Does LAPB specify how to maintain >reliably delivery and could we hook into this "how" when we >need to drop LAPB frames? Perhaps it is too late by the time >netif_rx is dealing with it. LAPB maintains a window of (normally 8) frames. When a frame is accepted it as acked (RR) or if there is no room it is rejected (RNR). Once it has been accepted with an RR it can from that point onwards not get lost. > LAPB sounds like quite a broken protocol at the moment... But I'm > sure there are details which will emerge and clear this all up. LAPB isnt broken, its actually rather clever and ideal for tiny low power devices - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
From: [EMAIL PROTECTED] Date: Fri, 15 Sep 2000 21:07:38 +0400 (MSK DST) [ Dave, all this sounds bad. ] Well, there are two things: 1) If exact sequencing is so important, then we can make special netif_rx tasklet for these guys which serializes around a spinlock. Actually, even with this, how could we guarentee this still. Yes, IRQ affinity would need to force only a single CPU to receive interrupts from this LAPB device. It smells rotten to the core, can someone tell me exactly why reordering is strictly disallowed? I do not even know how other OSes can handle this properly since most, if not all, use the IRQ dynamic cpu targeting facilities of various machines so LAPB is by definition broken there too. 2) Someone please show Alexey and myself how to process input packet when out of memory and not to drop any packets ;-) I sense that usually, LAPB handles this issue at a different level, in the hardware? Does LAPB specify how to maintain reliably delivery and could we hook into this "how" when we need to drop LAPB frames? Perhaps it is too late by the time netif_rx is dealing with it. LAPB sounds like quite a broken protocol at the moment... But I'm sure there are details which will emerge and clear this all up. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
Hello! > But I realized another problem X.25 related SMP problem -- this time > related to input. The protocol design assumes that the transmission > path preserves the packet ordering. It seems that with 2.4.0 SMP, the ordering > of the packets when received from the wire is not necessarily the same > as when delivered to the protocol´s receive method. Is this true? This is true. > recover from such errors transparently. Unfortunatly, the current design > assumes that the LAPB layer is performed below the network interface. I.e. on hard irq? It was not a good idea. > Although this allows to support controllers which implement LAPB in firmware, This is really difficult case. > this seems to break the assumptions made by upper layers. The upper layer > assumes that LAPB devices provide a reliable datalink service. But the Linux > network interfaces do not preserve such reliable semantics. (Network > interfaces may drop frames, e.g. when NET_RX input queue overruns, No way to fix. I do not know how to make this. > and on SMP packet sequencing might change), Though this one can be fixed by restricting affinity, all this smells like you cannot use netif_rx() for such devices. netif_rx() is used for normal "unreliable" devices, it loses any sense as soon as we require some reliability... [ Dave, all this sounds bad. ] Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
Hello! But I realized another problem X.25 related SMP problem -- this time related to input. The protocol design assumes that the transmission path preserves the packet ordering. It seems that with 2.4.0 SMP, the ordering of the packets when received from the wire is not necessarily the same as when delivered to the protocol´s receive method. Is this true? This is true. recover from such errors transparently. Unfortunatly, the current design assumes that the LAPB layer is performed below the network interface. I.e. on hard irq? It was not a good idea. Although this allows to support controllers which implement LAPB in firmware, This is really difficult case. this seems to break the assumptions made by upper layers. The upper layer assumes that LAPB devices provide a reliable datalink service. But the Linux network interfaces do not preserve such reliable semantics. (Network interfaces may drop frames, e.g. when NET_RX input queue overruns, No way to fix. I do not know how to make this. and on SMP packet sequencing might change), Though this one can be fixed by restricting affinity, all this smells like you cannot use netif_rx() for such devices. netif_rx() is used for normal "unreliable" devices, it loses any sense as soon as we require some reliability... [ Dave, all this sounds bad. ] Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
From: [EMAIL PROTECTED] Date: Fri, 15 Sep 2000 21:07:38 +0400 (MSK DST) [ Dave, all this sounds bad. ] Well, there are two things: 1) If exact sequencing is so important, then we can make special netif_rx tasklet for these guys which serializes around a spinlock. Actually, even with this, how could we guarentee this still. Yes, IRQ affinity would need to force only a single CPU to receive interrupts from this LAPB device. It smells rotten to the core, can someone tell me exactly why reordering is strictly disallowed? I do not even know how other OSes can handle this properly since most, if not all, use the IRQ dynamic cpu targeting facilities of various machines so LAPB is by definition broken there too. 2) Someone please show Alexey and myself how to process input packet when out of memory and not to drop any packets ;-) I sense that usually, LAPB handles this issue at a different level, in the hardware? Does LAPB specify how to maintain reliably delivery and could we hook into this "how" when we need to drop LAPB frames? Perhaps it is too late by the time netif_rx is dealing with it. LAPB sounds like quite a broken protocol at the moment... But I'm sure there are details which will emerge and clear this all up. Later, David S. Miller [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
Hi, "David" == David S Miller [EMAIL PROTECTED] writes: DavidIt smells rotten to the core, can someone tell me David exactly why reordering is strictly disallowed? I do not David even know how other OSes can handle this properly since David most, if not all, use the IRQ dynamic cpu targeting David facilities of various machines so LAPB is by definition David broken there too. LAPB itsself should be able to recover from reordering, although it is not optimzed for this. It will just discard any received out-of-sequence frame. The discarded frames will be retransmitted later (exacly like frames which had been discarded due to CRC errors). The problem is the X.25 packet layer (layer 3). It assumes that the LAPB layer has already fixed any lost frames and out-of-sequence problems and therefor does not provide for an own error recovery mechanism. It will detect when frames are missing or out of sequence. But as it cannot recover from such errors, it will just initiate a reset procedure (discarding all currently queued frames, set the state machine to a known state, and tell the network and the peer to also do so, before data transmission resumes. The upper layer is notified about the reset event, the task to recover from the packet loss is left to the upper layer.) DavidI sense that usually, LAPB handles this issue at a David different level, in the hardware? Does LAPB specify how to David maintain reliably delivery and could we hook into this David "how" when we need to drop LAPB frames? Perhaps it is too David late by the time netif_rx is dealing with it. The lapb protocol allows to flow control the peer. So, if known in advance that netif_rx() would discard the frame, it could set its rx_busy condition. (The linux software lapb module however does not support this, but this problem is yet a different matter). From looking at the netif_rx() source, it seems that CONFIG_NET_HW_FLOWCONTROL almost could provide the necessary state information for flow controling the peer. David LAPB sounds like quite a broken protocol at the moment... David But I'm sure there are details which will emerge and clear David this all up. Well, not just at the moment, it has ever been like this. Thus, as we did not panic before, there is neither reason to panic now. Actually, its not the LAPB protocol itsself that is broken, but the way of accessing it from the X.25 packet layer (reliable datalink service is accessed via the unreliable dev_queue_xmit()/netif_rx() interface). I always wondered why it was done like this. Probably the possible problems were not realized during the early design stage and did not show up when testing. (The problems might be unlikly to occur in real-world scenarios. As real-world X.25 connections usually use only slow links (a few kByte/sec), it is very unlikly that the X.25 connection itsself caused the NET_RX queue to overrun. It might only be triggered when the host is simultaneously flooded with other traffic from a local high speed lan network interface. Triggering SMP packet reordering problems with a slow X.25 link is probably even more unlikely). For drivers using the software lapb module implementation, the right fix would obviously be to move the lapb processing above the network interface. (We will need to provide a function call interface between X.25 packet layer and the datalink layer anyway once LLC.2 from the Linux-SNA project is merged and should be supported by X.25 as well). However, for drivers which support intelligent controllers (with lapb in firmware) this is not an option and the problem will persist. Henner - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
LAPB itsself should be able to recover from reordering, although it is not optimzed for this. It will just discard any received out-of-sequence frame. The discarded frames will be retransmitted later (exacly like frames which had been discarded due to CRC errors). LAPB does not expect ever to see re-ordering. Its a point to point wire level MAC protocol. For drivers using the software lapb module implementation, the right fix would obviously be to move the lapb processing above the network interface. Agreed However, for drivers which support intelligent controllers (with lapb in firmware) this is not an option and the problem will persist. 'Smart hardware is broken' repeat .. ;) - but yes its an issue there. These cards could bypass netif_rx and call directly to the lapb top end though ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
I sense that usually, LAPB handles this issue at a different level, in the hardware? Does LAPB specify how to maintain reliably delivery and could we hook into this "how" when we need to drop LAPB frames? Perhaps it is too late by the time netif_rx is dealing with it. LAPB maintains a window of (normally 8) frames. When a frame is accepted it as acked (RR) or if there is no room it is rejected (RNR). Once it has been accepted with an RR it can from that point onwards not get lost. LAPB sounds like quite a broken protocol at the moment... But I'm sure there are details which will emerge and clear this all up. LAPB isnt broken, its actually rather clever and ideal for tiny low power devices - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
Hi, > "kuznet" == kuznet <[EMAIL PROTECTED]> writes: >> when sk->lock.users!=0. Is there a particular reason why such >> task queue does not exist? kuznet> Because it appeared to be useless overhead. I also kuznet> believed that it will be required in tcp, but one day I kuznet> understood that all the problems of these kind kuznet> dissolved. 8) Yes, probably, this should also hold for other protocols. I need to study the protocol specs for further details. But I realized another problem X.25 related SMP problem -- this time related to input. The protocol design assumes that the transmission path preserves the packet ordering. It seems that with 2.4.0 SMP, the ordering of the packets when received from the wire is not necessarily the same as when delivered to the protocol´s receive method. Is this true? LAPB should be able to recover from such sequence errors. But the X.25 packet layer can only detect such problem (and reset the connection). It cannot recover from such errors transparently. Unfortunatly, the current design assumes that the LAPB layer is performed below the network interface. Although this allows to support controllers which implement LAPB in firmware, this seems to break the assumptions made by upper layers. The upper layer assumes that LAPB devices provide a reliable datalink service. But the Linux network interfaces do not preserve such reliable semantics. (Network interfaces may drop frames, e.g. when NET_RX input queue overruns, and and on SMP packet sequencing might change), Henner - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
Hello! > timer events where the protocol specs require immediate reaction and > which need to change socket state. For such events, it might not > be obvious how to defer them when sk->lock.users != 0. After some thinking, you will understand that "timer" and "immediate" are incompatible. TCP just defers such events, look into tcp_timer.c. Yes, you are right: the problem exists and you can try to solve this f.e. queueing special control events to backlog. > when sk->lock.users!=0. Is there a particular reason why such task queue > does not exist? Because it appeared to be useless overhead. I also believed that it will be required in tcp, but one day I understood that all the problems of these kind dissolved. 8) Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
Hello! timer events where the protocol specs require immediate reaction and which need to change socket state. For such events, it might not be obvious how to defer them when sk-lock.users != 0. After some thinking, you will understand that "timer" and "immediate" are incompatible. TCP just defers such events, look into tcp_timer.c. Yes, you are right: the problem exists and you can try to solve this f.e. queueing special control events to backlog. when sk-lock.users!=0. Is there a particular reason why such task queue does not exist? Because it appeared to be useless overhead. I also believed that it will be required in tcp, but one day I understood that all the problems of these kind dissolved. 8) Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
Hi, "kuznet" == kuznet [EMAIL PROTECTED] writes: when sk-lock.users!=0. Is there a particular reason why such task queue does not exist? kuznet Because it appeared to be useless overhead. I also kuznet believed that it will be required in tcp, but one day I kuznet understood that all the problems of these kind kuznet dissolved. 8) Yes, probably, this should also hold for other protocols. I need to study the protocol specs for further details. But I realized another problem X.25 related SMP problem -- this time related to input. The protocol design assumes that the transmission path preserves the packet ordering. It seems that with 2.4.0 SMP, the ordering of the packets when received from the wire is not necessarily the same as when delivered to the protocol´s receive method. Is this true? LAPB should be able to recover from such sequence errors. But the X.25 packet layer can only detect such problem (and reset the connection). It cannot recover from such errors transparently. Unfortunatly, the current design assumes that the LAPB layer is performed below the network interface. Although this allows to support controllers which implement LAPB in firmware, this seems to break the assumptions made by upper layers. The upper layer assumes that LAPB devices provide a reliable datalink service. But the Linux network interfaces do not preserve such reliable semantics. (Network interfaces may drop frames, e.g. when NET_RX input queue overruns, and and on SMP packet sequencing might change), Henner - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
Hi, > "kuznet" == kuznet <[EMAIL PROTECTED]> writes: >> Anyway, it seems that I can already make use the lock_sock() >> infrastructure for fixing the output serialization, even >> without making the whole protocol stack SMP-aware at once. kuznet> Actually, the last task is not a rocket science as well. Yes. It seems the most critical part is changing timer events to honor sk->lock.users and doing sock_hold/put(). There might be timer events where the protocol specs require immediate reaction and which need to change socket state. For such events, it might not be obvious how to defer them when sk->lock.users != 0. While deferring socket input is explictly supported (by processing the sk->backlog queue in release_sock()), there is no special support for deferring non-input events. Maybe in addition to processing the sk->backlog queue, release_sock could also run a backlog task_queue? Such task_queue could be used by other events to defer actions when sk->lock.users!=0. Is there a particular reason why such task queue does not exist? Henner - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
Hi, "kuznet" == kuznet [EMAIL PROTECTED] writes: Anyway, it seems that I can already make use the lock_sock() infrastructure for fixing the output serialization, even without making the whole protocol stack SMP-aware at once. kuznet Actually, the last task is not a rocket science as well. Yes. It seems the most critical part is changing timer events to honor sk-lock.users and doing sock_hold/put(). There might be timer events where the protocol specs require immediate reaction and which need to change socket state. For such events, it might not be obvious how to defer them when sk-lock.users != 0. While deferring socket input is explictly supported (by processing the sk-backlog queue in release_sock()), there is no special support for deferring non-input events. Maybe in addition to processing the sk-backlog queue, release_sock could also run a backlog task_queue? Such task_queue could be used by other events to defer actions when sk-lock.users!=0. Is there a particular reason why such task queue does not exist? Henner - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
Hello! > Yes, I see. I did not realize before that the lock_sock and the > sk->backlog framework are not two independent things. They really > seem to be designed for team work only. Did I get this right? Yes. Actually, in 2.4 lock_sock() is also semaphore and in some cases (f.e. for stateless datagram sockets) it is used as pure semaphore. > applied seems to make socket propgramming as easy as in the old cli()/sti() > days again. What??? 8)8) No, it makes it much easier. 8) By the way: 1. lock_sock() is not much younger than cli()/sti(). 2. cli()/sti() is not used by attended parts of networking for looong time, they are deprecated not yesterday too. > tcp also seems to use some additional protocol-global spinlocks Of course. > (like tcp_portalloc_lock). And this is redundant, to be honest. 8) > the spinlock. In that case, beeing preemptable would make a very essential > difference. Yes, of course. > Anyway, it seems that I can already make use the lock_sock() infrastructure > for fixing the output serialization, even without making the whole > protocol stack SMP-aware at once. Actually, the last task is not a rocket science as well. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
Hi, > "kuznet" == kuznet <[EMAIL PROTECTED]> writes: kuznet> Hello! kuznet> In input path you have a packet. Add it to backlog and kuznet> processing will be resumed after lock is released. Compare kuznet> with tcp. >> serializing the kick. Well, maybe my solution could still be >> simplified (maybe some test_and_set/clear_bit() magic could >> achieve the same). kuznet> Being legal in principle, using non-standard serialization kuznet> primitives is seriously deprecated. It is impossible to kuznet> maintain. In you case, it is even not evident that it does kuznet> not lose events with smp. Yes, I see. I did not realize before that the lock_sock and the sk->backlog framework are not two independent things. They really seem to be designed for team work only. Did I get this right? And I realize that the lock_sock framework is superior to my approach. It does not only serialize output, it also serialized output and input such that other problems are solved as well (the current code, even after serializing output, could still suffer from atomicity problems when input path interrupts output path an modifies protocol control block variables in a non-atomic manner). The lock_sock framework properly applied seems to make socket propgramming as easy as in the old cli()/sti() days again. Basically, it seems to be a ´disable interrupts for this socket´ >> - introduce a protocol-global spinlock and protect >> protocol-global critical section by spin_lock_bh() instead of >> cli() kuznet> Why? It is not required. There are no reasons to protect kuznet> protocol as whole, if sockets are protected. Well, the term ´protocol global´ was misleading. I should have said ´global to the protocol-family´. E.g. there are currently some cli()/sti() pairs to protect socket list and routing table manipulations. tcp also seems to use some additional protocol-global spinlocks (like tcp_portalloc_lock). >> Can NET_TX_SOFTIRQ be prempted by NET_RX_SOFTIRQ or timer? kuznet> It cannot be preempted, but it is not very essential, kuznet> because all they can run in parallel on different cpus. The reason why I was asking is that I recently got IP and PPP tunneling over X.25 working (in-kernel). In that case, protocol output processing would be done from NET_TX_SOFTIRQ context, which is only allowed to to bh_lock_sock(), but not lock_sock(). As bh_lock_sock() is just spin_lock() -- and not spin_lock_bh() -- this could stall the CPU if NET_TX_SOFTIRQ were preempted by a timer or NET_RX_SOFTIRQ while holding the spinlock. In that case, beeing preemptable would make a very essential difference. kuznet> Alexey Thanks for the insight. I hope this is sufficient to migrate the code. Not before 2.4.0-final, however :-). Anyway, it seems that I can already make use the lock_sock() infrastructure for fixing the output serialization, even without making the whole protocol stack SMP-aware at once. Henner - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
Hi, "kuznet" == kuznet [EMAIL PROTECTED] writes: kuznet Hello! kuznet In input path you have a packet. Add it to backlog and kuznet processing will be resumed after lock is released. Compare kuznet with tcp. serializing the kick. Well, maybe my solution could still be simplified (maybe some test_and_set/clear_bit() magic could achieve the same). kuznet Being legal in principle, using non-standard serialization kuznet primitives is seriously deprecated. It is impossible to kuznet maintain. In you case, it is even not evident that it does kuznet not lose events with smp. Yes, I see. I did not realize before that the lock_sock and the sk-backlog framework are not two independent things. They really seem to be designed for team work only. Did I get this right? And I realize that the lock_sock framework is superior to my approach. It does not only serialize output, it also serialized output and input such that other problems are solved as well (the current code, even after serializing output, could still suffer from atomicity problems when input path interrupts output path an modifies protocol control block variables in a non-atomic manner). The lock_sock framework properly applied seems to make socket propgramming as easy as in the old cli()/sti() days again. Basically, it seems to be a ´disable interrupts for this socket´ - introduce a protocol-global spinlock and protect protocol-global critical section by spin_lock_bh() instead of cli() kuznet Why? It is not required. There are no reasons to protect kuznet protocol as whole, if sockets are protected. Well, the term ´protocol global´ was misleading. I should have said ´global to the protocol-family´. E.g. there are currently some cli()/sti() pairs to protect socket list and routing table manipulations. tcp also seems to use some additional protocol-global spinlocks (like tcp_portalloc_lock). Can NET_TX_SOFTIRQ be prempted by NET_RX_SOFTIRQ or timer? kuznet It cannot be preempted, but it is not very essential, kuznet because all they can run in parallel on different cpus. The reason why I was asking is that I recently got IP and PPP tunneling over X.25 working (in-kernel). In that case, protocol output processing would be done from NET_TX_SOFTIRQ context, which is only allowed to to bh_lock_sock(), but not lock_sock(). As bh_lock_sock() is just spin_lock() -- and not spin_lock_bh() -- this could stall the CPU if NET_TX_SOFTIRQ were preempted by a timer or NET_RX_SOFTIRQ while holding the spinlock. In that case, beeing preemptable would make a very essential difference. kuznet Alexey Thanks for the insight. I hope this is sufficient to migrate the code. Not before 2.4.0-final, however :-). Anyway, it seems that I can already make use the lock_sock() infrastructure for fixing the output serialization, even without making the whole protocol stack SMP-aware at once. Henner - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Q: sock output serialization
Hello! > I guess I´d also need to call lock_sock() from sendmsg(). And before > calling x25_kick from socket input path, I´d need to verify that > sk->lock.users is zero. If sk->lock.users was !=0, I´d need some atomic > variable anyway in order to defer the kick. In input path you have a packet. Add it to backlog and processing will be resumed after lock is released. Compare with tcp. > serializing the kick. Well, maybe my solution could still be simplified > (maybe some test_and_set/clear_bit() magic could achieve the same). Being legal in principle, using non-standard serialization primitives is seriously deprecated. It is impossible to maintain. In you case, it is even not evident that it does not lose events with smp. > - introduce a protocol-global spinlock and protect protocol-global > critical section by spin_lock_bh() instead of cli() Why? It is not required. There are no reasons to protect protocol as whole, if sockets are protected. > - protect all sock proto_ops methods by lock_sock() Yes. > - when bh functions need to be protected from sk state change, they > need to aquire bh_lock_sock() And check for sk->lock.users. If it is not zero, operation is deferred. > - before bh (timer) function change sk state, they need to aquire > bh_lock_sock and verify, that sk->lock.users!=0 Yes. > - remove the SOCKOPS_WRAPPED() macro from the proto_ops Yes. And finally announce protocol to be SMP aware setting data field of packet_type to 1. > Can NET_TX_SOFTIRQ be prempted by NET_RX_SOFTIRQ or timer? It cannot be preempted, but it is not very essential, because all they can run in parallel on different cpus. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Q: sock output serialization
Hi, Is the following fix clean or are there better solutions? There is a race condition in the Linux X.25 protocol stack. The stack has an x25_kick() function which dequeues as many skb´s from sk->write_queue as the send windows allows and sends them downwards. This kick function is called from send_msg() as well as (when an acknowledge arrives) from the input path of the socket code. The latter is usually called from NET_RX_SOFTIRQ and might therfore interrupt an x25_kick() executed on behalf of send_msg(). (This is a problem because it could mess up packet order which needs to be preserved with X.25). The fix I came up with consists of replacing current x25_kick() by inlined __x25_kick() and defining a new x25_kick() which wraps the old function as follows: atomic_inc(>protinfo.x25->kick_it); if((atomic_read(>protinfo.x25->kick_it)) != 1) return; do { __x25_kick(sk); } while (!atomic_dec_and_test(>protinfo.x25->kick_it)); This makes __x25_kick single threaded per socket, the first thread in __x25_kick() will also perform the work for possible other threads which have tried to interrupt the first thread. Is this a proper approach or are there better solutions (e.g. more SMP friendly, less overhead on certain hardware arch)? What about 2.2.x? This should also work for 2.2.x, but for 2.2.x I could also wrap __x25_kick() inside {start,stop}_bh_atomic(), I guess. Henner - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/