Re: [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?))
Hello! > However, could we not have dev_queue_xmit behave as such (not free > frame on failure)? If you need to hold original skb, you may hold its refcnt. However, this feature inevitably results in big troubles: dev_queue_xmit() is allowed to change skb and you cannot assume anything about this. So that resuing skb dev_queue_xmit() is fatal bug not depending on anything. > The reason why I'm proposing this is that ppp_generic.c assumes that > the skb is still available after a transmission failure via pppoe. To > support the semantics of dev_queue_xmit and ppp_generic we would have > to always copy skb's inside pppoe_xmit. Then, if dev_queue_xmit fails > the original is deleted. You need not copy it. I said "clone". Nobody is allowed to touch _data_ part of skb, so that you need not to copy data. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?))
Hello! SOme short comment on the patch: > - dev_queue_xmit(skb); > + /* The skb we are to transmit may be a copy (see above). If > + * this fails, then the caller is responsible for the original > + * skb, otherwise we must free it. Also if this fails we must > + * free the copy that we made. > + */ > + > + if (dev_queue_xmit(skb)<0) { dev_queue_xmit _frees_ frame, not depending on return value. Return value is not a criterium to assume anything. > + if (old_skb) { > + /* The skb we tried to send was a copy. We > + * have to free it (the copy) and let the > + * caller deal with the original one. > + */ > + skb_unlink(skb); So, do you pass to dev_queue_xmit some skb, which is on some list? Not a good idea. Please, clone it and submit clone, if you need to hold original in some list. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?))
Hello! SOme short comment on the patch: - dev_queue_xmit(skb); + /* The skb we are to transmit may be a copy (see above). If + * this fails, then the caller is responsible for the original + * skb, otherwise we must free it. Also if this fails we must + * free the copy that we made. + */ + + if (dev_queue_xmit(skb)0) { dev_queue_xmit _frees_ frame, not depending on return value. Return value is not a criterium to assume anything. + if (old_skb) { + /* The skb we tried to send was a copy. We + * have to free it (the copy) and let the + * caller deal with the original one. + */ + skb_unlink(skb); So, do you pass to dev_queue_xmit some skb, which is on some list? Not a good idea. Please, clone it and submit clone, if you need to hold original in some list. Alexey - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?))
Hello! However, could we not have dev_queue_xmit behave as such (not free frame on failure)? If you need to hold original skb, you may hold its refcnt. However, this feature inevitably results in big troubles: dev_queue_xmit() is allowed to change skb and you cannot assume anything about this. So that resuing skb dev_queue_xmit() is fatal bug not depending on anything. The reason why I'm proposing this is that ppp_generic.c assumes that the skb is still available after a transmission failure via pppoe. To support the semantics of dev_queue_xmit and ppp_generic we would have to always copy skb's inside pppoe_xmit. Then, if dev_queue_xmit fails the original is deleted. You need not copy it. I said clone. Nobody is allowed to touch _data_ part of skb, so that you need not to copy data. Alexey - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: softirq in pre3 and all linux ports
Hello! > Soft irqs should definitely not be much heavier than an irq handler, > if they are then we have implemented them wrongly somehow. For example, all the networking nicely fits to this class. :-) Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: softirq in pre3 and all linux ports
Hello! Soft irqs should definitely not be much heavier than an irq handler, if they are then we have implemented them wrongly somehow. For example, all the networking nicely fits to this class. :-) Alexey - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IPv6: the same address can be added multiple times
Hello! > 2) no significant restrictions (==this) When user asks to create some object, the only required thing of any reasonable interface is to return an error when the object is not added. KAME's one is broken, ours is _one_ of right ones. Another example of bad mistake is mine: I have made some crap with creating tunnels: adding tunnel does not fail, when such tunnel already exists, so that user has no idea, whether did it create tunnel (and should it delete it) or someone another made this work. Note, that if we would be able to create _duplicate_ tunnels on each new request (like IPv6 addresses), this would be also right approach. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IPv6: the same address can be added multiple times
Hello! 2) no significant restrictions (==this) When user asks to create some object, the only required thing of any reasonable interface is to return an error when the object is not added. KAME's one is broken, ours is _one_ of right ones. Another example of bad mistake is mine: I have made some crap with creating tunnels: adding tunnel does not fail, when such tunnel already exists, so that user has no idea, whether did it create tunnel (and should it delete it) or someone another made this work. Note, that if we would be able to create _duplicate_ tunnels on each new request (like IPv6 addresses), this would be also right approach. Alexey - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NETDEV_CHANGE events when __LINK_STATE_NOCARRIER is modified
Hello! > Jeff has introduced `alloc_etherdev()' which allocates storage > for a netdev but doesn't register it. The one quirk with this > approach (and why it's vastly simpler than my thing) I do not see where it is simpler. The only difference is that name is unknown. 8) > Not many drivers have been converted to the new interface yet. Paaardon! It is the only place where it takes sense to tell: "simpler" and it was sense of your patch! Of course, it is much "simpler" to leave all the devices in buggy state, no doubts. 8) What's about dev_probe_lock, I again do not understand why it is not deleted. Please, shed some light. > is a bit foggy. ISTR that the init() method was inherently > immune to this race. 8) Imagine, I believed that all the devices use this method for years. The discovery that init_etherdev does some shit was real catharsis. 8) Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: skb->truesize > sk->rcvbuf == Dropped packets
Hello! > Hmmm... I don't see how not touching buffer values can solve his > problem at all. His MTU is really HUGE, and in this case 300 byte > packet eats 10k or so space in receive buffer. Default rcvbuf is ~64K, it is enough to receive up to mtu of a bit less 64K. When application says rcvbuf=2K it appearently does not expect to hold more then one packet. > I doubt our buffer size tuning algorithms can cope with this. At least TCP will tune itself nicely under the most extremal conditions. What's about this case, no chances to tune. rcvbuf just should be large. > Really, copy threshold in driver just must be choosen carefully. rx copybreak has no chances to work in the case of large mtus... F.e. it does not with gigabit cards, which use 9K mtu, but need to talk to 1.5K world. Blind copybreak at 1.5K is a non-sense, meaning "copy all". Though acenic with its three rx rings is a lucky exception. 8) Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NETDEV_CHANGE events when __LINK_STATE_NOCARRIER is modified
Hello! Jeff has introduced `alloc_etherdev()' which allocates storage for a netdev but doesn't register it. The one quirk with this approach (and why it's vastly simpler than my thing) I do not see where it is simpler. The only difference is that name is unknown. 8) Not many drivers have been converted to the new interface yet. Paaardon! It is the only place where it takes sense to tell: simpler and it was sense of your patch! Of course, it is much simpler to leave all the devices in buggy state, no doubts. 8) What's about dev_probe_lock, I again do not understand why it is not deleted. Please, shed some light. is a bit foggy. ISTR that the init() method was inherently immune to this race. 8) Imagine, I believed that all the devices use this method for years. The discovery that init_etherdev does some shit was real catharsis. 8) Alexey - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: skb-truesize sk-rcvbuf == Dropped packets
Hello! Hmmm... I don't see how not touching buffer values can solve his problem at all. His MTU is really HUGE, and in this case 300 byte packet eats 10k or so space in receive buffer. Default rcvbuf is ~64K, it is enough to receive up to mtu of a bit less 64K. When application says rcvbuf=2K it appearently does not expect to hold more then one packet. I doubt our buffer size tuning algorithms can cope with this. At least TCP will tune itself nicely under the most extremal conditions. What's about this case, no chances to tune. rcvbuf just should be large. Really, copy threshold in driver just must be choosen carefully. rx copybreak has no chances to work in the case of large mtus... F.e. it does not with gigabit cards, which use 9K mtu, but need to talk to 1.5K world. Blind copybreak at 1.5K is a non-sense, meaning copy all. Though acenic with its three rx rings is a lucky exception. 8) Alexey - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: skb->truesize > sk->rcvbuf == Dropped packets
Hello! > > Any suggestions on heuristics for this ? Not to set rcvbuf to ridiculously low values. The best variant is not to touch SO_*BUF options at all. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NETDEV_CHANGE events when __LINK_STATE_NOCARRIER is modified
Hello! > I believe these events get sent to the cardmgr daemon and it does > all the ifconf magic to change the device state. Compare this also to the situation with netif_present(). After Linus said that it is called from thread context, I prepared corresponding code for netif_present (and for carrier detection in assumption it is called from thread context or softirq) BUT... this happened to be not true. So, these macros still do not assume anything on context. As result netif_carrier* is unreliable, netif_present is still straight bug. Should be fixed, of course. BTW what did happen with Andrew's netdev registration patch? By some strange reason I believed it is already applied... Grrr. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IPv6: the same address can be added multiple times
Hello! > It appears you can add _exactly_ same IPv6 address on an interface many > times: Yes. BTW, look here: kuznet@dust:~ # ip -6 a ls sit0 7: sit0@NONE: mtu 1480 qdisc noqueue inet6 ::127.0.0.1/96 scope host inet6 ::193.233.7.100/96 scope global inet6 ::193.233.7.100/96 scope global I have two equal addresses inherited from one IPv4 address on two interfaces. Nothing illegal. > FWIW, KAME stack adds the address only once(, but BSD ifconfig(8) > doesn't show errors when you try to do it again; just doesn't add the > second one). 8) > It looks like a check or two in kernel is missing, or is there some > reasoning to this behaviour? Well, it is one of well defined approaches (unlike KAME's one). Alternative is to implement full set of options NLM_F_* like used in IPv4 routing to block undefined cases. In IPv6 flags are hardwired to NLM_F_CREATE|NLM_F_APPEND both for addresses and routes. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IPv6: the same address can be added multiple times
Hello! It appears you can add _exactly_ same IPv6 address on an interface many times: Yes. BTW, look here: kuznet@dust:~ # ip -6 a ls sit0 7: sit0@NONE: NOARP,UP mtu 1480 qdisc noqueue inet6 ::127.0.0.1/96 scope host inet6 ::193.233.7.100/96 scope global inet6 ::193.233.7.100/96 scope global I have two equal addresses inherited from one IPv4 address on two interfaces. Nothing illegal. FWIW, KAME stack adds the address only once(, but BSD ifconfig(8) doesn't show errors when you try to do it again; just doesn't add the second one). 8) It looks like a check or two in kernel is missing, or is there some reasoning to this behaviour? Well, it is one of well defined approaches (unlike KAME's one). Alternative is to implement full set of options NLM_F_* like used in IPv4 routing to block undefined cases. In IPv6 flags are hardwired to NLM_F_CREATE|NLM_F_APPEND both for addresses and routes. Alexey - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: NETDEV_CHANGE events when __LINK_STATE_NOCARRIER is modified
Hello! I believe these events get sent to the cardmgr daemon and it does all the ifconf magic to change the device state. Compare this also to the situation with netif_present(). After Linus said that it is called from thread context, I prepared corresponding code for netif_present (and for carrier detection in assumption it is called from thread context or softirq) BUT... this happened to be not true. So, these macros still do not assume anything on context. As result netif_carrier* is unreliable, netif_present is still straight bug. Should be fixed, of course. BTW what did happen with Andrew's netdev registration patch? By some strange reason I believed it is already applied... Grrr. Alexey - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: skb-truesize sk-rcvbuf == Dropped packets
Hello! Any suggestions on heuristics for this ? Not to set rcvbuf to ridiculously low values. The best variant is not to touch SO_*BUF options at all. Alexey - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.4: Kernel crash, possibly tcp related
Hello! > If send_head doesn't point to skb then it is before it (and it cannot > advance under us of course because we hold the sock lock) and so in such > case we didn't clobbered the send_head at all in skb_entail, and so we > don't need to touch send_head in order to undo (we only need to unlink). > > See? I see! Dave, please, take the second Andrea's patch (appended). It is really the cleanest one. Alexey --- 2.4.4aa3/net/ipv4/tcp.c.~1~ Tue May 1 10:44:57 2001 +++ 2.4.4aa3/net/ipv4/tcp.c Tue May 1 12:00:25 2001 @@ -1183,11 +1183,8 @@ do_fault: if (skb->len==0) { - if (tp->send_head == skb) { - tp->send_head = skb->next; - if (tp->send_head == (struct sk_buff*)>write_queue) - tp->send_head = NULL; - } + if (tp->send_head == skb) + tp->send_head = NULL; __skb_unlink(skb, skb->list); tcp_free_skb(sk, skb); } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.4: Kernel crash, possibly tcp related
Hello! > zero and we are running in such slow path, it is obvious the send_head > _was_ NULL when we entered the critical section, so it's perfectly fine It is not only not obvious, it is not true almost always. On normally working tcp send_head is almost never NULL, it is NULL only when application is so slow that is not able to saturate pipe. If you do not believe my word, add printk checking this. 8) Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.4: Kernel crash, possibly tcp related
Hello! > this is the strict fix: Andrea, you caught the problem! The fix is not right though (it is equivalent to straight tp->send_head=NULL, as you noticed. It also corrupts queue in an opposite manner.) Right fix is appended. Explanation: in do_fault we must undo effect of enqueueing new segment in the case the segment remained empty. tp->send_head points to the first unsent skb in queue and it is NULL when and only when all the skbs are already sent. (Invariant is: tp->send_head==NULL || tp->send_head->seq == tp->snd_nxt) I crapped this case except for the case when queue is completely empty, so that the last sent skb was accounted in packets_out twice... Damn, what a silly mistake was it... shame. Alexey --- ../vger3-010426/linux/net/ipv4/tcp.cWed Apr 25 21:02:18 2001 +++ linux/net/ipv4/tcp.cTue May 1 20:38:44 2001 @@ -1185,7 +1187,7 @@ if (skb->len==0) { if (tp->send_head == skb) { tp->send_head = skb->prev; - if (tp->send_head == (struct sk_buff*)>write_queue) + if (TCP_SKB_CB(skb)->seq == tp->snd_nxt) tp->send_head = NULL; } __skb_unlink(skb, skb->list); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.4: Kernel crash, possibly tcp related
Hello! If send_head doesn't point to skb then it is before it (and it cannot advance under us of course because we hold the sock lock) and so in such case we didn't clobbered the send_head at all in skb_entail, and so we don't need to touch send_head in order to undo (we only need to unlink). See? I see! Dave, please, take the second Andrea's patch (appended). It is really the cleanest one. Alexey --- 2.4.4aa3/net/ipv4/tcp.c.~1~ Tue May 1 10:44:57 2001 +++ 2.4.4aa3/net/ipv4/tcp.c Tue May 1 12:00:25 2001 @@ -1183,11 +1183,8 @@ do_fault: if (skb-len==0) { - if (tp-send_head == skb) { - tp-send_head = skb-next; - if (tp-send_head == (struct sk_buff*)sk-write_queue) - tp-send_head = NULL; - } + if (tp-send_head == skb) + tp-send_head = NULL; __skb_unlink(skb, skb-list); tcp_free_skb(sk, skb); } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.4: Kernel crash, possibly tcp related
Hello! this is the strict fix: Andrea, you caught the problem! The fix is not right though (it is equivalent to straight tp-send_head=NULL, as you noticed. It also corrupts queue in an opposite manner.) Right fix is appended. Explanation: in do_fault we must undo effect of enqueueing new segment in the case the segment remained empty. tp-send_head points to the first unsent skb in queue and it is NULL when and only when all the skbs are already sent. (Invariant is: tp-send_head==NULL || tp-send_head-seq == tp-snd_nxt) I crapped this case except for the case when queue is completely empty, so that the last sent skb was accounted in packets_out twice... Damn, what a silly mistake was it... shame. Alexey --- ../vger3-010426/linux/net/ipv4/tcp.cWed Apr 25 21:02:18 2001 +++ linux/net/ipv4/tcp.cTue May 1 20:38:44 2001 @@ -1185,7 +1187,7 @@ if (skb-len==0) { if (tp-send_head == skb) { tp-send_head = skb-prev; - if (tp-send_head == (struct sk_buff*)sk-write_queue) + if (TCP_SKB_CB(skb)-seq == tp-snd_nxt) tp-send_head = NULL; } __skb_unlink(skb, skb-list); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.4: Kernel crash, possibly tcp related
Hello! zero and we are running in such slow path, it is obvious the send_head _was_ NULL when we entered the critical section, so it's perfectly fine It is not only not obvious, it is not true almost always. On normally working tcp send_head is almost never NULL, it is NULL only when application is so slow that is not able to saturate pipe. If you do not believe my word, add printk checking this. 8) Alexey - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.4: Kernel crash, possibly tcp related
Hello! > My current theory is that tcpblast does something erratic when the > error occurs. It has buffer size of 32K, so that it faults at enough large chunk sizes. Erratic errno is because this applet prints errno on partial write. Oops is apparently because I did something wrong in do_fault yet. Seems, you were right telling that this place looks dubious. 8) Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.4: Kernel crash, possibly tcp related
Hello! My current theory is that tcpblast does something erratic when the error occurs. It has buffer size of 32K, so that it faults at enough large chunk sizes. Erratic errno is because this applet prints errno on partial write. Oops is apparently because I did something wrong in do_fault yet. Seems, you were right telling that this place looks dubious. 8) Alexey - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug report: tcp staled when send-q != 0, timers == 0.
Hello! > Im my case P-MTU discovery Sorry, I lied. Not pmtu discovery but exaclty opposite effect is important here: collapsing of small frames to larger ones. Each such merge results in loss of 1 "sack" in 2.2. > I only wrote that it was active when got stuck. It may be idle before - > I do not remember, but have a habit to keep connections for weeks. :) Good. 8) > As my experiments show, any connection, entering keepalive once, > have lose its ability to send zero probes - forever. Exactly. > OK. Let us return to the "mss/mtu bug". The most mystifying thing for > me is the dependance of the MTU threshold on the kernel version, etc. Well, you can reinvestigate this to get more reliable results... Actually, this problem is so difficult that the study would be purely academical; there is no hope to fix it in 2.2. It is partially repaired during 2.3 and completely resolved only in 2.4.4. > But the question is what the minimum "reliable" MTU. There are lots of > situations when data comes rapidly in small packets (say, monitoring logs). > Is there a danger to lose such connections on a heavily loaded host? There is no real danger. Bad things can happen only when receiver does not read data for very long time, in this case connection times out not receiving any acks. What's about minimum/maximum mtu... it does not exist. F.e. if sender floods 1 byte frames in TCP_NODELAY mode and receiver does not read them, 2.2 will fail not depending on mtu. See? Even 40 bytes of IP+TCP headers (not counting for additional overhead) guarantee that memory will exhaust by order earlier than receiver can close window. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug report: tcp staled when send-q != 0, timers == 0.
Hello! Im my case P-MTU discovery Sorry, I lied. Not pmtu discovery but exaclty opposite effect is important here: collapsing of small frames to larger ones. Each such merge results in loss of 1 "sack" in 2.2. I only wrote that it was active when got stuck. It may be idle before - I do not remember, but have a habit to keep connections for weeks. :) Good. 8) As my experiments show, any connection, entering keepalive once, have lose its ability to send zero probes - forever. Exactly. OK. Let us return to the "mss/mtu bug". The most mystifying thing for me is the dependance of the MTU threshold on the kernel version, etc. Well, you can reinvestigate this to get more reliable results... Actually, this problem is so difficult that the study would be purely academical; there is no hope to fix it in 2.2. It is partially repaired during 2.3 and completely resolved only in 2.4.4. But the question is what the minimum "reliable" MTU. There are lots of situations when data comes rapidly in small packets (say, monitoring logs). Is there a danger to lose such connections on a heavily loaded host? There is no real danger. Bad things can happen only when receiver does not read data for very long time, in this case connection times out not receiving any acks. What's about minimum/maximum mtu... it does not exist. F.e. if sender floods 1 byte frames in TCP_NODELAY mode and receiver does not read them, 2.2 will fail not depending on mtu. See? Even 40 bytes of IP+TCP headers (not counting for additional overhead) guarantee that memory will exhaust by order earlier than receiver can close window. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CONFIG_PACKET_MMAP help
Hello! > 1. for tp_frame_size, I dont want to truncate any data on ethernet, I > need 1514 bytes, is this the best way to do it and not waste space? To select small snapsize (obtained from later experiments), to set PACKET_COPY_THRESH to read larger packets via recvmsg(). > 2. what is tp_block_nr for? I dont understand it, I just set it to 1 > and make tp_block_size big enough for all the frames I need, so its > just one contiguous space, all I need is about a megabyte I think. Kernel has problems with allocating large chunks of memory. If you see problems with allocating large chuns, split them to less ones. > while(1) { >if (tp->status == 0) poll() for pollin on the socket /* is there a >race here? */ No. poll returns, when new frame appears. > 4. what does the copy threshold setsockopt tuning accomplish? doesnt it always > have to copy anyway, to the mmaped area? see anser to question 1. It has a sense when size of chunk is small enough. Small packets are copied to ring, large ones (which are truncated) are queued to socket to be received via recvmsg(). Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: CONFIG_PACKET_MMAP help
Hello! 1. for tp_frame_size, I dont want to truncate any data on ethernet, I need 1514 bytes, is this the best way to do it and not waste space? To select small snapsize (obtained from later experiments), to set PACKET_COPY_THRESH to read larger packets via recvmsg(). 2. what is tp_block_nr for? I dont understand it, I just set it to 1 and make tp_block_size big enough for all the frames I need, so its just one contiguous space, all I need is about a megabyte I think. Kernel has problems with allocating large chunks of memory. If you see problems with allocating large chuns, split them to less ones. while(1) { if (tp-status == 0) poll() for pollin on the socket /* is there a race here? */ No. poll returns, when new frame appears. 4. what does the copy threshold setsockopt tuning accomplish? doesnt it always have to copy anyway, to the mmaped area? see anser to question 1. It has a sense when size of chunk is small enough. Small packets are copied to ring, large ones (which are truncated) are queued to socket to be received via recvmsg(). Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug report: tcp staled when send-q != 0, timers == 0.
Hello! >mtu 382 + keepalive yes -> loss >mtu 382 + keepalive no -> ok Well, I ignored this because it looked as full sense. Sorry. 8) > such a picture? If the answer is "yes", I am almost satisfied. :-) No, the answer is strict "no". Until keepalive is triggered the first time, it cannot affect connection in _any_ way. ... sorry, I have to run. Let's defer the furter investigation. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug report: tcp staled when send-q != 0, timers == 0.
Hello! > If your model does not cover such situation, pls, take it in mind. :) Taken. Is the machine UP? The only other known dubious place is smp specific... BTW if that cursed socket is still alive, try to make the experiment with filling window on it. It must stuck, or my theory is completely wrong. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug report: tcp staled when send-q != 0, timers == 0.
Hello! > In my experiments linux simply sets mss=mtu-40 at the start of ethernet > connections. I do not know why, but belive it's ok. How the version of > kernel and configuration options can affect mss later? You can figure out this yourself. In fact you measured this. With mss=1460 the problem does not exist. The problem begins f.e. when mss is less and packet arrives on ethernet. It eats the same 1.5k of memory, but carries only ~mss bytes of tcp payload. See? We do not know this forward, advertise large window, have not enough rcvbuf to get it filled and cannot do anything but dropping new packets. ppp is more difficult. Actually, I do not know exactly how it works now. At least, ppp in 2.4 trims skb if it has too much of unused space. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug report: tcp staled when send-q != 0, timers == 0.
Hello! > At last, I tried several MTUs on 3d computer, running "right" 2.2.17, and > could not find conditions, under which any loss of ACKs can be detected. 8)8)8) ppp also inclined to the mss/mtu bug, it allocates too large buffers and never breaks them. The difference between kernels looks funny, but I think it finds explanation in differences between mss/mtu's. Alexey [ I will be absent since tomorrow for some time. ] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug report: tcp staled when send-q != 0, timers == 0.
Hello! > > If my guess is right, you can easily put this socket to funny state > > just catting a large file and kill -STOP'ing ssh. ssh will close window, > > but sshd will not send zero probes. > > [1] I have checked your statement on 2 different machines, running 2.2.17. > No confirmation. But this is much more funny than it simply sounds. :) _That_ socket which was stuck must show this behaviour. To get this on new socket you should leave session idle for >2hours until the first keeplaive. After this it will never probe under any curcumstances. The bug was that keepalive corrupts state of timer and probe0 timer is not started after this. > buffer is filled, and client IS NOT stopped! :))) Hence connection dies > due to retransmission timeout on the server side. It is known linuxism. If the ratio connection_mss/link_mtu less than ~1/4 or connection is flood with tiny packets, after rcvbuf is full linux enters memory paranoia mode pretending that all the packets are lost. Ugly, unpleasant, but luckily harmless under any normal curcumstances. One way to workaround is to set rx_copybreak on ethernet drivers to 400-500. The bug is really difficult. It is not cured even in current 2.4 (only with zerocopy patch). > I do not understand how connection with closed window can wait until > first keepalive - it must do zero probes instead. If socket has ever sent keepalive, it will not be able to send zero window probes after this. > Hmm... I observed this bug on the host, which never performs more > than 10 conn/sec and has peak loadvg ~ 0.15. 8)8)8) Probability is probability. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug report: tcp staled when send-q != 0, timers == 0.
Hello! mtu 382 + keepalive yes - loss mtu 382 + keepalive no - ok Well, I ignored this because it looked as full sense. Sorry. 8) such a picture? If the answer is "yes", I am almost satisfied. :-) No, the answer is strict "no". Until keepalive is triggered the first time, it cannot affect connection in _any_ way. ... sorry, I have to run. Let's defer the furter investigation. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug report: tcp staled when send-q != 0, timers == 0.
Hello! If my guess is right, you can easily put this socket to funny state just catting a large file and kill -STOP'ing ssh. ssh will close window, but sshd will not send zero probes. [1] I have checked your statement on 2 different machines, running 2.2.17. No confirmation. But this is much more funny than it simply sounds. :) _That_ socket which was stuck must show this behaviour. To get this on new socket you should leave session idle for 2hours until the first keeplaive. After this it will never probe under any curcumstances. The bug was that keepalive corrupts state of timer and probe0 timer is not started after this. buffer is filled, and client IS NOT stopped! :))) Hence connection dies due to retransmission timeout on the server side. It is known linuxism. If the ratio connection_mss/link_mtu less than ~1/4 or connection is flood with tiny packets, after rcvbuf is full linux enters memory paranoia mode pretending that all the packets are lost. Ugly, unpleasant, but luckily harmless under any normal curcumstances. One way to workaround is to set rx_copybreak on ethernet drivers to 400-500. The bug is really difficult. It is not cured even in current 2.4 (only with zerocopy patch). I do not understand how connection with closed window can wait until first keepalive - it must do zero probes instead. If socket has ever sent keepalive, it will not be able to send zero window probes after this. Hmm... I observed this bug on the host, which never performs more than 10 conn/sec and has peak loadvg ~ 0.15. 8)8)8) Probability is probability. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug report: tcp staled when send-q != 0, timers == 0.
Hello! If your model does not cover such situation, pls, take it in mind. :) Taken. Is the machine UP? The only other known dubious place is smp specific... BTW if that cursed socket is still alive, try to make the experiment with filling window on it. It must stuck, or my theory is completely wrong. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug report: tcp staled when send-q != 0, timers == 0.
Hello! > In brief: a stale state of the tcp send queue was observed for 2.2.17 > while send-q counter and connection window sizes are not zero: I think I pinned down this. The patch is appended. > diagnostic, I'll try to get it. In any case, I plan to run something through > this connection in hope to reproduce this state again. If my guess is right, you can easily put this socket to funny state just catting a large file and kill -STOP'ing ssh. ssh will close window, but sshd will not send zero probes. Any socket with keepalives enabled enters this state after the first keepalive is sent. [ Note, that it is not Butenko's problem, it is still to be discovered. 8) ] I think you will not able to reproduce full problem: socket will revive after the first received ACK. It is another bug and its probability is astronomically low. Alexey --- linux/net/ipv4/tcp_input.c.orig Mon Apr 9 22:46:56 2001 +++ linux/net/ipv4/tcp_input.c Tue Apr 10 21:23:33 2001 @@ -733,8 +733,6 @@ if (tp->retransmits) { if (tp->packets_out == 0) { tp->retransmits = 0; - tp->fackets_out = 0; - tp->retrans_out = 0; tp->backoff = 0; tcp_set_rto(tp); } else { @@ -781,8 +779,10 @@ if(sk->zapped) return(1); /* Dead, can't ack any more so why bother */ - if (tp->pending == TIME_KEEPOPEN) + if (tp->pending == TIME_KEEPOPEN) { tp->probes_out = 0; + tp->pending = 0; + } tp->rcv_tstamp = tcp_time_stamp; @@ -850,8 +850,6 @@ if (tp->retransmits) { if (tp->packets_out == 0) { tp->retransmits = 0; - tp->fackets_out = 0; - tp->retrans_out = 0; } } else { /* We don't have a timestamp. Can only use @@ -878,6 +876,8 @@ tcp_ack_packets_out(sk, tp); } else { tcp_clear_xmit_timer(sk, TIME_RETRANS); + tp->fackets_out = 0; + tp->retrans_out = 0; } flag &= (FLAG_DATA | FLAG_WIN_UPDATE); --- linux/net/ipv4/tcp_output.c.origMon Apr 9 22:47:06 2001 +++ linux/net/ipv4/tcp_output.c Tue Apr 10 21:23:33 2001 @@ -546,6 +546,8 @@ */ kfree_skb(next_skb); sk->tp_pinfo.af_tcp.packets_out--; + if (sk->tp_pinfo.af_tcp.fackets_out) + sk->tp_pinfo.af_tcp.fackets_out--; } } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Bug report: tcp staled when send-q != 0, timers == 0.
Hello! In brief: a stale state of the tcp send queue was observed for 2.2.17 while send-q counter and connection window sizes are not zero: I think I pinned down this. The patch is appended. diagnostic, I'll try to get it. In any case, I plan to run something through this connection in hope to reproduce this state again. If my guess is right, you can easily put this socket to funny state just catting a large file and kill -STOP'ing ssh. ssh will close window, but sshd will not send zero probes. Any socket with keepalives enabled enters this state after the first keepalive is sent. [ Note, that it is not Butenko's problem, it is still to be discovered. 8) ] I think you will not able to reproduce full problem: socket will revive after the first received ACK. It is another bug and its probability is astronomically low. Alexey --- linux/net/ipv4/tcp_input.c.orig Mon Apr 9 22:46:56 2001 +++ linux/net/ipv4/tcp_input.c Tue Apr 10 21:23:33 2001 @@ -733,8 +733,6 @@ if (tp-retransmits) { if (tp-packets_out == 0) { tp-retransmits = 0; - tp-fackets_out = 0; - tp-retrans_out = 0; tp-backoff = 0; tcp_set_rto(tp); } else { @@ -781,8 +779,10 @@ if(sk-zapped) return(1); /* Dead, can't ack any more so why bother */ - if (tp-pending == TIME_KEEPOPEN) + if (tp-pending == TIME_KEEPOPEN) { tp-probes_out = 0; + tp-pending = 0; + } tp-rcv_tstamp = tcp_time_stamp; @@ -850,8 +850,6 @@ if (tp-retransmits) { if (tp-packets_out == 0) { tp-retransmits = 0; - tp-fackets_out = 0; - tp-retrans_out = 0; } } else { /* We don't have a timestamp. Can only use @@ -878,6 +876,8 @@ tcp_ack_packets_out(sk, tp); } else { tcp_clear_xmit_timer(sk, TIME_RETRANS); + tp-fackets_out = 0; + tp-retrans_out = 0; } flag = (FLAG_DATA | FLAG_WIN_UPDATE); --- linux/net/ipv4/tcp_output.c.origMon Apr 9 22:47:06 2001 +++ linux/net/ipv4/tcp_output.c Tue Apr 10 21:23:33 2001 @@ -546,6 +546,8 @@ */ kfree_skb(next_skb); sk-tp_pinfo.af_tcp.packets_out--; + if (sk-tp_pinfo.af_tcp.fackets_out) + sk-tp_pinfo.af_tcp.fackets_out--; } } - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: softirq buggy
Hello! > Btw, you don't schedule the ksoftirqd thread if do_softirq() returns > from the 'if(in_interrupt())' check. ksoftirqd will not be switched to before the first schedule or ret form syscall, when softirqs will be processed in any case. So, wake up in this case would be mistake. > I assume that this is the most common case of delayed softirq > processing: softirqs have the same latency warranty as rt threads, so that this is not a problem at all. The _real_ problem is softirqs generated from another softirqs: additonal thread is made _not_ to speed up softirqs, but to _tame_ them (if I understood Andres's explanations correctly). Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: softirq buggy
Hello! Btw, you don't schedule the ksoftirqd thread if do_softirq() returns from the 'if(in_interrupt())' check. ksoftirqd will not be switched to before the first schedule or ret form syscall, when softirqs will be processed in any case. So, wake up in this case would be mistake. I assume that this is the most common case of delayed softirq processing: softirqs have the same latency warranty as rt threads, so that this is not a problem at all. The _real_ problem is softirqs generated from another softirqs: additonal thread is made _not_ to speed up softirqs, but to _tame_ them (if I understood Andres's explanations correctly). Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: softirq buggy [Re: Serial port latency]
Hello! > But with a huge overhead. I'd prefer to call it directly from within the > idle functions, the overhead of schedule is IMHO too high. + if (current->need_resched) { + return 0; + } + if (softirq_active(smp_processor_id()) & softirq_mask(smp_processor_id())) { + do_softirq(); + return 0; ^ You return one value in both casesand I decided it means "schedule". 8) Apparently you meaned return 1 in the first case. 8) But in this case it becomes wrong. do_softirq() can raise need_reshed and moreover irqs arrive during it. Order of check should be different. BTW what's about overhead... I suspect it is _lower_ in the case of schedule(). In the case of networking at least, when softirq most likely wakes some socket. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: TCP stack misbehaviour?
Hello! > empty, except for occasional ACKs. The utilization of the channel is about 4%. 1. tcpdump is required. 2. exact vesion of used kernel is required too. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: new queuing discipline
Hello! > packet in the queue. No other conditions i found. But i need repeatedly test > the top packet in the queue. > > How to accomplish it? Look into sch_tbf.c for example. Hint: timer. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: softirq buggy [Re: Serial port latency]
Hello! > + if (softirq_active(smp_processor_id()) & softirq_mask(smp_processor_id())) { > + do_softirq(); > + return 0; BTW you may delete do_softirq()... schedule() will call this. > + * > + * Isn't this identical to default_idle with the 'no-hlt' boot > + * option? <[EMAIL PROTECTED]> Seeems, it is not. need_resched=-1 avoids useless IPIs. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: softirq buggy [Re: Serial port latency]
Hello! + if (softirq_active(smp_processor_id()) softirq_mask(smp_processor_id())) { + do_softirq(); + return 0; BTW you may delete do_softirq()... schedule() will call this. + * + * Isn't this identical to default_idle with the 'no-hlt' boot + * option? [EMAIL PROTECTED] Seeems, it is not. need_resched=-1 avoids useless IPIs. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: new queuing discipline
Hello! packet in the queue. No other conditions i found. But i need repeatedly test the top packet in the queue. How to accomplish it? Look into sch_tbf.c for example. Hint: timer. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: TCP stack misbehaviour?
Hello! empty, except for occasional ACKs. The utilization of the channel is about 4%. 1. tcpdump is required. 2. exact vesion of used kernel is required too. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: udp <-> tcp connect
Hello! > I want to bind to non-local IP and send/receive UDP packets. This is impossible, apparently. > but in tcp_v4_connect: > tmp = ip_route_connect(, nexthop, sk->saddr, > RT_TOS(sk->ip_tos)|RTO_CONN|sk->localroute, sk->bound_dev_if); > ^^^ And this is __terrible__ bug. RTO_CONN cannot be set here. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IP layer bug?
Hello! > Hm. But comment in linux/skbuff.h says: The comment is about more difficult case: transmit path, where cb is used both by top level protocol and lower layers: f.e. TCP -> IP -> device. cb is dirty from the moment of skb creation in this case. Also, note that the second sentence in the comment is obsolete. Passing not cloned skbs between layers is strongly deprecated practice (I hope it is not used in any place) and cb of skb entering to lower layer is property of the layer. RX path is simpler: cb must be kept clean, that's all. General rule is minimization redundant clearings of the area. > Why not document it somewhere, so that others will not fall into the same trap? Indeed. 8) You got the experience, which you expect to be useful for people, it is time to prepare some note recording this. 8) Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IP layer bug?
Hello! Hm. But comment in linux/skbuff.h says: The comment is about more difficult case: transmit path, where cb is used both by top level protocol and lower layers: f.e. TCP - IP - device. cb is dirty from the moment of skb creation in this case. Also, note that the second sentence in the comment is obsolete. Passing not cloned skbs between layers is strongly deprecated practice (I hope it is not used in any place) and cb of skb entering to lower layer is property of the layer. RX path is simpler: cb must be kept clean, that's all. General rule is minimization redundant clearings of the area. Why not document it somewhere, so that others will not fall into the same trap? Indeed. 8) You got the experience, which you expect to be useful for people, it is time to prepare some note recording this. 8) Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: udp - tcp connect
Hello! I want to bind to non-local IP and send/receive UDP packets. This is impossible, apparently. but in tcp_v4_connect: tmp = ip_route_connect(rt, nexthop, sk-saddr, RT_TOS(sk-ip_tos)|RTO_CONN|sk-localroute, sk-bound_dev_if); ^^^ And this is __terrible__ bug. RTO_CONN cannot be set here. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IP layer bug?
Hello! >For now I workarounded it with filling skb->cb with zeroes before >netif_rx(), This is right. For another examples look into tunnels. > but I believe it is a kludge and networking layer should be fixed instead. No. alloc_skb() creates skb with clean cb. ip_rcv() and other protocol handlers do not redo this work. If device uses cb internally, it must clear it before handing skb to netif_rx(). Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IP layer bug?
Hello! For now I workarounded it with filling skb-cb with zeroes before netif_rx(), This is right. For another examples look into tunnels. but I believe it is a kludge and networking layer should be fixed instead. No. alloc_skb() creates skb with clean cb. ip_rcv() and other protocol handlers do not redo this work. If device uses cb internally, it must clear it before handing skb to netif_rx(). Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: rsync over ssh on 2.4.2 to 2.2.18
Hello! > Well, since I moved the rsync to 5pm, and then back to 9pm, I haven't > seen this problem - everything is again working as expected (touch wood) > with 2.2.15pre13 and 2.4.0. > > This is odd, since it wasn't a one-off problem, but something that happened > each and every day of a particular week. Anyway, if it starts happening > again, I'll get a tcpdump of the session. Well... I can reproduce this not depending of day of week now. :-( If I understood Andrew's mail correctly, rsync freezes when large amount of errors happen. Particularly, here ssh always freezes trying to write to stdout (pipe linked to rsync) some chunk of data (>64K), consisting of reports sort of "stat(usr/X11R6/bin/xmbind) : No such file" (my /usr has huge amount of stray symlinks...). rsync does not read this pipe, trying to write some more data to pipe, which is pipe linked to rsync stdin. Dead lock. Note, that doing strace you do not see this write()! write() is interrupted by strace and you see succesful select(). You can see real place of lockup via ps axl or starting strace before lockup happened. Andrew said about one more common case, when large amount of errors happens: wrong permission on some target directory. I have impression that long ago I observed the same affect, when disk space at target exhausted. Why do I tall this? Well, probably, something changed at fs, wich you rsync and rsync generates less errors. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: rsync over ssh on 2.4.2 to 2.2.18
Hello! Well, since I moved the rsync to 5pm, and then back to 9pm, I haven't seen this problem - everything is again working as expected (touch wood) with 2.2.15pre13 and 2.4.0. This is odd, since it wasn't a one-off problem, but something that happened each and every day of a particular week. Anyway, if it starts happening again, I'll get a tcpdump of the session. Well... I can reproduce this not depending of day of week now. :-( If I understood Andrew's mail correctly, rsync freezes when large amount of errors happen. Particularly, here ssh always freezes trying to write to stdout (pipe linked to rsync) some chunk of data (64K), consisting of reports sort of "stat(usr/X11R6/bin/xmbind) : No such file" (my /usr has huge amount of stray symlinks...). rsync does not read this pipe, trying to write some more data to pipe, which is pipe linked to rsync stdin. Dead lock. Note, that doing strace you do not see this write()! write() is interrupted by strace and you see succesful select(). You can see real place of lockup via ps axl or starting strace before lockup happened. Andrew said about one more common case, when large amount of errors happens: wrong permission on some target directory. I have impression that long ago I observed the same affect, when disk space at target exhausted. Why do I tall this? Well, probably, something changed at fs, wich you rsync and rsync generates less errors. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: poll() behaves differently in Linux 2.4.1 vs. Linux 2.2.14 (POLLHUP)
Hello! > Sure, workarounds exist, but they just complicates > things. Working around --- what? An example of application hitting the case is enough to make me completely agreed. But genarally we are not going to match any os and even yourselves yesterday or tomorrow in the cases when behaviour is truly undefined and the answer is meaningless. For me any solution from retunring 0 or returning POLLHUO to killing offending application or generating an answer using random number generator look equally good, acceptable and 100% compatible in this case. 8) Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: poll() behaves differently in Linux 2.4.1 vs. Linux 2.2.14 (POLLHUP)
Hello! Sure, workarounds exist, but they just complicates things. Working around --- what? An example of application hitting the case is enough to make me completely agreed. But genarally we are not going to match any os and even yourselves yesterday or tomorrow in the cases when behaviour is truly undefined and the answer is meaningless. For me any solution from retunring 0 or returning POLLHUO to killing offending application or generating an answer using random number generator look equally good, acceptable and 100% compatible in this case. 8) Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: poll() behaves differently in Linux 2.4.1 vs. Linux 2.2.14 (POLLHUP)
Hello! > True, this behavior was changed from 2.2.x. We now match the behavior > of other svr4 systems, in particular Solaris. Damn, we did not test behaviour on absolutely new clean never connected socket... Solaris really may return 0 on it. However, looking from other hand the issue looks as absolutely academic and not related to practice in any way. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: poll() behaves differently in Linux 2.4.1 vs. Linux 2.2.14 (POLLHUP)
Hello! True, this behavior was changed from 2.2.x. We now match the behavior of other svr4 systems, in particular Solaris. Damn, we did not test behaviour on absolutely new clean never connected socket... Solaris really may return 0 on it. However, looking from other hand the issue looks as absolutely academic and not related to practice in any way. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Feedback for fastselect and one-copy-pipe
Hello! > freebsd-4.0 doesn't use direct transfers for PAGE_SIZE'd pipe write()s: > it uses MINDIRECT=8192. I see. > (and PIPE_BUF is 512, so 4096 was possible for > them) 8) I see. Thank you for patience. 8) Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Feedback for fastselect and one-copy-pipe
Hello! > freebsd Very funny, the idea is borrowed from there. As you could understand your patch kills it. PAGE_SIZE is one of the most frequently used transfer unit. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Feedback for fastselect and one-copy-pipe
Hello! > It returns immediately on all unix platforms I tested I see. It is essential moment. PAGE_SIZE was really bad threshold value. Sigh and alas. Alexey PS BTW "all unix" is unlikely to include freebsd. 8) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Feedback for fastselect and one-copy-pipe
Hello! > * davem's patch breaks apps that assume that write(,PIPE_BUF) after > poll(POLLOUT) never blocks, even for blocking pipes. Pardon, but PIPE_BUF <= PAGE_SIZE yet, so that fears have no reasons. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Feedback for fastselect and one-copy-pipe
Hello! * davem's patch breaks apps that assume that write(,PIPE_BUF) after poll(POLLOUT) never blocks, even for blocking pipes. Pardon, but PIPE_BUF = PAGE_SIZE yet, so that fears have no reasons. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Feedback for fastselect and one-copy-pipe
Hello! It returns immediately on all unix platforms I tested I see. It is essential moment. PAGE_SIZE was really bad threshold value. Sigh and alas. Alexey PS BTW "all unix" is unlikely to include freebsd. 8) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Feedback for fastselect and one-copy-pipe
Hello! freebsd Very funny, the idea is borrowed from there. As you could understand your patch kills it. PAGE_SIZE is one of the most frequently used transfer unit. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Feedback for fastselect and one-copy-pipe
Hello! freebsd-4.0 doesn't use direct transfers for PAGE_SIZE'd pipe write()s: it uses MINDIRECT=8192. I see. (and PIPE_BUF is 512, so 4096 was possible for them) 8) I see. Thank you for patience. 8) Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Incoming TCP TOS: A simple question, I would have thought...
Hello! > I've scrolled through various code in net/ipv4, and I can't see how to query > the TOS of an incoming TCP stream (or at the least, the TOS of the SYN which > initiated the connection). No way. Formally it is IP_RECVTOS, followed by IP_PKTOPTIONS. But getting TOS via IP_PKTOPTIONS is not implemented, only ttl. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Incoming TCP TOS: A simple question, I would have thought...
Hello! I've scrolled through various code in net/ipv4, and I can't see how to query the TOS of an incoming TCP stream (or at the least, the TOS of the SYN which initiated the connection). No way. Formally it is IP_RECVTOS, followed by IP_PKTOPTIONS. But getting TOS via IP_PKTOPTIONS is not implemented, only ttl. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Inadequate documentation: sockets
Hello! > The manual specifies the following flag to be returned by the > kernel > > #define POLLHUP 0x0010/* Hung up */ > > Hanging up is ambiguous. Does it mean that the client is dead, > that he closed his end of the socket, or that he shut down one or > both directions of the data flow? The following man page clears > this up, but I think the following information would best be > placed in man poll. The information is is not quite correct. POLLHUP is really ambigous in the case of full-duplex connections with bi-directional close sort of TCP. However, invariants are: * POLLHUP is set when connection is closed in both directions. * POLLHUP implies that descriptor is not-writable and any write will cause error and, probably, SIGPIPE. * POLLHUP is _not_ set when descriptor is writable (i.e. connection is shutdowned in write direction by remote) Standards require that POLLHUP and POLLOUT never happened together, however linux does this, which is formally bug. However, it is not fixed, assuming that POLLHUP overrides POLLOUT. > Finally I'm left with my original problem: how am I supposed to > detect a close or a shutdown from the peer? By EOF. No other way exists. POLLHUP is local condition, only local side can close connection in write direction. Exception is abort (reset) initiated by peer. > by addressing me to more adequate documentation. UNIX98 and Austin draft pages. The are very ambiguous though. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Inadequate documentation: sockets
Hello! The manual specifies the following flag to be returned by the kernel #define POLLHUP 0x0010/* Hung up */ Hanging up is ambiguous. Does it mean that the client is dead, that he closed his end of the socket, or that he shut down one or both directions of the data flow? The following man page clears this up, but I think the following information would best be placed in man poll. The information is is not quite correct. POLLHUP is really ambigous in the case of full-duplex connections with bi-directional close sort of TCP. However, invariants are: * POLLHUP is set when connection is closed in both directions. * POLLHUP implies that descriptor is not-writable and any write will cause error and, probably, SIGPIPE. * POLLHUP is _not_ set when descriptor is writable (i.e. connection is shutdowned in write direction by remote) Standards require that POLLHUP and POLLOUT never happened together, however linux does this, which is formally bug. However, it is not fixed, assuming that POLLHUP overrides POLLOUT. Finally I'm left with my original problem: how am I supposed to detect a close or a shutdown from the peer? By EOF. No other way exists. POLLHUP is local condition, only local side can close connection in write direction. Exception is abort (reset) initiated by peer. by addressing me to more adequate documentation. UNIX98 and Austin draft pages. The are very ambiguous though. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Another rsync over ssh hang (repeatable, with 2.4.1 on both ends)
Hello! > this kernel was compiled with GCC 2.95.2, This is a hint. Could you make the following things: 1. to disassemble tcp_poll() (the easiest way is to gdb vmlinux, to say x/i tcp_poll and to hold enter pressed long enough, copying screen to file) and to send the result to me. 2. to apply the enclosed patchlet. 3. if 3 does not change anything, recompile with egcs-1.1.2 Alexey --- ../vger3-010223/linux/net/ipv4/tcp.cFri Feb 23 21:28:34 2001 +++ linux/net/ipv4/tcp.cSat Mar 3 18:37:22 2001 @@ -442,6 +443,8 @@ set_bit(SOCK_ASYNC_NOSPACE, >socket->flags); set_bit(SOCK_NOSPACE, >socket->flags); + barrier(); + /* Race breaker. If space is freed after * wspace test but before the flags are set, * IO signal will be lost. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Another rsync over ssh hang (repeatable, with 2.4.1 on both ends)
Hello! this kernel was compiled with GCC 2.95.2, This is a hint. Could you make the following things: 1. to disassemble tcp_poll() (the easiest way is to gdb vmlinux, to say x/i tcp_poll and to hold enter pressed long enough, copying screen to file) and to send the result to me. 2. to apply the enclosed patchlet. 3. if 3 does not change anything, recompile with egcs-1.1.2 Alexey --- ../vger3-010223/linux/net/ipv4/tcp.cFri Feb 23 21:28:34 2001 +++ linux/net/ipv4/tcp.cSat Mar 3 18:37:22 2001 @@ -442,6 +443,8 @@ set_bit(SOCK_ASYNC_NOSPACE, sk-socket-flags); set_bit(SOCK_NOSPACE, sk-socket-flags); + barrier(); + /* Race breaker. If space is freed after * wspace test but before the flags are set, * IO signal will be lost. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Another rsync over ssh hang (repeatable, with 2.4.1 on both ends)
Hello! > same means its not the same bug? It is the same, I think. > If you still insist that it is purely a 2.2.15pre13 bug I never said this. I said that your strace is _wrong_, how can I be sure that tcpdump is not wrong too? You could understand this. 8) > together to put 2.2.18 on this machine. I can't guarantee when I'll > be able to do this though. You planned to make more accurate strace on Monday, if I remember correctly. Now it is not necessary, Scott's one is enough to understand that some problem exists and cannot be explained by buggy 2.2.15. > PS, could you please spell my name correctly? I bring apologies. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Another rsync over ssh hang (repeatable, with 2.4.1 on both ends)
Hello! > I've also reported The report by Scott Laird is sane unlike your one. It can be explained by bug rather than only by poltergeist. 8) > Thanks for confirming that 2.2.15pre13 is not the cause. Russel, you are warned that kernels<2.2.17 and rsync is an incompatible combination. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Another rsync over ssh hang (repeatable, with 2.4.1 on both ends)
Hello! I've also reported The report by Scott Laird is sane unlike your one. It can be explained by bug rather than only by poltergeist. 8) Thanks for confirming that 2.2.15pre13 is not the cause. Russel, you are warned that kernels2.2.17 and rsync is an incompatible combination. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Another rsync over ssh hang (repeatable, with 2.4.1 on both ends)
Hello! same means its not the same bug? It is the same, I think. If you still insist that it is purely a 2.2.15pre13 bug I never said this. I said that your strace is _wrong_, how can I be sure that tcpdump is not wrong too? You could understand this. 8) together to put 2.2.18 on this machine. I can't guarantee when I'll be able to do this though. You planned to make more accurate strace on Monday, if I remember correctly. Now it is not necessary, Scott's one is enough to understand that some problem exists and cannot be explained by buggy 2.2.15. PS, could you please spell my name correctly? I bring apologies. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: What is 2.4 Linux networking performance like compared to BSD?
Hello! > They know that iMimic's polymix performance on Linux 2.2.* is half what it is on > BSD. What is "iMimic's polymix"? I am almost sure, it is simply buggy and was not _debugged_ under linux. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: rsync over ssh on 2.4.2 to 2.2.18
Hello! > I'll see if I can strace it from the start until it hangs tomorrow. Please... Also, try to make binary tcpdump. > I was running at one point a 2.4.0-test kernel, but I didn't see these Yes, it did not result in full stall. Lost wakeups were recovered f.e. by any keyboard activity. 8) Wow! 2.2.15? rsync surely does work with 2.2.15. The first 2.2 fixing the same bugs, which solaris has now, was 2.2.17. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: rsync over ssh on 2.4.2 to 2.2.18
Hello! > I've seen hanging rsync over ssh more than once, while sending much data > from an x86 running Linux (late 2.3.x) to Sparc/Solaris2.5.1 I remember this your report. However, recent news force to suspect that the reason was in Solaris yet. Actually, if you send tcpdump of failed session, this question can be answered. Also REMEMBER! rsync has __no__ chances to work, if one of sides is Linux-2.2 before 2.2.17 or Solaris of today (sunos-2.5.1 is enough old, probably, it still was right unlike subsequent kernels). These stacks have conciding set of fatal bugs, which prevent transmits with closed window. To use rsync it is necessary to upgrade to >=2.2.17. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: rsync over ssh on 2.4.2 to 2.2.18
Hello! I've seen hanging rsync over ssh more than once, while sending much data from an x86 running Linux (late 2.3.x) to Sparc/Solaris2.5.1 I remember this your report. However, recent news force to suspect that the reason was in Solaris yet. Actually, if you send tcpdump of failed session, this question can be answered. Also REMEMBER! rsync has __no__ chances to work, if one of sides is Linux-2.2 before 2.2.17 or Solaris of today (sunos-2.5.1 is enough old, probably, it still was right unlike subsequent kernels). These stacks have conciding set of fatal bugs, which prevent transmits with closed window. To use rsync it is necessary to upgrade to =2.2.17. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: rsync over ssh on 2.4.2 to 2.2.18
Hello! I'll see if I can strace it from the start until it hangs tomorrow. Please... Also, try to make binary tcpdump. I was running at one point a 2.4.0-test kernel, but I didn't see these Yes, it did not result in full stall. Lost wakeups were recovered f.e. by any keyboard activity. 8) Wow! 2.2.15? rsync surely does work with 2.2.15. The first 2.2 fixing the same bugs, which solaris has now, was 2.2.17. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: rsync over ssh on 2.4.2 to 2.2.18
Hello! > netstat on isdn-gw shows the following: > > Proto Recv-Q Send-Q Local Address Foreign Address State > tcp72868 0 isdn-gw.piltdown.a:1023 pilt-gw.piltdown.at:ssh ESTABLISHED plus > select(4, [3], [3], NULL, NULL) = 2 (in [3], out [3]) > Maybe there is a race condition or missing wakeup in the TCP code? Moreover, even not _one_ wakeup is missing. At least two, because wakeups in read and write are separate and you have stuck in both directions. 8)8) Well, if it was one I would start to dig ground inside tcp instantly. But as soon as two of them are missing, I have to suspect wake_up itself. At least, we had such bugs there until 2.4.0. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New net features for added performance
Hello! > > 3) Enforce correct usage of it in all the networking :-) > > ,) -- the tricky part. No tricks, IP[v6] is already enforced to be clever; all the rest are free to do this, if they desire. And btw, driver need not to parse anything, but its internal stuff and even aligning eth II header can be made in eth_type_trans(). Actually, it is possible now not changing anything but driver. Fortunately, I removed stupid tulip from alpha, so that I have no impetus to try this myself. 8) Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: possible bug x86 2.4.2 SMP in IP receive stack
Hello! > Feb 23 12:42:30 rcc2 kernel: Warning: kfree_skb passed an skb still on a list (from >c01f58dc). BTW, that's didactic example of bug which results in similar behaviour. Alexey > From: [EMAIL PROTECTED] (Andrew Morton) > Subject: Re: Failed assertion > Date: 27 Feb 2001 04:15:01 +0300 > > "David S. Miller" wrote: > > > > Ralf Baechle writes: > > > No backtrace, the machine did continue as you'd suspect after a print. > > > The machine is a dual CPU Origin 200 with an IOC3 NIC. > > > > What is your current kernel based upon, some older 2.4.x or > > even 2.3.x variant? Or is it sync'd to current? > > Could this be a driver problem? This code: > > netif_rx(skb); > > ip->rx_skbs[rx_entry] = NULL; /* Poison */ > > new_skb = ioc3_alloc_skb(RX_BUF_ALLOC_SIZE, GFP_ATOMIC); > if (!new_skb) { > /* Ouch, drop packet and just recycle packet >to keep the ring filled. */ > ip->stats.rx_dropped++; > new_skb = skb; > goto next; > } > > looks scary. We've passed an skb to the network stack, > but we can continue to make it available to the device > driver at the same time. > > I'd suggest a printk() in there, plus perhaps do the > alloc_skb _before_ the netif_rx(). Don't pass the skb > to the stack if it is to be recycled. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: possible bug x86 2.4.2 SMP in IP receive stack
Hello! Feb 23 12:42:30 rcc2 kernel: Warning: kfree_skb passed an skb still on a list (from c01f58dc). BTW, that's didactic example of bug which results in similar behaviour. Alexey From: [EMAIL PROTECTED] (Andrew Morton) Subject: Re: Failed assertion Date: 27 Feb 2001 04:15:01 +0300 "David S. Miller" wrote: Ralf Baechle writes: No backtrace, the machine did continue as you'd suspect after a print. The machine is a dual CPU Origin 200 with an IOC3 NIC. What is your current kernel based upon, some older 2.4.x or even 2.3.x variant? Or is it sync'd to current? Could this be a driver problem? This code: netif_rx(skb); ip-rx_skbs[rx_entry] = NULL; /* Poison */ new_skb = ioc3_alloc_skb(RX_BUF_ALLOC_SIZE, GFP_ATOMIC); if (!new_skb) { /* Ouch, drop packet and just recycle packet to keep the ring filled. */ ip-stats.rx_dropped++; new_skb = skb; goto next; } looks scary. We've passed an skb to the network stack, but we can continue to make it available to the device driver at the same time. I'd suggest a printk() in there, plus perhaps do the alloc_skb _before_ the netif_rx(). Don't pass the skb to the stack if it is to be recycled. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: New net features for added performance
Hello! 3) Enforce correct usage of it in all the networking :-) ,) -- the tricky part. No tricks, IP[v6] is already enforced to be clever; all the rest are free to do this, if they desire. And btw, driver need not to parse anything, but its internal stuff and even aligning eth II header can be made in eth_type_trans(). Actually, it is possible now not changing anything but driver. Fortunately, I removed stupid tulip from alpha, so that I have no impetus to try this myself. 8) Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: rsync over ssh on 2.4.2 to 2.2.18
Hello! netstat on isdn-gw shows the following: Proto Recv-Q Send-Q Local Address Foreign Address State tcp72868 0 isdn-gw.piltdown.a:1023 pilt-gw.piltdown.at:ssh ESTABLISHED plus select(4, [3], [3], NULL, NULL) = 2 (in [3], out [3]) Maybe there is a race condition or missing wakeup in the TCP code? Moreover, even not _one_ wakeup is missing. At least two, because wakeups in read and write are separate and you have stuck in both directions. 8)8) Well, if it was one I would start to dig ground inside tcp instantly. But as soon as two of them are missing, I have to suspect wake_up itself. At least, we had such bugs there until 2.4.0. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Very high bandwith packet based interface and performance problems
Hello! > > Yes its a SHOULD in RFC1122, but in any normal environment pretty much a > > must and I know of no stack significantly violating it. > > I didn't know there was such a thing as a normal environment :) Jokes apart, such "normal" environments are rare today. >From tcpdumps it is clear, that win2000 does not ack each other mss. It can ack once per window at high load. I have seen the same behaviour of solaris. freebsd-4.x surely does not ack each second mss (it is from source code), which is probably bug (at least, it stops to ack at all as soon as MSG_WAITALL is used. 8)) Acking each second mss is required to do slow start more or less fastly. As soon as window is full, they are useless, so that win2000 is fully right and, in fact, optimal. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Very high bandwith packet based interface and performance problems
Hello! Yes its a SHOULD in RFC1122, but in any normal environment pretty much a must and I know of no stack significantly violating it. I didn't know there was such a thing as a normal environment :) Jokes apart, such "normal" environments are rare today. From tcpdumps it is clear, that win2000 does not ack each other mss. It can ack once per window at high load. I have seen the same behaviour of solaris. freebsd-4.x surely does not ack each second mss (it is from source code), which is probably bug (at least, it stops to ack at all as soon as MSG_WAITALL is used. 8)) Acking each second mss is required to do slow start more or less fastly. As soon as window is full, they are useless, so that win2000 is fully right and, in fact, optimal. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.1 under heavy network load - more info
Hello! > OK! I actually expected 2.4 to be somewhat selftuning. Defaults for these numbers (X,Y,Z) are very conservative. > Interesting you say that, I looked at the logs and I see over 5000 sockets > used, does'nt look peaceful to me. But you are absolutely right about the > orphans. The error about "too many orphans" must be wrong and is triggered > by some other condition. Look at the output from the debug printk I've > added: > > Feb 18 15:43:50 mcquack kernel: TCP: too many of orphaned sockets Well, message is not accurate. It refuses to hold this particular orphan, because it feels that too much of memory is consumed. Change number Z and the message will disappear. Poor orphans are the first victims, because they have nobody to take care of, but kernel. And kernel is harsh parent. 8) > I raised the numbers a little bit more. Now with 128MB RAM in the box we can > handle a maximum of 7000 connections. No more because we start to swap too > much. Really? Well, it is unlikely to have something with net. Your dumps show that at 6000 connections networking eated less than 10MB of memory. Probably, swapping is mistuned. > Feb 21 10:43:41 mcquack kernel: KERNEL: assertion (tp->lost_out == 0) failed > at tcp_input.c(1202):tcp_remove_reno_sacks This is also debugging. Harmless. > 2) The error about "too many orphans" is bogus? Yes. It is sort of desinformation. It means really that accounting detected excess of limits, which are set. > 3) I will get a lot of debug crap i syslog It will disappear as soon as debugging is disabled. I.e. when kernel will enter distributions, I guess. If I was responsible for this, I would not kill them. The more messages is the better. Otherwise you would have nothing to report and even did not notice that something is wrong. 8) > This happened once under very heavy load (8000+ connections) and I have been > unable to reproduce. Probably this has nothing to do with tcp, but explained by some vm failure, sort of oom killer. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.1 under heavy network load - more info
Hello! OK! I actually expected 2.4 to be somewhat selftuning. Defaults for these numbers (X,Y,Z) are very conservative. Interesting you say that, I looked at the logs and I see over 5000 sockets used, does'nt look peaceful to me. But you are absolutely right about the orphans. The error about "too many orphans" must be wrong and is triggered by some other condition. Look at the output from the debug printk I've added: Feb 18 15:43:50 mcquack kernel: TCP: too many of orphaned sockets Well, message is not accurate. It refuses to hold this particular orphan, because it feels that too much of memory is consumed. Change number Z and the message will disappear. Poor orphans are the first victims, because they have nobody to take care of, but kernel. And kernel is harsh parent. 8) I raised the numbers a little bit more. Now with 128MB RAM in the box we can handle a maximum of 7000 connections. No more because we start to swap too much. Really? Well, it is unlikely to have something with net. Your dumps show that at 6000 connections networking eated less than 10MB of memory. Probably, swapping is mistuned. Feb 21 10:43:41 mcquack kernel: KERNEL: assertion (tp-lost_out == 0) failed at tcp_input.c(1202):tcp_remove_reno_sacks This is also debugging. Harmless. 2) The error about "too many orphans" is bogus? Yes. It is sort of desinformation. It means really that accounting detected excess of limits, which are set. 3) I will get a lot of debug crap i syslog It will disappear as soon as debugging is disabled. I.e. when kernel will enter distributions, I guess. If I was responsible for this, I would not kill them. The more messages is the better. Otherwise you would have nothing to report and even did not notice that something is wrong. 8) This happened once under very heavy load (8000+ connections) and I have been unable to reproduce. Probably this has nothing to do with tcp, but explained by some vm failure, sort of oom killer. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.1 under heavy network load - more info
Hello! > of errors a bit but I'm not sure I fully understand the implications of > doing so. Until these numbers do not exceed total amount of RAM, this is exactly the action required in this case. Dumps, which you sent to me, show nothing pathological. Actually, they are made in some period of full peace: only 47 orphans and only about 10MB of accounted memory. > echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout This is not essential with 2.4. In 2.4 this state does not grab any essential resources. > echo 0 > /proc/sys/net/ipv4/tcp_timestamps > echo 0 > /proc/sys/net/ipv4/tcp_sack Why? > echo "3072 3584 6144" > /proc/sys/net/ipv4/tcp_mem If you still have problems with orphans, you should raise these numbers. Extremal settings are sort of: Z= Y= Z= echo "$X $Y $Z" > /proc/sys/net/ipv4/tcp_mem Set them to maximum and if the messages will not disappear completely, decrease them to more tough limits. > Feb 18 15:05:44 mcquack kernel: sending pkt_too_big to self Normal. Debugging. > Feb 18 15:24:07 mcquack kernel: TCP: peer xx.xx.xx.xx:1084/7000 shrinks > window 210605:1072:2106779313. Bad, what else can I say? Debugging too. > Feb 18 15:42:06 mcquack kernel: TCP: dbg sk->wmem_queued 5664 > tcp_orphan_count 99 tcp_memory_allocated 6145 Number Z is exceeded, newly _closed_ sockets will be aborted and stack entered state of moderation of its appetite. Dump, which you have sent to me and further messages in logs, show that it succeded and converged to normal state. > Please let me know if I can provide more debug info or test something! Actually, the only dubious place in your original report was something about behaviour of ssh. ssh surely cannot be affected by this effect. Could you elaborate this? What kind of problem exactly? Maybe, some tcpdump is the problem is reproducable. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.4.1 under heavy network load - more info
Hello! of errors a bit but I'm not sure I fully understand the implications of doing so. Until these numbers do not exceed total amount of RAM, this is exactly the action required in this case. Dumps, which you sent to me, show nothing pathological. Actually, they are made in some period of full peace: only 47 orphans and only about 10MB of accounted memory. echo 30 /proc/sys/net/ipv4/tcp_fin_timeout This is not essential with 2.4. In 2.4 this state does not grab any essential resources. echo 0 /proc/sys/net/ipv4/tcp_timestamps echo 0 /proc/sys/net/ipv4/tcp_sack Why? echo "3072 3584 6144" /proc/sys/net/ipv4/tcp_mem If you still have problems with orphans, you should raise these numbers. Extremal settings are sort of: Z=total amount of ram in pages Y=something Z Z=something Y echo "$X $Y $Z" /proc/sys/net/ipv4/tcp_mem Set them to maximum and if the messages will not disappear completely, decrease them to more tough limits. Feb 18 15:05:44 mcquack kernel: sending pkt_too_big to self Normal. Debugging. Feb 18 15:24:07 mcquack kernel: TCP: peer xx.xx.xx.xx:1084/7000 shrinks window 210605:1072:2106779313. Bad, what else can I say? Debugging too. Feb 18 15:42:06 mcquack kernel: TCP: dbg sk-wmem_queued 5664 tcp_orphan_count 99 tcp_memory_allocated 6145 Number Z is exceeded, newly _closed_ sockets will be aborted and stack entered state of moderation of its appetite. Dump, which you have sent to me and further messages in logs, show that it succeded and converged to normal state. Please let me know if I can provide more debug info or test something! Actually, the only dubious place in your original report was something about behaviour of ssh. ssh surely cannot be affected by this effect. Could you elaborate this? What kind of problem exactly? Maybe, some tcpdump is the problem is reproducable. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: MTU and 2.4.x kernel
Hello! > We are implementing an IP stack. Alan, please, tell me what is wrong. And we will repair this. The implementation follows RFCs and even relaxes their requirements in the cases, when they are far from reality. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
Hello! > You are right - our sendfile() implementation is broken. I have fixed it Thank you! > Investigation shows that the Linux network layer is behaving oddly. It > seems that we are writing 4096 bytes to a socket. This proceeds in 4096 > byte chunks until the send buffer on the socket is full, and a 4096 byte > write blocks. This blocking write is eventually interrupted by the > timeout, and the write call returns.. wait for it.. 4096! This suggests > there was socket space after all, and the call should not have blocked. Wakeup does not happen until _enough_ (1/3 of snbuf) of space in sndbuf is released, otherwise you will overschedule. So, as soon as write() goes to sleep, it will sleep waiting until 1/3 is released. If it is interrupted, it use all the released space immediately before exit. Again, to make more for in this context. This can be even wrong and, probably, we should return instantly with -EAGAIN/-EINTR/partial count, but it is most likely suboptimal (though I have already changed this to instant return). But this does not look essential from caller's viewpoint, except for sendfile() of course. 8) Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SO_SNDTIMEO: 2.4 kernel bugs
Hello! You are right - our sendfile() implementation is broken. I have fixed it Thank you! Investigation shows that the Linux network layer is behaving oddly. It seems that we are writing 4096 bytes to a socket. This proceeds in 4096 byte chunks until the send buffer on the socket is full, and a 4096 byte write blocks. This blocking write is eventually interrupted by the timeout, and the write call returns.. wait for it.. 4096! This suggests there was socket space after all, and the call should not have blocked. Wakeup does not happen until _enough_ (1/3 of snbuf) of space in sndbuf is released, otherwise you will overschedule. So, as soon as write() goes to sleep, it will sleep waiting until 1/3 is released. If it is interrupted, it use all the released space immediately before exit. Again, to make more for in this context. This can be even wrong and, probably, we should return instantly with -EAGAIN/-EINTR/partial count, but it is most likely suboptimal (though I have already changed this to instant return). But this does not look essential from caller's viewpoint, except for sendfile() of course. 8) Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: MTU and 2.4.x kernel
Hello! We are implementing an IP stack. Alan, please, tell me what is wrong. And we will repair this. The implementation follows RFCs and even relaxes their requirements in the cases, when they are far from reality. Alexey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/