Re: [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?))

2001-07-19 Thread kuznet

Hello!

> However, could we not have dev_queue_xmit behave as such (not free
> frame on failure)?  

If you need to hold original skb, you may hold its refcnt.

However, this feature inevitably results in big troubles: dev_queue_xmit()
is allowed to change skb and you cannot assume anything about this.
So that resuing skb dev_queue_xmit() is fatal bug not depending on anything.


> The reason why I'm proposing this is that ppp_generic.c assumes that
> the skb is still available after a transmission failure via pppoe.  To
> support the semantics of dev_queue_xmit and ppp_generic we would have
> to always copy skb's inside pppoe_xmit.  Then, if dev_queue_xmit fails
> the original is deleted.

You need not copy it. I said "clone".

Nobody is allowed to touch _data_ part of skb, so that you need not
to copy data.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?))

2001-07-19 Thread kuznet

Hello!

SOme short comment on the patch:


> - dev_queue_xmit(skb);
> + /* The skb we are to transmit may be a copy (see above).  If
> +  * this fails, then the caller is responsible for the original
> +  * skb, otherwise we must free it.  Also if this fails we must
> +  * free the copy that we made.
> +  */
> +
> + if (dev_queue_xmit(skb)<0) {

dev_queue_xmit _frees_ frame, not depending on return value.
Return value is not a criterium to assume anything.



> + if (old_skb) {
> + /* The skb we tried to send was a copy.  We
> +  * have to free it (the copy) and let the
> +  * caller deal with the original one.
> +  */
> + skb_unlink(skb);

So, do you pass to dev_queue_xmit some skb, which is on some list?
Not a good idea. Please, clone it and submit clone, if you need to hold
original in some list.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?))

2001-07-19 Thread kuznet

Hello!

SOme short comment on the patch:


 - dev_queue_xmit(skb);
 + /* The skb we are to transmit may be a copy (see above).  If
 +  * this fails, then the caller is responsible for the original
 +  * skb, otherwise we must free it.  Also if this fails we must
 +  * free the copy that we made.
 +  */
 +
 + if (dev_queue_xmit(skb)0) {

dev_queue_xmit _frees_ frame, not depending on return value.
Return value is not a criterium to assume anything.



 + if (old_skb) {
 + /* The skb we tried to send was a copy.  We
 +  * have to free it (the copy) and let the
 +  * caller deal with the original one.
 +  */
 + skb_unlink(skb);

So, do you pass to dev_queue_xmit some skb, which is on some list?
Not a good idea. Please, clone it and submit clone, if you need to hold
original in some list.

Alexey
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?))

2001-07-19 Thread kuznet

Hello!

 However, could we not have dev_queue_xmit behave as such (not free
 frame on failure)?  

If you need to hold original skb, you may hold its refcnt.

However, this feature inevitably results in big troubles: dev_queue_xmit()
is allowed to change skb and you cannot assume anything about this.
So that resuing skb dev_queue_xmit() is fatal bug not depending on anything.


 The reason why I'm proposing this is that ppp_generic.c assumes that
 the skb is still available after a transmission failure via pppoe.  To
 support the semantics of dev_queue_xmit and ppp_generic we would have
 to always copy skb's inside pppoe_xmit.  Then, if dev_queue_xmit fails
 the original is deleted.

You need not copy it. I said clone.

Nobody is allowed to touch _data_ part of skb, so that you need not
to copy data.

Alexey
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: softirq in pre3 and all linux ports

2001-06-20 Thread kuznet

Hello!

> Soft irqs should definitely not be much heavier than an irq handler,
> if they are then we have implemented them wrongly somehow.

For example, all the networking nicely fits to this class. :-)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: softirq in pre3 and all linux ports

2001-06-20 Thread kuznet

Hello!

 Soft irqs should definitely not be much heavier than an irq handler,
 if they are then we have implemented them wrongly somehow.

For example, all the networking nicely fits to this class. :-)

Alexey
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: IPv6: the same address can be added multiple times

2001-05-18 Thread kuznet

Hello!

>  2) no significant restrictions (==this)

When user asks to create some object, the only required thing
of any reasonable interface is to return an error when the object
is not added.

KAME's one is broken, ours is _one_ of right ones.


Another example of bad mistake is mine: I have made some crap with creating
tunnels: adding tunnel does not fail, when such tunnel already exists,
so that user has no idea, whether did it create tunnel (and should it
delete it) or someone another made this work. Note, that if we would
be able to create _duplicate_ tunnels on each new request (like IPv6 addresses),
this would be also right approach.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: IPv6: the same address can be added multiple times

2001-05-18 Thread kuznet

Hello!

  2) no significant restrictions (==this)

When user asks to create some object, the only required thing
of any reasonable interface is to return an error when the object
is not added.

KAME's one is broken, ours is _one_ of right ones.


Another example of bad mistake is mine: I have made some crap with creating
tunnels: adding tunnel does not fail, when such tunnel already exists,
so that user has no idea, whether did it create tunnel (and should it
delete it) or someone another made this work. Note, that if we would
be able to create _duplicate_ tunnels on each new request (like IPv6 addresses),
this would be also right approach.

Alexey
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: NETDEV_CHANGE events when __LINK_STATE_NOCARRIER is modified

2001-05-14 Thread kuznet

Hello!

> Jeff has introduced `alloc_etherdev()' which allocates storage
> for a netdev but doesn't register it.  The one quirk with this
> approach (and why it's vastly simpler than my thing) 

I do not see where it is simpler. The only difference is that
name is unknown. 8)


> Not many drivers have been converted to the new interface yet.

Paaardon! It is the only place where it takes sense to tell: "simpler"
and it was sense of your patch! Of course, it is much "simpler" to leave
all the devices in buggy state, no doubts. 8)

What's about dev_probe_lock, I again do not understand why it is not deleted.
Please, shed some light.


> is a bit foggy.  ISTR that the init() method was inherently
> immune to this race.

8) Imagine, I believed that all the devices use this method for years.
The discovery that init_etherdev does some shit was real catharsis. 8)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: skb->truesize > sk->rcvbuf == Dropped packets

2001-05-14 Thread kuznet

Hello!

> Hmmm... I don't see how not touching buffer values can solve his
> problem at all.  His MTU is really HUGE, and in this case 300 byte
> packet eats 10k or so space in receive buffer.

Default rcvbuf is ~64K, it is enough to receive up to mtu of a bit less 64K.
When application says rcvbuf=2K it appearently does not expect to hold
more then one packet.



> I doubt our buffer size tuning algorithms can cope with this.

At least TCP will tune itself nicely under the most extremal conditions.
What's about this case, no chances to tune. rcvbuf just should be large.


> Really, copy threshold in driver just must be choosen carefully.

rx copybreak has no chances to work in the case of large mtus...
F.e. it does not with gigabit cards, which use 9K mtu, but need
to talk to 1.5K world. Blind copybreak at 1.5K is a non-sense,
meaning "copy all". Though acenic with its three rx rings is
a lucky exception. 8)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: NETDEV_CHANGE events when __LINK_STATE_NOCARRIER is modified

2001-05-14 Thread kuznet

Hello!

 Jeff has introduced `alloc_etherdev()' which allocates storage
 for a netdev but doesn't register it.  The one quirk with this
 approach (and why it's vastly simpler than my thing) 

I do not see where it is simpler. The only difference is that
name is unknown. 8)


 Not many drivers have been converted to the new interface yet.

Paaardon! It is the only place where it takes sense to tell: simpler
and it was sense of your patch! Of course, it is much simpler to leave
all the devices in buggy state, no doubts. 8)

What's about dev_probe_lock, I again do not understand why it is not deleted.
Please, shed some light.


 is a bit foggy.  ISTR that the init() method was inherently
 immune to this race.

8) Imagine, I believed that all the devices use this method for years.
The discovery that init_etherdev does some shit was real catharsis. 8)

Alexey
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: skb-truesize sk-rcvbuf == Dropped packets

2001-05-14 Thread kuznet

Hello!

 Hmmm... I don't see how not touching buffer values can solve his
 problem at all.  His MTU is really HUGE, and in this case 300 byte
 packet eats 10k or so space in receive buffer.

Default rcvbuf is ~64K, it is enough to receive up to mtu of a bit less 64K.
When application says rcvbuf=2K it appearently does not expect to hold
more then one packet.



 I doubt our buffer size tuning algorithms can cope with this.

At least TCP will tune itself nicely under the most extremal conditions.
What's about this case, no chances to tune. rcvbuf just should be large.


 Really, copy threshold in driver just must be choosen carefully.

rx copybreak has no chances to work in the case of large mtus...
F.e. it does not with gigabit cards, which use 9K mtu, but need
to talk to 1.5K world. Blind copybreak at 1.5K is a non-sense,
meaning copy all. Though acenic with its three rx rings is
a lucky exception. 8)

Alexey
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: skb->truesize > sk->rcvbuf == Dropped packets

2001-05-13 Thread kuznet

Hello!

>  > Any suggestions on heuristics for this ? 

Not to set rcvbuf to ridiculously low values. The best variant is not
to touch SO_*BUF options at all.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: NETDEV_CHANGE events when __LINK_STATE_NOCARRIER is modified

2001-05-13 Thread kuznet

Hello!

> I believe these events get sent to the cardmgr daemon and it does
> all the ifconf magic to change the device state.

Compare this also to the situation with netif_present().

After Linus said that it is called from thread context, I prepared
corresponding code for netif_present (and for carrier detection
in assumption it is called from thread context or softirq)
BUT... this happened to be not true.

So, these macros still do not assume anything on context.
As result netif_carrier* is unreliable, netif_present is still straight bug.
Should be fixed, of course.

BTW what did happen with Andrew's netdev registration patch?
By some strange reason I believed it is already applied... Grrr.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: IPv6: the same address can be added multiple times

2001-05-13 Thread kuznet

Hello!

> It appears you can add _exactly_ same IPv6 address on an interface many
> times:

Yes. BTW, look here:

kuznet@dust:~ # ip -6 a ls sit0
7: sit0@NONE:  mtu 1480 qdisc noqueue
inet6 ::127.0.0.1/96 scope host
inet6 ::193.233.7.100/96 scope global
inet6 ::193.233.7.100/96 scope global

I have two equal addresses inherited from one IPv4 address
on two interfaces. Nothing illegal.



> FWIW, KAME stack adds the address only once(, but BSD ifconfig(8)
> doesn't show errors when you try to do it again; just doesn't add the
> second one).

8)

> It looks like a check or two in kernel is missing, or is there some
> reasoning to this behaviour?

Well, it is one of well defined approaches (unlike KAME's one).
Alternative is to implement full set of options NLM_F_* like used
in IPv4 routing to block undefined cases. In IPv6 flags are hardwired
to NLM_F_CREATE|NLM_F_APPEND both for addresses and routes.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: IPv6: the same address can be added multiple times

2001-05-13 Thread kuznet

Hello!

 It appears you can add _exactly_ same IPv6 address on an interface many
 times:

Yes. BTW, look here:

kuznet@dust:~ # ip -6 a ls sit0
7: sit0@NONE: NOARP,UP mtu 1480 qdisc noqueue
inet6 ::127.0.0.1/96 scope host
inet6 ::193.233.7.100/96 scope global
inet6 ::193.233.7.100/96 scope global

I have two equal addresses inherited from one IPv4 address
on two interfaces. Nothing illegal.



 FWIW, KAME stack adds the address only once(, but BSD ifconfig(8)
 doesn't show errors when you try to do it again; just doesn't add the
 second one).

8)

 It looks like a check or two in kernel is missing, or is there some
 reasoning to this behaviour?

Well, it is one of well defined approaches (unlike KAME's one).
Alternative is to implement full set of options NLM_F_* like used
in IPv4 routing to block undefined cases. In IPv6 flags are hardwired
to NLM_F_CREATE|NLM_F_APPEND both for addresses and routes.

Alexey
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: NETDEV_CHANGE events when __LINK_STATE_NOCARRIER is modified

2001-05-13 Thread kuznet

Hello!

 I believe these events get sent to the cardmgr daemon and it does
 all the ifconf magic to change the device state.

Compare this also to the situation with netif_present().

After Linus said that it is called from thread context, I prepared
corresponding code for netif_present (and for carrier detection
in assumption it is called from thread context or softirq)
BUT... this happened to be not true.

So, these macros still do not assume anything on context.
As result netif_carrier* is unreliable, netif_present is still straight bug.
Should be fixed, of course.

BTW what did happen with Andrew's netdev registration patch?
By some strange reason I believed it is already applied... Grrr.

Alexey
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: skb-truesize sk-rcvbuf == Dropped packets

2001-05-13 Thread kuznet

Hello!

   Any suggestions on heuristics for this ? 

Not to set rcvbuf to ridiculously low values. The best variant is not
to touch SO_*BUF options at all.

Alexey
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-05-01 Thread kuznet

Hello!

> If send_head doesn't point to skb then it is before it (and it cannot
> advance under us of course because we hold the sock lock) and so in such
> case we didn't clobbered the send_head at all in skb_entail, and so we
> don't need to touch send_head in order to undo (we only need to unlink).
> 
> See?

I see! Dave, please, take the second Andrea's patch (appended).
It is really the cleanest one.

Alexey


--- 2.4.4aa3/net/ipv4/tcp.c.~1~ Tue May  1 10:44:57 2001
+++ 2.4.4aa3/net/ipv4/tcp.c Tue May  1 12:00:25 2001
@@ -1183,11 +1183,8 @@
 
 do_fault:
if (skb->len==0) {
-   if (tp->send_head == skb) {
-   tp->send_head = skb->next;
-   if (tp->send_head == (struct sk_buff*)>write_queue)
-   tp->send_head = NULL;
-   }
+   if (tp->send_head == skb)
+   tp->send_head = NULL;
__skb_unlink(skb, skb->list);
tcp_free_skb(sk, skb);
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-05-01 Thread kuznet

Hello!

> zero and we are running in such slow path, it is obvious the send_head
> _was_ NULL when we entered the critical section, so it's perfectly fine

It is not only not obvious, it is not true almost always.
On normally working tcp send_head is almost never NULL,
it is NULL only when application is so slow that is not able
to saturate pipe. If you do not believe my word, add printk checking this. 8)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-05-01 Thread kuznet

Hello!

> this is the strict fix:

Andrea, you caught the problem!

The fix is not right though (it is equivalent to straight
tp->send_head=NULL, as you noticed. It also corrupts queue in
an opposite manner.) Right fix is appended.

Explanation: in do_fault we must undo effect of enqueueing new segment
in the case the segment remained empty. tp->send_head points to
the first unsent skb in queue and it is NULL when and only when
all the skbs are already sent. (Invariant is: tp->send_head==NULL ||
tp->send_head->seq == tp->snd_nxt)
I crapped this case except for the case when queue is completely empty,
so that the last sent skb was accounted in packets_out twice...

Damn, what a silly mistake was it... shame.

Alexey


--- ../vger3-010426/linux/net/ipv4/tcp.cWed Apr 25 21:02:18 2001
+++ linux/net/ipv4/tcp.cTue May  1 20:38:44 2001
@@ -1185,7 +1187,7 @@
if (skb->len==0) {
if (tp->send_head == skb) {
tp->send_head = skb->prev;
-   if (tp->send_head == (struct sk_buff*)>write_queue)
+   if (TCP_SKB_CB(skb)->seq == tp->snd_nxt)
tp->send_head = NULL;
}
__skb_unlink(skb, skb->list);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-05-01 Thread kuznet

Hello!

 If send_head doesn't point to skb then it is before it (and it cannot
 advance under us of course because we hold the sock lock) and so in such
 case we didn't clobbered the send_head at all in skb_entail, and so we
 don't need to touch send_head in order to undo (we only need to unlink).
 
 See?

I see! Dave, please, take the second Andrea's patch (appended).
It is really the cleanest one.

Alexey


--- 2.4.4aa3/net/ipv4/tcp.c.~1~ Tue May  1 10:44:57 2001
+++ 2.4.4aa3/net/ipv4/tcp.c Tue May  1 12:00:25 2001
@@ -1183,11 +1183,8 @@
 
 do_fault:
if (skb-len==0) {
-   if (tp-send_head == skb) {
-   tp-send_head = skb-next;
-   if (tp-send_head == (struct sk_buff*)sk-write_queue)
-   tp-send_head = NULL;
-   }
+   if (tp-send_head == skb)
+   tp-send_head = NULL;
__skb_unlink(skb, skb-list);
tcp_free_skb(sk, skb);
}

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-05-01 Thread kuznet

Hello!

 this is the strict fix:

Andrea, you caught the problem!

The fix is not right though (it is equivalent to straight
tp-send_head=NULL, as you noticed. It also corrupts queue in
an opposite manner.) Right fix is appended.

Explanation: in do_fault we must undo effect of enqueueing new segment
in the case the segment remained empty. tp-send_head points to
the first unsent skb in queue and it is NULL when and only when
all the skbs are already sent. (Invariant is: tp-send_head==NULL ||
tp-send_head-seq == tp-snd_nxt)
I crapped this case except for the case when queue is completely empty,
so that the last sent skb was accounted in packets_out twice...

Damn, what a silly mistake was it... shame.

Alexey


--- ../vger3-010426/linux/net/ipv4/tcp.cWed Apr 25 21:02:18 2001
+++ linux/net/ipv4/tcp.cTue May  1 20:38:44 2001
@@ -1185,7 +1187,7 @@
if (skb-len==0) {
if (tp-send_head == skb) {
tp-send_head = skb-prev;
-   if (tp-send_head == (struct sk_buff*)sk-write_queue)
+   if (TCP_SKB_CB(skb)-seq == tp-snd_nxt)
tp-send_head = NULL;
}
__skb_unlink(skb, skb-list);
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-05-01 Thread kuznet

Hello!

 zero and we are running in such slow path, it is obvious the send_head
 _was_ NULL when we entered the critical section, so it's perfectly fine

It is not only not obvious, it is not true almost always.
On normally working tcp send_head is almost never NULL,
it is NULL only when application is so slow that is not able
to saturate pipe. If you do not believe my word, add printk checking this. 8)

Alexey
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-04-30 Thread kuznet

Hello!

> My current theory is that tcpblast does something erratic when the
> error occurs.

It has buffer size of 32K, so that it faults at enough large chunk sizes.

Erratic errno is because this applet prints errno on partial write.

Oops is apparently because I did something wrong in do_fault yet.
Seems, you were right telling that this place looks dubious. 8)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.4: Kernel crash, possibly tcp related

2001-04-30 Thread kuznet

Hello!

 My current theory is that tcpblast does something erratic when the
 error occurs.

It has buffer size of 32K, so that it faults at enough large chunk sizes.

Erratic errno is because this applet prints errno on partial write.

Oops is apparently because I did something wrong in do_fault yet.
Seems, you were right telling that this place looks dubious. 8)

Alexey
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bug report: tcp staled when send-q != 0, timers == 0.

2001-04-21 Thread kuznet

Hello!

>  Im my case P-MTU discovery

Sorry, I lied. Not pmtu discovery but exaclty opposite effect
is important here: collapsing of small frames to larger ones.
Each such merge results in loss of 1 "sack" in 2.2.

>  I only wrote that it was active when got stuck. It may be idle before -
>  I do not remember, but have a habit to keep connections for weeks. :)

Good. 8)

>  As my experiments show, any connection, entering keepalive once,
>  have lose its ability to send zero probes - forever.

Exactly.


>  OK. Let us return to the "mss/mtu bug". The most mystifying thing for
>  me is the dependance of the MTU threshold on the kernel version, etc.

Well, you can reinvestigate this to get more reliable results...

Actually, this problem is so difficult that the study would be purely
academical; there is no hope to fix it in 2.2. It is partially
repaired during 2.3 and completely resolved only in 2.4.4.


>  But the question is what the minimum "reliable" MTU. There are lots of
>  situations when data comes rapidly in small packets (say, monitoring logs).
>  Is there a danger to lose such connections on a heavily loaded host?

There is no real danger. Bad things can happen only when receiver does not
read data for very long time, in this case connection times out not
receiving any acks.

What's about minimum/maximum mtu... it does not exist. F.e. if sender floods
1 byte frames in TCP_NODELAY mode and receiver does not read them, 2.2 will
fail not depending on mtu. See? Even 40 bytes of IP+TCP headers (not counting
for additional overhead) guarantee that memory will exhaust by order earlier
than receiver can close window.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bug report: tcp staled when send-q != 0, timers == 0.

2001-04-21 Thread kuznet

Hello!

  Im my case P-MTU discovery

Sorry, I lied. Not pmtu discovery but exaclty opposite effect
is important here: collapsing of small frames to larger ones.
Each such merge results in loss of 1 "sack" in 2.2.

  I only wrote that it was active when got stuck. It may be idle before -
  I do not remember, but have a habit to keep connections for weeks. :)

Good. 8)

  As my experiments show, any connection, entering keepalive once,
  have lose its ability to send zero probes - forever.

Exactly.


  OK. Let us return to the "mss/mtu bug". The most mystifying thing for
  me is the dependance of the MTU threshold on the kernel version, etc.

Well, you can reinvestigate this to get more reliable results...

Actually, this problem is so difficult that the study would be purely
academical; there is no hope to fix it in 2.2. It is partially
repaired during 2.3 and completely resolved only in 2.4.4.


  But the question is what the minimum "reliable" MTU. There are lots of
  situations when data comes rapidly in small packets (say, monitoring logs).
  Is there a danger to lose such connections on a heavily loaded host?

There is no real danger. Bad things can happen only when receiver does not
read data for very long time, in this case connection times out not
receiving any acks.

What's about minimum/maximum mtu... it does not exist. F.e. if sender floods
1 byte frames in TCP_NODELAY mode and receiver does not read them, 2.2 will
fail not depending on mtu. See? Even 40 bytes of IP+TCP headers (not counting
for additional overhead) guarantee that memory will exhaust by order earlier
than receiver can close window.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: CONFIG_PACKET_MMAP help

2001-04-20 Thread kuznet

Hello!

> 1. for tp_frame_size, I dont want to truncate any data on ethernet, I
> need 1514 bytes, is this the best way to do it and not waste space?

To select small snapsize (obtained from later experiments),
to set PACKET_COPY_THRESH to read larger packets via recvmsg().

> 2. what is tp_block_nr for?  I dont understand it, I just set it to 1
> and make tp_block_size big enough for all the frames I need, so its
> just one contiguous space, all I need is about a megabyte I think.

Kernel has problems with allocating large chunks of memory.
If you see problems with allocating large chuns, split them
to less ones.

> while(1) {
>if (tp->status == 0) poll() for pollin on the socket  /* is there a
>race here? */

No. poll returns, when new frame appears.


> 4. what does the copy threshold setsockopt tuning accomplish? doesnt it always
> have to copy anyway, to the mmaped area?

see anser to question 1. It has a sense when size of chunk is small enough.
Small packets are copied to ring, large ones (which are truncated) are queued
to socket to be received via recvmsg().

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: CONFIG_PACKET_MMAP help

2001-04-20 Thread kuznet

Hello!

 1. for tp_frame_size, I dont want to truncate any data on ethernet, I
 need 1514 bytes, is this the best way to do it and not waste space?

To select small snapsize (obtained from later experiments),
to set PACKET_COPY_THRESH to read larger packets via recvmsg().

 2. what is tp_block_nr for?  I dont understand it, I just set it to 1
 and make tp_block_size big enough for all the frames I need, so its
 just one contiguous space, all I need is about a megabyte I think.

Kernel has problems with allocating large chunks of memory.
If you see problems with allocating large chuns, split them
to less ones.

 while(1) {
if (tp-status == 0) poll() for pollin on the socket  /* is there a
race here? */

No. poll returns, when new frame appears.


 4. what does the copy threshold setsockopt tuning accomplish? doesnt it always
 have to copy anyway, to the mmaped area?

see anser to question 1. It has a sense when size of chunk is small enough.
Small packets are copied to ring, large ones (which are truncated) are queued
to socket to be received via recvmsg().

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bug report: tcp staled when send-q != 0, timers == 0.

2001-04-11 Thread kuznet

Hello!

>mtu 382 + keepalive yes -> loss
>mtu 382 + keepalive no  -> ok

Well, I ignored this because it looked as full sense. Sorry. 8)

>  such a picture? If the answer is "yes", I am almost satisfied. :-)

No, the answer is strict "no". Until keepalive is triggered the first
time, it cannot affect connection in _any_ way.


... sorry, I have to run. Let's defer the furter investigation.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bug report: tcp staled when send-q != 0, timers == 0.

2001-04-11 Thread kuznet

Hello!

>  If your model does not cover such situation, pls, take it in mind. :)

Taken.

Is the machine UP? The only other known dubious place is smp specific...

BTW if that cursed socket is still alive, try to make the experiment
with filling window on it. It must stuck, or my theory is completely wrong.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bug report: tcp staled when send-q != 0, timers == 0.

2001-04-11 Thread kuznet

Hello!

>  In my experiments linux simply sets mss=mtu-40 at the start of ethernet
>  connections. I do not know why, but belive it's ok. How the version of
>  kernel and configuration options can affect mss later?

You can figure out this yourself. In fact you measured this.

With mss=1460 the problem does not exist.

The problem begins f.e. when mss is less and packet arrives on ethernet.
It eats the same 1.5k of memory, but carries only ~mss bytes of tcp payload.
See? We do not know this forward, advertise large window, have not enough
rcvbuf to get it filled and cannot do anything but dropping new packets.

ppp is more difficult. Actually, I do not know exactly how it works now.
At least, ppp in 2.4 trims skb if it has too much of unused space.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bug report: tcp staled when send-q != 0, timers == 0.

2001-04-11 Thread kuznet

Hello!

>  At last, I tried several MTUs on 3d computer, running "right" 2.2.17, and
>  could not find conditions, under which any loss of ACKs can be detected.

8)8)8)

ppp also inclined to the mss/mtu bug, it allocates too large buffers
and never breaks them. The difference between kernels looks funny, but
I think it finds explanation in differences between mss/mtu's.

Alexey

[ I will be absent since tomorrow for some time. ]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bug report: tcp staled when send-q != 0, timers == 0.

2001-04-11 Thread kuznet

Hello!

> > If my guess is right, you can easily put this socket to funny state
> > just catting a large file and kill -STOP'ing ssh. ssh will close window,
> > but sshd will not send zero probes.
> 
>  [1] I have checked your statement on 2 different machines, running 2.2.17.
>  No confirmation. But this is much more funny than it simply sounds. :)

_That_ socket which was stuck must show this behaviour.

To get this on new socket you should leave session idle for >2hours
until the first keeplaive. After this it will never probe under
any curcumstances. The bug was that keepalive corrupts state of timer
and probe0 timer is not started after this.


>  buffer is filled, and client IS NOT stopped! :))) Hence connection dies
>  due to retransmission timeout on the server side.

It is known linuxism. If the ratio connection_mss/link_mtu less than ~1/4
or connection is flood with tiny packets, after rcvbuf is full linux
enters memory paranoia mode pretending that all the packets are lost.
Ugly, unpleasant, but luckily harmless under any normal curcumstances.

One way to workaround is to set rx_copybreak on ethernet drivers to 400-500.

The bug is really difficult. It is not cured even in current 2.4
(only with zerocopy patch).


>  I do not understand how connection with closed window can wait until
>  first keepalive - it must do zero probes instead.

If socket has ever sent keepalive, it will not be able to send zero window
probes after this.


>  Hmm... I observed this bug on the host, which never performs more
>  than 10 conn/sec and has peak loadvg ~ 0.15.

8)8)8) Probability is probability.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bug report: tcp staled when send-q != 0, timers == 0.

2001-04-11 Thread kuznet

Hello!

mtu 382 + keepalive yes - loss
mtu 382 + keepalive no  - ok

Well, I ignored this because it looked as full sense. Sorry. 8)

  such a picture? If the answer is "yes", I am almost satisfied. :-)

No, the answer is strict "no". Until keepalive is triggered the first
time, it cannot affect connection in _any_ way.


... sorry, I have to run. Let's defer the furter investigation.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bug report: tcp staled when send-q != 0, timers == 0.

2001-04-11 Thread kuznet

Hello!

  If my guess is right, you can easily put this socket to funny state
  just catting a large file and kill -STOP'ing ssh. ssh will close window,
  but sshd will not send zero probes.
 
  [1] I have checked your statement on 2 different machines, running 2.2.17.
  No confirmation. But this is much more funny than it simply sounds. :)

_That_ socket which was stuck must show this behaviour.

To get this on new socket you should leave session idle for 2hours
until the first keeplaive. After this it will never probe under
any curcumstances. The bug was that keepalive corrupts state of timer
and probe0 timer is not started after this.


  buffer is filled, and client IS NOT stopped! :))) Hence connection dies
  due to retransmission timeout on the server side.

It is known linuxism. If the ratio connection_mss/link_mtu less than ~1/4
or connection is flood with tiny packets, after rcvbuf is full linux
enters memory paranoia mode pretending that all the packets are lost.
Ugly, unpleasant, but luckily harmless under any normal curcumstances.

One way to workaround is to set rx_copybreak on ethernet drivers to 400-500.

The bug is really difficult. It is not cured even in current 2.4
(only with zerocopy patch).


  I do not understand how connection with closed window can wait until
  first keepalive - it must do zero probes instead.

If socket has ever sent keepalive, it will not be able to send zero window
probes after this.


  Hmm... I observed this bug on the host, which never performs more
  than 10 conn/sec and has peak loadvg ~ 0.15.

8)8)8) Probability is probability.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bug report: tcp staled when send-q != 0, timers == 0.

2001-04-11 Thread kuznet

Hello!

  If your model does not cover such situation, pls, take it in mind. :)

Taken.

Is the machine UP? The only other known dubious place is smp specific...

BTW if that cursed socket is still alive, try to make the experiment
with filling window on it. It must stuck, or my theory is completely wrong.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bug report: tcp staled when send-q != 0, timers == 0.

2001-04-10 Thread kuznet

Hello!

>  In brief: a stale state of the tcp send queue was observed for 2.2.17
>  while send-q counter and connection window sizes are not zero: 

I think I pinned down this. The patch is appended.


>  diagnostic, I'll try to get it. In any case, I plan to run something through
>  this connection in hope to reproduce this state again.

If my guess is right, you can easily put this socket to funny state
just catting a large file and kill -STOP'ing ssh. ssh will close window,
but sshd will not send zero probes. Any socket with keepalives enabled
enters this state after the first keepalive is sent.
[ Note, that it is not Butenko's problem, it is still to be discovered. 8) ]

I think you will not able to reproduce full problem: socket will revive
after the first received ACK. It is another bug and its probability is
astronomically low.

Alexey


--- linux/net/ipv4/tcp_input.c.orig Mon Apr  9 22:46:56 2001
+++ linux/net/ipv4/tcp_input.c  Tue Apr 10 21:23:33 2001
@@ -733,8 +733,6 @@
if (tp->retransmits) {
if (tp->packets_out == 0) {
tp->retransmits = 0;
-   tp->fackets_out = 0;
-   tp->retrans_out = 0;
tp->backoff = 0;
tcp_set_rto(tp);
} else {
@@ -781,8 +779,10 @@
if(sk->zapped)
return(1);  /* Dead, can't ack any more so why bother */
 
-   if (tp->pending == TIME_KEEPOPEN)
+   if (tp->pending == TIME_KEEPOPEN) {
tp->probes_out = 0;
+   tp->pending = 0;
+   }
 
tp->rcv_tstamp = tcp_time_stamp;
 
@@ -850,8 +850,6 @@
if (tp->retransmits) {
if (tp->packets_out == 0) {
tp->retransmits = 0;
-   tp->fackets_out = 0;
-   tp->retrans_out = 0;
}
} else {
/* We don't have a timestamp. Can only use
@@ -878,6 +876,8 @@
tcp_ack_packets_out(sk, tp);
} else {
tcp_clear_xmit_timer(sk, TIME_RETRANS);
+   tp->fackets_out = 0;
+   tp->retrans_out = 0;
}
 
flag &= (FLAG_DATA | FLAG_WIN_UPDATE);
--- linux/net/ipv4/tcp_output.c.origMon Apr  9 22:47:06 2001
+++ linux/net/ipv4/tcp_output.c Tue Apr 10 21:23:33 2001
@@ -546,6 +546,8 @@
 */
kfree_skb(next_skb);
sk->tp_pinfo.af_tcp.packets_out--;
+   if (sk->tp_pinfo.af_tcp.fackets_out)
+   sk->tp_pinfo.af_tcp.fackets_out--;
}
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Bug report: tcp staled when send-q != 0, timers == 0.

2001-04-10 Thread kuznet

Hello!

  In brief: a stale state of the tcp send queue was observed for 2.2.17
  while send-q counter and connection window sizes are not zero: 

I think I pinned down this. The patch is appended.


  diagnostic, I'll try to get it. In any case, I plan to run something through
  this connection in hope to reproduce this state again.

If my guess is right, you can easily put this socket to funny state
just catting a large file and kill -STOP'ing ssh. ssh will close window,
but sshd will not send zero probes. Any socket with keepalives enabled
enters this state after the first keepalive is sent.
[ Note, that it is not Butenko's problem, it is still to be discovered. 8) ]

I think you will not able to reproduce full problem: socket will revive
after the first received ACK. It is another bug and its probability is
astronomically low.

Alexey


--- linux/net/ipv4/tcp_input.c.orig Mon Apr  9 22:46:56 2001
+++ linux/net/ipv4/tcp_input.c  Tue Apr 10 21:23:33 2001
@@ -733,8 +733,6 @@
if (tp-retransmits) {
if (tp-packets_out == 0) {
tp-retransmits = 0;
-   tp-fackets_out = 0;
-   tp-retrans_out = 0;
tp-backoff = 0;
tcp_set_rto(tp);
} else {
@@ -781,8 +779,10 @@
if(sk-zapped)
return(1);  /* Dead, can't ack any more so why bother */
 
-   if (tp-pending == TIME_KEEPOPEN)
+   if (tp-pending == TIME_KEEPOPEN) {
tp-probes_out = 0;
+   tp-pending = 0;
+   }
 
tp-rcv_tstamp = tcp_time_stamp;
 
@@ -850,8 +850,6 @@
if (tp-retransmits) {
if (tp-packets_out == 0) {
tp-retransmits = 0;
-   tp-fackets_out = 0;
-   tp-retrans_out = 0;
}
} else {
/* We don't have a timestamp. Can only use
@@ -878,6 +876,8 @@
tcp_ack_packets_out(sk, tp);
} else {
tcp_clear_xmit_timer(sk, TIME_RETRANS);
+   tp-fackets_out = 0;
+   tp-retrans_out = 0;
}
 
flag = (FLAG_DATA | FLAG_WIN_UPDATE);
--- linux/net/ipv4/tcp_output.c.origMon Apr  9 22:47:06 2001
+++ linux/net/ipv4/tcp_output.c Tue Apr 10 21:23:33 2001
@@ -546,6 +546,8 @@
 */
kfree_skb(next_skb);
sk-tp_pinfo.af_tcp.packets_out--;
+   if (sk-tp_pinfo.af_tcp.fackets_out)
+   sk-tp_pinfo.af_tcp.fackets_out--;
}
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Re: softirq buggy

2001-04-09 Thread kuznet

Hello!

> Btw, you don't schedule the ksoftirqd thread if do_softirq() returns
> from the 'if(in_interrupt())' check.

ksoftirqd will not be switched to before the first schedule
or ret form syscall, when softirqs will be processed in any case.
So, wake up in this case would be mistake.


> I assume that this is the most common case of delayed softirq
> processing:

softirqs have the same latency warranty as rt threads, so that
this is not a problem at all.

The _real_ problem is softirqs generated from another softirqs:
additonal thread is made _not_ to speed up softirqs, but to _tame_
them (if I understood Andres's explanations correctly).

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [PATCH] Re: softirq buggy

2001-04-09 Thread kuznet

Hello!

 Btw, you don't schedule the ksoftirqd thread if do_softirq() returns
 from the 'if(in_interrupt())' check.

ksoftirqd will not be switched to before the first schedule
or ret form syscall, when softirqs will be processed in any case.
So, wake up in this case would be mistake.


 I assume that this is the most common case of delayed softirq
 processing:

softirqs have the same latency warranty as rt threads, so that
this is not a problem at all.

The _real_ problem is softirqs generated from another softirqs:
additonal thread is made _not_ to speed up softirqs, but to _tame_
them (if I understood Andres's explanations correctly).

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: softirq buggy [Re: Serial port latency]

2001-04-08 Thread kuznet

Hello!

> But with a huge overhead. I'd prefer to call it directly from within the
> idle functions, the overhead of schedule is IMHO too high.


+   if (current->need_resched) {
+   return 0;

+   }
+   if (softirq_active(smp_processor_id()) & softirq_mask(smp_processor_id())) {
+   do_softirq();
+   return 0;
^
You return one value in both casesand I decided it means "schedule". 8)
Apparently you meaned return 1 in the first case. 8)

But in this case it becomes wrong. do_softirq() can raise need_reshed
and moreover irqs arrive during it. Order of check should be different.


BTW what's about overhead... I suspect it is _lower_ in the case
of schedule(). In the case of networking at least, when softirq
most likely wakes some socket.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: TCP stack misbehaviour?

2001-04-08 Thread kuznet

Hello!

> empty, except for occasional ACKs. The utilization of the channel is about 4%.

1. tcpdump is required.
2. exact vesion of used kernel is required too.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: new queuing discipline

2001-04-08 Thread kuznet

Hello!

> packet in the queue. No other conditions i found. But i need repeatedly test
> the top packet in the queue.
> 
> How to accomplish it?

Look into sch_tbf.c for example. Hint: timer.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: softirq buggy [Re: Serial port latency]

2001-04-08 Thread kuznet

Hello!

> + if (softirq_active(smp_processor_id()) & softirq_mask(smp_processor_id())) {
> + do_softirq();
> + return 0;

BTW you may delete do_softirq()... schedule() will call this.


> + *
> + * Isn't this identical to default_idle with the 'no-hlt' boot
> + * option? <[EMAIL PROTECTED]>

Seeems, it is not. need_resched=-1 avoids useless IPIs.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: softirq buggy [Re: Serial port latency]

2001-04-08 Thread kuznet

Hello!

 + if (softirq_active(smp_processor_id())  softirq_mask(smp_processor_id())) {
 + do_softirq();
 + return 0;

BTW you may delete do_softirq()... schedule() will call this.


 + *
 + * Isn't this identical to default_idle with the 'no-hlt' boot
 + * option? [EMAIL PROTECTED]

Seeems, it is not. need_resched=-1 avoids useless IPIs.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: new queuing discipline

2001-04-08 Thread kuznet

Hello!

 packet in the queue. No other conditions i found. But i need repeatedly test
 the top packet in the queue.
 
 How to accomplish it?

Look into sch_tbf.c for example. Hint: timer.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: TCP stack misbehaviour?

2001-04-08 Thread kuznet

Hello!

 empty, except for occasional ACKs. The utilization of the channel is about 4%.

1. tcpdump is required.
2. exact vesion of used kernel is required too.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: udp <-> tcp connect

2001-03-31 Thread kuznet

Hello!

> I want to bind to non-local IP and send/receive UDP packets.

This is impossible, apparently.


> but in tcp_v4_connect:
> tmp = ip_route_connect(, nexthop, sk->saddr,
>   RT_TOS(sk->ip_tos)|RTO_CONN|sk->localroute, sk->bound_dev_if);
>   ^^^

And this is __terrible__ bug. RTO_CONN cannot be set here.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: IP layer bug?

2001-03-31 Thread kuznet

Hello!

> Hm. But comment in linux/skbuff.h says:

The comment is about more difficult case: transmit path,
where cb is used both by top level protocol and lower layers:
f.e. TCP -> IP -> device. cb is dirty from the moment of skb
creation in this case.

Also, note that the second sentence in the comment is obsolete.
Passing not cloned skbs between layers is strongly deprecated
practice (I hope it is not used in any place) and cb of skb entering
to lower layer is property of the layer.

RX path is simpler: cb must be kept clean, that's all.

General rule is minimization redundant clearings of the area.

> Why not document it somewhere, so that others will not fall into the same trap?

Indeed. 8) You got the experience, which you expect to be useful
for people, it is time to prepare some note recording this. 8)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: IP layer bug?

2001-03-31 Thread kuznet

Hello!

 Hm. But comment in linux/skbuff.h says:

The comment is about more difficult case: transmit path,
where cb is used both by top level protocol and lower layers:
f.e. TCP - IP - device. cb is dirty from the moment of skb
creation in this case.

Also, note that the second sentence in the comment is obsolete.
Passing not cloned skbs between layers is strongly deprecated
practice (I hope it is not used in any place) and cb of skb entering
to lower layer is property of the layer.

RX path is simpler: cb must be kept clean, that's all.

General rule is minimization redundant clearings of the area.

 Why not document it somewhere, so that others will not fall into the same trap?

Indeed. 8) You got the experience, which you expect to be useful
for people, it is time to prepare some note recording this. 8)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: udp - tcp connect

2001-03-31 Thread kuznet

Hello!

 I want to bind to non-local IP and send/receive UDP packets.

This is impossible, apparently.


 but in tcp_v4_connect:
 tmp = ip_route_connect(rt, nexthop, sk-saddr,
   RT_TOS(sk-ip_tos)|RTO_CONN|sk-localroute, sk-bound_dev_if);
   ^^^

And this is __terrible__ bug. RTO_CONN cannot be set here.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: IP layer bug?

2001-03-30 Thread kuznet

Hello!

>For now I workarounded it with filling skb->cb with zeroes before
>netif_rx(),

This is right. For another examples look into tunnels.

> but I believe it is a kludge and networking layer should be fixed instead.

No.

alloc_skb() creates skb with clean cb. ip_rcv() and other protocol handlers
do not redo this work. If device uses cb internally, it must clear it
before handing skb to netif_rx().

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: IP layer bug?

2001-03-30 Thread kuznet

Hello!

For now I workarounded it with filling skb-cb with zeroes before
netif_rx(),

This is right. For another examples look into tunnels.

 but I believe it is a kludge and networking layer should be fixed instead.

No.

alloc_skb() creates skb with clean cb. ip_rcv() and other protocol handlers
do not redo this work. If device uses cb internally, it must clear it
before handing skb to netif_rx().

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: rsync over ssh on 2.4.2 to 2.2.18

2001-03-19 Thread kuznet

Hello!

> Well, since I moved the rsync to 5pm, and then back to 9pm, I haven't
> seen this problem - everything is again working as expected (touch wood)
> with 2.2.15pre13 and 2.4.0.
> 
> This is odd, since it wasn't a one-off problem, but something that happened
> each and every day of a particular week.  Anyway, if it starts happening
> again, I'll get a tcpdump of the session.

Well... I can reproduce this not depending of day of week now. :-(

If I understood Andrew's mail correctly, rsync freezes when 
large amount of errors happen. Particularly, here ssh always freezes
trying to write to stdout (pipe linked to rsync) some chunk of data (>64K), 
consisting of reports sort of "stat(usr/X11R6/bin/xmbind) : No such file"
(my /usr has huge amount of stray symlinks...). rsync does not read
this pipe, trying to write some more data to pipe, which is
pipe linked to rsync stdin. Dead lock.

Note, that doing strace you do not see this write()! write() is interrupted
by strace and you see succesful select(). You can see real place of lockup
via ps axl or starting strace before lockup happened.

Andrew said about one more common case, when large amount of errors happens:
wrong permission on some target directory.

I have impression that long ago I observed the same affect,
when disk space at target exhausted.

Why do I tall this? Well, probably, something changed at fs,
wich you rsync and rsync generates less errors.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: rsync over ssh on 2.4.2 to 2.2.18

2001-03-19 Thread kuznet

Hello!

 Well, since I moved the rsync to 5pm, and then back to 9pm, I haven't
 seen this problem - everything is again working as expected (touch wood)
 with 2.2.15pre13 and 2.4.0.
 
 This is odd, since it wasn't a one-off problem, but something that happened
 each and every day of a particular week.  Anyway, if it starts happening
 again, I'll get a tcpdump of the session.

Well... I can reproduce this not depending of day of week now. :-(

If I understood Andrew's mail correctly, rsync freezes when 
large amount of errors happen. Particularly, here ssh always freezes
trying to write to stdout (pipe linked to rsync) some chunk of data (64K), 
consisting of reports sort of "stat(usr/X11R6/bin/xmbind) : No such file"
(my /usr has huge amount of stray symlinks...). rsync does not read
this pipe, trying to write some more data to pipe, which is
pipe linked to rsync stdin. Dead lock.

Note, that doing strace you do not see this write()! write() is interrupted
by strace and you see succesful select(). You can see real place of lockup
via ps axl or starting strace before lockup happened.

Andrew said about one more common case, when large amount of errors happens:
wrong permission on some target directory.

I have impression that long ago I observed the same affect,
when disk space at target exhausted.

Why do I tall this? Well, probably, something changed at fs,
wich you rsync and rsync generates less errors.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: poll() behaves differently in Linux 2.4.1 vs. Linux 2.2.14 (POLLHUP)

2001-03-15 Thread kuznet

Hello!

> Sure, workarounds exist, but they just complicates
> things.

Working around --- what?

An example of application hitting the case is enough to make
me completely agreed.

But genarally we are not going to match any os and even yourselves
yesterday or tomorrow in the cases when behaviour is truly undefined
and the answer is meaningless. For me any solution from retunring 0
or returning POLLHUO to killing offending application or generating
an answer using random number generator look equally good, acceptable
and 100% compatible in this case. 8)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: poll() behaves differently in Linux 2.4.1 vs. Linux 2.2.14 (POLLHUP)

2001-03-15 Thread kuznet

Hello!

 Sure, workarounds exist, but they just complicates
 things.

Working around --- what?

An example of application hitting the case is enough to make
me completely agreed.

But genarally we are not going to match any os and even yourselves
yesterday or tomorrow in the cases when behaviour is truly undefined
and the answer is meaningless. For me any solution from retunring 0
or returning POLLHUO to killing offending application or generating
an answer using random number generator look equally good, acceptable
and 100% compatible in this case. 8)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: poll() behaves differently in Linux 2.4.1 vs. Linux 2.2.14 (POLLHUP)

2001-03-14 Thread kuznet

Hello!

> True, this behavior was changed from 2.2.x.  We now match the behavior
> of other svr4 systems, in particular Solaris.

Damn, we did not test behaviour on absolutely new clean never
connected socket... Solaris really may return 0 on it.

However, looking from other hand the issue looks as absolutely
academic and not related to practice in any way.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: poll() behaves differently in Linux 2.4.1 vs. Linux 2.2.14 (POLLHUP)

2001-03-14 Thread kuznet

Hello!

 True, this behavior was changed from 2.2.x.  We now match the behavior
 of other svr4 systems, in particular Solaris.

Damn, we did not test behaviour on absolutely new clean never
connected socket... Solaris really may return 0 on it.

However, looking from other hand the issue looks as absolutely
academic and not related to practice in any way.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Feedback for fastselect and one-copy-pipe

2001-03-12 Thread kuznet

Hello!

> freebsd-4.0 doesn't use direct transfers for PAGE_SIZE'd pipe write()s:
> it uses  MINDIRECT=8192.

I see.

> (and PIPE_BUF is 512, so 4096 was possible for
> them)

8) I see.

Thank you for patience. 8)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Feedback for fastselect and one-copy-pipe

2001-03-12 Thread kuznet

Hello!

> freebsd

Very funny, the idea is borrowed from there.

As you could understand your patch kills it. PAGE_SIZE is one of the most
frequently used transfer unit.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Feedback for fastselect and one-copy-pipe

2001-03-12 Thread kuznet

Hello!

> It returns immediately on all unix platforms I tested

I see. It is essential moment. PAGE_SIZE was really bad threshold value.
Sigh and alas.

Alexey


PS BTW "all unix" is unlikely to include freebsd. 8)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Feedback for fastselect and one-copy-pipe

2001-03-12 Thread kuznet

Hello!

> * davem's patch breaks apps that assume that write(,PIPE_BUF) after
> poll(POLLOUT) never blocks, even for blocking pipes.

Pardon, but PIPE_BUF <= PAGE_SIZE yet, so that fears have no reasons.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Feedback for fastselect and one-copy-pipe

2001-03-12 Thread kuznet

Hello!

 * davem's patch breaks apps that assume that write(,PIPE_BUF) after
 poll(POLLOUT) never blocks, even for blocking pipes.

Pardon, but PIPE_BUF = PAGE_SIZE yet, so that fears have no reasons.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Feedback for fastselect and one-copy-pipe

2001-03-12 Thread kuznet

Hello!

 It returns immediately on all unix platforms I tested

I see. It is essential moment. PAGE_SIZE was really bad threshold value.
Sigh and alas.

Alexey


PS BTW "all unix" is unlikely to include freebsd. 8)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Feedback for fastselect and one-copy-pipe

2001-03-12 Thread kuznet

Hello!

 freebsd

Very funny, the idea is borrowed from there.

As you could understand your patch kills it. PAGE_SIZE is one of the most
frequently used transfer unit.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Feedback for fastselect and one-copy-pipe

2001-03-12 Thread kuznet

Hello!

 freebsd-4.0 doesn't use direct transfers for PAGE_SIZE'd pipe write()s:
 it uses  MINDIRECT=8192.

I see.

 (and PIPE_BUF is 512, so 4096 was possible for
 them)

8) I see.

Thank you for patience. 8)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Incoming TCP TOS: A simple question, I would have thought...

2001-03-07 Thread kuznet

Hello!

> I've scrolled through various code in net/ipv4, and I can't see how to query 
> the TOS of an incoming TCP stream (or at the least, the TOS of the SYN which
> initiated the connection).

No way. Formally it is IP_RECVTOS, followed by IP_PKTOPTIONS.
But getting TOS via IP_PKTOPTIONS is not implemented, only ttl.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Incoming TCP TOS: A simple question, I would have thought...

2001-03-07 Thread kuznet

Hello!

 I've scrolled through various code in net/ipv4, and I can't see how to query 
 the TOS of an incoming TCP stream (or at the least, the TOS of the SYN which
 initiated the connection).

No way. Formally it is IP_RECVTOS, followed by IP_PKTOPTIONS.
But getting TOS via IP_PKTOPTIONS is not implemented, only ttl.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Inadequate documentation: sockets

2001-03-06 Thread kuznet

Hello!

> The manual specifies the following flag to be returned by the
> kernel
> > #define POLLHUP 0x0010/* Hung up */
> 
> Hanging up is ambiguous. Does it mean that the client is dead,
> that he closed his end of the socket, or that he shut down one or
> both directions of the data flow? The following man page clears
> this up, but I think the following information would best be
> placed in man poll.

The information is is not quite correct.

POLLHUP is really ambigous in the case of full-duplex connections
with bi-directional close sort of TCP. However, invariants are:

* POLLHUP is set when connection is closed in both directions.
* POLLHUP implies that descriptor is not-writable and any write
  will cause error and, probably, SIGPIPE.
* POLLHUP is _not_ set when descriptor is writable (i.e. connection
  is shutdowned in write direction by remote)

Standards require that POLLHUP and POLLOUT never happened
together, however linux does this, which is formally bug.
However, it is not fixed, assuming that POLLHUP overrides POLLOUT.


> Finally I'm left with my original problem: how am I supposed to
> detect a close or a shutdown from the peer?

By EOF. No other way exists. POLLHUP is local condition, only
local side can close connection in write direction. Exception
is abort (reset) initiated by peer.


> by addressing me to more adequate documentation.

UNIX98 and Austin draft pages. The are very ambiguous though.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Inadequate documentation: sockets

2001-03-06 Thread kuznet

Hello!

 The manual specifies the following flag to be returned by the
 kernel
  #define POLLHUP 0x0010/* Hung up */
 
 Hanging up is ambiguous. Does it mean that the client is dead,
 that he closed his end of the socket, or that he shut down one or
 both directions of the data flow? The following man page clears
 this up, but I think the following information would best be
 placed in man poll.

The information is is not quite correct.

POLLHUP is really ambigous in the case of full-duplex connections
with bi-directional close sort of TCP. However, invariants are:

* POLLHUP is set when connection is closed in both directions.
* POLLHUP implies that descriptor is not-writable and any write
  will cause error and, probably, SIGPIPE.
* POLLHUP is _not_ set when descriptor is writable (i.e. connection
  is shutdowned in write direction by remote)

Standards require that POLLHUP and POLLOUT never happened
together, however linux does this, which is formally bug.
However, it is not fixed, assuming that POLLHUP overrides POLLOUT.


 Finally I'm left with my original problem: how am I supposed to
 detect a close or a shutdown from the peer?

By EOF. No other way exists. POLLHUP is local condition, only
local side can close connection in write direction. Exception
is abort (reset) initiated by peer.


 by addressing me to more adequate documentation.

UNIX98 and Austin draft pages. The are very ambiguous though.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Another rsync over ssh hang (repeatable, with 2.4.1 on both ends)

2001-03-03 Thread kuznet

Hello!

> this kernel was compiled with GCC 2.95.2,

This is a hint.

Could you make the following things:

1. to disassemble tcp_poll() (the easiest way is to gdb vmlinux, to 
   say x/i tcp_poll and to hold enter pressed long enough, copying screen
   to file) and to send the result to me.
2. to apply the enclosed patchlet.
3. if 3 does not change anything, recompile with egcs-1.1.2

Alexey



--- ../vger3-010223/linux/net/ipv4/tcp.cFri Feb 23 21:28:34 2001
+++ linux/net/ipv4/tcp.cSat Mar  3 18:37:22 2001
@@ -442,6 +443,8 @@
set_bit(SOCK_ASYNC_NOSPACE, >socket->flags);
set_bit(SOCK_NOSPACE, >socket->flags);
 
+   barrier();
+
/* Race breaker. If space is freed after
 * wspace test but before the flags are set,
 * IO signal will be lost.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Another rsync over ssh hang (repeatable, with 2.4.1 on both ends)

2001-03-03 Thread kuznet

Hello!

 this kernel was compiled with GCC 2.95.2,

This is a hint.

Could you make the following things:

1. to disassemble tcp_poll() (the easiest way is to gdb vmlinux, to 
   say x/i tcp_poll and to hold enter pressed long enough, copying screen
   to file) and to send the result to me.
2. to apply the enclosed patchlet.
3. if 3 does not change anything, recompile with egcs-1.1.2

Alexey



--- ../vger3-010223/linux/net/ipv4/tcp.cFri Feb 23 21:28:34 2001
+++ linux/net/ipv4/tcp.cSat Mar  3 18:37:22 2001
@@ -442,6 +443,8 @@
set_bit(SOCK_ASYNC_NOSPACE, sk-socket-flags);
set_bit(SOCK_NOSPACE, sk-socket-flags);
 
+   barrier();
+
/* Race breaker. If space is freed after
 * wspace test but before the flags are set,
 * IO signal will be lost.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Another rsync over ssh hang (repeatable, with 2.4.1 on both ends)

2001-03-02 Thread kuznet

Hello!

> same means its not the same bug?

It is the same, I think.


> If you still insist that it is purely a 2.2.15pre13 bug

I never said this. I said that your strace is _wrong_, how can I be
sure that tcpdump is not wrong too? You could understand this. 8)


> together to put 2.2.18 on this machine.  I can't guarantee when I'll
> be able to do this though.

You planned to make more accurate strace on Monday, if I remember correctly.
Now it is not necessary, Scott's one is enough to understand that
some problem exists and cannot be explained by buggy 2.2.15.


> PS, could you please spell my name correctly?

I bring apologies.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Another rsync over ssh hang (repeatable, with 2.4.1 on both ends)

2001-03-02 Thread kuznet

Hello!

> I've also reported

The report by Scott Laird is sane unlike your one.
It can be explained by bug rather than only by poltergeist. 8)


> Thanks for confirming that 2.2.15pre13 is not the cause.

Russel, you are warned that kernels<2.2.17 and rsync is an incompatible
combination.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Another rsync over ssh hang (repeatable, with 2.4.1 on both ends)

2001-03-02 Thread kuznet

Hello!

 I've also reported

The report by Scott Laird is sane unlike your one.
It can be explained by bug rather than only by poltergeist. 8)


 Thanks for confirming that 2.2.15pre13 is not the cause.

Russel, you are warned that kernels2.2.17 and rsync is an incompatible
combination.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Another rsync over ssh hang (repeatable, with 2.4.1 on both ends)

2001-03-02 Thread kuznet

Hello!

 same means its not the same bug?

It is the same, I think.


 If you still insist that it is purely a 2.2.15pre13 bug

I never said this. I said that your strace is _wrong_, how can I be
sure that tcpdump is not wrong too? You could understand this. 8)


 together to put 2.2.18 on this machine.  I can't guarantee when I'll
 be able to do this though.

You planned to make more accurate strace on Monday, if I remember correctly.
Now it is not necessary, Scott's one is enough to understand that
some problem exists and cannot be explained by buggy 2.2.15.


 PS, could you please spell my name correctly?

I bring apologies.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: What is 2.4 Linux networking performance like compared to BSD?

2001-03-01 Thread kuznet

Hello!

> They know that iMimic's polymix performance on Linux 2.2.* is half what it is on
> BSD. 

What is "iMimic's polymix"? I am almost sure, it is simply buggy
and was not _debugged_ under linux.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: rsync over ssh on 2.4.2 to 2.2.18

2001-02-28 Thread kuznet

Hello!

> I'll see if I can strace it from the start until it hangs tomorrow.

Please...

Also, try to make binary tcpdump.


> I was running at one point a 2.4.0-test kernel, but I didn't see these

Yes, it did not result in full stall. Lost wakeups were recovered
f.e. by any keyboard activity. 8)

Wow! 2.2.15? rsync surely does work with 2.2.15. The first 2.2 fixing
the same bugs, which solaris has now, was 2.2.17.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: rsync over ssh on 2.4.2 to 2.2.18

2001-02-28 Thread kuznet

Hello!

> I've seen hanging rsync over ssh more than once, while sending much data
> from an x86 running Linux (late 2.3.x) to Sparc/Solaris2.5.1

I remember this your report. However, recent news force to suspect
that the reason was in Solaris yet. Actually, if you send tcpdump of
failed session, this question can be answered.


Also REMEMBER!

rsync has __no__ chances to work, if one of sides is Linux-2.2 before 2.2.17
or Solaris of today (sunos-2.5.1 is enough old, probably, it still was right
unlike subsequent kernels). These stacks have conciding set of fatal bugs,
which prevent transmits with closed window.

To use rsync it is necessary to upgrade to >=2.2.17.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: rsync over ssh on 2.4.2 to 2.2.18

2001-02-28 Thread kuznet

Hello!

 I've seen hanging rsync over ssh more than once, while sending much data
 from an x86 running Linux (late 2.3.x) to Sparc/Solaris2.5.1

I remember this your report. However, recent news force to suspect
that the reason was in Solaris yet. Actually, if you send tcpdump of
failed session, this question can be answered.


Also REMEMBER!

rsync has __no__ chances to work, if one of sides is Linux-2.2 before 2.2.17
or Solaris of today (sunos-2.5.1 is enough old, probably, it still was right
unlike subsequent kernels). These stacks have conciding set of fatal bugs,
which prevent transmits with closed window.

To use rsync it is necessary to upgrade to =2.2.17.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: rsync over ssh on 2.4.2 to 2.2.18

2001-02-28 Thread kuznet

Hello!

 I'll see if I can strace it from the start until it hangs tomorrow.

Please...

Also, try to make binary tcpdump.


 I was running at one point a 2.4.0-test kernel, but I didn't see these

Yes, it did not result in full stall. Lost wakeups were recovered
f.e. by any keyboard activity. 8)

Wow! 2.2.15? rsync surely does work with 2.2.15. The first 2.2 fixing
the same bugs, which solaris has now, was 2.2.17.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: rsync over ssh on 2.4.2 to 2.2.18

2001-02-27 Thread kuznet

Hello!

> netstat on isdn-gw shows the following:
> 
>   Proto Recv-Q Send-Q Local Address   Foreign Address State
>   tcp72868  0 isdn-gw.piltdown.a:1023 pilt-gw.piltdown.at:ssh ESTABLISHED
plus
> select(4, [3], [3], NULL, NULL) = 2 (in [3], out [3])


> Maybe there is a race condition or missing wakeup in the TCP code?

Moreover, even not _one_ wakeup is missing. At least two, because
wakeups in read and write are separate and you have stuck in both directions.
8)8)

Well, if it was one I would start to dig ground inside tcp instantly.
But as soon as two of them are missing, I have to suspect wake_up itself.
At least, we had such bugs there until 2.4.0.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: New net features for added performance

2001-02-27 Thread kuznet

Hello!

> > 3) Enforce correct usage of it in all the networking :-)
> 
> ,) -- the tricky part.

No tricks, IP[v6] is already enforced to be clever; all the rest are free
to do this, if they desire. And btw, driver need not to parse anything,
but its internal stuff and even aligning eth II header can be made
in eth_type_trans().

Actually, it is possible now not changing anything but driver.
Fortunately, I removed stupid tulip from alpha, so that I have
no impetus to try this myself. 8)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: possible bug x86 2.4.2 SMP in IP receive stack

2001-02-27 Thread kuznet

Hello!

> Feb 23 12:42:30 rcc2 kernel: Warning: kfree_skb passed an skb still on a list (from 
>c01f58dc).

BTW, that's didactic example of bug which results in similar behaviour.

Alexey


> From: [EMAIL PROTECTED] (Andrew Morton)
> Subject: Re: Failed assertion
> Date: 27 Feb 2001 04:15:01 +0300
> 
> "David S. Miller" wrote:
> > 
> > Ralf Baechle writes:
> >  > No backtrace, the machine did continue as you'd suspect after a print.
> >  > The machine is a dual CPU Origin 200 with an IOC3 NIC.
> > 
> > What is your current kernel based upon, some older 2.4.x or
> > even 2.3.x variant?  Or is it sync'd to current?
> 
> Could this be a driver problem?  This code:
> 
> netif_rx(skb);
> 
> ip->rx_skbs[rx_entry] = NULL;   /* Poison  */
> 
> new_skb = ioc3_alloc_skb(RX_BUF_ALLOC_SIZE, GFP_ATOMIC);
> if (!new_skb) {
> /* Ouch, drop packet and just recycle packet
>to keep the ring filled.  */
> ip->stats.rx_dropped++;
> new_skb = skb;
> goto next;
> }
> 
> looks scary.  We've passed an skb to the network stack,
> but we can continue to make it available to the device
> driver at the same time.
> 
> I'd suggest a printk() in there, plus perhaps do the
> alloc_skb _before_ the netif_rx().  Don't pass the skb
> to the stack if it is to be recycled.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: possible bug x86 2.4.2 SMP in IP receive stack

2001-02-27 Thread kuznet

Hello!

 Feb 23 12:42:30 rcc2 kernel: Warning: kfree_skb passed an skb still on a list (from 
c01f58dc).

BTW, that's didactic example of bug which results in similar behaviour.

Alexey


 From: [EMAIL PROTECTED] (Andrew Morton)
 Subject: Re: Failed assertion
 Date: 27 Feb 2001 04:15:01 +0300
 
 "David S. Miller" wrote:
  
  Ralf Baechle writes:
No backtrace, the machine did continue as you'd suspect after a print.
The machine is a dual CPU Origin 200 with an IOC3 NIC.
  
  What is your current kernel based upon, some older 2.4.x or
  even 2.3.x variant?  Or is it sync'd to current?
 
 Could this be a driver problem?  This code:
 
 netif_rx(skb);
 
 ip-rx_skbs[rx_entry] = NULL;   /* Poison  */
 
 new_skb = ioc3_alloc_skb(RX_BUF_ALLOC_SIZE, GFP_ATOMIC);
 if (!new_skb) {
 /* Ouch, drop packet and just recycle packet
to keep the ring filled.  */
 ip-stats.rx_dropped++;
 new_skb = skb;
 goto next;
 }
 
 looks scary.  We've passed an skb to the network stack,
 but we can continue to make it available to the device
 driver at the same time.
 
 I'd suggest a printk() in there, plus perhaps do the
 alloc_skb _before_ the netif_rx().  Don't pass the skb
 to the stack if it is to be recycled.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: New net features for added performance

2001-02-27 Thread kuznet

Hello!

  3) Enforce correct usage of it in all the networking :-)
 
 ,) -- the tricky part.

No tricks, IP[v6] is already enforced to be clever; all the rest are free
to do this, if they desire. And btw, driver need not to parse anything,
but its internal stuff and even aligning eth II header can be made
in eth_type_trans().

Actually, it is possible now not changing anything but driver.
Fortunately, I removed stupid tulip from alpha, so that I have
no impetus to try this myself. 8)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: rsync over ssh on 2.4.2 to 2.2.18

2001-02-27 Thread kuznet

Hello!

 netstat on isdn-gw shows the following:
 
   Proto Recv-Q Send-Q Local Address   Foreign Address State
   tcp72868  0 isdn-gw.piltdown.a:1023 pilt-gw.piltdown.at:ssh ESTABLISHED
plus
 select(4, [3], [3], NULL, NULL) = 2 (in [3], out [3])


 Maybe there is a race condition or missing wakeup in the TCP code?

Moreover, even not _one_ wakeup is missing. At least two, because
wakeups in read and write are separate and you have stuck in both directions.
8)8)

Well, if it was one I would start to dig ground inside tcp instantly.
But as soon as two of them are missing, I have to suspect wake_up itself.
At least, we had such bugs there until 2.4.0.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Very high bandwith packet based interface and performance problems

2001-02-23 Thread kuznet

Hello!

> > Yes its a SHOULD in RFC1122, but in any normal environment pretty much a
> > must and I know of no stack significantly violating it.
> 
> I didn't know there was such a thing as a normal environment :)

Jokes apart, such "normal" environments are rare today.

>From tcpdumps it is clear, that win2000 does not ack each other mss.
It can ack once per window at high load. I have seen the same behaviour
of solaris. freebsd-4.x surely does not ack each second mss
(it is from source code), which is probably bug (at least, it stops
to ack at all as soon as MSG_WAITALL is used. 8))

Acking each second mss is required to do slow start more or less
fastly. As soon as window is full, they are useless, so that win2000
is fully right and, in fact, optimal.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Very high bandwith packet based interface and performance problems

2001-02-23 Thread kuznet

Hello!

  Yes its a SHOULD in RFC1122, but in any normal environment pretty much a
  must and I know of no stack significantly violating it.
 
 I didn't know there was such a thing as a normal environment :)

Jokes apart, such "normal" environments are rare today.

From tcpdumps it is clear, that win2000 does not ack each other mss.
It can ack once per window at high load. I have seen the same behaviour
of solaris. freebsd-4.x surely does not ack each second mss
(it is from source code), which is probably bug (at least, it stops
to ack at all as soon as MSG_WAITALL is used. 8))

Acking each second mss is required to do slow start more or less
fastly. As soon as window is full, they are useless, so that win2000
is fully right and, in fact, optimal.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.1 under heavy network load - more info

2001-02-21 Thread kuznet

Hello!

> OK! I actually expected 2.4 to be somewhat selftuning.

Defaults for these numbers (X,Y,Z) are very conservative.


> Interesting you say that, I looked at the logs and I see over 5000 sockets
> used, does'nt look peaceful to me. But you are absolutely right about the
> orphans. The error about "too many orphans" must be wrong and is triggered
> by some other condition. Look at the output from the debug printk I've
> added:
> 
> Feb 18 15:43:50 mcquack kernel: TCP: too many of orphaned sockets

Well, message is not accurate. It refuses to hold this particular
orphan, because it feels that too much of memory is consumed.
Change number Z and the message will disappear.

Poor orphans are the first victims, because they have nobody
to take care of, but kernel. And kernel is harsh parent. 8)


> I raised the numbers a little bit more. Now with 128MB RAM in the box we can
> handle a maximum of 7000 connections. No more because we start to swap too
> much.

Really? Well, it is unlikely to have something with net.
Your dumps show that at 6000 connections networking eated less
than 10MB of memory. Probably, swapping is mistuned.


> Feb 21 10:43:41 mcquack kernel: KERNEL: assertion (tp->lost_out == 0) failed
> at tcp_input.c(1202):tcp_remove_reno_sacks

This is also debugging. Harmless.


> 2) The error about "too many orphans" is bogus?

Yes. It is sort of desinformation. It means really that
accounting detected excess of limits, which are set.


> 3) I will get a lot of debug crap i syslog

It will disappear as soon as debugging is disabled. I.e. when
kernel will enter distributions, I guess.

If I was responsible for this, I would not kill them.
The more messages is the better. Otherwise you would have
nothing to report and even did not notice that something is wrong. 8)


> This happened once under very heavy load (8000+ connections) and I have been
> unable to reproduce.

Probably this has nothing to do with tcp, but explained by some
vm failure, sort of oom killer.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.1 under heavy network load - more info

2001-02-21 Thread kuznet

Hello!

 OK! I actually expected 2.4 to be somewhat selftuning.

Defaults for these numbers (X,Y,Z) are very conservative.


 Interesting you say that, I looked at the logs and I see over 5000 sockets
 used, does'nt look peaceful to me. But you are absolutely right about the
 orphans. The error about "too many orphans" must be wrong and is triggered
 by some other condition. Look at the output from the debug printk I've
 added:
 
 Feb 18 15:43:50 mcquack kernel: TCP: too many of orphaned sockets

Well, message is not accurate. It refuses to hold this particular
orphan, because it feels that too much of memory is consumed.
Change number Z and the message will disappear.

Poor orphans are the first victims, because they have nobody
to take care of, but kernel. And kernel is harsh parent. 8)


 I raised the numbers a little bit more. Now with 128MB RAM in the box we can
 handle a maximum of 7000 connections. No more because we start to swap too
 much.

Really? Well, it is unlikely to have something with net.
Your dumps show that at 6000 connections networking eated less
than 10MB of memory. Probably, swapping is mistuned.


 Feb 21 10:43:41 mcquack kernel: KERNEL: assertion (tp-lost_out == 0) failed
 at tcp_input.c(1202):tcp_remove_reno_sacks

This is also debugging. Harmless.


 2) The error about "too many orphans" is bogus?

Yes. It is sort of desinformation. It means really that
accounting detected excess of limits, which are set.


 3) I will get a lot of debug crap i syslog

It will disappear as soon as debugging is disabled. I.e. when
kernel will enter distributions, I guess.

If I was responsible for this, I would not kill them.
The more messages is the better. Otherwise you would have
nothing to report and even did not notice that something is wrong. 8)


 This happened once under very heavy load (8000+ connections) and I have been
 unable to reproduce.

Probably this has nothing to do with tcp, but explained by some
vm failure, sort of oom killer.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.1 under heavy network load - more info

2001-02-20 Thread kuznet

Hello!

> of errors a bit but I'm not sure I fully understand the implications of
> doing so.

Until these numbers do not exceed total amount of RAM, this is exactly
the action required in this case.

Dumps, which you sent to me, show nothing pathological. Actually,
they are made in some period of full peace: only 47 orphans and
only about 10MB of accounted memory.


> echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout

This is not essential with 2.4. In 2.4 this state does not grab any essential
resources.

> echo 0 > /proc/sys/net/ipv4/tcp_timestamps
> echo 0 > /proc/sys/net/ipv4/tcp_sack

Why?


> echo "3072 3584 6144" > /proc/sys/net/ipv4/tcp_mem

If you still have problems with orphans, you should raise these
numbers. Extremal settings are sort of:

Z=
Y=
Z=
echo "$X $Y $Z" > /proc/sys/net/ipv4/tcp_mem

Set them to maximum and if the messages will not disappear completely,
decrease them to more tough limits.


> Feb 18 15:05:44 mcquack kernel: sending pkt_too_big to self

Normal. Debugging.


> Feb 18 15:24:07 mcquack kernel: TCP: peer xx.xx.xx.xx:1084/7000 shrinks
> window 210605:1072:2106779313. Bad, what else can I say?

Debugging too.

> Feb 18 15:42:06 mcquack kernel: TCP: dbg sk->wmem_queued 5664
> tcp_orphan_count 99 tcp_memory_allocated 6145

Number Z is exceeded, newly _closed_ sockets will be aborted and
stack entered state of moderation of its appetite.

Dump, which you have sent to me and further messages in logs,
show that it succeded and converged to normal state.


> Please let me know if I can provide more debug info or test something!

Actually, the only dubious place in your original report was something
about behaviour of ssh. ssh surely cannot be affected by this effect.
Could you elaborate this? What kind of problem exactly? Maybe, some
tcpdump is the problem is reproducable.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.4.1 under heavy network load - more info

2001-02-20 Thread kuznet

Hello!

 of errors a bit but I'm not sure I fully understand the implications of
 doing so.

Until these numbers do not exceed total amount of RAM, this is exactly
the action required in this case.

Dumps, which you sent to me, show nothing pathological. Actually,
they are made in some period of full peace: only 47 orphans and
only about 10MB of accounted memory.


 echo 30  /proc/sys/net/ipv4/tcp_fin_timeout

This is not essential with 2.4. In 2.4 this state does not grab any essential
resources.

 echo 0  /proc/sys/net/ipv4/tcp_timestamps
 echo 0  /proc/sys/net/ipv4/tcp_sack

Why?


 echo "3072 3584 6144"  /proc/sys/net/ipv4/tcp_mem

If you still have problems with orphans, you should raise these
numbers. Extremal settings are sort of:

Z=total amount of ram in pages
Y=something  Z
Z=something  Y
echo "$X $Y $Z"  /proc/sys/net/ipv4/tcp_mem

Set them to maximum and if the messages will not disappear completely,
decrease them to more tough limits.


 Feb 18 15:05:44 mcquack kernel: sending pkt_too_big to self

Normal. Debugging.


 Feb 18 15:24:07 mcquack kernel: TCP: peer xx.xx.xx.xx:1084/7000 shrinks
 window 210605:1072:2106779313. Bad, what else can I say?

Debugging too.

 Feb 18 15:42:06 mcquack kernel: TCP: dbg sk-wmem_queued 5664
 tcp_orphan_count 99 tcp_memory_allocated 6145

Number Z is exceeded, newly _closed_ sockets will be aborted and
stack entered state of moderation of its appetite.

Dump, which you have sent to me and further messages in logs,
show that it succeded and converged to normal state.


 Please let me know if I can provide more debug info or test something!

Actually, the only dubious place in your original report was something
about behaviour of ssh. ssh surely cannot be affected by this effect.
Could you elaborate this? What kind of problem exactly? Maybe, some
tcpdump is the problem is reproducable.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: MTU and 2.4.x kernel

2001-02-19 Thread kuznet

Hello!

> We are implementing an IP stack.

Alan, please, tell me what is wrong. And we will repair this.

The implementation follows RFCs and even relaxes their requirements
in the cases, when they are far from reality.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SO_SNDTIMEO: 2.4 kernel bugs

2001-02-19 Thread kuznet

Hello!

> You are right - our sendfile() implementation is broken. I have fixed it

Thank you!


> Investigation shows that the Linux network layer is behaving oddly. It
> seems that we are writing 4096 bytes to a socket. This proceeds in 4096
> byte chunks until the send buffer on the socket is full, and a 4096 byte
> write blocks. This blocking write is eventually interrupted by the
> timeout, and the write call returns.. wait for it.. 4096! This suggests
> there was socket space after all, and the call should not have blocked.

Wakeup does not happen until _enough_ (1/3 of snbuf) of space in sndbuf
is released, otherwise you will overschedule. So, as soon as
write() goes to sleep, it will sleep waiting until 1/3 is released.

If it is interrupted, it use all the released space immediately before exit.
Again, to make more for in this context. This can be even wrong and, probably,
we should return instantly with -EAGAIN/-EINTR/partial count, but it is most
likely suboptimal (though I have already changed this to instant return).
But this does not look essential from caller's viewpoint, except for
sendfile() of course. 8)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: SO_SNDTIMEO: 2.4 kernel bugs

2001-02-19 Thread kuznet

Hello!

 You are right - our sendfile() implementation is broken. I have fixed it

Thank you!


 Investigation shows that the Linux network layer is behaving oddly. It
 seems that we are writing 4096 bytes to a socket. This proceeds in 4096
 byte chunks until the send buffer on the socket is full, and a 4096 byte
 write blocks. This blocking write is eventually interrupted by the
 timeout, and the write call returns.. wait for it.. 4096! This suggests
 there was socket space after all, and the call should not have blocked.

Wakeup does not happen until _enough_ (1/3 of snbuf) of space in sndbuf
is released, otherwise you will overschedule. So, as soon as
write() goes to sleep, it will sleep waiting until 1/3 is released.

If it is interrupted, it use all the released space immediately before exit.
Again, to make more for in this context. This can be even wrong and, probably,
we should return instantly with -EAGAIN/-EINTR/partial count, but it is most
likely suboptimal (though I have already changed this to instant return).
But this does not look essential from caller's viewpoint, except for
sendfile() of course. 8)

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: MTU and 2.4.x kernel

2001-02-19 Thread kuznet

Hello!

 We are implementing an IP stack.

Alan, please, tell me what is wrong. And we will repair this.

The implementation follows RFCs and even relaxes their requirements
in the cases, when they are far from reality.

Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



  1   2   3   4   >