Re: TCP stalls in current git, possibly splice related

2007-07-19 Thread Johannes Berg
On Wed, 2007-07-18 at 12:58 +0200, Johannes Berg wrote:

> Yeah, I only need to try to remember when I last *didn't* see the
> problem, and hope that my analysis about it being on the receiver side
> is correct :)

Hm. I went back to 2.6.20 on the receiver side and still saw it. Maybe I
need to go back further?

With strace (the synergyc process) and tcpdump (on the interface) I
continually see packets coming in but the read sometimes pauses for 8
seconds... Not sure why. Very weird, but right now I don't have enough
time for investigating.

johannes


signature.asc
Description: This is a digitally signed message part


Re: TCP stalls in current git, possibly splice related

2007-07-18 Thread Johannes Berg
On Wed, 2007-07-18 at 12:49 +0200, Jens Axboe wrote:

> OK, then we can put splice off the hook at least :-)

:)

> If it's easily reproducible (and it sounds like it), then a git bisect
> might be the easiest way forward.

Yeah, I only need to try to remember when I last *didn't* see the
problem, and hope that my analysis about it being on the receiver side
is correct :)

johannes


signature.asc
Description: This is a digitally signed message part


Re: TCP stalls in current git, possibly splice related

2007-07-18 Thread Jens Axboe
On Wed, Jul 18 2007, Johannes Berg wrote:
> On Mon, 2007-07-16 at 14:02 +0200, Jens Axboe wrote:
> 
> > Yep, it's a sender thing, so upgrading the receiver will not change
> > anything.
> 
> Ok, I upgraded, but that didn't help. And in fact, I don't see how it
> could have since synergy doesn't use splice or sendfile. I should've
> thought of that right away, sorry.
> 
> It seems that packets are actually coming in during the time that my
> mouse hangs though (ran wireshark in parallel and saw no pauses in the
> timeline.) Hence, it actually seems to be on the receiver side, and
> running the synergy client under strace reveals that during the time my
> mouse hangs it's in poll() waiting for input on the tcp socket. sysrg-t
> doesn't show anything useful, it's just scheduling waiting for data.
> According to wireshark data is sent, but it never shows up at the
> application layer.

OK, then we can put splice off the hook at least :-)

If it's easily reproducible (and it sounds like it), then a git bisect
might be the easiest way forward.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP stalls in current git, possibly splice related

2007-07-18 Thread Johannes Berg
On Mon, 2007-07-16 at 14:02 +0200, Jens Axboe wrote:

> Yep, it's a sender thing, so upgrading the receiver will not change
> anything.

Ok, I upgraded, but that didn't help. And in fact, I don't see how it
could have since synergy doesn't use splice or sendfile. I should've
thought of that right away, sorry.

It seems that packets are actually coming in during the time that my
mouse hangs though (ran wireshark in parallel and saw no pauses in the
timeline.) Hence, it actually seems to be on the receiver side, and
running the synergy client under strace reveals that during the time my
mouse hangs it's in poll() waiting for input on the tcp socket. sysrg-t
doesn't show anything useful, it's just scheduling waiting for data.
According to wireshark data is sent, but it never shows up at the
application layer.

johannes


signature.asc
Description: This is a digitally signed message part


Re: TCP stalls in current git, possibly splice related

2007-07-16 Thread Jens Axboe
On Mon, Jul 16 2007, Johannes Berg wrote:
> On Fri, 2007-07-13 at 13:05 +0200, Jens Axboe wrote:
> > On Fri, Jul 13 2007, Johannes Berg wrote:
> > > On Thu, 2007-07-12 at 16:12 -0400, James Morris wrote:
> > > > I'm seeing TCP connection stalls with current git, and a bisect found 
> > > > the 
> > > > following as a possible cause:
> > > 
> > > I'm also seeing stalls with synergy.
> > 
> > Please try the below.
> 
> I usually see stalls sending data from machine A to machine B. I just
> upgraded machine B to commit 8d9107e8c50e1c4ff43c91c8841805833f3ecfb9
> (Fri Jul 13 16:53:18 2007 -0700) and the tree includes this patch. This
> did not, however, fix the problem so I guess the problem is on the
> sender side. Due to wireless-dev lagging behind I there still have
> 2.6.22-something where this patch doesn't apply. I hope John will merge
> again soon and then I'll report back whether I still see the stalling
> behaviour with a current tree that includes this patch.

Yep, it's a sender thing, so upgrading the receiver will not change
anything.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP stalls in current git, possibly splice related

2007-07-16 Thread Johannes Berg
On Fri, 2007-07-13 at 13:05 +0200, Jens Axboe wrote:
> On Fri, Jul 13 2007, Johannes Berg wrote:
> > On Thu, 2007-07-12 at 16:12 -0400, James Morris wrote:
> > > I'm seeing TCP connection stalls with current git, and a bisect found the 
> > > following as a possible cause:
> > 
> > I'm also seeing stalls with synergy.
> 
> Please try the below.

I usually see stalls sending data from machine A to machine B. I just
upgraded machine B to commit 8d9107e8c50e1c4ff43c91c8841805833f3ecfb9
(Fri Jul 13 16:53:18 2007 -0700) and the tree includes this patch. This
did not, however, fix the problem so I guess the problem is on the
sender side. Due to wireless-dev lagging behind I there still have
2.6.22-something where this patch doesn't apply. I hope John will merge
again soon and then I'll report back whether I still see the stalling
behaviour with a current tree that includes this patch.

johannes


signature.asc
Description: This is a digitally signed message part


Re: TCP stalls in current git, possibly splice related

2007-07-13 Thread Johannes Berg
On Fri, 2007-07-13 at 13:05 +0200, Jens Axboe wrote:
> On Fri, Jul 13 2007, Johannes Berg wrote:
> > On Thu, 2007-07-12 at 16:12 -0400, James Morris wrote:
> > > I'm seeing TCP connection stalls with current git, and a bisect found the 
> > > following as a possible cause:
> > 
> > I'm also seeing stalls with synergy.
> 
> Please try the below.

Thanks. I already had your mail in the other part of the thread marked,
I'll give it a try on Monday or Tuesday; I'm away from the setup where I
always see the problem right now.

johannes


signature.asc
Description: This is a digitally signed message part


Re: TCP stalls in current git, possibly splice related

2007-07-13 Thread James Morris
On Fri, 13 Jul 2007, Jens Axboe wrote:

> On Fri, Jul 13 2007, Johannes Berg wrote:
> > On Thu, 2007-07-12 at 16:12 -0400, James Morris wrote:
> > > I'm seeing TCP connection stalls with current git, and a bisect found the 
> > > following as a possible cause:
> > 
> > I'm also seeing stalls with synergy.
> 
> Please try the below.

Works for me.

-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP stalls in current git, possibly splice related

2007-07-13 Thread Jens Axboe
On Fri, Jul 13 2007, Johannes Berg wrote:
> On Thu, 2007-07-12 at 16:12 -0400, James Morris wrote:
> > I'm seeing TCP connection stalls with current git, and a bisect found the 
> > following as a possible cause:
> 
> I'm also seeing stalls with synergy.

Please try the below.

diff --git a/fs/splice.c b/fs/splice.c
index ed2ce99..92646aa 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -491,7 +491,7 @@ ssize_t generic_file_splice_read(struct file *in, loff_t 
*ppos,
 
ret = 0;
spliced = 0;
-   while (len) {
+   while (len && !spliced) {
ret = __generic_file_splice_read(in, ppos, pipe, len, flags);
 
if (ret < 0)
@@ -1051,15 +1051,10 @@ ssize_t splice_direct_to_actor(struct file *in, struct 
splice_desc *sd,
sd->flags &= ~SPLICE_F_NONBLOCK;
 
while (len) {
-   size_t read_len, max_read_len;
-
-   /*
-* Do at most PIPE_BUFFERS pages worth of transfer:
-*/
-   max_read_len = min(len, (size_t)(PIPE_BUFFERS*PAGE_SIZE));
+   size_t read_len;
 
-   ret = do_splice_to(in, &sd->pos, pipe, max_read_len, flags);
-   if (unlikely(ret < 0))
+   ret = do_splice_to(in, &sd->pos, pipe, len, flags);
+   if (unlikely(ret <= 0))
goto out_release;
 
read_len = ret;
@@ -1071,26 +1066,17 @@ ssize_t splice_direct_to_actor(struct file *in, struct 
splice_desc *sd,
 * could get stuck data in the internal pipe:
 */
ret = actor(pipe, sd);
-   if (unlikely(ret < 0))
+   if (unlikely(ret <= 0))
goto out_release;
 
bytes += ret;
len -= ret;
 
-   /*
-* In nonblocking mode, if we got back a short read then
-* that was due to either an IO error or due to the
-* pagecache entry not being there. In the IO error case
-* the _next_ splice attempt will produce a clean IO error
-* return value (not a short read), so in both cases it's
-* correct to break out of the loop here:
-*/
-   if ((flags & SPLICE_F_NONBLOCK) && (read_len < max_read_len))
-   break;
+   if (ret < read_len)
+   goto out_release;
}
 
pipe->nrbufs = pipe->curbuf = 0;
-
return bytes;
 
 out_release:
@@ -1152,10 +1138,12 @@ long do_splice_direct(struct file *in, loff_t *ppos, 
struct file *out,
.pos= *ppos,
.u.file = out,
};
-   size_t ret;
+   long ret;
 
ret = splice_direct_to_actor(in, &sd, direct_splice_actor);
-   *ppos = sd.pos;
+   if (ret > 0)
+   *ppos += ret;
+
return ret;
 }
 

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP stalls in current git, possibly splice related

2007-07-13 Thread Johannes Berg
On Thu, 2007-07-12 at 16:12 -0400, James Morris wrote:
> I'm seeing TCP connection stalls with current git, and a bisect found the 
> following as a possible cause:

I'm also seeing stalls with synergy.

johannes


signature.asc
Description: This is a digitally signed message part


Re: TCP stalls in current git, possibly splice related

2007-07-13 Thread Jens Axboe
On Fri, Jul 13 2007, Jens Axboe wrote:
> On Fri, Jul 13 2007, Jens Axboe wrote:
> > On Thu, Jul 12 2007, James Morris wrote:
> > > On Thu, 12 Jul 2007, David Miller wrote:
> > > 
> > > > From: James Morris <[EMAIL PROTECTED]>
> > > > Date: Thu, 12 Jul 2007 16:12:25 -0400 (EDT)
> > > > 
> > > > > I'm seeing TCP connection stalls with current git, and a bisect found 
> > > > > the 
> > > > > following as a possible cause:
> > > > 
> > > > To add to this James is seeing this with distcc I believe.
> > > 
> > > Correct.
> > 
> > I'll try and reproduce.
> 
> You didn't happen to get a sysrq-t backtrace of that distcc being hung,
> did you?

Does this work for you?

diff --git a/fs/splice.c b/fs/splice.c
index ed2ce99..92646aa 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -491,7 +491,7 @@ ssize_t generic_file_splice_read(struct file *in, loff_t 
*ppos,
 
ret = 0;
spliced = 0;
-   while (len) {
+   while (len && !spliced) {
ret = __generic_file_splice_read(in, ppos, pipe, len, flags);
 
if (ret < 0)
@@ -1051,15 +1051,10 @@ ssize_t splice_direct_to_actor(struct file *in, struct 
splice_desc *sd,
sd->flags &= ~SPLICE_F_NONBLOCK;
 
while (len) {
-   size_t read_len, max_read_len;
-
-   /*
-* Do at most PIPE_BUFFERS pages worth of transfer:
-*/
-   max_read_len = min(len, (size_t)(PIPE_BUFFERS*PAGE_SIZE));
+   size_t read_len;
 
-   ret = do_splice_to(in, &sd->pos, pipe, max_read_len, flags);
-   if (unlikely(ret < 0))
+   ret = do_splice_to(in, &sd->pos, pipe, len, flags);
+   if (unlikely(ret <= 0))
goto out_release;
 
read_len = ret;
@@ -1071,26 +1066,17 @@ ssize_t splice_direct_to_actor(struct file *in, struct 
splice_desc *sd,
 * could get stuck data in the internal pipe:
 */
ret = actor(pipe, sd);
-   if (unlikely(ret < 0))
+   if (unlikely(ret <= 0))
goto out_release;
 
bytes += ret;
len -= ret;
 
-   /*
-* In nonblocking mode, if we got back a short read then
-* that was due to either an IO error or due to the
-* pagecache entry not being there. In the IO error case
-* the _next_ splice attempt will produce a clean IO error
-* return value (not a short read), so in both cases it's
-* correct to break out of the loop here:
-*/
-   if ((flags & SPLICE_F_NONBLOCK) && (read_len < max_read_len))
-   break;
+   if (ret < read_len)
+   goto out_release;
}
 
pipe->nrbufs = pipe->curbuf = 0;
-
return bytes;
 
 out_release:
@@ -1152,10 +1138,12 @@ long do_splice_direct(struct file *in, loff_t *ppos, 
struct file *out,
.pos= *ppos,
.u.file = out,
};
-   size_t ret;
+   long ret;
 
ret = splice_direct_to_actor(in, &sd, direct_splice_actor);
-   *ppos = sd.pos;
+   if (ret > 0)
+   *ppos += ret;
+
return ret;
 }
 

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP stalls in current git, possibly splice related

2007-07-12 Thread Jens Axboe
On Fri, Jul 13 2007, Jens Axboe wrote:
> On Thu, Jul 12 2007, James Morris wrote:
> > On Thu, 12 Jul 2007, David Miller wrote:
> > 
> > > From: James Morris <[EMAIL PROTECTED]>
> > > Date: Thu, 12 Jul 2007 16:12:25 -0400 (EDT)
> > > 
> > > > I'm seeing TCP connection stalls with current git, and a bisect found 
> > > > the 
> > > > following as a possible cause:
> > > 
> > > To add to this James is seeing this with distcc I believe.
> > 
> > Correct.
> 
> I'll try and reproduce.

You didn't happen to get a sysrq-t backtrace of that distcc being hung,
did you?

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP stalls in current git, possibly splice related

2007-07-12 Thread Jens Axboe
On Thu, Jul 12 2007, James Morris wrote:
> On Thu, 12 Jul 2007, David Miller wrote:
> 
> > From: James Morris <[EMAIL PROTECTED]>
> > Date: Thu, 12 Jul 2007 16:12:25 -0400 (EDT)
> > 
> > > I'm seeing TCP connection stalls with current git, and a bisect found the 
> > > following as a possible cause:
> > 
> > To add to this James is seeing this with distcc I believe.
> 
> Correct.

I'll try and reproduce.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP stalls in current git, possibly splice related

2007-07-12 Thread James Morris
On Thu, 12 Jul 2007, David Miller wrote:

> From: James Morris <[EMAIL PROTECTED]>
> Date: Thu, 12 Jul 2007 16:12:25 -0400 (EDT)
> 
> > I'm seeing TCP connection stalls with current git, and a bisect found the 
> > following as a possible cause:
> 
> To add to this James is seeing this with distcc I believe.

Correct.

-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: TCP stalls in current git, possibly splice related

2007-07-12 Thread David Miller
From: James Morris <[EMAIL PROTECTED]>
Date: Thu, 12 Jul 2007 16:12:25 -0400 (EDT)

> I'm seeing TCP connection stalls with current git, and a bisect found the 
> following as a possible cause:

To add to this James is seeing this with distcc I believe.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


TCP stalls in current git, possibly splice related

2007-07-12 Thread James Morris
I'm seeing TCP connection stalls with current git, and a bisect found the 
following as a possible cause:

534f2aaa6ab07cd71164180bc958a7dcde41db11 is first bad commit
commit 534f2aaa6ab07cd71164180bc958a7dcde41db11
Author: Jens Axboe <[EMAIL PROTECTED]>
Date:   Fri Jun 1 14:52:37 2007 +0200

sys_sendfile: switch to using ->splice_read, if available

This patch makes sendfile prefer to use ->splice_read(), if it's
available in the file_operations structure.

Signed-off-by: Jens Axboe <[EMAIL PROTECTED]>


It's a large patch, and not clear if it's the patch itself or coincidental 
to it.

I've looked at some tcpdumps, but may not be able to get back to them 
until tomorrow or the weekend, & thought it might be useful to get the 
report out now.



- James
-- 
James Morris
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html