Outstanding NBD requests bug? (Was: Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues)

2007-06-26 Thread Mike Snitzer

Peter, Paul,

The original message (below) didn't get any response.

Could you elaborate on the impact of this NBD issue?  Based on your
description, having requests just sit on the queue would _seem_ to be
quite serious.  Is this purely a bookkeeping issue or is there a more
subtle side-effect of this inaccurate request count?

You stated that your patch fixes this issue "not in the proper way".
If a fix is trully needed what is the proper way?

please advise, thanks.
Mike

On 4/29/07, Peter Zijlstra <[EMAIL PROTECTED]> wrote:

On Sun, 2007-04-29 at 21:41 +0200, Rogier Wolff wrote:
> On Tue, Apr 17, 2007 at 10:37:38PM -0700, Andrew Morton wrote:
> > Florin, can we please see /proc/meminfo as well?
> >
> > Also the result of `echo m > /proc/sysrq-trigger'
>
> Hi,
>
> It's been a while since this thread died out, but maybe I'm
> having the same problem. Networking, large part of memory is
> buffering writes.
>
> In my case I'm using NBD.
>
> Oh,
>
> /sys/block/nbd0/stat gives:
>  636   88 5353 1700  99119554   16227263156   
43  1452000 61802352
> I put some debugging stuff in nbd, and it DOES NOT KNOW about the
> 43 requests that the io scheduler claims are in flight at the
> driver

AFAIK nbd is a tad broken; the following patch used to fix it, although
not in the proper way. Hence it never got merged.

There is a race where the plug state of the device queue gets confused,
which causes requests to just sit on the queue, without further action.

---

Subject: nbd: request_fn fixup

Dropping the queue_lock opens up a nasty race, fix this race by
plugging the device when we're done.

Also includes a small cleanup.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
CC: Daniel Phillips <[EMAIL PROTECTED]>
CC: Pavel Machek <[EMAIL PROTECTED]>
---
 drivers/block/nbd.c |   67 ++--
 1 file changed, 49 insertions(+), 18 deletions(-)

Index: linux-2.6/drivers/block/nbd.c
===
--- linux-2.6.orig/drivers/block/nbd.c  2006-09-07 17:20:52.0 +0200
+++ linux-2.6/drivers/block/nbd.c   2006-09-07 17:35:05.0 +0200
@@ -97,20 +97,24 @@ static const char *nbdcmd_to_ascii(int c
 }
 #endif /* NDEBUG */

-static void nbd_end_request(struct request *req)
+static void __nbd_end_request(struct request *req)
 {
int uptodate = (req->errors == 0) ? 1 : 0;
-   request_queue_t *q = req->q;
-   unsigned long flags;

dprintk(DBG_BLKDEV, "%s: request %p: %s\n", req->rq_disk->disk_name,
req, uptodate? "done": "failed");

-   spin_lock_irqsave(q->queue_lock, flags);
-   if (!end_that_request_first(req, uptodate, req->nr_sectors)) {
+   if (!end_that_request_first(req, uptodate, req->nr_sectors))
end_that_request_last(req, uptodate);
-   }
-   spin_unlock_irqrestore(q->queue_lock, flags);
+}
+
+static void nbd_end_request(struct request *req)
+{
+   request_queue_t *q = req->q;
+
+   spin_lock_irq(q->queue_lock);
+   __nbd_end_request(req);
+   spin_unlock_irq(q->queue_lock);
 }

 /*
@@ -435,10 +439,8 @@ static void do_nbd_request(request_queue
mutex_unlock(>tx_lock);
printk(KERN_ERR "%s: Attempted send on closed socket\n",
   lo->disk->disk_name);
-   req->errors++;
-   nbd_end_request(req);
spin_lock_irq(q->queue_lock);
-   continue;
+   goto error_out;
}

lo->active_req = req;
@@ -463,10 +465,13 @@ static void do_nbd_request(request_queue

 error_out:
req->errors++;
-   spin_unlock(q->queue_lock);
-   nbd_end_request(req);
-   spin_lock(q->queue_lock);
+   __nbd_end_request(req);
}
+   /*
+* q->queue_lock has been dropped, this opens up a race
+* plug the device to close it.
+*/
+   blk_plug_device(q);
return;
 }



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Outstanding NBD requests bug? (Was: Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues)

2007-06-26 Thread Mike Snitzer

Peter, Paul,

The original message (below) didn't get any response.

Could you elaborate on the impact of this NBD issue?  Based on your
description, having requests just sit on the queue would _seem_ to be
quite serious.  Is this purely a bookkeeping issue or is there a more
subtle side-effect of this inaccurate request count?

You stated that your patch fixes this issue not in the proper way.
If a fix is trully needed what is the proper way?

please advise, thanks.
Mike

On 4/29/07, Peter Zijlstra [EMAIL PROTECTED] wrote:

On Sun, 2007-04-29 at 21:41 +0200, Rogier Wolff wrote:
 On Tue, Apr 17, 2007 at 10:37:38PM -0700, Andrew Morton wrote:
  Florin, can we please see /proc/meminfo as well?
 
  Also the result of `echo m  /proc/sysrq-trigger'

 Hi,

 It's been a while since this thread died out, but maybe I'm
 having the same problem. Networking, large part of memory is
 buffering writes.

 In my case I'm using NBD.

 Oh,

 /sys/block/nbd0/stat gives:
  636   88 5353 1700  99119554   16227263156   
43  1452000 61802352
 I put some debugging stuff in nbd, and it DOES NOT KNOW about the
 43 requests that the io scheduler claims are in flight at the
 driver

AFAIK nbd is a tad broken; the following patch used to fix it, although
not in the proper way. Hence it never got merged.

There is a race where the plug state of the device queue gets confused,
which causes requests to just sit on the queue, without further action.

---

Subject: nbd: request_fn fixup

Dropping the queue_lock opens up a nasty race, fix this race by
plugging the device when we're done.

Also includes a small cleanup.

Signed-off-by: Peter Zijlstra [EMAIL PROTECTED]
CC: Daniel Phillips [EMAIL PROTECTED]
CC: Pavel Machek [EMAIL PROTECTED]
---
 drivers/block/nbd.c |   67 ++--
 1 file changed, 49 insertions(+), 18 deletions(-)

Index: linux-2.6/drivers/block/nbd.c
===
--- linux-2.6.orig/drivers/block/nbd.c  2006-09-07 17:20:52.0 +0200
+++ linux-2.6/drivers/block/nbd.c   2006-09-07 17:35:05.0 +0200
@@ -97,20 +97,24 @@ static const char *nbdcmd_to_ascii(int c
 }
 #endif /* NDEBUG */

-static void nbd_end_request(struct request *req)
+static void __nbd_end_request(struct request *req)
 {
int uptodate = (req-errors == 0) ? 1 : 0;
-   request_queue_t *q = req-q;
-   unsigned long flags;

dprintk(DBG_BLKDEV, %s: request %p: %s\n, req-rq_disk-disk_name,
req, uptodate? done: failed);

-   spin_lock_irqsave(q-queue_lock, flags);
-   if (!end_that_request_first(req, uptodate, req-nr_sectors)) {
+   if (!end_that_request_first(req, uptodate, req-nr_sectors))
end_that_request_last(req, uptodate);
-   }
-   spin_unlock_irqrestore(q-queue_lock, flags);
+}
+
+static void nbd_end_request(struct request *req)
+{
+   request_queue_t *q = req-q;
+
+   spin_lock_irq(q-queue_lock);
+   __nbd_end_request(req);
+   spin_unlock_irq(q-queue_lock);
 }

 /*
@@ -435,10 +439,8 @@ static void do_nbd_request(request_queue
mutex_unlock(lo-tx_lock);
printk(KERN_ERR %s: Attempted send on closed socket\n,
   lo-disk-disk_name);
-   req-errors++;
-   nbd_end_request(req);
spin_lock_irq(q-queue_lock);
-   continue;
+   goto error_out;
}

lo-active_req = req;
@@ -463,10 +465,13 @@ static void do_nbd_request(request_queue

 error_out:
req-errors++;
-   spin_unlock(q-queue_lock);
-   nbd_end_request(req);
-   spin_lock(q-queue_lock);
+   __nbd_end_request(req);
}
+   /*
+* q-queue_lock has been dropped, this opens up a race
+* plug the device to close it.
+*/
+   blk_plug_device(q);
return;
 }



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-29 Thread Peter Zijlstra
On Sun, 2007-04-29 at 21:41 +0200, Rogier Wolff wrote:
> On Tue, Apr 17, 2007 at 10:37:38PM -0700, Andrew Morton wrote:
> > Florin, can we please see /proc/meminfo as well?
> > 
> > Also the result of `echo m > /proc/sysrq-trigger'
> 
> Hi,
> 
> It's been a while since this thread died out, but maybe I'm 
> having the same problem. Networking, large part of memory is 
> buffering writes. 
> 
> In my case I'm using NBD. 
> 
> Oh, 
> 
> /sys/block/nbd0/stat gives:
>  636   88 5353 1700  99119554   16227263156   
> 43  1452000 61802352
> I put some debugging stuff in nbd, and it DOES NOT KNOW about the
> 43 requests that the io scheduler claims are in flight at the
> driver 

AFAIK nbd is a tad broken; the following patch used to fix it, although
not in the proper way. Hence it never got merged.

There is a race where the plug state of the device queue gets confused,
which causes requests to just sit on the queue, without further action.

---

Subject: nbd: request_fn fixup

Dropping the queue_lock opens up a nasty race, fix this race by
plugging the device when we're done.

Also includes a small cleanup.

Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
CC: Daniel Phillips <[EMAIL PROTECTED]>
CC: Pavel Machek <[EMAIL PROTECTED]>
---
 drivers/block/nbd.c |   67 ++--
 1 file changed, 49 insertions(+), 18 deletions(-)

Index: linux-2.6/drivers/block/nbd.c
===
--- linux-2.6.orig/drivers/block/nbd.c  2006-09-07 17:20:52.0 +0200
+++ linux-2.6/drivers/block/nbd.c   2006-09-07 17:35:05.0 +0200
@@ -97,20 +97,24 @@ static const char *nbdcmd_to_ascii(int c
 }
 #endif /* NDEBUG */
 
-static void nbd_end_request(struct request *req)
+static void __nbd_end_request(struct request *req)
 {
int uptodate = (req->errors == 0) ? 1 : 0;
-   request_queue_t *q = req->q;
-   unsigned long flags;
 
dprintk(DBG_BLKDEV, "%s: request %p: %s\n", req->rq_disk->disk_name,
req, uptodate? "done": "failed");
 
-   spin_lock_irqsave(q->queue_lock, flags);
-   if (!end_that_request_first(req, uptodate, req->nr_sectors)) {
+   if (!end_that_request_first(req, uptodate, req->nr_sectors))
end_that_request_last(req, uptodate);
-   }
-   spin_unlock_irqrestore(q->queue_lock, flags);
+}
+
+static void nbd_end_request(struct request *req)
+{
+   request_queue_t *q = req->q;
+
+   spin_lock_irq(q->queue_lock);
+   __nbd_end_request(req);
+   spin_unlock_irq(q->queue_lock);
 }
 
 /*
@@ -435,10 +439,8 @@ static void do_nbd_request(request_queue
mutex_unlock(>tx_lock);
printk(KERN_ERR "%s: Attempted send on closed socket\n",
   lo->disk->disk_name);
-   req->errors++;
-   nbd_end_request(req);
spin_lock_irq(q->queue_lock);
-   continue;
+   goto error_out;
}
 
lo->active_req = req;
@@ -463,10 +465,13 @@ static void do_nbd_request(request_queue
 
 error_out:
req->errors++;
-   spin_unlock(q->queue_lock);
-   nbd_end_request(req);
-   spin_lock(q->queue_lock);
+   __nbd_end_request(req);
}
+   /*
+* q->queue_lock has been dropped, this opens up a race
+* plug the device to close it.
+*/
+   blk_plug_device(q);
return;
 }
 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-29 Thread Rogier Wolff
On Tue, Apr 17, 2007 at 10:37:38PM -0700, Andrew Morton wrote:
> Florin, can we please see /proc/meminfo as well?
> 
> Also the result of `echo m > /proc/sysrq-trigger'

Hi,

It's been a while since this thread died out, but maybe I'm 
having the same problem. Networking, large part of memory is 
buffering writes. 

In my case I'm using NBD. 

Oh, 

/sys/block/nbd0/stat gives:
 636   88 5353 1700  99119554   16227263156   
43  1452000 61802352
I put some debugging stuff in nbd, and it DOES NOT KNOW about the
43 requests that the io scheduler claims are in flight at the
driver 

Those requests start a couple of seconds AFTER the whole thing
grinds to a halt.

I switched from crashing my 512Mb-ram-workstation to my development
machine, which has only 64M of RAM. (I got the development machine
back up and running after some effort). 

My rsync (and also "sync" if I call it, or reboot without -n) also
gets stuck in D state: 

<4>[  622.364000] rsync D 0019C170 0  2456   2455 (NOTLB)
<4>[  622.364000]c04d7c80 0086 c1f61ba8 0019c170  0008 
c31048a0 0003382e 
<4>[  622.364000] c24c9908 0286 c1092740 c3e12590 c2330a50 
c2330b5c 00061a80 
<4>[  622.364000]8639a400 0078 c04d7cd0  c10801b8 c04d7c88 
c03082af c0176c48 
<4>[  622.364000] Call Trace:
<4>[  622.364000]  [] io_schedule+0xe/0x16
<4>[  622.364000]  [] sync_buffer+0x0/0x2e
<4>[  622.364000]  [] sync_buffer+0x2b/0x2e
<4>[  622.364000]  [] __wait_on_bit+0x2c/0x51
<4>[  622.364000]  [] sync_buffer+0x0/0x2e
<4>[  622.364000]  [] out_of_line_wait_on_bit+0x73/0x7b
<4>[  622.364000]  [] wake_bit_function+0x0/0x3c
<4>[  622.364000]  [] wake_bit_function+0x0/0x3c
<4>[  622.364000]  [] __wait_on_buffer+0x22/0x25
<4>[  622.364000]  [] ext3_find_entry+0x1aa/0x36f
<4>[  622.364000]  [] journal_dirty_metadata+0x1b6/0x1d3
<4>[  622.364000]  [] ext3_lookup+0x28/0xc6
<4>[  622.364000]  [] real_lookup+0x53/0xc2
<4>[  622.364000]  [] do_lookup+0x57/0x9d
<4>[  622.364000]  [] __link_path_walk+0x7ae/0xb81
<4>[  622.364000]  [] __do_softirq+0x57/0x83
<4>[  622.364000]  [] link_path_walk+0x3d/0xa0
<4>[  622.364000]  [] sys_lchown+0x3c/0x44
<4>[  622.364000]  [] get_unused_fd+0xa0/0xbc
<4>[  622.364000]  [] do_path_lookup+0x1b7/0x200
<4>[  622.364000]  [] __path_lookup_intent_open+0x42/0x72
<4>[  622.364000]  [] path_lookup_open+0x20/0x25
<4>[  622.364000]  [] open_namei+0x8c/0x532
<4>[  622.364000]  [] sys_fchmodat+0xac/0xb9
<4>[  622.364000]  [] do_filp_open+0x25/0x39
<4>[  622.364000]  [] sys_lchown+0x3c/0x44
<4>[  622.364000]  [] get_unused_fd+0xa0/0xbc
<4>[  622.364000]  [] do_sys_open+0x42/0xbe
<4>[  622.364000]  [] sys_open+0x1a/0x1c
<4>[  622.364000]  [] syscall_call+0x7/0xb
<4>[  622.364000]  ===

--
<6>[  871.52] SysRq : Show Memory
<6>[  871.52] Mem-info:
<4>[  871.52] DMA per-cpu:
<4>[  871.52] CPU0: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, 
btch:   1 usd:   0
<4>[  871.52] Normal per-cpu:
<4>[  871.52] CPU0: Hot: hi:6, btch:   1 usd:   0   Cold: hi:2, 
btch:   1 usd:   0
<4>[  871.52] Active:5632 inactive:6764 dirty:0 writeback:302 unstable:0
<4>[  871.52]  free:717 slab:2024 mapped:926 pagetables:135 bounce:0
<4>[  871.52] DMA free:1104kB min:252kB low:312kB high:376kB active:3600kB 
inactive:6820kB present:16256kB pages_scanned:0 all_unreclaimable? no
<4>[  871.52] lowmem_reserve[]: 0 47
<4>[  871.52] Normal free:1764kB min:760kB low:948kB high:1140kB 
active:18928kB inactive:20236kB present:48708kB pages_scanned:0 
all_unreclaimable? no
<4>[  871.52] lowmem_reserve[]: 0 0
<4>[  871.52] DMA: 118*4kB 19*8kB 2*16kB 2*32kB 0*64kB 1*128kB 1*256kB 
0*512kB 0*1024kB 0*2048kB 0*4096kB = 1104kB
<4>[  871.52] Normal: 171*4kB 23*8kB 0*16kB 0*32kB 4*64kB 1*128kB 0*256kB 
1*512kB 0*1024kB 0*2048kB 0*4096kB = 1764kB
<4>[  871.52] Swap cache: add 0, delete 0, find 0/0, race 0+0
<4>[  871.52] Free swap  = 0kB
<4>[  871.52] Total swap = 0kB
<6>[  871.52] Free swap:0kB
<6>[  871.52] 16368 pages of RAM
<6>[  871.52] 0 pages of HIGHMEM
<6>[  871.52] 1044 reserved pages
<6>[  871.52] 13456 pages shared
<6>[  871.52] 0 pages swap cached
<6>[  871.52] 0 pages dirty
<6>[  871.52] 302 pages writeback
<6>[  871.52] 926 pages mapped
<6>[  871.52] 2024 pages slab
<6>[  871.52] 135 pages pagetables

--
ozon:/home/wolff# cat /proc/meminfo 
MemTotal:61296 kB
MemFree:  2752 kB
Buffers:  2228 kB
Cached:  29968 kB
SwapCached:  0 kB
Active:  22632 kB
Inactive:27056 kB
SwapTotal:   0 kB
SwapFree:0 kB
Dirty:   0 kB
Writeback:1208 kB
AnonPages:   17512 kB
Mapped:   3704 kB
Slab: 8088 kB
SReclaimable: 3656 kB
SUnreclaim:   4432 kB
PageTables:552 kB

Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-29 Thread Rogier Wolff
On Tue, Apr 17, 2007 at 10:37:38PM -0700, Andrew Morton wrote:
 Florin, can we please see /proc/meminfo as well?
 
 Also the result of `echo m  /proc/sysrq-trigger'

Hi,

It's been a while since this thread died out, but maybe I'm 
having the same problem. Networking, large part of memory is 
buffering writes. 

In my case I'm using NBD. 

Oh, 

/sys/block/nbd0/stat gives:
 636   88 5353 1700  99119554   16227263156   
43  1452000 61802352
I put some debugging stuff in nbd, and it DOES NOT KNOW about the
43 requests that the io scheduler claims are in flight at the
driver 

Those requests start a couple of seconds AFTER the whole thing
grinds to a halt.

I switched from crashing my 512Mb-ram-workstation to my development
machine, which has only 64M of RAM. (I got the development machine
back up and running after some effort). 

My rsync (and also sync if I call it, or reboot without -n) also
gets stuck in D state: 

4[  622.364000] rsync D 0019C170 0  2456   2455 (NOTLB)
4[  622.364000]c04d7c80 0086 c1f61ba8 0019c170  0008 
c31048a0 0003382e 
4[  622.364000] c24c9908 0286 c1092740 c3e12590 c2330a50 
c2330b5c 00061a80 
4[  622.364000]8639a400 0078 c04d7cd0  c10801b8 c04d7c88 
c03082af c0176c48 
4[  622.364000] Call Trace:
4[  622.364000]  [c03082af] io_schedule+0xe/0x16
4[  622.364000]  [c0176c48] sync_buffer+0x0/0x2e
4[  622.364000]  [c0176c73] sync_buffer+0x2b/0x2e
4[  622.364000]  [c03083b9] __wait_on_bit+0x2c/0x51
4[  622.364000]  [c0176c48] sync_buffer+0x0/0x2e
4[  622.364000]  [c0308451] out_of_line_wait_on_bit+0x73/0x7b
4[  622.364000]  [c012970e] wake_bit_function+0x0/0x3c
4[  622.364000]  [c012970e] wake_bit_function+0x0/0x3c
4[  622.364000]  [c0176cce] __wait_on_buffer+0x22/0x25
4[  622.364000]  [c0198cf0] ext3_find_entry+0x1aa/0x36f
4[  622.364000]  [c01a2324] journal_dirty_metadata+0x1b6/0x1d3
4[  622.364000]  [c01990e7] ext3_lookup+0x28/0xc6
4[  622.364000]  [c0161611] real_lookup+0x53/0xc2
4[  622.364000]  [c0161881] do_lookup+0x57/0x9d
4[  622.364000]  [c0162075] __link_path_walk+0x7ae/0xb81
4[  622.364000]  [c011cb77] __do_softirq+0x57/0x83
4[  622.364000]  [c0162485] link_path_walk+0x3d/0xa0
4[  622.364000]  [c015a4e7] sys_lchown+0x3c/0x44
4[  622.364000]  [c015a8c7] get_unused_fd+0xa0/0xbc
4[  622.364000]  [c0162840] do_path_lookup+0x1b7/0x200
4[  622.364000]  [c01628e1] __path_lookup_intent_open+0x42/0x72
4[  622.364000]  [c0162931] path_lookup_open+0x20/0x25
4[  622.364000]  [c0163026] open_namei+0x8c/0x532
4[  622.364000]  [c015a328] sys_fchmodat+0xac/0xb9
4[  622.364000]  [c015a71b] do_filp_open+0x25/0x39
4[  622.364000]  [c015a4e7] sys_lchown+0x3c/0x44
4[  622.364000]  [c015a8c7] get_unused_fd+0xa0/0xbc
4[  622.364000]  [c015a9d5] do_sys_open+0x42/0xbe
4[  622.364000]  [c015aa6b] sys_open+0x1a/0x1c
4[  622.364000]  [c0103dbc] syscall_call+0x7/0xb
4[  622.364000]  ===

--
6[  871.52] SysRq : Show Memory
6[  871.52] Mem-info:
4[  871.52] DMA per-cpu:
4[  871.52] CPU0: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, 
btch:   1 usd:   0
4[  871.52] Normal per-cpu:
4[  871.52] CPU0: Hot: hi:6, btch:   1 usd:   0   Cold: hi:2, 
btch:   1 usd:   0
4[  871.52] Active:5632 inactive:6764 dirty:0 writeback:302 unstable:0
4[  871.52]  free:717 slab:2024 mapped:926 pagetables:135 bounce:0
4[  871.52] DMA free:1104kB min:252kB low:312kB high:376kB active:3600kB 
inactive:6820kB present:16256kB pages_scanned:0 all_unreclaimable? no
4[  871.52] lowmem_reserve[]: 0 47
4[  871.52] Normal free:1764kB min:760kB low:948kB high:1140kB 
active:18928kB inactive:20236kB present:48708kB pages_scanned:0 
all_unreclaimable? no
4[  871.52] lowmem_reserve[]: 0 0
4[  871.52] DMA: 118*4kB 19*8kB 2*16kB 2*32kB 0*64kB 1*128kB 1*256kB 
0*512kB 0*1024kB 0*2048kB 0*4096kB = 1104kB
4[  871.52] Normal: 171*4kB 23*8kB 0*16kB 0*32kB 4*64kB 1*128kB 0*256kB 
1*512kB 0*1024kB 0*2048kB 0*4096kB = 1764kB
4[  871.52] Swap cache: add 0, delete 0, find 0/0, race 0+0
4[  871.52] Free swap  = 0kB
4[  871.52] Total swap = 0kB
6[  871.52] Free swap:0kB
6[  871.52] 16368 pages of RAM
6[  871.52] 0 pages of HIGHMEM
6[  871.52] 1044 reserved pages
6[  871.52] 13456 pages shared
6[  871.52] 0 pages swap cached
6[  871.52] 0 pages dirty
6[  871.52] 302 pages writeback
6[  871.52] 926 pages mapped
6[  871.52] 2024 pages slab
6[  871.52] 135 pages pagetables

--
ozon:/home/wolff# cat /proc/meminfo 
MemTotal:61296 kB
MemFree:  2752 kB
Buffers:  2228 kB
Cached:  29968 kB
SwapCached:  0 kB
Active:  22632 kB
Inactive:27056 kB
SwapTotal:   0 kB
SwapFree:0 kB
Dirty:   0 kB
Writeback:1208 kB
AnonPages:   17512 kB
Mapped:   3704 

Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-29 Thread Peter Zijlstra
On Sun, 2007-04-29 at 21:41 +0200, Rogier Wolff wrote:
 On Tue, Apr 17, 2007 at 10:37:38PM -0700, Andrew Morton wrote:
  Florin, can we please see /proc/meminfo as well?
  
  Also the result of `echo m  /proc/sysrq-trigger'
 
 Hi,
 
 It's been a while since this thread died out, but maybe I'm 
 having the same problem. Networking, large part of memory is 
 buffering writes. 
 
 In my case I'm using NBD. 
 
 Oh, 
 
 /sys/block/nbd0/stat gives:
  636   88 5353 1700  99119554   16227263156   
 43  1452000 61802352
 I put some debugging stuff in nbd, and it DOES NOT KNOW about the
 43 requests that the io scheduler claims are in flight at the
 driver 

AFAIK nbd is a tad broken; the following patch used to fix it, although
not in the proper way. Hence it never got merged.

There is a race where the plug state of the device queue gets confused,
which causes requests to just sit on the queue, without further action.

---

Subject: nbd: request_fn fixup

Dropping the queue_lock opens up a nasty race, fix this race by
plugging the device when we're done.

Also includes a small cleanup.

Signed-off-by: Peter Zijlstra [EMAIL PROTECTED]
CC: Daniel Phillips [EMAIL PROTECTED]
CC: Pavel Machek [EMAIL PROTECTED]
---
 drivers/block/nbd.c |   67 ++--
 1 file changed, 49 insertions(+), 18 deletions(-)

Index: linux-2.6/drivers/block/nbd.c
===
--- linux-2.6.orig/drivers/block/nbd.c  2006-09-07 17:20:52.0 +0200
+++ linux-2.6/drivers/block/nbd.c   2006-09-07 17:35:05.0 +0200
@@ -97,20 +97,24 @@ static const char *nbdcmd_to_ascii(int c
 }
 #endif /* NDEBUG */
 
-static void nbd_end_request(struct request *req)
+static void __nbd_end_request(struct request *req)
 {
int uptodate = (req-errors == 0) ? 1 : 0;
-   request_queue_t *q = req-q;
-   unsigned long flags;
 
dprintk(DBG_BLKDEV, %s: request %p: %s\n, req-rq_disk-disk_name,
req, uptodate? done: failed);
 
-   spin_lock_irqsave(q-queue_lock, flags);
-   if (!end_that_request_first(req, uptodate, req-nr_sectors)) {
+   if (!end_that_request_first(req, uptodate, req-nr_sectors))
end_that_request_last(req, uptodate);
-   }
-   spin_unlock_irqrestore(q-queue_lock, flags);
+}
+
+static void nbd_end_request(struct request *req)
+{
+   request_queue_t *q = req-q;
+
+   spin_lock_irq(q-queue_lock);
+   __nbd_end_request(req);
+   spin_unlock_irq(q-queue_lock);
 }
 
 /*
@@ -435,10 +439,8 @@ static void do_nbd_request(request_queue
mutex_unlock(lo-tx_lock);
printk(KERN_ERR %s: Attempted send on closed socket\n,
   lo-disk-disk_name);
-   req-errors++;
-   nbd_end_request(req);
spin_lock_irq(q-queue_lock);
-   continue;
+   goto error_out;
}
 
lo-active_req = req;
@@ -463,10 +465,13 @@ static void do_nbd_request(request_queue
 
 error_out:
req-errors++;
-   spin_unlock(q-queue_lock);
-   nbd_end_request(req);
-   spin_lock(q-queue_lock);
+   __nbd_end_request(req);
}
+   /*
+* q-queue_lock has been dropped, this opens up a race
+* plug the device to close it.
+*/
+   blk_plug_device(q);
return;
 }
 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Success! Was: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-20 Thread Florin Iucha
On Fri, Apr 20, 2007 at 09:37:30AM -0400, Trond Myklebust wrote:
> Thanks! Did you ever find out what had happened to the test that hung
> last night?

Nope.  I could not ssh into it and the machine was needed for some
windows duty before I got home ;)  I'll try again this coming week-end
and let you know if I see any problems.

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Re: Success! Was: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-20 Thread Trond Myklebust
On Fri, 2007-04-20 at 08:30 -0500, Florin Iucha wrote:
> On Thu, Apr 19, 2007 at 04:49:31PM -0500, Florin Iucha wrote:
> > On Thu, Apr 19, 2007 at 05:30:42PM -0400, Trond Myklebust wrote:
> > > > I'm far from the machine right now, so I will do some more tests
> > > > tonight, but right now, the new patchset is not good.  What is the
> > > > difference between reverting the patch you sent yesterday and your
> > > > current fifth patch?  I assume the other four are identical, right?
> > > 
> > > The only difference is the way in which we handle retries of an NFSv4
> > > request: the new patch disconnects if and only if a timeout has
> > > occurred, or the server sends us garbage.
> > 
> > I have to mention that I rebased to the head of the tree
> > (895e1fc7226e6732bc77138955b6c7dfa279f57a) before applying your
> > patches, in order to test what I expect the official tree to be.
> > 
> > Tonight I'll test this kernel once more, then go back to 21-rc7 and
> > apply your 5 patches and re-test.
> 
> It passed big-copy, and the copy run from the gnome-session while I
> did my morning light browsing, email reading, etc.
> 
> kernel:
>895e1fc7226e6732bc77138955b6c7dfa279f57a
> 
> patches:
>linux-2.6.21-001-cleanup_unstable_write.dif
>linux-2.6.21-002-defer_clearing_pg_writeback.dif
>linux-2.6.21-003-fix_desynchronised_ncommit.dif
>linux-2.6.21-004-fix_nfs_set_page_dirty.dif
>linux-2.6.21-005-fix_nfsv4_resend.dif
> 
> Regards,
> florin

Thanks! Did you ever find out what had happened to the test that hung
last night?

Cheers
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Success! Was: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-20 Thread Florin Iucha
On Thu, Apr 19, 2007 at 04:49:31PM -0500, Florin Iucha wrote:
> On Thu, Apr 19, 2007 at 05:30:42PM -0400, Trond Myklebust wrote:
> > > I'm far from the machine right now, so I will do some more tests
> > > tonight, but right now, the new patchset is not good.  What is the
> > > difference between reverting the patch you sent yesterday and your
> > > current fifth patch?  I assume the other four are identical, right?
> > 
> > The only difference is the way in which we handle retries of an NFSv4
> > request: the new patch disconnects if and only if a timeout has
> > occurred, or the server sends us garbage.
> 
> I have to mention that I rebased to the head of the tree
> (895e1fc7226e6732bc77138955b6c7dfa279f57a) before applying your
> patches, in order to test what I expect the official tree to be.
> 
> Tonight I'll test this kernel once more, then go back to 21-rc7 and
> apply your 5 patches and re-test.

It passed big-copy, and the copy run from the gnome-session while I
did my morning light browsing, email reading, etc.

kernel:
   895e1fc7226e6732bc77138955b6c7dfa279f57a

patches:
   linux-2.6.21-001-cleanup_unstable_write.dif
   linux-2.6.21-002-defer_clearing_pg_writeback.dif
   linux-2.6.21-003-fix_desynchronised_ncommit.dif
   linux-2.6.21-004-fix_nfs_set_page_dirty.dif
   linux-2.6.21-005-fix_nfsv4_resend.dif

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Re: Success! Was: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-20 Thread Trond Myklebust
On Fri, 2007-04-20 at 08:30 -0500, Florin Iucha wrote:
 On Thu, Apr 19, 2007 at 04:49:31PM -0500, Florin Iucha wrote:
  On Thu, Apr 19, 2007 at 05:30:42PM -0400, Trond Myklebust wrote:
I'm far from the machine right now, so I will do some more tests
tonight, but right now, the new patchset is not good.  What is the
difference between reverting the patch you sent yesterday and your
current fifth patch?  I assume the other four are identical, right?
   
   The only difference is the way in which we handle retries of an NFSv4
   request: the new patch disconnects if and only if a timeout has
   occurred, or the server sends us garbage.
  
  I have to mention that I rebased to the head of the tree
  (895e1fc7226e6732bc77138955b6c7dfa279f57a) before applying your
  patches, in order to test what I expect the official tree to be.
  
  Tonight I'll test this kernel once more, then go back to 21-rc7 and
  apply your 5 patches and re-test.
 
 It passed big-copy, and the copy run from the gnome-session while I
 did my morning light browsing, email reading, etc.
 
 kernel:
895e1fc7226e6732bc77138955b6c7dfa279f57a
 
 patches:
linux-2.6.21-001-cleanup_unstable_write.dif
linux-2.6.21-002-defer_clearing_pg_writeback.dif
linux-2.6.21-003-fix_desynchronised_ncommit.dif
linux-2.6.21-004-fix_nfs_set_page_dirty.dif
linux-2.6.21-005-fix_nfsv4_resend.dif
 
 Regards,
 florin

Thanks! Did you ever find out what had happened to the test that hung
last night?

Cheers
  Trond
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Success! Was: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-20 Thread Florin Iucha
On Thu, Apr 19, 2007 at 04:49:31PM -0500, Florin Iucha wrote:
 On Thu, Apr 19, 2007 at 05:30:42PM -0400, Trond Myklebust wrote:
   I'm far from the machine right now, so I will do some more tests
   tonight, but right now, the new patchset is not good.  What is the
   difference between reverting the patch you sent yesterday and your
   current fifth patch?  I assume the other four are identical, right?
  
  The only difference is the way in which we handle retries of an NFSv4
  request: the new patch disconnects if and only if a timeout has
  occurred, or the server sends us garbage.
 
 I have to mention that I rebased to the head of the tree
 (895e1fc7226e6732bc77138955b6c7dfa279f57a) before applying your
 patches, in order to test what I expect the official tree to be.
 
 Tonight I'll test this kernel once more, then go back to 21-rc7 and
 apply your 5 patches and re-test.

It passed big-copy, and the copy run from the gnome-session while I
did my morning light browsing, email reading, etc.

kernel:
   895e1fc7226e6732bc77138955b6c7dfa279f57a

patches:
   linux-2.6.21-001-cleanup_unstable_write.dif
   linux-2.6.21-002-defer_clearing_pg_writeback.dif
   linux-2.6.21-003-fix_desynchronised_ncommit.dif
   linux-2.6.21-004-fix_nfs_set_page_dirty.dif
   linux-2.6.21-005-fix_nfsv4_resend.dif

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Re: Success! Was: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-20 Thread Florin Iucha
On Fri, Apr 20, 2007 at 09:37:30AM -0400, Trond Myklebust wrote:
 Thanks! Did you ever find out what had happened to the test that hung
 last night?

Nope.  I could not ssh into it and the machine was needed for some
windows duty before I got home ;)  I'll try again this coming week-end
and let you know if I see any problems.

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Re: Failure! Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-19 Thread Florin Iucha
On Thu, Apr 19, 2007 at 05:30:42PM -0400, Trond Myklebust wrote:
> > I'm far from the machine right now, so I will do some more tests
> > tonight, but right now, the new patchset is not good.  What is the
> > difference between reverting the patch you sent yesterday and your
> > current fifth patch?  I assume the other four are identical, right?
> 
> The only difference is the way in which we handle retries of an NFSv4
> request: the new patch disconnects if and only if a timeout has
> occurred, or the server sends us garbage.

I have to mention that I rebased to the head of the tree
(895e1fc7226e6732bc77138955b6c7dfa279f57a) before applying your
patches, in order to test what I expect the official tree to be.

Tonight I'll test this kernel once more, then go back to 21-rc7 and
apply your 5 patches and re-test.

florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Re: Failure! Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-19 Thread Trond Myklebust
On Thu, 2007-04-19 at 14:58 -0500, Florin Iucha wrote:
> On Thu, Apr 19, 2007 at 12:09:42PM -0400, Trond Myklebust wrote:
> > See
> >http://client.linux-nfs.org/Linux-2.6.x/2.6.21-rc7/
> > 
> > I'm giving the first 5 patches of that series (i.e.
> > linux-2.6.21-001-cleanup_unstable_write.dif to
> > linux-2.6.21-005-fix_nfsv4_resend.dif) an extra beating since those are
> > the ones that I feel should go into 2.6.21 final in order to fix the
> > read/write regressions that have been reported. They should be identical
> > to the patches that I posted on lkml in the past 3 days.
> > 
> > Please feel free to grab them and give them a test.
> 
> The copy completed some time ago, but now I cannot ssh into the box!
> This is a new development, as before I was always able to ssh into,
> even when the copy slowed down to a trickle.
> 
> I'm far from the machine right now, so I will do some more tests
> tonight, but right now, the new patchset is not good.  What is the
> difference between reverting the patch you sent yesterday and your
> current fifth patch?  I assume the other four are identical, right?

The only difference is the way in which we handle retries of an NFSv4
request: the new patch disconnects if and only if a timeout has
occurred, or the server sends us garbage.

Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Failure! Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-19 Thread Florin Iucha
On Thu, Apr 19, 2007 at 12:09:42PM -0400, Trond Myklebust wrote:
> See
>http://client.linux-nfs.org/Linux-2.6.x/2.6.21-rc7/
> 
> I'm giving the first 5 patches of that series (i.e.
> linux-2.6.21-001-cleanup_unstable_write.dif to
> linux-2.6.21-005-fix_nfsv4_resend.dif) an extra beating since those are
> the ones that I feel should go into 2.6.21 final in order to fix the
> read/write regressions that have been reported. They should be identical
> to the patches that I posted on lkml in the past 3 days.
> 
> Please feel free to grab them and give them a test.

The copy completed some time ago, but now I cannot ssh into the box!
This is a new development, as before I was always able to ssh into,
even when the copy slowed down to a trickle.

I'm far from the machine right now, so I will do some more tests
tonight, but right now, the new patchset is not good.  What is the
difference between reverting the patch you sent yesterday and your
current fifth patch?  I assume the other four are identical, right?

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-19 Thread Trond Myklebust
On Thu, 2007-04-19 at 10:50 -0500, Florin Iucha wrote:
> On Thu, Apr 19, 2007 at 11:17:28AM -0400, Trond Myklebust wrote:
> > On Thu, 2007-04-19 at 11:12 -0400, Chuck Lever wrote:
> > > Perhaps instead of looking at the number of bytes sent, the logic in the 
> > > last hunk of this patch should check which queue the request is sitting 
> > > on.
> > 
> > ??? It would be a bug for the request to be sitting on _any_ queue when
> > it enters xprt_transmit().
> > 
> > Here is the patch that I'm currently testing.
> 
> Trond,
> 
> What is the set of patches that are you testing?  I'd like to give
> that a spin tonight as well.
> 
> It is possible that what makes my configuration more susceptible
> to the problem is the fact that the client significantly overpowers
> the server: Athlon x2 4200+ with 2Gb of RAM for the client vs. PIII
> 1Ghz 512 MB RAM for the server.  They both have gigabit ethernet
> and both NICs and the switch support jumbo frames.
> 
> Regards,
> florin
> 

See
   http://client.linux-nfs.org/Linux-2.6.x/2.6.21-rc7/

I'm giving the first 5 patches of that series (i.e.
linux-2.6.21-001-cleanup_unstable_write.dif to
linux-2.6.21-005-fix_nfsv4_resend.dif) an extra beating since those are
the ones that I feel should go into 2.6.21 final in order to fix the
read/write regressions that have been reported. They should be identical
to the patches that I posted on lkml in the past 3 days.

Please feel free to grab them and give them a test.

Cheers
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-19 Thread Florin Iucha
On Thu, Apr 19, 2007 at 11:17:28AM -0400, Trond Myklebust wrote:
> On Thu, 2007-04-19 at 11:12 -0400, Chuck Lever wrote:
> > Perhaps instead of looking at the number of bytes sent, the logic in the 
> > last hunk of this patch should check which queue the request is sitting on.
> 
> ??? It would be a bug for the request to be sitting on _any_ queue when
> it enters xprt_transmit().
> 
> Here is the patch that I'm currently testing.

Trond,

What is the set of patches that are you testing?  I'd like to give
that a spin tonight as well.

It is possible that what makes my configuration more susceptible
to the problem is the fact that the client significantly overpowers
the server: Athlon x2 4200+ with 2Gb of RAM for the client vs. PIII
1Ghz 512 MB RAM for the server.  They both have gigabit ethernet
and both NICs and the switch support jumbo frames.

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-19 Thread Chuck Lever

Trond Myklebust wrote:

On Wed, 2007-04-18 at 20:52 -0500, Florin Iucha wrote:

On Wed, Apr 18, 2007 at 10:11:46AM -0400, Trond Myklebust wrote:

Do you have a copy of wireshark or ethereal on hand? If so, could you
take a look at whether or not any NFS traffic is going between the
client and server once the hang happens?
I used the following command 


   tcpdump -w nfs-traffic -i eth0 -vv -tt dst port nfs

to capture

   http://iucha.net/nfs/21-rc7-nfs4/nfs-traffic.bz2

I started the capture before starting the copy and left it to run for
a few minutes after the traffic slowed to a crawl.

The iostat and vmstat are at:

   http://iucha.net/nfs/21-rc7-nfs4/iostat
   http://iucha.net/nfs/21-rc7-nfs4/vmstat
   
It seems that my original problem report had a big mistake!  There is

no hang, but at some point the write slows down to a trickle (from
40,000 blocks/s to 22 blocks/s) as can be seen from the iostat log.


Yeah. You only captured the outgoing traffic to the server, but already
it looks as if there were 'interesting' things going on. In frames 29346
to 29350, the traffic stops altogether for 5 seconds (I only see
keepalives) then it starts up again. Ditto for frames 40477-40482
(another 5 seconds). ...
Then at around frame 92072, the client starts to send a bunch of RSTs.
Aha I'll bet that reverting the appended patch fixes the problem.

The assumption Chuck makes is that if _no_ request bytes have been sent,
yet the request is on the 'receive list' then it must be a resend is
patently false in the case where the send queue just happens to be full.


There are other places in the RPC client where "zero bytes sent" implies 
that the request has been sent.  The real problem here is that zeroing 
the "bytes sent" field is overloaded.


Perhaps instead of looking at the number of bytes sent, the logic in the 
last hunk of this patch should check which queue the request is sitting on.




---
commit 43d78ef2ba5bec26d0315859e8324bfc0be23766
Author: Chuck Lever <[EMAIL PROTECTED]>
Date:   Tue Feb 6 18:26:11 2007 -0500

NFS: disconnect before retrying NFSv4 requests over TCP

RFC3530 section 3.1.1 states an NFSv4 client MUST NOT send a request

twice on the same connection unless it is the NULL procedure.  Section
3.1.1 suggests that the client should disconnect and reconnect if it
wants to retry a request.

Implement this by adding an rpc_clnt flag that an ULP can use to

specify that the underlying transport should be disconnected on a
major timeout.  The NFSv4 client asserts this new flag, and requests
no retries after a minor retransmit timeout.

Note that disconnecting on a retransmit is in general not safe to do

if the RPC client does not reuse the TCP port number when reconnecting.

See http://bugzilla.linux-nfs.org/show_bug.cgi?id=6

Signed-off-by: Chuck Lever <[EMAIL PROTECTED]>

Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]>

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index a3191f0..c46e94f 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -394,7 +394,8 @@ static void nfs_init_timeout_values(struct rpc_timeout *to, 
int proto,
 static int nfs_create_rpc_client(struct nfs_client *clp, int proto,
unsigned int timeo,
unsigned int retrans,
-   rpc_authflavor_t flavor)
+   rpc_authflavor_t flavor,
+   int flags)
 {
struct rpc_timeout  timeparms;
struct rpc_clnt *clnt = NULL;
@@ -407,6 +408,7 @@ static int nfs_create_rpc_client(struct nfs_client *clp, 
int proto,
.program= _program,
.version= clp->rpc_ops->version,
.authflavor = flavor,
+   .flags  = flags,
};
 
 	if (!IS_ERR(clp->cl_rpcclient))

@@ -548,7 +550,7 @@ static int nfs_init_client(struct nfs_client *clp, const 
struct nfs_mount_data *
 * - RFC 2623, sec 2.3.2
 */
error = nfs_create_rpc_client(clp, proto, data->timeo, data->retrans,
-   RPC_AUTH_UNIX);
+   RPC_AUTH_UNIX, 0);
if (error < 0)
goto error;
nfs_mark_client_ready(clp, NFS_CS_READY);
@@ -868,7 +870,8 @@ static int nfs4_init_client(struct nfs_client *clp,
/* Check NFS protocol revision and initialize RPC op vector */
clp->rpc_ops = _v4_clientops;
 
-	error = nfs_create_rpc_client(clp, proto, timeo, retrans, authflavour);

+   error = nfs_create_rpc_client(clp, proto, timeo, retrans, authflavour,
+   RPC_CLNT_CREATE_DISCRTRY);
if (error < 0)
goto error;
memcpy(clp->cl_ipaddr, ip_addr, sizeof(clp->cl_ipaddr));

Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-19 Thread Trond Myklebust
On Thu, 2007-04-19 at 11:12 -0400, Chuck Lever wrote:
> Perhaps instead of looking at the number of bytes sent, the logic in the 
> last hunk of this patch should check which queue the request is sitting on.

??? It would be a bug for the request to be sitting on _any_ queue when
it enters xprt_transmit().

Here is the patch that I'm currently testing.

Cheers
  Trond
---
From: Trond Myklebust <[EMAIL PROTECTED]>
Date: Thu, 19 Apr 2007 09:55:44 -0400
RPC: Fix the TCP resend semantics for NFSv4

Fix a regression due to the patch "NFS: disconnect before retrying NFSv4
requests over TCP"

The assumption made in xprt_transmit() that the condition
"req->rq_bytes_sent == 0 and request is on the receive list"
should imply that we're dealing with a retransmission is false.
Firstly, it may simply happen that the socket send queue was full
at the time the request was initially sent through xprt_transmit().
Secondly, doing this for each request that was retransmitted implies
that we disconnect and reconnect for _every_ request that happened to
be retransmitted irrespective of whether or not a disconnection has
already occurred.

Fix is to move this logic into the call_status request timeout handler.

Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]>
---

 net/sunrpc/clnt.c |4 
 net/sunrpc/xprt.c |   10 --
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 6d7221f..396cdbe 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1046,6 +1046,8 @@ call_status(struct rpc_task *task)
rpc_delay(task, 3*HZ);
case -ETIMEDOUT:
task->tk_action = call_timeout;
+   if (task->tk_client->cl_discrtry)
+   xprt_disconnect(task->tk_xprt);
break;
case -ECONNREFUSED:
case -ENOTCONN:
@@ -1169,6 +1171,8 @@ call_decode(struct rpc_task *task)
 out_retry:
req->rq_received = req->rq_private_buf.len = 0;
task->tk_status = 0;
+   if (task->tk_client->cl_discrtry)
+   xprt_disconnect(task->tk_xprt);
 }
 
 /*
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index ee6ffa0..456a145 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -735,16 +735,6 @@ void xprt_transmit(struct rpc_task *task)
xprt_reset_majortimeo(req);
/* Turn off autodisconnect */
del_singleshot_timer_sync(>timer);
-   } else {
-   /* If all request bytes have been sent,
-* then we must be retransmitting this one */
-   if (!req->rq_bytes_sent) {
-   if (task->tk_client->cl_discrtry) {
-   xprt_disconnect(xprt);
-   task->tk_status = -ENOTCONN;
-   return;
-   }
-   }
}
} else if (!req->rq_bytes_sent)
return;
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-19 Thread Chuck Lever

Trond Myklebust wrote:

On Wed, 2007-04-18 at 20:52 -0500, Florin Iucha wrote:

On Wed, Apr 18, 2007 at 10:11:46AM -0400, Trond Myklebust wrote:

Do you have a copy of wireshark or ethereal on hand? If so, could you
take a look at whether or not any NFS traffic is going between the
client and server once the hang happens?
I used the following command 


   tcpdump -w nfs-traffic -i eth0 -vv -tt dst port nfs

to capture

   http://iucha.net/nfs/21-rc7-nfs4/nfs-traffic.bz2

I started the capture before starting the copy and left it to run for
a few minutes after the traffic slowed to a crawl.

The iostat and vmstat are at:

   http://iucha.net/nfs/21-rc7-nfs4/iostat
   http://iucha.net/nfs/21-rc7-nfs4/vmstat
   
It seems that my original problem report had a big mistake!  There is

no hang, but at some point the write slows down to a trickle (from
40,000 blocks/s to 22 blocks/s) as can be seen from the iostat log.


Yeah. You only captured the outgoing traffic to the server, but already
it looks as if there were 'interesting' things going on. In frames 29346
to 29350, the traffic stops altogether for 5 seconds (I only see
keepalives) then it starts up again. Ditto for frames 40477-40482
(another 5 seconds). ...
Then at around frame 92072, the client starts to send a bunch of RSTs.
Aha I'll bet that reverting the appended patch fixes the problem.

The assumption Chuck makes is that if _no_ request bytes have been sent,
yet the request is on the 'receive list' then it must be a resend is
patently false in the case where the send queue just happens to be full.


There are other places in the RPC client where zero bytes sent implies 
that the request has been sent.  The real problem here is that zeroing 
the bytes sent field is overloaded.


Perhaps instead of looking at the number of bytes sent, the logic in the 
last hunk of this patch should check which queue the request is sitting on.




---
commit 43d78ef2ba5bec26d0315859e8324bfc0be23766
Author: Chuck Lever [EMAIL PROTECTED]
Date:   Tue Feb 6 18:26:11 2007 -0500

NFS: disconnect before retrying NFSv4 requests over TCP

RFC3530 section 3.1.1 states an NFSv4 client MUST NOT send a request

twice on the same connection unless it is the NULL procedure.  Section
3.1.1 suggests that the client should disconnect and reconnect if it
wants to retry a request.

Implement this by adding an rpc_clnt flag that an ULP can use to

specify that the underlying transport should be disconnected on a
major timeout.  The NFSv4 client asserts this new flag, and requests
no retries after a minor retransmit timeout.

Note that disconnecting on a retransmit is in general not safe to do

if the RPC client does not reuse the TCP port number when reconnecting.

See http://bugzilla.linux-nfs.org/show_bug.cgi?id=6

Signed-off-by: Chuck Lever [EMAIL PROTECTED]

Signed-off-by: Trond Myklebust [EMAIL PROTECTED]

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index a3191f0..c46e94f 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -394,7 +394,8 @@ static void nfs_init_timeout_values(struct rpc_timeout *to, 
int proto,
 static int nfs_create_rpc_client(struct nfs_client *clp, int proto,
unsigned int timeo,
unsigned int retrans,
-   rpc_authflavor_t flavor)
+   rpc_authflavor_t flavor,
+   int flags)
 {
struct rpc_timeout  timeparms;
struct rpc_clnt *clnt = NULL;
@@ -407,6 +408,7 @@ static int nfs_create_rpc_client(struct nfs_client *clp, 
int proto,
.program= nfs_program,
.version= clp-rpc_ops-version,
.authflavor = flavor,
+   .flags  = flags,
};
 
 	if (!IS_ERR(clp-cl_rpcclient))

@@ -548,7 +550,7 @@ static int nfs_init_client(struct nfs_client *clp, const 
struct nfs_mount_data *
 * - RFC 2623, sec 2.3.2
 */
error = nfs_create_rpc_client(clp, proto, data-timeo, data-retrans,
-   RPC_AUTH_UNIX);
+   RPC_AUTH_UNIX, 0);
if (error  0)
goto error;
nfs_mark_client_ready(clp, NFS_CS_READY);
@@ -868,7 +870,8 @@ static int nfs4_init_client(struct nfs_client *clp,
/* Check NFS protocol revision and initialize RPC op vector */
clp-rpc_ops = nfs_v4_clientops;
 
-	error = nfs_create_rpc_client(clp, proto, timeo, retrans, authflavour);

+   error = nfs_create_rpc_client(clp, proto, timeo, retrans, authflavour,
+   RPC_CLNT_CREATE_DISCRTRY);
if (error  0)
goto error;
memcpy(clp-cl_ipaddr, ip_addr, sizeof(clp-cl_ipaddr));
diff --git 

Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-19 Thread Trond Myklebust
On Thu, 2007-04-19 at 11:12 -0400, Chuck Lever wrote:
 Perhaps instead of looking at the number of bytes sent, the logic in the 
 last hunk of this patch should check which queue the request is sitting on.

??? It would be a bug for the request to be sitting on _any_ queue when
it enters xprt_transmit().

Here is the patch that I'm currently testing.

Cheers
  Trond
---
From: Trond Myklebust [EMAIL PROTECTED]
Date: Thu, 19 Apr 2007 09:55:44 -0400
RPC: Fix the TCP resend semantics for NFSv4

Fix a regression due to the patch NFS: disconnect before retrying NFSv4
requests over TCP

The assumption made in xprt_transmit() that the condition
req-rq_bytes_sent == 0 and request is on the receive list
should imply that we're dealing with a retransmission is false.
Firstly, it may simply happen that the socket send queue was full
at the time the request was initially sent through xprt_transmit().
Secondly, doing this for each request that was retransmitted implies
that we disconnect and reconnect for _every_ request that happened to
be retransmitted irrespective of whether or not a disconnection has
already occurred.

Fix is to move this logic into the call_status request timeout handler.

Signed-off-by: Trond Myklebust [EMAIL PROTECTED]
---

 net/sunrpc/clnt.c |4 
 net/sunrpc/xprt.c |   10 --
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 6d7221f..396cdbe 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1046,6 +1046,8 @@ call_status(struct rpc_task *task)
rpc_delay(task, 3*HZ);
case -ETIMEDOUT:
task-tk_action = call_timeout;
+   if (task-tk_client-cl_discrtry)
+   xprt_disconnect(task-tk_xprt);
break;
case -ECONNREFUSED:
case -ENOTCONN:
@@ -1169,6 +1171,8 @@ call_decode(struct rpc_task *task)
 out_retry:
req-rq_received = req-rq_private_buf.len = 0;
task-tk_status = 0;
+   if (task-tk_client-cl_discrtry)
+   xprt_disconnect(task-tk_xprt);
 }
 
 /*
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index ee6ffa0..456a145 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -735,16 +735,6 @@ void xprt_transmit(struct rpc_task *task)
xprt_reset_majortimeo(req);
/* Turn off autodisconnect */
del_singleshot_timer_sync(xprt-timer);
-   } else {
-   /* If all request bytes have been sent,
-* then we must be retransmitting this one */
-   if (!req-rq_bytes_sent) {
-   if (task-tk_client-cl_discrtry) {
-   xprt_disconnect(xprt);
-   task-tk_status = -ENOTCONN;
-   return;
-   }
-   }
}
} else if (!req-rq_bytes_sent)
return;
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-19 Thread Florin Iucha
On Thu, Apr 19, 2007 at 11:17:28AM -0400, Trond Myklebust wrote:
 On Thu, 2007-04-19 at 11:12 -0400, Chuck Lever wrote:
  Perhaps instead of looking at the number of bytes sent, the logic in the 
  last hunk of this patch should check which queue the request is sitting on.
 
 ??? It would be a bug for the request to be sitting on _any_ queue when
 it enters xprt_transmit().
 
 Here is the patch that I'm currently testing.

Trond,

What is the set of patches that are you testing?  I'd like to give
that a spin tonight as well.

It is possible that what makes my configuration more susceptible
to the problem is the fact that the client significantly overpowers
the server: Athlon x2 4200+ with 2Gb of RAM for the client vs. PIII
1Ghz 512 MB RAM for the server.  They both have gigabit ethernet
and both NICs and the switch support jumbo frames.

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-19 Thread Trond Myklebust
On Thu, 2007-04-19 at 10:50 -0500, Florin Iucha wrote:
 On Thu, Apr 19, 2007 at 11:17:28AM -0400, Trond Myklebust wrote:
  On Thu, 2007-04-19 at 11:12 -0400, Chuck Lever wrote:
   Perhaps instead of looking at the number of bytes sent, the logic in the 
   last hunk of this patch should check which queue the request is sitting 
   on.
  
  ??? It would be a bug for the request to be sitting on _any_ queue when
  it enters xprt_transmit().
  
  Here is the patch that I'm currently testing.
 
 Trond,
 
 What is the set of patches that are you testing?  I'd like to give
 that a spin tonight as well.
 
 It is possible that what makes my configuration more susceptible
 to the problem is the fact that the client significantly overpowers
 the server: Athlon x2 4200+ with 2Gb of RAM for the client vs. PIII
 1Ghz 512 MB RAM for the server.  They both have gigabit ethernet
 and both NICs and the switch support jumbo frames.
 
 Regards,
 florin
 

See
   http://client.linux-nfs.org/Linux-2.6.x/2.6.21-rc7/

I'm giving the first 5 patches of that series (i.e.
linux-2.6.21-001-cleanup_unstable_write.dif to
linux-2.6.21-005-fix_nfsv4_resend.dif) an extra beating since those are
the ones that I feel should go into 2.6.21 final in order to fix the
read/write regressions that have been reported. They should be identical
to the patches that I posted on lkml in the past 3 days.

Please feel free to grab them and give them a test.

Cheers
  Trond
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Failure! Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-19 Thread Florin Iucha
On Thu, Apr 19, 2007 at 12:09:42PM -0400, Trond Myklebust wrote:
 See
http://client.linux-nfs.org/Linux-2.6.x/2.6.21-rc7/
 
 I'm giving the first 5 patches of that series (i.e.
 linux-2.6.21-001-cleanup_unstable_write.dif to
 linux-2.6.21-005-fix_nfsv4_resend.dif) an extra beating since those are
 the ones that I feel should go into 2.6.21 final in order to fix the
 read/write regressions that have been reported. They should be identical
 to the patches that I posted on lkml in the past 3 days.
 
 Please feel free to grab them and give them a test.

The copy completed some time ago, but now I cannot ssh into the box!
This is a new development, as before I was always able to ssh into,
even when the copy slowed down to a trickle.

I'm far from the machine right now, so I will do some more tests
tonight, but right now, the new patchset is not good.  What is the
difference between reverting the patch you sent yesterday and your
current fifth patch?  I assume the other four are identical, right?

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Re: Failure! Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-19 Thread Trond Myklebust
On Thu, 2007-04-19 at 14:58 -0500, Florin Iucha wrote:
 On Thu, Apr 19, 2007 at 12:09:42PM -0400, Trond Myklebust wrote:
  See
 http://client.linux-nfs.org/Linux-2.6.x/2.6.21-rc7/
  
  I'm giving the first 5 patches of that series (i.e.
  linux-2.6.21-001-cleanup_unstable_write.dif to
  linux-2.6.21-005-fix_nfsv4_resend.dif) an extra beating since those are
  the ones that I feel should go into 2.6.21 final in order to fix the
  read/write regressions that have been reported. They should be identical
  to the patches that I posted on lkml in the past 3 days.
  
  Please feel free to grab them and give them a test.
 
 The copy completed some time ago, but now I cannot ssh into the box!
 This is a new development, as before I was always able to ssh into,
 even when the copy slowed down to a trickle.
 
 I'm far from the machine right now, so I will do some more tests
 tonight, but right now, the new patchset is not good.  What is the
 difference between reverting the patch you sent yesterday and your
 current fifth patch?  I assume the other four are identical, right?

The only difference is the way in which we handle retries of an NFSv4
request: the new patch disconnects if and only if a timeout has
occurred, or the server sends us garbage.

Trond
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Failure! Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-19 Thread Florin Iucha
On Thu, Apr 19, 2007 at 05:30:42PM -0400, Trond Myklebust wrote:
  I'm far from the machine right now, so I will do some more tests
  tonight, but right now, the new patchset is not good.  What is the
  difference between reverting the patch you sent yesterday and your
  current fifth patch?  I assume the other four are identical, right?
 
 The only difference is the way in which we handle retries of an NFSv4
 request: the new patch disconnects if and only if a timeout has
 occurred, or the server sends us garbage.

I have to mention that I rebased to the head of the tree
(895e1fc7226e6732bc77138955b6c7dfa279f57a) before applying your
patches, in order to test what I expect the official tree to be.

Tonight I'll test this kernel once more, then go back to 21-rc7 and
apply your 5 patches and re-test.

florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Success! Was: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Florin Iucha
On Wed, Apr 18, 2007 at 10:45:13PM -0400, Trond Myklebust wrote:
> On Wed, 2007-04-18 at 20:52 -0500, Florin Iucha wrote:
> > It seems that my original problem report had a big mistake!  There is
> > no hang, but at some point the write slows down to a trickle (from
> > 40,000 blocks/s to 22 blocks/s) as can be seen from the iostat log.
> 
> Yeah. You only captured the outgoing traffic to the server, but already
> it looks as if there were 'interesting' things going on. In frames 29346
> to 29350, the traffic stops altogether for 5 seconds (I only see
> keepalives) then it starts up again. Ditto for frames 40477-40482
> (another 5 seconds). ...
> Then at around frame 92072, the client starts to send a bunch of RSTs.
> Aha I'll bet that reverting the appended patch fixes the problem.

You win!

Reverting this patch (on top of your previous 5) allowed the big copy
to complete (70GB) as well as successful log-in to gnome!

Acked-By: Florin Iucha <[EMAIL PROTECTED]>

Thanks so much for the patience with this elusive bug and stubborn
bugreporter!

Regards,
florin

> ---
> commit 43d78ef2ba5bec26d0315859e8324bfc0be23766
> Author: Chuck Lever <[EMAIL PROTECTED]>
> Date:   Tue Feb 6 18:26:11 2007 -0500
> 
> NFS: disconnect before retrying NFSv4 requests over TCP
> 
> RFC3530 section 3.1.1 states an NFSv4 client MUST NOT send a request
> twice on the same connection unless it is the NULL procedure.  Section
> 3.1.1 suggests that the client should disconnect and reconnect if it
> wants to retry a request.
> 
> Implement this by adding an rpc_clnt flag that an ULP can use to
> specify that the underlying transport should be disconnected on a
> major timeout.  The NFSv4 client asserts this new flag, and requests
> no retries after a minor retransmit timeout.
> 
> Note that disconnecting on a retransmit is in general not safe to do
> if the RPC client does not reuse the TCP port number when reconnecting.
> 
> See http://bugzilla.linux-nfs.org/show_bug.cgi?id=6
> 
> Signed-off-by: Chuck Lever <[EMAIL PROTECTED]>
> Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]>
> 
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index a3191f0..c46e94f 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -394,7 +394,8 @@ static void nfs_init_timeout_values(struct rpc_timeout 
> *to, int proto,
>  static int nfs_create_rpc_client(struct nfs_client *clp, int proto,
>   unsigned int timeo,
>   unsigned int retrans,
> - rpc_authflavor_t flavor)
> + rpc_authflavor_t flavor,
> + int flags)
>  {
>   struct rpc_timeout  timeparms;
>   struct rpc_clnt *clnt = NULL;
> @@ -407,6 +408,7 @@ static int nfs_create_rpc_client(struct nfs_client *clp, 
> int proto,
>   .program= _program,
>   .version= clp->rpc_ops->version,
>   .authflavor = flavor,
> + .flags  = flags,
>   };
>  
>   if (!IS_ERR(clp->cl_rpcclient))
> @@ -548,7 +550,7 @@ static int nfs_init_client(struct nfs_client *clp, const 
> struct nfs_mount_data *
>* - RFC 2623, sec 2.3.2
>*/
>   error = nfs_create_rpc_client(clp, proto, data->timeo, data->retrans,
> - RPC_AUTH_UNIX);
> + RPC_AUTH_UNIX, 0);
>   if (error < 0)
>   goto error;
>   nfs_mark_client_ready(clp, NFS_CS_READY);
> @@ -868,7 +870,8 @@ static int nfs4_init_client(struct nfs_client *clp,
>   /* Check NFS protocol revision and initialize RPC op vector */
>   clp->rpc_ops = _v4_clientops;
>  
> - error = nfs_create_rpc_client(clp, proto, timeo, retrans, authflavour);
> + error = nfs_create_rpc_client(clp, proto, timeo, retrans, authflavour,
> + RPC_CLNT_CREATE_DISCRTRY);
>   if (error < 0)
>   goto error;
>   memcpy(clp->cl_ipaddr, ip_addr, sizeof(clp->cl_ipaddr));
> diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
> index a1be89d..c7a78ee 100644
> --- a/include/linux/sunrpc/clnt.h
> +++ b/include/linux/sunrpc/clnt.h
> @@ -40,6 +40,7 @@ struct rpc_clnt {
>  
>   unsigned intcl_softrtry : 1,/* soft timeouts */
>   cl_intr : 1,/* interruptible */
> + cl_discrtry : 1,/* disconnect before retry */
>   cl_autobind : 1,/* use getport() */
>   cl_oneshot  : 1,/* dispose after use */
>   cl_dead : 1;/* abandoned */
> @@ -111,6 +112,7 @@ struct rpc_create_args {
>  #define RPC_CLNT_CREATE_ONESHOT  (1UL << 3)
>  #define 

Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Trond Myklebust
On Wed, 2007-04-18 at 20:52 -0500, Florin Iucha wrote:
> On Wed, Apr 18, 2007 at 10:11:46AM -0400, Trond Myklebust wrote:
> > Do you have a copy of wireshark or ethereal on hand? If so, could you
> > take a look at whether or not any NFS traffic is going between the
> > client and server once the hang happens?
> 
> I used the following command 
> 
>tcpdump -w nfs-traffic -i eth0 -vv -tt dst port nfs
> 
> to capture
> 
>http://iucha.net/nfs/21-rc7-nfs4/nfs-traffic.bz2
> 
> I started the capture before starting the copy and left it to run for
> a few minutes after the traffic slowed to a crawl.
> 
> The iostat and vmstat are at:
> 
>http://iucha.net/nfs/21-rc7-nfs4/iostat
>http://iucha.net/nfs/21-rc7-nfs4/vmstat
>
> It seems that my original problem report had a big mistake!  There is
> no hang, but at some point the write slows down to a trickle (from
> 40,000 blocks/s to 22 blocks/s) as can be seen from the iostat log.

Yeah. You only captured the outgoing traffic to the server, but already
it looks as if there were 'interesting' things going on. In frames 29346
to 29350, the traffic stops altogether for 5 seconds (I only see
keepalives) then it starts up again. Ditto for frames 40477-40482
(another 5 seconds). ...
Then at around frame 92072, the client starts to send a bunch of RSTs.
Aha I'll bet that reverting the appended patch fixes the problem.

The assumption Chuck makes is that if _no_ request bytes have been sent,
yet the request is on the 'receive list' then it must be a resend is
patently false in the case where the send queue just happens to be full.
A better solution would probably be to disconnect the socket following
the ETIMEDOUT handling in call_status().

Cheers
  Trond
---
commit 43d78ef2ba5bec26d0315859e8324bfc0be23766
Author: Chuck Lever <[EMAIL PROTECTED]>
Date:   Tue Feb 6 18:26:11 2007 -0500

NFS: disconnect before retrying NFSv4 requests over TCP

RFC3530 section 3.1.1 states an NFSv4 client MUST NOT send a request
twice on the same connection unless it is the NULL procedure.  Section
3.1.1 suggests that the client should disconnect and reconnect if it
wants to retry a request.

Implement this by adding an rpc_clnt flag that an ULP can use to
specify that the underlying transport should be disconnected on a
major timeout.  The NFSv4 client asserts this new flag, and requests
no retries after a minor retransmit timeout.

Note that disconnecting on a retransmit is in general not safe to do
if the RPC client does not reuse the TCP port number when reconnecting.

See http://bugzilla.linux-nfs.org/show_bug.cgi?id=6

Signed-off-by: Chuck Lever <[EMAIL PROTECTED]>
Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]>

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index a3191f0..c46e94f 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -394,7 +394,8 @@ static void nfs_init_timeout_values(struct rpc_timeout *to, 
int proto,
 static int nfs_create_rpc_client(struct nfs_client *clp, int proto,
unsigned int timeo,
unsigned int retrans,
-   rpc_authflavor_t flavor)
+   rpc_authflavor_t flavor,
+   int flags)
 {
struct rpc_timeout  timeparms;
struct rpc_clnt *clnt = NULL;
@@ -407,6 +408,7 @@ static int nfs_create_rpc_client(struct nfs_client *clp, 
int proto,
.program= _program,
.version= clp->rpc_ops->version,
.authflavor = flavor,
+   .flags  = flags,
};
 
if (!IS_ERR(clp->cl_rpcclient))
@@ -548,7 +550,7 @@ static int nfs_init_client(struct nfs_client *clp, const 
struct nfs_mount_data *
 * - RFC 2623, sec 2.3.2
 */
error = nfs_create_rpc_client(clp, proto, data->timeo, data->retrans,
-   RPC_AUTH_UNIX);
+   RPC_AUTH_UNIX, 0);
if (error < 0)
goto error;
nfs_mark_client_ready(clp, NFS_CS_READY);
@@ -868,7 +870,8 @@ static int nfs4_init_client(struct nfs_client *clp,
/* Check NFS protocol revision and initialize RPC op vector */
clp->rpc_ops = _v4_clientops;
 
-   error = nfs_create_rpc_client(clp, proto, timeo, retrans, authflavour);
+   error = nfs_create_rpc_client(clp, proto, timeo, retrans, authflavour,
+   RPC_CLNT_CREATE_DISCRTRY);
if (error < 0)
goto error;
memcpy(clp->cl_ipaddr, ip_addr, sizeof(clp->cl_ipaddr));
diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index a1be89d..c7a78ee 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -40,6 +40,7 

Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Florin Iucha
On Wed, Apr 18, 2007 at 10:11:46AM -0400, Trond Myklebust wrote:
> Do you have a copy of wireshark or ethereal on hand? If so, could you
> take a look at whether or not any NFS traffic is going between the
> client and server once the hang happens?

I used the following command 

   tcpdump -w nfs-traffic -i eth0 -vv -tt dst port nfs

to capture

   http://iucha.net/nfs/21-rc7-nfs4/nfs-traffic.bz2

I started the capture before starting the copy and left it to run for
a few minutes after the traffic slowed to a crawl.

The iostat and vmstat are at:

   http://iucha.net/nfs/21-rc7-nfs4/iostat
   http://iucha.net/nfs/21-rc7-nfs4/vmstat
   
It seems that my original problem report had a big mistake!  There is
no hang, but at some point the write slows down to a trickle (from
40,000 blocks/s to 22 blocks/s) as can be seen from the iostat log.

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Peter Zijlstra
On Wed, 2007-04-18 at 10:19 +0200, Peter Zijlstra wrote:
> On Tue, 2007-04-17 at 21:19 -0400, Trond Myklebust wrote:
> > I've split the issues introduced by the 2.6.21-rcX write code up into 4
> > subproblems.
> > 
> > The first patch is just a cleanup in order to ease review.
> > 
> > Patch number 2 ensures that we never release the PG_writeback flag until
> > _after_ we've either discarded the unstable request altogether, or put it
> > on the nfs_inode's commit or dirty lists.
> > 
> > Patch number 3 fixes the 'desynchronized value of nfs_i.ncommit' error. It
> > uses the PG_NEED_COMMIT flag as an indicator for whether or not the request
> > may be redirtied.
> > 
> > Patch number 4 protects the NFS '.set_page_dirty' address_space operation
> > against races with nfs_inode_add_request.
> 
> Ok, stuck them in, and my debug patch from yesterday, just in case...
> 
> However, I can't seem to run long enough to establish whether the
> problem is gone. It deadlocks between 10-30 minutes due to missing IO
> completions, whereas yesterday it took between 45-60 minutes to trigger
> the 'desynchronized value of nfs_i.ncommit' messages.
> 
> I will continue trying go get a good run,

Just got one around 80-90 minutes, no 'desynchronized value of
nfs_i.ncommit' errors.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Trond Myklebust
On Wed, 2007-04-18 at 09:17 -0500, Florin Iucha wrote:
> On Wed, Apr 18, 2007 at 10:11:46AM -0400, Trond Myklebust wrote:
> > On Wed, 2007-04-18 at 08:42 -0500, Florin Iucha wrote:
> > > Could the port in CLOSE_WAIT state be the culprit?  (FWIW
> > > the server has been up for 38 days and subjected to
> > > this nfs test quite a bit without showing any stress).
> > 
> > The port in CLOSE_WAIT shows that a socket was closed down recently, but
> > once the connection is re-established, the client should start sending
> > data.
> > Do you have a copy of wireshark or ethereal on hand? If so, could you
> > take a look at whether or not any NFS traffic is going between the
> > client and server once the hang happens?
> > Note that the timeout value is 60 seconds, so if you see no immediate
> > traffic, then let the ethereal/wireshark session keep running for a
> > couple more minutes.
> 
> Should I run wireshark/ethereal on the client or on the server?

On the client, please, for the moment.

Cheers
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Florin Iucha
On Wed, Apr 18, 2007 at 10:11:46AM -0400, Trond Myklebust wrote:
> On Wed, 2007-04-18 at 08:42 -0500, Florin Iucha wrote:
> > Could the port in CLOSE_WAIT state be the culprit?  (FWIW
> > the server has been up for 38 days and subjected to
> > this nfs test quite a bit without showing any stress).
> 
> The port in CLOSE_WAIT shows that a socket was closed down recently, but
> once the connection is re-established, the client should start sending
> data.
> Do you have a copy of wireshark or ethereal on hand? If so, could you
> take a look at whether or not any NFS traffic is going between the
> client and server once the hang happens?
> Note that the timeout value is 60 seconds, so if you see no immediate
> traffic, then let the ethereal/wireshark session keep running for a
> couple more minutes.

Should I run wireshark/ethereal on the client or on the server?

I'll get a trace tonight (10 PM CST) and get back to you.

Thanks,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Florin Iucha
On Wed, Apr 18, 2007 at 08:42:25AM -0500, Florin Iucha wrote:
> On Wed, Apr 18, 2007 at 09:15:31AM -0400, Trond Myklebust wrote:
> The netstat outputs are stable (not changed in 5 minutes):
> 
>http://iucha.net/nfs/21-rc7-nfs3/netstat-server :
> 
> tcp1  0 hermes.iucha.org:nfszeus.iucha.org:799  
> CLOSE_WAIT 
> tcp0  0 hermes.iucha.org:nfszeus.iucha.org:976  
> ESTABLISHED
> 
>http://iucha.net/nfs/21-rc7-nfs3/netstat-client
> 
> Active Internet connections (w/o servers)
> Proto Recv-Q Send-Q Local Address   Foreign Address State 
>  
> tcp0  0 zeus.iucha.org:976  hermes.iucha.org:nfs
> ESTABLISHED
> tcp0  0 zeus.iucha.org:ssh  hermes.iucha.org:56880  
> ESTABLISHED
> tcp0  0 zeus.iucha.org:ssh  hermes.iucha.org:45176  
> ESTABLISHED
> 
> Could the port in CLOSE_WAIT state be the culprit?  (FWIW
> the server has been up for 38 days and subjected to
> this nfs test quite a bit without showing any stress).

The CLOSE_WAIT went away as soon as I rebooted the client.  Something
was holding it up...

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Trond Myklebust
On Wed, 2007-04-18 at 08:42 -0500, Florin Iucha wrote:
> On Wed, Apr 18, 2007 at 09:15:31AM -0400, Trond Myklebust wrote:
> > There is only one request on the 'pending' queue. That would usually
> > indicate that the connection to the server is down. Can you check using
> > "netstat -t" whether or not there is a connection in the 'ESTABLISHED'
> > state to the server? Please also repeat the command a couple of times in
> > order to see if the socket/port number on the connection changes.
> 
> This is with your fifth patch on top of the previous four patches:
> 
>http://iucha.net/nfs/21-rc7-nfs3/big-copy
> 
> Again, it has memory, stack traces and rpc_debug.
> 
> The iostat 5 output:
> 
>http://iucha.net/nfs/21-rc7-nfs3/iostat
> 
> The netstat outputs are stable (not changed in 5 minutes):
> 
>http://iucha.net/nfs/21-rc7-nfs3/netstat-server :
> 
> tcp1  0 hermes.iucha.org:nfszeus.iucha.org:799  
> CLOSE_WAIT 
> tcp0  0 hermes.iucha.org:nfszeus.iucha.org:976  
> ESTABLISHED
> 
>http://iucha.net/nfs/21-rc7-nfs3/netstat-client
> 
> Active Internet connections (w/o servers)
> Proto Recv-Q Send-Q Local Address   Foreign Address State 
>  
> tcp0  0 zeus.iucha.org:976  hermes.iucha.org:nfs
> ESTABLISHED
> tcp0  0 zeus.iucha.org:ssh  hermes.iucha.org:56880  
> ESTABLISHED
> tcp0  0 zeus.iucha.org:ssh  hermes.iucha.org:45176  
> ESTABLISHED
> 
> Could the port in CLOSE_WAIT state be the culprit?  (FWIW
> the server has been up for 38 days and subjected to
> this nfs test quite a bit without showing any stress).

The port in CLOSE_WAIT shows that a socket was closed down recently, but
once the connection is re-established, the client should start sending
data.
Do you have a copy of wireshark or ethereal on hand? If so, could you
take a look at whether or not any NFS traffic is going between the
client and server once the hang happens?
Note that the timeout value is 60 seconds, so if you see no immediate
traffic, then let the ethereal/wireshark session keep running for a
couple more minutes.

Cheers,
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Florin Iucha
On Wed, Apr 18, 2007 at 09:15:31AM -0400, Trond Myklebust wrote:
> There is only one request on the 'pending' queue. That would usually
> indicate that the connection to the server is down. Can you check using
> "netstat -t" whether or not there is a connection in the 'ESTABLISHED'
> state to the server? Please also repeat the command a couple of times in
> order to see if the socket/port number on the connection changes.

This is with your fifth patch on top of the previous four patches:

   http://iucha.net/nfs/21-rc7-nfs3/big-copy

Again, it has memory, stack traces and rpc_debug.

The iostat 5 output:

   http://iucha.net/nfs/21-rc7-nfs3/iostat

The netstat outputs are stable (not changed in 5 minutes):

   http://iucha.net/nfs/21-rc7-nfs3/netstat-server :

tcp1  0 hermes.iucha.org:nfszeus.iucha.org:799  CLOSE_WAIT 
tcp0  0 hermes.iucha.org:nfszeus.iucha.org:976  ESTABLISHED

   http://iucha.net/nfs/21-rc7-nfs3/netstat-client

Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address   Foreign Address State  
tcp0  0 zeus.iucha.org:976  hermes.iucha.org:nfsESTABLISHED
tcp0  0 zeus.iucha.org:ssh  hermes.iucha.org:56880  ESTABLISHED
tcp0  0 zeus.iucha.org:ssh  hermes.iucha.org:45176  ESTABLISHED

Could the port in CLOSE_WAIT state be the culprit?  (FWIW
the server has been up for 38 days and subjected to
this nfs test quite a bit without showing any stress).

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Trond Myklebust
On Wed, 2007-04-18 at 07:38 -0500, Florin Iucha wrote:
> On Tue, Apr 17, 2007 at 10:37:38PM -0700, Andrew Morton wrote:
> > Florin, can we please see /proc/meminfo as well?
> 
>http://iucha.net/nfs/21-rc7-nfs2/meminfo
> 
> > Also the result of `echo m > /proc/sysrq-trigger'
> 
>http://iucha.net/nfs/21-rc7-nfs2/big-copy
> 
> This has 'echo m > /proc/sysrq-trigger', 'echo t >
> /proc/sysrq-trigger' and 'echo 0 > /proc/sys/sunrpc/rpc_debug'.

Thanks.

So it looks as if you have a massive backlog of requests waiting in the
RPC layer to get sent. That would indeed trigger the BDI congestion
control stuff, and prevent you from sending more requests. The
interesting bit is this:

[  399.665314] -pid- proc flgs status -client- -prog- --rqstp- -timeout 
-rpcwait -action- ---ops--
[  399.665338] 40373 0001 0001-11 81007f418508 13 810078eb0de0  
  0 xprt_resend 804196bf 80440b10
[  399.665345] 40391 0001 0001-11 81007f418508 13 810078eb05c8  
  0 xprt_sending 804196bf 80440b10
[  399.665351] 40392 0001 0001-11 81007f418508 13 810078eb0128  
  0 xprt_sending 804196bf 80440b10
[  399.665358] 40393 0001 0001-11 81007f418508 13 810078eb1158  
  0 xprt_sending 804196bf 80440b10
[  399.665364] 40394 0001 0001-11 81007f418508 13 810078eb0f08  
  0 xprt_sending 804196bf 80440b10
[  399.665371] 40395 0001 0001-11 81007f418508 13 810078eb  
  0 xprt_sending 804196bf 80440b10
[  399.665377] 40396 0001 0001-11 81007f418508 13 810078eb1030  
  0 xprt_sending 804196bf 80440b10
[  399.665384] 40397 0001 0001-11 81007f418508 13 810078eb0cb8  
  0 xprt_sending 804196bf 80440b10
[  399.665390] 40398 0001 0001-11 81007f418508 13 810078eb06f0  
  0 xprt_sending 804196bf 80440b10
[  399.665397] 40399 0001 0001-11 81007f418508 13 810078eb0940  
  0 xprt_sending 804196bf 80440b10
[  399.665404] 40400 0001 0001-11 81007f418508 13 810078eb0818  
  0 xprt_sending 804196bf 80440b10
[  399.665410] 40401 0001 0001-11 81007f418508 13 810078eb0378  
  0 xprt_sending 804196bf 80440b10
[  399.665417] 40402 0001 0001-11 81007f418508 13 810078eb0250  
  0 xprt_sending 804196bf 80440b10
[  399.669252] 41086 0001 0001  0 81007f418508 13 810078eb0a68  
  15000 xprt_pending 804196bf 80440b10
[  399.669258] 41087 0001 0001-11 81007f418508 13 810078eb0b90  
  0 xprt_resend 804196bf 80440b10
[  399.669265] 41088 0001 0001-11 81007f418508 13 810078eb04a0  
  0 xprt_sending 804196bf 80440b10

There is only one request on the 'pending' queue. That would usually
indicate that the connection to the server is down. Can you check using
"netstat -t" whether or not there is a connection in the 'ESTABLISHED'
state to the server? Please also repeat the command a couple of times in
order to see if the socket/port number on the connection changes.

Cheers
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Florin Iucha
On Tue, Apr 17, 2007 at 10:37:38PM -0700, Andrew Morton wrote:
> Florin, can we please see /proc/meminfo as well?

   http://iucha.net/nfs/21-rc7-nfs2/meminfo

> Also the result of `echo m > /proc/sysrq-trigger'

   http://iucha.net/nfs/21-rc7-nfs2/big-copy

This has 'echo m > /proc/sysrq-trigger', 'echo t >
/proc/sysrq-trigger' and 'echo 0 > /proc/sys/sunrpc/rpc_debug'.

The output from the server's 'iostat 5' is at

   http://iucha.net/nfs/21-rc7-nfs2/iostat

This run it copied 5.6G (vs yesterday's 2.5G).

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Trond Myklebust
On Tue, 2007-04-17 at 23:07 -0500, Florin Iucha wrote:
> When 'big-copy' hangs, if I switch to a different console and run
> 'lsof', '[u]mount', or use shell completion on a network mount then that
> process goes into D state.  I cannot umount the network shares nor
> stop autofs.  I cannot do a clean reboot, I have to ssh
> in and "echo s > /proc/sysrq-trigger; echo u > /proc/sysrq-trigger;
> echo b > /proc/sysrq-trigger" .

What happens if you issue "echo 0 >/proc/sys/sunrpc/rpc_debug"?

> I am not mounting anything using CIFS, but I could give it a try.
> 
> I could transfer 75 GB without hiccup with 2.6.19 using NFS4 and CIFS,
> and with 2.6.20 using CIFS.  2.6.20 works fine under reasonably light
> load, with gnome sessions logging in and out several times a day.

How about NFSv3? I'd like to eliminate any issues with NFSv4 state.

I've also attached a little patch that I used in order to debug the list
consistency issues. Could you try it on top of the 4 I sent last night?

Cheers
  Trond


--- Begin Message ---
Adds consistency checks for nfs_page list operations

Signed-off-by: Trond Myklebust <[EMAIL PROTECTED]>
---

 fs/nfs/write.c   |8 ++--
 include/linux/nfs_page.h |3 +++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index cadbf3c..9be626d 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -194,6 +194,7 @@ static int nfs_writepage_setup(struct nfs_open_context 
*ctx, struct page *page,
nfs_grow_file(page, offset, count);
/* Set the PG_uptodate flag? */
nfs_mark_uptodate(page, offset, count);
+   WARN_ON(test_bit(PG_NEED_COMMIT,&(req)->wb_flags));
nfs_unlock_request(req);
return 0;
 }
@@ -459,6 +460,7 @@ nfs_mark_request_commit(struct nfs_page *req)
struct inode *inode = req->wb_context->dentry->d_inode;
struct nfs_inode *nfsi = NFS_I(inode);
 
+   WARN_ON(nfs_dirty_request(req));
spin_lock(>req_lock);
nfs_list_add_request(req, >commit);
nfsi->ncommit++;
@@ -552,7 +554,7 @@ static void nfs_cancel_commit_list(struct list_head *head)
req = nfs_list_entry(head->next);
dec_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
nfs_list_remove_request(req);
-   clear_bit(PG_NEED_COMMIT, &(req)->wb_flags);
+   WARN_ON(!test_and_clear_bit(PG_NEED_COMMIT,&(req)->wb_flags));
nfs_inode_remove_request(req);
nfs_unlock_request(req);
}
@@ -1033,6 +1035,7 @@ static void nfs_writeback_done_full(struct rpc_task 
*task, void *calldata)
 
if (nfs_write_need_commit(data)) {
memcpy(>wb_verf, >verf, 
sizeof(req->wb_verf));
+   set_bit(PG_NEED_COMMIT,&(req)->wb_flags);
nfs_mark_request_commit(req);
nfs_end_page_writeback(page);
dprintk(" marked for commit\n");
@@ -1206,6 +1209,7 @@ nfs_commit_list(struct inode *inode, struct list_head 
*head, int how)
nfs_list_remove_request(req);
nfs_mark_request_commit(req);
dec_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
+   WARN_ON(!test_and_clear_bit(PG_NEED_COMMIT,&(req)->wb_flags));
nfs_clear_page_writeback(req);
}
return -ENOMEM;
@@ -1229,7 +1233,7 @@ static void nfs_commit_done(struct rpc_task *task, void 
*calldata)
while (!list_empty(>pages)) {
req = nfs_list_entry(data->pages.next);
nfs_list_remove_request(req);
-   clear_bit(PG_NEED_COMMIT, &(req)->wb_flags);
+   WARN_ON(!test_and_clear_bit(PG_NEED_COMMIT,&(req)->wb_flags));
dec_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
 
dprintk("NFS: commit (%s/%Ld [EMAIL PROTECTED])",
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 41afab6..75c2d34 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -116,6 +116,9 @@ nfs_lock_request(struct nfs_page *req)
 static inline void
 nfs_list_add_request(struct nfs_page *req, struct list_head *head)
 {
+   BUG_ON(!list_empty(>wb_list));
+   BUG_ON(req->wb_list_head != NULL);
+
list_add_tail(>wb_list, head);
req->wb_list_head = head;
 }
--- End Message ---


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread OGAWA Hirofumi
Trond Myklebust <[EMAIL PROTECTED]> writes:

> I was mainly interested in feedback from Peter, Florin and Ogawa-san to
> find out if this series fixes their problems. You were unfortunate
> enough to have been on earlier Ccs, so I didn't dare trim you off. :-)

Sorry. I'm trying to reproduce that, but unfortunately I can't
reproduce that on unpatched kernel for now.

FWIW, syslog is here.

Apr 17 03:54:55 duaron kernel: [drm] Initialized i915 1.6.0 20060119 on minor 0
Apr 17 03:58:31 duaron kernel: tun: Universal TUN/TAP device driver, 1.6
Apr 17 03:58:31 duaron kernel: tun: (C) 1999-2004 Max Krasnyansky <[EMAIL 
PROTECTED]>
Apr 17 03:58:44 duaron kernel: tun0: no IPv6 routers present
Apr 17 13:33:13 duaron kernel: NFS: desynchronized value of nfs_i.ncommit.
Apr 17 13:33:47 duaron last message repeated 16 times
Apr 17 13:34:49 duaron last message repeated 32 times
Apr 17 13:35:19 duaron last message repeated 7 times
Apr 17 13:39:39 duaron kernel: NFS: desynchronized value of nfs_i.ncommit.
Apr 17 13:40:39 duaron last message repeated 3 times
Apr 17 13:42:26 duaron kernel: NFS: desynchronized value of nfs_i.ncommit.
Apr 17 13:42:30 duaron last message repeated 3 times
Apr 18 00:58:22 duaron kernel: NFS: desynchronized value of nfs_i.ncommit.
Apr 18 00:59:09 duaron last message repeated 129 times
Apr 18 01:00:31 duaron last message repeated 24 times
Apr 18 01:02:16 duaron kernel: NFS: desynchronized value of nfs_i.ncommit.
Apr 18 01:03:25 duaron last message repeated 47 times
Apr 18 01:04:30 duaron last message repeated 16 times
Apr 18 01:05:31 duaron last message repeated 28 times
Apr 18 01:06:33 duaron last message repeated 32 times
-- 
OGAWA Hirofumi <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Peter Zijlstra
On Tue, 2007-04-17 at 21:19 -0400, Trond Myklebust wrote:
> I've split the issues introduced by the 2.6.21-rcX write code up into 4
> subproblems.
> 
> The first patch is just a cleanup in order to ease review.
> 
> Patch number 2 ensures that we never release the PG_writeback flag until
> _after_ we've either discarded the unstable request altogether, or put it
> on the nfs_inode's commit or dirty lists.
> 
> Patch number 3 fixes the 'desynchronized value of nfs_i.ncommit' error. It
> uses the PG_NEED_COMMIT flag as an indicator for whether or not the request
> may be redirtied.
> 
> Patch number 4 protects the NFS '.set_page_dirty' address_space operation
> against races with nfs_inode_add_request.

Ok, stuck them in, and my debug patch from yesterday, just in case...

However, I can't seem to run long enough to establish whether the
problem is gone. It deadlocks between 10-30 minutes due to missing IO
completions, whereas yesterday it took between 45-60 minutes to trigger
the 'desynchronized value of nfs_i.ncommit' messages.

I will continue trying go get a good run, however if you got some
(perhaps experimental .22) patches you want me to try..



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Peter Zijlstra
On Tue, 2007-04-17 at 21:19 -0400, Trond Myklebust wrote:
 I've split the issues introduced by the 2.6.21-rcX write code up into 4
 subproblems.
 
 The first patch is just a cleanup in order to ease review.
 
 Patch number 2 ensures that we never release the PG_writeback flag until
 _after_ we've either discarded the unstable request altogether, or put it
 on the nfs_inode's commit or dirty lists.
 
 Patch number 3 fixes the 'desynchronized value of nfs_i.ncommit' error. It
 uses the PG_NEED_COMMIT flag as an indicator for whether or not the request
 may be redirtied.
 
 Patch number 4 protects the NFS '.set_page_dirty' address_space operation
 against races with nfs_inode_add_request.

Ok, stuck them in, and my debug patch from yesterday, just in case...

However, I can't seem to run long enough to establish whether the
problem is gone. It deadlocks between 10-30 minutes due to missing IO
completions, whereas yesterday it took between 45-60 minutes to trigger
the 'desynchronized value of nfs_i.ncommit' messages.

I will continue trying go get a good run, however if you got some
(perhaps experimental .22) patches you want me to try..



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread OGAWA Hirofumi
Trond Myklebust [EMAIL PROTECTED] writes:

 I was mainly interested in feedback from Peter, Florin and Ogawa-san to
 find out if this series fixes their problems. You were unfortunate
 enough to have been on earlier Ccs, so I didn't dare trim you off. :-)

Sorry. I'm trying to reproduce that, but unfortunately I can't
reproduce that on unpatched kernel for now.

FWIW, syslog is here.

Apr 17 03:54:55 duaron kernel: [drm] Initialized i915 1.6.0 20060119 on minor 0
Apr 17 03:58:31 duaron kernel: tun: Universal TUN/TAP device driver, 1.6
Apr 17 03:58:31 duaron kernel: tun: (C) 1999-2004 Max Krasnyansky [EMAIL 
PROTECTED]
Apr 17 03:58:44 duaron kernel: tun0: no IPv6 routers present
Apr 17 13:33:13 duaron kernel: NFS: desynchronized value of nfs_i.ncommit.
Apr 17 13:33:47 duaron last message repeated 16 times
Apr 17 13:34:49 duaron last message repeated 32 times
Apr 17 13:35:19 duaron last message repeated 7 times
Apr 17 13:39:39 duaron kernel: NFS: desynchronized value of nfs_i.ncommit.
Apr 17 13:40:39 duaron last message repeated 3 times
Apr 17 13:42:26 duaron kernel: NFS: desynchronized value of nfs_i.ncommit.
Apr 17 13:42:30 duaron last message repeated 3 times
Apr 18 00:58:22 duaron kernel: NFS: desynchronized value of nfs_i.ncommit.
Apr 18 00:59:09 duaron last message repeated 129 times
Apr 18 01:00:31 duaron last message repeated 24 times
Apr 18 01:02:16 duaron kernel: NFS: desynchronized value of nfs_i.ncommit.
Apr 18 01:03:25 duaron last message repeated 47 times
Apr 18 01:04:30 duaron last message repeated 16 times
Apr 18 01:05:31 duaron last message repeated 28 times
Apr 18 01:06:33 duaron last message repeated 32 times
-- 
OGAWA Hirofumi [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Trond Myklebust
On Tue, 2007-04-17 at 23:07 -0500, Florin Iucha wrote:
 When 'big-copy' hangs, if I switch to a different console and run
 'lsof', '[u]mount', or use shell completion on a network mount then that
 process goes into D state.  I cannot umount the network shares nor
 stop autofs.  I cannot do a clean reboot, I have to ssh
 in and echo s  /proc/sysrq-trigger; echo u  /proc/sysrq-trigger;
 echo b  /proc/sysrq-trigger .

What happens if you issue echo 0 /proc/sys/sunrpc/rpc_debug?

 I am not mounting anything using CIFS, but I could give it a try.
 
 I could transfer 75 GB without hiccup with 2.6.19 using NFS4 and CIFS,
 and with 2.6.20 using CIFS.  2.6.20 works fine under reasonably light
 load, with gnome sessions logging in and out several times a day.

How about NFSv3? I'd like to eliminate any issues with NFSv4 state.

I've also attached a little patch that I used in order to debug the list
consistency issues. Could you try it on top of the 4 I sent last night?

Cheers
  Trond


---BeginMessage---
Adds consistency checks for nfs_page list operations

Signed-off-by: Trond Myklebust [EMAIL PROTECTED]
---

 fs/nfs/write.c   |8 ++--
 include/linux/nfs_page.h |3 +++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index cadbf3c..9be626d 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -194,6 +194,7 @@ static int nfs_writepage_setup(struct nfs_open_context 
*ctx, struct page *page,
nfs_grow_file(page, offset, count);
/* Set the PG_uptodate flag? */
nfs_mark_uptodate(page, offset, count);
+   WARN_ON(test_bit(PG_NEED_COMMIT,(req)-wb_flags));
nfs_unlock_request(req);
return 0;
 }
@@ -459,6 +460,7 @@ nfs_mark_request_commit(struct nfs_page *req)
struct inode *inode = req-wb_context-dentry-d_inode;
struct nfs_inode *nfsi = NFS_I(inode);
 
+   WARN_ON(nfs_dirty_request(req));
spin_lock(nfsi-req_lock);
nfs_list_add_request(req, nfsi-commit);
nfsi-ncommit++;
@@ -552,7 +554,7 @@ static void nfs_cancel_commit_list(struct list_head *head)
req = nfs_list_entry(head-next);
dec_zone_page_state(req-wb_page, NR_UNSTABLE_NFS);
nfs_list_remove_request(req);
-   clear_bit(PG_NEED_COMMIT, (req)-wb_flags);
+   WARN_ON(!test_and_clear_bit(PG_NEED_COMMIT,(req)-wb_flags));
nfs_inode_remove_request(req);
nfs_unlock_request(req);
}
@@ -1033,6 +1035,7 @@ static void nfs_writeback_done_full(struct rpc_task 
*task, void *calldata)
 
if (nfs_write_need_commit(data)) {
memcpy(req-wb_verf, data-verf, 
sizeof(req-wb_verf));
+   set_bit(PG_NEED_COMMIT,(req)-wb_flags);
nfs_mark_request_commit(req);
nfs_end_page_writeback(page);
dprintk( marked for commit\n);
@@ -1206,6 +1209,7 @@ nfs_commit_list(struct inode *inode, struct list_head 
*head, int how)
nfs_list_remove_request(req);
nfs_mark_request_commit(req);
dec_zone_page_state(req-wb_page, NR_UNSTABLE_NFS);
+   WARN_ON(!test_and_clear_bit(PG_NEED_COMMIT,(req)-wb_flags));
nfs_clear_page_writeback(req);
}
return -ENOMEM;
@@ -1229,7 +1233,7 @@ static void nfs_commit_done(struct rpc_task *task, void 
*calldata)
while (!list_empty(data-pages)) {
req = nfs_list_entry(data-pages.next);
nfs_list_remove_request(req);
-   clear_bit(PG_NEED_COMMIT, (req)-wb_flags);
+   WARN_ON(!test_and_clear_bit(PG_NEED_COMMIT,(req)-wb_flags));
dec_zone_page_state(req-wb_page, NR_UNSTABLE_NFS);
 
dprintk(NFS: commit (%s/%Ld [EMAIL PROTECTED]),
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 41afab6..75c2d34 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -116,6 +116,9 @@ nfs_lock_request(struct nfs_page *req)
 static inline void
 nfs_list_add_request(struct nfs_page *req, struct list_head *head)
 {
+   BUG_ON(!list_empty(req-wb_list));
+   BUG_ON(req-wb_list_head != NULL);
+
list_add_tail(req-wb_list, head);
req-wb_list_head = head;
 }
---End Message---


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Florin Iucha
On Tue, Apr 17, 2007 at 10:37:38PM -0700, Andrew Morton wrote:
 Florin, can we please see /proc/meminfo as well?

   http://iucha.net/nfs/21-rc7-nfs2/meminfo

 Also the result of `echo m  /proc/sysrq-trigger'

   http://iucha.net/nfs/21-rc7-nfs2/big-copy

This has 'echo m  /proc/sysrq-trigger', 'echo t 
/proc/sysrq-trigger' and 'echo 0  /proc/sys/sunrpc/rpc_debug'.

The output from the server's 'iostat 5' is at

   http://iucha.net/nfs/21-rc7-nfs2/iostat

This run it copied 5.6G (vs yesterday's 2.5G).

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Trond Myklebust
On Wed, 2007-04-18 at 07:38 -0500, Florin Iucha wrote:
 On Tue, Apr 17, 2007 at 10:37:38PM -0700, Andrew Morton wrote:
  Florin, can we please see /proc/meminfo as well?
 
http://iucha.net/nfs/21-rc7-nfs2/meminfo
 
  Also the result of `echo m  /proc/sysrq-trigger'
 
http://iucha.net/nfs/21-rc7-nfs2/big-copy
 
 This has 'echo m  /proc/sysrq-trigger', 'echo t 
 /proc/sysrq-trigger' and 'echo 0  /proc/sys/sunrpc/rpc_debug'.

Thanks.

So it looks as if you have a massive backlog of requests waiting in the
RPC layer to get sent. That would indeed trigger the BDI congestion
control stuff, and prevent you from sending more requests. The
interesting bit is this:

[  399.665314] -pid- proc flgs status -client- -prog- --rqstp- -timeout 
-rpcwait -action- ---ops--
[  399.665338] 40373 0001 0001-11 81007f418508 13 810078eb0de0  
  0 xprt_resend 804196bf 80440b10
[  399.665345] 40391 0001 0001-11 81007f418508 13 810078eb05c8  
  0 xprt_sending 804196bf 80440b10
[  399.665351] 40392 0001 0001-11 81007f418508 13 810078eb0128  
  0 xprt_sending 804196bf 80440b10
[  399.665358] 40393 0001 0001-11 81007f418508 13 810078eb1158  
  0 xprt_sending 804196bf 80440b10
[  399.665364] 40394 0001 0001-11 81007f418508 13 810078eb0f08  
  0 xprt_sending 804196bf 80440b10
[  399.665371] 40395 0001 0001-11 81007f418508 13 810078eb  
  0 xprt_sending 804196bf 80440b10
[  399.665377] 40396 0001 0001-11 81007f418508 13 810078eb1030  
  0 xprt_sending 804196bf 80440b10
[  399.665384] 40397 0001 0001-11 81007f418508 13 810078eb0cb8  
  0 xprt_sending 804196bf 80440b10
[  399.665390] 40398 0001 0001-11 81007f418508 13 810078eb06f0  
  0 xprt_sending 804196bf 80440b10
[  399.665397] 40399 0001 0001-11 81007f418508 13 810078eb0940  
  0 xprt_sending 804196bf 80440b10
[  399.665404] 40400 0001 0001-11 81007f418508 13 810078eb0818  
  0 xprt_sending 804196bf 80440b10
[  399.665410] 40401 0001 0001-11 81007f418508 13 810078eb0378  
  0 xprt_sending 804196bf 80440b10
[  399.665417] 40402 0001 0001-11 81007f418508 13 810078eb0250  
  0 xprt_sending 804196bf 80440b10
[  399.669252] 41086 0001 0001  0 81007f418508 13 810078eb0a68  
  15000 xprt_pending 804196bf 80440b10
[  399.669258] 41087 0001 0001-11 81007f418508 13 810078eb0b90  
  0 xprt_resend 804196bf 80440b10
[  399.669265] 41088 0001 0001-11 81007f418508 13 810078eb04a0  
  0 xprt_sending 804196bf 80440b10

There is only one request on the 'pending' queue. That would usually
indicate that the connection to the server is down. Can you check using
netstat -t whether or not there is a connection in the 'ESTABLISHED'
state to the server? Please also repeat the command a couple of times in
order to see if the socket/port number on the connection changes.

Cheers
  Trond
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Florin Iucha
On Wed, Apr 18, 2007 at 09:15:31AM -0400, Trond Myklebust wrote:
 There is only one request on the 'pending' queue. That would usually
 indicate that the connection to the server is down. Can you check using
 netstat -t whether or not there is a connection in the 'ESTABLISHED'
 state to the server? Please also repeat the command a couple of times in
 order to see if the socket/port number on the connection changes.

This is with your fifth patch on top of the previous four patches:

   http://iucha.net/nfs/21-rc7-nfs3/big-copy

Again, it has memory, stack traces and rpc_debug.

The iostat 5 output:

   http://iucha.net/nfs/21-rc7-nfs3/iostat

The netstat outputs are stable (not changed in 5 minutes):

   http://iucha.net/nfs/21-rc7-nfs3/netstat-server :

tcp1  0 hermes.iucha.org:nfszeus.iucha.org:799  CLOSE_WAIT 
tcp0  0 hermes.iucha.org:nfszeus.iucha.org:976  ESTABLISHED

   http://iucha.net/nfs/21-rc7-nfs3/netstat-client

Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address   Foreign Address State  
tcp0  0 zeus.iucha.org:976  hermes.iucha.org:nfsESTABLISHED
tcp0  0 zeus.iucha.org:ssh  hermes.iucha.org:56880  ESTABLISHED
tcp0  0 zeus.iucha.org:ssh  hermes.iucha.org:45176  ESTABLISHED

Could the port in CLOSE_WAIT state be the culprit?  (FWIW
the server has been up for 38 days and subjected to
this nfs test quite a bit without showing any stress).

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Trond Myklebust
On Wed, 2007-04-18 at 08:42 -0500, Florin Iucha wrote:
 On Wed, Apr 18, 2007 at 09:15:31AM -0400, Trond Myklebust wrote:
  There is only one request on the 'pending' queue. That would usually
  indicate that the connection to the server is down. Can you check using
  netstat -t whether or not there is a connection in the 'ESTABLISHED'
  state to the server? Please also repeat the command a couple of times in
  order to see if the socket/port number on the connection changes.
 
 This is with your fifth patch on top of the previous four patches:
 
http://iucha.net/nfs/21-rc7-nfs3/big-copy
 
 Again, it has memory, stack traces and rpc_debug.
 
 The iostat 5 output:
 
http://iucha.net/nfs/21-rc7-nfs3/iostat
 
 The netstat outputs are stable (not changed in 5 minutes):
 
http://iucha.net/nfs/21-rc7-nfs3/netstat-server :
 
 tcp1  0 hermes.iucha.org:nfszeus.iucha.org:799  
 CLOSE_WAIT 
 tcp0  0 hermes.iucha.org:nfszeus.iucha.org:976  
 ESTABLISHED
 
http://iucha.net/nfs/21-rc7-nfs3/netstat-client
 
 Active Internet connections (w/o servers)
 Proto Recv-Q Send-Q Local Address   Foreign Address State 
  
 tcp0  0 zeus.iucha.org:976  hermes.iucha.org:nfs
 ESTABLISHED
 tcp0  0 zeus.iucha.org:ssh  hermes.iucha.org:56880  
 ESTABLISHED
 tcp0  0 zeus.iucha.org:ssh  hermes.iucha.org:45176  
 ESTABLISHED
 
 Could the port in CLOSE_WAIT state be the culprit?  (FWIW
 the server has been up for 38 days and subjected to
 this nfs test quite a bit without showing any stress).

The port in CLOSE_WAIT shows that a socket was closed down recently, but
once the connection is re-established, the client should start sending
data.
Do you have a copy of wireshark or ethereal on hand? If so, could you
take a look at whether or not any NFS traffic is going between the
client and server once the hang happens?
Note that the timeout value is 60 seconds, so if you see no immediate
traffic, then let the ethereal/wireshark session keep running for a
couple more minutes.

Cheers,
  Trond
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Florin Iucha
On Wed, Apr 18, 2007 at 08:42:25AM -0500, Florin Iucha wrote:
 On Wed, Apr 18, 2007 at 09:15:31AM -0400, Trond Myklebust wrote:
 The netstat outputs are stable (not changed in 5 minutes):
 
http://iucha.net/nfs/21-rc7-nfs3/netstat-server :
 
 tcp1  0 hermes.iucha.org:nfszeus.iucha.org:799  
 CLOSE_WAIT 
 tcp0  0 hermes.iucha.org:nfszeus.iucha.org:976  
 ESTABLISHED
 
http://iucha.net/nfs/21-rc7-nfs3/netstat-client
 
 Active Internet connections (w/o servers)
 Proto Recv-Q Send-Q Local Address   Foreign Address State 
  
 tcp0  0 zeus.iucha.org:976  hermes.iucha.org:nfs
 ESTABLISHED
 tcp0  0 zeus.iucha.org:ssh  hermes.iucha.org:56880  
 ESTABLISHED
 tcp0  0 zeus.iucha.org:ssh  hermes.iucha.org:45176  
 ESTABLISHED
 
 Could the port in CLOSE_WAIT state be the culprit?  (FWIW
 the server has been up for 38 days and subjected to
 this nfs test quite a bit without showing any stress).

The CLOSE_WAIT went away as soon as I rebooted the client.  Something
was holding it up...

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Florin Iucha
On Wed, Apr 18, 2007 at 10:11:46AM -0400, Trond Myklebust wrote:
 On Wed, 2007-04-18 at 08:42 -0500, Florin Iucha wrote:
  Could the port in CLOSE_WAIT state be the culprit?  (FWIW
  the server has been up for 38 days and subjected to
  this nfs test quite a bit without showing any stress).
 
 The port in CLOSE_WAIT shows that a socket was closed down recently, but
 once the connection is re-established, the client should start sending
 data.
 Do you have a copy of wireshark or ethereal on hand? If so, could you
 take a look at whether or not any NFS traffic is going between the
 client and server once the hang happens?
 Note that the timeout value is 60 seconds, so if you see no immediate
 traffic, then let the ethereal/wireshark session keep running for a
 couple more minutes.

Should I run wireshark/ethereal on the client or on the server?

I'll get a trace tonight (10 PM CST) and get back to you.

Thanks,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Trond Myklebust
On Wed, 2007-04-18 at 09:17 -0500, Florin Iucha wrote:
 On Wed, Apr 18, 2007 at 10:11:46AM -0400, Trond Myklebust wrote:
  On Wed, 2007-04-18 at 08:42 -0500, Florin Iucha wrote:
   Could the port in CLOSE_WAIT state be the culprit?  (FWIW
   the server has been up for 38 days and subjected to
   this nfs test quite a bit without showing any stress).
  
  The port in CLOSE_WAIT shows that a socket was closed down recently, but
  once the connection is re-established, the client should start sending
  data.
  Do you have a copy of wireshark or ethereal on hand? If so, could you
  take a look at whether or not any NFS traffic is going between the
  client and server once the hang happens?
  Note that the timeout value is 60 seconds, so if you see no immediate
  traffic, then let the ethereal/wireshark session keep running for a
  couple more minutes.
 
 Should I run wireshark/ethereal on the client or on the server?

On the client, please, for the moment.

Cheers
  Trond
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Peter Zijlstra
On Wed, 2007-04-18 at 10:19 +0200, Peter Zijlstra wrote:
 On Tue, 2007-04-17 at 21:19 -0400, Trond Myklebust wrote:
  I've split the issues introduced by the 2.6.21-rcX write code up into 4
  subproblems.
  
  The first patch is just a cleanup in order to ease review.
  
  Patch number 2 ensures that we never release the PG_writeback flag until
  _after_ we've either discarded the unstable request altogether, or put it
  on the nfs_inode's commit or dirty lists.
  
  Patch number 3 fixes the 'desynchronized value of nfs_i.ncommit' error. It
  uses the PG_NEED_COMMIT flag as an indicator for whether or not the request
  may be redirtied.
  
  Patch number 4 protects the NFS '.set_page_dirty' address_space operation
  against races with nfs_inode_add_request.
 
 Ok, stuck them in, and my debug patch from yesterday, just in case...
 
 However, I can't seem to run long enough to establish whether the
 problem is gone. It deadlocks between 10-30 minutes due to missing IO
 completions, whereas yesterday it took between 45-60 minutes to trigger
 the 'desynchronized value of nfs_i.ncommit' messages.
 
 I will continue trying go get a good run,

Just got one around 80-90 minutes, no 'desynchronized value of
nfs_i.ncommit' errors.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Florin Iucha
On Wed, Apr 18, 2007 at 10:11:46AM -0400, Trond Myklebust wrote:
 Do you have a copy of wireshark or ethereal on hand? If so, could you
 take a look at whether or not any NFS traffic is going between the
 client and server once the hang happens?

I used the following command 

   tcpdump -w nfs-traffic -i eth0 -vv -tt dst port nfs

to capture

   http://iucha.net/nfs/21-rc7-nfs4/nfs-traffic.bz2

I started the capture before starting the copy and left it to run for
a few minutes after the traffic slowed to a crawl.

The iostat and vmstat are at:

   http://iucha.net/nfs/21-rc7-nfs4/iostat
   http://iucha.net/nfs/21-rc7-nfs4/vmstat
   
It seems that my original problem report had a big mistake!  There is
no hang, but at some point the write slows down to a trickle (from
40,000 blocks/s to 22 blocks/s) as can be seen from the iostat log.

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Trond Myklebust
On Wed, 2007-04-18 at 20:52 -0500, Florin Iucha wrote:
 On Wed, Apr 18, 2007 at 10:11:46AM -0400, Trond Myklebust wrote:
  Do you have a copy of wireshark or ethereal on hand? If so, could you
  take a look at whether or not any NFS traffic is going between the
  client and server once the hang happens?
 
 I used the following command 
 
tcpdump -w nfs-traffic -i eth0 -vv -tt dst port nfs
 
 to capture
 
http://iucha.net/nfs/21-rc7-nfs4/nfs-traffic.bz2
 
 I started the capture before starting the copy and left it to run for
 a few minutes after the traffic slowed to a crawl.
 
 The iostat and vmstat are at:
 
http://iucha.net/nfs/21-rc7-nfs4/iostat
http://iucha.net/nfs/21-rc7-nfs4/vmstat

 It seems that my original problem report had a big mistake!  There is
 no hang, but at some point the write slows down to a trickle (from
 40,000 blocks/s to 22 blocks/s) as can be seen from the iostat log.

Yeah. You only captured the outgoing traffic to the server, but already
it looks as if there were 'interesting' things going on. In frames 29346
to 29350, the traffic stops altogether for 5 seconds (I only see
keepalives) then it starts up again. Ditto for frames 40477-40482
(another 5 seconds). ...
Then at around frame 92072, the client starts to send a bunch of RSTs.
Aha I'll bet that reverting the appended patch fixes the problem.

The assumption Chuck makes is that if _no_ request bytes have been sent,
yet the request is on the 'receive list' then it must be a resend is
patently false in the case where the send queue just happens to be full.
A better solution would probably be to disconnect the socket following
the ETIMEDOUT handling in call_status().

Cheers
  Trond
---
commit 43d78ef2ba5bec26d0315859e8324bfc0be23766
Author: Chuck Lever [EMAIL PROTECTED]
Date:   Tue Feb 6 18:26:11 2007 -0500

NFS: disconnect before retrying NFSv4 requests over TCP

RFC3530 section 3.1.1 states an NFSv4 client MUST NOT send a request
twice on the same connection unless it is the NULL procedure.  Section
3.1.1 suggests that the client should disconnect and reconnect if it
wants to retry a request.

Implement this by adding an rpc_clnt flag that an ULP can use to
specify that the underlying transport should be disconnected on a
major timeout.  The NFSv4 client asserts this new flag, and requests
no retries after a minor retransmit timeout.

Note that disconnecting on a retransmit is in general not safe to do
if the RPC client does not reuse the TCP port number when reconnecting.

See http://bugzilla.linux-nfs.org/show_bug.cgi?id=6

Signed-off-by: Chuck Lever [EMAIL PROTECTED]
Signed-off-by: Trond Myklebust [EMAIL PROTECTED]

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index a3191f0..c46e94f 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -394,7 +394,8 @@ static void nfs_init_timeout_values(struct rpc_timeout *to, 
int proto,
 static int nfs_create_rpc_client(struct nfs_client *clp, int proto,
unsigned int timeo,
unsigned int retrans,
-   rpc_authflavor_t flavor)
+   rpc_authflavor_t flavor,
+   int flags)
 {
struct rpc_timeout  timeparms;
struct rpc_clnt *clnt = NULL;
@@ -407,6 +408,7 @@ static int nfs_create_rpc_client(struct nfs_client *clp, 
int proto,
.program= nfs_program,
.version= clp-rpc_ops-version,
.authflavor = flavor,
+   .flags  = flags,
};
 
if (!IS_ERR(clp-cl_rpcclient))
@@ -548,7 +550,7 @@ static int nfs_init_client(struct nfs_client *clp, const 
struct nfs_mount_data *
 * - RFC 2623, sec 2.3.2
 */
error = nfs_create_rpc_client(clp, proto, data-timeo, data-retrans,
-   RPC_AUTH_UNIX);
+   RPC_AUTH_UNIX, 0);
if (error  0)
goto error;
nfs_mark_client_ready(clp, NFS_CS_READY);
@@ -868,7 +870,8 @@ static int nfs4_init_client(struct nfs_client *clp,
/* Check NFS protocol revision and initialize RPC op vector */
clp-rpc_ops = nfs_v4_clientops;
 
-   error = nfs_create_rpc_client(clp, proto, timeo, retrans, authflavour);
+   error = nfs_create_rpc_client(clp, proto, timeo, retrans, authflavour,
+   RPC_CLNT_CREATE_DISCRTRY);
if (error  0)
goto error;
memcpy(clp-cl_ipaddr, ip_addr, sizeof(clp-cl_ipaddr));
diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index a1be89d..c7a78ee 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -40,6 +40,7 @@ struct rpc_clnt {
 

Success! Was: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Florin Iucha
On Wed, Apr 18, 2007 at 10:45:13PM -0400, Trond Myklebust wrote:
 On Wed, 2007-04-18 at 20:52 -0500, Florin Iucha wrote:
  It seems that my original problem report had a big mistake!  There is
  no hang, but at some point the write slows down to a trickle (from
  40,000 blocks/s to 22 blocks/s) as can be seen from the iostat log.
 
 Yeah. You only captured the outgoing traffic to the server, but already
 it looks as if there were 'interesting' things going on. In frames 29346
 to 29350, the traffic stops altogether for 5 seconds (I only see
 keepalives) then it starts up again. Ditto for frames 40477-40482
 (another 5 seconds). ...
 Then at around frame 92072, the client starts to send a bunch of RSTs.
 Aha I'll bet that reverting the appended patch fixes the problem.

You win!

Reverting this patch (on top of your previous 5) allowed the big copy
to complete (70GB) as well as successful log-in to gnome!

Acked-By: Florin Iucha [EMAIL PROTECTED]

Thanks so much for the patience with this elusive bug and stubborn
bugreporter!

Regards,
florin

 ---
 commit 43d78ef2ba5bec26d0315859e8324bfc0be23766
 Author: Chuck Lever [EMAIL PROTECTED]
 Date:   Tue Feb 6 18:26:11 2007 -0500
 
 NFS: disconnect before retrying NFSv4 requests over TCP
 
 RFC3530 section 3.1.1 states an NFSv4 client MUST NOT send a request
 twice on the same connection unless it is the NULL procedure.  Section
 3.1.1 suggests that the client should disconnect and reconnect if it
 wants to retry a request.
 
 Implement this by adding an rpc_clnt flag that an ULP can use to
 specify that the underlying transport should be disconnected on a
 major timeout.  The NFSv4 client asserts this new flag, and requests
 no retries after a minor retransmit timeout.
 
 Note that disconnecting on a retransmit is in general not safe to do
 if the RPC client does not reuse the TCP port number when reconnecting.
 
 See http://bugzilla.linux-nfs.org/show_bug.cgi?id=6
 
 Signed-off-by: Chuck Lever [EMAIL PROTECTED]
 Signed-off-by: Trond Myklebust [EMAIL PROTECTED]
 
 diff --git a/fs/nfs/client.c b/fs/nfs/client.c
 index a3191f0..c46e94f 100644
 --- a/fs/nfs/client.c
 +++ b/fs/nfs/client.c
 @@ -394,7 +394,8 @@ static void nfs_init_timeout_values(struct rpc_timeout 
 *to, int proto,
  static int nfs_create_rpc_client(struct nfs_client *clp, int proto,
   unsigned int timeo,
   unsigned int retrans,
 - rpc_authflavor_t flavor)
 + rpc_authflavor_t flavor,
 + int flags)
  {
   struct rpc_timeout  timeparms;
   struct rpc_clnt *clnt = NULL;
 @@ -407,6 +408,7 @@ static int nfs_create_rpc_client(struct nfs_client *clp, 
 int proto,
   .program= nfs_program,
   .version= clp-rpc_ops-version,
   .authflavor = flavor,
 + .flags  = flags,
   };
  
   if (!IS_ERR(clp-cl_rpcclient))
 @@ -548,7 +550,7 @@ static int nfs_init_client(struct nfs_client *clp, const 
 struct nfs_mount_data *
* - RFC 2623, sec 2.3.2
*/
   error = nfs_create_rpc_client(clp, proto, data-timeo, data-retrans,
 - RPC_AUTH_UNIX);
 + RPC_AUTH_UNIX, 0);
   if (error  0)
   goto error;
   nfs_mark_client_ready(clp, NFS_CS_READY);
 @@ -868,7 +870,8 @@ static int nfs4_init_client(struct nfs_client *clp,
   /* Check NFS protocol revision and initialize RPC op vector */
   clp-rpc_ops = nfs_v4_clientops;
  
 - error = nfs_create_rpc_client(clp, proto, timeo, retrans, authflavour);
 + error = nfs_create_rpc_client(clp, proto, timeo, retrans, authflavour,
 + RPC_CLNT_CREATE_DISCRTRY);
   if (error  0)
   goto error;
   memcpy(clp-cl_ipaddr, ip_addr, sizeof(clp-cl_ipaddr));
 diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
 index a1be89d..c7a78ee 100644
 --- a/include/linux/sunrpc/clnt.h
 +++ b/include/linux/sunrpc/clnt.h
 @@ -40,6 +40,7 @@ struct rpc_clnt {
  
   unsigned intcl_softrtry : 1,/* soft timeouts */
   cl_intr : 1,/* interruptible */
 + cl_discrtry : 1,/* disconnect before retry */
   cl_autobind : 1,/* use getport() */
   cl_oneshot  : 1,/* dispose after use */
   cl_dead : 1;/* abandoned */
 @@ -111,6 +112,7 @@ struct rpc_create_args {
  #define RPC_CLNT_CREATE_ONESHOT  (1UL  3)
  #define RPC_CLNT_CREATE_NONPRIVPORT  (1UL  4)
  #define RPC_CLNT_CREATE_NOPING   (1UL  5)
 +#define 

Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-17 Thread Andrew Morton
On Tue, 17 Apr 2007 22:14:02 -0700 (PDT) Linus Torvalds <[EMAIL PROTECTED]> 
wrote:

> 
> 
> On Tue, 17 Apr 2007, Florin Iucha wrote:
> > 
> > Already did.  Traces from vanilla kernel at
> >http://iucha.net/nfs/21-rc7/big-copy
> 
> Well, there's a pdflush in io_schedule_timeout/congestion_wait, and 
> there's a nfsv4-scv in svc_recv/nfs_callback_sv, and a lot of processes 
> either just in schedule_timeout or similar "normal" waiting (pollwait 
> etc).
> 
> [ The call traces could be prettier, but sadly, even if you enable frame 
>   pointers, the x86-64 kernel is too stupid to follow them. So you kind of 
>   just have to ignore the noise) ]
> 
> The triggering process looks like it might be that "cp", it is in the 
> __wait_on_bit/sync_page/wait_on_page_bit/wait_on_page_writeback_range/ 
> filemap_fdatawait.
> 
> Is this a trace from the "big copy" hang, or from a gnome splashscreen 
> hang? It *looks* like it's a big copy. Yes/no?
> 
> Anyway, looks like the cp did a "utimes()" system call, which triggers 
> "nfs_setattr(), which in turn triggers the filemap_fdatawait() and some 
> kind of endless wait. Nothing stands out from the traces, in other words. 
> Doesn't look like a locking thing, for example - we're not stuck on some 
> inode semaphore, we're literally waiting for the page to be written out.
> 

That's the unpatched kernel.  I think the trace with Trond's patchset
is http://iucha.net/nfs/21-rc7-nfs1/big-copy

[  243.594565] cpS 0038b55df454 0  2547   2526 (NOTLB)
[  243.594570]  810078339988 0086  
000187a9
[  243.594574]  810078339908 80162c03 000a 
81007f612fe0
[  243.594579]  8054c3e0 063f 81007f6131b8 
0292
[  243.594583] Call Trace:
[  243.594587]  [] _spin_lock_irqsave+0x11/0x18
[  243.594591]  [] __mod_timer+0xa9/0xbb
[  243.594595]  [] schedule_timeout+0x8d/0xb4
[  243.594599]  [] process_timeout+0x0/0xb
[  243.594603]  [] io_schedule_timeout+0x28/0x33
[  243.594607]  [] congestion_wait_interruptible+0x86/0xa3
[  243.594611]  [] autoremove_wake_function+0x0/0x38
[  243.594615]  [] rpc_save_sigmask+0x2f/0x31
[  243.594619]  [] nfs_writepage_setup+0xc8/0x482
[  243.594624]  [] nfs_updatepage+0x101/0x143
[  243.594628]  [] nfs_commit_write+0x2e/0x41
[  243.594633]  [] generic_file_buffered_write+0x530/0x773
[  243.594639]  [] copy_user_generic_string+0x17/0x40
[  243.594643]  [] file_read_actor+0xaa/0x130
[  243.594648]  [] __generic_file_aio_write_nolock+0x38b/0x3fe
[  243.594652]  [] debug_mutex_free_waiter+0x5b/0x5f
[  243.594657]  [] generic_file_aio_write+0x64/0xc0
[  243.594662]  [] nfs_file_write+0xee/0x15a
[  243.594666]  [] do_sync_write+0xe2/0x126
[  243.594671]  [] autoremove_wake_function+0x0/0x38
[  243.594676]  [] _spin_unlock+0x9/0xb
[  243.594680]  [] vfs_write+0xae/0x137
[  243.594684]  [] sys_write+0x47/0x70
[  243.594688]  [] system_call+0x7e/0x83

At a guess, bdi_write_congested() is failing to return false.  
(nfs_update_request()
got inlined in nfs_writepage_setup(), even though it was defined afterwards?  
gcc
got smarter)

We have a stuck pdflush, presumably waiting for dirty+writeback memory to 
subside:

[  243.594761] pdflush   D 0038b5643b16 0  2552 11 (L-TLB)
[  243.594766]  81000c617d70 0046  
000187a9
[  243.594770]  81000c617cf0 80162c03 000a 
81007e5ff510
[  243.594775]  810002f4a080 0ae5 81007e5ff6e8 
00010282
[  243.594779] Call Trace:
[  243.594783]  [] _spin_lock_irqsave+0x11/0x18
[  243.594787]  [] __mod_timer+0xa9/0xbb
[  243.594792]  [] keventd_create_kthread+0x0/0x79
[  243.594795]  [] schedule_timeout+0x8d/0xb4
[  243.594799]  [] process_timeout+0x0/0xb
[  243.594803]  [] io_schedule_timeout+0x28/0x33
[  243.594806]  [] congestion_wait+0x6b/0x87
[  243.594810]  [] autoremove_wake_function+0x0/0x38
[  243.594814]  [] writeback_inodes+0xe1/0xea
[  243.594818]  [] pdflush+0x0/0x1e3
[  243.594821]  [] wb_kupdate+0xbb/0x113
[  243.594825]  [] pdflush+0x0/0x1e3
[  243.594828]  [] pdflush+0x138/0x1e3
[  243.594831]  [] wb_kupdate+0x0/0x113
[  243.594835]  [] kthread+0xd8/0x10b
[  243.594839]  [] schedule_tail+0x45/0xa5
[  243.594843]  [] child_rip+0xa/0x12
[  243.594847]  [] keventd_create_kthread+0x0/0x79
[  243.594850]  [] worker_thread+0x0/0x14b
[  243.594854]  [] kthread+0x0/0x10b
[  243.594858]  [] child_rip+0x0/0x12

at a guess I'd say we have a ton of memory under writeback, but the completions
aren't coming back.

Florin, can we please see /proc/meminfo as well?

Also the result of `echo m > /proc/sysrq-trigger'

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-17 Thread Florin Iucha
On Tue, Apr 17, 2007 at 10:14:02PM -0700, Linus Torvalds wrote:
> On Tue, 17 Apr 2007, Florin Iucha wrote:
> > 
> > Already did.  Traces from vanilla kernel at
> >http://iucha.net/nfs/21-rc7/big-copy
> 
> Well, there's a pdflush in io_schedule_timeout/congestion_wait, and 
> there's a nfsv4-scv in svc_recv/nfs_callback_sv, and a lot of processes 
> either just in schedule_timeout or similar "normal" waiting (pollwait 
> etc).
> 
> [ The call traces could be prettier, but sadly, even if you enable frame 
>   pointers, the x86-64 kernel is too stupid to follow them. So you kind of 
>   just have to ignore the noise) ]
> 
> The triggering process looks like it might be that "cp", it is in the 
> __wait_on_bit/sync_page/wait_on_page_bit/wait_on_page_writeback_range/ 
> filemap_fdatawait.
> 
> Is this a trace from the "big copy" hang, or from a gnome splashscreen 
> hang? It *looks* like it's a big copy. Yes/no?

It *is* big copy.

I am monitoring the copy on the server, using "iostat 5".  It writes
for a while at a fairly constant pace (on ext3 *), then it drops to 0 in 5-10
seconds and stays there...  Once I left it for a couple of hours and
it did not pick up.

Regards,
florin

(*) XFS exhibited wild ups and downs in the transaction rate, and JFS
starts strong then loses steam and slowly settles to about half ext3's
rate.

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-17 Thread Linus Torvalds


On Tue, 17 Apr 2007, Florin Iucha wrote:
> 
> Already did.  Traces from vanilla kernel at
>http://iucha.net/nfs/21-rc7/big-copy

Well, there's a pdflush in io_schedule_timeout/congestion_wait, and 
there's a nfsv4-scv in svc_recv/nfs_callback_sv, and a lot of processes 
either just in schedule_timeout or similar "normal" waiting (pollwait 
etc).

[ The call traces could be prettier, but sadly, even if you enable frame 
  pointers, the x86-64 kernel is too stupid to follow them. So you kind of 
  just have to ignore the noise) ]

The triggering process looks like it might be that "cp", it is in the 
__wait_on_bit/sync_page/wait_on_page_bit/wait_on_page_writeback_range/ 
filemap_fdatawait.

Is this a trace from the "big copy" hang, or from a gnome splashscreen 
hang? It *looks* like it's a big copy. Yes/no?

Anyway, looks like the cp did a "utimes()" system call, which triggers 
"nfs_setattr(), which in turn triggers the filemap_fdatawait() and some 
kind of endless wait. Nothing stands out from the traces, in other words. 
Doesn't look like a locking thing, for example - we're not stuck on some 
inode semaphore, we're literally waiting for the page to be written out.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-17 Thread Florin Iucha
On Tue, Apr 17, 2007 at 09:13:50PM -0700, Andrew Morton wrote:
> On Tue, 17 Apr 2007 23:07:30 -0500 [EMAIL PROTECTED] (Florin Iucha) wrote:
> > > > The process traces are at:
> > > > 
> > > >http://iucha.net/nfs/21-rc7-nfs1/gnome-session
> > > >http://iucha.net/nfs/21-rc7-nfs1/big-copy
> 
> please, do `echo t > /proc/sysrq-trigger' first, send us the result.

Already did.  Traces from vanilla kernel at
   http://iucha.net/nfs/21-rc7/big-copy

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-17 Thread Andrew Morton
On Tue, 17 Apr 2007 23:07:30 -0500 [EMAIL PROTECTED] (Florin Iucha) wrote:

> On Tue, Apr 17, 2007 at 11:54:45PM -0400, Trond Myklebust wrote:
> > > The good news is that the Gnome session log-in progresses to the point
> > > where both top and bottom bars are painted (gray) and the bottom bar
> > > is populated with icons (2.6.21-rc7 vanilla stops after displaying the
> > > splash).  The bad news is that it stops there.
> > > 
> > > Big-copy fails as well, after 2.5G transferred.
> > > 
> > > The process traces are at:
> > > 
> > >http://iucha.net/nfs/21-rc7-nfs1/gnome-session
> > >http://iucha.net/nfs/21-rc7-nfs1/big-copy
> > > 
> > > Regards,
> > > florin
> > 
> > Could you tell us a bit more about what happens when these hangs occur?
> > Does the networking stop too, or just NFS? How about CIFS?
> 
> The networking does not stop, I can ssh into and out of the box
> without any problem.
> 
> When 'gnome-session' hangs, it does not react to any clicks on the
> terminal or browser icons.  If I switch to a virtual console then come
> back into X, the panels (or splash) are gray - the icons disappear. I
> can switch to a virtual console and give it the three finger salute and
> it reboots cleanly.
> 
> When 'big-copy' hangs, if I switch to a different console and run
> 'lsof', '[u]mount', or use shell completion on a network mount then that
> process goes into D state.  I cannot umount the network shares nor
> stop autofs.  I cannot do a clean reboot, I have to ssh
> in and "echo s > /proc/sysrq-trigger; echo u > /proc/sysrq-trigger;
> echo b > /proc/sysrq-trigger" .

please, do `echo t > /proc/sysrq-trigger' first, send us the result.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-17 Thread Florin Iucha
On Tue, Apr 17, 2007 at 11:54:45PM -0400, Trond Myklebust wrote:
> > The good news is that the Gnome session log-in progresses to the point
> > where both top and bottom bars are painted (gray) and the bottom bar
> > is populated with icons (2.6.21-rc7 vanilla stops after displaying the
> > splash).  The bad news is that it stops there.
> > 
> > Big-copy fails as well, after 2.5G transferred.
> > 
> > The process traces are at:
> > 
> >http://iucha.net/nfs/21-rc7-nfs1/gnome-session
> >http://iucha.net/nfs/21-rc7-nfs1/big-copy
> > 
> > Regards,
> > florin
> 
> Could you tell us a bit more about what happens when these hangs occur?
> Does the networking stop too, or just NFS? How about CIFS?

The networking does not stop, I can ssh into and out of the box
without any problem.

When 'gnome-session' hangs, it does not react to any clicks on the
terminal or browser icons.  If I switch to a virtual console then come
back into X, the panels (or splash) are gray - the icons disappear. I
can switch to a virtual console and give it the three finger salute and
it reboots cleanly.

When 'big-copy' hangs, if I switch to a different console and run
'lsof', '[u]mount', or use shell completion on a network mount then that
process goes into D state.  I cannot umount the network shares nor
stop autofs.  I cannot do a clean reboot, I have to ssh
in and "echo s > /proc/sysrq-trigger; echo u > /proc/sysrq-trigger;
echo b > /proc/sysrq-trigger" .

I am not mounting anything using CIFS, but I could give it a try.

I could transfer 75 GB without hiccup with 2.6.19 using NFS4 and CIFS,
and with 2.6.20 using CIFS.  2.6.20 works fine under reasonably light
load, with gnome sessions logging in and out several times a day.

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-17 Thread Trond Myklebust
On Tue, 2007-04-17 at 22:30 -0500, Florin Iucha wrote:
> On Tue, Apr 17, 2007 at 11:06:05PM -0400, Trond Myklebust wrote:
> > > > I've split the issues introduced by the 2.6.21-rcX write code up into 4
> > > > subproblems.
> > > > 
> > > > The first patch is just a cleanup in order to ease review.
> > > > 
> > > > Patch number 2 ensures that we never release the PG_writeback flag until
> > > > _after_ we've either discarded the unstable request altogether, or put 
> > > > it
> > > > on the nfs_inode's commit or dirty lists.
> > > > 
> > > > Patch number 3 fixes the 'desynchronized value of nfs_i.ncommit' error. 
> > > > It
> > > > uses the PG_NEED_COMMIT flag as an indicator for whether or not the 
> > > > request
> > > > may be redirtied.
> > > > 
> > > > Patch number 4 protects the NFS '.set_page_dirty' address_space 
> > > > operation
> > > > against races with nfs_inode_add_request.
> > > 
> > > For 2.6.21, yes?
> > 
> > Right. A couple of nasty regressions have been sighted. This series
> > attempts to deal with them all.
> 
> The good news is that the Gnome session log-in progresses to the point
> where both top and bottom bars are painted (gray) and the bottom bar
> is populated with icons (2.6.21-rc7 vanilla stops after displaying the
> splash).  The bad news is that it stops there.
> 
> Big-copy fails as well, after 2.5G transferred.
> 
> The process traces are at:
> 
>http://iucha.net/nfs/21-rc7-nfs1/gnome-session
>http://iucha.net/nfs/21-rc7-nfs1/big-copy
> 
> Regards,
> florin

Could you tell us a bit more about what happens when these hangs occur?
Does the networking stop too, or just NFS? How about CIFS?

Cheers
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-17 Thread Florin Iucha
On Tue, Apr 17, 2007 at 11:06:05PM -0400, Trond Myklebust wrote:
> > > I've split the issues introduced by the 2.6.21-rcX write code up into 4
> > > subproblems.
> > > 
> > > The first patch is just a cleanup in order to ease review.
> > > 
> > > Patch number 2 ensures that we never release the PG_writeback flag until
> > > _after_ we've either discarded the unstable request altogether, or put it
> > > on the nfs_inode's commit or dirty lists.
> > > 
> > > Patch number 3 fixes the 'desynchronized value of nfs_i.ncommit' error. It
> > > uses the PG_NEED_COMMIT flag as an indicator for whether or not the 
> > > request
> > > may be redirtied.
> > > 
> > > Patch number 4 protects the NFS '.set_page_dirty' address_space operation
> > > against races with nfs_inode_add_request.
> > 
> > For 2.6.21, yes?
> 
> Right. A couple of nasty regressions have been sighted. This series
> attempts to deal with them all.

The good news is that the Gnome session log-in progresses to the point
where both top and bottom bars are painted (gray) and the bottom bar
is populated with icons (2.6.21-rc7 vanilla stops after displaying the
splash).  The bad news is that it stops there.

Big-copy fails as well, after 2.5G transferred.

The process traces are at:

   http://iucha.net/nfs/21-rc7-nfs1/gnome-session
   http://iucha.net/nfs/21-rc7-nfs1/big-copy

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-17 Thread Trond Myklebust
On Tue, 2007-04-17 at 19:58 -0700, Andrew Morton wrote:
> On Tue, 17 Apr 2007 21:19:46 -0400 Trond Myklebust <[EMAIL PROTECTED]> wrote:
> 
> > 
> > I've split the issues introduced by the 2.6.21-rcX write code up into 4
> > subproblems.
> > 
> > The first patch is just a cleanup in order to ease review.
> > 
> > Patch number 2 ensures that we never release the PG_writeback flag until
> > _after_ we've either discarded the unstable request altogether, or put it
> > on the nfs_inode's commit or dirty lists.
> > 
> > Patch number 3 fixes the 'desynchronized value of nfs_i.ncommit' error. It
> > uses the PG_NEED_COMMIT flag as an indicator for whether or not the request
> > may be redirtied.
> > 
> > Patch number 4 protects the NFS '.set_page_dirty' address_space operation
> > against races with nfs_inode_add_request.
> 
> For 2.6.21, yes?

Right. A couple of nasty regressions have been sighted. This series
attempts to deal with them all.

> I diligently tried to review this code but alas, it seems that my NFS
> knowledge remains not up to the task.  Please avoid buses.
> 
> At least it compiles with all the configs I could think of ;)

I was mainly interested in feedback from Peter, Florin and Ogawa-san to
find out if this series fixes their problems. You were unfortunate
enough to have been on earlier Ccs, so I didn't dare trim you off. :-)

Cheers
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-17 Thread Andrew Morton
On Tue, 17 Apr 2007 21:19:46 -0400 Trond Myklebust <[EMAIL PROTECTED]> wrote:

> 
> I've split the issues introduced by the 2.6.21-rcX write code up into 4
> subproblems.
> 
> The first patch is just a cleanup in order to ease review.
> 
> Patch number 2 ensures that we never release the PG_writeback flag until
> _after_ we've either discarded the unstable request altogether, or put it
> on the nfs_inode's commit or dirty lists.
> 
> Patch number 3 fixes the 'desynchronized value of nfs_i.ncommit' error. It
> uses the PG_NEED_COMMIT flag as an indicator for whether or not the request
> may be redirtied.
> 
> Patch number 4 protects the NFS '.set_page_dirty' address_space operation
> against races with nfs_inode_add_request.

For 2.6.21, yes?

I diligently tried to review this code but alas, it seems that my NFS
knowledge remains not up to the task.  Please avoid buses.

At least it compiles with all the configs I could think of ;)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-17 Thread Trond Myklebust
I've split the issues introduced by the 2.6.21-rcX write code up into 4
subproblems.

The first patch is just a cleanup in order to ease review.

Patch number 2 ensures that we never release the PG_writeback flag until
_after_ we've either discarded the unstable request altogether, or put it
on the nfs_inode's commit or dirty lists.

Patch number 3 fixes the 'desynchronized value of nfs_i.ncommit' error. It
uses the PG_NEED_COMMIT flag as an indicator for whether or not the request
may be redirtied.

Patch number 4 protects the NFS '.set_page_dirty' address_space operation
against races with nfs_inode_add_request.

Cheers
  Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-17 Thread Trond Myklebust
I've split the issues introduced by the 2.6.21-rcX write code up into 4
subproblems.

The first patch is just a cleanup in order to ease review.

Patch number 2 ensures that we never release the PG_writeback flag until
_after_ we've either discarded the unstable request altogether, or put it
on the nfs_inode's commit or dirty lists.

Patch number 3 fixes the 'desynchronized value of nfs_i.ncommit' error. It
uses the PG_NEED_COMMIT flag as an indicator for whether or not the request
may be redirtied.

Patch number 4 protects the NFS '.set_page_dirty' address_space operation
against races with nfs_inode_add_request.

Cheers
  Trond
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-17 Thread Andrew Morton
On Tue, 17 Apr 2007 21:19:46 -0400 Trond Myklebust [EMAIL PROTECTED] wrote:

 
 I've split the issues introduced by the 2.6.21-rcX write code up into 4
 subproblems.
 
 The first patch is just a cleanup in order to ease review.
 
 Patch number 2 ensures that we never release the PG_writeback flag until
 _after_ we've either discarded the unstable request altogether, or put it
 on the nfs_inode's commit or dirty lists.
 
 Patch number 3 fixes the 'desynchronized value of nfs_i.ncommit' error. It
 uses the PG_NEED_COMMIT flag as an indicator for whether or not the request
 may be redirtied.
 
 Patch number 4 protects the NFS '.set_page_dirty' address_space operation
 against races with nfs_inode_add_request.

For 2.6.21, yes?

I diligently tried to review this code but alas, it seems that my NFS
knowledge remains not up to the task.  Please avoid buses.

At least it compiles with all the configs I could think of ;)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-17 Thread Trond Myklebust
On Tue, 2007-04-17 at 19:58 -0700, Andrew Morton wrote:
 On Tue, 17 Apr 2007 21:19:46 -0400 Trond Myklebust [EMAIL PROTECTED] wrote:
 
  
  I've split the issues introduced by the 2.6.21-rcX write code up into 4
  subproblems.
  
  The first patch is just a cleanup in order to ease review.
  
  Patch number 2 ensures that we never release the PG_writeback flag until
  _after_ we've either discarded the unstable request altogether, or put it
  on the nfs_inode's commit or dirty lists.
  
  Patch number 3 fixes the 'desynchronized value of nfs_i.ncommit' error. It
  uses the PG_NEED_COMMIT flag as an indicator for whether or not the request
  may be redirtied.
  
  Patch number 4 protects the NFS '.set_page_dirty' address_space operation
  against races with nfs_inode_add_request.
 
 For 2.6.21, yes?

Right. A couple of nasty regressions have been sighted. This series
attempts to deal with them all.

 I diligently tried to review this code but alas, it seems that my NFS
 knowledge remains not up to the task.  Please avoid buses.
 
 At least it compiles with all the configs I could think of ;)

I was mainly interested in feedback from Peter, Florin and Ogawa-san to
find out if this series fixes their problems. You were unfortunate
enough to have been on earlier Ccs, so I didn't dare trim you off. :-)

Cheers
  Trond
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-17 Thread Florin Iucha
On Tue, Apr 17, 2007 at 11:06:05PM -0400, Trond Myklebust wrote:
   I've split the issues introduced by the 2.6.21-rcX write code up into 4
   subproblems.
   
   The first patch is just a cleanup in order to ease review.
   
   Patch number 2 ensures that we never release the PG_writeback flag until
   _after_ we've either discarded the unstable request altogether, or put it
   on the nfs_inode's commit or dirty lists.
   
   Patch number 3 fixes the 'desynchronized value of nfs_i.ncommit' error. It
   uses the PG_NEED_COMMIT flag as an indicator for whether or not the 
   request
   may be redirtied.
   
   Patch number 4 protects the NFS '.set_page_dirty' address_space operation
   against races with nfs_inode_add_request.
  
  For 2.6.21, yes?
 
 Right. A couple of nasty regressions have been sighted. This series
 attempts to deal with them all.

The good news is that the Gnome session log-in progresses to the point
where both top and bottom bars are painted (gray) and the bottom bar
is populated with icons (2.6.21-rc7 vanilla stops after displaying the
splash).  The bad news is that it stops there.

Big-copy fails as well, after 2.5G transferred.

The process traces are at:

   http://iucha.net/nfs/21-rc7-nfs1/gnome-session
   http://iucha.net/nfs/21-rc7-nfs1/big-copy

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-17 Thread Trond Myklebust
On Tue, 2007-04-17 at 22:30 -0500, Florin Iucha wrote:
 On Tue, Apr 17, 2007 at 11:06:05PM -0400, Trond Myklebust wrote:
I've split the issues introduced by the 2.6.21-rcX write code up into 4
subproblems.

The first patch is just a cleanup in order to ease review.

Patch number 2 ensures that we never release the PG_writeback flag until
_after_ we've either discarded the unstable request altogether, or put 
it
on the nfs_inode's commit or dirty lists.

Patch number 3 fixes the 'desynchronized value of nfs_i.ncommit' error. 
It
uses the PG_NEED_COMMIT flag as an indicator for whether or not the 
request
may be redirtied.

Patch number 4 protects the NFS '.set_page_dirty' address_space 
operation
against races with nfs_inode_add_request.
   
   For 2.6.21, yes?
  
  Right. A couple of nasty regressions have been sighted. This series
  attempts to deal with them all.
 
 The good news is that the Gnome session log-in progresses to the point
 where both top and bottom bars are painted (gray) and the bottom bar
 is populated with icons (2.6.21-rc7 vanilla stops after displaying the
 splash).  The bad news is that it stops there.
 
 Big-copy fails as well, after 2.5G transferred.
 
 The process traces are at:
 
http://iucha.net/nfs/21-rc7-nfs1/gnome-session
http://iucha.net/nfs/21-rc7-nfs1/big-copy
 
 Regards,
 florin

Could you tell us a bit more about what happens when these hangs occur?
Does the networking stop too, or just NFS? How about CIFS?

Cheers
  Trond
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-17 Thread Florin Iucha
On Tue, Apr 17, 2007 at 11:54:45PM -0400, Trond Myklebust wrote:
  The good news is that the Gnome session log-in progresses to the point
  where both top and bottom bars are painted (gray) and the bottom bar
  is populated with icons (2.6.21-rc7 vanilla stops after displaying the
  splash).  The bad news is that it stops there.
  
  Big-copy fails as well, after 2.5G transferred.
  
  The process traces are at:
  
 http://iucha.net/nfs/21-rc7-nfs1/gnome-session
 http://iucha.net/nfs/21-rc7-nfs1/big-copy
  
  Regards,
  florin
 
 Could you tell us a bit more about what happens when these hangs occur?
 Does the networking stop too, or just NFS? How about CIFS?

The networking does not stop, I can ssh into and out of the box
without any problem.

When 'gnome-session' hangs, it does not react to any clicks on the
terminal or browser icons.  If I switch to a virtual console then come
back into X, the panels (or splash) are gray - the icons disappear. I
can switch to a virtual console and give it the three finger salute and
it reboots cleanly.

When 'big-copy' hangs, if I switch to a different console and run
'lsof', '[u]mount', or use shell completion on a network mount then that
process goes into D state.  I cannot umount the network shares nor
stop autofs.  I cannot do a clean reboot, I have to ssh
in and echo s  /proc/sysrq-trigger; echo u  /proc/sysrq-trigger;
echo b  /proc/sysrq-trigger .

I am not mounting anything using CIFS, but I could give it a try.

I could transfer 75 GB without hiccup with 2.6.19 using NFS4 and CIFS,
and with 2.6.20 using CIFS.  2.6.20 works fine under reasonably light
load, with gnome sessions logging in and out several times a day.

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-17 Thread Andrew Morton
On Tue, 17 Apr 2007 23:07:30 -0500 [EMAIL PROTECTED] (Florin Iucha) wrote:

 On Tue, Apr 17, 2007 at 11:54:45PM -0400, Trond Myklebust wrote:
   The good news is that the Gnome session log-in progresses to the point
   where both top and bottom bars are painted (gray) and the bottom bar
   is populated with icons (2.6.21-rc7 vanilla stops after displaying the
   splash).  The bad news is that it stops there.
   
   Big-copy fails as well, after 2.5G transferred.
   
   The process traces are at:
   
  http://iucha.net/nfs/21-rc7-nfs1/gnome-session
  http://iucha.net/nfs/21-rc7-nfs1/big-copy
   
   Regards,
   florin
  
  Could you tell us a bit more about what happens when these hangs occur?
  Does the networking stop too, or just NFS? How about CIFS?
 
 The networking does not stop, I can ssh into and out of the box
 without any problem.
 
 When 'gnome-session' hangs, it does not react to any clicks on the
 terminal or browser icons.  If I switch to a virtual console then come
 back into X, the panels (or splash) are gray - the icons disappear. I
 can switch to a virtual console and give it the three finger salute and
 it reboots cleanly.
 
 When 'big-copy' hangs, if I switch to a different console and run
 'lsof', '[u]mount', or use shell completion on a network mount then that
 process goes into D state.  I cannot umount the network shares nor
 stop autofs.  I cannot do a clean reboot, I have to ssh
 in and echo s  /proc/sysrq-trigger; echo u  /proc/sysrq-trigger;
 echo b  /proc/sysrq-trigger .

please, do `echo t  /proc/sysrq-trigger' first, send us the result.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-17 Thread Florin Iucha
On Tue, Apr 17, 2007 at 09:13:50PM -0700, Andrew Morton wrote:
 On Tue, 17 Apr 2007 23:07:30 -0500 [EMAIL PROTECTED] (Florin Iucha) wrote:
The process traces are at:

   http://iucha.net/nfs/21-rc7-nfs1/gnome-session
   http://iucha.net/nfs/21-rc7-nfs1/big-copy
 
 please, do `echo t  /proc/sysrq-trigger' first, send us the result.

Already did.  Traces from vanilla kernel at
   http://iucha.net/nfs/21-rc7/big-copy

Regards,
florin

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-17 Thread Linus Torvalds


On Tue, 17 Apr 2007, Florin Iucha wrote:
 
 Already did.  Traces from vanilla kernel at
http://iucha.net/nfs/21-rc7/big-copy

Well, there's a pdflush in io_schedule_timeout/congestion_wait, and 
there's a nfsv4-scv in svc_recv/nfs_callback_sv, and a lot of processes 
either just in schedule_timeout or similar normal waiting (pollwait 
etc).

[ The call traces could be prettier, but sadly, even if you enable frame 
  pointers, the x86-64 kernel is too stupid to follow them. So you kind of 
  just have to ignore the noise) ]

The triggering process looks like it might be that cp, it is in the 
__wait_on_bit/sync_page/wait_on_page_bit/wait_on_page_writeback_range/ 
filemap_fdatawait.

Is this a trace from the big copy hang, or from a gnome splashscreen 
hang? It *looks* like it's a big copy. Yes/no?

Anyway, looks like the cp did a utimes() system call, which triggers 
nfs_setattr(), which in turn triggers the filemap_fdatawait() and some 
kind of endless wait. Nothing stands out from the traces, in other words. 
Doesn't look like a locking thing, for example - we're not stuck on some 
inode semaphore, we're literally waiting for the page to be written out.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-17 Thread Florin Iucha
On Tue, Apr 17, 2007 at 10:14:02PM -0700, Linus Torvalds wrote:
 On Tue, 17 Apr 2007, Florin Iucha wrote:
  
  Already did.  Traces from vanilla kernel at
 http://iucha.net/nfs/21-rc7/big-copy
 
 Well, there's a pdflush in io_schedule_timeout/congestion_wait, and 
 there's a nfsv4-scv in svc_recv/nfs_callback_sv, and a lot of processes 
 either just in schedule_timeout or similar normal waiting (pollwait 
 etc).
 
 [ The call traces could be prettier, but sadly, even if you enable frame 
   pointers, the x86-64 kernel is too stupid to follow them. So you kind of 
   just have to ignore the noise) ]
 
 The triggering process looks like it might be that cp, it is in the 
 __wait_on_bit/sync_page/wait_on_page_bit/wait_on_page_writeback_range/ 
 filemap_fdatawait.
 
 Is this a trace from the big copy hang, or from a gnome splashscreen 
 hang? It *looks* like it's a big copy. Yes/no?

It *is* big copy.

I am monitoring the copy on the server, using iostat 5.  It writes
for a while at a fairly constant pace (on ext3 *), then it drops to 0 in 5-10
seconds and stays there...  Once I left it for a couple of hours and
it did not pick up.

Regards,
florin

(*) XFS exhibited wild ups and downs in the transaction rate, and JFS
starts strong then loses steam and slowly settles to about half ext3's
rate.

-- 
Bruce Schneier expects the Spanish Inquisition.
  http://geekz.co.uk/schneierfacts/fact/163


signature.asc
Description: Digital signature


Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-17 Thread Andrew Morton
On Tue, 17 Apr 2007 22:14:02 -0700 (PDT) Linus Torvalds [EMAIL PROTECTED] 
wrote:

 
 
 On Tue, 17 Apr 2007, Florin Iucha wrote:
  
  Already did.  Traces from vanilla kernel at
 http://iucha.net/nfs/21-rc7/big-copy
 
 Well, there's a pdflush in io_schedule_timeout/congestion_wait, and 
 there's a nfsv4-scv in svc_recv/nfs_callback_sv, and a lot of processes 
 either just in schedule_timeout or similar normal waiting (pollwait 
 etc).
 
 [ The call traces could be prettier, but sadly, even if you enable frame 
   pointers, the x86-64 kernel is too stupid to follow them. So you kind of 
   just have to ignore the noise) ]
 
 The triggering process looks like it might be that cp, it is in the 
 __wait_on_bit/sync_page/wait_on_page_bit/wait_on_page_writeback_range/ 
 filemap_fdatawait.
 
 Is this a trace from the big copy hang, or from a gnome splashscreen 
 hang? It *looks* like it's a big copy. Yes/no?
 
 Anyway, looks like the cp did a utimes() system call, which triggers 
 nfs_setattr(), which in turn triggers the filemap_fdatawait() and some 
 kind of endless wait. Nothing stands out from the traces, in other words. 
 Doesn't look like a locking thing, for example - we're not stuck on some 
 inode semaphore, we're literally waiting for the page to be written out.
 

That's the unpatched kernel.  I think the trace with Trond's patchset
is http://iucha.net/nfs/21-rc7-nfs1/big-copy

[  243.594565] cpS 0038b55df454 0  2547   2526 (NOTLB)
[  243.594570]  810078339988 0086  
000187a9
[  243.594574]  810078339908 80162c03 000a 
81007f612fe0
[  243.594579]  8054c3e0 063f 81007f6131b8 
0292
[  243.594583] Call Trace:
[  243.594587]  [80162c03] _spin_lock_irqsave+0x11/0x18
[  243.594591]  [8011c0c5] __mod_timer+0xa9/0xbb
[  243.594595]  [80161283] schedule_timeout+0x8d/0xb4
[  243.594599]  [8018b8ac] process_timeout+0x0/0xb
[  243.594603]  [80160bcf] io_schedule_timeout+0x28/0x33
[  243.594607]  [801a9400] congestion_wait_interruptible+0x86/0xa3
[  243.594611]  [8019489f] autoremove_wake_function+0x0/0x38
[  243.594615]  [80418952] rpc_save_sigmask+0x2f/0x31
[  243.594619]  [80236cfa] nfs_writepage_setup+0xc8/0x482
[  243.594624]  [80237507] nfs_updatepage+0x101/0x143
[  243.594628]  [8022da30] nfs_commit_write+0x2e/0x41
[  243.594633]  [8010fb96] generic_file_buffered_write+0x530/0x773
[  243.594639]  [8015fae7] copy_user_generic_string+0x17/0x40
[  243.594643]  [8010cb51] file_read_actor+0xaa/0x130
[  243.594648]  [80115ee8] __generic_file_aio_write_nolock+0x38b/0x3fe
[  243.594652]  [8013824c] debug_mutex_free_waiter+0x5b/0x5f
[  243.594657]  [80120d73] generic_file_aio_write+0x64/0xc0
[  243.594662]  [8022e128] nfs_file_write+0xee/0x15a
[  243.594666]  [801175c8] do_sync_write+0xe2/0x126
[  243.594671]  [8019489f] autoremove_wake_function+0x0/0x38
[  243.594676]  [80162ab5] _spin_unlock+0x9/0xb
[  243.594680]  [80116294] vfs_write+0xae/0x137
[  243.594684]  [80116bea] sys_write+0x47/0x70
[  243.594688]  [8015ad5e] system_call+0x7e/0x83

At a guess, bdi_write_congested() is failing to return false.  
(nfs_update_request()
got inlined in nfs_writepage_setup(), even though it was defined afterwards?  
gcc
got smarter)

We have a stuck pdflush, presumably waiting for dirty+writeback memory to 
subside:

[  243.594761] pdflush   D 0038b5643b16 0  2552 11 (L-TLB)
[  243.594766]  81000c617d70 0046  
000187a9
[  243.594770]  81000c617cf0 80162c03 000a 
81007e5ff510
[  243.594775]  810002f4a080 0ae5 81007e5ff6e8 
00010282
[  243.594779] Call Trace:
[  243.594783]  [80162c03] _spin_lock_irqsave+0x11/0x18
[  243.594787]  [8011c0c5] __mod_timer+0xa9/0xbb
[  243.594792]  [801946f2] keventd_create_kthread+0x0/0x79
[  243.594795]  [80161283] schedule_timeout+0x8d/0xb4
[  243.594799]  [8018b8ac] process_timeout+0x0/0xb
[  243.594803]  [80160bcf] io_schedule_timeout+0x28/0x33
[  243.594806]  [801a935e] congestion_wait+0x6b/0x87
[  243.594810]  [8019489f] autoremove_wake_function+0x0/0x38
[  243.594814]  [8014da11] writeback_inodes+0xe1/0xea
[  243.594818]  [80153381] pdflush+0x0/0x1e3
[  243.594821]  [801a732c] wb_kupdate+0xbb/0x113
[  243.594825]  [80153381] pdflush+0x0/0x1e3
[  243.594828]  [801534b9] pdflush+0x138/0x1e3
[  243.594831]  [801a7271] wb_kupdate+0x0/0x113
[  243.594835]  [801315d6] kthread+0xd8/0x10b
[  243.594839]  [801270d2] schedule_tail+0x45/0xa5
[  243.594843]  [8015bb78] child_rip+0xa/0x12
[  243.594847]  [801946f2]