Re: NFS-RDMA hangs: connection closed (-103)

2010-12-09 Thread Tom Tucker

On 12/8/10 9:10 AM, Spelic wrote:
Tom, have you reproduced the RDMA hangs - connection closes bug or 
the sparse file at server side upon NFS hitting ENOSPC ?


Because for the latter people have already given exhaustive 
explanation: see this other thread at 
http://fossplanet.com/f13/%5Blinux-lvm%5D-bugs-mkfs-xfs-device-mapper-xfs-dev-ram-81653/ 



While the former bug is still open and very interesting for us.

I'm working on the 'former' bug. The bug that I think you've run in to 
with how RDMA transport errors are handled and how RPC are retried in 
the event of an error. With hard mounts (which I'm suspecting you have), 
the RPC will be retried forever. In this bug, the transport never 
'recovers' after the error and therefore the RPC never succeeds and the 
mount is effectively hung.


There were bugs fixed in this area between 34 and top which is why you 
saw the less catastrophic, but still broken behavior you see now.


Unfortunately I can only support this part-time, but I'll keep you 
updated on the progress.


Thanks for finding this and helping to debug,
Tom


Thanks for your help
S.


On 12/07/2010 05:12 PM, Tom Tucker wrote:

Status update...

I have reproduced the bug a number of different ways. It seems to be 
most easily reproduced by simply writing more data than the 
filesystem has space for. I can do this reliably with any FS. I think 
the XFS bug may have tickled this bug somehow.


Tom

On 12/2/10 1:09 PM, Spelic wrote:

Hello all
please be aware that the file oversize bug is reproducible also 
without infiniband, with just nfs over ethernet over xfs over 
ramdisk (but it doesn't hang, so it's a different bug than the one I 
posted here at the RDMA mailing list)
I have posted another thread regarding the file oversize bug, 
which you can read in the LVM, XFS, and LKML mailing lists, please 
have a look
http://fossplanet.com/f13/%5Blinux-lvm%5D-bugs-mkfs-xfs-device-mapper-xfs-dev-ram-81653/ 

Especially my second post, replying myself at +30 minutes, explains 
that it's reproducible also with ethernet.


Thank you

On 12/02/2010 07:37 PM, Roland Dreier wrote:
Adding Dave Chinner to the cc list, since he's both an XFS guru as 
well

as being very familiar with NFS and RDMA...

Dave, if you read below, it seems there is some strange behavior
exporting XFS with NFS/RDMA.

  - R.

  On 12/02/2010 12:59 AM, Tom Tucker wrote:
   Spelic,
 
   I have seen this problem before, but have not been able to 
reliably
   reproduce it. When I saw the problem, there were no transport 
errors
   and it appeared as if the I/O had actually completed, but that 
the

   waiter was not being awoken. I was not able to reliably reproduce
   the problem and was not able to determine if the problem was a
   latent bug in NFS in general or a bug in the RDMA transport in
   particular.
 
   I will try your setup here, but I don't have a system like 
yours so

   I'll have to settle for a smaller ramdisk, however, I have a few
   questions:
 
   - Does the FS matter? For example, can you use ext[2-4] on the
   ramdisk and not still reproduce
   - As I mentioned earlier NFS v3 vs. NFS v4
   - RAMDISK size, i.e. 2G vs. 14G
 
   Thanks,
   Tom

  Hello Tom, thanks for replying

  - The FS matters to some extent: as I wrote, with ext4 it's not
  possible to reproduce the bug in this way, so immediately and
  reliably, however ext4 also will hang eventually if you work on 
it for

  hours so I had to switch to IPoIB for our real work; reread my
  previous post.

  - NFS3 not tried yet. Never tried to do RDMA on NFS3... do you 
have a

  pointer on instructions?


  - RAMDISK size: I am testing it.

  Ok I confirm with 1.5GB ramdisk it's reproducible.
  boot option ramdisk_size=1572864
  (1.5*1024**2=1572864.0)
  confirm: blockdev --getsize64 /dev/ram0 == 1610612736

  now at server side mkfs and mount with defaults:
  mkfs.xfs /dev/ram0
  mount /dev/ram0 /mnt/ram
  (this is a simplification over my previous email, and it's 
needed with

  a smaller ramdisk or mkfs.xfs will refuse to work. The bug is still
  reproducible like this)


  DOH! another bug:
  It's strange how at the end of the test
  ls -lh /mnt/ram
  at server side will show a zerofile larger than 1.5GB at the end of
  the procedure, sometimes it's 3GB, sometimes it's 2.3GB... but it's
  larger than the ramdisk size.

  # ll -h /mnt/ram
  total 1.5G
  drwxr-xr-x 2 root root   21 2010-12-02 12:54 ./
  drwxr-xr-x 3 root root 4.0K 2010-11-29 23:51 ../
  -rw-r--r-- 1 root root 2.3G 2010-12-02 12:59 zerofile
  # df -h
  FilesystemSize  Used Avail Use% Mounted on
  /dev/sda1 294G  4.1G  275G   2% /
  devtmpfs  7.9G  184K  7.9G   1% /dev
  none  7.9G 0  7.9G   0% /dev/shm
  none  7.9G  100K  7.9G   1% /var/run
  none  7.9G 0  7.9G   0% /var/lock
  none  7.9G 0  7.9G   0% /lib/init/rw
  /dev/ram0 1.5G  1.5G   20K 100% /mnt/ram

  # dd 

Re: NFS-RDMA hangs: connection closed (-103)

2010-12-08 Thread Spelic
Tom, have you reproduced the RDMA hangs - connection closes bug or the 
sparse file at server side upon NFS hitting ENOSPC ?


Because for the latter people have already given exhaustive explanation: 
see this other thread at 
http://fossplanet.com/f13/%5Blinux-lvm%5D-bugs-mkfs-xfs-device-mapper-xfs-dev-ram-81653/ 



While the former bug is still open and very interesting for us.

Thanks for your help
S.


On 12/07/2010 05:12 PM, Tom Tucker wrote:

Status update...

I have reproduced the bug a number of different ways. It seems to be 
most easily reproduced by simply writing more data than the filesystem 
has space for. I can do this reliably with any FS. I think the XFS bug 
may have tickled this bug somehow.


Tom

On 12/2/10 1:09 PM, Spelic wrote:

Hello all
please be aware that the file oversize bug is reproducible also 
without infiniband, with just nfs over ethernet over xfs over ramdisk 
(but it doesn't hang, so it's a different bug than the one I posted 
here at the RDMA mailing list)
I have posted another thread regarding the file oversize bug, which 
you can read in the LVM, XFS, and LKML mailing lists, please have a look
http://fossplanet.com/f13/%5Blinux-lvm%5D-bugs-mkfs-xfs-device-mapper-xfs-dev-ram-81653/ 

Especially my second post, replying myself at +30 minutes, explains 
that it's reproducible also with ethernet.


Thank you

On 12/02/2010 07:37 PM, Roland Dreier wrote:

Adding Dave Chinner to the cc list, since he's both an XFS guru as well
as being very familiar with NFS and RDMA...

Dave, if you read below, it seems there is some strange behavior
exporting XFS with NFS/RDMA.

  - R.

  On 12/02/2010 12:59 AM, Tom Tucker wrote:
   Spelic,
 
   I have seen this problem before, but have not been able to 
reliably
   reproduce it. When I saw the problem, there were no transport 
errors

   and it appeared as if the I/O had actually completed, but that the
   waiter was not being awoken. I was not able to reliably reproduce
   the problem and was not able to determine if the problem was a
   latent bug in NFS in general or a bug in the RDMA transport in
   particular.
 
   I will try your setup here, but I don't have a system like 
yours so

   I'll have to settle for a smaller ramdisk, however, I have a few
   questions:
 
   - Does the FS matter? For example, can you use ext[2-4] on the
   ramdisk and not still reproduce
   - As I mentioned earlier NFS v3 vs. NFS v4
   - RAMDISK size, i.e. 2G vs. 14G
 
   Thanks,
   Tom

  Hello Tom, thanks for replying

  - The FS matters to some extent: as I wrote, with ext4 it's not
  possible to reproduce the bug in this way, so immediately and
  reliably, however ext4 also will hang eventually if you work on 
it for

  hours so I had to switch to IPoIB for our real work; reread my
  previous post.

  - NFS3 not tried yet. Never tried to do RDMA on NFS3... do you 
have a

  pointer on instructions?


  - RAMDISK size: I am testing it.

  Ok I confirm with 1.5GB ramdisk it's reproducible.
  boot option ramdisk_size=1572864
  (1.5*1024**2=1572864.0)
  confirm: blockdev --getsize64 /dev/ram0 == 1610612736

  now at server side mkfs and mount with defaults:
  mkfs.xfs /dev/ram0
  mount /dev/ram0 /mnt/ram
  (this is a simplification over my previous email, and it's needed 
with

  a smaller ramdisk or mkfs.xfs will refuse to work. The bug is still
  reproducible like this)


  DOH! another bug:
  It's strange how at the end of the test
  ls -lh /mnt/ram
  at server side will show a zerofile larger than 1.5GB at the end of
  the procedure, sometimes it's 3GB, sometimes it's 2.3GB... but it's
  larger than the ramdisk size.

  # ll -h /mnt/ram
  total 1.5G
  drwxr-xr-x 2 root root   21 2010-12-02 12:54 ./
  drwxr-xr-x 3 root root 4.0K 2010-11-29 23:51 ../
  -rw-r--r-- 1 root root 2.3G 2010-12-02 12:59 zerofile
  # df -h
  FilesystemSize  Used Avail Use% Mounted on
  /dev/sda1 294G  4.1G  275G   2% /
  devtmpfs  7.9G  184K  7.9G   1% /dev
  none  7.9G 0  7.9G   0% /dev/shm
  none  7.9G  100K  7.9G   1% /var/run
  none  7.9G 0  7.9G   0% /var/lock
  none  7.9G 0  7.9G   0% /lib/init/rw
  /dev/ram0 1.5G  1.5G   20K 100% /mnt/ram

  # dd if=/mnt/ram/zerofile | wc -c
  4791480+0 records in
  4791480+0 records out
  2453237760
  2453237760 bytes (2.5 GB) copied, 8.41821 s, 291 MB/s

  It seems there is also an XFS bug here...

  This might help triggering the bug however please note than ext4
  (nfs-rdma over it) also hanged on us and it was real work on HDD 
disks

  and they were not full... after switching to IPoIB it didn't hang
  anymore.

  On IPoIB the size problem also shows up: final file is 2.3GB instead
  of  1.5GB, however nothing hangs:

  # echo begin; dd if=/dev/zero of=/mnt/nfsram/zerofile bs=1M ; echo
  syncing now ; time sync ; echo finished
  begin
  dd: writing `/mnt/nfsram/zerofile': Input/output error
  2497+0 

Re: NFS-RDMA hangs: connection closed (-103)

2010-12-07 Thread Tom Tucker

Status update...

I have reproduced the bug a number of different ways. It seems to be most 
easily reproduced by simply writing more data than the filesystem has 
space for. I can do this reliably with any FS. I think the XFS bug may 
have tickled this bug somehow.


Tom

On 12/2/10 1:09 PM, Spelic wrote:

Hello all
please be aware that the file oversize bug is reproducible also 
without infiniband, with just nfs over ethernet over xfs over ramdisk 
(but it doesn't hang, so it's a different bug than the one I posted here 
at the RDMA mailing list)
I have posted another thread regarding the file oversize bug, which 
you can read in the LVM, XFS, and LKML mailing lists, please have a look
http://fossplanet.com/f13/%5Blinux-lvm%5D-bugs-mkfs-xfs-device-mapper-xfs-dev-ram-81653/ 

Especially my second post, replying myself at +30 minutes, explains that 
it's reproducible also with ethernet.


Thank you

On 12/02/2010 07:37 PM, Roland Dreier wrote:

Adding Dave Chinner to the cc list, since he's both an XFS guru as well
as being very familiar with NFS and RDMA...

Dave, if you read below, it seems there is some strange behavior
exporting XFS with NFS/RDMA.

  - R.

  On 12/02/2010 12:59 AM, Tom Tucker wrote:
   Spelic,
 
   I have seen this problem before, but have not been able to reliably
   reproduce it. When I saw the problem, there were no transport errors
   and it appeared as if the I/O had actually completed, but that the
   waiter was not being awoken. I was not able to reliably reproduce
   the problem and was not able to determine if the problem was a
   latent bug in NFS in general or a bug in the RDMA transport in
   particular.
 
   I will try your setup here, but I don't have a system like yours so
   I'll have to settle for a smaller ramdisk, however, I have a few
   questions:
 
   - Does the FS matter? For example, can you use ext[2-4] on the
   ramdisk and not still reproduce
   - As I mentioned earlier NFS v3 vs. NFS v4
   - RAMDISK size, i.e. 2G vs. 14G
 
   Thanks,
   Tom

  Hello Tom, thanks for replying

  - The FS matters to some extent: as I wrote, with ext4 it's not
  possible to reproduce the bug in this way, so immediately and
  reliably, however ext4 also will hang eventually if you work on it for
  hours so I had to switch to IPoIB for our real work; reread my
  previous post.

  - NFS3 not tried yet. Never tried to do RDMA on NFS3... do you have a
  pointer on instructions?


  - RAMDISK size: I am testing it.

  Ok I confirm with 1.5GB ramdisk it's reproducible.
  boot option ramdisk_size=1572864
  (1.5*1024**2=1572864.0)
  confirm: blockdev --getsize64 /dev/ram0 == 1610612736

  now at server side mkfs and mount with defaults:
  mkfs.xfs /dev/ram0
  mount /dev/ram0 /mnt/ram
  (this is a simplification over my previous email, and it's needed with
  a smaller ramdisk or mkfs.xfs will refuse to work. The bug is still
  reproducible like this)


  DOH! another bug:
  It's strange how at the end of the test
  ls -lh /mnt/ram
  at server side will show a zerofile larger than 1.5GB at the end of
  the procedure, sometimes it's 3GB, sometimes it's 2.3GB... but it's
  larger than the ramdisk size.

  # ll -h /mnt/ram
  total 1.5G
  drwxr-xr-x 2 root root   21 2010-12-02 12:54 ./
  drwxr-xr-x 3 root root 4.0K 2010-11-29 23:51 ../
  -rw-r--r-- 1 root root 2.3G 2010-12-02 12:59 zerofile
  # df -h
  FilesystemSize  Used Avail Use% Mounted on
  /dev/sda1 294G  4.1G  275G   2% /
  devtmpfs  7.9G  184K  7.9G   1% /dev
  none  7.9G 0  7.9G   0% /dev/shm
  none  7.9G  100K  7.9G   1% /var/run
  none  7.9G 0  7.9G   0% /var/lock
  none  7.9G 0  7.9G   0% /lib/init/rw
  /dev/ram0 1.5G  1.5G   20K 100% /mnt/ram

  # dd if=/mnt/ram/zerofile | wc -c
  4791480+0 records in
  4791480+0 records out
  2453237760
  2453237760 bytes (2.5 GB) copied, 8.41821 s, 291 MB/s

  It seems there is also an XFS bug here...

  This might help triggering the bug however please note than ext4
  (nfs-rdma over it) also hanged on us and it was real work on HDD disks
  and they were not full... after switching to IPoIB it didn't hang
  anymore.

  On IPoIB the size problem also shows up: final file is 2.3GB instead
  of  1.5GB, however nothing hangs:

  # echo begin; dd if=/dev/zero of=/mnt/nfsram/zerofile bs=1M ; echo
  syncing now ; time sync ; echo finished
  begin
  dd: writing `/mnt/nfsram/zerofile': Input/output error
  2497+0 records in
  2496+0 records out
  2617245696 bytes (2.6 GB) copied, 10.4 s, 252 MB/s
  syncing now

  real0m0.057s
  user0m0.000s
  sys 0m0.000s
  finished

  I think I noticed the same problem with a 14GB ramdisk, the file ended
  up to be about 15GB, but at that time I thought I made some
  computation mistakes. Now with a smaller ramdisk it's more obvious.

  Earlier or later someone should notify the XFS developers of the 
size bug.

  However 

Re: NFS-RDMA hangs: connection closed (-103)

2010-12-02 Thread Spelic

On 12/02/2010 12:59 AM, Tom Tucker wrote:

Spelic,

I have seen this problem before, but have not been able to reliably 
reproduce it. When I saw the problem, there were no transport errors 
and it appeared as if the I/O had actually completed, but that the 
waiter was not being awoken. I was not able to reliably reproduce the 
problem and was not able to determine if the problem was a latent bug 
in NFS in general or a bug in the RDMA transport in particular.


I will try your setup here, but I don't have a system like yours so 
I'll have to settle for a smaller ramdisk, however, I have a few 
questions:


- Does the FS matter? For example, can you use ext[2-4] on the ramdisk 
and not still reproduce

- As I mentioned earlier NFS v3 vs. NFS v4
- RAMDISK size, i.e. 2G vs. 14G

Thanks,
Tom


Hello Tom, thanks for replying

- The FS matters to some extent: as I wrote, with ext4 it's not possible 
to reproduce the bug in this way, so immediately and reliably, however 
ext4 also will hang eventually if you work on it for hours so I had to 
switch to IPoIB for our real work; reread my previous post.


- NFS3 not tried yet. Never tried to do RDMA on NFS3... do you have a 
pointer on instructions?



- RAMDISK size: I am testing it.

Ok I confirm with 1.5GB ramdisk it's reproducible.
boot option ramdisk_size=1572864
(1.5*1024**2=1572864.0)
confirm: blockdev --getsize64 /dev/ram0 == 1610612736

now at server side mkfs and mount with defaults:
mkfs.xfs /dev/ram0
mount /dev/ram0 /mnt/ram
(this is a simplification over my previous email, and it's needed with a 
smaller ramdisk or mkfs.xfs will refuse to work. The bug is still 
reproducible like this)



DOH! another bug:
It's strange how at the end of the test
ls -lh /mnt/ram
at server side will show a zerofile larger than 1.5GB at the end of the 
procedure, sometimes it's 3GB, sometimes it's 2.3GB... but it's larger 
than the ramdisk size.


# ll -h /mnt/ram
total 1.5G
drwxr-xr-x 2 root root   21 2010-12-02 12:54 ./
drwxr-xr-x 3 root root 4.0K 2010-11-29 23:51 ../
-rw-r--r-- 1 root root 2.3G 2010-12-02 12:59 zerofile
# df -h
FilesystemSize  Used Avail Use% Mounted on
/dev/sda1 294G  4.1G  275G   2% /
devtmpfs  7.9G  184K  7.9G   1% /dev
none  7.9G 0  7.9G   0% /dev/shm
none  7.9G  100K  7.9G   1% /var/run
none  7.9G 0  7.9G   0% /var/lock
none  7.9G 0  7.9G   0% /lib/init/rw
/dev/ram0 1.5G  1.5G   20K 100% /mnt/ram

# dd if=/mnt/ram/zerofile | wc -c
4791480+0 records in
4791480+0 records out
2453237760
2453237760 bytes (2.5 GB) copied, 8.41821 s, 291 MB/s

It seems there is also an XFS bug here...

This might help triggering the bug however please note than ext4 
(nfs-rdma over it) also hanged on us and it was real work on HDD disks 
and they were not full... after switching to IPoIB it didn't hang anymore.


On IPoIB the size problem also shows up: final file is 2.3GB instead of 
 1.5GB, however nothing hangs:


# echo begin; dd if=/dev/zero of=/mnt/nfsram/zerofile bs=1M ; echo 
syncing now ; time sync ; echo finished

begin
dd: writing `/mnt/nfsram/zerofile': Input/output error
2497+0 records in
2496+0 records out
2617245696 bytes (2.6 GB) copied, 10.4 s, 252 MB/s
syncing now

real0m0.057s
user0m0.000s
sys 0m0.000s
finished

I think I noticed the same problem with a 14GB ramdisk, the file ended 
up to be about 15GB, but at that time I thought I made some computation 
mistakes. Now with a smaller ramdisk it's more obvious.


Earlier or later someone should notify the XFS developers of the size bug.
However currently it's a good thing: the size bug might help us to fix 
the RDMA bug.


Thanks for your help

--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS-RDMA hangs: connection closed (-103)

2010-12-02 Thread Roland Dreier
Adding Dave Chinner to the cc list, since he's both an XFS guru as well
as being very familiar with NFS and RDMA...

Dave, if you read below, it seems there is some strange behavior
exporting XFS with NFS/RDMA.

 - R.

  On 12/02/2010 12:59 AM, Tom Tucker wrote:
   Spelic,
  
   I have seen this problem before, but have not been able to reliably
   reproduce it. When I saw the problem, there were no transport errors
   and it appeared as if the I/O had actually completed, but that the
   waiter was not being awoken. I was not able to reliably reproduce
   the problem and was not able to determine if the problem was a
   latent bug in NFS in general or a bug in the RDMA transport in
   particular.
  
   I will try your setup here, but I don't have a system like yours so
   I'll have to settle for a smaller ramdisk, however, I have a few
   questions:
  
   - Does the FS matter? For example, can you use ext[2-4] on the
   ramdisk and not still reproduce
   - As I mentioned earlier NFS v3 vs. NFS v4
   - RAMDISK size, i.e. 2G vs. 14G
  
   Thanks,
   Tom
  
  Hello Tom, thanks for replying
  
  - The FS matters to some extent: as I wrote, with ext4 it's not
  possible to reproduce the bug in this way, so immediately and
  reliably, however ext4 also will hang eventually if you work on it for
  hours so I had to switch to IPoIB for our real work; reread my
  previous post.
  
  - NFS3 not tried yet. Never tried to do RDMA on NFS3... do you have a
  pointer on instructions?
  
  
  - RAMDISK size: I am testing it.
  
  Ok I confirm with 1.5GB ramdisk it's reproducible.
  boot option ramdisk_size=1572864
  (1.5*1024**2=1572864.0)
  confirm: blockdev --getsize64 /dev/ram0 == 1610612736
  
  now at server side mkfs and mount with defaults:
  mkfs.xfs /dev/ram0
  mount /dev/ram0 /mnt/ram
  (this is a simplification over my previous email, and it's needed with
  a smaller ramdisk or mkfs.xfs will refuse to work. The bug is still
  reproducible like this)
  
  
  DOH! another bug:
  It's strange how at the end of the test
  ls -lh /mnt/ram
  at server side will show a zerofile larger than 1.5GB at the end of
  the procedure, sometimes it's 3GB, sometimes it's 2.3GB... but it's
  larger than the ramdisk size.
  
  # ll -h /mnt/ram
  total 1.5G
  drwxr-xr-x 2 root root   21 2010-12-02 12:54 ./
  drwxr-xr-x 3 root root 4.0K 2010-11-29 23:51 ../
  -rw-r--r-- 1 root root 2.3G 2010-12-02 12:59 zerofile
  # df -h
  FilesystemSize  Used Avail Use% Mounted on
  /dev/sda1 294G  4.1G  275G   2% /
  devtmpfs  7.9G  184K  7.9G   1% /dev
  none  7.9G 0  7.9G   0% /dev/shm
  none  7.9G  100K  7.9G   1% /var/run
  none  7.9G 0  7.9G   0% /var/lock
  none  7.9G 0  7.9G   0% /lib/init/rw
  /dev/ram0 1.5G  1.5G   20K 100% /mnt/ram
  
  # dd if=/mnt/ram/zerofile | wc -c
  4791480+0 records in
  4791480+0 records out
  2453237760
  2453237760 bytes (2.5 GB) copied, 8.41821 s, 291 MB/s
  
  It seems there is also an XFS bug here...
  
  This might help triggering the bug however please note than ext4
  (nfs-rdma over it) also hanged on us and it was real work on HDD disks
  and they were not full... after switching to IPoIB it didn't hang
  anymore.
  
  On IPoIB the size problem also shows up: final file is 2.3GB instead
  of  1.5GB, however nothing hangs:
  
  # echo begin; dd if=/dev/zero of=/mnt/nfsram/zerofile bs=1M ; echo
  syncing now ; time sync ; echo finished
  begin
  dd: writing `/mnt/nfsram/zerofile': Input/output error
  2497+0 records in
  2496+0 records out
  2617245696 bytes (2.6 GB) copied, 10.4 s, 252 MB/s
  syncing now
  
  real0m0.057s
  user0m0.000s
  sys 0m0.000s
  finished
  
  I think I noticed the same problem with a 14GB ramdisk, the file ended
  up to be about 15GB, but at that time I thought I made some
  computation mistakes. Now with a smaller ramdisk it's more obvious.
  
  Earlier or later someone should notify the XFS developers of the size bug.
  However currently it's a good thing: the size bug might help us to fix
  the RDMA bug.
  
  Thanks for your help
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: NFS-RDMA hangs: connection closed (-103)

2010-12-02 Thread Spelic

Hello all
please be aware that the file oversize bug is reproducible also 
without infiniband, with just nfs over ethernet over xfs over ramdisk 
(but it doesn't hang, so it's a different bug than the one I posted here 
at the RDMA mailing list)
I have posted another thread regarding the file oversize bug, which 
you can read in the LVM, XFS, and LKML mailing lists, please have a look

http://fossplanet.com/f13/%5Blinux-lvm%5D-bugs-mkfs-xfs-device-mapper-xfs-dev-ram-81653/
Especially my second post, replying myself at +30 minutes, explains that 
it's reproducible also with ethernet.


Thank you

On 12/02/2010 07:37 PM, Roland Dreier wrote:

Adding Dave Chinner to the cc list, since he's both an XFS guru as well
as being very familiar with NFS and RDMA...

Dave, if you read below, it seems there is some strange behavior
exporting XFS with NFS/RDMA.

  - R.

On 12/02/2010 12:59 AM, Tom Tucker wrote:
  Spelic,

  I have seen this problem before, but have not been able to reliably
  reproduce it. When I saw the problem, there were no transport errors
  and it appeared as if the I/O had actually completed, but that the
  waiter was not being awoken. I was not able to reliably reproduce
  the problem and was not able to determine if the problem was a
  latent bug in NFS in general or a bug in the RDMA transport in
  particular.

  I will try your setup here, but I don't have a system like yours so
  I'll have to settle for a smaller ramdisk, however, I have a few
  questions:

  - Does the FS matter? For example, can you use ext[2-4] on the
  ramdisk and not still reproduce
  - As I mentioned earlier NFS v3 vs. NFS v4
  - RAMDISK size, i.e. 2G vs. 14G

  Thanks,
  Tom
  
Hello Tom, thanks for replying
  
- The FS matters to some extent: as I wrote, with ext4 it's not
possible to reproduce the bug in this way, so immediately and
reliably, however ext4 also will hang eventually if you work on it for
hours so I had to switch to IPoIB for our real work; reread my
previous post.
  
- NFS3 not tried yet. Never tried to do RDMA on NFS3... do you have a
pointer on instructions?
  
  
- RAMDISK size: I am testing it.
  
Ok I confirm with 1.5GB ramdisk it's reproducible.
boot option ramdisk_size=1572864
(1.5*1024**2=1572864.0)
confirm: blockdev --getsize64 /dev/ram0 == 1610612736
  
now at server side mkfs and mount with defaults:
mkfs.xfs /dev/ram0
mount /dev/ram0 /mnt/ram
(this is a simplification over my previous email, and it's needed with
a smaller ramdisk or mkfs.xfs will refuse to work. The bug is still
reproducible like this)
  
  
DOH! another bug:
It's strange how at the end of the test
ls -lh /mnt/ram
at server side will show a zerofile larger than 1.5GB at the end of
the procedure, sometimes it's 3GB, sometimes it's 2.3GB... but it's
larger than the ramdisk size.
  
# ll -h /mnt/ram
total 1.5G
drwxr-xr-x 2 root root   21 2010-12-02 12:54 ./
drwxr-xr-x 3 root root 4.0K 2010-11-29 23:51 ../
-rw-r--r-- 1 root root 2.3G 2010-12-02 12:59 zerofile
# df -h
FilesystemSize  Used Avail Use% Mounted on
/dev/sda1 294G  4.1G  275G   2% /
devtmpfs  7.9G  184K  7.9G   1% /dev
none  7.9G 0  7.9G   0% /dev/shm
none  7.9G  100K  7.9G   1% /var/run
none  7.9G 0  7.9G   0% /var/lock
none  7.9G 0  7.9G   0% /lib/init/rw
/dev/ram0 1.5G  1.5G   20K 100% /mnt/ram
  
# dd if=/mnt/ram/zerofile | wc -c
4791480+0 records in
4791480+0 records out
2453237760
2453237760 bytes (2.5 GB) copied, 8.41821 s, 291 MB/s
  
It seems there is also an XFS bug here...
  
This might help triggering the bug however please note than ext4
(nfs-rdma over it) also hanged on us and it was real work on HDD disks
and they were not full... after switching to IPoIB it didn't hang
anymore.
  
On IPoIB the size problem also shows up: final file is 2.3GB instead
of  1.5GB, however nothing hangs:
  
# echo begin; dd if=/dev/zero of=/mnt/nfsram/zerofile bs=1M ; echo
syncing now ; time sync ; echo finished
begin
dd: writing `/mnt/nfsram/zerofile': Input/output error
2497+0 records in
2496+0 records out
2617245696 bytes (2.6 GB) copied, 10.4 s, 252 MB/s
syncing now
  
real0m0.057s
user0m0.000s
sys 0m0.000s
finished
  
I think I noticed the same problem with a 14GB ramdisk, the file ended
up to be about 15GB, but at that time I thought I made some
computation mistakes. Now with a smaller ramdisk it's more obvious.
  
Earlier or later someone should notify the XFS developers of the size 
bug.
However currently it's a good thing: the size bug might help us to fix
the RDMA bug.

Re: NFS-RDMA hangs: connection closed (-103)

2010-12-01 Thread Tom Tucker

Hi Spelic,

Can you reproduce this with an nfsv3 mount?

On 12/1/10 5:13 PM, Spelic wrote:

Hello all

First of all: I have tried to send this message to the list at least 3 
times but it doesn't seem to get through (and I'm given no error back).
It was very long with 2 attachments... is is because of that? What are 
the limits of this ML?

This time I will shorten it a bit and remove the attachments.

Here is my problem:
I am trying to use NFS over RDMA. It doesn't work: hangs very soon.
I tried kernel 2.6.32 from ubuntu 10.04, and then I tried the most 
recent upstream 2.6.37-rc4 compiled from source. They behave basically 
the same regarding the NFS mount itself, only difference is that 2.6.32 
will hang the complete operating system when nfs hangs, while 2.6.37-rc4 
(after nfs hangs) will only hang processes which launch sync or list nfs 
directories. Anyway the mount is hanged forever; does not resolve by 
itself.

IPoIB nfs mounts appear to work flawlessly, the problem is with RDMA only.

Hardware: (identical client and server machines)
07:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx 
HCA] (rev 20)

Subsystem: Mellanox Technologies MT25204 [InfiniHost III Lx HCA]
Flags: bus master, fast devsel, latency 0, IRQ 30
Memory at d880 (64-bit, non-prefetchable) [size=1M]
Memory at d800 (64-bit, prefetchable) [size=8M]
Capabilities: [40] Power Management version 2
Capabilities: [48] Vital Product Data ?
Capabilities: [90] Message Signalled Interrupts: Mask- 64bit+ 
Queue=0/5 Enable-

Capabilities: [84] MSI-X: Enable+ Mask- TabSize=32
Capabilities: [60] Express Endpoint, MSI 00
Kernel driver in use: ib_mthca
Kernel modules: ib_mthca

Mainboard = Supermicro X7DWT with embedded infiniband.

This is my test:
on server I make a big 14GB ramdisk (exact boot option: 
ramdisk_size=14680064), format xfs and mount like this:

mkfs.xfs -f -l size=128m -d agcount=16 /dev/ram0
mount -o nobarrier,inode64,logbufs=8,logbsize=256k /dev/ram0 
/mnt/ram/

On the client I mount like this (fstab):
10.100.0.220:/   /mnt/nfsram   nfs4
_netdev,auto,defaults,rdma,port=20049  0  0


Then on the client I perform
echo begin; dd if=/dev/zero of=/mnt/nfsram/zerofile bs=1M ; echo 
syncing now ; sync ; echo finished


It hangs as soon as it reaches the end of the 14GB of space, but never 
writes syncing now. It seems like the disk full message triggers the 
hangup reliably on NFS over RDMA over XFS over ramdisk; other 
combinations are not so reliable for triggering the bug (e.g. ext4).


However please note that this is not an XFS problem in itself: we had 
another hangup on an ext4 filesystem on NFS on RDMA on real disks for 
real work after a few hours (and it hadn't hit the disk full 
situation); this technique with XFS on ramdisk is just more reliably 
reproducible.


Note that the hangup does not happen on NFS over IPoIB (no RDMA) over 
XFS over ramdisk. It's really an RDMA-only bug.
On the other machine (2.6.32) that was doing real work on real disks I 
am now mounting over IPoIB without RDMA and in fact that one is still 
running reliably.


The dd process hangs like this: (/proc/pid/stack)
[810f8f75] sync_page+0x45/0x60
[810f9143] wait_on_page_bit+0x73/0x80
[810f9590] filemap_fdatawait_range+0x110/0x1a0
[810f9720] filemap_write_and_wait_range+0x70/0x80
[811766ba] vfs_fsync_range+0x5a/0xa0
[8117676c] vfs_fsync+0x1c/0x20
[a02bda1d] nfs_file_write+0xdd/0x1f0 [nfs]
[8114d4fa] do_sync_write+0xda/0x120
[8114d808] vfs_write+0xc8/0x190
[8114e061] sys_write+0x51/0x90
[8100c042] system_call_fastpath+0x16/0x1b
[] 0x

The dd process is not killable with -9 . Stays alive and hanged.

In the dmesg (client) you can see this line immediately, as soon as 
transfer stops (iostat -n 1) and dd hangs up:

 [ 3072.884988] rpcrdma: connection to 10.100.0.220:20049 closed (-103)

after a while you can see this in dmesg
[ 3242.890030] INFO: task dd:2140 blocked for more than 120 seconds.
[ 3242.890132] echo 0  /proc/sys/kernel/hung_task_timeout_secs 
disables this message.
[ 3242.890239] ddD 88040a8f0398 0  2140   2113 
0x
[ 3242.890243]  88040891fb38 0082 88040891fa98 
88040891fa98
[ 3242.890248]  000139c0 88040a8f 88040a8f0398 
88040891ffd8
[ 3242.890251]  88040a8f03a0 000139c0 88040891e010 
000139c0

[ 3242.890255] Call Trace:
[ 3242.890264]  [81035509] ? default_spin_lock_flags+0x9/0x10
[ 3242.890269]  [810f8f30] ? sync_page+0x0/0x60
[ 3242.890273]  [8157b824] io_schedule+0x44/0x60
[ 3242.890276]  [810f8f75] sync_page+0x45/0x60
[ 3242.890279]  [8157c0bf] __wait_on_bit+0x5f/0x90
[ 3242.890281]  

Re: NFS-RDMA hangs: connection closed (-103)

2010-12-01 Thread Tom Tucker

Spelic,

I have seen this problem before, but have not been able to reliably 
reproduce it. When I saw the problem, there were no transport errors and 
it appeared as if the I/O had actually completed, but that the waiter was 
not being awoken. I was not able to reliably reproduce the problem and was 
not able to determine if the problem was a latent bug in NFS in general or 
a bug in the RDMA transport in particular.


I will try your setup here, but I don't have a system like yours so I'll 
have to settle for a smaller ramdisk, however, I have a few questions:


- Does the FS matter? For example, can you use ext[2-4] on the ramdisk and 
not still reproduce

- As I mentioned earlier NFS v3 vs. NFS v4
- RAMDISK size, i.e. 2G vs. 14G

Thanks,
Tom

On 12/1/10 5:13 PM, Spelic wrote:

Hello all

First of all: I have tried to send this message to the list at least 3 
times but it doesn't seem to get through (and I'm given no error back).
It was very long with 2 attachments... is is because of that? What are 
the limits of this ML?

This time I will shorten it a bit and remove the attachments.

Here is my problem:
I am trying to use NFS over RDMA. It doesn't work: hangs very soon.
I tried kernel 2.6.32 from ubuntu 10.04, and then I tried the most 
recent upstream 2.6.37-rc4 compiled from source. They behave basically 
the same regarding the NFS mount itself, only difference is that 2.6.32 
will hang the complete operating system when nfs hangs, while 2.6.37-rc4 
(after nfs hangs) will only hang processes which launch sync or list nfs 
directories. Anyway the mount is hanged forever; does not resolve by 
itself.

IPoIB nfs mounts appear to work flawlessly, the problem is with RDMA only.

Hardware: (identical client and server machines)
07:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx 
HCA] (rev 20)

Subsystem: Mellanox Technologies MT25204 [InfiniHost III Lx HCA]
Flags: bus master, fast devsel, latency 0, IRQ 30
Memory at d880 (64-bit, non-prefetchable) [size=1M]
Memory at d800 (64-bit, prefetchable) [size=8M]
Capabilities: [40] Power Management version 2
Capabilities: [48] Vital Product Data ?
Capabilities: [90] Message Signalled Interrupts: Mask- 64bit+ 
Queue=0/5 Enable-

Capabilities: [84] MSI-X: Enable+ Mask- TabSize=32
Capabilities: [60] Express Endpoint, MSI 00
Kernel driver in use: ib_mthca
Kernel modules: ib_mthca

Mainboard = Supermicro X7DWT with embedded infiniband.

This is my test:
on server I make a big 14GB ramdisk (exact boot option: 
ramdisk_size=14680064), format xfs and mount like this:

mkfs.xfs -f -l size=128m -d agcount=16 /dev/ram0
mount -o nobarrier,inode64,logbufs=8,logbsize=256k /dev/ram0 
/mnt/ram/

On the client I mount like this (fstab):
10.100.0.220:/   /mnt/nfsram   nfs4
_netdev,auto,defaults,rdma,port=20049  0  0


Then on the client I perform
echo begin; dd if=/dev/zero of=/mnt/nfsram/zerofile bs=1M ; echo 
syncing now ; sync ; echo finished


It hangs as soon as it reaches the end of the 14GB of space, but never 
writes syncing now. It seems like the disk full message triggers the 
hangup reliably on NFS over RDMA over XFS over ramdisk; other 
combinations are not so reliable for triggering the bug (e.g. ext4).


However please note that this is not an XFS problem in itself: we had 
another hangup on an ext4 filesystem on NFS on RDMA on real disks for 
real work after a few hours (and it hadn't hit the disk full 
situation); this technique with XFS on ramdisk is just more reliably 
reproducible.


Note that the hangup does not happen on NFS over IPoIB (no RDMA) over 
XFS over ramdisk. It's really an RDMA-only bug.
On the other machine (2.6.32) that was doing real work on real disks I 
am now mounting over IPoIB without RDMA and in fact that one is still 
running reliably.


The dd process hangs like this: (/proc/pid/stack)
[810f8f75] sync_page+0x45/0x60
[810f9143] wait_on_page_bit+0x73/0x80
[810f9590] filemap_fdatawait_range+0x110/0x1a0
[810f9720] filemap_write_and_wait_range+0x70/0x80
[811766ba] vfs_fsync_range+0x5a/0xa0
[8117676c] vfs_fsync+0x1c/0x20
[a02bda1d] nfs_file_write+0xdd/0x1f0 [nfs]
[8114d4fa] do_sync_write+0xda/0x120
[8114d808] vfs_write+0xc8/0x190
[8114e061] sys_write+0x51/0x90
[8100c042] system_call_fastpath+0x16/0x1b
[] 0x

The dd process is not killable with -9 . Stays alive and hanged.

In the dmesg (client) you can see this line immediately, as soon as 
transfer stops (iostat -n 1) and dd hangs up:

 [ 3072.884988] rpcrdma: connection to 10.100.0.220:20049 closed (-103)

after a while you can see this in dmesg
[ 3242.890030] INFO: task dd:2140 blocked for more than 120 seconds.
[ 3242.890132] echo 0  /proc/sys/kernel/hung_task_timeout_secs