Re: NFS-RDMA hangs: connection closed (-103)
On 12/8/10 9:10 AM, Spelic wrote: Tom, have you reproduced the RDMA hangs - connection closes bug or the sparse file at server side upon NFS hitting ENOSPC ? Because for the latter people have already given exhaustive explanation: see this other thread at http://fossplanet.com/f13/%5Blinux-lvm%5D-bugs-mkfs-xfs-device-mapper-xfs-dev-ram-81653/ While the former bug is still open and very interesting for us. I'm working on the 'former' bug. The bug that I think you've run in to with how RDMA transport errors are handled and how RPC are retried in the event of an error. With hard mounts (which I'm suspecting you have), the RPC will be retried forever. In this bug, the transport never 'recovers' after the error and therefore the RPC never succeeds and the mount is effectively hung. There were bugs fixed in this area between 34 and top which is why you saw the less catastrophic, but still broken behavior you see now. Unfortunately I can only support this part-time, but I'll keep you updated on the progress. Thanks for finding this and helping to debug, Tom Thanks for your help S. On 12/07/2010 05:12 PM, Tom Tucker wrote: Status update... I have reproduced the bug a number of different ways. It seems to be most easily reproduced by simply writing more data than the filesystem has space for. I can do this reliably with any FS. I think the XFS bug may have tickled this bug somehow. Tom On 12/2/10 1:09 PM, Spelic wrote: Hello all please be aware that the file oversize bug is reproducible also without infiniband, with just nfs over ethernet over xfs over ramdisk (but it doesn't hang, so it's a different bug than the one I posted here at the RDMA mailing list) I have posted another thread regarding the file oversize bug, which you can read in the LVM, XFS, and LKML mailing lists, please have a look http://fossplanet.com/f13/%5Blinux-lvm%5D-bugs-mkfs-xfs-device-mapper-xfs-dev-ram-81653/ Especially my second post, replying myself at +30 minutes, explains that it's reproducible also with ethernet. Thank you On 12/02/2010 07:37 PM, Roland Dreier wrote: Adding Dave Chinner to the cc list, since he's both an XFS guru as well as being very familiar with NFS and RDMA... Dave, if you read below, it seems there is some strange behavior exporting XFS with NFS/RDMA. - R. On 12/02/2010 12:59 AM, Tom Tucker wrote: Spelic, I have seen this problem before, but have not been able to reliably reproduce it. When I saw the problem, there were no transport errors and it appeared as if the I/O had actually completed, but that the waiter was not being awoken. I was not able to reliably reproduce the problem and was not able to determine if the problem was a latent bug in NFS in general or a bug in the RDMA transport in particular. I will try your setup here, but I don't have a system like yours so I'll have to settle for a smaller ramdisk, however, I have a few questions: - Does the FS matter? For example, can you use ext[2-4] on the ramdisk and not still reproduce - As I mentioned earlier NFS v3 vs. NFS v4 - RAMDISK size, i.e. 2G vs. 14G Thanks, Tom Hello Tom, thanks for replying - The FS matters to some extent: as I wrote, with ext4 it's not possible to reproduce the bug in this way, so immediately and reliably, however ext4 also will hang eventually if you work on it for hours so I had to switch to IPoIB for our real work; reread my previous post. - NFS3 not tried yet. Never tried to do RDMA on NFS3... do you have a pointer on instructions? - RAMDISK size: I am testing it. Ok I confirm with 1.5GB ramdisk it's reproducible. boot option ramdisk_size=1572864 (1.5*1024**2=1572864.0) confirm: blockdev --getsize64 /dev/ram0 == 1610612736 now at server side mkfs and mount with defaults: mkfs.xfs /dev/ram0 mount /dev/ram0 /mnt/ram (this is a simplification over my previous email, and it's needed with a smaller ramdisk or mkfs.xfs will refuse to work. The bug is still reproducible like this) DOH! another bug: It's strange how at the end of the test ls -lh /mnt/ram at server side will show a zerofile larger than 1.5GB at the end of the procedure, sometimes it's 3GB, sometimes it's 2.3GB... but it's larger than the ramdisk size. # ll -h /mnt/ram total 1.5G drwxr-xr-x 2 root root 21 2010-12-02 12:54 ./ drwxr-xr-x 3 root root 4.0K 2010-11-29 23:51 ../ -rw-r--r-- 1 root root 2.3G 2010-12-02 12:59 zerofile # df -h FilesystemSize Used Avail Use% Mounted on /dev/sda1 294G 4.1G 275G 2% / devtmpfs 7.9G 184K 7.9G 1% /dev none 7.9G 0 7.9G 0% /dev/shm none 7.9G 100K 7.9G 1% /var/run none 7.9G 0 7.9G 0% /var/lock none 7.9G 0 7.9G 0% /lib/init/rw /dev/ram0 1.5G 1.5G 20K 100% /mnt/ram # dd
Re: NFS-RDMA hangs: connection closed (-103)
Tom, have you reproduced the RDMA hangs - connection closes bug or the sparse file at server side upon NFS hitting ENOSPC ? Because for the latter people have already given exhaustive explanation: see this other thread at http://fossplanet.com/f13/%5Blinux-lvm%5D-bugs-mkfs-xfs-device-mapper-xfs-dev-ram-81653/ While the former bug is still open and very interesting for us. Thanks for your help S. On 12/07/2010 05:12 PM, Tom Tucker wrote: Status update... I have reproduced the bug a number of different ways. It seems to be most easily reproduced by simply writing more data than the filesystem has space for. I can do this reliably with any FS. I think the XFS bug may have tickled this bug somehow. Tom On 12/2/10 1:09 PM, Spelic wrote: Hello all please be aware that the file oversize bug is reproducible also without infiniband, with just nfs over ethernet over xfs over ramdisk (but it doesn't hang, so it's a different bug than the one I posted here at the RDMA mailing list) I have posted another thread regarding the file oversize bug, which you can read in the LVM, XFS, and LKML mailing lists, please have a look http://fossplanet.com/f13/%5Blinux-lvm%5D-bugs-mkfs-xfs-device-mapper-xfs-dev-ram-81653/ Especially my second post, replying myself at +30 minutes, explains that it's reproducible also with ethernet. Thank you On 12/02/2010 07:37 PM, Roland Dreier wrote: Adding Dave Chinner to the cc list, since he's both an XFS guru as well as being very familiar with NFS and RDMA... Dave, if you read below, it seems there is some strange behavior exporting XFS with NFS/RDMA. - R. On 12/02/2010 12:59 AM, Tom Tucker wrote: Spelic, I have seen this problem before, but have not been able to reliably reproduce it. When I saw the problem, there were no transport errors and it appeared as if the I/O had actually completed, but that the waiter was not being awoken. I was not able to reliably reproduce the problem and was not able to determine if the problem was a latent bug in NFS in general or a bug in the RDMA transport in particular. I will try your setup here, but I don't have a system like yours so I'll have to settle for a smaller ramdisk, however, I have a few questions: - Does the FS matter? For example, can you use ext[2-4] on the ramdisk and not still reproduce - As I mentioned earlier NFS v3 vs. NFS v4 - RAMDISK size, i.e. 2G vs. 14G Thanks, Tom Hello Tom, thanks for replying - The FS matters to some extent: as I wrote, with ext4 it's not possible to reproduce the bug in this way, so immediately and reliably, however ext4 also will hang eventually if you work on it for hours so I had to switch to IPoIB for our real work; reread my previous post. - NFS3 not tried yet. Never tried to do RDMA on NFS3... do you have a pointer on instructions? - RAMDISK size: I am testing it. Ok I confirm with 1.5GB ramdisk it's reproducible. boot option ramdisk_size=1572864 (1.5*1024**2=1572864.0) confirm: blockdev --getsize64 /dev/ram0 == 1610612736 now at server side mkfs and mount with defaults: mkfs.xfs /dev/ram0 mount /dev/ram0 /mnt/ram (this is a simplification over my previous email, and it's needed with a smaller ramdisk or mkfs.xfs will refuse to work. The bug is still reproducible like this) DOH! another bug: It's strange how at the end of the test ls -lh /mnt/ram at server side will show a zerofile larger than 1.5GB at the end of the procedure, sometimes it's 3GB, sometimes it's 2.3GB... but it's larger than the ramdisk size. # ll -h /mnt/ram total 1.5G drwxr-xr-x 2 root root 21 2010-12-02 12:54 ./ drwxr-xr-x 3 root root 4.0K 2010-11-29 23:51 ../ -rw-r--r-- 1 root root 2.3G 2010-12-02 12:59 zerofile # df -h FilesystemSize Used Avail Use% Mounted on /dev/sda1 294G 4.1G 275G 2% / devtmpfs 7.9G 184K 7.9G 1% /dev none 7.9G 0 7.9G 0% /dev/shm none 7.9G 100K 7.9G 1% /var/run none 7.9G 0 7.9G 0% /var/lock none 7.9G 0 7.9G 0% /lib/init/rw /dev/ram0 1.5G 1.5G 20K 100% /mnt/ram # dd if=/mnt/ram/zerofile | wc -c 4791480+0 records in 4791480+0 records out 2453237760 2453237760 bytes (2.5 GB) copied, 8.41821 s, 291 MB/s It seems there is also an XFS bug here... This might help triggering the bug however please note than ext4 (nfs-rdma over it) also hanged on us and it was real work on HDD disks and they were not full... after switching to IPoIB it didn't hang anymore. On IPoIB the size problem also shows up: final file is 2.3GB instead of 1.5GB, however nothing hangs: # echo begin; dd if=/dev/zero of=/mnt/nfsram/zerofile bs=1M ; echo syncing now ; time sync ; echo finished begin dd: writing `/mnt/nfsram/zerofile': Input/output error 2497+0
Re: NFS-RDMA hangs: connection closed (-103)
Status update... I have reproduced the bug a number of different ways. It seems to be most easily reproduced by simply writing more data than the filesystem has space for. I can do this reliably with any FS. I think the XFS bug may have tickled this bug somehow. Tom On 12/2/10 1:09 PM, Spelic wrote: Hello all please be aware that the file oversize bug is reproducible also without infiniband, with just nfs over ethernet over xfs over ramdisk (but it doesn't hang, so it's a different bug than the one I posted here at the RDMA mailing list) I have posted another thread regarding the file oversize bug, which you can read in the LVM, XFS, and LKML mailing lists, please have a look http://fossplanet.com/f13/%5Blinux-lvm%5D-bugs-mkfs-xfs-device-mapper-xfs-dev-ram-81653/ Especially my second post, replying myself at +30 minutes, explains that it's reproducible also with ethernet. Thank you On 12/02/2010 07:37 PM, Roland Dreier wrote: Adding Dave Chinner to the cc list, since he's both an XFS guru as well as being very familiar with NFS and RDMA... Dave, if you read below, it seems there is some strange behavior exporting XFS with NFS/RDMA. - R. On 12/02/2010 12:59 AM, Tom Tucker wrote: Spelic, I have seen this problem before, but have not been able to reliably reproduce it. When I saw the problem, there were no transport errors and it appeared as if the I/O had actually completed, but that the waiter was not being awoken. I was not able to reliably reproduce the problem and was not able to determine if the problem was a latent bug in NFS in general or a bug in the RDMA transport in particular. I will try your setup here, but I don't have a system like yours so I'll have to settle for a smaller ramdisk, however, I have a few questions: - Does the FS matter? For example, can you use ext[2-4] on the ramdisk and not still reproduce - As I mentioned earlier NFS v3 vs. NFS v4 - RAMDISK size, i.e. 2G vs. 14G Thanks, Tom Hello Tom, thanks for replying - The FS matters to some extent: as I wrote, with ext4 it's not possible to reproduce the bug in this way, so immediately and reliably, however ext4 also will hang eventually if you work on it for hours so I had to switch to IPoIB for our real work; reread my previous post. - NFS3 not tried yet. Never tried to do RDMA on NFS3... do you have a pointer on instructions? - RAMDISK size: I am testing it. Ok I confirm with 1.5GB ramdisk it's reproducible. boot option ramdisk_size=1572864 (1.5*1024**2=1572864.0) confirm: blockdev --getsize64 /dev/ram0 == 1610612736 now at server side mkfs and mount with defaults: mkfs.xfs /dev/ram0 mount /dev/ram0 /mnt/ram (this is a simplification over my previous email, and it's needed with a smaller ramdisk or mkfs.xfs will refuse to work. The bug is still reproducible like this) DOH! another bug: It's strange how at the end of the test ls -lh /mnt/ram at server side will show a zerofile larger than 1.5GB at the end of the procedure, sometimes it's 3GB, sometimes it's 2.3GB... but it's larger than the ramdisk size. # ll -h /mnt/ram total 1.5G drwxr-xr-x 2 root root 21 2010-12-02 12:54 ./ drwxr-xr-x 3 root root 4.0K 2010-11-29 23:51 ../ -rw-r--r-- 1 root root 2.3G 2010-12-02 12:59 zerofile # df -h FilesystemSize Used Avail Use% Mounted on /dev/sda1 294G 4.1G 275G 2% / devtmpfs 7.9G 184K 7.9G 1% /dev none 7.9G 0 7.9G 0% /dev/shm none 7.9G 100K 7.9G 1% /var/run none 7.9G 0 7.9G 0% /var/lock none 7.9G 0 7.9G 0% /lib/init/rw /dev/ram0 1.5G 1.5G 20K 100% /mnt/ram # dd if=/mnt/ram/zerofile | wc -c 4791480+0 records in 4791480+0 records out 2453237760 2453237760 bytes (2.5 GB) copied, 8.41821 s, 291 MB/s It seems there is also an XFS bug here... This might help triggering the bug however please note than ext4 (nfs-rdma over it) also hanged on us and it was real work on HDD disks and they were not full... after switching to IPoIB it didn't hang anymore. On IPoIB the size problem also shows up: final file is 2.3GB instead of 1.5GB, however nothing hangs: # echo begin; dd if=/dev/zero of=/mnt/nfsram/zerofile bs=1M ; echo syncing now ; time sync ; echo finished begin dd: writing `/mnt/nfsram/zerofile': Input/output error 2497+0 records in 2496+0 records out 2617245696 bytes (2.6 GB) copied, 10.4 s, 252 MB/s syncing now real0m0.057s user0m0.000s sys 0m0.000s finished I think I noticed the same problem with a 14GB ramdisk, the file ended up to be about 15GB, but at that time I thought I made some computation mistakes. Now with a smaller ramdisk it's more obvious. Earlier or later someone should notify the XFS developers of the size bug. However
Re: NFS-RDMA hangs: connection closed (-103)
On 12/02/2010 12:59 AM, Tom Tucker wrote: Spelic, I have seen this problem before, but have not been able to reliably reproduce it. When I saw the problem, there were no transport errors and it appeared as if the I/O had actually completed, but that the waiter was not being awoken. I was not able to reliably reproduce the problem and was not able to determine if the problem was a latent bug in NFS in general or a bug in the RDMA transport in particular. I will try your setup here, but I don't have a system like yours so I'll have to settle for a smaller ramdisk, however, I have a few questions: - Does the FS matter? For example, can you use ext[2-4] on the ramdisk and not still reproduce - As I mentioned earlier NFS v3 vs. NFS v4 - RAMDISK size, i.e. 2G vs. 14G Thanks, Tom Hello Tom, thanks for replying - The FS matters to some extent: as I wrote, with ext4 it's not possible to reproduce the bug in this way, so immediately and reliably, however ext4 also will hang eventually if you work on it for hours so I had to switch to IPoIB for our real work; reread my previous post. - NFS3 not tried yet. Never tried to do RDMA on NFS3... do you have a pointer on instructions? - RAMDISK size: I am testing it. Ok I confirm with 1.5GB ramdisk it's reproducible. boot option ramdisk_size=1572864 (1.5*1024**2=1572864.0) confirm: blockdev --getsize64 /dev/ram0 == 1610612736 now at server side mkfs and mount with defaults: mkfs.xfs /dev/ram0 mount /dev/ram0 /mnt/ram (this is a simplification over my previous email, and it's needed with a smaller ramdisk or mkfs.xfs will refuse to work. The bug is still reproducible like this) DOH! another bug: It's strange how at the end of the test ls -lh /mnt/ram at server side will show a zerofile larger than 1.5GB at the end of the procedure, sometimes it's 3GB, sometimes it's 2.3GB... but it's larger than the ramdisk size. # ll -h /mnt/ram total 1.5G drwxr-xr-x 2 root root 21 2010-12-02 12:54 ./ drwxr-xr-x 3 root root 4.0K 2010-11-29 23:51 ../ -rw-r--r-- 1 root root 2.3G 2010-12-02 12:59 zerofile # df -h FilesystemSize Used Avail Use% Mounted on /dev/sda1 294G 4.1G 275G 2% / devtmpfs 7.9G 184K 7.9G 1% /dev none 7.9G 0 7.9G 0% /dev/shm none 7.9G 100K 7.9G 1% /var/run none 7.9G 0 7.9G 0% /var/lock none 7.9G 0 7.9G 0% /lib/init/rw /dev/ram0 1.5G 1.5G 20K 100% /mnt/ram # dd if=/mnt/ram/zerofile | wc -c 4791480+0 records in 4791480+0 records out 2453237760 2453237760 bytes (2.5 GB) copied, 8.41821 s, 291 MB/s It seems there is also an XFS bug here... This might help triggering the bug however please note than ext4 (nfs-rdma over it) also hanged on us and it was real work on HDD disks and they were not full... after switching to IPoIB it didn't hang anymore. On IPoIB the size problem also shows up: final file is 2.3GB instead of 1.5GB, however nothing hangs: # echo begin; dd if=/dev/zero of=/mnt/nfsram/zerofile bs=1M ; echo syncing now ; time sync ; echo finished begin dd: writing `/mnt/nfsram/zerofile': Input/output error 2497+0 records in 2496+0 records out 2617245696 bytes (2.6 GB) copied, 10.4 s, 252 MB/s syncing now real0m0.057s user0m0.000s sys 0m0.000s finished I think I noticed the same problem with a 14GB ramdisk, the file ended up to be about 15GB, but at that time I thought I made some computation mistakes. Now with a smaller ramdisk it's more obvious. Earlier or later someone should notify the XFS developers of the size bug. However currently it's a good thing: the size bug might help us to fix the RDMA bug. Thanks for your help -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFS-RDMA hangs: connection closed (-103)
Adding Dave Chinner to the cc list, since he's both an XFS guru as well as being very familiar with NFS and RDMA... Dave, if you read below, it seems there is some strange behavior exporting XFS with NFS/RDMA. - R. On 12/02/2010 12:59 AM, Tom Tucker wrote: Spelic, I have seen this problem before, but have not been able to reliably reproduce it. When I saw the problem, there were no transport errors and it appeared as if the I/O had actually completed, but that the waiter was not being awoken. I was not able to reliably reproduce the problem and was not able to determine if the problem was a latent bug in NFS in general or a bug in the RDMA transport in particular. I will try your setup here, but I don't have a system like yours so I'll have to settle for a smaller ramdisk, however, I have a few questions: - Does the FS matter? For example, can you use ext[2-4] on the ramdisk and not still reproduce - As I mentioned earlier NFS v3 vs. NFS v4 - RAMDISK size, i.e. 2G vs. 14G Thanks, Tom Hello Tom, thanks for replying - The FS matters to some extent: as I wrote, with ext4 it's not possible to reproduce the bug in this way, so immediately and reliably, however ext4 also will hang eventually if you work on it for hours so I had to switch to IPoIB for our real work; reread my previous post. - NFS3 not tried yet. Never tried to do RDMA on NFS3... do you have a pointer on instructions? - RAMDISK size: I am testing it. Ok I confirm with 1.5GB ramdisk it's reproducible. boot option ramdisk_size=1572864 (1.5*1024**2=1572864.0) confirm: blockdev --getsize64 /dev/ram0 == 1610612736 now at server side mkfs and mount with defaults: mkfs.xfs /dev/ram0 mount /dev/ram0 /mnt/ram (this is a simplification over my previous email, and it's needed with a smaller ramdisk or mkfs.xfs will refuse to work. The bug is still reproducible like this) DOH! another bug: It's strange how at the end of the test ls -lh /mnt/ram at server side will show a zerofile larger than 1.5GB at the end of the procedure, sometimes it's 3GB, sometimes it's 2.3GB... but it's larger than the ramdisk size. # ll -h /mnt/ram total 1.5G drwxr-xr-x 2 root root 21 2010-12-02 12:54 ./ drwxr-xr-x 3 root root 4.0K 2010-11-29 23:51 ../ -rw-r--r-- 1 root root 2.3G 2010-12-02 12:59 zerofile # df -h FilesystemSize Used Avail Use% Mounted on /dev/sda1 294G 4.1G 275G 2% / devtmpfs 7.9G 184K 7.9G 1% /dev none 7.9G 0 7.9G 0% /dev/shm none 7.9G 100K 7.9G 1% /var/run none 7.9G 0 7.9G 0% /var/lock none 7.9G 0 7.9G 0% /lib/init/rw /dev/ram0 1.5G 1.5G 20K 100% /mnt/ram # dd if=/mnt/ram/zerofile | wc -c 4791480+0 records in 4791480+0 records out 2453237760 2453237760 bytes (2.5 GB) copied, 8.41821 s, 291 MB/s It seems there is also an XFS bug here... This might help triggering the bug however please note than ext4 (nfs-rdma over it) also hanged on us and it was real work on HDD disks and they were not full... after switching to IPoIB it didn't hang anymore. On IPoIB the size problem also shows up: final file is 2.3GB instead of 1.5GB, however nothing hangs: # echo begin; dd if=/dev/zero of=/mnt/nfsram/zerofile bs=1M ; echo syncing now ; time sync ; echo finished begin dd: writing `/mnt/nfsram/zerofile': Input/output error 2497+0 records in 2496+0 records out 2617245696 bytes (2.6 GB) copied, 10.4 s, 252 MB/s syncing now real0m0.057s user0m0.000s sys 0m0.000s finished I think I noticed the same problem with a 14GB ramdisk, the file ended up to be about 15GB, but at that time I thought I made some computation mistakes. Now with a smaller ramdisk it's more obvious. Earlier or later someone should notify the XFS developers of the size bug. However currently it's a good thing: the size bug might help us to fix the RDMA bug. Thanks for your help -- To unsubscribe from this list: send the line unsubscribe linux-rdma in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: NFS-RDMA hangs: connection closed (-103)
Hello all please be aware that the file oversize bug is reproducible also without infiniband, with just nfs over ethernet over xfs over ramdisk (but it doesn't hang, so it's a different bug than the one I posted here at the RDMA mailing list) I have posted another thread regarding the file oversize bug, which you can read in the LVM, XFS, and LKML mailing lists, please have a look http://fossplanet.com/f13/%5Blinux-lvm%5D-bugs-mkfs-xfs-device-mapper-xfs-dev-ram-81653/ Especially my second post, replying myself at +30 minutes, explains that it's reproducible also with ethernet. Thank you On 12/02/2010 07:37 PM, Roland Dreier wrote: Adding Dave Chinner to the cc list, since he's both an XFS guru as well as being very familiar with NFS and RDMA... Dave, if you read below, it seems there is some strange behavior exporting XFS with NFS/RDMA. - R. On 12/02/2010 12:59 AM, Tom Tucker wrote: Spelic, I have seen this problem before, but have not been able to reliably reproduce it. When I saw the problem, there were no transport errors and it appeared as if the I/O had actually completed, but that the waiter was not being awoken. I was not able to reliably reproduce the problem and was not able to determine if the problem was a latent bug in NFS in general or a bug in the RDMA transport in particular. I will try your setup here, but I don't have a system like yours so I'll have to settle for a smaller ramdisk, however, I have a few questions: - Does the FS matter? For example, can you use ext[2-4] on the ramdisk and not still reproduce - As I mentioned earlier NFS v3 vs. NFS v4 - RAMDISK size, i.e. 2G vs. 14G Thanks, Tom Hello Tom, thanks for replying - The FS matters to some extent: as I wrote, with ext4 it's not possible to reproduce the bug in this way, so immediately and reliably, however ext4 also will hang eventually if you work on it for hours so I had to switch to IPoIB for our real work; reread my previous post. - NFS3 not tried yet. Never tried to do RDMA on NFS3... do you have a pointer on instructions? - RAMDISK size: I am testing it. Ok I confirm with 1.5GB ramdisk it's reproducible. boot option ramdisk_size=1572864 (1.5*1024**2=1572864.0) confirm: blockdev --getsize64 /dev/ram0 == 1610612736 now at server side mkfs and mount with defaults: mkfs.xfs /dev/ram0 mount /dev/ram0 /mnt/ram (this is a simplification over my previous email, and it's needed with a smaller ramdisk or mkfs.xfs will refuse to work. The bug is still reproducible like this) DOH! another bug: It's strange how at the end of the test ls -lh /mnt/ram at server side will show a zerofile larger than 1.5GB at the end of the procedure, sometimes it's 3GB, sometimes it's 2.3GB... but it's larger than the ramdisk size. # ll -h /mnt/ram total 1.5G drwxr-xr-x 2 root root 21 2010-12-02 12:54 ./ drwxr-xr-x 3 root root 4.0K 2010-11-29 23:51 ../ -rw-r--r-- 1 root root 2.3G 2010-12-02 12:59 zerofile # df -h FilesystemSize Used Avail Use% Mounted on /dev/sda1 294G 4.1G 275G 2% / devtmpfs 7.9G 184K 7.9G 1% /dev none 7.9G 0 7.9G 0% /dev/shm none 7.9G 100K 7.9G 1% /var/run none 7.9G 0 7.9G 0% /var/lock none 7.9G 0 7.9G 0% /lib/init/rw /dev/ram0 1.5G 1.5G 20K 100% /mnt/ram # dd if=/mnt/ram/zerofile | wc -c 4791480+0 records in 4791480+0 records out 2453237760 2453237760 bytes (2.5 GB) copied, 8.41821 s, 291 MB/s It seems there is also an XFS bug here... This might help triggering the bug however please note than ext4 (nfs-rdma over it) also hanged on us and it was real work on HDD disks and they were not full... after switching to IPoIB it didn't hang anymore. On IPoIB the size problem also shows up: final file is 2.3GB instead of 1.5GB, however nothing hangs: # echo begin; dd if=/dev/zero of=/mnt/nfsram/zerofile bs=1M ; echo syncing now ; time sync ; echo finished begin dd: writing `/mnt/nfsram/zerofile': Input/output error 2497+0 records in 2496+0 records out 2617245696 bytes (2.6 GB) copied, 10.4 s, 252 MB/s syncing now real0m0.057s user0m0.000s sys 0m0.000s finished I think I noticed the same problem with a 14GB ramdisk, the file ended up to be about 15GB, but at that time I thought I made some computation mistakes. Now with a smaller ramdisk it's more obvious. Earlier or later someone should notify the XFS developers of the size bug. However currently it's a good thing: the size bug might help us to fix the RDMA bug.
Re: NFS-RDMA hangs: connection closed (-103)
Hi Spelic, Can you reproduce this with an nfsv3 mount? On 12/1/10 5:13 PM, Spelic wrote: Hello all First of all: I have tried to send this message to the list at least 3 times but it doesn't seem to get through (and I'm given no error back). It was very long with 2 attachments... is is because of that? What are the limits of this ML? This time I will shorten it a bit and remove the attachments. Here is my problem: I am trying to use NFS over RDMA. It doesn't work: hangs very soon. I tried kernel 2.6.32 from ubuntu 10.04, and then I tried the most recent upstream 2.6.37-rc4 compiled from source. They behave basically the same regarding the NFS mount itself, only difference is that 2.6.32 will hang the complete operating system when nfs hangs, while 2.6.37-rc4 (after nfs hangs) will only hang processes which launch sync or list nfs directories. Anyway the mount is hanged forever; does not resolve by itself. IPoIB nfs mounts appear to work flawlessly, the problem is with RDMA only. Hardware: (identical client and server machines) 07:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 20) Subsystem: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] Flags: bus master, fast devsel, latency 0, IRQ 30 Memory at d880 (64-bit, non-prefetchable) [size=1M] Memory at d800 (64-bit, prefetchable) [size=8M] Capabilities: [40] Power Management version 2 Capabilities: [48] Vital Product Data ? Capabilities: [90] Message Signalled Interrupts: Mask- 64bit+ Queue=0/5 Enable- Capabilities: [84] MSI-X: Enable+ Mask- TabSize=32 Capabilities: [60] Express Endpoint, MSI 00 Kernel driver in use: ib_mthca Kernel modules: ib_mthca Mainboard = Supermicro X7DWT with embedded infiniband. This is my test: on server I make a big 14GB ramdisk (exact boot option: ramdisk_size=14680064), format xfs and mount like this: mkfs.xfs -f -l size=128m -d agcount=16 /dev/ram0 mount -o nobarrier,inode64,logbufs=8,logbsize=256k /dev/ram0 /mnt/ram/ On the client I mount like this (fstab): 10.100.0.220:/ /mnt/nfsram nfs4 _netdev,auto,defaults,rdma,port=20049 0 0 Then on the client I perform echo begin; dd if=/dev/zero of=/mnt/nfsram/zerofile bs=1M ; echo syncing now ; sync ; echo finished It hangs as soon as it reaches the end of the 14GB of space, but never writes syncing now. It seems like the disk full message triggers the hangup reliably on NFS over RDMA over XFS over ramdisk; other combinations are not so reliable for triggering the bug (e.g. ext4). However please note that this is not an XFS problem in itself: we had another hangup on an ext4 filesystem on NFS on RDMA on real disks for real work after a few hours (and it hadn't hit the disk full situation); this technique with XFS on ramdisk is just more reliably reproducible. Note that the hangup does not happen on NFS over IPoIB (no RDMA) over XFS over ramdisk. It's really an RDMA-only bug. On the other machine (2.6.32) that was doing real work on real disks I am now mounting over IPoIB without RDMA and in fact that one is still running reliably. The dd process hangs like this: (/proc/pid/stack) [810f8f75] sync_page+0x45/0x60 [810f9143] wait_on_page_bit+0x73/0x80 [810f9590] filemap_fdatawait_range+0x110/0x1a0 [810f9720] filemap_write_and_wait_range+0x70/0x80 [811766ba] vfs_fsync_range+0x5a/0xa0 [8117676c] vfs_fsync+0x1c/0x20 [a02bda1d] nfs_file_write+0xdd/0x1f0 [nfs] [8114d4fa] do_sync_write+0xda/0x120 [8114d808] vfs_write+0xc8/0x190 [8114e061] sys_write+0x51/0x90 [8100c042] system_call_fastpath+0x16/0x1b [] 0x The dd process is not killable with -9 . Stays alive and hanged. In the dmesg (client) you can see this line immediately, as soon as transfer stops (iostat -n 1) and dd hangs up: [ 3072.884988] rpcrdma: connection to 10.100.0.220:20049 closed (-103) after a while you can see this in dmesg [ 3242.890030] INFO: task dd:2140 blocked for more than 120 seconds. [ 3242.890132] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [ 3242.890239] ddD 88040a8f0398 0 2140 2113 0x [ 3242.890243] 88040891fb38 0082 88040891fa98 88040891fa98 [ 3242.890248] 000139c0 88040a8f 88040a8f0398 88040891ffd8 [ 3242.890251] 88040a8f03a0 000139c0 88040891e010 000139c0 [ 3242.890255] Call Trace: [ 3242.890264] [81035509] ? default_spin_lock_flags+0x9/0x10 [ 3242.890269] [810f8f30] ? sync_page+0x0/0x60 [ 3242.890273] [8157b824] io_schedule+0x44/0x60 [ 3242.890276] [810f8f75] sync_page+0x45/0x60 [ 3242.890279] [8157c0bf] __wait_on_bit+0x5f/0x90 [ 3242.890281]
Re: NFS-RDMA hangs: connection closed (-103)
Spelic, I have seen this problem before, but have not been able to reliably reproduce it. When I saw the problem, there were no transport errors and it appeared as if the I/O had actually completed, but that the waiter was not being awoken. I was not able to reliably reproduce the problem and was not able to determine if the problem was a latent bug in NFS in general or a bug in the RDMA transport in particular. I will try your setup here, but I don't have a system like yours so I'll have to settle for a smaller ramdisk, however, I have a few questions: - Does the FS matter? For example, can you use ext[2-4] on the ramdisk and not still reproduce - As I mentioned earlier NFS v3 vs. NFS v4 - RAMDISK size, i.e. 2G vs. 14G Thanks, Tom On 12/1/10 5:13 PM, Spelic wrote: Hello all First of all: I have tried to send this message to the list at least 3 times but it doesn't seem to get through (and I'm given no error back). It was very long with 2 attachments... is is because of that? What are the limits of this ML? This time I will shorten it a bit and remove the attachments. Here is my problem: I am trying to use NFS over RDMA. It doesn't work: hangs very soon. I tried kernel 2.6.32 from ubuntu 10.04, and then I tried the most recent upstream 2.6.37-rc4 compiled from source. They behave basically the same regarding the NFS mount itself, only difference is that 2.6.32 will hang the complete operating system when nfs hangs, while 2.6.37-rc4 (after nfs hangs) will only hang processes which launch sync or list nfs directories. Anyway the mount is hanged forever; does not resolve by itself. IPoIB nfs mounts appear to work flawlessly, the problem is with RDMA only. Hardware: (identical client and server machines) 07:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev 20) Subsystem: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] Flags: bus master, fast devsel, latency 0, IRQ 30 Memory at d880 (64-bit, non-prefetchable) [size=1M] Memory at d800 (64-bit, prefetchable) [size=8M] Capabilities: [40] Power Management version 2 Capabilities: [48] Vital Product Data ? Capabilities: [90] Message Signalled Interrupts: Mask- 64bit+ Queue=0/5 Enable- Capabilities: [84] MSI-X: Enable+ Mask- TabSize=32 Capabilities: [60] Express Endpoint, MSI 00 Kernel driver in use: ib_mthca Kernel modules: ib_mthca Mainboard = Supermicro X7DWT with embedded infiniband. This is my test: on server I make a big 14GB ramdisk (exact boot option: ramdisk_size=14680064), format xfs and mount like this: mkfs.xfs -f -l size=128m -d agcount=16 /dev/ram0 mount -o nobarrier,inode64,logbufs=8,logbsize=256k /dev/ram0 /mnt/ram/ On the client I mount like this (fstab): 10.100.0.220:/ /mnt/nfsram nfs4 _netdev,auto,defaults,rdma,port=20049 0 0 Then on the client I perform echo begin; dd if=/dev/zero of=/mnt/nfsram/zerofile bs=1M ; echo syncing now ; sync ; echo finished It hangs as soon as it reaches the end of the 14GB of space, but never writes syncing now. It seems like the disk full message triggers the hangup reliably on NFS over RDMA over XFS over ramdisk; other combinations are not so reliable for triggering the bug (e.g. ext4). However please note that this is not an XFS problem in itself: we had another hangup on an ext4 filesystem on NFS on RDMA on real disks for real work after a few hours (and it hadn't hit the disk full situation); this technique with XFS on ramdisk is just more reliably reproducible. Note that the hangup does not happen on NFS over IPoIB (no RDMA) over XFS over ramdisk. It's really an RDMA-only bug. On the other machine (2.6.32) that was doing real work on real disks I am now mounting over IPoIB without RDMA and in fact that one is still running reliably. The dd process hangs like this: (/proc/pid/stack) [810f8f75] sync_page+0x45/0x60 [810f9143] wait_on_page_bit+0x73/0x80 [810f9590] filemap_fdatawait_range+0x110/0x1a0 [810f9720] filemap_write_and_wait_range+0x70/0x80 [811766ba] vfs_fsync_range+0x5a/0xa0 [8117676c] vfs_fsync+0x1c/0x20 [a02bda1d] nfs_file_write+0xdd/0x1f0 [nfs] [8114d4fa] do_sync_write+0xda/0x120 [8114d808] vfs_write+0xc8/0x190 [8114e061] sys_write+0x51/0x90 [8100c042] system_call_fastpath+0x16/0x1b [] 0x The dd process is not killable with -9 . Stays alive and hanged. In the dmesg (client) you can see this line immediately, as soon as transfer stops (iostat -n 1) and dd hangs up: [ 3072.884988] rpcrdma: connection to 10.100.0.220:20049 closed (-103) after a while you can see this in dmesg [ 3242.890030] INFO: task dd:2140 blocked for more than 120 seconds. [ 3242.890132] echo 0 /proc/sys/kernel/hung_task_timeout_secs