Re: [ceph-users] hanging nfsd requests on an RBD to NFS gateway
@John-Paul Robinson: I’ve also experienced nfs being blocked when serving rbd devices (XFS system). In my scenario I had rbd device mapped on an OSD host and nfs exported (lab scenario). Log entries below.. Running Centos 7 w/ 3.10.0-229.14.1.el7.x86_64. Next step for me is to compile 3.18.22 and test nfs and scst (iscsi / fc). Oct 22 13:30:01 osdhost01 systemd: Started Session 14 of user root. Oct 22 13:37:04 osdhost01 kernel: INFO: task nfsd:12672 blocked for more than 120 seconds. Oct 22 13:37:04 osdhost01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 22 13:37:04 osdhost01 kernel: nfsdD 880627c73680 0 12672 2 0x0080 Oct 22 13:37:04 osdhost01 kernel: 880bda763b08 0046 880be73af1c0 880bda763fd8 Oct 22 13:37:04 osdhost01 kernel: 880bda763fd8 880bda763fd8 880be73af1c0 880627c73f48 Oct 22 13:37:04 osdhost01 kernel: 880c3ff98ae8 0002 811562e0 880bda763b80 Oct 22 13:37:04 osdhost01 kernel: Call Trace: Oct 22 13:37:04 osdhost01 kernel: [] ? wait_on_page_read+0x60/0x60 Oct 22 13:37:04 osdhost01 kernel: [] io_schedule+0x9d/0x130 Oct 22 13:37:04 osdhost01 kernel: [] sleep_on_page+0xe/0x20 Oct 22 13:37:04 osdhost01 kernel: [] __wait_on_bit+0x60/0x90 Oct 22 13:37:04 osdhost01 kernel: [] wait_on_page_bit+0x86/0xb0 Oct 22 13:37:04 osdhost01 kernel: [] ? autoremove_wake_function+0x40/0x40 Oct 22 13:37:04 osdhost01 kernel: [] filemap_fdatawait_range+0x111/0x1b0 Oct 22 13:37:04 osdhost01 kernel: [] filemap_write_and_wait_range+0x3f/0x70 Oct 22 13:37:04 osdhost01 kernel: [] xfs_file_fsync+0x66/0x1f0 [xfs] Oct 22 13:37:04 osdhost01 kernel: [] vfs_fsync_range+0x1d/0x30 Oct 22 13:37:04 osdhost01 kernel: [] nfsd_commit+0xb9/0xe0 [nfsd] Oct 22 13:37:04 osdhost01 kernel: [] nfsd4_commit+0x57/0x60 [nfsd] Oct 22 13:37:04 osdhost01 kernel: [] nfsd4_proc_compound+0x4d7/0x7f0 [nfsd] Oct 22 13:37:04 osdhost01 kernel: [] nfsd_dispatch+0xbb/0x200 [nfsd] Oct 22 13:37:04 osdhost01 kernel: [] svc_process_common+0x453/0x6f0 [sunrpc] Oct 22 13:37:04 osdhost01 kernel: [] svc_process+0x103/0x170 [sunrpc] Oct 22 13:37:04 osdhost01 kernel: [] nfsd+0xe7/0x150 [nfsd] Oct 22 13:37:04 osdhost01 kernel: [] ? nfsd_destroy+0x80/0x80 [nfsd] Oct 22 13:37:04 osdhost01 kernel: [] kthread+0xcf/0xe0 Oct 22 13:37:04 osdhost01 kernel: [] ? kthread_create_on_node+0x140/0x140 Oct 22 13:37:04 osdhost01 kernel: [] ret_from_fork+0x58/0x90 Oct 22 13:37:04 osdhost01 kernel: [] ? kthread_create_on_node+0x140/0x140 Oct 22 13:37:04 osdhost01 kernel: INFO: task kworker/u50:81:15660 blocked for more than 120 seconds. Oct 22 13:37:04 osdhost01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 22 13:37:04 osdhost01 kernel: kworker/u50:81 D 880c3fc73680 0 15660 2 0x0080 Oct 22 13:37:04 osdhost01 kernel: Workqueue: writeback bdi_writeback_workfn (flush-252:0) Oct 22 13:37:04 osdhost01 kernel: 88086deeb738 0046 880beb6796c0 88086deebfd8 Oct 22 13:37:04 osdhost01 kernel: 88086deebfd8 88086deebfd8 880beb6796c0 880c3fc73f48 Oct 22 13:37:04 osdhost01 kernel: 88061aec0fc0 880c1bb2dea0 88061aec0ff0 88061aec0fc0 Oct 22 13:37:04 osdhost01 kernel: Call Trace: Oct 22 13:37:04 osdhost01 kernel: [] io_schedule+0x9d/0x130 Oct 22 13:37:04 osdhost01 kernel: [] get_request+0x1b5/0x780 Oct 22 13:37:04 osdhost01 kernel: [] ? wake_up_bit+0x30/0x30 Oct 22 13:37:04 osdhost01 kernel: [] blk_queue_bio+0xc6/0x390 Oct 22 13:37:04 osdhost01 kernel: [] generic_make_request+0xe2/0x130 Oct 22 13:37:04 osdhost01 kernel: [] submit_bio+0x71/0x150 Oct 22 13:37:04 osdhost01 kernel: [] xfs_submit_ioend_bio.isra.12+0x33/0x40 [xfs] Oct 22 13:37:04 osdhost01 kernel: [] xfs_submit_ioend+0xef/0x130 [xfs] Oct 22 13:37:04 osdhost01 kernel: [] xfs_vm_writepage+0x36a/0x5d0 [xfs] Oct 22 13:37:04 osdhost01 kernel: [] __writepage+0x13/0x50 Oct 22 13:37:04 osdhost01 kernel: [] write_cache_pages+0x251/0x4d0 Oct 22 13:37:04 osdhost01 kernel: [] ? global_dirtyable_memory+0x70/0x70 Oct 22 13:37:04 osdhost01 kernel: [] generic_writepages+0x4d/0x80 Oct 22 13:37:04 osdhost01 kernel: [] xfs_vm_writepages+0x43/0x50 [xfs] Oct 22 13:37:04 osdhost01 kernel: [] do_writepages+0x1e/0x40 Oct 22 13:37:04 osdhost01 kernel: [] __writeback_single_inode+0x40/0x220 Oct 22 13:37:04 osdhost01 kernel: [] writeback_sb_inodes+0x25e/0x420 Oct 22 13:37:04 osdhost01 kernel: [] __writeback_inodes_wb+0x9f/0xd0 Oct 22 13:37:04 osdhost01 kernel: [] wb_writeback+0x263/0x2f0 Oct 22 13:37:04 osdhost01 kernel: [] bdi_writeback_workfn+0x1cc/0x460 Oct 22 13:37:04 osdhost01 kernel: [] process_one_work+0x17b/0x470 Oct 22 13:37:04 osdhost01 kernel: [] worker_thread+0x11b/0x400 Oct 22 13:37:04 osdhost01 kernel: [] ? rescuer_thread+0x400/0x400 Oct 22 13:37:04 osdhost01 kernel: [] kthread+0xcf/0xe0 Oct 22 13:37:04 osdhost01 kernel: [] ? kthread_create_on_node+0x140/0x140
Re: [ceph-users] hanging nfsd requests on an RBD to NFS gateway
> On Oct 22, 2015, at 10:19 PM, John-Paul Robinson wrote: > > A few clarifications on our experience: > > * We have 200+ rbd images mounted on our RBD-NFS gateway. (There's > nothing easier for a user to understand than "your disk is full".) Same here, and agreed. It sounds like our situations are similar except for my blocking on an apparently healthy cluster issue. > * I'd expect more contention potential with a single shared RBD back > end, but with many distinct and presumably isolated backend RBD images, > I've always been surprised that *all* the nfsd task hang. This leads me > to think it's an nfsd issue rather than and rbd issue. (I realize this > is an rbd list, looking for shared experience. ;) ) It's definitely possible. I've experienced exactly the behavior you're seeing. My guess is that when an nfsd thread blocks and goes dark, affected clients (even if it's only one) will retransmit their requests thinking there's a network issue causing more nfsds to go dark until all the server threads are stuck (that could be hogwash, but it fits the behavior). Or perhaps there are enough individual clients writing to the affected NFS volume that they consume all the available nfsd threads (I'm not sure about your client to FS and nfsd thread ratio, but that is plausible in my situation). I think some testing with xfs_freeze and non-critical nfs server/clients is called for. I don't think this part is related to ceph except that it happens to be providing the underlying storage. I'm fairly certain that my problems with an apparently healthy cluster blocking writes is a ceph problem, but I haven't figured out what the source of that is. > * I haven't seen any difference between reads and writes. Any access to > any backing RBD store from the NFS client hangs. All NFS clients are hung, but in my situation, it's usually only 1-3 local file systems that stop accepting writes. NFS is completely unresponsive, but local and remote-samba operations on the unaffected file systems are totally happy. I don't have a solution to NFS issue, but I've seen it all too often. I wonder whether setting a huge number of threads and or playing with client retransmit times would help, but I suspect this problem is just intrinsic to Linux NFS servers. Ryan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] hanging nfsd requests on an RBD to NFS gateway
A few clarifications on our experience: * We have 200+ rbd images mounted on our RBD-NFS gateway. (There's nothing easier for a user to understand than "your disk is full".) * I'd expect more contention potential with a single shared RBD back end, but with many distinct and presumably isolated backend RBD images, I've always been surprised that *all* the nfsd task hang. This leads me to think it's an nfsd issue rather than and rbd issue. (I realize this is an rbd list, looking for shared experience. ;) ) * I haven't seen any difference between reads and writes. Any access to any backing RBD store from the NFS client hangs. ~jpr On 10/22/2015 06:42 PM, Ryan Tokarek wrote: >> On Oct 22, 2015, at 3:57 PM, John-Paul Robinson wrote: >> >> Hi, >> >> Has anyone else experienced a problem with RBD-to-NFS gateways blocking >> nfsd server requests when their ceph cluster has a placement group that >> is not servicing I/O for some reason, eg. too few replicas or an osd >> with slow request warnings? > We have experienced exactly that kind of problem except that it sometimes > happens even when ceph health reports "HEALTH_OK". This has been incredibly > vexing for us. > > > If the cluster is unhealthy for some reason, then I'd expect your/our > symptoms as writes can't be completed. > > I'm guessing that you have file systems with barriers turned on. Whichever > file system that has a barrier write stuck on the problem pg, will cause any > other process trying to write anywhere in that FS also to block. This likely > means a cascade of nfsd processes will block as they each try to service > various client writes to that FS. Even though, theoretically, the rest of the > "disk" (rbd) and other file systems might still be writable, the NFS > processes will still be in uninterruptible sleep just because of that stuck > write request (or such is my understanding). > > Disabling barriers on the gateway machine might postpone the problem (never > tried it and don't want to) until you hit your vm.dirty_bytes or > vm.dirty_ratio thresholds, but it is dangerous as you could much more easily > lose data. You'd be better off solving the underlying issues when they happen > (too few replicas available or overloaded osds). > > > For us, even when the cluster reports itself as healthy, we sometimes have > this problem. All nfsd processes block. sync blocks. echo 3 > > /proc/sys/vm/drop_caches blocks. There is a persistent 4-8MB "Dirty" in > /proc/meminfo. None of the osds log slow requests. Everything seems fine on > the osds and mons. Neither CPU nor I/O load is extraordinary on the ceph > nodes, but at least one file system on the gateway machine will stop > accepting writes. > > If we just wait, the situation resolves itself in 10 to 30 minutes. A forced > reboot of the NFS gateway "solves" the performance problem, but is annoying > and dangerous (we unmount all of the file systems that are still unmountable, > but the stuck ones lead us to a sysrq-b). > > This is on Scientific Linux 6.7 systems with elrepo 4.1.10 Kernels running > Ceph Firefly (0.8.10) and XFS file systems exported over NFS and samba. > > Ryan > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] hanging nfsd requests on an RBD to NFS gateway
> On Oct 22, 2015, at 3:57 PM, John-Paul Robinson wrote: > > Hi, > > Has anyone else experienced a problem with RBD-to-NFS gateways blocking > nfsd server requests when their ceph cluster has a placement group that > is not servicing I/O for some reason, eg. too few replicas or an osd > with slow request warnings? We have experienced exactly that kind of problem except that it sometimes happens even when ceph health reports "HEALTH_OK". This has been incredibly vexing for us. If the cluster is unhealthy for some reason, then I'd expect your/our symptoms as writes can't be completed. I'm guessing that you have file systems with barriers turned on. Whichever file system that has a barrier write stuck on the problem pg, will cause any other process trying to write anywhere in that FS also to block. This likely means a cascade of nfsd processes will block as they each try to service various client writes to that FS. Even though, theoretically, the rest of the "disk" (rbd) and other file systems might still be writable, the NFS processes will still be in uninterruptible sleep just because of that stuck write request (or such is my understanding). Disabling barriers on the gateway machine might postpone the problem (never tried it and don't want to) until you hit your vm.dirty_bytes or vm.dirty_ratio thresholds, but it is dangerous as you could much more easily lose data. You'd be better off solving the underlying issues when they happen (too few replicas available or overloaded osds). For us, even when the cluster reports itself as healthy, we sometimes have this problem. All nfsd processes block. sync blocks. echo 3 > /proc/sys/vm/drop_caches blocks. There is a persistent 4-8MB "Dirty" in /proc/meminfo. None of the osds log slow requests. Everything seems fine on the osds and mons. Neither CPU nor I/O load is extraordinary on the ceph nodes, but at least one file system on the gateway machine will stop accepting writes. If we just wait, the situation resolves itself in 10 to 30 minutes. A forced reboot of the NFS gateway "solves" the performance problem, but is annoying and dangerous (we unmount all of the file systems that are still unmountable, but the stuck ones lead us to a sysrq-b). This is on Scientific Linux 6.7 systems with elrepo 4.1.10 Kernels running Ceph Firefly (0.8.10) and XFS file systems exported over NFS and samba. Ryan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] hanging nfsd requests on an RBD to NFS gateway
On 10/22/2015 04:03 PM, Wido den Hollander wrote: > On 10/22/2015 10:57 PM, John-Paul Robinson wrote: >> Hi, >> >> Has anyone else experienced a problem with RBD-to-NFS gateways blocking >> nfsd server requests when their ceph cluster has a placement group that >> is not servicing I/O for some reason, eg. too few replicas or an osd >> with slow request warnings? >> >> We have an RBD-NFS gateway that stops responding to NFS clients >> (interaction with RBD-backed NFS shares hang on the NFS client), >> whenever our ceph cluster has some part of it in an I/O block >> condition. This issue only affects the ability of the nfsd processes >> to serve requests to the client. I can look at and access underlying >> mounted RBD containers without issue, although they appear hung from the >> NFS client side. The gateway node load numbers spike to a number that >> reflects the number of nfsd processes, but the system is otherwise >> untaxed (unlike the case in a normal high os load, ie. i can type and >> run commands with normal responsiveness.) >> > Well, that is normal I think. Certain objects become unresponsive if a > PG is not serving I/O. > > With a simple 'ls' or 'df -h' you might not be touching those objects, > so for you it seems like everything is functioning. > > The nfsd process however might be hung due to a blocking I/O call. That > is completely normal and to be excpected. I agree that an nfsd process blocking on a blocked backend I/O request is expected an normal. > That it hangs the complete NFS server might be just a side-effect on how > nfsd was written. Hanging all nfsd processes is the part I find unexpected. I'm just wondering is someone has experience with this or if this is a known nfsd issue. > It might be that Ganesha works better for you: > http://blog.widodh.nl/2014/12/nfs-ganesha-with-libcephfs-on-ubuntu-14-04/ Thanks genesha looks very interesting! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] hanging nfsd requests on an RBD to NFS gateway
On 10/22/2015 10:57 PM, John-Paul Robinson wrote: > Hi, > > Has anyone else experienced a problem with RBD-to-NFS gateways blocking > nfsd server requests when their ceph cluster has a placement group that > is not servicing I/O for some reason, eg. too few replicas or an osd > with slow request warnings? > > We have an RBD-NFS gateway that stops responding to NFS clients > (interaction with RBD-backed NFS shares hang on the NFS client), > whenever our ceph cluster has some part of it in an I/O block > condition. This issue only affects the ability of the nfsd processes > to serve requests to the client. I can look at and access underlying > mounted RBD containers without issue, although they appear hung from the > NFS client side. The gateway node load numbers spike to a number that > reflects the number of nfsd processes, but the system is otherwise > untaxed (unlike the case in a normal high os load, ie. i can type and > run commands with normal responsiveness.) > Well, that is normal I think. Certain objects become unresponsive if a PG is not serving I/O. With a simple 'ls' or 'df -h' you might not be touching those objects, so for you it seems like everything is functioning. The nfsd process however might be hung due to a blocking I/O call. That is completely normal and to be excpected. That it hangs the complete NFS server might be just a side-effect on how nfsd was written. It might be that Ganesha works better for you: http://blog.widodh.nl/2014/12/nfs-ganesha-with-libcephfs-on-ubuntu-14-04/ > The behavior comes accross like there is some nfsd global lock that an > nfsd sets before requesting I/O from a backend device. In the case > above, the I/O request hangs on one RBD image affected by the I/O block > caused by the problematic pg or OSD. The nfsd request blocks on the > ceph I/O and because it has set a global lock, all other nfsd processes > are prevented from servicing requests to their clients. The nfsd > processes are now all in the wait queue causing the load number on the > gateway system to spike. Once the Ceph I/O issues is resolved, the nfsd > I/O request completes and all service returns to normal. The load on > the gateway drops to normal immediately and all NFS clients can again > interact with the nfsd processes. Thoughout this time unaffected ceph > objects remain available to other clients, eg. OpenStack volumes. > > Our RBD-NFS gateway is running Ubuntu 12.04.5 with kernel > 3.11.0-15-generic. The ceph version installed on this client is 0.72.2, > though I assume only the kernel resident RBD module matters. > > Any thoughts or pointers appreciated. > > ~jpr > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com