Re: [ceph-users] NFS interaction with RBD

Jens-Christian Fischer Sat, 23 May 2015 14:29:14 -0700

We see something very similar on our Ceph cluster, starting as of today.

We use a 16 node, 102 OSD Ceph installation as the basis for an Icehouse 
OpenStack cluster (we applied the RBD patches for live migration etc)


On this cluster we have a big ownCloud installation (Sync & Share) that stores 
its files on three NFS servers, each mounting 6 2TB RBD volumes and exposing 
them to around 10 web server VMs (we originally started with one NFS server 
with a 100TB volume, but that has become unwieldy). All of the servers 
(hypervisors, ceph storage nodes and VMs) are using Ubuntu 14.04

Yesterday evening we added 23 ODSs to the cluster bringing it up to 125 OSDs 
(because we had 4 OSDs that were nearing the 90% full mark). The rebalancing 
process ended this morning (after around 12 hours)
The cluster has been clean since then:

    cluster b1f3f4c8-xxxxx
     health HEALTH_OK
     monmap e2: 3 mons at 
{zhdk0009=[yyyy:xxxx::1009]:6789/0,zhdk0013=[yyyy:xxxx::1013]:6789/0,zhdk0025=[yyyy:xxxx::1025]:6789/0},
 election epoch 612, quorum 0,1,2 zhdk0009,zhdk0013,zhdk0025
     osdmap e43476: 125 osds: 125 up, 125 in
      pgmap v18928606: 3336 pgs, 17 pools, 82447 GB data, 22585 kobjects
            266 TB used, 187 TB / 454 TB avail
                3319 active+clean
                  17 active+clean+scrubbing+deep
  client io 8186 kB/s rd, 7747 kB/s wr, 2288 op/s

At midnight, we run a script that creates an RBD snapshot of all RBD volumes 
that are attached to the NFS servers (for backup purposes). Looking at our 
monitoring, around that time, one of the NFS servers became unresponsive and 
took down the complete ownCloud installation (load on the web server was > 200 
and they had lost some of the NFS mounts)

Rebooting the NFS server solved that problem, but the NFS kernel server kept 
crashing all day long after having run between 10 to 90 minutes.

We initially suspected a corrupt rbd volume (as it seemed that we could trigger 
the kernel crash by just “ls -l” one of the volumes, but subsequent “xfs_repair 
-n” checks on those RBD volumes showed no problems.

We migrated the NFS server off of its hypervisor, suspecting a problem with RBD 
kernel modules, rebooted the hypervisor but the problem persisted (both on the 
new hypervisor, and on the old one when we migrated it back)

We changed the /etc/default/nfs-kernel-server to start up 256 servers (even 
though the defaults had been working fine for over a year)

Only one of our 3 NFS servers crashes (see below for syslog information) - the 
other 2 have been fine

May 23 21:44:10 drive-nfs1 kernel: [  165.264648] NFSD: Using 
/var/lib/nfs/v4recovery as the NFSv4 state recovery directory
May 23 21:44:19 drive-nfs1 kernel: [  173.880092] NFSD: starting 90-second 
grace period (net ffffffff81cdab00)
May 23 21:44:23 drive-nfs1 rpc.mountd[1724]: Version 1.2.8 starting
May 23 21:44:28 drive-nfs1 kernel: [  182.917775] ip_tables: (C) 2000-2006 
Netfilter Core Team
May 23 21:44:28 drive-nfs1 kernel: [  182.958465] nf_conntrack version 0.5.0 
(16384 buckets, 65536 max)
May 23 21:44:28 drive-nfs1 kernel: [  183.044091] ip6_tables: (C) 2000-2006 
Netfilter Core Team
May 23 21:45:10 drive-nfs1 CRON[1867]: (root) CMD (command -v debian-sa1 > 
/dev/null && debian-sa1 1 1)
May 23 21:45:17 drive-nfs1 collectd[1872]: python: Plugin loaded but not 
configured.
May 23 21:45:17 drive-nfs1 collectd[1872]: Initialization complete, entering 
read-loop.
May 23 21:47:11 drive-nfs1 kernel: [  346.392283] init: plymouth-upstart-bridge 
main process ended, respawning
May 23 21:51:26 drive-nfs1 kernel: [  600.776177] INFO: task nfsd:1696 blocked 
for more than 120 seconds.
May 23 21:51:26 drive-nfs1 kernel: [  600.778090]       Not tainted 
3.13.0-53-generic #89-Ubuntu
May 23 21:51:26 drive-nfs1 kernel: [  600.779507] "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 23 21:51:26 drive-nfs1 kernel: [  600.781504] nfsd            D 
ffff88013fd93180     0  1696      2 0x00000000
May 23 21:51:26 drive-nfs1 kernel: [  600.781508]  ffff8800b2391c50 
0000000000000046 ffff8800b22f9800 ffff8800b2391fd8
May 23 21:51:26 drive-nfs1 kernel: [  600.781511]  0000000000013180 
0000000000013180 ffff8800b22f9800 ffff880035f48240
May 23 21:51:26 drive-nfs1 kernel: [  600.781513]  ffff880035f48244 
ffff8800b22f9800 00000000ffffffff ffff880035f48248
May 23 21:51:26 drive-nfs1 kernel: [  600.781515] Call Trace:
May 23 21:51:26 drive-nfs1 kernel: [  600.781523]  [<ffffffff81727749>] 
schedule_preempt_disabled+0x29/0x70
May 23 21:51:26 drive-nfs1 kernel: [  600.781526]  [<ffffffff817295b5>] 
__mutex_lock_slowpath+0x135/0x1b0
May 23 21:51:26 drive-nfs1 kernel: [  600.781528]  [<ffffffff8172964f>] 
mutex_lock+0x1f/0x2f
May 23 21:51:26 drive-nfs1 kernel: [  600.781557]  [<ffffffffa03b1761>] 
nfsd_lookup_dentry+0xa1/0x490 [nfsd]
May 23 21:51:26 drive-nfs1 kernel: [  600.781568]  [<ffffffffa03b044b>] ? 
fh_verify+0x14b/0x5e0 [nfsd]
May 23 21:51:26 drive-nfs1 kernel: [  600.781591]  [<ffffffffa03b1bb9>] 
nfsd_lookup+0x69/0x130 [nfsd]
May 23 21:51:26 drive-nfs1 kernel: [  600.781613]  [<ffffffffa03be90a>] 
nfsd4_lookup+0x1a/0x20 [nfsd]
May 23 21:51:26 drive-nfs1 kernel: [  600.781628]  [<ffffffffa03c055a>] 
nfsd4_proc_compound+0x56a/0x7d0 [nfsd]
May 23 21:51:26 drive-nfs1 kernel: [  600.781638]  [<ffffffffa03acd3b>] 
nfsd_dispatch+0xbb/0x200 [nfsd]
May 23 21:51:26 drive-nfs1 kernel: [  600.781662]  [<ffffffffa028762d>] 
svc_process_common+0x46d/0x6d0 [sunrpc]
May 23 21:51:26 drive-nfs1 kernel: [  600.781678]  [<ffffffffa0287997>] 
svc_process+0x107/0x170 [sunrpc]
May 23 21:51:26 drive-nfs1 kernel: [  600.781687]  [<ffffffffa03ac71f>] 
nfsd+0xbf/0x130 [nfsd]
May 23 21:51:26 drive-nfs1 kernel: [  600.781696]  [<ffffffffa03ac660>] ? 
nfsd_destroy+0x80/0x80 [nfsd]
May 23 21:51:26 drive-nfs1 kernel: [  600.781702]  [<ffffffff8108b6b2>] 
kthread+0xd2/0xf0
May 23 21:51:26 drive-nfs1 kernel: [  600.781707]  [<ffffffff8108b5e0>] ? 
kthread_create_on_node+0x1c0/0x1c0
May 23 21:51:26 drive-nfs1 kernel: [  600.781712]  [<ffffffff81733868>] 
ret_from_fork+0x58/0x90
May 23 21:51:26 drive-nfs1 kernel: [  600.781717]  [<ffffffff8108b5e0>] ? 
kthread_create_on_node+0x1c0/0x1c0

Before each crash, we see the disk utilization of one or two random mounted RBD 
volumes to go to 100% - there is no pattern on which of the RBD disks start to 
act up.

We have scoured the log files of the Ceph cluster for any signs of problems but 
came up empty.

The NFS server has almost no load (compared to regular usage) as most sync 
clients are either turned off (weekend) or have given up connecting to the 
server. 

There haven't been any configuration change on the NFS servers prior to the 
problems. The only change was the adding of 23 OSDs.

We use ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)

Our team is completely out of ideas. We have removed the 100TB volume from the 
nfs server (we used the downtime to migrate the last data off of it to one of 
the smaller volumes). The NFS server has been running for 30 minutes now (with 
close to no load) but we don’t really expect it to make it until tomorrow.

send help
Jens-Christian

-- 
SWITCH
Jens-Christian Fischer, Peta Solutions
Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
phone +41 44 268 15 15, direct +41 44 268 15 71
jens-christian.fisc...@switch.ch
http://www.switch.ch

http://www.switch.ch/stories

On 23.05.2015, at 20:38, John-Paul Robinson (Campus) <j...@uab.edu> wrote:

> We've had a an NFS gateway serving up RBD images successfully for over a 
> year. Ubuntu 12.04 and ceph .73 iirc. 
> 
> In the past couple of weeks we have developed a problem where the nfs clients 
> hang while accessing exported rbd containers. 
> 
> We see errors on the server about nfsd hanging for 120sec etc. 
> 
> The nfs server is still able to successfully interact with the images it is 
> serving. We can export non rbd shares from the local file system and nfs 
> clients can use them just fine. 
> 
> There seems to be something weird going on with rbd and nfs kernel modules. 
> 
> Our ceph pool is in a warn state due to an osd rebalance that is continuing 
> slowly. But the fact that we continue to have good rbd image access directly 
> on the server makes me think this is not related.  Also the nfs server is 
> only a client of the pool, it doesnt participate in it. 
> 
> Has anyone experienced similar issues?  
> 
> We do have a lot of images attached to the server but he issue is there even 
> when we map only a few. 
> 
> Thanks for any pointers. 
> 
> ~jpr
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] NFS interaction with RBD

Reply via email to