Hi,

since we switched to NFS(due to many small files) we are experiencing heavy 
problems with Glusters NFS daemon. About once a day, the Gluster NFS process 
just crashes on one of the machines and doesn't come up again until I issue a 
restart of the Gluster daemon on that node. Sometimes the crashed node will 
even crash again after the restart.

We have a ~2TB volume with 6 bricks on 5 servers, accessed by 12 NFS clients 
and one FUSE client.

In the nfs logs there's something like the following:

tail -n 100 /var/log/glusterfs/nfs.log
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
[...]
frame : type(0) op(0)

signal received: 11
time of crash: 2013-08-15 14:08:39
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.0
/lib/x86_64-linux-gnu/libc.so.6(+0x364c0)[0x7fac361904c0]
/lib/x86_64-linux-gnu/libpthread.so.0(pthread_spin_lock+0x0)[0x7fac36523a50]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(fd_unref+0x36)[0x7fac36b96966]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/xlator/protocol/client.so(client_local_wipe+0x28)[0x7fac31f6a4f8]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/xlator/protocol/client.so(client3_3_opendir_cbk+0x19c)[0x7fac31f8353c]
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x7fac36957bd5]
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0xc5)[0x7fac36957f35]
/usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_transport_notify+0x27)[0x7fac36954627]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/rpc-transport/socket.so(+0xa1d1)[0x7fac32e091d1]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.0/rpc-transport/socket.so(+0xa81c)[0x7fac32e0981c]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(+0x5e553)[0x7fac36bbd553]
/usr/sbin/glusterfs(main+0x3e3)[0x7fac37007883]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7fac3617b76d]
/usr/sbin/glusterfs(+0x5c79)[0x7fac37007c79]
---------




Is there anything we could do to prevent this or at least something to find the 
cause of this? At the moment we have the ugly workaround to check the NFS 
status via cron and restart the server if necessary but that's nothing we find 
suitable for larger deployments..
_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Reply via email to