Re: [Nfs-ganesha-devel] ganesha crash when stopping with gluster handles exist

Frank Filz Wed, 24 Jan 2018 07:43:40 -0800

Hmm, I need to check all the conditions, but it looks like someone has to do 
something when the fsal_obj_handle passed up to mdcache is not actually 
consumed. Probably mdcache needs to release the obj handle. I think mdcache may 
need to call release on the obj handles it doesn't consume. In the fsal_helper 
layer used by the protocols there is an obj handle put ref which disposes of 
any obj handles. At that layer we never consume the obj handle, also note that 
a top layer FSAL MUST implement ref counting, an FSAL stacked under such an 
FSAL need not implement refcounting with the upper FSAL doing a obj handle 
release when the lower FSAL's obj handle is no longer needed.


So I think it's a simple case of the stacking in readdir not properly managing 
the lower FSAL's obj handle lifetime...

Good catch, that could be a serious memory leak, take your config and then have 
thousands of directories with more than 50 entries and do a find on the 
export...

Frank

> -----Original Message-----
> From: Kinglong Mee [mailto:kinglong...@gmail.com]
> Sent: Wednesday, January 24, 2018 2:30 AM
> To: Frank Filz <ffilz...@mindspring.com>; nfs-ganesha-
> de...@lists.sourceforge.net
> Subject: ganesha crash when stopping with gluster handles exist
> 
> With the latest codes (nfs-ganesha-2.6.0-0.2rc3.el7.centos.x86_64),
> set ganesha.conf includes,
> 
> CACHEINODE {
>     Dir_Max = 50;
>     Dir_Chunk = 0;
> }
> 
> that forces mdcache_readdir enter mdcache_dirent_populate, but
> mdc_add_cache return ERR_FSAL_OVERFLOW when a directory contains larger
> than 50 files (I tests with 100 files).
> 
> After return ERR_FSAL_OVERFLOW, a handle is left in gluster fsal, but no entry
> added to mdcache.
> 
> Right now, restarting nfs-ganesha will crash as,
> 
> #0  0x00007f75bb9ea6e7 in __inode_unref () from /lib64/libglusterfs.so.0
> #1  0x00007f75bb9eaf41 in inode_unref () from /lib64/libglusterfs.so.0
> #2  0x00007f75bbccdad6 in glfs_h_close () from /lib64/libgfapi.so.0
> #3  0x00007f75bc0eb93f in handle_release ()
>    from /usr/lib64/ganesha/libfsalgluster.so
> #4  0x00007f75c12c6faf in destroy_fsals ()
> #5  0x00007f75c12e158f in admin_thread ()
> #6  0x00007f75bf849dc5 in start_thread () from /lib64/libpthread.so.0
> #7  0x00007f75bef1b73d in clone () from /lib64/libc.so.6
> 
> For gluster, those sources (eg, handles, inode) belong to a glfs, the glfs is 
> freed
> at remove_all_exports() before destory_fsal(), so that, glfs_h_close() use 
> after
> freed memory.
> 
> I have no idea how to fix the problem,
> 1. Adds the gluster handle (meets the ERR_FSAL_OVERFLOW) to mdcache entry
> ?
> 2. Binds gluster handles to a glfs export, and release before freeing glfs?
> 3. Only move the shutdown_handles() before remove_all_exports()?
> 
> valgrind shows many messages as,
> ==13796== Invalid read of size 8
> ==13796==    at 0xA5E39C2: ??? (in /usr/lib64/libglusterfs.so.0.0.1)
> ==13796==    by 0xA5E5D58: inode_forget (in /usr/lib64/libglusterfs.so.0.0.1)
> ==13796==    by 0xA3A0ACD: glfs_h_close (in /usr/lib64/libgfapi.so.0.0.0)
> ==13796==    by 0x9F6B93E: ??? (in /usr/lib64/ganesha/libfsalgluster.so.4.2.0)
> ==13796==    by 0x147FAE: destroy_fsals (in /usr/bin/ganesha.nfsd)
> ==13796==    by 0x16258E: admin_thread (in /usr/bin/ganesha.nfsd)
> ==13796==    by 0x6441DC4: start_thread (in /usr/lib64/libpthread-2.17.so)
> ==13796==    by 0x6DAA73C: clone (in /usr/lib64/libc-2.17.so)
> ==13796==  Address 0x1d3b6d58 is 104 bytes inside a block of size 256 free'd
> ==13796==    at 0x4C28CDD: free (vg_replace_malloc.c:530)
> ==13796==    by 0xA5FD9BB: free_obj_list (in /usr/lib64/libglusterfs.so.0.0.1)
> ==13796==    by 0xA5FDDA7: mem_pools_fini (in 
> /usr/lib64/libglusterfs.so.0.0.1)
> ==13796==    by 0xA38A2D6: glfs_fini (in /usr/lib64/libgfapi.so.0.0.0)
> ==13796==    by 0x9F6687A: glusterfs_free_fs (in
> /usr/lib64/ganesha/libfsalgluster.so.4.2.0)
> ==13796==    by 0x9F66C19: ??? (in /usr/lib64/ganesha/libfsalgluster.so.4.2.0)
> ==13796==    by 0x223E1E: ??? (in /usr/bin/ganesha.nfsd)
> ==13796==    by 0x200FFA: free_export_resources (in /usr/bin/ganesha.nfsd)
> ==13796==    by 0x212288: free_export (in /usr/bin/ganesha.nfsd)
> ==13796==    by 0x215329: remove_all_exports (in /usr/bin/ganesha.nfsd)
> ==13796==    by 0x1624B8: admin_thread (in /usr/bin/ganesha.nfsd)
> ==13796==    by 0x6441DC4: start_thread (in /usr/lib64/libpthread-2.17.so)
> ==13796==  Block was alloc'd at
> ==13796==    at 0x4C27BE3: malloc (vg_replace_malloc.c:299)
> ==13796==    by 0xA5FE3C4: mem_get (in /usr/lib64/libglusterfs.so.0.0.1)
> ==13796==    by 0xA5FE4D2: mem_get0 (in /usr/lib64/libglusterfs.so.0.0.1)
> ==13796==    by 0xA5E3BD3: ??? (in /usr/lib64/libglusterfs.so.0.0.1)
> ==13796==    by 0xA5E510A: inode_new (in /usr/lib64/libglusterfs.so.0.0.1)
> ==13796==    by 0x178B5779: ???
> ==13796==    by 0x178EA503: ???
> ==13796==    by 0x178C3E46: ???
> ==13796==    by 0xA8B8E7F: rpc_clnt_handle_reply (in
> /usr/lib64/libgfrpc.so.0.0.1)
> ==13796==    by 0xA8B9202: rpc_clnt_notify (in /usr/lib64/libgfrpc.so.0.0.1)
> ==13796==    by 0xA8B4F82: rpc_transport_notify (in
> /usr/lib64/libgfrpc.so.0.0.1)
> ==13796==    by 0x1741D595: ??? (in
> /usr/lib64/glusterfs/3.13.1/rpc-transport/socket.so)


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Nfs-ganesha-devel mailing list
Nfs-ganesha-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs-ganesha-devel

Re: [Nfs-ganesha-devel] ganesha crash when stopping with gluster handles exist

Reply via email to