There are a couple of answers to that question... - The core dump is from a fully patched RHEL 6 box. This is my primary box - The other two nodes are fully patched CentOS 6.
-- *Gene Liverman* Systems Integration Architect Information Technology Services University of West Georgia glive...@westga.edu 678.839.5492 ITS: Making Technology Work for You! On Wed, Oct 7, 2015 at 11:50 AM, Atin Mukherjee <atin.mukherje...@gmail.com> wrote: > This looks like a glibc corruption to me. Which distribution platform are > you running Gluster on? > > -Atin > Sent from one plus one > On Oct 7, 2015 9:12 PM, "Gene Liverman" <glive...@westga.edu> wrote: > >> Both of the requested trace commands are below: >> >> Core was generated by `/usr/sbin/glusterd >> --pid-file=/var/run/glusterd.pid'. >> Program terminated with signal 6, Aborted. >> #0 0x0000003b91432625 in raise (sig=<value optimized out>) at >> ../nptl/sysdeps/unix/sysv/linux/raise.c:64 >> 64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig); >> >> >> >> (gdb) bt >> #0 0x0000003b91432625 in raise (sig=<value optimized out>) at >> ../nptl/sysdeps/unix/sysv/linux/raise.c:64 >> #1 0x0000003b91433e05 in abort () at abort.c:92 >> #2 0x0000003b91470537 in __libc_message (do_abort=2, fmt=0x3b915588c0 >> "*** glibc detected *** %s: %s: 0x%s ***\n") at >> ../sysdeps/unix/sysv/linux/libc_fatal.c:198 >> #3 0x0000003b91475f4e in malloc_printerr (action=3, str=0x3b9155687d >> "corrupted double-linked list", ptr=<value optimized out>, ar_ptr=<value >> optimized out>) at malloc.c:6350 >> #4 0x0000003b914763d3 in malloc_consolidate (av=0x7fee90000020) at >> malloc.c:5216 >> #5 0x0000003b91479c28 in _int_malloc (av=0x7fee90000020, bytes=<value >> optimized out>) at malloc.c:4415 >> #6 0x0000003b9147a7ed in __libc_calloc (n=<value optimized out>, >> elem_size=<value optimized out>) at malloc.c:4093 >> #7 0x0000003b9345c81f in __gf_calloc (nmemb=<value optimized out>, >> size=<value optimized out>, type=59, typestr=0x7fee9ed2d708 >> "gf_common_mt_rpc_trans_t") at mem-pool.c:117 >> #8 0x00007fee9ed2830b in socket_server_event_handler (fd=<value >> optimized out>, idx=<value optimized out>, data=0xf3eca0, poll_in=1, >> poll_out=<value optimized out>, >> poll_err=<value optimized out>) at socket.c:2622 >> #9 0x0000003b9348b0a0 in event_dispatch_epoll_handler (data=0xf408b0) at >> event-epoll.c:575 >> #10 event_dispatch_epoll_worker (data=0xf408b0) at event-epoll.c:678 >> #11 0x0000003b91807a51 in start_thread (arg=0x7fee9db3b700) at >> pthread_create.c:301 >> #12 0x0000003b914e893d in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 >> >> >> >> >> (gdb) t a a bt >> >> Thread 9 (Thread 0x7fee9e53c700 (LWP 37122)): >> #0 pthread_cond_wait@@GLIBC_2.3.2 () at >> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:183 >> #1 0x00007fee9fffcf93 in hooks_worker (args=<value optimized out>) at >> glusterd-hooks.c:534 >> #2 0x0000003b91807a51 in start_thread (arg=0x7fee9e53c700) at >> pthread_create.c:301 >> #3 0x0000003b914e893d in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 >> >> Thread 8 (Thread 0x7feea0c99700 (LWP 36996)): >> #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at >> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:239 >> #1 0x0000003b9346cbdb in syncenv_task (proc=0xefa8c0) at syncop.c:607 >> #2 0x0000003b93472cb0 in syncenv_processor (thdata=0xefa8c0) at >> syncop.c:699 >> #3 0x0000003b91807a51 in start_thread (arg=0x7feea0c99700) at >> pthread_create.c:301 >> #4 0x0000003b914e893d in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 >> >> Thread 7 (Thread 0x7feea209b700 (LWP 36994)): >> #0 do_sigwait (set=<value optimized out>, sig=0x7feea209ae5c) at >> ../sysdeps/unix/sysv/linux/sigwait.c:65 >> #1 __sigwait (set=<value optimized out>, sig=0x7feea209ae5c) at >> ../sysdeps/unix/sysv/linux/sigwait.c:100 >> #2 0x0000000000405dfb in glusterfs_sigwaiter (arg=<value optimized out>) >> at glusterfsd.c:1989 >> #3 0x0000003b91807a51 in start_thread (arg=0x7feea209b700) at >> pthread_create.c:301 >> #4 0x0000003b914e893d in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 >> >> Thread 6 (Thread 0x7feea2a9c700 (LWP 36993)): >> #0 0x0000003b9180efbd in nanosleep () at >> ../sysdeps/unix/syscall-template.S:82 >> #1 0x0000003b934473ea in gf_timer_proc (ctx=0xecc010) at timer.c:205 >> #2 0x0000003b91807a51 in start_thread (arg=0x7feea2a9c700) at >> pthread_create.c:301 >> #3 0x0000003b914e893d in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 >> >> Thread 5 (Thread 0x7feea9e04740 (LWP 36992)): >> #0 0x0000003b918082ad in pthread_join (threadid=140662814254848, >> thread_return=0x0) at pthread_join.c:89 >> #1 0x0000003b9348ab4d in event_dispatch_epoll (event_pool=0xeeb5b0) at >> event-epoll.c:762 >> #2 0x0000000000407b24 in main (argc=2, argv=0x7fff5294adc8) at >> glusterfsd.c:2333 >> >> Thread 4 (Thread 0x7feea169a700 (LWP 36995)): >> #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at >> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:239 >> #1 0x0000003b9346cbdb in syncenv_task (proc=0xefa500) at syncop.c:607 >> #2 0x0000003b93472cb0 in syncenv_processor (thdata=0xefa500) at >> syncop.c:699 >> #3 0x0000003b91807a51 in start_thread (arg=0x7feea169a700) at >> pthread_create.c:301 >> #4 0x0000003b914e893d in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 >> >> Thread 3 (Thread 0x7fee9d13a700 (LWP 37124)): >> #0 0x0000003b914e8f33 in epoll_wait () at >> ../sysdeps/unix/syscall-template.S:82 >> #1 0x0000003b9348aed1 in event_dispatch_epoll_worker (data=0xf405b0) at >> event-epoll.c:668 >> #2 0x0000003b91807a51 in start_thread (arg=0x7fee9d13a700) at >> pthread_create.c:301 >> #3 0x0000003b914e893d in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 >> >> Thread 2 (Thread 0x7fee97fff700 (LWP 37125)): >> #0 0x0000003b914e8f33 in epoll_wait () at >> ../sysdeps/unix/syscall-template.S:82 >> #1 0x0000003b9348aed1 in event_dispatch_epoll_worker (data=0xf6b4d0) at >> event-epoll.c:668 >> #2 0x0000003b91807a51 in start_thread (arg=0x7fee97fff700) at >> pthread_create.c:301 >> #3 0x0000003b914e893d in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 >> >> Thread 1 (Thread 0x7fee9db3b700 (LWP 37123)): >> #0 0x0000003b91432625 in raise (sig=<value optimized out>) at >> ../nptl/sysdeps/unix/sysv/linux/raise.c:64 >> #1 0x0000003b91433e05 in abort () at abort.c:92 >> #2 0x0000003b91470537 in __libc_message (do_abort=2, fmt=0x3b915588c0 >> "*** glibc detected *** %s: %s: 0x%s ***\n") at >> ../sysdeps/unix/sysv/linux/libc_fatal.c:198 >> >> ---Type <return> to continue, or q <return> to quit--- >> #3 0x0000003b91475f4e in malloc_printerr (action=3, str=0x3b9155687d >> "corrupted double-linked list", ptr=<value optimized out>, ar_ptr=<value >> optimized out>) at malloc.c:6350 >> #4 0x0000003b914763d3 in malloc_consolidate (av=0x7fee90000020) at >> malloc.c:5216 >> #5 0x0000003b91479c28 in _int_malloc (av=0x7fee90000020, bytes=<value >> optimized out>) at malloc.c:4415 >> #6 0x0000003b9147a7ed in __libc_calloc (n=<value optimized out>, >> elem_size=<value optimized out>) at malloc.c:4093 >> #7 0x0000003b9345c81f in __gf_calloc (nmemb=<value optimized out>, >> size=<value optimized out>, type=59, typestr=0x7fee9ed2d708 >> "gf_common_mt_rpc_trans_t") at mem-pool.c:117 >> #8 0x00007fee9ed2830b in socket_server_event_handler (fd=<value >> optimized out>, idx=<value optimized out>, data=0xf3eca0, poll_in=1, >> poll_out=<value optimized out>, >> poll_err=<value optimized out>) at socket.c:2622 >> #9 0x0000003b9348b0a0 in event_dispatch_epoll_handler (data=0xf408b0) at >> event-epoll.c:575 >> #10 event_dispatch_epoll_worker (data=0xf408b0) at event-epoll.c:678 >> #11 0x0000003b91807a51 in start_thread (arg=0x7fee9db3b700) at >> pthread_create.c:301 >> #12 0x0000003b914e893d in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 >> >> >> >> >> >> >> >> -- >> *Gene Liverman* >> Systems Integration Architect >> Information Technology Services >> University of West Georgia >> glive...@westga.edu >> 678.839.5492 >> >> ITS: Making Technology Work for You! >> >> >> >> >> On Wed, Oct 7, 2015 at 12:06 AM, Atin Mukherjee <amukh...@redhat.com> >> wrote: >> >>> >>> >>> On 10/07/2015 09:34 AM, Atin Mukherjee wrote: >>> > >>> > >>> > On 10/06/2015 08:15 PM, Gene Liverman wrote: >>> >> Sorry for the delay... they joys of multiple proverbial fires at once. >>> >> In /var/log/messages I found this for our most recent crash: >>> >> >>> >> Oct 3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]: >>> >> pending frames: >>> >> Oct 3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]: >>> >> patchset: git://git.gluster.com/glusterfs.git >>> >> <http://git.gluster.com/glusterfs.git> >>> >> Oct 3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]: >>> >> signal received: 6 >>> >> Oct 3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]: >>> time >>> >> of crash: >>> >> Oct 3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]: >>> >> 2015-10-03 04:26:21 >>> >> Oct 3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]: >>> >> configuration details: >>> >> Oct 3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]: >>> argp 1 >>> >> Oct 3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]: >>> >> backtrace 1 >>> >> Oct 3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]: >>> dlfcn 1 >>> >> Oct 3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]: >>> >> libpthread 1 >>> >> Oct 3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]: >>> >> llistxattr 1 >>> >> Oct 3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]: >>> setfsid 1 >>> >> Oct 3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]: >>> >> spinlock 1 >>> >> Oct 3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]: >>> epoll.h 1 >>> >> Oct 3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]: >>> xattr.h 1 >>> >> Oct 3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]: >>> >> st_atim.tv_nsec 1 >>> >> Oct 3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]: >>> >> package-string: glusterfs 3.7.4 >>> >> Oct 3 00:26:21 eapps-gluster01 etc-glusterfs-glusterd.vol[36992]: >>> --------- >>> >> >>> >> >>> >> I have posted etc-glusterfs-glusterd.vol.log >>> >> to http://pastebin.com/Pzq1j5J3. I also put the core file and an >>> >> sosreport on my web server for you but don't want to leave them there >>> >> for long so I'd appreciate it if you'd let me know once you get them. >>> >> They are at the following url's: >>> >> http://www.westga.edu/~gliverma/tmp-files/core.36992 >>> > Could you get the backtrace and share with us with the following >>> commands: >>> > >>> > $ gdb glusterd2 <core file path> >>> > $ bt >>> Also "t a a bt" output in gdb might help. >>> > >>> >> >>> http://www.westga.edu/~gliverma/tmp-files/sosreport-gliverman.gluster-crashing-20151006101239.tar.xz >>> >> >>> http://www.westga.edu/~gliverma/tmp-files/sosreport-gliverman.gluster-crashing-20151006101239.tar.xz.md5 >>> >> >>> >> >>> >> >>> >> >>> >> Thanks again for the help! >>> >> *Gene Liverman* >>> >> Systems Integration Architect >>> >> Information Technology Services >>> >> University of West Georgia >>> >> glive...@westga.edu <mailto:glive...@westga.edu> >>> >> >>> >> ITS: Making Technology Work for You! >>> >> >>> >> >>> >> >>> >> >>> >> On Fri, Oct 2, 2015 at 11:18 AM, Gaurav Garg <gg...@redhat.com >>> >> <mailto:gg...@redhat.com>> wrote: >>> >> >>> >> >> Pulling those logs now but how do I generate the core file you >>> are asking >>> >> for? >>> >> >>> >> When there is crash then core file automatically generated based >>> on >>> >> your *ulimit* set option. you can find location of core file in >>> your >>> >> root or current working directory or where ever you have set your >>> >> core dump file location. core file gives you information regarding >>> >> crash, where exactly crash happened. >>> >> you can find appropriate core file by looking at crash time in >>> >> glusterd log's by searching "crash" keyword. you can also paste >>> few >>> >> line's just above latest "crash" keyword in glusterd logs. >>> >> >>> >> Just for your curiosity if you willing to look where it crash then >>> >> you can debug it by #gdb -c <location of core file> glusterd >>> >> >>> >> Thank you... >>> >> >>> >> Regards, >>> >> Gaurav >>> >> >>> >> ----- Original Message ----- >>> >> From: "Gene Liverman" <glive...@westga.edu <mailto: >>> glive...@westga.edu>> >>> >> To: "Gaurav Garg" <gg...@redhat.com <mailto:gg...@redhat.com>> >>> >> Cc: "gluster-users" <gluster-users@gluster.org >>> >> <mailto:gluster-users@gluster.org>> >>> >> Sent: Friday, October 2, 2015 8:28:49 PM >>> >> Subject: Re: [Gluster-users] glusterd crashing >>> >> >>> >> Pulling those logs now but how do I generate the core file you are >>> >> asking >>> >> for? >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> -- >>> >> *Gene Liverman* >>> >> Systems Integration Architect >>> >> Information Technology Services >>> >> University of West Georgia >>> >> glive...@westga.edu <mailto:glive...@westga.edu> >>> >> 678.839.5492 <tel:678.839.5492> >>> >> >>> >> ITS: Making Technology Work for You! >>> >> >>> >> >>> >> >>> >> >>> >> On Fri, Oct 2, 2015 at 2:25 AM, Gaurav Garg <gg...@redhat.com >>> >> <mailto:gg...@redhat.com>> wrote: >>> >> >>> >> > Hi Gene, >>> >> > >>> >> > you have paste glustershd log. we asked you to paste glusterd >>> log. >>> >> > glusterd and glustershd both are different process. with this >>> >> information >>> >> > we can't find out why your glusterd crashed. could you paste >>> >> *glusterd* >>> >> > logs >>> (/var/log/glusterfs/usr-local-etc-glusterfs-glusterd.vol.log*) in >>> >> > pastebin (not in this mail thread) and give the link of pastebin >>> >> in this >>> >> > mail thread. Can you also attach core file or you can paste >>> >> backtrace of >>> >> > that core dump file. >>> >> > It will be great if you give us sos report of the node where >>> the crash >>> >> > happen. >>> >> > >>> >> > Thanx, >>> >> > >>> >> > ~Gaurav >>> >> > >>> >> > ----- Original Message ----- >>> >> > From: "Gene Liverman" <glive...@westga.edu >>> >> <mailto:glive...@westga.edu>> >>> >> > To: "gluster-users" <gluster-users@gluster.org >>> >> <mailto:gluster-users@gluster.org>> >>> >> > Sent: Friday, October 2, 2015 4:47:00 AM >>> >> > Subject: Re: [Gluster-users] glusterd crashing >>> >> > >>> >> > Sorry for the delay. Here is what's installed: >>> >> > # rpm -qa | grep gluster >>> >> > glusterfs-geo-replication-3.7.4-2.el6.x86_64 >>> >> > glusterfs-client-xlators-3.7.4-2.el6.x86_64 >>> >> > glusterfs-3.7.4-2.el6.x86_64 >>> >> > glusterfs-libs-3.7.4-2.el6.x86_64 >>> >> > glusterfs-api-3.7.4-2.el6.x86_64 >>> >> > glusterfs-fuse-3.7.4-2.el6.x86_64 >>> >> > glusterfs-server-3.7.4-2.el6.x86_64 >>> >> > glusterfs-cli-3.7.4-2.el6.x86_64 >>> >> > >>> >> > The cmd_history.log file is attached. >>> >> > In gluster.log I have filtered out a bunch of lines like the one >>> >> below due >>> >> > to make them more readable. I had a node down for multiple days >>> due to >>> >> > maintenance and another one went down due to a hardware failure >>> >> during that >>> >> > time too. >>> >> > [2015-10-01 00:16:09.643631] W [MSGID: 114031] >>> >> > [client-rpc-fops.c:2971:client3_3_lookup_cbk] 0-gv0-client-0: >>> remote >>> >> > operation failed. Path: >>> <gfid:31f17f8c-6c96-4440-88c0-f813b3c8d364> >>> >> > (31f17f8c-6c96-4440-88c0-f813b3c8d364) [No such file or >>> directory] >>> >> > >>> >> > I also filtered out a boat load of self heal lines like these >>> two: >>> >> > [2015-10-01 15:14:14.851015] I [MSGID: 108026] >>> >> > [afr-self-heal-metadata.c:56:__afr_selfheal_metadata_do] >>> >> 0-gv0-replicate-0: >>> >> > performing metadata selfheal on >>> f78a47db-a359-430d-a655-1d217eb848c3 >>> >> > [2015-10-01 15:14:14.856392] I [MSGID: 108026] >>> >> > [afr-self-heal-common.c:651:afr_log_selfheal] 0-gv0-replicate-0: >>> >> Completed >>> >> > metadata selfheal on f78a47db-a359-430d-a655-1d217eb848c3. >>> >> source=0 sinks=1 >>> >> > >>> >> > >>> >> > [root@eapps-gluster01 glusterfs]# cat glustershd.log |grep -v >>> 'remote >>> >> > operation failed' |grep -v 'self-heal' >>> >> > [2015-09-27 08:46:56.893125] E [rpc-clnt.c:201:call_bail] >>> 0-glusterfs: >>> >> > bailing out frame type(GlusterFS Handshake) op(GETSPEC(2)) xid = >>> >> 0x6 sent = >>> >> > 2015-09-27 08:16:51.742731. timeout = 1800 for 127.0.0.1:24007 >>> >> <http://127.0.0.1:24007> >>> >> > [2015-09-28 12:54:17.524924] W [socket.c:588:__socket_rwv] >>> >> 0-glusterfs: >>> >> > readv on 127.0.0.1:24007 <http://127.0.0.1:24007> failed >>> >> (Connection reset by peer) >>> >> > [2015-09-28 12:54:27.844374] I >>> >> [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk] >>> >> > 0-glusterfs: No change in volfile, continuing >>> >> > [2015-09-28 12:57:03.485027] W [socket.c:588:__socket_rwv] >>> >> 0-gv0-client-2: >>> >> > readv on 160.10.31.227:24007 <http://160.10.31.227:24007> >>> failed >>> >> (Connection reset by peer) >>> >> > [2015-09-28 12:57:05.872973] E >>> [socket.c:2278:socket_connect_finish] >>> >> > 0-gv0-client-2: connection to 160.10.31.227:24007 >>> >> <http://160.10.31.227:24007> failed (Connection >>> >> > refused) >>> >> > [2015-09-28 12:57:38.490578] W [socket.c:588:__socket_rwv] >>> >> 0-glusterfs: >>> >> > readv on 127.0.0.1:24007 <http://127.0.0.1:24007> failed (No >>> data >>> >> available) >>> >> > [2015-09-28 12:57:49.054475] I >>> >> [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk] >>> >> > 0-glusterfs: No change in volfile, continuing >>> >> > [2015-09-28 13:01:12.062960] W >>> [glusterfsd.c:1219:cleanup_and_exit] >>> >> > (-->/lib64/libpthread.so.0() [0x3c65e07a51] >>> >> > -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e4d] >>> >> > -->/usr/sbin/glusterfs(cleanup_and_exit+0x65) [0x4059b5] ) 0-: >>> >> received >>> >> > signum (15), shutting down >>> >> > [2015-09-28 13:01:12.981945] I [MSGID: 100030] >>> >> [glusterfsd.c:2301:main] >>> >> > 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs >>> version >>> >> 3.7.4 >>> >> > (args: /usr/sbin/glusterfs -s localhost --volfile-id >>> >> gluster/glustershd -p >>> >> > /var/lib/glusterd/glustershd/run/glustershd.pid -l >>> >> > /var/log/glusterfs/glustershd.log -S >>> >> > /var/run/gluster/9a9819e90404187e84e67b01614bbe10.socket >>> >> --xlator-option >>> >> > *replicate*.node-uuid=416d712a-06fc-4b3c-a92f-8c82145626ff) >>> >> > [2015-09-28 13:01:13.009171] I [MSGID: 101190] >>> >> > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started >>> >> thread >>> >> > with index 1 >>> >> > [2015-09-28 13:01:13.092483] I >>> [graph.c:269:gf_add_cmdline_options] >>> >> > 0-gv0-replicate-0: adding option 'node-uuid' for volume >>> >> 'gv0-replicate-0' >>> >> > with value '416d712a-06fc-4b3c-a92f-8c82145626ff' >>> >> > [2015-09-28 13:01:13.100856] I [MSGID: 101190] >>> >> > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started >>> >> thread >>> >> > with index 2 >>> >> > [2015-09-28 13:01:13.103995] I [MSGID: 114020] >>> [client.c:2118:notify] >>> >> > 0-gv0-client-0: parent translators are ready, attempting >>> connect on >>> >> > transport >>> >> > [2015-09-28 13:01:13.114745] I [MSGID: 114020] >>> [client.c:2118:notify] >>> >> > 0-gv0-client-1: parent translators are ready, attempting >>> connect on >>> >> > transport >>> >> > [2015-09-28 13:01:13.115725] I >>> [rpc-clnt.c:1851:rpc_clnt_reconfig] >>> >> > 0-gv0-client-0: changing port to 49152 (from 0) >>> >> > [2015-09-28 13:01:13.125619] I [MSGID: 114020] >>> [client.c:2118:notify] >>> >> > 0-gv0-client-2: parent translators are ready, attempting >>> connect on >>> >> > transport >>> >> > [2015-09-28 13:01:13.132316] E >>> [socket.c:2278:socket_connect_finish] >>> >> > 0-gv0-client-1: connection to 160.10.31.64:24007 >>> >> <http://160.10.31.64:24007> failed (Connection >>> >> > refused) >>> >> > [2015-09-28 13:01:13.132650] I [MSGID: 114057] >>> >> > [client-handshake.c:1437:select_server_supported_programs] >>> >> 0-gv0-client-0: >>> >> > Using Program GlusterFS 3.3, Num (1298437), Version (330) >>> >> > [2015-09-28 13:01:13.133322] I [MSGID: 114046] >>> >> > [client-handshake.c:1213:client_setvolume_cbk] 0-gv0-client-0: >>> >> Connected to >>> >> > gv0-client-0, attached to remote volume '/export/sdb1/gv0'. >>> >> > [2015-09-28 13:01:13.133365] I [MSGID: 114047] >>> >> > [client-handshake.c:1224:client_setvolume_cbk] 0-gv0-client-0: >>> >> Server and >>> >> > Client lk-version numbers are not same, reopening the fds >>> >> > [2015-09-28 13:01:13.133782] I [MSGID: 108005] >>> >> > [afr-common.c:3998:afr_notify] 0-gv0-replicate-0: Subvolume >>> >> 'gv0-client-0' >>> >> > came back up; going online. >>> >> > [2015-09-28 13:01:13.133863] I [MSGID: 114035] >>> >> > [client-handshake.c:193:client_set_lk_version_cbk] >>> 0-gv0-client-0: >>> >> Server >>> >> > lk version = 1 >>> >> > Final graph: >>> >> > >>> >> > >>> >> >>> >>> +------------------------------------------------------------------------------+ >>> >> > 1: volume gv0-client-0 >>> >> > 2: type protocol/client >>> >> > 3: option clnt-lk-version 1 >>> >> > 4: option volfile-checksum 0 >>> >> > 5: option volfile-key gluster/glustershd >>> >> > 6: option client-version 3.7.4 >>> >> > 7: option process-uuid >>> >> > >>> eapps-gluster01-65147-2015/09/28-13:01:12:970131-gv0-client-0-0-0 >>> >> > 8: option fops-version 1298437 >>> >> > 9: option ping-timeout 42 >>> >> > 10: option remote-host eapps-gluster01.uwg.westga.edu >>> >> <http://eapps-gluster01.uwg.westga.edu> >>> >> > 11: option remote-subvolume /export/sdb1/gv0 >>> >> > 12: option transport-type socket >>> >> > 13: option username 0005f8fa-107a-4cc8-ac38-bb821c014c14 >>> >> > 14: option password 379bae9a-6529-4564-a6f5-f5a9f7424d01 >>> >> > 15: end-volume >>> >> > 16: >>> >> > 17: volume gv0-client-1 >>> >> > 18: type protocol/client >>> >> > 19: option ping-timeout 42 >>> >> > 20: option remote-host eapps-gluster02.uwg.westga.edu >>> >> <http://eapps-gluster02.uwg.westga.edu> >>> >> > 21: option remote-subvolume /export/sdb1/gv0 >>> >> > 22: option transport-type socket >>> >> > 23: option username 0005f8fa-107a-4cc8-ac38-bb821c014c14 >>> >> > 24: option password 379bae9a-6529-4564-a6f5-f5a9f7424d01 >>> >> > 25: end-volume >>> >> > 26: >>> >> > 27: volume gv0-client-2 >>> >> > 28: type protocol/client >>> >> > 29: option ping-timeout 42 >>> >> > 30: option remote-host eapps-gluster03.uwg.westga.edu >>> >> <http://eapps-gluster03.uwg.westga.edu> >>> >> > 31: option remote-subvolume /export/sdb1/gv0 >>> >> > 32: option transport-type socket >>> >> > 33: option username 0005f8fa-107a-4cc8-ac38-bb821c014c14 >>> >> > 34: option password 379bae9a-6529-4564-a6f5-f5a9f7424d01 >>> >> > 35: end-volume >>> >> > 36: >>> >> > 37: volume gv0-replicate-0 >>> >> > 38: type cluster/replicate >>> >> > 39: option node-uuid 416d712a-06fc-4b3c-a92f-8c82145626ff >>> >> > 46: subvolumes gv0-client-0 gv0-client-1 gv0-client-2 >>> >> > 47: end-volume >>> >> > 48: >>> >> > 49: volume glustershd >>> >> > 50: type debug/io-stats >>> >> > 51: subvolumes gv0-replicate-0 >>> >> > 52: end-volume >>> >> > 53: >>> >> > >>> >> > >>> >> >>> >>> +------------------------------------------------------------------------------+ >>> >> > [2015-09-28 13:01:13.154898] E [MSGID: 114058] >>> >> > [client-handshake.c:1524:client_query_portmap_cbk] >>> 0-gv0-client-2: >>> >> failed >>> >> > to get the port number for remote subvolume. Please run 'gluster >>> >> volume >>> >> > status' on server to see if brick process is running. >>> >> > [2015-09-28 13:01:13.155031] I [MSGID: 114018] >>> >> > [client.c:2042:client_rpc_notify] 0-gv0-client-2: disconnected >>> from >>> >> > gv0-client-2. Client process will keep trying to connect to >>> >> glusterd until >>> >> > brick's port is available >>> >> > [2015-09-28 13:01:13.155080] W [MSGID: 108001] >>> >> > [afr-common.c:4081:afr_notify] 0-gv0-replicate-0: Client-quorum >>> is >>> >> not met >>> >> > [2015-09-29 08:11:24.728797] I [MSGID: 100011] >>> >> > [glusterfsd.c:1291:reincarnate] 0-glusterfsd: Fetching the >>> volume >>> >> file from >>> >> > server... >>> >> > [2015-09-29 08:11:24.763338] I >>> >> [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk] >>> >> > 0-glusterfs: No change in volfile, continuing >>> >> > [2015-09-29 12:50:41.915254] E [rpc-clnt.c:201:call_bail] >>> >> 0-gv0-client-2: >>> >> > bailing out frame type(GF-DUMP) op(DUMP(1)) xid = 0xd91f sent = >>> >> 2015-09-29 >>> >> > 12:20:36.092734. timeout = 1800 for 160.10.31.227:24007 >>> >> <http://160.10.31.227:24007> >>> >> > [2015-09-29 12:50:41.923550] W [MSGID: 114032] >>> >> > [client-handshake.c:1623:client_dump_version_cbk] >>> 0-gv0-client-2: >>> >> received >>> >> > RPC status error [Transport endpoint is not connected] >>> >> > [2015-09-30 23:54:36.547979] W [socket.c:588:__socket_rwv] >>> >> 0-glusterfs: >>> >> > readv on 127.0.0.1:24007 <http://127.0.0.1:24007> failed (No >>> data >>> >> available) >>> >> > [2015-09-30 23:54:46.812870] E >>> [socket.c:2278:socket_connect_finish] >>> >> > 0-glusterfs: connection to 127.0.0.1:24007 >>> >> <http://127.0.0.1:24007> failed (Connection refused) >>> >> > [2015-10-01 00:14:20.997081] I >>> >> [glusterfsd-mgmt.c:1512:mgmt_getspec_cbk] >>> >> > 0-glusterfs: No change in volfile, continuing >>> >> > [2015-10-01 00:15:36.770579] W [socket.c:588:__socket_rwv] >>> >> 0-gv0-client-2: >>> >> > readv on 160.10.31.227:24007 <http://160.10.31.227:24007> >>> failed >>> >> (Connection reset by peer) >>> >> > [2015-10-01 00:15:37.906708] E >>> [socket.c:2278:socket_connect_finish] >>> >> > 0-gv0-client-2: connection to 160.10.31.227:24007 >>> >> <http://160.10.31.227:24007> failed (Connection >>> >> > refused) >>> >> > [2015-10-01 00:15:53.008130] W >>> [glusterfsd.c:1219:cleanup_and_exit] >>> >> > (-->/lib64/libpthread.so.0() [0x3b91807a51] >>> >> > -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xcd) [0x405e4d] >>> >> > -->/usr/sbin/glusterfs(cleanup_and_exit+0x65) [0x4059b5] ) 0-: >>> >> received >>> >> > signum (15), shutting down >>> >> > [2015-10-01 00:15:53.008697] I [timer.c:48:gf_timer_call_after] >>> >> > (-->/usr/lib64/libgfrpc.so.0(rpc_clnt_submit+0x3e2) >>> [0x3b9480f992] >>> >> > -->/usr/lib64/libgfrpc.so.0(__save_frame+0x76) [0x3b9480f046] >>> >> > -->/usr/lib64/libglusterfs.so.0(gf_timer_call_after+0x1b1) >>> >> [0x3b93447881] ) >>> >> > 0-timer: ctx cleanup started >>> >> > [2015-10-01 00:15:53.994698] I [MSGID: 100030] >>> >> [glusterfsd.c:2301:main] >>> >> > 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs >>> version >>> >> 3.7.4 >>> >> > (args: /usr/sbin/glusterfs -s localhost --volfile-id >>> >> gluster/glustershd -p >>> >> > /var/lib/glusterd/glustershd/run/glustershd.pid -l >>> >> > /var/log/glusterfs/glustershd.log -S >>> >> > /var/run/gluster/9a9819e90404187e84e67b01614bbe10.socket >>> >> --xlator-option >>> >> > *replicate*.node-uuid=416d712a-06fc-4b3c-a92f-8c82145626ff) >>> >> > [2015-10-01 00:15:54.020401] I [MSGID: 101190] >>> >> > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started >>> >> thread >>> >> > with index 1 >>> >> > [2015-10-01 00:15:54.086777] I >>> [graph.c:269:gf_add_cmdline_options] >>> >> > 0-gv0-replicate-0: adding option 'node-uuid' for volume >>> >> 'gv0-replicate-0' >>> >> > with value '416d712a-06fc-4b3c-a92f-8c82145626ff' >>> >> > [2015-10-01 00:15:54.093004] I [MSGID: 101190] >>> >> > [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started >>> >> thread >>> >> > with index 2 >>> >> > [2015-10-01 00:15:54.098144] I [MSGID: 114020] >>> [client.c:2118:notify] >>> >> > 0-gv0-client-0: parent translators are ready, attempting >>> connect on >>> >> > transport >>> >> > [2015-10-01 00:15:54.107432] I [MSGID: 114020] >>> [client.c:2118:notify] >>> >> > 0-gv0-client-1: parent translators are ready, attempting >>> connect on >>> >> > transport >>> >> > [2015-10-01 00:15:54.115962] I [MSGID: 114020] >>> [client.c:2118:notify] >>> >> > 0-gv0-client-2: parent translators are ready, attempting >>> connect on >>> >> > transport >>> >> > [2015-10-01 00:15:54.120474] E >>> [socket.c:2278:socket_connect_finish] >>> >> > 0-gv0-client-1: connection to 160.10.31.64:24007 >>> >> <http://160.10.31.64:24007> failed (Connection >>> >> > refused) >>> >> > [2015-10-01 00:15:54.120639] I >>> [rpc-clnt.c:1851:rpc_clnt_reconfig] >>> >> > 0-gv0-client-0: changing port to 49152 (from 0) >>> >> > Final graph: >>> >> > >>> >> > >>> >> >>> >>> +------------------------------------------------------------------------------+ >>> >> > 1: volume gv0-client-0 >>> >> > 2: type protocol/client >>> >> > 3: option ping-timeout 42 >>> >> > 4: option remote-host eapps-gluster01.uwg.westga.edu >>> >> <http://eapps-gluster01.uwg.westga.edu> >>> >> > 5: option remote-subvolume /export/sdb1/gv0 >>> >> > 6: option transport-type socket >>> >> > 7: option username 0005f8fa-107a-4cc8-ac38-bb821c014c14 >>> >> > 8: option password 379bae9a-6529-4564-a6f5-f5a9f7424d01 >>> >> > 9: end-volume >>> >> > 10: >>> >> > 11: volume gv0-client-1 >>> >> > 12: type protocol/client >>> >> > 13: option ping-timeout 42 >>> >> > 14: option remote-host eapps-gluster02.uwg.westga.edu >>> >> <http://eapps-gluster02.uwg.westga.edu> >>> >> > 15: option remote-subvolume /export/sdb1/gv0 >>> >> > 16: option transport-type socket >>> >> > 17: option username 0005f8fa-107a-4cc8-ac38-bb821c014c14 >>> >> > 18: option password 379bae9a-6529-4564-a6f5-f5a9f7424d01 >>> >> > 19: end-volume >>> >> > 20: >>> >> > 21: volume gv0-client-2 >>> >> > 22: type protocol/client >>> >> > 23: option ping-timeout 42 >>> >> > 24: option remote-host eapps-gluster03.uwg.westga.edu >>> >> <http://eapps-gluster03.uwg.westga.edu> >>> >> > 25: option remote-subvolume /export/sdb1/gv0 >>> >> > 26: option transport-type socket >>> >> > 27: option username 0005f8fa-107a-4cc8-ac38-bb821c014c14 >>> >> > 28: option password 379bae9a-6529-4564-a6f5-f5a9f7424d01 >>> >> > 29: end-volume >>> >> > 30: >>> >> > 31: volume gv0-replicate-0 >>> >> > 32: type cluster/replicate >>> >> > 33: option node-uuid 416d712a-06fc-4b3c-a92f-8c82145626ff >>> >> > 40: subvolumes gv0-client-0 gv0-client-1 gv0-client-2 >>> >> > 41: end-volume >>> >> > 42: >>> >> > 43: volume glustershd >>> >> > 44: type debug/io-stats >>> >> > 45: subvolumes gv0-replicate-0 >>> >> > 46: end-volume >>> >> > 47: >>> >> > >>> >> > >>> >> >>> >>> +------------------------------------------------------------------------------+ >>> >> > [2015-10-01 00:15:54.135650] I [MSGID: 114057] >>> >> > [client-handshake.c:1437:select_server_supported_programs] >>> >> 0-gv0-client-0: >>> >> > Using Program GlusterFS 3.3, Num (1298437), Version (330) >>> >> > [2015-10-01 00:15:54.136223] I [MSGID: 114046] >>> >> > [client-handshake.c:1213:client_setvolume_cbk] 0-gv0-client-0: >>> >> Connected to >>> >> > gv0-client-0, attached to remote volume '/export/sdb1/gv0'. >>> >> > [2015-10-01 00:15:54.136262] I [MSGID: 114047] >>> >> > [client-handshake.c:1224:client_setvolume_cbk] 0-gv0-client-0: >>> >> Server and >>> >> > Client lk-version numbers are not same, reopening the fds >>> >> > [2015-10-01 00:15:54.136410] I [MSGID: 108005] >>> >> > [afr-common.c:3998:afr_notify] 0-gv0-replicate-0: Subvolume >>> >> 'gv0-client-0' >>> >> > came back up; going online. >>> >> > [2015-10-01 00:15:54.136500] I [MSGID: 114035] >>> >> > [client-handshake.c:193:client_set_lk_version_cbk] >>> 0-gv0-client-0: >>> >> Server >>> >> > lk version = 1 >>> >> > [2015-10-01 00:15:54.401702] E [MSGID: 114058] >>> >> > [client-handshake.c:1524:client_query_portmap_cbk] >>> 0-gv0-client-2: >>> >> failed >>> >> > to get the port number for remote subvolume. Please run 'gluster >>> >> volume >>> >> > status' on server to see if brick process is running. >>> >> > [2015-10-01 00:15:54.401834] I [MSGID: 114018] >>> >> > [client.c:2042:client_rpc_notify] 0-gv0-client-2: disconnected >>> from >>> >> > gv0-client-2. Client process will keep trying to connect to >>> >> glusterd until >>> >> > brick's port is available >>> >> > [2015-10-01 00:15:54.401878] W [MSGID: 108001] >>> >> > [afr-common.c:4081:afr_notify] 0-gv0-replicate-0: Client-quorum >>> is >>> >> not met >>> >> > [2015-10-01 03:57:52.755426] E >>> [socket.c:2278:socket_connect_finish] >>> >> > 0-gv0-client-2: connection to 160.10.31.227:24007 >>> >> <http://160.10.31.227:24007> failed (Connection >>> >> > refused) >>> >> > [2015-10-01 13:50:49.000708] E >>> [socket.c:2278:socket_connect_finish] >>> >> > 0-gv0-client-2: connection to 160.10.31.227:24007 >>> >> <http://160.10.31.227:24007> failed (Connection >>> >> > timed out) >>> >> > [2015-10-01 14:36:40.481673] E [MSGID: 114058] >>> >> > [client-handshake.c:1524:client_query_portmap_cbk] >>> 0-gv0-client-1: >>> >> failed >>> >> > to get the port number for remote subvolume. Please run 'gluster >>> >> volume >>> >> > status' on server to see if brick process is running. >>> >> > [2015-10-01 14:36:40.481833] I [MSGID: 114018] >>> >> > [client.c:2042:client_rpc_notify] 0-gv0-client-1: disconnected >>> from >>> >> > gv0-client-1. Client process will keep trying to connect to >>> >> glusterd until >>> >> > brick's port is available >>> >> > [2015-10-01 14:36:41.982037] I >>> [rpc-clnt.c:1851:rpc_clnt_reconfig] >>> >> > 0-gv0-client-1: changing port to 49152 (from 0) >>> >> > [2015-10-01 14:36:41.993478] I [MSGID: 114057] >>> >> > [client-handshake.c:1437:select_server_supported_programs] >>> >> 0-gv0-client-1: >>> >> > Using Program GlusterFS 3.3, Num (1298437), Version (330) >>> >> > [2015-10-01 14:36:41.994568] I [MSGID: 114046] >>> >> > [client-handshake.c:1213:client_setvolume_cbk] 0-gv0-client-1: >>> >> Connected to >>> >> > gv0-client-1, attached to remote volume '/export/sdb1/gv0'. >>> >> > [2015-10-01 14:36:41.994647] I [MSGID: 114047] >>> >> > [client-handshake.c:1224:client_setvolume_cbk] 0-gv0-client-1: >>> >> Server and >>> >> > Client lk-version numbers are not same, reopening the fds >>> >> > [2015-10-01 14:36:41.994899] I [MSGID: 108002] >>> >> > [afr-common.c:4077:afr_notify] 0-gv0-replicate-0: Client-quorum >>> is met >>> >> > [2015-10-01 14:36:42.002275] I [MSGID: 114035] >>> >> > [client-handshake.c:193:client_set_lk_version_cbk] >>> 0-gv0-client-1: >>> >> Server >>> >> > lk version = 1 >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > Thanks, >>> >> > Gene Liverman >>> >> > Systems Integration Architect >>> >> > Information Technology Services >>> >> > University of West Georgia >>> >> > glive...@westga.edu <mailto:glive...@westga.edu> >>> >> > >>> >> > ITS: Making Technology Work for You! >>> >> > >>> >> > >>> >> > >>> >> > On Wed, Sep 30, 2015 at 10:54 PM, Gaurav Garg < >>> gg...@redhat.com >>> >> <mailto:gg...@redhat.com> > wrote: >>> >> > >>> >> > >>> >> > Hi Gene, >>> >> > >>> >> > Could you paste or attach core file/glusterd log file/cmd >>> history >>> >> to find >>> >> > out actual RCA of the crash. What steps you performed for this >>> crash. >>> >> > >>> >> > >> How can I troubleshoot this? >>> >> > >>> >> > If you want to troubleshoot this then you can look into the >>> >> glusterd log >>> >> > file, core file. >>> >> > >>> >> > Thank you.. >>> >> > >>> >> > Regards, >>> >> > Gaurav >>> >> > >>> >> > ----- Original Message ----- >>> >> > From: "Gene Liverman" < glive...@westga.edu >>> >> <mailto:glive...@westga.edu> > >>> >> > To: gluster-users@gluster.org <mailto:gluster-users@gluster.org >>> > >>> >> > Sent: Thursday, October 1, 2015 7:59:47 AM >>> >> > Subject: [Gluster-users] glusterd crashing >>> >> > >>> >> > In the last few days I've started having issues with my glusterd >>> >> service >>> >> > crashing. When it goes down it seems to do so on all nodes in my >>> >> replicated >>> >> > volume. How can I troubleshoot this? I'm on a mix of CentOS 6 >>> and >>> >> RHEL 6. >>> >> > Thanks! >>> >> > >>> >> > >>> >> > >>> >> > Gene Liverman >>> >> > Systems Integration Architect >>> >> > Information Technology Services >>> >> > University of West Georgia >>> >> > glive...@westga.edu <mailto:glive...@westga.edu> >>> >> > >>> >> > >>> >> > Sent from Outlook on my iPhone >>> >> > >>> >> > >>> >> > _______________________________________________ >>> >> > Gluster-users mailing list >>> >> > Gluster-users@gluster.org <mailto:Gluster-users@gluster.org> >>> >> > http://www.gluster.org/mailman/listinfo/gluster-users >>> >> > >>> >> > >>> >> > _______________________________________________ >>> >> > Gluster-users mailing list >>> >> > Gluster-users@gluster.org <mailto:Gluster-users@gluster.org> >>> >> > http://www.gluster.org/mailman/listinfo/gluster-users >>> >> > >>> >> >>> >> >>> >> >>> >> >>> >> _______________________________________________ >>> >> Gluster-users mailing list >>> >> Gluster-users@gluster.org >>> >> http://www.gluster.org/mailman/listinfo/gluster-users >>> >> >>> > _______________________________________________ >>> > Gluster-users mailing list >>> > Gluster-users@gluster.org >>> > http://www.gluster.org/mailman/listinfo/gluster-users >>> > >>> >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users@gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-users >> >
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users