Kashif, Could you also send over the client/mount log file as Vijay suggested ? Or maybe the lines with the crash backtrace lines
Also, you've mentioned that you straced glusterd, but when you ran gdb, you ran it over /usr/sbin/glusterfs On Tue, Jun 12, 2018 at 8:19 PM, Vijay Bellur <vbel...@redhat.com> wrote: > > > On Tue, Jun 12, 2018 at 7:40 AM, mohammad kashif <kashif.a...@gmail.com> > wrote: > >> Hi Milind >> >> The operating system is Scientific Linux 6 which is based on RHEL6. The >> cpu arch is Intel x86_64. >> >> I will send you a separate email with link to core dump. >> > > > You could also grep for crash in the client log file and the lines > following crash would have a backtrace in most cases. > > HTH, > Vijay > > >> >> Thanks for your help. >> >> Kashif >> >> >> On Tue, Jun 12, 2018 at 3:16 PM, Milind Changire <mchan...@redhat.com> >> wrote: >> >>> Kashif, >>> Could you share the core dump via Google Drive or something similar >>> >>> Also, let me know the CPU arch and OS Distribution on which you are >>> running gluster. >>> >>> If you've installed the glusterfs-debuginfo package, you'll also get the >>> source lines in the backtrace via gdb >>> >>> >>> >>> On Tue, Jun 12, 2018 at 5:59 PM, mohammad kashif <kashif.a...@gmail.com> >>> wrote: >>> >>>> Hi Milind, Vijay >>>> >>>> Thanks, I have some more information now as I straced glusterd on client >>>> >>>> 138544 0.000131 mprotect(0x7f2f70785000, 4096, >>>> PROT_READ|PROT_WRITE) = 0 <0.000026> >>>> 138544 0.000128 mprotect(0x7f2f70786000, 4096, >>>> PROT_READ|PROT_WRITE) = 0 <0.000027> >>>> 138544 0.000126 mprotect(0x7f2f70787000, 4096, >>>> PROT_READ|PROT_WRITE) = 0 <0.000027> >>>> 138544 0.000124 --- SIGSEGV {si_signo=SIGSEGV, >>>> si_code=SEGV_ACCERR, si_addr=0x7f2f7c60ef88} --- >>>> 138544 0.000051 --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, >>>> si_addr=0} --- >>>> 138551 0.105048 +++ killed by SIGSEGV (core dumped) +++ >>>> 138550 0.000041 +++ killed by SIGSEGV (core dumped) +++ >>>> 138547 0.000008 +++ killed by SIGSEGV (core dumped) +++ >>>> 138546 0.000007 +++ killed by SIGSEGV (core dumped) +++ >>>> 138545 0.000007 +++ killed by SIGSEGV (core dumped) +++ >>>> 138544 0.000008 +++ killed by SIGSEGV (core dumped) +++ >>>> 138543 0.000007 +++ killed by SIGSEGV (core dumped) +++ >>>> >>>> As for I understand that somehow gluster is trying to access memory in >>>> appropriate manner and kernel sends SIGSEGV >>>> >>>> I also got the core dump. I am trying gdb first time so I am not sure >>>> whether I am using it correctly >>>> >>>> gdb /usr/sbin/glusterfs core.138536 >>>> >>>> It just tell me that program terminated with signal 11, segmentation >>>> fault . >>>> >>>> The problem is not limited to one client but happening to many clients. >>>> >>>> I will really appreciate any help as whole file system has become >>>> unusable >>>> >>>> Thanks >>>> >>>> Kashif >>>> >>>> >>>> >>>> >>>> On Tue, Jun 12, 2018 at 12:26 PM, Milind Changire <mchan...@redhat.com> >>>> wrote: >>>> >>>>> Kashif, >>>>> You can change the log level by: >>>>> $ gluster volume set <vol> diagnostics.brick-log-level TRACE >>>>> $ gluster volume set <vol> diagnostics.client-log-level TRACE >>>>> >>>>> and see how things fare >>>>> >>>>> If you want fewer logs you can change the log-level to DEBUG instead >>>>> of TRACE. >>>>> >>>>> >>>>> >>>>> On Tue, Jun 12, 2018 at 3:37 PM, mohammad kashif < >>>>> kashif.a...@gmail.com> wrote: >>>>> >>>>>> Hi Vijay >>>>>> >>>>>> Now it is unmounting every 30 mins ! >>>>>> >>>>>> The server log at /var/log/glusterfs/bricks/glusteratlas-brics001-gv0.log >>>>>> have this line only >>>>>> >>>>>> 2018-06-12 09:53:19.303102] I [MSGID: 115013] >>>>>> [server-helpers.c:289:do_fd_cleanup] 0-atlasglust-server: fd cleanup >>>>>> on /atlas/atlasdata/zgubic/hmumu/histograms/v14.3/Signal >>>>>> [2018-06-12 09:53:19.306190] I [MSGID: 101055] >>>>>> [client_t.c:443:gf_client_unref] 0-atlasglust-server: Shutting down >>>>>> connection <server-name> -2224879-2018/06/12-09:51:01:4 >>>>>> 60889-atlasglust-client-0-0-0 >>>>>> >>>>>> There is no other information. Is there any way to increase log >>>>>> verbosity? >>>>>> >>>>>> on the client >>>>>> >>>>>> 2018-06-12 09:51:01.744980] I [MSGID: 114057] >>>>>> [client-handshake.c:1478:select_server_supported_programs] >>>>>> 0-atlasglust-client-5: Using Program GlusterFS 3.3, Num (1298437), >>>>>> Version >>>>>> (330) >>>>>> [2018-06-12 09:51:01.746508] I [MSGID: 114046] >>>>>> [client-handshake.c:1231:client_setvolume_cbk] >>>>>> 0-atlasglust-client-5: Connected to atlasglust-client-5, attached to >>>>>> remote >>>>>> volume '/glusteratlas/brick006/gv0'. >>>>>> [2018-06-12 09:51:01.746543] I [MSGID: 114047] >>>>>> [client-handshake.c:1242:client_setvolume_cbk] >>>>>> 0-atlasglust-client-5: Server and Client lk-version numbers are not same, >>>>>> reopening the fds >>>>>> [2018-06-12 09:51:01.746814] I [MSGID: 114035] >>>>>> [client-handshake.c:202:client_set_lk_version_cbk] >>>>>> 0-atlasglust-client-5: Server lk version = 1 >>>>>> [2018-06-12 09:51:01.748449] I [MSGID: 114057] >>>>>> [client-handshake.c:1478:select_server_supported_programs] >>>>>> 0-atlasglust-client-6: Using Program GlusterFS 3.3, Num (1298437), >>>>>> Version >>>>>> (330) >>>>>> [2018-06-12 09:51:01.750219] I [MSGID: 114046] >>>>>> [client-handshake.c:1231:client_setvolume_cbk] >>>>>> 0-atlasglust-client-6: Connected to atlasglust-client-6, attached to >>>>>> remote >>>>>> volume '/glusteratlas/brick007/gv0'. >>>>>> [2018-06-12 09:51:01.750261] I [MSGID: 114047] >>>>>> [client-handshake.c:1242:client_setvolume_cbk] >>>>>> 0-atlasglust-client-6: Server and Client lk-version numbers are not same, >>>>>> reopening the fds >>>>>> [2018-06-12 09:51:01.750503] I [MSGID: 114035] >>>>>> [client-handshake.c:202:client_set_lk_version_cbk] >>>>>> 0-atlasglust-client-6: Server lk version = 1 >>>>>> [2018-06-12 09:51:01.752207] I [fuse-bridge.c:4205:fuse_init] >>>>>> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 >>>>>> kernel >>>>>> 7.14 >>>>>> [2018-06-12 09:51:01.752261] I [fuse-bridge.c:4835:fuse_graph_sync] >>>>>> 0-fuse: switched to graph 0 >>>>>> >>>>>> >>>>>> is there a problem with server and client 1k version? >>>>>> >>>>>> Thanks for your help. >>>>>> >>>>>> Kashif >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Jun 11, 2018 at 11:52 PM, Vijay Bellur <vbel...@redhat.com> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Jun 11, 2018 at 8:50 AM, mohammad kashif < >>>>>>> kashif.a...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi >>>>>>>> >>>>>>>> Since I have updated our gluster server and client to latest >>>>>>>> version 3.12.9-1, I am having this issue of gluster getting unmounted >>>>>>>> from >>>>>>>> client very regularly. It was not a problem before update. >>>>>>>> >>>>>>>> Its a distributed file system with no replication. We have seven >>>>>>>> servers totaling around 480TB data. Its 97% full. >>>>>>>> >>>>>>>> I am using following config on server >>>>>>>> >>>>>>>> >>>>>>>> gluster volume set atlasglust features.cache-invalidation on >>>>>>>> gluster volume set atlasglust features.cache-invalidation-timeout >>>>>>>> 600 >>>>>>>> gluster volume set atlasglust performance.stat-prefetch on >>>>>>>> gluster volume set atlasglust performance.cache-invalidation on >>>>>>>> gluster volume set atlasglust performance.md-cache-timeout 600 >>>>>>>> gluster volume set atlasglust performance.parallel-readdir on >>>>>>>> gluster volume set atlasglust performance.cache-size 1GB >>>>>>>> gluster volume set atlasglust performance.client-io-threads on >>>>>>>> gluster volume set atlasglust cluster.lookup-optimize on >>>>>>>> gluster volume set atlasglust performance.stat-prefetch on >>>>>>>> gluster volume set atlasglust client.event-threads 4 >>>>>>>> gluster volume set atlasglust server.event-threads 4 >>>>>>>> >>>>>>>> clients are mounted with this option >>>>>>>> >>>>>>>> defaults,direct-io-mode=disable,attribute-timeout=600,entry- >>>>>>>> timeout=600,negative-timeout=600,fopen-keep-cache,rw,_netdev >>>>>>>> >>>>>>>> I can't see anything in the log file. Can someone suggest that how >>>>>>>> to troubleshoot this issue? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> Can you please share the log file? Checking for messages related to >>>>>>> disconnections/crashes in the log file would be a good way to start >>>>>>> troubleshooting the problem. >>>>>>> >>>>>>> Thanks, >>>>>>> Vijay >>>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Gluster-users mailing list >>>>>> Gluster-users@gluster.org >>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Milind >>>>> >>>>> >>>> >>> >>> >>> -- >>> Milind >>> >>> >> > -- Milind
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users