Hey sorry. couldn't notice you have already uploaded logs. Kaushal is looking at the issue now.
----- Original Message ----- From: "Franco Broi" <franco.b...@iongeo.com> To: "Susant Palai" <spa...@redhat.com> Cc: "Lalatendu Mohanty" <lmoha...@redhat.com>, "Niels de Vos" <nde...@redhat.com>, "Pranith Kumar Karampuri" <pkara...@redhat.com>, gluster-users@gluster.org, "Raghavendra Gowdappa" <rgowd...@redhat.com>, kdhan...@redhat.com, vsomy...@redhat.com, nbala...@redhat.com Sent: Wednesday, 18 June, 2014 1:51:50 PM Subject: Re: [Gluster-users] glusterfsd process spinning On Wed, 2014-06-18 at 04:12 -0400, Susant Palai wrote: > Can you figure out the failure from log and update here ? Not sure what you mean? I figured it failed because I'd been running a newer version and you can't go back? After 3.4 wouldn't start, I reinstalled 3.5 and it started working again. > > ----- Original Message ----- > From: "Franco Broi" <franco.b...@iongeo.com> > To: "Lalatendu Mohanty" <lmoha...@redhat.com> > Cc: "Susant Palai" <spa...@redhat.com>, "Niels de Vos" <nde...@redhat.com>, > "Pranith Kumar Karampuri" <pkara...@redhat.com>, gluster-users@gluster.org, > "Raghavendra Gowdappa" <rgowd...@redhat.com>, kdhan...@redhat.com, > vsomy...@redhat.com, nbala...@redhat.com > Sent: Wednesday, 18 June, 2014 1:24:24 PM > Subject: Re: [Gluster-users] glusterfsd process spinning > > On Wed, 2014-06-18 at 13:09 +0530, Lalatendu Mohanty wrote: > > On 06/17/2014 02:25 PM, Susant Palai wrote: > > > Hi Franco: > > > The following patches address the ENOTEMPTY issue. > > > > > > 1. http://review.gluster.org/#/c/7733/ > > > 2. http://review.gluster.org/#/c/7599/ > > > > > > I think the above patches will be available in 3.5.1 which will be a > > > minor upgrade.(Need ack from Niels de Vos.) > > > > > > Hi Lala, > > > Can you provide the steps to downgrade to 3.4 from 3.5 ? > > > > > > Thanks :) > > > > If you are using a RPM based distribution, "yum downgrade" command > > should work if yum have access to 3.5 and 3.4 repos. In particular I > > have not tested the downgrade scenario from 3.5 to 3.4. I would suggest > > that you stop your volume and kill gluster processes while downgrading. > > I did try installing 3.4 but the volume wouldn't start: > > [2014-06-16 02:53:16.886995] I [glusterfsd.c:1910:main] 0-/usr/sbin/glusterd: > Started running /usr/sbin/glusterd version 3.4.3 (/usr/sbin/glusterd > --pid-file=/var/run/glusterd.pid) > [2014-06-16 02:53:16.889605] I [glusterd.c:961:init] 0-management: Using > /var/lib/glusterd as working directory > [2014-06-16 02:53:16.891580] I [socket.c:3480:socket_init] > 0-socket.management: SSL support is NOT enabled > [2014-06-16 02:53:16.891600] I [socket.c:3495:socket_init] > 0-socket.management: using system polling thread > [2014-06-16 02:53:16.891675] E [rpc-transport.c:253:rpc_transport_load] > 0-rpc-transport: /usr/lib64/glusterfs/3.4.3/rpc-transport/rdma.so: cannot > open shared object file: No such file or directory > [2014-06-16 02:53:16.891691] W [rpc-transport.c:257:rpc_transport_load] > 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid > or not found on this machine > [2014-06-16 02:53:16.891700] W [rpcsvc.c:1389:rpcsvc_transport_create] > 0-rpc-service: cannot create listener, initing the transport failed > [2014-06-16 02:53:16.892457] I [glusterd.c:354:glusterd_check_gsync_present] > 0-glusterd: geo-replication module not installed in the system > [2014-06-16 02:53:17.087325] E > [glusterd-store.c:1333:glusterd_restore_op_version] 0-management: wrong > op-version (3) retreived > [2014-06-16 02:53:17.087352] E [glusterd-store.c:2510:glusterd_restore] > 0-management: Failed to restore op_version > [2014-06-16 02:53:17.087365] E [xlator.c:390:xlator_init] 0-management: > Initialization of volume 'management' failed, review your volfile again > [2014-06-16 02:53:17.087375] E [graph.c:292:glusterfs_graph_init] > 0-management: initializing translator failed > [2014-06-16 02:53:17.087383] E [graph.c:479:glusterfs_graph_activate] > 0-graph: init failed > [2014-06-16 02:53:17.087534] W [glusterfsd.c:1002:cleanup_and_exit] > (-->/usr/sbin/glusterd(main+0x5d2) [0x406802] > (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xb7) [0x4051b7] > (-->/usr/sbin/glusterd(glusterfs_process_volfp+0x103) [0x4050c3]))) 0-: > received signum (0), shutting down > > > > > > Thanks, > > Lala > > > > > > > > > ----- Original Message ----- > > > From: "Franco Broi" <franco.b...@iongeo.com> > > > To: "Susant Palai" <spa...@redhat.com> > > > Cc: "Pranith Kumar Karampuri" <pkara...@redhat.com>, > > > gluster-users@gluster.org, "Raghavendra Gowdappa" <rgowd...@redhat.com>, > > > kdhan...@redhat.com, vsomy...@redhat.com, nbala...@redhat.com > > > Sent: Monday, 16 June, 2014 5:47:55 AM > > > Subject: Re: [Gluster-users] glusterfsd process spinning > > > > > > > > > Is it possible to downgrade to 3.4 from 3.5? I can't afford to spend any > > > more time testing 3.5 and it doesn't seem to work as well as 3.4. > > > > > > Cheers, > > > > > > On Wed, 2014-06-04 at 01:51 -0400, Susant Palai wrote: > > >> From the logs it seems files are present on data(21,22,23,24) which are > > >> on nas6 while missing on data(17,18,19,20) which are on nas5 > > >> (interesting). There is an existing issue where directories does not > > >> show up on mount point if they are not present on > > >> first_up_subvol(longest living brick) and the current issue looks more > > >> similar. Well will look at the client logs for more information. > > >> > > >> Susant. > > >> > > >> ----- Original Message ----- > > >> From: "Franco Broi" <franco.b...@iongeo.com> > > >> To: "Pranith Kumar Karampuri" <pkara...@redhat.com> > > >> Cc: "Susant Palai" <spa...@redhat.com>, gluster-users@gluster.org, > > >> "Raghavendra Gowdappa" <rgowd...@redhat.com>, kdhan...@redhat.com, > > >> vsomy...@redhat.com, nbala...@redhat.com > > >> Sent: Wednesday, 4 June, 2014 10:32:37 AM > > >> Subject: Re: [Gluster-users] glusterfsd process spinning > > >> > > >> On Wed, 2014-06-04 at 10:19 +0530, Pranith Kumar Karampuri wrote: > > >>> On 06/04/2014 08:07 AM, Susant Palai wrote: > > >>>> Pranith can you send the client and bricks logs. > > >>> I have the logs. But I believe for this issue of directory not listing > > >>> entries, it would help more if we have the contents of that directory on > > >>> all the directories in the bricks + their hash values in the xattrs. > > >> Strange thing is, all the invisible files are on the one server (nas6), > > >> the other seems ok. I did rm -Rf of /data2/franco/dir* and was left with > > >> this one directory - there were many hundreds which were removed > > >> successfully. > > >> > > >> I've attached listings and xattr dumps. > > >> > > >> Cheers, > > >> > > >> Volume Name: data2 > > >> Type: Distribute > > >> Volume ID: d958423f-bd25-49f1-81f8-f12e4edc6823 > > >> Status: Started > > >> Number of Bricks: 8 > > >> Transport-type: tcp > > >> Bricks: > > >> Brick1: nas5-10g:/data17/gvol > > >> Brick2: nas5-10g:/data18/gvol > > >> Brick3: nas5-10g:/data19/gvol > > >> Brick4: nas5-10g:/data20/gvol > > >> Brick5: nas6-10g:/data21/gvol > > >> Brick6: nas6-10g:/data22/gvol > > >> Brick7: nas6-10g:/data23/gvol > > >> Brick8: nas6-10g:/data24/gvol > > >> Options Reconfigured: > > >> nfs.drc: on > > >> cluster.min-free-disk: 5% > > >> network.frame-timeout: 10800 > > >> nfs.export-volumes: on > > >> nfs.disable: on > > >> cluster.readdir-optimize: on > > >> > > >> Gluster process Port Online > > >> Pid > > >> ------------------------------------------------------------------------------ > > >> Brick nas5-10g:/data17/gvol 49152 Y > > >> 6553 > > >> Brick nas5-10g:/data18/gvol 49153 Y > > >> 6564 > > >> Brick nas5-10g:/data19/gvol 49154 Y > > >> 6575 > > >> Brick nas5-10g:/data20/gvol 49155 Y > > >> 6586 > > >> Brick nas6-10g:/data21/gvol 49160 Y > > >> 20608 > > >> Brick nas6-10g:/data22/gvol 49161 Y > > >> 20613 > > >> Brick nas6-10g:/data23/gvol 49162 Y > > >> 20614 > > >> Brick nas6-10g:/data24/gvol 49163 Y > > >> 20621 > > >> > > >> Task Status of Volume data2 > > >> ------------------------------------------------------------------------------ > > >> There are no active volume tasks > > >> > > >> > > >> > > >>> Pranith > > >>>> Thanks, > > >>>> Susant~ > > >>>> > > >>>> ----- Original Message ----- > > >>>> From: "Pranith Kumar Karampuri" <pkara...@redhat.com> > > >>>> To: "Franco Broi" <franco.b...@iongeo.com> > > >>>> Cc: gluster-users@gluster.org, "Raghavendra Gowdappa" > > >>>> <rgowd...@redhat.com>, spa...@redhat.com, kdhan...@redhat.com, > > >>>> vsomy...@redhat.com, nbala...@redhat.com > > >>>> Sent: Wednesday, 4 June, 2014 7:53:41 AM > > >>>> Subject: Re: [Gluster-users] glusterfsd process spinning > > >>>> > > >>>> hi Franco, > > >>>> CC Devs who work on DHT to comment. > > >>>> > > >>>> Pranith > > >>>> > > >>>> On 06/04/2014 07:39 AM, Franco Broi wrote: > > >>>>> On Wed, 2014-06-04 at 07:28 +0530, Pranith Kumar Karampuri wrote: > > >>>>>> Franco, > > >>>>>> Thanks for providing the logs. I just copied over the logs > > >>>>>> to my > > >>>>>> machine. Most of the logs I see are related to "No such File or > > >>>>>> Directory" I wonder what lead to this. Do you have any idea? > > >>>>> No but I'm just looking at my 3.5 Gluster volume and it has a > > >>>>> directory > > >>>>> that looks empty but can't be deleted. When I look at the directories > > >>>>> on > > >>>>> the servers there are definitely files in there. > > >>>>> > > >>>>> [franco@charlie1 franco]$ rmdir /data2/franco/dir1226/dir25 > > >>>>> rmdir: failed to remove `/data2/franco/dir1226/dir25': Directory not > > >>>>> empty > > >>>>> [franco@charlie1 franco]$ ls -la /data2/franco/dir1226/dir25 > > >>>>> total 8 > > >>>>> drwxrwxr-x 2 franco support 60 May 21 03:58 . > > >>>>> drwxrwxr-x 3 franco support 24 Jun 4 09:37 .. > > >>>>> > > >>>>> [root@nas6 ~]# ls -la /data*/gvol/franco/dir1226/dir25 > > >>>>> /data21/gvol/franco/dir1226/dir25: > > >>>>> total 2081 > > >>>>> drwxrwxr-x 13 1348 200 13 May 21 03:58 . > > >>>>> drwxrwxr-x 3 1348 200 3 May 21 03:58 .. > > >>>>> drwxrwxr-x 2 1348 200 2 May 16 12:05 dir13017 > > >>>>> drwxrwxr-x 2 1348 200 2 May 16 12:05 dir13018 > > >>>>> drwxrwxr-x 2 1348 200 3 May 16 12:05 dir13020 > > >>>>> drwxrwxr-x 2 1348 200 3 May 16 12:05 dir13021 > > >>>>> drwxrwxr-x 2 1348 200 3 May 16 12:05 dir13022 > > >>>>> drwxrwxr-x 2 1348 200 2 May 16 12:05 dir13024 > > >>>>> drwxrwxr-x 2 1348 200 2 May 16 12:05 dir13027 > > >>>>> drwxrwxr-x 2 1348 200 3 May 16 12:05 dir13028 > > >>>>> drwxrwxr-x 2 1348 200 2 May 16 12:06 dir13029 > > >>>>> drwxrwxr-x 2 1348 200 2 May 16 12:06 dir13031 > > >>>>> drwxrwxr-x 2 1348 200 3 May 16 12:06 dir13032 > > >>>>> > > >>>>> /data22/gvol/franco/dir1226/dir25: > > >>>>> total 2084 > > >>>>> drwxrwxr-x 13 1348 200 13 May 21 03:58 . > > >>>>> drwxrwxr-x 3 1348 200 3 May 21 03:58 .. > > >>>>> drwxrwxr-x 2 1348 200 2 May 16 12:05 dir13017 > > >>>>> drwxrwxr-x 2 1348 200 2 May 16 12:05 dir13018 > > >>>>> drwxrwxr-x 2 1348 200 2 May 16 12:05 dir13020 > > >>>>> drwxrwxr-x 2 1348 200 2 May 16 12:05 dir13021 > > >>>>> drwxrwxr-x 2 1348 200 2 May 16 12:05 dir13022 > > >>>>> ..... > > >>>>> > > >>>>> Maybe Gluster is losing track of the files?? > > >>>>> > > >>>>>> Pranith > > >>>>>> > > >>>>>> On 06/02/2014 02:48 PM, Franco Broi wrote: > > >>>>>>> Hi Pranith > > >>>>>>> > > >>>>>>> Here's a listing of the brick logs, looks very odd especially the > > >>>>>>> size > > >>>>>>> of the log for data10. > > >>>>>>> > > >>>>>>> [root@nas3 bricks]# ls -ltrh > > >>>>>>> total 2.6G > > >>>>>>> -rw------- 1 root root 381K May 13 12:15 data12-gvol.log-20140511 > > >>>>>>> -rw------- 1 root root 430M May 13 12:15 data11-gvol.log-20140511 > > >>>>>>> -rw------- 1 root root 328K May 13 12:15 data9-gvol.log-20140511 > > >>>>>>> -rw------- 1 root root 2.0M May 13 12:15 data10-gvol.log-20140511 > > >>>>>>> -rw------- 1 root root 0 May 18 03:43 data10-gvol.log-20140525 > > >>>>>>> -rw------- 1 root root 0 May 18 03:43 data11-gvol.log-20140525 > > >>>>>>> -rw------- 1 root root 0 May 18 03:43 data12-gvol.log-20140525 > > >>>>>>> -rw------- 1 root root 0 May 18 03:43 data9-gvol.log-20140525 > > >>>>>>> -rw------- 1 root root 0 May 25 03:19 data10-gvol.log-20140601 > > >>>>>>> -rw------- 1 root root 0 May 25 03:19 data11-gvol.log-20140601 > > >>>>>>> -rw------- 1 root root 0 May 25 03:19 data9-gvol.log-20140601 > > >>>>>>> -rw------- 1 root root 98M May 26 03:04 data12-gvol.log-20140518 > > >>>>>>> -rw------- 1 root root 0 Jun 1 03:37 data10-gvol.log > > >>>>>>> -rw------- 1 root root 0 Jun 1 03:37 data11-gvol.log > > >>>>>>> -rw------- 1 root root 0 Jun 1 03:37 data12-gvol.log > > >>>>>>> -rw------- 1 root root 0 Jun 1 03:37 data9-gvol.log > > >>>>>>> -rw------- 1 root root 1.8G Jun 2 16:35 data10-gvol.log-20140518 > > >>>>>>> -rw------- 1 root root 279M Jun 2 16:35 data9-gvol.log-20140518 > > >>>>>>> -rw------- 1 root root 328K Jun 2 16:35 data12-gvol.log-20140601 > > >>>>>>> -rw------- 1 root root 8.3M Jun 2 16:35 data11-gvol.log-20140518 > > >>>>>>> > > >>>>>>> Too big to post everything. > > >>>>>>> > > >>>>>>> Cheers, > > >>>>>>> > > >>>>>>> On Sun, 2014-06-01 at 22:00 -0400, Pranith Kumar Karampuri wrote: > > >>>>>>>> ----- Original Message ----- > > >>>>>>>>> From: "Pranith Kumar Karampuri" <pkara...@redhat.com> > > >>>>>>>>> To: "Franco Broi" <franco.b...@iongeo.com> > > >>>>>>>>> Cc: gluster-users@gluster.org > > >>>>>>>>> Sent: Monday, June 2, 2014 7:01:34 AM > > >>>>>>>>> Subject: Re: [Gluster-users] glusterfsd process spinning > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> ----- Original Message ----- > > >>>>>>>>>> From: "Franco Broi" <franco.b...@iongeo.com> > > >>>>>>>>>> To: "Pranith Kumar Karampuri" <pkara...@redhat.com> > > >>>>>>>>>> Cc: gluster-users@gluster.org > > >>>>>>>>>> Sent: Sunday, June 1, 2014 10:53:51 AM > > >>>>>>>>>> Subject: Re: [Gluster-users] glusterfsd process spinning > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> The volume is almost completely idle now and the CPU for the > > >>>>>>>>>> brick > > >>>>>>>>>> process has returned to normal. I've included the profile and I > > >>>>>>>>>> think it > > >>>>>>>>>> shows the latency for the bad brick (data12) is unusually high, > > >>>>>>>>>> probably > > >>>>>>>>>> indicating the filesystem is at fault after all?? > > >>>>>>>>> I am not sure if we can believe the outputs now that you say the > > >>>>>>>>> brick > > >>>>>>>>> returned to normal. Next time it is acting up, do the same > > >>>>>>>>> procedure and > > >>>>>>>>> post the result. > > >>>>>>>> On second thought may be its not a bad idea to inspect the log > > >>>>>>>> files of the bricks in nas3. Could you post them. > > >>>>>>>> > > >>>>>>>> Pranith > > >>>>>>>> > > >>>>>>>>> Pranith > > >>>>>>>>>> On Sun, 2014-06-01 at 01:01 -0400, Pranith Kumar Karampuri wrote: > > >>>>>>>>>>> Franco, > > >>>>>>>>>>> Could you do the following to get more information: > > >>>>>>>>>>> > > >>>>>>>>>>> "gluster volume profile <volname> start" > > >>>>>>>>>>> > > >>>>>>>>>>> Wait for some time, this will start gathering what operations > > >>>>>>>>>>> are coming > > >>>>>>>>>>> to > > >>>>>>>>>>> all the bricks" > > >>>>>>>>>>> Now execute "gluster volume profile <volname> info" > > > >>>>>>>>>>> /file/you/should/reply/to/this/mail/with > > >>>>>>>>>>> > > >>>>>>>>>>> Then execute: > > >>>>>>>>>>> gluster volume profile <volname> stop > > >>>>>>>>>>> > > >>>>>>>>>>> Lets see if this throws any light on the problem at hand > > >>>>>>>>>>> > > >>>>>>>>>>> Pranith > > >>>>>>>>>>> ----- Original Message ----- > > >>>>>>>>>>>> From: "Franco Broi" <franco.b...@iongeo.com> > > >>>>>>>>>>>> To: gluster-users@gluster.org > > >>>>>>>>>>>> Sent: Sunday, June 1, 2014 9:02:48 AM > > >>>>>>>>>>>> Subject: [Gluster-users] glusterfsd process spinning > > >>>>>>>>>>>> > > >>>>>>>>>>>> Hi > > >>>>>>>>>>>> > > >>>>>>>>>>>> I've been suffering from continual problems with my gluster > > >>>>>>>>>>>> filesystem > > >>>>>>>>>>>> slowing down due to what I thought was congestion on a single > > >>>>>>>>>>>> brick > > >>>>>>>>>>>> being caused by a problem with the underlying filesystem > > >>>>>>>>>>>> running slow > > >>>>>>>>>>>> but I've just noticed that the glusterfsd process for that > > >>>>>>>>>>>> particular > > >>>>>>>>>>>> brick is running at 100%+, even when the filesystem is almost > > >>>>>>>>>>>> idle. > > >>>>>>>>>>>> > > >>>>>>>>>>>> I've done a couple of straces of the brick and another on the > > >>>>>>>>>>>> same > > >>>>>>>>>>>> server, does the high number of futex errors give any clues as > > >>>>>>>>>>>> to what > > >>>>>>>>>>>> might be wrong? > > >>>>>>>>>>>> > > >>>>>>>>>>>> % time seconds usecs/call calls errors syscall > > >>>>>>>>>>>> ------ ----------- ----------- --------- --------- > > >>>>>>>>>>>> ---------------- > > >>>>>>>>>>>> 45.58 0.027554 0 191665 20772 futex > > >>>>>>>>>>>> 28.26 0.017084 0 137133 readv > > >>>>>>>>>>>> 26.04 0.015743 0 66259 epoll_wait > > >>>>>>>>>>>> 0.13 0.000077 3 23 writev > > >>>>>>>>>>>> 0.00 0.000000 0 1 > > >>>>>>>>>>>> epoll_ctl > > >>>>>>>>>>>> ------ ----------- ----------- --------- --------- > > >>>>>>>>>>>> ---------------- > > >>>>>>>>>>>> 100.00 0.060458 395081 20772 total > > >>>>>>>>>>>> > > >>>>>>>>>>>> % time seconds usecs/call calls errors syscall > > >>>>>>>>>>>> ------ ----------- ----------- --------- --------- > > >>>>>>>>>>>> ---------------- > > >>>>>>>>>>>> 99.25 0.334020 133 2516 epoll_wait > > >>>>>>>>>>>> 0.40 0.001347 0 4090 26 futex > > >>>>>>>>>>>> 0.35 0.001192 0 5064 readv > > >>>>>>>>>>>> 0.00 0.000000 0 20 writev > > >>>>>>>>>>>> ------ ----------- ----------- --------- --------- > > >>>>>>>>>>>> ---------------- > > >>>>>>>>>>>> 100.00 0.336559 11690 26 total > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> Cheers, > > >>>>>>>>>>>> > > >>>>>>>>>>>> _______________________________________________ > > >>>>>>>>>>>> Gluster-users mailing list > > >>>>>>>>>>>> Gluster-users@gluster.org > > >>>>>>>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users > > >>>>>>>>>>>> > > > > _______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users