Re: [Gluster-users] glusterfsd process spinning

Franco Broi Wed, 18 Jun 2014 00:55:31 -0700

On Wed, 2014-06-18 at 13:09 +0530, Lalatendu Mohanty wrote: 
> On 06/17/2014 02:25 PM, Susant Palai wrote:
> > Hi Franco:
> >     The following patches address the ENOTEMPTY issue.
> >
> >                   1. http://review.gluster.org/#/c/7733/
> >                   2. http://review.gluster.org/#/c/7599/
> >   
> > I think the above patches will be available in 3.5.1 which will be a minor 
> > upgrade.(Need ack from Niels de Vos.)
> >
> > Hi Lala,
> >      Can you provide the steps to downgrade to 3.4 from 3.5 ?
> >
> > Thanks :)
> 
> If you are using a RPM based distribution, "yum downgrade" command 
> should work if yum have access to 3.5 and 3.4 repos. In particular I 
> have not tested the downgrade scenario from 3.5 to 3.4. I would suggest 
> that you stop your volume and kill gluster processes while downgrading.


I did try installing 3.4 but the volume wouldn't start: 

[2014-06-16 02:53:16.886995] I [glusterfsd.c:1910:main] 0-/usr/sbin/glusterd: 
Started running /usr/sbin/glusterd version 3.4.3 (/usr/sbin/glusterd 
--pid-file=/var/run/glusterd.pid)
[2014-06-16 02:53:16.889605] I [glusterd.c:961:init] 0-management: Using 
/var/lib/glusterd as working directory
[2014-06-16 02:53:16.891580] I [socket.c:3480:socket_init] 0-socket.management: 
SSL support is NOT enabled
[2014-06-16 02:53:16.891600] I [socket.c:3495:socket_init] 0-socket.management: 
using system polling thread
[2014-06-16 02:53:16.891675] E [rpc-transport.c:253:rpc_transport_load] 
0-rpc-transport: /usr/lib64/glusterfs/3.4.3/rpc-transport/rdma.so: cannot open 
shared object file: No such file or directory
[2014-06-16 02:53:16.891691] W [rpc-transport.c:257:rpc_transport_load] 
0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid 
or not found on this machine
[2014-06-16 02:53:16.891700] W [rpcsvc.c:1389:rpcsvc_transport_create] 
0-rpc-service: cannot create listener, initing the transport failed
[2014-06-16 02:53:16.892457] I [glusterd.c:354:glusterd_check_gsync_present] 
0-glusterd: geo-replication module not installed in the system
[2014-06-16 02:53:17.087325] E 
[glusterd-store.c:1333:glusterd_restore_op_version] 0-management: wrong 
op-version (3) retreived
[2014-06-16 02:53:17.087352] E [glusterd-store.c:2510:glusterd_restore] 
0-management: Failed to restore op_version
[2014-06-16 02:53:17.087365] E [xlator.c:390:xlator_init] 0-management: 
Initialization of volume 'management' failed, review your volfile again
[2014-06-16 02:53:17.087375] E [graph.c:292:glusterfs_graph_init] 0-management: 
initializing translator failed
[2014-06-16 02:53:17.087383] E [graph.c:479:glusterfs_graph_activate] 0-graph: 
init failed
[2014-06-16 02:53:17.087534] W [glusterfsd.c:1002:cleanup_and_exit] 
(-->/usr/sbin/glusterd(main+0x5d2) [0x406802] 
(-->/usr/sbin/glusterd(glusterfs_volumes_init+0xb7) [0x4051b7] 
(-->/usr/sbin/glusterd(glusterfs_process_volfp+0x103) [0x4050c3]))) 0-: 
received signum (0), shutting down


> 
> Thanks,
> Lala
> >
> >
> > ----- Original Message -----
> > From: "Franco Broi" <franco.b...@iongeo.com>
> > To: "Susant Palai" <spa...@redhat.com>
> > Cc: "Pranith Kumar Karampuri" <pkara...@redhat.com>, 
> > gluster-users@gluster.org, "Raghavendra Gowdappa" <rgowd...@redhat.com>, 
> > kdhan...@redhat.com, vsomy...@redhat.com, nbala...@redhat.com
> > Sent: Monday, 16 June, 2014 5:47:55 AM
> > Subject: Re: [Gluster-users] glusterfsd process spinning
> >
> >
> > Is it possible to downgrade to 3.4 from 3.5? I can't afford to spend any
> > more time testing 3.5 and it doesn't seem to work as well as 3.4.
> >
> > Cheers,
> >
> > On Wed, 2014-06-04 at 01:51 -0400, Susant Palai wrote:
> >>  From the logs it seems files are present on data(21,22,23,24) which are 
> >> on nas6 while missing on data(17,18,19,20) which are on nas5 
> >> (interesting). There is an existing issue where directories does not show 
> >> up on mount point if they are not present on first_up_subvol(longest 
> >> living brick) and the current issue looks more similar. Well will look at 
> >> the client logs for more information.
> >>
> >> Susant.
> >>
> >> ----- Original Message -----
> >> From: "Franco Broi" <franco.b...@iongeo.com>
> >> To: "Pranith Kumar Karampuri" <pkara...@redhat.com>
> >> Cc: "Susant Palai" <spa...@redhat.com>, gluster-users@gluster.org, 
> >> "Raghavendra Gowdappa" <rgowd...@redhat.com>, kdhan...@redhat.com, 
> >> vsomy...@redhat.com, nbala...@redhat.com
> >> Sent: Wednesday, 4 June, 2014 10:32:37 AM
> >> Subject: Re: [Gluster-users] glusterfsd process spinning
> >>
> >> On Wed, 2014-06-04 at 10:19 +0530, Pranith Kumar Karampuri wrote:
> >>> On 06/04/2014 08:07 AM, Susant Palai wrote:
> >>>> Pranith can you send the client and bricks logs.
> >>> I have the logs. But I believe for this issue of directory not listing
> >>> entries, it would help more if we have the contents of that directory on
> >>> all the directories in the bricks + their hash values in the xattrs.
> >> Strange thing is, all the invisible files are on the one server (nas6),
> >> the other seems ok. I did rm -Rf of /data2/franco/dir* and was left with
> >> this one directory - there were many hundreds which were removed
> >> successfully.
> >>
> >> I've attached listings and xattr dumps.
> >>
> >> Cheers,
> >>
> >> Volume Name: data2
> >> Type: Distribute
> >> Volume ID: d958423f-bd25-49f1-81f8-f12e4edc6823
> >> Status: Started
> >> Number of Bricks: 8
> >> Transport-type: tcp
> >> Bricks:
> >> Brick1: nas5-10g:/data17/gvol
> >> Brick2: nas5-10g:/data18/gvol
> >> Brick3: nas5-10g:/data19/gvol
> >> Brick4: nas5-10g:/data20/gvol
> >> Brick5: nas6-10g:/data21/gvol
> >> Brick6: nas6-10g:/data22/gvol
> >> Brick7: nas6-10g:/data23/gvol
> >> Brick8: nas6-10g:/data24/gvol
> >> Options Reconfigured:
> >> nfs.drc: on
> >> cluster.min-free-disk: 5%
> >> network.frame-timeout: 10800
> >> nfs.export-volumes: on
> >> nfs.disable: on
> >> cluster.readdir-optimize: on
> >>
> >> Gluster process                                            Port    Online  
> >> Pid
> >> ------------------------------------------------------------------------------
> >> Brick nas5-10g:/data17/gvol                                49152   Y       
> >> 6553
> >> Brick nas5-10g:/data18/gvol                                49153   Y       
> >> 6564
> >> Brick nas5-10g:/data19/gvol                                49154   Y       
> >> 6575
> >> Brick nas5-10g:/data20/gvol                                49155   Y       
> >> 6586
> >> Brick nas6-10g:/data21/gvol                                49160   Y       
> >> 20608
> >> Brick nas6-10g:/data22/gvol                                49161   Y       
> >> 20613
> >> Brick nas6-10g:/data23/gvol                                49162   Y       
> >> 20614
> >> Brick nas6-10g:/data24/gvol                                49163   Y       
> >> 20621
> >>   
> >> Task Status of Volume data2
> >> ------------------------------------------------------------------------------
> >> There are no active volume tasks
> >>
> >>
> >>
> >>> Pranith
> >>>> Thanks,
> >>>> Susant~
> >>>>
> >>>> ----- Original Message -----
> >>>> From: "Pranith Kumar Karampuri" <pkara...@redhat.com>
> >>>> To: "Franco Broi" <franco.b...@iongeo.com>
> >>>> Cc: gluster-users@gluster.org, "Raghavendra Gowdappa" 
> >>>> <rgowd...@redhat.com>, spa...@redhat.com, kdhan...@redhat.com, 
> >>>> vsomy...@redhat.com, nbala...@redhat.com
> >>>> Sent: Wednesday, 4 June, 2014 7:53:41 AM
> >>>> Subject: Re: [Gluster-users] glusterfsd process spinning
> >>>>
> >>>> hi Franco,
> >>>>         CC Devs who work on DHT to comment.
> >>>>
> >>>> Pranith
> >>>>
> >>>> On 06/04/2014 07:39 AM, Franco Broi wrote:
> >>>>> On Wed, 2014-06-04 at 07:28 +0530, Pranith Kumar Karampuri wrote:
> >>>>>> Franco,
> >>>>>>           Thanks for providing the logs. I just copied over the logs 
> >>>>>> to my
> >>>>>> machine. Most of the logs I see are related to "No such File or
> >>>>>> Directory" I wonder what lead to this. Do you have any idea?
> >>>>> No but I'm just looking at my 3.5 Gluster volume and it has a directory
> >>>>> that looks empty but can't be deleted. When I look at the directories on
> >>>>> the servers there are definitely files in there.
> >>>>>
> >>>>> [franco@charlie1 franco]$ rmdir /data2/franco/dir1226/dir25
> >>>>> rmdir: failed to remove `/data2/franco/dir1226/dir25': Directory not 
> >>>>> empty
> >>>>> [franco@charlie1 franco]$ ls -la  /data2/franco/dir1226/dir25
> >>>>> total 8
> >>>>> drwxrwxr-x 2 franco support 60 May 21 03:58 .
> >>>>> drwxrwxr-x 3 franco support 24 Jun  4 09:37 ..
> >>>>>
> >>>>> [root@nas6 ~]# ls -la /data*/gvol/franco/dir1226/dir25
> >>>>> /data21/gvol/franco/dir1226/dir25:
> >>>>> total 2081
> >>>>> drwxrwxr-x 13 1348 200 13 May 21 03:58 .
> >>>>> drwxrwxr-x  3 1348 200  3 May 21 03:58 ..
> >>>>> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13017
> >>>>> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13018
> >>>>> drwxrwxr-x  2 1348 200  3 May 16 12:05 dir13020
> >>>>> drwxrwxr-x  2 1348 200  3 May 16 12:05 dir13021
> >>>>> drwxrwxr-x  2 1348 200  3 May 16 12:05 dir13022
> >>>>> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13024
> >>>>> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13027
> >>>>> drwxrwxr-x  2 1348 200  3 May 16 12:05 dir13028
> >>>>> drwxrwxr-x  2 1348 200  2 May 16 12:06 dir13029
> >>>>> drwxrwxr-x  2 1348 200  2 May 16 12:06 dir13031
> >>>>> drwxrwxr-x  2 1348 200  3 May 16 12:06 dir13032
> >>>>>
> >>>>> /data22/gvol/franco/dir1226/dir25:
> >>>>> total 2084
> >>>>> drwxrwxr-x 13 1348 200 13 May 21 03:58 .
> >>>>> drwxrwxr-x  3 1348 200  3 May 21 03:58 ..
> >>>>> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13017
> >>>>> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13018
> >>>>> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13020
> >>>>> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13021
> >>>>> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13022
> >>>>> .....
> >>>>>
> >>>>> Maybe Gluster is losing track of the files??
> >>>>>
> >>>>>> Pranith
> >>>>>>
> >>>>>> On 06/02/2014 02:48 PM, Franco Broi wrote:
> >>>>>>> Hi Pranith
> >>>>>>>
> >>>>>>> Here's a listing of the brick logs, looks very odd especially the size
> >>>>>>> of the log for data10.
> >>>>>>>
> >>>>>>> [root@nas3 bricks]# ls -ltrh
> >>>>>>> total 2.6G
> >>>>>>> -rw------- 1 root root 381K May 13 12:15 data12-gvol.log-20140511
> >>>>>>> -rw------- 1 root root 430M May 13 12:15 data11-gvol.log-20140511
> >>>>>>> -rw------- 1 root root 328K May 13 12:15 data9-gvol.log-20140511
> >>>>>>> -rw------- 1 root root 2.0M May 13 12:15 data10-gvol.log-20140511
> >>>>>>> -rw------- 1 root root    0 May 18 03:43 data10-gvol.log-20140525
> >>>>>>> -rw------- 1 root root    0 May 18 03:43 data11-gvol.log-20140525
> >>>>>>> -rw------- 1 root root    0 May 18 03:43 data12-gvol.log-20140525
> >>>>>>> -rw------- 1 root root    0 May 18 03:43 data9-gvol.log-20140525
> >>>>>>> -rw------- 1 root root    0 May 25 03:19 data10-gvol.log-20140601
> >>>>>>> -rw------- 1 root root    0 May 25 03:19 data11-gvol.log-20140601
> >>>>>>> -rw------- 1 root root    0 May 25 03:19 data9-gvol.log-20140601
> >>>>>>> -rw------- 1 root root  98M May 26 03:04 data12-gvol.log-20140518
> >>>>>>> -rw------- 1 root root    0 Jun  1 03:37 data10-gvol.log
> >>>>>>> -rw------- 1 root root    0 Jun  1 03:37 data11-gvol.log
> >>>>>>> -rw------- 1 root root    0 Jun  1 03:37 data12-gvol.log
> >>>>>>> -rw------- 1 root root    0 Jun  1 03:37 data9-gvol.log
> >>>>>>> -rw------- 1 root root 1.8G Jun  2 16:35 data10-gvol.log-20140518
> >>>>>>> -rw------- 1 root root 279M Jun  2 16:35 data9-gvol.log-20140518
> >>>>>>> -rw------- 1 root root 328K Jun  2 16:35 data12-gvol.log-20140601
> >>>>>>> -rw------- 1 root root 8.3M Jun  2 16:35 data11-gvol.log-20140518
> >>>>>>>
> >>>>>>> Too big to post everything.
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>>
> >>>>>>> On Sun, 2014-06-01 at 22:00 -0400, Pranith Kumar Karampuri wrote:
> >>>>>>>> ----- Original Message -----
> >>>>>>>>> From: "Pranith Kumar Karampuri" <pkara...@redhat.com>
> >>>>>>>>> To: "Franco Broi" <franco.b...@iongeo.com>
> >>>>>>>>> Cc: gluster-users@gluster.org
> >>>>>>>>> Sent: Monday, June 2, 2014 7:01:34 AM
> >>>>>>>>> Subject: Re: [Gluster-users] glusterfsd process spinning
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> ----- Original Message -----
> >>>>>>>>>> From: "Franco Broi" <franco.b...@iongeo.com>
> >>>>>>>>>> To: "Pranith Kumar Karampuri" <pkara...@redhat.com>
> >>>>>>>>>> Cc: gluster-users@gluster.org
> >>>>>>>>>> Sent: Sunday, June 1, 2014 10:53:51 AM
> >>>>>>>>>> Subject: Re: [Gluster-users] glusterfsd process spinning
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> The volume is almost completely idle now and the CPU for the brick
> >>>>>>>>>> process has returned to normal. I've included the profile and I 
> >>>>>>>>>> think it
> >>>>>>>>>> shows the latency for the bad brick (data12) is unusually high, 
> >>>>>>>>>> probably
> >>>>>>>>>> indicating the filesystem is at fault after all??
> >>>>>>>>> I am not sure if we can believe the outputs now that you say the 
> >>>>>>>>> brick
> >>>>>>>>> returned to normal. Next time it is acting up, do the same 
> >>>>>>>>> procedure and
> >>>>>>>>> post the result.
> >>>>>>>> On second thought may be its not a bad idea to inspect the log files 
> >>>>>>>> of the bricks in nas3. Could you post them.
> >>>>>>>>
> >>>>>>>> Pranith
> >>>>>>>>
> >>>>>>>>> Pranith
> >>>>>>>>>> On Sun, 2014-06-01 at 01:01 -0400, Pranith Kumar Karampuri wrote:
> >>>>>>>>>>> Franco,
> >>>>>>>>>>>         Could you do the following to get more information:
> >>>>>>>>>>>
> >>>>>>>>>>> "gluster volume profile <volname> start"
> >>>>>>>>>>>
> >>>>>>>>>>> Wait for some time, this will start gathering what operations are 
> >>>>>>>>>>> coming
> >>>>>>>>>>> to
> >>>>>>>>>>> all the bricks"
> >>>>>>>>>>> Now execute "gluster volume profile <volname> info" >
> >>>>>>>>>>> /file/you/should/reply/to/this/mail/with
> >>>>>>>>>>>
> >>>>>>>>>>> Then execute:
> >>>>>>>>>>> gluster volume profile <volname> stop
> >>>>>>>>>>>
> >>>>>>>>>>> Lets see if this throws any light on the problem at hand
> >>>>>>>>>>>
> >>>>>>>>>>> Pranith
> >>>>>>>>>>> ----- Original Message -----
> >>>>>>>>>>>> From: "Franco Broi" <franco.b...@iongeo.com>
> >>>>>>>>>>>> To: gluster-users@gluster.org
> >>>>>>>>>>>> Sent: Sunday, June 1, 2014 9:02:48 AM
> >>>>>>>>>>>> Subject: [Gluster-users] glusterfsd process spinning
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hi
> >>>>>>>>>>>>
> >>>>>>>>>>>> I've been suffering from continual problems with my gluster 
> >>>>>>>>>>>> filesystem
> >>>>>>>>>>>> slowing down due to what I thought was congestion on a single 
> >>>>>>>>>>>> brick
> >>>>>>>>>>>> being caused by a problem with the underlying filesystem running 
> >>>>>>>>>>>> slow
> >>>>>>>>>>>> but I've just noticed that the glusterfsd process for that 
> >>>>>>>>>>>> particular
> >>>>>>>>>>>> brick is running at 100%+, even when the filesystem is almost 
> >>>>>>>>>>>> idle.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I've done a couple of straces of the brick and another on the 
> >>>>>>>>>>>> same
> >>>>>>>>>>>> server, does the high number of futex errors give any clues as 
> >>>>>>>>>>>> to what
> >>>>>>>>>>>> might be wrong?
> >>>>>>>>>>>>
> >>>>>>>>>>>> % time     seconds  usecs/call     calls    errors syscall
> >>>>>>>>>>>> ------ ----------- ----------- --------- --------- 
> >>>>>>>>>>>> ----------------
> >>>>>>>>>>>> 45.58    0.027554           0    191665     20772 futex
> >>>>>>>>>>>> 28.26    0.017084           0    137133           readv
> >>>>>>>>>>>> 26.04    0.015743           0     66259           epoll_wait
> >>>>>>>>>>>>       0.13    0.000077           3        23           writev
> >>>>>>>>>>>>       0.00    0.000000           0         1           epoll_ctl
> >>>>>>>>>>>> ------ ----------- ----------- --------- --------- 
> >>>>>>>>>>>> ----------------
> >>>>>>>>>>>> 100.00    0.060458                395081     20772 total
> >>>>>>>>>>>>
> >>>>>>>>>>>> % time     seconds  usecs/call     calls    errors syscall
> >>>>>>>>>>>> ------ ----------- ----------- --------- --------- 
> >>>>>>>>>>>> ----------------
> >>>>>>>>>>>> 99.25    0.334020         133      2516           epoll_wait
> >>>>>>>>>>>>       0.40    0.001347           0      4090        26 futex
> >>>>>>>>>>>>       0.35    0.001192           0      5064           readv
> >>>>>>>>>>>>       0.00    0.000000           0        20           writev
> >>>>>>>>>>>> ------ ----------- ----------- --------- --------- 
> >>>>>>>>>>>> ----------------
> >>>>>>>>>>>> 100.00    0.336559                 11690        26 total
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>
> >>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>> Gluster-users mailing list
> >>>>>>>>>>>> Gluster-users@gluster.org
> >>>>>>>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> >>>>>>>>>>>>
> 


_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] glusterfsd process spinning

Reply via email to