Re: [Gluster-users] glusterfsd process spinning

Susant Palai Wed, 18 Jun 2014 01:29:10 -0700

Hey sorry. couldn't notice you have already uploaded logs. Kaushal is looking 
at the issue now.


----- Original Message -----
From: "Franco Broi" <franco.b...@iongeo.com>
To: "Susant Palai" <spa...@redhat.com>
Cc: "Lalatendu Mohanty" <lmoha...@redhat.com>, "Niels de Vos" 
<nde...@redhat.com>, "Pranith Kumar Karampuri" <pkara...@redhat.com>, 
gluster-users@gluster.org, "Raghavendra Gowdappa" <rgowd...@redhat.com>, 
kdhan...@redhat.com, vsomy...@redhat.com, nbala...@redhat.com
Sent: Wednesday, 18 June, 2014 1:51:50 PM
Subject: Re: [Gluster-users] glusterfsd process spinning

On Wed, 2014-06-18 at 04:12 -0400, Susant Palai wrote: 
> Can you figure out the failure from log and update here ?

Not sure what you mean? I figured it failed because I'd been running a
newer version and you can't go back?

After 3.4 wouldn't start, I reinstalled 3.5 and it started working
again.

> 
> ----- Original Message -----
> From: "Franco Broi" <franco.b...@iongeo.com>
> To: "Lalatendu Mohanty" <lmoha...@redhat.com>
> Cc: "Susant Palai" <spa...@redhat.com>, "Niels de Vos" <nde...@redhat.com>, 
> "Pranith Kumar Karampuri" <pkara...@redhat.com>, gluster-users@gluster.org, 
> "Raghavendra Gowdappa" <rgowd...@redhat.com>, kdhan...@redhat.com, 
> vsomy...@redhat.com, nbala...@redhat.com
> Sent: Wednesday, 18 June, 2014 1:24:24 PM
> Subject: Re: [Gluster-users] glusterfsd process spinning
> 
> On Wed, 2014-06-18 at 13:09 +0530, Lalatendu Mohanty wrote: 
> > On 06/17/2014 02:25 PM, Susant Palai wrote:
> > > Hi Franco:
> > >     The following patches address the ENOTEMPTY issue.
> > >
> > >                   1. http://review.gluster.org/#/c/7733/
> > >                   2. http://review.gluster.org/#/c/7599/
> > >   
> > > I think the above patches will be available in 3.5.1 which will be a 
> > > minor upgrade.(Need ack from Niels de Vos.)
> > >
> > > Hi Lala,
> > >      Can you provide the steps to downgrade to 3.4 from 3.5 ?
> > >
> > > Thanks :)
> > 
> > If you are using a RPM based distribution, "yum downgrade" command 
> > should work if yum have access to 3.5 and 3.4 repos. In particular I 
> > have not tested the downgrade scenario from 3.5 to 3.4. I would suggest 
> > that you stop your volume and kill gluster processes while downgrading.
> 
> I did try installing 3.4 but the volume wouldn't start: 
> 
> [2014-06-16 02:53:16.886995] I [glusterfsd.c:1910:main] 0-/usr/sbin/glusterd: 
> Started running /usr/sbin/glusterd version 3.4.3 (/usr/sbin/glusterd 
> --pid-file=/var/run/glusterd.pid)
> [2014-06-16 02:53:16.889605] I [glusterd.c:961:init] 0-management: Using 
> /var/lib/glusterd as working directory
> [2014-06-16 02:53:16.891580] I [socket.c:3480:socket_init] 
> 0-socket.management: SSL support is NOT enabled
> [2014-06-16 02:53:16.891600] I [socket.c:3495:socket_init] 
> 0-socket.management: using system polling thread
> [2014-06-16 02:53:16.891675] E [rpc-transport.c:253:rpc_transport_load] 
> 0-rpc-transport: /usr/lib64/glusterfs/3.4.3/rpc-transport/rdma.so: cannot 
> open shared object file: No such file or directory
> [2014-06-16 02:53:16.891691] W [rpc-transport.c:257:rpc_transport_load] 
> 0-rpc-transport: volume 'rdma.management': transport-type 'rdma' is not valid 
> or not found on this machine
> [2014-06-16 02:53:16.891700] W [rpcsvc.c:1389:rpcsvc_transport_create] 
> 0-rpc-service: cannot create listener, initing the transport failed
> [2014-06-16 02:53:16.892457] I [glusterd.c:354:glusterd_check_gsync_present] 
> 0-glusterd: geo-replication module not installed in the system
> [2014-06-16 02:53:17.087325] E 
> [glusterd-store.c:1333:glusterd_restore_op_version] 0-management: wrong 
> op-version (3) retreived
> [2014-06-16 02:53:17.087352] E [glusterd-store.c:2510:glusterd_restore] 
> 0-management: Failed to restore op_version
> [2014-06-16 02:53:17.087365] E [xlator.c:390:xlator_init] 0-management: 
> Initialization of volume 'management' failed, review your volfile again
> [2014-06-16 02:53:17.087375] E [graph.c:292:glusterfs_graph_init] 
> 0-management: initializing translator failed
> [2014-06-16 02:53:17.087383] E [graph.c:479:glusterfs_graph_activate] 
> 0-graph: init failed
> [2014-06-16 02:53:17.087534] W [glusterfsd.c:1002:cleanup_and_exit] 
> (-->/usr/sbin/glusterd(main+0x5d2) [0x406802] 
> (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xb7) [0x4051b7] 
> (-->/usr/sbin/glusterd(glusterfs_process_volfp+0x103) [0x4050c3]))) 0-: 
> received signum (0), shutting down
> 
> 
> > 
> > Thanks,
> > Lala
> > >
> > >
> > > ----- Original Message -----
> > > From: "Franco Broi" <franco.b...@iongeo.com>
> > > To: "Susant Palai" <spa...@redhat.com>
> > > Cc: "Pranith Kumar Karampuri" <pkara...@redhat.com>, 
> > > gluster-users@gluster.org, "Raghavendra Gowdappa" <rgowd...@redhat.com>, 
> > > kdhan...@redhat.com, vsomy...@redhat.com, nbala...@redhat.com
> > > Sent: Monday, 16 June, 2014 5:47:55 AM
> > > Subject: Re: [Gluster-users] glusterfsd process spinning
> > >
> > >
> > > Is it possible to downgrade to 3.4 from 3.5? I can't afford to spend any
> > > more time testing 3.5 and it doesn't seem to work as well as 3.4.
> > >
> > > Cheers,
> > >
> > > On Wed, 2014-06-04 at 01:51 -0400, Susant Palai wrote:
> > >>  From the logs it seems files are present on data(21,22,23,24) which are 
> > >> on nas6 while missing on data(17,18,19,20) which are on nas5 
> > >> (interesting). There is an existing issue where directories does not 
> > >> show up on mount point if they are not present on 
> > >> first_up_subvol(longest living brick) and the current issue looks more 
> > >> similar. Well will look at the client logs for more information.
> > >>
> > >> Susant.
> > >>
> > >> ----- Original Message -----
> > >> From: "Franco Broi" <franco.b...@iongeo.com>
> > >> To: "Pranith Kumar Karampuri" <pkara...@redhat.com>
> > >> Cc: "Susant Palai" <spa...@redhat.com>, gluster-users@gluster.org, 
> > >> "Raghavendra Gowdappa" <rgowd...@redhat.com>, kdhan...@redhat.com, 
> > >> vsomy...@redhat.com, nbala...@redhat.com
> > >> Sent: Wednesday, 4 June, 2014 10:32:37 AM
> > >> Subject: Re: [Gluster-users] glusterfsd process spinning
> > >>
> > >> On Wed, 2014-06-04 at 10:19 +0530, Pranith Kumar Karampuri wrote:
> > >>> On 06/04/2014 08:07 AM, Susant Palai wrote:
> > >>>> Pranith can you send the client and bricks logs.
> > >>> I have the logs. But I believe for this issue of directory not listing
> > >>> entries, it would help more if we have the contents of that directory on
> > >>> all the directories in the bricks + their hash values in the xattrs.
> > >> Strange thing is, all the invisible files are on the one server (nas6),
> > >> the other seems ok. I did rm -Rf of /data2/franco/dir* and was left with
> > >> this one directory - there were many hundreds which were removed
> > >> successfully.
> > >>
> > >> I've attached listings and xattr dumps.
> > >>
> > >> Cheers,
> > >>
> > >> Volume Name: data2
> > >> Type: Distribute
> > >> Volume ID: d958423f-bd25-49f1-81f8-f12e4edc6823
> > >> Status: Started
> > >> Number of Bricks: 8
> > >> Transport-type: tcp
> > >> Bricks:
> > >> Brick1: nas5-10g:/data17/gvol
> > >> Brick2: nas5-10g:/data18/gvol
> > >> Brick3: nas5-10g:/data19/gvol
> > >> Brick4: nas5-10g:/data20/gvol
> > >> Brick5: nas6-10g:/data21/gvol
> > >> Brick6: nas6-10g:/data22/gvol
> > >> Brick7: nas6-10g:/data23/gvol
> > >> Brick8: nas6-10g:/data24/gvol
> > >> Options Reconfigured:
> > >> nfs.drc: on
> > >> cluster.min-free-disk: 5%
> > >> network.frame-timeout: 10800
> > >> nfs.export-volumes: on
> > >> nfs.disable: on
> > >> cluster.readdir-optimize: on
> > >>
> > >> Gluster process                                          Port    Online  
> > >> Pid
> > >> ------------------------------------------------------------------------------
> > >> Brick nas5-10g:/data17/gvol                              49152   Y       
> > >> 6553
> > >> Brick nas5-10g:/data18/gvol                              49153   Y       
> > >> 6564
> > >> Brick nas5-10g:/data19/gvol                              49154   Y       
> > >> 6575
> > >> Brick nas5-10g:/data20/gvol                              49155   Y       
> > >> 6586
> > >> Brick nas6-10g:/data21/gvol                              49160   Y       
> > >> 20608
> > >> Brick nas6-10g:/data22/gvol                              49161   Y       
> > >> 20613
> > >> Brick nas6-10g:/data23/gvol                              49162   Y       
> > >> 20614
> > >> Brick nas6-10g:/data24/gvol                              49163   Y       
> > >> 20621
> > >>   
> > >> Task Status of Volume data2
> > >> ------------------------------------------------------------------------------
> > >> There are no active volume tasks
> > >>
> > >>
> > >>
> > >>> Pranith
> > >>>> Thanks,
> > >>>> Susant~
> > >>>>
> > >>>> ----- Original Message -----
> > >>>> From: "Pranith Kumar Karampuri" <pkara...@redhat.com>
> > >>>> To: "Franco Broi" <franco.b...@iongeo.com>
> > >>>> Cc: gluster-users@gluster.org, "Raghavendra Gowdappa" 
> > >>>> <rgowd...@redhat.com>, spa...@redhat.com, kdhan...@redhat.com, 
> > >>>> vsomy...@redhat.com, nbala...@redhat.com
> > >>>> Sent: Wednesday, 4 June, 2014 7:53:41 AM
> > >>>> Subject: Re: [Gluster-users] glusterfsd process spinning
> > >>>>
> > >>>> hi Franco,
> > >>>>         CC Devs who work on DHT to comment.
> > >>>>
> > >>>> Pranith
> > >>>>
> > >>>> On 06/04/2014 07:39 AM, Franco Broi wrote:
> > >>>>> On Wed, 2014-06-04 at 07:28 +0530, Pranith Kumar Karampuri wrote:
> > >>>>>> Franco,
> > >>>>>>           Thanks for providing the logs. I just copied over the logs 
> > >>>>>> to my
> > >>>>>> machine. Most of the logs I see are related to "No such File or
> > >>>>>> Directory" I wonder what lead to this. Do you have any idea?
> > >>>>> No but I'm just looking at my 3.5 Gluster volume and it has a 
> > >>>>> directory
> > >>>>> that looks empty but can't be deleted. When I look at the directories 
> > >>>>> on
> > >>>>> the servers there are definitely files in there.
> > >>>>>
> > >>>>> [franco@charlie1 franco]$ rmdir /data2/franco/dir1226/dir25
> > >>>>> rmdir: failed to remove `/data2/franco/dir1226/dir25': Directory not 
> > >>>>> empty
> > >>>>> [franco@charlie1 franco]$ ls -la  /data2/franco/dir1226/dir25
> > >>>>> total 8
> > >>>>> drwxrwxr-x 2 franco support 60 May 21 03:58 .
> > >>>>> drwxrwxr-x 3 franco support 24 Jun  4 09:37 ..
> > >>>>>
> > >>>>> [root@nas6 ~]# ls -la /data*/gvol/franco/dir1226/dir25
> > >>>>> /data21/gvol/franco/dir1226/dir25:
> > >>>>> total 2081
> > >>>>> drwxrwxr-x 13 1348 200 13 May 21 03:58 .
> > >>>>> drwxrwxr-x  3 1348 200  3 May 21 03:58 ..
> > >>>>> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13017
> > >>>>> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13018
> > >>>>> drwxrwxr-x  2 1348 200  3 May 16 12:05 dir13020
> > >>>>> drwxrwxr-x  2 1348 200  3 May 16 12:05 dir13021
> > >>>>> drwxrwxr-x  2 1348 200  3 May 16 12:05 dir13022
> > >>>>> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13024
> > >>>>> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13027
> > >>>>> drwxrwxr-x  2 1348 200  3 May 16 12:05 dir13028
> > >>>>> drwxrwxr-x  2 1348 200  2 May 16 12:06 dir13029
> > >>>>> drwxrwxr-x  2 1348 200  2 May 16 12:06 dir13031
> > >>>>> drwxrwxr-x  2 1348 200  3 May 16 12:06 dir13032
> > >>>>>
> > >>>>> /data22/gvol/franco/dir1226/dir25:
> > >>>>> total 2084
> > >>>>> drwxrwxr-x 13 1348 200 13 May 21 03:58 .
> > >>>>> drwxrwxr-x  3 1348 200  3 May 21 03:58 ..
> > >>>>> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13017
> > >>>>> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13018
> > >>>>> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13020
> > >>>>> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13021
> > >>>>> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13022
> > >>>>> .....
> > >>>>>
> > >>>>> Maybe Gluster is losing track of the files??
> > >>>>>
> > >>>>>> Pranith
> > >>>>>>
> > >>>>>> On 06/02/2014 02:48 PM, Franco Broi wrote:
> > >>>>>>> Hi Pranith
> > >>>>>>>
> > >>>>>>> Here's a listing of the brick logs, looks very odd especially the 
> > >>>>>>> size
> > >>>>>>> of the log for data10.
> > >>>>>>>
> > >>>>>>> [root@nas3 bricks]# ls -ltrh
> > >>>>>>> total 2.6G
> > >>>>>>> -rw------- 1 root root 381K May 13 12:15 data12-gvol.log-20140511
> > >>>>>>> -rw------- 1 root root 430M May 13 12:15 data11-gvol.log-20140511
> > >>>>>>> -rw------- 1 root root 328K May 13 12:15 data9-gvol.log-20140511
> > >>>>>>> -rw------- 1 root root 2.0M May 13 12:15 data10-gvol.log-20140511
> > >>>>>>> -rw------- 1 root root    0 May 18 03:43 data10-gvol.log-20140525
> > >>>>>>> -rw------- 1 root root    0 May 18 03:43 data11-gvol.log-20140525
> > >>>>>>> -rw------- 1 root root    0 May 18 03:43 data12-gvol.log-20140525
> > >>>>>>> -rw------- 1 root root    0 May 18 03:43 data9-gvol.log-20140525
> > >>>>>>> -rw------- 1 root root    0 May 25 03:19 data10-gvol.log-20140601
> > >>>>>>> -rw------- 1 root root    0 May 25 03:19 data11-gvol.log-20140601
> > >>>>>>> -rw------- 1 root root    0 May 25 03:19 data9-gvol.log-20140601
> > >>>>>>> -rw------- 1 root root  98M May 26 03:04 data12-gvol.log-20140518
> > >>>>>>> -rw------- 1 root root    0 Jun  1 03:37 data10-gvol.log
> > >>>>>>> -rw------- 1 root root    0 Jun  1 03:37 data11-gvol.log
> > >>>>>>> -rw------- 1 root root    0 Jun  1 03:37 data12-gvol.log
> > >>>>>>> -rw------- 1 root root    0 Jun  1 03:37 data9-gvol.log
> > >>>>>>> -rw------- 1 root root 1.8G Jun  2 16:35 data10-gvol.log-20140518
> > >>>>>>> -rw------- 1 root root 279M Jun  2 16:35 data9-gvol.log-20140518
> > >>>>>>> -rw------- 1 root root 328K Jun  2 16:35 data12-gvol.log-20140601
> > >>>>>>> -rw------- 1 root root 8.3M Jun  2 16:35 data11-gvol.log-20140518
> > >>>>>>>
> > >>>>>>> Too big to post everything.
> > >>>>>>>
> > >>>>>>> Cheers,
> > >>>>>>>
> > >>>>>>> On Sun, 2014-06-01 at 22:00 -0400, Pranith Kumar Karampuri wrote:
> > >>>>>>>> ----- Original Message -----
> > >>>>>>>>> From: "Pranith Kumar Karampuri" <pkara...@redhat.com>
> > >>>>>>>>> To: "Franco Broi" <franco.b...@iongeo.com>
> > >>>>>>>>> Cc: gluster-users@gluster.org
> > >>>>>>>>> Sent: Monday, June 2, 2014 7:01:34 AM
> > >>>>>>>>> Subject: Re: [Gluster-users] glusterfsd process spinning
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> ----- Original Message -----
> > >>>>>>>>>> From: "Franco Broi" <franco.b...@iongeo.com>
> > >>>>>>>>>> To: "Pranith Kumar Karampuri" <pkara...@redhat.com>
> > >>>>>>>>>> Cc: gluster-users@gluster.org
> > >>>>>>>>>> Sent: Sunday, June 1, 2014 10:53:51 AM
> > >>>>>>>>>> Subject: Re: [Gluster-users] glusterfsd process spinning
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> The volume is almost completely idle now and the CPU for the 
> > >>>>>>>>>> brick
> > >>>>>>>>>> process has returned to normal. I've included the profile and I 
> > >>>>>>>>>> think it
> > >>>>>>>>>> shows the latency for the bad brick (data12) is unusually high, 
> > >>>>>>>>>> probably
> > >>>>>>>>>> indicating the filesystem is at fault after all??
> > >>>>>>>>> I am not sure if we can believe the outputs now that you say the 
> > >>>>>>>>> brick
> > >>>>>>>>> returned to normal. Next time it is acting up, do the same 
> > >>>>>>>>> procedure and
> > >>>>>>>>> post the result.
> > >>>>>>>> On second thought may be its not a bad idea to inspect the log 
> > >>>>>>>> files of the bricks in nas3. Could you post them.
> > >>>>>>>>
> > >>>>>>>> Pranith
> > >>>>>>>>
> > >>>>>>>>> Pranith
> > >>>>>>>>>> On Sun, 2014-06-01 at 01:01 -0400, Pranith Kumar Karampuri wrote:
> > >>>>>>>>>>> Franco,
> > >>>>>>>>>>>         Could you do the following to get more information:
> > >>>>>>>>>>>
> > >>>>>>>>>>> "gluster volume profile <volname> start"
> > >>>>>>>>>>>
> > >>>>>>>>>>> Wait for some time, this will start gathering what operations 
> > >>>>>>>>>>> are coming
> > >>>>>>>>>>> to
> > >>>>>>>>>>> all the bricks"
> > >>>>>>>>>>> Now execute "gluster volume profile <volname> info" >
> > >>>>>>>>>>> /file/you/should/reply/to/this/mail/with
> > >>>>>>>>>>>
> > >>>>>>>>>>> Then execute:
> > >>>>>>>>>>> gluster volume profile <volname> stop
> > >>>>>>>>>>>
> > >>>>>>>>>>> Lets see if this throws any light on the problem at hand
> > >>>>>>>>>>>
> > >>>>>>>>>>> Pranith
> > >>>>>>>>>>> ----- Original Message -----
> > >>>>>>>>>>>> From: "Franco Broi" <franco.b...@iongeo.com>
> > >>>>>>>>>>>> To: gluster-users@gluster.org
> > >>>>>>>>>>>> Sent: Sunday, June 1, 2014 9:02:48 AM
> > >>>>>>>>>>>> Subject: [Gluster-users] glusterfsd process spinning
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Hi
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I've been suffering from continual problems with my gluster 
> > >>>>>>>>>>>> filesystem
> > >>>>>>>>>>>> slowing down due to what I thought was congestion on a single 
> > >>>>>>>>>>>> brick
> > >>>>>>>>>>>> being caused by a problem with the underlying filesystem 
> > >>>>>>>>>>>> running slow
> > >>>>>>>>>>>> but I've just noticed that the glusterfsd process for that 
> > >>>>>>>>>>>> particular
> > >>>>>>>>>>>> brick is running at 100%+, even when the filesystem is almost 
> > >>>>>>>>>>>> idle.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I've done a couple of straces of the brick and another on the 
> > >>>>>>>>>>>> same
> > >>>>>>>>>>>> server, does the high number of futex errors give any clues as 
> > >>>>>>>>>>>> to what
> > >>>>>>>>>>>> might be wrong?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> % time     seconds  usecs/call     calls    errors syscall
> > >>>>>>>>>>>> ------ ----------- ----------- --------- --------- 
> > >>>>>>>>>>>> ----------------
> > >>>>>>>>>>>> 45.58    0.027554           0    191665     20772 futex
> > >>>>>>>>>>>> 28.26    0.017084           0    137133           readv
> > >>>>>>>>>>>> 26.04    0.015743           0     66259           epoll_wait
> > >>>>>>>>>>>>       0.13    0.000077           3        23           writev
> > >>>>>>>>>>>>       0.00    0.000000           0         1           
> > >>>>>>>>>>>> epoll_ctl
> > >>>>>>>>>>>> ------ ----------- ----------- --------- --------- 
> > >>>>>>>>>>>> ----------------
> > >>>>>>>>>>>> 100.00    0.060458                395081     20772 total
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> % time     seconds  usecs/call     calls    errors syscall
> > >>>>>>>>>>>> ------ ----------- ----------- --------- --------- 
> > >>>>>>>>>>>> ----------------
> > >>>>>>>>>>>> 99.25    0.334020         133      2516           epoll_wait
> > >>>>>>>>>>>>       0.40    0.001347           0      4090        26 futex
> > >>>>>>>>>>>>       0.35    0.001192           0      5064           readv
> > >>>>>>>>>>>>       0.00    0.000000           0        20           writev
> > >>>>>>>>>>>> ------ ----------- ----------- --------- --------- 
> > >>>>>>>>>>>> ----------------
> > >>>>>>>>>>>> 100.00    0.336559                 11690        26 total
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Cheers,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> _______________________________________________
> > >>>>>>>>>>>> Gluster-users mailing list
> > >>>>>>>>>>>> Gluster-users@gluster.org
> > >>>>>>>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> > >>>>>>>>>>>>
> > 
> 
> 


_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] glusterfsd process spinning

Reply via email to