Re: [Gluster-users] glusterfsd process spinning

Susant Palai Tue, 17 Jun 2014 01:55:30 -0700

Hi Franco:
   The following patches address the ENOTEMPTY issue.

                 1. http://review.gluster.org/#/c/7733/ 
                 2. http://review.gluster.org/#/c/7599/ 
 
I think the above patches will be available in 3.5.1 which will be a minor 
upgrade.(Need ack from Niels de Vos.)


Hi Lala,
    Can you provide the steps to downgrade to 3.4 from 3.5 ?

Thanks :)


----- Original Message -----
From: "Franco Broi" <franco.b...@iongeo.com>
To: "Susant Palai" <spa...@redhat.com>
Cc: "Pranith Kumar Karampuri" <pkara...@redhat.com>, gluster-users@gluster.org, 
"Raghavendra Gowdappa" <rgowd...@redhat.com>, kdhan...@redhat.com, 
vsomy...@redhat.com, nbala...@redhat.com
Sent: Monday, 16 June, 2014 5:47:55 AM
Subject: Re: [Gluster-users] glusterfsd process spinning


Is it possible to downgrade to 3.4 from 3.5? I can't afford to spend any
more time testing 3.5 and it doesn't seem to work as well as 3.4.

Cheers,

On Wed, 2014-06-04 at 01:51 -0400, Susant Palai wrote: 
> From the logs it seems files are present on data(21,22,23,24) which are on 
> nas6 while missing on data(17,18,19,20) which are on nas5 (interesting). 
> There is an existing issue where directories does not show up on mount point 
> if they are not present on first_up_subvol(longest living brick) and the 
> current issue looks more similar. Well will look at the client logs for more 
> information.
> 
> Susant.
> 
> ----- Original Message -----
> From: "Franco Broi" <franco.b...@iongeo.com>
> To: "Pranith Kumar Karampuri" <pkara...@redhat.com>
> Cc: "Susant Palai" <spa...@redhat.com>, gluster-users@gluster.org, 
> "Raghavendra Gowdappa" <rgowd...@redhat.com>, kdhan...@redhat.com, 
> vsomy...@redhat.com, nbala...@redhat.com
> Sent: Wednesday, 4 June, 2014 10:32:37 AM
> Subject: Re: [Gluster-users] glusterfsd process spinning
> 
> On Wed, 2014-06-04 at 10:19 +0530, Pranith Kumar Karampuri wrote: 
> > On 06/04/2014 08:07 AM, Susant Palai wrote:
> > > Pranith can you send the client and bricks logs.
> > I have the logs. But I believe for this issue of directory not listing 
> > entries, it would help more if we have the contents of that directory on 
> > all the directories in the bricks + their hash values in the xattrs.
> 
> Strange thing is, all the invisible files are on the one server (nas6),
> the other seems ok. I did rm -Rf of /data2/franco/dir* and was left with
> this one directory - there were many hundreds which were removed
> successfully.
> 
> I've attached listings and xattr dumps.
> 
> Cheers,
> 
> Volume Name: data2
> Type: Distribute
> Volume ID: d958423f-bd25-49f1-81f8-f12e4edc6823
> Status: Started
> Number of Bricks: 8
> Transport-type: tcp
> Bricks:
> Brick1: nas5-10g:/data17/gvol
> Brick2: nas5-10g:/data18/gvol
> Brick3: nas5-10g:/data19/gvol
> Brick4: nas5-10g:/data20/gvol
> Brick5: nas6-10g:/data21/gvol
> Brick6: nas6-10g:/data22/gvol
> Brick7: nas6-10g:/data23/gvol
> Brick8: nas6-10g:/data24/gvol
> Options Reconfigured:
> nfs.drc: on
> cluster.min-free-disk: 5%
> network.frame-timeout: 10800
> nfs.export-volumes: on
> nfs.disable: on
> cluster.readdir-optimize: on
> 
> Gluster process                                               Port    Online  
> Pid
> ------------------------------------------------------------------------------
> Brick nas5-10g:/data17/gvol                           49152   Y       6553
> Brick nas5-10g:/data18/gvol                           49153   Y       6564
> Brick nas5-10g:/data19/gvol                           49154   Y       6575
> Brick nas5-10g:/data20/gvol                           49155   Y       6586
> Brick nas6-10g:/data21/gvol                           49160   Y       20608
> Brick nas6-10g:/data22/gvol                           49161   Y       20613
> Brick nas6-10g:/data23/gvol                           49162   Y       20614
> Brick nas6-10g:/data24/gvol                           49163   Y       20621
>  
> Task Status of Volume data2
> ------------------------------------------------------------------------------
> There are no active volume tasks
> 
> 
> 
> > 
> > Pranith
> > >
> > > Thanks,
> > > Susant~
> > >
> > > ----- Original Message -----
> > > From: "Pranith Kumar Karampuri" <pkara...@redhat.com>
> > > To: "Franco Broi" <franco.b...@iongeo.com>
> > > Cc: gluster-users@gluster.org, "Raghavendra Gowdappa" 
> > > <rgowd...@redhat.com>, spa...@redhat.com, kdhan...@redhat.com, 
> > > vsomy...@redhat.com, nbala...@redhat.com
> > > Sent: Wednesday, 4 June, 2014 7:53:41 AM
> > > Subject: Re: [Gluster-users] glusterfsd process spinning
> > >
> > > hi Franco,
> > >        CC Devs who work on DHT to comment.
> > >
> > > Pranith
> > >
> > > On 06/04/2014 07:39 AM, Franco Broi wrote:
> > >> On Wed, 2014-06-04 at 07:28 +0530, Pranith Kumar Karampuri wrote:
> > >>> Franco,
> > >>>          Thanks for providing the logs. I just copied over the logs to 
> > >>> my
> > >>> machine. Most of the logs I see are related to "No such File or
> > >>> Directory" I wonder what lead to this. Do you have any idea?
> > >> No but I'm just looking at my 3.5 Gluster volume and it has a directory
> > >> that looks empty but can't be deleted. When I look at the directories on
> > >> the servers there are definitely files in there.
> > >>
> > >> [franco@charlie1 franco]$ rmdir /data2/franco/dir1226/dir25
> > >> rmdir: failed to remove `/data2/franco/dir1226/dir25': Directory not 
> > >> empty
> > >> [franco@charlie1 franco]$ ls -la  /data2/franco/dir1226/dir25
> > >> total 8
> > >> drwxrwxr-x 2 franco support 60 May 21 03:58 .
> > >> drwxrwxr-x 3 franco support 24 Jun  4 09:37 ..
> > >>
> > >> [root@nas6 ~]# ls -la /data*/gvol/franco/dir1226/dir25
> > >> /data21/gvol/franco/dir1226/dir25:
> > >> total 2081
> > >> drwxrwxr-x 13 1348 200 13 May 21 03:58 .
> > >> drwxrwxr-x  3 1348 200  3 May 21 03:58 ..
> > >> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13017
> > >> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13018
> > >> drwxrwxr-x  2 1348 200  3 May 16 12:05 dir13020
> > >> drwxrwxr-x  2 1348 200  3 May 16 12:05 dir13021
> > >> drwxrwxr-x  2 1348 200  3 May 16 12:05 dir13022
> > >> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13024
> > >> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13027
> > >> drwxrwxr-x  2 1348 200  3 May 16 12:05 dir13028
> > >> drwxrwxr-x  2 1348 200  2 May 16 12:06 dir13029
> > >> drwxrwxr-x  2 1348 200  2 May 16 12:06 dir13031
> > >> drwxrwxr-x  2 1348 200  3 May 16 12:06 dir13032
> > >>
> > >> /data22/gvol/franco/dir1226/dir25:
> > >> total 2084
> > >> drwxrwxr-x 13 1348 200 13 May 21 03:58 .
> > >> drwxrwxr-x  3 1348 200  3 May 21 03:58 ..
> > >> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13017
> > >> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13018
> > >> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13020
> > >> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13021
> > >> drwxrwxr-x  2 1348 200  2 May 16 12:05 dir13022
> > >> .....
> > >>
> > >> Maybe Gluster is losing track of the files??
> > >>
> > >>> Pranith
> > >>>
> > >>> On 06/02/2014 02:48 PM, Franco Broi wrote:
> > >>>> Hi Pranith
> > >>>>
> > >>>> Here's a listing of the brick logs, looks very odd especially the size
> > >>>> of the log for data10.
> > >>>>
> > >>>> [root@nas3 bricks]# ls -ltrh
> > >>>> total 2.6G
> > >>>> -rw------- 1 root root 381K May 13 12:15 data12-gvol.log-20140511
> > >>>> -rw------- 1 root root 430M May 13 12:15 data11-gvol.log-20140511
> > >>>> -rw------- 1 root root 328K May 13 12:15 data9-gvol.log-20140511
> > >>>> -rw------- 1 root root 2.0M May 13 12:15 data10-gvol.log-20140511
> > >>>> -rw------- 1 root root    0 May 18 03:43 data10-gvol.log-20140525
> > >>>> -rw------- 1 root root    0 May 18 03:43 data11-gvol.log-20140525
> > >>>> -rw------- 1 root root    0 May 18 03:43 data12-gvol.log-20140525
> > >>>> -rw------- 1 root root    0 May 18 03:43 data9-gvol.log-20140525
> > >>>> -rw------- 1 root root    0 May 25 03:19 data10-gvol.log-20140601
> > >>>> -rw------- 1 root root    0 May 25 03:19 data11-gvol.log-20140601
> > >>>> -rw------- 1 root root    0 May 25 03:19 data9-gvol.log-20140601
> > >>>> -rw------- 1 root root  98M May 26 03:04 data12-gvol.log-20140518
> > >>>> -rw------- 1 root root    0 Jun  1 03:37 data10-gvol.log
> > >>>> -rw------- 1 root root    0 Jun  1 03:37 data11-gvol.log
> > >>>> -rw------- 1 root root    0 Jun  1 03:37 data12-gvol.log
> > >>>> -rw------- 1 root root    0 Jun  1 03:37 data9-gvol.log
> > >>>> -rw------- 1 root root 1.8G Jun  2 16:35 data10-gvol.log-20140518
> > >>>> -rw------- 1 root root 279M Jun  2 16:35 data9-gvol.log-20140518
> > >>>> -rw------- 1 root root 328K Jun  2 16:35 data12-gvol.log-20140601
> > >>>> -rw------- 1 root root 8.3M Jun  2 16:35 data11-gvol.log-20140518
> > >>>>
> > >>>> Too big to post everything.
> > >>>>
> > >>>> Cheers,
> > >>>>
> > >>>> On Sun, 2014-06-01 at 22:00 -0400, Pranith Kumar Karampuri wrote:
> > >>>>> ----- Original Message -----
> > >>>>>> From: "Pranith Kumar Karampuri" <pkara...@redhat.com>
> > >>>>>> To: "Franco Broi" <franco.b...@iongeo.com>
> > >>>>>> Cc: gluster-users@gluster.org
> > >>>>>> Sent: Monday, June 2, 2014 7:01:34 AM
> > >>>>>> Subject: Re: [Gluster-users] glusterfsd process spinning
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> ----- Original Message -----
> > >>>>>>> From: "Franco Broi" <franco.b...@iongeo.com>
> > >>>>>>> To: "Pranith Kumar Karampuri" <pkara...@redhat.com>
> > >>>>>>> Cc: gluster-users@gluster.org
> > >>>>>>> Sent: Sunday, June 1, 2014 10:53:51 AM
> > >>>>>>> Subject: Re: [Gluster-users] glusterfsd process spinning
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> The volume is almost completely idle now and the CPU for the brick
> > >>>>>>> process has returned to normal. I've included the profile and I 
> > >>>>>>> think it
> > >>>>>>> shows the latency for the bad brick (data12) is unusually high, 
> > >>>>>>> probably
> > >>>>>>> indicating the filesystem is at fault after all??
> > >>>>>> I am not sure if we can believe the outputs now that you say the 
> > >>>>>> brick
> > >>>>>> returned to normal. Next time it is acting up, do the same procedure 
> > >>>>>> and
> > >>>>>> post the result.
> > >>>>> On second thought may be its not a bad idea to inspect the log files 
> > >>>>> of the bricks in nas3. Could you post them.
> > >>>>>
> > >>>>> Pranith
> > >>>>>
> > >>>>>> Pranith
> > >>>>>>> On Sun, 2014-06-01 at 01:01 -0400, Pranith Kumar Karampuri wrote:
> > >>>>>>>> Franco,
> > >>>>>>>>        Could you do the following to get more information:
> > >>>>>>>>
> > >>>>>>>> "gluster volume profile <volname> start"
> > >>>>>>>>
> > >>>>>>>> Wait for some time, this will start gathering what operations are 
> > >>>>>>>> coming
> > >>>>>>>> to
> > >>>>>>>> all the bricks"
> > >>>>>>>> Now execute "gluster volume profile <volname> info" >
> > >>>>>>>> /file/you/should/reply/to/this/mail/with
> > >>>>>>>>
> > >>>>>>>> Then execute:
> > >>>>>>>> gluster volume profile <volname> stop
> > >>>>>>>>
> > >>>>>>>> Lets see if this throws any light on the problem at hand
> > >>>>>>>>
> > >>>>>>>> Pranith
> > >>>>>>>> ----- Original Message -----
> > >>>>>>>>> From: "Franco Broi" <franco.b...@iongeo.com>
> > >>>>>>>>> To: gluster-users@gluster.org
> > >>>>>>>>> Sent: Sunday, June 1, 2014 9:02:48 AM
> > >>>>>>>>> Subject: [Gluster-users] glusterfsd process spinning
> > >>>>>>>>>
> > >>>>>>>>> Hi
> > >>>>>>>>>
> > >>>>>>>>> I've been suffering from continual problems with my gluster 
> > >>>>>>>>> filesystem
> > >>>>>>>>> slowing down due to what I thought was congestion on a single 
> > >>>>>>>>> brick
> > >>>>>>>>> being caused by a problem with the underlying filesystem running 
> > >>>>>>>>> slow
> > >>>>>>>>> but I've just noticed that the glusterfsd process for that 
> > >>>>>>>>> particular
> > >>>>>>>>> brick is running at 100%+, even when the filesystem is almost 
> > >>>>>>>>> idle.
> > >>>>>>>>>
> > >>>>>>>>> I've done a couple of straces of the brick and another on the same
> > >>>>>>>>> server, does the high number of futex errors give any clues as to 
> > >>>>>>>>> what
> > >>>>>>>>> might be wrong?
> > >>>>>>>>>
> > >>>>>>>>> % time     seconds  usecs/call     calls    errors syscall
> > >>>>>>>>> ------ ----------- ----------- --------- --------- 
> > >>>>>>>>> ----------------
> > >>>>>>>>> 45.58    0.027554           0    191665     20772 futex
> > >>>>>>>>> 28.26    0.017084           0    137133           readv
> > >>>>>>>>> 26.04    0.015743           0     66259           epoll_wait
> > >>>>>>>>>      0.13    0.000077           3        23           writev
> > >>>>>>>>>      0.00    0.000000           0         1           epoll_ctl
> > >>>>>>>>> ------ ----------- ----------- --------- --------- 
> > >>>>>>>>> ----------------
> > >>>>>>>>> 100.00    0.060458                395081     20772 total
> > >>>>>>>>>
> > >>>>>>>>> % time     seconds  usecs/call     calls    errors syscall
> > >>>>>>>>> ------ ----------- ----------- --------- --------- 
> > >>>>>>>>> ----------------
> > >>>>>>>>> 99.25    0.334020         133      2516           epoll_wait
> > >>>>>>>>>      0.40    0.001347           0      4090        26 futex
> > >>>>>>>>>      0.35    0.001192           0      5064           readv
> > >>>>>>>>>      0.00    0.000000           0        20           writev
> > >>>>>>>>> ------ ----------- ----------- --------- --------- 
> > >>>>>>>>> ----------------
> > >>>>>>>>> 100.00    0.336559                 11690        26 total
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Cheers,
> > >>>>>>>>>
> > >>>>>>>>> _______________________________________________
> > >>>>>>>>> Gluster-users mailing list
> > >>>>>>>>> Gluster-users@gluster.org
> > >>>>>>>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
> > >>>>>>>>>
> > 
> 

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] glusterfsd process spinning

Reply via email to