Hi, As mentioned glusterfs.get_real_filename getxattr is called when we need to check if a file(case insensitive) exists in a directory.
You could run the following command to get to know the perf details. I guess you need to have debuginfo installed for this to work. Record perf: $perf record -a -g -p <PID of process to instrument> -o perf-glusterfsd.data Generate the stat: perf report -g -i perf-glusterfsd.data Attaching perf might slow down the process further, hence recommended to use it in test setup. Regards, Poornima ----- Original Message ----- > From: "Raghavendra Talur" <rta...@redhat.com> > To: "Pranith Kumar Karampuri" <pkara...@redhat.com> > Cc: "Gluster Devel" <gluster-de...@gluster.org>, "Patrick Glomski" > <patrick.glom...@corvidtec.com>, gluster-users@gluster.org, "David Robinson" > <drobin...@corvidtec.com>, "Poornima Gurusiddaiah" <pguru...@redhat.com> > Sent: Friday, January 22, 2016 8:37:50 AM > Subject: Re: [Gluster-devel] [Gluster-users] heal hanging > On Jan 22, 2016 7:27 AM, "Pranith Kumar Karampuri" < pkara...@redhat.com > > wrote: > > > > > > > > On 01/22/2016 07:19 AM, Pranith Kumar Karampuri wrote: > >> > >> > >> > >> On 01/22/2016 07:13 AM, Glomski, Patrick wrote: > >>> > >>> We use the samba glusterfs virtual filesystem (the current version > >>> provided on download.gluster.org ), but no windows clients connecting > >>> directly. > >> > >> > >> Hmm.. Is there a way to disable using this and check if the CPU% still > >> increases? What getxattr of "glusterfs.get_real_filename <filanme>" does > >> is to scan the entire directory looking for strcasecmp(<filname>, > >> <scanned-filename>). If anything matches then it will return the > >> <scanned-filename>. But the problem is the scan is costly. So I wonder if > >> this is the reason for the CPU spikes. > > > > +Raghavendra Talur, +Poornima > > > > Raghavendra, Poornima, > > When are these getxattrs triggered? Did you guys see any brick CPU spikes > > before? I initially thought it could be because of big directory heals. > > But this is happening even when no self-heals are required. So I had to > > move away from that theory. > These getxattrs are triggered when a SMB client performs a path based > operation. It is necessary then that some client was connected. > The last fix to go in that code for 3.6 was > http://review.gluster.org/#/c/10403/ . > I am not able to determine which release of 3.6 it made into. Will update. > Also we would need version of Samba installed. Including the vfs plugin > package. > There is a for loop of strcmp involved here which does take a lot of CPU. It > should be for short bursts though and is expected and harmless. > > > > Pranith > > > >> > >> Pranith > >>> > >>> > >>> On Thu, Jan 21, 2016 at 8:37 PM, Pranith Kumar Karampuri < > >>> pkara...@redhat.com > wrote: > >>>> > >>>> Do you have any windows clients? I see a lot of getxattr calls for > >>>> "glusterfs.get_real_filename" which lead to full readdirs of the > >>>> directories on the brick. > >>>> > >>>> Pranith > >>>> > >>>> On 01/22/2016 12:51 AM, Glomski, Patrick wrote: > >>>>> > >>>>> Pranith, could this kind of behavior be self-inflicted by us deleting > >>>>> files directly from the bricks? We have done that in the past to clean > >>>>> up an issues where gluster wouldn't allow us to delete from the mount. > >>>>> > >>>>> If so, is it feasible to clean them up by running a search on the > >>>>> .glusterfs directories directly and removing files with a reference > >>>>> count of 1 that are non-zero size (or directly checking the xattrs to > >>>>> be sure that it's not a DHT link). > >>>>> > >>>>> find /data/brick01a/homegfs/.glusterfs -type f -not -empty -links -2 > >>>>> -exec rm -f "{}" \; > >>>>> > >>>>> Is there anything I'm inherently missing with that approach that will > >>>>> further corrupt the system? > >>>>> > >>>>> > >>>>> On Thu, Jan 21, 2016 at 1:02 PM, Glomski, Patrick < > >>>>> patrick.glom...@corvidtec.com > wrote: > >>>>>> > >>>>>> Load spiked again: ~1200%cpu on gfs02a for glusterfsd. Crawl has been > >>>>>> running on one of the bricks on gfs02b for 25 min or so and users > >>>>>> cannot access the volume. > >>>>>> > >>>>>> I re-listed the xattrop directories as well as a 'top' entry and heal > >>>>>> statistics. Then I restarted the gluster services on gfs02a. > >>>>>> > >>>>>> =================== top =================== > >>>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > >>>>>> 8969 root 20 0 2815m 204m 3588 S 1181.0 0.6 591:06.93 glusterfsd > >>>>>> > >>>>>> =================== xattrop =================== > >>>>>> /data/brick01a/homegfs/.glusterfs/indices/xattrop: > >>>>>> xattrop-41f19453-91e4-437c-afa9-3b25614de210 > >>>>>> xattrop-9b815879-2f4d-402b-867c-a6d65087788c > >>>>>> > >>>>>> /data/brick02a/homegfs/.glusterfs/indices/xattrop: > >>>>>> xattrop-70131855-3cfb-49af-abce-9d23f57fb393 > >>>>>> xattrop-dfb77848-a39d-4417-a725-9beca75d78c6 > >>>>>> > >>>>>> /data/brick01b/homegfs/.glusterfs/indices/xattrop: > >>>>>> e6e47ed9-309b-42a7-8c44-28c29b9a20f8 > >>>>>> xattrop-5c797a64-bde7-4eac-b4fc-0befc632e125 > >>>>>> xattrop-38ec65a1-00b5-4544-8a6c-bf0f531a1934 > >>>>>> xattrop-ef0980ad-f074-4163-979f-16d5ef85b0a0 > >>>>>> > >>>>>> /data/brick02b/homegfs/.glusterfs/indices/xattrop: > >>>>>> xattrop-7402438d-0ee7-4fcf-b9bb-b561236f99bc > >>>>>> xattrop-8ffbf5f7-ace3-497d-944e-93ac85241413 > >>>>>> > >>>>>> /data/brick01a/homegfs/.glusterfs/indices/xattrop: > >>>>>> xattrop-0115acd0-caae-4dfd-b3b4-7cc42a0ff531 > >>>>>> > >>>>>> /data/brick02a/homegfs/.glusterfs/indices/xattrop: > >>>>>> xattrop-7e20fdb1-5224-4b9a-be06-568708526d70 > >>>>>> > >>>>>> /data/brick01b/homegfs/.glusterfs/indices/xattrop: > >>>>>> 8034bc06-92cd-4fa5-8aaf-09039e79d2c8 > >>>>>> c9ce22ed-6d8b-471b-a111-b39e57f0b512 > >>>>>> 94fa1d60-45ad-4341-b69c-315936b51e8d > >>>>>> xattrop-9c04623a-64ce-4f66-8b23-dbaba49119c7 > >>>>>> > >>>>>> /data/brick02b/homegfs/.glusterfs/indices/xattrop: > >>>>>> xattrop-b8c8f024-d038-49a2-9a53-c54ead09111d > >>>>>> > >>>>>> > >>>>>> =================== heal stats =================== > >>>>>> > >>>>>> homegfs [b0-gfsib01a] : Starting time of crawl : Thu Jan 21 12:36:45 > >>>>>> 2016 > >>>>>> homegfs [b0-gfsib01a] : Ending time of crawl : Thu Jan 21 12:36:45 > >>>>>> 2016 > >>>>>> homegfs [b0-gfsib01a] : Type of crawl: INDEX > >>>>>> homegfs [b0-gfsib01a] : No. of entries healed : 0 > >>>>>> homegfs [b0-gfsib01a] : No. of entries in split-brain: 0 > >>>>>> homegfs [b0-gfsib01a] : No. of heal failed entries : 0 > >>>>>> > >>>>>> homegfs [b1-gfsib01b] : Starting time of crawl : Thu Jan 21 12:36:19 > >>>>>> 2016 > >>>>>> homegfs [b1-gfsib01b] : Ending time of crawl : Thu Jan 21 12:36:19 > >>>>>> 2016 > >>>>>> homegfs [b1-gfsib01b] : Type of crawl: INDEX > >>>>>> homegfs [b1-gfsib01b] : No. of entries healed : 0 > >>>>>> homegfs [b1-gfsib01b] : No. of entries in split-brain: 0 > >>>>>> homegfs [b1-gfsib01b] : No. of heal failed entries : 1 > >>>>>> > >>>>>> homegfs [b2-gfsib01a] : Starting time of crawl : Thu Jan 21 12:36:48 > >>>>>> 2016 > >>>>>> homegfs [b2-gfsib01a] : Ending time of crawl : Thu Jan 21 12:36:48 > >>>>>> 2016 > >>>>>> homegfs [b2-gfsib01a] : Type of crawl: INDEX > >>>>>> homegfs [b2-gfsib01a] : No. of entries healed : 0 > >>>>>> homegfs [b2-gfsib01a] : No. of entries in split-brain: 0 > >>>>>> homegfs [b2-gfsib01a] : No. of heal failed entries : 0 > >>>>>> > >>>>>> homegfs [b3-gfsib01b] : Starting time of crawl : Thu Jan 21 12:36:47 > >>>>>> 2016 > >>>>>> homegfs [b3-gfsib01b] : Ending time of crawl : Thu Jan 21 12:36:47 > >>>>>> 2016 > >>>>>> homegfs [b3-gfsib01b] : Type of crawl: INDEX > >>>>>> homegfs [b3-gfsib01b] : No. of entries healed : 0 > >>>>>> homegfs [b3-gfsib01b] : No. of entries in split-brain: 0 > >>>>>> homegfs [b3-gfsib01b] : No. of heal failed entries : 0 > >>>>>> > >>>>>> homegfs [b4-gfsib02a] : Starting time of crawl : Thu Jan 21 12:36:06 > >>>>>> 2016 > >>>>>> homegfs [b4-gfsib02a] : Ending time of crawl : Thu Jan 21 12:36:06 > >>>>>> 2016 > >>>>>> homegfs [b4-gfsib02a] : Type of crawl: INDEX > >>>>>> homegfs [b4-gfsib02a] : No. of entries healed : 0 > >>>>>> homegfs [b4-gfsib02a] : No. of entries in split-brain: 0 > >>>>>> homegfs [b4-gfsib02a] : No. of heal failed entries : 0 > >>>>>> > >>>>>> homegfs [b5-gfsib02b] : Starting time of crawl : Thu Jan 21 12:13:40 > >>>>>> 2016 > >>>>>> homegfs [b5-gfsib02b] : *** Crawl is in progress *** > >>>>>> homegfs [b5-gfsib02b] : Type of crawl: INDEX > >>>>>> homegfs [b5-gfsib02b] : No. of entries healed : 0 > >>>>>> homegfs [b5-gfsib02b] : No. of entries in split-brain: 0 > >>>>>> homegfs [b5-gfsib02b] : No. of heal failed entries : 0 > >>>>>> > >>>>>> homegfs [b6-gfsib02a] : Starting time of crawl : Thu Jan 21 12:36:58 > >>>>>> 2016 > >>>>>> homegfs [b6-gfsib02a] : Ending time of crawl : Thu Jan 21 12:36:58 > >>>>>> 2016 > >>>>>> homegfs [b6-gfsib02a] : Type of crawl: INDEX > >>>>>> homegfs [b6-gfsib02a] : No. of entries healed : 0 > >>>>>> homegfs [b6-gfsib02a] : No. of entries in split-brain: 0 > >>>>>> homegfs [b6-gfsib02a] : No. of heal failed entries : 0 > >>>>>> > >>>>>> homegfs [b7-gfsib02b] : Starting time of crawl : Thu Jan 21 12:36:50 > >>>>>> 2016 > >>>>>> homegfs [b7-gfsib02b] : Ending time of crawl : Thu Jan 21 12:36:50 > >>>>>> 2016 > >>>>>> homegfs [b7-gfsib02b] : Type of crawl: INDEX > >>>>>> homegfs [b7-gfsib02b] : No. of entries healed : 0 > >>>>>> homegfs [b7-gfsib02b] : No. of entries in split-brain: 0 > >>>>>> homegfs [b7-gfsib02b] : No. of heal failed entries : 0 > >>>>>> > >>>>>> > >>>>>> ======================================================================================== > >>>>>> I waited a few minutes for the heals to finish and ran the heal > >>>>>> statistics and info again. one file is in split-brain. Aside from the > >>>>>> split-brain, the load on all systems is down now and they are > >>>>>> behaving normally. glustershd.log is attached. What is going on??? > >>>>>> > >>>>>> Thu Jan 21 12:53:50 EST 2016 > >>>>>> > >>>>>> =================== homegfs =================== > >>>>>> > >>>>>> homegfs [b0-gfsib01a] : Starting time of crawl : Thu Jan 21 12:53:02 > >>>>>> 2016 > >>>>>> homegfs [b0-gfsib01a] : Ending time of crawl : Thu Jan 21 12:53:02 > >>>>>> 2016 > >>>>>> homegfs [b0-gfsib01a] : Type of crawl: INDEX > >>>>>> homegfs [b0-gfsib01a] : No. of entries healed : 0 > >>>>>> homegfs [b0-gfsib01a] : No. of entries in split-brain: 0 > >>>>>> homegfs [b0-gfsib01a] : No. of heal failed entries : 0 > >>>>>> > >>>>>> homegfs [b1-gfsib01b] : Starting time of crawl : Thu Jan 21 12:53:38 > >>>>>> 2016 > >>>>>> homegfs [b1-gfsib01b] : Ending time of crawl : Thu Jan 21 12:53:38 > >>>>>> 2016 > >>>>>> homegfs [b1-gfsib01b] : Type of crawl: INDEX > >>>>>> homegfs [b1-gfsib01b] : No. of entries healed : 0 > >>>>>> homegfs [b1-gfsib01b] : No. of entries in split-brain: 0 > >>>>>> homegfs [b1-gfsib01b] : No. of heal failed entries : 1 > >>>>>> > >>>>>> homegfs [b2-gfsib01a] : Starting time of crawl : Thu Jan 21 12:53:04 > >>>>>> 2016 > >>>>>> homegfs [b2-gfsib01a] : Ending time of crawl : Thu Jan 21 12:53:04 > >>>>>> 2016 > >>>>>> homegfs [b2-gfsib01a] : Type of crawl: INDEX > >>>>>> homegfs [b2-gfsib01a] : No. of entries healed : 0 > >>>>>> homegfs [b2-gfsib01a] : No. of entries in split-brain: 0 > >>>>>> homegfs [b2-gfsib01a] : No. of heal failed entries : 0 > >>>>>> > >>>>>> homegfs [b3-gfsib01b] : Starting time of crawl : Thu Jan 21 12:53:04 > >>>>>> 2016 > >>>>>> homegfs [b3-gfsib01b] : Ending time of crawl : Thu Jan 21 12:53:04 > >>>>>> 2016 > >>>>>> homegfs [b3-gfsib01b] : Type of crawl: INDEX > >>>>>> homegfs [b3-gfsib01b] : No. of entries healed : 0 > >>>>>> homegfs [b3-gfsib01b] : No. of entries in split-brain: 0 > >>>>>> homegfs [b3-gfsib01b] : No. of heal failed entries : 0 > >>>>>> > >>>>>> homegfs [b4-gfsib02a] : Starting time of crawl : Thu Jan 21 12:53:33 > >>>>>> 2016 > >>>>>> homegfs [b4-gfsib02a] : Ending time of crawl : Thu Jan 21 12:53:33 > >>>>>> 2016 > >>>>>> homegfs [b4-gfsib02a] : Type of crawl: INDEX > >>>>>> homegfs [b4-gfsib02a] : No. of entries healed : 0 > >>>>>> homegfs [b4-gfsib02a] : No. of entries in split-brain: 0 > >>>>>> homegfs [b4-gfsib02a] : No. of heal failed entries : 1 > >>>>>> > >>>>>> homegfs [b5-gfsib02b] : Starting time of crawl : Thu Jan 21 12:53:14 > >>>>>> 2016 > >>>>>> homegfs [b5-gfsib02b] : Ending time of crawl : Thu Jan 21 12:53:15 > >>>>>> 2016 > >>>>>> homegfs [b5-gfsib02b] : Type of crawl: INDEX > >>>>>> homegfs [b5-gfsib02b] : No. of entries healed : 0 > >>>>>> homegfs [b5-gfsib02b] : No. of entries in split-brain: 0 > >>>>>> homegfs [b5-gfsib02b] : No. of heal failed entries : 3 > >>>>>> > >>>>>> homegfs [b6-gfsib02a] : Starting time of crawl : Thu Jan 21 12:53:04 > >>>>>> 2016 > >>>>>> homegfs [b6-gfsib02a] : Ending time of crawl : Thu Jan 21 12:53:04 > >>>>>> 2016 > >>>>>> homegfs [b6-gfsib02a] : Type of crawl: INDEX > >>>>>> homegfs [b6-gfsib02a] : No. of entries healed : 0 > >>>>>> homegfs [b6-gfsib02a] : No. of entries in split-brain: 0 > >>>>>> homegfs [b6-gfsib02a] : No. of heal failed entries : 0 > >>>>>> > >>>>>> homegfs [b7-gfsib02b] : Starting time of crawl : Thu Jan 21 12:53:09 > >>>>>> 2016 > >>>>>> homegfs [b7-gfsib02b] : Ending time of crawl : Thu Jan 21 12:53:09 > >>>>>> 2016 > >>>>>> homegfs [b7-gfsib02b] : Type of crawl: INDEX > >>>>>> homegfs [b7-gfsib02b] : No. of entries healed : 0 > >>>>>> homegfs [b7-gfsib02b] : No. of entries in split-brain: 0 > >>>>>> homegfs [b7-gfsib02b] : No. of heal failed entries : 0 > >>>>>> > >>>>>> *** gluster bug in 'gluster volume heal homegfs statistics' *** > >>>>>> *** Use 'gluster volume heal homegfs info' until bug is fixed *** > >>>>>> > >>>>>> Brick gfs01a.corvidtec.com:/data/brick01a/homegfs/ > >>>>>> Number of entries: 0 > >>>>>> > >>>>>> Brick gfs01b.corvidtec.com:/data/brick01b/homegfs/ > >>>>>> Number of entries: 0 > >>>>>> > >>>>>> Brick gfs01a.corvidtec.com:/data/brick02a/homegfs/ > >>>>>> Number of entries: 0 > >>>>>> > >>>>>> Brick gfs01b.corvidtec.com:/data/brick02b/homegfs/ > >>>>>> Number of entries: 0 > >>>>>> > >>>>>> Brick gfs02a.corvidtec.com:/data/brick01a/homegfs/ > >>>>>> /users/bangell/.gconfd - Is in split-brain > >>>>>> > >>>>>> Number of entries: 1 > >>>>>> > >>>>>> Brick gfs02b.corvidtec.com:/data/brick01b/homegfs/ > >>>>>> /users/bangell/.gconfd - Is in split-brain > >>>>>> > >>>>>> /users/bangell/.gconfd/saved_state > >>>>>> Number of entries: 2 > >>>>>> > >>>>>> Brick gfs02a.corvidtec.com:/data/brick02a/homegfs/ > >>>>>> Number of entries: 0 > >>>>>> > >>>>>> Brick gfs02b.corvidtec.com:/data/brick02b/homegfs/ > >>>>>> Number of entries: 0 > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Thu, Jan 21, 2016 at 11:10 AM, Pranith Kumar Karampuri < > >>>>>> pkara...@redhat.com > wrote: > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On 01/21/2016 09:26 PM, Glomski, Patrick wrote: > >>>>>>>> > >>>>>>>> I should mention that the problem is not currently occurring and > >>>>>>>> there are no heals (output appended). By restarting the gluster > >>>>>>>> services, we can stop the crawl, which lowers the load for a while. > >>>>>>>> Subsequent crawls seem to finish properly. For what it's worth, > >>>>>>>> files/folders that show up in the 'volume info' output during a > >>>>>>>> hung crawl don't seem to be anything out of the ordinary. > >>>>>>>> > >>>>>>>> Over the past four days, the typical time before the problem recurs > >>>>>>>> after suppressing it in this manner is an hour. Last night when we > >>>>>>>> reached out to you was the last time it happened and the load has > >>>>>>>> been low since (a relief). David believes that recursively listing > >>>>>>>> the files (ls -alR or similar) from a client mount can force the > >>>>>>>> issue to happen, but obviously I'd rather not unless we have some > >>>>>>>> precise thing we're looking for. Let me know if you'd like me to > >>>>>>>> attempt to drive the system unstable like that and what I should > >>>>>>>> look for. As it's a production system, I'd rather not leave it in > >>>>>>>> this state for long. > >>>>>>> > >>>>>>> > >>>>>>> Will it be possible to send glustershd, mount logs of the past 4 > >>>>>>> days? I would like to see if this is because of directory self-heal > >>>>>>> going wild (Ravi is working on throttling feature for 3.8, which > >>>>>>> will allow to put breaks on self-heal traffic) > >>>>>>> > >>>>>>> Pranith > >>>>>>> > >>>>>>>> > >>>>>>>> [root@gfs01a xattrop]# gluster volume heal homegfs info > >>>>>>>> Brick gfs01a.corvidtec.com:/data/brick01a/homegfs/ > >>>>>>>> Number of entries: 0 > >>>>>>>> > >>>>>>>> Brick gfs01b.corvidtec.com:/data/brick01b/homegfs/ > >>>>>>>> Number of entries: 0 > >>>>>>>> > >>>>>>>> Brick gfs01a.corvidtec.com:/data/brick02a/homegfs/ > >>>>>>>> Number of entries: 0 > >>>>>>>> > >>>>>>>> Brick gfs01b.corvidtec.com:/data/brick02b/homegfs/ > >>>>>>>> Number of entries: 0 > >>>>>>>> > >>>>>>>> Brick gfs02a.corvidtec.com:/data/brick01a/homegfs/ > >>>>>>>> Number of entries: 0 > >>>>>>>> > >>>>>>>> Brick gfs02b.corvidtec.com:/data/brick01b/homegfs/ > >>>>>>>> Number of entries: 0 > >>>>>>>> > >>>>>>>> Brick gfs02a.corvidtec.com:/data/brick02a/homegfs/ > >>>>>>>> Number of entries: 0 > >>>>>>>> > >>>>>>>> Brick gfs02b.corvidtec.com:/data/brick02b/homegfs/ > >>>>>>>> Number of entries: 0 > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On Thu, Jan 21, 2016 at 10:40 AM, Pranith Kumar Karampuri < > >>>>>>>> pkara...@redhat.com > wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On 01/21/2016 08:25 PM, Glomski, Patrick wrote: > >>>>>>>>>> > >>>>>>>>>> Hello, Pranith. The typical behavior is that the %cpu on a > >>>>>>>>>> glusterfsd process jumps to number of processor cores available > >>>>>>>>>> (800% or 1200%, depending on the pair of nodes involved) and the > >>>>>>>>>> load average on the machine goes very high (~20). The volume's > >>>>>>>>>> heal statistics output shows that it is crawling one of the > >>>>>>>>>> bricks and trying to heal, but this crawl hangs and never seems > >>>>>>>>>> to finish. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> The number of files in the xattrop directory varies over time, so > >>>>>>>>>> I ran a wc -l as you requested periodically for some time and > >>>>>>>>>> then started including a datestamped list of the files that were > >>>>>>>>>> in the xattrops directory on each brick to see which were > >>>>>>>>>> persistent. All bricks had files in the xattrop folder, so all > >>>>>>>>>> results are attached. > >>>>>>>>> > >>>>>>>>> Thanks this info is helpful. I don't see a lot of files. Could you > >>>>>>>>> give output of "gluster volume heal <volname> info"? Is there any > >>>>>>>>> directory in there which is LARGE? > >>>>>>>>> > >>>>>>>>> Pranith > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Please let me know if there is anything else I can provide. > >>>>>>>>>> > >>>>>>>>>> Patrick > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Thu, Jan 21, 2016 at 12:01 AM, Pranith Kumar Karampuri < > >>>>>>>>>> pkara...@redhat.com > wrote: > >>>>>>>>>>> > >>>>>>>>>>> hey, > >>>>>>>>>>> Which process is consuming so much cpu? I went through the logs > >>>>>>>>>>> you gave me. I see that the following files are in gfid mismatch > >>>>>>>>>>> state: > >>>>>>>>>>> > >>>>>>>>>>> <066e4525-8f8b-43aa-b7a1-86bbcecc68b9/safebrowsing-backup>, > >>>>>>>>>>> <1d48754b-b38c-403d-94e2-0f5c41d5f885/recovery.bak>, > >>>>>>>>>>> <ddc92637-303a-4059-9c56-ab23b1bb6ae9/patch0008.cnvrg>, > >>>>>>>>>>> > >>>>>>>>>>> Could you give me the output of "ls <brick-path>/indices/xattrop > >>>>>>>>>>> | wc -l" output on all the bricks which are acting this way? > >>>>>>>>>>> This will tell us the number of pending self-heals on the > >>>>>>>>>>> system. > >>>>>>>>>>> > >>>>>>>>>>> Pranith > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On 01/20/2016 09:26 PM, David Robinson wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> resending with parsed logs... > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I am having issues with 3.6.6 where the load will spike up to > >>>>>>>>>>>>>> 800% for one of the glusterfsd processes and the users can no > >>>>>>>>>>>>>> longer access the system. If I reboot the node, the heal will > >>>>>>>>>>>>>> finish normally after a few minutes and the system will be > >>>>>>>>>>>>>> responsive, but a few hours later the issue will start again. > >>>>>>>>>>>>>> It look like it is hanging in a heal and spinning up the load > >>>>>>>>>>>>>> on one of the bricks. The heal gets stuck and says it is > >>>>>>>>>>>>>> crawling and never returns. After a few minutes of the heal > >>>>>>>>>>>>>> saying it is crawling, the load spikes up and the mounts > >>>>>>>>>>>>>> become unresponsive. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Any suggestions on how to fix this? It has us stopped cold as > >>>>>>>>>>>>>> the user can no longer access the systems when the load > >>>>>>>>>>>>>> spikes... Logs attached. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> System setup info is: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> [root@gfs01a ~]# gluster volume info homegfs > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Volume Name: homegfs > >>>>>>>>>>>>>> Type: Distributed-Replicate > >>>>>>>>>>>>>> Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071 > >>>>>>>>>>>>>> Status: Started > >>>>>>>>>>>>>> Number of Bricks: 4 x 2 = 8 > >>>>>>>>>>>>>> Transport-type: tcp > >>>>>>>>>>>>>> Bricks: > >>>>>>>>>>>>>> Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs > >>>>>>>>>>>>>> Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs > >>>>>>>>>>>>>> Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs > >>>>>>>>>>>>>> Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs > >>>>>>>>>>>>>> Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs > >>>>>>>>>>>>>> Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs > >>>>>>>>>>>>>> Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs > >>>>>>>>>>>>>> Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs > >>>>>>>>>>>>>> Options Reconfigured: > >>>>>>>>>>>>>> performance.io-thread-count: 32 > >>>>>>>>>>>>>> performance.cache-size: 128MB > >>>>>>>>>>>>>> performance.write-behind-window-size: 128MB > >>>>>>>>>>>>>> server.allow-insecure: on > >>>>>>>>>>>>>> network.ping-timeout: 42 > >>>>>>>>>>>>>> storage.owner-gid: 100 > >>>>>>>>>>>>>> geo-replication.indexing: off > >>>>>>>>>>>>>> geo-replication.ignore-pid-check: on > >>>>>>>>>>>>>> changelog.changelog: off > >>>>>>>>>>>>>> changelog.fsync-interval: 3 > >>>>>>>>>>>>>> changelog.rollover-time: 15 > >>>>>>>>>>>>>> server.manage-gids: on > >>>>>>>>>>>>>> diagnostics.client-log-level: WARNING > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> [root@gfs01a ~]# rpm -qa | grep gluster > >>>>>>>>>>>>>> gluster-nagios-common-0.1.1-0.el6.noarch > >>>>>>>>>>>>>> glusterfs-fuse-3.6.6-1.el6.x86_64 > >>>>>>>>>>>>>> glusterfs-debuginfo-3.6.6-1.el6.x86_64 > >>>>>>>>>>>>>> glusterfs-libs-3.6.6-1.el6.x86_64 > >>>>>>>>>>>>>> glusterfs-geo-replication-3.6.6-1.el6.x86_64 > >>>>>>>>>>>>>> glusterfs-api-3.6.6-1.el6.x86_64 > >>>>>>>>>>>>>> glusterfs-devel-3.6.6-1.el6.x86_64 > >>>>>>>>>>>>>> glusterfs-api-devel-3.6.6-1.el6.x86_64 > >>>>>>>>>>>>>> glusterfs-3.6.6-1.el6.x86_64 > >>>>>>>>>>>>>> glusterfs-cli-3.6.6-1.el6.x86_64 > >>>>>>>>>>>>>> glusterfs-rdma-3.6.6-1.el6.x86_64 > >>>>>>>>>>>>>> samba-vfs-glusterfs-4.1.11-2.el6.x86_64 > >>>>>>>>>>>>>> glusterfs-server-3.6.6-1.el6.x86_64 > >>>>>>>>>>>>>> glusterfs-extra-xlators-3.6.6-1.el6.x86_64 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> _______________________________________________ > >>>>>>>>>>>> Gluster-devel mailing list > >>>>>>>>>>>> gluster-de...@gluster.org > >>>>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-devel > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> _______________________________________________ > >>>>>>>>>>> Gluster-users mailing list > >>>>>>>>>>> Gluster-users@gluster.org > >>>>>>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > >> > >> > >> _______________________________________________ > >> Gluster-devel mailing list > >> gluster-de...@gluster.org > >> http://www.gluster.org/mailman/listinfo/gluster-devel > > > >
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users