Re: [Gluster-users] One client can effectively hang entire gluster array
Not a bad idea for a workaround, but that would require significant investment with our current setup. All of our compute nodes are stateless / have no disks. All storage is network storage. It's probably still not feasible if we added disks because some simulations produce terabytes of data. We would need some kind of periodic check-and-sync mechanism. I still owe the gluster devs a test of that patch On Fri, Aug 19, 2016 at 3:22 PM, Steve Dainard wrote: > As a potential solution on the compute node side, can you have users copy > relevant data from the gluster volume to a local disk (ie $TMDIR), operate > on that disk, write output files to that disk, and then write the results > back to persistent storage once the job is complete? > > There are lots of factors to consider, but this is how we operate in a > small compute environment trying to avoid over-loading gluster storage > nodes. > > On Fri, Jul 8, 2016 at 6:29 AM, Glomski, Patrick < > patrick.glom...@corvidtec.com> wrote: > >> Hello, users and devs. >> >> TL;DR: One gluster client can essentially cause denial of service / >> availability loss to entire gluster array. There's no way to stop it and >> almost no way to find the bad client. Probably all (at least 3.6 and 3.7) >> versions are affected. >> >> We have two large replicate gluster arrays (3.6.6 and 3.7.11) that are >> used in a high-performance computing environment. Two file access cases >> cause severe issues with glusterfs: Some of our scientific codes write >> hundreds of files (~400-500) simultaneously (one file or more per processor >> core, so lots of small or large writes) and others read thousands of files >> (2000-3000) simultaneously to grab metadata from each file (lots of small >> reads). >> >> In either of these situations, one glusterfsd process on whatever peer >> the client is currently talking to will skyrocket to *nproc* cpu usage >> (800%, 1600%) and the storage cluster is essentially useless; all other >> clients will eventually try to read or write data to the overloaded peer >> and, when that happens, their connection will hang. Heals between peers >> hang because the load on the peer is around 1.5x the number of cores or >> more. This occurs in either gluster 3.6 or 3.7, is very repeatable, and >> happens much too frequently. >> >> Even worse, there seems to be no definitive way to diagnose which client >> is causing the issues. Getting 'volume status <> clients' doesn't help >> because it reports the total number of bytes read/written by each client. >> (a) The metadata in question is tiny compared to the multi-gigabyte output >> files being dealt with and (b) the byte-count is cumulative for the clients >> and the compute nodes are always up with the filesystems mounted, so the >> byte transfer counts are astronomical. The best solution I've come up with >> is to blackhole-route traffic from clients one at a time (effectively push >> the traffic over to the other peer), wait a few minutes for all of the >> backlogged traffic to dissipate (if it's going to), see if the load on >> glusterfsd drops, and repeat until I find the client causing the issue. I >> would *love* any ideas on a better way to find rogue clients. >> >> More importantly, though, there must be some feature envorced to stop one >> user from having the capability to render the entire filesystem unavailable >> for all other users. In the worst case, I would even prefer a gluster >> volume option that simply disconnects clients making over some threshold of >> file open requests. That's WAY more preferable than a complete availability >> loss reminiscent of a DDoS attack... >> >> Apologies for the essay and looking forward to any help you can provide. >> >> Thanks, >> Patrick >> >> ___ >> Gluster-users mailing list >> Gluster-users@gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-users >> > > ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] One client can effectively hang entire gluster array
Hello, users and devs. TL;DR: One gluster client can essentially cause denial of service / availability loss to entire gluster array. There's no way to stop it and almost no way to find the bad client. Probably all (at least 3.6 and 3.7) versions are affected. We have two large replicate gluster arrays (3.6.6 and 3.7.11) that are used in a high-performance computing environment. Two file access cases cause severe issues with glusterfs: Some of our scientific codes write hundreds of files (~400-500) simultaneously (one file or more per processor core, so lots of small or large writes) and others read thousands of files (2000-3000) simultaneously to grab metadata from each file (lots of small reads). In either of these situations, one glusterfsd process on whatever peer the client is currently talking to will skyrocket to *nproc* cpu usage (800%, 1600%) and the storage cluster is essentially useless; all other clients will eventually try to read or write data to the overloaded peer and, when that happens, their connection will hang. Heals between peers hang because the load on the peer is around 1.5x the number of cores or more. This occurs in either gluster 3.6 or 3.7, is very repeatable, and happens much too frequently. Even worse, there seems to be no definitive way to diagnose which client is causing the issues. Getting 'volume status <> clients' doesn't help because it reports the total number of bytes read/written by each client. (a) The metadata in question is tiny compared to the multi-gigabyte output files being dealt with and (b) the byte-count is cumulative for the clients and the compute nodes are always up with the filesystems mounted, so the byte transfer counts are astronomical. The best solution I've come up with is to blackhole-route traffic from clients one at a time (effectively push the traffic over to the other peer), wait a few minutes for all of the backlogged traffic to dissipate (if it's going to), see if the load on glusterfsd drops, and repeat until I find the client causing the issue. I would *love* any ideas on a better way to find rogue clients. More importantly, though, there must be some feature envorced to stop one user from having the capability to render the entire filesystem unavailable for all other users. In the worst case, I would even prefer a gluster volume option that simply disconnects clients making over some threshold of file open requests. That's WAY more preferable than a complete availability loss reminiscent of a DDoS attack... Apologies for the essay and looking forward to any help you can provide. Thanks, Patrick ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Multiple questions regarding monitoring of Gluster
If you're not opposed to another dependency, there is a glusterfs-nagios package (python-based) which presents the volumes in a much more useful format for monitoring. http://download.gluster.org/pub/gluster/glusterfs-nagios/1.1.0/ Patrick On Tue, Jun 21, 2016 at 10:28 AM, Malte Schmidt wrote: > Under which conditions does "gluster volume status $volume detail" return > something else than a table? > > Typical, expected output: > > root@server1:~# gluster volume status vol0 detail > Status of volume: vol0 > > -- > Brick : Brick server1:/data/glusterfs/vol0 > TCP Port : 49152 > RDMA Port : 0 > Online : Y > Pid : 2942 > File System : xfs > Device : /dev/mapper/glusterfs > Mount Options : rw,relatime,attr2,inode64,noquota > Inode Size : 512 > Disk Space Free : 9.0GB > Total Disk Space : 20.0GB > Inode Count : 10485760 > Free Inodes : 8774085 > > -- > Brick : Brick server2:/data/glusterfs/vol0 > TCP Port : 49152 > RDMA Port : 0 > Online : Y > Pid : 3275 > File System : xfs > Device : /dev/mapper/glusterfs > Mount Options : rw,relatime,attr2,inode64,noquota > Inode Size : 512 > Disk Space Free : 9.0GB > Total Disk Space : 20.0GB > Inode Count : 10485760 > Free Inodes : 8774085 > > Are there any conditions under which that table is different? Better > question: What is the best way of getting this data for usage in Nagios? > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users > ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [Gluster-devel] gluster 3.7.9 permission denied and mv errors
'Failed moves" are still a problem on our backup system. Another instance is attached with gfids if it's helpful. In this case, the rename after explicitly removing the target location was successful. mv the files from bkp01 --> bkp00 : 18:41:02 > /bin/mv: cannot move > `./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4' to > `../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4': > File exists > /bin/mv: cannot move > `./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4' to > `../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4': > File exists > Source: > # file: > data/brick01bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4 > trusted.bit-rot.version=0x0200571632ea00056a16 > trusted.gfid=0x2246740774424bb78f408d46c1f2a13e > trusted.pgfid.583e43d6-27ff-4978-90a8-d7057385cf72=0x0001 > Target: > getfattr: Removing leading '/' from absolute path names > # file: > data/brick01bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4 > trusted.gfid=0x556cd0eabffe4549a5ae43a188491db3 > trusted.glusterfs.dht.linkto=0x6766736261636b75702d636c69656e742d3100 > trusted.pgfid.9e21e545-b254-4d93-ba34-8dd204b41160=0x0001 > getfattr: > /data/brick02bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4: > No such file or directory > getfattr: > /data/brick01bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4: > No such file or directory > getfattr: > /data/brick02bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4: > No such file or directory > getfattr: > /data/brick03bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4: > No such file or directory > getfattr: > /data/brick04bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4: > No such file or directory > getfattr: > /data/brick05bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4: > No such file or directory > > # file: > data/brick02bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4 > trusted.bit-rot.version=0x0200569bb8d700074a42 > trusted.gfid=0x556cd0eabffe4549a5ae43a188491db3 > trusted.pgfid.9e21e545-b254-4d93-ba34-8dd204b41160=0x0001 > > getfattr: > /data/brick01bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4: > No such file or directory > getfattr: > /data/brick02bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4: > No such file or directory > getfattr: > /data/brick03bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4: > No such file or directory > getfattr: > /data/brick04bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4: > No such file or directory > getfattr: > /data/brick05bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4: > No such file or directory > stat: cannot stat > `"../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4"': > No such file or directory > retry: renaming > ./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4 -> > ../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4 > Rename Succeeded! > ./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4 -> > ../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4 > Thanks for any assistance, Patrick On Tue, May 3, 2016 at 5:02 PM, Glomski, Patrick < patrick.glom...@corvidtec.com> wrote: > Attaching a text file with the same content that is easier to read. > > Patrick > > On Tue, May 3, 2016 at 4:59 PM, Glomski, Patrick < > patrick.glom...@corvidtec.com> wrote: > >> Raghavendra, >> >> Last night the backup had four of these errors and only one of the >> 'retried moves' succeeded. The only one to succeed in moving the file the >> second time had target files on a different gluster peer (gfs01bkp). Not >> sure if that is significant. >> >> Note that I cannot stat the target file over the FUSE mount for any of >> these, but it exists on the bricks. Running an 'ls' on the directory >> containing the file (via FUSE) does not fix the issue. Source and target >> xattrs are appended for all bricks on all machines in the distributed &
Re: [Gluster-users] [Gluster-devel] gluster 3.7.9 permission denied and mv errors
Attaching a text file with the same content that is easier to read. Patrick On Tue, May 3, 2016 at 4:59 PM, Glomski, Patrick < patrick.glom...@corvidtec.com> wrote: > Raghavendra, > > Last night the backup had four of these errors and only one of the > 'retried moves' succeeded. The only one to succeed in moving the file the > second time had target files on a different gluster peer (gfs01bkp). Not > sure if that is significant. > > Note that I cannot stat the target file over the FUSE mount for any of > these, but it exists on the bricks. Running an 'ls' on the directory > containing the file (via FUSE) does not fix the issue. Source and target > xattrs are appended for all bricks on all machines in the distributed > volume. > > Let me know if there's any other information it would be useful to gather, > as this issue seems to recur frequently. > > Thanks, > Patrick > > # Move failures >> >> /bin/mv: cannot move >> `./homegfs/hpc_shared/motorsports/056-1/data_collected3' to >> `../bkp00/./homegfs/hpc_shared/motorsports/056-1/data_collected3': File >> exists >> /bin/mv: cannot move >> `./homegfs/hpc_shared/motorsports/090-1/data_collected3' to >> `../bkp00/./homegfs/hpc_shared/motorsports/090-1/data_collected3': File >> exists >> /bin/mv: cannot move >> `./homegfs/hpc_shared/motorsports/057-2/data_collected3' to >> `../bkp00/./homegfs/hpc_shared/motorsports/057-2/data_collected3': File >> exists >> /bin/mv: cannot move >> `./homegfs/hpc_shared/motorsports/54/data_collected4' to >> `../bkp00/./homegfs/hpc_shared/motorsports/54/data_collected4': File exists >> >> /bin/mv: cannot move >> `./homegfs/hpc_shared/motorsports/056-1/data_collected3' to >> `../bkp00/./homegfs/hpc_shared/motorsports/056-1/data_collected3': File >> exists >> /bin/mv: cannot move >> `./homegfs/hpc_shared/motorsports/090-1/data_collected3' to >> `../bkp00/./homegfs/hpc_shared/motorsports/090-1/data_collected3': File >> exists >> /bin/mv: cannot move >> `./homegfs/hpc_shared/motorsports/057-2/data_collected3' to >> `../bkp00/./homegfs/hpc_shared/motorsports/057-2/data_collected3': File >> exists >> /bin/mv: cannot move >> `./homegfs/hpc_shared/motorsports/54/data_collected4' to >> `../bkp00/./homegfs/hpc_shared/motorsports/54/data_collected4': File exists >> >> >> >> retry: renaming ./homegfs/hpc_shared/motorsports/056-1/data_collected3 -> >> ../bkp00/./homegfs/hpc_shared/motorsports/056-1/data_collected3 >> >> source xattrs >> gfs01bkp >> getfattr: >> /data/brick01bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/056-1/data_collected3: >> No such file or directory >> getfattr: >> /data/brick02bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/056-1/data_collected3: >> No such file or directory >> >> gfs02bkp >> getfattr: >> /data/brick01bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/056-1/data_collected3: >> No such file or directory >> # file: >> data/brick02bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/056-1/data_collected3 >> trusted.bit-rot.version=0x0200570308980001d157 >> trusted.gfid=0xe07abd8ae861442ebc0df8b20719af30 >> trusted.pgfid.1776adb6-2925-49d3-9cca-8a04c29f4c05=0x0001 >> >> getfattr: Removing leading '/' from absolute path names >> getfattr: >> /data/brick03bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/056-1/data_collected3: >> No such file or directory >> getfattr: >> /data/brick04bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/056-1/data_collected3: >> No such file or directory >> getfattr: >> /data/brick05bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/056-1/data_collected3: >> No such file or directory >> >> target xattrs >> gfs01bkp >>getfattr: >> /data/brick01bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/056-1/data_collected3: >> No such file or directory >>getfattr: >> /data/brick02bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/056-1/data_collected3: >> No such file or directory >> >> gfs02bkp >> # file: >> data/brick01bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/056-1/data_collected3 >> trusted.bit-rot.version=0x0200569bb8d20003ed00 >>
Re: [Gluster-users] [Gluster-devel] gluster 3.7.9 permission denied and mv errors
02bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/54/data_collected4 > trusted.bit-rot.version=0x0200569bb8d700074a42 > trusted.gfid=0x980c12097507431d953ee458ec14ca4a > trusted.pgfid.6a8ecf7c-5597-4725-9764-455f7e267667=0x0001 > > getfattr: Removing leading '/' from absolute path names > > gfs02bkp > getfattr: > /data/brick01bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/54/data_collected4: > No such file or directory > getfattr: > /data/brick02bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/54/data_collected4: > No such file or directory > getfattr: > /data/brick03bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/54/data_collected4: > No such file or directory > getfattr: > /data/brick04bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/54/data_collected4: > No such file or directory > getfattr: > /data/brick05bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/54/data_collected4: > No such file or directory > > stat: cannot stat > `"../bkp00/./homegfs/hpc_shared/motorsports/54/data_collected4"': No such > file or directory > Rename Succeeded! > > > > On Fri, Apr 29, 2016 at 10:21 AM, Glomski, Patrick < patrick.glom...@corvidtec.com> wrote: > Raghavendra, > > This error is occurring in a shell script moving files between directories > on a FUSE mount when overwriting an old file with a newer file (it's a > backup script, moving an incremental backup of a file into a 'rolling full > backup' directory). > > As a temporary workaround, we parse the output of this shell script for > move errors and handle the errors as they happen. Simply re-moving the > files fails, so we stat the destination (to see if we can learn anything > about the type of file that causes this behavior), delete the destination, > and try the move again (success!). Typical output is as follows: > > /bin/mv: cannot move > `./homegfs/hpc_shared/motorsports/gmics/Raven/p11/149/data_collected4' >> to `../bkp00/./homegfs/hpc_shared/motorsports/gmics/ >> Raven/p11/149/data_collected4': File exists >> /bin/mv: cannot move >> `./homegfs/hpc_shared/motorsports/gmics/Raven/p11/149/data_collected4' >> to `../bkp00/./homegfs/hpc_shared/motorsports/gmics/ >> Raven/p11/149/data_collected4': File exists >> File: `../bkp00/./homegfs/hpc_shared/motorsports/gmics/ >> Raven/p11/149/data_collected4' >> Size: 1714Blocks: 4 IO Block: 131072 regular file >> Device: 13h/19d Inode: 11051758947722304158 Links: 1 >> Access: (0660/-rw-rw) Uid: ( 628/pkeistler) Gid: ( 2020/ gmirl) >> Access: 2016-01-20 17:20:45.0 -0500 >> Modify: 2015-11-06 15:20:41.0 -0500 >> Change: 2016-01-27 03:35:00.434712146 -0500 >> retry: renaming >> ./homegfs/hpc_shared/motorsports/gmics/Raven/p11/149/data_collected4 >> -> ../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p11/ >> 149/data_collected4 >> > > Not sure if that description rings any bells as to what the problem might > be, but if not, I added some code to print out the 'getattr' for the source > and destination file on all of the bricks (before we delete the > destination) and will post to this thread the next time we have that issue. > > Thanks, > Patrick > > > On Fri, Apr 29, 2016 at 8:15 AM, Raghavendra G > wrote: > >> >> >> On Wed, Apr 13, 2016 at 10:00 PM, David F. Robinson < >> david.robin...@corvidtec.com> wrote: >> >>> I am running into two problems (possibly related?). >>> >>> 1) Every once in a while, when I do a 'rm -rf DIRNAME', it comes back >>> with an error: >>> rm: cannot remove `DIRNAME` : Directory not empty >>> >>> If I try the 'rm -rf' again after the error, it deletes the >>> directory. The issue is that I have scripts that clean up directories, and >>> they are failing unless I go through the deletes a 2nd time. >>> >> >> What kind of mount are you using? Is it a FUSE or NFS mount? Recently we >> saw a similar issue on NFS clients on RHEL6 where rm -rf used to fail with >> ENOTEMPTY in some specific cases. >> >> >>> >>> 2) I have different scripts to move a large numbers of files (5-25k) >>> from one directory to another. Sometimes I receive an error: >>> /bin/mv: cannot move `xyz` to `../bkp00/xyz`: File exists >>> >> >> Does ./bkp00/xyz exist on backend?
Re: [Gluster-users] [Gluster-devel] gluster 3.7.9 permission denied and mv errors
Raghavendra, This error is occurring in a shell script moving files between directories on a FUSE mount when overwriting an old file with a newer file (it's a backup script, moving an incremental backup of a file into a 'rolling full backup' directory). As a temporary workaround, we parse the output of this shell script for move errors and handle the errors as they happen. Simply re-moving the files fails, so we stat the destination (to see if we can learn anything about the type of file that causes this behavior), delete the destination, and try the move again (success!). Typical output is as follows: /bin/mv: cannot move `./homegfs/hpc_shared/motorsports/gmics/Raven/p11/149/data_collected4' > to `../bkp00/./homegfs/hpc_shared/motorsports/gmics/ > Raven/p11/149/data_collected4': File exists > /bin/mv: cannot move > `./homegfs/hpc_shared/motorsports/gmics/Raven/p11/149/data_collected4' > to `../bkp00/./homegfs/hpc_shared/motorsports/gmics/ > Raven/p11/149/data_collected4': File exists > File: `../bkp00/./homegfs/hpc_shared/motorsports/gmics/ > Raven/p11/149/data_collected4' > Size: 1714Blocks: 4 IO Block: 131072 regular file > Device: 13h/19d Inode: 11051758947722304158 Links: 1 > Access: (0660/-rw-rw) Uid: ( 628/pkeistler) Gid: ( 2020/ gmirl) > Access: 2016-01-20 17:20:45.0 -0500 > Modify: 2015-11-06 15:20:41.0 -0500 > Change: 2016-01-27 03:35:00.434712146 -0500 > retry: renaming > ./homegfs/hpc_shared/motorsports/gmics/Raven/p11/149/data_collected4 > -> ../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p11/ > 149/data_collected4 > Not sure if that description rings any bells as to what the problem might be, but if not, I added some code to print out the 'getattr' for the source and destination file on all of the bricks (before we delete the destination) and will post to this thread the next time we have that issue. Thanks, Patrick On Fri, Apr 29, 2016 at 8:15 AM, Raghavendra G wrote: > > > On Wed, Apr 13, 2016 at 10:00 PM, David F. Robinson < > david.robin...@corvidtec.com> wrote: > >> I am running into two problems (possibly related?). >> >> 1) Every once in a while, when I do a 'rm -rf DIRNAME', it comes back >> with an error: >> rm: cannot remove `DIRNAME` : Directory not empty >> >> If I try the 'rm -rf' again after the error, it deletes the >> directory. The issue is that I have scripts that clean up directories, and >> they are failing unless I go through the deletes a 2nd time. >> > > What kind of mount are you using? Is it a FUSE or NFS mount? Recently we > saw a similar issue on NFS clients on RHEL6 where rm -rf used to fail with > ENOTEMPTY in some specific cases. > > >> >> 2) I have different scripts to move a large numbers of files (5-25k) from >> one directory to another. Sometimes I receive an error: >> /bin/mv: cannot move `xyz` to `../bkp00/xyz`: File exists >> > > Does ./bkp00/xyz exist on backend? If yes, what is the value of gfid xattr > (key: "trusted.gfid") for "xyz" and "./bkp00/xyz" on backend bricks (I need > gfid from all the bricks) when this issue happens? > > >> The move is done using '/bin/mv -f', so it should overwrite the file >> if it exists. I have tested this with hundreds of files, and it works as >> expected. However, every few days the script that moves the files will >> have problems with 1 or 2 files during the move. This is one move problem >> out of roughly 10,000 files that are being moved and I cannot figure out >> any reason for the intermittent problem. >> >> Setup details for my gluster configuration shown below. >> >> [root@gfs01bkp logs]# gluster volume info >> >> Volume Name: gfsbackup >> Type: Distribute >> Volume ID: e78d5123-d9bc-4d88-9c73-61d28abf0b41 >> Status: Started >> Number of Bricks: 7 >> Transport-type: tcp >> Bricks: >> Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/gfsbackup >> Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/gfsbackup >> Brick3: gfsib02bkp.corvidtec.com:/data/brick01bkp/gfsbackup >> Brick4: gfsib02bkp.corvidtec.com:/data/brick02bkp/gfsbackup >> Brick5: gfsib02bkp.corvidtec.com:/data/brick03bkp/gfsbackup >> Brick6: gfsib02bkp.corvidtec.com:/data/brick04bkp/gfsbackup >> Brick7: gfsib02bkp.corvidtec.com:/data/brick05bkp/gfsbackup >> Options Reconfigured: >> nfs.disable: off >> server.allow-insecure: on >> storage.owner-gid: 100 >> server.manage-gids: on >> cluster.lookup-optimize: on >> server.event-threads: 8 >> client.event-threads: 8 >> changelog.changelog: off >> storage.build-pgfid: on >> performance.readdir-ahead: on >> diagnostics.brick-log-level: WARNING >> diagnostics.client-log-level: WARNING >> cluster.rebal-throttle: aggressive >> performance.cache-size: 1024MB >> performance.write-behind-window-size: 10MB >> >> >> [root@gfs01bkp logs]# rpm -qa | grep gluster >> glusterfs-server-3.7.9-1.el6.x86_64 >> glusterfs-debuginfo-3.7.9-1.el6.x86_64 >> glusterfs-api-3.7.9-1.el6.x86_64 >> glusterfs-resource-agents-3.7.9-1.el6.noa
[Gluster-users] Gluster + Infiniband + 3.x kernel -> hard crash?
We run gluster 3.7 in a distributed replicated setup. Infiniband (tcp) links the gluster peers together and clients use the ethernet interface. This setup is stable running CentOS 6.x and using the most recent infiniband drivers provided by Mellanox. Uptime was 170 days when we took it down to wipe the systems and update to CentOS 7. When the exact same setup is loaded onto a CentOS 7 machine (minor setup differences, but basically the same; setup is handled by ansible), the peers will (seemingly randomly) experience a hard crash and need to be power-cycled. There is no output on the screen and nothing in the logs. After rebooting, the peer reconnects, heals whatever files it missed, and everything is happy again. Maximum uptime for any given peer is 20 days. Thanks to the replication, clients maintain connectivity, but from a system administration perspective it's driving me crazy! We run other storage servers with the same infiniband and CentOS7 setup except that they use NFS instead of gluster. NFS shares are served through infiniband to some machines and ethernet to others. Is it possible that gluster's (and only gluster's) use of the infiniband kernel module to send tcp packets to its peers on a 3 kernel is causing the system to have a hard crash? Pretty specific problem and it doesn't make much sense to me, but that's sure where the evidence seems to point. Anyone running CentOS 7 gluster arrays with infiniband out there to confirm that it works fine for them? Gluster devs care to chime in with a better theory? I'd love for this random crashing to stop. Thanks, Patrick ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] [Gluster-devel] heal hanging
Samba version is 4.1.17 that you guys maintain at download.gluster.org. The vfs plugin comes packaged with it. http://download.gluster.org/pub/gluster/glusterfs/samba/EPEL.repo/epel-6/x86_64/ # smbd --version Version 4.1.17 # rpm -qa | grep samba-vfs-glusterfs samba-vfs-glusterfs-4.1.17-4.el6rhs.x86_64 Let us know if there's anything else we can provide, Patrick On Thu, Jan 21, 2016 at 10:07 PM, Raghavendra Talur wrote: > > On Jan 22, 2016 7:27 AM, "Pranith Kumar Karampuri" > wrote: > > > > > > > > On 01/22/2016 07:19 AM, Pranith Kumar Karampuri wrote: > >> > >> > >> > >> On 01/22/2016 07:13 AM, Glomski, Patrick wrote: > >>> > >>> We use the samba glusterfs virtual filesystem (the current version > provided on download.gluster.org), but no windows clients connecting > directly. > >> > >> > >> Hmm.. Is there a way to disable using this and check if the CPU% still > increases? What getxattr of "glusterfs.get_real_filename " does is > to scan the entire directory looking for strcasecmp(, > ). If anything matches then it will return the > . But the problem is the scan is costly. So I wonder if > this is the reason for the CPU spikes. > > > > +Raghavendra Talur, +Poornima > > > > Raghavendra, Poornima, > > When are these getxattrs triggered? Did you guys see any > brick CPU spikes before? I initially thought it could be because of big > directory heals. But this is happening even when no self-heals are > required. So I had to move away from that theory. > > These getxattrs are triggered when a SMB client performs a path based > operation. It is necessary then that some client was connected. > > The last fix to go in that code for 3.6 was > http://review.gluster.org/#/c/10403/. > > I am not able to determine which release of 3.6 it made into. Will update. > > Also we would need version of Samba installed. Including the vfs plugin > package. > > There is a for loop of strcmp involved here which does take a lot of CPU. > It should be for short bursts though and is expected and harmless. > > > > > Pranith > > > >> > >> Pranith > >>> > >>> > >>> On Thu, Jan 21, 2016 at 8:37 PM, Pranith Kumar Karampuri < > pkara...@redhat.com> wrote: > >>>> > >>>> Do you have any windows clients? I see a lot of getxattr calls for > "glusterfs.get_real_filename" which lead to full readdirs of the > directories on the brick. > >>>> > >>>> Pranith > >>>> > >>>> On 01/22/2016 12:51 AM, Glomski, Patrick wrote: > >>>>> > >>>>> Pranith, could this kind of behavior be self-inflicted by us > deleting files directly from the bricks? We have done that in the past to > clean up an issues where gluster wouldn't allow us to delete from the mount. > >>>>> > >>>>> If so, is it feasible to clean them up by running a search on the > .glusterfs directories directly and removing files with a reference count > of 1 that are non-zero size (or directly checking the xattrs to be sure > that it's not a DHT link). > >>>>> > >>>>> find /data/brick01a/homegfs/.glusterfs -type f -not -empty -links -2 > -exec rm -f "{}" \; > >>>>> > >>>>> Is there anything I'm inherently missing with that approach that > will further corrupt the system? > >>>>> > >>>>> > >>>>> On Thu, Jan 21, 2016 at 1:02 PM, Glomski, Patrick < > patrick.glom...@corvidtec.com> wrote: > >>>>>> > >>>>>> Load spiked again: ~1200%cpu on gfs02a for glusterfsd. Crawl has > been running on one of the bricks on gfs02b for 25 min or so and users > cannot access the volume. > >>>>>> > >>>>>> I re-listed the xattrop directories as well as a 'top' entry and > heal statistics. Then I restarted the gluster services on gfs02a. > >>>>>> > >>>>>> === top === > >>>>>> PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ > COMMAND > >>>>>> 8969 root 20 0 2815m 204m 3588 S 1181.0 0.6 591:06.93 > glusterfsd > >>>>>> > >>>>>> === xattrop === > >>>>>> /data/brick01a/homegfs/.glusterfs/indices/xattrop: > >>>>>> xattrop-41f19453-91e4-437c-afa9-3b25614de210 > xattrop-9b
Re: [Gluster-users] [Gluster-devel] heal hanging
Last entry for get_real_filename on any of the bricks was when we turned off the samba gfapi vfs plugin earlier today: /var/log/glusterfs/bricks/data-brick01a-homegfs.log:[2016-01-21 15:13:00.008239] E [server-rpc-fops.c:768:server_getxattr_cbk] 0-homegfs-server: 105: GETXATTR /wks_backup (40e582d6-b0c7-4099-ba88-9168a3c32ca6) (glusterfs.get_real_filename:desktop.ini) ==> (Permission denied) We'll get back to you with those traces when %cpu spikes again. As with most sporadic problems, as soon as you want something out of it, the issue becomes harder to reproduce. On Thu, Jan 21, 2016 at 9:21 PM, Pranith Kumar Karampuri < pkara...@redhat.com> wrote: > > > On 01/22/2016 07:25 AM, Glomski, Patrick wrote: > > Unfortunately, all samba mounts to the gluster volume through the gfapi > vfs plugin have been disabled for the last 6 hours or so and frequency of > %cpu spikes is increased. We had switched to sharing a fuse mount through > samba, but I just disabled that as well. There are no samba shares of this > volume now. The spikes now happen every thirty minutes or so. We've > resorted to just rebooting the machine with high load for the present. > > > Could you see if the logs of following type are not at all coming? > [2016-01-21 15:13:00.005736] E [server-rpc-fops.c:768:server_getxattr_cbk] > 0-homegfs-server: 110: GETXATTR /wks_backup (40e582d6-b0c7-4099-ba88-9168a3c > 32ca6) (glusterfs.get_real_filename:desktop.ini) ==> (Permission denied) > > These are operations that failed. Operations that succeed are the ones > that will scan the directory. But I don't have a way to find them other > than using tcpdumps. > > At the moment I have 2 theories: > 1) these get_real_filename calls > 2) [2016-01-21 16:10:38.017828] E [server-helpers.c:46:gid_resolve] > 0-gid-cache: getpwuid_r(494) failed > " > > Yessir they are. Normally, sssd would look to the local cache file in > /var/lib/sss/db/ first, to get any group or userid information, then go out > to the domain controller. I put the options that we are using on our GFS > volumes below… Thanks for your help. > > > > We had been running sssd with sssd_nss and sssd_be sub-processes on these > systems for a long time, under the GFS 3.5.2 code, and not run into the > problem that David described with the high cpu usage on sssd_nss. > > *" *That was Tom Young's email 1.5 years back when we debugged it. But > the process which was consuming lot of cpu is sssd_nss. So I am not sure if > it is same issue. Let us debug to see '1)' doesn't happen. The gstack > traces I asked for should also help. > > > Pranith > > > On Thu, Jan 21, 2016 at 8:49 PM, Pranith Kumar Karampuri < > pkara...@redhat.com> wrote: > >> >> >> On 01/22/2016 07:13 AM, Glomski, Patrick wrote: >> >> We use the samba glusterfs virtual filesystem (the current version >> provided on download.gluster.org), but no windows clients connecting >> directly. >> >> >> Hmm.. Is there a way to disable using this and check if the CPU% still >> increases? What getxattr of "glusterfs.get_real_filename " does is >> to scan the entire directory looking for strcasecmp(, >> ). If anything matches then it will return the >> . But the problem is the scan is costly. So I wonder if >> this is the reason for the CPU spikes. >> >> Pranith >> >> >> On Thu, Jan 21, 2016 at 8:37 PM, Pranith Kumar Karampuri < >> pkara...@redhat.com> wrote: >> >>> Do you have any windows clients? I see a lot of getxattr calls for >>> "glusterfs.get_real_filename" which lead to full readdirs of the >>> directories on the brick. >>> >>> Pranith >>> >>> On 01/22/2016 12:51 AM, Glomski, Patrick wrote: >>> >>> Pranith, could this kind of behavior be self-inflicted by us deleting >>> files directly from the bricks? We have done that in the past to clean up >>> an issues where gluster wouldn't allow us to delete from the mount. >>> >>> If so, is it feasible to clean them up by running a search on the >>> .glusterfs directories directly and removing files with a reference count >>> of 1 that are non-zero size (or directly checking the xattrs to be sure >>> that it's not a DHT link). >>> >>> find /data/brick01a/homegfs/.glusterfs -type f -not -empty -links -2 >>> -exec rm -f "{}" \; >>> >>> Is there anything I'm inherently missing with that approach that will >>> further corrupt the system? >>> >>> >>> On Thu, Jan 2
Re: [Gluster-users] [Gluster-devel] heal hanging
Unfortunately, all samba mounts to the gluster volume through the gfapi vfs plugin have been disabled for the last 6 hours or so and frequency of %cpu spikes is increased. We had switched to sharing a fuse mount through samba, but I just disabled that as well. There are no samba shares of this volume now. The spikes now happen every thirty minutes or so. We've resorted to just rebooting the machine with high load for the present. On Thu, Jan 21, 2016 at 8:49 PM, Pranith Kumar Karampuri < pkara...@redhat.com> wrote: > > > On 01/22/2016 07:13 AM, Glomski, Patrick wrote: > > We use the samba glusterfs virtual filesystem (the current version > provided on download.gluster.org), but no windows clients connecting > directly. > > > Hmm.. Is there a way to disable using this and check if the CPU% still > increases? What getxattr of "glusterfs.get_real_filename " does is > to scan the entire directory looking for strcasecmp(, > ). If anything matches then it will return the > . But the problem is the scan is costly. So I wonder if > this is the reason for the CPU spikes. > > Pranith > > > On Thu, Jan 21, 2016 at 8:37 PM, Pranith Kumar Karampuri < > pkara...@redhat.com> wrote: > >> Do you have any windows clients? I see a lot of getxattr calls for >> "glusterfs.get_real_filename" which lead to full readdirs of the >> directories on the brick. >> >> Pranith >> >> On 01/22/2016 12:51 AM, Glomski, Patrick wrote: >> >> Pranith, could this kind of behavior be self-inflicted by us deleting >> files directly from the bricks? We have done that in the past to clean up >> an issues where gluster wouldn't allow us to delete from the mount. >> >> If so, is it feasible to clean them up by running a search on the >> .glusterfs directories directly and removing files with a reference count >> of 1 that are non-zero size (or directly checking the xattrs to be sure >> that it's not a DHT link). >> >> find /data/brick01a/homegfs/.glusterfs -type f -not -empty -links -2 >> -exec rm -f "{}" \; >> >> Is there anything I'm inherently missing with that approach that will >> further corrupt the system? >> >> >> On Thu, Jan 21, 2016 at 1:02 PM, Glomski, Patrick < >> patrick.glom...@corvidtec.com> wrote: >> >>> Load spiked again: ~1200%cpu on gfs02a for glusterfsd. Crawl has been >>> running on one of the bricks on gfs02b for 25 min or so and users cannot >>> access the volume. >>> >>> I re-listed the xattrop directories as well as a 'top' entry and heal >>> statistics. Then I restarted the gluster services on gfs02a. >>> >>> === top === >>> PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ >>> COMMAND >>> 8969 root 20 0 2815m 204m 3588 S 1181.0 0.6 591:06.93 >>> glusterfsd >>> >>> === xattrop === >>> /data/brick01a/homegfs/.glusterfs/indices/xattrop: >>> xattrop-41f19453-91e4-437c-afa9-3b25614de210 >>> xattrop-9b815879-2f4d-402b-867c-a6d65087788c >>> >>> /data/brick02a/homegfs/.glusterfs/indices/xattrop: >>> xattrop-70131855-3cfb-49af-abce-9d23f57fb393 >>> xattrop-dfb77848-a39d-4417-a725-9beca75d78c6 >>> >>> /data/brick01b/homegfs/.glusterfs/indices/xattrop: >>> e6e47ed9-309b-42a7-8c44-28c29b9a20f8 >>> xattrop-5c797a64-bde7-4eac-b4fc-0befc632e125 >>> xattrop-38ec65a1-00b5-4544-8a6c-bf0f531a1934 >>> xattrop-ef0980ad-f074-4163-979f-16d5ef85b0a0 >>> >>> /data/brick02b/homegfs/.glusterfs/indices/xattrop: >>> xattrop-7402438d-0ee7-4fcf-b9bb-b561236f99bc >>> xattrop-8ffbf5f7-ace3-497d-944e-93ac85241413 >>> >>> /data/brick01a/homegfs/.glusterfs/indices/xattrop: >>> xattrop-0115acd0-caae-4dfd-b3b4-7cc42a0ff531 >>> >>> /data/brick02a/homegfs/.glusterfs/indices/xattrop: >>> xattrop-7e20fdb1-5224-4b9a-be06-568708526d70 >>> >>> /data/brick01b/homegfs/.glusterfs/indices/xattrop: >>> 8034bc06-92cd-4fa5-8aaf-09039e79d2c8 >>> c9ce22ed-6d8b-471b-a111-b39e57f0b512 >>> 94fa1d60-45ad-4341-b69c-315936b51e8d >>> xattrop-9c04623a-64ce-4f66-8b23-dbaba49119c7 >>> >>> /data/brick02b/homegfs/.glusterfs/indices/xattrop: >>> xattrop-b8c8f024-d038-49a2-9a53-c54ead09111d >>> >>> >>> === heal stats === >>> >>> homegfs [b0-gfsib01a] : Starting time of crawl : Thu Jan 21 >>> 12:36:45 2016 >&g
Re: [Gluster-users] [Gluster-devel] heal hanging
We use the samba glusterfs virtual filesystem (the current version provided on download.gluster.org), but no windows clients connecting directly. On Thu, Jan 21, 2016 at 8:37 PM, Pranith Kumar Karampuri < pkara...@redhat.com> wrote: > Do you have any windows clients? I see a lot of getxattr calls for > "glusterfs.get_real_filename" which lead to full readdirs of the > directories on the brick. > > Pranith > > On 01/22/2016 12:51 AM, Glomski, Patrick wrote: > > Pranith, could this kind of behavior be self-inflicted by us deleting > files directly from the bricks? We have done that in the past to clean up > an issues where gluster wouldn't allow us to delete from the mount. > > If so, is it feasible to clean them up by running a search on the > .glusterfs directories directly and removing files with a reference count > of 1 that are non-zero size (or directly checking the xattrs to be sure > that it's not a DHT link). > > find /data/brick01a/homegfs/.glusterfs -type f -not -empty -links -2 -exec > rm -f "{}" \; > > Is there anything I'm inherently missing with that approach that will > further corrupt the system? > > > On Thu, Jan 21, 2016 at 1:02 PM, Glomski, Patrick < > patrick.glom...@corvidtec.com> wrote: > >> Load spiked again: ~1200%cpu on gfs02a for glusterfsd. Crawl has been >> running on one of the bricks on gfs02b for 25 min or so and users cannot >> access the volume. >> >> I re-listed the xattrop directories as well as a 'top' entry and heal >> statistics. Then I restarted the gluster services on gfs02a. >> >> === top === >> PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ >> COMMAND >> 8969 root 20 0 2815m 204m 3588 S 1181.0 0.6 591:06.93 >> glusterfsd >> >> === xattrop === >> /data/brick01a/homegfs/.glusterfs/indices/xattrop: >> xattrop-41f19453-91e4-437c-afa9-3b25614de210 >> xattrop-9b815879-2f4d-402b-867c-a6d65087788c >> >> /data/brick02a/homegfs/.glusterfs/indices/xattrop: >> xattrop-70131855-3cfb-49af-abce-9d23f57fb393 >> xattrop-dfb77848-a39d-4417-a725-9beca75d78c6 >> >> /data/brick01b/homegfs/.glusterfs/indices/xattrop: >> e6e47ed9-309b-42a7-8c44-28c29b9a20f8 >> xattrop-5c797a64-bde7-4eac-b4fc-0befc632e125 >> xattrop-38ec65a1-00b5-4544-8a6c-bf0f531a1934 >> xattrop-ef0980ad-f074-4163-979f-16d5ef85b0a0 >> >> /data/brick02b/homegfs/.glusterfs/indices/xattrop: >> xattrop-7402438d-0ee7-4fcf-b9bb-b561236f99bc >> xattrop-8ffbf5f7-ace3-497d-944e-93ac85241413 >> >> /data/brick01a/homegfs/.glusterfs/indices/xattrop: >> xattrop-0115acd0-caae-4dfd-b3b4-7cc42a0ff531 >> >> /data/brick02a/homegfs/.glusterfs/indices/xattrop: >> xattrop-7e20fdb1-5224-4b9a-be06-568708526d70 >> >> /data/brick01b/homegfs/.glusterfs/indices/xattrop: >> 8034bc06-92cd-4fa5-8aaf-09039e79d2c8 c9ce22ed-6d8b-471b-a111-b39e57f0b512 >> 94fa1d60-45ad-4341-b69c-315936b51e8d >> xattrop-9c04623a-64ce-4f66-8b23-dbaba49119c7 >> >> /data/brick02b/homegfs/.glusterfs/indices/xattrop: >> xattrop-b8c8f024-d038-49a2-9a53-c54ead09111d >> >> >> === heal stats === >> >> homegfs [b0-gfsib01a] : Starting time of crawl : Thu Jan 21 >> 12:36:45 2016 >> homegfs [b0-gfsib01a] : Ending time of crawl : Thu Jan 21 >> 12:36:45 2016 >> homegfs [b0-gfsib01a] : Type of crawl: INDEX >> homegfs [b0-gfsib01a] : No. of entries healed: 0 >> homegfs [b0-gfsib01a] : No. of entries in split-brain: 0 >> homegfs [b0-gfsib01a] : No. of heal failed entries : 0 >> >> homegfs [b1-gfsib01b] : Starting time of crawl : Thu Jan 21 >> 12:36:19 2016 >> homegfs [b1-gfsib01b] : Ending time of crawl : Thu Jan 21 >> 12:36:19 2016 >> homegfs [b1-gfsib01b] : Type of crawl: INDEX >> homegfs [b1-gfsib01b] : No. of entries healed: 0 >> homegfs [b1-gfsib01b] : No. of entries in split-brain: 0 >> homegfs [b1-gfsib01b] : No. of heal failed entries : 1 >> >> homegfs [b2-gfsib01a] : Starting time of crawl : Thu Jan 21 >> 12:36:48 2016 >> homegfs [b2-gfsib01a] : Ending time of crawl : Thu Jan 21 >> 12:36:48 2016 >> homegfs [b2-gfsib01a] : Type of crawl: INDEX >> homegfs [b2-gfsib01a] : No. of entries healed: 0 >> homegfs [b2-gfsib01a] : No. of entries in split-brain: 0 >> homegfs [b2-gfsib01a] : No. of heal failed entries : 0 >> >> homegfs [b3-gfsib01b] : Starting time of crawl : Thu Jan 21 >> 12:36:47 2016 &
Re: [Gluster-users] [Gluster-devel] heal hanging
Pranith, could this kind of behavior be self-inflicted by us deleting files directly from the bricks? We have done that in the past to clean up an issues where gluster wouldn't allow us to delete from the mount. If so, is it feasible to clean them up by running a search on the .glusterfs directories directly and removing files with a reference count of 1 that are non-zero size (or directly checking the xattrs to be sure that it's not a DHT link). find /data/brick01a/homegfs/.glusterfs -type f -not -empty -links -2 -exec rm -f "{}" \; Is there anything I'm inherently missing with that approach that will further corrupt the system? On Thu, Jan 21, 2016 at 1:02 PM, Glomski, Patrick < patrick.glom...@corvidtec.com> wrote: > Load spiked again: ~1200%cpu on gfs02a for glusterfsd. Crawl has been > running on one of the bricks on gfs02b for 25 min or so and users cannot > access the volume. > > I re-listed the xattrop directories as well as a 'top' entry and heal > statistics. Then I restarted the gluster services on gfs02a. > > === top === > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ > COMMAND > 8969 root 20 0 2815m 204m 3588 S 1181.0 0.6 591:06.93 > glusterfsd > > === xattrop === > /data/brick01a/homegfs/.glusterfs/indices/xattrop: > xattrop-41f19453-91e4-437c-afa9-3b25614de210 > xattrop-9b815879-2f4d-402b-867c-a6d65087788c > > /data/brick02a/homegfs/.glusterfs/indices/xattrop: > xattrop-70131855-3cfb-49af-abce-9d23f57fb393 > xattrop-dfb77848-a39d-4417-a725-9beca75d78c6 > > /data/brick01b/homegfs/.glusterfs/indices/xattrop: > e6e47ed9-309b-42a7-8c44-28c29b9a20f8 > xattrop-5c797a64-bde7-4eac-b4fc-0befc632e125 > xattrop-38ec65a1-00b5-4544-8a6c-bf0f531a1934 > xattrop-ef0980ad-f074-4163-979f-16d5ef85b0a0 > > /data/brick02b/homegfs/.glusterfs/indices/xattrop: > xattrop-7402438d-0ee7-4fcf-b9bb-b561236f99bc > xattrop-8ffbf5f7-ace3-497d-944e-93ac85241413 > > /data/brick01a/homegfs/.glusterfs/indices/xattrop: > xattrop-0115acd0-caae-4dfd-b3b4-7cc42a0ff531 > > /data/brick02a/homegfs/.glusterfs/indices/xattrop: > xattrop-7e20fdb1-5224-4b9a-be06-568708526d70 > > /data/brick01b/homegfs/.glusterfs/indices/xattrop: > 8034bc06-92cd-4fa5-8aaf-09039e79d2c8 c9ce22ed-6d8b-471b-a111-b39e57f0b512 > 94fa1d60-45ad-4341-b69c-315936b51e8d > xattrop-9c04623a-64ce-4f66-8b23-dbaba49119c7 > > /data/brick02b/homegfs/.glusterfs/indices/xattrop: > xattrop-b8c8f024-d038-49a2-9a53-c54ead09111d > > > === heal stats === > > homegfs [b0-gfsib01a] : Starting time of crawl : Thu Jan 21 12:36:45 > 2016 > homegfs [b0-gfsib01a] : Ending time of crawl : Thu Jan 21 12:36:45 > 2016 > homegfs [b0-gfsib01a] : Type of crawl: INDEX > homegfs [b0-gfsib01a] : No. of entries healed: 0 > homegfs [b0-gfsib01a] : No. of entries in split-brain: 0 > homegfs [b0-gfsib01a] : No. of heal failed entries : 0 > > homegfs [b1-gfsib01b] : Starting time of crawl : Thu Jan 21 12:36:19 > 2016 > homegfs [b1-gfsib01b] : Ending time of crawl : Thu Jan 21 12:36:19 > 2016 > homegfs [b1-gfsib01b] : Type of crawl: INDEX > homegfs [b1-gfsib01b] : No. of entries healed: 0 > homegfs [b1-gfsib01b] : No. of entries in split-brain: 0 > homegfs [b1-gfsib01b] : No. of heal failed entries : 1 > > homegfs [b2-gfsib01a] : Starting time of crawl : Thu Jan 21 12:36:48 > 2016 > homegfs [b2-gfsib01a] : Ending time of crawl : Thu Jan 21 12:36:48 > 2016 > homegfs [b2-gfsib01a] : Type of crawl: INDEX > homegfs [b2-gfsib01a] : No. of entries healed: 0 > homegfs [b2-gfsib01a] : No. of entries in split-brain: 0 > homegfs [b2-gfsib01a] : No. of heal failed entries : 0 > > homegfs [b3-gfsib01b] : Starting time of crawl : Thu Jan 21 12:36:47 > 2016 > homegfs [b3-gfsib01b] : Ending time of crawl : Thu Jan 21 12:36:47 > 2016 > homegfs [b3-gfsib01b] : Type of crawl: INDEX > homegfs [b3-gfsib01b] : No. of entries healed: 0 > homegfs [b3-gfsib01b] : No. of entries in split-brain: 0 > homegfs [b3-gfsib01b] : No. of heal failed entries : 0 > > homegfs [b4-gfsib02a] : Starting time of crawl : Thu Jan 21 12:36:06 > 2016 > homegfs [b4-gfsib02a] : Ending time of crawl : Thu Jan 21 12:36:06 > 2016 > homegfs [b4-gfsib02a] : Type of crawl: INDEX > homegfs [b4-gfsib02a] : No. of entries healed: 0 > homegfs [b4-gfsib02a] : No. of entries in split-brain: 0 > homegfs [b4-gfsib02a] : No. of heal failed entries : 0 > > homegfs [b5-gfsib02b] : Starting time of crawl : Thu Jan 21 12:13:40 > 2016 > homegfs [b5-gfsib02b] :*
Re: [Gluster-users] [Gluster-devel] heal hanging
I should mention that the problem is not currently occurring and there are no heals (output appended). By restarting the gluster services, we can stop the crawl, which lowers the load for a while. Subsequent crawls seem to finish properly. For what it's worth, files/folders that show up in the 'volume info' output during a hung crawl don't seem to be anything out of the ordinary. Over the past four days, the typical time before the problem recurs after suppressing it in this manner is an hour. Last night when we reached out to you was the last time it happened and the load has been low since (a relief). David believes that recursively listing the files (ls -alR or similar) from a client mount can force the issue to happen, but obviously I'd rather not unless we have some precise thing we're looking for. Let me know if you'd like me to attempt to drive the system unstable like that and what I should look for. As it's a production system, I'd rather not leave it in this state for long. [root@gfs01a xattrop]# gluster volume heal homegfs info Brick gfs01a.corvidtec.com:/data/brick01a/homegfs/ Number of entries: 0 Brick gfs01b.corvidtec.com:/data/brick01b/homegfs/ Number of entries: 0 Brick gfs01a.corvidtec.com:/data/brick02a/homegfs/ Number of entries: 0 Brick gfs01b.corvidtec.com:/data/brick02b/homegfs/ Number of entries: 0 Brick gfs02a.corvidtec.com:/data/brick01a/homegfs/ Number of entries: 0 Brick gfs02b.corvidtec.com:/data/brick01b/homegfs/ Number of entries: 0 Brick gfs02a.corvidtec.com:/data/brick02a/homegfs/ Number of entries: 0 Brick gfs02b.corvidtec.com:/data/brick02b/homegfs/ Number of entries: 0 On Thu, Jan 21, 2016 at 10:40 AM, Pranith Kumar Karampuri < pkara...@redhat.com> wrote: > > > On 01/21/2016 08:25 PM, Glomski, Patrick wrote: > > Hello, Pranith. The typical behavior is that the %cpu on a glusterfsd > process jumps to number of processor cores available (800% or 1200%, > depending on the pair of nodes involved) and the load average on the > machine goes very high (~20). The volume's heal statistics output shows > that it is crawling one of the bricks and trying to heal, but this crawl > hangs and never seems to finish. > > > The number of files in the xattrop directory varies over time, so I ran a > wc -l as you requested periodically for some time and then started > including a datestamped list of the files that were in the xattrops > directory on each brick to see which were persistent. All bricks had files > in the xattrop folder, so all results are attached. > > Thanks this info is helpful. I don't see a lot of files. Could you give > output of "gluster volume heal info"? Is there any directory in > there which is LARGE? > > Pranith > > > Please let me know if there is anything else I can provide. > > Patrick > > > On Thu, Jan 21, 2016 at 12:01 AM, Pranith Kumar Karampuri < > pkara...@redhat.com> wrote: > >> hey, >>Which process is consuming so much cpu? I went through the logs >> you gave me. I see that the following files are in gfid mismatch state: >> >> <066e4525-8f8b-43aa-b7a1-86bbcecc68b9/safebrowsing-backup>, >> <1d48754b-b38c-403d-94e2-0f5c41d5f885/recovery.bak>, >> , >> >> Could you give me the output of "ls /indices/xattrop | wc -l" >> output on all the bricks which are acting this way? This will tell us the >> number of pending self-heals on the system. >> >> Pranith >> >> >> On 01/20/2016 09:26 PM, David Robinson wrote: >> >> resending with parsed logs... >> >> >> >> >> >> I am having issues with 3.6.6 where the load will spike up to 800% for >> one of the glusterfsd processes and the users can no longer access the >> system. If I reboot the node, the heal will finish normally after a few >> minutes and the system will be responsive, but a few hours later the issue >> will start again. It look like it is hanging in a heal and spinning up the >> load on one of the bricks. The heal gets stuck and says it is crawling and >> never returns. After a few minutes of the heal saying it is crawling, the >> load spikes up and the mounts become unresponsive. >> >> Any suggestions on how to fix this? It has us stopped cold as the user >> can no longer access the systems when the load spikes... Logs attached. >> >> System setup info is: >> >> [root@gfs01a ~]# gluster volume info homegfs >> >> Volume Name: homegfs >> Type: Distributed-Replicate >> Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071 >> Status: Started >> Number of Bricks: 4 x 2 = 8 >> Transport-type: tcp >> Bricks: >&
Re: [Gluster-users] [Gluster-devel] heal hanging
Hello, Pranith. The typical behavior is that the %cpu on a glusterfsd process jumps to number of processor cores available (800% or 1200%, depending on the pair of nodes involved) and the load average on the machine goes very high (~20). The volume's heal statistics output shows that it is crawling one of the bricks and trying to heal, but this crawl hangs and never seems to finish. The number of files in the xattrop directory varies over time, so I ran a wc -l as you requested periodically for some time and then started including a datestamped list of the files that were in the xattrops directory on each brick to see which were persistent. All bricks had files in the xattrop folder, so all results are attached. Please let me know if there is anything else I can provide. Patrick On Thu, Jan 21, 2016 at 12:01 AM, Pranith Kumar Karampuri < pkara...@redhat.com> wrote: > hey, >Which process is consuming so much cpu? I went through the logs you > gave me. I see that the following files are in gfid mismatch state: > > <066e4525-8f8b-43aa-b7a1-86bbcecc68b9/safebrowsing-backup>, > <1d48754b-b38c-403d-94e2-0f5c41d5f885/recovery.bak>, > , > > Could you give me the output of "ls /indices/xattrop | wc -l" > output on all the bricks which are acting this way? This will tell us the > number of pending self-heals on the system. > > Pranith > > > On 01/20/2016 09:26 PM, David Robinson wrote: > > resending with parsed logs... > > > > > > I am having issues with 3.6.6 where the load will spike up to 800% for one > of the glusterfsd processes and the users can no longer access the system. > If I reboot the node, the heal will finish normally after a few minutes and > the system will be responsive, but a few hours later the issue will start > again. It look like it is hanging in a heal and spinning up the load on > one of the bricks. The heal gets stuck and says it is crawling and never > returns. After a few minutes of the heal saying it is crawling, the load > spikes up and the mounts become unresponsive. > > Any suggestions on how to fix this? It has us stopped cold as the user > can no longer access the systems when the load spikes... Logs attached. > > System setup info is: > > [root@gfs01a ~]# gluster volume info homegfs > > Volume Name: homegfs > Type: Distributed-Replicate > Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071 > Status: Started > Number of Bricks: 4 x 2 = 8 > Transport-type: tcp > Bricks: > Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs > Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs > Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs > Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs > Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs > Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs > Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs > Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs > Options Reconfigured: > performance.io-thread-count: 32 > performance.cache-size: 128MB > performance.write-behind-window-size: 128MB > server.allow-insecure: on > network.ping-timeout: 42 > storage.owner-gid: 100 > geo-replication.indexing: off > geo-replication.ignore-pid-check: on > changelog.changelog: off > changelog.fsync-interval: 3 > changelog.rollover-time: 15 > server.manage-gids: on > diagnostics.client-log-level: WARNING > > [root@gfs01a ~]# rpm -qa | grep gluster > gluster-nagios-common-0.1.1-0.el6.noarch > glusterfs-fuse-3.6.6-1.el6.x86_64 > glusterfs-debuginfo-3.6.6-1.el6.x86_64 > glusterfs-libs-3.6.6-1.el6.x86_64 > glusterfs-geo-replication-3.6.6-1.el6.x86_64 > glusterfs-api-3.6.6-1.el6.x86_64 > glusterfs-devel-3.6.6-1.el6.x86_64 > glusterfs-api-devel-3.6.6-1.el6.x86_64 > glusterfs-3.6.6-1.el6.x86_64 > glusterfs-cli-3.6.6-1.el6.x86_64 > glusterfs-rdma-3.6.6-1.el6.x86_64 > samba-vfs-glusterfs-4.1.11-2.el6.x86_64 > glusterfs-server-3.6.6-1.el6.x86_64 > glusterfs-extra-xlators-3.6.6-1.el6.x86_64 > > > > > > ___ > Gluster-devel mailing > listGluster-devel@gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-devel > > > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users > gfs01a_data_brick01a_homegfs Description: Binary data gfs01a_data_brick02a_homegfs Description: Binary data gfs01b_data_brick01b_homegfs Description: Binary data gfs01b_data_brick02b_homegfs Description: Binary data gfs02a_data_brick01a_homegfs Description: Binary data gfs02a_data_brick02a_homegfs Description: Binary data gfs02b_data_brick01b_homegfs Description: Binary data gfs02b_data_brick02b_homegfs Description: Binary data ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] glusterfsd crash due to page allocation failure
Hello, We've recently upgraded from gluster 3.6.6 to 3.7.6 and have started encountering dmesg page allocation errors (stack trace is appended). It appears that glusterfsd now sometimes fills up the cache completely and crashes with a page allocation failure. I *believe* it mainly happens when copying lots of new data to the system, running a 'find', or similar. Hosts are all Scientific Linux 6.6 and these errors occur consistently on two separate gluster pools. Has anyone else seen this issue and are there any known fixes for it via sysctl kernel parameters or other means? Please let me know of any other diagnostic information that would help. Thanks, Patrick [1458118.134697] glusterfsd: page allocation failure. order:5, mode:0x20 > [1458118.134701] Pid: 6010, comm: glusterfsd Not tainted > 2.6.32-573.3.1.el6.x86_64 #1 > [1458118.134702] Call Trace: > [1458118.134714] [] ? __alloc_pages_nodemask+0x7dc/0x950 > [1458118.134728] [] ? mlx4_ib_post_send+0x680/0x1f90 > [mlx4_ib] > [1458118.134733] [] ? kmem_getpages+0x62/0x170 > [1458118.134735] [] ? fallback_alloc+0x1ba/0x270 > [1458118.134736] [] ? cache_grow+0x2cf/0x320 > [1458118.134738] [] ? cache_alloc_node+0x99/0x160 > [1458118.134743] [] ? pskb_expand_head+0x62/0x280 > [1458118.134744] [] ? __kmalloc+0x199/0x230 > [1458118.134746] [] ? pskb_expand_head+0x62/0x280 > [1458118.134748] [] ? __pskb_pull_tail+0x2aa/0x360 > [1458118.134751] [] ? harmonize_features+0x29/0x70 > [1458118.134753] [] ? dev_hard_start_xmit+0x1c4/0x490 > [1458118.134758] [] ? sch_direct_xmit+0x15a/0x1c0 > [1458118.134759] [] ? dev_queue_xmit+0x228/0x320 > [1458118.134762] [] ? neigh_connected_output+0xbd/0x100 > [1458118.134766] [] ? ip_finish_output+0x287/0x360 > [1458118.134767] [] ? ip_output+0xb8/0xc0 > [1458118.134769] [] ? __ip_local_out+0x9f/0xb0 > [1458118.134770] [] ? ip_local_out+0x25/0x30 > [1458118.134772] [] ? ip_queue_xmit+0x190/0x420 > [1458118.134773] [] ? __alloc_pages_nodemask+0x129/0x950 > [1458118.134776] [] ? tcp_transmit_skb+0x4b4/0x8b0 > [1458118.134778] [] ? tcp_write_xmit+0x1da/0xa90 > [1458118.134779] [] ? __kmalloc_node+0x4d/0x60 > [1458118.134780] [] ? tcp_push_one+0x30/0x40 > [1458118.134782] [] ? tcp_sendmsg+0x9cc/0xa20 > [1458118.134786] [] ? sock_aio_write+0x19b/0x1c0 > [1458118.134788] [] ? sock_aio_write+0x0/0x1c0 > [1458118.134791] [] ? do_sync_readv_writev+0xfb/0x140 > [1458118.134797] [] ? autoremove_wake_function+0x0/0x40 > [1458118.134801] [] ? selinux_file_permission+0xbf/0x150 > [1458118.134804] [] ? security_file_permission+0x16/0x20 > [1458118.134806] [] ? do_readv_writev+0xd6/0x1f0 > [1458118.134807] [] ? vfs_writev+0x46/0x60 > [1458118.134809] [] ? sys_writev+0x51/0xd0 > [1458118.134812] [] ? __audit_syscall_exit+0x25e/0x290 > [1458118.134816] [] ? system_call_fastpath+0x16/0x1b > ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Gluster v3.7.3 REMOVEXATTR clogs volume logs
I am currently testing gluster v3.7.3 on Scientific Linux 7.1 and a newly created gluster volume. After transferring some files to the volume over the fuse mount, the volume log is flooded with 2.5GB of errors like the following: [2015-08-13 15:54:36.921622] W [fuse-bridge.c:1230:fuse_err_cbk] 0-glusterfs-fuse: 361669: REMOVEXATTR() /path/to/file => -1 (No data available) There are several (fixed) redhat bugs relating to similar errors: https://bugzilla.redhat.com/show_bug.cgi?id=1245966 https://bugzilla.redhat.com/show_bug.cgi?id=1188064 https://bugzilla.redhat.com/show_bug.cgi?id=1192832 - Is anyone else running 3.7 seeing similar errors? - Is there something wrong with my configuration? - If it's a problem, what other information do you need to diagnose? gluster volume info: Volume Name: testbrick Type: Distribute Volume ID: 91b0d825-5e39-4b17-a505-174b47849b40 Status: Started Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: gfstest:/data/brick01/testbrick Options Reconfigured: performance.readdir-ahead: on Thanks, Patrick ___ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users