Re: [Gluster-users] One client can effectively hang entire gluster array

2016-08-22 Thread Glomski, Patrick
Not a bad idea for a workaround, but that would require significant
investment with our current setup. All of our compute nodes are stateless /
have no disks. All storage is network storage. It's probably still not
feasible if we added disks because some simulations produce terabytes of
data. We would need some kind of periodic check-and-sync mechanism.

I still owe the gluster devs a test of that patch

On Fri, Aug 19, 2016 at 3:22 PM, Steve Dainard  wrote:

> As a potential solution on the compute node side, can you have users copy
> relevant data from the gluster volume to a local disk (ie $TMDIR), operate
> on that disk, write output files to that disk, and then write the results
> back to persistent storage once the job is complete?
>
> There are lots of factors to consider, but this is how we operate in a
> small compute environment trying to avoid over-loading gluster storage
> nodes.
>
> On Fri, Jul 8, 2016 at 6:29 AM, Glomski, Patrick <
> patrick.glom...@corvidtec.com> wrote:
>
>> Hello, users and devs.
>>
>> TL;DR: One gluster client can essentially cause denial of service /
>> availability loss to entire gluster array. There's no way to stop it and
>> almost no way to find the bad client. Probably all (at least 3.6 and 3.7)
>> versions are affected.
>>
>> We have two large replicate gluster arrays (3.6.6 and 3.7.11) that are
>> used in a high-performance computing environment. Two file access cases
>> cause severe issues with glusterfs: Some of our scientific codes write
>> hundreds of files (~400-500) simultaneously (one file or more per processor
>> core, so lots of small or large writes) and others read thousands of files
>> (2000-3000) simultaneously to grab metadata from each file (lots of small
>> reads).
>>
>> In either of these situations, one glusterfsd process on whatever peer
>> the client is currently talking to will skyrocket to *nproc* cpu usage
>> (800%, 1600%) and the storage cluster is essentially useless; all other
>> clients will eventually try to read or write data to the overloaded peer
>> and, when that happens, their connection will hang. Heals between peers
>> hang because the load on the peer is around 1.5x the number of cores or
>> more. This occurs in either gluster 3.6 or 3.7, is very repeatable, and
>> happens much too frequently.
>>
>> Even worse, there seems to be no definitive way to diagnose which client
>> is causing the issues. Getting 'volume status <> clients' doesn't help
>> because it reports the total number of bytes read/written by each client.
>> (a) The metadata in question is tiny compared to the multi-gigabyte output
>> files being dealt with and (b) the byte-count is cumulative for the clients
>> and the compute nodes are always up with the filesystems mounted, so the
>> byte transfer counts are astronomical. The best solution I've come up with
>> is to blackhole-route traffic from clients one at a time (effectively push
>> the traffic over to the other peer), wait a few minutes for all of the
>> backlogged traffic to dissipate (if it's going to), see if the load on
>> glusterfsd drops, and repeat until I find the client causing the issue. I
>> would *love* any ideas on a better way to find rogue clients.
>>
>> More importantly, though, there must be some feature envorced to stop one
>> user from having the capability to render the entire filesystem unavailable
>> for all other users. In the worst case, I would even prefer a gluster
>> volume option that simply disconnects clients making over some threshold of
>> file open requests. That's WAY more preferable than a complete availability
>> loss reminiscent of a DDoS attack...
>>
>> Apologies for the essay and looking forward to any help you can provide.
>>
>> Thanks,
>> Patrick
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] One client can effectively hang entire gluster array

2016-07-08 Thread Glomski, Patrick
Hello, users and devs.

TL;DR: One gluster client can essentially cause denial of service /
availability loss to entire gluster array. There's no way to stop it and
almost no way to find the bad client. Probably all (at least 3.6 and 3.7)
versions are affected.

We have two large replicate gluster arrays (3.6.6 and 3.7.11) that are used
in a high-performance computing environment. Two file access cases cause
severe issues with glusterfs: Some of our scientific codes write hundreds
of files (~400-500) simultaneously (one file or more per processor core, so
lots of small or large writes) and others read thousands of files
(2000-3000) simultaneously to grab metadata from each file (lots of small
reads).

In either of these situations, one glusterfsd process on whatever peer the
client is currently talking to will skyrocket to *nproc* cpu usage (800%,
1600%) and the storage cluster is essentially useless; all other clients
will eventually try to read or write data to the overloaded peer and, when
that happens, their connection will hang. Heals between peers hang because
the load on the peer is around 1.5x the number of cores or more. This
occurs in either gluster 3.6 or 3.7, is very repeatable, and happens much
too frequently.

Even worse, there seems to be no definitive way to diagnose which client is
causing the issues. Getting 'volume status <> clients' doesn't help because
it reports the total number of bytes read/written by each client. (a) The
metadata in question is tiny compared to the multi-gigabyte output files
being dealt with and (b) the byte-count is cumulative for the clients and
the compute nodes are always up with the filesystems mounted, so the byte
transfer counts are astronomical. The best solution I've come up with is to
blackhole-route traffic from clients one at a time (effectively push the
traffic over to the other peer), wait a few minutes for all of the
backlogged traffic to dissipate (if it's going to), see if the load on
glusterfsd drops, and repeat until I find the client causing the issue. I
would *love* any ideas on a better way to find rogue clients.

More importantly, though, there must be some feature envorced to stop one
user from having the capability to render the entire filesystem unavailable
for all other users. In the worst case, I would even prefer a gluster
volume option that simply disconnects clients making over some threshold of
file open requests. That's WAY more preferable than a complete availability
loss reminiscent of a DDoS attack...

Apologies for the essay and looking forward to any help you can provide.

Thanks,
Patrick
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Multiple questions regarding monitoring of Gluster

2016-06-22 Thread Glomski, Patrick
If you're not opposed to another dependency, there is a glusterfs-nagios
package (python-based) which presents the volumes in a much more useful
format for monitoring.

http://download.gluster.org/pub/gluster/glusterfs-nagios/1.1.0/

Patrick

On Tue, Jun 21, 2016 at 10:28 AM, Malte Schmidt  wrote:

> Under which conditions does "gluster volume status $volume detail" return
> something else than a table?
>
> Typical, expected output:
>
> root@server1:~# gluster volume status vol0 detail
> Status of volume: vol0
>
> --
> Brick : Brick server1:/data/glusterfs/vol0
> TCP Port : 49152
> RDMA Port : 0
> Online : Y
> Pid : 2942
> File System : xfs
> Device : /dev/mapper/glusterfs
> Mount Options : rw,relatime,attr2,inode64,noquota
> Inode Size : 512
> Disk Space Free : 9.0GB
> Total Disk Space : 20.0GB
> Inode Count : 10485760
> Free Inodes : 8774085
>
> --
> Brick : Brick server2:/data/glusterfs/vol0
> TCP Port : 49152
> RDMA Port : 0
> Online : Y
> Pid : 3275
> File System : xfs
> Device : /dev/mapper/glusterfs
> Mount Options : rw,relatime,attr2,inode64,noquota
> Inode Size : 512
> Disk Space Free : 9.0GB
> Total Disk Space : 20.0GB
> Inode Count : 10485760
> Free Inodes : 8774085
>
> Are there any conditions under which that table is different? Better
> question: What is the best way of getting this data for usage in Nagios?
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] gluster 3.7.9 permission denied and mv errors

2016-05-26 Thread Glomski, Patrick
'Failed moves" are still a problem on our backup system. Another instance
is attached with gfids if it's helpful. In this case, the rename after
explicitly removing the target location was successful.

   mv the files from bkp01 --> bkp00   : 18:41:02
> /bin/mv: cannot move
> `./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4' to
> `../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4':
> File exists
> /bin/mv: cannot move
> `./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4' to
> `../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4':
> File exists
>

Source:

> # file:
> data/brick01bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4
> trusted.bit-rot.version=0x0200571632ea00056a16
> trusted.gfid=0x2246740774424bb78f408d46c1f2a13e
> trusted.pgfid.583e43d6-27ff-4978-90a8-d7057385cf72=0x0001
>

Target:

> getfattr: Removing leading '/' from absolute path names
> # file:
> data/brick01bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4
> trusted.gfid=0x556cd0eabffe4549a5ae43a188491db3
> trusted.glusterfs.dht.linkto=0x6766736261636b75702d636c69656e742d3100
> trusted.pgfid.9e21e545-b254-4d93-ba34-8dd204b41160=0x0001
>
getfattr:
> /data/brick02bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4:
> No such file or directory
>
getfattr:
> /data/brick01bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4:
> No such file or directory
> getfattr:
> /data/brick02bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4:
> No such file or directory
> getfattr:
> /data/brick03bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4:
> No such file or directory
> getfattr:
> /data/brick04bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4:
> No such file or directory
> getfattr:
> /data/brick05bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4:
> No such file or directory
>
> # file:
> data/brick02bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4
> trusted.bit-rot.version=0x0200569bb8d700074a42
> trusted.gfid=0x556cd0eabffe4549a5ae43a188491db3
> trusted.pgfid.9e21e545-b254-4d93-ba34-8dd204b41160=0x0001
>
> getfattr:
> /data/brick01bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4:
> No such file or directory
> getfattr:
> /data/brick02bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4:
> No such file or directory
> getfattr:
> /data/brick03bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4:
> No such file or directory
> getfattr:
> /data/brick04bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4:
> No such file or directory
> getfattr:
> /data/brick05bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4:
> No such file or directory
> stat: cannot stat
> `"../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4"':
> No such file or directory
> retry: renaming
> ./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4 ->
> ../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4
> Rename Succeeded!
> ./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4 ->
> ../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p9/85/data_collected4
>


Thanks for any assistance,
Patrick


On Tue, May 3, 2016 at 5:02 PM, Glomski, Patrick <
patrick.glom...@corvidtec.com> wrote:

> Attaching a text file with the same content that is easier to read.
>
> Patrick
>
> On Tue, May 3, 2016 at 4:59 PM, Glomski, Patrick <
> patrick.glom...@corvidtec.com> wrote:
>
>> Raghavendra,
>>
>> Last night the backup had four of these errors and only one of the
>> 'retried moves' succeeded. The only one to succeed in moving the file the
>> second time had target files on a different gluster peer (gfs01bkp). Not
>> sure if that is significant.
>>
>> Note that I cannot stat the target file over the FUSE mount for any of
>> these, but it exists on the bricks. Running an 'ls' on the directory
>> containing the file (via FUSE) does not fix the issue. Source and target
>> xattrs are appended for all bricks on all machines in the distributed
&

Re: [Gluster-users] [Gluster-devel] gluster 3.7.9 permission denied and mv errors

2016-05-03 Thread Glomski, Patrick
Attaching a text file with the same content that is easier to read.

Patrick

On Tue, May 3, 2016 at 4:59 PM, Glomski, Patrick <
patrick.glom...@corvidtec.com> wrote:

> Raghavendra,
>
> Last night the backup had four of these errors and only one of the
> 'retried moves' succeeded. The only one to succeed in moving the file the
> second time had target files on a different gluster peer (gfs01bkp). Not
> sure if that is significant.
>
> Note that I cannot stat the target file over the FUSE mount for any of
> these, but it exists on the bricks. Running an 'ls' on the directory
> containing the file (via FUSE) does not fix the issue. Source and target
> xattrs are appended for all bricks on all machines in the distributed
> volume.
>
> Let me know if there's any other information it would be useful to gather,
> as this issue seems to recur frequently.
>
> Thanks,
> Patrick
>
> # Move failures
>>
>> /bin/mv: cannot move
>> `./homegfs/hpc_shared/motorsports/056-1/data_collected3' to
>> `../bkp00/./homegfs/hpc_shared/motorsports/056-1/data_collected3': File
>> exists
>> /bin/mv: cannot move
>> `./homegfs/hpc_shared/motorsports/090-1/data_collected3' to
>> `../bkp00/./homegfs/hpc_shared/motorsports/090-1/data_collected3': File
>> exists
>> /bin/mv: cannot move
>> `./homegfs/hpc_shared/motorsports/057-2/data_collected3' to
>> `../bkp00/./homegfs/hpc_shared/motorsports/057-2/data_collected3': File
>> exists
>> /bin/mv: cannot move
>> `./homegfs/hpc_shared/motorsports/54/data_collected4' to
>> `../bkp00/./homegfs/hpc_shared/motorsports/54/data_collected4': File exists
>>
>> /bin/mv: cannot move
>> `./homegfs/hpc_shared/motorsports/056-1/data_collected3' to
>> `../bkp00/./homegfs/hpc_shared/motorsports/056-1/data_collected3': File
>> exists
>> /bin/mv: cannot move
>> `./homegfs/hpc_shared/motorsports/090-1/data_collected3' to
>> `../bkp00/./homegfs/hpc_shared/motorsports/090-1/data_collected3': File
>> exists
>> /bin/mv: cannot move
>> `./homegfs/hpc_shared/motorsports/057-2/data_collected3' to
>> `../bkp00/./homegfs/hpc_shared/motorsports/057-2/data_collected3': File
>> exists
>> /bin/mv: cannot move
>> `./homegfs/hpc_shared/motorsports/54/data_collected4' to
>> `../bkp00/./homegfs/hpc_shared/motorsports/54/data_collected4': File exists
>>
>>
>> 
>> retry: renaming ./homegfs/hpc_shared/motorsports/056-1/data_collected3 ->
>> ../bkp00/./homegfs/hpc_shared/motorsports/056-1/data_collected3
>>
>> source xattrs
>>   gfs01bkp
>> getfattr:
>> /data/brick01bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/056-1/data_collected3:
>> No such file or directory
>> getfattr:
>> /data/brick02bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/056-1/data_collected3:
>> No such file or directory
>>
>>   gfs02bkp
>> getfattr:
>> /data/brick01bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/056-1/data_collected3:
>> No such file or directory
>> # file:
>> data/brick02bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/056-1/data_collected3
>>   trusted.bit-rot.version=0x0200570308980001d157
>>   trusted.gfid=0xe07abd8ae861442ebc0df8b20719af30
>>   trusted.pgfid.1776adb6-2925-49d3-9cca-8a04c29f4c05=0x0001
>>
>> getfattr: Removing leading '/' from absolute path names
>> getfattr:
>> /data/brick03bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/056-1/data_collected3:
>> No such file or directory
>> getfattr:
>> /data/brick04bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/056-1/data_collected3:
>> No such file or directory
>> getfattr:
>> /data/brick05bkp/gfsbackup/bkp01/./homegfs/hpc_shared/motorsports/056-1/data_collected3:
>> No such file or directory
>>
>> target xattrs
>>   gfs01bkp
>>getfattr:
>> /data/brick01bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/056-1/data_collected3:
>> No such file or directory
>>getfattr:
>> /data/brick02bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/056-1/data_collected3:
>> No such file or directory
>>
>>   gfs02bkp
>> # file:
>> data/brick01bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/056-1/data_collected3
>>   trusted.bit-rot.version=0x0200569bb8d20003ed00
>>

Re: [Gluster-users] [Gluster-devel] gluster 3.7.9 permission denied and mv errors

2016-05-03 Thread Glomski, Patrick
02bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/54/data_collected4
>   trusted.bit-rot.version=0x0200569bb8d700074a42
>   trusted.gfid=0x980c12097507431d953ee458ec14ca4a
>   trusted.pgfid.6a8ecf7c-5597-4725-9764-455f7e267667=0x0001
>
> getfattr: Removing leading '/' from absolute path names
>
>   gfs02bkp
> getfattr:
> /data/brick01bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/54/data_collected4:
> No such file or directory
> getfattr:
> /data/brick02bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/54/data_collected4:
> No such file or directory
> getfattr:
> /data/brick03bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/54/data_collected4:
> No such file or directory
> getfattr:
> /data/brick04bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/54/data_collected4:
> No such file or directory
> getfattr:
> /data/brick05bkp/gfsbackup/bkp01/../bkp00/./homegfs/hpc_shared/motorsports/54/data_collected4:
> No such file or directory
>
> stat: cannot stat
> `"../bkp00/./homegfs/hpc_shared/motorsports/54/data_collected4"': No such
> file or directory
> Rename Succeeded!
>
> 
>
>



On Fri, Apr 29, 2016 at 10:21 AM, Glomski, Patrick <
patrick.glom...@corvidtec.com> wrote:

> Raghavendra,
>
> This error is occurring in a shell script moving files between directories
> on a FUSE mount when overwriting an old file with a newer file (it's a
> backup script, moving an incremental backup of a file into a 'rolling full
> backup' directory).
>
> As a temporary workaround, we parse the output of this shell script for
> move errors and handle the errors as they happen. Simply re-moving the
> files fails, so we stat the destination (to see if we can learn anything
> about the type of file that causes this behavior), delete the destination,
> and try the move again (success!). Typical output is as follows:
>
> /bin/mv: cannot move 
> `./homegfs/hpc_shared/motorsports/gmics/Raven/p11/149/data_collected4'
>> to `../bkp00/./homegfs/hpc_shared/motorsports/gmics/
>> Raven/p11/149/data_collected4': File exists
>> /bin/mv: cannot move 
>> `./homegfs/hpc_shared/motorsports/gmics/Raven/p11/149/data_collected4'
>> to `../bkp00/./homegfs/hpc_shared/motorsports/gmics/
>> Raven/p11/149/data_collected4': File exists
>>   File: `../bkp00/./homegfs/hpc_shared/motorsports/gmics/
>> Raven/p11/149/data_collected4'
>>   Size: 1714Blocks: 4  IO Block: 131072 regular file
>> Device: 13h/19d Inode: 11051758947722304158  Links: 1
>> Access: (0660/-rw-rw)  Uid: (  628/pkeistler)   Gid: ( 2020/   gmirl)
>> Access: 2016-01-20 17:20:45.0 -0500
>> Modify: 2015-11-06 15:20:41.0 -0500
>> Change: 2016-01-27 03:35:00.434712146 -0500
>> retry: renaming 
>> ./homegfs/hpc_shared/motorsports/gmics/Raven/p11/149/data_collected4
>> -> ../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p11/
>> 149/data_collected4
>>
>
> Not sure if that description rings any bells as to what the problem might
> be, but if not, I added some code to print out the 'getattr' for the source
> and destination file on all of the bricks (before we delete the
> destination) and will post to this thread the next time we have that issue.
>
> Thanks,
> Patrick
>
>
> On Fri, Apr 29, 2016 at 8:15 AM, Raghavendra G 
> wrote:
>
>>
>>
>> On Wed, Apr 13, 2016 at 10:00 PM, David F. Robinson <
>> david.robin...@corvidtec.com> wrote:
>>
>>> I am running into two problems (possibly related?).
>>>
>>> 1) Every once in a while, when I do a 'rm -rf DIRNAME', it comes back
>>> with an error:
>>> rm: cannot remove `DIRNAME` : Directory not empty
>>>
>>> If I try the 'rm -rf' again after the error, it deletes the
>>> directory.  The issue is that I have scripts that clean up directories, and
>>> they are failing unless I go through the deletes a 2nd time.
>>>
>>
>> What kind of mount are you using? Is it a FUSE or NFS mount? Recently we
>> saw a similar issue on NFS clients on RHEL6 where rm -rf used to fail with
>> ENOTEMPTY in some specific cases.
>>
>>
>>>
>>> 2) I have different scripts to move a large numbers of files (5-25k)
>>> from one directory to another.  Sometimes I receive an error:
>>> /bin/mv: cannot move `xyz` to `../bkp00/xyz`: File exists
>>>
>>
>> Does ./bkp00/xyz exist on backend? 

Re: [Gluster-users] [Gluster-devel] gluster 3.7.9 permission denied and mv errors

2016-04-29 Thread Glomski, Patrick
Raghavendra,

This error is occurring in a shell script moving files between directories
on a FUSE mount when overwriting an old file with a newer file (it's a
backup script, moving an incremental backup of a file into a 'rolling full
backup' directory).

As a temporary workaround, we parse the output of this shell script for
move errors and handle the errors as they happen. Simply re-moving the
files fails, so we stat the destination (to see if we can learn anything
about the type of file that causes this behavior), delete the destination,
and try the move again (success!). Typical output is as follows:

/bin/mv: cannot move
`./homegfs/hpc_shared/motorsports/gmics/Raven/p11/149/data_collected4'
> to `../bkp00/./homegfs/hpc_shared/motorsports/gmics/
> Raven/p11/149/data_collected4': File exists
> /bin/mv: cannot move 
> `./homegfs/hpc_shared/motorsports/gmics/Raven/p11/149/data_collected4'
> to `../bkp00/./homegfs/hpc_shared/motorsports/gmics/
> Raven/p11/149/data_collected4': File exists
>   File: `../bkp00/./homegfs/hpc_shared/motorsports/gmics/
> Raven/p11/149/data_collected4'
>   Size: 1714Blocks: 4  IO Block: 131072 regular file
> Device: 13h/19d Inode: 11051758947722304158  Links: 1
> Access: (0660/-rw-rw)  Uid: (  628/pkeistler)   Gid: ( 2020/   gmirl)
> Access: 2016-01-20 17:20:45.0 -0500
> Modify: 2015-11-06 15:20:41.0 -0500
> Change: 2016-01-27 03:35:00.434712146 -0500
> retry: renaming 
> ./homegfs/hpc_shared/motorsports/gmics/Raven/p11/149/data_collected4
> -> ../bkp00/./homegfs/hpc_shared/motorsports/gmics/Raven/p11/
> 149/data_collected4
>

Not sure if that description rings any bells as to what the problem might
be, but if not, I added some code to print out the 'getattr' for the source
and destination file on all of the bricks (before we delete the
destination) and will post to this thread the next time we have that issue.

Thanks,
Patrick


On Fri, Apr 29, 2016 at 8:15 AM, Raghavendra G 
wrote:

>
>
> On Wed, Apr 13, 2016 at 10:00 PM, David F. Robinson <
> david.robin...@corvidtec.com> wrote:
>
>> I am running into two problems (possibly related?).
>>
>> 1) Every once in a while, when I do a 'rm -rf DIRNAME', it comes back
>> with an error:
>> rm: cannot remove `DIRNAME` : Directory not empty
>>
>> If I try the 'rm -rf' again after the error, it deletes the
>> directory.  The issue is that I have scripts that clean up directories, and
>> they are failing unless I go through the deletes a 2nd time.
>>
>
> What kind of mount are you using? Is it a FUSE or NFS mount? Recently we
> saw a similar issue on NFS clients on RHEL6 where rm -rf used to fail with
> ENOTEMPTY in some specific cases.
>
>
>>
>> 2) I have different scripts to move a large numbers of files (5-25k) from
>> one directory to another.  Sometimes I receive an error:
>> /bin/mv: cannot move `xyz` to `../bkp00/xyz`: File exists
>>
>
> Does ./bkp00/xyz exist on backend? If yes, what is the value of gfid xattr
> (key: "trusted.gfid") for "xyz" and "./bkp00/xyz" on backend bricks (I need
> gfid from all the bricks) when this issue happens?
>
>
>> The move is done using '/bin/mv -f', so it should overwrite the file
>> if it exists.  I have tested this with hundreds of files, and it works as
>> expected.  However, every few days the script that moves the files will
>> have problems with 1 or 2 files during the move.  This is one move problem
>> out of roughly 10,000 files that are being moved and I cannot figure out
>> any reason for the intermittent problem.
>>
>> Setup details for my gluster configuration shown below.
>>
>> [root@gfs01bkp logs]# gluster volume info
>>
>> Volume Name: gfsbackup
>> Type: Distribute
>> Volume ID: e78d5123-d9bc-4d88-9c73-61d28abf0b41
>> Status: Started
>> Number of Bricks: 7
>> Transport-type: tcp
>> Bricks:
>> Brick1: gfsib01bkp.corvidtec.com:/data/brick01bkp/gfsbackup
>> Brick2: gfsib01bkp.corvidtec.com:/data/brick02bkp/gfsbackup
>> Brick3: gfsib02bkp.corvidtec.com:/data/brick01bkp/gfsbackup
>> Brick4: gfsib02bkp.corvidtec.com:/data/brick02bkp/gfsbackup
>> Brick5: gfsib02bkp.corvidtec.com:/data/brick03bkp/gfsbackup
>> Brick6: gfsib02bkp.corvidtec.com:/data/brick04bkp/gfsbackup
>> Brick7: gfsib02bkp.corvidtec.com:/data/brick05bkp/gfsbackup
>> Options Reconfigured:
>> nfs.disable: off
>> server.allow-insecure: on
>> storage.owner-gid: 100
>> server.manage-gids: on
>> cluster.lookup-optimize: on
>> server.event-threads: 8
>> client.event-threads: 8
>> changelog.changelog: off
>> storage.build-pgfid: on
>> performance.readdir-ahead: on
>> diagnostics.brick-log-level: WARNING
>> diagnostics.client-log-level: WARNING
>> cluster.rebal-throttle: aggressive
>> performance.cache-size: 1024MB
>> performance.write-behind-window-size: 10MB
>>
>>
>> [root@gfs01bkp logs]# rpm -qa | grep gluster
>> glusterfs-server-3.7.9-1.el6.x86_64
>> glusterfs-debuginfo-3.7.9-1.el6.x86_64
>> glusterfs-api-3.7.9-1.el6.x86_64
>> glusterfs-resource-agents-3.7.9-1.el6.noa

[Gluster-users] Gluster + Infiniband + 3.x kernel -> hard crash?

2016-04-06 Thread Glomski, Patrick
We run gluster 3.7 in a distributed replicated setup. Infiniband (tcp)
links the gluster peers together and clients use the ethernet interface.

This setup is stable running CentOS 6.x and using the most recent
infiniband drivers provided by Mellanox. Uptime was 170 days when we took
it down to wipe the systems and update to CentOS 7.

When the exact same setup is loaded onto a CentOS 7 machine (minor setup
differences, but basically the same; setup is handled by ansible), the
peers will (seemingly randomly) experience a hard crash and need to be
power-cycled. There is no output on the screen and nothing in the logs.
After rebooting, the peer reconnects, heals whatever files it missed, and
everything is happy again. Maximum uptime for any given peer is 20 days.
Thanks to the replication, clients maintain connectivity, but from a system
administration perspective it's driving me crazy!

We run other storage servers with the same infiniband and CentOS7 setup
except that they use NFS instead of gluster. NFS shares are served through
infiniband to some machines and ethernet to others.

Is it possible that gluster's (and only gluster's) use of the infiniband
kernel module to send tcp packets to its peers on a 3 kernel is causing the
system to have a hard crash? Pretty specific problem and it doesn't make
much sense to me, but that's sure where the evidence seems to point.

Anyone running CentOS 7 gluster arrays with infiniband out there to confirm
that it works fine for them? Gluster devs care to chime in with a better
theory? I'd love for this random crashing to stop.

Thanks,
Patrick
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] heal hanging

2016-01-21 Thread Glomski, Patrick
Samba version is 4.1.17 that you guys maintain at download.gluster.org. The
vfs plugin comes packaged with it.

http://download.gluster.org/pub/gluster/glusterfs/samba/EPEL.repo/epel-6/x86_64/

# smbd --version
Version 4.1.17

# rpm -qa | grep samba-vfs-glusterfs
samba-vfs-glusterfs-4.1.17-4.el6rhs.x86_64

Let us know if there's anything else we can provide,

Patrick


On Thu, Jan 21, 2016 at 10:07 PM, Raghavendra Talur 
wrote:

>
> On Jan 22, 2016 7:27 AM, "Pranith Kumar Karampuri" 
> wrote:
> >
> >
> >
> > On 01/22/2016 07:19 AM, Pranith Kumar Karampuri wrote:
> >>
> >>
> >>
> >> On 01/22/2016 07:13 AM, Glomski, Patrick wrote:
> >>>
> >>> We use the samba glusterfs virtual filesystem (the current version
> provided on download.gluster.org), but no windows clients connecting
> directly.
> >>
> >>
> >> Hmm.. Is there a way to disable using this and check if the CPU% still
> increases? What getxattr of "glusterfs.get_real_filename " does is
> to scan the entire directory looking for strcasecmp(,
> ). If anything matches then it will return the
> . But the problem is the scan is costly. So I wonder if
> this is the reason for the CPU spikes.
> >
> > +Raghavendra Talur, +Poornima
> >
> > Raghavendra, Poornima,
> > When are these getxattrs triggered? Did you guys see any
> brick CPU spikes before? I initially thought it could be because of big
> directory heals. But this is happening even when no self-heals are
> required. So I had to move away from that theory.
>
> These getxattrs are triggered when a SMB client performs a path based
> operation. It is necessary then that some client was connected.
>
> The last fix to go in that code for 3.6 was
> http://review.gluster.org/#/c/10403/.
>
> I am not able to determine which release of 3.6 it made into. Will update.
>
> Also we would need version of Samba installed. Including the vfs plugin
> package.
>
> There is a for loop of strcmp involved here which does take a lot of CPU.
> It should be for short bursts though and is expected and harmless.
>
> >
> > Pranith
> >
> >>
> >> Pranith
> >>>
> >>>
> >>> On Thu, Jan 21, 2016 at 8:37 PM, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
> >>>>
> >>>> Do you have any windows clients? I see a lot of getxattr calls for
> "glusterfs.get_real_filename" which lead to full readdirs of the
> directories on the brick.
> >>>>
> >>>> Pranith
> >>>>
> >>>> On 01/22/2016 12:51 AM, Glomski, Patrick wrote:
> >>>>>
> >>>>> Pranith, could this kind of behavior be self-inflicted by us
> deleting files directly from the bricks? We have done that in the past to
> clean up an issues where gluster wouldn't allow us to delete from the mount.
> >>>>>
> >>>>> If so, is it feasible to clean them up by running a search on the
> .glusterfs directories directly and removing files with a reference count
> of 1 that are non-zero size (or directly checking the xattrs to be sure
> that it's not a DHT link).
> >>>>>
> >>>>> find /data/brick01a/homegfs/.glusterfs -type f -not -empty -links -2
> -exec rm -f "{}" \;
> >>>>>
> >>>>> Is there anything I'm inherently missing with that approach that
> will further corrupt the system?
> >>>>>
> >>>>>
> >>>>> On Thu, Jan 21, 2016 at 1:02 PM, Glomski, Patrick <
> patrick.glom...@corvidtec.com> wrote:
> >>>>>>
> >>>>>> Load spiked again: ~1200%cpu on gfs02a for glusterfsd. Crawl has
> been running on one of the bricks on gfs02b for 25 min or so and users
> cannot access the volume.
> >>>>>>
> >>>>>> I re-listed the xattrop directories as well as a 'top' entry and
> heal statistics. Then I restarted the gluster services on gfs02a.
> >>>>>>
> >>>>>> === top ===
> >>>>>> PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
> COMMAND
> >>>>>>  8969 root  20   0 2815m 204m 3588 S 1181.0  0.6 591:06.93
> glusterfsd
> >>>>>>
> >>>>>> === xattrop ===
> >>>>>> /data/brick01a/homegfs/.glusterfs/indices/xattrop:
> >>>>>> xattrop-41f19453-91e4-437c-afa9-3b25614de210
> xattrop-9b

Re: [Gluster-users] [Gluster-devel] heal hanging

2016-01-21 Thread Glomski, Patrick
Last entry for get_real_filename on any of the bricks was when we turned
off the samba gfapi vfs plugin earlier today:

/var/log/glusterfs/bricks/data-brick01a-homegfs.log:[2016-01-21
15:13:00.008239] E [server-rpc-fops.c:768:server_getxattr_cbk]
0-homegfs-server: 105: GETXATTR /wks_backup
(40e582d6-b0c7-4099-ba88-9168a3c32ca6)
(glusterfs.get_real_filename:desktop.ini) ==> (Permission denied)

We'll get back to you with those traces when %cpu spikes again. As with
most sporadic problems, as soon as you want something out of it, the issue
becomes harder to reproduce.


On Thu, Jan 21, 2016 at 9:21 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

>
>
> On 01/22/2016 07:25 AM, Glomski, Patrick wrote:
>
> Unfortunately, all samba mounts to the gluster volume through the gfapi
> vfs plugin have been disabled for the last 6 hours or so and frequency of
> %cpu spikes is increased. We had switched to sharing a fuse mount through
> samba, but I just disabled that as well. There are no samba shares of this
> volume now. The spikes now happen every thirty minutes or so. We've
> resorted to just rebooting the machine with high load for the present.
>
>
> Could you see if the logs of following type are not at all coming?
> [2016-01-21 15:13:00.005736] E [server-rpc-fops.c:768:server_getxattr_cbk]
> 0-homegfs-server: 110: GETXATTR /wks_backup (40e582d6-b0c7-4099-ba88-9168a3c
> 32ca6) (glusterfs.get_real_filename:desktop.ini) ==> (Permission denied)
>
> These are operations that failed. Operations that succeed are the ones
> that will scan the directory. But I don't have a way to find them other
> than using tcpdumps.
>
> At the moment I have 2 theories:
> 1) these get_real_filename calls
> 2) [2016-01-21 16:10:38.017828] E [server-helpers.c:46:gid_resolve]
> 0-gid-cache: getpwuid_r(494) failed
> "
>
> Yessir they are.  Normally, sssd would look to the local cache file in
> /var/lib/sss/db/ first, to get any group or userid information, then go out
> to the domain controller.  I put the options that we are using on our GFS
> volumes below…  Thanks for your help.
>
>
>
> We had been running sssd with sssd_nss and sssd_be sub-processes on these
> systems for a long time, under the GFS 3.5.2 code, and not run into the
> problem that David described with the high cpu usage on sssd_nss.
>
> *" *That was Tom Young's email 1.5 years back when we debugged it. But
> the process which was consuming lot of cpu is sssd_nss. So I am not sure if
> it is same issue. Let us debug to see '1)' doesn't happen. The gstack
> traces I asked for should also help.
>
>
> Pranith
>
>
> On Thu, Jan 21, 2016 at 8:49 PM, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>>
>>
>> On 01/22/2016 07:13 AM, Glomski, Patrick wrote:
>>
>> We use the samba glusterfs virtual filesystem (the current version
>> provided on download.gluster.org), but no windows clients connecting
>> directly.
>>
>>
>> Hmm.. Is there a way to disable using this and check if the CPU% still
>> increases? What getxattr of "glusterfs.get_real_filename " does is
>> to scan the entire directory looking for strcasecmp(,
>> ). If anything matches then it will return the
>> . But the problem is the scan is costly. So I wonder if
>> this is the reason for the CPU spikes.
>>
>> Pranith
>>
>>
>> On Thu, Jan 21, 2016 at 8:37 PM, Pranith Kumar Karampuri <
>> pkara...@redhat.com> wrote:
>>
>>> Do you have any windows clients? I see a lot of getxattr calls for
>>> "glusterfs.get_real_filename" which lead to full readdirs of the
>>> directories on the brick.
>>>
>>> Pranith
>>>
>>> On 01/22/2016 12:51 AM, Glomski, Patrick wrote:
>>>
>>> Pranith, could this kind of behavior be self-inflicted by us deleting
>>> files directly from the bricks? We have done that in the past to clean up
>>> an issues where gluster wouldn't allow us to delete from the mount.
>>>
>>> If so, is it feasible to clean them up by running a search on the
>>> .glusterfs directories directly and removing files with a reference count
>>> of 1 that are non-zero size (or directly checking the xattrs to be sure
>>> that it's not a DHT link).
>>>
>>> find /data/brick01a/homegfs/.glusterfs -type f -not -empty -links -2
>>> -exec rm -f "{}" \;
>>>
>>> Is there anything I'm inherently missing with that approach that will
>>> further corrupt the system?
>>>
>>>
>>> On Thu, Jan 2

Re: [Gluster-users] [Gluster-devel] heal hanging

2016-01-21 Thread Glomski, Patrick
Unfortunately, all samba mounts to the gluster volume through the gfapi vfs
plugin have been disabled for the last 6 hours or so and frequency of %cpu
spikes is increased. We had switched to sharing a fuse mount through samba,
but I just disabled that as well. There are no samba shares of this volume
now. The spikes now happen every thirty minutes or so. We've resorted to
just rebooting the machine with high load for the present.

On Thu, Jan 21, 2016 at 8:49 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

>
>
> On 01/22/2016 07:13 AM, Glomski, Patrick wrote:
>
> We use the samba glusterfs virtual filesystem (the current version
> provided on download.gluster.org), but no windows clients connecting
> directly.
>
>
> Hmm.. Is there a way to disable using this and check if the CPU% still
> increases? What getxattr of "glusterfs.get_real_filename " does is
> to scan the entire directory looking for strcasecmp(,
> ). If anything matches then it will return the
> . But the problem is the scan is costly. So I wonder if
> this is the reason for the CPU spikes.
>
> Pranith
>
>
> On Thu, Jan 21, 2016 at 8:37 PM, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>> Do you have any windows clients? I see a lot of getxattr calls for
>> "glusterfs.get_real_filename" which lead to full readdirs of the
>> directories on the brick.
>>
>> Pranith
>>
>> On 01/22/2016 12:51 AM, Glomski, Patrick wrote:
>>
>> Pranith, could this kind of behavior be self-inflicted by us deleting
>> files directly from the bricks? We have done that in the past to clean up
>> an issues where gluster wouldn't allow us to delete from the mount.
>>
>> If so, is it feasible to clean them up by running a search on the
>> .glusterfs directories directly and removing files with a reference count
>> of 1 that are non-zero size (or directly checking the xattrs to be sure
>> that it's not a DHT link).
>>
>> find /data/brick01a/homegfs/.glusterfs -type f -not -empty -links -2
>> -exec rm -f "{}" \;
>>
>> Is there anything I'm inherently missing with that approach that will
>> further corrupt the system?
>>
>>
>> On Thu, Jan 21, 2016 at 1:02 PM, Glomski, Patrick <
>> patrick.glom...@corvidtec.com> wrote:
>>
>>> Load spiked again: ~1200%cpu on gfs02a for glusterfsd. Crawl has been
>>> running on one of the bricks on gfs02b for 25 min or so and users cannot
>>> access the volume.
>>>
>>> I re-listed the xattrop directories as well as a 'top' entry and heal
>>> statistics. Then I restarted the gluster services on gfs02a.
>>>
>>> === top ===
>>> PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
>>> COMMAND
>>>  8969 root  20   0 2815m 204m 3588 S 1181.0  0.6 591:06.93
>>> glusterfsd
>>>
>>> === xattrop ===
>>> /data/brick01a/homegfs/.glusterfs/indices/xattrop:
>>> xattrop-41f19453-91e4-437c-afa9-3b25614de210
>>> xattrop-9b815879-2f4d-402b-867c-a6d65087788c
>>>
>>> /data/brick02a/homegfs/.glusterfs/indices/xattrop:
>>> xattrop-70131855-3cfb-49af-abce-9d23f57fb393
>>> xattrop-dfb77848-a39d-4417-a725-9beca75d78c6
>>>
>>> /data/brick01b/homegfs/.glusterfs/indices/xattrop:
>>> e6e47ed9-309b-42a7-8c44-28c29b9a20f8
>>> xattrop-5c797a64-bde7-4eac-b4fc-0befc632e125
>>> xattrop-38ec65a1-00b5-4544-8a6c-bf0f531a1934
>>> xattrop-ef0980ad-f074-4163-979f-16d5ef85b0a0
>>>
>>> /data/brick02b/homegfs/.glusterfs/indices/xattrop:
>>> xattrop-7402438d-0ee7-4fcf-b9bb-b561236f99bc
>>> xattrop-8ffbf5f7-ace3-497d-944e-93ac85241413
>>>
>>> /data/brick01a/homegfs/.glusterfs/indices/xattrop:
>>> xattrop-0115acd0-caae-4dfd-b3b4-7cc42a0ff531
>>>
>>> /data/brick02a/homegfs/.glusterfs/indices/xattrop:
>>> xattrop-7e20fdb1-5224-4b9a-be06-568708526d70
>>>
>>> /data/brick01b/homegfs/.glusterfs/indices/xattrop:
>>> 8034bc06-92cd-4fa5-8aaf-09039e79d2c8
>>> c9ce22ed-6d8b-471b-a111-b39e57f0b512
>>> 94fa1d60-45ad-4341-b69c-315936b51e8d
>>> xattrop-9c04623a-64ce-4f66-8b23-dbaba49119c7
>>>
>>> /data/brick02b/homegfs/.glusterfs/indices/xattrop:
>>> xattrop-b8c8f024-d038-49a2-9a53-c54ead09111d
>>>
>>>
>>> === heal stats ===
>>>
>>> homegfs [b0-gfsib01a] : Starting time of crawl   : Thu Jan 21
>>> 12:36:45 2016
>&g

Re: [Gluster-users] [Gluster-devel] heal hanging

2016-01-21 Thread Glomski, Patrick
We use the samba glusterfs virtual filesystem (the current version provided
on download.gluster.org), but no windows clients connecting directly.

On Thu, Jan 21, 2016 at 8:37 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

> Do you have any windows clients? I see a lot of getxattr calls for
> "glusterfs.get_real_filename" which lead to full readdirs of the
> directories on the brick.
>
> Pranith
>
> On 01/22/2016 12:51 AM, Glomski, Patrick wrote:
>
> Pranith, could this kind of behavior be self-inflicted by us deleting
> files directly from the bricks? We have done that in the past to clean up
> an issues where gluster wouldn't allow us to delete from the mount.
>
> If so, is it feasible to clean them up by running a search on the
> .glusterfs directories directly and removing files with a reference count
> of 1 that are non-zero size (or directly checking the xattrs to be sure
> that it's not a DHT link).
>
> find /data/brick01a/homegfs/.glusterfs -type f -not -empty -links -2 -exec
> rm -f "{}" \;
>
> Is there anything I'm inherently missing with that approach that will
> further corrupt the system?
>
>
> On Thu, Jan 21, 2016 at 1:02 PM, Glomski, Patrick <
> patrick.glom...@corvidtec.com> wrote:
>
>> Load spiked again: ~1200%cpu on gfs02a for glusterfsd. Crawl has been
>> running on one of the bricks on gfs02b for 25 min or so and users cannot
>> access the volume.
>>
>> I re-listed the xattrop directories as well as a 'top' entry and heal
>> statistics. Then I restarted the gluster services on gfs02a.
>>
>> === top ===
>> PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
>> COMMAND
>>  8969 root  20   0 2815m 204m 3588 S 1181.0  0.6 591:06.93
>> glusterfsd
>>
>> === xattrop ===
>> /data/brick01a/homegfs/.glusterfs/indices/xattrop:
>> xattrop-41f19453-91e4-437c-afa9-3b25614de210
>> xattrop-9b815879-2f4d-402b-867c-a6d65087788c
>>
>> /data/brick02a/homegfs/.glusterfs/indices/xattrop:
>> xattrop-70131855-3cfb-49af-abce-9d23f57fb393
>> xattrop-dfb77848-a39d-4417-a725-9beca75d78c6
>>
>> /data/brick01b/homegfs/.glusterfs/indices/xattrop:
>> e6e47ed9-309b-42a7-8c44-28c29b9a20f8
>> xattrop-5c797a64-bde7-4eac-b4fc-0befc632e125
>> xattrop-38ec65a1-00b5-4544-8a6c-bf0f531a1934
>> xattrop-ef0980ad-f074-4163-979f-16d5ef85b0a0
>>
>> /data/brick02b/homegfs/.glusterfs/indices/xattrop:
>> xattrop-7402438d-0ee7-4fcf-b9bb-b561236f99bc
>> xattrop-8ffbf5f7-ace3-497d-944e-93ac85241413
>>
>> /data/brick01a/homegfs/.glusterfs/indices/xattrop:
>> xattrop-0115acd0-caae-4dfd-b3b4-7cc42a0ff531
>>
>> /data/brick02a/homegfs/.glusterfs/indices/xattrop:
>> xattrop-7e20fdb1-5224-4b9a-be06-568708526d70
>>
>> /data/brick01b/homegfs/.glusterfs/indices/xattrop:
>> 8034bc06-92cd-4fa5-8aaf-09039e79d2c8  c9ce22ed-6d8b-471b-a111-b39e57f0b512
>> 94fa1d60-45ad-4341-b69c-315936b51e8d
>> xattrop-9c04623a-64ce-4f66-8b23-dbaba49119c7
>>
>> /data/brick02b/homegfs/.glusterfs/indices/xattrop:
>> xattrop-b8c8f024-d038-49a2-9a53-c54ead09111d
>>
>>
>> === heal stats ===
>>
>> homegfs [b0-gfsib01a] : Starting time of crawl   : Thu Jan 21
>> 12:36:45 2016
>> homegfs [b0-gfsib01a] : Ending time of crawl : Thu Jan 21
>> 12:36:45 2016
>> homegfs [b0-gfsib01a] : Type of crawl: INDEX
>> homegfs [b0-gfsib01a] : No. of entries healed: 0
>> homegfs [b0-gfsib01a] : No. of entries in split-brain: 0
>> homegfs [b0-gfsib01a] : No. of heal failed entries   : 0
>>
>> homegfs [b1-gfsib01b] : Starting time of crawl   : Thu Jan 21
>> 12:36:19 2016
>> homegfs [b1-gfsib01b] : Ending time of crawl : Thu Jan 21
>> 12:36:19 2016
>> homegfs [b1-gfsib01b] : Type of crawl: INDEX
>> homegfs [b1-gfsib01b] : No. of entries healed: 0
>> homegfs [b1-gfsib01b] : No. of entries in split-brain: 0
>> homegfs [b1-gfsib01b] : No. of heal failed entries   : 1
>>
>> homegfs [b2-gfsib01a] : Starting time of crawl   : Thu Jan 21
>> 12:36:48 2016
>> homegfs [b2-gfsib01a] : Ending time of crawl : Thu Jan 21
>> 12:36:48 2016
>> homegfs [b2-gfsib01a] : Type of crawl: INDEX
>> homegfs [b2-gfsib01a] : No. of entries healed: 0
>> homegfs [b2-gfsib01a] : No. of entries in split-brain: 0
>> homegfs [b2-gfsib01a] : No. of heal failed entries   : 0
>>
>> homegfs [b3-gfsib01b] : Starting time of crawl   : Thu Jan 21
>> 12:36:47 2016
&

Re: [Gluster-users] [Gluster-devel] heal hanging

2016-01-21 Thread Glomski, Patrick
Pranith, could this kind of behavior be self-inflicted by us deleting files
directly from the bricks? We have done that in the past to clean up an
issues where gluster wouldn't allow us to delete from the mount.

If so, is it feasible to clean them up by running a search on the
.glusterfs directories directly and removing files with a reference count
of 1 that are non-zero size (or directly checking the xattrs to be sure
that it's not a DHT link).

find /data/brick01a/homegfs/.glusterfs -type f -not -empty -links -2 -exec
rm -f "{}" \;

Is there anything I'm inherently missing with that approach that will
further corrupt the system?


On Thu, Jan 21, 2016 at 1:02 PM, Glomski, Patrick <
patrick.glom...@corvidtec.com> wrote:

> Load spiked again: ~1200%cpu on gfs02a for glusterfsd. Crawl has been
> running on one of the bricks on gfs02b for 25 min or so and users cannot
> access the volume.
>
> I re-listed the xattrop directories as well as a 'top' entry and heal
> statistics. Then I restarted the gluster services on gfs02a.
>
> === top ===
> PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
> COMMAND
>  8969 root  20   0 2815m 204m 3588 S 1181.0  0.6 591:06.93
> glusterfsd
>
> === xattrop ===
> /data/brick01a/homegfs/.glusterfs/indices/xattrop:
> xattrop-41f19453-91e4-437c-afa9-3b25614de210
> xattrop-9b815879-2f4d-402b-867c-a6d65087788c
>
> /data/brick02a/homegfs/.glusterfs/indices/xattrop:
> xattrop-70131855-3cfb-49af-abce-9d23f57fb393
> xattrop-dfb77848-a39d-4417-a725-9beca75d78c6
>
> /data/brick01b/homegfs/.glusterfs/indices/xattrop:
> e6e47ed9-309b-42a7-8c44-28c29b9a20f8
> xattrop-5c797a64-bde7-4eac-b4fc-0befc632e125
> xattrop-38ec65a1-00b5-4544-8a6c-bf0f531a1934
> xattrop-ef0980ad-f074-4163-979f-16d5ef85b0a0
>
> /data/brick02b/homegfs/.glusterfs/indices/xattrop:
> xattrop-7402438d-0ee7-4fcf-b9bb-b561236f99bc
> xattrop-8ffbf5f7-ace3-497d-944e-93ac85241413
>
> /data/brick01a/homegfs/.glusterfs/indices/xattrop:
> xattrop-0115acd0-caae-4dfd-b3b4-7cc42a0ff531
>
> /data/brick02a/homegfs/.glusterfs/indices/xattrop:
> xattrop-7e20fdb1-5224-4b9a-be06-568708526d70
>
> /data/brick01b/homegfs/.glusterfs/indices/xattrop:
> 8034bc06-92cd-4fa5-8aaf-09039e79d2c8  c9ce22ed-6d8b-471b-a111-b39e57f0b512
> 94fa1d60-45ad-4341-b69c-315936b51e8d
> xattrop-9c04623a-64ce-4f66-8b23-dbaba49119c7
>
> /data/brick02b/homegfs/.glusterfs/indices/xattrop:
> xattrop-b8c8f024-d038-49a2-9a53-c54ead09111d
>
>
> === heal stats ===
>
> homegfs [b0-gfsib01a] : Starting time of crawl   : Thu Jan 21 12:36:45
> 2016
> homegfs [b0-gfsib01a] : Ending time of crawl : Thu Jan 21 12:36:45
> 2016
> homegfs [b0-gfsib01a] : Type of crawl: INDEX
> homegfs [b0-gfsib01a] : No. of entries healed: 0
> homegfs [b0-gfsib01a] : No. of entries in split-brain: 0
> homegfs [b0-gfsib01a] : No. of heal failed entries   : 0
>
> homegfs [b1-gfsib01b] : Starting time of crawl   : Thu Jan 21 12:36:19
> 2016
> homegfs [b1-gfsib01b] : Ending time of crawl : Thu Jan 21 12:36:19
> 2016
> homegfs [b1-gfsib01b] : Type of crawl: INDEX
> homegfs [b1-gfsib01b] : No. of entries healed: 0
> homegfs [b1-gfsib01b] : No. of entries in split-brain: 0
> homegfs [b1-gfsib01b] : No. of heal failed entries   : 1
>
> homegfs [b2-gfsib01a] : Starting time of crawl   : Thu Jan 21 12:36:48
> 2016
> homegfs [b2-gfsib01a] : Ending time of crawl : Thu Jan 21 12:36:48
> 2016
> homegfs [b2-gfsib01a] : Type of crawl: INDEX
> homegfs [b2-gfsib01a] : No. of entries healed: 0
> homegfs [b2-gfsib01a] : No. of entries in split-brain: 0
> homegfs [b2-gfsib01a] : No. of heal failed entries   : 0
>
> homegfs [b3-gfsib01b] : Starting time of crawl   : Thu Jan 21 12:36:47
> 2016
> homegfs [b3-gfsib01b] : Ending time of crawl : Thu Jan 21 12:36:47
> 2016
> homegfs [b3-gfsib01b] : Type of crawl: INDEX
> homegfs [b3-gfsib01b] : No. of entries healed: 0
> homegfs [b3-gfsib01b] : No. of entries in split-brain: 0
> homegfs [b3-gfsib01b] : No. of heal failed entries   : 0
>
> homegfs [b4-gfsib02a] : Starting time of crawl   : Thu Jan 21 12:36:06
> 2016
> homegfs [b4-gfsib02a] : Ending time of crawl : Thu Jan 21 12:36:06
> 2016
> homegfs [b4-gfsib02a] : Type of crawl: INDEX
> homegfs [b4-gfsib02a] : No. of entries healed: 0
> homegfs [b4-gfsib02a] : No. of entries in split-brain: 0
> homegfs [b4-gfsib02a] : No. of heal failed entries   : 0
>
> homegfs [b5-gfsib02b] : Starting time of crawl   : Thu Jan 21 12:13:40
> 2016
> homegfs [b5-gfsib02b] :*

Re: [Gluster-users] [Gluster-devel] heal hanging

2016-01-21 Thread Glomski, Patrick
I should mention that the problem is not currently occurring and there are
no heals (output appended). By restarting the gluster services, we can stop
the crawl, which lowers the load for a while. Subsequent crawls seem to
finish properly. For what it's worth, files/folders that show up in the
'volume info' output during a hung crawl don't seem to be anything out of
the ordinary.

Over the past four days, the typical time before the problem recurs after
suppressing it in this manner is an hour. Last night when we reached out to
you was the last time it happened and the load has been low since (a
relief).  David believes that recursively listing the files (ls -alR or
similar) from a client mount can force the issue to happen, but obviously
I'd rather not unless we have some precise thing we're looking for. Let me
know if you'd like me to attempt to drive the system unstable like that and
what I should look for. As it's a production system, I'd rather not leave
it in this state for long.

[root@gfs01a xattrop]# gluster volume heal homegfs info
Brick gfs01a.corvidtec.com:/data/brick01a/homegfs/
Number of entries: 0

Brick gfs01b.corvidtec.com:/data/brick01b/homegfs/
Number of entries: 0

Brick gfs01a.corvidtec.com:/data/brick02a/homegfs/
Number of entries: 0

Brick gfs01b.corvidtec.com:/data/brick02b/homegfs/
Number of entries: 0

Brick gfs02a.corvidtec.com:/data/brick01a/homegfs/
Number of entries: 0

Brick gfs02b.corvidtec.com:/data/brick01b/homegfs/
Number of entries: 0

Brick gfs02a.corvidtec.com:/data/brick02a/homegfs/
Number of entries: 0

Brick gfs02b.corvidtec.com:/data/brick02b/homegfs/
Number of entries: 0




On Thu, Jan 21, 2016 at 10:40 AM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

>
>
> On 01/21/2016 08:25 PM, Glomski, Patrick wrote:
>
> Hello, Pranith. The typical behavior is that the %cpu on a glusterfsd
> process jumps to number of processor cores available (800% or 1200%,
> depending on the pair of nodes involved) and the load average on the
> machine goes very high (~20). The volume's heal statistics output shows
> that it is crawling one of the bricks and trying to heal, but this crawl
> hangs and never seems to finish.
>
>
> The number of files in the xattrop directory varies over time, so I ran a
> wc -l as you requested periodically for some time and then started
> including a datestamped list of the files that were in the xattrops
> directory on each brick to see which were persistent. All bricks had files
> in the xattrop folder, so all results are attached.
>
> Thanks this info is helpful. I don't see a lot of files. Could you give
> output of "gluster volume heal  info"? Is there any directory in
> there which is LARGE?
>
> Pranith
>
>
> Please let me know if there is anything else I can provide.
>
> Patrick
>
>
> On Thu, Jan 21, 2016 at 12:01 AM, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>> hey,
>>Which process is consuming so much cpu? I went through the logs
>> you gave me. I see that the following files are in gfid mismatch state:
>>
>> <066e4525-8f8b-43aa-b7a1-86bbcecc68b9/safebrowsing-backup>,
>> <1d48754b-b38c-403d-94e2-0f5c41d5f885/recovery.bak>,
>> ,
>>
>> Could you give me the output of "ls /indices/xattrop | wc -l"
>> output on all the bricks which are acting this way? This will tell us the
>> number of pending self-heals on the system.
>>
>> Pranith
>>
>>
>> On 01/20/2016 09:26 PM, David Robinson wrote:
>>
>> resending with parsed logs...
>>
>>
>>
>>
>>
>> I am having issues with 3.6.6 where the load will spike up to 800% for
>> one of the glusterfsd processes and the users can no longer access the
>> system.  If I reboot the node, the heal will finish normally after a few
>> minutes and the system will be responsive, but a few hours later the issue
>> will start again.  It look like it is hanging in a heal and spinning up the
>> load on one of the bricks.  The heal gets stuck and says it is crawling and
>> never returns.  After a few minutes of the heal saying it is crawling, the
>> load spikes up and the mounts become unresponsive.
>>
>> Any suggestions on how to fix this?  It has us stopped cold as the user
>> can no longer access the systems when the load spikes... Logs attached.
>>
>> System setup info is:
>>
>> [root@gfs01a ~]# gluster volume info homegfs
>>
>> Volume Name: homegfs
>> Type: Distributed-Replicate
>> Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
>> Status: Started
>> Number of Bricks: 4 x 2 = 8
>> Transport-type: tcp
>> Bricks:
>&

Re: [Gluster-users] [Gluster-devel] heal hanging

2016-01-21 Thread Glomski, Patrick
Hello, Pranith. The typical behavior is that the %cpu on a glusterfsd
process jumps to number of processor cores available (800% or 1200%,
depending on the pair of nodes involved) and the load average on the
machine goes very high (~20). The volume's heal statistics output shows
that it is crawling one of the bricks and trying to heal, but this crawl
hangs and never seems to finish.

The number of files in the xattrop directory varies over time, so I ran a
wc -l as you requested periodically for some time and then started
including a datestamped list of the files that were in the xattrops
directory on each brick to see which were persistent. All bricks had files
in the xattrop folder, so all results are attached.

Please let me know if there is anything else I can provide.

Patrick


On Thu, Jan 21, 2016 at 12:01 AM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

> hey,
>Which process is consuming so much cpu? I went through the logs you
> gave me. I see that the following files are in gfid mismatch state:
>
> <066e4525-8f8b-43aa-b7a1-86bbcecc68b9/safebrowsing-backup>,
> <1d48754b-b38c-403d-94e2-0f5c41d5f885/recovery.bak>,
> ,
>
> Could you give me the output of "ls /indices/xattrop | wc -l"
> output on all the bricks which are acting this way? This will tell us the
> number of pending self-heals on the system.
>
> Pranith
>
>
> On 01/20/2016 09:26 PM, David Robinson wrote:
>
> resending with parsed logs...
>
>
>
>
>
> I am having issues with 3.6.6 where the load will spike up to 800% for one
> of the glusterfsd processes and the users can no longer access the system.
> If I reboot the node, the heal will finish normally after a few minutes and
> the system will be responsive, but a few hours later the issue will start
> again.  It look like it is hanging in a heal and spinning up the load on
> one of the bricks.  The heal gets stuck and says it is crawling and never
> returns.  After a few minutes of the heal saying it is crawling, the load
> spikes up and the mounts become unresponsive.
>
> Any suggestions on how to fix this?  It has us stopped cold as the user
> can no longer access the systems when the load spikes... Logs attached.
>
> System setup info is:
>
> [root@gfs01a ~]# gluster volume info homegfs
>
> Volume Name: homegfs
> Type: Distributed-Replicate
> Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071
> Status: Started
> Number of Bricks: 4 x 2 = 8
> Transport-type: tcp
> Bricks:
> Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs
> Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs
> Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs
> Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs
> Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs
> Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs
> Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs
> Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs
> Options Reconfigured:
> performance.io-thread-count: 32
> performance.cache-size: 128MB
> performance.write-behind-window-size: 128MB
> server.allow-insecure: on
> network.ping-timeout: 42
> storage.owner-gid: 100
> geo-replication.indexing: off
> geo-replication.ignore-pid-check: on
> changelog.changelog: off
> changelog.fsync-interval: 3
> changelog.rollover-time: 15
> server.manage-gids: on
> diagnostics.client-log-level: WARNING
>
> [root@gfs01a ~]# rpm -qa | grep gluster
> gluster-nagios-common-0.1.1-0.el6.noarch
> glusterfs-fuse-3.6.6-1.el6.x86_64
> glusterfs-debuginfo-3.6.6-1.el6.x86_64
> glusterfs-libs-3.6.6-1.el6.x86_64
> glusterfs-geo-replication-3.6.6-1.el6.x86_64
> glusterfs-api-3.6.6-1.el6.x86_64
> glusterfs-devel-3.6.6-1.el6.x86_64
> glusterfs-api-devel-3.6.6-1.el6.x86_64
> glusterfs-3.6.6-1.el6.x86_64
> glusterfs-cli-3.6.6-1.el6.x86_64
> glusterfs-rdma-3.6.6-1.el6.x86_64
> samba-vfs-glusterfs-4.1.11-2.el6.x86_64
> glusterfs-server-3.6.6-1.el6.x86_64
> glusterfs-extra-xlators-3.6.6-1.el6.x86_64
>
>
>
>
>
> ___
> Gluster-devel mailing 
> listGluster-devel@gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-devel
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>


gfs01a_data_brick01a_homegfs
Description: Binary data


gfs01a_data_brick02a_homegfs
Description: Binary data


gfs01b_data_brick01b_homegfs
Description: Binary data


gfs01b_data_brick02b_homegfs
Description: Binary data


gfs02a_data_brick01a_homegfs
Description: Binary data


gfs02a_data_brick02a_homegfs
Description: Binary data


gfs02b_data_brick01b_homegfs
Description: Binary data


gfs02b_data_brick02b_homegfs
Description: Binary data
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] glusterfsd crash due to page allocation failure

2015-12-21 Thread Glomski, Patrick
Hello,

We've recently upgraded from gluster 3.6.6 to 3.7.6 and have started
encountering dmesg page allocation errors (stack trace is appended).

It appears that glusterfsd now sometimes fills up the cache completely and
crashes with a page allocation failure. I *believe* it mainly happens when
copying lots of new data to the system, running a 'find', or similar. Hosts
are all Scientific Linux 6.6 and these errors occur consistently on two
separate gluster pools.

Has anyone else seen this issue and are there any known fixes for it via
sysctl kernel parameters or other means?

Please let me know of any other diagnostic information that would help.

Thanks,
Patrick


[1458118.134697] glusterfsd: page allocation failure. order:5, mode:0x20
> [1458118.134701] Pid: 6010, comm: glusterfsd Not tainted
> 2.6.32-573.3.1.el6.x86_64 #1
> [1458118.134702] Call Trace:
> [1458118.134714]  [] ? __alloc_pages_nodemask+0x7dc/0x950
> [1458118.134728]  [] ? mlx4_ib_post_send+0x680/0x1f90
> [mlx4_ib]
> [1458118.134733]  [] ? kmem_getpages+0x62/0x170
> [1458118.134735]  [] ? fallback_alloc+0x1ba/0x270
> [1458118.134736]  [] ? cache_grow+0x2cf/0x320
> [1458118.134738]  [] ? cache_alloc_node+0x99/0x160
> [1458118.134743]  [] ? pskb_expand_head+0x62/0x280
> [1458118.134744]  [] ? __kmalloc+0x199/0x230
> [1458118.134746]  [] ? pskb_expand_head+0x62/0x280
> [1458118.134748]  [] ? __pskb_pull_tail+0x2aa/0x360
> [1458118.134751]  [] ? harmonize_features+0x29/0x70
> [1458118.134753]  [] ? dev_hard_start_xmit+0x1c4/0x490
> [1458118.134758]  [] ? sch_direct_xmit+0x15a/0x1c0
> [1458118.134759]  [] ? dev_queue_xmit+0x228/0x320
> [1458118.134762]  [] ? neigh_connected_output+0xbd/0x100
> [1458118.134766]  [] ? ip_finish_output+0x287/0x360
> [1458118.134767]  [] ? ip_output+0xb8/0xc0
> [1458118.134769]  [] ? __ip_local_out+0x9f/0xb0
> [1458118.134770]  [] ? ip_local_out+0x25/0x30
> [1458118.134772]  [] ? ip_queue_xmit+0x190/0x420
> [1458118.134773]  [] ? __alloc_pages_nodemask+0x129/0x950
> [1458118.134776]  [] ? tcp_transmit_skb+0x4b4/0x8b0
> [1458118.134778]  [] ? tcp_write_xmit+0x1da/0xa90
> [1458118.134779]  [] ? __kmalloc_node+0x4d/0x60
> [1458118.134780]  [] ? tcp_push_one+0x30/0x40
> [1458118.134782]  [] ? tcp_sendmsg+0x9cc/0xa20
> [1458118.134786]  [] ? sock_aio_write+0x19b/0x1c0
> [1458118.134788]  [] ? sock_aio_write+0x0/0x1c0
> [1458118.134791]  [] ? do_sync_readv_writev+0xfb/0x140
> [1458118.134797]  [] ? autoremove_wake_function+0x0/0x40
> [1458118.134801]  [] ? selinux_file_permission+0xbf/0x150
> [1458118.134804]  [] ? security_file_permission+0x16/0x20
> [1458118.134806]  [] ? do_readv_writev+0xd6/0x1f0
> [1458118.134807]  [] ? vfs_writev+0x46/0x60
> [1458118.134809]  [] ? sys_writev+0x51/0xd0
> [1458118.134812]  [] ? __audit_syscall_exit+0x25e/0x290
> [1458118.134816]  [] ? system_call_fastpath+0x16/0x1b
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Gluster v3.7.3 REMOVEXATTR clogs volume logs

2015-08-13 Thread Glomski, Patrick
I am currently testing gluster v3.7.3 on Scientific Linux 7.1 and a newly
created gluster volume. After transferring some files to the volume over
the fuse mount, the volume log is flooded with 2.5GB of errors like the
following:

[2015-08-13 15:54:36.921622] W [fuse-bridge.c:1230:fuse_err_cbk]
0-glusterfs-fuse: 361669: REMOVEXATTR() /path/to/file => -1 (No data
available)

There are several (fixed) redhat bugs relating to similar errors:
https://bugzilla.redhat.com/show_bug.cgi?id=1245966
https://bugzilla.redhat.com/show_bug.cgi?id=1188064
https://bugzilla.redhat.com/show_bug.cgi?id=1192832

- Is anyone else running 3.7 seeing similar errors?
- Is there something wrong with my configuration?
- If it's a problem, what other information do you need to diagnose?

gluster volume info:

Volume Name: testbrick
Type: Distribute
Volume ID: 91b0d825-5e39-4b17-a505-174b47849b40
Status: Started
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: gfstest:/data/brick01/testbrick
Options Reconfigured:
performance.readdir-ahead: on

Thanks,
Patrick
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users