Re: [Gluster-users] Unable to create new files or folders using samba and vfs_glusterfs

2019-01-17 Thread Matt Waymack
I've been using these for a few weeks now without any issues, thank you!

-Original Message-
From: gluster-users-boun...@gluster.org  On 
Behalf Of Matt Waymack
Sent: Thursday, December 27, 2018 10:56 AM
To: Diego Remolina 
Cc: gluster-users@gluster.org List 
Subject: Re: [Gluster-users] Unable to create new files or folders using samba 
and vfs_glusterfs

OK, I'm back from the holiday and updated using the following packages:
libsmbclient-4.8.3-4.el7.0.1.x86_64.rpm  
libwbclient-4.8.3-4.el7.0.1.x86_64.rpm   
samba-4.8.3-4.el7.0.1.x86_64.rpm 
samba-client-4.8.3-4.el7.0.1.x86_64.rpm   
samba-client-libs-4.8.3-4.el7.0.1.x86_64.rpm  
samba-common-4.8.3-4.el7.0.1.noarch.rpm   
samba-common-libs-4.8.3-4.el7.0.1.x86_64.rpm
samba-common-tools-4.8.3-4.el7.0.1.x86_64.rpm
samba-libs-4.8.3-4.el7.0.1.x86_64.rpm
samba-vfs-glusterfs-4.8.3-4.el7.0.1.x86_64.rpm

First impressions are good!  We're able to create files/folders.  I'll keep you 
updated with stability.

Thank you!
-Original Message-
From: Diego Remolina 
Sent: Thursday, December 20, 2018 1:36 PM
To: Matt Waymack 
Cc: gluster-users@gluster.org List 
Subject: Re: [Gluster-users] Unable to create new files or folders using samba 
and vfs_glusterfs

Hi Matt,

The update is slightly different, has the .1 at the end:

Fast-track -> samba-4.8.3-4.el7.0.1.x86_64.rpm

vs general -> samba-4.8.3-4.el7.x86_64

I think these are built, but not pushed to fasttrack repo until they get 
feedback the packages are good. So you may need to use wget to download them 
and update your packages with these for the test.

Diego

On Thu, Dec 20, 2018 at 1:06 PM Matt Waymack  wrote:
>
> Hi all,
>
>
>
> I’m looking to update Samba from fasttrack, but I only still se 4.8.3 and yum 
> is not wanting to update.  The test build is also showing 4.8.3.
>
>
>
> Thank you!
>
>
>
>
>
> From: gluster-users-boun...@gluster.org 
>  On Behalf Of Matt Waymack
> Sent: Sunday, December 16, 2018 1:55 PM
> To: Diego Remolina 
> Cc: gluster-users@gluster.org List 
> Subject: Re: [Gluster-users] Unable to create new files or folders 
> using samba and vfs_glusterfs
>
>
>
> Hi all, sorry for the delayed response.
>
>
>
> I can test this out and will report back.  It may be as late as Tuesday 
> before I can test the build.
>
>
>
> Thank you!
>
>
>
> On Dec 15, 2018 7:46 AM, Diego Remolina  wrote:
>
> Matt,
>
>
>
> Can you test the updated samba packages that the CentOS team has built for 
> FasTrack?
>
>
>
> A NOTE has been added to this issue.
>
> --
>  (0033351) pgreco (developer) - 2018-12-15 13:43
>  https://bugs.centos.org/view.php?id=15586#c33351
> --
> @dijur...@gmail.com
> Here's the link for the test build
> https://buildlogs.centos.org/c7-fasttrack.x86_64/samba/20181214164659/
> 4.8.3-4.el7.0.1.x86_64/
> .
> Please let us know how it goes. Thanks for testing!
> Pablo.
> --
>
> Diego
>
>
>
>
> On Fri, Dec 14, 2018 at 12:52 AM Anoop C S  wrote:
> >
> > On Thu, 2018-12-13 at 15:31 +, Matt Waymack wrote:
> > > Hi all,
> > >
> > > I’m having an issue on Windows clients accessing shares via smb 
> > > when using vfs_glusterfs.  They are unable to create any file or 
> > > folders at the root of the share and get the error “The file is 
> > > too large for the destination file system.”  When I change from 
> > > vfs_glusterfs to just using a filesystem path to the same 
> > > location, it works fine (except for the performance hit).  All my 
> > > searches have led to bug 1619108, and that seems to be the 
> > > symptom, but there doesn’t appear to be any clear resolution.
> >
> > You figured out the right bug and following is the upstream Samba bug:
> >
> > https://bugzilla.samba.org/show_bug.cgi?id=13585
> >
> > Unfortunately it is only available with v4.8.6 and higher. If 
> > required I can patch it up and provide a build.
> >
> > > I’m on the latest version of samba available on CentOS 7 (4.8.3) 
> > > and I’m on the latest available glusterfs 4.1 (4.1.6).  Is there 
> > > something simple I’m missing to get this going?
> > >
> > > Thank you!
> > > ___
> > > Gluster-users mailing list
> > > Gluster-users@gluster.org
> > > https://lists.gluster.org/mailman/listinfo/gluster-users
> >
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Input/output error on FUSE log

2019-01-09 Thread Matt Waymack
Has anyone any other ideas where to look?  This is only affecting FUSE clients. 
 SMB clients are unaffected by this problem.

Thanks!

From: gluster-users-boun...@gluster.org  On 
Behalf Of Matt Waymack
Sent: Monday, January 7, 2019 1:19 PM
To: Raghavendra Gowdappa 
Cc: gluster-users@gluster.org List 
Subject: Re: [Gluster-users] Input/output error on FUSE log

Attached are the logs from when a failure occurred with diagnostics set to 
trace.

Thank you!

From: Raghavendra Gowdappa mailto:rgowd...@redhat.com>>
Sent: Saturday, January 5, 2019 8:32 PM
To: Matt Waymack mailto:mwaym...@nsgdv.com>>
Cc: gluster-users@gluster.org<mailto:gluster-users@gluster.org> List 
mailto:gluster-users@gluster.org>>
Subject: Re: [Gluster-users] Input/output error on FUSE log



On Sun, Jan 6, 2019 at 7:58 AM Raghavendra Gowdappa 
mailto:rgowd...@redhat.com>> wrote:


On Sun, Jan 6, 2019 at 4:19 AM Matt Waymack 
mailto:mwaym...@nsgdv.com>> wrote:

Hi all,



I'm having a problem writing to our volume.  When writing files larger than 
about 2GB, I get an intermittent issue where the write will fail and return 
Input/Output error.  This is also shown in the FUSE log of the client (this is 
affecting all clients).  A snip of a client log is below:

[2019-01-05 22:39:44.581371] W [fuse-bridge.c:2474:fuse_writev_cbk] 
0-glusterfs-fuse: 51040978: WRITE => -1 
gfid=82a0b5c4-7ef3-43c2-ad86-41e16673d7c2 fd=0x7f949839a368 (Input/output error)

[2019-01-05 22:39:44.598392] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 51040979: FLUSH() ERR => -1 (Input/output error)

[2019-01-05 22:39:47.420920] W [fuse-bridge.c:2474:fuse_writev_cbk] 
0-glusterfs-fuse: 51041266: WRITE => -1 
gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949809b7f8 (Input/output error)

[2019-01-05 22:39:47.433377] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 51041267: FLUSH() ERR => -1 (Input/output error)

[2019-01-05 22:39:50.441531] W [fuse-bridge.c:2474:fuse_writev_cbk] 
0-glusterfs-fuse: 51041548: WRITE => -1 
gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949839a368 (Input/output error)

[2019-01-05 22:39:50.451914] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 51041549: FLUSH() ERR => -1 (Input/output error)

The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: 
no subvolume for hash (value) = 1311504267" repeated 1721 times between 
[2019-01-05 22:39:33.906241] and [2019-01-05 22:39:44.598371]

The message "E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk] 
0-gv1-dht: dict is null" repeated 1714 times between [2019-01-05 
22:39:33.925981] and [2019-01-05 22:39:50.451862]

The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: 
no subvolume for hash (value) = 1137142622" repeated 1707 times between 
[2019-01-05 22:39:39.636552] and [2019-01-05 22:39:50.451895]

This looks to be a DHT issue. Some questions:
* Are all subvolumes of DHT up and client is connected to them? Particularly 
the subvolume which contains the file in question.
* Can you get all extended attributes of parent directory of the file from all 
bricks?
* set diagnostics.client-log-level to TRACE, capture these errors again and 
attach the client log file.

I spoke a bit early. dht_writev doesn't search hashed subvolume as its already 
been looked up in lookup. So, these msgs looks to be of a different issue - not 
 writev failure.


This is intermittent for most files, but eventually if a file is large enough 
it will not write.  The workflow is SFTP tot he client which then writes to the 
volume over FUSE.  When files get to a certain point,w e can no longer write to 
them.  The file sizes are different as well, so it's not like they all get to 
the same size and just stop either.  I've ruled out a free space issue, our 
files at their largest are only a few hundred GB and we have tens of terrabytes 
free on each brick.  We are also sharding at 1GB.

I'm not sure where to go from here as the error seems vague and I can only see 
it on the client log.  I'm not seeing these errors on the nodes themselves.  
This is also seen if I mount the volume via FUSE on any of the nodes as well 
and it is only reflected in the FUSE log.

Here is the volume info:
Volume Name: gv1
Type: Distributed-Replicate
Volume ID: 1472cc78-e2a0-4c3f-9571-dab840239b3c
Status: Started
Snapshot Count: 0
Number of Bricks: 8 x (2 + 1) = 24
Transport-type: tcp
Bricks:
Brick1: tpc-glus4:/exp/b1/gv1
Brick2: tpc-glus2:/exp/b1/gv1
Brick3: tpc-arbiter1:/exp/b1/gv1 (arbiter)
Brick4: tpc-glus2:/exp/b2/gv1
Brick5: tpc-glus4:/exp/b2/gv1
Brick6: tpc-arbiter1:/exp/b2/gv1 (arbiter)
Brick7: tpc-glus4:/exp/b3/gv1
Brick8: tpc-glus2:/exp/b3/gv1
Brick9: tpc-arbiter1:/exp/b3/gv1 (arbiter)
Brick10: tpc-glus4:/exp/b4/gv1
Brick11: tpc-glus2:/exp/b4/gv1
Brick12: tpc-arbiter1:/exp/b4/gv1 (arbiter)
Brick13: tpc-glus1:/exp/b5/gv1
Brick14: tpc-glus3:/ex

Re: [Gluster-users] [External] Re: Input/output error on FUSE log

2019-01-07 Thread Matt Waymack
Yep, first unmount/remounted, then rebooted clients.  Stopped/started the 
volumes, and rebooted all nodes.

From: Davide Obbi 
Sent: Monday, January 7, 2019 12:47 PM
To: Matt Waymack 
Cc: Raghavendra Gowdappa ; gluster-users@gluster.org List 

Subject: Re: [External] Re: [Gluster-users] Input/output error on FUSE log

i guess you tried already unmounting, stop/star and mounting?

On Mon, Jan 7, 2019 at 7:44 PM Matt Waymack 
mailto:mwaym...@nsgdv.com>> wrote:
Yes, all volumes use sharding.

From: Davide Obbi mailto:davide.o...@booking.com>>
Sent: Monday, January 7, 2019 12:43 PM
To: Matt Waymack mailto:mwaym...@nsgdv.com>>
Cc: Raghavendra Gowdappa mailto:rgowd...@redhat.com>>; 
gluster-users@gluster.org<mailto:gluster-users@gluster.org> List 
mailto:gluster-users@gluster.org>>
Subject: Re: [External] Re: [Gluster-users] Input/output error on FUSE log

are all the volumes being configured with sharding?

On Mon, Jan 7, 2019 at 5:35 PM Matt Waymack 
mailto:mwaym...@nsgdv.com>> wrote:
I think that I can rule out network as I have multiple volumes on the same 
nodes and not all volumes are affected.  Additionally, access via SMB using 
samba-vfs-glusterfs is not affected, even on the same volumes.   This is 
seemingly only affecting the FUSE clients.

From: Davide Obbi mailto:davide.o...@booking.com>>
Sent: Sunday, January 6, 2019 12:26 PM
To: Raghavendra Gowdappa mailto:rgowd...@redhat.com>>
Cc: Matt Waymack mailto:mwaym...@nsgdv.com>>; 
gluster-users@gluster.org<mailto:gluster-users@gluster.org> List 
mailto:gluster-users@gluster.org>>
Subject: Re: [External] Re: [Gluster-users] Input/output error on FUSE log

Hi,

i would start doing some checks like: "(Input/output error)" seems returned by 
the operating system, this happens for instance trying to access a file system 
which is on a device not available so i would check the network connectivity 
between the client to servers  and server to server during the reported time.

Regards
Davide

On Sun, Jan 6, 2019 at 3:32 AM Raghavendra Gowdappa 
mailto:rgowd...@redhat.com>> wrote:


On Sun, Jan 6, 2019 at 7:58 AM Raghavendra Gowdappa 
mailto:rgowd...@redhat.com>> wrote:


On Sun, Jan 6, 2019 at 4:19 AM Matt Waymack 
mailto:mwaym...@nsgdv.com>> wrote:

Hi all,



I'm having a problem writing to our volume.  When writing files larger than 
about 2GB, I get an intermittent issue where the write will fail and return 
Input/Output error.  This is also shown in the FUSE log of the client (this is 
affecting all clients).  A snip of a client log is below:

[2019-01-05 22:39:44.581371] W [fuse-bridge.c:2474:fuse_writev_cbk] 
0-glusterfs-fuse: 51040978: WRITE => -1 
gfid=82a0b5c4-7ef3-43c2-ad86-41e16673d7c2 fd=0x7f949839a368 (Input/output error)

[2019-01-05 22:39:44.598392] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 51040979: FLUSH() ERR => -1 (Input/output error)

[2019-01-05 22:39:47.420920] W [fuse-bridge.c:2474:fuse_writev_cbk] 
0-glusterfs-fuse: 51041266: WRITE => -1 
gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949809b7f8 (Input/output error)

[2019-01-05 22:39:47.433377] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 51041267: FLUSH() ERR => -1 (Input/output error)

[2019-01-05 22:39:50.441531] W [fuse-bridge.c:2474:fuse_writev_cbk] 
0-glusterfs-fuse: 51041548: WRITE => -1 
gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949839a368 (Input/output error)

[2019-01-05 22:39:50.451914] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 51041549: FLUSH() ERR => -1 (Input/output error)

The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: 
no subvolume for hash (value) = 1311504267" repeated 1721 times between 
[2019-01-05 22:39:33.906241] and [2019-01-05 22:39:44.598371]

The message "E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk] 
0-gv1-dht: dict is null" repeated 1714 times between [2019-01-05 
22:39:33.925981] and [2019-01-05 22:39:50.451862]

The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: 
no subvolume for hash (value) = 1137142622" repeated 1707 times between 
[2019-01-05 22:39:39.636552] and [2019-01-05 22:39:50.451895]

This looks to be a DHT issue. Some questions:
* Are all subvolumes of DHT up and client is connected to them? Particularly 
the subvolume which contains the file in question.
* Can you get all extended attributes of parent directory of the file from all 
bricks?
* set diagnostics.client-log-level to TRACE, capture these errors again and 
attach the client log file.

I spoke a bit early. dht_writev doesn't search hashed subvolume as its already 
been looked up in lookup. So, these msgs looks to be of a different issue - not 
 writev failure.


This is intermittent for most files, but eventually if a file is large enough 
it will not write.  The workflow is SFTP tot he client which then writ

Re: [Gluster-users] [External] Re: Input/output error on FUSE log

2019-01-07 Thread Matt Waymack
Yes, all volumes use sharding.

From: Davide Obbi 
Sent: Monday, January 7, 2019 12:43 PM
To: Matt Waymack 
Cc: Raghavendra Gowdappa ; gluster-users@gluster.org List 

Subject: Re: [External] Re: [Gluster-users] Input/output error on FUSE log

are all the volumes being configured with sharding?

On Mon, Jan 7, 2019 at 5:35 PM Matt Waymack 
mailto:mwaym...@nsgdv.com>> wrote:
I think that I can rule out network as I have multiple volumes on the same 
nodes and not all volumes are affected.  Additionally, access via SMB using 
samba-vfs-glusterfs is not affected, even on the same volumes.   This is 
seemingly only affecting the FUSE clients.

From: Davide Obbi mailto:davide.o...@booking.com>>
Sent: Sunday, January 6, 2019 12:26 PM
To: Raghavendra Gowdappa mailto:rgowd...@redhat.com>>
Cc: Matt Waymack mailto:mwaym...@nsgdv.com>>; 
gluster-users@gluster.org<mailto:gluster-users@gluster.org> List 
mailto:gluster-users@gluster.org>>
Subject: Re: [External] Re: [Gluster-users] Input/output error on FUSE log

Hi,

i would start doing some checks like: "(Input/output error)" seems returned by 
the operating system, this happens for instance trying to access a file system 
which is on a device not available so i would check the network connectivity 
between the client to servers  and server to server during the reported time.

Regards
Davide

On Sun, Jan 6, 2019 at 3:32 AM Raghavendra Gowdappa 
mailto:rgowd...@redhat.com>> wrote:


On Sun, Jan 6, 2019 at 7:58 AM Raghavendra Gowdappa 
mailto:rgowd...@redhat.com>> wrote:


On Sun, Jan 6, 2019 at 4:19 AM Matt Waymack 
mailto:mwaym...@nsgdv.com>> wrote:

Hi all,



I'm having a problem writing to our volume.  When writing files larger than 
about 2GB, I get an intermittent issue where the write will fail and return 
Input/Output error.  This is also shown in the FUSE log of the client (this is 
affecting all clients).  A snip of a client log is below:

[2019-01-05 22:39:44.581371] W [fuse-bridge.c:2474:fuse_writev_cbk] 
0-glusterfs-fuse: 51040978: WRITE => -1 
gfid=82a0b5c4-7ef3-43c2-ad86-41e16673d7c2 fd=0x7f949839a368 (Input/output error)

[2019-01-05 22:39:44.598392] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 51040979: FLUSH() ERR => -1 (Input/output error)

[2019-01-05 22:39:47.420920] W [fuse-bridge.c:2474:fuse_writev_cbk] 
0-glusterfs-fuse: 51041266: WRITE => -1 
gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949809b7f8 (Input/output error)

[2019-01-05 22:39:47.433377] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 51041267: FLUSH() ERR => -1 (Input/output error)

[2019-01-05 22:39:50.441531] W [fuse-bridge.c:2474:fuse_writev_cbk] 
0-glusterfs-fuse: 51041548: WRITE => -1 
gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949839a368 (Input/output error)

[2019-01-05 22:39:50.451914] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 51041549: FLUSH() ERR => -1 (Input/output error)

The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: 
no subvolume for hash (value) = 1311504267" repeated 1721 times between 
[2019-01-05 22:39:33.906241] and [2019-01-05 22:39:44.598371]

The message "E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk] 
0-gv1-dht: dict is null" repeated 1714 times between [2019-01-05 
22:39:33.925981] and [2019-01-05 22:39:50.451862]

The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: 
no subvolume for hash (value) = 1137142622" repeated 1707 times between 
[2019-01-05 22:39:39.636552] and [2019-01-05 22:39:50.451895]

This looks to be a DHT issue. Some questions:
* Are all subvolumes of DHT up and client is connected to them? Particularly 
the subvolume which contains the file in question.
* Can you get all extended attributes of parent directory of the file from all 
bricks?
* set diagnostics.client-log-level to TRACE, capture these errors again and 
attach the client log file.

I spoke a bit early. dht_writev doesn't search hashed subvolume as its already 
been looked up in lookup. So, these msgs looks to be of a different issue - not 
 writev failure.


This is intermittent for most files, but eventually if a file is large enough 
it will not write.  The workflow is SFTP tot he client which then writes to the 
volume over FUSE.  When files get to a certain point,w e can no longer write to 
them.  The file sizes are different as well, so it's not like they all get to 
the same size and just stop either.  I've ruled out a free space issue, our 
files at their largest are only a few hundred GB and we have tens of terrabytes 
free on each brick.  We are also sharding at 1GB.

I'm not sure where to go from here as the error seems vague and I can only see 
it on the client log.  I'm not seeing these errors on the nodes themselves.  
This is also seen if I mount the volume via FUSE on any of the nodes as well 
and it is only ref

Re: [Gluster-users] [External] Re: Input/output error on FUSE log

2019-01-07 Thread Matt Waymack
I think that I can rule out network as I have multiple volumes on the same 
nodes and not all volumes are affected.  Additionally, access via SMB using 
samba-vfs-glusterfs is not affected, even on the same volumes.   This is 
seemingly only affecting the FUSE clients.

From: Davide Obbi 
Sent: Sunday, January 6, 2019 12:26 PM
To: Raghavendra Gowdappa 
Cc: Matt Waymack ; gluster-users@gluster.org List 

Subject: Re: [External] Re: [Gluster-users] Input/output error on FUSE log

Hi,

i would start doing some checks like: "(Input/output error)" seems returned by 
the operating system, this happens for instance trying to access a file system 
which is on a device not available so i would check the network connectivity 
between the client to servers  and server to server during the reported time.

Regards
Davide

On Sun, Jan 6, 2019 at 3:32 AM Raghavendra Gowdappa 
mailto:rgowd...@redhat.com>> wrote:


On Sun, Jan 6, 2019 at 7:58 AM Raghavendra Gowdappa 
mailto:rgowd...@redhat.com>> wrote:


On Sun, Jan 6, 2019 at 4:19 AM Matt Waymack 
mailto:mwaym...@nsgdv.com>> wrote:

Hi all,



I'm having a problem writing to our volume.  When writing files larger than 
about 2GB, I get an intermittent issue where the write will fail and return 
Input/Output error.  This is also shown in the FUSE log of the client (this is 
affecting all clients).  A snip of a client log is below:

[2019-01-05 22:39:44.581371] W [fuse-bridge.c:2474:fuse_writev_cbk] 
0-glusterfs-fuse: 51040978: WRITE => -1 
gfid=82a0b5c4-7ef3-43c2-ad86-41e16673d7c2 fd=0x7f949839a368 (Input/output error)

[2019-01-05 22:39:44.598392] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 51040979: FLUSH() ERR => -1 (Input/output error)

[2019-01-05 22:39:47.420920] W [fuse-bridge.c:2474:fuse_writev_cbk] 
0-glusterfs-fuse: 51041266: WRITE => -1 
gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949809b7f8 (Input/output error)

[2019-01-05 22:39:47.433377] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 51041267: FLUSH() ERR => -1 (Input/output error)

[2019-01-05 22:39:50.441531] W [fuse-bridge.c:2474:fuse_writev_cbk] 
0-glusterfs-fuse: 51041548: WRITE => -1 
gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949839a368 (Input/output error)

[2019-01-05 22:39:50.451914] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 51041549: FLUSH() ERR => -1 (Input/output error)

The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: 
no subvolume for hash (value) = 1311504267" repeated 1721 times between 
[2019-01-05 22:39:33.906241] and [2019-01-05 22:39:44.598371]

The message "E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk] 
0-gv1-dht: dict is null" repeated 1714 times between [2019-01-05 
22:39:33.925981] and [2019-01-05 22:39:50.451862]

The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: 
no subvolume for hash (value) = 1137142622" repeated 1707 times between 
[2019-01-05 22:39:39.636552] and [2019-01-05 22:39:50.451895]

This looks to be a DHT issue. Some questions:
* Are all subvolumes of DHT up and client is connected to them? Particularly 
the subvolume which contains the file in question.
* Can you get all extended attributes of parent directory of the file from all 
bricks?
* set diagnostics.client-log-level to TRACE, capture these errors again and 
attach the client log file.

I spoke a bit early. dht_writev doesn't search hashed subvolume as its already 
been looked up in lookup. So, these msgs looks to be of a different issue - not 
 writev failure.


This is intermittent for most files, but eventually if a file is large enough 
it will not write.  The workflow is SFTP tot he client which then writes to the 
volume over FUSE.  When files get to a certain point,w e can no longer write to 
them.  The file sizes are different as well, so it's not like they all get to 
the same size and just stop either.  I've ruled out a free space issue, our 
files at their largest are only a few hundred GB and we have tens of terrabytes 
free on each brick.  We are also sharding at 1GB.

I'm not sure where to go from here as the error seems vague and I can only see 
it on the client log.  I'm not seeing these errors on the nodes themselves.  
This is also seen if I mount the volume via FUSE on any of the nodes as well 
and it is only reflected in the FUSE log.

Here is the volume info:
Volume Name: gv1
Type: Distributed-Replicate
Volume ID: 1472cc78-e2a0-4c3f-9571-dab840239b3c
Status: Started
Snapshot Count: 0
Number of Bricks: 8 x (2 + 1) = 24
Transport-type: tcp
Bricks:
Brick1: tpc-glus4:/exp/b1/gv1
Brick2: tpc-glus2:/exp/b1/gv1
Brick3: tpc-arbiter1:/exp/b1/gv1 (arbiter)
Brick4: tpc-glus2:/exp/b2/gv1
Brick5: tpc-glus4:/exp/b2/gv1
Brick6: tpc-arbiter1:/exp/b2/gv1 (arbiter)
Brick7: tpc-glus4:/exp/b3/gv1
Brick8: tpc-glus2:/exp/b3/gv1
Brick9: tpc-arbiter1:/exp/b3/gv1 (arbiter)
Brick10: tpc-glus4:/exp

[Gluster-users] Input/output error on FUSE log

2019-01-05 Thread Matt Waymack
Hi all,


I'm having a problem writing to our volume.  When writing files larger than 
about 2GB, I get an intermittent issue where the write will fail and return 
Input/Output error.  This is also shown in the FUSE log of the client (this is 
affecting all clients).  A snip of a client log is below:

[2019-01-05 22:39:44.581371] W [fuse-bridge.c:2474:fuse_writev_cbk] 
0-glusterfs-fuse: 51040978: WRITE => -1 
gfid=82a0b5c4-7ef3-43c2-ad86-41e16673d7c2 fd=0x7f949839a368 (Input/output error)

[2019-01-05 22:39:44.598392] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 51040979: FLUSH() ERR => -1 (Input/output error)

[2019-01-05 22:39:47.420920] W [fuse-bridge.c:2474:fuse_writev_cbk] 
0-glusterfs-fuse: 51041266: WRITE => -1 
gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949809b7f8 (Input/output error)

[2019-01-05 22:39:47.433377] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 51041267: FLUSH() ERR => -1 (Input/output error)

[2019-01-05 22:39:50.441531] W [fuse-bridge.c:2474:fuse_writev_cbk] 
0-glusterfs-fuse: 51041548: WRITE => -1 
gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949839a368 (Input/output error)

[2019-01-05 22:39:50.451914] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 51041549: FLUSH() ERR => -1 (Input/output error)

The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: 
no subvolume for hash (value) = 1311504267" repeated 1721 times between 
[2019-01-05 22:39:33.906241] and [2019-01-05 22:39:44.598371]

The message "E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk] 
0-gv1-dht: dict is null" repeated 1714 times between [2019-01-05 
22:39:33.925981] and [2019-01-05 22:39:50.451862]

The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: 
no subvolume for hash (value) = 1137142622" repeated 1707 times between 
[2019-01-05 22:39:39.636552] and [2019-01-05 22:39:50.451895]

This is intermittent for most files, but eventually if a file is large enough 
it will not write.  The workflow is SFTP tot he client which then writes to the 
volume over FUSE.  When files get to a certain point,w e can no longer write to 
them.  The file sizes are different as well, so it's not like they all get to 
the same size and just stop either.  I've ruled out a free space issue, our 
files at their largest are only a few hundred GB and we have tens of terrabytes 
free on each brick.  We are also sharding at 1GB.

I'm not sure where to go from here as the error seems vague and I can only see 
it on the client log.  I'm not seeing these errors on the nodes themselves.  
This is also seen if I mount the volume via FUSE on any of the nodes as well 
and it is only reflected in the FUSE log.

Here is the volume info:
Volume Name: gv1
Type: Distributed-Replicate
Volume ID: 1472cc78-e2a0-4c3f-9571-dab840239b3c
Status: Started
Snapshot Count: 0
Number of Bricks: 8 x (2 + 1) = 24
Transport-type: tcp
Bricks:
Brick1: tpc-glus4:/exp/b1/gv1
Brick2: tpc-glus2:/exp/b1/gv1
Brick3: tpc-arbiter1:/exp/b1/gv1 (arbiter)
Brick4: tpc-glus2:/exp/b2/gv1
Brick5: tpc-glus4:/exp/b2/gv1
Brick6: tpc-arbiter1:/exp/b2/gv1 (arbiter)
Brick7: tpc-glus4:/exp/b3/gv1
Brick8: tpc-glus2:/exp/b3/gv1
Brick9: tpc-arbiter1:/exp/b3/gv1 (arbiter)
Brick10: tpc-glus4:/exp/b4/gv1
Brick11: tpc-glus2:/exp/b4/gv1
Brick12: tpc-arbiter1:/exp/b4/gv1 (arbiter)
Brick13: tpc-glus1:/exp/b5/gv1
Brick14: tpc-glus3:/exp/b5/gv1
Brick15: tpc-arbiter2:/exp/b5/gv1 (arbiter)
Brick16: tpc-glus1:/exp/b6/gv1
Brick17: tpc-glus3:/exp/b6/gv1
Brick18: tpc-arbiter2:/exp/b6/gv1 (arbiter)
Brick19: tpc-glus1:/exp/b7/gv1
Brick20: tpc-glus3:/exp/b7/gv1
Brick21: tpc-arbiter2:/exp/b7/gv1 (arbiter)
Brick22: tpc-glus1:/exp/b8/gv1
Brick23: tpc-glus3:/exp/b8/gv1
Brick24: tpc-arbiter2:/exp/b8/gv1 (arbiter)
Options Reconfigured:
performance.cache-samba-metadata: on
performance.cache-invalidation: off
features.shard-block-size: 1000MB
features.shard: on
transport.address-family: inet
nfs.disable: on
cluster.lookup-optimize: on

I'm a bit stumped on this, any help is appreciated.  Thank you!

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Unable to create new files or folders using samba and vfs_glusterfs

2018-12-27 Thread Matt Waymack
OK, I'm back from the holiday and updated using the following packages:
libsmbclient-4.8.3-4.el7.0.1.x86_64.rpm  
libwbclient-4.8.3-4.el7.0.1.x86_64.rpm   
samba-4.8.3-4.el7.0.1.x86_64.rpm 
samba-client-4.8.3-4.el7.0.1.x86_64.rpm   
samba-client-libs-4.8.3-4.el7.0.1.x86_64.rpm  
samba-common-4.8.3-4.el7.0.1.noarch.rpm   
samba-common-libs-4.8.3-4.el7.0.1.x86_64.rpm
samba-common-tools-4.8.3-4.el7.0.1.x86_64.rpm
samba-libs-4.8.3-4.el7.0.1.x86_64.rpm
samba-vfs-glusterfs-4.8.3-4.el7.0.1.x86_64.rpm

First impressions are good!  We're able to create files/folders.  I'll keep you 
updated with stability.

Thank you!
-Original Message-
From: Diego Remolina  
Sent: Thursday, December 20, 2018 1:36 PM
To: Matt Waymack 
Cc: gluster-users@gluster.org List 
Subject: Re: [Gluster-users] Unable to create new files or folders using samba 
and vfs_glusterfs

Hi Matt,

The update is slightly different, has the .1 at the end:

Fast-track -> samba-4.8.3-4.el7.0.1.x86_64.rpm

vs general -> samba-4.8.3-4.el7.x86_64

I think these are built, but not pushed to fasttrack repo until they get 
feedback the packages are good. So you may need to use wget to download them 
and update your packages with these for the test.

Diego

On Thu, Dec 20, 2018 at 1:06 PM Matt Waymack  wrote:
>
> Hi all,
>
>
>
> I’m looking to update Samba from fasttrack, but I only still se 4.8.3 and yum 
> is not wanting to update.  The test build is also showing 4.8.3.
>
>
>
> Thank you!
>
>
>
>
>
> From: gluster-users-boun...@gluster.org 
>  On Behalf Of Matt Waymack
> Sent: Sunday, December 16, 2018 1:55 PM
> To: Diego Remolina 
> Cc: gluster-users@gluster.org List 
> Subject: Re: [Gluster-users] Unable to create new files or folders 
> using samba and vfs_glusterfs
>
>
>
> Hi all, sorry for the delayed response.
>
>
>
> I can test this out and will report back.  It may be as late as Tuesday 
> before I can test the build.
>
>
>
> Thank you!
>
>
>
> On Dec 15, 2018 7:46 AM, Diego Remolina  wrote:
>
> Matt,
>
>
>
> Can you test the updated samba packages that the CentOS team has built for 
> FasTrack?
>
>
>
> A NOTE has been added to this issue.
>
> --
>  (0033351) pgreco (developer) - 2018-12-15 13:43
>  https://bugs.centos.org/view.php?id=15586#c33351
> --
> @dijur...@gmail.com
> Here's the link for the test build
> https://buildlogs.centos.org/c7-fasttrack.x86_64/samba/20181214164659/
> 4.8.3-4.el7.0.1.x86_64/
> .
> Please let us know how it goes. Thanks for testing!
> Pablo.
> --
>
> Diego
>
>
>
>
> On Fri, Dec 14, 2018 at 12:52 AM Anoop C S  wrote:
> >
> > On Thu, 2018-12-13 at 15:31 +, Matt Waymack wrote:
> > > Hi all,
> > >
> > > I’m having an issue on Windows clients accessing shares via smb 
> > > when using vfs_glusterfs.  They are unable to create any file or 
> > > folders at the root of the share and get the error “The file is 
> > > too large for the destination file system.”  When I change from 
> > > vfs_glusterfs to just using a filesystem path to the same 
> > > location, it works fine (except for the performance hit).  All my 
> > > searches have led to bug 1619108, and that seems to be the 
> > > symptom, but there doesn’t appear to be any clear resolution.
> >
> > You figured out the right bug and following is the upstream Samba bug:
> >
> > https://bugzilla.samba.org/show_bug.cgi?id=13585
> >
> > Unfortunately it is only available with v4.8.6 and higher. If 
> > required I can patch it up and provide a build.
> >
> > > I’m on the latest version of samba available on CentOS 7 (4.8.3) 
> > > and I’m on the latest available glusterfs 4.1 (4.1.6).  Is there 
> > > something simple I’m missing to get this going?
> > >
> > > Thank you!
> > > ___
> > > Gluster-users mailing list
> > > Gluster-users@gluster.org
> > > https://lists.gluster.org/mailman/listinfo/gluster-users
> >
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Unable to create new files or folders using samba and vfs_glusterfs

2018-12-20 Thread Matt Waymack
Hi all,

I'm looking to update Samba from fasttrack, but I only still se 4.8.3 and yum 
is not wanting to update.  The test build is also showing 4.8.3.

Thank you!


From: gluster-users-boun...@gluster.org  On 
Behalf Of Matt Waymack
Sent: Sunday, December 16, 2018 1:55 PM
To: Diego Remolina 
Cc: gluster-users@gluster.org List 
Subject: Re: [Gluster-users] Unable to create new files or folders using samba 
and vfs_glusterfs

Hi all, sorry for the delayed response.

I can test this out and will report back.  It may be as late as Tuesday before 
I can test the build.

Thank you!

On Dec 15, 2018 7:46 AM, Diego Remolina 
mailto:dijur...@gmail.com>> wrote:
Matt,

Can you test the updated samba packages that the CentOS team has built for 
FasTrack?

A NOTE has been added to this issue.

--
 (0033351) pgreco (developer) - 2018-12-15 13:43
 https://bugs.centos.org/view.php?id=15586#c33351
--
@dijur...@gmail.com<mailto:dijur...@gmail.com>
Here's the link for the test build
https://buildlogs.centos.org/c7-fasttrack.x86_64/samba/20181214164659/4.8.3-4.el7.0.1.x86_64/
.
Please let us know how it goes. Thanks for testing!
Pablo.
--
Diego



On Fri, Dec 14, 2018 at 12:52 AM Anoop C S 
mailto:anoo...@cryptolab.net>> wrote:
>
> On Thu, 2018-12-13 at 15:31 +, Matt Waymack wrote:
> > Hi all,
> >
> > I'm having an issue on Windows clients accessing shares via smb when
> > using vfs_glusterfs.  They are unable to create any file or folders
> > at the root of the share and get the error "The file is too large for
> > the destination file system."  When I change from vfs_glusterfs to
> > just using a filesystem path to the same location, it works fine
> > (except for the performance hit).  All my searches have led to bug
> > 1619108, and that seems to be the symptom, but there doesn't appear
> > to be any clear resolution.
>
> You figured out the right bug and following is the upstream Samba bug:
>
> https://bugzilla.samba.org/show_bug.cgi?id=13585
>
> Unfortunately it is only available with v4.8.6 and higher. If required
> I can patch it up and provide a build.
>
> > I'm on the latest version of samba available on CentOS 7 (4.8.3) and
> > I'm on the latest available glusterfs 4.1 (4.1.6).  Is there
> > something simple I'm missing to get this going?
> >
> > Thank you!
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org<mailto:Gluster-users@gluster.org>
> > https://lists.gluster.org/mailman/listinfo/gluster-users
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org<mailto:Gluster-users@gluster.org>
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Unable to create new files or folders using samba and vfs_glusterfs

2018-12-16 Thread Matt Waymack
Hi all, sorry for the delayed response.

I can test this out and will report back.  It may be as late as Tuesday before 
I can test the build.

Thank you!

On Dec 15, 2018 7:46 AM, Diego Remolina  wrote:
Matt,

Can you test the updated samba packages that the CentOS team has built for 
FasTrack?

A NOTE has been added to this issue.

--
 (0033351) pgreco (developer) - 2018-12-15 13:43
 https://bugs.centos.org/view.php?id=15586#c33351
--
@dijur...@gmail.com<mailto:dijur...@gmail.com>
Here's the link for the test build
https://buildlogs.centos.org/c7-fasttrack.x86_64/samba/20181214164659/4.8.3-4.el7.0.1.x86_64/
.
Please let us know how it goes. Thanks for testing!
Pablo.
--

Diego



On Fri, Dec 14, 2018 at 12:52 AM Anoop C S 
mailto:anoo...@cryptolab.net>> wrote:
>
> On Thu, 2018-12-13 at 15:31 +, Matt Waymack wrote:
> > Hi all,
> >
> > I'm having an issue on Windows clients accessing shares via smb when
> > using vfs_glusterfs.  They are unable to create any file or folders
> > at the root of the share and get the error "The file is too large for
> > the destination file system."  When I change from vfs_glusterfs to
> > just using a filesystem path to the same location, it works fine
> > (except for the performance hit).  All my searches have led to bug
> > 1619108, and that seems to be the symptom, but there doesn't appear
> > to be any clear resolution.
>
> You figured out the right bug and following is the upstream Samba bug:
>
> https://bugzilla.samba.org/show_bug.cgi?id=13585
>
> Unfortunately it is only available with v4.8.6 and higher. If required
> I can patch it up and provide a build.
>
> > I'm on the latest version of samba available on CentOS 7 (4.8.3) and
> > I'm on the latest available glusterfs 4.1 (4.1.6).  Is there
> > something simple I'm missing to get this going?
> >
> > Thank you!
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org<mailto:Gluster-users@gluster.org>
> > https://lists.gluster.org/mailman/listinfo/gluster-users
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org<mailto:Gluster-users@gluster.org>
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Unable to create new files or folders using samba and vfs_glusterfs

2018-12-13 Thread Matt Waymack
Hi all,

I'm having an issue on Windows clients accessing shares via smb when using 
vfs_glusterfs.  They are unable to create any file or folders at the root of 
the share and get the error "The file is too large for the destination file 
system."  When I change from vfs_glusterfs to just using a filesystem path to 
the same location, it works fine (except for the performance hit).  All my 
searches have led to bug 1619108, and that seems to be the symptom, but there 
doesn't appear to be any clear resolution.  I'm on the latest version of samba 
available on CentOS 7 (4.8.3) and I'm on the latest available glusterfs 4.1 
(4.1.6).  Is there something simple I'm missing to get this going?

Thank you!
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] How to make sure self-heal backlog is empty ?

2017-12-19 Thread Matt Waymack
Mine also has a list of files that seemingly never heal.  They are usually 
isolated on my arbiter bricks, but not always.  I would also like to find an 
answer for this behavior.

-Original Message-
From: gluster-users-boun...@gluster.org 
[mailto:gluster-users-boun...@gluster.org] On Behalf Of Hoggins!
Sent: Tuesday, December 19, 2017 12:26 PM
To: gluster-users 
Subject: [Gluster-users] How to make sure self-heal backlog is empty ?

Hello list,

I'm not sure what to look for here, not sure if what I'm seeing is the actual 
"backlog" (that we need to make sure is empty while performing a rolling 
upgrade before going to the next node), how can I tell, while reading this, if 
it's okay to reboot / upgrade my next node in the pool ?
Here is what I do for checking :

for i in `gluster volume list`; do gluster volume heal $i info; done

And here is what I get :

Brick ngluster-1.network.hoggins.fr:/export/brick/clem
Status: Connected
Number of entries: 0

Brick ngluster-2.network.hoggins.fr:/export/brick/clem
Status: Connected
Number of entries: 0

Brick ngluster-3.network.hoggins.fr:/export/brick/clem
Status: Connected
Number of entries: 0

Brick ngluster-1.network.hoggins.fr:/export/brick/mailer
Status: Connected
Number of entries: 0

Brick ngluster-2.network.hoggins.fr:/export/brick/mailer
Status: Connected
Number of entries: 0

Brick ngluster-3.network.hoggins.fr:/export/brick/mailer

Status: Connected
Number of entries: 1

Brick ngluster-1.network.hoggins.fr:/export/brick/rom
Status: Connected
Number of entries: 0

Brick ngluster-2.network.hoggins.fr:/export/brick/rom
Status: Connected
Number of entries: 0

Brick ngluster-3.network.hoggins.fr:/export/brick/rom

Status: Connected
Number of entries: 1

Brick ngluster-1.network.hoggins.fr:/export/brick/thedude
Status: Connected
Number of entries: 0

Brick ngluster-2.network.hoggins.fr:/export/brick/thedude

Status: Connected
Number of entries: 1

Brick ngluster-3.network.hoggins.fr:/export/brick/thedude
Status: Connected
Number of entries: 0

Brick ngluster-1.network.hoggins.fr:/export/brick/web
Status: Connected
Number of entries: 0

Brick ngluster-2.network.hoggins.fr:/export/brick/web



Status: Connected
Number of entries: 3

Brick ngluster-3.network.hoggins.fr:/export/brick/web











Status: Connected
Number of entries: 11


Should I be worrying with this never ending ?

    Thank you,

        Hoggins!

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Production Volume will not start

2017-12-18 Thread Matt Waymack
Hi thank you for the reply.  Ultimately the volume did eventually start after 
about 1.5 hours from the volume start command.  Could it have something to do 
with the amount of files on the volume?

From: Atin Mukherjee [mailto:amukh...@redhat.com]
Sent: Monday, December 18, 2017 1:26 AM
To: Matt Waymack 
Cc: gluster-users 
Subject: Re: [Gluster-users] Production Volume will not start



On Sat, Dec 16, 2017 at 12:45 AM, Matt Waymack 
mailto:mwaym...@nsgdv.com>> wrote:

Hi all,



I have an issue where our volume will not start from any node.  When attempting 
to start the volume it will eventually return:

Error: Request timed out



For some time after that, the volume is locked and we either have to wait or 
restart Gluster services.  In the gluserd.log, it shows the following:



[2017-12-15 18:00:12.423478] I [glusterd-utils.c:5926:glusterd_brick_start] 
0-management: starting a fresh brick process for brick /exp/b1/gv0

[2017-12-15 18:03:12.673885] I 
[glusterd-locks.c:729:gd_mgmt_v3_unlock_timer_cbk] 0-management: In 
gd_mgmt_v3_unlock_timer_cbk

[2017-12-15 18:06:34.304868] I [MSGID: 106499] 
[glusterd-handler.c:4303:__glusterd_handle_status_volume] 0-management: 
Received status volume req for volume gv0

[2017-12-15 18:06:34.306603] E [MSGID: 106301] 
[glusterd-syncop.c:1353:gd_stage_op_phase] 0-management: Staging of operation 
'Volume Status' failed on localhost : Volume gv0 is not started

[2017-12-15 18:11:39.412700] I [glusterd-utils.c:5926:glusterd_brick_start] 
0-management: starting a fresh brick process for brick /exp/b2/gv0

[2017-12-15 18:11:42.405966] I [MSGID: 106143] 
[glusterd-pmap.c:280:pmap_registry_bind] 0-pmap: adding brick /exp/b2/gv0 on 
port 49153

[2017-12-15 18:11:42.406415] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 
0-management: setting frame-timeout to 600

[2017-12-15 18:11:42.406669] I [glusterd-utils.c:5926:glusterd_brick_start] 
0-management: starting a fresh brick process for brick /exp/b3/gv0

[2017-12-15 18:14:39.737192] I 
[glusterd-locks.c:729:gd_mgmt_v3_unlock_timer_cbk] 0-management: In 
gd_mgmt_v3_unlock_timer_cbk

[2017-12-15 18:35:20.856849] I [MSGID: 106143] 
[glusterd-pmap.c:280:pmap_registry_bind] 0-pmap: adding brick /exp/b1/gv0 on 
port 49152

[2017-12-15 18:35:20.857508] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 
0-management: setting frame-timeout to 600

[2017-12-15 18:35:20.858277] I [glusterd-utils.c:5926:glusterd_brick_start] 
0-management: starting a fresh brick process for brick /exp/b4/gv0

[2017-12-15 18:46:07.953995] I [MSGID: 106143] 
[glusterd-pmap.c:280:pmap_registry_bind] 0-pmap: adding brick /exp/b3/gv0 on 
port 49154

[2017-12-15 18:46:07.954432] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 
0-management: setting frame-timeout to 600

[2017-12-15 18:46:07.971355] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 
0-snapd: setting frame-timeout to 600

[2017-12-15 18:46:07.989392] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 
0-nfs: setting frame-timeout to 600

[2017-12-15 18:46:07.989543] I [MSGID: 106132] 
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped

[2017-12-15 18:46:07.989562] I [MSGID: 106568] 
[glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: nfs service is stopped

[2017-12-15 18:46:07.989575] I [MSGID: 106600] 
[glusterd-nfs-svc.c:82:glusterd_nfssvc_manager] 0-management: nfs/server.so 
xlator is not installed

[2017-12-15 18:46:07.989601] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 
0-glustershd: setting frame-timeout to 600

[2017-12-15 18:46:08.003011] I [MSGID: 106132] 
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: glustershd already 
stopped

[2017-12-15 18:46:08.003039] I [MSGID: 106568] 
[glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: glustershd service is 
stopped

[2017-12-15 18:46:08.003079] I [MSGID: 106567] 
[glusterd-svc-mgmt.c:197:glusterd_svc_start] 0-management: Starting glustershd 
service

[2017-12-15 18:46:09.005173] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 
0-quotad: setting frame-timeout to 600

[2017-12-15 18:46:09.005569] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 
0-bitd: setting frame-timeout to 600

[2017-12-15 18:46:09.005673] I [MSGID: 106132] 
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped

[2017-12-15 18:46:09.005689] I [MSGID: 106568] 
[glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: bitd service is 
stopped

[2017-12-15 18:46:09.005712] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 
0-scrub: setting frame-timeout to 600

[2017-12-15 18:46:09.005892] I [MSGID: 106132] 
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped

[2017-12-15 18:46:09.005912] I [MSGID: 106568] 
[glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: scrub service is 
stopped

[2017-12-15 18:46:09.026559] I [socket.c:3672:socket_submit_reply] 
0-socket.management: not connected (priv->connected = -1)

[2017-12-15 18:46:09.026568] E [rpcsvc.c:1364:rpcsvc_sub

[Gluster-users] Production Volume will not start

2017-12-15 Thread Matt Waymack
Hi all,

I have an issue where our volume will not start from any node.  When attempting 
to start the volume it will eventually return:

Error: Request timed out

For some time after that, the volume is locked and we either have to wait or 
restart Gluster services.  In the gluserd.log, it shows the following:

[2017-12-15 18:00:12.423478] I [glusterd-utils.c:5926:glusterd_brick_start] 
0-management: starting a fresh brick process for brick /exp/b1/gv0
[2017-12-15 18:03:12.673885] I 
[glusterd-locks.c:729:gd_mgmt_v3_unlock_timer_cbk] 0-management: In 
gd_mgmt_v3_unlock_timer_cbk
[2017-12-15 18:06:34.304868] I [MSGID: 106499] 
[glusterd-handler.c:4303:__glusterd_handle_status_volume] 0-management: 
Received status volume req for volume gv0
[2017-12-15 18:06:34.306603] E [MSGID: 106301] 
[glusterd-syncop.c:1353:gd_stage_op_phase] 0-management: Staging of operation 
'Volume Status' failed on localhost : Volume gv0 is not started
[2017-12-15 18:11:39.412700] I [glusterd-utils.c:5926:glusterd_brick_start] 
0-management: starting a fresh brick process for brick /exp/b2/gv0
[2017-12-15 18:11:42.405966] I [MSGID: 106143] 
[glusterd-pmap.c:280:pmap_registry_bind] 0-pmap: adding brick /exp/b2/gv0 on 
port 49153
[2017-12-15 18:11:42.406415] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 
0-management: setting frame-timeout to 600
[2017-12-15 18:11:42.406669] I [glusterd-utils.c:5926:glusterd_brick_start] 
0-management: starting a fresh brick process for brick /exp/b3/gv0
[2017-12-15 18:14:39.737192] I 
[glusterd-locks.c:729:gd_mgmt_v3_unlock_timer_cbk] 0-management: In 
gd_mgmt_v3_unlock_timer_cbk
[2017-12-15 18:35:20.856849] I [MSGID: 106143] 
[glusterd-pmap.c:280:pmap_registry_bind] 0-pmap: adding brick /exp/b1/gv0 on 
port 49152
[2017-12-15 18:35:20.857508] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 
0-management: setting frame-timeout to 600
[2017-12-15 18:35:20.858277] I [glusterd-utils.c:5926:glusterd_brick_start] 
0-management: starting a fresh brick process for brick /exp/b4/gv0
[2017-12-15 18:46:07.953995] I [MSGID: 106143] 
[glusterd-pmap.c:280:pmap_registry_bind] 0-pmap: adding brick /exp/b3/gv0 on 
port 49154
[2017-12-15 18:46:07.954432] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 
0-management: setting frame-timeout to 600
[2017-12-15 18:46:07.971355] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 
0-snapd: setting frame-timeout to 600
[2017-12-15 18:46:07.989392] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 
0-nfs: setting frame-timeout to 600
[2017-12-15 18:46:07.989543] I [MSGID: 106132] 
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped
[2017-12-15 18:46:07.989562] I [MSGID: 106568] 
[glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: nfs service is stopped
[2017-12-15 18:46:07.989575] I [MSGID: 106600] 
[glusterd-nfs-svc.c:82:glusterd_nfssvc_manager] 0-management: nfs/server.so 
xlator is not installed
[2017-12-15 18:46:07.989601] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 
0-glustershd: setting frame-timeout to 600
[2017-12-15 18:46:08.003011] I [MSGID: 106132] 
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: glustershd already 
stopped
[2017-12-15 18:46:08.003039] I [MSGID: 106568] 
[glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: glustershd service is 
stopped
[2017-12-15 18:46:08.003079] I [MSGID: 106567] 
[glusterd-svc-mgmt.c:197:glusterd_svc_start] 0-management: Starting glustershd 
service
[2017-12-15 18:46:09.005173] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 
0-quotad: setting frame-timeout to 600
[2017-12-15 18:46:09.005569] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 
0-bitd: setting frame-timeout to 600
[2017-12-15 18:46:09.005673] I [MSGID: 106132] 
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped
[2017-12-15 18:46:09.005689] I [MSGID: 106568] 
[glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: bitd service is 
stopped
[2017-12-15 18:46:09.005712] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 
0-scrub: setting frame-timeout to 600
[2017-12-15 18:46:09.005892] I [MSGID: 106132] 
[glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped
[2017-12-15 18:46:09.005912] I [MSGID: 106568] 
[glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: scrub service is 
stopped
[2017-12-15 18:46:09.026559] I [socket.c:3672:socket_submit_reply] 
0-socket.management: not connected (priv->connected = -1)
[2017-12-15 18:46:09.026568] E [rpcsvc.c:1364:rpcsvc_submit_generic] 
0-rpc-service: failed to submit message (XID: 0x2, Program: GlusterD svc cli, 
ProgVers: 2, Proc: 27) to rpc-transport (socket.management)
[2017-12-15 18:46:09.026582] E [MSGID: 106430] 
[glusterd-utils.c:568:glusterd_submit_reply] 0-glusterd: Reply submission failed
[2017-12-15 18:56:17.962251] E [rpc-clnt.c:185:call_bail] 0-management: bailing 
out frame type(glusterd mgmt v3) op(--(4)) xid = 0x14 sent = 2017-12-15 
18:46:09.005976. timeout = 600 for 10.17.100.208:24007
[2017-12-15 18:56:17.962324] E [MSGID: 106116

Re: [Gluster-users] gfid entries in volume heal info that do not heal

2017-10-23 Thread Matt Waymack
In my case I was able to delete the hard links in the .glusterfs folders of the 
bricks and it seems to have done the trick, thanks!

From: Karthik Subrahmanya [mailto:ksubr...@redhat.com]
Sent: Monday, October 23, 2017 1:52 AM
To: Jim Kinney ; Matt Waymack 
Cc: gluster-users 
Subject: Re: [Gluster-users] gfid entries in volume heal info that do not heal

Hi Jim & Matt,
Can you also check for the link count in the stat output of those hardlink 
entries in the .glusterfs folder on the bricks.
If the link count is 1 on all the bricks for those entries, then they are 
orphaned entries and you can delete those hardlinks.
To be on the safer side have a backup before deleting any of the entries.
Regards,
Karthik

On Fri, Oct 20, 2017 at 3:18 AM, Jim Kinney 
mailto:jim.kin...@gmail.com>> wrote:
I've been following this particular thread as I have a similar issue (RAID6 
array failed out with 3 dead drives at once while a 12 TB load was being copied 
into one mounted space - what a mess)

I have >700K GFID entries that have no path data:
Example:
getfattr -d -e hex -m . .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421
# file: .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.bit-rot.version=0x020059b1b316000270e7
trusted.gfid=0xa5ef5af7401b84b5ff2a51c10421

[root@bmidata1<mailto:root@bmidata1> brick]# getfattr -d -n 
trusted.glusterfs.pathinfo -e hex -m . 
.glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421
.glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421: 
trusted.glusterfs.pathinfo: No such attribute

I had to totally rebuild the dead RAID array and did a copy from the live one 
before activating gluster on the rebuilt system. I accidentally copied over the 
.glusterfs folder from the working side
(replica 2 only for now - adding arbiter node as soon as I can get this one 
cleaned up).

I've run the methods from 
"http://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/"; with no 
results using random GFIDs. A full systemic run using the script from method 3 
crashes with "too many nested links" error (or something similar).

When I run gluster volume heal volname info, I get 700K+ GFIDs. Oh. gluster 
3.8.4 on Centos 7.3

Should I just remove the contents of the .glusterfs folder on both and restart 
gluster and run a ls/stat on every file?


When I run a heal, it no longer has a decreasing number of files to heal so 
that's an improvement over the last 2-3 weeks :-)

On Tue, 2017-10-17 at 14:34 +, Matt Waymack wrote:

Attached is the heal log for the volume as well as the shd log.







Run these commands on all the bricks of the replica pair to get the attrs set 
on the backend.







[root@tpc-cent-glus1-081017 ~]# getfattr -d -e hex -m . 
/exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2

getfattr: Removing leading '/' from absolute path names

# file: exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2

security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000

trusted.afr.dirty=0x

trusted.afr.gv0-client-2=0x0001

trusted.gfid=0x108694dbc0394b7cbd3dad6a15d811a2

trusted.gfid2path.9a2f5ada22eb9c45=0x38633262623330322d323466332d346463622d393630322d3839356136396461363131662f435f564f4c2d623030312d693637342d63642d63772e6d6435



[root@tpc-cent-glus2-081017 ~]# getfattr -d -e hex -m . 
/exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2

getfattr: Removing leading '/' from absolute path names

# file: exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2

security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000

trusted.afr.dirty=0x

trusted.afr.gv0-client-2=0x0001

trusted.gfid=0x108694dbc0394b7cbd3dad6a15d811a2

trusted.gfid2path.9a2f5ada22eb9c45=0x38633262623330322d323466332d346463622d393630322d3839356136396461363131662f435f564f4c2d623030312d693637342d63642d63772e6d6435



[root@tpc-arbiter1-100617 ~]# getfattr -d -e hex -m . 
/exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2

getfattr: /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2: No 
such file or directory





[root@tpc-cent-glus1-081017 ~]# getfattr -d -e hex -m . 
/exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3

getfattr: Removing leading '/' from absolute path names

# file: exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3

security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000

trusted.afr.dirty=0x

trusted.afr.gv0-client-11=0x0001

trusted.gfid=0xe0c56bf78bfe46cabde1e46b92d33df3

trusted.gfid2path.be3ba24c3ef95ff2=0x63323366353834652d353566652d343033382d393131622d386637306365633461613666

Re: [Gluster-users] gfid entries in volume heal info that do not heal

2017-10-18 Thread Matt Waymack
It looks like these entries don't have a corresponding file path, they exist 
only in .glusters and appear to be orphaned:

[root@tpc-cent-glus2-081017 ~]# find /exp/b4/gv0 -samefile 
/exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3
/exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3

[root@tpc-cent-glus2-081017 ~]# find /exp/b4/gv0 -samefile 
/exp/b4/gv0/.glusterfs/6f/0a/6f0a0549-8669-46de-8823-d6677fdca8e3
/exp/b4/gv0/.glusterfs/6f/0a/6f0a0549-8669-46de-8823-d6677fdca8e3

[root@tpc-cent-glus1-081017 ~]# find /exp/b1/gv0 -samefile 
/exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2
/exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2

[root@tpc-cent-glus1-081017 ~]# find /exp/b4/gv0 -samefile 
/exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3
/exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3

Occasionally I would get these gfid entries in the heal info output but would 
just run tree against the volume.  After that the files would trigger 
self-heal, and the gfid entries would be updated with their volume paths.  That 
does not seem to be the case here so I feel that all of these entries are 
orphaned.

From: Karthik Subrahmanya [mailto:ksubr...@redhat.com] 
Sent: Wednesday, October 18, 2017 4:34 AM
To: Matt Waymack 
Cc: gluster-users 
Subject: Re: [Gluster-users] gfid entries in volume heal info that do not heal

Hey Matt,
From the xattr output, it looks like the files are not present on the arbiter 
brick & needs healing. But on the parent it does not have the pending markers 
set for those entries.
The workaround for this is you need to do a lookup on the file which needs heal 
from the mount, so it will create the entry on the arbiter brick and then run 
the volume heal to do the healing.
Follow these steps to resolve the issue: (first try this on one file and check 
whether it gets healed. If it gets healed then do this for all the remaining 
files)
1. Get the file path for the gfids you got from heal info output.
    find  -samefile //
2. Do ls/stat on the file from mount.
3. Run volume heal.
4. Check the heal info output to see whether the file got healed.
If one file gets healed, then do step 1 & 2 for the rest of the files and do 
step 3 & 4 once at the end.
Let me know if that resolves the issue.

Thanks & Regards,
Karthik

On Tue, Oct 17, 2017 at 8:04 PM, Matt Waymack <mailto:mwaym...@nsgdv.com> wrote:
Attached is the heal log for the volume as well as the shd log.

>> Run these commands on all the bricks of the replica pair to get the attrs 
>> set on the backend.

[root@tpc-cent-glus1-081017 ~]# getfattr -d -e hex -m . 
/exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2
getfattr: Removing leading '/' from absolute path names
# file: exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x
trusted.afr.gv0-client-2=0x0001
trusted.gfid=0x108694dbc0394b7cbd3dad6a15d811a2
trusted.gfid2path.9a2f5ada22eb9c45=0x38633262623330322d323466332d346463622d393630322d3839356136396461363131662f435f564f4c2d623030312d693637342d63642d63772e6d6435

[root@tpc-cent-glus2-081017 ~]# getfattr -d -e hex -m . 
/exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2
getfattr: Removing leading '/' from absolute path names
# file: exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x
trusted.afr.gv0-client-2=0x0001
trusted.gfid=0x108694dbc0394b7cbd3dad6a15d811a2
trusted.gfid2path.9a2f5ada22eb9c45=0x38633262623330322d323466332d346463622d393630322d3839356136396461363131662f435f564f4c2d623030312d693637342d63642d63772e6d6435

[root@tpc-arbiter1-100617 ~]# getfattr -d -e hex -m . 
/exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2
getfattr: /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2: No 
such file or directory


[root@tpc-cent-glus1-081017 ~]# getfattr -d -e hex -m . 
/exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3
getfattr: Removing leading '/' from absolute path names
# file: exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x
trusted.afr.gv0-client-11=0x0001
trusted.gfid=0xe0c56bf78bfe46cabde1e46b92d33df3
trusted.gfid2path.be3ba24c3ef95ff2=0x63323366353834652d353566652d343033382d393131622d3866373063656334616136662f435f564f4c2d623030332d69313331342d63642d636d2d63722e6d6435

[root@tpc-cent-glus2-081017 ~]# getfattr -d -e hex -m . 
/exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3
getfattr: Removi

Re: [Gluster-users] gfid entries in volume heal info that do not heal

2017-10-17 Thread Matt Waymack
Attached is the heal log for the volume as well as the shd log. 

>> Run these commands on all the bricks of the replica pair to get the attrs 
>> set on the backend.

[root@tpc-cent-glus1-081017 ~]# getfattr -d -e hex -m . 
/exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2
getfattr: Removing leading '/' from absolute path names
# file: exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x
trusted.afr.gv0-client-2=0x0001
trusted.gfid=0x108694dbc0394b7cbd3dad6a15d811a2
trusted.gfid2path.9a2f5ada22eb9c45=0x38633262623330322d323466332d346463622d393630322d3839356136396461363131662f435f564f4c2d623030312d693637342d63642d63772e6d6435

[root@tpc-cent-glus2-081017 ~]# getfattr -d -e hex -m . 
/exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2
getfattr: Removing leading '/' from absolute path names
# file: exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x
trusted.afr.gv0-client-2=0x0001
trusted.gfid=0x108694dbc0394b7cbd3dad6a15d811a2
trusted.gfid2path.9a2f5ada22eb9c45=0x38633262623330322d323466332d346463622d393630322d3839356136396461363131662f435f564f4c2d623030312d693637342d63642d63772e6d6435

[root@tpc-arbiter1-100617 ~]# getfattr -d -e hex -m . 
/exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2
getfattr: /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2: No 
such file or directory


[root@tpc-cent-glus1-081017 ~]# getfattr -d -e hex -m . 
/exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3
getfattr: Removing leading '/' from absolute path names
# file: exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x
trusted.afr.gv0-client-11=0x0001
trusted.gfid=0xe0c56bf78bfe46cabde1e46b92d33df3
trusted.gfid2path.be3ba24c3ef95ff2=0x63323366353834652d353566652d343033382d393131622d3866373063656334616136662f435f564f4c2d623030332d69313331342d63642d636d2d63722e6d6435

[root@tpc-cent-glus2-081017 ~]# getfattr -d -e hex -m . 
/exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3
getfattr: Removing leading '/' from absolute path names
# file: exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x
trusted.afr.gv0-client-11=0x0001
trusted.gfid=0xe0c56bf78bfe46cabde1e46b92d33df3
trusted.gfid2path.be3ba24c3ef95ff2=0x63323366353834652d353566652d343033382d393131622d3866373063656334616136662f435f564f4c2d623030332d69313331342d63642d636d2d63722e6d6435

[root@tpc-arbiter1-100617 ~]# getfattr -d -e hex -m . 
/exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3
getfattr: /exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3: No 
such file or directory

>> And the output of "gluster volume heal  info split-brain"

[root@tpc-cent-glus1-081017 ~]# gluster volume heal gv0 info split-brain
Brick tpc-cent-glus1-081017:/exp/b1/gv0
Status: Connected
Number of entries in split-brain: 0

Brick tpc-cent-glus2-081017:/exp/b1/gv0
Status: Connected
Number of entries in split-brain: 0

Brick tpc-arbiter1-100617:/exp/b1/gv0
Status: Connected
Number of entries in split-brain: 0

Brick tpc-cent-glus1-081017:/exp/b2/gv0
Status: Connected
Number of entries in split-brain: 0

Brick tpc-cent-glus2-081017:/exp/b2/gv0
Status: Connected
Number of entries in split-brain: 0

Brick tpc-arbiter1-100617:/exp/b2/gv0
Status: Connected
Number of entries in split-brain: 0

Brick tpc-cent-glus1-081017:/exp/b3/gv0
Status: Connected
Number of entries in split-brain: 0

Brick tpc-cent-glus2-081017:/exp/b3/gv0
Status: Connected
Number of entries in split-brain: 0

Brick tpc-arbiter1-100617:/exp/b3/gv0
Status: Connected
Number of entries in split-brain: 0

Brick tpc-cent-glus1-081017:/exp/b4/gv0
Status: Connected
Number of entries in split-brain: 0

Brick tpc-cent-glus2-081017:/exp/b4/gv0
Status: Connected
Number of entries in split-brain: 0

Brick tpc-arbiter1-100617:/exp/b4/gv0
Status: Connected
Number of entries in split-brain: 0

-Matt

From: Karthik Subrahmanya [mailto:ksubr...@redhat.com] 
Sent: Tuesday, October 17, 2017 1:26 AM
To: Matt Waymack 
Cc: gluster-users 
Subject: Re: [Gluster-users] gfid entries in volume heal info that do not heal

Hi Matt,

Run these commands on all the bricks of the replica pair to get the attrs set 
on the backend.

On the bricks of first replica set:
getfattr -d -e hex -m . /.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811

Re: [Gluster-users] gfid entries in volume heal info that do not heal

2017-10-16 Thread Matt Waymack
OK, so here’s my output of the volume info and the heal info. I have not yet 
tracked down physical location of these files, any tips to finding them would 
be appreciated, but I’m definitely just wanting them gone.  I forgot to mention 
earlier that the cluster is running 3.12 and was upgraded from 3.10; these 
files were likely stuck like this when it was on 3.10.

[root@tpc-cent-glus1-081017 ~]# gluster volume info gv0

Volume Name: gv0
Type: Distributed-Replicate
Volume ID: 8f07894d-e3ab-4a65-bda1-9d9dd46db007
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x (2 + 1) = 12
Transport-type: tcp
Bricks:
Brick1: tpc-cent-glus1-081017:/exp/b1/gv0
Brick2: tpc-cent-glus2-081017:/exp/b1/gv0
Brick3: tpc-arbiter1-100617:/exp/b1/gv0 (arbiter)
Brick4: tpc-cent-glus1-081017:/exp/b2/gv0
Brick5: tpc-cent-glus2-081017:/exp/b2/gv0
Brick6: tpc-arbiter1-100617:/exp/b2/gv0 (arbiter)
Brick7: tpc-cent-glus1-081017:/exp/b3/gv0
Brick8: tpc-cent-glus2-081017:/exp/b3/gv0
Brick9: tpc-arbiter1-100617:/exp/b3/gv0 (arbiter)
Brick10: tpc-cent-glus1-081017:/exp/b4/gv0
Brick11: tpc-cent-glus2-081017:/exp/b4/gv0
Brick12: tpc-arbiter1-100617:/exp/b4/gv0 (arbiter)
Options Reconfigured:
nfs.disable: on
transport.address-family: inet

[root@tpc-cent-glus1-081017 ~]# gluster volume heal gv0 info
Brick tpc-cent-glus1-081017:/exp/b1/gv0








Status: Connected
Number of entries: 118

Brick tpc-cent-glus2-081017:/exp/b1/gv0








Status: Connected
Number of entries: 118

Brick tpc-arbiter1-100617:/exp/b1/gv0
Status: Connected
Number of entries: 0

Brick tpc-cent-glus1-081017:/exp/b2/gv0
Status: Connected
Number of entries: 0

Brick tpc-cent-glus2-081017:/exp/b2/gv0
Status: Connected
Number of entries: 0

Brick tpc-arbiter1-100617:/exp/b2/gv0
Status: Connected
Number of entries: 0

Brick tpc-cent-glus1-081017:/exp/b3/gv0
Status: Connected
Number of entries: 0

Brick tpc-cent-glus2-081017:/exp/b3/gv0
Status: Connected
Number of entries: 0

Brick tpc-arbiter1-100617:/exp/b3/gv0
Status: Connected
Number of entries: 0

Brick tpc-cent-glus1-081017:/exp/b4/gv0
























Status: Connected
Number of entries: 24

Brick tpc-cent-glus2-081017:/exp/b4/gv0
























Status: Connected
Number of entries: 24

Brick tpc-arbiter1-100617:/exp/b4/gv0
Status: Connected
Number of entries: 0

Thank you for your help!

From: Karthik Subrahmanya [mailto:ksubr...@redhat.com]
Sent: Monday, October 16, 2017 10:27 AM
To: Matt Waymack 
Cc: gluster-users 
Subject: Re: [Gluster-users] gfid entries in volume heal info that do not heal

Hi Matt,

The files might be in split brain. Could you please send the outputs of these?
gluster volume info 
gluster volume heal  info
And also the getfattr output of the files which are in the heal info output 
from all the bricks of that replica pair.
getfattr -d -e hex -m . 

Thanks &  Regards
Karthik

On 16-Oct-2017 8:16 PM, "Matt Waymack" 
mailto:mwaym...@nsgdv.com>> wrote:
Hi all,

I have a volume where the output of volume heal info shows several gfid entries 
to be healed, but they’ve been there for weeks and have not healed.  Any normal 
file that shows up on the heal info does get healed as expected, but these gfid 
entries do not.  Is there any way to remove these orphaned entries from the 
volume so they are no longer stuck in the heal process?

Thank you!

___
Gluster-users mailing list
Gluster-users@gluster.org<mailto:Gluster-users@gluster.org>
http://lists.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] gfid entries in volume heal info that do not heal

2017-10-16 Thread Matt Waymack
Hi all,

I have a volume where the output of volume heal info shows several gfid entries 
to be healed, but they've been there for weeks and have not healed.  Any normal 
file that shows up on the heal info does get healed as expected, but these gfid 
entries do not.  Is there any way to remove these orphaned entries from the 
volume so they are no longer stuck in the heal process?

Thank you!
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users