Re: [Gluster-users] Unable to create new files or folders using samba and vfs_glusterfs
I've been using these for a few weeks now without any issues, thank you! -Original Message- From: gluster-users-boun...@gluster.org On Behalf Of Matt Waymack Sent: Thursday, December 27, 2018 10:56 AM To: Diego Remolina Cc: gluster-users@gluster.org List Subject: Re: [Gluster-users] Unable to create new files or folders using samba and vfs_glusterfs OK, I'm back from the holiday and updated using the following packages: libsmbclient-4.8.3-4.el7.0.1.x86_64.rpm libwbclient-4.8.3-4.el7.0.1.x86_64.rpm samba-4.8.3-4.el7.0.1.x86_64.rpm samba-client-4.8.3-4.el7.0.1.x86_64.rpm samba-client-libs-4.8.3-4.el7.0.1.x86_64.rpm samba-common-4.8.3-4.el7.0.1.noarch.rpm samba-common-libs-4.8.3-4.el7.0.1.x86_64.rpm samba-common-tools-4.8.3-4.el7.0.1.x86_64.rpm samba-libs-4.8.3-4.el7.0.1.x86_64.rpm samba-vfs-glusterfs-4.8.3-4.el7.0.1.x86_64.rpm First impressions are good! We're able to create files/folders. I'll keep you updated with stability. Thank you! -Original Message- From: Diego Remolina Sent: Thursday, December 20, 2018 1:36 PM To: Matt Waymack Cc: gluster-users@gluster.org List Subject: Re: [Gluster-users] Unable to create new files or folders using samba and vfs_glusterfs Hi Matt, The update is slightly different, has the .1 at the end: Fast-track -> samba-4.8.3-4.el7.0.1.x86_64.rpm vs general -> samba-4.8.3-4.el7.x86_64 I think these are built, but not pushed to fasttrack repo until they get feedback the packages are good. So you may need to use wget to download them and update your packages with these for the test. Diego On Thu, Dec 20, 2018 at 1:06 PM Matt Waymack wrote: > > Hi all, > > > > I’m looking to update Samba from fasttrack, but I only still se 4.8.3 and yum > is not wanting to update. The test build is also showing 4.8.3. > > > > Thank you! > > > > > > From: gluster-users-boun...@gluster.org > On Behalf Of Matt Waymack > Sent: Sunday, December 16, 2018 1:55 PM > To: Diego Remolina > Cc: gluster-users@gluster.org List > Subject: Re: [Gluster-users] Unable to create new files or folders > using samba and vfs_glusterfs > > > > Hi all, sorry for the delayed response. > > > > I can test this out and will report back. It may be as late as Tuesday > before I can test the build. > > > > Thank you! > > > > On Dec 15, 2018 7:46 AM, Diego Remolina wrote: > > Matt, > > > > Can you test the updated samba packages that the CentOS team has built for > FasTrack? > > > > A NOTE has been added to this issue. > > -- > (0033351) pgreco (developer) - 2018-12-15 13:43 > https://bugs.centos.org/view.php?id=15586#c33351 > -- > @dijur...@gmail.com > Here's the link for the test build > https://buildlogs.centos.org/c7-fasttrack.x86_64/samba/20181214164659/ > 4.8.3-4.el7.0.1.x86_64/ > . > Please let us know how it goes. Thanks for testing! > Pablo. > -- > > Diego > > > > > On Fri, Dec 14, 2018 at 12:52 AM Anoop C S wrote: > > > > On Thu, 2018-12-13 at 15:31 +, Matt Waymack wrote: > > > Hi all, > > > > > > I’m having an issue on Windows clients accessing shares via smb > > > when using vfs_glusterfs. They are unable to create any file or > > > folders at the root of the share and get the error “The file is > > > too large for the destination file system.” When I change from > > > vfs_glusterfs to just using a filesystem path to the same > > > location, it works fine (except for the performance hit). All my > > > searches have led to bug 1619108, and that seems to be the > > > symptom, but there doesn’t appear to be any clear resolution. > > > > You figured out the right bug and following is the upstream Samba bug: > > > > https://bugzilla.samba.org/show_bug.cgi?id=13585 > > > > Unfortunately it is only available with v4.8.6 and higher. If > > required I can patch it up and provide a build. > > > > > I’m on the latest version of samba available on CentOS 7 (4.8.3) > > > and I’m on the latest available glusterfs 4.1 (4.1.6). Is there > > > something simple I’m missing to get this going? > > > > > > Thank you! > > > ___ > > > Gluster-users mailing list > > > Gluster-users@gluster.org > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > ___ > > Gluster-users mailing list > > Gluster-users@gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Input/output error on FUSE log
Has anyone any other ideas where to look? This is only affecting FUSE clients. SMB clients are unaffected by this problem. Thanks! From: gluster-users-boun...@gluster.org On Behalf Of Matt Waymack Sent: Monday, January 7, 2019 1:19 PM To: Raghavendra Gowdappa Cc: gluster-users@gluster.org List Subject: Re: [Gluster-users] Input/output error on FUSE log Attached are the logs from when a failure occurred with diagnostics set to trace. Thank you! From: Raghavendra Gowdappa mailto:rgowd...@redhat.com>> Sent: Saturday, January 5, 2019 8:32 PM To: Matt Waymack mailto:mwaym...@nsgdv.com>> Cc: gluster-users@gluster.org<mailto:gluster-users@gluster.org> List mailto:gluster-users@gluster.org>> Subject: Re: [Gluster-users] Input/output error on FUSE log On Sun, Jan 6, 2019 at 7:58 AM Raghavendra Gowdappa mailto:rgowd...@redhat.com>> wrote: On Sun, Jan 6, 2019 at 4:19 AM Matt Waymack mailto:mwaym...@nsgdv.com>> wrote: Hi all, I'm having a problem writing to our volume. When writing files larger than about 2GB, I get an intermittent issue where the write will fail and return Input/Output error. This is also shown in the FUSE log of the client (this is affecting all clients). A snip of a client log is below: [2019-01-05 22:39:44.581371] W [fuse-bridge.c:2474:fuse_writev_cbk] 0-glusterfs-fuse: 51040978: WRITE => -1 gfid=82a0b5c4-7ef3-43c2-ad86-41e16673d7c2 fd=0x7f949839a368 (Input/output error) [2019-01-05 22:39:44.598392] W [fuse-bridge.c:1441:fuse_err_cbk] 0-glusterfs-fuse: 51040979: FLUSH() ERR => -1 (Input/output error) [2019-01-05 22:39:47.420920] W [fuse-bridge.c:2474:fuse_writev_cbk] 0-glusterfs-fuse: 51041266: WRITE => -1 gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949809b7f8 (Input/output error) [2019-01-05 22:39:47.433377] W [fuse-bridge.c:1441:fuse_err_cbk] 0-glusterfs-fuse: 51041267: FLUSH() ERR => -1 (Input/output error) [2019-01-05 22:39:50.441531] W [fuse-bridge.c:2474:fuse_writev_cbk] 0-glusterfs-fuse: 51041548: WRITE => -1 gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949839a368 (Input/output error) [2019-01-05 22:39:50.451914] W [fuse-bridge.c:1441:fuse_err_cbk] 0-glusterfs-fuse: 51041549: FLUSH() ERR => -1 (Input/output error) The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: no subvolume for hash (value) = 1311504267" repeated 1721 times between [2019-01-05 22:39:33.906241] and [2019-01-05 22:39:44.598371] The message "E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk] 0-gv1-dht: dict is null" repeated 1714 times between [2019-01-05 22:39:33.925981] and [2019-01-05 22:39:50.451862] The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: no subvolume for hash (value) = 1137142622" repeated 1707 times between [2019-01-05 22:39:39.636552] and [2019-01-05 22:39:50.451895] This looks to be a DHT issue. Some questions: * Are all subvolumes of DHT up and client is connected to them? Particularly the subvolume which contains the file in question. * Can you get all extended attributes of parent directory of the file from all bricks? * set diagnostics.client-log-level to TRACE, capture these errors again and attach the client log file. I spoke a bit early. dht_writev doesn't search hashed subvolume as its already been looked up in lookup. So, these msgs looks to be of a different issue - not writev failure. This is intermittent for most files, but eventually if a file is large enough it will not write. The workflow is SFTP tot he client which then writes to the volume over FUSE. When files get to a certain point,w e can no longer write to them. The file sizes are different as well, so it's not like they all get to the same size and just stop either. I've ruled out a free space issue, our files at their largest are only a few hundred GB and we have tens of terrabytes free on each brick. We are also sharding at 1GB. I'm not sure where to go from here as the error seems vague and I can only see it on the client log. I'm not seeing these errors on the nodes themselves. This is also seen if I mount the volume via FUSE on any of the nodes as well and it is only reflected in the FUSE log. Here is the volume info: Volume Name: gv1 Type: Distributed-Replicate Volume ID: 1472cc78-e2a0-4c3f-9571-dab840239b3c Status: Started Snapshot Count: 0 Number of Bricks: 8 x (2 + 1) = 24 Transport-type: tcp Bricks: Brick1: tpc-glus4:/exp/b1/gv1 Brick2: tpc-glus2:/exp/b1/gv1 Brick3: tpc-arbiter1:/exp/b1/gv1 (arbiter) Brick4: tpc-glus2:/exp/b2/gv1 Brick5: tpc-glus4:/exp/b2/gv1 Brick6: tpc-arbiter1:/exp/b2/gv1 (arbiter) Brick7: tpc-glus4:/exp/b3/gv1 Brick8: tpc-glus2:/exp/b3/gv1 Brick9: tpc-arbiter1:/exp/b3/gv1 (arbiter) Brick10: tpc-glus4:/exp/b4/gv1 Brick11: tpc-glus2:/exp/b4/gv1 Brick12: tpc-arbiter1:/exp/b4/gv1 (arbiter) Brick13: tpc-glus1:/exp/b5/gv1 Brick14: tpc-glus3:/ex
Re: [Gluster-users] [External] Re: Input/output error on FUSE log
Yep, first unmount/remounted, then rebooted clients. Stopped/started the volumes, and rebooted all nodes. From: Davide Obbi Sent: Monday, January 7, 2019 12:47 PM To: Matt Waymack Cc: Raghavendra Gowdappa ; gluster-users@gluster.org List Subject: Re: [External] Re: [Gluster-users] Input/output error on FUSE log i guess you tried already unmounting, stop/star and mounting? On Mon, Jan 7, 2019 at 7:44 PM Matt Waymack mailto:mwaym...@nsgdv.com>> wrote: Yes, all volumes use sharding. From: Davide Obbi mailto:davide.o...@booking.com>> Sent: Monday, January 7, 2019 12:43 PM To: Matt Waymack mailto:mwaym...@nsgdv.com>> Cc: Raghavendra Gowdappa mailto:rgowd...@redhat.com>>; gluster-users@gluster.org<mailto:gluster-users@gluster.org> List mailto:gluster-users@gluster.org>> Subject: Re: [External] Re: [Gluster-users] Input/output error on FUSE log are all the volumes being configured with sharding? On Mon, Jan 7, 2019 at 5:35 PM Matt Waymack mailto:mwaym...@nsgdv.com>> wrote: I think that I can rule out network as I have multiple volumes on the same nodes and not all volumes are affected. Additionally, access via SMB using samba-vfs-glusterfs is not affected, even on the same volumes. This is seemingly only affecting the FUSE clients. From: Davide Obbi mailto:davide.o...@booking.com>> Sent: Sunday, January 6, 2019 12:26 PM To: Raghavendra Gowdappa mailto:rgowd...@redhat.com>> Cc: Matt Waymack mailto:mwaym...@nsgdv.com>>; gluster-users@gluster.org<mailto:gluster-users@gluster.org> List mailto:gluster-users@gluster.org>> Subject: Re: [External] Re: [Gluster-users] Input/output error on FUSE log Hi, i would start doing some checks like: "(Input/output error)" seems returned by the operating system, this happens for instance trying to access a file system which is on a device not available so i would check the network connectivity between the client to servers and server to server during the reported time. Regards Davide On Sun, Jan 6, 2019 at 3:32 AM Raghavendra Gowdappa mailto:rgowd...@redhat.com>> wrote: On Sun, Jan 6, 2019 at 7:58 AM Raghavendra Gowdappa mailto:rgowd...@redhat.com>> wrote: On Sun, Jan 6, 2019 at 4:19 AM Matt Waymack mailto:mwaym...@nsgdv.com>> wrote: Hi all, I'm having a problem writing to our volume. When writing files larger than about 2GB, I get an intermittent issue where the write will fail and return Input/Output error. This is also shown in the FUSE log of the client (this is affecting all clients). A snip of a client log is below: [2019-01-05 22:39:44.581371] W [fuse-bridge.c:2474:fuse_writev_cbk] 0-glusterfs-fuse: 51040978: WRITE => -1 gfid=82a0b5c4-7ef3-43c2-ad86-41e16673d7c2 fd=0x7f949839a368 (Input/output error) [2019-01-05 22:39:44.598392] W [fuse-bridge.c:1441:fuse_err_cbk] 0-glusterfs-fuse: 51040979: FLUSH() ERR => -1 (Input/output error) [2019-01-05 22:39:47.420920] W [fuse-bridge.c:2474:fuse_writev_cbk] 0-glusterfs-fuse: 51041266: WRITE => -1 gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949809b7f8 (Input/output error) [2019-01-05 22:39:47.433377] W [fuse-bridge.c:1441:fuse_err_cbk] 0-glusterfs-fuse: 51041267: FLUSH() ERR => -1 (Input/output error) [2019-01-05 22:39:50.441531] W [fuse-bridge.c:2474:fuse_writev_cbk] 0-glusterfs-fuse: 51041548: WRITE => -1 gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949839a368 (Input/output error) [2019-01-05 22:39:50.451914] W [fuse-bridge.c:1441:fuse_err_cbk] 0-glusterfs-fuse: 51041549: FLUSH() ERR => -1 (Input/output error) The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: no subvolume for hash (value) = 1311504267" repeated 1721 times between [2019-01-05 22:39:33.906241] and [2019-01-05 22:39:44.598371] The message "E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk] 0-gv1-dht: dict is null" repeated 1714 times between [2019-01-05 22:39:33.925981] and [2019-01-05 22:39:50.451862] The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: no subvolume for hash (value) = 1137142622" repeated 1707 times between [2019-01-05 22:39:39.636552] and [2019-01-05 22:39:50.451895] This looks to be a DHT issue. Some questions: * Are all subvolumes of DHT up and client is connected to them? Particularly the subvolume which contains the file in question. * Can you get all extended attributes of parent directory of the file from all bricks? * set diagnostics.client-log-level to TRACE, capture these errors again and attach the client log file. I spoke a bit early. dht_writev doesn't search hashed subvolume as its already been looked up in lookup. So, these msgs looks to be of a different issue - not writev failure. This is intermittent for most files, but eventually if a file is large enough it will not write. The workflow is SFTP tot he client which then writ
Re: [Gluster-users] [External] Re: Input/output error on FUSE log
Yes, all volumes use sharding. From: Davide Obbi Sent: Monday, January 7, 2019 12:43 PM To: Matt Waymack Cc: Raghavendra Gowdappa ; gluster-users@gluster.org List Subject: Re: [External] Re: [Gluster-users] Input/output error on FUSE log are all the volumes being configured with sharding? On Mon, Jan 7, 2019 at 5:35 PM Matt Waymack mailto:mwaym...@nsgdv.com>> wrote: I think that I can rule out network as I have multiple volumes on the same nodes and not all volumes are affected. Additionally, access via SMB using samba-vfs-glusterfs is not affected, even on the same volumes. This is seemingly only affecting the FUSE clients. From: Davide Obbi mailto:davide.o...@booking.com>> Sent: Sunday, January 6, 2019 12:26 PM To: Raghavendra Gowdappa mailto:rgowd...@redhat.com>> Cc: Matt Waymack mailto:mwaym...@nsgdv.com>>; gluster-users@gluster.org<mailto:gluster-users@gluster.org> List mailto:gluster-users@gluster.org>> Subject: Re: [External] Re: [Gluster-users] Input/output error on FUSE log Hi, i would start doing some checks like: "(Input/output error)" seems returned by the operating system, this happens for instance trying to access a file system which is on a device not available so i would check the network connectivity between the client to servers and server to server during the reported time. Regards Davide On Sun, Jan 6, 2019 at 3:32 AM Raghavendra Gowdappa mailto:rgowd...@redhat.com>> wrote: On Sun, Jan 6, 2019 at 7:58 AM Raghavendra Gowdappa mailto:rgowd...@redhat.com>> wrote: On Sun, Jan 6, 2019 at 4:19 AM Matt Waymack mailto:mwaym...@nsgdv.com>> wrote: Hi all, I'm having a problem writing to our volume. When writing files larger than about 2GB, I get an intermittent issue where the write will fail and return Input/Output error. This is also shown in the FUSE log of the client (this is affecting all clients). A snip of a client log is below: [2019-01-05 22:39:44.581371] W [fuse-bridge.c:2474:fuse_writev_cbk] 0-glusterfs-fuse: 51040978: WRITE => -1 gfid=82a0b5c4-7ef3-43c2-ad86-41e16673d7c2 fd=0x7f949839a368 (Input/output error) [2019-01-05 22:39:44.598392] W [fuse-bridge.c:1441:fuse_err_cbk] 0-glusterfs-fuse: 51040979: FLUSH() ERR => -1 (Input/output error) [2019-01-05 22:39:47.420920] W [fuse-bridge.c:2474:fuse_writev_cbk] 0-glusterfs-fuse: 51041266: WRITE => -1 gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949809b7f8 (Input/output error) [2019-01-05 22:39:47.433377] W [fuse-bridge.c:1441:fuse_err_cbk] 0-glusterfs-fuse: 51041267: FLUSH() ERR => -1 (Input/output error) [2019-01-05 22:39:50.441531] W [fuse-bridge.c:2474:fuse_writev_cbk] 0-glusterfs-fuse: 51041548: WRITE => -1 gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949839a368 (Input/output error) [2019-01-05 22:39:50.451914] W [fuse-bridge.c:1441:fuse_err_cbk] 0-glusterfs-fuse: 51041549: FLUSH() ERR => -1 (Input/output error) The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: no subvolume for hash (value) = 1311504267" repeated 1721 times between [2019-01-05 22:39:33.906241] and [2019-01-05 22:39:44.598371] The message "E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk] 0-gv1-dht: dict is null" repeated 1714 times between [2019-01-05 22:39:33.925981] and [2019-01-05 22:39:50.451862] The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: no subvolume for hash (value) = 1137142622" repeated 1707 times between [2019-01-05 22:39:39.636552] and [2019-01-05 22:39:50.451895] This looks to be a DHT issue. Some questions: * Are all subvolumes of DHT up and client is connected to them? Particularly the subvolume which contains the file in question. * Can you get all extended attributes of parent directory of the file from all bricks? * set diagnostics.client-log-level to TRACE, capture these errors again and attach the client log file. I spoke a bit early. dht_writev doesn't search hashed subvolume as its already been looked up in lookup. So, these msgs looks to be of a different issue - not writev failure. This is intermittent for most files, but eventually if a file is large enough it will not write. The workflow is SFTP tot he client which then writes to the volume over FUSE. When files get to a certain point,w e can no longer write to them. The file sizes are different as well, so it's not like they all get to the same size and just stop either. I've ruled out a free space issue, our files at their largest are only a few hundred GB and we have tens of terrabytes free on each brick. We are also sharding at 1GB. I'm not sure where to go from here as the error seems vague and I can only see it on the client log. I'm not seeing these errors on the nodes themselves. This is also seen if I mount the volume via FUSE on any of the nodes as well and it is only ref
Re: [Gluster-users] [External] Re: Input/output error on FUSE log
I think that I can rule out network as I have multiple volumes on the same nodes and not all volumes are affected. Additionally, access via SMB using samba-vfs-glusterfs is not affected, even on the same volumes. This is seemingly only affecting the FUSE clients. From: Davide Obbi Sent: Sunday, January 6, 2019 12:26 PM To: Raghavendra Gowdappa Cc: Matt Waymack ; gluster-users@gluster.org List Subject: Re: [External] Re: [Gluster-users] Input/output error on FUSE log Hi, i would start doing some checks like: "(Input/output error)" seems returned by the operating system, this happens for instance trying to access a file system which is on a device not available so i would check the network connectivity between the client to servers and server to server during the reported time. Regards Davide On Sun, Jan 6, 2019 at 3:32 AM Raghavendra Gowdappa mailto:rgowd...@redhat.com>> wrote: On Sun, Jan 6, 2019 at 7:58 AM Raghavendra Gowdappa mailto:rgowd...@redhat.com>> wrote: On Sun, Jan 6, 2019 at 4:19 AM Matt Waymack mailto:mwaym...@nsgdv.com>> wrote: Hi all, I'm having a problem writing to our volume. When writing files larger than about 2GB, I get an intermittent issue where the write will fail and return Input/Output error. This is also shown in the FUSE log of the client (this is affecting all clients). A snip of a client log is below: [2019-01-05 22:39:44.581371] W [fuse-bridge.c:2474:fuse_writev_cbk] 0-glusterfs-fuse: 51040978: WRITE => -1 gfid=82a0b5c4-7ef3-43c2-ad86-41e16673d7c2 fd=0x7f949839a368 (Input/output error) [2019-01-05 22:39:44.598392] W [fuse-bridge.c:1441:fuse_err_cbk] 0-glusterfs-fuse: 51040979: FLUSH() ERR => -1 (Input/output error) [2019-01-05 22:39:47.420920] W [fuse-bridge.c:2474:fuse_writev_cbk] 0-glusterfs-fuse: 51041266: WRITE => -1 gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949809b7f8 (Input/output error) [2019-01-05 22:39:47.433377] W [fuse-bridge.c:1441:fuse_err_cbk] 0-glusterfs-fuse: 51041267: FLUSH() ERR => -1 (Input/output error) [2019-01-05 22:39:50.441531] W [fuse-bridge.c:2474:fuse_writev_cbk] 0-glusterfs-fuse: 51041548: WRITE => -1 gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949839a368 (Input/output error) [2019-01-05 22:39:50.451914] W [fuse-bridge.c:1441:fuse_err_cbk] 0-glusterfs-fuse: 51041549: FLUSH() ERR => -1 (Input/output error) The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: no subvolume for hash (value) = 1311504267" repeated 1721 times between [2019-01-05 22:39:33.906241] and [2019-01-05 22:39:44.598371] The message "E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk] 0-gv1-dht: dict is null" repeated 1714 times between [2019-01-05 22:39:33.925981] and [2019-01-05 22:39:50.451862] The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: no subvolume for hash (value) = 1137142622" repeated 1707 times between [2019-01-05 22:39:39.636552] and [2019-01-05 22:39:50.451895] This looks to be a DHT issue. Some questions: * Are all subvolumes of DHT up and client is connected to them? Particularly the subvolume which contains the file in question. * Can you get all extended attributes of parent directory of the file from all bricks? * set diagnostics.client-log-level to TRACE, capture these errors again and attach the client log file. I spoke a bit early. dht_writev doesn't search hashed subvolume as its already been looked up in lookup. So, these msgs looks to be of a different issue - not writev failure. This is intermittent for most files, but eventually if a file is large enough it will not write. The workflow is SFTP tot he client which then writes to the volume over FUSE. When files get to a certain point,w e can no longer write to them. The file sizes are different as well, so it's not like they all get to the same size and just stop either. I've ruled out a free space issue, our files at their largest are only a few hundred GB and we have tens of terrabytes free on each brick. We are also sharding at 1GB. I'm not sure where to go from here as the error seems vague and I can only see it on the client log. I'm not seeing these errors on the nodes themselves. This is also seen if I mount the volume via FUSE on any of the nodes as well and it is only reflected in the FUSE log. Here is the volume info: Volume Name: gv1 Type: Distributed-Replicate Volume ID: 1472cc78-e2a0-4c3f-9571-dab840239b3c Status: Started Snapshot Count: 0 Number of Bricks: 8 x (2 + 1) = 24 Transport-type: tcp Bricks: Brick1: tpc-glus4:/exp/b1/gv1 Brick2: tpc-glus2:/exp/b1/gv1 Brick3: tpc-arbiter1:/exp/b1/gv1 (arbiter) Brick4: tpc-glus2:/exp/b2/gv1 Brick5: tpc-glus4:/exp/b2/gv1 Brick6: tpc-arbiter1:/exp/b2/gv1 (arbiter) Brick7: tpc-glus4:/exp/b3/gv1 Brick8: tpc-glus2:/exp/b3/gv1 Brick9: tpc-arbiter1:/exp/b3/gv1 (arbiter) Brick10: tpc-glus4:/exp
[Gluster-users] Input/output error on FUSE log
Hi all, I'm having a problem writing to our volume. When writing files larger than about 2GB, I get an intermittent issue where the write will fail and return Input/Output error. This is also shown in the FUSE log of the client (this is affecting all clients). A snip of a client log is below: [2019-01-05 22:39:44.581371] W [fuse-bridge.c:2474:fuse_writev_cbk] 0-glusterfs-fuse: 51040978: WRITE => -1 gfid=82a0b5c4-7ef3-43c2-ad86-41e16673d7c2 fd=0x7f949839a368 (Input/output error) [2019-01-05 22:39:44.598392] W [fuse-bridge.c:1441:fuse_err_cbk] 0-glusterfs-fuse: 51040979: FLUSH() ERR => -1 (Input/output error) [2019-01-05 22:39:47.420920] W [fuse-bridge.c:2474:fuse_writev_cbk] 0-glusterfs-fuse: 51041266: WRITE => -1 gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949809b7f8 (Input/output error) [2019-01-05 22:39:47.433377] W [fuse-bridge.c:1441:fuse_err_cbk] 0-glusterfs-fuse: 51041267: FLUSH() ERR => -1 (Input/output error) [2019-01-05 22:39:50.441531] W [fuse-bridge.c:2474:fuse_writev_cbk] 0-glusterfs-fuse: 51041548: WRITE => -1 gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949839a368 (Input/output error) [2019-01-05 22:39:50.451914] W [fuse-bridge.c:1441:fuse_err_cbk] 0-glusterfs-fuse: 51041549: FLUSH() ERR => -1 (Input/output error) The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: no subvolume for hash (value) = 1311504267" repeated 1721 times between [2019-01-05 22:39:33.906241] and [2019-01-05 22:39:44.598371] The message "E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk] 0-gv1-dht: dict is null" repeated 1714 times between [2019-01-05 22:39:33.925981] and [2019-01-05 22:39:50.451862] The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: no subvolume for hash (value) = 1137142622" repeated 1707 times between [2019-01-05 22:39:39.636552] and [2019-01-05 22:39:50.451895] This is intermittent for most files, but eventually if a file is large enough it will not write. The workflow is SFTP tot he client which then writes to the volume over FUSE. When files get to a certain point,w e can no longer write to them. The file sizes are different as well, so it's not like they all get to the same size and just stop either. I've ruled out a free space issue, our files at their largest are only a few hundred GB and we have tens of terrabytes free on each brick. We are also sharding at 1GB. I'm not sure where to go from here as the error seems vague and I can only see it on the client log. I'm not seeing these errors on the nodes themselves. This is also seen if I mount the volume via FUSE on any of the nodes as well and it is only reflected in the FUSE log. Here is the volume info: Volume Name: gv1 Type: Distributed-Replicate Volume ID: 1472cc78-e2a0-4c3f-9571-dab840239b3c Status: Started Snapshot Count: 0 Number of Bricks: 8 x (2 + 1) = 24 Transport-type: tcp Bricks: Brick1: tpc-glus4:/exp/b1/gv1 Brick2: tpc-glus2:/exp/b1/gv1 Brick3: tpc-arbiter1:/exp/b1/gv1 (arbiter) Brick4: tpc-glus2:/exp/b2/gv1 Brick5: tpc-glus4:/exp/b2/gv1 Brick6: tpc-arbiter1:/exp/b2/gv1 (arbiter) Brick7: tpc-glus4:/exp/b3/gv1 Brick8: tpc-glus2:/exp/b3/gv1 Brick9: tpc-arbiter1:/exp/b3/gv1 (arbiter) Brick10: tpc-glus4:/exp/b4/gv1 Brick11: tpc-glus2:/exp/b4/gv1 Brick12: tpc-arbiter1:/exp/b4/gv1 (arbiter) Brick13: tpc-glus1:/exp/b5/gv1 Brick14: tpc-glus3:/exp/b5/gv1 Brick15: tpc-arbiter2:/exp/b5/gv1 (arbiter) Brick16: tpc-glus1:/exp/b6/gv1 Brick17: tpc-glus3:/exp/b6/gv1 Brick18: tpc-arbiter2:/exp/b6/gv1 (arbiter) Brick19: tpc-glus1:/exp/b7/gv1 Brick20: tpc-glus3:/exp/b7/gv1 Brick21: tpc-arbiter2:/exp/b7/gv1 (arbiter) Brick22: tpc-glus1:/exp/b8/gv1 Brick23: tpc-glus3:/exp/b8/gv1 Brick24: tpc-arbiter2:/exp/b8/gv1 (arbiter) Options Reconfigured: performance.cache-samba-metadata: on performance.cache-invalidation: off features.shard-block-size: 1000MB features.shard: on transport.address-family: inet nfs.disable: on cluster.lookup-optimize: on I'm a bit stumped on this, any help is appreciated. Thank you! ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Unable to create new files or folders using samba and vfs_glusterfs
OK, I'm back from the holiday and updated using the following packages: libsmbclient-4.8.3-4.el7.0.1.x86_64.rpm libwbclient-4.8.3-4.el7.0.1.x86_64.rpm samba-4.8.3-4.el7.0.1.x86_64.rpm samba-client-4.8.3-4.el7.0.1.x86_64.rpm samba-client-libs-4.8.3-4.el7.0.1.x86_64.rpm samba-common-4.8.3-4.el7.0.1.noarch.rpm samba-common-libs-4.8.3-4.el7.0.1.x86_64.rpm samba-common-tools-4.8.3-4.el7.0.1.x86_64.rpm samba-libs-4.8.3-4.el7.0.1.x86_64.rpm samba-vfs-glusterfs-4.8.3-4.el7.0.1.x86_64.rpm First impressions are good! We're able to create files/folders. I'll keep you updated with stability. Thank you! -Original Message- From: Diego Remolina Sent: Thursday, December 20, 2018 1:36 PM To: Matt Waymack Cc: gluster-users@gluster.org List Subject: Re: [Gluster-users] Unable to create new files or folders using samba and vfs_glusterfs Hi Matt, The update is slightly different, has the .1 at the end: Fast-track -> samba-4.8.3-4.el7.0.1.x86_64.rpm vs general -> samba-4.8.3-4.el7.x86_64 I think these are built, but not pushed to fasttrack repo until they get feedback the packages are good. So you may need to use wget to download them and update your packages with these for the test. Diego On Thu, Dec 20, 2018 at 1:06 PM Matt Waymack wrote: > > Hi all, > > > > I’m looking to update Samba from fasttrack, but I only still se 4.8.3 and yum > is not wanting to update. The test build is also showing 4.8.3. > > > > Thank you! > > > > > > From: gluster-users-boun...@gluster.org > On Behalf Of Matt Waymack > Sent: Sunday, December 16, 2018 1:55 PM > To: Diego Remolina > Cc: gluster-users@gluster.org List > Subject: Re: [Gluster-users] Unable to create new files or folders > using samba and vfs_glusterfs > > > > Hi all, sorry for the delayed response. > > > > I can test this out and will report back. It may be as late as Tuesday > before I can test the build. > > > > Thank you! > > > > On Dec 15, 2018 7:46 AM, Diego Remolina wrote: > > Matt, > > > > Can you test the updated samba packages that the CentOS team has built for > FasTrack? > > > > A NOTE has been added to this issue. > > -- > (0033351) pgreco (developer) - 2018-12-15 13:43 > https://bugs.centos.org/view.php?id=15586#c33351 > -- > @dijur...@gmail.com > Here's the link for the test build > https://buildlogs.centos.org/c7-fasttrack.x86_64/samba/20181214164659/ > 4.8.3-4.el7.0.1.x86_64/ > . > Please let us know how it goes. Thanks for testing! > Pablo. > -- > > Diego > > > > > On Fri, Dec 14, 2018 at 12:52 AM Anoop C S wrote: > > > > On Thu, 2018-12-13 at 15:31 +, Matt Waymack wrote: > > > Hi all, > > > > > > I’m having an issue on Windows clients accessing shares via smb > > > when using vfs_glusterfs. They are unable to create any file or > > > folders at the root of the share and get the error “The file is > > > too large for the destination file system.” When I change from > > > vfs_glusterfs to just using a filesystem path to the same > > > location, it works fine (except for the performance hit). All my > > > searches have led to bug 1619108, and that seems to be the > > > symptom, but there doesn’t appear to be any clear resolution. > > > > You figured out the right bug and following is the upstream Samba bug: > > > > https://bugzilla.samba.org/show_bug.cgi?id=13585 > > > > Unfortunately it is only available with v4.8.6 and higher. If > > required I can patch it up and provide a build. > > > > > I’m on the latest version of samba available on CentOS 7 (4.8.3) > > > and I’m on the latest available glusterfs 4.1 (4.1.6). Is there > > > something simple I’m missing to get this going? > > > > > > Thank you! > > > ___ > > > Gluster-users mailing list > > > Gluster-users@gluster.org > > > https://lists.gluster.org/mailman/listinfo/gluster-users > > > > ___ > > Gluster-users mailing list > > Gluster-users@gluster.org > > https://lists.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Unable to create new files or folders using samba and vfs_glusterfs
Hi all, I'm looking to update Samba from fasttrack, but I only still se 4.8.3 and yum is not wanting to update. The test build is also showing 4.8.3. Thank you! From: gluster-users-boun...@gluster.org On Behalf Of Matt Waymack Sent: Sunday, December 16, 2018 1:55 PM To: Diego Remolina Cc: gluster-users@gluster.org List Subject: Re: [Gluster-users] Unable to create new files or folders using samba and vfs_glusterfs Hi all, sorry for the delayed response. I can test this out and will report back. It may be as late as Tuesday before I can test the build. Thank you! On Dec 15, 2018 7:46 AM, Diego Remolina mailto:dijur...@gmail.com>> wrote: Matt, Can you test the updated samba packages that the CentOS team has built for FasTrack? A NOTE has been added to this issue. -- (0033351) pgreco (developer) - 2018-12-15 13:43 https://bugs.centos.org/view.php?id=15586#c33351 -- @dijur...@gmail.com<mailto:dijur...@gmail.com> Here's the link for the test build https://buildlogs.centos.org/c7-fasttrack.x86_64/samba/20181214164659/4.8.3-4.el7.0.1.x86_64/ . Please let us know how it goes. Thanks for testing! Pablo. -- Diego On Fri, Dec 14, 2018 at 12:52 AM Anoop C S mailto:anoo...@cryptolab.net>> wrote: > > On Thu, 2018-12-13 at 15:31 +, Matt Waymack wrote: > > Hi all, > > > > I'm having an issue on Windows clients accessing shares via smb when > > using vfs_glusterfs. They are unable to create any file or folders > > at the root of the share and get the error "The file is too large for > > the destination file system." When I change from vfs_glusterfs to > > just using a filesystem path to the same location, it works fine > > (except for the performance hit). All my searches have led to bug > > 1619108, and that seems to be the symptom, but there doesn't appear > > to be any clear resolution. > > You figured out the right bug and following is the upstream Samba bug: > > https://bugzilla.samba.org/show_bug.cgi?id=13585 > > Unfortunately it is only available with v4.8.6 and higher. If required > I can patch it up and provide a build. > > > I'm on the latest version of samba available on CentOS 7 (4.8.3) and > > I'm on the latest available glusterfs 4.1 (4.1.6). Is there > > something simple I'm missing to get this going? > > > > Thank you! > > ___ > > Gluster-users mailing list > > Gluster-users@gluster.org<mailto:Gluster-users@gluster.org> > > https://lists.gluster.org/mailman/listinfo/gluster-users > > ___ > Gluster-users mailing list > Gluster-users@gluster.org<mailto:Gluster-users@gluster.org> > https://lists.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Unable to create new files or folders using samba and vfs_glusterfs
Hi all, sorry for the delayed response. I can test this out and will report back. It may be as late as Tuesday before I can test the build. Thank you! On Dec 15, 2018 7:46 AM, Diego Remolina wrote: Matt, Can you test the updated samba packages that the CentOS team has built for FasTrack? A NOTE has been added to this issue. -- (0033351) pgreco (developer) - 2018-12-15 13:43 https://bugs.centos.org/view.php?id=15586#c33351 -- @dijur...@gmail.com<mailto:dijur...@gmail.com> Here's the link for the test build https://buildlogs.centos.org/c7-fasttrack.x86_64/samba/20181214164659/4.8.3-4.el7.0.1.x86_64/ . Please let us know how it goes. Thanks for testing! Pablo. -- Diego On Fri, Dec 14, 2018 at 12:52 AM Anoop C S mailto:anoo...@cryptolab.net>> wrote: > > On Thu, 2018-12-13 at 15:31 +, Matt Waymack wrote: > > Hi all, > > > > I'm having an issue on Windows clients accessing shares via smb when > > using vfs_glusterfs. They are unable to create any file or folders > > at the root of the share and get the error "The file is too large for > > the destination file system." When I change from vfs_glusterfs to > > just using a filesystem path to the same location, it works fine > > (except for the performance hit). All my searches have led to bug > > 1619108, and that seems to be the symptom, but there doesn't appear > > to be any clear resolution. > > You figured out the right bug and following is the upstream Samba bug: > > https://bugzilla.samba.org/show_bug.cgi?id=13585 > > Unfortunately it is only available with v4.8.6 and higher. If required > I can patch it up and provide a build. > > > I'm on the latest version of samba available on CentOS 7 (4.8.3) and > > I'm on the latest available glusterfs 4.1 (4.1.6). Is there > > something simple I'm missing to get this going? > > > > Thank you! > > ___ > > Gluster-users mailing list > > Gluster-users@gluster.org<mailto:Gluster-users@gluster.org> > > https://lists.gluster.org/mailman/listinfo/gluster-users > > ___ > Gluster-users mailing list > Gluster-users@gluster.org<mailto:Gluster-users@gluster.org> > https://lists.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] Unable to create new files or folders using samba and vfs_glusterfs
Hi all, I'm having an issue on Windows clients accessing shares via smb when using vfs_glusterfs. They are unable to create any file or folders at the root of the share and get the error "The file is too large for the destination file system." When I change from vfs_glusterfs to just using a filesystem path to the same location, it works fine (except for the performance hit). All my searches have led to bug 1619108, and that seems to be the symptom, but there doesn't appear to be any clear resolution. I'm on the latest version of samba available on CentOS 7 (4.8.3) and I'm on the latest available glusterfs 4.1 (4.1.6). Is there something simple I'm missing to get this going? Thank you! ___ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] How to make sure self-heal backlog is empty ?
Mine also has a list of files that seemingly never heal. They are usually isolated on my arbiter bricks, but not always. I would also like to find an answer for this behavior. -Original Message- From: gluster-users-boun...@gluster.org [mailto:gluster-users-boun...@gluster.org] On Behalf Of Hoggins! Sent: Tuesday, December 19, 2017 12:26 PM To: gluster-users Subject: [Gluster-users] How to make sure self-heal backlog is empty ? Hello list, I'm not sure what to look for here, not sure if what I'm seeing is the actual "backlog" (that we need to make sure is empty while performing a rolling upgrade before going to the next node), how can I tell, while reading this, if it's okay to reboot / upgrade my next node in the pool ? Here is what I do for checking : for i in `gluster volume list`; do gluster volume heal $i info; done And here is what I get : Brick ngluster-1.network.hoggins.fr:/export/brick/clem Status: Connected Number of entries: 0 Brick ngluster-2.network.hoggins.fr:/export/brick/clem Status: Connected Number of entries: 0 Brick ngluster-3.network.hoggins.fr:/export/brick/clem Status: Connected Number of entries: 0 Brick ngluster-1.network.hoggins.fr:/export/brick/mailer Status: Connected Number of entries: 0 Brick ngluster-2.network.hoggins.fr:/export/brick/mailer Status: Connected Number of entries: 0 Brick ngluster-3.network.hoggins.fr:/export/brick/mailer Status: Connected Number of entries: 1 Brick ngluster-1.network.hoggins.fr:/export/brick/rom Status: Connected Number of entries: 0 Brick ngluster-2.network.hoggins.fr:/export/brick/rom Status: Connected Number of entries: 0 Brick ngluster-3.network.hoggins.fr:/export/brick/rom Status: Connected Number of entries: 1 Brick ngluster-1.network.hoggins.fr:/export/brick/thedude Status: Connected Number of entries: 0 Brick ngluster-2.network.hoggins.fr:/export/brick/thedude Status: Connected Number of entries: 1 Brick ngluster-3.network.hoggins.fr:/export/brick/thedude Status: Connected Number of entries: 0 Brick ngluster-1.network.hoggins.fr:/export/brick/web Status: Connected Number of entries: 0 Brick ngluster-2.network.hoggins.fr:/export/brick/web Status: Connected Number of entries: 3 Brick ngluster-3.network.hoggins.fr:/export/brick/web Status: Connected Number of entries: 11 Should I be worrying with this never ending ? Thank you, Hoggins! ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] Production Volume will not start
Hi thank you for the reply. Ultimately the volume did eventually start after about 1.5 hours from the volume start command. Could it have something to do with the amount of files on the volume? From: Atin Mukherjee [mailto:amukh...@redhat.com] Sent: Monday, December 18, 2017 1:26 AM To: Matt Waymack Cc: gluster-users Subject: Re: [Gluster-users] Production Volume will not start On Sat, Dec 16, 2017 at 12:45 AM, Matt Waymack mailto:mwaym...@nsgdv.com>> wrote: Hi all, I have an issue where our volume will not start from any node. When attempting to start the volume it will eventually return: Error: Request timed out For some time after that, the volume is locked and we either have to wait or restart Gluster services. In the gluserd.log, it shows the following: [2017-12-15 18:00:12.423478] I [glusterd-utils.c:5926:glusterd_brick_start] 0-management: starting a fresh brick process for brick /exp/b1/gv0 [2017-12-15 18:03:12.673885] I [glusterd-locks.c:729:gd_mgmt_v3_unlock_timer_cbk] 0-management: In gd_mgmt_v3_unlock_timer_cbk [2017-12-15 18:06:34.304868] I [MSGID: 106499] [glusterd-handler.c:4303:__glusterd_handle_status_volume] 0-management: Received status volume req for volume gv0 [2017-12-15 18:06:34.306603] E [MSGID: 106301] [glusterd-syncop.c:1353:gd_stage_op_phase] 0-management: Staging of operation 'Volume Status' failed on localhost : Volume gv0 is not started [2017-12-15 18:11:39.412700] I [glusterd-utils.c:5926:glusterd_brick_start] 0-management: starting a fresh brick process for brick /exp/b2/gv0 [2017-12-15 18:11:42.405966] I [MSGID: 106143] [glusterd-pmap.c:280:pmap_registry_bind] 0-pmap: adding brick /exp/b2/gv0 on port 49153 [2017-12-15 18:11:42.406415] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2017-12-15 18:11:42.406669] I [glusterd-utils.c:5926:glusterd_brick_start] 0-management: starting a fresh brick process for brick /exp/b3/gv0 [2017-12-15 18:14:39.737192] I [glusterd-locks.c:729:gd_mgmt_v3_unlock_timer_cbk] 0-management: In gd_mgmt_v3_unlock_timer_cbk [2017-12-15 18:35:20.856849] I [MSGID: 106143] [glusterd-pmap.c:280:pmap_registry_bind] 0-pmap: adding brick /exp/b1/gv0 on port 49152 [2017-12-15 18:35:20.857508] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2017-12-15 18:35:20.858277] I [glusterd-utils.c:5926:glusterd_brick_start] 0-management: starting a fresh brick process for brick /exp/b4/gv0 [2017-12-15 18:46:07.953995] I [MSGID: 106143] [glusterd-pmap.c:280:pmap_registry_bind] 0-pmap: adding brick /exp/b3/gv0 on port 49154 [2017-12-15 18:46:07.954432] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2017-12-15 18:46:07.971355] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600 [2017-12-15 18:46:07.989392] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-nfs: setting frame-timeout to 600 [2017-12-15 18:46:07.989543] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped [2017-12-15 18:46:07.989562] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: nfs service is stopped [2017-12-15 18:46:07.989575] I [MSGID: 106600] [glusterd-nfs-svc.c:82:glusterd_nfssvc_manager] 0-management: nfs/server.so xlator is not installed [2017-12-15 18:46:07.989601] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-glustershd: setting frame-timeout to 600 [2017-12-15 18:46:08.003011] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: glustershd already stopped [2017-12-15 18:46:08.003039] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: glustershd service is stopped [2017-12-15 18:46:08.003079] I [MSGID: 106567] [glusterd-svc-mgmt.c:197:glusterd_svc_start] 0-management: Starting glustershd service [2017-12-15 18:46:09.005173] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-quotad: setting frame-timeout to 600 [2017-12-15 18:46:09.005569] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-bitd: setting frame-timeout to 600 [2017-12-15 18:46:09.005673] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped [2017-12-15 18:46:09.005689] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: bitd service is stopped [2017-12-15 18:46:09.005712] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-scrub: setting frame-timeout to 600 [2017-12-15 18:46:09.005892] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped [2017-12-15 18:46:09.005912] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: scrub service is stopped [2017-12-15 18:46:09.026559] I [socket.c:3672:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1) [2017-12-15 18:46:09.026568] E [rpcsvc.c:1364:rpcsvc_sub
[Gluster-users] Production Volume will not start
Hi all, I have an issue where our volume will not start from any node. When attempting to start the volume it will eventually return: Error: Request timed out For some time after that, the volume is locked and we either have to wait or restart Gluster services. In the gluserd.log, it shows the following: [2017-12-15 18:00:12.423478] I [glusterd-utils.c:5926:glusterd_brick_start] 0-management: starting a fresh brick process for brick /exp/b1/gv0 [2017-12-15 18:03:12.673885] I [glusterd-locks.c:729:gd_mgmt_v3_unlock_timer_cbk] 0-management: In gd_mgmt_v3_unlock_timer_cbk [2017-12-15 18:06:34.304868] I [MSGID: 106499] [glusterd-handler.c:4303:__glusterd_handle_status_volume] 0-management: Received status volume req for volume gv0 [2017-12-15 18:06:34.306603] E [MSGID: 106301] [glusterd-syncop.c:1353:gd_stage_op_phase] 0-management: Staging of operation 'Volume Status' failed on localhost : Volume gv0 is not started [2017-12-15 18:11:39.412700] I [glusterd-utils.c:5926:glusterd_brick_start] 0-management: starting a fresh brick process for brick /exp/b2/gv0 [2017-12-15 18:11:42.405966] I [MSGID: 106143] [glusterd-pmap.c:280:pmap_registry_bind] 0-pmap: adding brick /exp/b2/gv0 on port 49153 [2017-12-15 18:11:42.406415] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2017-12-15 18:11:42.406669] I [glusterd-utils.c:5926:glusterd_brick_start] 0-management: starting a fresh brick process for brick /exp/b3/gv0 [2017-12-15 18:14:39.737192] I [glusterd-locks.c:729:gd_mgmt_v3_unlock_timer_cbk] 0-management: In gd_mgmt_v3_unlock_timer_cbk [2017-12-15 18:35:20.856849] I [MSGID: 106143] [glusterd-pmap.c:280:pmap_registry_bind] 0-pmap: adding brick /exp/b1/gv0 on port 49152 [2017-12-15 18:35:20.857508] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2017-12-15 18:35:20.858277] I [glusterd-utils.c:5926:glusterd_brick_start] 0-management: starting a fresh brick process for brick /exp/b4/gv0 [2017-12-15 18:46:07.953995] I [MSGID: 106143] [glusterd-pmap.c:280:pmap_registry_bind] 0-pmap: adding brick /exp/b3/gv0 on port 49154 [2017-12-15 18:46:07.954432] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-management: setting frame-timeout to 600 [2017-12-15 18:46:07.971355] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-snapd: setting frame-timeout to 600 [2017-12-15 18:46:07.989392] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-nfs: setting frame-timeout to 600 [2017-12-15 18:46:07.989543] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: nfs already stopped [2017-12-15 18:46:07.989562] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: nfs service is stopped [2017-12-15 18:46:07.989575] I [MSGID: 106600] [glusterd-nfs-svc.c:82:glusterd_nfssvc_manager] 0-management: nfs/server.so xlator is not installed [2017-12-15 18:46:07.989601] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-glustershd: setting frame-timeout to 600 [2017-12-15 18:46:08.003011] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: glustershd already stopped [2017-12-15 18:46:08.003039] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: glustershd service is stopped [2017-12-15 18:46:08.003079] I [MSGID: 106567] [glusterd-svc-mgmt.c:197:glusterd_svc_start] 0-management: Starting glustershd service [2017-12-15 18:46:09.005173] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-quotad: setting frame-timeout to 600 [2017-12-15 18:46:09.005569] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-bitd: setting frame-timeout to 600 [2017-12-15 18:46:09.005673] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: bitd already stopped [2017-12-15 18:46:09.005689] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: bitd service is stopped [2017-12-15 18:46:09.005712] I [rpc-clnt.c:1044:rpc_clnt_connection_init] 0-scrub: setting frame-timeout to 600 [2017-12-15 18:46:09.005892] I [MSGID: 106132] [glusterd-proc-mgmt.c:83:glusterd_proc_stop] 0-management: scrub already stopped [2017-12-15 18:46:09.005912] I [MSGID: 106568] [glusterd-svc-mgmt.c:229:glusterd_svc_stop] 0-management: scrub service is stopped [2017-12-15 18:46:09.026559] I [socket.c:3672:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1) [2017-12-15 18:46:09.026568] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x2, Program: GlusterD svc cli, ProgVers: 2, Proc: 27) to rpc-transport (socket.management) [2017-12-15 18:46:09.026582] E [MSGID: 106430] [glusterd-utils.c:568:glusterd_submit_reply] 0-glusterd: Reply submission failed [2017-12-15 18:56:17.962251] E [rpc-clnt.c:185:call_bail] 0-management: bailing out frame type(glusterd mgmt v3) op(--(4)) xid = 0x14 sent = 2017-12-15 18:46:09.005976. timeout = 600 for 10.17.100.208:24007 [2017-12-15 18:56:17.962324] E [MSGID: 106116
Re: [Gluster-users] gfid entries in volume heal info that do not heal
In my case I was able to delete the hard links in the .glusterfs folders of the bricks and it seems to have done the trick, thanks! From: Karthik Subrahmanya [mailto:ksubr...@redhat.com] Sent: Monday, October 23, 2017 1:52 AM To: Jim Kinney ; Matt Waymack Cc: gluster-users Subject: Re: [Gluster-users] gfid entries in volume heal info that do not heal Hi Jim & Matt, Can you also check for the link count in the stat output of those hardlink entries in the .glusterfs folder on the bricks. If the link count is 1 on all the bricks for those entries, then they are orphaned entries and you can delete those hardlinks. To be on the safer side have a backup before deleting any of the entries. Regards, Karthik On Fri, Oct 20, 2017 at 3:18 AM, Jim Kinney mailto:jim.kin...@gmail.com>> wrote: I've been following this particular thread as I have a similar issue (RAID6 array failed out with 3 dead drives at once while a 12 TB load was being copied into one mounted space - what a mess) I have >700K GFID entries that have no path data: Example: getfattr -d -e hex -m . .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421 # file: .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.bit-rot.version=0x020059b1b316000270e7 trusted.gfid=0xa5ef5af7401b84b5ff2a51c10421 [root@bmidata1<mailto:root@bmidata1> brick]# getfattr -d -n trusted.glusterfs.pathinfo -e hex -m . .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421 .glusterfs/00/00/a5ef-5af7-401b-84b5-ff2a51c10421: trusted.glusterfs.pathinfo: No such attribute I had to totally rebuild the dead RAID array and did a copy from the live one before activating gluster on the rebuilt system. I accidentally copied over the .glusterfs folder from the working side (replica 2 only for now - adding arbiter node as soon as I can get this one cleaned up). I've run the methods from "http://docs.gluster.org/en/latest/Troubleshooting/gfid-to-path/"; with no results using random GFIDs. A full systemic run using the script from method 3 crashes with "too many nested links" error (or something similar). When I run gluster volume heal volname info, I get 700K+ GFIDs. Oh. gluster 3.8.4 on Centos 7.3 Should I just remove the contents of the .glusterfs folder on both and restart gluster and run a ls/stat on every file? When I run a heal, it no longer has a decreasing number of files to heal so that's an improvement over the last 2-3 weeks :-) On Tue, 2017-10-17 at 14:34 +, Matt Waymack wrote: Attached is the heal log for the volume as well as the shd log. Run these commands on all the bricks of the replica pair to get the attrs set on the backend. [root@tpc-cent-glus1-081017 ~]# getfattr -d -e hex -m . /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2 getfattr: Removing leading '/' from absolute path names # file: exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x trusted.afr.gv0-client-2=0x0001 trusted.gfid=0x108694dbc0394b7cbd3dad6a15d811a2 trusted.gfid2path.9a2f5ada22eb9c45=0x38633262623330322d323466332d346463622d393630322d3839356136396461363131662f435f564f4c2d623030312d693637342d63642d63772e6d6435 [root@tpc-cent-glus2-081017 ~]# getfattr -d -e hex -m . /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2 getfattr: Removing leading '/' from absolute path names # file: exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x trusted.afr.gv0-client-2=0x0001 trusted.gfid=0x108694dbc0394b7cbd3dad6a15d811a2 trusted.gfid2path.9a2f5ada22eb9c45=0x38633262623330322d323466332d346463622d393630322d3839356136396461363131662f435f564f4c2d623030312d693637342d63642d63772e6d6435 [root@tpc-arbiter1-100617 ~]# getfattr -d -e hex -m . /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2 getfattr: /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2: No such file or directory [root@tpc-cent-glus1-081017 ~]# getfattr -d -e hex -m . /exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3 getfattr: Removing leading '/' from absolute path names # file: exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x trusted.afr.gv0-client-11=0x0001 trusted.gfid=0xe0c56bf78bfe46cabde1e46b92d33df3 trusted.gfid2path.be3ba24c3ef95ff2=0x63323366353834652d353566652d343033382d393131622d386637306365633461613666
Re: [Gluster-users] gfid entries in volume heal info that do not heal
It looks like these entries don't have a corresponding file path, they exist only in .glusters and appear to be orphaned: [root@tpc-cent-glus2-081017 ~]# find /exp/b4/gv0 -samefile /exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3 /exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3 [root@tpc-cent-glus2-081017 ~]# find /exp/b4/gv0 -samefile /exp/b4/gv0/.glusterfs/6f/0a/6f0a0549-8669-46de-8823-d6677fdca8e3 /exp/b4/gv0/.glusterfs/6f/0a/6f0a0549-8669-46de-8823-d6677fdca8e3 [root@tpc-cent-glus1-081017 ~]# find /exp/b1/gv0 -samefile /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2 /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2 [root@tpc-cent-glus1-081017 ~]# find /exp/b4/gv0 -samefile /exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3 /exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3 Occasionally I would get these gfid entries in the heal info output but would just run tree against the volume. After that the files would trigger self-heal, and the gfid entries would be updated with their volume paths. That does not seem to be the case here so I feel that all of these entries are orphaned. From: Karthik Subrahmanya [mailto:ksubr...@redhat.com] Sent: Wednesday, October 18, 2017 4:34 AM To: Matt Waymack Cc: gluster-users Subject: Re: [Gluster-users] gfid entries in volume heal info that do not heal Hey Matt, From the xattr output, it looks like the files are not present on the arbiter brick & needs healing. But on the parent it does not have the pending markers set for those entries. The workaround for this is you need to do a lookup on the file which needs heal from the mount, so it will create the entry on the arbiter brick and then run the volume heal to do the healing. Follow these steps to resolve the issue: (first try this on one file and check whether it gets healed. If it gets healed then do this for all the remaining files) 1. Get the file path for the gfids you got from heal info output. find -samefile // 2. Do ls/stat on the file from mount. 3. Run volume heal. 4. Check the heal info output to see whether the file got healed. If one file gets healed, then do step 1 & 2 for the rest of the files and do step 3 & 4 once at the end. Let me know if that resolves the issue. Thanks & Regards, Karthik On Tue, Oct 17, 2017 at 8:04 PM, Matt Waymack <mailto:mwaym...@nsgdv.com> wrote: Attached is the heal log for the volume as well as the shd log. >> Run these commands on all the bricks of the replica pair to get the attrs >> set on the backend. [root@tpc-cent-glus1-081017 ~]# getfattr -d -e hex -m . /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2 getfattr: Removing leading '/' from absolute path names # file: exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x trusted.afr.gv0-client-2=0x0001 trusted.gfid=0x108694dbc0394b7cbd3dad6a15d811a2 trusted.gfid2path.9a2f5ada22eb9c45=0x38633262623330322d323466332d346463622d393630322d3839356136396461363131662f435f564f4c2d623030312d693637342d63642d63772e6d6435 [root@tpc-cent-glus2-081017 ~]# getfattr -d -e hex -m . /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2 getfattr: Removing leading '/' from absolute path names # file: exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x trusted.afr.gv0-client-2=0x0001 trusted.gfid=0x108694dbc0394b7cbd3dad6a15d811a2 trusted.gfid2path.9a2f5ada22eb9c45=0x38633262623330322d323466332d346463622d393630322d3839356136396461363131662f435f564f4c2d623030312d693637342d63642d63772e6d6435 [root@tpc-arbiter1-100617 ~]# getfattr -d -e hex -m . /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2 getfattr: /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2: No such file or directory [root@tpc-cent-glus1-081017 ~]# getfattr -d -e hex -m . /exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3 getfattr: Removing leading '/' from absolute path names # file: exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x trusted.afr.gv0-client-11=0x0001 trusted.gfid=0xe0c56bf78bfe46cabde1e46b92d33df3 trusted.gfid2path.be3ba24c3ef95ff2=0x63323366353834652d353566652d343033382d393131622d3866373063656334616136662f435f564f4c2d623030332d69313331342d63642d636d2d63722e6d6435 [root@tpc-cent-glus2-081017 ~]# getfattr -d -e hex -m . /exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3 getfattr: Removi
Re: [Gluster-users] gfid entries in volume heal info that do not heal
Attached is the heal log for the volume as well as the shd log. >> Run these commands on all the bricks of the replica pair to get the attrs >> set on the backend. [root@tpc-cent-glus1-081017 ~]# getfattr -d -e hex -m . /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2 getfattr: Removing leading '/' from absolute path names # file: exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x trusted.afr.gv0-client-2=0x0001 trusted.gfid=0x108694dbc0394b7cbd3dad6a15d811a2 trusted.gfid2path.9a2f5ada22eb9c45=0x38633262623330322d323466332d346463622d393630322d3839356136396461363131662f435f564f4c2d623030312d693637342d63642d63772e6d6435 [root@tpc-cent-glus2-081017 ~]# getfattr -d -e hex -m . /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2 getfattr: Removing leading '/' from absolute path names # file: exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x trusted.afr.gv0-client-2=0x0001 trusted.gfid=0x108694dbc0394b7cbd3dad6a15d811a2 trusted.gfid2path.9a2f5ada22eb9c45=0x38633262623330322d323466332d346463622d393630322d3839356136396461363131662f435f564f4c2d623030312d693637342d63642d63772e6d6435 [root@tpc-arbiter1-100617 ~]# getfattr -d -e hex -m . /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2 getfattr: /exp/b1/gv0/.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811a2: No such file or directory [root@tpc-cent-glus1-081017 ~]# getfattr -d -e hex -m . /exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3 getfattr: Removing leading '/' from absolute path names # file: exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x trusted.afr.gv0-client-11=0x0001 trusted.gfid=0xe0c56bf78bfe46cabde1e46b92d33df3 trusted.gfid2path.be3ba24c3ef95ff2=0x63323366353834652d353566652d343033382d393131622d3866373063656334616136662f435f564f4c2d623030332d69313331342d63642d636d2d63722e6d6435 [root@tpc-cent-glus2-081017 ~]# getfattr -d -e hex -m . /exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3 getfattr: Removing leading '/' from absolute path names # file: exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3 security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 trusted.afr.dirty=0x trusted.afr.gv0-client-11=0x0001 trusted.gfid=0xe0c56bf78bfe46cabde1e46b92d33df3 trusted.gfid2path.be3ba24c3ef95ff2=0x63323366353834652d353566652d343033382d393131622d3866373063656334616136662f435f564f4c2d623030332d69313331342d63642d636d2d63722e6d6435 [root@tpc-arbiter1-100617 ~]# getfattr -d -e hex -m . /exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3 getfattr: /exp/b4/gv0/.glusterfs/e0/c5/e0c56bf7-8bfe-46ca-bde1-e46b92d33df3: No such file or directory >> And the output of "gluster volume heal info split-brain" [root@tpc-cent-glus1-081017 ~]# gluster volume heal gv0 info split-brain Brick tpc-cent-glus1-081017:/exp/b1/gv0 Status: Connected Number of entries in split-brain: 0 Brick tpc-cent-glus2-081017:/exp/b1/gv0 Status: Connected Number of entries in split-brain: 0 Brick tpc-arbiter1-100617:/exp/b1/gv0 Status: Connected Number of entries in split-brain: 0 Brick tpc-cent-glus1-081017:/exp/b2/gv0 Status: Connected Number of entries in split-brain: 0 Brick tpc-cent-glus2-081017:/exp/b2/gv0 Status: Connected Number of entries in split-brain: 0 Brick tpc-arbiter1-100617:/exp/b2/gv0 Status: Connected Number of entries in split-brain: 0 Brick tpc-cent-glus1-081017:/exp/b3/gv0 Status: Connected Number of entries in split-brain: 0 Brick tpc-cent-glus2-081017:/exp/b3/gv0 Status: Connected Number of entries in split-brain: 0 Brick tpc-arbiter1-100617:/exp/b3/gv0 Status: Connected Number of entries in split-brain: 0 Brick tpc-cent-glus1-081017:/exp/b4/gv0 Status: Connected Number of entries in split-brain: 0 Brick tpc-cent-glus2-081017:/exp/b4/gv0 Status: Connected Number of entries in split-brain: 0 Brick tpc-arbiter1-100617:/exp/b4/gv0 Status: Connected Number of entries in split-brain: 0 -Matt From: Karthik Subrahmanya [mailto:ksubr...@redhat.com] Sent: Tuesday, October 17, 2017 1:26 AM To: Matt Waymack Cc: gluster-users Subject: Re: [Gluster-users] gfid entries in volume heal info that do not heal Hi Matt, Run these commands on all the bricks of the replica pair to get the attrs set on the backend. On the bricks of first replica set: getfattr -d -e hex -m . /.glusterfs/10/86/108694db-c039-4b7c-bd3d-ad6a15d811
Re: [Gluster-users] gfid entries in volume heal info that do not heal
OK, so here’s my output of the volume info and the heal info. I have not yet tracked down physical location of these files, any tips to finding them would be appreciated, but I’m definitely just wanting them gone. I forgot to mention earlier that the cluster is running 3.12 and was upgraded from 3.10; these files were likely stuck like this when it was on 3.10. [root@tpc-cent-glus1-081017 ~]# gluster volume info gv0 Volume Name: gv0 Type: Distributed-Replicate Volume ID: 8f07894d-e3ab-4a65-bda1-9d9dd46db007 Status: Started Snapshot Count: 0 Number of Bricks: 4 x (2 + 1) = 12 Transport-type: tcp Bricks: Brick1: tpc-cent-glus1-081017:/exp/b1/gv0 Brick2: tpc-cent-glus2-081017:/exp/b1/gv0 Brick3: tpc-arbiter1-100617:/exp/b1/gv0 (arbiter) Brick4: tpc-cent-glus1-081017:/exp/b2/gv0 Brick5: tpc-cent-glus2-081017:/exp/b2/gv0 Brick6: tpc-arbiter1-100617:/exp/b2/gv0 (arbiter) Brick7: tpc-cent-glus1-081017:/exp/b3/gv0 Brick8: tpc-cent-glus2-081017:/exp/b3/gv0 Brick9: tpc-arbiter1-100617:/exp/b3/gv0 (arbiter) Brick10: tpc-cent-glus1-081017:/exp/b4/gv0 Brick11: tpc-cent-glus2-081017:/exp/b4/gv0 Brick12: tpc-arbiter1-100617:/exp/b4/gv0 (arbiter) Options Reconfigured: nfs.disable: on transport.address-family: inet [root@tpc-cent-glus1-081017 ~]# gluster volume heal gv0 info Brick tpc-cent-glus1-081017:/exp/b1/gv0 Status: Connected Number of entries: 118 Brick tpc-cent-glus2-081017:/exp/b1/gv0 Status: Connected Number of entries: 118 Brick tpc-arbiter1-100617:/exp/b1/gv0 Status: Connected Number of entries: 0 Brick tpc-cent-glus1-081017:/exp/b2/gv0 Status: Connected Number of entries: 0 Brick tpc-cent-glus2-081017:/exp/b2/gv0 Status: Connected Number of entries: 0 Brick tpc-arbiter1-100617:/exp/b2/gv0 Status: Connected Number of entries: 0 Brick tpc-cent-glus1-081017:/exp/b3/gv0 Status: Connected Number of entries: 0 Brick tpc-cent-glus2-081017:/exp/b3/gv0 Status: Connected Number of entries: 0 Brick tpc-arbiter1-100617:/exp/b3/gv0 Status: Connected Number of entries: 0 Brick tpc-cent-glus1-081017:/exp/b4/gv0 Status: Connected Number of entries: 24 Brick tpc-cent-glus2-081017:/exp/b4/gv0 Status: Connected Number of entries: 24 Brick tpc-arbiter1-100617:/exp/b4/gv0 Status: Connected Number of entries: 0 Thank you for your help! From: Karthik Subrahmanya [mailto:ksubr...@redhat.com] Sent: Monday, October 16, 2017 10:27 AM To: Matt Waymack Cc: gluster-users Subject: Re: [Gluster-users] gfid entries in volume heal info that do not heal Hi Matt, The files might be in split brain. Could you please send the outputs of these? gluster volume info gluster volume heal info And also the getfattr output of the files which are in the heal info output from all the bricks of that replica pair. getfattr -d -e hex -m . Thanks & Regards Karthik On 16-Oct-2017 8:16 PM, "Matt Waymack" mailto:mwaym...@nsgdv.com>> wrote: Hi all, I have a volume where the output of volume heal info shows several gfid entries to be healed, but they’ve been there for weeks and have not healed. Any normal file that shows up on the heal info does get healed as expected, but these gfid entries do not. Is there any way to remove these orphaned entries from the volume so they are no longer stuck in the heal process? Thank you! ___ Gluster-users mailing list Gluster-users@gluster.org<mailto:Gluster-users@gluster.org> http://lists.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] gfid entries in volume heal info that do not heal
Hi all, I have a volume where the output of volume heal info shows several gfid entries to be healed, but they've been there for weeks and have not healed. Any normal file that shows up on the heal info does get healed as expected, but these gfid entries do not. Is there any way to remove these orphaned entries from the volume so they are no longer stuck in the heal process? Thank you! ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users