Re: [Gluster-users] Gluster Heal Issue

2020-02-04 Thread Karthik Subrahmanya
Hi Chris,

By looking at the data provided (hope the other entry is also a file and
not the parent of the file for which the stat & getfattrs are provided) it
seems like the parent(s) of these entries are missing the entry pending
markers on the good bricks, which is necessary to create these files on the
bad node. Can you try the following steps and let us know whether you have
any luck with this?

- Find the actual path of the files on one of the bricks where it exists
using the below command
find  -samefile //
- Run lookup on the files from a client mount point
- Run gluster volume heal 
- Check the heal info to see whether these files gets healed or not

Regards,
Karthik

On Sat, Feb 1, 2020 at 2:25 PM Christian Reiss 
wrote:

> Hey folks,
>
> in our production setup with 3 nodes (HCI) we took one host down
> (maintenance, stop gluster, poweroff via ssh/ovirt engine). Once it was
> up the gluster hat 2k healing entries that went down in a matter on 10
> minutes to 2.
>
> Those two give me a headache:
>
> [root@node03:~] # gluster vol heal ssd_storage info
> Brick node01:/gluster_bricks/ssd_storage/ssd_storage
> 
> 
> Status: Connected
> Number of entries: 2
>
> Brick node02:/gluster_bricks/ssd_storage/ssd_storage
> Status: Connected
> Number of entries: 0
>
> Brick node03:/gluster_bricks/ssd_storage/ssd_storage
> 
> 
> Status: Connected
> Number of entries: 2
>
> No paths, only gfid. We took down node2, so it does not have the file:
>
> [root@node01:~] # md5sum
>
> /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6
> 75c4941683b7eabc223fc9d5f022a77c
>
> /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6
>
> [root@node02:~] # md5sum
>
> /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6
> md5sum:
> /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6:
>
> No such file or directory
>
> [root@node03:~] # md5sum
>
> /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6
> 75c4941683b7eabc223fc9d5f022a77c
>
> /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6
>
> The other two files are md5-identical.
>
> These flags are identical, too:
>
> [root@node01:~] # getfattr -d -m . -e hex
>
> /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6
> getfattr: Removing leading '/' from absolute path names
> # file:
>
> gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6
>
> security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
> trusted.afr.dirty=0x
> trusted.afr.ssd_storage-client-1=0x004f0001
> trusted.gfid=0xa121e4fb09844e4194d78f0c4f87f4b6
>
> trusted.gfid2path.d4cf876a215b173f=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f38366461303238392d663734662d343230302d393238342d3637386537626437363139352e31323030
>
> trusted.glusterfs.mdata=0x015e349b1e1139aa2a5e349b1e1139aa2a5e349949304a5eb2
>
> getfattr: Removing leading '/' from absolute path names
> # file:
>
> gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6
>
> security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
> trusted.afr.dirty=0x
> trusted.afr.ssd_storage-client-1=0x004f0001
> trusted.gfid=0xa121e4fb09844e4194d78f0c4f87f4b6
>
> trusted.gfid2path.d4cf876a215b173f=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f38366461303238392d663734662d343230302d393238342d3637386537626437363139352e31323030
>
> trusted.glusterfs.mdata=0x015e349b1e1139aa2a5e349b1e1139aa2a5e349949304a5eb2
>
>
> The only thing I can see is the different change times, really:
>
> [root@node01:~] # stat
>
> /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6
>File:
>
> ‘/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6’
>Size: 67108864   Blocks: 54576  IO Block: 4096   regular file
> Device: fd09h/64777dInode: 16152829909  Links: 2
> Access: (0660/-rw-rw)  Uid: (0/root)   Gid: (0/root)
> Context: system_u:object_r:glusterd_brick_t:s0
> Access: 2020-01-31 22:16:57.812620635 +0100
> Modify: 2020-02-01 07:19:24.183045141 +0100
> Change: 2020-02-01 07:19:24.186045203 +0100
>   Birth: -
>
> [root@node03:~] # stat
>
> /gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6
>File:
>
> ‘/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6’
>Size: 67108864   Blocks: 54576  IO 

[Gluster-users] Gluster Heal Issue

2020-02-01 Thread Christian Reiss

Hey folks,

in our production setup with 3 nodes (HCI) we took one host down 
(maintenance, stop gluster, poweroff via ssh/ovirt engine). Once it was 
up the gluster hat 2k healing entries that went down in a matter on 10 
minutes to 2.


Those two give me a headache:

[root@node03:~] # gluster vol heal ssd_storage info
Brick node01:/gluster_bricks/ssd_storage/ssd_storage


Status: Connected
Number of entries: 2

Brick node02:/gluster_bricks/ssd_storage/ssd_storage
Status: Connected
Number of entries: 0

Brick node03:/gluster_bricks/ssd_storage/ssd_storage


Status: Connected
Number of entries: 2

No paths, only gfid. We took down node2, so it does not have the file:

[root@node01:~] # md5sum 
/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6
75c4941683b7eabc223fc9d5f022a77c 
/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6


[root@node02:~] # md5sum 
/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6
md5sum: 
/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6: 
No such file or directory


[root@node03:~] # md5sum 
/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6
75c4941683b7eabc223fc9d5f022a77c 
/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6


The other two files are md5-identical.

These flags are identical, too:

[root@node01:~] # getfattr -d -m . -e hex 
/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6

getfattr: Removing leading '/' from absolute path names
# file: 
gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6

security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x
trusted.afr.ssd_storage-client-1=0x004f0001
trusted.gfid=0xa121e4fb09844e4194d78f0c4f87f4b6
trusted.gfid2path.d4cf876a215b173f=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f38366461303238392d663734662d343230302d393238342d3637386537626437363139352e31323030
trusted.glusterfs.mdata=0x015e349b1e1139aa2a5e349b1e1139aa2a5e349949304a5eb2

getfattr: Removing leading '/' from absolute path names
# file: 
gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6

security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x
trusted.afr.ssd_storage-client-1=0x004f0001
trusted.gfid=0xa121e4fb09844e4194d78f0c4f87f4b6
trusted.gfid2path.d4cf876a215b173f=0x62653331383633382d653861302d346336642d393737642d3761393337616138343830362f38366461303238392d663734662d343230302d393238342d3637386537626437363139352e31323030
trusted.glusterfs.mdata=0x015e349b1e1139aa2a5e349b1e1139aa2a5e349949304a5eb2


The only thing I can see is the different change times, really:

[root@node01:~] # stat 
/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6
  File: 
‘/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6’

  Size: 67108864Blocks: 54576  IO Block: 4096   regular file
Device: fd09h/64777dInode: 16152829909  Links: 2
Access: (0660/-rw-rw)  Uid: (0/root)   Gid: (0/root)
Context: system_u:object_r:glusterd_brick_t:s0
Access: 2020-01-31 22:16:57.812620635 +0100
Modify: 2020-02-01 07:19:24.183045141 +0100
Change: 2020-02-01 07:19:24.186045203 +0100
 Birth: -

[root@node03:~] # stat 
/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6
  File: 
‘/gluster_bricks/ssd_storage/ssd_storage/.glusterfs/a1/21/a121e4fb-0984-4e41-94d7-8f0c4f87f4b6’

  Size: 67108864Blocks: 54576  IO Block: 4096   regular file
Device: fd09h/64777dInode: 16154259424  Links: 2
Access: (0660/-rw-rw)  Uid: (0/root)   Gid: (0/root)
Context: system_u:object_r:glusterd_brick_t:s0
Access: 2020-01-31 22:16:57.811800217 +0100
Modify: 2020-02-01 07:19:24.180939487 +0100
Change: 2020-02-01 07:19:24.184939586 +0100
 Birth: -



Now, I dont dare simply proceeding without some advice.
Anyone got a clue on who to resolve this issue? File #2 is identical to 
this one, from a problem point of view.


Have a great weekend!
-Chris.

--
with kind regards,
mit freundlichen Gruessen,

Christian Reiss



Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/441850968

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@gluster.org