Re: [Gluster-users] Does replace-brick migrate data?

2019-06-09 Thread Alan Orth
Dear Nithya,

A small update: shortly after I sent the message above with xattrs of one
"missing" directory, the directory and several others magically appeared on
the FUSE mount point (woohoo!). This was several days into the rebalance
progress (and about 9 million files scanned!). Now I'm hopeful that Gluster
has done the right thing and fixed some more of these issues. I'll wait
until the rebalance is done and then assess its work. I will let you know
if I have any more questions.

Regards,

On Sat, Jun 8, 2019 at 11:25 AM Alan Orth  wrote:

> Thank you, Nithya.
>
> The "missing" directory is indeed present on all bricks. I enabled
> client-log-level DEBUG on the volume and then noticed the following in the
> FUSE mount log when doing a `stat` on the "missing" directory on the FUSE
> mount:
>
> [2019-06-08 08:03:30.240738] D [MSGID: 0]
> [dht-common.c:3454:dht_do_fresh_lookup] 0-homes-dht: Calling fresh lookup
> for /aorth/data on homes-replicate-2
> [2019-06-08 08:03:30.241138] D [MSGID: 0]
> [dht-common.c:3013:dht_lookup_cbk] 0-homes-dht: fresh_lookup returned for
> /aorth/data with op_ret 0
> [2019-06-08 08:03:30.241610] D [MSGID: 0]
> [dht-common.c:1354:dht_lookup_dir_cbk] 0-homes-dht: Internal xattr
> trusted.glusterfs.dht.mds is not present  on path /aorth/data gfid is
> fb87699f-ebf3-4098-977d-85c3a70b849c
> [2019-06-08 08:06:18.880961] D [MSGID: 0]
> [dht-common.c:1559:dht_revalidate_cbk] 0-homes-dht: revalidate lookup of
> /aorth/data returned with op_ret 0
> [2019-06-08 08:06:18.880963] D [MSGID: 0]
> [dht-common.c:1651:dht_revalidate_cbk] 0-homes-dht: internal xattr
> trusted.glusterfs.dht.mds is not present on path /aorth/data gfid is
> fb87699f-ebf3-4098-977d-85c3a70b849c
> [2019-06-08 08:06:18.880996] D [MSGID: 0]
> [dht-common.c:914:dht_common_mark_mdsxattr] 0-homes-dht: internal xattr
> trusted.glusterfs.dht.mds is present on subvolon path /aorth/data gfid is
> fb87699f-ebf3-4098-977d-85c3a70b849c
>
> One message says the trusted.glusterfs.dht.mds xattr is not present, then
> the next says it is present. Is that relevant? I looked at the xattrs of
> that directory on all the bricks and it does seem to be inconsistent (also
> the modification times on the directory are different):
>
> [root@wingu0 ~]# getfattr -d -m. -e hex /mnt/gluster/homes/aorth/data
> getfattr: Removing leading '/' from absolute path names
> # file: mnt/gluster/homes/aorth/data
>
> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> trusted.afr.dirty=0x
> trusted.afr.homes-client-3=0x00020002
> trusted.afr.homes-client-5=0x
> trusted.gfid=0xfb87699febf34098977d85c3a70b849c
> trusted.glusterfs.dht=0xe7c11ff2b6dd59ef
>
> [root@wingu3 ~]# getfattr -d -m. -e hex /mnt/gluster/homes/aorth/data
> getfattr: Removing leading '/' from absolute path names
> # file: mnt/gluster/homes/aorth/data
>
> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> trusted.afr.homes-client-0=0x
> trusted.afr.homes-client-1=0x
> trusted.gfid=0xfb87699febf34098977d85c3a70b849c
> trusted.glusterfs.dht=0xe7c11ff249251e2d
> trusted.glusterfs.dht.mds=0x
>
> [root@wingu4 ~]# getfattr -d -m. -e hex /mnt/gluster/homes/aorth/data
> getfattr: Removing leading '/' from absolute path names
> # file: mnt/gluster/homes/aorth/data
>
> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> trusted.afr.homes-client-0=0x
> trusted.afr.homes-client-1=0x
> trusted.gfid=0xfb87699febf34098977d85c3a70b849c
> trusted.glusterfs.dht=0xe7c11ff249251e2d
> trusted.glusterfs.dht.mds=0x
>
> [root@wingu05 ~]# getfattr -d -m. -e hex
> /data/glusterfs/sdb/homes/aorth/data
> getfattr: Removing leading '/' from absolute path names
> # file: data/glusterfs/sdb/homes/aorth/data
>
> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> trusted.afr.homes-client-2=0x
> trusted.gfid=0xfb87699febf34098977d85c3a70b849c
> trusted.glusterfs.dht=0xe7c11ff249251e2eb6dd59ee
>
> [root@wingu05 ~]# getfattr -d -m. -e hex
> /data/glusterfs/sdc/homes/aorth/data
> getfattr: Removing leading '/' from absolute path names
> # file: data/glusterfs/sdc/homes/aorth/data
>
> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> trusted.gfid=0xfb87699febf34098977d85c3a70b849c
> trusted.glusterfs.dht=0xe7c11ff2b6dd59ef
>
> [root@wingu06 ~]# getfattr -d -m. -e hex
> /data/glusterfs/sdb/homes/aorth/data
> getfattr: Removing leading '/' from absolute path names
> # file: data/glusterfs/sdb/homes/aorth/data
>
> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> trusted.gfid=0xfb87699febf34098977d85c3a70b849c
> trusted.glusterfs.dht=0xe7c11f

Re: [Gluster-users] Does replace-brick migrate data?

2019-06-08 Thread Alan Orth
Thank you, Nithya.

The "missing" directory is indeed present on all bricks. I enabled
client-log-level DEBUG on the volume and then noticed the following in the
FUSE mount log when doing a `stat` on the "missing" directory on the FUSE
mount:

[2019-06-08 08:03:30.240738] D [MSGID: 0]
[dht-common.c:3454:dht_do_fresh_lookup] 0-homes-dht: Calling fresh lookup
for /aorth/data on homes-replicate-2
[2019-06-08 08:03:30.241138] D [MSGID: 0]
[dht-common.c:3013:dht_lookup_cbk] 0-homes-dht: fresh_lookup returned for
/aorth/data with op_ret 0
[2019-06-08 08:03:30.241610] D [MSGID: 0]
[dht-common.c:1354:dht_lookup_dir_cbk] 0-homes-dht: Internal xattr
trusted.glusterfs.dht.mds is not present  on path /aorth/data gfid is
fb87699f-ebf3-4098-977d-85c3a70b849c
[2019-06-08 08:06:18.880961] D [MSGID: 0]
[dht-common.c:1559:dht_revalidate_cbk] 0-homes-dht: revalidate lookup of
/aorth/data returned with op_ret 0
[2019-06-08 08:06:18.880963] D [MSGID: 0]
[dht-common.c:1651:dht_revalidate_cbk] 0-homes-dht: internal xattr
trusted.glusterfs.dht.mds is not present on path /aorth/data gfid is
fb87699f-ebf3-4098-977d-85c3a70b849c
[2019-06-08 08:06:18.880996] D [MSGID: 0]
[dht-common.c:914:dht_common_mark_mdsxattr] 0-homes-dht: internal xattr
trusted.glusterfs.dht.mds is present on subvolon path /aorth/data gfid is
fb87699f-ebf3-4098-977d-85c3a70b849c

One message says the trusted.glusterfs.dht.mds xattr is not present, then
the next says it is present. Is that relevant? I looked at the xattrs of
that directory on all the bricks and it does seem to be inconsistent (also
the modification times on the directory are different):

[root@wingu0 ~]# getfattr -d -m. -e hex /mnt/gluster/homes/aorth/data
getfattr: Removing leading '/' from absolute path names
# file: mnt/gluster/homes/aorth/data
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.dirty=0x
trusted.afr.homes-client-3=0x00020002
trusted.afr.homes-client-5=0x
trusted.gfid=0xfb87699febf34098977d85c3a70b849c
trusted.glusterfs.dht=0xe7c11ff2b6dd59ef

[root@wingu3 ~]# getfattr -d -m. -e hex /mnt/gluster/homes/aorth/data
getfattr: Removing leading '/' from absolute path names
# file: mnt/gluster/homes/aorth/data
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.homes-client-0=0x
trusted.afr.homes-client-1=0x
trusted.gfid=0xfb87699febf34098977d85c3a70b849c
trusted.glusterfs.dht=0xe7c11ff249251e2d
trusted.glusterfs.dht.mds=0x

[root@wingu4 ~]# getfattr -d -m. -e hex /mnt/gluster/homes/aorth/data
getfattr: Removing leading '/' from absolute path names
# file: mnt/gluster/homes/aorth/data
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.homes-client-0=0x
trusted.afr.homes-client-1=0x
trusted.gfid=0xfb87699febf34098977d85c3a70b849c
trusted.glusterfs.dht=0xe7c11ff249251e2d
trusted.glusterfs.dht.mds=0x

[root@wingu05 ~]# getfattr -d -m. -e hex
/data/glusterfs/sdb/homes/aorth/data
getfattr: Removing leading '/' from absolute path names
# file: data/glusterfs/sdb/homes/aorth/data
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.homes-client-2=0x
trusted.gfid=0xfb87699febf34098977d85c3a70b849c
trusted.glusterfs.dht=0xe7c11ff249251e2eb6dd59ee

[root@wingu05 ~]# getfattr -d -m. -e hex
/data/glusterfs/sdc/homes/aorth/data
getfattr: Removing leading '/' from absolute path names
# file: data/glusterfs/sdc/homes/aorth/data
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.gfid=0xfb87699febf34098977d85c3a70b849c
trusted.glusterfs.dht=0xe7c11ff2b6dd59ef

[root@wingu06 ~]# getfattr -d -m. -e hex
/data/glusterfs/sdb/homes/aorth/data
getfattr: Removing leading '/' from absolute path names
# file: data/glusterfs/sdb/homes/aorth/data
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.gfid=0xfb87699febf34098977d85c3a70b849c
trusted.glusterfs.dht=0xe7c11ff249251e2eb6dd59ee

This is a replica 2 volume on Gluster 5.6.

Thank you,

On Sat, Jun 8, 2019 at 5:28 AM Nithya Balachandran 
wrote:

>
>
> On Sat, 8 Jun 2019 at 01:29, Alan Orth  wrote:
>
>> Dear Ravi,
>>
>> In the last week I have completed a fix-layout and a full INDEX heal on
>> this volume. Now I've started a rebalance and I see a few terabytes of data
>> going around on different bricks since yesterday, which I'm sure is good.
>>
>> While I wait for the rebalance to finish, I'm wondering if you know what
>> would cause directories to be missing from the FUSE mount point? If I list
>> the directories explicitly I can see their contents, but they do not appear
>> in their parent directories' listing. In the

Re: [Gluster-users] Does replace-brick migrate data?

2019-06-07 Thread Nithya Balachandran
On Sat, 8 Jun 2019 at 01:29, Alan Orth  wrote:

> Dear Ravi,
>
> In the last week I have completed a fix-layout and a full INDEX heal on
> this volume. Now I've started a rebalance and I see a few terabytes of data
> going around on different bricks since yesterday, which I'm sure is good.
>
> While I wait for the rebalance to finish, I'm wondering if you know what
> would cause directories to be missing from the FUSE mount point? If I list
> the directories explicitly I can see their contents, but they do not appear
> in their parent directories' listing. In the case of duplicated files it is
> always because the files are not on the correct bricks (according to the
> Dynamo/Elastic Hash algorithm), and I can fix it by copying the file to the
> correct brick(s) and removing it from the others (along with their
> .glusterfs hard links). So what could cause directories to be missing?
>
> Hi Alan,

The directories that don't show up in the parent directory listing is
probably because they do not exist on the hashed subvol. Please check the
backend bricks to see if they are missing on any of them.

Regards,
Nithya

Thank you,
>
> Thank you,
>
> On Wed, Jun 5, 2019 at 1:08 AM Alan Orth  wrote:
>
>> Hi Ravi,
>>
>> You're right that I had mentioned using rsync to copy the brick content
>> to a new host, but in the end I actually decided not to bring it up on a
>> new brick. Instead I added the original brick back into the volume. So the
>> xattrs and symlinks to .glusterfs on the original brick are fine. I think
>> the problem probably lies with a remove-brick that got interrupted. A few
>> weeks ago during the maintenance I had tried to remove a brick and then
>> after twenty minutes and no obvious progress I stopped it—after that the
>> bricks were still part of the volume.
>>
>> In the last few days I have run a fix-layout that took 26 hours and
>> finished successfully. Then I started a full index heal and it has healed
>> about 3.3 million files in a few days and I see a clear increase of network
>> traffic from old brick host to new brick host over that time. Once the full
>> index heal completes I will try to do a rebalance.
>>
>> Thank you,
>>
>>
>> On Mon, Jun 3, 2019 at 7:40 PM Ravishankar N 
>> wrote:
>>
>>>
>>> On 01/06/19 9:37 PM, Alan Orth wrote:
>>>
>>> Dear Ravi,
>>>
>>> The .glusterfs hardlinks/symlinks should be fine. I'm not sure how I
>>> could verify them for six bricks and millions of files, though... :\
>>>
>>> Hi Alan,
>>>
>>> The reason I asked this is because you had mentioned in one of your
>>> earlier emails that when you moved content from the old brick to the new
>>> one, you had skipped the .glusterfs directory. So I was assuming that when
>>> you added back this new brick to the cluster, it might have been missing
>>> the .glusterfs entries. If that is the cae, one way to verify could be to
>>> check using a script if all files on the brick have a link-count of at
>>> least 2 and all dirs have valid symlinks inside .glusterfs pointing to
>>> themselves.
>>>
>>>
>>> I had a small success in fixing some issues with duplicated files on the
>>> FUSE mount point yesterday. I read quite a bit about the elastic hashing
>>> algorithm that determines which files get placed on which bricks based on
>>> the hash of their filename and the trusted.glusterfs.dht xattr on brick
>>> directories (thanks to Joe Julian's blog post and Python script for showing
>>> how it works¹). With that knowledge I looked closer at one of the files
>>> that was appearing as duplicated on the FUSE mount and found that it was
>>> also duplicated on more than `replica 2` bricks. For this particular file I
>>> found two "real" files and several zero-size files with
>>> trusted.glusterfs.dht.linkto xattrs. Neither of the "real" files were on
>>> the correct brick as far as the DHT layout is concerned, so I copied one of
>>> them to the correct brick, deleted the others and their hard links, and did
>>> a `stat` on the file from the FUSE mount point and it fixed itself. Yay!
>>>
>>> Could this have been caused by a replace-brick that got interrupted and
>>> didn't finish re-labeling the xattrs?
>>>
>>> No, replace-brick only initiates AFR self-heal, which just copies the
>>> contents from the other brick(s) of the *same* replica pair into the
>>> replaced brick.  The link-to files are created by DHT when you rename a
>>> file from the client. If the new name hashes to a different  brick, DHT
>>> does not move the entire file there. It instead creates the link-to file
>>> (the one with the dht.linkto xattrs) on the hashed subvol. The value of
>>> this xattr points to the brick where the actual data is there (`getfattr -e
>>> text` to see it for yourself).  Perhaps you had attempted a rebalance or
>>> remove-brick earlier and interrupted that?
>>>
>>> Should I be thinking of some heuristics to identify and fix these issues
>>> with a script (incorrect brick placement), or is this something a fix
>>> layout or repeated vo

Re: [Gluster-users] Does replace-brick migrate data?

2019-06-07 Thread Alan Orth
Dear Ravi,

In the last week I have completed a fix-layout and a full INDEX heal on
this volume. Now I've started a rebalance and I see a few terabytes of data
going around on different bricks since yesterday, which I'm sure is good.

While I wait for the rebalance to finish, I'm wondering if you know what
would cause directories to be missing from the FUSE mount point? If I list
the directories explicitly I can see their contents, but they do not appear
in their parent directories' listing. In the case of duplicated files it is
always because the files are not on the correct bricks (according to the
Dynamo/Elastic Hash algorithm), and I can fix it by copying the file to the
correct brick(s) and removing it from the others (along with their
.glusterfs hard links). So what could cause directories to be missing?

Thank you,

Thank you,

On Wed, Jun 5, 2019 at 1:08 AM Alan Orth  wrote:

> Hi Ravi,
>
> You're right that I had mentioned using rsync to copy the brick content to
> a new host, but in the end I actually decided not to bring it up on a new
> brick. Instead I added the original brick back into the volume. So the
> xattrs and symlinks to .glusterfs on the original brick are fine. I think
> the problem probably lies with a remove-brick that got interrupted. A few
> weeks ago during the maintenance I had tried to remove a brick and then
> after twenty minutes and no obvious progress I stopped it—after that the
> bricks were still part of the volume.
>
> In the last few days I have run a fix-layout that took 26 hours and
> finished successfully. Then I started a full index heal and it has healed
> about 3.3 million files in a few days and I see a clear increase of network
> traffic from old brick host to new brick host over that time. Once the full
> index heal completes I will try to do a rebalance.
>
> Thank you,
>
>
> On Mon, Jun 3, 2019 at 7:40 PM Ravishankar N 
> wrote:
>
>>
>> On 01/06/19 9:37 PM, Alan Orth wrote:
>>
>> Dear Ravi,
>>
>> The .glusterfs hardlinks/symlinks should be fine. I'm not sure how I
>> could verify them for six bricks and millions of files, though... :\
>>
>> Hi Alan,
>>
>> The reason I asked this is because you had mentioned in one of your
>> earlier emails that when you moved content from the old brick to the new
>> one, you had skipped the .glusterfs directory. So I was assuming that when
>> you added back this new brick to the cluster, it might have been missing
>> the .glusterfs entries. If that is the cae, one way to verify could be to
>> check using a script if all files on the brick have a link-count of at
>> least 2 and all dirs have valid symlinks inside .glusterfs pointing to
>> themselves.
>>
>>
>> I had a small success in fixing some issues with duplicated files on the
>> FUSE mount point yesterday. I read quite a bit about the elastic hashing
>> algorithm that determines which files get placed on which bricks based on
>> the hash of their filename and the trusted.glusterfs.dht xattr on brick
>> directories (thanks to Joe Julian's blog post and Python script for showing
>> how it works¹). With that knowledge I looked closer at one of the files
>> that was appearing as duplicated on the FUSE mount and found that it was
>> also duplicated on more than `replica 2` bricks. For this particular file I
>> found two "real" files and several zero-size files with
>> trusted.glusterfs.dht.linkto xattrs. Neither of the "real" files were on
>> the correct brick as far as the DHT layout is concerned, so I copied one of
>> them to the correct brick, deleted the others and their hard links, and did
>> a `stat` on the file from the FUSE mount point and it fixed itself. Yay!
>>
>> Could this have been caused by a replace-brick that got interrupted and
>> didn't finish re-labeling the xattrs?
>>
>> No, replace-brick only initiates AFR self-heal, which just copies the
>> contents from the other brick(s) of the *same* replica pair into the
>> replaced brick.  The link-to files are created by DHT when you rename a
>> file from the client. If the new name hashes to a different  brick, DHT
>> does not move the entire file there. It instead creates the link-to file
>> (the one with the dht.linkto xattrs) on the hashed subvol. The value of
>> this xattr points to the brick where the actual data is there (`getfattr -e
>> text` to see it for yourself).  Perhaps you had attempted a rebalance or
>> remove-brick earlier and interrupted that?
>>
>> Should I be thinking of some heuristics to identify and fix these issues
>> with a script (incorrect brick placement), or is this something a fix
>> layout or repeated volume heals can fix? I've already completed a whole
>> heal on this particular volume this week and it did heal about 1,000,000
>> files (mostly data and metadata, but about 20,000 entry heals as well).
>>
>> Maybe you should let the AFR self-heals complete first and then attempt a
>> full rebalance to take care of the dht link-to files. But  if the files are
>> in millions, it could t

Re: [Gluster-users] Does replace-brick migrate data?

2019-06-04 Thread Alan Orth
Hi Ravi,

You're right that I had mentioned using rsync to copy the brick content to
a new host, but in the end I actually decided not to bring it up on a new
brick. Instead I added the original brick back into the volume. So the
xattrs and symlinks to .glusterfs on the original brick are fine. I think
the problem probably lies with a remove-brick that got interrupted. A few
weeks ago during the maintenance I had tried to remove a brick and then
after twenty minutes and no obvious progress I stopped it—after that the
bricks were still part of the volume.

In the last few days I have run a fix-layout that took 26 hours and
finished successfully. Then I started a full index heal and it has healed
about 3.3 million files in a few days and I see a clear increase of network
traffic from old brick host to new brick host over that time. Once the full
index heal completes I will try to do a rebalance.

Thank you,


On Mon, Jun 3, 2019 at 7:40 PM Ravishankar N  wrote:

>
> On 01/06/19 9:37 PM, Alan Orth wrote:
>
> Dear Ravi,
>
> The .glusterfs hardlinks/symlinks should be fine. I'm not sure how I could
> verify them for six bricks and millions of files, though... :\
>
> Hi Alan,
>
> The reason I asked this is because you had mentioned in one of your
> earlier emails that when you moved content from the old brick to the new
> one, you had skipped the .glusterfs directory. So I was assuming that when
> you added back this new brick to the cluster, it might have been missing
> the .glusterfs entries. If that is the cae, one way to verify could be to
> check using a script if all files on the brick have a link-count of at
> least 2 and all dirs have valid symlinks inside .glusterfs pointing to
> themselves.
>
>
> I had a small success in fixing some issues with duplicated files on the
> FUSE mount point yesterday. I read quite a bit about the elastic hashing
> algorithm that determines which files get placed on which bricks based on
> the hash of their filename and the trusted.glusterfs.dht xattr on brick
> directories (thanks to Joe Julian's blog post and Python script for showing
> how it works¹). With that knowledge I looked closer at one of the files
> that was appearing as duplicated on the FUSE mount and found that it was
> also duplicated on more than `replica 2` bricks. For this particular file I
> found two "real" files and several zero-size files with
> trusted.glusterfs.dht.linkto xattrs. Neither of the "real" files were on
> the correct brick as far as the DHT layout is concerned, so I copied one of
> them to the correct brick, deleted the others and their hard links, and did
> a `stat` on the file from the FUSE mount point and it fixed itself. Yay!
>
> Could this have been caused by a replace-brick that got interrupted and
> didn't finish re-labeling the xattrs?
>
> No, replace-brick only initiates AFR self-heal, which just copies the
> contents from the other brick(s) of the *same* replica pair into the
> replaced brick.  The link-to files are created by DHT when you rename a
> file from the client. If the new name hashes to a different  brick, DHT
> does not move the entire file there. It instead creates the link-to file
> (the one with the dht.linkto xattrs) on the hashed subvol. The value of
> this xattr points to the brick where the actual data is there (`getfattr -e
> text` to see it for yourself).  Perhaps you had attempted a rebalance or
> remove-brick earlier and interrupted that?
>
> Should I be thinking of some heuristics to identify and fix these issues
> with a script (incorrect brick placement), or is this something a fix
> layout or repeated volume heals can fix? I've already completed a whole
> heal on this particular volume this week and it did heal about 1,000,000
> files (mostly data and metadata, but about 20,000 entry heals as well).
>
> Maybe you should let the AFR self-heals complete first and then attempt a
> full rebalance to take care of the dht link-to files. But  if the files are
> in millions, it could take quite some time to complete.
> Regards,
> Ravi
>
> Thanks for your support,
>
> ¹ https://joejulian.name/post/dht-misses-are-expensive/
>
> On Fri, May 31, 2019 at 7:57 AM Ravishankar N 
> wrote:
>
>>
>> On 31/05/19 3:20 AM, Alan Orth wrote:
>>
>> Dear Ravi,
>>
>> I spent a bit of time inspecting the xattrs on some files and directories
>> on a few bricks for this volume and it looks a bit messy. Even if I could
>> make sense of it for a few and potentially heal them manually, there are
>> millions of files and directories in total so that's definitely not a
>> scalable solution. After a few missteps with `replace-brick ... commit
>> force` in the last week—one of which on a brick that was dead/offline—as
>> well as some premature `remove-brick` commands, I'm unsure how how to
>> proceed and I'm getting demotivated. It's scary how quickly things get out
>> of hand in distributed systems...
>>
>> Hi Alan,
>> The one good thing about gluster is it that the data is always avail

Re: [Gluster-users] Does replace-brick migrate data?

2019-06-03 Thread Ravishankar N


On 01/06/19 9:37 PM, Alan Orth wrote:

Dear Ravi,

The .glusterfs hardlinks/symlinks should be fine. I'm not sure how I 
could verify them for six bricks and millions of files, though... :\


Hi Alan,

The reason I asked this is because you had mentioned in one of your 
earlier emails that when you moved content from the old brick to the new 
one, you had skipped the .glusterfs directory. So I was assuming that 
when you added back this new brick to the cluster, it might have been 
missing the .glusterfs entries. If that is the cae, one way to verify 
could be to check using a script if all files on the brick have a 
link-count of at least 2 and all dirs have valid symlinks inside 
.glusterfs pointing to themselves.




I had a small success in fixing some issues with duplicated files on 
the FUSE mount point yesterday. I read quite a bit about the elastic 
hashing algorithm that determines which files get placed on which 
bricks based on the hash of their filename and the 
trusted.glusterfs.dht xattr on brick directories (thanks to Joe 
Julian's blog post and Python script for showing how it works¹). With 
that knowledge I looked closer at one of the files that was appearing 
as duplicated on the FUSE mount and found that it was also duplicated 
on more than `replica 2` bricks. For this particular file I found two 
"real" files and several zero-size files with 
trusted.glusterfs.dht.linkto xattrs. Neither of the "real" files were 
on the correct brick as far as the DHT layout is concerned, so I 
copied one of them to the correct brick, deleted the others and their 
hard links, and did a `stat` on the file from the FUSE mount point and 
it fixed itself. Yay!


Could this have been caused by a replace-brick that got interrupted 
and didn't finish re-labeling the xattrs?
No, replace-brick only initiates AFR self-heal, which just copies the 
contents from the other brick(s) of the *same* replica pair into the 
replaced brick.  The link-to files are created by DHT when you rename a 
file from the client. If the new name hashes to a different  brick, DHT 
does not move the entire file there. It instead creates the link-to file 
(the one with the dht.linkto xattrs) on the hashed subvol. The value of 
this xattr points to the brick where the actual data is there (`getfattr 
-e text` to see it for yourself).  Perhaps you had attempted a rebalance 
or remove-brick earlier and interrupted that?
Should I be thinking of some heuristics to identify and fix these 
issues with a script (incorrect brick placement), or is this something 
a fix layout or repeated volume heals can fix? I've already completed 
a whole heal on this particular volume this week and it did heal about 
1,000,000 files (mostly data and metadata, but about 20,000 entry 
heals as well).


Maybe you should let the AFR self-heals complete first and then attempt 
a full rebalance to take care of the dht link-to files. But  if the 
files are in millions, it could take quite some time to complete.


Regards,
Ravi

Thanks for your support,

¹ https://joejulian.name/post/dht-misses-are-expensive/

On Fri, May 31, 2019 at 7:57 AM Ravishankar N > wrote:



On 31/05/19 3:20 AM, Alan Orth wrote:

Dear Ravi,

I spent a bit of time inspecting the xattrs on some files and
directories on a few bricks for this volume and it looks a bit
messy. Even if I could make sense of it for a few and potentially
heal them manually, there are millions of files and directories
in total so that's definitely not a scalable solution. After a
few missteps with `replace-brick ... commit force` in the last
week—one of which on a brick that was dead/offline—as well as
some premature `remove-brick` commands, I'm unsure how how to
proceed and I'm getting demotivated. It's scary how quickly
things get out of hand in distributed systems...

Hi Alan,
The one good thing about gluster is it that the data is always
available directly on the backed bricks even if your volume has
inconsistencies at the gluster level. So theoretically, if your
cluster is FUBAR, you could just create a new volume and copy all
data onto it via its mount from the old volume's bricks.


I had hoped that bringing the old brick back up would help, but
by the time I added it again a few days had passed and all the
brick-id's had changed due to the replace/remove brick commands,
not to mention that the trusted.afr.$volume-client-xx values were
now probably pointing to the wrong bricks (?).

Anyways, a few hours ago I started a full heal on the volume and
I see that there is a sustained 100MiB/sec of network traffic
going from the old brick's host to the new one. The completed
heals reported in the logs look promising too:

Old brick host:

# grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o
-E 'Completed (data|metadata|entry) selfheal' | sort | uniq -c
 28

Re: [Gluster-users] Does replace-brick migrate data?

2019-06-01 Thread Alan Orth
Dear Ravi,

The .glusterfs hardlinks/symlinks should be fine. I'm not sure how I could
verify them for six bricks and millions of files, though... :\

I had a small success in fixing some issues with duplicated files on the
FUSE mount point yesterday. I read quite a bit about the elastic hashing
algorithm that determines which files get placed on which bricks based on
the hash of their filename and the trusted.glusterfs.dht xattr on brick
directories (thanks to Joe Julian's blog post and Python script for showing
how it works¹). With that knowledge I looked closer at one of the files
that was appearing as duplicated on the FUSE mount and found that it was
also duplicated on more than `replica 2` bricks. For this particular file I
found two "real" files and several zero-size files with
trusted.glusterfs.dht.linkto xattrs. Neither of the "real" files were on
the correct brick as far as the DHT layout is concerned, so I copied one of
them to the correct brick, deleted the others and their hard links, and did
a `stat` on the file from the FUSE mount point and it fixed itself. Yay!

Could this have been caused by a replace-brick that got interrupted and
didn't finish re-labeling the xattrs? Should I be thinking of some
heuristics to identify and fix these issues with a script (incorrect brick
placement), or is this something a fix layout or repeated volume heals can
fix? I've already completed a whole heal on this particular volume this
week and it did heal about 1,000,000 files (mostly data and metadata, but
about 20,000 entry heals as well).

Thanks for your support,

¹ https://joejulian.name/post/dht-misses-are-expensive/

On Fri, May 31, 2019 at 7:57 AM Ravishankar N 
wrote:

>
> On 31/05/19 3:20 AM, Alan Orth wrote:
>
> Dear Ravi,
>
> I spent a bit of time inspecting the xattrs on some files and directories
> on a few bricks for this volume and it looks a bit messy. Even if I could
> make sense of it for a few and potentially heal them manually, there are
> millions of files and directories in total so that's definitely not a
> scalable solution. After a few missteps with `replace-brick ... commit
> force` in the last week—one of which on a brick that was dead/offline—as
> well as some premature `remove-brick` commands, I'm unsure how how to
> proceed and I'm getting demotivated. It's scary how quickly things get out
> of hand in distributed systems...
>
> Hi Alan,
> The one good thing about gluster is it that the data is always available
> directly on the backed bricks even if your volume has inconsistencies at
> the gluster level. So theoretically, if your cluster is FUBAR, you could
> just create a new volume and copy all data onto it via its mount from the
> old volume's bricks.
>
>
> I had hoped that bringing the old brick back up would help, but by the
> time I added it again a few days had passed and all the brick-id's had
> changed due to the replace/remove brick commands, not to mention that the
> trusted.afr.$volume-client-xx values were now probably pointing to the
> wrong bricks (?).
>
> Anyways, a few hours ago I started a full heal on the volume and I see
> that there is a sustained 100MiB/sec of network traffic going from the old
> brick's host to the new one. The completed heals reported in the logs look
> promising too:
>
> Old brick host:
>
> # grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E
> 'Completed (data|metadata|entry) selfheal' | sort | uniq -c
>  281614 Completed data selfheal
>  84 Completed entry selfheal
>  299648 Completed metadata selfheal
>
> New brick host:
>
> # grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E
> 'Completed (data|metadata|entry) selfheal' | sort | uniq -c
>  198256 Completed data selfheal
>   16829 Completed entry selfheal
>  229664 Completed metadata selfheal
>
> So that's good I guess, though I have no idea how long it will take or if
> it will fix the "missing files" issue on the FUSE mount. I've increased
> cluster.shd-max-threads to 8 to hopefully speed up the heal process.
>
> The afr xattrs should not cause files to disappear from mount. If the
> xattr names do not match what each AFR subvol expects (for eg. in a replica
> 2 volume, trusted.afr.*-client-{0,1} for 1st subvol, client-{2,3} for 2nd
> subvol and so on - ) for its children then it won't heal the data, that is
> all. But in your case I see some inconsistencies like one brick having the
> actual file (licenseserver.cfg) and the other having a linkto file (the
> one with the dht.linkto xattr) *in the same replica pair*.
>
>
> I'd be happy for any advice or pointers,
>
> Did you check if the .glusterfs hardlinks/symlinks exist and are in order
> for all bricks?
>
> -Ravi
>
>
> On Wed, May 29, 2019 at 5:20 PM Alan Orth  wrote:
>
>> Dear Ravi,
>>
>> Thank you for the link to the blog post series—it is very informative and
>> current! If I understand your blog post correctly then I think the answer
>> to your previous question about pending AFRs is: no, ther

Re: [Gluster-users] Does replace-brick migrate data?

2019-05-30 Thread Ravishankar N


On 31/05/19 3:20 AM, Alan Orth wrote:

Dear Ravi,

I spent a bit of time inspecting the xattrs on some files and 
directories on a few bricks for this volume and it looks a bit messy. 
Even if I could make sense of it for a few and potentially heal them 
manually, there are millions of files and directories in total so 
that's definitely not a scalable solution. After a few missteps with 
`replace-brick ... commit force` in the last week—one of which on a 
brick that was dead/offline—as well as some premature `remove-brick` 
commands, I'm unsure how how to proceed and I'm getting demotivated. 
It's scary how quickly things get out of hand in distributed systems...

Hi Alan,
The one good thing about gluster is it that the data is always available 
directly on the backed bricks even if your volume has inconsistencies at 
the gluster level. So theoretically, if your cluster is FUBAR, you could 
just create a new volume and copy all data onto it via its mount from 
the old volume's bricks.


I had hoped that bringing the old brick back up would help, but by the 
time I added it again a few days had passed and all the brick-id's had 
changed due to the replace/remove brick commands, not to mention that 
the trusted.afr.$volume-client-xx values were now probably pointing to 
the wrong bricks (?).


Anyways, a few hours ago I started a full heal on the volume and I see 
that there is a sustained 100MiB/sec of network traffic going from the 
old brick's host to the new one. The completed heals reported in the 
logs look promising too:


Old brick host:

# grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E 
'Completed (data|metadata|entry) selfheal' | sort | uniq -c

 281614 Completed data selfheal
     84 Completed entry selfheal
 299648 Completed metadata selfheal

New brick host:

# grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E 
'Completed (data|metadata|entry) selfheal' | sort | uniq -c

 198256 Completed data selfheal
  16829 Completed entry selfheal
 229664 Completed metadata selfheal

So that's good I guess, though I have no idea how long it will take or 
if it will fix the "missing files" issue on the FUSE mount. I've 
increased cluster.shd-max-threads to 8 to hopefully speed up the heal 
process.
The afr xattrs should not cause files to disappear from mount. If the 
xattr names do not match what each AFR subvol expects (for eg. in a 
replica 2 volume, trusted.afr.*-client-{0,1} for 1st subvol, 
client-{2,3} for 2nd subvol and so on - ) for its children then it won't 
heal the data, that is all. But in your case I see some inconsistencies 
like one brick having the actual file (licenseserver.cfg) and the other 
having a linkto file (the one with thedht.linkto xattr) /in the same 
replica pair/.


I'd be happy for any advice or pointers,


Did you check if the .glusterfs hardlinks/symlinks exist and are in 
order for all bricks?


-Ravi



On Wed, May 29, 2019 at 5:20 PM Alan Orth > wrote:


Dear Ravi,

Thank you for the link to the blog post series—it is very
informative and current! If I understand your blog post correctly
then I think the answer to your previous question about pending
AFRs is: no, there are no pending AFRs. I have identified one file
that is a good test case to try to understand what happened after
I issued the `gluster volume replace-brick ... commit force` a few
days ago and then added the same original brick back to the volume
later. This is the current state of the replica 2
distribute/replicate volume:

[root@wingu0 ~]# gluster volume info apps

Volume Name: apps
Type: Distributed-Replicate
Volume ID: f118d2da-79df-4ee1-919d-53884cd34eda
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: wingu3:/mnt/gluster/apps
Brick2: wingu4:/mnt/gluster/apps
Brick3: wingu05:/data/glusterfs/sdb/apps
Brick4: wingu06:/data/glusterfs/sdb/apps
Brick5: wingu0:/mnt/gluster/apps
Brick6: wingu05:/data/glusterfs/sdc/apps
Options Reconfigured:
diagnostics.client-log-level: DEBUG
storage.health-check-interval: 10
nfs.disable: on

I checked the xattrs of one file that is missing from the volume's
FUSE mount (though I can read it if I access its full path
explicitly), but is present in several of the volume's bricks
(some with full size, others empty):

[root@wingu0 ~]# getfattr -d -m. -e hex
/mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg

getfattr: Removing leading '/' from absolute path names # file:
mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg

security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.apps-client-3=0x
trusted.afr.apps-client-5=0x
trusted.afr.dirty=0x
trusted.bit-rot.version=0x02

Re: [Gluster-users] Does replace-brick migrate data?

2019-05-30 Thread Alan Orth
Dear Ravi,

I spent a bit of time inspecting the xattrs on some files and directories
on a few bricks for this volume and it looks a bit messy. Even if I could
make sense of it for a few and potentially heal them manually, there are
millions of files and directories in total so that's definitely not a
scalable solution. After a few missteps with `replace-brick ... commit
force` in the last week—one of which on a brick that was dead/offline—as
well as some premature `remove-brick` commands, I'm unsure how how to
proceed and I'm getting demotivated. It's scary how quickly things get out
of hand in distributed systems...

I had hoped that bringing the old brick back up would help, but by the time
I added it again a few days had passed and all the brick-id's had changed
due to the replace/remove brick commands, not to mention that the
trusted.afr.$volume-client-xx values were now probably pointing to the
wrong bricks (?).

Anyways, a few hours ago I started a full heal on the volume and I see that
there is a sustained 100MiB/sec of network traffic going from the old
brick's host to the new one. The completed heals reported in the logs look
promising too:

Old brick host:

# grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E
'Completed (data|metadata|entry) selfheal' | sort | uniq -c
 281614 Completed data selfheal
 84 Completed entry selfheal
 299648 Completed metadata selfheal

New brick host:

# grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E
'Completed (data|metadata|entry) selfheal' | sort | uniq -c
 198256 Completed data selfheal
  16829 Completed entry selfheal
 229664 Completed metadata selfheal

So that's good I guess, though I have no idea how long it will take or if
it will fix the "missing files" issue on the FUSE mount. I've increased
cluster.shd-max-threads to 8 to hopefully speed up the heal process.

I'd be happy for any advice or pointers,

On Wed, May 29, 2019 at 5:20 PM Alan Orth  wrote:

> Dear Ravi,
>
> Thank you for the link to the blog post series—it is very informative and
> current! If I understand your blog post correctly then I think the answer
> to your previous question about pending AFRs is: no, there are no pending
> AFRs. I have identified one file that is a good test case to try to
> understand what happened after I issued the `gluster volume replace-brick
> ... commit force` a few days ago and then added the same original brick
> back to the volume later. This is the current state of the replica 2
> distribute/replicate volume:
>
> [root@wingu0 ~]# gluster volume info apps
>
> Volume Name: apps
> Type: Distributed-Replicate
> Volume ID: f118d2da-79df-4ee1-919d-53884cd34eda
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 3 x 2 = 6
> Transport-type: tcp
> Bricks:
> Brick1: wingu3:/mnt/gluster/apps
> Brick2: wingu4:/mnt/gluster/apps
> Brick3: wingu05:/data/glusterfs/sdb/apps
> Brick4: wingu06:/data/glusterfs/sdb/apps
> Brick5: wingu0:/mnt/gluster/apps
> Brick6: wingu05:/data/glusterfs/sdc/apps
> Options Reconfigured:
> diagnostics.client-log-level: DEBUG
> storage.health-check-interval: 10
> nfs.disable: on
>
> I checked the xattrs of one file that is missing from the volume's FUSE
> mount (though I can read it if I access its full path explicitly), but is
> present in several of the volume's bricks (some with full size, others
> empty):
>
> [root@wingu0 ~]# getfattr -d -m. -e hex
> /mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg
>
> getfattr: Removing leading '/' from absolute path names
> # file: mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg
> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> trusted.afr.apps-client-3=0x
> trusted.afr.apps-client-5=0x
> trusted.afr.dirty=0x
> trusted.bit-rot.version=0x0200585a396f00046e15
> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd
>
> [root@wingu05 ~]# getfattr -d -m. -e hex 
> /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
> getfattr: Removing leading '/' from absolute path names
> # file: data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd
> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667
> trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200
>
> [root@wingu05 ~]# getfattr -d -m. -e hex 
> /data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg
> getfattr: Removing leading '/' from absolute path names
> # file: data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg
> security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
> trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd
> trusted.gfid2path.82586deefbc539c3=0x34666437323861612d3564

Re: [Gluster-users] Does replace-brick migrate data?

2019-05-29 Thread Alan Orth
Dear Ravi,

Thank you for the link to the blog post series—it is very informative and
current! If I understand your blog post correctly then I think the answer
to your previous question about pending AFRs is: no, there are no pending
AFRs. I have identified one file that is a good test case to try to
understand what happened after I issued the `gluster volume replace-brick
... commit force` a few days ago and then added the same original brick
back to the volume later. This is the current state of the replica 2
distribute/replicate volume:

[root@wingu0 ~]# gluster volume info apps

Volume Name: apps
Type: Distributed-Replicate
Volume ID: f118d2da-79df-4ee1-919d-53884cd34eda
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: wingu3:/mnt/gluster/apps
Brick2: wingu4:/mnt/gluster/apps
Brick3: wingu05:/data/glusterfs/sdb/apps
Brick4: wingu06:/data/glusterfs/sdb/apps
Brick5: wingu0:/mnt/gluster/apps
Brick6: wingu05:/data/glusterfs/sdc/apps
Options Reconfigured:
diagnostics.client-log-level: DEBUG
storage.health-check-interval: 10
nfs.disable: on

I checked the xattrs of one file that is missing from the volume's FUSE
mount (though I can read it if I access its full path explicitly), but is
present in several of the volume's bricks (some with full size, others
empty):

[root@wingu0 ~]# getfattr -d -m. -e hex
/mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg

getfattr: Removing leading '/' from absolute path names
# file: mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.apps-client-3=0x
trusted.afr.apps-client-5=0x
trusted.afr.dirty=0x
trusted.bit-rot.version=0x0200585a396f00046e15
trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd

[root@wingu05 ~]# getfattr -d -m. -e hex
/data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
getfattr: Removing leading '/' from absolute path names
# file: data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd
trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667
trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200

[root@wingu05 ~]# getfattr -d -m. -e hex
/data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg
getfattr: Removing leading '/' from absolute path names
# file: data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd
trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667

[root@wingu06 ~]# getfattr -d -m. -e hex
/data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
getfattr: Removing leading '/' from absolute path names
# file: data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd
trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667
trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200

According to the trusted.afr.apps-client-xx xattrs this particular file
should be on bricks with id "apps-client-3" and "apps-client-5". It took me
a few hours to realize that the brick-id values are recorded in the
volume's volfiles in /var/lib/glusterd/vols/apps/bricks. After comparing
those brick-id values with a volfile backup from before the replace-brick,
I realized that the files are simply on the wrong brick now as far as
Gluster is concerned. This particular file is now on the brick for
"apps-client-4". As an experiment I copied this one file to the two bricks
listed in the xattrs and I was then able to see the file from the FUSE
mount (yay!).

Other than replacing the brick, removing it, and then adding the old brick
on the original server back, there has been no change in the data this
entire time. Can I change the brick IDs in the volfiles so they reflect
where the data actually is? Or perhaps script something to reset all the
xattrs on the files/directories to point to the correct bricks?

Thank you for any help or pointers,

On Wed, May 29, 2019 at 7:24 AM Ravishankar N 
wrote:

>
> On 29/05/19 9:50 AM, Ravishankar N wrote:
>
>
> On 29/05/19 3:59 AM, Alan Orth wrote:
>
> Dear Ravishankar,
>
> I'm not sure if Brick4 had pending AFRs because I don't know what that
> means and it's been a few days so I am not sure I would be able to find
> that information.
>
> When you find some time, have a

Re: [Gluster-users] Does replace-brick migrate data?

2019-05-28 Thread Ravishankar N


On 29/05/19 9:50 AM, Ravishankar N wrote:



On 29/05/19 3:59 AM, Alan Orth wrote:

Dear Ravishankar,

I'm not sure if Brick4 had pending AFRs because I don't know what 
that means and it's been a few days so I am not sure I would be able 
to find that information.
When you find some time, have a look at a blog  series 
I wrote about AFR- I've tried to explain what one needs to know to 
debug replication related issues in it.


Made a typo error. The URL for the blog is https://wp.me/peiBB-6b

-Ravi



Anyways, after wasting a few days rsyncing the old brick to a new 
host I decided to just try to add the old brick back into the volume 
instead of bringing it up on the new host. I created a new brick 
directory on the old host, moved the old brick's contents into that 
new directory (minus the .glusterfs directory), added the new brick 
to the volume, and then did Vlad's find/stat trick¹ from the brick to 
the FUSE mount point.


The interesting problem I have now is that some files don't appear in 
the FUSE mount's directory listings, but I can actually list them 
directly and even read them. What could cause that?
Not sure, too many variables in the hacks that you did to take a 
guess. You can check if the contents of the .glusterfs folder are in 
order on the new brick (example hardlink for files and symlinks for 
directories are present etc.) .

Regards,
Ravi


Thanks,

¹ 
https://lists.gluster.org/pipermail/gluster-users/2018-February/033584.html


On Fri, May 24, 2019 at 4:59 PM Ravishankar N > wrote:



On 23/05/19 2:40 AM, Alan Orth wrote:

Dear list,

I seem to have gotten into a tricky situation. Today I brought
up a shiny new server with new disk arrays and attempted to
replace one brick of a replica 2 distribute/replicate volume on
an older server using the `replace-brick` command:

# gluster volume replace-brick homes wingu0:/mnt/gluster/homes
wingu06:/data/glusterfs/sdb/homes commit force

The command was successful and I see the new brick in the output
of `gluster volume info`. The problem is that Gluster doesn't
seem to be migrating the data,


`replace-brick` definitely must heal (not migrate) the data. In
your case, data must have been healed from Brick-4 to the
replaced Brick-3. Are there any errors in the self-heal daemon
logs of Brick-4's node? Does Brick-4 have pending AFR xattrs
blaming Brick-3? The doc is a bit out of date. replace-brick
command internally does all the setfattr steps that are mentioned
in the doc.

-Ravi



and now the original brick that I replaced is no longer part of
the volume (and a few terabytes of data are just sitting on the
old brick):

# gluster volume info homes | grep -E "Brick[0-9]:"
Brick1: wingu4:/mnt/gluster/homes
Brick2: wingu3:/mnt/gluster/homes
Brick3: wingu06:/data/glusterfs/sdb/homes
Brick4: wingu05:/data/glusterfs/sdb/homes
Brick5: wingu05:/data/glusterfs/sdc/homes
Brick6: wingu06:/data/glusterfs/sdc/homes

I see the Gluster docs have a more complicated procedure for
replacing bricks that involves getfattr/setfattr¹. How can I
tell Gluster about the old brick? I see that I have a backup of
the old volfile thanks to yum's rpmsave function if that helps.

We are using Gluster 5.6 on CentOS 7. Thank you for any advice
you can give.

¹

https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick

-- 
Alan Orth

alan.o...@gmail.com 
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
"In heaven all the interesting people are missing." ―Friedrich
Nietzsche

___
Gluster-users mailing list
Gluster-users@gluster.org  
https://lists.gluster.org/mailman/listinfo/gluster-users




--
Alan Orth
alan.o...@gmail.com 
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
"In heaven all the interesting people are missing." ―Friedrich Nietzsche


___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Does replace-brick migrate data?

2019-05-28 Thread Ravishankar N


On 29/05/19 3:59 AM, Alan Orth wrote:

Dear Ravishankar,

I'm not sure if Brick4 had pending AFRs because I don't know what that 
means and it's been a few days so I am not sure I would be able to 
find that information.
When you find some time, have a look at a blog  series I 
wrote about AFR- I've tried to explain what one needs to know to debug 
replication related issues in it.


Anyways, after wasting a few days rsyncing the old brick to a new host 
I decided to just try to add the old brick back into the volume 
instead of bringing it up on the new host. I created a new brick 
directory on the old host, moved the old brick's contents into that 
new directory (minus the .glusterfs directory), added the new brick to 
the volume, and then did Vlad's find/stat trick¹ from the brick to the 
FUSE mount point.


The interesting problem I have now is that some files don't appear in 
the FUSE mount's directory listings, but I can actually list them 
directly and even read them. What could cause that?
Not sure, too many variables in the hacks that you did to take a guess. 
You can check if the contents of the .glusterfs folder are in order on 
the new brick (example hardlink for files and symlinks for directories 
are present etc.) .

Regards,
Ravi


Thanks,

¹ 
https://lists.gluster.org/pipermail/gluster-users/2018-February/033584.html


On Fri, May 24, 2019 at 4:59 PM Ravishankar N > wrote:



On 23/05/19 2:40 AM, Alan Orth wrote:

Dear list,

I seem to have gotten into a tricky situation. Today I brought up
a shiny new server with new disk arrays and attempted to replace
one brick of a replica 2 distribute/replicate volume on an older
server using the `replace-brick` command:

# gluster volume replace-brick homes wingu0:/mnt/gluster/homes
wingu06:/data/glusterfs/sdb/homes commit force

The command was successful and I see the new brick in the output
of `gluster volume info`. The problem is that Gluster doesn't
seem to be migrating the data,


`replace-brick` definitely must heal (not migrate) the data. In
your case, data must have been healed from Brick-4 to the replaced
Brick-3. Are there any errors in the self-heal daemon logs of
Brick-4's node? Does Brick-4 have pending AFR xattrs blaming
Brick-3? The doc is a bit out of date. replace-brick command
internally does all the setfattr steps that are mentioned in the doc.

-Ravi



and now the original brick that I replaced is no longer part of
the volume (and a few terabytes of data are just sitting on the
old brick):

# gluster volume info homes | grep -E "Brick[0-9]:"
Brick1: wingu4:/mnt/gluster/homes
Brick2: wingu3:/mnt/gluster/homes
Brick3: wingu06:/data/glusterfs/sdb/homes
Brick4: wingu05:/data/glusterfs/sdb/homes
Brick5: wingu05:/data/glusterfs/sdc/homes
Brick6: wingu06:/data/glusterfs/sdc/homes

I see the Gluster docs have a more complicated procedure for
replacing bricks that involves getfattr/setfattr¹. How can I tell
Gluster about the old brick? I see that I have a backup of the
old volfile thanks to yum's rpmsave function if that helps.

We are using Gluster 5.6 on CentOS 7. Thank you for any advice
you can give.

¹

https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick

-- 
Alan Orth

alan.o...@gmail.com 
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
"In heaven all the interesting people are missing." ―Friedrich
Nietzsche

___
Gluster-users mailing list
Gluster-users@gluster.org  
https://lists.gluster.org/mailman/listinfo/gluster-users




--
Alan Orth
alan.o...@gmail.com 
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
"In heaven all the interesting people are missing." ―Friedrich Nietzsche
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Does replace-brick migrate data?

2019-05-28 Thread Alan Orth
Dear Ravishankar,

I'm not sure if Brick4 had pending AFRs because I don't know what that
means and it's been a few days so I am not sure I would be able to find
that information.

Anyways, after wasting a few days rsyncing the old brick to a new host I
decided to just try to add the old brick back into the volume instead of
bringing it up on the new host. I created a new brick directory on the old
host, moved the old brick's contents into that new directory (minus the
.glusterfs directory), added the new brick to the volume, and then did
Vlad's find/stat trick¹ from the brick to the FUSE mount point.

The interesting problem I have now is that some files don't appear in the
FUSE mount's directory listings, but I can actually list them directly and
even read them. What could cause that?

Thanks,

¹
https://lists.gluster.org/pipermail/gluster-users/2018-February/033584.html

On Fri, May 24, 2019 at 4:59 PM Ravishankar N 
wrote:

>
> On 23/05/19 2:40 AM, Alan Orth wrote:
>
> Dear list,
>
> I seem to have gotten into a tricky situation. Today I brought up a shiny
> new server with new disk arrays and attempted to replace one brick of a
> replica 2 distribute/replicate volume on an older server using the
> `replace-brick` command:
>
> # gluster volume replace-brick homes wingu0:/mnt/gluster/homes
> wingu06:/data/glusterfs/sdb/homes commit force
>
> The command was successful and I see the new brick in the output of
> `gluster volume info`. The problem is that Gluster doesn't seem to be
> migrating the data,
>
> `replace-brick` definitely must heal (not migrate) the data. In your case,
> data must have been healed from Brick-4 to the replaced Brick-3. Are there
> any errors in the self-heal daemon logs of Brick-4's node? Does Brick-4
> have pending AFR xattrs blaming Brick-3? The doc is a bit out of date.
> replace-brick command internally does all the setfattr steps that are
> mentioned in the doc.
>
> -Ravi
>
>
> and now the original brick that I replaced is no longer part of the volume
> (and a few terabytes of data are just sitting on the old brick):
>
> # gluster volume info homes | grep -E "Brick[0-9]:"
> Brick1: wingu4:/mnt/gluster/homes
> Brick2: wingu3:/mnt/gluster/homes
> Brick3: wingu06:/data/glusterfs/sdb/homes
> Brick4: wingu05:/data/glusterfs/sdb/homes
> Brick5: wingu05:/data/glusterfs/sdc/homes
> Brick6: wingu06:/data/glusterfs/sdc/homes
>
> I see the Gluster docs have a more complicated procedure for replacing
> bricks that involves getfattr/setfattr¹. How can I tell Gluster about the
> old brick? I see that I have a backup of the old volfile thanks to yum's
> rpmsave function if that helps.
>
> We are using Gluster 5.6 on CentOS 7. Thank you for any advice you can
> give.
>
> ¹
> https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick
>
> --
> Alan Orth
> alan.o...@gmail.com
> https://picturingjordan.com
> https://englishbulgaria.net
> https://mjanja.ch
> "In heaven all the interesting people are missing." ―Friedrich Nietzsche
>
> ___
> Gluster-users mailing 
> listGluster-users@gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>
>

-- 
Alan Orth
alan.o...@gmail.com
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
"In heaven all the interesting people are missing." ―Friedrich Nietzsche
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Does replace-brick migrate data?

2019-05-24 Thread Ravishankar N


On 23/05/19 2:40 AM, Alan Orth wrote:

Dear list,

I seem to have gotten into a tricky situation. Today I brought up a 
shiny new server with new disk arrays and attempted to replace one 
brick of a replica 2 distribute/replicate volume on an older server 
using the `replace-brick` command:


# gluster volume replace-brick homes wingu0:/mnt/gluster/homes 
wingu06:/data/glusterfs/sdb/homes commit force


The command was successful and I see the new brick in the output of 
`gluster volume info`. The problem is that Gluster doesn't seem to be 
migrating the data,


`replace-brick` definitely must heal (not migrate) the data. In your 
case, data must have been healed from Brick-4 to the replaced Brick-3. 
Are there any errors in the self-heal daemon logs of Brick-4's node? 
Does Brick-4 have pending AFR xattrs blaming Brick-3? The doc is a bit 
out of date. replace-brick command internally does all the setfattr 
steps that are mentioned in the doc.


-Ravi


and now the original brick that I replaced is no longer part of the 
volume (and a few terabytes of data are just sitting on the old brick):


# gluster volume info homes | grep -E "Brick[0-9]:"
Brick1: wingu4:/mnt/gluster/homes
Brick2: wingu3:/mnt/gluster/homes
Brick3: wingu06:/data/glusterfs/sdb/homes
Brick4: wingu05:/data/glusterfs/sdb/homes
Brick5: wingu05:/data/glusterfs/sdc/homes
Brick6: wingu06:/data/glusterfs/sdc/homes

I see the Gluster docs have a more complicated procedure for replacing 
bricks that involves getfattr/setfattr¹. How can I tell Gluster about 
the old brick? I see that I have a backup of the old volfile thanks to 
yum's rpmsave function if that helps.


We are using Gluster 5.6 on CentOS 7. Thank you for any advice you can 
give.


¹ 
https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick


--
Alan Orth
alan.o...@gmail.com 
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
"In heaven all the interesting people are missing." ―Friedrich Nietzsche

___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users