Re: [Gluster-users] Distributed re-balance issue

2017-05-25 Thread Nithya Balachandran
On 24 May 2017 at 22:54, Mahdi Adnan  wrote:

> Well yes and no, when i start the re-balance and check it's status, it
> just tells me it completed the re-balance, but it really did not move any
> data and the volume is not evenly distributed.
>
> right now brick6 is full, brick 5 is going to be full in few hours or so.
>

An update on this - on further analysis it looked like the rebalance was
actually happening and files were being migrated. However as the files were
large (2TB) and the rebalance status does not update the Rebalanced or Size
columns until a file migration is complete, it looked like nothing was
happening.


> --
>
> Respectfully
> *Mahdi A. Mahdi*
>
> --
> *From:* Nithya Balachandran 
> *Sent:* Wednesday, May 24, 2017 8:16:53 PM
> *To:* Mahdi Adnan
> *Cc:* Mohammed Rafi K C; gluster-users@gluster.org
>
> *Subject:* Re: [Gluster-users] Distributed re-balance issue
>
>
>
> On 24 May 2017 at 22:45, Nithya Balachandran  wrote:
>
>>
>>
>> On 24 May 2017 at 21:55, Mahdi Adnan  wrote:
>>
>>> Hi,
>>>
>>>
>>> Thank you for your response.
>>>
>>> I have around 15 files, each is 2TB qcow.
>>>
>>> One brick reached 96% so i removed it with "brick remove" and waited
>>> until it goes for around 40% and stopped the removal process with brick
>>> remove stop.
>>>
>>> The issue is brick1 drain it's data to brick6 only, and when brick6
>>> reached around 90% i did the same thing as before and it drained the data
>>> to brick1 only.
>>>
>>> now brick6 reached 99% and i have only a few gigabytes left which will
>>> fill in the next half hour or so.
>>>
>>> attached are the logs for all 6 bricks.
>>>
>>> Hi,
>>
>> Just to clarify, did you run a rebalance (gluster volume rebalance 
>> start) or did you only run remove-brick  ?
>>
>> On re-reading your original email, I see you did run a rebalance. Did it
> complete? Also which bricks are full at the moment?
>
>
>>
>> --
>>>
>>> Respectfully
>>> *Mahdi A. Mahdi*
>>>
>>> --
>>> *From:* Nithya Balachandran 
>>> *Sent:* Wednesday, May 24, 2017 6:45:10 PM
>>> *To:* Mohammed Rafi K C
>>> *Cc:* Mahdi Adnan; gluster-users@gluster.org
>>> *Subject:* Re: [Gluster-users] Distributed re-balance issue
>>>
>>>
>>>
>>> On 24 May 2017 at 20:02, Mohammed Rafi K C  wrote:
>>>
>>>>
>>>>
>>>> On 05/23/2017 08:53 PM, Mahdi Adnan wrote:
>>>>
>>>> Hi,
>>>>
>>>>
>>>> I have a distributed volume with 6 bricks, each have 5TB and it's
>>>> hosting large qcow2 VM disks (I know it's reliable but it's not important
>>>> data)
>>>>
>>>> I started with 5 bricks and then added another one, started the re
>>>> balance process, everything went well, but now im looking at the bricks
>>>> free space and i found one brick is around 82% while others ranging from
>>>> 20% to 60%.
>>>>
>>>> The brick with highest utilization is hosting more qcow2 disk than
>>>> other bricks, and whenever i start re balance it just complete in 0 seconds
>>>> and without moving any data.
>>>>
>>>>
>>>> How much is your average file size in the cluster? And number of files
>>>> (roughly) .
>>>>
>>>>
>>>> What will happen with the brick became full ?
>>>>
>>>> Once brick contents goes beyond 90%, new files won't be created in the
>>>> brick. But existing files can grow.
>>>>
>>>>
>>>> Can i move data manually from one brick to the other ?
>>>>
>>>>
>>>> Nop.It is not recommended, even though gluster will try to find the
>>>> file, it may break.
>>>>
>>>>
>>>> Why re balance not distributing data evenly on all bricks ?
>>>>
>>>>
>>>> Rebalance works based on layout, so we need to see how layouts are
>>>> distributed. If one of your bricks has higher capacity, it will have larger
>>>> layout.
>>>>
>>>>
>>>
>>>
>>>> That is correct. As Rafi said, the layout matters here. Can you please
>>>> send across all the rebalance logs from all the 6 nodes?
>>>>
>>>>

Re: [Gluster-users] Distributed re-balance issue

2017-05-24 Thread Mahdi Adnan
Well yes and no, when i start the re-balance and check it's status, it just 
tells me it completed the re-balance, but it really did not move any data and 
the volume is not evenly distributed.

right now brick6 is full, brick 5 is going to be full in few hours or so.


--

Respectfully
Mahdi A. Mahdi


From: Nithya Balachandran 
Sent: Wednesday, May 24, 2017 8:16:53 PM
To: Mahdi Adnan
Cc: Mohammed Rafi K C; gluster-users@gluster.org
Subject: Re: [Gluster-users] Distributed re-balance issue



On 24 May 2017 at 22:45, Nithya Balachandran 
mailto:nbala...@redhat.com>> wrote:


On 24 May 2017 at 21:55, Mahdi Adnan 
mailto:mahdi.ad...@outlook.com>> wrote:

Hi,


Thank you for your response.

I have around 15 files, each is 2TB qcow.

One brick reached 96% so i removed it with "brick remove" and waited until it 
goes for around 40% and stopped the removal process with brick remove stop.

The issue is brick1 drain it's data to brick6 only, and when brick6 reached 
around 90% i did the same thing as before and it drained the data to brick1 
only.

now brick6 reached 99% and i have only a few gigabytes left which will fill in 
the next half hour or so.

attached are the logs for all 6 bricks.

Hi,

Just to clarify, did you run a rebalance (gluster volume rebalance  start) 
or did you only run remove-brick  ?

On re-reading your original email, I see you did run a rebalance. Did it 
complete? Also which bricks are full at the moment?


--

Respectfully
Mahdi A. Mahdi


From: Nithya Balachandran mailto:nbala...@redhat.com>>
Sent: Wednesday, May 24, 2017 6:45:10 PM
To: Mohammed Rafi K C
Cc: Mahdi Adnan; gluster-users@gluster.org<mailto:gluster-users@gluster.org>
Subject: Re: [Gluster-users] Distributed re-balance issue



On 24 May 2017 at 20:02, Mohammed Rafi K C 
mailto:rkavu...@redhat.com>> wrote:


On 05/23/2017 08:53 PM, Mahdi Adnan wrote:

Hi,


I have a distributed volume with 6 bricks, each have 5TB and it's hosting large 
qcow2 VM disks (I know it's reliable but it's not important data)

I started with 5 bricks and then added another one, started the re balance 
process, everything went well, but now im looking at the bricks free space and 
i found one brick is around 82% while others ranging from 20% to 60%.

The brick with highest utilization is hosting more qcow2 disk than other 
bricks, and whenever i start re balance it just complete in 0 seconds and 
without moving any data.

How much is your average file size in the cluster? And number of files 
(roughly) .



What will happen with the brick became full ?

Once brick contents goes beyond 90%, new files won't be created in the brick. 
But existing files can grow.



Can i move data manually from one brick to the other ?

Nop.It is not recommended, even though gluster will try to find the file, it 
may break.



Why re balance not distributing data evenly on all bricks ?

Rebalance works based on layout, so we need to see how layouts are distributed. 
If one of your bricks has higher capacity, it will have larger layout.




That is correct. As Rafi said, the layout matters here. Can you please send 
across all the rebalance logs from all the 6 nodes?


Nodes runing CentOS 7.3

Gluster 3.8.11


Volume info;

Volume Name: ctvvols
Type: Distribute
Volume ID: 1ecea912-510f-4079-b437-7398e9caa0eb
Status: Started
Snapshot Count: 0
Number of Bricks: 6
Transport-type: tcp
Bricks:
Brick1: ctv01:/vols/ctvvols
Brick2: ctv02:/vols/ctvvols
Brick3: ctv03:/vols/ctvvols
Brick4: ctv04:/vols/ctvvols
Brick5: ctv05:/vols/ctvvols
Brick6: ctv06:/vols/ctvvols
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
performance.low-prio-threads: 32
network.remote-dio: enable
cluster.eager-lock: enable
cluster.quorum-type: none
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 1
features.shard: off
user.cifs: off
network.ping-timeout: 10
storage.owner-uid: 36
storage.owner-gid: 36



re balance log:


[2017-05-23 14:45:12.637671] I [dht-rebalance.c:2866:gf_defrag_process_dir] 
0-ctvvols-dht: Migration operation on dir 
/31e0b341-4eeb-4b71-b280-840eba7d6940/images/690c728d-a83e-4c79-ac7d-1f3f17edf7f0
 took 0.00 secs
[2017-05-23 14:45:12.640043] I [MSGID: 109081] [dht-common.c:4202:dht_setxattr] 
0-ctvvols-dht: fixing the layout of 
/31e0b341-4eeb-4b71-b280-840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
[2017-05-23 14:45:12.641516] I [dht-rebalance.c:2652:gf_defrag_process_dir] 
0-ctvvols-dht: migrate data called on 
/31e0b341-4eeb-4b71-b280-840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
[2017-05-23 14:45:12.642421] I [dht-rebalance.c:2866:gf_defrag_process_dir

Re: [Gluster-users] Distributed re-balance issue

2017-05-24 Thread Nithya Balachandran
On 24 May 2017 at 22:45, Nithya Balachandran  wrote:

>
>
> On 24 May 2017 at 21:55, Mahdi Adnan  wrote:
>
>> Hi,
>>
>>
>> Thank you for your response.
>>
>> I have around 15 files, each is 2TB qcow.
>>
>> One brick reached 96% so i removed it with "brick remove" and waited
>> until it goes for around 40% and stopped the removal process with brick
>> remove stop.
>>
>> The issue is brick1 drain it's data to brick6 only, and when brick6
>> reached around 90% i did the same thing as before and it drained the data
>> to brick1 only.
>>
>> now brick6 reached 99% and i have only a few gigabytes left which will
>> fill in the next half hour or so.
>>
>> attached are the logs for all 6 bricks.
>>
>> Hi,
>
> Just to clarify, did you run a rebalance (gluster volume rebalance 
> start) or did you only run remove-brick  ?
>
> On re-reading your original email, I see you did run a rebalance. Did it
complete? Also which bricks are full at the moment?


>
> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> --------------
>> *From:* Nithya Balachandran 
>> *Sent:* Wednesday, May 24, 2017 6:45:10 PM
>> *To:* Mohammed Rafi K C
>> *Cc:* Mahdi Adnan; gluster-users@gluster.org
>> *Subject:* Re: [Gluster-users] Distributed re-balance issue
>>
>>
>>
>> On 24 May 2017 at 20:02, Mohammed Rafi K C  wrote:
>>
>>>
>>>
>>> On 05/23/2017 08:53 PM, Mahdi Adnan wrote:
>>>
>>> Hi,
>>>
>>>
>>> I have a distributed volume with 6 bricks, each have 5TB and it's
>>> hosting large qcow2 VM disks (I know it's reliable but it's not important
>>> data)
>>>
>>> I started with 5 bricks and then added another one, started the re
>>> balance process, everything went well, but now im looking at the bricks
>>> free space and i found one brick is around 82% while others ranging from
>>> 20% to 60%.
>>>
>>> The brick with highest utilization is hosting more qcow2 disk than other
>>> bricks, and whenever i start re balance it just complete in 0 seconds and
>>> without moving any data.
>>>
>>>
>>> How much is your average file size in the cluster? And number of files
>>> (roughly) .
>>>
>>>
>>> What will happen with the brick became full ?
>>>
>>> Once brick contents goes beyond 90%, new files won't be created in the
>>> brick. But existing files can grow.
>>>
>>>
>>> Can i move data manually from one brick to the other ?
>>>
>>>
>>> Nop.It is not recommended, even though gluster will try to find the
>>> file, it may break.
>>>
>>>
>>> Why re balance not distributing data evenly on all bricks ?
>>>
>>>
>>> Rebalance works based on layout, so we need to see how layouts are
>>> distributed. If one of your bricks has higher capacity, it will have larger
>>> layout.
>>>
>>>
>>
>>
>>> That is correct. As Rafi said, the layout matters here. Can you please
>>> send across all the rebalance logs from all the 6 nodes?
>>>
>>>
>> Nodes runing CentOS 7.3
>>>
>>> Gluster 3.8.11
>>>
>>>
>>> Volume info;
>>> Volume Name: ctvvols
>>> Type: Distribute
>>> Volume ID: 1ecea912-510f-4079-b437-7398e9caa0eb
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 6
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: ctv01:/vols/ctvvols
>>> Brick2: ctv02:/vols/ctvvols
>>> Brick3: ctv03:/vols/ctvvols
>>> Brick4: ctv04:/vols/ctvvols
>>> Brick5: ctv05:/vols/ctvvols
>>> Brick6: ctv06:/vols/ctvvols
>>> Options Reconfigured:
>>> nfs.disable: on
>>> performance.readdir-ahead: on
>>> transport.address-family: inet
>>> performance.quick-read: off
>>> performance.read-ahead: off
>>> performance.io-cache: off
>>> performance.stat-prefetch: off
>>> performance.low-prio-threads: 32
>>> network.remote-dio: enable
>>> cluster.eager-lock: enable
>>> cluster.quorum-type: none
>>> cluster.server-quorum-type: server
>>> cluster.data-self-heal-algorithm: full
>>> cluster.locking-scheme: granular
>>> cluster.shd-max-threads: 8
>>> cluster.shd-wait-qlength: 1
>>> features.shard: off
>>> user.cifs: off

Re: [Gluster-users] Distributed re-balance issue

2017-05-24 Thread Nithya Balachandran
On 24 May 2017 at 21:55, Mahdi Adnan  wrote:

> Hi,
>
>
> Thank you for your response.
>
> I have around 15 files, each is 2TB qcow.
>
> One brick reached 96% so i removed it with "brick remove" and waited until
> it goes for around 40% and stopped the removal process with brick remove
> stop.
>
> The issue is brick1 drain it's data to brick6 only, and when brick6
> reached around 90% i did the same thing as before and it drained the data
> to brick1 only.
>
> now brick6 reached 99% and i have only a few gigabytes left which will
> fill in the next half hour or so.
>
> attached are the logs for all 6 bricks.
>
> Hi,

Just to clarify, did you run a rebalance (gluster volume rebalance 
start) or did you only run remove-brick  ?


-- 
>
> Respectfully
> *Mahdi A. Mahdi*
>
> --
> *From:* Nithya Balachandran 
> *Sent:* Wednesday, May 24, 2017 6:45:10 PM
> *To:* Mohammed Rafi K C
> *Cc:* Mahdi Adnan; gluster-users@gluster.org
> *Subject:* Re: [Gluster-users] Distributed re-balance issue
>
>
>
> On 24 May 2017 at 20:02, Mohammed Rafi K C  wrote:
>
>>
>>
>> On 05/23/2017 08:53 PM, Mahdi Adnan wrote:
>>
>> Hi,
>>
>>
>> I have a distributed volume with 6 bricks, each have 5TB and it's hosting
>> large qcow2 VM disks (I know it's reliable but it's not important data)
>>
>> I started with 5 bricks and then added another one, started the re
>> balance process, everything went well, but now im looking at the bricks
>> free space and i found one brick is around 82% while others ranging from
>> 20% to 60%.
>>
>> The brick with highest utilization is hosting more qcow2 disk than other
>> bricks, and whenever i start re balance it just complete in 0 seconds and
>> without moving any data.
>>
>>
>> How much is your average file size in the cluster? And number of files
>> (roughly) .
>>
>>
>> What will happen with the brick became full ?
>>
>> Once brick contents goes beyond 90%, new files won't be created in the
>> brick. But existing files can grow.
>>
>>
>> Can i move data manually from one brick to the other ?
>>
>>
>> Nop.It is not recommended, even though gluster will try to find the file,
>> it may break.
>>
>>
>> Why re balance not distributing data evenly on all bricks ?
>>
>>
>> Rebalance works based on layout, so we need to see how layouts are
>> distributed. If one of your bricks has higher capacity, it will have larger
>> layout.
>>
>>
>
>
>> That is correct. As Rafi said, the layout matters here. Can you please
>> send across all the rebalance logs from all the 6 nodes?
>>
>>
> Nodes runing CentOS 7.3
>>
>> Gluster 3.8.11
>>
>>
>> Volume info;
>> Volume Name: ctvvols
>> Type: Distribute
>> Volume ID: 1ecea912-510f-4079-b437-7398e9caa0eb
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 6
>> Transport-type: tcp
>> Bricks:
>> Brick1: ctv01:/vols/ctvvols
>> Brick2: ctv02:/vols/ctvvols
>> Brick3: ctv03:/vols/ctvvols
>> Brick4: ctv04:/vols/ctvvols
>> Brick5: ctv05:/vols/ctvvols
>> Brick6: ctv06:/vols/ctvvols
>> Options Reconfigured:
>> nfs.disable: on
>> performance.readdir-ahead: on
>> transport.address-family: inet
>> performance.quick-read: off
>> performance.read-ahead: off
>> performance.io-cache: off
>> performance.stat-prefetch: off
>> performance.low-prio-threads: 32
>> network.remote-dio: enable
>> cluster.eager-lock: enable
>> cluster.quorum-type: none
>> cluster.server-quorum-type: server
>> cluster.data-self-heal-algorithm: full
>> cluster.locking-scheme: granular
>> cluster.shd-max-threads: 8
>> cluster.shd-wait-qlength: 1
>> features.shard: off
>> user.cifs: off
>> network.ping-timeout: 10
>> storage.owner-uid: 36
>> storage.owner-gid: 36
>>
>>
>> re balance log:
>>
>>
>> [2017-05-23 14:45:12.637671] I [dht-rebalance.c:2866:gf_defrag_process_dir]
>> 0-ctvvols-dht: Migration operation on dir /31e0b341-4eeb-4b71-b280-840eb
>> a7d6940/images/690c728d-a83e-4c79-ac7d-1f3f17edf7f0 took 0.00 secs
>> [2017-05-23 14:45:12.640043] I [MSGID: 109081]
>> [dht-common.c:4202:dht_setxattr] 0-ctvvols-dht: fixing the layout of
>> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/091402ba-dc90-
>> 4206-848a-d73e85a1cc35
>> [2017-05-23 14:45:12.641516] I [dht-rebalance.c:2652:gf_defrag_process_dir]
>> 0-ctvvols-dht: migra

Re: [Gluster-users] Distributed re-balance issue

2017-05-24 Thread Nithya Balachandran
On 24 May 2017 at 20:02, Mohammed Rafi K C  wrote:

>
>
> On 05/23/2017 08:53 PM, Mahdi Adnan wrote:
>
> Hi,
>
>
> I have a distributed volume with 6 bricks, each have 5TB and it's hosting
> large qcow2 VM disks (I know it's reliable but it's not important data)
>
> I started with 5 bricks and then added another one, started the re balance
> process, everything went well, but now im looking at the bricks free space
> and i found one brick is around 82% while others ranging from 20% to 60%.
>
> The brick with highest utilization is hosting more qcow2 disk than other
> bricks, and whenever i start re balance it just complete in 0 seconds and
> without moving any data.
>
>
> How much is your average file size in the cluster? And number of files
> (roughly) .
>
>
> What will happen with the brick became full ?
>
> Once brick contents goes beyond 90%, new files won't be created in the
> brick. But existing files can grow.
>
>
> Can i move data manually from one brick to the other ?
>
>
> Nop.It is not recommended, even though gluster will try to find the file,
> it may break.
>
>
> Why re balance not distributing data evenly on all bricks ?
>
>
> Rebalance works based on layout, so we need to see how layouts are
> distributed. If one of your bricks has higher capacity, it will have larger
> layout.
>
>


> That is correct. As Rafi said, the layout matters here. Can you please
> send across all the rebalance logs from all the 6 nodes?
>
>
Nodes runing CentOS 7.3
>
> Gluster 3.8.11
>
>
> Volume info;
> Volume Name: ctvvols
> Type: Distribute
> Volume ID: 1ecea912-510f-4079-b437-7398e9caa0eb
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 6
> Transport-type: tcp
> Bricks:
> Brick1: ctv01:/vols/ctvvols
> Brick2: ctv02:/vols/ctvvols
> Brick3: ctv03:/vols/ctvvols
> Brick4: ctv04:/vols/ctvvols
> Brick5: ctv05:/vols/ctvvols
> Brick6: ctv06:/vols/ctvvols
> Options Reconfigured:
> nfs.disable: on
> performance.readdir-ahead: on
> transport.address-family: inet
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> performance.low-prio-threads: 32
> network.remote-dio: enable
> cluster.eager-lock: enable
> cluster.quorum-type: none
> cluster.server-quorum-type: server
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-max-threads: 8
> cluster.shd-wait-qlength: 1
> features.shard: off
> user.cifs: off
> network.ping-timeout: 10
> storage.owner-uid: 36
> storage.owner-gid: 36
>
>
> re balance log:
>
>
> [2017-05-23 14:45:12.637671] I [dht-rebalance.c:2866:gf_defrag_process_dir]
> 0-ctvvols-dht: Migration operation on dir /31e0b341-4eeb-4b71-b280-
> 840eba7d6940/images/690c728d-a83e-4c79-ac7d-1f3f17edf7f0 took 0.00 secs
> [2017-05-23 14:45:12.640043] I [MSGID: 109081] 
> [dht-common.c:4202:dht_setxattr]
> 0-ctvvols-dht: fixing the layout of /31e0b341-4eeb-4b71-b280-
> 840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
> [2017-05-23 14:45:12.641516] I [dht-rebalance.c:2652:gf_defrag_process_dir]
> 0-ctvvols-dht: migrate data called on /31e0b341-4eeb-4b71-b280-
> 840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
> [2017-05-23 14:45:12.642421] I [dht-rebalance.c:2866:gf_defrag_process_dir]
> 0-ctvvols-dht: Migration operation on dir /31e0b341-4eeb-4b71-b280-
> 840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35 took 0.00 secs
> [2017-05-23 14:45:12.645610] I [MSGID: 109081] 
> [dht-common.c:4202:dht_setxattr]
> 0-ctvvols-dht: fixing the layout of /31e0b341-4eeb-4b71-b280-
> 840eba7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078
> [2017-05-23 14:45:12.647034] I [dht-rebalance.c:2652:gf_defrag_process_dir]
> 0-ctvvols-dht: migrate data called on /31e0b341-4eeb-4b71-b280-
> 840eba7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078
> [2017-05-23 14:45:12.647589] I [dht-rebalance.c:2866:gf_defrag_process_dir]
> 0-ctvvols-dht: Migration operation on dir /31e0b341-4eeb-4b71-b280-
> 840eba7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078 took 0.00 secs
> [2017-05-23 14:45:12.653291] I [dht-rebalance.c:3838:gf_defrag_start_crawl]
> 0-DHT: crawling file-system completed
> [2017-05-23 14:45:12.653323] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 23
> [2017-05-23 14:45:12.653508] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 24
> [2017-05-23 14:45:12.653536] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 25
> [2017-05-23 14:45:12.653556] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 26
> [2017-05-23 14:45:12.653580] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 27
> [2017-05-23 14:45:12.653603] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 28
> [2017-05-23 14:45:12.653623] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DH

Re: [Gluster-users] Distributed re-balance issue

2017-05-24 Thread Mohammed Rafi K C


On 05/23/2017 08:53 PM, Mahdi Adnan wrote:
>
> Hi,
>
>
> I have a distributed volume with 6 bricks, each have 5TB and it's
> hosting large qcow2 VM disks (I know it's reliable but it's
> not important data)
>
> I started with 5 bricks and then added another one, started the re
> balance process, everything went well, but now im looking at the
> bricks free space and i found one brick is around 82% while others
> ranging from 20% to 60%.
>
> The brick with highest utilization is hosting more qcow2 disk than
> other bricks, and whenever i start re balance it just complete in 0
> seconds and without moving any data.
>

How much is your average file size in the cluster? And number of files
(roughly) .


> What will happen with the brick became full ?
>
Once brick contents goes beyond 90%, new files won't be created in the
brick. But existing files can grow.


> Can i move data manually from one brick to the other ?
>

Nop.It is not recommended, even though gluster will try to find the
file, it may break.


> Why re balance not distributing data evenly on all bricks ?
>

Rebalance works based on layout, so we need to see how layouts are
distributed. If one of your bricks has higher capacity, it will have
larger layout.

>
> Nodes runing CentOS 7.3
>
> Gluster 3.8.11
>
>
> Volume info;
>
> Volume Name: ctvvols
> Type: Distribute
> Volume ID: 1ecea912-510f-4079-b437-7398e9caa0eb
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 6
> Transport-type: tcp
> Bricks:
> Brick1: ctv01:/vols/ctvvols
> Brick2: ctv02:/vols/ctvvols
> Brick3: ctv03:/vols/ctvvols
> Brick4: ctv04:/vols/ctvvols
> Brick5: ctv05:/vols/ctvvols
> Brick6: ctv06:/vols/ctvvols
> Options Reconfigured:
> nfs.disable: on
> performance.readdir-ahead: on
> transport.address-family: inet
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> performance.low-prio-threads: 32
> network.remote-dio: enable
> cluster.eager-lock: enable
> cluster.quorum-type: none
> cluster.server-quorum-type: server
> cluster.data-self-heal-algorithm: full
> cluster.locking-scheme: granular
> cluster.shd-max-threads: 8
> cluster.shd-wait-qlength: 1
> features.shard: off
> user.cifs: off
> network.ping-timeout: 10
> storage.owner-uid: 36
> storage.owner-gid: 36
>
>
> re balance log:
>
>
> [2017-05-23 14:45:12.637671] I
> [dht-rebalance.c:2866:gf_defrag_process_dir] 0-ctvvols-dht: Migration
> operation on dir
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/690c728d-a83e-4c79-ac7d-1f3f17edf7f0
> took 0.00 secs
> [2017-05-23 14:45:12.640043] I [MSGID: 109081]
> [dht-common.c:4202:dht_setxattr] 0-ctvvols-dht: fixing the layout of
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
> [2017-05-23 14:45:12.641516] I
> [dht-rebalance.c:2652:gf_defrag_process_dir] 0-ctvvols-dht: migrate
> data called on
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
> [2017-05-23 14:45:12.642421] I
> [dht-rebalance.c:2866:gf_defrag_process_dir] 0-ctvvols-dht: Migration
> operation on dir
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/091402ba-dc90-4206-848a-d73e85a1cc35
> took 0.00 secs
> [2017-05-23 14:45:12.645610] I [MSGID: 109081]
> [dht-common.c:4202:dht_setxattr] 0-ctvvols-dht: fixing the layout of
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078
> [2017-05-23 14:45:12.647034] I
> [dht-rebalance.c:2652:gf_defrag_process_dir] 0-ctvvols-dht: migrate
> data called on
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078
> [2017-05-23 14:45:12.647589] I
> [dht-rebalance.c:2866:gf_defrag_process_dir] 0-ctvvols-dht: Migration
> operation on dir
> /31e0b341-4eeb-4b71-b280-840eba7d6940/images/be1e2276-d38f-4d90-abf5-de757dd04078
> took 0.00 secs
> [2017-05-23 14:45:12.653291] I
> [dht-rebalance.c:3838:gf_defrag_start_crawl] 0-DHT: crawling
> file-system completed
> [2017-05-23 14:45:12.653323] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 23
> [2017-05-23 14:45:12.653508] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 24
> [2017-05-23 14:45:12.653536] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 25
> [2017-05-23 14:45:12.653556] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 26
> [2017-05-23 14:45:12.653580] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 27
> [2017-05-23 14:45:12.653603] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 28
> [2017-05-23 14:45:12.653623] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 29
> [2017-05-23 14:45:12.653638] I [dht-rebalance.c:2246:gf_defrag_task]
> 0-DHT: Thread wokeup. defrag->current_thread_count: 30
> [2017-05-23 14:45:12.653659]