Re: [Gluster-devel] Gluster 5.10 rebalance stuck

2022-11-16 Thread Shreyansh Shah
Hi Gluster Dev's,
Any leads on the above? We are kinda stuck at the moment.

On Mon, Nov 7, 2022 at 2:13 PM Strahil Nikolov 
wrote:

> Hi Dev list,
>
> How can I find the details about the rebalance_status/status ids ? Is it
> actually normal that some systems are in '4' , others in '3' ?
>
> Is it safe to forcefully start a new rebalance ?
>
> Best Regards,
> Strahil Nikolov
>
> On Mon, Nov 7, 2022 at 9:15, Shreyansh Shah
>  wrote:
> Hi Strahil,
> Adding the info below:
>
> --
> Node IP = 10.132.0.19
> rebalance_status=1
> status=4
> rebalance_op=19
> rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
> rebalanced-files=27054
> size=7104425578505
> scanned=72141
> failures=10
> skipped=19611
> run-time=92805.00
> --
> Node IP = 10.132.0.20
> rebalance_status=1
> status=4
> rebalance_op=19
> rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
> rebalanced-files=23945
> size=7126809216060
> scanned=71208
> failures=7
> skipped=18834
> run-time=94029.00
> --
> Node IP = 10.132.1.12
> rebalance_status=1
> status=4
> rebalance_op=19
> rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
> rebalanced-files=12533
> size=12945021256
> scanned=40398
> failures=14
> skipped=1194
> run-time=92201.00
> --
> Node IP = 10.132.1.13
> rebalance_status=1
> status=3
> rebalance_op=19
> rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
> rebalanced-files=41483
> size=8845076025598
> scanned=179920
> failures=25
> skipped=62373
> run-time=130017.00
> --
> Node IP = 10.132.1.14
> rebalance_status=1
> status=3
> rebalance_op=19
> rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
> rebalanced-files=43603
> size=7834691799355
> scanned=204140
> failures=2878
> skipped=87761
> run-time=130016.00
> --
> Node IP = 10.132.1.15
> rebalance_status=1
> status=4
> rebalance_op=19
> rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
> rebalanced-files=29968
> size=6389568855140
> scanned=69320
> failures=7
> skipped=17999
> run-time=93654.00
> --
> Node IP = 10.132.1.16
> rebalance_status=1
> status=4
> rebalance_op=19
> rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
> rebalanced-files=23226
> size=5899338197718
> scanned=56169
> failures=7
> skipped=12659
> run-time=94030.00
> --
> Node IP = 10.132.1.17
> rebalance_status=1
> status=4
> rebalance_op=19
> rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
> rebalanced-files=17538
> size=6247281008602
> scanned=50038
> failures=8
> skipped=11335
> run-time=92203.00
> --
> Node IP = 10.132.1.18
> rebalance_status=1
> status=4
> rebalance_op=19
> rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
> rebalanced-files=20394
> size=6395008466977
> scanned=50060
> failures=7
> skipped=13784
> run-time=92103.00
> --
> Node IP = 10.132.1.19
> rebalance_status=1
> status=1
> rebalance_op=19
> rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
> rebalanced-files=0
> size=0
> scanned=0
> failures=0
> skipped=0
> run-time=0.00
> --
> Node IP = 10.132.1.20
> rebalance_status=1
> status=3
> rebalance_op=19
> rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
> rebalanced-files=0
> size=0
> scanned=24
> failures=0
> skipped=2
> run-time=1514.00
>
> On Thu, Nov 3, 2022 at 10:10 PM Strahil Nikolov 
> wrote:
>
> And the other servers ?
>
> On Thu, Nov 3, 2022 at 16:21, Shreyansh Shah
>  wrote:
> Hi Strahil,
> Thank you for your reply. node_state.info has the below data
>
> root@gluster-11:/usr/var/lib/glusterd/vols/data# cat node_state.info
> rebalance_status=1
> status=3
> rebalance_op=19
> rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
> rebalanced-files=0
> size=0
> scanned=24
> failures=0
> skipped=2
> run-time=1514.00
>
>
>
>
> On Thu, Nov 3, 2022 at 4:00 PM Strahil Nikolov 
> wrote:
>
> I would check the details in /var/lib/glusterd/vols//
> node_state.info
>
> Best Regards,
> Strahil Nikolov
>
> On Wed, Nov 2, 2022 at 9:06, Shreyansh Shah
>  wrote:
> Hi,
> I Would really appreciate it if someone would be able to help on the above
> issue. We are stuck as we cannot run rebalance due to this and thus are not
> able to extract peak performance from the setup due to unbalanced data.
> Adding gluster info (without the bricks) below. Please let me know if any
> other details/logs are needed.
>
> Volume Name: data
> Type: Distribute
> Volume ID: 75410231-bb25-4f14-bcde-caf18fce1d31
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 41
> Transport-type: tcp
> Options Reconfigured:
> server.event-threads: 4
> network.ping-timeout: 90
> client.keepalive-time: 60
> server.keepalive-time: 60
> storage.health-check-interval: 60
> performance.client-io-threads: on
> 

Re: [Gluster-devel] Gluster 5.10 rebalance stuck

2022-11-07 Thread Strahil Nikolov
Hi Dev list,
How can I find the details about the rebalance_status/status ids ? Is it 
actually normal that some systems are in '4' , others in '3' ?
Is it safe to forcefully start a new rebalance ?
Best Regards,Strahil Nikolov  
 
  On Mon, Nov 7, 2022 at 9:15, Shreyansh Shah 
wrote:   Hi Strahil,
Adding the info below:

--
Node IP = 10.132.0.19
rebalance_status=1
status=4
rebalance_op=19
rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
rebalanced-files=27054
size=7104425578505
scanned=72141
failures=10
skipped=19611
run-time=92805.00
--
Node IP = 10.132.0.20
rebalance_status=1
status=4
rebalance_op=19
rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
rebalanced-files=23945
size=7126809216060
scanned=71208
failures=7
skipped=18834
run-time=94029.00
--
Node IP = 10.132.1.12
rebalance_status=1
status=4
rebalance_op=19
rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
rebalanced-files=12533
size=12945021256
scanned=40398
failures=14
skipped=1194
run-time=92201.00
--
Node IP = 10.132.1.13
rebalance_status=1
status=3
rebalance_op=19
rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
rebalanced-files=41483
size=8845076025598
scanned=179920
failures=25
skipped=62373
run-time=130017.00
--
Node IP = 10.132.1.14
rebalance_status=1
status=3
rebalance_op=19
rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
rebalanced-files=43603
size=7834691799355
scanned=204140
failures=2878
skipped=87761
run-time=130016.00
--
Node IP = 10.132.1.15
rebalance_status=1
status=4
rebalance_op=19
rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
rebalanced-files=29968
size=6389568855140
scanned=69320
failures=7
skipped=17999
run-time=93654.00
--
Node IP = 10.132.1.16
rebalance_status=1
status=4
rebalance_op=19
rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
rebalanced-files=23226
size=5899338197718
scanned=56169
failures=7
skipped=12659
run-time=94030.00
--
Node IP = 10.132.1.17
rebalance_status=1
status=4
rebalance_op=19
rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
rebalanced-files=17538
size=6247281008602
scanned=50038
failures=8
skipped=11335
run-time=92203.00
--
Node IP = 10.132.1.18
rebalance_status=1
status=4
rebalance_op=19
rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
rebalanced-files=20394
size=6395008466977
scanned=50060
failures=7
skipped=13784
run-time=92103.00
--
Node IP = 10.132.1.19
rebalance_status=1
status=1
rebalance_op=19
rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
rebalanced-files=0
size=0
scanned=0
failures=0
skipped=0
run-time=0.00
--
Node IP = 10.132.1.20
rebalance_status=1
status=3
rebalance_op=19
rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
rebalanced-files=0
size=0
scanned=24
failures=0
skipped=2
run-time=1514.00

On Thu, Nov 3, 2022 at 10:10 PM Strahil Nikolov  wrote:

And the other servers ?
 
 
  On Thu, Nov 3, 2022 at 16:21, Shreyansh Shah 
wrote:   Hi Strahil,
Thank you for your reply. node_state.info has the below data


root@gluster-11:/usr/var/lib/glusterd/vols/data# cat node_state.info
 rebalance_status=1
status=3
rebalance_op=19
rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f
rebalanced-files=0
size=0
scanned=24
failures=0
skipped=2
run-time=1514.00



On Thu, Nov 3, 2022 at 4:00 PM Strahil Nikolov  wrote:

I would check the details in 
/var/lib/glusterd/vols//node_state.info
Best Regards,Strahil Nikolov 
 
 
  On Wed, Nov 2, 2022 at 9:06, Shreyansh Shah 
wrote:   Hi,
I Would really appreciate it if someone would be able to help on the above 
issue. We are stuck as we cannot run rebalance due to this and thus are not 
able to extract peak performance from the setup due to unbalanced data.
Adding gluster info (without the bricks) below. Please let me know if any other 
details/logs are needed.


Volume Name: data
Type: Distribute
Volume ID: 75410231-bb25-4f14-bcde-caf18fce1d31
Status: Started
Snapshot Count: 0
Number of Bricks: 41
Transport-type: tcp
Options Reconfigured:
server.event-threads: 4
network.ping-timeout: 90
client.keepalive-time: 60
server.keepalive-time: 60
storage.health-check-interval: 60
performance.client-io-threads: on
nfs.disable: on
transport.address-family: inet
performance.cache-size: 8GB
performance.cache-refresh-timeout: 60
cluster.min-free-disk: 3%
client.event-threads: 4
performance.io-thread-count: 16


On Fri, Oct 28, 2022 at 11:40 AM Shreyansh Shah  
wrote:

Hi,
We are running glusterfs 5.10 server volume. Recently we added a few new bricks 
and started a rebalance operation. After a couple of days the rebalance 
operation was just stuck, with one of the peers showing In-Progress with no 
file being