Re: [Gluster-devel] Gluster 5.10 rebalance stuck
Hi Gluster Dev's, Any leads on the above? We are kinda stuck at the moment. On Mon, Nov 7, 2022 at 2:13 PM Strahil Nikolov wrote: > Hi Dev list, > > How can I find the details about the rebalance_status/status ids ? Is it > actually normal that some systems are in '4' , others in '3' ? > > Is it safe to forcefully start a new rebalance ? > > Best Regards, > Strahil Nikolov > > On Mon, Nov 7, 2022 at 9:15, Shreyansh Shah > wrote: > Hi Strahil, > Adding the info below: > > -- > Node IP = 10.132.0.19 > rebalance_status=1 > status=4 > rebalance_op=19 > rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f > rebalanced-files=27054 > size=7104425578505 > scanned=72141 > failures=10 > skipped=19611 > run-time=92805.00 > -- > Node IP = 10.132.0.20 > rebalance_status=1 > status=4 > rebalance_op=19 > rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f > rebalanced-files=23945 > size=7126809216060 > scanned=71208 > failures=7 > skipped=18834 > run-time=94029.00 > -- > Node IP = 10.132.1.12 > rebalance_status=1 > status=4 > rebalance_op=19 > rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f > rebalanced-files=12533 > size=12945021256 > scanned=40398 > failures=14 > skipped=1194 > run-time=92201.00 > -- > Node IP = 10.132.1.13 > rebalance_status=1 > status=3 > rebalance_op=19 > rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f > rebalanced-files=41483 > size=8845076025598 > scanned=179920 > failures=25 > skipped=62373 > run-time=130017.00 > -- > Node IP = 10.132.1.14 > rebalance_status=1 > status=3 > rebalance_op=19 > rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f > rebalanced-files=43603 > size=7834691799355 > scanned=204140 > failures=2878 > skipped=87761 > run-time=130016.00 > -- > Node IP = 10.132.1.15 > rebalance_status=1 > status=4 > rebalance_op=19 > rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f > rebalanced-files=29968 > size=6389568855140 > scanned=69320 > failures=7 > skipped=17999 > run-time=93654.00 > -- > Node IP = 10.132.1.16 > rebalance_status=1 > status=4 > rebalance_op=19 > rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f > rebalanced-files=23226 > size=5899338197718 > scanned=56169 > failures=7 > skipped=12659 > run-time=94030.00 > -- > Node IP = 10.132.1.17 > rebalance_status=1 > status=4 > rebalance_op=19 > rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f > rebalanced-files=17538 > size=6247281008602 > scanned=50038 > failures=8 > skipped=11335 > run-time=92203.00 > -- > Node IP = 10.132.1.18 > rebalance_status=1 > status=4 > rebalance_op=19 > rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f > rebalanced-files=20394 > size=6395008466977 > scanned=50060 > failures=7 > skipped=13784 > run-time=92103.00 > -- > Node IP = 10.132.1.19 > rebalance_status=1 > status=1 > rebalance_op=19 > rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f > rebalanced-files=0 > size=0 > scanned=0 > failures=0 > skipped=0 > run-time=0.00 > -- > Node IP = 10.132.1.20 > rebalance_status=1 > status=3 > rebalance_op=19 > rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f > rebalanced-files=0 > size=0 > scanned=24 > failures=0 > skipped=2 > run-time=1514.00 > > On Thu, Nov 3, 2022 at 10:10 PM Strahil Nikolov > wrote: > > And the other servers ? > > On Thu, Nov 3, 2022 at 16:21, Shreyansh Shah > wrote: > Hi Strahil, > Thank you for your reply. node_state.info has the below data > > root@gluster-11:/usr/var/lib/glusterd/vols/data# cat node_state.info > rebalance_status=1 > status=3 > rebalance_op=19 > rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f > rebalanced-files=0 > size=0 > scanned=24 > failures=0 > skipped=2 > run-time=1514.00 > > > > > On Thu, Nov 3, 2022 at 4:00 PM Strahil Nikolov > wrote: > > I would check the details in /var/lib/glusterd/vols// > node_state.info > > Best Regards, > Strahil Nikolov > > On Wed, Nov 2, 2022 at 9:06, Shreyansh Shah > wrote: > Hi, > I Would really appreciate it if someone would be able to help on the above > issue. We are stuck as we cannot run rebalance due to this and thus are not > able to extract peak performance from the setup due to unbalanced data. > Adding gluster info (without the bricks) below. Please let me know if any > other details/logs are needed. > > Volume Name: data > Type: Distribute > Volume ID: 75410231-bb25-4f14-bcde-caf18fce1d31 > Status: Started > Snapshot Count: 0 > Number of Bricks: 41 > Transport-type: tcp > Options Reconfigured: > server.event-threads: 4 > network.ping-timeout: 90 > client.keepalive-time: 60 > server.keepalive-time: 60 > storage.health-check-interval: 60 > performance.client-io-threads: on >
Re: [Gluster-devel] Gluster 5.10 rebalance stuck
Hi Dev list, How can I find the details about the rebalance_status/status ids ? Is it actually normal that some systems are in '4' , others in '3' ? Is it safe to forcefully start a new rebalance ? Best Regards,Strahil Nikolov On Mon, Nov 7, 2022 at 9:15, Shreyansh Shah wrote: Hi Strahil, Adding the info below: -- Node IP = 10.132.0.19 rebalance_status=1 status=4 rebalance_op=19 rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f rebalanced-files=27054 size=7104425578505 scanned=72141 failures=10 skipped=19611 run-time=92805.00 -- Node IP = 10.132.0.20 rebalance_status=1 status=4 rebalance_op=19 rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f rebalanced-files=23945 size=7126809216060 scanned=71208 failures=7 skipped=18834 run-time=94029.00 -- Node IP = 10.132.1.12 rebalance_status=1 status=4 rebalance_op=19 rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f rebalanced-files=12533 size=12945021256 scanned=40398 failures=14 skipped=1194 run-time=92201.00 -- Node IP = 10.132.1.13 rebalance_status=1 status=3 rebalance_op=19 rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f rebalanced-files=41483 size=8845076025598 scanned=179920 failures=25 skipped=62373 run-time=130017.00 -- Node IP = 10.132.1.14 rebalance_status=1 status=3 rebalance_op=19 rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f rebalanced-files=43603 size=7834691799355 scanned=204140 failures=2878 skipped=87761 run-time=130016.00 -- Node IP = 10.132.1.15 rebalance_status=1 status=4 rebalance_op=19 rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f rebalanced-files=29968 size=6389568855140 scanned=69320 failures=7 skipped=17999 run-time=93654.00 -- Node IP = 10.132.1.16 rebalance_status=1 status=4 rebalance_op=19 rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f rebalanced-files=23226 size=5899338197718 scanned=56169 failures=7 skipped=12659 run-time=94030.00 -- Node IP = 10.132.1.17 rebalance_status=1 status=4 rebalance_op=19 rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f rebalanced-files=17538 size=6247281008602 scanned=50038 failures=8 skipped=11335 run-time=92203.00 -- Node IP = 10.132.1.18 rebalance_status=1 status=4 rebalance_op=19 rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f rebalanced-files=20394 size=6395008466977 scanned=50060 failures=7 skipped=13784 run-time=92103.00 -- Node IP = 10.132.1.19 rebalance_status=1 status=1 rebalance_op=19 rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f rebalanced-files=0 size=0 scanned=0 failures=0 skipped=0 run-time=0.00 -- Node IP = 10.132.1.20 rebalance_status=1 status=3 rebalance_op=19 rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f rebalanced-files=0 size=0 scanned=24 failures=0 skipped=2 run-time=1514.00 On Thu, Nov 3, 2022 at 10:10 PM Strahil Nikolov wrote: And the other servers ? On Thu, Nov 3, 2022 at 16:21, Shreyansh Shah wrote: Hi Strahil, Thank you for your reply. node_state.info has the below data root@gluster-11:/usr/var/lib/glusterd/vols/data# cat node_state.info rebalance_status=1 status=3 rebalance_op=19 rebalance-id=39a89b51-2549-4348-aa47-0db321c3a32f rebalanced-files=0 size=0 scanned=24 failures=0 skipped=2 run-time=1514.00 On Thu, Nov 3, 2022 at 4:00 PM Strahil Nikolov wrote: I would check the details in /var/lib/glusterd/vols//node_state.info Best Regards,Strahil Nikolov On Wed, Nov 2, 2022 at 9:06, Shreyansh Shah wrote: Hi, I Would really appreciate it if someone would be able to help on the above issue. We are stuck as we cannot run rebalance due to this and thus are not able to extract peak performance from the setup due to unbalanced data. Adding gluster info (without the bricks) below. Please let me know if any other details/logs are needed. Volume Name: data Type: Distribute Volume ID: 75410231-bb25-4f14-bcde-caf18fce1d31 Status: Started Snapshot Count: 0 Number of Bricks: 41 Transport-type: tcp Options Reconfigured: server.event-threads: 4 network.ping-timeout: 90 client.keepalive-time: 60 server.keepalive-time: 60 storage.health-check-interval: 60 performance.client-io-threads: on nfs.disable: on transport.address-family: inet performance.cache-size: 8GB performance.cache-refresh-timeout: 60 cluster.min-free-disk: 3% client.event-threads: 4 performance.io-thread-count: 16 On Fri, Oct 28, 2022 at 11:40 AM Shreyansh Shah wrote: Hi, We are running glusterfs 5.10 server volume. Recently we added a few new bricks and started a rebalance operation. After a couple of days the rebalance operation was just stuck, with one of the peers showing In-Progress with no file being