> > Hi, > > Rebalance will abort itself if it cannot reach any of the nodes. Are all > the bricks still up and reachable? > > Regards, > Nithya >
Yes the bricks appear to be fine. I restarted the rebalance and the process is moving along again: # gluster vol rebalance tank status Node Rebalanced-files size scanned failures skipped status run time in h:m:s --------- ----------- ----------- ----------- ----------- ----------- ------------ -------------- localhost 226973 14.9TB 1572952 0 0 in progress 44:26:48 serverB 0 0Bytes 631667 0 0 completed 37:2:14 volume rebalance: tank: success # df -hP |grep data /dev/mapper/gluster_vg-gluster_lv1_data 60T 24T 36T 40% /gluster_bricks/data1 /dev/mapper/gluster_vg-gluster_lv2_data 60T 24T 36T 40% /gluster_bricks/data2 /dev/mapper/gluster_vg-gluster_lv3_data 60T 17T 43T 29% /gluster_bricks/data3 /dev/mapper/gluster_vg-gluster_lv4_data 60T 17T 43T 29% /gluster_bricks/data4 /dev/mapper/gluster_vg-gluster_lv5_data 60T 19T 41T 31% /gluster_bricks/data5 /dev/mapper/gluster_vg-gluster_lv6_data 60T 19T 41T 31% /gluster_bricks/data6 Thanks, HB > > > > >> >> # gluster vol rebalance tank status >> Node Rebalanced-files size >> scanned failures skipped status run time in >> h:m:s >> --------- ----------- ----------- >> ----------- ----------- ----------- ------------ >> -------------- >> localhost 1348706 57.8TB >> 2234439 9 6 failed 190:24:3 >> serverB 0 >> 0Bytes 7 0 0 completed >> 63:47:55 >> volume rebalance: tank: success >> >> # gluster vol status tank >> Status of volume: tank >> Gluster process TCP Port RDMA Port Online >> Pid >> >> ------------------------------------------------------------------------------ >> Brick serverA:/gluster_bricks/data1 49162 0 Y >> 20318 >> Brick serverB:/gluster_bricks/data1 49166 0 Y >> 3432 >> Brick serverA:/gluster_bricks/data2 49163 0 Y >> 20323 >> Brick serverB:/gluster_bricks/data2 49167 0 Y >> 3435 >> Brick serverA:/gluster_bricks/data3 49164 0 Y >> 4625 >> Brick serverA:/gluster_bricks/data4 49165 0 Y >> 4644 >> Brick serverA:/gluster_bricks/data5 49166 0 Y >> 5088 >> Brick serverA:/gluster_bricks/data6 49167 0 Y >> 5128 >> Brick serverB:/gluster_bricks/data3 49168 0 Y >> 22314 >> Brick serverB:/gluster_bricks/data4 49169 0 Y >> 22345 >> Brick serverB:/gluster_bricks/data5 49170 0 Y >> 22889 >> Brick serverB:/gluster_bricks/data6 49171 0 Y >> 22932 >> Self-heal Daemon on localhost N/A N/A Y >> 6202 >> Self-heal Daemon on serverB N/A N/A Y >> 22981 >> >> Task Status of Volume tank >> >> ------------------------------------------------------------------------------ >> Task : Rebalance >> ID : eec64343-8e0d-4523-ad05-5678f9eb9eb2 >> Status : failed >> >> # df -hP |grep data >> /dev/mapper/gluster_vg-gluster_lv1_data 60T 31T 29T 52% >> /gluster_bricks/data1 >> /dev/mapper/gluster_vg-gluster_lv2_data 60T 31T 29T 51% >> /gluster_bricks/data2 >> /dev/mapper/gluster_vg-gluster_lv3_data 60T 15T 46T 24% >> /gluster_bricks/data3 >> /dev/mapper/gluster_vg-gluster_lv4_data 60T 15T 46T 24% >> /gluster_bricks/data4 >> /dev/mapper/gluster_vg-gluster_lv5_data 60T 15T 45T 25% >> /gluster_bricks/data5 >> /dev/mapper/gluster_vg-gluster_lv6_data 60T 15T 45T 25% >> /gluster_bricks/data6 >> >> >> The rebalance log on serverA shows a disconnect from serverB >> >> [2019-09-08 15:41:44.285591] C >> [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-tank-client-10: server >> <serverB>:49170 has not responded in the last 42 seconds, disconnecting. >> [2019-09-08 15:41:44.285739] I [MSGID: 114018] >> [client.c:2280:client_rpc_notify] 0-tank-client-10: disconnected from >> tank-client-10. Client process will keep trying to connect to glusterd >> until brick's port is available >> [2019-09-08 15:41:44.286023] E [rpc-clnt.c:365:saved_frames_unwind] (--> >> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7ff986e8b132] (--> >> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7ff986c5299e] (--> >> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7ff986c52aae] (--> >> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7ff986c54220] (--> >> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2b0)[0x7ff986c54ce0] ))))) >> 0-tank-client-10: forced unwinding frame type(GlusterFS 3.3) >> op(FXATTROP(34)) called at 2019-09-08 15:40:44.040333 (xid=0x7f8cfac) >> >> Does this type of failure cause data corruption? What is the best course >> of action at this point? >> >> Thanks, >> >> HB >> >> On Wed, Sep 11, 2019 at 11:58 PM Strahil <hunter86...@yahoo.com> wrote: >> >>> Hi Nithya, >>> >>> Thanks for the detailed explanation. >>> It makes sense. >>> >>> Best Regards, >>> Strahil Nikolov >>> On Sep 12, 2019 08:18, Nithya Balachandran <nbala...@redhat.com> wrote: >>> >>> >>> >>> On Wed, 11 Sep 2019 at 09:47, Strahil <hunter86...@yahoo.com> wrote: >>> >>> Hi Nithya, >>> >>> I just reminded about your previous e-mail which left me with the >>> impression that old volumes need that. >>> This is the one 1 mean: >>> >>> >It looks like this is a replicate volume. If >that is the case then >>> yes, you are >running an old version of Gluster for >which this was the >>> default >>> >>> >>> Hi Strahil, >>> >>> I'm providing a little more detail here which I hope will explain things. >>> Rebalance was always a volume wide operation - a *rebalance start* >>> operation will start rebalance processes on all nodes of the volume. >>> However, different processes would behave differently. In earlier releases, >>> all nodes would crawl the bricks and update the directory layouts. However, >>> only one node in each replica/disperse set would actually migrate files,so >>> the rebalance status would only show one node doing any "work" (scanning, >>> rebalancing etc). However, this one node will process all the files in its >>> replica sets. Rerunning rebalance on other nodes would make no difference >>> as it will always be the same node that ends up migrating files. >>> So for instance, for a replicate volume with server1:/brick1, >>> server2:/brick2 and server3:/brick3 in that order, only the rebalance >>> process on server1 would migrate files. In newer releases, all 3 nodes >>> would migrate files. >>> >>> The rebalance status does not capture the directory operations of fixing >>> layouts which is why it looks like the other nodes are not doing anything. >>> >>> Hope this helps. >>> >>> Regards, >>> Nithya >>> >>> behaviour. >>> >>> > >>> > >>> >>> >Regards, >>> >>> > >>> >>> >Nithya >>> >>> >>> Best Regards, >>> Strahil Nikolov >>> On Sep 9, 2019 06:36, Nithya Balachandran <nbala...@redhat.com> wrote: >>> >>> >>> >>> On Sat, 7 Sep 2019 at 00:03, Strahil Nikolov <hunter86...@yahoo.com> >>> wrote: >>> >>> As it was mentioned, you might have to run rebalance on the other node - >>> but it is better to wait this node is over. >>> >>> >>> Hi Strahil, >>> >>> Rebalance does not need to be run on the other node - the operation is a >>> volume wide one . Only a single node per replica set would migrate files in >>> the version used in this case . >>> >>> Regards, >>> Nithya >>> >>> Best Regards, >>> Strahil Nikolov >>> >>> В петък, 6 септември 2019 г., 15:29:20 ч. Гринуич+3, Herb Burnswell < >>> herbert.burnsw...@gmail.com> >>> >>> _______________________________________________ >> Gluster-users mailing list >> Gluster-users@gluster.org >> https://lists.gluster.org/mailman/listinfo/gluster-users > >
________ Community Meeting Calendar: APAC Schedule - Every 2nd and 4th Tuesday at 11:30 AM IST Bridge: https://bluejeans.com/118564314 NA/EMEA Schedule - Every 1st and 3rd Tuesday at 01:00 PM EDT Bridge: https://bluejeans.com/118564314 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users