If I were you I follow the following steps. Stop the rebalance and fix the cluster health first. Bring up the down server, replace server4:brick4 with a new disk, format and be sure it is started, then start a full heal. Without all bricks up full heal will not start. The you can continue with rebalance.
On Fri, Feb 2, 2018 at 1:27 PM, Alessandro Ipe <alessandro....@meteo.be> wrote: > Hi, > > > I simplified the config in my first email, but I actually have 2x4 servers in > replicate-distribute with each 4 bricks for 6 of them and 2 bricks for the > remaining 2. Full healing will just take ages... for a just single brick to > resync ! > >> gluster v status home > volume status home > Status of volume: home > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick server1:/data/glusterfs/home/brick1 49157 0 Y 5003 > Brick server1:/data/glusterfs/home/brick2 49153 0 Y 5023 > Brick server1:/data/glusterfs/home/brick3 49154 0 Y 5004 > Brick server1:/data/glusterfs/home/brick4 49155 0 Y 5011 > Brick server3:/data/glusterfs/home/brick1 49152 0 Y 5422 > Brick server4:/data/glusterfs/home/brick1 49152 0 Y 5019 > Brick server3:/data/glusterfs/home/brick2 49153 0 Y 5429 > Brick server4:/data/glusterfs/home/brick2 49153 0 Y 5033 > Brick server3:/data/glusterfs/home/brick3 49154 0 Y 5437 > Brick server4:/data/glusterfs/home/brick3 49154 0 Y 5026 > Brick server3:/data/glusterfs/home/brick4 49155 0 Y 5444 > Brick server4:/data/glusterfs/home/brick4 N/A N/A N N/A > Brick server5:/data/glusterfs/home/brick1 49152 0 Y 5275 > Brick server6:/data/glusterfs/home/brick1 49152 0 Y 5786 > Brick server5:/data/glusterfs/home/brick2 49153 0 Y 5276 > Brick server6:/data/glusterfs/home/brick2 49153 0 Y 5792 > Brick server5:/data/glusterfs/home/brick3 49154 0 Y 5282 > Brick server6:/data/glusterfs/home/brick3 49154 0 Y 5794 > Brick server5:/data/glusterfs/home/brick4 49155 0 Y 5293 > Brick server6:/data/glusterfs/home/brick4 49155 0 Y 5806 > Brick server7:/data/glusterfs/home/brick1 49156 0 Y 22339 > Brick server8:/data/glusterfs/home/brick1 49153 0 Y 17992 > Brick server7:/data/glusterfs/home/brick2 49157 0 Y 22347 > Brick server8:/data/glusterfs/home/brick2 49154 0 Y 18546 > NFS Server on localhost 2049 0 Y 683 > Self-heal Daemon on localhost N/A N/A Y 693 > NFS Server on server8 2049 0 Y 18553 > Self-heal Daemon on server8 N/A N/A Y 18566 > NFS Server on server5 2049 0 Y 23115 > Self-heal Daemon on server5 N/A N/A Y 23121 > NFS Server on server7 2049 0 Y 4201 > Self-heal Daemon on server7 N/A N/A Y 4210 > NFS Server on server3 2049 0 Y 5460 > Self-heal Daemon on server3 N/A N/A Y 5469 > NFS Server on server6 2049 0 Y 22709 > Self-heal Daemon on server6 N/A N/A Y 22718 > NFS Server on server4 2049 0 Y 6044 > Self-heal Daemon on server4 N/A N/A Y 6243 > > server 2 is currently powered off as we are waiting a replacement RAID > controller, as well as for > server4:/data/glusterfs/home/brick4 > > And as I said, there is a rebalance in progress >> gluster rebalance home status > Node Rebalanced-files size > scanned failures skipped status run time in h:m:s > --------- ----------- ----------- > ----------- ----------- ----------- ------------ > -------------- > localhost 42083 23.3GB > 1568065 1359 303734 in progress 16:49:31 > server5 35698 23.8GB > 1027934 0 240748 in progress 16:49:23 > server4 35096 23.4GB > 899491 0 229064 in progress 16:49:18 > server3 27031 18.0GB > 701759 8 182592 in progress 16:49:27 > server8 0 0Bytes > 327602 0 805 in progress 16:49:18 > server6 35672 23.9GB > 1028469 0 240810 in progress 16:49:17 > server7 1 45Bytes > 53 0 0 completed 0:03:53 > Estimated time left for rebalance to complete : 359739:51:24 > volume rebalance: home: success > > > Thanks, > > > A. > > > > On Thursday, 1 February 2018 18:57:17 CET Serkan Çoban wrote: >> What is server4? You just mentioned server1 and server2 previously. >> Can you post the output of gluster v status volname >> >> On Thu, Feb 1, 2018 at 8:13 PM, Alessandro Ipe <alessandro....@meteo.be> >> wrote: >> > Hi, >> > >> > >> > Thanks. However "gluster v heal volname full" returned the following error >> > message >> > Commit failed on server4. Please check log file for details. >> > >> > I have checked the log files in /var/log/glusterfs on server4 (by grepping >> > heal), but did not get any match. What should I be looking for and in >> > which >> > log file, please ? >> > >> > Note that there is currently a rebalance process running on the volume. >> > >> > >> > Many thanks, >> > >> > >> > A. >> > >> > On Thursday, 1 February 2018 17:32:19 CET Serkan Çoban wrote: >> >> You do not need to reset brick if brick path does not change. Replace >> >> the brick format and mount, then gluster v start volname force. >> >> To start self heal just run gluster v heal volname full. >> >> >> >> On Thu, Feb 1, 2018 at 6:39 PM, Alessandro Ipe <alessandro....@meteo.be> >> > >> > wrote: >> >> > Hi, >> >> > >> >> > >> >> > My volume home is configured in replicate mode (version 3.12.4) with >> >> > the >> >> > bricks server1:/data/gluster/brick1 >> >> > server2:/data/gluster/brick1 >> >> > >> >> > server2:/data/gluster/brick1 was corrupted, so I killed gluster daemon >> >> > for >> >> > that brick on server2, umounted it, reformated it, remounted it and did >> >> > a> >> >> > >> >> >> gluster volume reset-brick home server2:/data/gluster/brick1 >> >> >> server2:/data/gluster/brick1 commit force> >> >> > >> >> > I was expecting that the self-heal daemon would start copying data from >> >> > server1:/data/gluster/brick1 (about 7.4 TB) to the empty >> >> > server2:/data/gluster/brick1, which it only did for directories, but >> >> > not >> >> > for files. >> >> > >> >> > For the moment, I launched on the fuse mount point >> >> > >> >> >> find . | xargs stat >> >> > >> >> > but crawling the whole volume (100 TB) to trigger self-healing of a >> >> > single >> >> > brick of 7.4 TB is unefficient. >> >> > >> >> > Is there any trick to only self-heal a single brick, either by setting >> >> > some attributes to its top directory, for example ? >> >> > >> >> > >> >> > Many thanks, >> >> > >> >> > >> >> > Alessandro >> >> > >> >> > >> >> > _______________________________________________ >> >> > Gluster-users mailing list >> >> > Gluster-users@gluster.org >> >> > http://lists.gluster.org/mailman/listinfo/gluster-users >> > >> > -- >> > >> > Dr. Ir. Alessandro Ipe >> > Department of Observations Tel. +32 2 373 06 31 >> > Remote Sensing from Space >> > Royal Meteorological Institute >> > Avenue Circulaire 3 Email: >> > B-1180 Brussels Belgium alessandro....@meteo.be >> > Web: http://gerb.oma.be > > > -- > > Dr. Ir. Alessandro Ipe > Department of Observations Tel. +32 2 373 06 31 > Remote Sensing from Space > Royal Meteorological Institute > Avenue Circulaire 3 Email: > B-1180 Brussels Belgium alessandro....@meteo.be > Web: http://gerb.oma.be > > > _______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users