If I were you I follow the following steps. Stop the rebalance and fix
the cluster health first.
Bring up the down server, replace server4:brick4 with a new disk,
format and be sure it is started, then start a full heal.
Without all bricks up full heal will not start. The you can continue
with rebalance.


On Fri, Feb 2, 2018 at 1:27 PM, Alessandro Ipe <alessandro....@meteo.be> wrote:
> Hi,
>
>
> I simplified the config in my first email, but I actually have 2x4 servers in 
> replicate-distribute with each 4 bricks for 6 of them and 2 bricks  for the 
> remaining 2. Full healing will just take ages... for a just single brick to 
> resync !
>
>> gluster v status home
> volume status home
> Status of volume: home
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> ------------------------------------------------------------------------------
> Brick server1:/data/glusterfs/home/brick1  49157     0          Y       5003
> Brick server1:/data/glusterfs/home/brick2  49153     0          Y       5023
> Brick server1:/data/glusterfs/home/brick3  49154     0          Y       5004
> Brick server1:/data/glusterfs/home/brick4  49155     0          Y       5011
> Brick server3:/data/glusterfs/home/brick1  49152     0          Y       5422
> Brick server4:/data/glusterfs/home/brick1  49152     0          Y       5019
> Brick server3:/data/glusterfs/home/brick2  49153     0          Y       5429
> Brick server4:/data/glusterfs/home/brick2  49153     0          Y       5033
> Brick server3:/data/glusterfs/home/brick3  49154     0          Y       5437
> Brick server4:/data/glusterfs/home/brick3  49154     0          Y       5026
> Brick server3:/data/glusterfs/home/brick4  49155     0          Y       5444
> Brick server4:/data/glusterfs/home/brick4  N/A       N/A        N       N/A
> Brick server5:/data/glusterfs/home/brick1  49152     0          Y       5275
> Brick server6:/data/glusterfs/home/brick1  49152     0          Y       5786
> Brick server5:/data/glusterfs/home/brick2  49153     0          Y       5276
> Brick server6:/data/glusterfs/home/brick2  49153     0          Y       5792
> Brick server5:/data/glusterfs/home/brick3  49154     0          Y       5282
> Brick server6:/data/glusterfs/home/brick3  49154     0          Y       5794
> Brick server5:/data/glusterfs/home/brick4  49155     0          Y       5293
> Brick server6:/data/glusterfs/home/brick4  49155     0          Y       5806
> Brick server7:/data/glusterfs/home/brick1  49156     0          Y       22339
> Brick server8:/data/glusterfs/home/brick1  49153     0          Y       17992
> Brick server7:/data/glusterfs/home/brick2  49157     0          Y       22347
> Brick server8:/data/glusterfs/home/brick2  49154     0          Y       18546
> NFS Server on localhost                     2049      0          Y       683
> Self-heal Daemon on localhost               N/A       N/A        Y       693
> NFS Server on server8                      2049      0          Y       18553
> Self-heal Daemon on server8                N/A       N/A        Y       18566
> NFS Server on server5                      2049      0          Y       23115
> Self-heal Daemon on server5                N/A       N/A        Y       23121
> NFS Server on server7                      2049      0          Y       4201
> Self-heal Daemon on server7                N/A       N/A        Y       4210
> NFS Server on server3                      2049      0          Y       5460
> Self-heal Daemon on server3                N/A       N/A        Y       5469
> NFS Server on server6                      2049      0          Y       22709
> Self-heal Daemon on server6                N/A       N/A        Y       22718
> NFS Server on server4                      2049      0          Y       6044
> Self-heal Daemon on server4                N/A       N/A        Y       6243
>
> server 2 is currently powered off as we are waiting a replacement RAID 
> controller, as well as for
> server4:/data/glusterfs/home/brick4
>
> And as I said, there is a rebalance in progress
>> gluster rebalance home status
>                                     Node Rebalanced-files          size       
> scanned      failures       skipped               status  run time in h:m:s
>                                ---------      -----------   -----------   
> -----------   -----------   -----------         ------------     
> --------------
>                                localhost            42083        23.3GB       
> 1568065          1359        303734          in progress       16:49:31
>                                 server5            35698        23.8GB       
> 1027934             0        240748          in progress       16:49:23
>                                 server4            35096        23.4GB        
> 899491             0        229064          in progress       16:49:18
>                                 server3            27031        18.0GB        
> 701759             8        182592          in progress       16:49:27
>                                 server8                0        0Bytes        
> 327602             0           805          in progress       16:49:18
>                                 server6            35672        23.9GB       
> 1028469             0        240810          in progress       16:49:17
>                                 server7                1       45Bytes        
>     53             0             0            completed        0:03:53
> Estimated time left for rebalance to complete :   359739:51:24
> volume rebalance: home: success
>
>
> Thanks,
>
>
> A.
>
>
>
> On Thursday, 1 February 2018 18:57:17 CET Serkan Çoban wrote:
>> What is server4? You just mentioned server1 and server2 previously.
>> Can you post the output of gluster v status volname
>>
>> On Thu, Feb 1, 2018 at 8:13 PM, Alessandro Ipe <alessandro....@meteo.be> 
>> wrote:
>> > Hi,
>> >
>> >
>> > Thanks. However "gluster v heal volname full" returned the following error
>> > message
>> > Commit failed on server4. Please check log file for details.
>> >
>> > I have checked the log files in /var/log/glusterfs on server4 (by grepping
>> > heal), but did not get any match. What should I be looking for and in
>> > which
>> > log file, please ?
>> >
>> > Note that there is currently a rebalance process running on the volume.
>> >
>> >
>> > Many thanks,
>> >
>> >
>> > A.
>> >
>> > On Thursday, 1 February 2018 17:32:19 CET Serkan Çoban wrote:
>> >> You do not need to reset brick if brick path does not change. Replace
>> >> the brick format and mount, then gluster v start volname force.
>> >> To start self heal just run gluster v heal volname full.
>> >>
>> >> On Thu, Feb 1, 2018 at 6:39 PM, Alessandro Ipe <alessandro....@meteo.be>
>> >
>> > wrote:
>> >> > Hi,
>> >> >
>> >> >
>> >> > My volume home is configured in replicate mode (version 3.12.4) with
>> >> > the
>> >> > bricks server1:/data/gluster/brick1
>> >> > server2:/data/gluster/brick1
>> >> >
>> >> > server2:/data/gluster/brick1 was corrupted, so I killed gluster daemon
>> >> > for
>> >> > that brick on server2, umounted it, reformated it, remounted it and did
>> >> > a>
>> >> >
>> >> >> gluster volume reset-brick home server2:/data/gluster/brick1
>> >> >> server2:/data/gluster/brick1 commit force>
>> >> >
>> >> > I was expecting that the self-heal daemon would start copying data from
>> >> > server1:/data/gluster/brick1 (about 7.4 TB) to the empty
>> >> > server2:/data/gluster/brick1, which it only did for directories, but
>> >> > not
>> >> > for files.
>> >> >
>> >> > For the moment, I launched on the fuse mount point
>> >> >
>> >> >> find . | xargs stat
>> >> >
>> >> > but crawling the whole volume (100 TB) to trigger self-healing of a
>> >> > single
>> >> > brick of 7.4 TB is unefficient.
>> >> >
>> >> > Is there any trick to only self-heal a single brick, either by setting
>> >> > some attributes to its top directory, for example ?
>> >> >
>> >> >
>> >> > Many thanks,
>> >> >
>> >> >
>> >> > Alessandro
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > Gluster-users mailing list
>> >> > Gluster-users@gluster.org
>> >> > http://lists.gluster.org/mailman/listinfo/gluster-users
>> >
>> > --
>> >
>> >  Dr. Ir. Alessandro Ipe
>> >  Department of Observations             Tel. +32 2 373 06 31
>> >  Remote Sensing from Space
>> >  Royal Meteorological Institute
>> >  Avenue Circulaire 3                    Email:
>> >  B-1180 Brussels        Belgium         alessandro....@meteo.be
>> >  Web: http://gerb.oma.be
>
>
> --
>
>  Dr. Ir. Alessandro Ipe
>  Department of Observations             Tel. +32 2 373 06 31
>  Remote Sensing from Space
>  Royal Meteorological Institute
>  Avenue Circulaire 3                    Email:
>  B-1180 Brussels        Belgium         alessandro....@meteo.be
>  Web: http://gerb.oma.be
>
>
>
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to