On Fri, Feb 9, 2018 at 11:46 AM, Karthik Subrahmanya <ksubr...@redhat.com> wrote:
> Hey, > > Did the heal completed and you still have some entries pending heal? > If yes then can you provide the following informations to debug the issue. > 1. Which version of gluster you are running > 2. Output of gluster volume heal <volname> info summary or gluster volume > heal <volname> info > 3. getfattr -d -e hex -m . <filepath-on-brick> output of any one of the > file which is pending heal from all the bricks > > Regards, > Karthik > > On Thu, Feb 8, 2018 at 12:48 PM, Seva Gluschenko <g...@webkontrol.ru> > wrote: > >> Hi folks, >> >> I'm troubled moving an arbiter brick to another server because of I/O >> load issues. My setup is as follows: >> >> # gluster volume info >> >> Volume Name: myvol >> Type: Distributed-Replicate >> Volume ID: 43ba517a-ac09-461e-99da-a197759a7dc8 >> Status: Started >> Snapshot Count: 0 >> Number of Bricks: 3 x (2 + 1) = 9 >> Transport-type: tcp >> Bricks: >> Brick1: gv0:/data/glusterfs >> Brick2: gv1:/data/glusterfs >> Brick3: gv4:/data/gv01-arbiter (arbiter) >> Brick4: gv2:/data/glusterfs >> Brick5: gv3:/data/glusterfs >> Brick6: gv1:/data/gv23-arbiter (arbiter) >> Brick7: gv4:/data/glusterfs >> Brick8: gv5:/data/glusterfs >> Brick9: pluto:/var/gv45-arbiter (arbiter) >> Options Reconfigured: >> nfs.disable: on >> transport.address-family: inet >> storage.owner-gid: 1000 >> storage.owner-uid: 1000 >> cluster.self-heal-daemon: enable >> >> The gv23-arbiter is the brick that was recently moved from other server >> (chronos) using the following command: >> >> # gluster volume replace-brick myvol chronos:/mnt/gv23-arbiter >> gv1:/data/gv23-arbiter commit force >> volume replace-brick: success: replace-brick commit force operation >> successful >> >> It's not the first time I was moving an arbiter brick, and the heal-count >> was zero for all the bricks before the change, so I didn't expect much >> trouble then. What was probably wrong is that I then forced chronos out of >> cluster with gluster peer detach command. All since that, over the course >> of the last 3 days, I see this: >> >> # gluster volume heal myvol statistics heal-count >> Gathering count of entries to be healed on volume myvol has been >> successful >> >> Brick gv0:/data/glusterfs >> Number of entries: 0 >> >> Brick gv1:/data/glusterfs >> Number of entries: 0 >> >> Brick gv4:/data/gv01-arbiter >> Number of entries: 0 >> >> Brick gv2:/data/glusterfs >> Number of entries: 64999 >> >> Brick gv3:/data/glusterfs >> Number of entries: 64999 >> >> Brick gv1:/data/gv23-arbiter >> Number of entries: 0 >> >> Brick gv4:/data/glusterfs >> Number of entries: 0 >> >> Brick gv5:/data/glusterfs >> Number of entries: 0 >> >> Brick pluto:/var/gv45-arbiter >> Number of entries: 0 >> >> According to the /var/log/glusterfs/glustershd.log, the self-healing is >> undergoing, so it might be worth just sit and wait, but I'm wondering why >> this 64999 heal-count persists (a limitation on counter? In fact, gv2 and >> gv3 bricks contain roughly 30 million files), and I feel bothered because >> of the following output: >> >> # gluster volume heal myvol info heal-failed >> Gathering list of heal failed entries on volume myvol has been >> unsuccessful on bricks that are down. Please check if all brick processes >> are running. >> >> I attached the chronos server back to the cluster, with no noticeable >> effect. Any comments and suggestions would be much appreciated. >> >> -- >> Best Regards, >> >> Seva Gluschenko >> CTO @ http://webkontrol.ru >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users@gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users >> > >
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users