[Gluster-users] Recovery (no active sink)

Robin Fri, 04 Jan 2013 16:33:25 -0800

Hi,

I have a volume (currently not mounted by any other clients) thatcomplains about an "unsynced" entry.

Gluster-3.3.1, setup with replicate 2 (at gluster machines p01,p02 tokeep the names short)


# gluster volume heal RedhawkHome info
Gathering Heal info on volume RedhawkHome has been successful

Brick mualglup01:/mnt/gluster/RedhawkHome
Number of entries: 1
<gfid:9ed83644-cae6-4d16-a5b7-7ccb48c41695>

Brick mualglup02:/mnt/gluster/RedhawkHome
Number of entries: 1
<gfid:9ed83644-cae6-4d16-a5b7-7ccb48c41695>

### Usually, the output gives me the path of the file,
### but this time only spits out the gfid

I walked the entire file system and found that the corresponding filewith the gfid is:

./home/zhouq_shared/T2483spectra/January17201/t2483_17jan2010_s1221e1635_1459

I confirmed that the gfid is the same on both p01, p02 Gluster machinesfor that file.


### At both p01 and p02, I have exactly matching

# getfattr -d -e hex -m .home/zhouq_shared/T2483spectra/January172010/t2483_17jan2010_s1221e1635_1459# file:home/zhouq_shared/T2483spectra/January172010/t2483_17jan2010_s1221e1635_1459

trusted.afr.RedhawkHome-client-0=0x000000030000000000000000 (non zero)
trusted.afr.RedhawkHome-client-1=0x000000030000000000000000 (non zero)
trusted.gfid=0x9ed83644cae64d16a5b77ccb48c41695

Self heal is failing with the log (repeating many times each time selfheal runs):[2013-01-04 14:47:05.203072] I[afr-self-heal-data.c:712:afr_sh_data_fix] 0-RedhawkHome-replicate-0: noactive sinks for performing self-heal on file<gfid:9ed83644-cae6-4d16-a5b7-7ccb48c41695>


At p01,

# stathome/zhouq_shared/T2483spectra/January172010/t2483_17jan2010_s1221e1635_1459File:`home/zhouq_shared/T2483spectra/January172010/t2483_17jan2010_s1221e1635_1459'

  Size: 34881536        Blocks: 68128      IO Block: 4096   regular file
Device: fd02h/64770d    Inode: 47190622    Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2012-12-18 10:26:35.276898896 -0500
Modify: 2012-12-18 10:26:36.581912761 -0500
Change: 2013-01-04 19:18:34.495935037 -0500


At p02,

# stathome/zhouq_shared/T2483spectra/January172010/t2483_17jan2010_s1221e1635_1459File:`home/zhouq_shared/T2483spectra/January172010/t2483_17jan2010_s1221e1635_1459'

  Size: 34881536        Blocks: 68128      IO Block: 4096   regular file
Device: fd02h/64770d    Inode: 328602346   Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2012-12-18 10:26:35.275947590 -0500
Modify: 2012-12-18 10:26:36.584973268 -0500
Change: 2013-01-04 19:18:34.498314380 -0500

md5sum at both p01,p02 matched exactly.

I don't recall both Gluster machines were down at the same time (butthat does not mean that it did not happen). This is my non-productionvolume, it could be me overly aggressive testing things out. But, Idon't recall the client "cp" process which produced that file to haveany error messages (this does not mean that it did not happen too).


What's the best way to recover from this error ?

I assume that the worst case scenario is I use a client to mount thevolume and then delete the file (that is, I lose this file).


Thanks,
Robin
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Recovery (no active sink)

Reply via email to