Hi,

I have a volume (currently not mounted by any other clients) that complains about an "unsynced" entry.

Gluster-3.3.1, setup with replicate 2 (at gluster machines p01,p02 to keep the names short)

# gluster volume heal RedhawkHome info
Gathering Heal info on volume RedhawkHome has been successful

Brick mualglup01:/mnt/gluster/RedhawkHome
Number of entries: 1
<gfid:9ed83644-cae6-4d16-a5b7-7ccb48c41695>

Brick mualglup02:/mnt/gluster/RedhawkHome
Number of entries: 1
<gfid:9ed83644-cae6-4d16-a5b7-7ccb48c41695>

### Usually, the output gives me the path of the file,
### but this time only spits out the gfid

I walked the entire file system and found that the corresponding file with the gfid is:
./home/zhouq_shared/T2483spectra/January17201/t2483_17jan2010_s1221e1635_1459

I confirmed that the gfid is the same on both p01, p02 Gluster machines for that file.

### At both p01 and p02, I have exactly matching

# getfattr -d -e hex -m . home/zhouq_shared/T2483spectra/January172010/t2483_17jan2010_s1221e1635_1459 # file: home/zhouq_shared/T2483spectra/January172010/t2483_17jan2010_s1221e1635_1459
trusted.afr.RedhawkHome-client-0=0x000000030000000000000000 (non zero)
trusted.afr.RedhawkHome-client-1=0x000000030000000000000000 (non zero)
trusted.gfid=0x9ed83644cae64d16a5b77ccb48c41695

Self heal is failing with the log (repeating many times each time self heal runs): [2013-01-04 14:47:05.203072] I [afr-self-heal-data.c:712:afr_sh_data_fix] 0-RedhawkHome-replicate-0: no active sinks for performing self-heal on file <gfid:9ed83644-cae6-4d16-a5b7-7ccb48c41695>

At p01,

# stat home/zhouq_shared/T2483spectra/January172010/t2483_17jan2010_s1221e1635_1459 File: `home/zhouq_shared/T2483spectra/January172010/t2483_17jan2010_s1221e1635_1459'
  Size: 34881536        Blocks: 68128      IO Block: 4096   regular file
Device: fd02h/64770d    Inode: 47190622    Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2012-12-18 10:26:35.276898896 -0500
Modify: 2012-12-18 10:26:36.581912761 -0500
Change: 2013-01-04 19:18:34.495935037 -0500


At p02,
# stat home/zhouq_shared/T2483spectra/January172010/t2483_17jan2010_s1221e1635_1459 File: `home/zhouq_shared/T2483spectra/January172010/t2483_17jan2010_s1221e1635_1459'
  Size: 34881536        Blocks: 68128      IO Block: 4096   regular file
Device: fd02h/64770d    Inode: 328602346   Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2012-12-18 10:26:35.275947590 -0500
Modify: 2012-12-18 10:26:36.584973268 -0500
Change: 2013-01-04 19:18:34.498314380 -0500

md5sum at both p01,p02 matched exactly.

I don't recall both Gluster machines were down at the same time (but that does not mean that it did not happen). This is my non-production volume, it could be me overly aggressive testing things out. But, I don't recall the client "cp" process which produced that file to have any error messages (this does not mean that it did not happen too).

What's the best way to recover from this error ?

I assume that the worst case scenario is I use a client to mount the volume and then delete the file (that is, I lose this file).

Thanks,
Robin
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Reply via email to