So - I continue to try to chase down this problem to its ultimate conclusion.

The directories are all still visible to the users, but scanning for attributes 
of 0sAAAAAAAAAAAAAAAA still yielded matches on the set of GlusterFS servers.

http://pastebin.com/mxvFnFj4

I tried running this command, but as you can see it wasn't happy, even though 
the syntax was correct:

root@jc1letgfs17:~# gluster volume rebalance pfs-ro1 fix-layout start
Usage: volume rebalance <VOLNAME> [fix-layout|migrate-data] {start|stop|status}

I suspect this is a bug because of the "-" in my volume name. I'll test and 
confirm and file when I get a chance.

So I just did the standard rebalance command:
 gluster volume rebalance pfs-ro1 start

and it trundled along for a while and then one time when checked it's status, 
it failed:
 date; gluster volume rebalance pfs-ro1 status
 Thu May 26 09:02:00 EDT 2011
 rebalance failed

I re-ran it FOUR times getting a little farther with each attempt, and it 
eventually completed and then started doing the actual file migration part of 
the rebalance:
 Thu May 26 12:22:25 EDT 2011
 rebalance step 1: layout fix in progress: fixed layout 779
 Thu May 26 12:23:25 EDT 2011
 rebalance step 2: data migration in progress: rebalanced 71 files of size 
136518704 (total files scanned 57702)

Now scanning for attributes of 0sAAAAAAAAAAAAAAAA yields less results, but some 
are still present:

http://pastebin.com/x4wYq8ic

As a possible sanity check, I did this command on my Read-Write GlusterFS 
storage servers (2 boxes, Distributed-Replicate), and got no "bad" attributes:
 jc1ladmin1:~/projects/gluster  loop_check ' getfattr -dm - 
/export/read-only/g*' jc1letgfs{13,16} | egrep 
"jc1letgfs|0sAAAAAAAAAAAAAAAA$|file:" | less
 getfattr: /export/read-only/g*: No such file or directory
 getfattr: /export/read-only/g*: No such file or directory
 jc1letgfs13
 jc1letgfs16

One difference in these two Storage server groups - the Read-Only group of 4 
servers have their backend file systems formatted as XFS, while the Read-Write 
group of 2 are formatted with EXT4.

Suggestions, critiques, etc gratefully solicited.

James Burnash
Unix Engineer.

-----Original Message-----
From: gluster-users-boun...@gluster.org 
[mailto:gluster-users-boun...@gluster.org] On Behalf Of Burnash, James
Sent: Monday, May 23, 2011 11:17 AM
To: 'Mohit Anchlia'
Cc: gluster-users@gluster.org
Subject: Re: [Gluster-users] Files present on the backend but have become 
invisible from clients

So - following up on this in the hope it will help others.

According to the help offered (thanks!), the problem is that the attributes on 
the directories that cannot be seen by the clients are incorrect - which is to 
say they are all 0sAAAAAAAAAAAAAAAA.

Here is the pastebin URL: http://pastebin.com/yz5PWjKV

In desperation, I did a:

gluster volume pfs-ro1 stop

and then

gluster volume pfs-ro1 start

... and now the directories and their contents are visible to the clients.

I'm in the process of scanning the attributes of those directories again - I'll 
post another list on pastebin when I get them.

James Burnash

-----Original Message-----
From: gluster-users-boun...@gluster.org 
[mailto:gluster-users-boun...@gluster.org] On Behalf Of Burnash, James
Sent: Thursday, May 19, 2011 3:47 PM
To: 'Mohit Anchlia'
Cc: gluster-users@gluster.org
Subject: Re: [Gluster-users] Files present on the backend but have become 
invisible from clients

>From the client, I can't see files in any directories under the path of 
>/pfs2/online_archive/2011/*.

root@jc1lnxsamm100:/pfs2/test# ls -l /pfs2/online_archive/2011 total 212 
drwxr-xr-x 22 statarb arb  4096 Jan 31 09:18 01 drwxr-xr-x 21 dataops arb 77824 
Feb 28 09:18 02 drwxr-xr-x 25 dataops arb  4096 Mar 31 18:15 03 drwxr-xr-x 22 
dataops arb 77824 May  4 11:42 04 drwxr-xr-x 15 dataops arb  4096 May 18 21:10 
05
drwxr-xr-x  2 dataops arb   114 Dec 30 10:10 06
drwxr-xr-x  2 dataops arb   114 Dec 30 10:10 07
drwxr-xr-x  2 dataops arb   114 Dec 30 10:10 08
drwxr-xr-x  2 dataops arb   114 Dec 30 10:10 09
drwxr-xr-x  2 dataops arb   114 Dec 30 10:10 10
drwxr-xr-x  2 dataops arb   114 Dec 30 10:10 11
drwxr-xr-x  2 dataops arb   114 Dec 30 10:10 12

root@jc1lnxsamm100:/pfs2/test# ls -l /pfs2/online_archive/2011/*
/pfs2/online_archive/2011/01:
total 0

/pfs2/online_archive/2011/02:
total 0

/pfs2/online_archive/2011/03:
total 0

/pfs2/online_archive/2011/04:
total 0

/pfs2/online_archive/2011/05:
total 0

/pfs2/online_archive/2011/06:
total 0

/pfs2/online_archive/2011/07:
total 0

/pfs2/online_archive/2011/08:
total 0

/pfs2/online_archive/2011/09:
total 0

/pfs2/online_archive/2011/10:
total 0

/pfs2/online_archive/2011/11:
total 0

/pfs2/online_archive/2011/12:
total 0

-----Original Message-----
From: Mohit Anchlia [mailto:mohitanch...@gmail.com]
Sent: Thursday, May 19, 2011 3:44 PM
To: Burnash, James
Cc: gluster-users@gluster.org
Subject: Re: [Gluster-users] Files present on the backend but have become 
invisible from clients

As in do you see all the files in those dirs unlike others?

On Thu, May 19, 2011 at 12:42 PM, Burnash, James <jburn...@knight.com> wrote:
> "Good ones" in what way?
>
> Permissions on the backend storage are here:
>
> http://pastebin.com/EiMvbgdh
>
> -----Original Message-----
> From: Mohit Anchlia [mailto:mohitanch...@gmail.com]
> Sent: Thursday, May 19, 2011 3:09 PM
> To: Burnash, James
> Cc: gluster-users@gluster.org
> Subject: Re: [Gluster-users] Files present on the backend but have
> become invisible from clients
>
> It looks like a bug. You are missing xattrs. Can you confirm if all dirs that 
> have "0sAAAAAAAAAAAAAAAA" in your pastebin are good ones?
>
> On Thu, May 19, 2011 at 11:51 AM, Burnash, James <jburn...@knight.com> wrote:
>> Hi Mohit.
>>
>> Answers inline below:
>>
>> -----Original Message-----
>> From: Mohit Anchlia [mailto:mohitanch...@gmail.com]
>> Sent: Thursday, May 19, 2011 1:17 PM
>> To: Burnash, James
>> Cc: gluster-users@gluster.org
>> Subject: Re: [Gluster-users] Files present on the backend but have
>> become invisible from clients
>>
>> Can you post the output of  getfattr -dm - <file|dir> for all parent dirs.
>>        http://pastebin.com/EVfRsSrD
>>
>>  and for one of the files from the server?
>>
>> #  getfattr -dm -
>> /export/read-only/g01/online_archive/2011/01/05/20110105.SN.grep.gz
>> getfattr: Removing leading '/' from absolute path names # file:
>> export/read-only/g01/online_archive/2011/01/05/20110105.SN.grep.gz
>> trusted.afr.pfs-ro1-client-0=0sAAAAAAAAAAAAAAAA
>> trusted.afr.pfs-ro1-client-1=0sAAAAAAAAAAAAAAAA
>> trusted.gfid=0sjyq/BEwuRhaVbF7qdo0lqA==
>>
>> Thank you sir!
>>
>> James
>>
>>
>> On Thu, May 19, 2011 at 8:15 AM, Burnash, James <jburn...@knight.com> wrote:
>>> Hello folks. A new conundrum to make sure that my life with
>>> GlusterFS doesn't become boring :-)
>>>
>>> Configuration at end of this message:
>>>
>>> On client - directory appears to be empty:
>>> # ls -l /pfs2/online_archive/2011/01 total 0
>>>
>>> fgrep -C 2 inode /var/log/glusterfs/pfs2.log | tail -10
>>> [2011-05-18 14:40:11.665045] E
>>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]
>>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of
>>> '/' (possib le split-brain). Please fix the file on all backend
>>> volumes
>>> [2011-05-18 14:43:47.810045] E [rpc-clnt.c:199:call_bail]
>>> 0-pfs-ro1-client-1: bailing out frame type(GlusterFS 3.1)
>>> op(INODELK(29)) xid = 0x130824x sent = 2011-0
>>> 5-18 14:13:45.978987. timeout = 1800
>>> [2011-05-18 14:53:12.311323] E
>>> [afr-common.c:110:afr_set_split_brain]
>>> 0-pfs-ro1-replicate-0: invalid argument: inode
>>> [2011-05-18 15:00:32.240373] E
>>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]
>>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of
>>> '/' (possib le split-brain). Please fix the file on all backend
>>> volumes
>>> [2011-05-18 15:10:12.282848] E
>>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]
>>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of
>>> '/' (possib le split-brain). Please fix the file on all backend
>>> volumes
>>> --
>>> [2011-05-19 10:10:25.967246] E
>>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]
>>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of
>>> '/' (possib le split-brain). Please fix the file on all backend
>>> volumes
>>> [2011-05-19 10:20:18.551953] E
>>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]
>>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of
>>> '/' (possib le split-brain). Please fix the file on all backend
>>> volumes
>>> [2011-05-19 10:29:34.834256] E
>>> [afr-common.c:110:afr_set_split_brain]
>>> 0-pfs-ro1-replicate-0: invalid argument: inode
>>> [2011-05-19 10:30:06.898152] E
>>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]
>>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of
>>> '/' (possib le split-brain). Please fix the file on all backend
>>> volumes
>>> [2011-05-19 10:32:05.258799] E
>>> [afr-self-heal-metadata.c:524:afr_sh_metadata_fix]
>>> 0-pfs-ro1-replicate-6: Unable to self-heal permissions/ownership of
>>> '/' (possib le split-brain). Please fix the file on all backend
>>> volumes
>>>
>>>
>>> On server - directory is populated:
>>> loop_check ' ls -l /export/read-only/g*/online_archive/2011/01'
>>> jc1letgfs{14,15,17,18} | less
>>> jc1letgfs14
>>> /export/read-only/g01/online_archive/2011/01:
>>> total 80
>>> drwxrwxrwt 3    403 1009 4096 May  4 10:35 03 drwxrwxrwt 3 107421
>>> 1009
>>> 4096 May  7 12:18 04 drwxrwxrwt 3 107421 1009 4096 May  4 10:35 05
>>> drwxrwxrwt 3 107421 1009 4096 May  4 10:36 06 drwxrwxrwt 3 107421
>>> 1009
>>> 4096 May  4 10:36 07 drwxrwxrwt 3 107421 1009 4096 May  4 10:41 10
>>> drwxrwxrwt 3 107421 1009 4096 May  4 10:37 11 drwxrwxrwt 3 107421
>>> 1009
>>> 4096 May  4 10:43 12 drwxrwxrwt 3 107421 1009 4096 May  4 10:43 13
>>> drwxrwxrwt 3 107421 1009 4096 May  4 10:44 14 drwxrwxrwt 3 107421
>>> 1009
>>> 4096 May  4 10:46 18 drwxrwxrwt 3 107421 1009 4096 Apr 14 14:11 19
>>> drwxrwxrwt 3 107421 1009 4096 May  4 10:43 20 drwxrwxrwt 3 107421
>>> 1009
>>> 4096 May  4 10:49 21 drwxrwxrwt 3 107421 1009 4096 May  4 10:45 24
>>> drwxrwxrwt 3 107421 1009 4096 May  4 10:47 25 drwxrwxrwt 3 107421
>>> 1009
>>> 4096 May  4 10:52 26 drwxrwxrwt 3 107421 1009 4096 May  4 10:49 27
>>> drwxrwxrwt 3 107421 1009 4096 May  4 10:50 28 drwxrwxrwt 3 107421
>>> 1009
>>> 4096 May  4 10:56 31
>>>
>>> (and shows on every brick the same)
>>>
>>> And from the server logs:
>>> root@jc1letgfs17:/var/log/glusterfs# fgrep '2011-05-19 10:39:30'
>>> bricks/export-read-only-g*.log
>>> [2011-05-19 10:39:30.306661] E [posix.c:438:posix_lookup]
>>> 0-pfs-ro1-posix: lstat on /online_archive/2011/01/21 failed: No data
>>> available
>>> [2011-05-19 10:39:30.307754] E [posix.c:438:posix_lookup]
>>> 0-pfs-ro1-posix: lstat on /online_archive/2011/01/21 failed: No data
>>> available
>>> [2011-05-19 10:39:30.308230] E [posix.c:438:posix_lookup]
>>> 0-pfs-ro1-posix: lstat on /online_archive/2011/01/21 failed: No data
>>> available
>>> [2011-05-19 10:39:30.322342] E [posix.c:438:posix_lookup]
>>> 0-pfs-ro1-posix: lstat on /online_archive/2011/01/21 failed: No data
>>> available
>>> [2011-05-19 10:39:30.421298] E [posix.c:438:posix_lookup]
>>> 0-pfs-ro1-posix: lstat on /online_archive/2011/01/21 failed: No data
>>> available
>>>
>>> The only two things that jump out so far are:
>>>  the permissions on the directories under 
>>> /export/read-only/g01/online_archive/2011/01 are 7777, whereas on the 
>>> directories under /export/read-only/g01/online_archive/2010/01 are just 755.
>>> The lstat "no data available errors" only see to appear on the problem 
>>> directories.
>>>
>>>  Any hints or suggestions would be greatly appreciated. Thanks,
>>> James
>>>
>>>
>>> Config:
>>> All on Gluster 3.1.3
>>> Servers:
>>> 4 CentOS 5.5 (ProLiant DL370 G6 servers, Intel Xeon 3200 MHz), Each
>>> with:
>>> Single P812 Smart Array Controller,
>>> Single MDS600 with 70 2TB SATA drives configured as RAID 50
>>> 48 MB RAM
>>>
>>> Clients:
>>> 185 CentOS 5.2 (mostly DL360 G6).
>>> /pfs2 is the mount point for a Duplicated-Replicate volume of 4 servers.
>>>
>>> Volume Name: pfs-ro1
>>> Type: Distributed-Replicate
>>> Status: Started
>>> Number of Bricks: 20 x 2 = 40
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: jc1letgfs17-pfs1:/export/read-only/g01
>>> Brick2: jc1letgfs18-pfs1:/export/read-only/g01
>>> Brick3: jc1letgfs17-pfs1:/export/read-only/g02
>>> Brick4: jc1letgfs18-pfs1:/export/read-only/g02
>>> Brick5: jc1letgfs17-pfs1:/export/read-only/g03
>>> Brick6: jc1letgfs18-pfs1:/export/read-only/g03
>>> Brick7: jc1letgfs17-pfs1:/export/read-only/g04
>>> Brick8: jc1letgfs18-pfs1:/export/read-only/g04
>>> Brick9: jc1letgfs17-pfs1:/export/read-only/g05
>>> Brick10: jc1letgfs18-pfs1:/export/read-only/g05
>>> Brick11: jc1letgfs17-pfs1:/export/read-only/g06
>>> Brick12: jc1letgfs18-pfs1:/export/read-only/g06
>>> Brick13: jc1letgfs17-pfs1:/export/read-only/g07
>>> Brick14: jc1letgfs18-pfs1:/export/read-only/g07
>>> Brick15: jc1letgfs17-pfs1:/export/read-only/g08
>>> Brick16: jc1letgfs18-pfs1:/export/read-only/g08
>>> Brick17: jc1letgfs17-pfs1:/export/read-only/g09
>>> Brick18: jc1letgfs18-pfs1:/export/read-only/g09
>>> Brick19: jc1letgfs17-pfs1:/export/read-only/g10
>>> Brick20: jc1letgfs18-pfs1:/export/read-only/g10
>>> Brick21: jc1letgfs14-pfs1:/export/read-only/g01
>>> Brick22: jc1letgfs15-pfs1:/export/read-only/g01
>>> Brick23: jc1letgfs14-pfs1:/export/read-only/g02
>>> Brick24: jc1letgfs15-pfs1:/export/read-only/g02
>>> Brick25: jc1letgfs14-pfs1:/export/read-only/g03
>>> Brick26: jc1letgfs15-pfs1:/export/read-only/g03
>>> Brick27: jc1letgfs14-pfs1:/export/read-only/g04
>>> Brick28: jc1letgfs15-pfs1:/export/read-only/g04
>>> Brick29: jc1letgfs14-pfs1:/export/read-only/g05
>>> Brick30: jc1letgfs15-pfs1:/export/read-only/g05
>>> Brick11: jc1letgfs17-pfs1:/export/read-only/g06
>>> Brick12: jc1letgfs18-pfs1:/export/read-only/g06
>>> Brick13: jc1letgfs17-pfs1:/export/read-only/g07
>>> Brick14: jc1letgfs18-pfs1:/export/read-only/g07
>>> Brick15: jc1letgfs17-pfs1:/export/read-only/g08
>>> Brick16: jc1letgfs18-pfs1:/export/read-only/g08
>>> Brick17: jc1letgfs17-pfs1:/export/read-only/g09
>>> Brick18: jc1letgfs18-pfs1:/export/read-only/g09
>>> Brick19: jc1letgfs17-pfs1:/export/read-only/g10
>>> Brick20: jc1letgfs18-pfs1:/export/read-only/g10
>>> Brick21: jc1letgfs14-pfs1:/export/read-only/g01
>>> Brick22: jc1letgfs15-pfs1:/export/read-only/g01
>>> Brick23: jc1letgfs14-pfs1:/export/read-only/g02
>>> Brick24: jc1letgfs15-pfs1:/export/read-only/g02
>>> Brick25: jc1letgfs14-pfs1:/export/read-only/g03
>>> Brick26: jc1letgfs15-pfs1:/export/read-only/g03
>>> Brick27: jc1letgfs14-pfs1:/export/read-only/g04
>>> Brick28: jc1letgfs15-pfs1:/export/read-only/g04
>>> Brick29: jc1letgfs14-pfs1:/export/read-only/g05
>>> Brick30: jc1letgfs15-pfs1:/export/read-only/g05
>>> Brick31: jc1letgfs14-pfs1:/export/read-only/g06
>>> Brick32: jc1letgfs15-pfs1:/export/read-only/g06
>>> Brick33: jc1letgfs14-pfs1:/export/read-only/g07
>>> Brick34: jc1letgfs15-pfs1:/export/read-only/g07
>>> Brick35: jc1letgfs14-pfs1:/export/read-only/g08
>>> Brick36: jc1letgfs15-pfs1:/export/read-only/g08
>>> Brick37: jc1letgfs14-pfs1:/export/read-only/g09
>>> Brick38: jc1letgfs15-pfs1:/export/read-only/g09
>>> Brick39: jc1letgfs14-pfs1:/export/read-only/g10
>>> Brick40: jc1letgfs15-pfs1:/export/read-only/g10
>>> Options Reconfigured:
>>> diagnostics.brick-log-level: ERROR
>>> cluster.metadata-change-log: on
>>> diagnostics.client-log-level: ERROR
>>> performance.stat-prefetch: on
>>> performance.cache-size: 2GB
>>> network.ping-timeout: 10
>>>
>>>
>>> DISCLAIMER:
>>> This e-mail, and any attachments thereto, is intended only for use by the 
>>> addressee(s) named herein and may contain legally privileged and/or 
>>> confidential information. If you are not the intended recipient of this 
>>> e-mail, you are hereby notified that any dissemination, distribution or 
>>> copying of this e-mail, and any attachments thereto, is strictly 
>>> prohibited. If you have received this in error, please immediately notify 
>>> me and permanently delete the original and any copy of any e-mail and any 
>>> printout thereof. E-mail transmission cannot be guaranteed to be secure or 
>>> error-free. The sender therefore does not accept liability for any errors 
>>> or omissions in the contents of this message which arise as a result of 
>>> e-mail transmission.
>>> NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group
>>> may, at its discretion, monitor and review the content of all e-mail
>>> communications. http://www.knight.com
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>>
>>
>
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to