On 07/14/2011 04:35 PM, John Mark Walker wrote:
Joe - thanks for taking the time to write this up. It sounds like the
issue this is designed to fix is related to the GFID mismatch issue
that we released a preventive fix for today.

Hi John

  We need to get this into testing soon.


The sanity checks could be useful, though. Does today's release
change anything with respect to your tools?

Don't know yet. If we can get access to more of the internals monitoring, we should be able to craft better tools for this. Our plan is to eventually get something like a parallel scan and "fix" tool that does best practices for the fix. Not quite an fsck (thats at a different level). But a real sanity checker.

I'd imagine it would be useful for the CloudFS group too ...


-John Mark Gluster Community Guy


________________________________________ From:
gluster-users-boun...@gluster.org [gluster-users-boun...@gluster.org]
on behalf of Joe Landman [land...@scalableinformatics.com] Sent:
Wednesday, July 13, 2011 9:15 PM To: gluster-users Subject:
[Gluster-users] Tools for the admin

Hi folks

We have run into a number of problems with missing files (among
other things).  So I went hunting for the files.  Along the way, I
came up with some very simple sanity checks and tools for helping to
correct situations.  They will not work on striped data ... sorry.

Sanity check #1: conservation of number of files

The sum of the number of files on your backing stores (excluding
links and directories) should equal (with possible minor variance due
to gluster internals) the sum of the number of files (excluding links
and directories) in your gluster volumes.

If you have say, 6 bricks, each with nearly 1M files, and a dht
volume built from those bricks, you really ... REALLY ... shouldn't
have only 1.8M files in your volume.  If you do, then some files are
missing from the volume (really).  You can tell what these files are,
as they have no xattr.  Yeah.  Really.

How can you enumerate what you have?

Simple.  Meet file_accounting.pl  (available at
(http://download.scalableinformatics.com/gluster/utils/)

This handy utility will tell you important things about your file
system.

[root@jr4-1 temp]# /data/tiburon/install/scan/file_accounting.pl
--bspath=/data/brick-sdc2/dht/ Number of entries: 944604 Number of
links  : 6711 Number of dir    : 102825 Number of files  : 834794

--bspath is the "backing store path", where the files reside.  It
works just as well on your gluster volume, which allows you to
inspect your sanity with appropriate sums.

So you need to copy these into the volume.  And move them out of the
way first  before copy in.

Which leads to tool #2 and #3.  First, you need to scan your backing
store file system for the files.

Tool #2: scan_gluster.pl

/data/tiburon/install/scan/scan_gluster.pl
--bspath=/data/brick-sdc2/dht
/data/brick-sdc2/temp/sdc2.data

will grab lots of nice info about the file, including the
attributes. You can now use grep against the sdc2.data file and look
only for 'attr=,' and those will be things gluster knows varying
degrees about in your file system.  Some things, specifically files
with this condition, yeah, those are missing files.  The ones I was
trying to find.

If you have a user who notes that files occasionally go missing,
yeah, this can help you find them if they exist on the backing store.
Which they probably do.

The next tool is dangerous.  So far all we have done is to scan the
backing store.  Now we are going to make changes.  No, don't worry,
its actually ... almost ... safe.  We do a file move to another
location (preferably on the same device/mount point in the backing
store), then a copy into gluster volume (yes you need to mount it on
your brick nodes). The danger is in modifying a gluster file system
backend.  Don't do this.  Ever.  Unless 3/4 of your files go
missing.

And, by the way, we have a handy dandy --md5 switch on there, if you
want the scan to take forever.

Tool #3: data_mover.pl

This will do the dirty work.  It parses the output of scan_gluster,
and makes changes.  There is a --dryrun option for those who want to
try it, and a -T number   option to specify the number of changes to
make to the file.  Allows you to try it (hence the T ... for TRY) on
some number of files.  It will preserve ownership and permission mask
(ohhh ahhh ... shiny!).  The --tmp option happily sets your temporary
directory. Verbose and debug should be obvious.

nohup ./data_mover.pl --data sdd2-nomd5.data --debug --verbose
--tmp `pwd`/tmp  -T 2000000>>  out 2>&1&


Note:  all of these tools currently use /opt/scalable/bin/perl as
the interpreter.  This is because our Perl build (5.12.3) includes
all the bits we need to make this work.  If you want to use them, you
are welcome to change /opt/scalable/bin/perl to /usr/bin/perl, and
they you will have to install a few modules

cpan Getopt::Lucid File::ExtAttr

If you have an issue with either, please let me know.

We can turn these into binaries if someone needs.  Source is at

http://download.scalableinformatics.com/gluster/utils/

Let me know (offline) if you run into problems if you decide to give
them a try.  Note, they are GPL2 (no license tag on them), no
warranty, and data_mover.pl will MOST DEFINITELY DESTROY DATA.  We
aren't liable for any damages if you use it.  Caveat Emptor.  Let the
admin beware. Did I mention that data_mover.pl WILL DESTROY YOUR
DATA?   I am not sure if I did.  So here it is again.  data_mover.pl
WILL DESTROY YOUR DATA.

Don't use these unless you have a backup.  Especially data_mover.pl.
Because IT WILL MOST DEFINITELY DESTROY YOUR DATA.  Might even bite
your dog, egg your house, and do all sorts of other nastiness.  It
will increase entropy in the universe.

But if you are staring at the rear end of 3.8M missing files,
wondering WTF, mebbe ... that data lossage thing doesn't sound so
bad.  Especially if you can reverse it.

So feel free to look them over.  I plan to hone them over time, and
refine them.  Add documentation even.  If they prove useful enough
for people to use, please let me know.  And they are GPL v2.

The site is http://download.scalableinformatics.com/gluster/utils/ .

-- Joseph Landman, Ph.D Founder and CEO Scalable Informatics, Inc.
email: land...@scalableinformatics.com web  :
http://scalableinformatics.com
http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121
fax  : +1 866 888 3112 cell : +1 734 612 4615
_______________________________________________ Gluster-users mailing
list Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: land...@scalableinformatics.com
web  : http://scalableinformatics.com
       http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615
_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to