I think its repaired.  After using Phil's method, I got a file that
the pvfs2-display displayed all content, so I started the server and
got:
[S 04/05 10:45] PVFS2 Server on node pvfs2-io-0-2 version 2.8.2 starting...
[E 04/05 10:45] Warning: got invalid handle or key size in
dbpf_dspace_iterate_handles().
[E 04/05 10:45] Warning: skipping entry.
[S 04/05 10:45] PVFS2 Server ready.

I believe this means recovery is as compelte as possible, and that
there's an entry that's missing now, is this correct?  Is it ready to
go back into production (once I update versions of db and pvfs2)?

--Jim


On Wed, Apr 4, 2012 at 1:18 PM, Elaine Quarles <[email protected]> wrote:
> Try "make develtools".
>
> -- Elaine
>
> -----Original Message-----
> From: Jim Kusznir [mailto:[email protected]]
> Sent: Wednesday, April 04, 2012 3:45 PM
> To: Elaine Quarles
> Subject: Re: [Pvfs2-users] Help: pvfs2-server won't start, errors detected
>
> I patched everything and ran configure and make, but it didn't build
> pvfs2-db-display.  The .c file is present.  I haven't found the magic make
> command to cause that to be built either...Suggestions?
>
> --Jim
>
> On Wed, Apr 4, 2012 at 11:35 AM, Elaine Quarles <[email protected]> wrote:
>> Sorry for the delay. Attached are db-display.tar. If you expand this
>> from the top level directory of your source tree it will create the
>> src/apps/devel directory. Makefile.in.patch will patch your
>> Makefile.in with the logic necessary to build pvfs2-db-display. Please
>> note that it is necessary to run the configure script to update your
> Makefile.
>>
>> Please send the results of running this utility so we can determine
>> whether it is necessary to try continuous forward reading through the
>> database, skipping error records or whether we will have to also read
>> from the end of the database backwards.
>>
>> Thanks,
>> Elaine
>>
>> -----Original Message-----
>> From: Jim Kusznir [mailto:[email protected]]
>> Sent: Wednesday, April 04, 2012 1:56 PM
>> To: Elaine Quarles
>> Cc: Becky Ligon
>> Subject: Re: [Pvfs2-users] Help: pvfs2-server won't start, errors
>> detected
>>
>> Any updates?  My entire cluster is still offline due to this problem,
>> and my users are starting to look for their pitchforks....
>>
>> Thanks!
>> --Jim
>>
>> On Tue, Apr 3, 2012 at 8:47 AM, Elaine Quarles <[email protected]>
> wrote:
>>> Jim,
>>>
>>> Could you please check whether your pvfs 2.8.2 distribution contains
>>> src/apps/devel/pvfs2-db-display.c? If so you can build it by running
>>> "make develtools". If your distribution does not contain this file
>>> let me know and I will send a patch.
>>>
>>> If you already have the utility, please redirect the output and send
>>> it so we can see what it has to say about the state of the database
>>> and determine the next step from there.
>>>
>>> Here is the command-line format.
>>> Usage:          ./pvfs2-db-display --dbpath <path> --hexdir <hexdir>
>>> Example:        ./pvfs2-db-display --dbpath /tmp/pvfs2-space --hexdir
>>> 4e3f77a5
>>>
>>> Options:
>>>        --verbose               Enable verbose output
>>>        --help                  This message.
>>>        --dbpath <path>         The path of the server's StorageSpace.
>>> The path
>>>                                should contain collections.db and
>>>                                storage_attributes.db
>>>        --hexdir <dir>          The directory in dbpath that contains
>>>                                collection_attributes.db,
>>> dataspace_attrbutes.db
>>>                                and keyval.db
>>>
>>> Thanks,
>>> Elaine
>>>
>>> -----Original Message-----
>>> From: Jim Kusznir [mailto:[email protected]]
>>> Sent: Monday, April 02, 2012 5:57 PM
>>> To: [email protected]
>>> Cc: [email protected]; [email protected];
>>> [email protected]
>>> Subject: Re: [Pvfs2-users] Help: pvfs2-server won't start, errors
>>> detected
>>>
>>> If this is the recommended method for recovery, then lets do it.
>>>
>>> Just one more question on how pvfs2 runs: is the metadata contained
>>> on each server different, or should they all be identical copies?  It
>>> just occurred to me that my understanding of the metadata was that
>>> all three metadata servers were redundant.....  Or is this a
>>> "different
>> metadata" db?
>>>
>>> --Jim
>>>
>>> On Mon, Apr 2, 2012 at 1:15 PM, Becky Ligon <[email protected]> wrote:
>>>> Jim:
>>>>
>>>> We have a program called pvfs2-db-display that reads directly
>>>> through the Berkeley DB.  We don't know for sure, but we might be
>>>> able to use whatever information it will give to recover what we
>>>> can.  The program reads from the database from logical top to
>>>> bottom.  We can also change it to read from logical bottom to top.
>>>> In this way, we MAY be able to recover the good data that is still
>>>> there above and below the corrupted area.  We've never done this but
>>>> we are willing to give it a
>>> try.
>>>>
>>>> Let us know if you'd like to try this!
>>>>
>>>> Becky
>>>> --
>>>> Becky Ligon
>>>> HPC Admin Staff
>>>> PVFS/OrangeFS Developer
>>>> Clemson University/Omnibond.com OrangeFS Support
>>>> 864-650-4065
>>>>
>>>>> Your solution sounds like what I am trying to do; I'd prefer to
>>>>> install db4 into /opt.
>>>>>
>>>>> If I can get your spec file or srpm, I'd greatly appreciate it!
>>>>>
>>>>> --Jim
>>>>>
>>>>> On Mon, Apr 2, 2012 at 11:19 AM, Becky Ligon <[email protected]>
> wrote:
>>>>>> Jim:
>>>>>>
>>>>>> We downloaded the software from the Oracle site and created an rpm
>>>>>> from that.  We are running Centos5 on our productions servers with
>>>>>> kernel=2.6.18-238.9.1.el5 and have been running a version of db4
>>>>>> for at least the past 3 years.  So, you should be able to create
>>>>>> the rpm.  I can send you the rpm that we are using but it is
>>>>>> taylored to our environment; we install db4 in /opt/db4, because
>>>>>> other items depend on the installed version.
>>>>>>
>>>>>> Becky
>>>>>>
>>>>>>
>>>>>> On Mon, Apr 2, 2012 at 1:37 PM, Jim Kusznir <[email protected]>
> wrote:
>>>>>>>
>>>>>>> I've been trying to build a db4 rpm on my centos box, but it
>>>>>>> appears it has dependencies that require an OS upgrade...how did
>>>>>>> you get anything newer than the stock db4 installed on centos5?
>>>>>>>
>>>>>>> --Jim
>>>>>>>
>>>>>>> On Sat, Mar 31, 2012 at 3:07 PM, Becky Ligon <[email protected]>
>>>>>>> wrote:
>>>>>>> > Jim:
>>>>>>> >
>>>>>>> > I understand your situation.  Here at Clemson University, we
>>>>>>> > went through the same situation a couple of years ago.  Now, we
>>>>>>> > backup the
>>>>>>> metadata
>>>>>>> > databases.  We don't have the space to backup our data either!
>>>>>>> >
>>>>>>> > Under no circumstances should you run pvfs2-fsck.  If you do,
>>>>>>> > then we won't be able to help at all, if you run this command
>>>>>>> > in the destructive
>>>>>>> mode.
>>>>>>> >  If
>>>>>>> > you're willing, Omnibond MAY be able to write some utilities
>>>>>>> > that we help you recover most of the data.  You will have to
>>>>>>> > speak to Boyd Wilson
>>>>>>> > ([email protected]) and workout something.
>>>>>>> >
>>>>>>> > Becky Ligon
>>>>>>> >
>>>>>>> >
>>>>>>> > On Fri, Mar 30, 2012 at 5:55 PM, Jim Kusznir
>>>>>>> > <[email protected]>
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >> I made no changes to my environment; it was up and running
>>>>>>> >> just
>>>>>>> fine.
>>>>>>> >> I ran db_recover, and it immediately returned, with no
>>>>>>> >> apparent sign of doing anything but creating a log.000000001 file.
>>>>>>> >>
>>>>>>> >> I have the centos DB installed, db4-4.3.29-10.el5
>>>>>>> >>
>>>>>>> >> I have no backups; this is my high performance filesystem of
>>>>>>> >> 99TB;
>>>>>>> it
>>>>>>> >> is the largest disk we have and therefore have no means of
>>>>>>> >> backing
>>>>>>> it
>>>>>>> >> up.  We don't have anything big enough to hold that much data.
>>>>>>> >>
>>>>>>> >> Is there any hope?  Can we just identify and delete the files
>>>>>>> >> that have the db dammange on it?  (Note that I don't even have
>>>>>>> >> anywhere
>>>>>>> to
>>>>>>> >> back up this data to temporally if we do get it running, so
>>>>>>> >> I'd need to "fix in place".
>>>>>>> >>
>>>>>>> >> thanks!
>>>>>>> >> --Jim
>>>>>>> >>
>>>>>>> >> --Jim
>>>>>>> >>
>>>>>>> >> On Fri, Mar 30, 2012 at 2:44 PM, Becky Ligon
>>>>>>> >> <[email protected]>
>>>>>>> >> wrote:
>>>>>>> >> > Jim:
>>>>>>> >> >
>>>>>>> >> > If you haven't made any recent changes to your pvfs
>>>>>>> >> > environment or Berkeley Db installation, then it looks like
>>>>>>> >> > you have a corrupted metadata database.
>>>>>>> >> > There is no way to easily recover.  Sometimes, the Berkeley
>>>>>>> >> > db command "db_recover" might work, but PVFS doesn't have
>>>>>>> >> > transactions turned on, so normally it doesn't work.  It's
>>>>>>> >> > worth a try, just to be sure.
>>>>>>> >> >
>>>>>>> >> > Do you have any recent backups of the databases?  If so,
>>>>>>> >> > then you will need to use a set of backups that were created
>>>>>>> >> > around the same time, so the databases will be somewhat
>>>>>>> >> > consistent with each other.
>>>>>>> >> >
>>>>>>> >> > Which version of Berkeley are you using?  We have had
>>>>>>> >> > corruption issues with older versions of it.  We strongly
>>>>>>> >> > recommend 4.8 or higher.  There are some know problems with
>>>>>>> >> > threads in the older versions .
>>>>>>> >> >
>>>>>>> >> > Becky Ligon
>>>>>>> >> >
>>>>>>> >> > On Fri, Mar 30, 2012 at 3:28 PM, Jim Kusznir
>>>>>>> >> > <[email protected]>
>>>>>>> >> > wrote:
>>>>>>> >> >>
>>>>>>> >> >> Hi all:
>>>>>>> >> >>
>>>>>>> >> >> I got some notices from my users with "wierdness with pvfs2"
>>>>>>> >> >> this morning, and went and investagated.  eventually, I
>>>>>>> >> >> found the following on one of my 3 serers:
>>>>>>> >> >>
>>>>>>> >> >> [S 03/30 12:22] PVFS2 Server on node pvfs2-io-0-2 version
>>>>>>> >> >> 2.8.2 starting...
>>>>>>> >> >> [E 03/30 12:23] Warning: got invalid handle or key size in
>>>>>>> >> >> dbpf_dspace_iterate_handles().
>>>>>>> >> >> [E 03/30 12:23] Warning: skipping entry.
>>>>>>> >> >> [E 03/30 12:23] c_get failed on iteration 3044 [E 03/30
>>>>>>> >> >> 12:23] dbpf_dspace_iterate_handles_op_svc: Invalid
>>>>>>> argument
>>>>>>> >> >> [E 03/30 12:23] Error adding handle range
>>>>>>> >> >> 1431655768-2147483649,3579139414-4294967295 to filesystem
>>>>>>> pvfs2-fs
>>>>>>> >> >> [E 03/30 12:23] Error: Could not initialize server
>>>>>>> >> >> interfaces; aborting.
>>>>>>> >> >> [E 03/30 12:23] Error: Could not initialize server; aborting.
>>>>>>> >> >>
>>>>>>> >> >> ------------
>>>>>>> >> >> pvfs2-fs.conf:
>>>>>>> >> >> -----------
>>>>>>> >> >>
>>>>>>> >> >> <Defaults>
>>>>>>> >> >>        UnexpectedRequests 50
>>>>>>> >> >>        EventLogging none
>>>>>>> >> >>        LogStamp datetime
>>>>>>> >> >>        BMIModules bmi_tcp
>>>>>>> >> >>        FlowModules flowproto_multiqueue
>>>>>>> >> >>        PerfUpdateInterval 1000
>>>>>>> >> >>        ServerJobBMITimeoutSecs 30
>>>>>>> >> >>        ServerJobFlowTimeoutSecs 30
>>>>>>> >> >>        ClientJobBMITimeoutSecs 300
>>>>>>> >> >>        ClientJobFlowTimeoutSecs 300
>>>>>>> >> >>        ClientRetryLimit 5
>>>>>>> >> >>        ClientRetryDelayMilliSecs 2000
>>>>>>> >> >>        StorageSpace /mnt/pvfs2
>>>>>>> >> >>        LogFile /var/log/pvfs2-server.log </Defaults>
>>>>>>> >> >>
>>>>>>> >> >> <Aliases>
>>>>>>> >> >>        Alias pvfs2-io-0-0 tcp://pvfs2-io-0-0:3334
>>>>>>> >> >>        Alias pvfs2-io-0-1 tcp://pvfs2-io-0-1:3334
>>>>>>> >> >>        Alias pvfs2-io-0-2 tcp://pvfs2-io-0-2:3334
>>>>>>> >> >> </Aliases>
>>>>>>> >> >>
>>>>>>> >> >> <Filesystem>
>>>>>>> >> >>        Name pvfs2-fs
>>>>>>> >> >>        ID 62659950
>>>>>>> >> >>        RootHandle 1048576
>>>>>>> >> >>        <MetaHandleRanges>
>>>>>>> >> >>                Range pvfs2-io-0-0 4-715827885
>>>>>>> >> >>                Range pvfs2-io-0-1 715827886-1431655767
>>>>>>> >> >>                Range pvfs2-io-0-2 1431655768-2147483649
>>>>>>> >> >>        </MetaHandleRanges>
>>>>>>> >> >>        <DataHandleRanges>
>>>>>>> >> >>                Range pvfs2-io-0-0 2147483650-2863311531
>>>>>>> >> >>                Range pvfs2-io-0-1 2863311532-3579139413
>>>>>>> >> >>                Range pvfs2-io-0-2 3579139414-4294967295
>>>>>>> >> >>        </DataHandleRanges>
>>>>>>> >> >>        <StorageHints>
>>>>>>> >> >>                TroveSyncMeta yes
>>>>>>> >> >>                TroveSyncData no
>>>>>>> >> >>        </StorageHints>
>>>>>>> >> >> </Filesystem>
>>>>>>> >> >> -------------
>>>>>>> >> >> Any suggestions for recovery?
>>>>>>> >> >>
>>>>>>> >> >> Thanks!
>>>>>>> >> >> --Jim
>>>>>>> >> >> _______________________________________________
>>>>>>> >> >> Pvfs2-users mailing list
>>>>>>> >> >> [email protected]
>>>>>>> >> >> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-u
>>>>>>> >> >> s
>>>>>>> >> >> e
>>>>>>> >> >> rs
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >> > --
>>>>>>> >> > Becky Ligon
>>>>>>> >> > OrangeFS Support and Development Omnibond Systems Anderson,
>>>>>>> >> > South Carolina
>>>>>>> >> >
>>>>>>> >> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > --
>>>>>>> > Becky Ligon
>>>>>>> > OrangeFS Support and Development Omnibond Systems Anderson,
>>>>>>> > South Carolina
>>>>>>> >
>>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Becky Ligon
>>>>>> OrangeFS Support and Development
>>>>>> Omnibond Systems
>>>>>> Anderson, South Carolina
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to