I think its repaired. After using Phil's method, I got a file that the pvfs2-display displayed all content, so I started the server and got: [S 04/05 10:45] PVFS2 Server on node pvfs2-io-0-2 version 2.8.2 starting... [E 04/05 10:45] Warning: got invalid handle or key size in dbpf_dspace_iterate_handles(). [E 04/05 10:45] Warning: skipping entry. [S 04/05 10:45] PVFS2 Server ready.
I believe this means recovery is as compelte as possible, and that there's an entry that's missing now, is this correct? Is it ready to go back into production (once I update versions of db and pvfs2)? --Jim On Wed, Apr 4, 2012 at 1:18 PM, Elaine Quarles <[email protected]> wrote: > Try "make develtools". > > -- Elaine > > -----Original Message----- > From: Jim Kusznir [mailto:[email protected]] > Sent: Wednesday, April 04, 2012 3:45 PM > To: Elaine Quarles > Subject: Re: [Pvfs2-users] Help: pvfs2-server won't start, errors detected > > I patched everything and ran configure and make, but it didn't build > pvfs2-db-display. The .c file is present. I haven't found the magic make > command to cause that to be built either...Suggestions? > > --Jim > > On Wed, Apr 4, 2012 at 11:35 AM, Elaine Quarles <[email protected]> wrote: >> Sorry for the delay. Attached are db-display.tar. If you expand this >> from the top level directory of your source tree it will create the >> src/apps/devel directory. Makefile.in.patch will patch your >> Makefile.in with the logic necessary to build pvfs2-db-display. Please >> note that it is necessary to run the configure script to update your > Makefile. >> >> Please send the results of running this utility so we can determine >> whether it is necessary to try continuous forward reading through the >> database, skipping error records or whether we will have to also read >> from the end of the database backwards. >> >> Thanks, >> Elaine >> >> -----Original Message----- >> From: Jim Kusznir [mailto:[email protected]] >> Sent: Wednesday, April 04, 2012 1:56 PM >> To: Elaine Quarles >> Cc: Becky Ligon >> Subject: Re: [Pvfs2-users] Help: pvfs2-server won't start, errors >> detected >> >> Any updates? My entire cluster is still offline due to this problem, >> and my users are starting to look for their pitchforks.... >> >> Thanks! >> --Jim >> >> On Tue, Apr 3, 2012 at 8:47 AM, Elaine Quarles <[email protected]> > wrote: >>> Jim, >>> >>> Could you please check whether your pvfs 2.8.2 distribution contains >>> src/apps/devel/pvfs2-db-display.c? If so you can build it by running >>> "make develtools". If your distribution does not contain this file >>> let me know and I will send a patch. >>> >>> If you already have the utility, please redirect the output and send >>> it so we can see what it has to say about the state of the database >>> and determine the next step from there. >>> >>> Here is the command-line format. >>> Usage: ./pvfs2-db-display --dbpath <path> --hexdir <hexdir> >>> Example: ./pvfs2-db-display --dbpath /tmp/pvfs2-space --hexdir >>> 4e3f77a5 >>> >>> Options: >>> --verbose Enable verbose output >>> --help This message. >>> --dbpath <path> The path of the server's StorageSpace. >>> The path >>> should contain collections.db and >>> storage_attributes.db >>> --hexdir <dir> The directory in dbpath that contains >>> collection_attributes.db, >>> dataspace_attrbutes.db >>> and keyval.db >>> >>> Thanks, >>> Elaine >>> >>> -----Original Message----- >>> From: Jim Kusznir [mailto:[email protected]] >>> Sent: Monday, April 02, 2012 5:57 PM >>> To: [email protected] >>> Cc: [email protected]; [email protected]; >>> [email protected] >>> Subject: Re: [Pvfs2-users] Help: pvfs2-server won't start, errors >>> detected >>> >>> If this is the recommended method for recovery, then lets do it. >>> >>> Just one more question on how pvfs2 runs: is the metadata contained >>> on each server different, or should they all be identical copies? It >>> just occurred to me that my understanding of the metadata was that >>> all three metadata servers were redundant..... Or is this a >>> "different >> metadata" db? >>> >>> --Jim >>> >>> On Mon, Apr 2, 2012 at 1:15 PM, Becky Ligon <[email protected]> wrote: >>>> Jim: >>>> >>>> We have a program called pvfs2-db-display that reads directly >>>> through the Berkeley DB. We don't know for sure, but we might be >>>> able to use whatever information it will give to recover what we >>>> can. The program reads from the database from logical top to >>>> bottom. We can also change it to read from logical bottom to top. >>>> In this way, we MAY be able to recover the good data that is still >>>> there above and below the corrupted area. We've never done this but >>>> we are willing to give it a >>> try. >>>> >>>> Let us know if you'd like to try this! >>>> >>>> Becky >>>> -- >>>> Becky Ligon >>>> HPC Admin Staff >>>> PVFS/OrangeFS Developer >>>> Clemson University/Omnibond.com OrangeFS Support >>>> 864-650-4065 >>>> >>>>> Your solution sounds like what I am trying to do; I'd prefer to >>>>> install db4 into /opt. >>>>> >>>>> If I can get your spec file or srpm, I'd greatly appreciate it! >>>>> >>>>> --Jim >>>>> >>>>> On Mon, Apr 2, 2012 at 11:19 AM, Becky Ligon <[email protected]> > wrote: >>>>>> Jim: >>>>>> >>>>>> We downloaded the software from the Oracle site and created an rpm >>>>>> from that. We are running Centos5 on our productions servers with >>>>>> kernel=2.6.18-238.9.1.el5 and have been running a version of db4 >>>>>> for at least the past 3 years. So, you should be able to create >>>>>> the rpm. I can send you the rpm that we are using but it is >>>>>> taylored to our environment; we install db4 in /opt/db4, because >>>>>> other items depend on the installed version. >>>>>> >>>>>> Becky >>>>>> >>>>>> >>>>>> On Mon, Apr 2, 2012 at 1:37 PM, Jim Kusznir <[email protected]> > wrote: >>>>>>> >>>>>>> I've been trying to build a db4 rpm on my centos box, but it >>>>>>> appears it has dependencies that require an OS upgrade...how did >>>>>>> you get anything newer than the stock db4 installed on centos5? >>>>>>> >>>>>>> --Jim >>>>>>> >>>>>>> On Sat, Mar 31, 2012 at 3:07 PM, Becky Ligon <[email protected]> >>>>>>> wrote: >>>>>>> > Jim: >>>>>>> > >>>>>>> > I understand your situation. Here at Clemson University, we >>>>>>> > went through the same situation a couple of years ago. Now, we >>>>>>> > backup the >>>>>>> metadata >>>>>>> > databases. We don't have the space to backup our data either! >>>>>>> > >>>>>>> > Under no circumstances should you run pvfs2-fsck. If you do, >>>>>>> > then we won't be able to help at all, if you run this command >>>>>>> > in the destructive >>>>>>> mode. >>>>>>> > If >>>>>>> > you're willing, Omnibond MAY be able to write some utilities >>>>>>> > that we help you recover most of the data. You will have to >>>>>>> > speak to Boyd Wilson >>>>>>> > ([email protected]) and workout something. >>>>>>> > >>>>>>> > Becky Ligon >>>>>>> > >>>>>>> > >>>>>>> > On Fri, Mar 30, 2012 at 5:55 PM, Jim Kusznir >>>>>>> > <[email protected]> >>>>>>> wrote: >>>>>>> >> >>>>>>> >> I made no changes to my environment; it was up and running >>>>>>> >> just >>>>>>> fine. >>>>>>> >> I ran db_recover, and it immediately returned, with no >>>>>>> >> apparent sign of doing anything but creating a log.000000001 file. >>>>>>> >> >>>>>>> >> I have the centos DB installed, db4-4.3.29-10.el5 >>>>>>> >> >>>>>>> >> I have no backups; this is my high performance filesystem of >>>>>>> >> 99TB; >>>>>>> it >>>>>>> >> is the largest disk we have and therefore have no means of >>>>>>> >> backing >>>>>>> it >>>>>>> >> up. We don't have anything big enough to hold that much data. >>>>>>> >> >>>>>>> >> Is there any hope? Can we just identify and delete the files >>>>>>> >> that have the db dammange on it? (Note that I don't even have >>>>>>> >> anywhere >>>>>>> to >>>>>>> >> back up this data to temporally if we do get it running, so >>>>>>> >> I'd need to "fix in place". >>>>>>> >> >>>>>>> >> thanks! >>>>>>> >> --Jim >>>>>>> >> >>>>>>> >> --Jim >>>>>>> >> >>>>>>> >> On Fri, Mar 30, 2012 at 2:44 PM, Becky Ligon >>>>>>> >> <[email protected]> >>>>>>> >> wrote: >>>>>>> >> > Jim: >>>>>>> >> > >>>>>>> >> > If you haven't made any recent changes to your pvfs >>>>>>> >> > environment or Berkeley Db installation, then it looks like >>>>>>> >> > you have a corrupted metadata database. >>>>>>> >> > There is no way to easily recover. Sometimes, the Berkeley >>>>>>> >> > db command "db_recover" might work, but PVFS doesn't have >>>>>>> >> > transactions turned on, so normally it doesn't work. It's >>>>>>> >> > worth a try, just to be sure. >>>>>>> >> > >>>>>>> >> > Do you have any recent backups of the databases? If so, >>>>>>> >> > then you will need to use a set of backups that were created >>>>>>> >> > around the same time, so the databases will be somewhat >>>>>>> >> > consistent with each other. >>>>>>> >> > >>>>>>> >> > Which version of Berkeley are you using? We have had >>>>>>> >> > corruption issues with older versions of it. We strongly >>>>>>> >> > recommend 4.8 or higher. There are some know problems with >>>>>>> >> > threads in the older versions . >>>>>>> >> > >>>>>>> >> > Becky Ligon >>>>>>> >> > >>>>>>> >> > On Fri, Mar 30, 2012 at 3:28 PM, Jim Kusznir >>>>>>> >> > <[email protected]> >>>>>>> >> > wrote: >>>>>>> >> >> >>>>>>> >> >> Hi all: >>>>>>> >> >> >>>>>>> >> >> I got some notices from my users with "wierdness with pvfs2" >>>>>>> >> >> this morning, and went and investagated. eventually, I >>>>>>> >> >> found the following on one of my 3 serers: >>>>>>> >> >> >>>>>>> >> >> [S 03/30 12:22] PVFS2 Server on node pvfs2-io-0-2 version >>>>>>> >> >> 2.8.2 starting... >>>>>>> >> >> [E 03/30 12:23] Warning: got invalid handle or key size in >>>>>>> >> >> dbpf_dspace_iterate_handles(). >>>>>>> >> >> [E 03/30 12:23] Warning: skipping entry. >>>>>>> >> >> [E 03/30 12:23] c_get failed on iteration 3044 [E 03/30 >>>>>>> >> >> 12:23] dbpf_dspace_iterate_handles_op_svc: Invalid >>>>>>> argument >>>>>>> >> >> [E 03/30 12:23] Error adding handle range >>>>>>> >> >> 1431655768-2147483649,3579139414-4294967295 to filesystem >>>>>>> pvfs2-fs >>>>>>> >> >> [E 03/30 12:23] Error: Could not initialize server >>>>>>> >> >> interfaces; aborting. >>>>>>> >> >> [E 03/30 12:23] Error: Could not initialize server; aborting. >>>>>>> >> >> >>>>>>> >> >> ------------ >>>>>>> >> >> pvfs2-fs.conf: >>>>>>> >> >> ----------- >>>>>>> >> >> >>>>>>> >> >> <Defaults> >>>>>>> >> >> UnexpectedRequests 50 >>>>>>> >> >> EventLogging none >>>>>>> >> >> LogStamp datetime >>>>>>> >> >> BMIModules bmi_tcp >>>>>>> >> >> FlowModules flowproto_multiqueue >>>>>>> >> >> PerfUpdateInterval 1000 >>>>>>> >> >> ServerJobBMITimeoutSecs 30 >>>>>>> >> >> ServerJobFlowTimeoutSecs 30 >>>>>>> >> >> ClientJobBMITimeoutSecs 300 >>>>>>> >> >> ClientJobFlowTimeoutSecs 300 >>>>>>> >> >> ClientRetryLimit 5 >>>>>>> >> >> ClientRetryDelayMilliSecs 2000 >>>>>>> >> >> StorageSpace /mnt/pvfs2 >>>>>>> >> >> LogFile /var/log/pvfs2-server.log </Defaults> >>>>>>> >> >> >>>>>>> >> >> <Aliases> >>>>>>> >> >> Alias pvfs2-io-0-0 tcp://pvfs2-io-0-0:3334 >>>>>>> >> >> Alias pvfs2-io-0-1 tcp://pvfs2-io-0-1:3334 >>>>>>> >> >> Alias pvfs2-io-0-2 tcp://pvfs2-io-0-2:3334 >>>>>>> >> >> </Aliases> >>>>>>> >> >> >>>>>>> >> >> <Filesystem> >>>>>>> >> >> Name pvfs2-fs >>>>>>> >> >> ID 62659950 >>>>>>> >> >> RootHandle 1048576 >>>>>>> >> >> <MetaHandleRanges> >>>>>>> >> >> Range pvfs2-io-0-0 4-715827885 >>>>>>> >> >> Range pvfs2-io-0-1 715827886-1431655767 >>>>>>> >> >> Range pvfs2-io-0-2 1431655768-2147483649 >>>>>>> >> >> </MetaHandleRanges> >>>>>>> >> >> <DataHandleRanges> >>>>>>> >> >> Range pvfs2-io-0-0 2147483650-2863311531 >>>>>>> >> >> Range pvfs2-io-0-1 2863311532-3579139413 >>>>>>> >> >> Range pvfs2-io-0-2 3579139414-4294967295 >>>>>>> >> >> </DataHandleRanges> >>>>>>> >> >> <StorageHints> >>>>>>> >> >> TroveSyncMeta yes >>>>>>> >> >> TroveSyncData no >>>>>>> >> >> </StorageHints> >>>>>>> >> >> </Filesystem> >>>>>>> >> >> ------------- >>>>>>> >> >> Any suggestions for recovery? >>>>>>> >> >> >>>>>>> >> >> Thanks! >>>>>>> >> >> --Jim >>>>>>> >> >> _______________________________________________ >>>>>>> >> >> Pvfs2-users mailing list >>>>>>> >> >> [email protected] >>>>>>> >> >> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-u >>>>>>> >> >> s >>>>>>> >> >> e >>>>>>> >> >> rs >>>>>>> >> > >>>>>>> >> > >>>>>>> >> > >>>>>>> >> > >>>>>>> >> > -- >>>>>>> >> > Becky Ligon >>>>>>> >> > OrangeFS Support and Development Omnibond Systems Anderson, >>>>>>> >> > South Carolina >>>>>>> >> > >>>>>>> >> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > -- >>>>>>> > Becky Ligon >>>>>>> > OrangeFS Support and Development Omnibond Systems Anderson, >>>>>>> > South Carolina >>>>>>> > >>>>>>> > >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Becky Ligon >>>>>> OrangeFS Support and Development >>>>>> Omnibond Systems >>>>>> Anderson, South Carolina >>>>>> >>>>>> >>>>> >>>> >>>> >>> > _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
