I noticed a similar thread where someone ran a fsck and recovered. I tried a fsck with no luck. I ran db_verify on all of the .db files and it didn¹t show anything. Below is the debug output of the server:
[D 06/29 15:29] Passing tcp://oss004-4:3337 as BMI listen address. [D 06/29 15:29] BMI_tcp_initialize: Initializing TCP/IP module. [D 06/29 15:29] BMI_tcp_initialize: TCP/IP module successfully initialized. [D 06/29 15:29] Server using shm key hint: 373672738 [D 06/29 15:29] [BMI CONTROL]: BMI_set_info: set_info: 0 option: 11 [D 06/29 15:29] Default socket buffers send:16384 receive:87380 [D 06/29 15:29] Setting socket buffer size for send:0 receive:0 [D 06/29 15:29] Reread socket buffers send:16384 receive:87380 [D 06/29 15:29] [BMI CONTROL]: BMI_set_info: set_info: 0 option: 12 [D 06/29 15:29] Default socket buffers send:16384 receive:87380 [D 06/29 15:29] Setting socket buffer size for send:0 receive:0 [D 06/29 15:29] Reread socket buffers send:16384 receive:87380 [D 06/29 15:29] dbpf_thread_initialize: initialized [D 06/29 15:29] [SYNC_COALESCE]: dbpf_sync_context_init for context 0 called [D 06/29 15:29] dbpf_collection_lookup of coll: pvfs2-fs [D 06/29 15:29] dbpf using default db cache size. [D 06/29 15:29] dbpf using shm key: 1020239961 [D 06/29 15:29] collection lookup: version is 0.1.4 [D 06/29 15:29] [SYNC_COALESCE]: dbpf_sync_context_init for context 1 called [D 06/29 15:29] dbpf collection 373672578 - Setting handle timeout to 360000000 microseconds [D 06/29 15:29] - set handle re-use timeout to 360 seconds (ret=0) [D 06/29 15:29] dbpf collection 373672578 - Setting cache keywords of attribute cache to dh, [D 06/29 15:29] Setting dbpf_attr_cache keywords to: dh, [D 06/29 15:29] dbpf collection 373672578 - Setting cache size of attribute cache to 511 [D 06/29 15:29] dbpf collection 373672578 - Setting maximum elements of attribute cache to 1024 [D 06/29 15:29] dbpf collection 373672578 - Initialize collection attr. cache [D 06/29 15:29] There are 1 cacheable keywords registered [D 06/29 15:29] dbpf_attr_cache_initialize: initialized [D 06/29 15:29] dbpf collection 373672578 - Setting collection handle ranges to 4323455642275676148-4611686018427387890,8935141660703064036-9223372036854775 778 [D 06/29 15:29] op_queue add: 0x9f96380 [D 06/29 15:29] dbpf_thread_function started [D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE (DSPACE_ITERATE_HANDLES) [D 06/29 15:29] handle_new_connection: Assigning socket 11 to new method addr. [D 06/29 15:29] tcp_do_work_recv: Reading header for new op. [D 06/29 15:29] tcp_do_work_recv: Received new message; mode: 2. [D 06/29 15:29] tcp_do_work_recv: tag: 5865658 [D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE (DSPACE_ITERATE_HANDLES) (ret: 1) [D 06/29 15:29] op_queue add: 0x9f96380 [D 06/29 15:29] handle_new_connection: Assigning socket 12 to new method addr. [D 06/29 15:29] op_queue add: 0x9f9da50 [D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE (DSPACE_ITERATE_HANDLES) [D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE (DSPACE_ITERATE_HANDLES) (ret: 1) [D 06/29 15:29] op_queue add: 0x9f9da50 [D 06/29 15:29] op_queue add: 0x9fa63d0 [D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE (DSPACE_ITERATE_HANDLES) [D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE (DSPACE_ITERATE_HANDLES) (ret: 1) [D 06/29 15:29] op_queue add: 0x9fa63d0 [D 06/29 15:29] op_queue add: 0x9fad360 [D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE (DSPACE_ITERATE_HANDLES) [D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE (DSPACE_ITERATE_HANDLES) (ret: 1) [D 06/29 15:29] op_queue add: 0x9fad360 [D 06/29 15:29] op_queue add: 0x9fb0bf0 [D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE (DSPACE_ITERATE_HANDLES) [D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE (DSPACE_ITERATE_HANDLES) (ret: 1) [D 06/29 15:29] op_queue add: 0x9fb0bf0 [D 06/29 15:29] op_queue add: 0x9fb2f90 [D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE (DSPACE_ITERATE_HANDLES) [D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE (DSPACE_ITERATE_HANDLES) (ret: 1) [D 06/29 15:29] op_queue add: 0x9fb2f90 [D 06/29 15:29] op_queue add: 0x9fb5ab0 [D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE (DSPACE_ITERATE_HANDLES) [D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE (DSPACE_ITERATE_HANDLES) (ret: 1) [D 06/29 15:29] op_queue add: 0x9fb5ab0 [D 06/29 15:29] op_queue add: 0x9fc7a30 [D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE (DSPACE_ITERATE_HANDLES) [D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE (DSPACE_ITERATE_HANDLES) (ret: 1) [D 06/29 15:29] op_queue add: 0x9fc7a30 [D 06/29 15:29] op_queue add: 0x9fca500 [D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE (DSPACE_ITERATE_HANDLES) [D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE (DSPACE_ITERATE_HANDLES) (ret: 1) [D 06/29 15:29] op_queue add: 0x9fca500 [D 06/29 15:29] op_queue add: 0x9fca690 [D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE (DSPACE_ITERATE_HANDLES) [D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE (DSPACE_ITERATE_HANDLES) (ret: 1) [D 06/29 15:29] op_queue add: 0x9fca690 [D 06/29 15:29] op_queue add: 0x9fe1980 [D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE (DSPACE_ITERATE_HANDLES) [D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE (DSPACE_ITERATE_HANDLES) (ret: 1) [D 06/29 15:29] op_queue add: 0x9fe1980 [D 06/29 15:29] op_queue add: 0x9fe2330 [D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE (DSPACE_ITERATE_HANDLES) [E 06/29 15:29] dbpf_dspace_iterate_handles_op_svc: Invalid argument [D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE (DSPACE_ITERATE_HANDLES) (ret: -1073742095) [D 06/29 15:29] op_queue add: 0x9fe2330 [D 06/29 15:29] trove_dspace_iterate_handles failed [E 06/29 15:29] Error adding handle range 4323455642275676148-4611686018427387890,8935141660703064036-9223372036854775 778 to filesystem pvfs2-fs [E 06/29 15:29] Error: Could not initialize server interfaces; aborting. [E 06/29 15:29] Error: Could not initialize server; aborting. [D 06/29 15:29] *** server shutdown in progress *** -Randy From: Randall Martin <[email protected]> Date: Mon, 29 Jun 2009 14:05:33 -0400 To: <[email protected]> Subject: [Pvfs2-users] PVFS server won't start One of our PVFS servers crashed and now it won¹t start back. It was previously working since June 2 until today¹s crash. Any ideas on how to fix it? I was running the 2.8.1 released version, but I also tried the HEAD version with no change in symptoms. >From the server log: [D 06/29 13:49] PVFS2 Server version 2.8.1pre1-2009-06-26-182521 starting. [E 06/29 13:49] dbpf_dspace_iterate_handles_op_svc: Invalid argument [E 06/29 13:49] Error adding handle range 4323455642275676148-4611686018427387890,8935141660703064036-9223372036854775 778 to filesystem pvfs2-fs [E 06/29 13:49] Error: Could not initialize server interfaces; aborting. [E 06/29 13:49] Error: Could not initialize server; aborting. My config file: <Defaults> UnexpectedRequests 50 EventLogging none EnableTracing no LogStamp datetime BMIModules bmi_tcp FlowModules flowproto_multiqueue PerfUpdateInterval 1000 ServerJobBMITimeoutSecs 30 ServerJobFlowTimeoutSecs 30 ClientJobBMITimeoutSecs 300 ClientJobFlowTimeoutSecs 300 ClientRetryLimit 60 ClientRetryDelayMilliSecs 10000 PrecreateBatchSize 512 PrecreateLowThreshold 256 </Defaults> <Aliases> Alias oss001-1 tcp://oss001-1:3334 Alias oss001-2 tcp://oss001-2:3335 Alias oss001-3 tcp://oss001-3:3336 Alias oss001-4 tcp://oss001-4:3337 Alias oss002-1 tcp://oss002-1:3334 Alias oss002-2 tcp://oss002-2:3335 Alias oss002-3 tcp://oss002-3:3336 Alias oss002-4 tcp://oss002-4:3337 Alias oss003-1 tcp://oss003-1:3334 Alias oss003-2 tcp://oss003-2:3335 Alias oss003-3 tcp://oss003-3:3336 Alias oss003-4 tcp://oss003-4:3337 Alias oss004-1 tcp://oss004-1:3334 Alias oss004-2 tcp://oss004-2:3335 Alias oss004-3 tcp://oss004-3:3336 Alias oss004-4 tcp://oss004-4:3337 </Aliases> <ServerOptions> Server oss001-1 StorageSpace /ost1 LogFile /var/log/pvfs2-server.oss001-1.log </ServerOptions> <ServerOptions> Server oss001-2 StorageSpace /ost2 LogFile /var/log/pvfs2-server.oss001-2.log </ServerOptions> <ServerOptions> Server oss001-3 StorageSpace /ost3 LogFile /var/log/pvfs2-server.oss001-3.log </ServerOptions> <ServerOptions> Server oss001-4 StorageSpace /ost4 LogFile /var/log/pvfs2-server.oss001-4.log </ServerOptions> <ServerOptions> Server oss002-1 StorageSpace /ost5 LogFile /var/log/pvfs2-server.oss002-1.log </ServerOptions> <ServerOptions> Server oss002-2 StorageSpace /ost6 LogFile /var/log/pvfs2-server.oss002-2.log </ServerOptions> <ServerOptions> Server oss002-3 StorageSpace /ost7 LogFile /var/log/pvfs2-server.oss002-3.log </ServerOptions> <ServerOptions> Server oss002-4 StorageSpace /ost8 LogFile /var/log/pvfs2-server.oss002-4.log </ServerOptions> <ServerOptions> Server oss003-1 StorageSpace /ost9 LogFile /var/log/pvfs2-server.oss003-1.log </ServerOptions> <ServerOptions> Server oss003-2 StorageSpace /ost10 LogFile /var/log/pvfs2-server.oss003-2.log </ServerOptions> <ServerOptions> Server oss003-3 StorageSpace /ost11 LogFile /var/log/pvfs2-server.oss003-3.log </ServerOptions> <ServerOptions> Server oss003-4 StorageSpace /ost12 LogFile /var/log/pvfs2-server.oss003-4.log </ServerOptions> <ServerOptions> Server oss004-1 StorageSpace /ost13 LogFile /var/log/pvfs2-server.oss004-1.log </ServerOptions> <ServerOptions> Server oss004-2 StorageSpace /ost14 LogFile /var/log/pvfs2-server.oss004-2.log </ServerOptions> <ServerOptions> Server oss004-3 StorageSpace /ost15 LogFile /var/log/pvfs2-server.oss004-3.log </ServerOptions> <ServerOptions> Server oss004-4 StorageSpace /ost16 LogFile /var/log/pvfs2-server.oss004-4.log </ServerOptions> <Filesystem> Name pvfs2-fs ID 373672578 RootHandle 1048576 FileStuffing yes <MetaHandleRanges> Range oss001-1 3-288230376151711745 Range oss001-2 288230376151711746-576460752303423488 Range oss001-3 576460752303423489-864691128455135231 Range oss001-4 864691128455135232-1152921504606846974 Range oss002-1 1152921504606846975-1441151880758558717 Range oss002-2 1441151880758558718-1729382256910270460 Range oss002-3 1729382256910270461-2017612633061982203 Range oss002-4 2017612633061982204-2305843009213693946 Range oss003-1 2305843009213693947-2594073385365405689 Range oss003-2 2594073385365405690-2882303761517117432 Range oss003-3 2882303761517117433-3170534137668829175 Range oss003-4 3170534137668829176-3458764513820540918 Range oss004-1 3458764513820540919-3746994889972252661 Range oss004-2 3746994889972252662-4035225266123964404 Range oss004-3 4035225266123964405-4323455642275676147 Range oss004-4 4323455642275676148-4611686018427387890 </MetaHandleRanges> <DataHandleRanges> Range oss001-1 4611686018427387891-4899916394579099633 Range oss001-2 4899916394579099634-5188146770730811376 Range oss001-3 5188146770730811377-5476377146882523119 Range oss001-4 5476377146882523120-5764607523034234862 Range oss002-1 5764607523034234863-6052837899185946605 Range oss002-2 6052837899185946606-6341068275337658348 Range oss002-3 6341068275337658349-6629298651489370091 Range oss002-4 6629298651489370092-6917529027641081834 Range oss003-1 6917529027641081835-7205759403792793577 Range oss003-2 7205759403792793578-7493989779944505320 Range oss003-3 7493989779944505321-7782220156096217063 Range oss003-4 7782220156096217064-8070450532247928806 Range oss004-1 8070450532247928807-8358680908399640549 Range oss004-2 8358680908399640550-8646911284551352292 Range oss004-3 8646911284551352293-8935141660703064035 Range oss004-4 8935141660703064036-9223372036854775778 </DataHandleRanges> <StorageHints> TroveSyncMeta no TroveSyncData no TroveMethod alt-aio </StorageHints> <Distribution> Name simple_stripe Param strip_size Value 1048576 </Distribution> </Filesystem> Thanks, Randy _______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
