Hi Randy,
It looks like maybe one of the entries in the attributes db is damaged,
but the error output doesn't give much detail.
Any chance you could apply the attached patch and show the log output
again? It doesn't change the behavior other than to print out more
error messages in the cases that (I think) you are hitting.
thanks,
-Phil
Randall Martin wrote:
I noticed a similar thread where someone ran a fsck and recovered. I
tried a fsck with no luck. I ran db_verify on all of the .db files and
it didn’t show anything. Below is the debug output of the server:
[D 06/29 15:29] Passing tcp://oss004-4:3337 as BMI listen address.
[D 06/29 15:29] BMI_tcp_initialize: Initializing TCP/IP module.
[D 06/29 15:29] BMI_tcp_initialize: TCP/IP module successfully initialized.
[D 06/29 15:29] Server using shm key hint: 373672738
[D 06/29 15:29] [BMI CONTROL]: BMI_set_info: set_info: 0 option: 11
[D 06/29 15:29] Default socket buffers send:16384 receive:87380
[D 06/29 15:29] Setting socket buffer size for send:0 receive:0
[D 06/29 15:29] Reread socket buffers send:16384 receive:87380
[D 06/29 15:29] [BMI CONTROL]: BMI_set_info: set_info: 0 option: 12
[D 06/29 15:29] Default socket buffers send:16384 receive:87380
[D 06/29 15:29] Setting socket buffer size for send:0 receive:0
[D 06/29 15:29] Reread socket buffers send:16384 receive:87380
[D 06/29 15:29] dbpf_thread_initialize: initialized
[D 06/29 15:29] [SYNC_COALESCE]: dbpf_sync_context_init for context 0 called
[D 06/29 15:29] dbpf_collection_lookup of coll: pvfs2-fs
[D 06/29 15:29] dbpf using default db cache size.
[D 06/29 15:29] dbpf using shm key: 1020239961
[D 06/29 15:29] collection lookup: version is 0.1.4
[D 06/29 15:29] [SYNC_COALESCE]: dbpf_sync_context_init for context 1 called
[D 06/29 15:29] dbpf collection 373672578 - Setting handle timeout to
360000000 microseconds
[D 06/29 15:29] - set handle re-use timeout to 360 seconds (ret=0)
[D 06/29 15:29] dbpf collection 373672578 - Setting cache keywords of
attribute cache to dh,
[D 06/29 15:29] Setting dbpf_attr_cache keywords to:
dh,
[D 06/29 15:29] dbpf collection 373672578 - Setting cache size of
attribute cache to 511
[D 06/29 15:29] dbpf collection 373672578 - Setting maximum elements of
attribute cache to 1024
[D 06/29 15:29] dbpf collection 373672578 - Initialize collection attr.
cache
[D 06/29 15:29] There are 1 cacheable keywords registered
[D 06/29 15:29] dbpf_attr_cache_initialize: initialized
[D 06/29 15:29] dbpf collection 373672578 - Setting collection handle
ranges to
4323455642275676148-4611686018427387890,8935141660703064036-9223372036854775778
[D 06/29 15:29] op_queue add: 0x9f96380
[D 06/29 15:29] dbpf_thread_function started
[D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE
(DSPACE_ITERATE_HANDLES)
[D 06/29 15:29] handle_new_connection: Assigning socket 11 to new method
addr.
[D 06/29 15:29] tcp_do_work_recv: Reading header for new op.
[D 06/29 15:29] tcp_do_work_recv: Received new message; mode: 2.
[D 06/29 15:29] tcp_do_work_recv: tag: 5865658
[D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE
(DSPACE_ITERATE_HANDLES) (ret: 1)
[D 06/29 15:29] op_queue add: 0x9f96380
[D 06/29 15:29] handle_new_connection: Assigning socket 12 to new method
addr.
[D 06/29 15:29] op_queue add: 0x9f9da50
[D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE
(DSPACE_ITERATE_HANDLES)
[D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE
(DSPACE_ITERATE_HANDLES) (ret: 1)
[D 06/29 15:29] op_queue add: 0x9f9da50
[D 06/29 15:29] op_queue add: 0x9fa63d0
[D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE
(DSPACE_ITERATE_HANDLES)
[D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE
(DSPACE_ITERATE_HANDLES) (ret: 1)
[D 06/29 15:29] op_queue add: 0x9fa63d0
[D 06/29 15:29] op_queue add: 0x9fad360
[D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE
(DSPACE_ITERATE_HANDLES)
[D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE
(DSPACE_ITERATE_HANDLES) (ret: 1)
[D 06/29 15:29] op_queue add: 0x9fad360
[D 06/29 15:29] op_queue add: 0x9fb0bf0
[D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE
(DSPACE_ITERATE_HANDLES)
[D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE
(DSPACE_ITERATE_HANDLES) (ret: 1)
[D 06/29 15:29] op_queue add: 0x9fb0bf0
[D 06/29 15:29] op_queue add: 0x9fb2f90
[D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE
(DSPACE_ITERATE_HANDLES)
[D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE
(DSPACE_ITERATE_HANDLES) (ret: 1)
[D 06/29 15:29] op_queue add: 0x9fb2f90
[D 06/29 15:29] op_queue add: 0x9fb5ab0
[D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE
(DSPACE_ITERATE_HANDLES)
[D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE
(DSPACE_ITERATE_HANDLES) (ret: 1)
[D 06/29 15:29] op_queue add: 0x9fb5ab0
[D 06/29 15:29] op_queue add: 0x9fc7a30
[D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE
(DSPACE_ITERATE_HANDLES)
[D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE
(DSPACE_ITERATE_HANDLES) (ret: 1)
[D 06/29 15:29] op_queue add: 0x9fc7a30
[D 06/29 15:29] op_queue add: 0x9fca500
[D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE
(DSPACE_ITERATE_HANDLES)
[D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE
(DSPACE_ITERATE_HANDLES) (ret: 1)
[D 06/29 15:29] op_queue add: 0x9fca500
[D 06/29 15:29] op_queue add: 0x9fca690
[D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE
(DSPACE_ITERATE_HANDLES)
[D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE
(DSPACE_ITERATE_HANDLES) (ret: 1)
[D 06/29 15:29] op_queue add: 0x9fca690
[D 06/29 15:29] op_queue add: 0x9fe1980
[D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE
(DSPACE_ITERATE_HANDLES)
[D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE
(DSPACE_ITERATE_HANDLES) (ret: 1)
[D 06/29 15:29] op_queue add: 0x9fe1980
[D 06/29 15:29] op_queue add: 0x9fe2330
[D 06/29 15:29] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE
(DSPACE_ITERATE_HANDLES)
[E 06/29 15:29] dbpf_dspace_iterate_handles_op_svc: Invalid argument
[D 06/29 15:29] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE
(DSPACE_ITERATE_HANDLES) (ret: -1073742095)
[D 06/29 15:29] op_queue add: 0x9fe2330
[D 06/29 15:29] trove_dspace_iterate_handles failed
[E 06/29 15:29] Error adding handle range
4323455642275676148-4611686018427387890,8935141660703064036-9223372036854775778
to filesystem pvfs2-fs
[E 06/29 15:29] Error: Could not initialize server interfaces; aborting.
[E 06/29 15:29] Error: Could not initialize server; aborting.
[D 06/29 15:29] *** server shutdown in progress ***
-Randy
------------------------------------------------------------------------
*From: *Randall Martin <[email protected]>
*Date: *Mon, 29 Jun 2009 14:05:33 -0400
*To: *<[email protected]>
*Subject: *[Pvfs2-users] PVFS server won't start
One of our PVFS servers crashed and now it won’t start back. It was
previously working since June 2 until today’s crash. Any ideas on how
to fix it? I was running the 2.8.1 released version, but I also tried
the HEAD version with no change in symptoms.
From the server log:
[D 06/29 13:49] PVFS2 Server version 2.8.1pre1-2009-06-26-182521 starting.
[E 06/29 13:49] dbpf_dspace_iterate_handles_op_svc: Invalid argument
[E 06/29 13:49] Error adding handle range
4323455642275676148-4611686018427387890,8935141660703064036-9223372036854775778
to filesystem pvfs2-fs
[E 06/29 13:49] Error: Could not initialize server interfaces; aborting.
[E 06/29 13:49] Error: Could not initialize server; aborting.
My config file:
<Defaults>
UnexpectedRequests 50
EventLogging none
EnableTracing no
LogStamp datetime
BMIModules bmi_tcp
FlowModules flowproto_multiqueue
PerfUpdateInterval 1000
ServerJobBMITimeoutSecs 30
ServerJobFlowTimeoutSecs 30
ClientJobBMITimeoutSecs 300
ClientJobFlowTimeoutSecs 300
ClientRetryLimit 60
ClientRetryDelayMilliSecs 10000
PrecreateBatchSize 512
PrecreateLowThreshold 256
</Defaults>
<Aliases>
Alias oss001-1 tcp://oss001-1:3334
Alias oss001-2 tcp://oss001-2:3335
Alias oss001-3 tcp://oss001-3:3336
Alias oss001-4 tcp://oss001-4:3337
Alias oss002-1 tcp://oss002-1:3334
Alias oss002-2 tcp://oss002-2:3335
Alias oss002-3 tcp://oss002-3:3336
Alias oss002-4 tcp://oss002-4:3337
Alias oss003-1 tcp://oss003-1:3334
Alias oss003-2 tcp://oss003-2:3335
Alias oss003-3 tcp://oss003-3:3336
Alias oss003-4 tcp://oss003-4:3337
Alias oss004-1 tcp://oss004-1:3334
Alias oss004-2 tcp://oss004-2:3335
Alias oss004-3 tcp://oss004-3:3336
Alias oss004-4 tcp://oss004-4:3337
</Aliases>
<ServerOptions>
Server oss001-1
StorageSpace /ost1
LogFile /var/log/pvfs2-server.oss001-1.log
</ServerOptions>
<ServerOptions>
Server oss001-2
StorageSpace /ost2
LogFile /var/log/pvfs2-server.oss001-2.log
</ServerOptions>
<ServerOptions>
Server oss001-3
StorageSpace /ost3
LogFile /var/log/pvfs2-server.oss001-3.log
</ServerOptions>
<ServerOptions>
Server oss001-4
StorageSpace /ost4
LogFile /var/log/pvfs2-server.oss001-4.log
</ServerOptions>
<ServerOptions>
Server oss002-1
StorageSpace /ost5
LogFile /var/log/pvfs2-server.oss002-1.log
</ServerOptions>
<ServerOptions>
Server oss002-2
StorageSpace /ost6
LogFile /var/log/pvfs2-server.oss002-2.log
</ServerOptions>
<ServerOptions>
Server oss002-3
StorageSpace /ost7
LogFile /var/log/pvfs2-server.oss002-3.log
</ServerOptions>
<ServerOptions>
Server oss002-4
StorageSpace /ost8
LogFile /var/log/pvfs2-server.oss002-4.log
</ServerOptions>
<ServerOptions>
Server oss003-1
StorageSpace /ost9
LogFile /var/log/pvfs2-server.oss003-1.log
</ServerOptions>
<ServerOptions>
Server oss003-2
StorageSpace /ost10
LogFile /var/log/pvfs2-server.oss003-2.log
</ServerOptions>
<ServerOptions>
Server oss003-3
StorageSpace /ost11
LogFile /var/log/pvfs2-server.oss003-3.log
</ServerOptions>
<ServerOptions>
Server oss003-4
StorageSpace /ost12
LogFile /var/log/pvfs2-server.oss003-4.log
</ServerOptions>
<ServerOptions>
Server oss004-1
StorageSpace /ost13
LogFile /var/log/pvfs2-server.oss004-1.log
</ServerOptions>
<ServerOptions>
Server oss004-2
StorageSpace /ost14
LogFile /var/log/pvfs2-server.oss004-2.log
</ServerOptions>
<ServerOptions>
Server oss004-3
StorageSpace /ost15
LogFile /var/log/pvfs2-server.oss004-3.log
</ServerOptions>
<ServerOptions>
Server oss004-4
StorageSpace /ost16
LogFile /var/log/pvfs2-server.oss004-4.log
</ServerOptions>
<Filesystem>
Name pvfs2-fs
ID 373672578
RootHandle 1048576
FileStuffing yes
<MetaHandleRanges>
Range oss001-1 3-288230376151711745
Range oss001-2 288230376151711746-576460752303423488
Range oss001-3 576460752303423489-864691128455135231
Range oss001-4 864691128455135232-1152921504606846974
Range oss002-1 1152921504606846975-1441151880758558717
Range oss002-2 1441151880758558718-1729382256910270460
Range oss002-3 1729382256910270461-2017612633061982203
Range oss002-4 2017612633061982204-2305843009213693946
Range oss003-1 2305843009213693947-2594073385365405689
Range oss003-2 2594073385365405690-2882303761517117432
Range oss003-3 2882303761517117433-3170534137668829175
Range oss003-4 3170534137668829176-3458764513820540918
Range oss004-1 3458764513820540919-3746994889972252661
Range oss004-2 3746994889972252662-4035225266123964404
Range oss004-3 4035225266123964405-4323455642275676147
Range oss004-4 4323455642275676148-4611686018427387890
</MetaHandleRanges>
<DataHandleRanges>
Range oss001-1 4611686018427387891-4899916394579099633
Range oss001-2 4899916394579099634-5188146770730811376
Range oss001-3 5188146770730811377-5476377146882523119
Range oss001-4 5476377146882523120-5764607523034234862
Range oss002-1 5764607523034234863-6052837899185946605
Range oss002-2 6052837899185946606-6341068275337658348
Range oss002-3 6341068275337658349-6629298651489370091
Range oss002-4 6629298651489370092-6917529027641081834
Range oss003-1 6917529027641081835-7205759403792793577
Range oss003-2 7205759403792793578-7493989779944505320
Range oss003-3 7493989779944505321-7782220156096217063
Range oss003-4 7782220156096217064-8070450532247928806
Range oss004-1 8070450532247928807-8358680908399640549
Range oss004-2 8358680908399640550-8646911284551352292
Range oss004-3 8646911284551352293-8935141660703064035
Range oss004-4 8935141660703064036-9223372036854775778
</DataHandleRanges>
<StorageHints>
TroveSyncMeta no
TroveSyncData no
TroveMethod alt-aio
</StorageHints>
<Distribution>
Name simple_stripe
Param strip_size
Value 1048576
</Distribution>
</Filesystem>
Thanks,
Randy
------------------------------------------------------------------------
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
------------------------------------------------------------------------
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
Index: src/io/trove/trove-dbpf/dbpf-dspace.c
===================================================================
RCS file: /projects/cvsroot/pvfs2-1/src/io/trove/trove-dbpf/dbpf-dspace.c,v
retrieving revision 1.163
diff -a -u -p -r1.163 dbpf-dspace.c
--- src/io/trove/trove-dbpf/dbpf-dspace.c 30 Jan 2009 15:41:08 -0000 1.163
+++ src/io/trove/trove-dbpf/dbpf-dspace.c 29 Jun 2009 20:39:39 -0000
@@ -886,6 +886,12 @@ static int dbpf_dspace_iterate_handles_o
if(sizeof_handle != sizeof(TROVE_handle) ||
sizeof_attr != sizeof(attr))
{
+ gossip_err("Error: got handle size %zd when expecting %d\n",
+ sizeof_handle, sizeof(TROVE_handle));
+ gossip_err("Error: got attr size %zd when expecting %d\n",
+ sizeof_attr, sizeof(attr));
+ if(sizeof_handle == sizeof(TROVE_handle))
+ gossip_err("Error iterating on handle %llu\n", llu(*(TROVE_handle *)tmp_handle));
/* something is wrong with the result */
ret = -TROVE_EINVAL;
goto return_error;
@@ -916,6 +922,12 @@ static int dbpf_dspace_iterate_handles_o
if(sizeof_handle != sizeof(TROVE_handle) ||
sizeof_attr != sizeof(attr))
{
+ gossip_err("Error: got handle size %zd when expecting %d\n",
+ sizeof_handle, sizeof(TROVE_handle));
+ gossip_err("Error: got attr size %zd when expecting %d\n",
+ sizeof_attr, sizeof(attr));
+ if(sizeof_handle == sizeof(TROVE_handle))
+ gossip_err("Error iterating on handle %llu\n", llu(*(TROVE_handle *)tmp_handle));
/* something is wrong with the result */
ret = -TROVE_EINVAL;
goto return_error;
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users