Thanks! I have applied the patch. I have replaced the old logs with the new ones. Just use the previous links. http://grid.ucy.ac.cy/file/pvfs_logwn140.grid.ucy.ac.cy http://grid.ucy.ac.cy/file/pvfs_logwn141.grid.ucy.ac.cy
thanks a lot for your help, On Mon, Apr 6, 2009 at 8:41 PM, Phil Carns <[email protected]> wrote: > Thanks for posting the logs. It looks like the create_list function in > within Trove actually generated the EINVAL error, but there aren't enough > log messages in that path to know why. > > Any chance you could apply the patch attached to this email and retry this > scenario (with verbose logging)? I'm hoping for some extra output after the > line that looks like this: > > (0x8d4f020) batch_create (prelude sm) state: perm_check (status = 0) > > thanks, > -Phil > > > Asterios Katsifodimos wrote: > >> Yes both of them. Because now both are Metadata servers. When I had one >> metadata and >> one IO server, the metadata server was not producing the errors until the >> IO server got up. >> From the time that the IO server gets up, the Metadata server is getting >> crazy... >> >> I have uploaded the log files here: >> http://grid.ucy.ac.cy/file/pvfs_logwn140.grid.ucy.ac.cy >> http://grid.ucy.ac.cy/file/pvfs_logwn141.grid.ucy.ac.cy >> >> have a look! >> >> thanks! >> On Mon, Apr 6, 2009 at 7:00 PM, Phil Carns <[email protected] <mailto: >> [email protected]>> wrote: >> >> Ok. Could you try "verbose" now as the log level? It is close to >> the "all" level but should only print information while the server >> is busy. >> >> Are both wn140 and wn141 showing the same batch create errors, or >> just one of them? >> >> >> thanks, >> -Phil >> >> Asterios Katsifodimos wrote: >> >> Hello Phil, >> >> Thanks for you answer. >> Yes I delete the storage dir every time I make a new configuration >> and I run the pvfs2-server -f command before starting the daemons. >> >> The only thing that I get from the servers is the batch_create, >> starting server, and the "PVFS2 server got signal 15 >> (server_status_flag: 507903" >> error message. Do you want me to try on an other log level? >> >> Also, this is how the server is configured: >> ***** Displaying PVFS Configuration Information ***** >> ------------------------------------------------------ >> PVFS2 configured to build karma gui : no >> PVFS2 configured to perform coverage analysis : no >> PVFS2 configured for aio threaded callbacks : yes >> PVFS2 configured to use FUSE : no >> PVFS2 configured for the 2.6.x kernel module : no >> PVFS2 configured for the 2.4.x kernel module : no >> PVFS2 configured for using the mmap-ra-cache : no >> PVFS2 will use workaround for redhat 2.4 kernels : no >> PVFS2 will use workaround for buggy NPTL : no >> PVFS2 server will be built : yes >> >> PVFS2 version string: 2.8.1 >> >> >> thanks again, >> On Mon, Apr 6, 2009 at 5:21 PM, Phil Carns <[email protected] >> <mailto:[email protected]> <mailto:[email protected] >> >> <mailto:[email protected]>>> wrote: >> >> Hello, >> >> I'm not sure what would cause that "Invalid argument" error. >> >> Could you try the following steps: >> >> - kill both servers >> - modify your configuration files to set "EventLogging" to >> "none" >> - delete your old log files (or move them to another directory) >> - start the servers >> >> You can then send us the complete contents of both log files >> and we >> can go from there. The "all" level is a little hard to >> interpret >> because it generates a lot of information even when servers >> are idle. >> >> Also, when you went from one server to two, did you delete >> your old >> storage space (/pvfs) and start over, or are you trying to >> keep that >> data and add servers to it? >> >> thanks! >> -Phil >> >> Asterios Katsifodimos wrote: >> >> Hello all, >> >> I have been trying to install PVFS 2.8.1 on Ubuntu server, >> Centos4 and >> Scientific Linux 4. I compile it and can run it on a "single >> host" configuration >> without any problems. >> >> However, when I add more nodes to the >> configuration(always using the >> pvfs2-genconfig defaults ) I have the following problem: >> >> *On the metadata node I get these messages:* >> [E 04/02 20:16] batch_create request got: Invalid argument >> [E 04/02 20:16] batch_create request got: Invalid argument >> [E 04/02 20:16] batch_create request got: Invalid argument >> [E 04/02 20:16] batch_create request got: Invalid argument >> >> >> *In the IO nodes I get:* >> [r...@wn140 ~]# tail -50 /tmp/pvfs2-server.log >> [D 04/02 23:53] BMI_testcontext completing: >> 18446744072456767880 >> [D 04/02 23:53] [SM Entering]: (0x88f8b00) >> msgpairarray_sm:complete (status: 1) >> [D 04/02 23:53] [SM frame get]: (0x88f8b00) op-id: 37 >> index: 0 >> base-frm: 1 >> [D 04/02 23:53] msgpairarray_complete: sm 0x88f8b00 >> status_user_tag 1 msgarray_count 1 >> [D 04/02 23:53] msgpairarray: 1 operations remain >> [D 04/02 23:53] [SM Exiting]: (0x88f8b00) >> msgpairarray_sm:complete (error code: -1073742006), (action: >> DEFERRED) >> [D 04/02 23:53] [SM Entering]: (0x88f8b00) >> msgpairarray_sm:complete (status: 0) >> [D 04/02 23:53] [SM frame get]: (0x88f8b00) op-id: 37 >> index: 0 >> base-frm: 1 >> [D 04/02 23:53] msgpairarray_complete: sm 0x88f8b00 >> status_user_tag 0 msgarray_count 1 >> [D 04/02 23:53] msgpairarray: all operations complete >> [D 04/02 23:53] [SM Exiting]: (0x88f8b00) >> msgpairarray_sm:complete (error code: 190), (action: >> COMPLETE) >> [D 04/02 23:53] [SM Entering]: (0x88f8b00) >> msgpairarray_sm:completion_fn (status: 0) >> [D 04/02 23:53] [SM frame get]: (0x88f8b00) op-id: 37 >> index: 0 >> base-frm: 1 >> [D 04/02 23:53] (0x88f8b00) msgpairarray state: >> completion_fn >> [E 04/02 23:53] Warning: msgpair failed to tcp://wn141:3334, >> will retry: Connection refused >> [D 04/02 23:53] *** msgpairarray_completion_fn: msgpair 0 >> failed, retry 1 >> [D 04/02 23:53] *** msgpairarray_completion_fn: msgpair >> retrying >> after delay. >> [D 04/02 23:53] [SM Exiting]: (0x88f8b00) >> msgpairarray_sm:completion_fn (error code: 191), (action: >> COMPLETE) >> [D 04/02 23:53] [SM Entering]: (0x88f8b00) >> msgpairarray_sm:post_retry (status: 0) >> [D 04/02 23:53] [SM frame get]: (0x88f8b00) op-id: 37 >> index: 0 >> base-frm: 1 >> [D 04/02 23:53] msgpairarray_post_retry: sm 0x88f8b00, >> wait 2000 ms >> [D 04/02 23:53] [SM Exiting]: (0x88f8b00) >> msgpairarray_sm:post_retry (error code: 0), (action: >> DEFERRED) >> [D 04/02 23:53] [SM Entering]: (0x89476c0) >> perf_update_sm:do_work (status: 0) >> [P 04/02 23:53] Start times (hr:min:sec): 23:53:11.330 >> 23:53:10.310 23:53:09.287 23:53:08.268 23:53:07.245 >> 23:53:06.225 >> [P 04/02 23:53] Intervals (hr:min:sec) : 00:00:01.026 >> 00:00:01.020 00:00:01.023 00:00:01.019 00:00:01.023 >> 00:00:01.020 >> [P 04/02 23:53] >> >> ------------------------------------------------------------------------------------------------------------- >> [P 04/02 23:53] bytes read : 0 >> 0 0 0 0 >> 0 >> [P 04/02 23:53] bytes written : 0 >> 0 0 0 0 >> 0 >> [P 04/02 23:53] metadata reads : 0 >> 0 0 0 0 >> 0 >> [P 04/02 23:53] metadata writes : 0 >> 0 0 0 0 >> 0 >> [P 04/02 23:53] metadata dspace ops : 0 >> 0 0 0 0 >> 0 >> [P 04/02 23:53] metadata keyval ops : 1 >> 1 1 1 1 >> 1 >> [P 04/02 23:53] request scheduler : 0 >> 0 0 0 0 >> 0 >> [D 04/02 23:53] [SM Exiting]: (0x89476c0) >> perf_update_sm:do_work >> (error code: 0), (action: DEFERRED) >> [D 04/02 23:53] [SM Entering]: (0x8948810) >> job_timer_sm:do_work >> (status: 0) >> [D 04/02 23:53] [SM Exiting]: (0x8948810) >> job_timer_sm:do_work >> (error code: 0), (action: DEFERRED) >> [D 04/02 23:53] [SM Entering]: (0x89476c0) >> perf_update_sm:do_work (status: 0) >> [P 04/02 23:53] Start times (hr:min:sec): 23:53:12.356 >> 23:53:11.330 23:53:10.310 23:53:09.287 23:53:08.268 >> 23:53:07.245 >> [P 04/02 23:53] Intervals (hr:min:sec) : 00:00:01.020 >> 00:00:01.026 00:00:01.020 00:00:01.023 00:00:01.019 >> 00:00:01.023 >> [P 04/02 23:53] >> >> ------------------------------------------------------------------------------------------------------------- >> [P 04/02 23:53] bytes read : 0 >> 0 0 0 0 >> 0 >> [P 04/02 23:53] bytes written : 0 >> 0 0 0 0 >> 0 >> [P 04/02 23:53] metadata reads : 0 >> 0 0 0 0 >> 0 >> [P 04/02 23:53] metadata writes : 0 >> 0 0 0 0 >> 0 >> [P 04/02 23:53] metadata dspace ops : 0 >> 0 0 0 0 >> 0 >> [P 04/02 23:53] metadata keyval ops : 1 >> 1 1 1 1 >> 1 >> [P 04/02 23:53] request scheduler : 0 >> 0 0 0 0 >> 0 >> [D 04/02 23:53] [SM Exiting]: (0x89476c0) >> perf_update_sm:do_work >> (error code: 0), (action: DEFERRED) >> [D 04/02 23:53] [SM Entering]: (0x8948810) >> job_timer_sm:do_work >> (status: 0) >> [D 04/02 23:53] [SM Exiting]: (0x8948810) >> job_timer_sm:do_work >> (error code: 0), (action: DEFERRED) >> >> >> The metadata node keeps asking for something that the IO >> nodes >> cannot give >> the right way. So it complains. This makes the nodes and the >> metadata node >> not to work. >> >> I have installed those services many times. I have tested >> this >> using berkeley >> db 4.2 and 4.3 on Redhat systems(centos, scientific >> linnux) and >> on Ubuntu server. >> >> I have also tried the PVFS version 2.6.3 and I get the >> same problem. >> >> *My config files look like:* >> [r...@wn140 ~]# more /etc/pvfs2-fs.conf >> <Defaults> >> UnexpectedRequests 50 >> EventLogging all >> EnableTracing no >> LogStamp datetime >> BMIModules bmi_tcp >> FlowModules flowproto_multiqueue >> PerfUpdateInterval 1000 >> ServerJobBMITimeoutSecs 30 >> ServerJobFlowTimeoutSecs 30 >> ClientJobBMITimeoutSecs 300 >> ClientJobFlowTimeoutSecs 300 >> ClientRetryLimit 5 >> ClientRetryDelayMilliSecs 2000 >> PrecreateBatchSize 512 >> PrecreateLowThreshold 256 >> >> StorageSpace /pvfs >> LogFile /tmp/pvfs2-server.log >> </Defaults> >> >> <Aliases> >> Alias wn140 tcp://wn140:3334 >> Alias wn141 tcp://wn141:3334 >> </Aliases> >> >> <Filesystem> >> Name pvfs2-fs >> ID 320870944 >> RootHandle 1048576 >> FileStuffing yes >> <MetaHandleRanges> >> Range wn140 3-2305843009213693953 >> Range wn141 2305843009213693954-4611686018427387904 >> </MetaHandleRanges> >> <DataHandleRanges> >> Range wn140 4611686018427387905-6917529027641081855 >> Range wn141 6917529027641081856-9223372036854775806 >> </DataHandleRanges> >> <StorageHints> >> TroveSyncMeta yes >> TroveSyncData no >> TroveMethod alt-aio >> </StorageHints> >> </Filesystem> >> >> >> My setup is made from two nodes that are both IO and >> Metadata >> nodes. I have also tried >> a 4 node setup with 2I/O - 2 MD nodes resulting in the >> same thing. >> >> Any suggestions? >> >> thank you in advance, >> -- >> Asterios Katsifodimos >> High Performance Computing systems Lab >> Department of Computer Science, University of Cyprus >> http://www.asteriosk.gr <http://www.asteriosk.gr/> >> >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Pvfs2-users mailing list >> [email protected] >> <mailto:[email protected]> >> <mailto:[email protected] >> <mailto:[email protected]>> >> >> >> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users >> >> >> >> >> >> >
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
