Thanks!
I have applied the patch.

I have replaced the old logs with the new ones. Just use the previous links.
http://grid.ucy.ac.cy/file/pvfs_logwn140.grid.ucy.ac.cy
http://grid.ucy.ac.cy/file/pvfs_logwn141.grid.ucy.ac.cy

thanks a lot for your help,
On Mon, Apr 6, 2009 at 8:41 PM, Phil Carns <[email protected]> wrote:

> Thanks for posting the logs.  It looks like the create_list function in
> within Trove actually generated the EINVAL error, but there aren't enough
> log messages in that path to know why.
>
> Any chance you could apply the patch attached to this email and retry this
> scenario (with verbose logging)?  I'm hoping for some extra output after the
> line that looks like this:
>
> (0x8d4f020) batch_create (prelude sm) state: perm_check (status = 0)
>
> thanks,
> -Phil
>
>
> Asterios Katsifodimos wrote:
>
>> Yes both of them. Because now both are Metadata servers. When I had one
>> metadata and
>> one IO server, the metadata server was not producing the errors until the
>> IO server got up.
>>  From the time that the IO server gets up, the Metadata server is getting
>> crazy...
>>
>> I have uploaded the log files here:
>> http://grid.ucy.ac.cy/file/pvfs_logwn140.grid.ucy.ac.cy
>> http://grid.ucy.ac.cy/file/pvfs_logwn141.grid.ucy.ac.cy
>>
>> have a look!
>>
>> thanks!
>> On Mon, Apr 6, 2009 at 7:00 PM, Phil Carns <[email protected] <mailto:
>> [email protected]>> wrote:
>>
>>    Ok.  Could you try "verbose" now as the log level?  It is close to
>>    the "all" level but should only print information while the server
>>    is busy.
>>
>>    Are both wn140 and wn141 showing the same batch create errors, or
>>    just one of them?
>>
>>
>>    thanks,
>>    -Phil
>>
>>    Asterios Katsifodimos wrote:
>>
>>        Hello Phil,
>>
>>        Thanks for you answer.
>>        Yes I delete the storage dir every time I make a new configuration
>>        and I run the pvfs2-server -f command before starting the daemons.
>>
>>        The only thing that I get from the servers is the batch_create,
>>        starting server, and the "PVFS2 server got signal 15
>>        (server_status_flag: 507903"
>>        error message. Do you want me to try on an other log level?
>>
>>        Also, this is how the server is configured:
>>        ***** Displaying PVFS Configuration Information *****
>>        ------------------------------------------------------
>>        PVFS2 configured to build karma gui               :  no
>>        PVFS2 configured to perform coverage analysis     :  no
>>        PVFS2 configured for aio threaded callbacks       : yes
>>        PVFS2 configured to use FUSE                      :  no
>>        PVFS2 configured for the 2.6.x kernel module      :  no
>>        PVFS2 configured for the 2.4.x kernel module      :  no
>>        PVFS2 configured for using the mmap-ra-cache      :  no
>>        PVFS2 will use workaround for redhat 2.4 kernels  :  no
>>        PVFS2 will use workaround for buggy NPTL          :  no
>>        PVFS2 server will be built                        : yes
>>
>>        PVFS2 version string: 2.8.1
>>
>>
>>        thanks again,
>>        On Mon, Apr 6, 2009 at 5:21 PM, Phil Carns <[email protected]
>>        <mailto:[email protected]> <mailto:[email protected]
>>
>>        <mailto:[email protected]>>> wrote:
>>
>>           Hello,
>>
>>           I'm not sure what would cause that "Invalid argument" error.
>>
>>           Could you try the following steps:
>>
>>           - kill both servers
>>           - modify your configuration files to set "EventLogging" to
>> "none"
>>           - delete your old log files (or move them to another directory)
>>           - start the servers
>>
>>           You can then send us the complete contents of both log files
>>        and we
>>           can go from there.  The "all" level is a little hard to
>> interpret
>>           because it generates a lot of information even when servers
>>        are idle.
>>
>>           Also, when you went from one server to two, did you delete
>>        your old
>>           storage space (/pvfs) and start over, or are you trying to
>>        keep that
>>           data and add servers to it?
>>
>>           thanks!
>>           -Phil
>>
>>           Asterios Katsifodimos wrote:
>>
>>               Hello all,
>>
>>               I have been trying to install PVFS 2.8.1 on Ubuntu server,
>>               Centos4 and
>>               Scientific Linux 4. I compile it and can run it on a "single
>>               host" configuration
>>               without any problems.
>>
>>               However, when I add more nodes to the
>>        configuration(always using the
>>               pvfs2-genconfig defaults ) I have the following problem:
>>
>>               *On the metadata node I get these messages:*
>>               [E 04/02 20:16] batch_create request got: Invalid argument
>>               [E 04/02 20:16] batch_create request got: Invalid argument
>>               [E 04/02 20:16] batch_create request got: Invalid argument
>>               [E 04/02 20:16] batch_create request got: Invalid argument
>>
>>
>>               *In the IO nodes I get:*
>>               [r...@wn140 ~]# tail -50 /tmp/pvfs2-server.log
>>               [D 04/02 23:53] BMI_testcontext completing:
>>        18446744072456767880
>>               [D 04/02 23:53] [SM Entering]: (0x88f8b00)
>>               msgpairarray_sm:complete (status: 1)
>>               [D 04/02 23:53] [SM frame get]: (0x88f8b00) op-id: 37
>>        index: 0
>>               base-frm: 1
>>               [D 04/02 23:53] msgpairarray_complete: sm 0x88f8b00
>>               status_user_tag 1 msgarray_count 1
>>               [D 04/02 23:53]   msgpairarray: 1 operations remain
>>               [D 04/02 23:53] [SM Exiting]: (0x88f8b00)
>>               msgpairarray_sm:complete (error code: -1073742006), (action:
>>               DEFERRED)
>>               [D 04/02 23:53] [SM Entering]: (0x88f8b00)
>>               msgpairarray_sm:complete (status: 0)
>>               [D 04/02 23:53] [SM frame get]: (0x88f8b00) op-id: 37
>>        index: 0
>>               base-frm: 1
>>               [D 04/02 23:53] msgpairarray_complete: sm 0x88f8b00
>>               status_user_tag 0 msgarray_count 1
>>               [D 04/02 23:53]   msgpairarray: all operations complete
>>               [D 04/02 23:53] [SM Exiting]: (0x88f8b00)
>>               msgpairarray_sm:complete (error code: 190), (action:
>>        COMPLETE)
>>               [D 04/02 23:53] [SM Entering]: (0x88f8b00)
>>               msgpairarray_sm:completion_fn (status: 0)
>>               [D 04/02 23:53] [SM frame get]: (0x88f8b00) op-id: 37
>>        index: 0
>>               base-frm: 1
>>               [D 04/02 23:53] (0x88f8b00) msgpairarray state:
>> completion_fn
>>               [E 04/02 23:53] Warning: msgpair failed to tcp://wn141:3334,
>>               will retry: Connection refused
>>               [D 04/02 23:53] *** msgpairarray_completion_fn: msgpair 0
>>               failed, retry 1
>>               [D 04/02 23:53] *** msgpairarray_completion_fn: msgpair
>>        retrying
>>               after delay.
>>               [D 04/02 23:53] [SM Exiting]: (0x88f8b00)
>>               msgpairarray_sm:completion_fn (error code: 191), (action:
>>        COMPLETE)
>>               [D 04/02 23:53] [SM Entering]: (0x88f8b00)
>>               msgpairarray_sm:post_retry (status: 0)
>>               [D 04/02 23:53] [SM frame get]: (0x88f8b00) op-id: 37
>>        index: 0
>>               base-frm: 1
>>               [D 04/02 23:53] msgpairarray_post_retry: sm 0x88f8b00,
>>        wait 2000 ms
>>               [D 04/02 23:53] [SM Exiting]: (0x88f8b00)
>>               msgpairarray_sm:post_retry (error code: 0), (action:
>>        DEFERRED)
>>               [D 04/02 23:53] [SM Entering]: (0x89476c0)
>>               perf_update_sm:do_work (status: 0)
>>               [P 04/02 23:53] Start times (hr:min:sec):  23:53:11.330
>>                23:53:10.310  23:53:09.287  23:53:08.268  23:53:07.245
>>                23:53:06.225
>>               [P 04/02 23:53] Intervals (hr:min:sec)  :  00:00:01.026
>>                00:00:01.020  00:00:01.023  00:00:01.019  00:00:01.023
>>                00:00:01.020
>>               [P 04/02 23:53]
>>
>> -------------------------------------------------------------------------------------------------------------
>>               [P 04/02 23:53] bytes read              :             0
>>                       0             0             0             0
>>          0
>>               [P 04/02 23:53] bytes written           :             0
>>                       0             0             0             0
>>          0
>>               [P 04/02 23:53] metadata reads          :             0
>>                       0             0             0             0
>>          0
>>               [P 04/02 23:53] metadata writes         :             0
>>                       0             0             0             0
>>          0
>>               [P 04/02 23:53] metadata dspace ops     :             0
>>                       0             0             0             0
>>          0
>>               [P 04/02 23:53] metadata keyval ops     :             1
>>                       1             1             1             1
>>          1
>>               [P 04/02 23:53] request scheduler       :             0
>>                       0             0             0             0
>>          0
>>               [D 04/02 23:53] [SM Exiting]: (0x89476c0)
>>        perf_update_sm:do_work
>>               (error code: 0), (action: DEFERRED)
>>               [D 04/02 23:53] [SM Entering]: (0x8948810)
>>        job_timer_sm:do_work
>>               (status: 0)
>>               [D 04/02 23:53] [SM Exiting]: (0x8948810)
>>        job_timer_sm:do_work
>>               (error code: 0), (action: DEFERRED)
>>               [D 04/02 23:53] [SM Entering]: (0x89476c0)
>>               perf_update_sm:do_work (status: 0)
>>               [P 04/02 23:53] Start times (hr:min:sec):  23:53:12.356
>>                23:53:11.330  23:53:10.310  23:53:09.287  23:53:08.268
>>                23:53:07.245
>>               [P 04/02 23:53] Intervals (hr:min:sec)  :  00:00:01.020
>>                00:00:01.026  00:00:01.020  00:00:01.023  00:00:01.019
>>                00:00:01.023
>>               [P 04/02 23:53]
>>
>> -------------------------------------------------------------------------------------------------------------
>>               [P 04/02 23:53] bytes read              :             0
>>                       0             0             0             0
>>          0
>>               [P 04/02 23:53] bytes written           :             0
>>                       0             0             0             0
>>          0
>>               [P 04/02 23:53] metadata reads          :             0
>>                       0             0             0             0
>>          0
>>               [P 04/02 23:53] metadata writes         :             0
>>                       0             0             0             0
>>          0
>>               [P 04/02 23:53] metadata dspace ops     :             0
>>                       0             0             0             0
>>          0
>>               [P 04/02 23:53] metadata keyval ops     :             1
>>                       1             1             1             1
>>          1
>>               [P 04/02 23:53] request scheduler       :             0
>>                       0             0             0             0
>>          0
>>               [D 04/02 23:53] [SM Exiting]: (0x89476c0)
>>        perf_update_sm:do_work
>>               (error code: 0), (action: DEFERRED)
>>               [D 04/02 23:53] [SM Entering]: (0x8948810)
>>        job_timer_sm:do_work
>>               (status: 0)
>>               [D 04/02 23:53] [SM Exiting]: (0x8948810)
>>        job_timer_sm:do_work
>>               (error code: 0), (action: DEFERRED)
>>
>>
>>               The metadata node keeps asking for something that the IO
>>        nodes
>>               cannot give
>>               the right way. So it complains. This makes the nodes and the
>>               metadata node
>>               not to work.
>>
>>               I have installed those services many times. I have tested
>>        this
>>               using berkeley
>>               db 4.2 and 4.3 on Redhat systems(centos, scientific
>>        linnux) and
>>               on Ubuntu server.
>>
>>               I have also tried the PVFS version 2.6.3 and I get the
>>        same problem.
>>
>>               *My config files look like:*
>>               [r...@wn140 ~]# more /etc/pvfs2-fs.conf
>>               <Defaults>
>>                  UnexpectedRequests 50
>>                  EventLogging all
>>                  EnableTracing no
>>                  LogStamp datetime
>>                  BMIModules bmi_tcp
>>                  FlowModules flowproto_multiqueue
>>                  PerfUpdateInterval 1000
>>                  ServerJobBMITimeoutSecs 30
>>                  ServerJobFlowTimeoutSecs 30
>>                  ClientJobBMITimeoutSecs 300
>>                  ClientJobFlowTimeoutSecs 300
>>                  ClientRetryLimit 5
>>                  ClientRetryDelayMilliSecs 2000
>>                  PrecreateBatchSize 512
>>                  PrecreateLowThreshold 256
>>
>>                  StorageSpace /pvfs
>>                  LogFile /tmp/pvfs2-server.log
>>               </Defaults>
>>
>>               <Aliases>
>>                  Alias wn140 tcp://wn140:3334
>>                  Alias wn141 tcp://wn141:3334
>>               </Aliases>
>>
>>               <Filesystem>
>>                  Name pvfs2-fs
>>                  ID 320870944
>>                  RootHandle 1048576
>>                  FileStuffing yes
>>                  <MetaHandleRanges>
>>                      Range wn140 3-2305843009213693953
>>                      Range wn141 2305843009213693954-4611686018427387904
>>                  </MetaHandleRanges>
>>                  <DataHandleRanges>
>>                      Range wn140 4611686018427387905-6917529027641081855
>>                      Range wn141 6917529027641081856-9223372036854775806
>>                  </DataHandleRanges>
>>                  <StorageHints>
>>                      TroveSyncMeta yes
>>                      TroveSyncData no
>>                      TroveMethod alt-aio
>>                  </StorageHints>
>>               </Filesystem>
>>
>>
>>               My setup is made from two nodes that are both IO and
>> Metadata
>>               nodes. I have also tried
>>               a 4 node setup with 2I/O - 2 MD nodes resulting in the
>>        same thing.
>>
>>               Any suggestions?
>>
>>               thank you in advance,
>>               --
>>               Asterios Katsifodimos
>>               High Performance Computing systems Lab
>>               Department of Computer Science, University of Cyprus
>>               http://www.asteriosk.gr <http://www.asteriosk.gr/>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>>               _______________________________________________
>>               Pvfs2-users mailing list
>>               [email protected]
>>        <mailto:[email protected]>
>>               <mailto:[email protected]
>>        <mailto:[email protected]>>
>>
>>
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>
>>
>>
>>
>>
>>
>
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to