That didn't show what I expected at all. It must have hit a safety check on the request parameters. Could you try adding in the attached patch as well?

What kind of systems are these? Are the two servers different architectures by any chance?

thanks,
-Phil

Asterios Katsifodimos wrote:
Thanks!
I have applied the patch.

I have replaced the old logs with the new ones. Just use the previous links.
http://grid.ucy.ac.cy/file/pvfs_logwn140.grid.ucy.ac.cy
http://grid.ucy.ac.cy/file/pvfs_logwn141.grid.ucy.ac.cy

thanks a lot for your help,
On Mon, Apr 6, 2009 at 8:41 PM, Phil Carns <[email protected] <mailto:[email protected]>> wrote:

    Thanks for posting the logs.  It looks like the create_list function
    in within Trove actually generated the EINVAL error, but there
    aren't enough log messages in that path to know why.

    Any chance you could apply the patch attached to this email and
    retry this scenario (with verbose logging)?  I'm hoping for some
    extra output after the line that looks like this:

    (0x8d4f020) batch_create (prelude sm) state: perm_check (status = 0)


    thanks,
    -Phil


    Asterios Katsifodimos wrote:

        Yes both of them. Because now both are Metadata servers. When I
        had one metadata and
        one IO server, the metadata server was not producing the errors
        until the IO server got up.
         From the time that the IO server gets up, the Metadata server
        is getting crazy...

        I have uploaded the log files here:
        http://grid.ucy.ac.cy/file/pvfs_logwn140.grid.ucy.ac.cy
        http://grid.ucy.ac.cy/file/pvfs_logwn141.grid.ucy.ac.cy

        have a look!

        thanks!
        On Mon, Apr 6, 2009 at 7:00 PM, Phil Carns <[email protected]
        <mailto:[email protected]> <mailto:[email protected]
        <mailto:[email protected]>>> wrote:

           Ok.  Could you try "verbose" now as the log level?  It is
        close to
           the "all" level but should only print information while the
        server
           is busy.

           Are both wn140 and wn141 showing the same batch create errors, or
           just one of them?


           thanks,
           -Phil

           Asterios Katsifodimos wrote:

               Hello Phil,

               Thanks for you answer.
               Yes I delete the storage dir every time I make a new
        configuration
               and I run the pvfs2-server -f command before starting the
        daemons.

               The only thing that I get from the servers is the
        batch_create,
               starting server, and the "PVFS2 server got signal 15
               (server_status_flag: 507903"
               error message. Do you want me to try on an other log level?

               Also, this is how the server is configured:
               ***** Displaying PVFS Configuration Information *****
               ------------------------------------------------------
               PVFS2 configured to build karma gui               :  no
               PVFS2 configured to perform coverage analysis     :  no
               PVFS2 configured for aio threaded callbacks       : yes
               PVFS2 configured to use FUSE                      :  no
               PVFS2 configured for the 2.6.x kernel module      :  no
               PVFS2 configured for the 2.4.x kernel module      :  no
               PVFS2 configured for using the mmap-ra-cache      :  no
               PVFS2 will use workaround for redhat 2.4 kernels  :  no
               PVFS2 will use workaround for buggy NPTL          :  no
               PVFS2 server will be built                        : yes

               PVFS2 version string: 2.8.1


               thanks again,
               On Mon, Apr 6, 2009 at 5:21 PM, Phil Carns
        <[email protected] <mailto:[email protected]>
               <mailto:[email protected] <mailto:[email protected]>>
        <mailto:[email protected] <mailto:[email protected]>

               <mailto:[email protected] <mailto:[email protected]>>>>
        wrote:

                  Hello,

                  I'm not sure what would cause that "Invalid argument"
        error.

                  Could you try the following steps:

                  - kill both servers
                  - modify your configuration files to set
        "EventLogging" to "none"
                  - delete your old log files (or move them to another
        directory)
                  - start the servers

                  You can then send us the complete contents of both log
        files
               and we
                  can go from there.  The "all" level is a little hard
        to interpret
                  because it generates a lot of information even when
        servers
               are idle.

                  Also, when you went from one server to two, did you delete
               your old
                  storage space (/pvfs) and start over, or are you trying to
               keep that
                  data and add servers to it?

                  thanks!
                  -Phil

                  Asterios Katsifodimos wrote:

                      Hello all,

                      I have been trying to install PVFS 2.8.1 on Ubuntu
        server,
                      Centos4 and
                      Scientific Linux 4. I compile it and can run it on
        a "single
                      host" configuration
                      without any problems.

                      However, when I add more nodes to the
               configuration(always using the
                      pvfs2-genconfig defaults ) I have the following
        problem:

                      *On the metadata node I get these messages:*
                      [E 04/02 20:16] batch_create request got: Invalid
        argument
                      [E 04/02 20:16] batch_create request got: Invalid
        argument
                      [E 04/02 20:16] batch_create request got: Invalid
        argument
                      [E 04/02 20:16] batch_create request got: Invalid
        argument


                      *In the IO nodes I get:*
                      [r...@wn140 ~]# tail -50 /tmp/pvfs2-server.log
                      [D 04/02 23:53] BMI_testcontext completing:
               18446744072456767880
                      [D 04/02 23:53] [SM Entering]: (0x88f8b00)
                      msgpairarray_sm:complete (status: 1)
                      [D 04/02 23:53] [SM frame get]: (0x88f8b00) op-id: 37
               index: 0
                      base-frm: 1
                      [D 04/02 23:53] msgpairarray_complete: sm 0x88f8b00
                      status_user_tag 1 msgarray_count 1
                      [D 04/02 23:53]   msgpairarray: 1 operations remain
                      [D 04/02 23:53] [SM Exiting]: (0x88f8b00)
                      msgpairarray_sm:complete (error code:
        -1073742006), (action:
                      DEFERRED)
                      [D 04/02 23:53] [SM Entering]: (0x88f8b00)
                      msgpairarray_sm:complete (status: 0)
                      [D 04/02 23:53] [SM frame get]: (0x88f8b00) op-id: 37
               index: 0
                      base-frm: 1
                      [D 04/02 23:53] msgpairarray_complete: sm 0x88f8b00
                      status_user_tag 0 msgarray_count 1
                      [D 04/02 23:53]   msgpairarray: all operations
        complete
                      [D 04/02 23:53] [SM Exiting]: (0x88f8b00)
                      msgpairarray_sm:complete (error code: 190), (action:
               COMPLETE)
                      [D 04/02 23:53] [SM Entering]: (0x88f8b00)
                      msgpairarray_sm:completion_fn (status: 0)
                      [D 04/02 23:53] [SM frame get]: (0x88f8b00) op-id: 37
               index: 0
                      base-frm: 1
                      [D 04/02 23:53] (0x88f8b00) msgpairarray state:
        completion_fn
                      [E 04/02 23:53] Warning: msgpair failed to
        tcp://wn141:3334,
                      will retry: Connection refused
                      [D 04/02 23:53] *** msgpairarray_completion_fn:
        msgpair 0
                      failed, retry 1
                      [D 04/02 23:53] *** msgpairarray_completion_fn:
        msgpair
               retrying
                      after delay.
                      [D 04/02 23:53] [SM Exiting]: (0x88f8b00)
                      msgpairarray_sm:completion_fn (error code: 191),
        (action:
               COMPLETE)
                      [D 04/02 23:53] [SM Entering]: (0x88f8b00)
                      msgpairarray_sm:post_retry (status: 0)
                      [D 04/02 23:53] [SM frame get]: (0x88f8b00) op-id: 37
               index: 0
                      base-frm: 1
                      [D 04/02 23:53] msgpairarray_post_retry: sm 0x88f8b00,
               wait 2000 ms
                      [D 04/02 23:53] [SM Exiting]: (0x88f8b00)
                      msgpairarray_sm:post_retry (error code: 0), (action:
               DEFERRED)
                      [D 04/02 23:53] [SM Entering]: (0x89476c0)
                      perf_update_sm:do_work (status: 0)
                      [P 04/02 23:53] Start times (hr:min:sec):
         23:53:11.330
                       23:53:10.310  23:53:09.287  23:53:08.268
         23:53:07.245
                       23:53:06.225
                      [P 04/02 23:53] Intervals (hr:min:sec)  :
         00:00:01.026
                       00:00:01.020  00:00:01.023  00:00:01.019
         00:00:01.023
                       00:00:01.020
                      [P 04/02 23:53]
------------------------------------------------------------------------------------------------------------- [P 04/02 23:53] bytes read : 0 0 0 0 0 0 [P 04/02 23:53] bytes written : 0 0 0 0 0 0 [P 04/02 23:53] metadata reads : 0 0 0 0 0 0 [P 04/02 23:53] metadata writes : 0 0 0 0 0 0 [P 04/02 23:53] metadata dspace ops : 0 0 0 0 0 0 [P 04/02 23:53] metadata keyval ops : 1 1 1 1 1 1 [P 04/02 23:53] request scheduler : 0 0 0 0 0 0
                      [D 04/02 23:53] [SM Exiting]: (0x89476c0)
               perf_update_sm:do_work
                      (error code: 0), (action: DEFERRED)
                      [D 04/02 23:53] [SM Entering]: (0x8948810)
               job_timer_sm:do_work
                      (status: 0)
                      [D 04/02 23:53] [SM Exiting]: (0x8948810)
               job_timer_sm:do_work
                      (error code: 0), (action: DEFERRED)
                      [D 04/02 23:53] [SM Entering]: (0x89476c0)
                      perf_update_sm:do_work (status: 0)
                      [P 04/02 23:53] Start times (hr:min:sec):
         23:53:12.356
                       23:53:11.330  23:53:10.310  23:53:09.287
         23:53:08.268
                       23:53:07.245
                      [P 04/02 23:53] Intervals (hr:min:sec)  :
         00:00:01.020
                       00:00:01.026  00:00:01.020  00:00:01.023
         00:00:01.019
                       00:00:01.023
                      [P 04/02 23:53]
------------------------------------------------------------------------------------------------------------- [P 04/02 23:53] bytes read : 0 0 0 0 0 0 [P 04/02 23:53] bytes written : 0 0 0 0 0 0 [P 04/02 23:53] metadata reads : 0 0 0 0 0 0 [P 04/02 23:53] metadata writes : 0 0 0 0 0 0 [P 04/02 23:53] metadata dspace ops : 0 0 0 0 0 0 [P 04/02 23:53] metadata keyval ops : 1 1 1 1 1 1 [P 04/02 23:53] request scheduler : 0 0 0 0 0 0
                      [D 04/02 23:53] [SM Exiting]: (0x89476c0)
               perf_update_sm:do_work
                      (error code: 0), (action: DEFERRED)
                      [D 04/02 23:53] [SM Entering]: (0x8948810)
               job_timer_sm:do_work
                      (status: 0)
                      [D 04/02 23:53] [SM Exiting]: (0x8948810)
               job_timer_sm:do_work
                      (error code: 0), (action: DEFERRED)


                      The metadata node keeps asking for something that
        the IO
               nodes
                      cannot give
                      the right way. So it complains. This makes the
        nodes and the
                      metadata node
                      not to work.

                      I have installed those services many times. I have
        tested
               this
                      using berkeley
                      db 4.2 and 4.3 on Redhat systems(centos, scientific
               linnux) and
                      on Ubuntu server.

                      I have also tried the PVFS version 2.6.3 and I get the
               same problem.

                      *My config files look like:*
                      [r...@wn140 ~]# more /etc/pvfs2-fs.conf
                      <Defaults>
                         UnexpectedRequests 50
                         EventLogging all
                         EnableTracing no
                         LogStamp datetime
                         BMIModules bmi_tcp
                         FlowModules flowproto_multiqueue
                         PerfUpdateInterval 1000
                         ServerJobBMITimeoutSecs 30
                         ServerJobFlowTimeoutSecs 30
                         ClientJobBMITimeoutSecs 300
                         ClientJobFlowTimeoutSecs 300
                         ClientRetryLimit 5
                         ClientRetryDelayMilliSecs 2000
                         PrecreateBatchSize 512
                         PrecreateLowThreshold 256

                         StorageSpace /pvfs
                         LogFile /tmp/pvfs2-server.log
                      </Defaults>

                      <Aliases>
                         Alias wn140 tcp://wn140:3334
                         Alias wn141 tcp://wn141:3334
                      </Aliases>

                      <Filesystem>
                         Name pvfs2-fs
                         ID 320870944
                         RootHandle 1048576
                         FileStuffing yes
                         <MetaHandleRanges>
                             Range wn140 3-2305843009213693953
                             Range wn141
        2305843009213693954-4611686018427387904
                         </MetaHandleRanges>
                         <DataHandleRanges>
                             Range wn140
        4611686018427387905-6917529027641081855
                             Range wn141
        6917529027641081856-9223372036854775806
                         </DataHandleRanges>
                         <StorageHints>
                             TroveSyncMeta yes
                             TroveSyncData no
                             TroveMethod alt-aio
                         </StorageHints>
                      </Filesystem>


                      My setup is made from two nodes that are both IO
        and Metadata
                      nodes. I have also tried
                      a 4 node setup with 2I/O - 2 MD nodes resulting in the
               same thing.

                      Any suggestions?

                      thank you in advance,
                      --
                      Asterios Katsifodimos
                      High Performance Computing systems Lab
                      Department of Computer Science, University of Cyprus
                      http://www.asteriosk.gr <http://www.asteriosk.gr/>


------------------------------------------------------------------------

                      _______________________________________________
                      Pvfs2-users mailing list
                      [email protected]
        <mailto:[email protected]>
               <mailto:[email protected]
        <mailto:[email protected]>>
                      <mailto:[email protected]
        <mailto:[email protected]>
               <mailto:[email protected]
        <mailto:[email protected]>>>

http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users








? log.txt
? doc/citeseer.bib
? doc/foo.patch
? doc/google-scholar.bib
? examples/heartbeat/.cib.xml.example.swp
? src/apps/admin/boom.conf
? src/apps/admin/foo.conf
? src/apps/admin/pvfs2-cp-threadtest.c
? src/client/sysint/.mgmt-setparam-list.sm.swp
? src/io/bmi/bmi_tcp.tgz
? src/io/bmi/bmi_mx/log.txt
? src/io/bmi/bmi_tcp/bmi-tcp.c.hacked
? src/io/bmi/bmi_tcp/bmi-tcp.c.hacked2
? src/io/bmi/bmi_tcp/foo.patch
? src/io/bmi/bmi_tcp/log.txt
? src/io/bmi/bmi_tcp/socket-collection-epoll.c.hacked
? src/io/bmi/bmi_tcp/socket-collection-epoll.c.pipe
? src/io/bmi/bmi_tcp/socket-collection-epoll.h.hacked
? src/io/bmi/bmi_tcp/socket-collection-epoll.h.pipe
? src/io/flow/flowproto-bmi-trove/flowproto-multiqueue.c.backup
? src/kernel/linux-2.6/597.patch
? src/kernel/linux-2.6/foo.patch
? src/kernel/linux-2.6/log.txt
? src/kernel/linux-2.6/out.txt
? src/proto/.pvfs2-req-proto.h.swp
? test/automated/mpi-vfs-tests.d/fsx-mpi
? test/automated/sysint-tests.d/out.txt
? test/automated/vfs-tests.d/fsx-bin
? test/client/mpi-io/mpi-io-test
? test/client/mpi-io/test.out
Index: src/server/batch-create.sm
===================================================================
RCS file: /projects/cvsroot/pvfs2-1/src/server/batch-create.sm,v
retrieving revision 1.3
diff -a -u -p -r1.3 batch-create.sm
--- src/server/batch-create.sm	20 Nov 2008 01:17:10 -0000	1.3
+++ src/server/batch-create.sm	6 Apr 2009 18:57:26 -0000
@@ -67,6 +67,8 @@ static int batch_create_create(
     int ret = -1;
     job_id_t i;
 
+    gossip_debug(GOSSIP_SERVER_DEBUG, "batch_create.object_count: %d\n", s_op->req->u.batch_create.object_count);
+
     if(s_op->req->u.batch_create.object_count < 1)
     {
         js_p->error_code = -PVFS_EINVAL;
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to