Juan:

The conf file looks good.  Can you send me your server log files?

Becky

On Saturday, May 16, 2015, Juan PC <[email protected]> wrote:

> It is attached.
>
> I do not know if this is important, but one thing that I have seen with
> this configuration file is that if I run the second server just after
> running the first server, everything seems to work. However, if I wait
> for a few seconds, the error message of the root directory appears in
> the first server. Then, when I launch de second server, I get the
> avalanche of batch_create error messages. This avalanche seems to stop
> when it has generated around 1 GB of data. However, because of the
> problem with the root directory, the file system does not work.
>
> I have checked if waiting for a few seconds between server executions is
> an issue in OrangeFS 2.8.7 and it is not.
>
> Regards,
>
>         Juan
>
> El 16/05/15 a las 17:59, Becky Ligon escribió:
> > Can you send me your orangefs-server.conf file?
> >
> > NOTE:  do not use native IB with this version.  we have a known issue
> > with distributed directories and IB that we are currently working on.
> >
> > Becky
> >
> > On Sat, May 16, 2015 at 11:43 AM, <[email protected] <javascript:;>
> > <mailto:[email protected] <javascript:;>>> wrote:
> >
> >     No, only TCP over Ethernet. We have IB NICs, but I have not compiled
> >     OrangeFS with support for them.
> >
> >            Juan
> >
> >
> >     Quoting "Becky Ligon" <[email protected] <javascript:;>
> >     <mailto:[email protected] <javascript:;>>>:
> >
> >         Are you using native IB?
> >
> >         Becky
> >
> >         Sent from my iPhone
> >
> >             On May 15, 2015, at 5:39 PM, Juan PC <[email protected]
> <javascript:;>
> >             <mailto:[email protected] <javascript:;>>> wrote:
> >
> >             Hi,
> >
> >             Well, your configuration can probably avoid the problem with
> the
> >             benchmark, which I can not run because the creation of the
> >             OrangeFS fails.
> >
> >             The batch_create error is still there because it appears
> >             just when I
> >             launch the servers. The creation of the root directory fails
> >             too, as I
> >             have mentioned. I think this is the relevant part of the log
> >             messages
> >             regarding the problem with the root directory:
> >
> >             [D 05/15/2015 21:08:37] server_post_unexpected_recv
> >             [D 05/15/2015 21:08:37] server_op_state_get_machine 999
> >             [D 05/15/2015 21:08:37] Initialization completed
> successfully.
> >             [D 05/15/2015 21:08:37] server_state_machine_alloc_noreq 27
> >             [D 05/15/2015 21:08:37] server_op_state_get_machine 27
> >             [D 05/15/2015 21:08:37] server_state_machine_start_noreq
> >             0x1d6fa10
> >             [D 05/15/2015 21:08:37] *** Trove KeyVal Read of /dda
> >             [D 05/15/2015 21:08:37] op_queue add: 0x1d71100
> >             [D 05/15/2015 21:08:37] [DBPF THREAD]: [KEYVAL -1]: -7
> >             [D 05/15/2015 21:08:37] [DBPF THREAD]: STARTING TROVE
> >             SERVICE ROUTINE
> >             (KEYVAL_READ)
> >             [D 05/15/2015 21:08:37] warning: keyval read error on handle
> >             1048576 and
> >             key= /dda (BDB0073 DB_NOTFOUND: No matching key/data pair
> found)
> >             [D 05/15/2015 21:08:37] [DBPF THREAD]: FINISHED TROVE
> >             SERVICE ROUTINE
> >             (KEYVAL_READ) (ret: -1073742082)
> >             [D 05/15/2015 21:08:37] op_queue add: 0x1d71100
> >             [D 05/15/2015 21:08:37] server_state_machine_alloc_noreq 46
> >             [D 05/15/2015 21:08:37] server_op_state_get_machine 46
> >             [D 05/15/2015 21:08:37] server_state_machine_start_noreq
> >             0x1d70f80
> >             [D 05/15/2015 21:08:37] mgmt-create-root-dir: Init
> >             dist-dir-attr for dir
> >             meta handle 1048576 with tree_height=1, num_servers=2,
> >             bitmap_size=1,
> >             split_size=100, server_no=0 and branch_level=1
> >             [D 05/15/2015 21:08:37] mgmt-create-root-dir: Init
> >             dist_dir_bitmap as:
> >             [D 05/15/2015 21:08:37]  i=0 : 00 00 00 03
> >             [D 05/15/2015 21:08:37]
> >             [D 05/15/2015 21:08:37] creating 1 local dirdata files
> >             [D 05/15/2015 21:08:37] creating 1 remote dirdata files
> >             [D 05/15/2015 21:08:37] job_precreate_pool_get_handles:
> >             requesting 1
> >             handles of type 16
> >             [E 05/15/2015 21:08:37] Warning: unable to create root dir
> >             due to error:
> >             Invalid argument
> >             [E 05/15/2015 21:08:37]          Your FS may be in an
> >             inconsistent state
> >             [D 05/15/2015 21:08:37] server_state_machine_complete_noreq:
> >             0x1d70f80
> >             [D 05/15/2015 21:08:37] server_state_machine_terminate
> 0x1d70f80
> >             [E 05/15/2015 21:08:43] PVFS2 server got signal 15
> >             (server_status_flag:
> >             4177919)
> >             [D 05/15/2015 21:08:43] server_state_machine_terminate
> 0x1d2e970
> >
> >             Hope this helps.
> >
> >             Regards,
> >
> >                Juan
> >
> >
> >                 El 15/05/15 a las 22:13, Becky Ligon escribió:
> >                 Juan:
> >
> >                 You may have hit upon another problem that we've
> >                 encountered where the
> >                 splitting of directories goes into a race condition.
> >                 Try this:
> >
> >                 1.  In your orangefs-server.conf file, set
> >                 DistrDirServersInitial 1 and
> >                 DistrDirServersMax 1 in your multi-server configuration
> >                 installation.
> >
> >                 2.  Delete your data and metadata areas and recreate.
> >                 Start your servers.
> >
> >                 3.  Run your tests.
> >
> >                 See if this helps!
> >
> >                 NOTE:  We are working on a fix for this problem right
> >                 now but don't have
> >                 a working solution just yet.
> >
> >                 Becky
> >
> >                 On Fri, May 15, 2015 at 3:38 PM, Juan PC
> >                 <[email protected] <javascript:;> <mailto:
> [email protected] <javascript:;>>
> >                 <mailto:[email protected] <javascript:;>
> >                 <mailto:[email protected] <javascript:;>>>> wrote:
> >
> >                    Hi Becky,
> >
> >                    Thank you for your response :-)
> >
> >                    The problem is that the log file grows at a rate of
> >                 around 2 MiB per
> >                    second (EvenLogging is set to none!) and, more
> >                 importantly, a simple
> >                    pvfs2-ls does not work. The latter is probably due to
> >                 an error message
> >                    that I get after starting the server that stores the
> >                 root file system:
> >
> >                    [E 05/15/2015 18:38:08] Warning: unable to create
> >                 root dir due to error:
> >                    Resource temporarily unavailable
> >                    [E 05/15/2015 18:38:08]          Your FS may be in an
> >                 inconsistent state
> >
> >                    although the batch_create errors appears after, when
> >                 a second server
> >                    is run.
> >
> >                    I have spent a lot of time trying different
> >                 compilation options,
> >                    configurations, db versions, checking that I run the
> >                 right executables,
> >                    that they use the same filesystem configuration file,
> >                 etc., and the
> >                    results is always the same. Well, to be honest, I was
> >                 able to activate
> >                    the file system once (I do not know how), but it
> >                 started failing when I
> >                    tried to create a few thousands files per directory
> >                 (bechmark
> >                    hpcs-io_1.2.0-rc1, scenarios 9-12).
> >
> >                    My feeling is that, with two servers, the problematic
> >                 sever (the one
> >                    aimed at storing the root directory) does not
> >                 communicate correctly with
> >                    the second server. There is no firewall, SELinux is
> >                 disabled, etc.
> >
> >                    Some final remarks:
> >                    - Security is always the default one, I have not used
> >                 either
> >                    --enable-security-key or --enable-security-cert
> option.
> >                    - Same steps with OrangeFS 2.8.7 and not problem at
> all.
> >
> >                    So I guess that I should be doing something terribly
> >                 wrong, but I do not
> >                    know what :-(
> >
> >                    If I can do something (for instance, running the
> >                 servers with
> >                    EvenLogging set to verbose), just let me know.
> >
> >                    Regards,
> >
> >                            Juan
> >
> >                        El 15/05/15 a las 20:12, Becky Ligon escribió:
> >                     This is normal for 2.9.1 and okay to get the
> >                     messages you are seeing.
> >                     batch_create comes into play when a server needs to
> >                     gather more handles
> >                     (like inodes) from another server.  The "Resource
> >                     temporarily
> >                     unavailable" is generated when the capability
> >                     associated with this
> >                     request has timed out.  So, the calling server
> >                     regenerates the
> >                     capability and resends the batch_create request.
> >
> >                     The OFS development team is changing when these
> >                     capabilities get
> >                     generated for batch_create requests to alleviate
> >                     this problem.  For now,
> >                     you can ignore these messages.
> >
> >                     Sorry for the inconvenience.
> >
> >                     Becky
> >
> >
> >
> >                     On Fri, May 15, 2015 at 11:48 AM, Juan PC
> >                     <[email protected] <javascript:;> <mailto:
> [email protected] <javascript:;>>
> >                     <mailto:[email protected] <javascript:;>
> >                     <mailto:[email protected] <javascript:;>>>
> >                     <mailto:[email protected] <javascript:;>
> >                     <mailto:[email protected] <javascript:;>>
> >                     <mailto:[email protected] <javascript:;>
> >                     <mailto:[email protected] <javascript:;>>>>>
> wrote:
> >
> >                        Dear Becky,
> >
> >                        I am trying to use orangefs-2.9.1, but everytime
> >                     I run the
> >
> >                    servers I get
> >
> >                        the message of the subject in one of the servers,
> >                     and its log
> >
> >                    file grows
> >
> >                        very quickly. The last reference that I have seen
> >                     about this
> >
> >                    problem is
> >
> >
> http://www.beowulf-underground.org/pipermail/pvfs2-users/2015-April/004432.html
> .
> >
> >                        I have used option --disable-capcache of
> >                     configure, but same
> >
> >                    result. Do
> >
> >                        you know if this issue has been already fixed or
> >                     if there is a
> >                        workaround?
> >
> >                        Best regards,
> >
> >                                Juan
> >
> >
> >
> >
> >
> >
> >     ----------------------------------------------------------------
> >     This message was sent using IMP, the Internet Messaging Program.
> >
> >
>


-- 
Sent from Gmail Mobile
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to