Hi Becky,

When I have tried to set up an OrangeFS cluster with 4 and 8 nodes, the
batch_create error message has appeared again. Then, I have realized
that some of my nodes had a wrong time (with a maximum difference of two
hours and a half between nodes). After synchronizing the times, the
batch_create problem seems to be gone. Does this make sense? I mean, can
a wrong time in some servers cause the problem? I do not remember seeing
any recommendation or warning about node times in the OrangeFS
documentation?

Regards,

        Juan

El 16/05/15 a las 22:59, Becky Ligon escribió:
> Juan:
> 
> The conf file looks good.  Can you send me your server log files?
> 
> Becky
> 
> On Saturday, May 16, 2015, Juan PC <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     It is attached.
> 
>     I do not know if this is important, but one thing that I have seen with
>     this configuration file is that if I run the second server just after
>     running the first server, everything seems to work. However, if I wait
>     for a few seconds, the error message of the root directory appears in
>     the first server. Then, when I launch de second server, I get the
>     avalanche of batch_create error messages. This avalanche seems to stop
>     when it has generated around 1 GB of data. However, because of the
>     problem with the root directory, the file system does not work.
> 
>     I have checked if waiting for a few seconds between server executions is
>     an issue in OrangeFS 2.8.7 and it is not.
> 
>     Regards,
> 
>             Juan
> 
>     El 16/05/15 a las 17:59, Becky Ligon escribió:
>     > Can you send me your orangefs-server.conf file?
>     >
>     > NOTE:  do not use native IB with this version.  we have a known issue
>     > with distributed directories and IB that we are currently working on.
>     >
>     > Becky
>     >
>     > On Sat, May 16, 2015 at 11:43 AM, <[email protected] <javascript:;>
>     > <mailto:[email protected] <javascript:;>>> wrote:
>     >
>     >     No, only TCP over Ethernet. We have IB NICs, but I have not
>     compiled
>     >     OrangeFS with support for them.
>     >
>     >            Juan
>     >
>     >
>     >     Quoting "Becky Ligon" <[email protected] <javascript:;>
>     >     <mailto:[email protected] <javascript:;>>>:
>     >
>     >         Are you using native IB?
>     >
>     >         Becky
>     >
>     >         Sent from my iPhone
>     >
>     >             On May 15, 2015, at 5:39 PM, Juan PC
>     <[email protected] <javascript:;>
>     >             <mailto:[email protected] <javascript:;>>> wrote:
>     >
>     >             Hi,
>     >
>     >             Well, your configuration can probably avoid the
>     problem with the
>     >             benchmark, which I can not run because the creation of the
>     >             OrangeFS fails.
>     >
>     >             The batch_create error is still there because it appears
>     >             just when I
>     >             launch the servers. The creation of the root directory
>     fails
>     >             too, as I
>     >             have mentioned. I think this is the relevant part of
>     the log
>     >             messages
>     >             regarding the problem with the root directory:
>     >
>     >             [D 05/15/2015 21:08:37] server_post_unexpected_recv
>     >             [D 05/15/2015 21:08:37] server_op_state_get_machine 999
>     >             [D 05/15/2015 21:08:37] Initialization completed
>     successfully.
>     >             [D 05/15/2015 21:08:37]
>     server_state_machine_alloc_noreq 27
>     >             [D 05/15/2015 21:08:37] server_op_state_get_machine 27
>     >             [D 05/15/2015 21:08:37] server_state_machine_start_noreq
>     >             0x1d6fa10
>     >             [D 05/15/2015 21:08:37] *** Trove KeyVal Read of /dda
>     >             [D 05/15/2015 21:08:37] op_queue add: 0x1d71100
>     >             [D 05/15/2015 21:08:37] [DBPF THREAD]: [KEYVAL -1]: -7
>     >             [D 05/15/2015 21:08:37] [DBPF THREAD]: STARTING TROVE
>     >             SERVICE ROUTINE
>     >             (KEYVAL_READ)
>     >             [D 05/15/2015 21:08:37] warning: keyval read error on
>     handle
>     >             1048576 and
>     >             key= /dda (BDB0073 DB_NOTFOUND: No matching key/data
>     pair found)
>     >             [D 05/15/2015 21:08:37] [DBPF THREAD]: FINISHED TROVE
>     >             SERVICE ROUTINE
>     >             (KEYVAL_READ) (ret: -1073742082)
>     >             [D 05/15/2015 21:08:37] op_queue add: 0x1d71100
>     >             [D 05/15/2015 21:08:37]
>     server_state_machine_alloc_noreq 46
>     >             [D 05/15/2015 21:08:37] server_op_state_get_machine 46
>     >             [D 05/15/2015 21:08:37] server_state_machine_start_noreq
>     >             0x1d70f80
>     >             [D 05/15/2015 21:08:37] mgmt-create-root-dir: Init
>     >             dist-dir-attr for dir
>     >             meta handle 1048576 with tree_height=1, num_servers=2,
>     >             bitmap_size=1,
>     >             split_size=100, server_no=0 and branch_level=1
>     >             [D 05/15/2015 21:08:37] mgmt-create-root-dir: Init
>     >             dist_dir_bitmap as:
>     >             [D 05/15/2015 21:08:37]  i=0 : 00 00 00 03
>     >             [D 05/15/2015 21:08:37]
>     >             [D 05/15/2015 21:08:37] creating 1 local dirdata files
>     >             [D 05/15/2015 21:08:37] creating 1 remote dirdata files
>     >             [D 05/15/2015 21:08:37] job_precreate_pool_get_handles:
>     >             requesting 1
>     >             handles of type 16
>     >             [E 05/15/2015 21:08:37] Warning: unable to create root dir
>     >             due to error:
>     >             Invalid argument
>     >             [E 05/15/2015 21:08:37]          Your FS may be in an
>     >             inconsistent state
>     >             [D 05/15/2015 21:08:37]
>     server_state_machine_complete_noreq:
>     >             0x1d70f80
>     >             [D 05/15/2015 21:08:37] server_state_machine_terminate
>     0x1d70f80
>     >             [E 05/15/2015 21:08:43] PVFS2 server got signal 15
>     >             (server_status_flag:
>     >             4177919)
>     >             [D 05/15/2015 21:08:43] server_state_machine_terminate
>     0x1d2e970
>     >
>     >             Hope this helps.
>     >
>     >             Regards,
>     >
>     >                Juan
>     >
>     >
>     >                 El 15/05/15 a las 22:13, Becky Ligon escribió:
>     >                 Juan:
>     >
>     >                 You may have hit upon another problem that we've
>     >                 encountered where the
>     >                 splitting of directories goes into a race condition.
>     >                 Try this:
>     >
>     >                 1.  In your orangefs-server.conf file, set
>     >                 DistrDirServersInitial 1 and
>     >                 DistrDirServersMax 1 in your multi-server
>     configuration
>     >                 installation.
>     >
>     >                 2.  Delete your data and metadata areas and recreate.
>     >                 Start your servers.
>     >
>     >                 3.  Run your tests.
>     >
>     >                 See if this helps!
>     >
>     >                 NOTE:  We are working on a fix for this problem right
>     >                 now but don't have
>     >                 a working solution just yet.
>     >
>     >                 Becky
>     >
>     >                 On Fri, May 15, 2015 at 3:38 PM, Juan PC
>     >                 <[email protected] <javascript:;>
>     <mailto:[email protected] <javascript:;>>
>     >                 <mailto:[email protected] <javascript:;>
>     >                 <mailto:[email protected] <javascript:;>>>> wrote:
>     >
>     >                    Hi Becky,
>     >
>     >                    Thank you for your response :-)
>     >
>     >                    The problem is that the log file grows at a rate of
>     >                 around 2 MiB per
>     >                    second (EvenLogging is set to none!) and, more
>     >                 importantly, a simple
>     >                    pvfs2-ls does not work. The latter is probably
>     due to
>     >                 an error message
>     >                    that I get after starting the server that
>     stores the
>     >                 root file system:
>     >
>     >                    [E 05/15/2015 18:38:08] Warning: unable to create
>     >                 root dir due to error:
>     >                    Resource temporarily unavailable
>     >                    [E 05/15/2015 18:38:08]          Your FS may be
>     in an
>     >                 inconsistent state
>     >
>     >                    although the batch_create errors appears after,
>     when
>     >                 a second server
>     >                    is run.
>     >
>     >                    I have spent a lot of time trying different
>     >                 compilation options,
>     >                    configurations, db versions, checking that I
>     run the
>     >                 right executables,
>     >                    that they use the same filesystem configuration
>     file,
>     >                 etc., and the
>     >                    results is always the same. Well, to be honest,
>     I was
>     >                 able to activate
>     >                    the file system once (I do not know how), but it
>     >                 started failing when I
>     >                    tried to create a few thousands files per directory
>     >                 (bechmark
>     >                    hpcs-io_1.2.0-rc1, scenarios 9-12).
>     >
>     >                    My feeling is that, with two servers, the
>     problematic
>     >                 sever (the one
>     >                    aimed at storing the root directory) does not
>     >                 communicate correctly with
>     >                    the second server. There is no firewall, SELinux is
>     >                 disabled, etc.
>     >
>     >                    Some final remarks:
>     >                    - Security is always the default one, I have
>     not used
>     >                 either
>     >                    --enable-security-key or --enable-security-cert
>     option.
>     >                    - Same steps with OrangeFS 2.8.7 and not
>     problem at all.
>     >
>     >                    So I guess that I should be doing something
>     terribly
>     >                 wrong, but I do not
>     >                    know what :-(
>     >
>     >                    If I can do something (for instance, running the
>     >                 servers with
>     >                    EvenLogging set to verbose), just let me know.
>     >
>     >                    Regards,
>     >
>     >                            Juan
>     >
>     >                        El 15/05/15 a las 20:12, Becky Ligon escribió:
>     >                     This is normal for 2.9.1 and okay to get the
>     >                     messages you are seeing.
>     >                     batch_create comes into play when a server
>     needs to
>     >                     gather more handles
>     >                     (like inodes) from another server.  The "Resource
>     >                     temporarily
>     >                     unavailable" is generated when the capability
>     >                     associated with this
>     >                     request has timed out.  So, the calling server
>     >                     regenerates the
>     >                     capability and resends the batch_create request.
>     >
>     >                     The OFS development team is changing when these
>     >                     capabilities get
>     >                     generated for batch_create requests to alleviate
>     >                     this problem.  For now,
>     >                     you can ignore these messages.
>     >
>     >                     Sorry for the inconvenience.
>     >
>     >                     Becky
>     >
>     >
>     >
>     >                     On Fri, May 15, 2015 at 11:48 AM, Juan PC
>     >                     <[email protected] <javascript:;>
>     <mailto:[email protected] <javascript:;>>
>     >                     <mailto:[email protected] <javascript:;>
>     >                     <mailto:[email protected] <javascript:;>>>
>     >                     <mailto:[email protected] <javascript:;>
>     >                     <mailto:[email protected] <javascript:;>>
>     >                     <mailto:[email protected] <javascript:;>
>     >                     <mailto:[email protected] <javascript:;>>>>>
>     wrote:
>     >
>     >                        Dear Becky,
>     >
>     >                        I am trying to use orangefs-2.9.1, but
>     everytime
>     >                     I run the
>     >
>     >                    servers I get
>     >
>     >                        the message of the subject in one of the
>     servers,
>     >                     and its log
>     >
>     >                    file grows
>     >
>     >                        very quickly. The last reference that I
>     have seen
>     >                     about this
>     >
>     >                    problem is
>     >
>     >               
>      
> http://www.beowulf-underground.org/pipermail/pvfs2-users/2015-April/004432.html.
>     >
>     >                        I have used option --disable-capcache of
>     >                     configure, but same
>     >
>     >                    result. Do
>     >
>     >                        you know if this issue has been already
>     fixed or
>     >                     if there is a
>     >                        workaround?
>     >
>     >                        Best regards,
>     >
>     >                                Juan
>     >
>     >
>     >
>     >
>     >
>     >
>     >     ----------------------------------------------------------------
>     >     This message was sent using IMP, the Internet Messaging Program.
>     >
>     >
> 
> 
> 
> -- 
> Sent from Gmail Mobile


-- 
D. Juan Piernas Cánovas
Departamento de Ingeniería y Tecnología de Computadores
Facultad de Informática. Universidad de Murcia
Campus de Espinardo - 30080 Murcia (SPAIN)
Tel.: +34868887657    Fax: +34868884151
email: [email protected]
PGP public key:
http://pgp.rediris.es:11371/pks/lookup?search=piernas%40ditec.um.es&op=index

*** Por favor, envíeme sus documentos en formato texto, HTML, PDF o
PostScript :-) ***
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to