Hi,

Well, your configuration can probably avoid the problem with the
benchmark, which I can not run because the creation of the OrangeFS fails.

The batch_create error is still there because it appears just when I
launch the servers. The creation of the root directory fails too, as I
have mentioned. I think this is the relevant part of the log messages
regarding the problem with the root directory:

[D 05/15/2015 21:08:37] server_post_unexpected_recv
[D 05/15/2015 21:08:37] server_op_state_get_machine 999
[D 05/15/2015 21:08:37] Initialization completed successfully.
[D 05/15/2015 21:08:37] server_state_machine_alloc_noreq 27
[D 05/15/2015 21:08:37] server_op_state_get_machine 27
[D 05/15/2015 21:08:37] server_state_machine_start_noreq 0x1d6fa10
[D 05/15/2015 21:08:37] *** Trove KeyVal Read of /dda
[D 05/15/2015 21:08:37] op_queue add: 0x1d71100
[D 05/15/2015 21:08:37] [DBPF THREAD]: [KEYVAL -1]: -7
[D 05/15/2015 21:08:37] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE
(KEYVAL_READ)
[D 05/15/2015 21:08:37] warning: keyval read error on handle 1048576 and
key= /dda (BDB0073 DB_NOTFOUND: No matching key/data pair found)
[D 05/15/2015 21:08:37] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE
(KEYVAL_READ) (ret: -1073742082)
[D 05/15/2015 21:08:37] op_queue add: 0x1d71100
[D 05/15/2015 21:08:37] server_state_machine_alloc_noreq 46
[D 05/15/2015 21:08:37] server_op_state_get_machine 46
[D 05/15/2015 21:08:37] server_state_machine_start_noreq 0x1d70f80
[D 05/15/2015 21:08:37] mgmt-create-root-dir: Init dist-dir-attr for dir
meta handle 1048576 with tree_height=1, num_servers=2, bitmap_size=1,
split_size=100, server_no=0 and branch_level=1
[D 05/15/2015 21:08:37] mgmt-create-root-dir: Init dist_dir_bitmap as:
[D 05/15/2015 21:08:37]  i=0 : 00 00 00 03
[D 05/15/2015 21:08:37]
[D 05/15/2015 21:08:37] creating 1 local dirdata files
[D 05/15/2015 21:08:37] creating 1 remote dirdata files
[D 05/15/2015 21:08:37] job_precreate_pool_get_handles: requesting 1
handles of type 16
[E 05/15/2015 21:08:37] Warning: unable to create root dir due to error:
Invalid argument
[E 05/15/2015 21:08:37]          Your FS may be in an inconsistent state
[D 05/15/2015 21:08:37] server_state_machine_complete_noreq: 0x1d70f80
[D 05/15/2015 21:08:37] server_state_machine_terminate 0x1d70f80
[E 05/15/2015 21:08:43] PVFS2 server got signal 15 (server_status_flag:
4177919)
[D 05/15/2015 21:08:43] server_state_machine_terminate 0x1d2e970

Hope this helps.

Regards,

        Juan


El 15/05/15 a las 22:13, Becky Ligon escribió:
> Juan:
> 
> You may have hit upon another problem that we've encountered where the
> splitting of directories goes into a race condition.  Try this:
> 
> 1.  In your orangefs-server.conf file, set DistrDirServersInitial 1 and
> DistrDirServersMax 1 in your multi-server configuration installation.
> 
> 2.  Delete your data and metadata areas and recreate.  Start your servers.
> 
> 3.  Run your tests.
> 
> See if this helps!
> 
> NOTE:  We are working on a fix for this problem right now but don't have
> a working solution just yet.
> 
> Becky
> 
> On Fri, May 15, 2015 at 3:38 PM, Juan PC <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     Hi Becky,
> 
>     Thank you for your response :-)
> 
>     The problem is that the log file grows at a rate of around 2 MiB per
>     second (EvenLogging is set to none!) and, more importantly, a simple
>     pvfs2-ls does not work. The latter is probably due to an error message
>     that I get after starting the server that stores the root file system:
> 
>     [E 05/15/2015 18:38:08] Warning: unable to create root dir due to error:
>     Resource temporarily unavailable
>     [E 05/15/2015 18:38:08]          Your FS may be in an inconsistent state
> 
>     although the batch_create errors appears after, when a second server
>     is run.
> 
>     I have spent a lot of time trying different compilation options,
>     configurations, db versions, checking that I run the right executables,
>     that they use the same filesystem configuration file, etc., and the
>     results is always the same. Well, to be honest, I was able to activate
>     the file system once (I do not know how), but it started failing when I
>     tried to create a few thousands files per directory (bechmark
>     hpcs-io_1.2.0-rc1, scenarios 9-12).
> 
>     My feeling is that, with two servers, the problematic sever (the one
>     aimed at storing the root directory) does not communicate correctly with
>     the second server. There is no firewall, SELinux is disabled, etc.
> 
>     Some final remarks:
>     - Security is always the default one, I have not used either
>     --enable-security-key or --enable-security-cert option.
>     - Same steps with OrangeFS 2.8.7 and not problem at all.
> 
>     So I guess that I should be doing something terribly wrong, but I do not
>     know what :-(
> 
>     If I can do something (for instance, running the servers with
>     EvenLogging set to verbose), just let me know.
> 
>     Regards,
> 
>             Juan
> 
>     El 15/05/15 a las 20:12, Becky Ligon escribió:
>     > This is normal for 2.9.1 and okay to get the messages you are seeing.
>     > batch_create comes into play when a server needs to gather more handles
>     > (like inodes) from another server.  The "Resource temporarily
>     > unavailable" is generated when the capability associated with this
>     > request has timed out.  So, the calling server regenerates the
>     > capability and resends the batch_create request.
>     >
>     > The OFS development team is changing when these capabilities get
>     > generated for batch_create requests to alleviate this problem.  For now,
>     > you can ignore these messages.
>     >
>     > Sorry for the inconvenience.
>     >
>     > Becky
>     >
>     >
>     >
>     > On Fri, May 15, 2015 at 11:48 AM, Juan PC <[email protected] 
> <mailto:[email protected]>
>     > <mailto:[email protected] <mailto:[email protected]>>> wrote:
>     >
>     >     Dear Becky,
>     >
>     >     I am trying to use orangefs-2.9.1, but everytime I run the
>     servers I get
>     >     the message of the subject in one of the servers, and its log
>     file grows
>     >     very quickly. The last reference that I have seen about this
>     problem is
>     >   
>      
> http://www.beowulf-underground.org/pipermail/pvfs2-users/2015-April/004432.html.
>     >     I have used option --disable-capcache of configure, but same
>     result. Do
>     >     you know if this issue has been already fixed or if there is a
>     >     workaround?
>     >
>     >     Best regards,
>     >
>     >             Juan
>     >
>     >
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to