No, only TCP over Ethernet. We have IB NICs, but I have not compiled OrangeFS with support for them.

       Juan

Quoting "Becky Ligon" <[email protected]>:

Are you using native IB?

Becky

Sent from my iPhone

On May 15, 2015, at 5:39 PM, Juan PC <[email protected]> wrote:

Hi,

Well, your configuration can probably avoid the problem with the
benchmark, which I can not run because the creation of the OrangeFS fails.

The batch_create error is still there because it appears just when I
launch the servers. The creation of the root directory fails too, as I
have mentioned. I think this is the relevant part of the log messages
regarding the problem with the root directory:

[D 05/15/2015 21:08:37] server_post_unexpected_recv
[D 05/15/2015 21:08:37] server_op_state_get_machine 999
[D 05/15/2015 21:08:37] Initialization completed successfully.
[D 05/15/2015 21:08:37] server_state_machine_alloc_noreq 27
[D 05/15/2015 21:08:37] server_op_state_get_machine 27
[D 05/15/2015 21:08:37] server_state_machine_start_noreq 0x1d6fa10
[D 05/15/2015 21:08:37] *** Trove KeyVal Read of /dda
[D 05/15/2015 21:08:37] op_queue add: 0x1d71100
[D 05/15/2015 21:08:37] [DBPF THREAD]: [KEYVAL -1]: -7
[D 05/15/2015 21:08:37] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE
(KEYVAL_READ)
[D 05/15/2015 21:08:37] warning: keyval read error on handle 1048576 and
key= /dda (BDB0073 DB_NOTFOUND: No matching key/data pair found)
[D 05/15/2015 21:08:37] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE
(KEYVAL_READ) (ret: -1073742082)
[D 05/15/2015 21:08:37] op_queue add: 0x1d71100
[D 05/15/2015 21:08:37] server_state_machine_alloc_noreq 46
[D 05/15/2015 21:08:37] server_op_state_get_machine 46
[D 05/15/2015 21:08:37] server_state_machine_start_noreq 0x1d70f80
[D 05/15/2015 21:08:37] mgmt-create-root-dir: Init dist-dir-attr for dir
meta handle 1048576 with tree_height=1, num_servers=2, bitmap_size=1,
split_size=100, server_no=0 and branch_level=1
[D 05/15/2015 21:08:37] mgmt-create-root-dir: Init dist_dir_bitmap as:
[D 05/15/2015 21:08:37]  i=0 : 00 00 00 03
[D 05/15/2015 21:08:37]
[D 05/15/2015 21:08:37] creating 1 local dirdata files
[D 05/15/2015 21:08:37] creating 1 remote dirdata files
[D 05/15/2015 21:08:37] job_precreate_pool_get_handles: requesting 1
handles of type 16
[E 05/15/2015 21:08:37] Warning: unable to create root dir due to error:
Invalid argument
[E 05/15/2015 21:08:37]          Your FS may be in an inconsistent state
[D 05/15/2015 21:08:37] server_state_machine_complete_noreq: 0x1d70f80
[D 05/15/2015 21:08:37] server_state_machine_terminate 0x1d70f80
[E 05/15/2015 21:08:43] PVFS2 server got signal 15 (server_status_flag:
4177919)
[D 05/15/2015 21:08:43] server_state_machine_terminate 0x1d2e970

Hope this helps.

Regards,

   Juan


El 15/05/15 a las 22:13, Becky Ligon escribió:
Juan:

You may have hit upon another problem that we've encountered where the
splitting of directories goes into a race condition.  Try this:

1.  In your orangefs-server.conf file, set DistrDirServersInitial 1 and
DistrDirServersMax 1 in your multi-server configuration installation.

2.  Delete your data and metadata areas and recreate.  Start your servers.

3.  Run your tests.

See if this helps!

NOTE:  We are working on a fix for this problem right now but don't have
a working solution just yet.

Becky

On Fri, May 15, 2015 at 3:38 PM, Juan PC <[email protected]
<mailto:[email protected]>> wrote:

   Hi Becky,

   Thank you for your response :-)

   The problem is that the log file grows at a rate of around 2 MiB per
   second (EvenLogging is set to none!) and, more importantly, a simple
   pvfs2-ls does not work. The latter is probably due to an error message
   that I get after starting the server that stores the root file system:

   [E 05/15/2015 18:38:08] Warning: unable to create root dir due to error:
   Resource temporarily unavailable
   [E 05/15/2015 18:38:08]          Your FS may be in an inconsistent state

   although the batch_create errors appears after, when a second server
   is run.

   I have spent a lot of time trying different compilation options,
   configurations, db versions, checking that I run the right executables,
   that they use the same filesystem configuration file, etc., and the
   results is always the same. Well, to be honest, I was able to activate
   the file system once (I do not know how), but it started failing when I
   tried to create a few thousands files per directory (bechmark
   hpcs-io_1.2.0-rc1, scenarios 9-12).

   My feeling is that, with two servers, the problematic sever (the one
   aimed at storing the root directory) does not communicate correctly with
   the second server. There is no firewall, SELinux is disabled, etc.

   Some final remarks:
   - Security is always the default one, I have not used either
   --enable-security-key or --enable-security-cert option.
   - Same steps with OrangeFS 2.8.7 and not problem at all.

   So I guess that I should be doing something terribly wrong, but I do not
   know what :-(

   If I can do something (for instance, running the servers with
   EvenLogging set to verbose), just let me know.

   Regards,

           Juan

   El 15/05/15 a las 20:12, Becky Ligon escribió:
This is normal for 2.9.1 and okay to get the messages you are seeing.
batch_create comes into play when a server needs to gather more handles
(like inodes) from another server.  The "Resource temporarily
unavailable" is generated when the capability associated with this
request has timed out.  So, the calling server regenerates the
capability and resends the batch_create request.

The OFS development team is changing when these capabilities get
generated for batch_create requests to alleviate this problem.  For now,
you can ignore these messages.

Sorry for the inconvenience.

Becky



On Fri, May 15, 2015 at 11:48 AM, Juan PC <[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>> wrote:

   Dear Becky,

   I am trying to use orangefs-2.9.1, but everytime I run the
   servers I get
   the message of the subject in one of the servers, and its log
   file grows
   very quickly. The last reference that I have seen about this
   problem is
http://www.beowulf-underground.org/pipermail/pvfs2-users/2015-April/004432.html.
   I have used option --disable-capcache of configure, but same
   result. Do
   you know if this issue has been already fixed or if there is a
   workaround?

   Best regards,

           Juan






----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to