Juan: The conf file looks good. Can you send me your server log files?
Becky On Saturday, May 16, 2015, Juan PC <[email protected]> wrote: > It is attached. > > I do not know if this is important, but one thing that I have seen with > this configuration file is that if I run the second server just after > running the first server, everything seems to work. However, if I wait > for a few seconds, the error message of the root directory appears in > the first server. Then, when I launch de second server, I get the > avalanche of batch_create error messages. This avalanche seems to stop > when it has generated around 1 GB of data. However, because of the > problem with the root directory, the file system does not work. > > I have checked if waiting for a few seconds between server executions is > an issue in OrangeFS 2.8.7 and it is not. > > Regards, > > Juan > > El 16/05/15 a las 17:59, Becky Ligon escribió: > > Can you send me your orangefs-server.conf file? > > > > NOTE: do not use native IB with this version. we have a known issue > > with distributed directories and IB that we are currently working on. > > > > Becky > > > > On Sat, May 16, 2015 at 11:43 AM, <[email protected] <javascript:;> > > <mailto:[email protected] <javascript:;>>> wrote: > > > > No, only TCP over Ethernet. We have IB NICs, but I have not compiled > > OrangeFS with support for them. > > > > Juan > > > > > > Quoting "Becky Ligon" <[email protected] <javascript:;> > > <mailto:[email protected] <javascript:;>>>: > > > > Are you using native IB? > > > > Becky > > > > Sent from my iPhone > > > > On May 15, 2015, at 5:39 PM, Juan PC <[email protected] > <javascript:;> > > <mailto:[email protected] <javascript:;>>> wrote: > > > > Hi, > > > > Well, your configuration can probably avoid the problem with > the > > benchmark, which I can not run because the creation of the > > OrangeFS fails. > > > > The batch_create error is still there because it appears > > just when I > > launch the servers. The creation of the root directory fails > > too, as I > > have mentioned. I think this is the relevant part of the log > > messages > > regarding the problem with the root directory: > > > > [D 05/15/2015 21:08:37] server_post_unexpected_recv > > [D 05/15/2015 21:08:37] server_op_state_get_machine 999 > > [D 05/15/2015 21:08:37] Initialization completed > successfully. > > [D 05/15/2015 21:08:37] server_state_machine_alloc_noreq 27 > > [D 05/15/2015 21:08:37] server_op_state_get_machine 27 > > [D 05/15/2015 21:08:37] server_state_machine_start_noreq > > 0x1d6fa10 > > [D 05/15/2015 21:08:37] *** Trove KeyVal Read of /dda > > [D 05/15/2015 21:08:37] op_queue add: 0x1d71100 > > [D 05/15/2015 21:08:37] [DBPF THREAD]: [KEYVAL -1]: -7 > > [D 05/15/2015 21:08:37] [DBPF THREAD]: STARTING TROVE > > SERVICE ROUTINE > > (KEYVAL_READ) > > [D 05/15/2015 21:08:37] warning: keyval read error on handle > > 1048576 and > > key= /dda (BDB0073 DB_NOTFOUND: No matching key/data pair > found) > > [D 05/15/2015 21:08:37] [DBPF THREAD]: FINISHED TROVE > > SERVICE ROUTINE > > (KEYVAL_READ) (ret: -1073742082) > > [D 05/15/2015 21:08:37] op_queue add: 0x1d71100 > > [D 05/15/2015 21:08:37] server_state_machine_alloc_noreq 46 > > [D 05/15/2015 21:08:37] server_op_state_get_machine 46 > > [D 05/15/2015 21:08:37] server_state_machine_start_noreq > > 0x1d70f80 > > [D 05/15/2015 21:08:37] mgmt-create-root-dir: Init > > dist-dir-attr for dir > > meta handle 1048576 with tree_height=1, num_servers=2, > > bitmap_size=1, > > split_size=100, server_no=0 and branch_level=1 > > [D 05/15/2015 21:08:37] mgmt-create-root-dir: Init > > dist_dir_bitmap as: > > [D 05/15/2015 21:08:37] i=0 : 00 00 00 03 > > [D 05/15/2015 21:08:37] > > [D 05/15/2015 21:08:37] creating 1 local dirdata files > > [D 05/15/2015 21:08:37] creating 1 remote dirdata files > > [D 05/15/2015 21:08:37] job_precreate_pool_get_handles: > > requesting 1 > > handles of type 16 > > [E 05/15/2015 21:08:37] Warning: unable to create root dir > > due to error: > > Invalid argument > > [E 05/15/2015 21:08:37] Your FS may be in an > > inconsistent state > > [D 05/15/2015 21:08:37] server_state_machine_complete_noreq: > > 0x1d70f80 > > [D 05/15/2015 21:08:37] server_state_machine_terminate > 0x1d70f80 > > [E 05/15/2015 21:08:43] PVFS2 server got signal 15 > > (server_status_flag: > > 4177919) > > [D 05/15/2015 21:08:43] server_state_machine_terminate > 0x1d2e970 > > > > Hope this helps. > > > > Regards, > > > > Juan > > > > > > El 15/05/15 a las 22:13, Becky Ligon escribió: > > Juan: > > > > You may have hit upon another problem that we've > > encountered where the > > splitting of directories goes into a race condition. > > Try this: > > > > 1. In your orangefs-server.conf file, set > > DistrDirServersInitial 1 and > > DistrDirServersMax 1 in your multi-server configuration > > installation. > > > > 2. Delete your data and metadata areas and recreate. > > Start your servers. > > > > 3. Run your tests. > > > > See if this helps! > > > > NOTE: We are working on a fix for this problem right > > now but don't have > > a working solution just yet. > > > > Becky > > > > On Fri, May 15, 2015 at 3:38 PM, Juan PC > > <[email protected] <javascript:;> <mailto: > [email protected] <javascript:;>> > > <mailto:[email protected] <javascript:;> > > <mailto:[email protected] <javascript:;>>>> wrote: > > > > Hi Becky, > > > > Thank you for your response :-) > > > > The problem is that the log file grows at a rate of > > around 2 MiB per > > second (EvenLogging is set to none!) and, more > > importantly, a simple > > pvfs2-ls does not work. The latter is probably due to > > an error message > > that I get after starting the server that stores the > > root file system: > > > > [E 05/15/2015 18:38:08] Warning: unable to create > > root dir due to error: > > Resource temporarily unavailable > > [E 05/15/2015 18:38:08] Your FS may be in an > > inconsistent state > > > > although the batch_create errors appears after, when > > a second server > > is run. > > > > I have spent a lot of time trying different > > compilation options, > > configurations, db versions, checking that I run the > > right executables, > > that they use the same filesystem configuration file, > > etc., and the > > results is always the same. Well, to be honest, I was > > able to activate > > the file system once (I do not know how), but it > > started failing when I > > tried to create a few thousands files per directory > > (bechmark > > hpcs-io_1.2.0-rc1, scenarios 9-12). > > > > My feeling is that, with two servers, the problematic > > sever (the one > > aimed at storing the root directory) does not > > communicate correctly with > > the second server. There is no firewall, SELinux is > > disabled, etc. > > > > Some final remarks: > > - Security is always the default one, I have not used > > either > > --enable-security-key or --enable-security-cert > option. > > - Same steps with OrangeFS 2.8.7 and not problem at > all. > > > > So I guess that I should be doing something terribly > > wrong, but I do not > > know what :-( > > > > If I can do something (for instance, running the > > servers with > > EvenLogging set to verbose), just let me know. > > > > Regards, > > > > Juan > > > > El 15/05/15 a las 20:12, Becky Ligon escribió: > > This is normal for 2.9.1 and okay to get the > > messages you are seeing. > > batch_create comes into play when a server needs to > > gather more handles > > (like inodes) from another server. The "Resource > > temporarily > > unavailable" is generated when the capability > > associated with this > > request has timed out. So, the calling server > > regenerates the > > capability and resends the batch_create request. > > > > The OFS development team is changing when these > > capabilities get > > generated for batch_create requests to alleviate > > this problem. For now, > > you can ignore these messages. > > > > Sorry for the inconvenience. > > > > Becky > > > > > > > > On Fri, May 15, 2015 at 11:48 AM, Juan PC > > <[email protected] <javascript:;> <mailto: > [email protected] <javascript:;>> > > <mailto:[email protected] <javascript:;> > > <mailto:[email protected] <javascript:;>>> > > <mailto:[email protected] <javascript:;> > > <mailto:[email protected] <javascript:;>> > > <mailto:[email protected] <javascript:;> > > <mailto:[email protected] <javascript:;>>>>> > wrote: > > > > Dear Becky, > > > > I am trying to use orangefs-2.9.1, but everytime > > I run the > > > > servers I get > > > > the message of the subject in one of the servers, > > and its log > > > > file grows > > > > very quickly. The last reference that I have seen > > about this > > > > problem is > > > > > http://www.beowulf-underground.org/pipermail/pvfs2-users/2015-April/004432.html > . > > > > I have used option --disable-capcache of > > configure, but same > > > > result. Do > > > > you know if this issue has been already fixed or > > if there is a > > workaround? > > > > Best regards, > > > > Juan > > > > > > > > > > > > > > ---------------------------------------------------------------- > > This message was sent using IMP, the Internet Messaging Program. > > > > > -- Sent from Gmail Mobile
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
