Can you send me your orangefs-server.conf file? NOTE: do not use native IB with this version. we have a known issue with distributed directories and IB that we are currently working on.
Becky On Sat, May 16, 2015 at 11:43 AM, <[email protected]> wrote: > No, only TCP over Ethernet. We have IB NICs, but I have not compiled > OrangeFS with support for them. > > Juan > > > Quoting "Becky Ligon" <[email protected]>: > > Are you using native IB? >> >> Becky >> >> Sent from my iPhone >> >> On May 15, 2015, at 5:39 PM, Juan PC <[email protected]> wrote: >>> >>> Hi, >>> >>> Well, your configuration can probably avoid the problem with the >>> benchmark, which I can not run because the creation of the OrangeFS >>> fails. >>> >>> The batch_create error is still there because it appears just when I >>> launch the servers. The creation of the root directory fails too, as I >>> have mentioned. I think this is the relevant part of the log messages >>> regarding the problem with the root directory: >>> >>> [D 05/15/2015 21:08:37] server_post_unexpected_recv >>> [D 05/15/2015 21:08:37] server_op_state_get_machine 999 >>> [D 05/15/2015 21:08:37] Initialization completed successfully. >>> [D 05/15/2015 21:08:37] server_state_machine_alloc_noreq 27 >>> [D 05/15/2015 21:08:37] server_op_state_get_machine 27 >>> [D 05/15/2015 21:08:37] server_state_machine_start_noreq 0x1d6fa10 >>> [D 05/15/2015 21:08:37] *** Trove KeyVal Read of /dda >>> [D 05/15/2015 21:08:37] op_queue add: 0x1d71100 >>> [D 05/15/2015 21:08:37] [DBPF THREAD]: [KEYVAL -1]: -7 >>> [D 05/15/2015 21:08:37] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE >>> (KEYVAL_READ) >>> [D 05/15/2015 21:08:37] warning: keyval read error on handle 1048576 and >>> key= /dda (BDB0073 DB_NOTFOUND: No matching key/data pair found) >>> [D 05/15/2015 21:08:37] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE >>> (KEYVAL_READ) (ret: -1073742082) >>> [D 05/15/2015 21:08:37] op_queue add: 0x1d71100 >>> [D 05/15/2015 21:08:37] server_state_machine_alloc_noreq 46 >>> [D 05/15/2015 21:08:37] server_op_state_get_machine 46 >>> [D 05/15/2015 21:08:37] server_state_machine_start_noreq 0x1d70f80 >>> [D 05/15/2015 21:08:37] mgmt-create-root-dir: Init dist-dir-attr for dir >>> meta handle 1048576 with tree_height=1, num_servers=2, bitmap_size=1, >>> split_size=100, server_no=0 and branch_level=1 >>> [D 05/15/2015 21:08:37] mgmt-create-root-dir: Init dist_dir_bitmap as: >>> [D 05/15/2015 21:08:37] i=0 : 00 00 00 03 >>> [D 05/15/2015 21:08:37] >>> [D 05/15/2015 21:08:37] creating 1 local dirdata files >>> [D 05/15/2015 21:08:37] creating 1 remote dirdata files >>> [D 05/15/2015 21:08:37] job_precreate_pool_get_handles: requesting 1 >>> handles of type 16 >>> [E 05/15/2015 21:08:37] Warning: unable to create root dir due to error: >>> Invalid argument >>> [E 05/15/2015 21:08:37] Your FS may be in an inconsistent state >>> [D 05/15/2015 21:08:37] server_state_machine_complete_noreq: 0x1d70f80 >>> [D 05/15/2015 21:08:37] server_state_machine_terminate 0x1d70f80 >>> [E 05/15/2015 21:08:43] PVFS2 server got signal 15 (server_status_flag: >>> 4177919) >>> [D 05/15/2015 21:08:43] server_state_machine_terminate 0x1d2e970 >>> >>> Hope this helps. >>> >>> Regards, >>> >>> Juan >>> >>> >>> El 15/05/15 a las 22:13, Becky Ligon escribió: >>>> Juan: >>>> >>>> You may have hit upon another problem that we've encountered where the >>>> splitting of directories goes into a race condition. Try this: >>>> >>>> 1. In your orangefs-server.conf file, set DistrDirServersInitial 1 and >>>> DistrDirServersMax 1 in your multi-server configuration installation. >>>> >>>> 2. Delete your data and metadata areas and recreate. Start your >>>> servers. >>>> >>>> 3. Run your tests. >>>> >>>> See if this helps! >>>> >>>> NOTE: We are working on a fix for this problem right now but don't have >>>> a working solution just yet. >>>> >>>> Becky >>>> >>>> On Fri, May 15, 2015 at 3:38 PM, Juan PC <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> Hi Becky, >>>> >>>> Thank you for your response :-) >>>> >>>> The problem is that the log file grows at a rate of around 2 MiB per >>>> second (EvenLogging is set to none!) and, more importantly, a simple >>>> pvfs2-ls does not work. The latter is probably due to an error >>>> message >>>> that I get after starting the server that stores the root file >>>> system: >>>> >>>> [E 05/15/2015 18:38:08] Warning: unable to create root dir due to >>>> error: >>>> Resource temporarily unavailable >>>> [E 05/15/2015 18:38:08] Your FS may be in an inconsistent >>>> state >>>> >>>> although the batch_create errors appears after, when a second server >>>> is run. >>>> >>>> I have spent a lot of time trying different compilation options, >>>> configurations, db versions, checking that I run the right >>>> executables, >>>> that they use the same filesystem configuration file, etc., and the >>>> results is always the same. Well, to be honest, I was able to >>>> activate >>>> the file system once (I do not know how), but it started failing >>>> when I >>>> tried to create a few thousands files per directory (bechmark >>>> hpcs-io_1.2.0-rc1, scenarios 9-12). >>>> >>>> My feeling is that, with two servers, the problematic sever (the one >>>> aimed at storing the root directory) does not communicate correctly >>>> with >>>> the second server. There is no firewall, SELinux is disabled, etc. >>>> >>>> Some final remarks: >>>> - Security is always the default one, I have not used either >>>> --enable-security-key or --enable-security-cert option. >>>> - Same steps with OrangeFS 2.8.7 and not problem at all. >>>> >>>> So I guess that I should be doing something terribly wrong, but I do >>>> not >>>> know what :-( >>>> >>>> If I can do something (for instance, running the servers with >>>> EvenLogging set to verbose), just let me know. >>>> >>>> Regards, >>>> >>>> Juan >>>> >>>> El 15/05/15 a las 20:12, Becky Ligon escribió: >>>>> This is normal for 2.9.1 and okay to get the messages you are seeing. >>>>> batch_create comes into play when a server needs to gather more handles >>>>> (like inodes) from another server. The "Resource temporarily >>>>> unavailable" is generated when the capability associated with this >>>>> request has timed out. So, the calling server regenerates the >>>>> capability and resends the batch_create request. >>>>> >>>>> The OFS development team is changing when these capabilities get >>>>> generated for batch_create requests to alleviate this problem. For >>>>> now, >>>>> you can ignore these messages. >>>>> >>>>> Sorry for the inconvenience. >>>>> >>>>> Becky >>>>> >>>>> >>>>> >>>>> On Fri, May 15, 2015 at 11:48 AM, Juan PC <[email protected] >>>>> <mailto:[email protected]> >>>>> <mailto:[email protected] <mailto:[email protected]>>> wrote: >>>>> >>>>> Dear Becky, >>>>> >>>>> I am trying to use orangefs-2.9.1, but everytime I run the >>>>> >>>> servers I get >>>> >>>>> the message of the subject in one of the servers, and its log >>>>> >>>> file grows >>>> >>>>> very quickly. The last reference that I have seen about this >>>>> >>>> problem is >>>> >>>> http://www.beowulf-underground.org/pipermail/pvfs2-users/2015-April/004432.html >>>> . >>>> >>>>> I have used option --disable-capcache of configure, but same >>>>> >>>> result. Do >>>> >>>>> you know if this issue has been already fixed or if there is a >>>>> workaround? >>>>> >>>>> Best regards, >>>>> >>>>> Juan >>>>> >>>>> >>>>> >> > > > ---------------------------------------------------------------- > This message was sent using IMP, the Internet Messaging Program. >
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
