Can you send me your orangefs-server.conf file?

NOTE:  do not use native IB with this version.  we have a known issue with
distributed directories and IB that we are currently working on.

Becky

On Sat, May 16, 2015 at 11:43 AM, <[email protected]> wrote:

> No, only TCP over Ethernet. We have IB NICs, but I have not compiled
> OrangeFS with support for them.
>
>        Juan
>
>
> Quoting "Becky Ligon" <[email protected]>:
>
>  Are you using native IB?
>>
>> Becky
>>
>> Sent from my iPhone
>>
>>  On May 15, 2015, at 5:39 PM, Juan PC <[email protected]> wrote:
>>>
>>> Hi,
>>>
>>> Well, your configuration can probably avoid the problem with the
>>> benchmark, which I can not run because the creation of the OrangeFS
>>> fails.
>>>
>>> The batch_create error is still there because it appears just when I
>>> launch the servers. The creation of the root directory fails too, as I
>>> have mentioned. I think this is the relevant part of the log messages
>>> regarding the problem with the root directory:
>>>
>>> [D 05/15/2015 21:08:37] server_post_unexpected_recv
>>> [D 05/15/2015 21:08:37] server_op_state_get_machine 999
>>> [D 05/15/2015 21:08:37] Initialization completed successfully.
>>> [D 05/15/2015 21:08:37] server_state_machine_alloc_noreq 27
>>> [D 05/15/2015 21:08:37] server_op_state_get_machine 27
>>> [D 05/15/2015 21:08:37] server_state_machine_start_noreq 0x1d6fa10
>>> [D 05/15/2015 21:08:37] *** Trove KeyVal Read of /dda
>>> [D 05/15/2015 21:08:37] op_queue add: 0x1d71100
>>> [D 05/15/2015 21:08:37] [DBPF THREAD]: [KEYVAL -1]: -7
>>> [D 05/15/2015 21:08:37] [DBPF THREAD]: STARTING TROVE SERVICE ROUTINE
>>> (KEYVAL_READ)
>>> [D 05/15/2015 21:08:37] warning: keyval read error on handle 1048576 and
>>> key= /dda (BDB0073 DB_NOTFOUND: No matching key/data pair found)
>>> [D 05/15/2015 21:08:37] [DBPF THREAD]: FINISHED TROVE SERVICE ROUTINE
>>> (KEYVAL_READ) (ret: -1073742082)
>>> [D 05/15/2015 21:08:37] op_queue add: 0x1d71100
>>> [D 05/15/2015 21:08:37] server_state_machine_alloc_noreq 46
>>> [D 05/15/2015 21:08:37] server_op_state_get_machine 46
>>> [D 05/15/2015 21:08:37] server_state_machine_start_noreq 0x1d70f80
>>> [D 05/15/2015 21:08:37] mgmt-create-root-dir: Init dist-dir-attr for dir
>>> meta handle 1048576 with tree_height=1, num_servers=2, bitmap_size=1,
>>> split_size=100, server_no=0 and branch_level=1
>>> [D 05/15/2015 21:08:37] mgmt-create-root-dir: Init dist_dir_bitmap as:
>>> [D 05/15/2015 21:08:37]  i=0 : 00 00 00 03
>>> [D 05/15/2015 21:08:37]
>>> [D 05/15/2015 21:08:37] creating 1 local dirdata files
>>> [D 05/15/2015 21:08:37] creating 1 remote dirdata files
>>> [D 05/15/2015 21:08:37] job_precreate_pool_get_handles: requesting 1
>>> handles of type 16
>>> [E 05/15/2015 21:08:37] Warning: unable to create root dir due to error:
>>> Invalid argument
>>> [E 05/15/2015 21:08:37]          Your FS may be in an inconsistent state
>>> [D 05/15/2015 21:08:37] server_state_machine_complete_noreq: 0x1d70f80
>>> [D 05/15/2015 21:08:37] server_state_machine_terminate 0x1d70f80
>>> [E 05/15/2015 21:08:43] PVFS2 server got signal 15 (server_status_flag:
>>> 4177919)
>>> [D 05/15/2015 21:08:43] server_state_machine_terminate 0x1d2e970
>>>
>>> Hope this helps.
>>>
>>> Regards,
>>>
>>>    Juan
>>>
>>>
>>>  El 15/05/15 a las 22:13, Becky Ligon escribió:
>>>> Juan:
>>>>
>>>> You may have hit upon another problem that we've encountered where the
>>>> splitting of directories goes into a race condition.  Try this:
>>>>
>>>> 1.  In your orangefs-server.conf file, set DistrDirServersInitial 1 and
>>>> DistrDirServersMax 1 in your multi-server configuration installation.
>>>>
>>>> 2.  Delete your data and metadata areas and recreate.  Start your
>>>> servers.
>>>>
>>>> 3.  Run your tests.
>>>>
>>>> See if this helps!
>>>>
>>>> NOTE:  We are working on a fix for this problem right now but don't have
>>>> a working solution just yet.
>>>>
>>>> Becky
>>>>
>>>> On Fri, May 15, 2015 at 3:38 PM, Juan PC <[email protected]
>>>> <mailto:[email protected]>> wrote:
>>>>
>>>>    Hi Becky,
>>>>
>>>>    Thank you for your response :-)
>>>>
>>>>    The problem is that the log file grows at a rate of around 2 MiB per
>>>>    second (EvenLogging is set to none!) and, more importantly, a simple
>>>>    pvfs2-ls does not work. The latter is probably due to an error
>>>> message
>>>>    that I get after starting the server that stores the root file
>>>> system:
>>>>
>>>>    [E 05/15/2015 18:38:08] Warning: unable to create root dir due to
>>>> error:
>>>>    Resource temporarily unavailable
>>>>    [E 05/15/2015 18:38:08]          Your FS may be in an inconsistent
>>>> state
>>>>
>>>>    although the batch_create errors appears after, when a second server
>>>>    is run.
>>>>
>>>>    I have spent a lot of time trying different compilation options,
>>>>    configurations, db versions, checking that I run the right
>>>> executables,
>>>>    that they use the same filesystem configuration file, etc., and the
>>>>    results is always the same. Well, to be honest, I was able to
>>>> activate
>>>>    the file system once (I do not know how), but it started failing
>>>> when I
>>>>    tried to create a few thousands files per directory (bechmark
>>>>    hpcs-io_1.2.0-rc1, scenarios 9-12).
>>>>
>>>>    My feeling is that, with two servers, the problematic sever (the one
>>>>    aimed at storing the root directory) does not communicate correctly
>>>> with
>>>>    the second server. There is no firewall, SELinux is disabled, etc.
>>>>
>>>>    Some final remarks:
>>>>    - Security is always the default one, I have not used either
>>>>    --enable-security-key or --enable-security-cert option.
>>>>    - Same steps with OrangeFS 2.8.7 and not problem at all.
>>>>
>>>>    So I guess that I should be doing something terribly wrong, but I do
>>>> not
>>>>    know what :-(
>>>>
>>>>    If I can do something (for instance, running the servers with
>>>>    EvenLogging set to verbose), just let me know.
>>>>
>>>>    Regards,
>>>>
>>>>            Juan
>>>>
>>>>     El 15/05/15 a las 20:12, Becky Ligon escribió:
>>>>> This is normal for 2.9.1 and okay to get the messages you are seeing.
>>>>> batch_create comes into play when a server needs to gather more handles
>>>>> (like inodes) from another server.  The "Resource temporarily
>>>>> unavailable" is generated when the capability associated with this
>>>>> request has timed out.  So, the calling server regenerates the
>>>>> capability and resends the batch_create request.
>>>>>
>>>>> The OFS development team is changing when these capabilities get
>>>>> generated for batch_create requests to alleviate this problem.  For
>>>>> now,
>>>>> you can ignore these messages.
>>>>>
>>>>> Sorry for the inconvenience.
>>>>>
>>>>> Becky
>>>>>
>>>>>
>>>>>
>>>>> On Fri, May 15, 2015 at 11:48 AM, Juan PC <[email protected]
>>>>> <mailto:[email protected]>
>>>>> <mailto:[email protected] <mailto:[email protected]>>> wrote:
>>>>>
>>>>>    Dear Becky,
>>>>>
>>>>>    I am trying to use orangefs-2.9.1, but everytime I run the
>>>>>
>>>>    servers I get
>>>>
>>>>>    the message of the subject in one of the servers, and its log
>>>>>
>>>>    file grows
>>>>
>>>>>    very quickly. The last reference that I have seen about this
>>>>>
>>>>    problem is
>>>>
>>>> http://www.beowulf-underground.org/pipermail/pvfs2-users/2015-April/004432.html
>>>> .
>>>>
>>>>>    I have used option --disable-capcache of configure, but same
>>>>>
>>>>    result. Do
>>>>
>>>>>    you know if this issue has been already fixed or if there is a
>>>>>    workaround?
>>>>>
>>>>>    Best regards,
>>>>>
>>>>>            Juan
>>>>>
>>>>>
>>>>>
>>
>
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
>
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to