It is attached.

I do not know if this is important, but one thing that I have seen with
this configuration file is that if I run the second server just after
running the first server, everything seems to work. However, if I wait
for a few seconds, the error message of the root directory appears in
the first server. Then, when I launch de second server, I get the
avalanche of batch_create error messages. This avalanche seems to stop
when it has generated around 1 GB of data. However, because of the
problem with the root directory, the file system does not work.

I have checked if waiting for a few seconds between server executions is
an issue in OrangeFS 2.8.7 and it is not.

Regards,

        Juan

El 16/05/15 a las 17:59, Becky Ligon escribió:
> Can you send me your orangefs-server.conf file?  
> 
> NOTE:  do not use native IB with this version.  we have a known issue
> with distributed directories and IB that we are currently working on.
> 
> Becky
> 
> On Sat, May 16, 2015 at 11:43 AM, <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     No, only TCP over Ethernet. We have IB NICs, but I have not compiled
>     OrangeFS with support for them.
> 
>            Juan
> 
> 
>     Quoting "Becky Ligon" <[email protected]
>     <mailto:[email protected]>>:
> 
>         Are you using native IB?
> 
>         Becky
> 
>         Sent from my iPhone
> 
>             On May 15, 2015, at 5:39 PM, Juan PC <[email protected]
>             <mailto:[email protected]>> wrote:
> 
>             Hi,
> 
>             Well, your configuration can probably avoid the problem with the
>             benchmark, which I can not run because the creation of the
>             OrangeFS fails.
> 
>             The batch_create error is still there because it appears
>             just when I
>             launch the servers. The creation of the root directory fails
>             too, as I
>             have mentioned. I think this is the relevant part of the log
>             messages
>             regarding the problem with the root directory:
> 
>             [D 05/15/2015 21:08:37] server_post_unexpected_recv
>             [D 05/15/2015 21:08:37] server_op_state_get_machine 999
>             [D 05/15/2015 21:08:37] Initialization completed successfully.
>             [D 05/15/2015 21:08:37] server_state_machine_alloc_noreq 27
>             [D 05/15/2015 21:08:37] server_op_state_get_machine 27
>             [D 05/15/2015 21:08:37] server_state_machine_start_noreq
>             0x1d6fa10
>             [D 05/15/2015 21:08:37] *** Trove KeyVal Read of /dda
>             [D 05/15/2015 21:08:37] op_queue add: 0x1d71100
>             [D 05/15/2015 21:08:37] [DBPF THREAD]: [KEYVAL -1]: -7
>             [D 05/15/2015 21:08:37] [DBPF THREAD]: STARTING TROVE
>             SERVICE ROUTINE
>             (KEYVAL_READ)
>             [D 05/15/2015 21:08:37] warning: keyval read error on handle
>             1048576 and
>             key= /dda (BDB0073 DB_NOTFOUND: No matching key/data pair found)
>             [D 05/15/2015 21:08:37] [DBPF THREAD]: FINISHED TROVE
>             SERVICE ROUTINE
>             (KEYVAL_READ) (ret: -1073742082)
>             [D 05/15/2015 21:08:37] op_queue add: 0x1d71100
>             [D 05/15/2015 21:08:37] server_state_machine_alloc_noreq 46
>             [D 05/15/2015 21:08:37] server_op_state_get_machine 46
>             [D 05/15/2015 21:08:37] server_state_machine_start_noreq
>             0x1d70f80
>             [D 05/15/2015 21:08:37] mgmt-create-root-dir: Init
>             dist-dir-attr for dir
>             meta handle 1048576 with tree_height=1, num_servers=2,
>             bitmap_size=1,
>             split_size=100, server_no=0 and branch_level=1
>             [D 05/15/2015 21:08:37] mgmt-create-root-dir: Init
>             dist_dir_bitmap as:
>             [D 05/15/2015 21:08:37]  i=0 : 00 00 00 03
>             [D 05/15/2015 21:08:37]
>             [D 05/15/2015 21:08:37] creating 1 local dirdata files
>             [D 05/15/2015 21:08:37] creating 1 remote dirdata files
>             [D 05/15/2015 21:08:37] job_precreate_pool_get_handles:
>             requesting 1
>             handles of type 16
>             [E 05/15/2015 21:08:37] Warning: unable to create root dir
>             due to error:
>             Invalid argument
>             [E 05/15/2015 21:08:37]          Your FS may be in an
>             inconsistent state
>             [D 05/15/2015 21:08:37] server_state_machine_complete_noreq:
>             0x1d70f80
>             [D 05/15/2015 21:08:37] server_state_machine_terminate 0x1d70f80
>             [E 05/15/2015 21:08:43] PVFS2 server got signal 15
>             (server_status_flag:
>             4177919)
>             [D 05/15/2015 21:08:43] server_state_machine_terminate 0x1d2e970
> 
>             Hope this helps.
> 
>             Regards,
> 
>                Juan
> 
> 
>                 El 15/05/15 a las 22:13, Becky Ligon escribió:
>                 Juan:
> 
>                 You may have hit upon another problem that we've
>                 encountered where the
>                 splitting of directories goes into a race condition. 
>                 Try this:
> 
>                 1.  In your orangefs-server.conf file, set
>                 DistrDirServersInitial 1 and
>                 DistrDirServersMax 1 in your multi-server configuration
>                 installation.
> 
>                 2.  Delete your data and metadata areas and recreate. 
>                 Start your servers.
> 
>                 3.  Run your tests.
> 
>                 See if this helps!
> 
>                 NOTE:  We are working on a fix for this problem right
>                 now but don't have
>                 a working solution just yet.
> 
>                 Becky
> 
>                 On Fri, May 15, 2015 at 3:38 PM, Juan PC
>                 <[email protected] <mailto:[email protected]>
>                 <mailto:[email protected]
>                 <mailto:[email protected]>>> wrote:
> 
>                    Hi Becky,
> 
>                    Thank you for your response :-)
> 
>                    The problem is that the log file grows at a rate of
>                 around 2 MiB per
>                    second (EvenLogging is set to none!) and, more
>                 importantly, a simple
>                    pvfs2-ls does not work. The latter is probably due to
>                 an error message
>                    that I get after starting the server that stores the
>                 root file system:
> 
>                    [E 05/15/2015 18:38:08] Warning: unable to create
>                 root dir due to error:
>                    Resource temporarily unavailable
>                    [E 05/15/2015 18:38:08]          Your FS may be in an
>                 inconsistent state
> 
>                    although the batch_create errors appears after, when
>                 a second server
>                    is run.
> 
>                    I have spent a lot of time trying different
>                 compilation options,
>                    configurations, db versions, checking that I run the
>                 right executables,
>                    that they use the same filesystem configuration file,
>                 etc., and the
>                    results is always the same. Well, to be honest, I was
>                 able to activate
>                    the file system once (I do not know how), but it
>                 started failing when I
>                    tried to create a few thousands files per directory
>                 (bechmark
>                    hpcs-io_1.2.0-rc1, scenarios 9-12).
> 
>                    My feeling is that, with two servers, the problematic
>                 sever (the one
>                    aimed at storing the root directory) does not
>                 communicate correctly with
>                    the second server. There is no firewall, SELinux is
>                 disabled, etc.
> 
>                    Some final remarks:
>                    - Security is always the default one, I have not used
>                 either
>                    --enable-security-key or --enable-security-cert option.
>                    - Same steps with OrangeFS 2.8.7 and not problem at all.
> 
>                    So I guess that I should be doing something terribly
>                 wrong, but I do not
>                    know what :-(
> 
>                    If I can do something (for instance, running the
>                 servers with
>                    EvenLogging set to verbose), just let me know.
> 
>                    Regards,
> 
>                            Juan
> 
>                        El 15/05/15 a las 20:12, Becky Ligon escribió:
>                     This is normal for 2.9.1 and okay to get the
>                     messages you are seeing.
>                     batch_create comes into play when a server needs to
>                     gather more handles
>                     (like inodes) from another server.  The "Resource
>                     temporarily
>                     unavailable" is generated when the capability
>                     associated with this
>                     request has timed out.  So, the calling server
>                     regenerates the
>                     capability and resends the batch_create request.
> 
>                     The OFS development team is changing when these
>                     capabilities get
>                     generated for batch_create requests to alleviate
>                     this problem.  For now,
>                     you can ignore these messages.
> 
>                     Sorry for the inconvenience.
> 
>                     Becky
> 
> 
> 
>                     On Fri, May 15, 2015 at 11:48 AM, Juan PC
>                     <[email protected] <mailto:[email protected]>
>                     <mailto:[email protected]
>                     <mailto:[email protected]>>
>                     <mailto:[email protected]
>                     <mailto:[email protected]>
>                     <mailto:[email protected]
>                     <mailto:[email protected]>>>> wrote:
> 
>                        Dear Becky,
> 
>                        I am trying to use orangefs-2.9.1, but everytime
>                     I run the
> 
>                    servers I get
> 
>                        the message of the subject in one of the servers,
>                     and its log
> 
>                    file grows
> 
>                        very quickly. The last reference that I have seen
>                     about this
> 
>                    problem is
>                    
>                 
> http://www.beowulf-underground.org/pipermail/pvfs2-users/2015-April/004432.html.
> 
>                        I have used option --disable-capcache of
>                     configure, but same
> 
>                    result. Do
> 
>                        you know if this issue has been already fixed or
>                     if there is a
>                        workaround?
> 
>                        Best regards,
> 
>                                Juan
> 
> 
> 
> 
> 
> 
>     ----------------------------------------------------------------
>     This message was sent using IMP, the Internet Messaging Program.
> 
> 
<Defaults>
        UnexpectedRequests 50
        EventLogging none
        EnableTracing no
        LogStamp datetime
        BMIModules bmi_tcp
        FlowModules flowproto_multiqueue
        PerfUpdateInterval 1000
        ServerJobBMITimeoutSecs 30
        ServerJobFlowTimeoutSecs 30
        ClientJobBMITimeoutSecs 300
        ClientJobFlowTimeoutSecs 300
        ClientRetryLimit 5
        ClientRetryDelayMilliSecs 2000
        PrecreateBatchSize 0,32,512,32,32,32,0
        PrecreateLowThreshold 0,16,256,16,16,16,0

        DataStorageSpace /media/mds/orangefs/data
        MetadataStorageSpace /media/mds/orangefs/meta

        LogFile /media/mds/orangefs/orangefs-server.log
</Defaults>

<Aliases>
        Alias computo10 tcp://computo10:3334
        Alias computo11 tcp://computo11:3334
</Aliases>

<Filesystem>
        Name orangefs
        ID 739520197
        RootHandle 1048576
        FileStuffing yes
        DistrDirServersInitial 1
        DistrDirServersMax 1
        DistrDirSplitSize 100
        <MetaHandleRanges>
                Range computo10 3-2305843009213693953
                Range computo11 2305843009213693954-4611686018427387904
        </MetaHandleRanges>
        <DataHandleRanges>
                Range computo10 4611686018427387905-6917529027641081855
                Range computo11 6917529027641081856-9223372036854775806
        </DataHandleRanges>
        <StorageHints>
                TroveSyncMeta yes
                TroveSyncData no
                TroveMethod alt-aio
        </StorageHints>
</Filesystem>
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Reply via email to