It is attached.
I do not know if this is important, but one thing that I have seen with
this configuration file is that if I run the second server just after
running the first server, everything seems to work. However, if I wait
for a few seconds, the error message of the root directory appears in
the first server. Then, when I launch de second server, I get the
avalanche of batch_create error messages. This avalanche seems to stop
when it has generated around 1 GB of data. However, because of the
problem with the root directory, the file system does not work.
I have checked if waiting for a few seconds between server executions is
an issue in OrangeFS 2.8.7 and it is not.
Regards,
Juan
El 16/05/15 a las 17:59, Becky Ligon escribió:
> Can you send me your orangefs-server.conf file?
>
> NOTE: do not use native IB with this version. we have a known issue
> with distributed directories and IB that we are currently working on.
>
> Becky
>
> On Sat, May 16, 2015 at 11:43 AM, <[email protected]
> <mailto:[email protected]>> wrote:
>
> No, only TCP over Ethernet. We have IB NICs, but I have not compiled
> OrangeFS with support for them.
>
> Juan
>
>
> Quoting "Becky Ligon" <[email protected]
> <mailto:[email protected]>>:
>
> Are you using native IB?
>
> Becky
>
> Sent from my iPhone
>
> On May 15, 2015, at 5:39 PM, Juan PC <[email protected]
> <mailto:[email protected]>> wrote:
>
> Hi,
>
> Well, your configuration can probably avoid the problem with the
> benchmark, which I can not run because the creation of the
> OrangeFS fails.
>
> The batch_create error is still there because it appears
> just when I
> launch the servers. The creation of the root directory fails
> too, as I
> have mentioned. I think this is the relevant part of the log
> messages
> regarding the problem with the root directory:
>
> [D 05/15/2015 21:08:37] server_post_unexpected_recv
> [D 05/15/2015 21:08:37] server_op_state_get_machine 999
> [D 05/15/2015 21:08:37] Initialization completed successfully.
> [D 05/15/2015 21:08:37] server_state_machine_alloc_noreq 27
> [D 05/15/2015 21:08:37] server_op_state_get_machine 27
> [D 05/15/2015 21:08:37] server_state_machine_start_noreq
> 0x1d6fa10
> [D 05/15/2015 21:08:37] *** Trove KeyVal Read of /dda
> [D 05/15/2015 21:08:37] op_queue add: 0x1d71100
> [D 05/15/2015 21:08:37] [DBPF THREAD]: [KEYVAL -1]: -7
> [D 05/15/2015 21:08:37] [DBPF THREAD]: STARTING TROVE
> SERVICE ROUTINE
> (KEYVAL_READ)
> [D 05/15/2015 21:08:37] warning: keyval read error on handle
> 1048576 and
> key= /dda (BDB0073 DB_NOTFOUND: No matching key/data pair found)
> [D 05/15/2015 21:08:37] [DBPF THREAD]: FINISHED TROVE
> SERVICE ROUTINE
> (KEYVAL_READ) (ret: -1073742082)
> [D 05/15/2015 21:08:37] op_queue add: 0x1d71100
> [D 05/15/2015 21:08:37] server_state_machine_alloc_noreq 46
> [D 05/15/2015 21:08:37] server_op_state_get_machine 46
> [D 05/15/2015 21:08:37] server_state_machine_start_noreq
> 0x1d70f80
> [D 05/15/2015 21:08:37] mgmt-create-root-dir: Init
> dist-dir-attr for dir
> meta handle 1048576 with tree_height=1, num_servers=2,
> bitmap_size=1,
> split_size=100, server_no=0 and branch_level=1
> [D 05/15/2015 21:08:37] mgmt-create-root-dir: Init
> dist_dir_bitmap as:
> [D 05/15/2015 21:08:37] i=0 : 00 00 00 03
> [D 05/15/2015 21:08:37]
> [D 05/15/2015 21:08:37] creating 1 local dirdata files
> [D 05/15/2015 21:08:37] creating 1 remote dirdata files
> [D 05/15/2015 21:08:37] job_precreate_pool_get_handles:
> requesting 1
> handles of type 16
> [E 05/15/2015 21:08:37] Warning: unable to create root dir
> due to error:
> Invalid argument
> [E 05/15/2015 21:08:37] Your FS may be in an
> inconsistent state
> [D 05/15/2015 21:08:37] server_state_machine_complete_noreq:
> 0x1d70f80
> [D 05/15/2015 21:08:37] server_state_machine_terminate 0x1d70f80
> [E 05/15/2015 21:08:43] PVFS2 server got signal 15
> (server_status_flag:
> 4177919)
> [D 05/15/2015 21:08:43] server_state_machine_terminate 0x1d2e970
>
> Hope this helps.
>
> Regards,
>
> Juan
>
>
> El 15/05/15 a las 22:13, Becky Ligon escribió:
> Juan:
>
> You may have hit upon another problem that we've
> encountered where the
> splitting of directories goes into a race condition.
> Try this:
>
> 1. In your orangefs-server.conf file, set
> DistrDirServersInitial 1 and
> DistrDirServersMax 1 in your multi-server configuration
> installation.
>
> 2. Delete your data and metadata areas and recreate.
> Start your servers.
>
> 3. Run your tests.
>
> See if this helps!
>
> NOTE: We are working on a fix for this problem right
> now but don't have
> a working solution just yet.
>
> Becky
>
> On Fri, May 15, 2015 at 3:38 PM, Juan PC
> <[email protected] <mailto:[email protected]>
> <mailto:[email protected]
> <mailto:[email protected]>>> wrote:
>
> Hi Becky,
>
> Thank you for your response :-)
>
> The problem is that the log file grows at a rate of
> around 2 MiB per
> second (EvenLogging is set to none!) and, more
> importantly, a simple
> pvfs2-ls does not work. The latter is probably due to
> an error message
> that I get after starting the server that stores the
> root file system:
>
> [E 05/15/2015 18:38:08] Warning: unable to create
> root dir due to error:
> Resource temporarily unavailable
> [E 05/15/2015 18:38:08] Your FS may be in an
> inconsistent state
>
> although the batch_create errors appears after, when
> a second server
> is run.
>
> I have spent a lot of time trying different
> compilation options,
> configurations, db versions, checking that I run the
> right executables,
> that they use the same filesystem configuration file,
> etc., and the
> results is always the same. Well, to be honest, I was
> able to activate
> the file system once (I do not know how), but it
> started failing when I
> tried to create a few thousands files per directory
> (bechmark
> hpcs-io_1.2.0-rc1, scenarios 9-12).
>
> My feeling is that, with two servers, the problematic
> sever (the one
> aimed at storing the root directory) does not
> communicate correctly with
> the second server. There is no firewall, SELinux is
> disabled, etc.
>
> Some final remarks:
> - Security is always the default one, I have not used
> either
> --enable-security-key or --enable-security-cert option.
> - Same steps with OrangeFS 2.8.7 and not problem at all.
>
> So I guess that I should be doing something terribly
> wrong, but I do not
> know what :-(
>
> If I can do something (for instance, running the
> servers with
> EvenLogging set to verbose), just let me know.
>
> Regards,
>
> Juan
>
> El 15/05/15 a las 20:12, Becky Ligon escribió:
> This is normal for 2.9.1 and okay to get the
> messages you are seeing.
> batch_create comes into play when a server needs to
> gather more handles
> (like inodes) from another server. The "Resource
> temporarily
> unavailable" is generated when the capability
> associated with this
> request has timed out. So, the calling server
> regenerates the
> capability and resends the batch_create request.
>
> The OFS development team is changing when these
> capabilities get
> generated for batch_create requests to alleviate
> this problem. For now,
> you can ignore these messages.
>
> Sorry for the inconvenience.
>
> Becky
>
>
>
> On Fri, May 15, 2015 at 11:48 AM, Juan PC
> <[email protected] <mailto:[email protected]>
> <mailto:[email protected]
> <mailto:[email protected]>>
> <mailto:[email protected]
> <mailto:[email protected]>
> <mailto:[email protected]
> <mailto:[email protected]>>>> wrote:
>
> Dear Becky,
>
> I am trying to use orangefs-2.9.1, but everytime
> I run the
>
> servers I get
>
> the message of the subject in one of the servers,
> and its log
>
> file grows
>
> very quickly. The last reference that I have seen
> about this
>
> problem is
>
>
> http://www.beowulf-underground.org/pipermail/pvfs2-users/2015-April/004432.html.
>
> I have used option --disable-capcache of
> configure, but same
>
> result. Do
>
> you know if this issue has been already fixed or
> if there is a
> workaround?
>
> Best regards,
>
> Juan
>
>
>
>
>
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
>
>
<Defaults>
UnexpectedRequests 50
EventLogging none
EnableTracing no
LogStamp datetime
BMIModules bmi_tcp
FlowModules flowproto_multiqueue
PerfUpdateInterval 1000
ServerJobBMITimeoutSecs 30
ServerJobFlowTimeoutSecs 30
ClientJobBMITimeoutSecs 300
ClientJobFlowTimeoutSecs 300
ClientRetryLimit 5
ClientRetryDelayMilliSecs 2000
PrecreateBatchSize 0,32,512,32,32,32,0
PrecreateLowThreshold 0,16,256,16,16,16,0
DataStorageSpace /media/mds/orangefs/data
MetadataStorageSpace /media/mds/orangefs/meta
LogFile /media/mds/orangefs/orangefs-server.log
</Defaults>
<Aliases>
Alias computo10 tcp://computo10:3334
Alias computo11 tcp://computo11:3334
</Aliases>
<Filesystem>
Name orangefs
ID 739520197
RootHandle 1048576
FileStuffing yes
DistrDirServersInitial 1
DistrDirServersMax 1
DistrDirSplitSize 100
<MetaHandleRanges>
Range computo10 3-2305843009213693953
Range computo11 2305843009213693954-4611686018427387904
</MetaHandleRanges>
<DataHandleRanges>
Range computo10 4611686018427387905-6917529027641081855
Range computo11 6917529027641081856-9223372036854775806
</DataHandleRanges>
<StorageHints>
TroveSyncMeta yes
TroveSyncData no
TroveMethod alt-aio
</StorageHints>
</Filesystem>
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users