Hi Becky,
When I have tried to set up an OrangeFS cluster with 4 and 8 nodes, the
batch_create error message has appeared again. Then, I have realized
that some of my nodes had a wrong time (with a maximum difference of two
hours and a half between nodes). After synchronizing the times, the
batch_create problem seems to be gone. Does this make sense? I mean, can
a wrong time in some servers cause the problem? I do not remember seeing
any recommendation or warning about node times in the OrangeFS
documentation?
Regards,
Juan
El 16/05/15 a las 22:59, Becky Ligon escribió:
> Juan:
>
> The conf file looks good. Can you send me your server log files?
>
> Becky
>
> On Saturday, May 16, 2015, Juan PC <[email protected]
> <mailto:[email protected]>> wrote:
>
> It is attached.
>
> I do not know if this is important, but one thing that I have seen with
> this configuration file is that if I run the second server just after
> running the first server, everything seems to work. However, if I wait
> for a few seconds, the error message of the root directory appears in
> the first server. Then, when I launch de second server, I get the
> avalanche of batch_create error messages. This avalanche seems to stop
> when it has generated around 1 GB of data. However, because of the
> problem with the root directory, the file system does not work.
>
> I have checked if waiting for a few seconds between server executions is
> an issue in OrangeFS 2.8.7 and it is not.
>
> Regards,
>
> Juan
>
> El 16/05/15 a las 17:59, Becky Ligon escribió:
> > Can you send me your orangefs-server.conf file?
> >
> > NOTE: do not use native IB with this version. we have a known issue
> > with distributed directories and IB that we are currently working on.
> >
> > Becky
> >
> > On Sat, May 16, 2015 at 11:43 AM, <[email protected] <javascript:;>
> > <mailto:[email protected] <javascript:;>>> wrote:
> >
> > No, only TCP over Ethernet. We have IB NICs, but I have not
> compiled
> > OrangeFS with support for them.
> >
> > Juan
> >
> >
> > Quoting "Becky Ligon" <[email protected] <javascript:;>
> > <mailto:[email protected] <javascript:;>>>:
> >
> > Are you using native IB?
> >
> > Becky
> >
> > Sent from my iPhone
> >
> > On May 15, 2015, at 5:39 PM, Juan PC
> <[email protected] <javascript:;>
> > <mailto:[email protected] <javascript:;>>> wrote:
> >
> > Hi,
> >
> > Well, your configuration can probably avoid the
> problem with the
> > benchmark, which I can not run because the creation of the
> > OrangeFS fails.
> >
> > The batch_create error is still there because it appears
> > just when I
> > launch the servers. The creation of the root directory
> fails
> > too, as I
> > have mentioned. I think this is the relevant part of
> the log
> > messages
> > regarding the problem with the root directory:
> >
> > [D 05/15/2015 21:08:37] server_post_unexpected_recv
> > [D 05/15/2015 21:08:37] server_op_state_get_machine 999
> > [D 05/15/2015 21:08:37] Initialization completed
> successfully.
> > [D 05/15/2015 21:08:37]
> server_state_machine_alloc_noreq 27
> > [D 05/15/2015 21:08:37] server_op_state_get_machine 27
> > [D 05/15/2015 21:08:37] server_state_machine_start_noreq
> > 0x1d6fa10
> > [D 05/15/2015 21:08:37] *** Trove KeyVal Read of /dda
> > [D 05/15/2015 21:08:37] op_queue add: 0x1d71100
> > [D 05/15/2015 21:08:37] [DBPF THREAD]: [KEYVAL -1]: -7
> > [D 05/15/2015 21:08:37] [DBPF THREAD]: STARTING TROVE
> > SERVICE ROUTINE
> > (KEYVAL_READ)
> > [D 05/15/2015 21:08:37] warning: keyval read error on
> handle
> > 1048576 and
> > key= /dda (BDB0073 DB_NOTFOUND: No matching key/data
> pair found)
> > [D 05/15/2015 21:08:37] [DBPF THREAD]: FINISHED TROVE
> > SERVICE ROUTINE
> > (KEYVAL_READ) (ret: -1073742082)
> > [D 05/15/2015 21:08:37] op_queue add: 0x1d71100
> > [D 05/15/2015 21:08:37]
> server_state_machine_alloc_noreq 46
> > [D 05/15/2015 21:08:37] server_op_state_get_machine 46
> > [D 05/15/2015 21:08:37] server_state_machine_start_noreq
> > 0x1d70f80
> > [D 05/15/2015 21:08:37] mgmt-create-root-dir: Init
> > dist-dir-attr for dir
> > meta handle 1048576 with tree_height=1, num_servers=2,
> > bitmap_size=1,
> > split_size=100, server_no=0 and branch_level=1
> > [D 05/15/2015 21:08:37] mgmt-create-root-dir: Init
> > dist_dir_bitmap as:
> > [D 05/15/2015 21:08:37] i=0 : 00 00 00 03
> > [D 05/15/2015 21:08:37]
> > [D 05/15/2015 21:08:37] creating 1 local dirdata files
> > [D 05/15/2015 21:08:37] creating 1 remote dirdata files
> > [D 05/15/2015 21:08:37] job_precreate_pool_get_handles:
> > requesting 1
> > handles of type 16
> > [E 05/15/2015 21:08:37] Warning: unable to create root dir
> > due to error:
> > Invalid argument
> > [E 05/15/2015 21:08:37] Your FS may be in an
> > inconsistent state
> > [D 05/15/2015 21:08:37]
> server_state_machine_complete_noreq:
> > 0x1d70f80
> > [D 05/15/2015 21:08:37] server_state_machine_terminate
> 0x1d70f80
> > [E 05/15/2015 21:08:43] PVFS2 server got signal 15
> > (server_status_flag:
> > 4177919)
> > [D 05/15/2015 21:08:43] server_state_machine_terminate
> 0x1d2e970
> >
> > Hope this helps.
> >
> > Regards,
> >
> > Juan
> >
> >
> > El 15/05/15 a las 22:13, Becky Ligon escribió:
> > Juan:
> >
> > You may have hit upon another problem that we've
> > encountered where the
> > splitting of directories goes into a race condition.
> > Try this:
> >
> > 1. In your orangefs-server.conf file, set
> > DistrDirServersInitial 1 and
> > DistrDirServersMax 1 in your multi-server
> configuration
> > installation.
> >
> > 2. Delete your data and metadata areas and recreate.
> > Start your servers.
> >
> > 3. Run your tests.
> >
> > See if this helps!
> >
> > NOTE: We are working on a fix for this problem right
> > now but don't have
> > a working solution just yet.
> >
> > Becky
> >
> > On Fri, May 15, 2015 at 3:38 PM, Juan PC
> > <[email protected] <javascript:;>
> <mailto:[email protected] <javascript:;>>
> > <mailto:[email protected] <javascript:;>
> > <mailto:[email protected] <javascript:;>>>> wrote:
> >
> > Hi Becky,
> >
> > Thank you for your response :-)
> >
> > The problem is that the log file grows at a rate of
> > around 2 MiB per
> > second (EvenLogging is set to none!) and, more
> > importantly, a simple
> > pvfs2-ls does not work. The latter is probably
> due to
> > an error message
> > that I get after starting the server that
> stores the
> > root file system:
> >
> > [E 05/15/2015 18:38:08] Warning: unable to create
> > root dir due to error:
> > Resource temporarily unavailable
> > [E 05/15/2015 18:38:08] Your FS may be
> in an
> > inconsistent state
> >
> > although the batch_create errors appears after,
> when
> > a second server
> > is run.
> >
> > I have spent a lot of time trying different
> > compilation options,
> > configurations, db versions, checking that I
> run the
> > right executables,
> > that they use the same filesystem configuration
> file,
> > etc., and the
> > results is always the same. Well, to be honest,
> I was
> > able to activate
> > the file system once (I do not know how), but it
> > started failing when I
> > tried to create a few thousands files per directory
> > (bechmark
> > hpcs-io_1.2.0-rc1, scenarios 9-12).
> >
> > My feeling is that, with two servers, the
> problematic
> > sever (the one
> > aimed at storing the root directory) does not
> > communicate correctly with
> > the second server. There is no firewall, SELinux is
> > disabled, etc.
> >
> > Some final remarks:
> > - Security is always the default one, I have
> not used
> > either
> > --enable-security-key or --enable-security-cert
> option.
> > - Same steps with OrangeFS 2.8.7 and not
> problem at all.
> >
> > So I guess that I should be doing something
> terribly
> > wrong, but I do not
> > know what :-(
> >
> > If I can do something (for instance, running the
> > servers with
> > EvenLogging set to verbose), just let me know.
> >
> > Regards,
> >
> > Juan
> >
> > El 15/05/15 a las 20:12, Becky Ligon escribió:
> > This is normal for 2.9.1 and okay to get the
> > messages you are seeing.
> > batch_create comes into play when a server
> needs to
> > gather more handles
> > (like inodes) from another server. The "Resource
> > temporarily
> > unavailable" is generated when the capability
> > associated with this
> > request has timed out. So, the calling server
> > regenerates the
> > capability and resends the batch_create request.
> >
> > The OFS development team is changing when these
> > capabilities get
> > generated for batch_create requests to alleviate
> > this problem. For now,
> > you can ignore these messages.
> >
> > Sorry for the inconvenience.
> >
> > Becky
> >
> >
> >
> > On Fri, May 15, 2015 at 11:48 AM, Juan PC
> > <[email protected] <javascript:;>
> <mailto:[email protected] <javascript:;>>
> > <mailto:[email protected] <javascript:;>
> > <mailto:[email protected] <javascript:;>>>
> > <mailto:[email protected] <javascript:;>
> > <mailto:[email protected] <javascript:;>>
> > <mailto:[email protected] <javascript:;>
> > <mailto:[email protected] <javascript:;>>>>>
> wrote:
> >
> > Dear Becky,
> >
> > I am trying to use orangefs-2.9.1, but
> everytime
> > I run the
> >
> > servers I get
> >
> > the message of the subject in one of the
> servers,
> > and its log
> >
> > file grows
> >
> > very quickly. The last reference that I
> have seen
> > about this
> >
> > problem is
> >
> >
>
> http://www.beowulf-underground.org/pipermail/pvfs2-users/2015-April/004432.html.
> >
> > I have used option --disable-capcache of
> > configure, but same
> >
> > result. Do
> >
> > you know if this issue has been already
> fixed or
> > if there is a
> > workaround?
> >
> > Best regards,
> >
> > Juan
> >
> >
> >
> >
> >
> >
> > ----------------------------------------------------------------
> > This message was sent using IMP, the Internet Messaging Program.
> >
> >
>
>
>
> --
> Sent from Gmail Mobile
--
D. Juan Piernas Cánovas
Departamento de Ingeniería y Tecnología de Computadores
Facultad de Informática. Universidad de Murcia
Campus de Espinardo - 30080 Murcia (SPAIN)
Tel.: +34868887657 Fax: +34868884151
email: [email protected]
PGP public key:
http://pgp.rediris.es:11371/pks/lookup?search=piernas%40ditec.um.es&op=index
*** Por favor, envíeme sus documentos en formato texto, HTML, PDF o
PostScript :-) ***
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users