Great! Thank you :-)
Juan
El 24/05/15 a las 17:51, Becky Ligon escribió:
> Juan:
>
> We have also been able to recreate your problem with startup and
> creating the root directory information. We are working now to put a
> fix in place.
>
> Becky
>
> On Fri, May 22, 2015 at 7:20 PM, Boyd Wilson <[email protected]
> <mailto:[email protected]>> wrote:
>
> default mode still uses PKI to some degree, but all of the expensive
> signing operations are done without or minimal keys, but the time
> drift may affect it (possibly, I will have to check with the
> developers that are more familiar with that code).
>
> -b
>
> On Fri, May 22, 2015 at 7:09 PM Juan PC <[email protected]
> <mailto:[email protected]>> wrote:
>
> Good to know :-). However, I use the default mode security (the
> old one,
> I think).
>
> Regards,
>
> Juan
>
> El 23/05/15 a las 00:52, Boyd Wilson escribió:
> > The new capability based security uses pki so it is time
> dependent, so
> > time drift could cause problems. As far as I can tell we
> have not
> > documented this, so we need to do so.
> >
> > -b
> >
> > On Fri, May 22, 2015 at 6:49 PM Juan PC <[email protected]
> <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >
> > Hi Becky,
> >
> > When I have tried to set up an OrangeFS cluster with 4 and
> 8 nodes, the
> > batch_create error message has appeared again. Then, I
> have realized
> > that some of my nodes had a wrong time (with a maximum
> difference of two
> > hours and a half between nodes). After synchronizing the
> times, the
> > batch_create problem seems to be gone. Does this make
> sense? I mean, can
> > a wrong time in some servers cause the problem? I do not
> remember seeing
> > any recommendation or warning about node times in the OrangeFS
> > documentation?
> >
> > Regards,
> >
> > Juan
> >
> > El 16/05/15 a las 22:59, Becky Ligon escribió:
> > > Juan:
> > >
> > > The conf file looks good. Can you send me your server
> log files?
> > >
> > > Becky
> > >
> > > On Saturday, May 16, 2015, Juan PC <[email protected]
> <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>
> > > <mailto:[email protected] <mailto:[email protected]>
> <mailto:[email protected] <mailto:[email protected]>>>> wrote:
> > >
> > > It is attached.
> > >
> > > I do not know if this is important, but one thing
> that I have
> > seen with
> > > this configuration file is that if I run the second
> server
> > just after
> > > running the first server, everything seems to work.
> However,
> > if I wait
> > > for a few seconds, the error message of the root
> directory
> > appears in
> > > the first server. Then, when I launch de second
> server, I get the
> > > avalanche of batch_create error messages. This
> avalanche seems
> > to stop
> > > when it has generated around 1 GB of data. However,
> because of the
> > > problem with the root directory, the file system
> does not work.
> > >
> > > I have checked if waiting for a few seconds between
> server
> > executions is
> > > an issue in OrangeFS 2.8.7 and it is not.
> > >
> > > Regards,
> > >
> > > Juan
> > >
> > > El 16/05/15 a las 17:59, Becky Ligon escribió:
> > > > Can you send me your orangefs-server.conf file?
> > > >
> > > > NOTE: do not use native IB with this version. we
> have a
> > known issue
> > > > with distributed directories and IB that we are
> currently
> > working on.
> > > >
> > > > Becky
> > > >
> > > > On Sat, May 16, 2015 at 11:43 AM,
> <[email protected] <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>
> <javascript:;>
> > > > <mailto:[email protected]
> <mailto:[email protected]> <mailto:[email protected]
> <mailto:[email protected]>>
> > <javascript:;>>> wrote:
> > > >
> > > > No, only TCP over Ethernet. We have IB NICs,
> but I have not
> > > compiled
> > > > OrangeFS with support for them.
> > > >
> > > > Juan
> > > >
> > > >
> > > > Quoting "Becky Ligon" <[email protected]
> <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>
> <javascript:;>
> > > > <mailto:[email protected]
> <mailto:[email protected]> <mailto:[email protected]
> <mailto:[email protected]>>
> > <javascript:;>>>:
> > > >
> > > > Are you using native IB?
> > > >
> > > > Becky
> > > >
> > > > Sent from my iPhone
> > > >
> > > > On May 15, 2015, at 5:39 PM, Juan PC
> > > <[email protected] <mailto:[email protected]>
> <mailto:[email protected] <mailto:[email protected]>>
> <javascript:;>
> > > > <mailto:[email protected]
> <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>
> <javascript:;>>> wrote:
> > > >
> > > > Hi,
> > > >
> > > > Well, your configuration can probably
> avoid the
> > > problem with the
> > > > benchmark, which I can not run because the
> > creation of the
> > > > OrangeFS fails.
> > > >
> > > > The batch_create error is still there
> because it
> > appears
> > > > just when I
> > > > launch the servers. The creation of
> the root
> > directory
> > > fails
> > > > too, as I
> > > > have mentioned. I think this is the
> relevant part of
> > > the log
> > > > messages
> > > > regarding the problem with the root
> directory:
> > > >
> > > > [D 05/15/2015 21:08:37]
> server_post_unexpected_recv
> > > > [D 05/15/2015 21:08:37]
> > server_op_state_get_machine 999
> > > > [D 05/15/2015 21:08:37] Initialization
> completed
> > > successfully.
> > > > [D 05/15/2015 21:08:37]
> > > server_state_machine_alloc_noreq 27
> > > > [D 05/15/2015 21:08:37]
> > server_op_state_get_machine 27
> > > > [D 05/15/2015 21:08:37]
> > server_state_machine_start_noreq
> > > > 0x1d6fa10
> > > > [D 05/15/2015 21:08:37] *** Trove
> KeyVal Read of
> > /dda
> > > > [D 05/15/2015 21:08:37] op_queue add:
> 0x1d71100
> > > > [D 05/15/2015 21:08:37] [DBPF THREAD]:
> [KEYVAL
> > -1]: -7
> > > > [D 05/15/2015 21:08:37] [DBPF THREAD]:
> STARTING
> > TROVE
> > > > SERVICE ROUTINE
> > > > (KEYVAL_READ)
> > > > [D 05/15/2015 21:08:37] warning:
> keyval read
> > error on
> > > handle
> > > > 1048576 and
> > > > key= /dda (BDB0073 DB_NOTFOUND: No
> matching key/data
> > > pair found)
> > > > [D 05/15/2015 21:08:37] [DBPF THREAD]:
> FINISHED
> > TROVE
> > > > SERVICE ROUTINE
> > > > (KEYVAL_READ) (ret: -1073742082)
> > > > [D 05/15/2015 21:08:37] op_queue add:
> 0x1d71100
> > > > [D 05/15/2015 21:08:37]
> > > server_state_machine_alloc_noreq 46
> > > > [D 05/15/2015 21:08:37]
> > server_op_state_get_machine 46
> > > > [D 05/15/2015 21:08:37]
> > server_state_machine_start_noreq
> > > > 0x1d70f80
> > > > [D 05/15/2015 21:08:37]
> mgmt-create-root-dir: Init
> > > > dist-dir-attr for dir
> > > > meta handle 1048576 with tree_height=1,
> > num_servers=2,
> > > > bitmap_size=1,
> > > > split_size=100, server_no=0 and
> branch_level=1
> > > > [D 05/15/2015 21:08:37]
> mgmt-create-root-dir: Init
> > > > dist_dir_bitmap as:
> > > > [D 05/15/2015 21:08:37] i=0 : 00 00 00 03
> > > > [D 05/15/2015 21:08:37]
> > > > [D 05/15/2015 21:08:37] creating 1
> local dirdata
> > files
> > > > [D 05/15/2015 21:08:37] creating 1 remote
> > dirdata files
> > > > [D 05/15/2015 21:08:37]
> > job_precreate_pool_get_handles:
> > > > requesting 1
> > > > handles of type 16
> > > > [E 05/15/2015 21:08:37] Warning: unable to
> > create root dir
> > > > due to error:
> > > > Invalid argument
> > > > [E 05/15/2015 21:08:37] Your
> FS may be
> > in an
> > > > inconsistent state
> > > > [D 05/15/2015 21:08:37]
> > > server_state_machine_complete_noreq:
> > > > 0x1d70f80
> > > > [D 05/15/2015 21:08:37]
> > server_state_machine_terminate
> > > 0x1d70f80
> > > > [E 05/15/2015 21:08:43] PVFS2 server
> got signal 15
> > > > (server_status_flag:
> > > > 4177919)
> > > > [D 05/15/2015 21:08:43]
> > server_state_machine_terminate
> > > 0x1d2e970
> > > >
> > > > Hope this helps.
> > > >
> > > > Regards,
> > > >
> > > > Juan
> > > >
> > > >
> > > > El 15/05/15 a las 22:13, Becky
> Ligon escribió:
> > > > Juan:
> > > >
> > > > You may have hit upon another
> problem that we've
> > > > encountered where the
> > > > splitting of directories goes into
> a race
> > condition.
> > > > Try this:
> > > >
> > > > 1. In your orangefs-server.conf
> file, set
> > > > DistrDirServersInitial 1 and
> > > > DistrDirServersMax 1 in your
> multi-server
> > > configuration
> > > > installation.
> > > >
> > > > 2. Delete your data and metadata
> areas and
> > recreate.
> > > > Start your servers.
> > > >
> > > > 3. Run your tests.
> > > >
> > > > See if this helps!
> > > >
> > > > NOTE: We are working on a fix for
> this
> > problem right
> > > > now but don't have
> > > > a working solution just yet.
> > > >
> > > > Becky
> > > >
> > > > On Fri, May 15, 2015 at 3:38 PM,
> Juan PC
> > > > <[email protected]
> <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>
> <javascript:;>
> > > <mailto:[email protected]
> <mailto:[email protected]> <mailto:[email protected]
> <mailto:[email protected]>>
> > <javascript:;>>
> > > > <mailto:[email protected]
> <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>
> <javascript:;>
> > > > <mailto:[email protected]
> <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>
> <javascript:;>>>> wrote:
> > > >
> > > > Hi Becky,
> > > >
> > > > Thank you for your response :-)
> > > >
> > > > The problem is that the log
> file grows at
> > a rate of
> > > > around 2 MiB per
> > > > second (EvenLogging is set to
> none!) and,
> > more
> > > > importantly, a simple
> > > > pvfs2-ls does not work. The
> latter is
> > probably
> > > due to
> > > > an error message
> > > > that I get after starting the
> server that
> > > stores the
> > > > root file system:
> > > >
> > > > [E 05/15/2015 18:38:08]
> Warning: unable
> > to create
> > > > root dir due to error:
> > > > Resource temporarily unavailable
> > > > [E 05/15/2015 18:38:08]
> Your FS
> > may be
> > > in an
> > > > inconsistent state
> > > >
> > > > although the batch_create
> errors appears
> > after,
> > > when
> > > > a second server
> > > > is run.
> > > >
> > > > I have spent a lot of time
> trying different
> > > > compilation options,
> > > > configurations, db versions,
> checking that I
> > > run the
> > > > right executables,
> > > > that they use the same filesystem
> > configuration
> > > file,
> > > > etc., and the
> > > > results is always the same.
> Well, to be
> > honest,
> > > I was
> > > > able to activate
> > > > the file system once (I do not
> know how),
> > but it
> > > > started failing when I
> > > > tried to create a few thousands
> files per
> > directory
> > > > (bechmark
> > > > hpcs-io_1.2.0-rc1, scenarios 9-12).
> > > >
> > > > My feeling is that, with two
> servers, the
> > > problematic
> > > > sever (the one
> > > > aimed at storing the root
> directory) does not
> > > > communicate correctly with
> > > > the second server. There is no
> firewall,
> > SELinux is
> > > > disabled, etc.
> > > >
> > > > Some final remarks:
> > > > - Security is always the
> default one, I have
> > > not used
> > > > either
> > > > --enable-security-key or
> > --enable-security-cert
> > > option.
> > > > - Same steps with OrangeFS
> 2.8.7 and not
> > > problem at all.
> > > >
> > > > So I guess that I should be
> doing something
> > > terribly
> > > > wrong, but I do not
> > > > know what :-(
> > > >
> > > > If I can do something (for
> instance,
> > running the
> > > > servers with
> > > > EvenLogging set to verbose),
> just let me
> > know.
> > > >
> > > > Regards,
> > > >
> > > > Juan
> > > >
> > > > El 15/05/15 a las 20:12,
> Becky Ligon
> > escribió:
> > > > This is normal for 2.9.1 and
> okay to get the
> > > > messages you are seeing.
> > > > batch_create comes into play
> when a server
> > > needs to
> > > > gather more handles
> > > > (like inodes) from another
> server. The
> > "Resource
> > > > temporarily
> > > > unavailable" is generated when the
> > capability
> > > > associated with this
> > > > request has timed out. So,
> the calling
> > server
> > > > regenerates the
> > > > capability and resends the
> batch_create
> > request.
> > > >
> > > > The OFS development team is
> changing
> > when these
> > > > capabilities get
> > > > generated for batch_create
> requests to
> > alleviate
> > > > this problem. For now,
> > > > you can ignore these messages.
> > > >
> > > > Sorry for the inconvenience.
> > > >
> > > > Becky
> > > >
> > > >
> > > >
> > > > On Fri, May 15, 2015 at 11:48
> AM, Juan PC
> > > > <[email protected]
> <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>
> <javascript:;>
> > > <mailto:[email protected]
> <mailto:[email protected]> <mailto:[email protected]
> <mailto:[email protected]>>
> > <javascript:;>>
> > > > <mailto:[email protected]
> <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>
> <javascript:;>
> > > > <mailto:[email protected]
> <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>
> <javascript:;>>>
> > > > <mailto:[email protected]
> <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>
> <javascript:;>
> > > > <mailto:[email protected]
> <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>
> <javascript:;>>
> > > > <mailto:[email protected]
> <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>
> <javascript:;>
> > > > <mailto:[email protected]
> <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>
> <javascript:;>>>>>
> > > wrote:
> > > >
> > > > Dear Becky,
> > > >
> > > > I am trying to use
> orangefs-2.9.1, but
> > > everytime
> > > > I run the
> > > >
> > > > servers I get
> > > >
> > > > the message of the subject
> in one of the
> > > servers,
> > > > and its log
> > > >
> > > > file grows
> > > >
> > > > very quickly. The last
> reference that I
> > > have seen
> > > > about this
> > > >
> > > > problem is
> > > >
> > > >
> > >
> >
>
> http://www.beowulf-underground.org/pipermail/pvfs2-users/2015-April/004432.html.
> > > >
> > > > I have used option
> --disable-capcache of
> > > > configure, but same
> > > >
> > > > result. Do
> > > >
> > > > you know if this issue has
> been already
> > > fixed or
> > > > if there is a
> > > > workaround?
> > > >
> > > > Best regards,
> > > >
> > > > Juan
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> >
> ----------------------------------------------------------------
> > > > This message was sent using IMP, the Internet
> Messaging
> > Program.
> > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Sent from Gmail Mobile
> >
> >
> > --
> > D. Juan Piernas Cánovas
> > Departamento de Ingeniería y Tecnología de Computadores
> > Facultad de Informática. Universidad de Murcia
> > Campus de Espinardo - 30080 Murcia (SPAIN)
> > Tel.: +34868887657 <tel:%2B34868887657> Fax:
> +34868884151 <tel:%2B34868884151>
> > email: [email protected] <mailto:[email protected]>
> <mailto:[email protected] <mailto:[email protected]>>
> > PGP public key:
> >
>
> http://pgp.rediris.es:11371/pks/lookup?search=piernas%40ditec.um.es&op=index
> >
> > *** Por favor, envíeme sus documentos en formato texto,
> HTML, PDF o
> > PostScript :-) ***
> > _______________________________________________
> > Pvfs2-users mailing list
> > [email protected]
> <mailto:[email protected]>
> > <mailto:[email protected]
> <mailto:[email protected]>>
> >
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
> >
>
>
> --
> D. Juan Piernas Cánovas
> Departamento de Ingeniería y Tecnología de Computadores
> Facultad de Informática. Universidad de Murcia
> Campus de Espinardo - 30080 Murcia (SPAIN)
> Tel.: +34868887657 <tel:%2B34868887657> Fax: +34868884151
> <tel:%2B34868884151>
> email: [email protected] <mailto:[email protected]>
> PGP public key:
>
> http://pgp.rediris.es:11371/pks/lookup?search=piernas%40ditec.um.es&op=index
>
> *** Por favor, envíeme sus documentos en formato texto, HTML, PDF o
> PostScript :-) ***
>
>
--
D. Juan Piernas Cánovas
Departamento de Ingeniería y Tecnología de Computadores
Facultad de Informática. Universidad de Murcia
Campus de Espinardo - 30080 Murcia (SPAIN)
Tel.: +34868887657 Fax: +34868884151
email: [email protected]
PGP public key:
http://pgp.rediris.es:11371/pks/lookup?search=piernas%40ditec.um.es&op=index
*** Por favor, envíeme sus documentos en formato texto, HTML, PDF o
PostScript :-) ***
_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users