default mode still uses PKI to some degree, but all of the expensive signing operations are done without or minimal keys, but the time drift may affect it (possibly, I will have to check with the developers that are more familiar with that code).
-b On Fri, May 22, 2015 at 7:09 PM Juan PC <[email protected]> wrote: > Good to know :-). However, I use the default mode security (the old one, > I think). > > Regards, > > Juan > > El 23/05/15 a las 00:52, Boyd Wilson escribió: > > The new capability based security uses pki so it is time dependent, so > > time drift could cause problems. As far as I can tell we have not > > documented this, so we need to do so. > > > > -b > > > > On Fri, May 22, 2015 at 6:49 PM Juan PC <[email protected] > > <mailto:[email protected]>> wrote: > > > > Hi Becky, > > > > When I have tried to set up an OrangeFS cluster with 4 and 8 nodes, > the > > batch_create error message has appeared again. Then, I have realized > > that some of my nodes had a wrong time (with a maximum difference of > two > > hours and a half between nodes). After synchronizing the times, the > > batch_create problem seems to be gone. Does this make sense? I mean, > can > > a wrong time in some servers cause the problem? I do not remember > seeing > > any recommendation or warning about node times in the OrangeFS > > documentation? > > > > Regards, > > > > Juan > > > > El 16/05/15 a las 22:59, Becky Ligon escribió: > > > Juan: > > > > > > The conf file looks good. Can you send me your server log files? > > > > > > Becky > > > > > > On Saturday, May 16, 2015, Juan PC <[email protected] > > <mailto:[email protected]> > > > <mailto:[email protected] <mailto:[email protected]>>> wrote: > > > > > > It is attached. > > > > > > I do not know if this is important, but one thing that I have > > seen with > > > this configuration file is that if I run the second server > > just after > > > running the first server, everything seems to work. However, > > if I wait > > > for a few seconds, the error message of the root directory > > appears in > > > the first server. Then, when I launch de second server, I get > the > > > avalanche of batch_create error messages. This avalanche seems > > to stop > > > when it has generated around 1 GB of data. However, because of > the > > > problem with the root directory, the file system does not work. > > > > > > I have checked if waiting for a few seconds between server > > executions is > > > an issue in OrangeFS 2.8.7 and it is not. > > > > > > Regards, > > > > > > Juan > > > > > > El 16/05/15 a las 17:59, Becky Ligon escribió: > > > > Can you send me your orangefs-server.conf file? > > > > > > > > NOTE: do not use native IB with this version. we have a > > known issue > > > > with distributed directories and IB that we are currently > > working on. > > > > > > > > Becky > > > > > > > > On Sat, May 16, 2015 at 11:43 AM, <[email protected] > > <mailto:[email protected]> <javascript:;> > > > > <mailto:[email protected] <mailto:[email protected]> > > <javascript:;>>> wrote: > > > > > > > > No, only TCP over Ethernet. We have IB NICs, but I have > not > > > compiled > > > > OrangeFS with support for them. > > > > > > > > Juan > > > > > > > > > > > > Quoting "Becky Ligon" <[email protected] > > <mailto:[email protected]> <javascript:;> > > > > <mailto:[email protected] <mailto:[email protected]> > > <javascript:;>>>: > > > > > > > > Are you using native IB? > > > > > > > > Becky > > > > > > > > Sent from my iPhone > > > > > > > > On May 15, 2015, at 5:39 PM, Juan PC > > > <[email protected] <mailto:[email protected]> > <javascript:;> > > > > <mailto:[email protected] > > <mailto:[email protected]> <javascript:;>>> wrote: > > > > > > > > Hi, > > > > > > > > Well, your configuration can probably avoid the > > > problem with the > > > > benchmark, which I can not run because the > > creation of the > > > > OrangeFS fails. > > > > > > > > The batch_create error is still there because it > > appears > > > > just when I > > > > launch the servers. The creation of the root > > directory > > > fails > > > > too, as I > > > > have mentioned. I think this is the relevant > part of > > > the log > > > > messages > > > > regarding the problem with the root directory: > > > > > > > > [D 05/15/2015 21:08:37] > server_post_unexpected_recv > > > > [D 05/15/2015 21:08:37] > > server_op_state_get_machine 999 > > > > [D 05/15/2015 21:08:37] Initialization completed > > > successfully. > > > > [D 05/15/2015 21:08:37] > > > server_state_machine_alloc_noreq 27 > > > > [D 05/15/2015 21:08:37] > > server_op_state_get_machine 27 > > > > [D 05/15/2015 21:08:37] > > server_state_machine_start_noreq > > > > 0x1d6fa10 > > > > [D 05/15/2015 21:08:37] *** Trove KeyVal Read of > > /dda > > > > [D 05/15/2015 21:08:37] op_queue add: 0x1d71100 > > > > [D 05/15/2015 21:08:37] [DBPF THREAD]: [KEYVAL > > -1]: -7 > > > > [D 05/15/2015 21:08:37] [DBPF THREAD]: STARTING > > TROVE > > > > SERVICE ROUTINE > > > > (KEYVAL_READ) > > > > [D 05/15/2015 21:08:37] warning: keyval read > > error on > > > handle > > > > 1048576 and > > > > key= /dda (BDB0073 DB_NOTFOUND: No matching > key/data > > > pair found) > > > > [D 05/15/2015 21:08:37] [DBPF THREAD]: FINISHED > > TROVE > > > > SERVICE ROUTINE > > > > (KEYVAL_READ) (ret: -1073742082) > > > > [D 05/15/2015 21:08:37] op_queue add: 0x1d71100 > > > > [D 05/15/2015 21:08:37] > > > server_state_machine_alloc_noreq 46 > > > > [D 05/15/2015 21:08:37] > > server_op_state_get_machine 46 > > > > [D 05/15/2015 21:08:37] > > server_state_machine_start_noreq > > > > 0x1d70f80 > > > > [D 05/15/2015 21:08:37] mgmt-create-root-dir: > Init > > > > dist-dir-attr for dir > > > > meta handle 1048576 with tree_height=1, > > num_servers=2, > > > > bitmap_size=1, > > > > split_size=100, server_no=0 and branch_level=1 > > > > [D 05/15/2015 21:08:37] mgmt-create-root-dir: > Init > > > > dist_dir_bitmap as: > > > > [D 05/15/2015 21:08:37] i=0 : 00 00 00 03 > > > > [D 05/15/2015 21:08:37] > > > > [D 05/15/2015 21:08:37] creating 1 local dirdata > > files > > > > [D 05/15/2015 21:08:37] creating 1 remote > > dirdata files > > > > [D 05/15/2015 21:08:37] > > job_precreate_pool_get_handles: > > > > requesting 1 > > > > handles of type 16 > > > > [E 05/15/2015 21:08:37] Warning: unable to > > create root dir > > > > due to error: > > > > Invalid argument > > > > [E 05/15/2015 21:08:37] Your FS may be > > in an > > > > inconsistent state > > > > [D 05/15/2015 21:08:37] > > > server_state_machine_complete_noreq: > > > > 0x1d70f80 > > > > [D 05/15/2015 21:08:37] > > server_state_machine_terminate > > > 0x1d70f80 > > > > [E 05/15/2015 21:08:43] PVFS2 server got signal > 15 > > > > (server_status_flag: > > > > 4177919) > > > > [D 05/15/2015 21:08:43] > > server_state_machine_terminate > > > 0x1d2e970 > > > > > > > > Hope this helps. > > > > > > > > Regards, > > > > > > > > Juan > > > > > > > > > > > > El 15/05/15 a las 22:13, Becky Ligon > escribió: > > > > Juan: > > > > > > > > You may have hit upon another problem that > we've > > > > encountered where the > > > > splitting of directories goes into a race > > condition. > > > > Try this: > > > > > > > > 1. In your orangefs-server.conf file, set > > > > DistrDirServersInitial 1 and > > > > DistrDirServersMax 1 in your multi-server > > > configuration > > > > installation. > > > > > > > > 2. Delete your data and metadata areas and > > recreate. > > > > Start your servers. > > > > > > > > 3. Run your tests. > > > > > > > > See if this helps! > > > > > > > > NOTE: We are working on a fix for this > > problem right > > > > now but don't have > > > > a working solution just yet. > > > > > > > > Becky > > > > > > > > On Fri, May 15, 2015 at 3:38 PM, Juan PC > > > > <[email protected] > > <mailto:[email protected]> <javascript:;> > > > <mailto:[email protected] <mailto:[email protected]> > > <javascript:;>> > > > > <mailto:[email protected] > > <mailto:[email protected]> <javascript:;> > > > > <mailto:[email protected] > > <mailto:[email protected]> <javascript:;>>>> wrote: > > > > > > > > Hi Becky, > > > > > > > > Thank you for your response :-) > > > > > > > > The problem is that the log file grows at > > a rate of > > > > around 2 MiB per > > > > second (EvenLogging is set to none!) and, > > more > > > > importantly, a simple > > > > pvfs2-ls does not work. The latter is > > probably > > > due to > > > > an error message > > > > that I get after starting the server that > > > stores the > > > > root file system: > > > > > > > > [E 05/15/2015 18:38:08] Warning: unable > > to create > > > > root dir due to error: > > > > Resource temporarily unavailable > > > > [E 05/15/2015 18:38:08] Your FS > > may be > > > in an > > > > inconsistent state > > > > > > > > although the batch_create errors appears > > after, > > > when > > > > a second server > > > > is run. > > > > > > > > I have spent a lot of time trying > different > > > > compilation options, > > > > configurations, db versions, checking > that I > > > run the > > > > right executables, > > > > that they use the same filesystem > > configuration > > > file, > > > > etc., and the > > > > results is always the same. Well, to be > > honest, > > > I was > > > > able to activate > > > > the file system once (I do not know how), > > but it > > > > started failing when I > > > > tried to create a few thousands files per > > directory > > > > (bechmark > > > > hpcs-io_1.2.0-rc1, scenarios 9-12). > > > > > > > > My feeling is that, with two servers, the > > > problematic > > > > sever (the one > > > > aimed at storing the root directory) does > not > > > > communicate correctly with > > > > the second server. There is no firewall, > > SELinux is > > > > disabled, etc. > > > > > > > > Some final remarks: > > > > - Security is always the default one, I > have > > > not used > > > > either > > > > --enable-security-key or > > --enable-security-cert > > > option. > > > > - Same steps with OrangeFS 2.8.7 and not > > > problem at all. > > > > > > > > So I guess that I should be doing > something > > > terribly > > > > wrong, but I do not > > > > know what :-( > > > > > > > > If I can do something (for instance, > > running the > > > > servers with > > > > EvenLogging set to verbose), just let me > > know. > > > > > > > > Regards, > > > > > > > > Juan > > > > > > > > El 15/05/15 a las 20:12, Becky Ligon > > escribió: > > > > This is normal for 2.9.1 and okay to get > the > > > > messages you are seeing. > > > > batch_create comes into play when a > server > > > needs to > > > > gather more handles > > > > (like inodes) from another server. The > > "Resource > > > > temporarily > > > > unavailable" is generated when the > > capability > > > > associated with this > > > > request has timed out. So, the calling > > server > > > > regenerates the > > > > capability and resends the batch_create > > request. > > > > > > > > The OFS development team is changing > > when these > > > > capabilities get > > > > generated for batch_create requests to > > alleviate > > > > this problem. For now, > > > > you can ignore these messages. > > > > > > > > Sorry for the inconvenience. > > > > > > > > Becky > > > > > > > > > > > > > > > > On Fri, May 15, 2015 at 11:48 AM, Juan PC > > > > <[email protected] > > <mailto:[email protected]> <javascript:;> > > > <mailto:[email protected] <mailto:[email protected]> > > <javascript:;>> > > > > <mailto:[email protected] > > <mailto:[email protected]> <javascript:;> > > > > <mailto:[email protected] > > <mailto:[email protected]> <javascript:;>>> > > > > <mailto:[email protected] > > <mailto:[email protected]> <javascript:;> > > > > <mailto:[email protected] > > <mailto:[email protected]> <javascript:;>> > > > > <mailto:[email protected] > > <mailto:[email protected]> <javascript:;> > > > > <mailto:[email protected] > > <mailto:[email protected]> <javascript:;>>>>> > > > wrote: > > > > > > > > Dear Becky, > > > > > > > > I am trying to use orangefs-2.9.1, but > > > everytime > > > > I run the > > > > > > > > servers I get > > > > > > > > the message of the subject in one of > the > > > servers, > > > > and its log > > > > > > > > file grows > > > > > > > > very quickly. The last reference that > I > > > have seen > > > > about this > > > > > > > > problem is > > > > > > > > > > > > > > http://www.beowulf-underground.org/pipermail/pvfs2-users/2015-April/004432.html > . > > > > > > > > I have used option --disable-capcache > of > > > > configure, but same > > > > > > > > result. Do > > > > > > > > you know if this issue has been > already > > > fixed or > > > > if there is a > > > > workaround? > > > > > > > > Best regards, > > > > > > > > Juan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------- > > > > This message was sent using IMP, the Internet Messaging > > Program. > > > > > > > > > > > > > > > > > > > > -- > > > Sent from Gmail Mobile > > > > > > -- > > D. Juan Piernas Cánovas > > Departamento de Ingeniería y Tecnología de Computadores > > Facultad de Informática. Universidad de Murcia > > Campus de Espinardo - 30080 Murcia (SPAIN) > > Tel.: +34868887657 Fax: +34868884151 > > email: [email protected] <mailto:[email protected]> > > PGP public key: > > > http://pgp.rediris.es:11371/pks/lookup?search=piernas%40ditec.um.es&op=index > > > > *** Por favor, envíeme sus documentos en formato texto, HTML, PDF o > > PostScript :-) *** > > _______________________________________________ > > Pvfs2-users mailing list > > [email protected] > > <mailto:[email protected]> > > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users > > > > > -- > D. Juan Piernas Cánovas > Departamento de Ingeniería y Tecnología de Computadores > Facultad de Informática. Universidad de Murcia > Campus de Espinardo - 30080 Murcia (SPAIN) > Tel.: +34868887657 Fax: +34868884151 > email: [email protected] > PGP public key: > > http://pgp.rediris.es:11371/pks/lookup?search=piernas%40ditec.um.es&op=index > > *** Por favor, envíeme sus documentos en formato texto, HTML, PDF o > PostScript :-) *** >
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
