Jones de Andrade wrote:
Hi Justin.
Well, bothering again. Good and bad news.
The good news: I found a strange "work-around" for my problems here. For
some reason, the perl script updates the path, environments and
everything else when runs. So, the variables I placed on the script I
was using where simply lost. Workaround here was, then, to just include
those in the .tcshrc file and log again.
Neither perl nor gmxtest.pl "update" the environment variables. That
will be determined by the shell from which you call your script (and/or
your queueing system), and what your script does. See my earlier reply
for how to set the PATH correctly.
The problem is that it's not pratical. I'm trying a lot of different
MPIs and libraries compilations, and having to edit that file, and or
logou/login or source it, is not pratical at all. Is there any other
way, so that the perl script will be happy with the variables it has
when its called, instead of initializing all them again?
If you have a normal system, the environment variables from your shell
are propagated to your script after being modified by your .tcshrc, and
then passed to the perl script. You may get some ideas here
http://hell.org.ua/Docs/oreilly/perl2/prog/ch19_02.htm
If you haven't got a normal system, finding out how it works is not
really a problem for the GROMACS mailing list :-)
Mark
Second, here comes the real bad news: Lots of erros.
Without MPI, in single precision, 4 complex and 16 kernel tests fail.
Without MPI, but in double precision, "just" the 16 kernel tests fail.
With MPI, in single precision, it fails on 1 simple, 9 complex and 16
kernel tests!
And with MPI and double precision, 1 simple, 7 complex and 16 kernel
tests fails. :P
Edit: Just received your message. Well, it seems that I've done a
mistake on my script, but since at least part of the tests worked, it
means that it's not the MPI that is, at least, missconfigured.
I will look deeper into the erros above, and tell you later.
Thanks a lot,
Jones
On Mon, May 11, 2009 at 9:41 PM, Jones de Andrade <johanne...@gmail.com
<mailto:johanne...@gmail.com>> wrote:
Hi Justin.
Thanks a lot for that. It helped, but enough yet. :( Just made
4.0.4 tests reach the same "range of errors" that I'm getting with
3.3.3. :P
Using openMPI, it just complains that it can't find orted. That
would mean that the paths are not in there, BUT they are. :P If I
just try to run orted from the command line without any arguments:
*****************
/gmxtest404 196% orted
[palpatine:28366] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in
file runtime/orte_init.c at line 125
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_ess_base_select failed
--> Returned value Not found (-13) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[palpatine:28366] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in
file orted/orted_main.c at line 323
/*****************
So, the shell IS finding the file. But when I do it not from the
script anymore (I was already thinking in something on the
"it-else-end" stack), all mpi tests fail with the following message
on mdrun.out file:
**********************
/orted: Command not found.
--------------------------------------------------------------------------
A daemon (pid 27972) died unexpectedly with status 1 while attempting
to launch so we are aborting.
There may be more information reported by the environment (see above).
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to
have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
mpirun: clean termination accomplished
/**********************
What is going on? Next thing I think about doing is to execute a
full command line from one of the tests directly, to see that it
works... :( :P
Now I'm absolutelly lost. Any ideas, please?
Thanks a lot,
Jones
On Mon, May 11, 2009 at 9:07 PM, Justin A. Lemkul <jalem...@vt.edu
<mailto:jalem...@vt.edu>> wrote:
Justin A. Lemkul wrote:
Jones de Andrade wrote:
Hi Justin
This has been discussed several times on the list.
The -np flag is
no longer necessary with grompp. You don't get an
mdrun.out because
the .tpr file is likely never created, since grompp
fails.
Yes, I know that and that is what I would have expected.
But what I'm running is the gmxtest.pl script. Even
using the 4.0.4 version, it explicit states that I must
use "-np N" to make parallel works on its own command line.
************
gmxtest.pl
Usage: ./gmxtest.pl [ -np N ] [-verbose ] [ -double ] [
simple | complex | kernel | pdb2gmx | all ]
or: ./gmxtest.pl clean | refclean | dist
************
I would expect that the script would use it only for
mdrun and not for grompp, but it seems to try to use on
both. What becomes really strange it the testbed really
works. So, gmxtest.pl has a bug on 4.0.4? Or how should
I really tell gmxtest.pl to test in a growing number of
cores?
Ah, sorry for the mis-read :) There is a simple fix that
you can apply to the gmxtest.pl script:
% diff gmxtest.pl gmxtest_orig.pl
161c161
< system("$grompp -maxwarn 10 $ndx > grompp.out 2>&1");
---
> system("$grompp -maxwarn 10 $ndx $par >
grompp.out 2>&1");
-Justin
Version 3.3.3 on the other hand already failed in
so many
different places that I'm really thinking IF I'll
make it
available in the new cluster. :P
What messages are you getting from 3.3.3? I thought
you said the
3.3.x series worked fine.
I'll login for those and try to get any reproducible
error here. ;) As soon as I have these, I post back in
this thread.
Thanks a lot again,
Jones
--
========================================
Justin A. Lemkul
Ph.D. Candidate
ICTAS Doctoral Scholar
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu <http://vt.edu> | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin
========================================
_______________________________________________
gmx-users mailing list gmx-users@gromacs.org
<mailto:gmx-users@gromacs.org>
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search
before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org
<mailto:gmx-users-requ...@gromacs.org>.
Can't post? Read http://www.gromacs.org/mailing_lists/users.php
------------------------------------------------------------------------
_______________________________________________
gmx-users mailing list gmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/mailing_lists/users.php
_______________________________________________
gmx-users mailing list gmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/mailing_lists/users.php