Wondering if people have seen this one before:
I'm working on an mpiblast installation on a 40 node cluster, and I'm
encountering an intermittent error related to shared directories.
The reason that I'm asking the mpiblast community about it is that
all other jobs (including large parallel and batch jobs) work fine.
There are enough pieces involved that isolating the problem is
proving to be something of a challenge.
* MPIBlast is integrated with an SGE parallel environment.
* All shared directories are served via NFS, which are re-exported
from an Apple XSan.
* All blast target files are cached to local directories on the nodes.
* User authentication is via LDAP.
The Problem: Intermittent job failure due to inability to access the
startup directory. When I resubmit the same job, it will sometimes
work and other times not. Errors look like this:
Can't start from current directory: No such file or directory
sh: -c: line 1: unexpected EOF while looking for matching `''
sh: -c: line 4: syntax error: unexpected end of file
This persists even if I insert "sleep" or "while (!(-e /the/
appropriate/directory)) {sleep;}}" in my mpiblast submission script.
In fact, if I check whether the directory exists and pause the script
there, I can log into the node where the master task is running,
examine the directory, etc.
It is, however, intermittent. Sometimes jobs work fine.
Occasionally, I will have a job work, but I get these in the STDERR:
shell-init: could not get current directory: getcwd: cannot access
parent directories: No such file or directory
Any advice would be appreciated. I'll also be asking the SGE and
Xsan user lists for help.
-Chris Dwan
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users