Wondering if people have seen this one before:

I'm working on an mpiblast installation on a 40 node cluster, and I'm encountering an intermittent error related to shared directories. The reason that I'm asking the mpiblast community about it is that all other jobs (including large parallel and batch jobs) work fine. There are enough pieces involved that isolating the problem is proving to be something of a challenge.

* MPIBlast is integrated with an SGE parallel environment.
* All shared directories are served via NFS, which are re-exported from an Apple XSan.
* All blast target files are cached to local directories on the nodes.
* User authentication is via LDAP.

The Problem: Intermittent job failure due to inability to access the startup directory. When I resubmit the same job, it will sometimes work and other times not. Errors look like this:

Can't start from current directory: No such file or directory
sh: -c: line 1: unexpected EOF while looking for matching `''
sh: -c: line 4: syntax error: unexpected end of file

This persists even if I insert "sleep" or "while (!(-e /the/ appropriate/directory)) {sleep;}}" in my mpiblast submission script. In fact, if I check whether the directory exists and pause the script there, I can log into the node where the master task is running, examine the directory, etc.

It is, however, intermittent. Sometimes jobs work fine. Occasionally, I will have a job work, but I get these in the STDERR:

shell-init: could not get current directory: getcwd: cannot access parent directories: No such file or directory

Any advice would be appreciated. I'll also be asking the SGE and Xsan user lists for help.

-Chris Dwan


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Mpiblast-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/mpiblast-users

Reply via email to