Hi Johann,

I checked my old workflow and found I have to make some changes to make it working. The jobSubmitOptions setting working for me right now is "-ac SGE_CELL=hoffman2,SGE_ROOT=/u/systems/UGE8.0.1 -l h_data=1G". Since different clusters have different SGE installed, you need to find the one for your cluster. But it looks "-ac" is now needed for SGE_CELL and SEGE_ROOT.

Best wishes

Sincerely yours

Jianwu WANG, Ph.D.
[email protected]
http://users.sdsc.edu/~jianwu/

Assistant Director for Research
Workflows for Data Science (WorDS) Center of Excellence
San Diego Supercomputer Center (SDSC)
University of California, San Diego (UCSD)

On 2/13/15 10:49 AM, Hoeftberger, Johann wrote:
Hello Jianwu,

thank you for your interesting answer, I didn't take notice of that
settings by myself. Good hint!

I followed your advice and got rid of the qsub - unknown command error
through it. But although I set SGE_ROOT and SGE_CELL in
JobSubmitter-jobSubmitOptions options I still can't run my work flow,
always get the following errror/exception:

ERROR (org.kepler.actor.job.JobSubmitter:fire:226)
org.kepler.job.JobException: Error at job submission.
Command:cd [...]/SGE_Testscripts/[...]_Feb13_093517EST_0;
/opt/sge/bin/lx24-amd64/qsub  SGE_ROOT=/opt/sge,SGE_CELL=default
[...]/SGE_Testscripts/[...]_Feb13_093517EST_0/sgeTestscript.sh
Stdout:

Stderr:

Unable to initialize environment because of error: Please set the
environment variable SGE_ROOT.
Exiting.

org.kepler.job.JobException: Error at job submission.
Command:cd [...]/SGE_Testscripts/[...]_Feb13_093517EST_0;
/opt/sge/bin/lx24-amd64/qsub  SGE_ROOT=/opt/sge,SGE_CELL=default
[...]/SGE_Testscripts/[...]_Feb13_093517EST_0/sgeTestscript.sh
Stdout:

Stderr:

Unable to initialize environment because of error: Please set the
environment variable SGE_ROOT.
Exiting.

        at org.kepler.job.JobManager.submit(JobManager.java:307)
        at org.kepler.job.Job.submit(Job.java:375)
        at org.kepler.actor.job.JobSubmitter.fire(JobSubmitter.java:217)
        at
ptolemy.actor.process.ProcessThread._iterateActor(ProcessThread.java:335)
        at ptolemy.actor.process.ProcessThread.run(ProcessThread.java:212)


I tried different ways to set the SGE_ROOT setting, alone, together with
SGE_CELL, in parentheses, separated by comma, separated by semicolon.
All with the same outcome, Kepler "thinks" SGE_root isn't set.

Do you have further ideas how I could solve my issue.

Best regards,
Johann


On 02/06/2015 05:20 PM, Jianwu Wang wrote:
Hi Johann,

      Can you try adding SGE_CELL and SGE_ROOT settings at 'job submit
options' parameter of GenericJobLauncher actor or 'jobSubmitOptions'
parameter of JobSubmitter actor? An example is
"SGE_CELL=hoffman2,SGE_ROOT=/u/systems/SGE6.1u3". You might also need to
set the path for your qsub (such as
"/u/systems/SGE6.2u4/bin/lx26-amd64") to the 'binary path' parameter of
GenericJobLauncher actor or 'binPath' parameter of JobManager actor. I
remember I met a similar problem before and this did the trick.

Best wishes

Sincerely yours

Jianwu WANG, Ph.D.
[email protected]
http://users.sdsc.edu/~jianwu/

Assistant Director for Research
Workflows for Data Science (WorDS) Center of Excellence
San Diego Supercomputer Center (SDSC)
University of California, San Diego (UCSD)

On 2/6/15 3:31 PM, Hoeftberger, Johann wrote:
Hello,

I try to create my own simple Kepler test workflow on my local PC and
run it via SSH on a SGE cluster.
For that I took the Kepler demo Workflow
"Job_Submission_Using_JobManager" configured it for my local situation
an tried to run it.
I know that the connection to the cluster, the login with the given
credentials and the settings for the working directory work well. (I get
created directories and files for the cluster jobs which should be
executed.)

The execution of the qsub command on the cluster doesn't work because at
first I got the exception "qsub: unknown command" and when I hardcoded
the full path for the qsub command in the implementation (to locate the
error only), I changed the initialization of private String
_sgeSubmitCmd in JobSupportSGE.java to the full path of the qsub command
on my SGE cluster, I got the next exception "Unable to initialize
environment because of error: Please set the environment variable
SGE_ROOT".

SGE_ROOT is properly set on the SGE cluster. I tried to set it
additionally on my local PC (afterwards I restarted Eclipse where my
Kepler instance is running) and in the Eclipse - Run - Run Configuration
- Java Application - Environment. All these tries didn't solve the
problem, I still get the same exception about the uninitialized
environment.

I found in SshExec.java::public int executeCmd(String command,
OutputStream streamOut, OutputStream streamErr, String thirdPartyTarget)
the documentation

/**
      * Execute a command on the remote machine and expect a
password/passphrase
      * question from the command. The stream <i>streamOut</i> should be
provided
      * to get the output and errors merged. <i>streamErr</i> is not
used in
this
      * method (it will be empty string finally).
      *
      * @return exit code of command if execution succeeded,
      * @throws ExecTimeoutException
      *             if the command failed because of timeout
      * @throws SshException
      *             if an error occurs for the ssh connection during
the command
      *             execution Note: in this method, the SSH Channel is
forcing a
      *             pseudo-terminal allocation {see setPty(true)} to allow
remote
      *             commands to read something from their stdin (i.e.
from us
      *             here), thus, (1) remote environment is not set from
      *             .bashrc/.cshrc and (2) stdout and stderr come back
merged in
      *             one stream.
      */

so I guess my problem is caused through an uninitialized or wrong
initialized used (pseudo) terminal.
For me it seems the set environment variables on the used systems (SGE
cluster, local PC) are not read and the Kepler implementation itself
doesn't set proper values for the needed variables.

I haven't found a description or a configuration possibility for my
issue yet. So I think it is some kind of implementation flaw.

Can somebody give me a hint how to solve this issue. I would like to run
Kepler locally on my PC but execute parts of my Workflow remotely on my
SGE cluster.


Kind regards,
Johann Hoeftberger







_______________________________________________
Kepler-dev mailing list
[email protected]
http://lists.nceas.ucsb.edu/kepler/mailman/listinfo/kepler-dev

Reply via email to