Hi Esteban,
Ok, that makes more sense. You had mentioned "globusrun" not
"globusrun-ws" in your original email - globusrun-ws is correct. Also
while the 'sge.pm' file is used for job submission in GT4/WS, the
"poll()" function is not used. In WS, the scheduler event generator
process reads sge's reporting file for all changes to job state
information. You shuold see the process running via ps as:
path-to-globus/globus/libexec/globus-scheduler-event-generator -s sge -t
1213124041
where '-t xxxxxxxxxx' is the timestamp of the last time stopped. There's
one for 'fork' as well.
So based on your symptoms, it seems to me that either the sge job-id
isn't being correctly registered in the jobmanager or the
scheduler-event-generator is having problems processing the reporting
file. I've not built directly from the LeSC distribution before. I've
been reviewing it recently and may try to replicate your problem this week.
A couple of things you might try.
- make sure that the sge data is getting into the reporting file. e.g
do a tail on the reporting file while you do normal sge qsubs and see
the job appear and cycle through it's stages.
- run the globus-scheduler-event-generator by hand (cut & paste from
your ps result as above) but with a debugging flag set on your shell
environment: export SEG_SGE_DEBUG=15 . For this to work you probably
need to have built the 'dbg' libs. This will provide output of the
scheduler-event-generator scanning the reporting file.
thanks, Jeff
Esteban Freire wrote:
Hi Jeff,
Thanks for answering me. I think I have misunderstood something. Yes,
I'm interested in deploying GT4 + SGE, and actually, I have
compiled/installed Globus-4.0.7. I mentioned " turning on SGE's
reporting file"/globusrun-ws command and poll function(sge.pm file),
because I'm following the globus + sge integration from link,
http://www.lesc.ic.ac.uk/projects/SGE-GT4.html, maybe, should I use
another command to send the jobs instead of globusrun-ws? or Should I
follow other globus + sge integration tutorial?
I would appreciate your help because I'm a bit lost about this right
now..
Thanks,
Esteban
Jeff Porter wrote:
Hi Esteban,
I am a bit confused. You mention turning on SGE's reporting file
which indeed is needed for GT4 (WS GRAM) but then you discuss running
with "globusrun" and looking at the "poll()" function in the sge.pm
file. Both of those are in the GT2 (pre-WS) framework. From our
other discussions you were interested in deploying GT4. Is that what
you're trying to do? e.g. run with globusrun-ws?
Thanks, Jeff
Esteban Freire wrote:
Hello all,
I'm trying the integration of SGE (GE 6.1u3) + Globus
(globus-4.0.7), but I still have the same old problem which I had in
previous attempts. I'm trying the Globus + SGE integration provide
by the LESC, http://www.lesc.ic.ac.uk/projects/SGE-GT4.html
I can send the jobs with Fork correctly and I can send jobs with
*qsub* correctly too, and besides I have enabled *reporting_params
reporting=true* and accessible for globus.
I attach on this e-mail the outputs that I considerer more
important. I send the job with *globusrun* command to SGE, the job
enters in execution under SGE correctly and it finish well
(according to SGE). The files *.stdout and *.stderr are generated
correctly in the user Home, and *.stdout file contains the correct
output for the job, but for some reason the jobManager doesn't see
that the job has finished, and it remains on *Current job state:
Unsubmitted* without finish until I execute [ctrl + c].
I have been looking
*/usr/local/globus-4.0.7/lib/perl/Globus/GRAM/JobManager/sge.pm*,
and in the function in which check if the job has fineshed with
command qstat -j, *sub poll* function, doing a debug it isn't doing
the qstat, it execute the qsub correctly and it gets the jobID, but
I don't know in what step, it stops and don't execute poll function.
On the other hand, we have configured 'sge_qstat' in order to don't
be necessary execute qstat -u '*' to see the all jobs
running/queued, therefore the difference with previous versions of
SGE is minimum.
[EMAIL PROTECTED] ~]$ cat /usr/local/sge/pro/default/common/sge_qstat
-u *
I would appreciate any help, and comments are welcome.
Thanks in advance,
Esteban