Hi Charles,Thanks for the reply. I know it must be hard to tell how hosts are set up. However, you are the only one helping, so I very much appreciate.
1. globus-job-run jobmanager-fork or jobmanager condor does not work.[EMAIL PROTECTED] ~]$ globus-job-run grid2.ramscommunity.org/jobmanager- fork /bin/hostname GRAM Job submission failed because data transfer to the server failed (error code 10)
[EMAIL PROTECTED] ~]$ globus-job-run grid2.ramscommunity.org/jobmanager- condor /bin/hostname GRAM Job submission failed because data transfer to the server failed (error code 10)
As I said, the env has this: $ env | grep GLOBUS GLOBUS_PATH=/usr/local/globus GLOBUS_LOCATION=/usr/local/globus GLOBUS_TCP_PORT_RANGE=40000,41000 2. I have saved a gram job submit some time ago. [EMAIL PROTECTED] ~]$ cat scheduler_condor_submit_script # # description file for condor submission # Universe = vanilla Notification = Never Executable = /bin/hostname Requirements = OpSys == "LINUX" && Arch == "INTEL"Environment = X509_USER_PROXY=/home/yoichi/.globus/job/ grid2.ramscommunity.org/21429.1224637416/x509_up;GLOBUS_LOCATION=/usr/ local/globus;GLOBUS_GRAM_JOB_CONTACT=https://grid2.ramscommunity.org:40001/21429/1224637416/;GLOBUS_GRAM_MYJOB_CONTACT=URLx-nexus://grid2.ramscommunity.org:40002/;HOME=/home/yoichi;LOGNAME=yoichi;LD_LIBRARY_PATH=
Arguments = InitialDir = /home/yoichi Input = /dev/null Log = /usr/local/globus/var/globus-condor.log log_xml = True #Extra attributes specified by clientOutput = /home/yoichi/.globus/job/grid2.ramscommunity.org/ 21429.1224637416/stdout Error = /home/yoichi/.globus/job/grid2.ramscommunity.org/ 21429.1224637416/stderr
queue 13. When I submit it this manually, it works (although the Globus connections mentioned there are gone. Not sure how condor-globus are talking together)
$ condor_submit scheduler_condor_submit_script Submitting job(s)ERROR: Can't open "/home/yoichi/.globus/job/grid2.ramscommunity.org/ 21429.1224637416/stdout" with flags 01101 (No such file or directory)
Manually creating the directory[EMAIL PROTECTED] ~]$ mkdir -p /home/yoichi/.globus/job/ grid2.ramscommunity.org/21429.1224637416/ [EMAIL PROTECTED] ~]$ touch /home/yoichi/.globus/job/ grid2.ramscommunity.org/21429.1224637416/stdout
[EMAIL PROTECTED] ~]$ condor_submit scheduler_condor_submit_script Submitting job(s). Logging submit event(s). 1 job(s) submitted to cluster 5.I get the results, also the condor_submit created the stderr file there correctly without an error.
After 5 min or so, I got the results$ cat /home/yoichi/.globus/job/grid2.ramscommunity.org/ 21429.1224637416/stdout
grid4.ramscommunity.org4. Condor is setup with LOWPORT and HIGHPORT already and as far as I can tell all the ports are within this range and the firewall is all open for this range.
5. Although the globus-job-run reports the errors, the job has been submitted and ran on Condor pool OK.
The only thing it fails is to report the JobDI back to me when the job is submitted at the very beginning (suspect that it cannot write to / dev/stdout) and to write out the results when the job results is returned by condor - it has been received by Globus but it seems that it cannot write to /home/yoichi/.globus/job/grid2.ramscommunity.org/ 21429.1224637416/stdout
6. The only thing I can think of is that the DNS reverse lookup does not work in this subnet, because the university allows us to set DNS forward look up in this subnet with different sub-domain names we have purchased, but has not delegated the reverse look up to us. They do it themselves but only for the formal university DNS names only, which is different from this sub-net DNS name we are using. At present, those reverse look ups for these hosts are not set.
Although the DNS/IP pairs are defined in /etc/hosts on all these hosts and most things work by referring to these files, e.g. Kerberos, NFS and LDAP. All tests are PASS. Also, all tests for Condor are OK.
Pegasus works fine as well, but simply it cannot get the final results back. That was why I started testing the globus-job-run to see whether fundamental things are working with Globus or not.
I have exhausted all options of (what I can think of) what may be wrong. As long as I have those, I can go to fix them and see if that helped. :-(
Thanks, Yoichi -------------------------------------------------------------------------- Yoichi Takayama, PhD Senior Research Fellow RAMP Project MELCOE (Macquarie E-Learning Centre of Excellence) MACQUARIE UNIVERSITY Phone: +61 (0)2 9850 9073 Fax: +61 (0)2 9850 6527 www.mq.edu.au www.melcoe.mq.edu.au/projects/RAMP/ -------------------------------------------------------------------------- MACQUARIE UNIVERSITY: CRICOS Provider No 00002JThis message is intended for the addressee named and may contain confidential information. If you are not the intended recipient, please delete it and notify the sender. Views expressed in this message are those of the individual sender, and are not necessarily the views of Macquarie E-Learning Centre Of Excellence (MELCOE) or Macquarie University.
On 28/10/2008, at 1:31 AM, Charles Bacon wrote:
Yes, I meant the globus-job-run user having it set in the environment.There are so many moving parts in your pegasus setup, I'm getting a little lost. If I were debugging this, here's what I would do:1) Work with globus-job-run and the fork jobmanager until it was working. 2) Then, work with the globus-job-run and the condor jobmanager until it was working 3) If that's not working, I would edit $GLOBUS_LOCATION/lib/perl/ Globus/GRAM/JobManager/condor.pm on the gatekeeper node to have it save a copy of the condor script it was submitting, then: 3a) condor_submit that directly on the gatekeeper node to make sure that works. If that doesn't work, the condor pool itself needs to be fixed 3b) If that *does* work already, then I'm confused, but probably suspect that condor needs some lowport/highport settings to work in your firewall environment 4) Only once all that was working would I bother trying to use condor-G to submit to the gatekeeper, then debugging that if it doesn't work 5) As far as I know, GridFTP has nothing to do with your setup, and you can stop testing it. condor-G only uses GridFTP when submitting to GRAM4, as far as I know.Charles On Oct 24, 2008, at 8:50 PM, Yoichi Takayama wrote:Hi How to define LOBUS_TCP_PORT_RANGE for "client"What do you mean by defining the GLOBUS_TCP_PORT_RANGE for the client?Do you mean Condor-G? Where should I define it? It is defined for GridFTP. Is it enough?Or, do you mean the end user's environment? The end user has GLOBUS_TCP_PORT_RANGE=40000,41000 defined in his/her env.$ env | grep GLOBUS GLOBUS_PATH=/usr/local/globus GLOBUS_LOCATION=/usr/local/globus GLOBUS_TCP_PORT_RANGE=40000,41000 Or, do you mean linux condor user?condor user is defined as a normal user than a system user, so it gets the env, too. Is that enough?Or, is that a gloubs linux user on the submit node??? It is also defined as a normal user than a system user, so it also gets env.Thanks, Yoichi -------------------------------------------------------------------------- Yoichi Takayama, PhD Senior Research Fellow RAMP Project MELCOE (Macquarie E-Learning Centre of Excellence) MACQUARIE UNIVERSITY Phone: +61 (0)2 9850 9073 Fax: +61 (0)2 9850 6527 www.mq.edu.au www.melcoe.mq.edu.au/projects/RAMP/ -------------------------------------------------------------------------- MACQUARIE UNIVERSITY: CRICOS Provider No 00002JThis message is intended for the addressee named and may contain confidential information. If you are not the intended recipient, please delete it and notify the sender. Views expressed in this message are those of the individual sender, and are not necessarily the views of Macquarie E-Learning Centre Of Excellence (MELCOE) or Macquarie University.
smime.p7s
Description: S/MIME cryptographic signature
