Dear all, I could not check this before as machines are heavily used. I came to the conclusion that something is wrong with GT 4.0.5. Please correct me if I am wrong. We have two types of servers: romeo (IA-64, Globus 4.0.2, and Java 1.4.2) and hector (X86-64, Globus 4.0.4, and Java 1.6). Both are running nicely with old versions of GT. Both have the same entries in their sudo file:
#Globus GRAM entries globus ALL=(ALL,!root) NOPASSWD: /opt/globus/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /opt/globus/libexec/globus-job-manager-script.pl * globus ALL=(ALL,!root) NOPASSWD: /opt/globus/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /opt/globus/libexec/globus-gram-local-proxy-tool * The GLOBUS_LOCATION is /opt/globus which is a symlink. The sudoer have no problems with symlink in the older versions. Now, I compiled and installed GT 4.0.5 into a different directory. Changed the symlink to the new location. Everything (gsissh, GridFTP, and RFT) is working except job submission. Here is what you see at the client side: [EMAIL PROTECTED]:~> globusrun-ws -submit -S -F https://romeo.urz.tu-dresden.de:8443/wsrf/services/ManagedJobFactoryService -s -c /bin/date Delegating user credentials...Done. Submitting job...Done. Job ID: uuid:2cc1e60a-70be-11dc-b9e1-080069149999 Termination time: 10/03/2007 08:05 GMT Current job state: Failed Destroying job...Done. Cleaning up any delegated credentials...Done. globusrun-ws: Job failed: Error code: 200 Sudo is misconfigured to run the globus-job-manager-script.pl script for user zimd0022. [EMAIL PROTECTED]:~> *The entries in romeo's container file are below: * 2007-10-02 10:05:04,977 DEBUG exec.StateMachine [RunQueueThread_5,runScript:2987] running script submit 2007-10-02 10:05:04,998 DEBUG exec.JobManagerScript [Thread-18,run:208] Executing command: /usr/bin/sudo -H -u zimd0022 -S /opt/globus-4.0.5/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /opt/globus-4.0.5/libexec/globus-jo b-manager-script.pl -m fork -f /opt/globus-4.0.5/tmp/gram_job_mgr52269.tmp -c submit 2007-10-02 10:05:05,160 DEBUG exec.JobManagerScript [Thread-18,run:225] first line: null 2007-10-02 10:05:05,161 DEBUG exec.JobManagerScript [Thread-18,run:335] failure message: Sudo is misconfigured to run the globus-job-manager-script.pl scri pt for user zimd0022. 2007-10-02 10:05:05,162 DEBUG exec.JobManagerScript [Thread-18,setDone:345] script is done, setting done flag 2007-10-02 10:05:05,163 DEBUG exec.StateMachine [RunQueueThread_5,processSubmitState:1168] Done waiting for submit script 2007-10-02 10:05:05,164 DEBUG exec.StateMachine [RunQueueThread_5,processSubmitState:1176] script return code: 200 2007-10-02 10:05:05,165 DEBUG exec.StateMachine [RunQueueThread_5,processSubmitState:1181] script return code means error! 2007-10-02 10:05:05,177 DEBUG exec.StateMachine [RunQueueThread_5,createFaultFromErrorCode:3131] Creating fault from error code 200 2007-10-02 10:05:05,177 WARN exec.StateMachine [RunQueueThread_5,createFaultFromErrorCode:3270] Unhandled fault code 200 2007-10-02 10:05:05,178 DEBUG exec.StateMachine [RunQueueThread_5,createFaultFromErrorCode:3271] Offending Script Command: submit 2007-10-02 10:05:05,184 DEBUG utils.FaultUtils [RunQueueThread_5,createFault:422] Script Command: submit 2007-10-02 10:05:05,196 DEBUG utils.FaultUtils [RunQueueThread_5,makeFault:460] Fault Class: class org.globus.exec.generated.FaultType 2007-10-02 10:05:05,196 DEBUG utils.FaultUtils [RunQueueThread_5,makeFault:461] Resource Key: {http://www.globus.org/namespaces/2004/10/gram/job}ResourceID =2e3cb5a0-70be-11dc-b1ea-b1743b772918 2007-10-02 10:05:05,196 DEBUG utils.FaultUtils [RunQueueThread_5,makeFault:462] Description: Error code: 200 2007-10-02 10:05:05,197 DEBUG utils.FaultUtils [RunQueueThread_5,makeFault:463] Cause: java.lang.Exception: Sudo is misconfigured to run the globus-job-man ager-script.pl script for user zimd0022. 2007-10-02 10:05:05,197 DEBUG utils.FaultUtils [RunQueueThread_5,makeFault:464] State when failure occurred Unsubmitted 2007-10-02 10:05:05,197 DEBUG utils.FaultUtils [RunQueueThread_5,makeFault:466] Script Command: submit 2007-10-02 10:05:05,198 DEBUG utils.FaultUtils [RunQueueThread_5,makeFault:467] GT2 Error Code: 200 2007-10-02 10:05:05,225 DEBUG utils.FaultUtils [RunQueueThread_5,makeFault:514] setting fault cause 2007-10-02 10:05:05,227 DEBUG utils.FaultUtils [RunQueueThread_5,makeFault:519] Script Command: submit *So, I took the sudo command and directly executed and gave the globus password. * [EMAIL PROTECTED]:~> /usr/bin/sudo -H -u zimd0022 -S /opt/globus-4.0.5/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /opt/globus-4.0.5/libexec/globus-job-manager-script.pl -m fork -f /opt/globus-4.0.5/tmp/gram_job_mgr32745.tmp -c submit Password: Sorry, try again. Password: Sorry, try again. Password: Sorry, try again. /usr/bin/sudo: 3 incorrect password attempts [EMAIL PROTECTED]:~> [EMAIL PROTECTED]:~> ls -altrh /opt/ total 24K ........... drwxr-xr-x 16 globus globus 4.0K 2007-09-27 15:38 globus-4.0.5 drwxr-xr-x 16 globus globus 4.0K 2007-10-01 11:47 globus-4.0.2 lrwxrwxrwx 1 root root 12 2007-10-01 11:59 globus -> globus-4.0.5 drwxr-xr-x 17 root root 4.0K 2007-10-01 11:59 . [EMAIL PROTECTED]:~> *Now on hector: * [EMAIL PROTECTED]:~> globusrun-ws -submit -S -F https://hector.zih.tu-dresden.de:8443/wsrf/services/ManagedJobFactoryService -s -c /bin/date Delegating user credentials...Done. Submitting job...Done. Job ID: uuid:a9a0a49a-70be-11dc-bf1f-080069149999 Termination time: 10/03/2007 08:08 GMT Current job state: Failed Destroying job...Done. Cleaning up any delegated credentials...Done. globusrun-ws: Job failed: Error code: 201 Script stderr: zimd0022's password: [EMAIL PROTECTED]:~> *The entries in container log: * 2007-10-02 10:08:32,890 DEBUG exec.StateMachine [RunQueueThread_13,runScript:2987] running script submit 2007-10-02 10:08:32,890 DEBUG exec.JobManagerScript [Thread-18,run:208] Executing command: /usr/bin/sudo -H -u zimd0022 -S /opt/globus-4.0.5/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /opt/globus-4.0.5/libexec/globus-jo b-manager-script.pl -m fork -f /opt/globus-4.0.5/tmp/gram_job_mgr63230.tmp -c submit 2007-10-02 10:08:32,983 DEBUG exec.JobManagerScript [Thread-18,run:225] first line: null 2007-10-02 10:08:32,984 DEBUG exec.JobManagerScript [Thread-18,run:335] failure message: Script stderr: zimd0022's password: 2007-10-02 10:08:32,984 DEBUG exec.JobManagerScript [Thread-18,setDone:345] script is done, setting done flag 2007-10-02 10:08:32,985 DEBUG exec.StateMachine [RunQueueThread_13,processSubmitState:1168] Done waiting for submit script 2007-10-02 10:08:32,986 DEBUG exec.StateMachine [RunQueueThread_13,processSubmitState:1176] script return code: 201 2007-10-02 10:08:32,986 DEBUG exec.StateMachine [RunQueueThread_13,processSubmitState:1181] script return code means error! 2007-10-02 10:08:32,986 DEBUG exec.StateMachine [RunQueueThread_13,createFaultFromErrorCode:3131] Creating fault from error code 201 2007-10-02 10:08:32,986 WARN exec.StateMachine [RunQueueThread_13,createFaultFromErrorCode:3270] Unhandled fault code 201 2007-10-02 10:08:32,987 DEBUG exec.StateMachine [RunQueueThread_13,createFaultFromErrorCode:3271] Offending Script Command: submit 2007-10-02 10:08:32,991 DEBUG utils.FaultUtils [RunQueueThread_13,createFault:422] Script Command: submit 2007-10-02 10:08:32,997 DEBUG utils.FaultUtils [RunQueueThread_13,makeFault:460] Fault Class: class org.globus.exec.generated.FaultType 2007-10-02 10:08:32,998 DEBUG utils.FaultUtils [RunQueueThread_13,makeFault:461] Resource Key: {http://www.globus.org/namespaces/2004/10/gram/job}ResourceI D=aaadb440-70be-11dc-bd72-842811339d49 2007-10-02 10:08:32,998 DEBUG utils.FaultUtils [RunQueueThread_13,makeFault:462] Description: Error code: 201 2007-10-02 10:08:32,998 DEBUG utils.FaultUtils [RunQueueThread_13,makeFault:463] Cause: java.lang.Exception: Script stderr: zimd0022's password: 2007-10-02 10:08:32,998 DEBUG utils.FaultUtils [RunQueueThread_13,makeFault:464] State when failure occurred Unsubmitted 2007-10-02 10:08:32,998 DEBUG utils.FaultUtils [RunQueueThread_13,makeFault:466] Script Command: submit 2007-10-02 10:08:32,999 DEBUG utils.FaultUtils [RunQueueThread_13,makeFault:467] GT2 Error Code: 201 2007-10-02 10:08:33,006 DEBUG utils.FaultUtils [RunQueueThread_13,makeFault:514] setting fault cause 2007-10-02 10:08:33,007 DEBUG utils.FaultUtils [RunQueueThread_13,makeFault:519] Script Command: submit *Result of sudo command direct execution. * [EMAIL PROTECTED]:~> /usr/bin/sudo -H -u zimd0022 -S /opt/globus-4.0.5/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /opt/globus-4.0.5/libexec/globus-job-manager-script.pl -m fork -f /opt/globus-4.0.5/tmp/gram_job_mgr63230.tmp -c submit Password: Sorry, try again. Password: Sorry, try again. Password: Sorry, try again. /usr/bin/sudo: 3 incorrect password attempts [EMAIL PROTECTED]:~> [EMAIL PROTECTED]:~> ls -altrh /opt/ total 16K ............... drwxr-xr-x 16 globus globus 4.0K 2007-09-20 01:43 globus-4.0.4 lrwxrwxrwx 1 root root 12 2007-10-02 09:49 globus -> globus-4.0.5 drwxr-xr-x 16 globus globus 4.0K 2007-10-02 09:50 globus-4.0.5 [EMAIL PROTECTED]:~> So, I set the globus to again to globus-4.0.4. hector:/opt # rm globus hector:/opt # ln -s globus-4.0.4 globus hector:/opt # ls -altrh total 16K ....... drwxr-xr-x 16 globus globus 4.0K 2007-09-20 01:43 globus-4.0.4 drwxr-xr-x 16 globus globus 4.0K 2007-10-02 09:50 globus-4.0.5 lrwxrwxrwx 1 root root 12 2007-10-02 10:15 globus -> globus-4.0.4 hector:/opt # *At client: * [EMAIL PROTECTED]:~> globusrun-ws -submit -S -F https://hector.zih.tu-dresden.de:8443/wsrf/services/ManagedJobFactoryService -s -c /bin/date Delegating user credentials...Done. Submitting job...Done. Job ID: uuid:d39adff8-70bf-11dc-a4b2-080069149999 Termination time: 10/03/2007 08:16 GMT Current job state: Active Current job state: CleanUp-Hold Tue Oct 2 10:16:52 CEST 2007 Current job state: CleanUp Current job state: Done Destroying job...Done. Cleaning up any delegated credentials...Done. [EMAIL PROTECTED]:~> *Container log: *2007-10-02 10:16:52,525 DEBUG exec.StateMachine [RunQueueThread_0,runScript:2883] running script submit 2007-10-02 10:16:52,525 DEBUG exec.JobManagerScript [Thread-14,run:208] Executing command: /usr/bin/sudo -H -u zimd0022 -S /opt/globus/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /opt/globus/libexec/globus-job-manager-sc ript.pl -m fork -f /opt/globus/tmp/gram_job_mgr18847.tmp -c submit 2007-10-02 10:16:52,686 DEBUG exec.JobManagerScript [Thread-14,run:225] first line: GRAM_SCRIPT_JOB_ID:d4e0e07e-70bf-11dc-aba6-00151714069c:1045 2007-10-02 10:16:52,686 DEBUG exec.JobManagerScript [Thread-14,run:228] Read line: GRAM_SCRIPT_JOB_ID:d4e0e07e-70bf-11dc-aba6-00151714069c:1045 2007-10-02 10:16:52,686 DEBUG exec.JobManagerScript [Thread-14,run:240] Received local job ID d4e0e07e-70bf-11dc-aba6-00151714069c:1045 2007-10-02 10:16:52,686 DEBUG exec.JobManagerScript [Thread-14,run:228] Read line: GRAM_SCRIPT_JOB_STATE:2 2007-10-02 10:16:52,687 DEBUG exec.JobManagerScript [Thread-14,run:335] failure message: null 2007-10-02 10:16:52,688 DEBUG exec.JobManagerScript [Thread-14,setDone:345] script is done, setting done flag 2007-10-02 10:16:52,688 DEBUG exec.StateMachine [RunQueueThread_0,processSubmitState:1105] Done waiting for submit script 2007-10-02 10:16:52,688 DEBUG exec.StateMachine [RunQueueThread_0,processSubmitState:1129] script return code: 0 2007-10-02 10:16:52,689 DEBUG exec.StateMachine [RunQueueThread_0,processSubmitState:1161] script returned job state: Active 2007-10-02 10:16:52,690 DEBUG ManagedJobResourceImpl.d39adff8-70bf-11dc-a4b2-080069149999 [RunQueueThread_0,getResourceDatum:217] getting resource datum localJobId 2007-10-02 10:16:52,690 DEBUG ManagedJobResourceImpl.d39adff8-70bf-11dc-a4b2-080069149999 [RunQueueThread_0,getResourceDatum:223] Obtaining lock on resourceData 2007-10-02 10:16:52,690 DEBUG ManagedJobResourceImpl.d39adff8-70bf-11dc-a4b2-080069149999 [RunQueueThread_0,getResourceDatum:226] Obtained lock on resourceData 2007-10-02 10:16:52,690 DEBUG ManagedJobResourceImpl.d39adff8-70bf-11dc-a4b2-080069149999 [RunQueueThread_0,getResourceDatum:266] Releasing lock on resourceData *I see the only difference is that in case of GT4.0.4 the symlink is not resolved but sudo is working perfectly as you can see. Sorry for the long mail. But now any tips for me. Cheers, Samatha * -- Samatha Kottha Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH) Technische Universität Dresden Tel: (+49) 351 463-38776 Room 1019 Fax: (+49) 351 463-38245 Noethnitzer Straße 46 01187 Dresden Germany
