the error looks like a connection to the server fails when trying to obtain a session for the
delete. The server either never gets started or times out. can you send the configuration for the
the gridftp servers in question?
Adam Bazinet wrote:
Hmm,
Well I just turned on logging to /var/log/gridftp.log and guess what, it
works now, I'm not getting the error I was before! I doubt turning on
logging had anything to do with it though; perhaps it was just a mysterious
transient problem, because it has been a few hours since I last tried. If
it surfaces again I'll let you know.
thanks,
Adam
On Wed, Sep 24, 2008 at 5:07 PM, John Bresnahan <[EMAIL PROTECTED]>wrote:
can you send the gridftp server logs as well?
Adam Bazinet wrote:
Hi,
I'm getting a strange error with every job I submit to Condor through my
GT
4.2.0 installation. Job submits and runs fine, but fails during the
fileCleanUp stage. Here's one look at the error:
[EMAIL PROTECTED]:/export/grid_files/171727100.24523977945405806>
globusrun-ws
-status -j jobEPR.txt
Current job state: Failed
globusrun-ws: Job failed: Staging error for RSL element fileCleanUp.
Connection creation error [Caused by: java.io.EOFException]
The relevant snippet from the job description is here:
<fileCleanUp>
<deletion>
<file>file:///${GLOBUS_SCRATCH_DIR}/171727100.24523977945405806/</file>
</deletion>
</fileCleanUp>
I can assure you there is nothing special about the directory in question.
In fact, submissions to our custom BOINC job manager (with the same
fileCleanUp block) in the same container work just fine. In fact, we have
another identical 4.2.0 installation on another host that submits to
Condor
just fine. However, I can't seem to get it to work in this container.
One
difference is that this is a RHEL5 host, and the other host I just
mentioned
is running RHEL4.
I turned on RFT debugging and I can narrow down the error to this attempt:
2008-09-24T13:47:12.125-04:00 ERROR cache.ConnectionManager
[Thread-32,createNewConnection:345] Can't create connection:
java.io.EOFException
2008-09-24T13:47:12.127-04:00 ERROR service.TransferWork
[Thread-32,run:413]
Transient transfer error
Connection creation error [Caused by: java.io.EOFException]
Connection creation error. Caused by java.io.EOFException
at org.globus.ftp.vanilla.Reply.<init>(Reply.java:78)
at
org.globus.ftp.vanilla.FTPControlChannel.read(FTPControlChannel.java:342)
at
org.globus.ftp.vanilla.FTPControlChannel.readInitialReplies(FTPControlChannel.java:225)
at
org.globus.ftp.vanilla.FTPControlChannel.open(FTPControlChannel.java:214)
at org.globus.ftp.GridFTPClient.<init>(GridFTPClient.java:74)>
at
org.globus.transfer.reliable.service.cache.SingleConnectionImpl.<init>(SingleConnectionImpl.java:66)
at
org.globus.transfer.reliable.service.cache.ConnectionManager.createNewConnection(ConnectionManager.java:327)
at
org.globus.transfer.reliable.service.cache.ConnectionManager.getConnection(ConnectionManager.java:190)
at
org.globus.transfer.reliable.service.cache.ConnectionManager.getConnection(ConnectionManager.java:127)
at
org.globus.transfer.reliable.service.client.DeleteClient.<init>(DeleteClient.java:43)
at
org.globus.transfer.reliable.service.client.ClientFactory.createDeleteClient(ClientFactory.java:61)
at
org.globus.transfer.reliable.service.TransferWork.run(TransferWork.java:347)
at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Thread.java:595)
2008-09-24T13:47:12.136-04:00 DEBUG service.TransferWork
[Thread-32,setFault:219] setting transient fault
2008-09-24T13:47:12.136-04:00 DEBUG service.TransferWork
[Thread-32,processStates:246] [Request 62, Transfer 250] processing state
for transfer of gsiftp://
lysine.umiacs.umd.edu:2811/fs/mikehomes/gt4admin/.globus/scratch/171727100.24523977945405806/
-> null
I guess a transfer to 'null' in RFT really means delete the directory.
However, it is consistently failing with this strange EOFException. To me
the fact that it only occurs when submitting to Condor is really strange;
I've already reinstalled the entire gt4-gram-condor unit but there was no
change. I'll attach the container log with RFT/GRAM debug turned on.
thanks,
Adam