Hmm,

Well I just turned on logging to /var/log/gridftp.log and guess what, it
works now, I'm not getting the error I was before!  I doubt turning on
logging had anything to do with it though; perhaps it was just a mysterious
transient problem, because it has been a few hours since I last tried.  If
it surfaces again I'll let you know.

thanks,
Adam



On Wed, Sep 24, 2008 at 5:07 PM, John Bresnahan <[EMAIL PROTECTED]>wrote:

> can you send the gridftp server logs as well?
>
>
> Adam Bazinet wrote:
>
>> Hi,
>>
>> I'm getting a strange error with every job I submit to Condor through my
>> GT
>> 4.2.0 installation.  Job submits and runs fine, but fails during the
>> fileCleanUp stage.  Here's one look at the error:
>>
>> [EMAIL PROTECTED]:/export/grid_files/171727100.24523977945405806>
>> globusrun-ws
>> -status -j jobEPR.txt
>> Current job state: Failed
>> globusrun-ws: Job failed: Staging error for RSL element fileCleanUp.
>> Connection creation error [Caused by: java.io.EOFException]
>>
>> The relevant snippet from the job description is here:
>>
>> <fileCleanUp>
>> <deletion>
>> <file>file:///${GLOBUS_SCRATCH_DIR}/171727100.24523977945405806/</file>
>> </deletion>
>> </fileCleanUp>
>>
>> I can assure you there is nothing special about the directory in question.
>> In fact, submissions to our custom BOINC job manager (with the same
>> fileCleanUp block) in the same container work just fine.  In fact, we have
>> another identical 4.2.0 installation on another host that submits to
>> Condor
>> just fine.  However, I can't seem to get it to work in this container.
>>  One
>> difference is that this is a RHEL5 host, and the other host I just
>> mentioned
>> is running RHEL4.
>>
>> I turned on RFT debugging and I can narrow down the error to this attempt:
>>
>> 2008-09-24T13:47:12.125-04:00 ERROR cache.ConnectionManager
>> [Thread-32,createNewConnection:345] Can't create connection:
>> java.io.EOFException
>> 2008-09-24T13:47:12.127-04:00 ERROR service.TransferWork
>> [Thread-32,run:413]
>> Transient transfer error
>> Connection creation error [Caused by: java.io.EOFException]
>> Connection creation error. Caused by java.io.EOFException
>>    at org.globus.ftp.vanilla.Reply.<init>(Reply.java:78)
>>    at
>> org.globus.ftp.vanilla.FTPControlChannel.read(FTPControlChannel.java:342)
>>    at
>>
>> org.globus.ftp.vanilla.FTPControlChannel.readInitialReplies(FTPControlChannel.java:225)
>>    at
>> org.globus.ftp.vanilla.FTPControlChannel.open(FTPControlChannel.java:214)
>>    at org.globus.ftp.GridFTPClient.<init>(GridFTPClient.java:74)>
>>    at
>>
>> org.globus.transfer.reliable.service.cache.SingleConnectionImpl.<init>(SingleConnectionImpl.java:66)
>>    at
>>
>> org.globus.transfer.reliable.service.cache.ConnectionManager.createNewConnection(ConnectionManager.java:327)
>>    at
>>
>> org.globus.transfer.reliable.service.cache.ConnectionManager.getConnection(ConnectionManager.java:190)
>>    at
>>
>> org.globus.transfer.reliable.service.cache.ConnectionManager.getConnection(ConnectionManager.java:127)
>>    at
>>
>> org.globus.transfer.reliable.service.client.DeleteClient.<init>(DeleteClient.java:43)
>>    at
>>
>> org.globus.transfer.reliable.service.client.ClientFactory.createDeleteClient(ClientFactory.java:61)
>>    at
>>
>> org.globus.transfer.reliable.service.TransferWork.run(TransferWork.java:347)
>>    at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown
>> Source)
>>    at java.lang.Thread.run(Thread.java:595)
>> 2008-09-24T13:47:12.136-04:00 DEBUG service.TransferWork
>> [Thread-32,setFault:219] setting transient fault
>> 2008-09-24T13:47:12.136-04:00 DEBUG service.TransferWork
>> [Thread-32,processStates:246] [Request 62, Transfer 250] processing state
>> for transfer of gsiftp://
>>
>> lysine.umiacs.umd.edu:2811/fs/mikehomes/gt4admin/.globus/scratch/171727100.24523977945405806/
>> ->  null
>>
>> I guess a transfer to 'null' in RFT really means delete the directory.
>> However, it is consistently failing with this strange EOFException.  To me
>> the fact that it only occurs when submitting to Condor is really strange;
>> I've already reinstalled the entire gt4-gram-condor unit but there was no
>> change.  I'll attach the container log with RFT/GRAM debug turned on.
>>
>> thanks,
>> Adam
>>
>
>

Reply via email to