Honestly, this sounds like a use case for RLS, the replication location service. You can have logical file names and a map from the logical names to where they are physically instantiated. In that case you would query RLS to find out if a particular node already had a copy of your file or not. If it didn't, you could stage it in.

Regarding the creation of symlinks, I don't think RFT/GridFTP do that. You could use the Fork jobmanager to submit a symlink job if your compute server allows fork submissions.


Charles

On Jan 7, 2008, at 3:26 PM, Adam Bazinet wrote:

Hi,

I recently posted this message on rft-user, but seeing as that list doesn't get much traffic, I hope no one minds if I try again here.

I'm trying to implement a file caching scheme with GRAM/RFT/ GridFTP, such that when GRAM jobs are submitted to remote resources, input files end up in a cache of sorts on the remote resource. This is all in an effort to cut down on unnecessary file duplication between jobs that are submitted.

The structure of the cache might look like this:

${GLOBUS_SCRATCH_DIR}/cache/md5sum_file1/md5sum
${GLOBUS_SCRATCH_DIR}/cache/md5sum_file1/foo -> $ {GLOBUS_SCRATCH_DIR}/cache/md5sum_file1/md5sum ${GLOBUS_SCRATCH_DIR}/cache/md5sum_file1/bar -> $ {GLOBUS_SCRATCH_DIR}/cache/md5sum_file1/md5sum ${GLOBUS_SCRATCH_DIR}/cache/md5sum_file1/baz -> $ {GLOBUS_SCRATCH_DIR}/cache/md5sum_file1/md5sum

${GLOBUS_SCRATCH_DIR}/cache/md5sum_file2/md5sum
${GLOBUS_SCRATCH_DIR}/cache/md5sum_file2/foo -> $ {GLOBUS_SCRATCH_DIR}/cache/md5sum_file2/md5sum ${GLOBUS_SCRATCH_DIR}/cache/md5sum_file2/bar -> $ {GLOBUS_SCRATCH_DIR}/cache/md5sum_file2/md5sum ${GLOBUS_SCRATCH_DIR}/cache/md5sum_file2/baz -> $ {GLOBUS_SCRATCH_DIR}/cache/md5sum_file2/md5sum

In this example, you can imagine that md5sum, md5sum_file1, and md5sum_file2 are all actual md5sums.

There are really only three cases that need to be handled:

If file exists in the cache with the same name
   do nothing
If file exists in the cache with a different name
   create a symlink in the cache
Else
   upload file and create symlink

Ideally I would like this to happen automatically when a job is submitted -- I never used the old GASS cache, but from some Googling perhaps what I'm proposing is similar. Let's say before I construct my RSL file, I need to figure out whether or not a file exists on the remote resource. Is there a way to do this with pure RFT/GridFTP?

Furthermore, is there a way to cause a symlink to be created on a remote resource using RFT/GridFTP?

I thought I'd start with this list before I submit a message to either the GRAM or GridFTP lists.

Thanks,
Adam


Reply via email to