Honestly, this sounds like a use case for RLS, the replication
location service. You can have logical file names and a map from the
logical names to where they are physically instantiated. In that
case you would query RLS to find out if a particular node already had
a copy of your file or not. If it didn't, you could stage it in.
Regarding the creation of symlinks, I don't think RFT/GridFTP do
that. You could use the Fork jobmanager to submit a symlink job if
your compute server allows fork submissions.
Charles
On Jan 7, 2008, at 3:26 PM, Adam Bazinet wrote:
Hi,
I recently posted this message on rft-user, but seeing as that list
doesn't get much traffic, I hope no one minds if I try again here.
I'm trying to implement a file caching scheme with GRAM/RFT/
GridFTP, such that when GRAM jobs are submitted to remote
resources, input files end up in a cache of sorts on the remote
resource. This is all in an effort to cut down on unnecessary file
duplication between jobs that are submitted.
The structure of the cache might look like this:
${GLOBUS_SCRATCH_DIR}/cache/md5sum_file1/md5sum
${GLOBUS_SCRATCH_DIR}/cache/md5sum_file1/foo -> $
{GLOBUS_SCRATCH_DIR}/cache/md5sum_file1/md5sum
${GLOBUS_SCRATCH_DIR}/cache/md5sum_file1/bar -> $
{GLOBUS_SCRATCH_DIR}/cache/md5sum_file1/md5sum
${GLOBUS_SCRATCH_DIR}/cache/md5sum_file1/baz -> $
{GLOBUS_SCRATCH_DIR}/cache/md5sum_file1/md5sum
${GLOBUS_SCRATCH_DIR}/cache/md5sum_file2/md5sum
${GLOBUS_SCRATCH_DIR}/cache/md5sum_file2/foo -> $
{GLOBUS_SCRATCH_DIR}/cache/md5sum_file2/md5sum
${GLOBUS_SCRATCH_DIR}/cache/md5sum_file2/bar -> $
{GLOBUS_SCRATCH_DIR}/cache/md5sum_file2/md5sum
${GLOBUS_SCRATCH_DIR}/cache/md5sum_file2/baz -> $
{GLOBUS_SCRATCH_DIR}/cache/md5sum_file2/md5sum
In this example, you can imagine that md5sum, md5sum_file1, and
md5sum_file2 are all actual md5sums.
There are really only three cases that need to be handled:
If file exists in the cache with the same name
do nothing
If file exists in the cache with a different name
create a symlink in the cache
Else
upload file and create symlink
Ideally I would like this to happen automatically when a job is
submitted -- I never used the old GASS cache, but from some
Googling perhaps what I'm proposing is similar. Let's say before I
construct my RSL file, I need to figure out whether or not a file
exists on the remote resource. Is there a way to do this with pure
RFT/GridFTP?
Furthermore, is there a way to cause a symlink to be created on a
remote resource using RFT/GridFTP?
I thought I'd start with this list before I submit a message to
either the GRAM or GridFTP lists.
Thanks,
Adam