Re: [gt-user] determining if a file exists (on a remote resource) and creating symlinks (on a remote resource)

Charles Bacon Tue, 22 Jan 2008 10:18:33 -0800

On Jan 21, 2008, at 3:17 PM, Adam Bazinet wrote:

I'm basically going ahead with your suggestion and plan on usingRLS/DRS to achieve the kind of file caching that I want. However,I'm struggling with how I should use these tools to meet our needsbased solely on the documentation and the limited amount I've beenable to play around with them.

I'm hoping to get Rob Schuler to chime in with some suggestions, asI'm not a particularly active RLS/DRS user myself.

As a prerequisite to replicating files with DRS, do they alreadyneed to have LFN entries in an RLS somewhere? In other words, oncethe RLS is bootstrapped with at least one LFN, perhaps it ispossible to use DRS to replicate it. But it is also possible tohave DRS transfer files/make entries in an RLS de novo, from scratch?

I don't know, but I suspect that the assumption is that there is noalgorithmic way to convert your PFN into an LFN. The LFNconceptually holds metadata regarding the file. It's an interestingidea, though, that you could provide some command/script that youcould run against a file and extract an LFN. That would work ifthere's enough metadata in the file itself to regenerate the LFN, butI think it was probably developed under the assumption that that'snot possible.

So, one simple question is: do I need an RLS installation on eachremote resource, or can I get away with the single RLS installationon the Grid server keeping track of the locations of files on amodest (~10) number of these resources?

I believe you can get away with a single RLS server, and can add morelater if you find that you need them.

The main reason I'm interested in this is because we are runninginto the situation where we are staging files in needed by jobs (orjob batches) over and over again to the same resource, andsometimes duplicate files get staged unnecessarily, wastingbandwidth and disk space. Here's a more technical question,though. What if two different jobs need the same logical file, butneed it named two different things?

This is a place where you may find yourself writing a little wrapperaround the RLS/DRS functionality. I think you could use DRS to getthe first copy/name at a location. Subsequently if you search thecatalog and find that there's an LFN->PFN mapping that suits you interms of host but not in filename you could add a symlink and makeanother entry in the catalog to show that LFN->PFN map. Part of thedifficulty here is the co-scheduling of storage resources and computeresources. If you're using DRS, you have to get the file therebefore your job hits the queue, while if you use GRAM's staging youdon't get the benefits of DRS. So it's not clear to me whetheryou're better off with DRS + a small script, or RLS + GRAM + a largerscript, where the larger script does some of the things DRS wouldhave done for you.

Either way, I suspect you're going to want the client to run some RLS/DRS queries/transfers before the job is submitted at all.



Charles

Re: [gt-user] determining if a file exists (on a remote resource) and creating symlinks (on a remote resource)

Reply via email to