RE: [gt-user] determining if a file exists (on a remote resource) and creating symlinks (on a remote resource) [DRS]

Robert Schuler Tue, 22 Jan 2008 11:52:52 -0800

Hi, Adam,

|> From: "Adam Bazinet" <[EMAIL PROTECTED]>
|> Subject: Re: [gt-user] determining if a file exists (on a remote
|> resource) and creating symlinks (on a remote resource)
|>
|> Hi,
|>
|> I'm basically going ahead with your suggestion and plan on using
|> RLS/DRS to achieve the kind of file caching that I want.  However,
|> I'm struggling with how I should use these tools to meet our needs
|> based solely on the documentation and the limited amount I've been
|> able to play around with them.
|>
|> For example, DRS seems promising as a tool that can query an RLS
|> installation (and in saying RLS i'm subsuming the LRC/RLI pairing
|> for simplicity), actually perform file transfers via RFT, and
|> record the new locations of the files in the RLS.  However, I don't
|> exactly understand the format of the request file as shown:
|>
|> testrun-1      gsiftp://myhost:9001/sandbox/files/testrun-1
|> testrun-2      gsiftp://myhost:9001/sandbox/files/testrun-2
|> testrun-3      gsiftp://myhost:9001/sandbox/files/testrun-3
|> testrun-4      gsiftp://myhost:9001/sandbox/files/testrun-4
|> testrun-5      gsiftp://myhost:9001/sandbox/files/testrun-5
|>
|> As a prerequisite to replicating files with DRS, do they already
|> need to have LFN entries in an RLS somewhere?  In other words, once


That's right. The 1st column of the request file is the LFN which must
be (already) registered in the RLS.

|> the RLS is bootstrapped with at least one LFN, perhaps it is
|> possible to use DRS to replicate it.  But it is also possible to
|> have DRS transfer files/make entries in an RLS de novo, from scratch?

No. Though that's something we could consider putting on our development
roadmap.

|>
|> Otherwise, it seems like I would have to do all of the RFT
|> transfers myself, either as part of the GRAM job submission or
|> separately, and do all of the RLS updating myself too, which is
|> probably why DRS was created.
|>
|> I've read about how flexible the LRC/RLI configuration can be, and
|> I'm aware that there's probably no single "best" way to set it up.
|> However, what would be the simplest possible configuration that
|> would work for me?  I already have a properly functioning LRC/RLI
|> on a machine let's call the "Grid server".  GRAM jobs are submitted
|> from this machine to one of many other Grid resources (other Globus
|> installations).  It's on these remote resources too that I want
|> some kind of file cache, i.e., some location into which job input
|> files can be staged without unnecessary file duplication.
|> Therefore, I need a way of knowing which files are on that remote
|> resource, where they exist, and so on, which it seems RLS can
|> provide.  So, one simple question is: do I need an RLS installation
|> on each remote resource, or can I get away with the single RLS
|> installation on the Grid server keeping track of the locations of
|> files on a modest (~10) number of these resources?


Your scenario sounds appropriate for RLS. And, yes, it's reasonable to
start off with a single RLS installation. And if you write your client
software (or use DRS) to query the "RLI" first and then the "LRC" based
on the results returned by the RLI, you can easily extend your
configuration in the future by setting up additional LRCs and sending
their index "updates" to the RLI.

|>
|> The main reason I'm interested in this is because we are running
|> into the situation where we are staging files in needed by jobs (or
|> job batches) over and over again to the same resource, and
|> sometimes duplicate files get staged unnecessarily, wasting
|> bandwidth and disk space.  Here's a more technical question,
|> though.  What if two different jobs need the same logical file, but
|> need it named two different things?  If I were setting it up by
|> hand, I could make one file a symlink to the other to avoid wasting
|> disk space.  The corollary to this in RLS-speak would be one LFN,
|> two PFNs on the same filesystem -- and ideally where one PFN is a
|> symlink to the other (or better yet!  both PFNs are symlinks to a
|> third PFN, the actual instantiation, so to speak, of the LFN).  But
|> I don't see any way to do this with RFT/RLS/DRS.  So even though
|> the system would know that on a remote filesystem the file resource
|> it needs already exists, it would have to stage in the file again
|> (or make a copy -- either way, it's another PFN) and waste disk
|> space in the process.  It's a situation that hopefully won't happen
|> very often, but I'd like to have a better solution for it.

That's an interesting scenario. The RLS allows for multiple LFNs to
point to multiple (or just one) PFNs. So you could have different LFN
names for the same physical file PFN. That might be another way of
getting this symlink-like naming you want.


Rob

RE: [gt-user] determining if a file exists (on a remote resource) and creating symlinks (on a remote resource) [DRS]

Reply via email to