Take a look at "Best Practices for Shareable Metadata": http://webservices.itcs.umich.edu/mediawiki/oaibp/index.php/ShareableMetadataPublic
There is a specific section on "Linking from a Record to a Resource and Other Linking Issues". Regards, Tom > -----Original Message----- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > Joe Hourcle > Sent: Monday, February 27, 2012 10:43 AM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] "Repositories", OAI-PMH and web crawling > > On Feb 27, 2012, at 10:51 AM, Godmar Back wrote: > > On Mon, Feb 27, 2012 at 8:31 AM, Diane Hillmann > <metadata.ma...@gmail.com>wrote: > >> On Mon, Feb 27, 2012 at 5:25 AM, Owen Stephens > <o...@ostephens.com> wrote: > > >>> This issue is certainly not unique to VT - we've come across this as > >>> part of our project. While the OAI-PMH record may point at the PDF, > >>> it can > >> also > >>> point to a intermediary page. This seems to be standard practice in > >>> some instances - I think because there is a desire, or even > >>> requirement, that > >> a > >>> user should see the intermediary page (which may contain rights > >> information > >>> etc.) before viewing the full-text item. There may also be an issue > >>> where multiple files exist for the same item - maybe several data > >>> files and a > >> pdf > >>> of the thesis attached to the same metadata record - as the metadata > >>> via OAI-PMH may not describe each asset. > >>> > >>> > >> This has been an issue since the early days of OAI-PMH, and many > >> large providers provide such intermediate pages (arxiv.org, for > >> instance). The other issue driving providers towards intermediate > >> pages is that it allows them to continue to derive statistics from > >> usage of their materials, which direct access URIs and multiple web > >> caches don't. For providers dependent on external funding, this is a > biggie. > >> > > Why do you place direct access URI and multiple web caches into the > > same category? I follow your argument re: usage statistics for web > > caches, but as long as the item remains hosted in the repository > > direct access URIs should still be counted (provided proper > > cache-control headers are sent.) Perhaps it would require server-side > statistics rather than client-based GA. > > I'd agree -- if you can't get good statistics from direct linking, something's > wrong with the methods you're using to collect usage information. Google > Analytics and similar tools might produce pretty reports, but they're really > meant for tracking web sites and won't work when someone has javascript > turned off, has specifically blacklisted the analytics server, or on anything > that's not HTML. > > You *really* need to analyze the server logs directly, as you can't be sure > that all access is going to go through the intermediate 'landing pages' or > that > it'd be tracked even if they did. > > ... > > I admit, the stuff I'm serving is a little different than most people on this > list, > but we also have the issue that the collections are so large that we don't > want people retrieving the files unless they really need them. We serve > multiple TB per day -- I'd rather a person figure out if they want a file > *before* they retrieve it, rather than download a few GB of data and find > out it won't serve their purposes. > > It might not help our 'look how much we serve!' metrics to justify our > funding, but it helps keep our costs down, and I personally believe it helps > with good will in our designated community as they don't spend a day (or > more) downloading only to find it's not what they thought. (and it fits in > with > Ranganathan's 4th law better than saving them from an extra click) > > -Joe