Re: RDF Update Feeds + URI time travel on HTTP-level

Herbert Van de Sompel Mon, 23 Nov 2009 20:04:26 -0800

On Nov 23, 2009, at 4:59 PM, Erik Hetzner wrote:

At Mon, 23 Nov 2009 00:40:33 -0500,
Mark Baker wrote:

On Sun, Nov 22, 2009 at 11:59 PM, Peter Ansell <ansell.pe...@gmail.com> wrote:

It should be up to resource creators to determine when the natureof a
resource changes across time. A web architecture that requires every
single edit to have a different identifier is a large hassle and
likely won't catch on if people find that they can work fine with a
system that evolves constantly using semi-constant identifiers,rather
than through a series of mandatory time based checkpoints.


You seem to have read more into my argument than was there, and
created a strawman; I agree with the above.

My claim is simply that all HTTP requests, no matter the headers, are
requests upon the current state of the resource identified by the
Request-URI, and therefore, a request for a representation of the
state of "Resource X at time T" needs to be directed at the URI for
"Resource X at time T", not "Resource X".


I think this is a very compelling argument.

Actually, I don't think it is. The issue was also brought up (in asignificantly more tentative manner) in Pete Johnston blog entry oneFoundations (http://efoundations.typepad.com/efoundations/2009/11/memento-and-negotiating-on-time.html). Tomorrow, we will post a response that will try and show that"current state" issue is - as far as we can see - not quite as"written in stone" as suggested above in the specs that matter in thiscase, i.e. Architecture of the World Wide Web and RFC 2616. Both areinterestingly vague about this.


On the other hand, there is, nothing I can see that prevents one URI
from representing another URI as it changes through time. This is
already the case with, e.g.,
<http://web.archive.org/web/*/http://example.org>, which represents
the URI <http://example.org> at all times. So this URI could, perhaps,
be a target for X-Accept-Datetime headers.

That is actually what we do in Memento (see our paper http://arxiv.org/abs/0911.1112), and we recognize two cases, here:

(1) If the web server does not keep track of its own archivalversions, then we must rely on archival versions that are storedelsewhere, i.e. in Web Archives. In this case, the original server whoreceives the request can redirect the client to a resource like theone you mention above, i.e. a resource that stands for archivedversions of another resource. Note that this redirect is a simpleredirect like the ones that happen all the time on the Web. This isnot a redirect that is part of a datetime content negotiation flow,rather a redirect that occurs because the server has detected an X-Accept-Datetime header. Now, we don't want to overload the existing <http://web.archive.org/web/*/http://example.org> as you suggest, but rather choose to introduce a special-purposeresource that we call a TimeGate <http://web.archive.org/web/timegate/http://example.org>. And we indeed introduce this resource as a target for datetimecontent negotiation.

(2) If the web server does keep track of its own archival versions(think CMS), then it can handle requests for old versions "locally" asit has all the information that is required to do so. In this case, wecould also introduce a special-purpose, distinct, TimeGate on thisserver, and have the original resource redirect to it. That would makethis case in essence the same as (1) above. This, however, seemed likea bit of overkill and we felt that the original resource and theTimegate could coincide; meaning datetime content negotiation occursdirectly against the original resource. Meaning the URI thatrepresents the resource as it evolves over time is the URI of theresource itself. It stands for past and present versions. The presentversion is delivered (200 OK) from that URI itself (business asusual), archived versions are delivered from other resources viacontent negotiation (302 with Location different than the original URI)

In In both (1) and (2) the original resource plays a role in theframework, either because it redirects to an external TimeGate thatperforms the datetime content negotiation, or because it performs thedatetime content negotiation itself. And we actually think that isquite essential that this original resource is involved. It is the URIof the original resource by which the resource has been known as itevolved over time. It makes sense to be able to use that URI to tryand get to its past versions. And by "get", I don't mean search forit, but rather use the network to get there. After all, we all go bythe same name irrespective of the day you talk to us. Or we have thesame Linked Data URI irrespective of the day it is dereferenced. Whywould we suddenly need a new URI when we want to see what the LoDdescription for any of us was, say, a year ago? Why must we preventthat this same URI helps us to get to prior versions?


There is something else that I find problematic about the Memento
proposal. Archival versions of a web page are too important to hide
inside HTTP headers.

To take the canonical example, if I am viewing
<http://oakland.example.org/weather>, I don’t want the fact that I am
viewing historical weather information to be hidden in the request
headers.

It is not. The _request_ for prior versions is in a request header.The response will come from a URI different than <http://oakland.example.org/weather>, e.g. <http://oakland.example.org/20091012/weather> or <http://web.archive.org/web/20091012/http://oakland.example.org/weather> and there will be a response header provided by the server thatdelivers this response (X-Archive-Interval) that informs the clientunambiguously that the response _is_ an archived version. This infocan be leveraged by the client to give the archived version theposition of first class citizen it deserves.

Furthermore, I am viewing resource X as it appeared at time T1, I
should *not* be able to copy that URI and send it to a friend, or use
it as a reference in a document, only to have them see the URI as it
appears at time T2.

You will not. You would copy the URI <http://oakland.example.org/20091012/weather> or <http://web.archive.org/web/20091012/http://oakland.example.org/weather>.I think the misconception in this discussion is that the archivedversion is _delivered_ by the original URI. It is not. The archivedversion is _requested_ via the original URI, and it is _delivered_ bya resource at another URI. As is the case with all content negotiation.

I think that those of us in the web archiving community [1] would very
much appreciate a serious look by the web architecture community into
the problem of web archiving. The problem of representing and
resolving the tuple <URI, time> is a question which has not yet been
adequately dealt with.

I hope that with Memento we have provided a significant contributiontowards addressing that question. I think our paper at http://arxiv.org/abs/0911.1112describes the proposed solution in quite some details, andaddresses quite some of the concerns raised in the discussion on thislist, so far. And, as indicated before, there's also the slides incase there is not enough time to read the paper (http://www.slideshare.net/hvdsomp/memento-time-travel-for-the-web).


Greetings

Herbert Van de Sompel


best,
Erik Hetzner

1. Those unfamiliar with web archives are encouraged to visit
<http://web.archive.org/>, <http://www.archive-it.org/>,
<http://www.vefsafn.is/>, <http://webarchives.cdlib.org/>, ...
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


==
Herbert Van de Sompel
Digital Library Research & Prototyping
Los Alamos National Laboratory, Research Library
http://public.lanl.gov/herbertv/
tel. +1 505 667 1267

Re: RDF Update Feeds + URI time travel on HTTP-level

Reply via email to