On Nov 23, 2009, at 4:59 PM, Erik Hetzner wrote:
At Mon, 23 Nov 2009 00:40:33 -0500,
Mark Baker wrote:
On Sun, Nov 22, 2009 at 11:59 PM, Peter Ansell <ansell.pe...@gmail.com
> wrote:
It should be up to resource creators to determine when the nature
of a
resource changes across time. A web architecture that requires every
single edit to have a different identifier is a large hassle and
likely won't catch on if people find that they can work fine with a
system that evolves constantly using semi-constant identifiers,
rather
than through a series of mandatory time based checkpoints.
You seem to have read more into my argument than was there, and
created a strawman; I agree with the above.
My claim is simply that all HTTP requests, no matter the headers, are
requests upon the current state of the resource identified by the
Request-URI, and therefore, a request for a representation of the
state of "Resource X at time T" needs to be directed at the URI for
"Resource X at time T", not "Resource X".
I think this is a very compelling argument.
Actually, I don't think it is. The issue was also brought up (in a
significantly more tentative manner) in Pete Johnston blog entry on
eFoundations (http://efoundations.typepad.com/efoundations/2009/11/memento-and-negotiating-on-time.html
). Tomorrow, we will post a response that will try and show that
"current state" issue is - as far as we can see - not quite as
"written in stone" as suggested above in the specs that matter in this
case, i.e. Architecture of the World Wide Web and RFC 2616. Both are
interestingly vague about this.
On the other hand, there is, nothing I can see that prevents one URI
from representing another URI as it changes through time. This is
already the case with, e.g.,
<http://web.archive.org/web/*/http://example.org>, which represents
the URI <http://example.org> at all times. So this URI could, perhaps,
be a target for X-Accept-Datetime headers.
That is actually what we do in Memento (see our paper http://arxiv.org/abs/0911.1112)
, and we recognize two cases, here:
(1) If the web server does not keep track of its own archival
versions, then we must rely on archival versions that are stored
elsewhere, i.e. in Web Archives. In this case, the original server who
receives the request can redirect the client to a resource like the
one you mention above, i.e. a resource that stands for archived
versions of another resource. Note that this redirect is a simple
redirect like the ones that happen all the time on the Web. This is
not a redirect that is part of a datetime content negotiation flow,
rather a redirect that occurs because the server has detected an X-
Accept-Datetime header. Now, we don't want to overload the existing <http://web.archive.org/web/*/http://example.org
> as you suggest, but rather choose to introduce a special-purpose
resource that we call a TimeGate <http://web.archive.org/web/timegate/http://example.org
>. And we indeed introduce this resource as a target for datetime
content negotiation.
(2) If the web server does keep track of its own archival versions
(think CMS), then it can handle requests for old versions "locally" as
it has all the information that is required to do so. In this case, we
could also introduce a special-purpose, distinct, TimeGate on this
server, and have the original resource redirect to it. That would make
this case in essence the same as (1) above. This, however, seemed like
a bit of overkill and we felt that the original resource and the
Timegate could coincide; meaning datetime content negotiation occurs
directly against the original resource. Meaning the URI that
represents the resource as it evolves over time is the URI of the
resource itself. It stands for past and present versions. The present
version is delivered (200 OK) from that URI itself (business as
usual), archived versions are delivered from other resources via
content negotiation (302 with Location different than the original URI)
In In both (1) and (2) the original resource plays a role in the
framework, either because it redirects to an external TimeGate that
performs the datetime content negotiation, or because it performs the
datetime content negotiation itself. And we actually think that is
quite essential that this original resource is involved. It is the URI
of the original resource by which the resource has been known as it
evolved over time. It makes sense to be able to use that URI to try
and get to its past versions. And by "get", I don't mean search for
it, but rather use the network to get there. After all, we all go by
the same name irrespective of the day you talk to us. Or we have the
same Linked Data URI irrespective of the day it is dereferenced. Why
would we suddenly need a new URI when we want to see what the LoD
description for any of us was, say, a year ago? Why must we prevent
that this same URI helps us to get to prior versions?
There is something else that I find problematic about the Memento
proposal. Archival versions of a web page are too important to hide
inside HTTP headers.
To take the canonical example, if I am viewing
<http://oakland.example.org/weather>, I don’t want the fact that I am
viewing historical weather information to be hidden in the request
headers.
It is not. The _request_ for prior versions is in a request header.
The response will come from a URI different than <http://oakland.example.org/weather
>, e.g. <http://oakland.example.org/20091012/weather> or <http://web.archive.org/web/20091012/http://oakland.example.org/weather
> and there will be a response header provided by the server that
delivers this response (X-Archive-Interval) that informs the client
unambiguously that the response _is_ an archived version. This info
can be leveraged by the client to give the archived version the
position of first class citizen it deserves.
Furthermore, I am viewing resource X as it appeared at time T1, I
should *not* be able to copy that URI and send it to a friend, or use
it as a reference in a document, only to have them see the URI as it
appears at time T2.
You will not. You would copy the URI <http://oakland.example.org/20091012/weather
> or <http://web.archive.org/web/20091012/http://oakland.example.org/weather
>.
I think the misconception in this discussion is that the archived
version is _delivered_ by the original URI. It is not. The archived
version is _requested_ via the original URI, and it is _delivered_ by
a resource at another URI. As is the case with all content negotiation.
I think that those of us in the web archiving community [1] would very
much appreciate a serious look by the web architecture community into
the problem of web archiving. The problem of representing and
resolving the tuple <URI, time> is a question which has not yet been
adequately dealt with.
I hope that with Memento we have provided a significant contribution
towards addressing that question. I think our paper at http://arxiv.org/abs/0911.1112
describes the proposed solution in quite some details, and
addresses quite some of the concerns raised in the discussion on this
list, so far. And, as indicated before, there's also the slides in
case there is not enough time to read the paper (http://www.slideshare.net/hvdsomp/memento-time-travel-for-the-web
).
Greetings
Herbert Van de Sompel
best,
Erik Hetzner
1. Those unfamiliar with web archives are encouraged to visit
<http://web.archive.org/>, <http://www.archive-it.org/>,
<http://www.vefsafn.is/>, <http://webarchives.cdlib.org/>, ...
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3
==
Herbert Van de Sompel
Digital Library Research & Prototyping
Los Alamos National Laboratory, Research Library
http://public.lanl.gov/herbertv/
tel. +1 505 667 1267