Re: Change Proposal for HttpRange-14

Jeni Tennison Sat, 24 Mar 2012 04:20:05 -0700

Hi Hugh,

On 24 Mar 2012, at 10:02, Hugh Glaser wrote:
> Please can you clarify something for me?
> (I am not very good at reading these formal documents - a bear of little 
> brain, perhaps.)


I will try my best.

> Am I right in thinking that, under your Change Proposal, the following sort 
> of thing becomes possible (I hope I am getting it right).
> Taking a site such as myexperiment.org (but it could very easily by the 
> eprints software, BBC, or even dbpedia.)
> See http://www.myexperiment.org/workflows/16
> A huge barrier to adoption of LD for them was that their users would be 
> exposed to the intricacies of the different URIs, and in particular that if 
> myexperiment.org moved over to using LD URIs completely, users would not be 
> able to cut and paste them from the address bar etc..
> Great confusion would ensue, especially as their workflows already offered 
> XML in addition to the HTML.

Right.

> This was a Bad Thing for them - their users were only just coming to terms 
> with all this online workflow stuff, and could easily get spooked.
> They nearly didn't do it, but because many of their technology providers were 
> Linked Data people, it went ahead (a few years ago now).
> The current outcome is what you see at the bottom of the workflow page - a 
> panel offering the different URIs, with a link to a page describing the 
> Linked Data world (to Chemists), which they are expected to understand.
> (Hash URIs might have been a bit better, but introduced a different mechanism 
> from the XML.)

Yep.

> As a result of your Change Proposal, it would have been acceptable (*if they 
> wanted*), to simply add RDF as a Content Negotiation option, and deliver an 
> RDF document with 200, in response to -H Accept:application/rdf+xml 
> http://www.myexperiment.org/workflows/16, just as they did for XML, I think.
> And this would enable them to use http://www.myexperiment.org/workflows/16 as 
> the anchor throughout the site (as they do) and have the same URI in the 
> address bar, and in fact have http://www.myexperiment.org/workflows/16 as the 
> only thing users see.
> Is that right?

Yes. They could have used http://www.myexperiment.org/workflows/16 throughout 
the site, had it respond with a 200 based on conneg with either HTML or RDF as 
required. It wouldn't have taken a linked data expert to figure out that if 
they wanted to refer to the workflow they had to copy and paste from the box at 
the bottom of the HTML page rather than the location bar at the top from you 
which you usually copy and paste URIs.

They could also (as they are doing) had separate URIs for the individual 
formats like:

  http://www.myexperiment.org/workflows/16.html
  http://www.myexperiment.org/workflows/16.rdf
  http://www.myexperiment.org/workflows/16.xml

They could have included within the RDF that you got from 
http://www.myexperiment.org/workflows/16 statements of the form:

  <http://www.myexperiment.org/workflows/16> 
    wdrs:describedby <http://www.myexperiment.org/workflows/16.html> ;
    wdrs:describedby <http://www.myexperiment.org/workflows/16.rdf> ;
    wdrs:describedby <http://www.myexperiment.org/workflows/16.xml> ;
    .

This would have enabled them to make separate statements about the licensing 
and provenance of the information held in those documents. If they didn't want 
to make those kinds of statements or enable those formats to be individually 
addressable, they could have just supported the 
http://www.myexperiment.org/workflows/16 URL and used conneg.

> Apropos Doing It Wrong:
> It is interesting to note that I see myexperiment.org have made the practical 
> decision to 303 to the RDF from 
> curl -i -L -H Accept:application/rdf+xml 
> http://www.myexperiment.org/workflows/16.html
> which suggests that they are already subverting things to get round some sort 
> of problem.

It looks as though it's:

  http://www.myexperiment.org/workflows/16.html
  -> 301 -> http://www.myexperiment.org/workflows/16
  -> 303 -> http://www.myexperiment.org/workflows/16.rdf
  -> 200

Technically I think, per http://www.w3.org/2001/tag/doc/uddp/#idp439264 this 
should mean that you can infer http://www.myexperiment.org/workflows/16.html 
sameAs http://www.myexperiment.org/workflows/16 but I'm not 100% sure what's 
intended (I think this needs spelling out).

> Few sites I can find (apart from dbpedia) actually return 406 when you ask 
> the HTML URI for RDF: they usually return the HTML.
> It is a foolish agent that relies on RDF coming back from a 200 OK when it 
> has asked for application/rdf+xml.

Yes.

> Apropos Risk.
> You say there is no risk.
> Is this a risk?:
> There may be a serious increase in the number of URIs for current sites.
> 
> Taking Freebase as another example.
> (In fact any of these sites that have worked hard to conform to the current 
> regime will have a decision to make.)
> Presently, if I
> curl -i -L -H Accept:application/rdf+xml 
> http://www.freebase.com/view/en/engelbert_humperdinck
> it gives me back HTML.
> What will it do in future?
> I know this Change Proposal is not proposing that they need to change, but 
> will they?
> They already have http://rdf.freebase.com/ns/en.engelbert_humperdinck (and 
> http://rdf.freebase.com/ns/m.047vj6 and another longer one).
> Effectively http://www.freebase.com/view/en/engelbert_humperdinck becomes yet 
> another URI that people can use, since it would return RDF (as myexperiment).
> Obviously I am viewing this a bit from the sameAs.org viewpoint.
> I know that the resource in the RDF document will (should) never be the HTML 
> URI, but people can and possibly will start passing around the HTML URI as if 
> it was the "proper" URI, and so a sensible sameAs service would have it as a 
> way of looking up the "proper" URIs.
> In fact I have sometimes toyed with the idea of allowing look up by HTML URL 
> on sameAs.org (giving back only the "real" Linked Data URIs) - it is what a 
> user expects from such a query, after all.
> (I hope all that makes sense.)

I guess I don't quite see the distinction that you're making between "HTML 
URIs" and "proper" URIs. Perhaps that's because I've become too embedded in the 
world where RDF data is embedded within HTML pages. I think that where 
Jonathan's document says [1]:

  A "URI documentation carrier" for a URI is a representation that carries 
  URI documentation that bears on the meaning of that URI. Applying the 
  adjective "nominal" is a technicality that signifies that being a URI 
  documentation carrier for the URI is expected according to this 
  specification, but that it might not actually be one (for example, the 
  representation might be empty, or it might contain information, but not 
  information that helps to document the URI, perhaps as the result of a 
  mistake).

what he's trying to tease out is the fact that you might not get any data back 
about a particular URI when you request that URI, but what you do get back is 
still its (empty) documentation. The URI doesn't become meaningless just 
because you get nothing back; it doesn't mean others can't make statements 
about it.

So in my view we're already living in a world where those "HTML URLs" exist and 
are meaningful and a sameAs service could be making statements about them.

Sorry, I'm probably missing something.

Where well-behaved sites will have to make a decision is whether to continue to 
use a 303 or switch to using a 200 and including a 'describedby' relationship. 
For example, we at legislation.gov.uk might be seriously tempted to switch to 
returning 200s from /id/ URIs. Currently, anyone requesting an /id/ places a 
load on our origin server because the CDN can't cache the 303 response, so we 
try to avoid using them in links on our site even where we could (and really 
should). Consequently people referring to legislation don't use the /id/ URIs 
when what they are referring to is the legislation item, not a particular 
version of it. If we switched to a 200, we wouldn't have to avoid those URIs, 
which would in turn help us embed RDFa in our pages, because instead of having 
a reference in a footnote contain something like:

  <a rel="leg:references" 
     resource="/id/ukpga/1985/67/section/6"
     href="/ukpga/1985/67/section/6">1985 c. 67 s. 6</a>

we could just use:

  <a rel="leg:references" 
     href="/id/ukpga/1985/67/section/6">1985 c. 67 s. 6</a>

but none of this increases the number of URIs that we're using, it just makes 
us switch to referring to legislation items using the URIs that we'd designed 
to be used to refer to legislation items.

Cheers,

Jeni

[1] http://www.w3.org/2001/tag/doc/uddp/#carriage
-- 
Jeni Tennison
http://www.jenitennison.com

Re: Change Proposal for HttpRange-14

Reply via email to