Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.

2014-07-29 Thread Mark Fallu
Hi Michael,

You asked:

How can URIs from sparql endpoints or OAI-PMH contribute to page rank?


If party A:
- produces a system that uses 303 based cooluri to describe  their content,
and in addition to webpages expose it to the world via sparql endpoint or
oai-pmh.

and party B:
- harvests information via sparql enpoint or oai-pmh and produces a public
representation of that content that links back to party A.

If the link back is the cooluri that resolves to a page via a 303 redirect
and content negotiation, web spiders etc will not be able to follow that
inbound link.

This means that some of the advantage of being machine harvest-able is
lost.  Sure your content is indexed, but the authority that comes from
other people/systems citing your content, reusing your content is greatly
diluted.

Cheers,

Mark


On Sat, Jul 19, 2014 at 1:52 AM, Michael Brunnbauer bru...@netestate.de
wrote:


 Hello Mark,

 I cannot remember this important topic coming up earlier - which is a bit
 disturbing.

 The problem would be migitated by people using the URI they see for
 linking.

 Why not use the HTML URLs in the HTML pages for internal page rank flow?

 How can URIs from sparql endpoints or OAI-PMH contribute to page rank?

 A real problem would be RDFa where href also sets the object of a triple.

 Regards,

 Michael Brunnbauer

 On Fri, Jul 18, 2014 at 10:05:17PM +1000, Mark Fallu wrote:
  If the links we present to the outside world for harvesting eg. via
 sparql
  endpoint, OAI-PMH or open social widget etc is the canonical individual
  URI, clients will be able to get to the display url, but the google
 page
  rank that would normally flow from these external links will not.



 
  The specification of a 303 redirect describes it as:
  http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
 
   The response to the request can be found under a different URI and
 SHOULD
   be retrieved using a GET method on that resource. This method exists
   primarily to allow the output of a POST-activated script to redirect
 the
   user agent to a selected resource. *The new URI is not a substitute
   reference for the originally requested resource*. The 303 response MUST
   NOT be cached, but the response to the second (redirected) request
 might be
   cacheable.
  
 
 
  The different URI SHOULD be given by the Location field in the response.
   Unless the request method was HEAD, the entity of the response SHOULD
   contain a short hypertext note with a hyperlink to the new URI(s).
 
 
  Google correctly implements the specification and does not assign the
 page
  rank of the individual URI to the display URL as it is *not a
  substitute reference for the originally requested resource.*
 
   The same is true of internal links, a high page rank home page will not
  pass page rank on to display urls if the pathway to those urls is via
  individual uri links.
 
  I am not sure what the solution is here as it seems the realms of SEO and
  the conventions of the web they are built on are not a good fit for
  semantic web best practice.
 
  The most minimal compromise I can think of is to move away from the use
 of
  a 303 redirect to a redirect that conserves the flow of google page rank.
 
 - 302 Found redirect is the recommended replacement for 303 for
 clients that do not support HTTP 1.1  and it does allow a certain
 amount of
 google page rank to flow.
 - 301 Moved Permanently is a poor fit for the Cool URI pattern, but
 passes on the full page rank of the links.
 - rewriting all URIs the URL would also work, but would break the
 coolURI pattern.
 
  The pragmatist in me feels that if we are going to make a change for the
  purposes of SEO, it might as well be the one with best return, i.e. 301
  redirect.
 
  Note: Indexing is not the problem here, content is indexed.  The issue
  relates to page rank not flowing through a 303 redirect.
 
  I have tested and can confirm that 303 redirects are an issue for a
 number
  of reasons:
 
 - page rank does not flow through a 303 redirect
 - page rank can not be assigned from a url to a uri with a
 rel=canonical
 tag if URI does a 303 redirect (preventing aggregation of pagerank
 from
 external links to URL)
 - URI and URL are indexed separately
 - rdfa schema.org representations of URIs do not translate to URL
 (ie.
 representation described at URL A, talking about URI B, does not get
 connected to representation described at URL B)
 - url parameters are not passed by a 303 redirect.
 - impact on functinality of google analytics tracking eg. traversing
 the
 site is seen as a series of direct page visits.
 
  Essentially - as far as search engines are concerned - every URL and URI
 is
  an island, with no connections between them.  At best a URL can express a
  rel=canonical back to it's corresponding URI, no pagerank will flow
 through
  links.
 
  Any guidance you can provide would be appreciated.
 
  --
 

Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.

2014-07-23 Thread John Walker
Hi Mark

Seems to me that there is something in common with Single Page Application 
(SPA) design. Often you see that a static HTML copy (snapshot) of the site is 
made for SEO purposes. This as search bots generally did not execute JavaScript 
so did not 'see' the dynamic page.

Often the pages in the app get a fragment identifier so they can be bookmarked.

As the bot sees the static page, which link should be shown in search results? 
Presumably you want the (human) user to be taken to the SPA page, not the 
static HTML.

I've seen this done with server side redirects and redirects in JavaScript. Not 
sure what HTTP response code is used.

If my memory serves correctly also some browsers did not include the fragment 
part in bookmarks, so often you would see a bookmark this page button that 
would create a bookmark for the 'canonical' URL. That can be very confusing for 
end users though.

How many times have we all seen a DBpedia /page URI used instead of the 
/resource URI? We should no expect most regular users to understand this.

I get the feeling there must be a simple and elegant solution.

Regards,

John


Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.

2014-07-23 Thread Michael Smethurst
Hello

(Pretty sure I've made this comment before so please forgive any signs of
premature senility)

I think this may be an unfortunate side effect of the conflation of the
303 (I can't send that) pattern with the content negotiation (what
flavour would you like) pattern

Lots of linked data applications (like dbpedia) seem to couple the two
things together. So you have a individual uri which, when you attempt to
dereference does a 303 *and* conneg in one step to the display uri:
/resource  303+conneg  /data
or
/resource  303+conneg  /page


Many other linked data sites seem to have followed this pattern but it
does seem, to my eyes, broke

At the BBC we have 3 flavours of uri. I'm not sure if these are the
appropriate / best labels but:
- the non-information resource uri. The uri that refers to the real world
physical / metaphysical thing
- the generic information resource uri that identifies the document but
not any specific representation of the document
- the representation uri (the html or json or rdf-xml etc)

We tend to use hashes rather than slashes like
http://www.bbc.co.uk/programmes/b006mw1h#programme


But pretending we use slashes for a minute...

If you requested:
http://www.bbc.co.uk/programmes/b006mw1h/thing


You'd get a 303 redirect to the generic document / information resource
uri:
http://www.bbc.co.uk/programmes/b006mw1h


Which would then conneg to the appropriate representation which would
still be served from:
http://www.bbc.co.uk/programmes/b006mw1h

With a content location header of
http://www.bbc.co.uk/programmes/b006mw1h.rdf

For example

Whilst the rdf refers to the non-information resource uri when making
assertions about the thing this uri is not used elsewhere. All links in
the html point to the generic document uri not to the non-information
resource uri

So crawlers like google just follow links from information resource to
information resource and never have to encounter 303s

Picking up a conneg penalty for every request isn't without problems
(particularly given CDN serving) but picking up a 303 penalty for every
request would be madness and not something we'd ever have been able to
implement

I do think the dbpedia conflation of 303 with conneg is an unhelpful
anti-pattern that people shouldn't be encouraged to follow. The conneg
part is just REST; semantics add the 303 onto that but they're not doing
the same thing

Separating 303 from conneg still gives you thing vs document separation,
still maintains cool uris and doesn't kill your servers

And we've never had a problem with seo

Hth
michael




On 18/07/2014 16:52, Michael Brunnbauer bru...@netestate.de wrote:


Hello Mark,

I cannot remember this important topic coming up earlier - which is a bit
disturbing.

The problem would be migitated by people using the URI they see for
linking.

Why not use the HTML URLs in the HTML pages for internal page rank flow?

How can URIs from sparql endpoints or OAI-PMH contribute to page rank?

A real problem would be RDFa where href also sets the object of a triple.

Regards,

Michael Brunnbauer

On Fri, Jul 18, 2014 at 10:05:17PM +1000, Mark Fallu wrote:
 If the links we present to the outside world for harvesting eg. via
sparql
 endpoint, OAI-PMH or open social widget etc is the canonical
individual
 URI, clients will be able to get to the display url, but the google
page
 rank that would normally flow from these external links will not.



 
 The specification of a 303 redirect describes it as:
 http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
 
  The response to the request can be found under a different URI and
SHOULD
  be retrieved using a GET method on that resource. This method exists
  primarily to allow the output of a POST-activated script to redirect
the
  user agent to a selected resource. *The new URI is not a substitute
  reference for the originally requested resource*. The 303 response
MUST
  NOT be cached, but the response to the second (redirected) request
might be
  cacheable.
 
 
 
 The different URI SHOULD be given by the Location field in the response.
  Unless the request method was HEAD, the entity of the response SHOULD
  contain a short hypertext note with a hyperlink to the new URI(s).
 
 
 Google correctly implements the specification and does not assign the
page
 rank of the individual URI to the display URL as it is *not a
 substitute reference for the originally requested resource.*
 
  The same is true of internal links, a high page rank home page will not
 pass page rank on to display urls if the pathway to those urls is via
 individual uri links.
 
 I am not sure what the solution is here as it seems the realms of SEO
and
 the conventions of the web they are built on are not a good fit for
 semantic web best practice.
 
 The most minimal compromise I can think of is to move away from the use
of
 a 303 redirect to a redirect that conserves the flow of google page
rank.
 
- 302 Found redirect is the recommended replacement for 303 

Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.

2014-07-23 Thread Bill Roberts
Hi Michael

We've tended to use slash URIs where possible, because have found it more 
convenient when doing URI dereferencing from a triple-store backed site - in 
which case we essentially do a DESCRIBE on the relevant URI.
(So we do 303ing for non-information resources, though in practice in a lot of 
our applications, the great majority of content is statistical data, which we 
treat as information resources and respond with 200).

How do you organise your data and generation of URI dereferencing responses 
with hash based URIs?  I can see a variety of ways to do it, but I'd be 
interested to know what you have found most efficient/convenient at the BBC - 
essentially dealing with the fact that the server doesn't know about what comes 
after the #


Thanks

Bill

On 23 Jul 2014, at 13:52, Michael Smethurst michael.smethu...@bbc.co.uk wrote:

 Hello
 
 (Pretty sure I've made this comment before so please forgive any signs of
 premature senility)
 
 I think this may be an unfortunate side effect of the conflation of the
 303 (I can't send that) pattern with the content negotiation (what
 flavour would you like) pattern
 
 Lots of linked data applications (like dbpedia) seem to couple the two
 things together. So you have a individual uri which, when you attempt to
 dereference does a 303 *and* conneg in one step to the display uri:
 /resource  303+conneg  /data
 or
 /resource  303+conneg  /page
 
 
 Many other linked data sites seem to have followed this pattern but it
 does seem, to my eyes, broke
 
 At the BBC we have 3 flavours of uri. I'm not sure if these are the
 appropriate / best labels but:
 - the non-information resource uri. The uri that refers to the real world
 physical / metaphysical thing
 - the generic information resource uri that identifies the document but
 not any specific representation of the document
 - the representation uri (the html or json or rdf-xml etc)
 
 We tend to use hashes rather than slashes like
 http://www.bbc.co.uk/programmes/b006mw1h#programme
 
 
 But pretending we use slashes for a minute...
 
 If you requested:
 http://www.bbc.co.uk/programmes/b006mw1h/thing
 
 
 You'd get a 303 redirect to the generic document / information resource
 uri:
 http://www.bbc.co.uk/programmes/b006mw1h
 
 
 Which would then conneg to the appropriate representation which would
 still be served from:
 http://www.bbc.co.uk/programmes/b006mw1h
 
 With a content location header of
 http://www.bbc.co.uk/programmes/b006mw1h.rdf
 
 For example
 
 Whilst the rdf refers to the non-information resource uri when making
 assertions about the thing this uri is not used elsewhere. All links in
 the html point to the generic document uri not to the non-information
 resource uri
 
 So crawlers like google just follow links from information resource to
 information resource and never have to encounter 303s
 
 Picking up a conneg penalty for every request isn't without problems
 (particularly given CDN serving) but picking up a 303 penalty for every
 request would be madness and not something we'd ever have been able to
 implement
 
 I do think the dbpedia conflation of 303 with conneg is an unhelpful
 anti-pattern that people shouldn't be encouraged to follow. The conneg
 part is just REST; semantics add the 303 onto that but they're not doing
 the same thing
 
 Separating 303 from conneg still gives you thing vs document separation,
 still maintains cool uris and doesn't kill your servers
 
 And we've never had a problem with seo
 
 Hth
 michael
 
 
 
 
 On 18/07/2014 16:52, Michael Brunnbauer bru...@netestate.de wrote:
 
 
 Hello Mark,
 
 I cannot remember this important topic coming up earlier - which is a bit
 disturbing.
 
 The problem would be migitated by people using the URI they see for
 linking.
 
 Why not use the HTML URLs in the HTML pages for internal page rank flow?
 
 How can URIs from sparql endpoints or OAI-PMH contribute to page rank?
 
 A real problem would be RDFa where href also sets the object of a triple.
 
 Regards,
 
 Michael Brunnbauer
 
 On Fri, Jul 18, 2014 at 10:05:17PM +1000, Mark Fallu wrote:
 If the links we present to the outside world for harvesting eg. via
 sparql
 endpoint, OAI-PMH or open social widget etc is the canonical
 individual
 URI, clients will be able to get to the display url, but the google
 page
 rank that would normally flow from these external links will not.
 
 
 
 
 The specification of a 303 redirect describes it as:
 http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
 
 The response to the request can be found under a different URI and
 SHOULD
 be retrieved using a GET method on that resource. This method exists
 primarily to allow the output of a POST-activated script to redirect
 the
 user agent to a selected resource. *The new URI is not a substitute
 reference for the originally requested resource*. The 303 response
 MUST
 NOT be cached, but the response to the second (redirected) request
 might be
 cacheable.
 
 
 
 The different URI 

Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.

2014-07-23 Thread Michael Smethurst
Hi Bill

Bit of a difficult question to answer because the reality is probably
still quite disjoined. Various parts of bbc.co.uk:
- serve linked data
- store data as rdf (in a triple store)
- consume (to some extent) linked data

But nowhere are all those things true in one place. So /programmes
publishes linked data but the backend is a relational database, whereas
things like sport / olympics are stored as linked data but don't publish

So the 2 parts aren't really coupled

I do half remember lots of conversations about hashes v slashes for
/programmes and /music but the sites are designed to be quite granular
(one thing per uri; one uri per thing) so we weren't really dealing with
lots of things in a document

The linked data platform (our triple store) does use # uris like:

On 23/07/2014 14:19, Bill Roberts b...@swirrl.com wrote:

Hi Michael

We've tended to use slash URIs where possible, because have found it more
convenient when doing URI dereferencing from a triple-store backed site -
in which case we essentially do a DESCRIBE on the relevant URI.
(So we do 303ing for non-information resources, though in practice in a
lot of our applications, the great majority of content is statistical
data, which we treat as information resources and respond with 200).

How do you organise your data and generation of URI dereferencing
responses with hash based URIs?  I can see a variety of ways to do it,
but I'd be interested to know what you have found most
efficient/convenient at the BBC - essentially dealing with the fact that
the server doesn't know about what comes after the #


Thanks

Bill

On 23 Jul 2014, at 13:52, Michael Smethurst michael.smethu...@bbc.co.uk
wrote:

 Hello
 
 (Pretty sure I've made this comment before so please forgive any signs
of
 premature senility)
 
 I think this may be an unfortunate side effect of the conflation of the
 303 (I can't send that) pattern with the content negotiation (what
 flavour would you like) pattern
 
 Lots of linked data applications (like dbpedia) seem to couple the two
 things together. So you have a individual uri which, when you attempt
to
 dereference does a 303 *and* conneg in one step to the display uri:
 /resource  303+conneg  /data
 or
 /resource  303+conneg  /page
 
 
 Many other linked data sites seem to have followed this pattern but it
 does seem, to my eyes, broke
 
 At the BBC we have 3 flavours of uri. I'm not sure if these are the
 appropriate / best labels but:
 - the non-information resource uri. The uri that refers to the real
world
 physical / metaphysical thing
 - the generic information resource uri that identifies the document but
 not any specific representation of the document
 - the representation uri (the html or json or rdf-xml etc)
 
 We tend to use hashes rather than slashes like
 http://www.bbc.co.uk/programmes/b006mw1h#programme
 
 
 But pretending we use slashes for a minute...
 
 If you requested:
 http://www.bbc.co.uk/programmes/b006mw1h/thing
 
 
 You'd get a 303 redirect to the generic document / information resource
 uri:
 http://www.bbc.co.uk/programmes/b006mw1h
 
 
 Which would then conneg to the appropriate representation which would
 still be served from:
 http://www.bbc.co.uk/programmes/b006mw1h
 
 With a content location header of
 http://www.bbc.co.uk/programmes/b006mw1h.rdf
 
 For example
 
 Whilst the rdf refers to the non-information resource uri when making
 assertions about the thing this uri is not used elsewhere. All links
in
 the html point to the generic document uri not to the non-information
 resource uri
 
 So crawlers like google just follow links from information resource to
 information resource and never have to encounter 303s
 
 Picking up a conneg penalty for every request isn't without problems
 (particularly given CDN serving) but picking up a 303 penalty for every
 request would be madness and not something we'd ever have been able to
 implement
 
 I do think the dbpedia conflation of 303 with conneg is an unhelpful
 anti-pattern that people shouldn't be encouraged to follow. The conneg
 part is just REST; semantics add the 303 onto that but they're not
doing
 the same thing
 
 Separating 303 from conneg still gives you thing vs document
separation,
 still maintains cool uris and doesn't kill your servers
 
 And we've never had a problem with seo
 
 Hth
 michael
 
 
 
 
 On 18/07/2014 16:52, Michael Brunnbauer bru...@netestate.de wrote:
 
 
 Hello Mark,
 
 I cannot remember this important topic coming up earlier - which is a
bit
 disturbing.
 
 The problem would be migitated by people using the URI they see for
 linking.
 
 Why not use the HTML URLs in the HTML pages for internal page rank
flow?
 
 How can URIs from sparql endpoints or OAI-PMH contribute to page rank?
 
 A real problem would be RDFa where href also sets the object of a
triple.
 
 Regards,
 
 Michael Brunnbauer
 
 On Fri, Jul 18, 2014 at 10:05:17PM +1000, Mark Fallu wrote:
 If the links we present to the outside world for 

Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.

2014-07-23 Thread Michael Smethurst
Oops, dropped laptop :-/

Continues

On 23/07/2014 14:50, Michael Smethurst michael.smethu...@bbc.co.uk
wrote:

Hi Bill

Bit of a difficult question to answer because the reality is probably
still quite disjointed. Various parts of bbc.co.uk:
- serve linked data
- store data as rdf (in a triple store)
- consume (to some extent) linked data

But nowhere are all those things true in one place. So /programmes
publishes linked data but the backend is a relational database, whereas
things like sport / olympics are stored as linked data but don't publish

So the 2 parts aren't really coupled

I do half remember lots of conversations about hashes v slashes for
/programmes and /music but the sites are designed to be quite granular
(one thing per uri; one uri per thing) so we weren't really dealing with
lots of things in a document

The linked data platform (our triple store) does use # uris like:
http://www.bbc.co.uk/things/794274f1-d7ea-4ad2-9b36-c46ed55da9bd#id


But I'm not best placed to know about the interfaces and queries onto this
and why they chose hashes and not slashes. I'll ask around unless those
people are already on this list...

Not much help
Sorry
michael

On 23/07/2014 14:19, Bill Roberts b...@swirrl.com wrote:

Hi Michael

We've tended to use slash URIs where possible, because have found it more
convenient when doing URI dereferencing from a triple-store backed site -
in which case we essentially do a DESCRIBE on the relevant URI.
(So we do 303ing for non-information resources, though in practice in a
lot of our applications, the great majority of content is statistical
data, which we treat as information resources and respond with 200).

How do you organise your data and generation of URI dereferencing
responses with hash based URIs?  I can see a variety of ways to do it,
but I'd be interested to know what you have found most
efficient/convenient at the BBC - essentially dealing with the fact that
the server doesn't know about what comes after the #


Thanks

Bill

On 23 Jul 2014, at 13:52, Michael Smethurst michael.smethu...@bbc.co.uk
wrote:

 Hello
 
 (Pretty sure I've made this comment before so please forgive any signs
of
 premature senility)
 
 I think this may be an unfortunate side effect of the conflation of the
 303 (I can't send that) pattern with the content negotiation (what
 flavour would you like) pattern
 
 Lots of linked data applications (like dbpedia) seem to couple the two
 things together. So you have a individual uri which, when you attempt
to
 dereference does a 303 *and* conneg in one step to the display uri:
 /resource  303+conneg  /data
 or
 /resource  303+conneg  /page
 
 
 Many other linked data sites seem to have followed this pattern but it
 does seem, to my eyes, broke
 
 At the BBC we have 3 flavours of uri. I'm not sure if these are the
 appropriate / best labels but:
 - the non-information resource uri. The uri that refers to the real
world
 physical / metaphysical thing
 - the generic information resource uri that identifies the document but
 not any specific representation of the document
 - the representation uri (the html or json or rdf-xml etc)
 
 We tend to use hashes rather than slashes like
 http://www.bbc.co.uk/programmes/b006mw1h#programme
 
 
 But pretending we use slashes for a minute...
 
 If you requested:
 http://www.bbc.co.uk/programmes/b006mw1h/thing
 
 
 You'd get a 303 redirect to the generic document / information resource
 uri:
 http://www.bbc.co.uk/programmes/b006mw1h
 
 
 Which would then conneg to the appropriate representation which would
 still be served from:
 http://www.bbc.co.uk/programmes/b006mw1h
 
 With a content location header of
 http://www.bbc.co.uk/programmes/b006mw1h.rdf
 
 For example
 
 Whilst the rdf refers to the non-information resource uri when making
 assertions about the thing this uri is not used elsewhere. All links
in
 the html point to the generic document uri not to the non-information
 resource uri
 
 So crawlers like google just follow links from information resource to
 information resource and never have to encounter 303s
 
 Picking up a conneg penalty for every request isn't without problems
 (particularly given CDN serving) but picking up a 303 penalty for every
 request would be madness and not something we'd ever have been able to
 implement
 
 I do think the dbpedia conflation of 303 with conneg is an unhelpful
 anti-pattern that people shouldn't be encouraged to follow. The conneg
 part is just REST; semantics add the 303 onto that but they're not
doing
 the same thing
 
 Separating 303 from conneg still gives you thing vs document
separation,
 still maintains cool uris and doesn't kill your servers
 
 And we've never had a problem with seo
 
 Hth
 michael
 
 
 
 
 On 18/07/2014 16:52, Michael Brunnbauer bru...@netestate.de wrote:
 
 
 Hello Mark,
 
 I cannot remember this important topic coming up earlier - which is a
bit
 disturbing.
 
 The problem would be migitated by people using the URI they 

Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.

2014-07-23 Thread john.walker
Hi Michael,

Hope the laptop is ok :)

So I can think of your 'slash' NIR URI as something similar to a URN:
http://www.bbc.co.uk/programmes/b006mw1h/thing

It doesn't do much on it's own and *just* acts as an identifier.
Using HTTP it can be resolved to a URL via the 303, kind of similar to a URN
resolver.

Could you explain what you mean by conneg penalty?

I've set up an application working with 303s and, although I don't consider
myself mad, it does add an extra request to every click the user does.
Getting the 303 response takes 20 - 25 ms on average, so it's not a big issue in
this case (internal company usage).

Interestingly enough I just checked a random shortened link off Twitter and it
went through no less than 5 HTTP 301/302 redirects (500 ms in total) before
getting the HTML.
Taking that into consideration a single 303 is not too bad!

Regards,

John Walker


 On July 23, 2014 at 3:55 PM Michael Smethurst michael.smethu...@bbc.co.uk
 wrote:


 Oops, dropped laptop :-/

 Continues

 On 23/07/2014 14:50, Michael Smethurst michael.smethu...@bbc.co.uk
 wrote:

 Hi Bill
 
 Bit of a difficult question to answer because the reality is probably
 still quite disjointed. Various parts of bbc.co.uk:
 - serve linked data
 - store data as rdf (in a triple store)
 - consume (to some extent) linked data
 
 But nowhere are all those things true in one place. So /programmes
 publishes linked data but the backend is a relational database, whereas
 things like sport / olympics are stored as linked data but don't publish
 
 So the 2 parts aren't really coupled
 
 I do half remember lots of conversations about hashes v slashes for
 /programmes and /music but the sites are designed to be quite granular
 (one thing per uri; one uri per thing) so we weren't really dealing with
 lots of things in a document
 
 The linked data platform (our triple store) does use # uris like:
 http://www.bbc.co.uk/things/794274f1-d7ea-4ad2-9b36-c46ed55da9bd#id


 But I'm not best placed to know about the interfaces and queries onto this
 and why they chose hashes and not slashes. I'll ask around unless those
 people are already on this list...

 Not much help
 Sorry
 michael
 
 On 23/07/2014 14:19, Bill Roberts b...@swirrl.com wrote:
 
 Hi Michael
 
 We've tended to use slash URIs where possible, because have found it more
 convenient when doing URI dereferencing from a triple-store backed site -
 in which case we essentially do a DESCRIBE on the relevant URI.
 (So we do 303ing for non-information resources, though in practice in a
 lot of our applications, the great majority of content is statistical
 data, which we treat as information resources and respond with 200).
 
 How do you organise your data and generation of URI dereferencing
 responses with hash based URIs? I can see a variety of ways to do it,
 but I'd be interested to know what you have found most
 efficient/convenient at the BBC - essentially dealing with the fact that
 the server doesn't know about what comes after the #
 
 
 Thanks
 
 Bill
 
 On 23 Jul 2014, at 13:52, Michael Smethurst michael.smethu...@bbc.co.uk
 wrote:
 
  Hello
 
  (Pretty sure I've made this comment before so please forgive any signs
 of
  premature senility)
 
  I think this may be an unfortunate side effect of the conflation of the
  303 (I can't send that) pattern with the content negotiation (what
  flavour would you like) pattern
 
  Lots of linked data applications (like dbpedia) seem to couple the two
  things together. So you have a individual uri which, when you attempt
 to
  dereference does a 303 *and* conneg in one step to the display uri:
  /resource  303+conneg  /data
  or
  /resource  303+conneg  /page
 
 
  Many other linked data sites seem to have followed this pattern but it
  does seem, to my eyes, broke
 
  At the BBC we have 3 flavours of uri. I'm not sure if these are the
  appropriate / best labels but:
  - the non-information resource uri. The uri that refers to the real
 world
  physical / metaphysical thing
  - the generic information resource uri that identifies the document but
  not any specific representation of the document
  - the representation uri (the html or json or rdf-xml etc)
 
  We tend to use hashes rather than slashes like
  http://www.bbc.co.uk/programmes/b006mw1h#programme
 
 
  But pretending we use slashes for a minute...
 
  If you requested:
  http://www.bbc.co.uk/programmes/b006mw1h/thing
 
 
  You'd get a 303 redirect to the generic document / information resource
  uri:
  http://www.bbc.co.uk/programmes/b006mw1h
 
 
  Which would then conneg to the appropriate representation which would
  still be served from:
  http://www.bbc.co.uk/programmes/b006mw1h
 
  With a content location header of
  http://www.bbc.co.uk/programmes/b006mw1h.rdf
 
  For example
 
  Whilst the rdf refers to the non-information resource uri when making
  assertions about the thing this uri is not used elsewhere. All links
 in
  the html point to the generic document uri 

Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.

2014-07-23 Thread Kingsley Idehen

On 7/23/14 10:50 AM, john.walker wrote:

Hi Michael,
Hope the laptop is ok :)
So I can think of your 'slash' NIR URI as something similar to a URN:
http://www.bbc.co.uk/programmes/b006mw1h/thing
It doesn't do much on it's own and *just* acts as an identifier.
Using HTTP it can be resolved to a URL via the 303, kind of similar to 
a URN resolver.

Could you explain what you mean by conneg penalty?
I've set up an application working with 303s and, although I don't 
consider myself mad, it does add an extra request to every click the 
user does.
Getting the 303 response takes 20 - 25 ms on average, so it's not a 
big issue in this case (internal company usage).
Interestingly enough I just checked a random shortened link off 
Twitter and it went through no less than 5 HTTP 301/302 redirects (500 
ms in total) before getting the HTML.

Taking that into consideration a single 303 is not too bad!
Regards,

John Walker


SeeAlso, the output of our variant of Vapour that illustrates entity 
denotation and connotation via HTTP URIs [1] .


Basically, SEO should be targeting the entity denoted by the URI 
http://dbpedia.org/page/Linked_data since that URI denotes a Document. 
The document in comprised of RDF content where format is negotiable.


Links:

[1] http://bit.ly/entity-denotation-and-connotaton -- Vapour 
deconstruction of HTTP URIs that denote and connote entities of 
different types .


[2] http://lists.w3.org/Archives/Public/public-lod/2014Jul/0085.html -- 
related thread on this forum.


--
Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this




smime.p7s
Description: S/MIME Cryptographic Signature


Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.

2014-07-23 Thread john.walker
Hi Kingsley,

In the case that Michael describes, could one reasonably expect that if the BBC
were to embed the following triples as RDFa in the HTML served on the URL
http://www.bbc.co.uk/programmes/b006mw1h, then a webcrawler would understand to
go directly to the webpages about the seasons and episodes?

## start Turtle

@prefix schema: http://schema.org/.

http://www.bbc.co.uk/programmes/b006mw1h a schema:WebPage ;
  schema:about http://www.bbc.co.uk/programmes/b006mw1h/thing .

http://www.bbc.co.uk/programmes/b006mw1h/thing a schema:TVSeries ;
  schema:name Gardeners' World ;
  schema:season http://www.bbc.co.uk/programmes/p00fx55j/thing ,
http://www.bbc.co.uk/programmes/p00fx5b7/thing ;
  schema:episode http://www.bbc.co.uk/programmes/b049fnfd/thing .

http://www.bbc.co.uk/programmes/p00fx55j a schema:WebPage ;
  schema:about http://www.bbc.co.uk/programmes/p00fx55j/thing .

http://www.bbc.co.uk/programmes/p00fx5b7 a schema:WebPage ;
  schema:about http://www.bbc.co.uk/programmes/p00fx5b7/thing .

http://www.bbc.co.uk/programmes/b049fnfd a schema:WebPage ;
  schema:about http://www.bbc.co.uk/programmes/b049fnfd/thing .

## end Turtle

I guess, as Michael mentions, having the webpages as the href targets in the
HTML effectively shortcuts that indirect relation.

Cheers,
John

 On July 23, 2014 at 5:23 PM Kingsley Idehen kide...@openlinksw.com wrote:


 On 7/23/14 10:50 AM, john.walker wrote:
  Hi Michael,
  Hope the laptop is ok :)
  So I can think of your 'slash' NIR URI as something similar to a URN:
  http://www.bbc.co.uk/programmes/b006mw1h/thing
  It doesn't do much on it's own and *just* acts as an identifier.
  Using HTTP it can be resolved to a URL via the 303, kind of similar to
  a URN resolver.
  Could you explain what you mean by conneg penalty?
  I've set up an application working with 303s and, although I don't
  consider myself mad, it does add an extra request to every click the
  user does.
  Getting the 303 response takes 20 - 25 ms on average, so it's not a
  big issue in this case (internal company usage).
  Interestingly enough I just checked a random shortened link off
  Twitter and it went through no less than 5 HTTP 301/302 redirects (500
  ms in total) before getting the HTML.
  Taking that into consideration a single 303 is not too bad!
  Regards,
 
  John Walker

 SeeAlso, the output of our variant of Vapour that illustrates entity
 denotation and connotation via HTTP URIs [1] .

 Basically, SEO should be targeting the entity denoted by the URI
 http://dbpedia.org/page/Linked_data since that URI denotes a Document.
 The document in comprised of RDF content where format is negotiable.

 Links:

 [1] http://bit.ly/entity-denotation-and-connotaton -- Vapour
 deconstruction of HTTP URIs that denote and connote entities of
 different types .

 [2] http://lists.w3.org/Archives/Public/public-lod/2014Jul/0085.html --
 related thread on this forum.

 --
 Regards,

 Kingsley Idehen
 Founder  CEO
 OpenLink Software
 Company Web: http://www.openlinksw.com
 Personal Weblog 1: http://kidehen.blogspot.com
 Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
 Twitter Profile: https://twitter.com/kidehen
 Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
 LinkedIn Profile: http://www.linkedin.com/in/kidehen
 Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this



Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.

2014-07-23 Thread Kingsley Idehen

On 7/23/14 12:17 PM, john.walker wrote:

Hi Kingsley,
In the case that Michael describes, could one reasonably expect that 
if the BBC were to embed the following triples as RDFa in the HTML 
served on the URL http://www.bbc.co.uk/programmes/b006mw1h, then a 
webcrawler would understand to go directly to the webpages about the 
seasons and episodes?


Absolutely!!

They could even use the following within head/ :

1. link @rel={relation-or-predicate-identifier} 
href={relation-object} ../ -- indicating that the doc in question is 
the subject of a relation denoted by {relation-or-predicate-identifier}


2. link @rev={relation-or-predicate-identifier} 
href={relation-subject} ../ -- indicating that the doc in question is 
the object of a relation denoted by {relation-or-predicate-identifier} .


Even up the ante, for smart HTTP user agents by replicating the 
relations above using Link: response headers.


And to make your example live, I am tweaking your Turtle Snippet (aka. 
Nanotation) which will produce some interesting results. Note, my only 
change is a Nanotation parser hint i.e., ## Turtle Start ## and ## 
Turtle End ##  :-)


## Turtle Start ##
@prefix schema: http://schema.org/.
http://www.bbc.co.uk/programmes/b006mw1h a schema:WebPage ;
  schema:about http://www.bbc.co.uk/programmes/b006mw1h/thing .
http://www.bbc.co.uk/programmes/b006mw1h/thing a schema:TVSeries ;
  schema:name Gardeners' World ;
  schema:season http://www.bbc.co.uk/programmes/p00fx55j/thing , 
http://www.bbc.co.uk/programmes/p00fx5b7/thing ;

  schema:episode http://www.bbc.co.uk/programmes/b049fnfd/thing .
http://www.bbc.co.uk/programmes/p00fx55j a schema:WebPage ;
  schema:about http://www.bbc.co.uk/programmes/p00fx55j/thing .
http://www.bbc.co.uk/programmes/p00fx5b7 a schema:WebPage ;
  schema:about http://www.bbc.co.uk/programmes/p00fx5b7/thing .
http://www.bbc.co.uk/programmes/b049fnfd a schema:WebPage ;
  schema:about http://www.bbc.co.uk/programmes/b049fnfd/thing .
## Turtle End ##


## start Turtle
@prefix schema: http://schema.org/.
http://www.bbc.co.uk/programmes/b006mw1h a schema:WebPage ;
  schema:about http://www.bbc.co.uk/programmes/b006mw1h/thing .
http://www.bbc.co.uk/programmes/b006mw1h/thing a schema:TVSeries ;
  schema:name Gardeners' World ;
  schema:season http://www.bbc.co.uk/programmes/p00fx55j/thing , 
http://www.bbc.co.uk/programmes/p00fx5b7/thing ;

  schema:episode http://www.bbc.co.uk/programmes/b049fnfd/thing .
http://www.bbc.co.uk/programmes/p00fx55j a schema:WebPage ;
  schema:about http://www.bbc.co.uk/programmes/p00fx55j/thing .
http://www.bbc.co.uk/programmes/p00fx5b7 a schema:WebPage ;
  schema:about http://www.bbc.co.uk/programmes/p00fx5b7/thing .
http://www.bbc.co.uk/programmes/b049fnfd a schema:WebPage ;
  schema:about http://www.bbc.co.uk/programmes/b049fnfd/thing .
## end Turtle
I guess, as Michael mentions, having the webpages as the href targets 
in the HTML effectively shortcuts that indirect relation.

Cheers,
John

 On July 23, 2014 at 5:23 PM Kingsley Idehen kide...@openlinksw.com 
wrote:



 On 7/23/14 10:50 AM, john.walker wrote:
  Hi Michael,
  Hope the laptop is ok :)
  So I can think of your 'slash' NIR URI as something similar to a URN:
  http://www.bbc.co.uk/programmes/b006mw1h/thing
  It doesn't do much on it's own and *just* acts as an identifier.
  Using HTTP it can be resolved to a URL via the 303, kind of 
similar to

  a URN resolver.
  Could you explain what you mean by conneg penalty?
  I've set up an application working with 303s and, although I don't
  consider myself mad, it does add an extra request to every click the
  user does.
  Getting the 303 response takes 20 - 25 ms on average, so it's not a
  big issue in this case (internal company usage).
  Interestingly enough I just checked a random shortened link off
  Twitter and it went through no less than 5 HTTP 301/302 redirects 
(500

  ms in total) before getting the HTML.
  Taking that into consideration a single 303 is not too bad!
  Regards,
 
  John Walker

 SeeAlso, the output of our variant of Vapour that illustrates entity
 denotation and connotation via HTTP URIs [1] .

 Basically, SEO should be targeting the entity denoted by the URI
 http://dbpedia.org/page/Linked_data since that URI denotes a 
Document.

 The document in comprised of RDF content where format is negotiable.

 Links:

 [1] http://bit.ly/entity-denotation-and-connotaton -- Vapour
 deconstruction of HTTP URIs that denote and connote entities of
 different types .

 [2] http://lists.w3.org/Archives/Public/public-lod/2014Jul/0085.html --
 related thread on this forum.

 --
 Regards,

 Kingsley Idehen
 Founder  CEO
 OpenLink Software
 Company Web: http://www.openlinksw.com
 Personal Weblog 1: http://kidehen.blogspot.com
 Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
 Twitter Profile: https://twitter.com/kidehen
 Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
 LinkedIn Profile: 

Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.

2014-07-23 Thread Michael Smethurst


On 23/07/2014 15:50, john.walker john.wal...@semaku.com wrote:

Hi Michael,

Hiya

 
  
Hope the laptop is ok :)

Survived another drop

 
  
So I can think of your 'slash' NIR URI as something similar to a URN:
http://www.bbc.co.uk/programmes/b006mw1h/thing
  
It doesn't do much on it's own and *just* acts as an identifier.


(Ignoring the fact we actually use hashes, if we did use slashes, then)
yes. It's just an identifier for a real world thing. The rdf and rdfa
use it to make assertions:

a typeof=po:Brand about=/programmes/b006mw1h#programme
href=/programmes/b006mw1h title=Gardeners' World

but @href links don't travel through them


 
Using HTTP it can be resolved to a URL via the 303, kind of similar to a
URN resolver.

Guess so yes, a urn that doesn't need a urn resolver cos it's an http
uri

  
Could you explain what you mean by conneg penalty?

Every time a normal user clicks a link on the bits of bbc.co.uk that
support linked data, they click to the generic document resource uri
which then does the conneg bit to serve an appropriate representation. So
it's extra work at the server end but mostly cachable. Except a bit tricky
with CDNs

 
  
I've set up an application working with 303s and, although I don't
consider myself mad, it does add an extra request to every click the user
does.

Guess the madness quotient would depend on how much traffic you have to
cope with. For the BBC to add an additional request for every request for
a doctor who page would have been madness


Getting the 303 response takes 20 - 25 ms on average, so it's not a big
issue in this case (internal company usage).

For internal usage it's all probably fine. But I still think it's a
pattern that shouldn't be generally encouraged. On a high traffic website
it's just more requests that aren't really adding anything. I think if
we'd suggested the dbpedia style pattern at the BBC we'd never have gotten
permission to serve linked data

  
Interestingly enough I just checked a random shortened link off Twitter
and it went through no less than 5 HTTP 301/302 redirects (500 ms in
total) before getting the HTML.

Yeah, it's a shambles init :-/

Taking that into consideration a single 303 is not too bad!

In comparison to link shortener madness it's not that mad. But it's a
redirect your servers have to handle and link shorteners are someone
else's problem. Kinda

michael
 
  
Regards, 

John Walker 



 On July 23, 2014 at 3:55 PM Michael Smethurst
michael.smethu...@bbc.co.uk wrote:

 
 
 Oops, dropped laptop :-/
 
 Continues 
 
 On 23/07/2014 14:50, Michael Smethurst michael.smethu...@bbc.co.uk
 wrote: 
 
 Hi Bill 
  
 Bit of a difficult question to answer because the reality is probably
 still quite disjointed. Various parts of bbc.co.uk:
 - serve linked data
 - store data as rdf (in a triple store)
 - consume (to some extent) linked data
  
 But nowhere are all those things true in one place. So /programmes
 publishes linked data but the backend is a relational database,
whereas 
 things like sport / olympics are stored as linked data but don't
publish 
  
 So the 2 parts aren't really coupled
  
 I do half remember lots of conversations about hashes v slashes for
 /programmes and /music but the sites are designed to be quite granular
 (one thing per uri; one uri per thing) so we weren't really dealing
with 
 lots of things in a document
  
 The linked data platform (our triple store) does use # uris like:
 http://www.bbc.co.uk/things/794274f1-d7ea-4ad2-9b36-c46ed55da9bd#id
 
 
 But I'm not best placed to know about the interfaces and queries onto
this 
 and why they chose hashes and not slashes. I'll ask around unless those
 people are already on this list...
 
 Not much help 
 Sorry 
 michael 
  
 On 23/07/2014 14:19, Bill Roberts b...@swirrl.com wrote:
  
 Hi Michael 
  
 We've tended to use slash URIs where possible, because have found it
more 
 convenient when doing URI dereferencing from a triple-store backed
site - 
 in which case we essentially do a DESCRIBE on the relevant URI.
 (So we do 303ing for non-information resources, though in practice in
a 
 lot of our applications, the great majority of content is statistical
 data, which we treat as information resources and respond with 200).
  
 How do you organise your data and generation of URI dereferencing
 responses with hash based URIs? I can see a variety of ways to do it,
 but I'd be interested to know what you have found most
 efficient/convenient at the BBC - essentially dealing with the fact
that 
 the server doesn't know about what comes after the #
  
  
 Thanks 
  
 Bill 
  
 On 23 Jul 2014, at 13:52, Michael Smethurst
michael.smethu...@bbc.co.uk
 wrote: 
  
  Hello 
  
  (Pretty sure I've made this comment before so please forgive any
signs 
 of 
  premature senility)
  
  I think this may be an unfortunate side effect of the conflation of
the 
  303 (I can't send that) pattern with the content negotiation
(what 
  flavour would you like) pattern
 

Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.

2014-07-23 Thread Kingsley Idehen

On 7/23/14 2:05 PM, Michael Smethurst wrote:

For internal usage it's all probably fine. But I still think it's a
pattern that shouldn't be generally encouraged.


Its a horses for courses matter :-)

If you choose to use hashless HTTP URIs in regards to entity denotation, 
you have to make the extra investment required (via 303 heuristics) for 
entity disambiguation [1].


Note, there are changes to HTTP that also reduce some of the confusion 
in this realm. For instance the use Content-Location: response headers 
to aid disambiguation [2].


Links:

[1] http://bit.ly/WAJGCp -- HTTP URI denotation in a single slide

[2] https://twitter.com/kidehen/status/476039386425868288 -- HTTP changes

[3] https://twitter.com/ereteog/status/487935205240766464/photo/1 -- 
nice picture, but would be even clearer it had a hash based HTTP URI 
denoting the zebra re., denoting on the Web, what exists.


--
Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this




smime.p7s
Description: S/MIME Cryptographic Signature


Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.

2014-07-23 Thread Michael Smethurst
Hi Kingsley

Very definitely starting to feel like deja vu...

On 23/07/2014 20:18, Kingsley Idehen kide...@openlinksw.com wrote:

On 7/23/14 2:05 PM, Michael Smethurst wrote:
 For internal usage it's all probably fine. But I still think it's a
 pattern that shouldn't be generally encouraged.

Its a horses for courses matter :-)

If you choose to use hashless HTTP URIs in regards to entity denotation,
you have to make the extra investment required (via 303 heuristics) for
entity disambiguation [1].

My only point is: if you don't conflate I can't send that (303) with
what flavour would you like (conneg) you don't have to invest in more
servers


Note, there are changes to HTTP that also reduce some of the confusion
in this realm. For instance the use Content-Location: response headers
to aid disambiguation [2].

We do use content location for the (information) resource / representation
split but that's REST not 303 semantics

michael

Links:

[1] http://bit.ly/WAJGCp -- HTTP URI denotation in a single slide

[2] https://twitter.com/kidehen/status/476039386425868288 -- HTTP changes

[3] https://twitter.com/ereteog/status/487935205240766464/photo/1 --
nice picture, but would be even clearer it had a hash based HTTP URI
denoting the zebra re., denoting on the Web, what exists.

-- 
Regards,

Kingsley Idehen
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this






Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.

2014-07-23 Thread Kingsley Idehen

On 7/23/14 3:40 PM, Michael Smethurst wrote:

Hi Kingsley

Very definitely starting to feel like deja vu...

On 23/07/2014 20:18, Kingsley Idehenkide...@openlinksw.com  wrote:


On 7/23/14 2:05 PM, Michael Smethurst wrote:

For internal usage it's all probably fine. But I still think it's a
pattern that shouldn't be generally encouraged.


Its a horses for courses matter:-)

If you choose to use hashless HTTP URIs in regards to entity denotation,
you have to make the extra investment required (via 303 heuristics) for
entity disambiguation [1].

My only point is: if you don't conflate I can't send that (303) with
what flavour would you like (conneg) you don't have to invest in more
servers



Note, there are changes to HTTP that also reduce some of the confusion
in this realm. For instance the use Content-Location: response headers
to aid disambiguation [2].

We do use content location for the (information) resource / representation
split but that's REST not 303 semantics

michael


There is only one kind of relation semantics in play here, and its the 
semantics of denotation and connotation [1][2]. HTTP URIs denote things. 
HTTP URLs denote documents comprised of connotation bearing content.


In regards, to the current BBC programmes URIs, if you incorporate RDFa, 
link/, or Link: based relations, disambiguation without 303's or 
content negotiation is possible. RDF user agents (for example) will be 
able to make sense of the relations that that collective describe 
documents about programmes and actual programmes.


Links:

[1] http://bit.ly/what-does-this-bbc-programmes-uri-denote -- Vapour 
using RDF semantics discern what 
http://www.bbc.co.uk/programmes/b006mw1h denotes and connotes


[2] http://bit.ly/what-does-this-bbc-programmes-doc-url-denote -- ditto 
but targeting http://www.bbc.co.uk/programmes/b006mw1h.rdf .


--
Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this




smime.p7s
Description: S/MIME Cryptographic Signature


Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.

2014-07-23 Thread Michael Smethurst


On 23/07/2014 21:49, Kingsley Idehen kide...@openlinksw.com wrote:

On 7/23/14 3:40 PM, Michael Smethurst wrote:
 Hi Kingsley

 Very definitely starting to feel like deja vu...

 On 23/07/2014 20:18, Kingsley Idehenkide...@openlinksw.com  wrote:

 On 7/23/14 2:05 PM, Michael Smethurst wrote:
 For internal usage it's all probably fine. But I still think it's a
 pattern that shouldn't be generally encouraged.
 
 Its a horses for courses matter:-)
 
 If you choose to use hashless HTTP URIs in regards to entity
denotation,
 you have to make the extra investment required (via 303 heuristics)
for
 entity disambiguation [1].
 My only point is: if you don't conflate I can't send that (303) with
 what flavour would you like (conneg) you don't have to invest in more
 servers

 
 Note, there are changes to HTTP that also reduce some of the confusion
 in this realm. For instance the use Content-Location: response
headers
 to aid disambiguation [2].
 We do use content location for the (information) resource /
representation
 split but that's REST not 303 semantics

 michael

There is only one kind of relation semantics in play here, and its the
semantics of denotation and connotation [1][2].

Tho derrida didn't have to pay for servers :-/

 HTTP URIs denote things.

Which can't be served (303)

 
HTTP URLs denote documents comprised of connotation bearing content.

Which can be served in assorted representations (conneg (+ content
location))

Think the last time we had this conversation we broke the twitter scroll
bar and agreed to disagree. Or at worst misunderstand :-)

michael

In regards, to the current BBC programmes URIs, if you incorporate RDFa,
link/, or Link: based relations, disambiguation without 303's or
content negotiation is possible. RDF user agents (for example) will be
able to make sense of the relations that that collective describe
documents about programmes and actual programmes.

Links:

[1] http://bit.ly/what-does-this-bbc-programmes-uri-denote -- Vapour
using RDF semantics discern what
http://www.bbc.co.uk/programmes/b006mw1h denotes and connotes

[2] http://bit.ly/what-does-this-bbc-programmes-doc-url-denote -- ditto
but targeting http://www.bbc.co.uk/programmes/b006mw1h.rdf .

-- 
Regards,

Kingsley Idehen
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this






Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.

2014-07-23 Thread john.walker
And with a few additions to reflect the _actual_ published data (using hash
URIs) and what Michael described with conneg:
 
## Turtle Start ##
 
@prefix schema:  http://schema.org/ http://schema.org/ .
@prefix dcterms: http://purl.org/dc/terms/.
 
http://www.bbc.co.uk/programmes/b006mw1h a schema:CreativeWork ;
  schema:about  http://www.bbc.co.uk/programmes/b006mw1h/thing  .
 
http://www.bbc.co.uk/programmes/b006mw1h#programme a schema:TVSeries ;
  schema:name Gardeners' World ;
  schema:season  http://www.bbc.co.uk/programmes/p00fx55j#programme ,
http://www.bbc.co.uk/programmes/p00fx5b7#programme ;
  schema:episode  http://www.bbc.co.uk/programmes/b049fnfd#programme .
 
http://www.bbc.co.uk/programmes/p00fx55j a schema:CreativeWork ;
  schema:about  http://www.bbc.co.uk/programmes/p00fx55j#programme ;
  dcterms:hasFormat http://www.bbc.co.uk/programmes/p00fx55j.html ,
http://www.bbc.co.uk/programmes/p00fx55j.rdf .

http://www.bbc.co.uk/programmes/p00fx5b7 a schema:CreativeWork ;
  schema:about  http://www.bbc.co.uk/programmes/p00fx5b7#programme ;
  dcterms:hasFormat http://www.bbc.co.uk/programmes/p00fx5b7.html ,
http://www.bbc.co.uk/programmes/p00fx5b7.rdf .

http://www.bbc.co.uk/programmes/b049fnfd a schema:CreativeWork ;
  schema:about  http://www.bbc.co.uk/programmes/b049fnfd#programme ;
  dcterms:hasFormat http://www.bbc.co.uk/programmes/b049fnfd.html ,
http://www.bbc.co.uk/programmes/b049fnfd.rdf .
 
http://www.bbc.co.uk/programmes/p00fx55j.html a schema:WebPage ;
  schema:about  http://www.bbc.co.uk/programmes/p00fx55j#programme .

http://www.bbc.co.uk/programmes/p00fx5b7.html a schema:WebPage ;
  schema:about  http://www.bbc.co.uk/programmes/p00fx5b7#programme .

http://www.bbc.co.uk/programmes/b049fnfd.html a schema:WebPage ;
  schema:about  http://www.bbc.co.uk/programmes/b049fnfd#programme .
 
http://www.bbc.co.uk/programmes/p00fx55j.rdf a schema:DataDownload ;
  schema:about  http://www.bbc.co.uk/programmes/p00fx55j#programme ;
  schema:encodesCreativeWork http://www.bbc.co.uk/programmes/p00fx55j .

http://www.bbc.co.uk/programmes/p00fx5b7.rdf a schema:DataDownload ;
  schema:about  http://www.bbc.co.uk/programmes/p00fx5b7#programme ;
  schema:encodesCreativeWork http://www.bbc.co.uk/programmes/p00fx5b7 .

http://www.bbc.co.uk/programmes/b049fnfd.rdf a schema:DataDownload ;
  schema:about  http://www.bbc.co.uk/programmes/b049fnfd#programme ;
  schema:encodesCreativeWork http://www.bbc.co.uk/programmes/b049fnfd .
 
## Turtle End ##
 
 

 On July 23, 2014 at 6:53 PM Kingsley Idehen kide...@openlinksw.com wrote:

  On 7/23/14 12:17 PM, john.walker wrote:

             Hi Kingsley,
        
       In the case that Michael describes, could one reasonably expect that if
 the BBC were to embed the following triples as RDFa in the HTML served on the
 URL http://www.bbc.co.uk/programmes/b006mw1h, then a webcrawler would
 understand to go directly to the webpages about the seasons and episodes?
 
     Absolutely!!

  They could even use the following within



Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.

2014-07-23 Thread Kingsley Idehen

On 7/23/14 5:15 PM, Michael Smethurst wrote:


On 23/07/2014 21:49, Kingsley Idehen kide...@openlinksw.com wrote:


On 7/23/14 3:40 PM, Michael Smethurst wrote:

Hi Kingsley

Very definitely starting to feel like deja vu...

On 23/07/2014 20:18, Kingsley Idehenkide...@openlinksw.com  wrote:


On 7/23/14 2:05 PM, Michael Smethurst wrote:

For internal usage it's all probably fine. But I still think it's a
pattern that shouldn't be generally encouraged.

Its a horses for courses matter:-)

If you choose to use hashless HTTP URIs in regards to entity

denotation,

you have to make the extra investment required (via 303 heuristics)

for

entity disambiguation [1].

My only point is: if you don't conflate I can't send that (303) with
what flavour would you like (conneg) you don't have to invest in more
servers


Note, there are changes to HTTP that also reduce some of the confusion
in this realm. For instance the use Content-Location: response

headers

to aid disambiguation [2].

We do use content location for the (information) resource /
representation
split but that's REST not 303 semantics

michael

There is only one kind of relation semantics in play here, and its the
semantics of denotation and connotation [1][2].

Tho derrida didn't have to pay for servers :-/


HTTP URIs denote things.

Which can't be served (303)


No, HTTP URIs simply denote things (entities). It has nothing to do with 
being served etc..





HTTP URLs denote documents comprised of connotation bearing content.

Which can be served in assorted representations (conneg (+ content
location))


No, HTTP URLs are a kind of HTTP URI that denote Web Documents. Put 
differently, HTTP URLs are for all intents an purposes a colloquialism 
for HTTP URIs that focuses on Web Documents, a particular entity type 
i.e., entities that are instances of the Classes denoted by the URIs: 
http://xmlns.com/foaf/0.1/Document, 
http://purl.org/ontology/bibo/Document, 
http://purl.org/dc/terms/BibliographicResource etc..


The very same analogy applies to WebIDs which are HTTP URIs that denote 
Agents i.e., entities that are instances of the Class denoted by the 
URI: http://xmlns.com/foaf/0.1/Agent .


Think the last time we had this conversation we broke the twitter scroll
bar and agreed to disagree. Or at worst misunderstand :-)


Long discussions aren't necessarily bad, they can also unravel insights 
that are sometimes overlooked :-)


BTW -- one can also deconstruct this issue a different way, starting 
with HTTP URI/URLs that denote Documents. It goes something like this:


1. You have a RDF document (comprised of RDF/XML content) denoted by the 
HTTP URI/URL http://www.bbc.co.uk/programmes/b006mw1h.rdf
2. The document above describes an entity denoted by the HTTP URI 
http://www.bbc.co.uk/programmes/b006mw1h#programme .


We arrive at the same place (as illustrated by the Vapour links I shared).

My only issue with the BBC programmes URIs right now is that 
http://www.bbc.co.uk/programmes/b006mw1h doesn't make its association 
with http://www.bbc.co.uk/programmes/b006mw1h#programme discoverable 
to RDF user agents. That's where Microdata, RDFa, link/, Link: come 
into play i.e., they provide vehicles for exposing the missing relation 
(association, connection, relationship property/predicate etc..).


Also note:

curl -IH Accept: application/rdf+xml 
http://www.bbc.co.uk/programmes/b006mw1h

HTTP/1.1 200 OK
Server: Apache
Content-Type: text/html; charset=utf-8

curl -IH Accept: application/rdf+xml 
http://www.bbc.co.uk/programmes/b006mw1h.rdf

HTTP/1.1 200 OK
Server: Apache
Content-Type: application/rdf+xml

Which reinforces my point re. missing relation to aid RDF user agents. 
Simply tacking on .rdf to the end of URLs is way too brittle, when a 
relation (describes, describedby etc..)  would do much better via RDFa, 
Microdata, link/, Link: etc..





Kingsley


michael

In regards, to the current BBC programmes URIs, if you incorporate RDFa,
link/, or Link: based relations, disambiguation without 303's or
content negotiation is possible. RDF user agents (for example) will be
able to make sense of the relations that that collective describe
documents about programmes and actual programmes.

Links:

[1] http://bit.ly/what-does-this-bbc-programmes-uri-denote -- Vapour
using RDF semantics discern what
http://www.bbc.co.uk/programmes/b006mw1h denotes and connotes

[2] http://bit.ly/what-does-this-bbc-programmes-doc-url-denote -- ditto
but targeting http://www.bbc.co.uk/programmes/b006mw1h.rdf .

--
Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this








--
Regards,


Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.

2014-07-23 Thread Kingsley Idehen

On 7/23/14 5:59 PM, john.walker wrote:

And with a few additions to reflect the _actual_ published data (using hash
URIs) and what Michael described with conneg:
  
## Turtle Start ##
  
@prefix schema:  http://schema.org/ http://schema.org/ .

@prefix dcterms: http://purl.org/dc/terms/.
  
http://www.bbc.co.uk/programmes/b006mw1h a schema:CreativeWork ;

   schema:about  http://www.bbc.co.uk/programmes/b006mw1h/thing  .
  
http://www.bbc.co.uk/programmes/b006mw1h#programme a schema:TVSeries ;

   schema:name Gardeners' World ;
   schema:season  http://www.bbc.co.uk/programmes/p00fx55j#programme ,
http://www.bbc.co.uk/programmes/p00fx5b7#programme ;
   schema:episode  http://www.bbc.co.uk/programmes/b049fnfd#programme .
  
http://www.bbc.co.uk/programmes/p00fx55j a schema:CreativeWork ;

   schema:about  http://www.bbc.co.uk/programmes/p00fx55j#programme ;
   dcterms:hasFormat http://www.bbc.co.uk/programmes/p00fx55j.html ,
http://www.bbc.co.uk/programmes/p00fx55j.rdf .

http://www.bbc.co.uk/programmes/p00fx5b7 a schema:CreativeWork ;
   schema:about  http://www.bbc.co.uk/programmes/p00fx5b7#programme ;
   dcterms:hasFormat http://www.bbc.co.uk/programmes/p00fx5b7.html ,
http://www.bbc.co.uk/programmes/p00fx5b7.rdf .

http://www.bbc.co.uk/programmes/b049fnfd a schema:CreativeWork ;
   schema:about  http://www.bbc.co.uk/programmes/b049fnfd#programme ;
   dcterms:hasFormat http://www.bbc.co.uk/programmes/b049fnfd.html ,
http://www.bbc.co.uk/programmes/b049fnfd.rdf .
  
http://www.bbc.co.uk/programmes/p00fx55j.html a schema:WebPage ;

   schema:about  http://www.bbc.co.uk/programmes/p00fx55j#programme .

http://www.bbc.co.uk/programmes/p00fx5b7.html a schema:WebPage ;
   schema:about  http://www.bbc.co.uk/programmes/p00fx5b7#programme .

http://www.bbc.co.uk/programmes/b049fnfd.html a schema:WebPage ;
   schema:about  http://www.bbc.co.uk/programmes/b049fnfd#programme .
  
http://www.bbc.co.uk/programmes/p00fx55j.rdf a schema:DataDownload ;

   schema:about  http://www.bbc.co.uk/programmes/p00fx55j#programme ;
   schema:encodesCreativeWork http://www.bbc.co.uk/programmes/p00fx55j .

http://www.bbc.co.uk/programmes/p00fx5b7.rdf a schema:DataDownload ;
   schema:about  http://www.bbc.co.uk/programmes/p00fx5b7#programme ;
   schema:encodesCreativeWork http://www.bbc.co.uk/programmes/p00fx5b7 .

http://www.bbc.co.uk/programmes/b049fnfd.rdf a schema:DataDownload ;
   schema:about  http://www.bbc.co.uk/programmes/b049fnfd#programme ;
   schema:encodesCreativeWork http://www.bbc.co.uk/programmes/b049fnfd .
  
## Turtle End ##


Yep!!

And that enables an RDF agent produce output such as:

[1] 
http://linkeddata.uriburner.com/about/html/http/lists.w3.org/Archives/Public/public-lod/2014Jul/0121.html 
-- basic document description
[2] http://bit.ly/cool-uris-303-entity-ranking-fyn  -- deeper 
follow-your-nose oriented document description
[3] http://bit.ly/statements-made-by-john-walker-in-lod-list-post -- 
statements discerned and then reified, via the nanotations (micro 
annotations) in your post :-)



Kingsley
  
  


On July 23, 2014 at 6:53 PM Kingsley Idehen kide...@openlinksw.com wrote:

   On 7/23/14 12:17 PM, john.walker wrote:

  Hi Kingsley,

   In the case that Michael describes, could one reasonably expect that if

the BBC were to embed the following triples as RDFa in the HTML served on the
URL http://www.bbc.co.uk/programmes/b006mw1h, then a webcrawler would
understand to go directly to the webpages about the seasons and episodes?

 Absolutely!!

   They could even use the following within



--
Regards,

Kingsley Idehen 
Founder  CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this




smime.p7s
Description: S/MIME Cryptographic Signature


Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.

2014-07-18 Thread Mark Fallu
I am attempting to understand how the the CoolURI 303 redirect pattern for
the semantic web (http://www.w3.org/TR/cooluris/) can be implemented
without negative impact on search engines.

This pattern appears to allow site content to be indexed, but prevents page
rank from flowing through internal links due to the use of a 303 redirect.

For example in Griffith's Research-Hub: http://research-hub.griffith.edu.au

A get request to the URI of Howard Wiseman:
http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f

Will resolve to different urls based on content negotiation.

For RDF:
wget --header Accept: application/rdf+xml
http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f

results in a 303 see other redirect to the RDF version of the entity:
http://research-hub.griffith.edu.au/rdf/n33a4e2d3057476efaff5ce1884564a8f/n33a4e2d3057476efaff5ce1884564a8f.rdf

For HTML:
wget --header Accept: text/html
http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f
results in a 303 see other redirect to the HTML version of the entity
(our old friend the display version:
http://research-hub.griffith.edu.au/display/n33a4e2d3057476efaff5ce1884564a8f

Note: There will never be a HTML page at
http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f
just
a HTTP response

Links will be presented as the individual uri and then redirect to the
display url.

All good so far - this is a perfectly functional example of the Cool URI
specification at work.  Unfortunately it results in a few issues in
practice.

If the links we present to the outside world for harvesting eg. via sparql
endpoint, OAI-PMH or open social widget etc is the canonical individual
URI, clients will be able to get to the display url, but the google page
rank that would normally flow from these external links will not.

The specification of a 303 redirect describes it as:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html

 The response to the request can be found under a different URI and SHOULD
 be retrieved using a GET method on that resource. This method exists
 primarily to allow the output of a POST-activated script to redirect the
 user agent to a selected resource. *The new URI is not a substitute
 reference for the originally requested resource*. The 303 response MUST
 NOT be cached, but the response to the second (redirected) request might be
 cacheable.



The different URI SHOULD be given by the Location field in the response.
 Unless the request method was HEAD, the entity of the response SHOULD
 contain a short hypertext note with a hyperlink to the new URI(s).


Google correctly implements the specification and does not assign the page
rank of the individual URI to the display URL as it is *not a
substitute reference for the originally requested resource.*

 The same is true of internal links, a high page rank home page will not
pass page rank on to display urls if the pathway to those urls is via
individual uri links.

I am not sure what the solution is here as it seems the realms of SEO and
the conventions of the web they are built on are not a good fit for
semantic web best practice.

The most minimal compromise I can think of is to move away from the use of
a 303 redirect to a redirect that conserves the flow of google page rank.

   - 302 Found redirect is the recommended replacement for 303 for
   clients that do not support HTTP 1.1  and it does allow a certain amount of
   google page rank to flow.
   - 301 Moved Permanently is a poor fit for the Cool URI pattern, but
   passes on the full page rank of the links.
   - rewriting all URIs the URL would also work, but would break the
   coolURI pattern.

The pragmatist in me feels that if we are going to make a change for the
purposes of SEO, it might as well be the one with best return, i.e. 301
redirect.

Note: Indexing is not the problem here, content is indexed.  The issue
relates to page rank not flowing through a 303 redirect.

I have tested and can confirm that 303 redirects are an issue for a number
of reasons:

   - page rank does not flow through a 303 redirect
   - page rank can not be assigned from a url to a uri with a rel=canonical
   tag if URI does a 303 redirect (preventing aggregation of pagerank from
   external links to URL)
   - URI and URL are indexed separately
   - rdfa schema.org representations of URIs do not translate to URL (ie.
   representation described at URL A, talking about URI B, does not get
   connected to representation described at URL B)
   - url parameters are not passed by a 303 redirect.
   - impact on functinality of google analytics tracking eg. traversing the
   site is seen as a series of direct page visits.

Essentially - as far as search engines are concerned - every URL and URI is
an island, with no connections between them.  At best a URL can express a
rel=canonical back to it's corresponding URI, no pagerank will flow through
links.


Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.

2014-07-18 Thread Michael Brunnbauer

Hello Mark,

I cannot remember this important topic coming up earlier - which is a bit
disturbing.

The problem would be migitated by people using the URI they see for linking.

Why not use the HTML URLs in the HTML pages for internal page rank flow?

How can URIs from sparql endpoints or OAI-PMH contribute to page rank?

A real problem would be RDFa where href also sets the object of a triple.

Regards,

Michael Brunnbauer

On Fri, Jul 18, 2014 at 10:05:17PM +1000, Mark Fallu wrote:
 If the links we present to the outside world for harvesting eg. via sparql
 endpoint, OAI-PMH or open social widget etc is the canonical individual
 URI, clients will be able to get to the display url, but the google page
 rank that would normally flow from these external links will not.



 
 The specification of a 303 redirect describes it as:
 http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
 
  The response to the request can be found under a different URI and SHOULD
  be retrieved using a GET method on that resource. This method exists
  primarily to allow the output of a POST-activated script to redirect the
  user agent to a selected resource. *The new URI is not a substitute
  reference for the originally requested resource*. The 303 response MUST
  NOT be cached, but the response to the second (redirected) request might be
  cacheable.
 
 
 
 The different URI SHOULD be given by the Location field in the response.
  Unless the request method was HEAD, the entity of the response SHOULD
  contain a short hypertext note with a hyperlink to the new URI(s).
 
 
 Google correctly implements the specification and does not assign the page
 rank of the individual URI to the display URL as it is *not a
 substitute reference for the originally requested resource.*
 
  The same is true of internal links, a high page rank home page will not
 pass page rank on to display urls if the pathway to those urls is via
 individual uri links.
 
 I am not sure what the solution is here as it seems the realms of SEO and
 the conventions of the web they are built on are not a good fit for
 semantic web best practice.
 
 The most minimal compromise I can think of is to move away from the use of
 a 303 redirect to a redirect that conserves the flow of google page rank.
 
- 302 Found redirect is the recommended replacement for 303 for
clients that do not support HTTP 1.1  and it does allow a certain amount of
google page rank to flow.
- 301 Moved Permanently is a poor fit for the Cool URI pattern, but
passes on the full page rank of the links.
- rewriting all URIs the URL would also work, but would break the
coolURI pattern.
 
 The pragmatist in me feels that if we are going to make a change for the
 purposes of SEO, it might as well be the one with best return, i.e. 301
 redirect.
 
 Note: Indexing is not the problem here, content is indexed.  The issue
 relates to page rank not flowing through a 303 redirect.
 
 I have tested and can confirm that 303 redirects are an issue for a number
 of reasons:
 
- page rank does not flow through a 303 redirect
- page rank can not be assigned from a url to a uri with a rel=canonical
tag if URI does a 303 redirect (preventing aggregation of pagerank from
external links to URL)
- URI and URL are indexed separately
- rdfa schema.org representations of URIs do not translate to URL (ie.
representation described at URL A, talking about URI B, does not get
connected to representation described at URL B)
- url parameters are not passed by a 303 redirect.
- impact on functinality of google analytics tracking eg. traversing the
site is seen as a series of direct page visits.
 
 Essentially - as far as search engines are concerned - every URL and URI is
 an island, with no connections between them.  At best a URL can express a
 rel=canonical back to it's corresponding URI, no pagerank will flow through
 links.
 
 Any guidance you can provide would be appreciated.
 
 -- 
 
 o-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 | Mark Fallu
 | Manager, Research Data (Acting)
 | Office for Research
 | Bray Centre (N54) 0.10E
 | Griffith University, Nathan Campus
 | Queensland 4111 AUSTRALIA
 |
 | E-mail: m.fa...@griffith.edu.au
 | Mobile:  04177 69778
 | Phone:  +61 (07) 373 52069
 o-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

-- 
++  Michael Brunnbauer
++  netEstate GmbH
++  Geisenhausener Straße 11a
++  81379 München
++  Tel +49 89 32 19 77 80
++  Fax +49 89 32 19 77 89 
++  E-Mail bru...@netestate.de
++  http://www.netestate.de/
++
++  Sitz: München, HRB Nr.142452 (Handelsregister B München)
++  USt-IdNr. DE221033342
++  Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer
++  Prokurist: Dipl. Kfm. (Univ.) Markus Hendel


pgpwlzdhcwqRx.pgp
Description: PGP signature


Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.

2014-07-18 Thread Melvin Carvalho
On 18 July 2014 14:05, Mark Fallu m.fa...@griffith.edu.au wrote:

 I am attempting to understand how the the CoolURI 303 redirect pattern for
 the semantic web (http://www.w3.org/TR/cooluris/) can be implemented
 without negative impact on search engines.


Just a quick question:

Is there any reason you want to use 303s?

I personally consider it an anti-pattern.



 This pattern appears to allow site content to be indexed, but
 prevents page rank from flowing through internal links due to the use of a
 303 redirect.

 For example in Griffith's Research-Hub:
 http://research-hub.griffith.edu.au

 A get request to the URI of Howard Wiseman:
 http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f

 Will resolve to different urls based on content negotiation.

 For RDF:
 wget --header Accept: application/rdf+xml
 http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f

 results in a 303 see other redirect to the RDF version of the entity:

 http://research-hub.griffith.edu.au/rdf/n33a4e2d3057476efaff5ce1884564a8f/n33a4e2d3057476efaff5ce1884564a8f.rdf

 For HTML:
 wget --header Accept: text/html
 http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f
 results in a 303 see other redirect to the HTML version of the entity
 (our old friend the display version:

 http://research-hub.griffith.edu.au/display/n33a4e2d3057476efaff5ce1884564a8f

 Note: There will never be a HTML page at
 http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f
  just
 a HTTP response

 Links will be presented as the individual uri and then redirect to the
 display url.

 All good so far - this is a perfectly functional example of the Cool URI
 specification at work.  Unfortunately it results in a few issues in
 practice.

 If the links we present to the outside world for harvesting eg. via sparql
 endpoint, OAI-PMH or open social widget etc is the canonical individual
 URI, clients will be able to get to the display url, but the google page
 rank that would normally flow from these external links will not.

 The specification of a 303 redirect describes it as:
 http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html

 The response to the request can be found under a different URI and
 SHOULD be retrieved using a GET method on that resource. This method exists
 primarily to allow the output of a POST-activated script to redirect the
 user agent to a selected resource. *The new URI is not a substitute
 reference for the originally requested resource*. The 303 response MUST
 NOT be cached, but the response to the second (redirected) request might be
 cacheable.



 The different URI SHOULD be given by the Location field in the response.
 Unless the request method was HEAD, the entity of the response SHOULD
 contain a short hypertext note with a hyperlink to the new URI(s).


 Google correctly implements the specification and does not assign the page
 rank of the individual URI to the display URL as it is *not a
 substitute reference for the originally requested resource.*

  The same is true of internal links, a high page rank home page will not
 pass page rank on to display urls if the pathway to those urls is via
 individual uri links.

 I am not sure what the solution is here as it seems the realms of SEO and
 the conventions of the web they are built on are not a good fit for
 semantic web best practice.

 The most minimal compromise I can think of is to move away from the use of
 a 303 redirect to a redirect that conserves the flow of google page rank.

- 302 Found redirect is the recommended replacement for 303 for
clients that do not support HTTP 1.1  and it does allow a certain amount of
google page rank to flow.
- 301 Moved Permanently is a poor fit for the Cool URI pattern, but
passes on the full page rank of the links.
- rewriting all URIs the URL would also work, but would break the
coolURI pattern.

 The pragmatist in me feels that if we are going to make a change for the
 purposes of SEO, it might as well be the one with best return, i.e. 301
 redirect.

 Note: Indexing is not the problem here, content is indexed.  The issue
 relates to page rank not flowing through a 303 redirect.

 I have tested and can confirm that 303 redirects are an issue for a number
 of reasons:

- page rank does not flow through a 303 redirect
- page rank can not be assigned from a url to a uri with a
rel=canonical tag if URI does a 303 redirect (preventing aggregation of
pagerank from external links to URL)
- URI and URL are indexed separately
- rdfa schema.org representations of URIs do not translate to URL (ie.
representation described at URL A, talking about URI B, does not get
connected to representation described at URL B)
- url parameters are not passed by a 303 redirect.
- impact on functinality of google analytics tracking eg. traversing
the site is seen as a series of 

Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.

2014-07-18 Thread Gannon Dick

 On 18 July 2014 14:05, Mark
 Fallu m.fa...@griffith.edu.au
 wrote:
 
 I am attempting to understand how the the
 CoolURI 303 redirect pattern for the semantic web 
(http://www.w3.org/TR/cooluris/) can be
 implemented
 without negative impact on search engines.
 
 Just a
 quick question:
 
 Is there any reason you want to use
 303s?  
 
 I personally consider it an anti-pattern.
  

Thank you, Melvin. I think so too.

short version:anti-pattern

long version:
Eastern Australia is 13 hours ahead of the Central United States so ...
On Saturday night in Dallas there is no semantic difference between praying 
Australians and liquored-up cowboys.  Bug or a Feature ? No, anti-pattern.





Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.

2014-07-18 Thread Paul Houle
Frankly I don't care about PageRank,  and these days I don't know if
Google does.  These days Google gets direct sampling of user behavior
through Chrome and Google Analytics,  and this sort of data is
probably much more valuable than the link graph since they know about
things like time-on-page,  query chains,  and things like that.

If anything,  PageRank,  or what people imagine about PageRank has
been harmful to the web because it's created a situation where people
just don't make links to other web sites anymore.  It started with
high profile sites (ex. engadget) that just wanted to be greedy and
not give any PageRank to their competition.  Then you saw people using
the NOFOLLOW attribute because they thought that this too was a way to
be greedy.

Ten years ago I got a lot of emails from people that amounted to I
will pay you $X if you make a link on page Y to page Z with anchor
text T.  You'd also find SEO firms that would ask for $X a month to
generate Y links to your site.

Recently Google made some changes and they seem to be punishing people
who have inappropriate links so now people get emails like Would you
please remove the link from page X to page Y and the new thing is
that SEO firms now want you to pay them $X to remove Y links to your
site.

I think it is all a lot of bull and I make whatever links I like and
figure that Google is going to do whatever it is they are going to do.

ᐧ

On Fri, Jul 18, 2014 at 8:05 AM, Mark Fallu m.fa...@griffith.edu.au wrote:
 I am attempting to understand how the the CoolURI 303 redirect pattern for
 the semantic web (http://www.w3.org/TR/cooluris/) can be implemented without
 negative impact on search engines.

 This pattern appears to allow site content to be indexed, but prevents page
 rank from flowing through internal links due to the use of a 303 redirect.

 For example in Griffith's Research-Hub: http://research-hub.griffith.edu.au

 A get request to the URI of Howard Wiseman:
 http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f

 Will resolve to different urls based on content negotiation.

 For RDF:
 wget --header Accept: application/rdf+xml
 http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f

 results in a 303 see other redirect to the RDF version of the entity:
 http://research-hub.griffith.edu.au/rdf/n33a4e2d3057476efaff5ce1884564a8f/n33a4e2d3057476efaff5ce1884564a8f.rdf

 For HTML:
 wget --header Accept: text/html
 http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f
 results in a 303 see other redirect to the HTML version of the entity (our
 old friend the display version:
 http://research-hub.griffith.edu.au/display/n33a4e2d3057476efaff5ce1884564a8f

 Note: There will never be a HTML page at
 http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f
 just a HTTP response

 Links will be presented as the individual uri and then redirect to the
 display url.

 All good so far - this is a perfectly functional example of the Cool URI
 specification at work.  Unfortunately it results in a few issues in
 practice.

 If the links we present to the outside world for harvesting eg. via sparql
 endpoint, OAI-PMH or open social widget etc is the canonical individual
 URI, clients will be able to get to the display url, but the google page
 rank that would normally flow from these external links will not.

 The specification of a 303 redirect describes it as:
 http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html

 The response to the request can be found under a different URI and SHOULD
 be retrieved using a GET method on that resource. This method exists
 primarily to allow the output of a POST-activated script to redirect the
 user agent to a selected resource. The new URI is not a substitute reference
 for the originally requested resource. The 303 response MUST NOT be cached,
 but the response to the second (redirected) request might be cacheable.



 The different URI SHOULD be given by the Location field in the response.
 Unless the request method was HEAD, the entity of the response SHOULD
 contain a short hypertext note with a hyperlink to the new URI(s).


 Google correctly implements the specification and does not assign the page
 rank of the individual URI to the display URL as it is not a substitute
 reference for the originally requested resource.

 The same is true of internal links, a high page rank home page will not pass
 page rank on to display urls if the pathway to those urls is via
 individual uri links.

 I am not sure what the solution is here as it seems the realms of SEO and
 the conventions of the web they are built on are not a good fit for semantic
 web best practice.

 The most minimal compromise I can think of is to move away from the use of a
 303 redirect to a redirect that conserves the flow of google page rank.

 302 Found redirect is the recommended replacement for 303 for clients that
 do not support HTTP 1.1 

Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.

2014-07-18 Thread Mark Fallu
That is a fair point - but I would still suggest that it is important for 
search engines to be able to meaningfully interpret:
- internal links
- rdfa representations that span multiple pages.

Cheers,

Mark

Sent from my iPhone

 On 19 Jul 2014, at 3:02 am, Paul Houle ontolo...@gmail.com wrote:
 
 Frankly I don't care about PageRank,  and these days I don't know if
 Google does.  These days Google gets direct sampling of user behavior
 through Chrome and Google Analytics,  and this sort of data is
 probably much more valuable than the link graph since they know about
 things like time-on-page,  query chains,  and things like that.
 
 If anything,  PageRank,  or what people imagine about PageRank has
 been harmful to the web because it's created a situation where people
 just don't make links to other web sites anymore.  It started with
 high profile sites (ex. engadget) that just wanted to be greedy and
 not give any PageRank to their competition.  Then you saw people using
 the NOFOLLOW attribute because they thought that this too was a way to
 be greedy.
 
 Ten years ago I got a lot of emails from people that amounted to I
 will pay you $X if you make a link on page Y to page Z with anchor
 text T.  You'd also find SEO firms that would ask for $X a month to
 generate Y links to your site.
 
 Recently Google made some changes and they seem to be punishing people
 who have inappropriate links so now people get emails like Would you
 please remove the link from page X to page Y and the new thing is
 that SEO firms now want you to pay them $X to remove Y links to your
 site.
 
 I think it is all a lot of bull and I make whatever links I like and
 figure that Google is going to do whatever it is they are going to do.
 
 ᐧ
 
 On Fri, Jul 18, 2014 at 8:05 AM, Mark Fallu m.fa...@griffith.edu.au wrote:
 I am attempting to understand how the the CoolURI 303 redirect pattern for
 the semantic web (http://www.w3.org/TR/cooluris/) can be implemented without
 negative impact on search engines.
 
 This pattern appears to allow site content to be indexed, but prevents page
 rank from flowing through internal links due to the use of a 303 redirect.
 
 For example in Griffith's Research-Hub: http://research-hub.griffith.edu.au
 
 A get request to the URI of Howard Wiseman:
 http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f
 
 Will resolve to different urls based on content negotiation.
 
 For RDF:
 wget --header Accept: application/rdf+xml
 http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f
 
 results in a 303 see other redirect to the RDF version of the entity:
 http://research-hub.griffith.edu.au/rdf/n33a4e2d3057476efaff5ce1884564a8f/n33a4e2d3057476efaff5ce1884564a8f.rdf
 
 For HTML:
 wget --header Accept: text/html
 http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f
 results in a 303 see other redirect to the HTML version of the entity (our
 old friend the display version:
 http://research-hub.griffith.edu.au/display/n33a4e2d3057476efaff5ce1884564a8f
 
 Note: There will never be a HTML page at
 http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f
 just a HTTP response
 
 Links will be presented as the individual uri and then redirect to the
 display url.
 
 All good so far - this is a perfectly functional example of the Cool URI
 specification at work.  Unfortunately it results in a few issues in
 practice.
 
 If the links we present to the outside world for harvesting eg. via sparql
 endpoint, OAI-PMH or open social widget etc is the canonical individual
 URI, clients will be able to get to the display url, but the google page
 rank that would normally flow from these external links will not.
 
 The specification of a 303 redirect describes it as:
 http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
 
 The response to the request can be found under a different URI and SHOULD
 be retrieved using a GET method on that resource. This method exists
 primarily to allow the output of a POST-activated script to redirect the
 user agent to a selected resource. The new URI is not a substitute reference
 for the originally requested resource. The 303 response MUST NOT be cached,
 but the response to the second (redirected) request might be cacheable.
 
 
 
 The different URI SHOULD be given by the Location field in the response.
 Unless the request method was HEAD, the entity of the response SHOULD
 contain a short hypertext note with a hyperlink to the new URI(s).
 
 
 Google correctly implements the specification and does not assign the page
 rank of the individual URI to the display URL as it is not a substitute
 reference for the originally requested resource.
 
 The same is true of internal links, a high page rank home page will not pass
 page rank on to display urls if the pathway to those urls is via
 individual uri links.
 
 I am not sure what the solution is here as it seems the realms of SEO and