Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.
Hi Michael, You asked: How can URIs from sparql endpoints or OAI-PMH contribute to page rank? If party A: - produces a system that uses 303 based cooluri to describe their content, and in addition to webpages expose it to the world via sparql endpoint or oai-pmh. and party B: - harvests information via sparql enpoint or oai-pmh and produces a public representation of that content that links back to party A. If the link back is the cooluri that resolves to a page via a 303 redirect and content negotiation, web spiders etc will not be able to follow that inbound link. This means that some of the advantage of being machine harvest-able is lost. Sure your content is indexed, but the authority that comes from other people/systems citing your content, reusing your content is greatly diluted. Cheers, Mark On Sat, Jul 19, 2014 at 1:52 AM, Michael Brunnbauer bru...@netestate.de wrote: Hello Mark, I cannot remember this important topic coming up earlier - which is a bit disturbing. The problem would be migitated by people using the URI they see for linking. Why not use the HTML URLs in the HTML pages for internal page rank flow? How can URIs from sparql endpoints or OAI-PMH contribute to page rank? A real problem would be RDFa where href also sets the object of a triple. Regards, Michael Brunnbauer On Fri, Jul 18, 2014 at 10:05:17PM +1000, Mark Fallu wrote: If the links we present to the outside world for harvesting eg. via sparql endpoint, OAI-PMH or open social widget etc is the canonical individual URI, clients will be able to get to the display url, but the google page rank that would normally flow from these external links will not. The specification of a 303 redirect describes it as: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html The response to the request can be found under a different URI and SHOULD be retrieved using a GET method on that resource. This method exists primarily to allow the output of a POST-activated script to redirect the user agent to a selected resource. *The new URI is not a substitute reference for the originally requested resource*. The 303 response MUST NOT be cached, but the response to the second (redirected) request might be cacheable. The different URI SHOULD be given by the Location field in the response. Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s). Google correctly implements the specification and does not assign the page rank of the individual URI to the display URL as it is *not a substitute reference for the originally requested resource.* The same is true of internal links, a high page rank home page will not pass page rank on to display urls if the pathway to those urls is via individual uri links. I am not sure what the solution is here as it seems the realms of SEO and the conventions of the web they are built on are not a good fit for semantic web best practice. The most minimal compromise I can think of is to move away from the use of a 303 redirect to a redirect that conserves the flow of google page rank. - 302 Found redirect is the recommended replacement for 303 for clients that do not support HTTP 1.1 and it does allow a certain amount of google page rank to flow. - 301 Moved Permanently is a poor fit for the Cool URI pattern, but passes on the full page rank of the links. - rewriting all URIs the URL would also work, but would break the coolURI pattern. The pragmatist in me feels that if we are going to make a change for the purposes of SEO, it might as well be the one with best return, i.e. 301 redirect. Note: Indexing is not the problem here, content is indexed. The issue relates to page rank not flowing through a 303 redirect. I have tested and can confirm that 303 redirects are an issue for a number of reasons: - page rank does not flow through a 303 redirect - page rank can not be assigned from a url to a uri with a rel=canonical tag if URI does a 303 redirect (preventing aggregation of pagerank from external links to URL) - URI and URL are indexed separately - rdfa schema.org representations of URIs do not translate to URL (ie. representation described at URL A, talking about URI B, does not get connected to representation described at URL B) - url parameters are not passed by a 303 redirect. - impact on functinality of google analytics tracking eg. traversing the site is seen as a series of direct page visits. Essentially - as far as search engines are concerned - every URL and URI is an island, with no connections between them. At best a URL can express a rel=canonical back to it's corresponding URI, no pagerank will flow through links. Any guidance you can provide would be appreciated. --
Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.
Hi Mark Seems to me that there is something in common with Single Page Application (SPA) design. Often you see that a static HTML copy (snapshot) of the site is made for SEO purposes. This as search bots generally did not execute JavaScript so did not 'see' the dynamic page. Often the pages in the app get a fragment identifier so they can be bookmarked. As the bot sees the static page, which link should be shown in search results? Presumably you want the (human) user to be taken to the SPA page, not the static HTML. I've seen this done with server side redirects and redirects in JavaScript. Not sure what HTTP response code is used. If my memory serves correctly also some browsers did not include the fragment part in bookmarks, so often you would see a bookmark this page button that would create a bookmark for the 'canonical' URL. That can be very confusing for end users though. How many times have we all seen a DBpedia /page URI used instead of the /resource URI? We should no expect most regular users to understand this. I get the feeling there must be a simple and elegant solution. Regards, John
Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.
Hello (Pretty sure I've made this comment before so please forgive any signs of premature senility) I think this may be an unfortunate side effect of the conflation of the 303 (I can't send that) pattern with the content negotiation (what flavour would you like) pattern Lots of linked data applications (like dbpedia) seem to couple the two things together. So you have a individual uri which, when you attempt to dereference does a 303 *and* conneg in one step to the display uri: /resource 303+conneg /data or /resource 303+conneg /page Many other linked data sites seem to have followed this pattern but it does seem, to my eyes, broke At the BBC we have 3 flavours of uri. I'm not sure if these are the appropriate / best labels but: - the non-information resource uri. The uri that refers to the real world physical / metaphysical thing - the generic information resource uri that identifies the document but not any specific representation of the document - the representation uri (the html or json or rdf-xml etc) We tend to use hashes rather than slashes like http://www.bbc.co.uk/programmes/b006mw1h#programme But pretending we use slashes for a minute... If you requested: http://www.bbc.co.uk/programmes/b006mw1h/thing You'd get a 303 redirect to the generic document / information resource uri: http://www.bbc.co.uk/programmes/b006mw1h Which would then conneg to the appropriate representation which would still be served from: http://www.bbc.co.uk/programmes/b006mw1h With a content location header of http://www.bbc.co.uk/programmes/b006mw1h.rdf For example Whilst the rdf refers to the non-information resource uri when making assertions about the thing this uri is not used elsewhere. All links in the html point to the generic document uri not to the non-information resource uri So crawlers like google just follow links from information resource to information resource and never have to encounter 303s Picking up a conneg penalty for every request isn't without problems (particularly given CDN serving) but picking up a 303 penalty for every request would be madness and not something we'd ever have been able to implement I do think the dbpedia conflation of 303 with conneg is an unhelpful anti-pattern that people shouldn't be encouraged to follow. The conneg part is just REST; semantics add the 303 onto that but they're not doing the same thing Separating 303 from conneg still gives you thing vs document separation, still maintains cool uris and doesn't kill your servers And we've never had a problem with seo Hth michael On 18/07/2014 16:52, Michael Brunnbauer bru...@netestate.de wrote: Hello Mark, I cannot remember this important topic coming up earlier - which is a bit disturbing. The problem would be migitated by people using the URI they see for linking. Why not use the HTML URLs in the HTML pages for internal page rank flow? How can URIs from sparql endpoints or OAI-PMH contribute to page rank? A real problem would be RDFa where href also sets the object of a triple. Regards, Michael Brunnbauer On Fri, Jul 18, 2014 at 10:05:17PM +1000, Mark Fallu wrote: If the links we present to the outside world for harvesting eg. via sparql endpoint, OAI-PMH or open social widget etc is the canonical individual URI, clients will be able to get to the display url, but the google page rank that would normally flow from these external links will not. The specification of a 303 redirect describes it as: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html The response to the request can be found under a different URI and SHOULD be retrieved using a GET method on that resource. This method exists primarily to allow the output of a POST-activated script to redirect the user agent to a selected resource. *The new URI is not a substitute reference for the originally requested resource*. The 303 response MUST NOT be cached, but the response to the second (redirected) request might be cacheable. The different URI SHOULD be given by the Location field in the response. Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s). Google correctly implements the specification and does not assign the page rank of the individual URI to the display URL as it is *not a substitute reference for the originally requested resource.* The same is true of internal links, a high page rank home page will not pass page rank on to display urls if the pathway to those urls is via individual uri links. I am not sure what the solution is here as it seems the realms of SEO and the conventions of the web they are built on are not a good fit for semantic web best practice. The most minimal compromise I can think of is to move away from the use of a 303 redirect to a redirect that conserves the flow of google page rank. - 302 Found redirect is the recommended replacement for 303
Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.
Hi Michael We've tended to use slash URIs where possible, because have found it more convenient when doing URI dereferencing from a triple-store backed site - in which case we essentially do a DESCRIBE on the relevant URI. (So we do 303ing for non-information resources, though in practice in a lot of our applications, the great majority of content is statistical data, which we treat as information resources and respond with 200). How do you organise your data and generation of URI dereferencing responses with hash based URIs? I can see a variety of ways to do it, but I'd be interested to know what you have found most efficient/convenient at the BBC - essentially dealing with the fact that the server doesn't know about what comes after the # Thanks Bill On 23 Jul 2014, at 13:52, Michael Smethurst michael.smethu...@bbc.co.uk wrote: Hello (Pretty sure I've made this comment before so please forgive any signs of premature senility) I think this may be an unfortunate side effect of the conflation of the 303 (I can't send that) pattern with the content negotiation (what flavour would you like) pattern Lots of linked data applications (like dbpedia) seem to couple the two things together. So you have a individual uri which, when you attempt to dereference does a 303 *and* conneg in one step to the display uri: /resource 303+conneg /data or /resource 303+conneg /page Many other linked data sites seem to have followed this pattern but it does seem, to my eyes, broke At the BBC we have 3 flavours of uri. I'm not sure if these are the appropriate / best labels but: - the non-information resource uri. The uri that refers to the real world physical / metaphysical thing - the generic information resource uri that identifies the document but not any specific representation of the document - the representation uri (the html or json or rdf-xml etc) We tend to use hashes rather than slashes like http://www.bbc.co.uk/programmes/b006mw1h#programme But pretending we use slashes for a minute... If you requested: http://www.bbc.co.uk/programmes/b006mw1h/thing You'd get a 303 redirect to the generic document / information resource uri: http://www.bbc.co.uk/programmes/b006mw1h Which would then conneg to the appropriate representation which would still be served from: http://www.bbc.co.uk/programmes/b006mw1h With a content location header of http://www.bbc.co.uk/programmes/b006mw1h.rdf For example Whilst the rdf refers to the non-information resource uri when making assertions about the thing this uri is not used elsewhere. All links in the html point to the generic document uri not to the non-information resource uri So crawlers like google just follow links from information resource to information resource and never have to encounter 303s Picking up a conneg penalty for every request isn't without problems (particularly given CDN serving) but picking up a 303 penalty for every request would be madness and not something we'd ever have been able to implement I do think the dbpedia conflation of 303 with conneg is an unhelpful anti-pattern that people shouldn't be encouraged to follow. The conneg part is just REST; semantics add the 303 onto that but they're not doing the same thing Separating 303 from conneg still gives you thing vs document separation, still maintains cool uris and doesn't kill your servers And we've never had a problem with seo Hth michael On 18/07/2014 16:52, Michael Brunnbauer bru...@netestate.de wrote: Hello Mark, I cannot remember this important topic coming up earlier - which is a bit disturbing. The problem would be migitated by people using the URI they see for linking. Why not use the HTML URLs in the HTML pages for internal page rank flow? How can URIs from sparql endpoints or OAI-PMH contribute to page rank? A real problem would be RDFa where href also sets the object of a triple. Regards, Michael Brunnbauer On Fri, Jul 18, 2014 at 10:05:17PM +1000, Mark Fallu wrote: If the links we present to the outside world for harvesting eg. via sparql endpoint, OAI-PMH or open social widget etc is the canonical individual URI, clients will be able to get to the display url, but the google page rank that would normally flow from these external links will not. The specification of a 303 redirect describes it as: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html The response to the request can be found under a different URI and SHOULD be retrieved using a GET method on that resource. This method exists primarily to allow the output of a POST-activated script to redirect the user agent to a selected resource. *The new URI is not a substitute reference for the originally requested resource*. The 303 response MUST NOT be cached, but the response to the second (redirected) request might be cacheable. The different URI
Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.
Hi Bill Bit of a difficult question to answer because the reality is probably still quite disjoined. Various parts of bbc.co.uk: - serve linked data - store data as rdf (in a triple store) - consume (to some extent) linked data But nowhere are all those things true in one place. So /programmes publishes linked data but the backend is a relational database, whereas things like sport / olympics are stored as linked data but don't publish So the 2 parts aren't really coupled I do half remember lots of conversations about hashes v slashes for /programmes and /music but the sites are designed to be quite granular (one thing per uri; one uri per thing) so we weren't really dealing with lots of things in a document The linked data platform (our triple store) does use # uris like: On 23/07/2014 14:19, Bill Roberts b...@swirrl.com wrote: Hi Michael We've tended to use slash URIs where possible, because have found it more convenient when doing URI dereferencing from a triple-store backed site - in which case we essentially do a DESCRIBE on the relevant URI. (So we do 303ing for non-information resources, though in practice in a lot of our applications, the great majority of content is statistical data, which we treat as information resources and respond with 200). How do you organise your data and generation of URI dereferencing responses with hash based URIs? I can see a variety of ways to do it, but I'd be interested to know what you have found most efficient/convenient at the BBC - essentially dealing with the fact that the server doesn't know about what comes after the # Thanks Bill On 23 Jul 2014, at 13:52, Michael Smethurst michael.smethu...@bbc.co.uk wrote: Hello (Pretty sure I've made this comment before so please forgive any signs of premature senility) I think this may be an unfortunate side effect of the conflation of the 303 (I can't send that) pattern with the content negotiation (what flavour would you like) pattern Lots of linked data applications (like dbpedia) seem to couple the two things together. So you have a individual uri which, when you attempt to dereference does a 303 *and* conneg in one step to the display uri: /resource 303+conneg /data or /resource 303+conneg /page Many other linked data sites seem to have followed this pattern but it does seem, to my eyes, broke At the BBC we have 3 flavours of uri. I'm not sure if these are the appropriate / best labels but: - the non-information resource uri. The uri that refers to the real world physical / metaphysical thing - the generic information resource uri that identifies the document but not any specific representation of the document - the representation uri (the html or json or rdf-xml etc) We tend to use hashes rather than slashes like http://www.bbc.co.uk/programmes/b006mw1h#programme But pretending we use slashes for a minute... If you requested: http://www.bbc.co.uk/programmes/b006mw1h/thing You'd get a 303 redirect to the generic document / information resource uri: http://www.bbc.co.uk/programmes/b006mw1h Which would then conneg to the appropriate representation which would still be served from: http://www.bbc.co.uk/programmes/b006mw1h With a content location header of http://www.bbc.co.uk/programmes/b006mw1h.rdf For example Whilst the rdf refers to the non-information resource uri when making assertions about the thing this uri is not used elsewhere. All links in the html point to the generic document uri not to the non-information resource uri So crawlers like google just follow links from information resource to information resource and never have to encounter 303s Picking up a conneg penalty for every request isn't without problems (particularly given CDN serving) but picking up a 303 penalty for every request would be madness and not something we'd ever have been able to implement I do think the dbpedia conflation of 303 with conneg is an unhelpful anti-pattern that people shouldn't be encouraged to follow. The conneg part is just REST; semantics add the 303 onto that but they're not doing the same thing Separating 303 from conneg still gives you thing vs document separation, still maintains cool uris and doesn't kill your servers And we've never had a problem with seo Hth michael On 18/07/2014 16:52, Michael Brunnbauer bru...@netestate.de wrote: Hello Mark, I cannot remember this important topic coming up earlier - which is a bit disturbing. The problem would be migitated by people using the URI they see for linking. Why not use the HTML URLs in the HTML pages for internal page rank flow? How can URIs from sparql endpoints or OAI-PMH contribute to page rank? A real problem would be RDFa where href also sets the object of a triple. Regards, Michael Brunnbauer On Fri, Jul 18, 2014 at 10:05:17PM +1000, Mark Fallu wrote: If the links we present to the outside world for
Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.
Oops, dropped laptop :-/ Continues On 23/07/2014 14:50, Michael Smethurst michael.smethu...@bbc.co.uk wrote: Hi Bill Bit of a difficult question to answer because the reality is probably still quite disjointed. Various parts of bbc.co.uk: - serve linked data - store data as rdf (in a triple store) - consume (to some extent) linked data But nowhere are all those things true in one place. So /programmes publishes linked data but the backend is a relational database, whereas things like sport / olympics are stored as linked data but don't publish So the 2 parts aren't really coupled I do half remember lots of conversations about hashes v slashes for /programmes and /music but the sites are designed to be quite granular (one thing per uri; one uri per thing) so we weren't really dealing with lots of things in a document The linked data platform (our triple store) does use # uris like: http://www.bbc.co.uk/things/794274f1-d7ea-4ad2-9b36-c46ed55da9bd#id But I'm not best placed to know about the interfaces and queries onto this and why they chose hashes and not slashes. I'll ask around unless those people are already on this list... Not much help Sorry michael On 23/07/2014 14:19, Bill Roberts b...@swirrl.com wrote: Hi Michael We've tended to use slash URIs where possible, because have found it more convenient when doing URI dereferencing from a triple-store backed site - in which case we essentially do a DESCRIBE on the relevant URI. (So we do 303ing for non-information resources, though in practice in a lot of our applications, the great majority of content is statistical data, which we treat as information resources and respond with 200). How do you organise your data and generation of URI dereferencing responses with hash based URIs? I can see a variety of ways to do it, but I'd be interested to know what you have found most efficient/convenient at the BBC - essentially dealing with the fact that the server doesn't know about what comes after the # Thanks Bill On 23 Jul 2014, at 13:52, Michael Smethurst michael.smethu...@bbc.co.uk wrote: Hello (Pretty sure I've made this comment before so please forgive any signs of premature senility) I think this may be an unfortunate side effect of the conflation of the 303 (I can't send that) pattern with the content negotiation (what flavour would you like) pattern Lots of linked data applications (like dbpedia) seem to couple the two things together. So you have a individual uri which, when you attempt to dereference does a 303 *and* conneg in one step to the display uri: /resource 303+conneg /data or /resource 303+conneg /page Many other linked data sites seem to have followed this pattern but it does seem, to my eyes, broke At the BBC we have 3 flavours of uri. I'm not sure if these are the appropriate / best labels but: - the non-information resource uri. The uri that refers to the real world physical / metaphysical thing - the generic information resource uri that identifies the document but not any specific representation of the document - the representation uri (the html or json or rdf-xml etc) We tend to use hashes rather than slashes like http://www.bbc.co.uk/programmes/b006mw1h#programme But pretending we use slashes for a minute... If you requested: http://www.bbc.co.uk/programmes/b006mw1h/thing You'd get a 303 redirect to the generic document / information resource uri: http://www.bbc.co.uk/programmes/b006mw1h Which would then conneg to the appropriate representation which would still be served from: http://www.bbc.co.uk/programmes/b006mw1h With a content location header of http://www.bbc.co.uk/programmes/b006mw1h.rdf For example Whilst the rdf refers to the non-information resource uri when making assertions about the thing this uri is not used elsewhere. All links in the html point to the generic document uri not to the non-information resource uri So crawlers like google just follow links from information resource to information resource and never have to encounter 303s Picking up a conneg penalty for every request isn't without problems (particularly given CDN serving) but picking up a 303 penalty for every request would be madness and not something we'd ever have been able to implement I do think the dbpedia conflation of 303 with conneg is an unhelpful anti-pattern that people shouldn't be encouraged to follow. The conneg part is just REST; semantics add the 303 onto that but they're not doing the same thing Separating 303 from conneg still gives you thing vs document separation, still maintains cool uris and doesn't kill your servers And we've never had a problem with seo Hth michael On 18/07/2014 16:52, Michael Brunnbauer bru...@netestate.de wrote: Hello Mark, I cannot remember this important topic coming up earlier - which is a bit disturbing. The problem would be migitated by people using the URI they
Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.
Hi Michael, Hope the laptop is ok :) So I can think of your 'slash' NIR URI as something similar to a URN: http://www.bbc.co.uk/programmes/b006mw1h/thing It doesn't do much on it's own and *just* acts as an identifier. Using HTTP it can be resolved to a URL via the 303, kind of similar to a URN resolver. Could you explain what you mean by conneg penalty? I've set up an application working with 303s and, although I don't consider myself mad, it does add an extra request to every click the user does. Getting the 303 response takes 20 - 25 ms on average, so it's not a big issue in this case (internal company usage). Interestingly enough I just checked a random shortened link off Twitter and it went through no less than 5 HTTP 301/302 redirects (500 ms in total) before getting the HTML. Taking that into consideration a single 303 is not too bad! Regards, John Walker On July 23, 2014 at 3:55 PM Michael Smethurst michael.smethu...@bbc.co.uk wrote: Oops, dropped laptop :-/ Continues On 23/07/2014 14:50, Michael Smethurst michael.smethu...@bbc.co.uk wrote: Hi Bill Bit of a difficult question to answer because the reality is probably still quite disjointed. Various parts of bbc.co.uk: - serve linked data - store data as rdf (in a triple store) - consume (to some extent) linked data But nowhere are all those things true in one place. So /programmes publishes linked data but the backend is a relational database, whereas things like sport / olympics are stored as linked data but don't publish So the 2 parts aren't really coupled I do half remember lots of conversations about hashes v slashes for /programmes and /music but the sites are designed to be quite granular (one thing per uri; one uri per thing) so we weren't really dealing with lots of things in a document The linked data platform (our triple store) does use # uris like: http://www.bbc.co.uk/things/794274f1-d7ea-4ad2-9b36-c46ed55da9bd#id But I'm not best placed to know about the interfaces and queries onto this and why they chose hashes and not slashes. I'll ask around unless those people are already on this list... Not much help Sorry michael On 23/07/2014 14:19, Bill Roberts b...@swirrl.com wrote: Hi Michael We've tended to use slash URIs where possible, because have found it more convenient when doing URI dereferencing from a triple-store backed site - in which case we essentially do a DESCRIBE on the relevant URI. (So we do 303ing for non-information resources, though in practice in a lot of our applications, the great majority of content is statistical data, which we treat as information resources and respond with 200). How do you organise your data and generation of URI dereferencing responses with hash based URIs? I can see a variety of ways to do it, but I'd be interested to know what you have found most efficient/convenient at the BBC - essentially dealing with the fact that the server doesn't know about what comes after the # Thanks Bill On 23 Jul 2014, at 13:52, Michael Smethurst michael.smethu...@bbc.co.uk wrote: Hello (Pretty sure I've made this comment before so please forgive any signs of premature senility) I think this may be an unfortunate side effect of the conflation of the 303 (I can't send that) pattern with the content negotiation (what flavour would you like) pattern Lots of linked data applications (like dbpedia) seem to couple the two things together. So you have a individual uri which, when you attempt to dereference does a 303 *and* conneg in one step to the display uri: /resource 303+conneg /data or /resource 303+conneg /page Many other linked data sites seem to have followed this pattern but it does seem, to my eyes, broke At the BBC we have 3 flavours of uri. I'm not sure if these are the appropriate / best labels but: - the non-information resource uri. The uri that refers to the real world physical / metaphysical thing - the generic information resource uri that identifies the document but not any specific representation of the document - the representation uri (the html or json or rdf-xml etc) We tend to use hashes rather than slashes like http://www.bbc.co.uk/programmes/b006mw1h#programme But pretending we use slashes for a minute... If you requested: http://www.bbc.co.uk/programmes/b006mw1h/thing You'd get a 303 redirect to the generic document / information resource uri: http://www.bbc.co.uk/programmes/b006mw1h Which would then conneg to the appropriate representation which would still be served from: http://www.bbc.co.uk/programmes/b006mw1h With a content location header of http://www.bbc.co.uk/programmes/b006mw1h.rdf For example Whilst the rdf refers to the non-information resource uri when making assertions about the thing this uri is not used elsewhere. All links in the html point to the generic document uri
Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.
On 7/23/14 10:50 AM, john.walker wrote: Hi Michael, Hope the laptop is ok :) So I can think of your 'slash' NIR URI as something similar to a URN: http://www.bbc.co.uk/programmes/b006mw1h/thing It doesn't do much on it's own and *just* acts as an identifier. Using HTTP it can be resolved to a URL via the 303, kind of similar to a URN resolver. Could you explain what you mean by conneg penalty? I've set up an application working with 303s and, although I don't consider myself mad, it does add an extra request to every click the user does. Getting the 303 response takes 20 - 25 ms on average, so it's not a big issue in this case (internal company usage). Interestingly enough I just checked a random shortened link off Twitter and it went through no less than 5 HTTP 301/302 redirects (500 ms in total) before getting the HTML. Taking that into consideration a single 303 is not too bad! Regards, John Walker SeeAlso, the output of our variant of Vapour that illustrates entity denotation and connotation via HTTP URIs [1] . Basically, SEO should be targeting the entity denoted by the URI http://dbpedia.org/page/Linked_data since that URI denotes a Document. The document in comprised of RDF content where format is negotiable. Links: [1] http://bit.ly/entity-denotation-and-connotaton -- Vapour deconstruction of HTTP URIs that denote and connote entities of different types . [2] http://lists.w3.org/Archives/Public/public-lod/2014Jul/0085.html -- related thread on this forum. -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this smime.p7s Description: S/MIME Cryptographic Signature
Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.
Hi Kingsley, In the case that Michael describes, could one reasonably expect that if the BBC were to embed the following triples as RDFa in the HTML served on the URL http://www.bbc.co.uk/programmes/b006mw1h, then a webcrawler would understand to go directly to the webpages about the seasons and episodes? ## start Turtle @prefix schema: http://schema.org/. http://www.bbc.co.uk/programmes/b006mw1h a schema:WebPage ; schema:about http://www.bbc.co.uk/programmes/b006mw1h/thing . http://www.bbc.co.uk/programmes/b006mw1h/thing a schema:TVSeries ; schema:name Gardeners' World ; schema:season http://www.bbc.co.uk/programmes/p00fx55j/thing , http://www.bbc.co.uk/programmes/p00fx5b7/thing ; schema:episode http://www.bbc.co.uk/programmes/b049fnfd/thing . http://www.bbc.co.uk/programmes/p00fx55j a schema:WebPage ; schema:about http://www.bbc.co.uk/programmes/p00fx55j/thing . http://www.bbc.co.uk/programmes/p00fx5b7 a schema:WebPage ; schema:about http://www.bbc.co.uk/programmes/p00fx5b7/thing . http://www.bbc.co.uk/programmes/b049fnfd a schema:WebPage ; schema:about http://www.bbc.co.uk/programmes/b049fnfd/thing . ## end Turtle I guess, as Michael mentions, having the webpages as the href targets in the HTML effectively shortcuts that indirect relation. Cheers, John On July 23, 2014 at 5:23 PM Kingsley Idehen kide...@openlinksw.com wrote: On 7/23/14 10:50 AM, john.walker wrote: Hi Michael, Hope the laptop is ok :) So I can think of your 'slash' NIR URI as something similar to a URN: http://www.bbc.co.uk/programmes/b006mw1h/thing It doesn't do much on it's own and *just* acts as an identifier. Using HTTP it can be resolved to a URL via the 303, kind of similar to a URN resolver. Could you explain what you mean by conneg penalty? I've set up an application working with 303s and, although I don't consider myself mad, it does add an extra request to every click the user does. Getting the 303 response takes 20 - 25 ms on average, so it's not a big issue in this case (internal company usage). Interestingly enough I just checked a random shortened link off Twitter and it went through no less than 5 HTTP 301/302 redirects (500 ms in total) before getting the HTML. Taking that into consideration a single 303 is not too bad! Regards, John Walker SeeAlso, the output of our variant of Vapour that illustrates entity denotation and connotation via HTTP URIs [1] . Basically, SEO should be targeting the entity denoted by the URI http://dbpedia.org/page/Linked_data since that URI denotes a Document. The document in comprised of RDF content where format is negotiable. Links: [1] http://bit.ly/entity-denotation-and-connotaton -- Vapour deconstruction of HTTP URIs that denote and connote entities of different types . [2] http://lists.w3.org/Archives/Public/public-lod/2014Jul/0085.html -- related thread on this forum. -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this
Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.
On 7/23/14 12:17 PM, john.walker wrote: Hi Kingsley, In the case that Michael describes, could one reasonably expect that if the BBC were to embed the following triples as RDFa in the HTML served on the URL http://www.bbc.co.uk/programmes/b006mw1h, then a webcrawler would understand to go directly to the webpages about the seasons and episodes? Absolutely!! They could even use the following within head/ : 1. link @rel={relation-or-predicate-identifier} href={relation-object} ../ -- indicating that the doc in question is the subject of a relation denoted by {relation-or-predicate-identifier} 2. link @rev={relation-or-predicate-identifier} href={relation-subject} ../ -- indicating that the doc in question is the object of a relation denoted by {relation-or-predicate-identifier} . Even up the ante, for smart HTTP user agents by replicating the relations above using Link: response headers. And to make your example live, I am tweaking your Turtle Snippet (aka. Nanotation) which will produce some interesting results. Note, my only change is a Nanotation parser hint i.e., ## Turtle Start ## and ## Turtle End ## :-) ## Turtle Start ## @prefix schema: http://schema.org/. http://www.bbc.co.uk/programmes/b006mw1h a schema:WebPage ; schema:about http://www.bbc.co.uk/programmes/b006mw1h/thing . http://www.bbc.co.uk/programmes/b006mw1h/thing a schema:TVSeries ; schema:name Gardeners' World ; schema:season http://www.bbc.co.uk/programmes/p00fx55j/thing , http://www.bbc.co.uk/programmes/p00fx5b7/thing ; schema:episode http://www.bbc.co.uk/programmes/b049fnfd/thing . http://www.bbc.co.uk/programmes/p00fx55j a schema:WebPage ; schema:about http://www.bbc.co.uk/programmes/p00fx55j/thing . http://www.bbc.co.uk/programmes/p00fx5b7 a schema:WebPage ; schema:about http://www.bbc.co.uk/programmes/p00fx5b7/thing . http://www.bbc.co.uk/programmes/b049fnfd a schema:WebPage ; schema:about http://www.bbc.co.uk/programmes/b049fnfd/thing . ## Turtle End ## ## start Turtle @prefix schema: http://schema.org/. http://www.bbc.co.uk/programmes/b006mw1h a schema:WebPage ; schema:about http://www.bbc.co.uk/programmes/b006mw1h/thing . http://www.bbc.co.uk/programmes/b006mw1h/thing a schema:TVSeries ; schema:name Gardeners' World ; schema:season http://www.bbc.co.uk/programmes/p00fx55j/thing , http://www.bbc.co.uk/programmes/p00fx5b7/thing ; schema:episode http://www.bbc.co.uk/programmes/b049fnfd/thing . http://www.bbc.co.uk/programmes/p00fx55j a schema:WebPage ; schema:about http://www.bbc.co.uk/programmes/p00fx55j/thing . http://www.bbc.co.uk/programmes/p00fx5b7 a schema:WebPage ; schema:about http://www.bbc.co.uk/programmes/p00fx5b7/thing . http://www.bbc.co.uk/programmes/b049fnfd a schema:WebPage ; schema:about http://www.bbc.co.uk/programmes/b049fnfd/thing . ## end Turtle I guess, as Michael mentions, having the webpages as the href targets in the HTML effectively shortcuts that indirect relation. Cheers, John On July 23, 2014 at 5:23 PM Kingsley Idehen kide...@openlinksw.com wrote: On 7/23/14 10:50 AM, john.walker wrote: Hi Michael, Hope the laptop is ok :) So I can think of your 'slash' NIR URI as something similar to a URN: http://www.bbc.co.uk/programmes/b006mw1h/thing It doesn't do much on it's own and *just* acts as an identifier. Using HTTP it can be resolved to a URL via the 303, kind of similar to a URN resolver. Could you explain what you mean by conneg penalty? I've set up an application working with 303s and, although I don't consider myself mad, it does add an extra request to every click the user does. Getting the 303 response takes 20 - 25 ms on average, so it's not a big issue in this case (internal company usage). Interestingly enough I just checked a random shortened link off Twitter and it went through no less than 5 HTTP 301/302 redirects (500 ms in total) before getting the HTML. Taking that into consideration a single 303 is not too bad! Regards, John Walker SeeAlso, the output of our variant of Vapour that illustrates entity denotation and connotation via HTTP URIs [1] . Basically, SEO should be targeting the entity denoted by the URI http://dbpedia.org/page/Linked_data since that URI denotes a Document. The document in comprised of RDF content where format is negotiable. Links: [1] http://bit.ly/entity-denotation-and-connotaton -- Vapour deconstruction of HTTP URIs that denote and connote entities of different types . [2] http://lists.w3.org/Archives/Public/public-lod/2014Jul/0085.html -- related thread on this forum. -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile:
Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.
On 23/07/2014 15:50, john.walker john.wal...@semaku.com wrote: Hi Michael, Hiya Hope the laptop is ok :) Survived another drop So I can think of your 'slash' NIR URI as something similar to a URN: http://www.bbc.co.uk/programmes/b006mw1h/thing It doesn't do much on it's own and *just* acts as an identifier. (Ignoring the fact we actually use hashes, if we did use slashes, then) yes. It's just an identifier for a real world thing. The rdf and rdfa use it to make assertions: a typeof=po:Brand about=/programmes/b006mw1h#programme href=/programmes/b006mw1h title=Gardeners' World but @href links don't travel through them Using HTTP it can be resolved to a URL via the 303, kind of similar to a URN resolver. Guess so yes, a urn that doesn't need a urn resolver cos it's an http uri Could you explain what you mean by conneg penalty? Every time a normal user clicks a link on the bits of bbc.co.uk that support linked data, they click to the generic document resource uri which then does the conneg bit to serve an appropriate representation. So it's extra work at the server end but mostly cachable. Except a bit tricky with CDNs I've set up an application working with 303s and, although I don't consider myself mad, it does add an extra request to every click the user does. Guess the madness quotient would depend on how much traffic you have to cope with. For the BBC to add an additional request for every request for a doctor who page would have been madness Getting the 303 response takes 20 - 25 ms on average, so it's not a big issue in this case (internal company usage). For internal usage it's all probably fine. But I still think it's a pattern that shouldn't be generally encouraged. On a high traffic website it's just more requests that aren't really adding anything. I think if we'd suggested the dbpedia style pattern at the BBC we'd never have gotten permission to serve linked data Interestingly enough I just checked a random shortened link off Twitter and it went through no less than 5 HTTP 301/302 redirects (500 ms in total) before getting the HTML. Yeah, it's a shambles init :-/ Taking that into consideration a single 303 is not too bad! In comparison to link shortener madness it's not that mad. But it's a redirect your servers have to handle and link shorteners are someone else's problem. Kinda michael Regards, John Walker On July 23, 2014 at 3:55 PM Michael Smethurst michael.smethu...@bbc.co.uk wrote: Oops, dropped laptop :-/ Continues On 23/07/2014 14:50, Michael Smethurst michael.smethu...@bbc.co.uk wrote: Hi Bill Bit of a difficult question to answer because the reality is probably still quite disjointed. Various parts of bbc.co.uk: - serve linked data - store data as rdf (in a triple store) - consume (to some extent) linked data But nowhere are all those things true in one place. So /programmes publishes linked data but the backend is a relational database, whereas things like sport / olympics are stored as linked data but don't publish So the 2 parts aren't really coupled I do half remember lots of conversations about hashes v slashes for /programmes and /music but the sites are designed to be quite granular (one thing per uri; one uri per thing) so we weren't really dealing with lots of things in a document The linked data platform (our triple store) does use # uris like: http://www.bbc.co.uk/things/794274f1-d7ea-4ad2-9b36-c46ed55da9bd#id But I'm not best placed to know about the interfaces and queries onto this and why they chose hashes and not slashes. I'll ask around unless those people are already on this list... Not much help Sorry michael On 23/07/2014 14:19, Bill Roberts b...@swirrl.com wrote: Hi Michael We've tended to use slash URIs where possible, because have found it more convenient when doing URI dereferencing from a triple-store backed site - in which case we essentially do a DESCRIBE on the relevant URI. (So we do 303ing for non-information resources, though in practice in a lot of our applications, the great majority of content is statistical data, which we treat as information resources and respond with 200). How do you organise your data and generation of URI dereferencing responses with hash based URIs? I can see a variety of ways to do it, but I'd be interested to know what you have found most efficient/convenient at the BBC - essentially dealing with the fact that the server doesn't know about what comes after the # Thanks Bill On 23 Jul 2014, at 13:52, Michael Smethurst michael.smethu...@bbc.co.uk wrote: Hello (Pretty sure I've made this comment before so please forgive any signs of premature senility) I think this may be an unfortunate side effect of the conflation of the 303 (I can't send that) pattern with the content negotiation (what flavour would you like) pattern
Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.
On 7/23/14 2:05 PM, Michael Smethurst wrote: For internal usage it's all probably fine. But I still think it's a pattern that shouldn't be generally encouraged. Its a horses for courses matter :-) If you choose to use hashless HTTP URIs in regards to entity denotation, you have to make the extra investment required (via 303 heuristics) for entity disambiguation [1]. Note, there are changes to HTTP that also reduce some of the confusion in this realm. For instance the use Content-Location: response headers to aid disambiguation [2]. Links: [1] http://bit.ly/WAJGCp -- HTTP URI denotation in a single slide [2] https://twitter.com/kidehen/status/476039386425868288 -- HTTP changes [3] https://twitter.com/ereteog/status/487935205240766464/photo/1 -- nice picture, but would be even clearer it had a hash based HTTP URI denoting the zebra re., denoting on the Web, what exists. -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this smime.p7s Description: S/MIME Cryptographic Signature
Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.
Hi Kingsley Very definitely starting to feel like deja vu... On 23/07/2014 20:18, Kingsley Idehen kide...@openlinksw.com wrote: On 7/23/14 2:05 PM, Michael Smethurst wrote: For internal usage it's all probably fine. But I still think it's a pattern that shouldn't be generally encouraged. Its a horses for courses matter :-) If you choose to use hashless HTTP URIs in regards to entity denotation, you have to make the extra investment required (via 303 heuristics) for entity disambiguation [1]. My only point is: if you don't conflate I can't send that (303) with what flavour would you like (conneg) you don't have to invest in more servers Note, there are changes to HTTP that also reduce some of the confusion in this realm. For instance the use Content-Location: response headers to aid disambiguation [2]. We do use content location for the (information) resource / representation split but that's REST not 303 semantics michael Links: [1] http://bit.ly/WAJGCp -- HTTP URI denotation in a single slide [2] https://twitter.com/kidehen/status/476039386425868288 -- HTTP changes [3] https://twitter.com/ereteog/status/487935205240766464/photo/1 -- nice picture, but would be even clearer it had a hash based HTTP URI denoting the zebra re., denoting on the Web, what exists. -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this
Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.
On 7/23/14 3:40 PM, Michael Smethurst wrote: Hi Kingsley Very definitely starting to feel like deja vu... On 23/07/2014 20:18, Kingsley Idehenkide...@openlinksw.com wrote: On 7/23/14 2:05 PM, Michael Smethurst wrote: For internal usage it's all probably fine. But I still think it's a pattern that shouldn't be generally encouraged. Its a horses for courses matter:-) If you choose to use hashless HTTP URIs in regards to entity denotation, you have to make the extra investment required (via 303 heuristics) for entity disambiguation [1]. My only point is: if you don't conflate I can't send that (303) with what flavour would you like (conneg) you don't have to invest in more servers Note, there are changes to HTTP that also reduce some of the confusion in this realm. For instance the use Content-Location: response headers to aid disambiguation [2]. We do use content location for the (information) resource / representation split but that's REST not 303 semantics michael There is only one kind of relation semantics in play here, and its the semantics of denotation and connotation [1][2]. HTTP URIs denote things. HTTP URLs denote documents comprised of connotation bearing content. In regards, to the current BBC programmes URIs, if you incorporate RDFa, link/, or Link: based relations, disambiguation without 303's or content negotiation is possible. RDF user agents (for example) will be able to make sense of the relations that that collective describe documents about programmes and actual programmes. Links: [1] http://bit.ly/what-does-this-bbc-programmes-uri-denote -- Vapour using RDF semantics discern what http://www.bbc.co.uk/programmes/b006mw1h denotes and connotes [2] http://bit.ly/what-does-this-bbc-programmes-doc-url-denote -- ditto but targeting http://www.bbc.co.uk/programmes/b006mw1h.rdf . -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this smime.p7s Description: S/MIME Cryptographic Signature
Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.
On 23/07/2014 21:49, Kingsley Idehen kide...@openlinksw.com wrote: On 7/23/14 3:40 PM, Michael Smethurst wrote: Hi Kingsley Very definitely starting to feel like deja vu... On 23/07/2014 20:18, Kingsley Idehenkide...@openlinksw.com wrote: On 7/23/14 2:05 PM, Michael Smethurst wrote: For internal usage it's all probably fine. But I still think it's a pattern that shouldn't be generally encouraged. Its a horses for courses matter:-) If you choose to use hashless HTTP URIs in regards to entity denotation, you have to make the extra investment required (via 303 heuristics) for entity disambiguation [1]. My only point is: if you don't conflate I can't send that (303) with what flavour would you like (conneg) you don't have to invest in more servers Note, there are changes to HTTP that also reduce some of the confusion in this realm. For instance the use Content-Location: response headers to aid disambiguation [2]. We do use content location for the (information) resource / representation split but that's REST not 303 semantics michael There is only one kind of relation semantics in play here, and its the semantics of denotation and connotation [1][2]. Tho derrida didn't have to pay for servers :-/ HTTP URIs denote things. Which can't be served (303) HTTP URLs denote documents comprised of connotation bearing content. Which can be served in assorted representations (conneg (+ content location)) Think the last time we had this conversation we broke the twitter scroll bar and agreed to disagree. Or at worst misunderstand :-) michael In regards, to the current BBC programmes URIs, if you incorporate RDFa, link/, or Link: based relations, disambiguation without 303's or content negotiation is possible. RDF user agents (for example) will be able to make sense of the relations that that collective describe documents about programmes and actual programmes. Links: [1] http://bit.ly/what-does-this-bbc-programmes-uri-denote -- Vapour using RDF semantics discern what http://www.bbc.co.uk/programmes/b006mw1h denotes and connotes [2] http://bit.ly/what-does-this-bbc-programmes-doc-url-denote -- ditto but targeting http://www.bbc.co.uk/programmes/b006mw1h.rdf . -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this
Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.
And with a few additions to reflect the _actual_ published data (using hash URIs) and what Michael described with conneg: ## Turtle Start ## @prefix schema: http://schema.org/ http://schema.org/ . @prefix dcterms: http://purl.org/dc/terms/. http://www.bbc.co.uk/programmes/b006mw1h a schema:CreativeWork ; schema:about http://www.bbc.co.uk/programmes/b006mw1h/thing . http://www.bbc.co.uk/programmes/b006mw1h#programme a schema:TVSeries ; schema:name Gardeners' World ; schema:season http://www.bbc.co.uk/programmes/p00fx55j#programme , http://www.bbc.co.uk/programmes/p00fx5b7#programme ; schema:episode http://www.bbc.co.uk/programmes/b049fnfd#programme . http://www.bbc.co.uk/programmes/p00fx55j a schema:CreativeWork ; schema:about http://www.bbc.co.uk/programmes/p00fx55j#programme ; dcterms:hasFormat http://www.bbc.co.uk/programmes/p00fx55j.html , http://www.bbc.co.uk/programmes/p00fx55j.rdf . http://www.bbc.co.uk/programmes/p00fx5b7 a schema:CreativeWork ; schema:about http://www.bbc.co.uk/programmes/p00fx5b7#programme ; dcterms:hasFormat http://www.bbc.co.uk/programmes/p00fx5b7.html , http://www.bbc.co.uk/programmes/p00fx5b7.rdf . http://www.bbc.co.uk/programmes/b049fnfd a schema:CreativeWork ; schema:about http://www.bbc.co.uk/programmes/b049fnfd#programme ; dcterms:hasFormat http://www.bbc.co.uk/programmes/b049fnfd.html , http://www.bbc.co.uk/programmes/b049fnfd.rdf . http://www.bbc.co.uk/programmes/p00fx55j.html a schema:WebPage ; schema:about http://www.bbc.co.uk/programmes/p00fx55j#programme . http://www.bbc.co.uk/programmes/p00fx5b7.html a schema:WebPage ; schema:about http://www.bbc.co.uk/programmes/p00fx5b7#programme . http://www.bbc.co.uk/programmes/b049fnfd.html a schema:WebPage ; schema:about http://www.bbc.co.uk/programmes/b049fnfd#programme . http://www.bbc.co.uk/programmes/p00fx55j.rdf a schema:DataDownload ; schema:about http://www.bbc.co.uk/programmes/p00fx55j#programme ; schema:encodesCreativeWork http://www.bbc.co.uk/programmes/p00fx55j . http://www.bbc.co.uk/programmes/p00fx5b7.rdf a schema:DataDownload ; schema:about http://www.bbc.co.uk/programmes/p00fx5b7#programme ; schema:encodesCreativeWork http://www.bbc.co.uk/programmes/p00fx5b7 . http://www.bbc.co.uk/programmes/b049fnfd.rdf a schema:DataDownload ; schema:about http://www.bbc.co.uk/programmes/b049fnfd#programme ; schema:encodesCreativeWork http://www.bbc.co.uk/programmes/b049fnfd . ## Turtle End ## On July 23, 2014 at 6:53 PM Kingsley Idehen kide...@openlinksw.com wrote: On 7/23/14 12:17 PM, john.walker wrote: Hi Kingsley, In the case that Michael describes, could one reasonably expect that if the BBC were to embed the following triples as RDFa in the HTML served on the URL http://www.bbc.co.uk/programmes/b006mw1h, then a webcrawler would understand to go directly to the webpages about the seasons and episodes? Absolutely!! They could even use the following within
Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.
On 7/23/14 5:15 PM, Michael Smethurst wrote: On 23/07/2014 21:49, Kingsley Idehen kide...@openlinksw.com wrote: On 7/23/14 3:40 PM, Michael Smethurst wrote: Hi Kingsley Very definitely starting to feel like deja vu... On 23/07/2014 20:18, Kingsley Idehenkide...@openlinksw.com wrote: On 7/23/14 2:05 PM, Michael Smethurst wrote: For internal usage it's all probably fine. But I still think it's a pattern that shouldn't be generally encouraged. Its a horses for courses matter:-) If you choose to use hashless HTTP URIs in regards to entity denotation, you have to make the extra investment required (via 303 heuristics) for entity disambiguation [1]. My only point is: if you don't conflate I can't send that (303) with what flavour would you like (conneg) you don't have to invest in more servers Note, there are changes to HTTP that also reduce some of the confusion in this realm. For instance the use Content-Location: response headers to aid disambiguation [2]. We do use content location for the (information) resource / representation split but that's REST not 303 semantics michael There is only one kind of relation semantics in play here, and its the semantics of denotation and connotation [1][2]. Tho derrida didn't have to pay for servers :-/ HTTP URIs denote things. Which can't be served (303) No, HTTP URIs simply denote things (entities). It has nothing to do with being served etc.. HTTP URLs denote documents comprised of connotation bearing content. Which can be served in assorted representations (conneg (+ content location)) No, HTTP URLs are a kind of HTTP URI that denote Web Documents. Put differently, HTTP URLs are for all intents an purposes a colloquialism for HTTP URIs that focuses on Web Documents, a particular entity type i.e., entities that are instances of the Classes denoted by the URIs: http://xmlns.com/foaf/0.1/Document, http://purl.org/ontology/bibo/Document, http://purl.org/dc/terms/BibliographicResource etc.. The very same analogy applies to WebIDs which are HTTP URIs that denote Agents i.e., entities that are instances of the Class denoted by the URI: http://xmlns.com/foaf/0.1/Agent . Think the last time we had this conversation we broke the twitter scroll bar and agreed to disagree. Or at worst misunderstand :-) Long discussions aren't necessarily bad, they can also unravel insights that are sometimes overlooked :-) BTW -- one can also deconstruct this issue a different way, starting with HTTP URI/URLs that denote Documents. It goes something like this: 1. You have a RDF document (comprised of RDF/XML content) denoted by the HTTP URI/URL http://www.bbc.co.uk/programmes/b006mw1h.rdf 2. The document above describes an entity denoted by the HTTP URI http://www.bbc.co.uk/programmes/b006mw1h#programme . We arrive at the same place (as illustrated by the Vapour links I shared). My only issue with the BBC programmes URIs right now is that http://www.bbc.co.uk/programmes/b006mw1h doesn't make its association with http://www.bbc.co.uk/programmes/b006mw1h#programme discoverable to RDF user agents. That's where Microdata, RDFa, link/, Link: come into play i.e., they provide vehicles for exposing the missing relation (association, connection, relationship property/predicate etc..). Also note: curl -IH Accept: application/rdf+xml http://www.bbc.co.uk/programmes/b006mw1h HTTP/1.1 200 OK Server: Apache Content-Type: text/html; charset=utf-8 curl -IH Accept: application/rdf+xml http://www.bbc.co.uk/programmes/b006mw1h.rdf HTTP/1.1 200 OK Server: Apache Content-Type: application/rdf+xml Which reinforces my point re. missing relation to aid RDF user agents. Simply tacking on .rdf to the end of URLs is way too brittle, when a relation (describes, describedby etc..) would do much better via RDFa, Microdata, link/, Link: etc.. Kingsley michael In regards, to the current BBC programmes URIs, if you incorporate RDFa, link/, or Link: based relations, disambiguation without 303's or content negotiation is possible. RDF user agents (for example) will be able to make sense of the relations that that collective describe documents about programmes and actual programmes. Links: [1] http://bit.ly/what-does-this-bbc-programmes-uri-denote -- Vapour using RDF semantics discern what http://www.bbc.co.uk/programmes/b006mw1h denotes and connotes [2] http://bit.ly/what-does-this-bbc-programmes-doc-url-denote -- ditto but targeting http://www.bbc.co.uk/programmes/b006mw1h.rdf . -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this -- Regards,
Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.
On 7/23/14 5:59 PM, john.walker wrote: And with a few additions to reflect the _actual_ published data (using hash URIs) and what Michael described with conneg: ## Turtle Start ## @prefix schema: http://schema.org/ http://schema.org/ . @prefix dcterms: http://purl.org/dc/terms/. http://www.bbc.co.uk/programmes/b006mw1h a schema:CreativeWork ; schema:about http://www.bbc.co.uk/programmes/b006mw1h/thing . http://www.bbc.co.uk/programmes/b006mw1h#programme a schema:TVSeries ; schema:name Gardeners' World ; schema:season http://www.bbc.co.uk/programmes/p00fx55j#programme , http://www.bbc.co.uk/programmes/p00fx5b7#programme ; schema:episode http://www.bbc.co.uk/programmes/b049fnfd#programme . http://www.bbc.co.uk/programmes/p00fx55j a schema:CreativeWork ; schema:about http://www.bbc.co.uk/programmes/p00fx55j#programme ; dcterms:hasFormat http://www.bbc.co.uk/programmes/p00fx55j.html , http://www.bbc.co.uk/programmes/p00fx55j.rdf . http://www.bbc.co.uk/programmes/p00fx5b7 a schema:CreativeWork ; schema:about http://www.bbc.co.uk/programmes/p00fx5b7#programme ; dcterms:hasFormat http://www.bbc.co.uk/programmes/p00fx5b7.html , http://www.bbc.co.uk/programmes/p00fx5b7.rdf . http://www.bbc.co.uk/programmes/b049fnfd a schema:CreativeWork ; schema:about http://www.bbc.co.uk/programmes/b049fnfd#programme ; dcterms:hasFormat http://www.bbc.co.uk/programmes/b049fnfd.html , http://www.bbc.co.uk/programmes/b049fnfd.rdf . http://www.bbc.co.uk/programmes/p00fx55j.html a schema:WebPage ; schema:about http://www.bbc.co.uk/programmes/p00fx55j#programme . http://www.bbc.co.uk/programmes/p00fx5b7.html a schema:WebPage ; schema:about http://www.bbc.co.uk/programmes/p00fx5b7#programme . http://www.bbc.co.uk/programmes/b049fnfd.html a schema:WebPage ; schema:about http://www.bbc.co.uk/programmes/b049fnfd#programme . http://www.bbc.co.uk/programmes/p00fx55j.rdf a schema:DataDownload ; schema:about http://www.bbc.co.uk/programmes/p00fx55j#programme ; schema:encodesCreativeWork http://www.bbc.co.uk/programmes/p00fx55j . http://www.bbc.co.uk/programmes/p00fx5b7.rdf a schema:DataDownload ; schema:about http://www.bbc.co.uk/programmes/p00fx5b7#programme ; schema:encodesCreativeWork http://www.bbc.co.uk/programmes/p00fx5b7 . http://www.bbc.co.uk/programmes/b049fnfd.rdf a schema:DataDownload ; schema:about http://www.bbc.co.uk/programmes/b049fnfd#programme ; schema:encodesCreativeWork http://www.bbc.co.uk/programmes/b049fnfd . ## Turtle End ## Yep!! And that enables an RDF agent produce output such as: [1] http://linkeddata.uriburner.com/about/html/http/lists.w3.org/Archives/Public/public-lod/2014Jul/0121.html -- basic document description [2] http://bit.ly/cool-uris-303-entity-ranking-fyn -- deeper follow-your-nose oriented document description [3] http://bit.ly/statements-made-by-john-walker-in-lod-list-post -- statements discerned and then reified, via the nanotations (micro annotations) in your post :-) Kingsley On July 23, 2014 at 6:53 PM Kingsley Idehen kide...@openlinksw.com wrote: On 7/23/14 12:17 PM, john.walker wrote: Hi Kingsley, In the case that Michael describes, could one reasonably expect that if the BBC were to embed the following triples as RDFa in the HTML served on the URL http://www.bbc.co.uk/programmes/b006mw1h, then a webcrawler would understand to go directly to the webpages about the seasons and episodes? Absolutely!! They could even use the following within -- Regards, Kingsley Idehen Founder CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this smime.p7s Description: S/MIME Cryptographic Signature
Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.
I am attempting to understand how the the CoolURI 303 redirect pattern for the semantic web (http://www.w3.org/TR/cooluris/) can be implemented without negative impact on search engines. This pattern appears to allow site content to be indexed, but prevents page rank from flowing through internal links due to the use of a 303 redirect. For example in Griffith's Research-Hub: http://research-hub.griffith.edu.au A get request to the URI of Howard Wiseman: http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f Will resolve to different urls based on content negotiation. For RDF: wget --header Accept: application/rdf+xml http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f results in a 303 see other redirect to the RDF version of the entity: http://research-hub.griffith.edu.au/rdf/n33a4e2d3057476efaff5ce1884564a8f/n33a4e2d3057476efaff5ce1884564a8f.rdf For HTML: wget --header Accept: text/html http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f results in a 303 see other redirect to the HTML version of the entity (our old friend the display version: http://research-hub.griffith.edu.au/display/n33a4e2d3057476efaff5ce1884564a8f Note: There will never be a HTML page at http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f just a HTTP response Links will be presented as the individual uri and then redirect to the display url. All good so far - this is a perfectly functional example of the Cool URI specification at work. Unfortunately it results in a few issues in practice. If the links we present to the outside world for harvesting eg. via sparql endpoint, OAI-PMH or open social widget etc is the canonical individual URI, clients will be able to get to the display url, but the google page rank that would normally flow from these external links will not. The specification of a 303 redirect describes it as: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html The response to the request can be found under a different URI and SHOULD be retrieved using a GET method on that resource. This method exists primarily to allow the output of a POST-activated script to redirect the user agent to a selected resource. *The new URI is not a substitute reference for the originally requested resource*. The 303 response MUST NOT be cached, but the response to the second (redirected) request might be cacheable. The different URI SHOULD be given by the Location field in the response. Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s). Google correctly implements the specification and does not assign the page rank of the individual URI to the display URL as it is *not a substitute reference for the originally requested resource.* The same is true of internal links, a high page rank home page will not pass page rank on to display urls if the pathway to those urls is via individual uri links. I am not sure what the solution is here as it seems the realms of SEO and the conventions of the web they are built on are not a good fit for semantic web best practice. The most minimal compromise I can think of is to move away from the use of a 303 redirect to a redirect that conserves the flow of google page rank. - 302 Found redirect is the recommended replacement for 303 for clients that do not support HTTP 1.1 and it does allow a certain amount of google page rank to flow. - 301 Moved Permanently is a poor fit for the Cool URI pattern, but passes on the full page rank of the links. - rewriting all URIs the URL would also work, but would break the coolURI pattern. The pragmatist in me feels that if we are going to make a change for the purposes of SEO, it might as well be the one with best return, i.e. 301 redirect. Note: Indexing is not the problem here, content is indexed. The issue relates to page rank not flowing through a 303 redirect. I have tested and can confirm that 303 redirects are an issue for a number of reasons: - page rank does not flow through a 303 redirect - page rank can not be assigned from a url to a uri with a rel=canonical tag if URI does a 303 redirect (preventing aggregation of pagerank from external links to URL) - URI and URL are indexed separately - rdfa schema.org representations of URIs do not translate to URL (ie. representation described at URL A, talking about URI B, does not get connected to representation described at URL B) - url parameters are not passed by a 303 redirect. - impact on functinality of google analytics tracking eg. traversing the site is seen as a series of direct page visits. Essentially - as far as search engines are concerned - every URL and URI is an island, with no connections between them. At best a URL can express a rel=canonical back to it's corresponding URI, no pagerank will flow through links.
Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.
Hello Mark, I cannot remember this important topic coming up earlier - which is a bit disturbing. The problem would be migitated by people using the URI they see for linking. Why not use the HTML URLs in the HTML pages for internal page rank flow? How can URIs from sparql endpoints or OAI-PMH contribute to page rank? A real problem would be RDFa where href also sets the object of a triple. Regards, Michael Brunnbauer On Fri, Jul 18, 2014 at 10:05:17PM +1000, Mark Fallu wrote: If the links we present to the outside world for harvesting eg. via sparql endpoint, OAI-PMH or open social widget etc is the canonical individual URI, clients will be able to get to the display url, but the google page rank that would normally flow from these external links will not. The specification of a 303 redirect describes it as: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html The response to the request can be found under a different URI and SHOULD be retrieved using a GET method on that resource. This method exists primarily to allow the output of a POST-activated script to redirect the user agent to a selected resource. *The new URI is not a substitute reference for the originally requested resource*. The 303 response MUST NOT be cached, but the response to the second (redirected) request might be cacheable. The different URI SHOULD be given by the Location field in the response. Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s). Google correctly implements the specification and does not assign the page rank of the individual URI to the display URL as it is *not a substitute reference for the originally requested resource.* The same is true of internal links, a high page rank home page will not pass page rank on to display urls if the pathway to those urls is via individual uri links. I am not sure what the solution is here as it seems the realms of SEO and the conventions of the web they are built on are not a good fit for semantic web best practice. The most minimal compromise I can think of is to move away from the use of a 303 redirect to a redirect that conserves the flow of google page rank. - 302 Found redirect is the recommended replacement for 303 for clients that do not support HTTP 1.1 and it does allow a certain amount of google page rank to flow. - 301 Moved Permanently is a poor fit for the Cool URI pattern, but passes on the full page rank of the links. - rewriting all URIs the URL would also work, but would break the coolURI pattern. The pragmatist in me feels that if we are going to make a change for the purposes of SEO, it might as well be the one with best return, i.e. 301 redirect. Note: Indexing is not the problem here, content is indexed. The issue relates to page rank not flowing through a 303 redirect. I have tested and can confirm that 303 redirects are an issue for a number of reasons: - page rank does not flow through a 303 redirect - page rank can not be assigned from a url to a uri with a rel=canonical tag if URI does a 303 redirect (preventing aggregation of pagerank from external links to URL) - URI and URL are indexed separately - rdfa schema.org representations of URIs do not translate to URL (ie. representation described at URL A, talking about URI B, does not get connected to representation described at URL B) - url parameters are not passed by a 303 redirect. - impact on functinality of google analytics tracking eg. traversing the site is seen as a series of direct page visits. Essentially - as far as search engines are concerned - every URL and URI is an island, with no connections between them. At best a URL can express a rel=canonical back to it's corresponding URI, no pagerank will flow through links. Any guidance you can provide would be appreciated. -- o-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- | Mark Fallu | Manager, Research Data (Acting) | Office for Research | Bray Centre (N54) 0.10E | Griffith University, Nathan Campus | Queensland 4111 AUSTRALIA | | E-mail: m.fa...@griffith.edu.au | Mobile: 04177 69778 | Phone: +61 (07) 373 52069 o-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail bru...@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel pgpwlzdhcwqRx.pgp Description: PGP signature
Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.
On 18 July 2014 14:05, Mark Fallu m.fa...@griffith.edu.au wrote: I am attempting to understand how the the CoolURI 303 redirect pattern for the semantic web (http://www.w3.org/TR/cooluris/) can be implemented without negative impact on search engines. Just a quick question: Is there any reason you want to use 303s? I personally consider it an anti-pattern. This pattern appears to allow site content to be indexed, but prevents page rank from flowing through internal links due to the use of a 303 redirect. For example in Griffith's Research-Hub: http://research-hub.griffith.edu.au A get request to the URI of Howard Wiseman: http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f Will resolve to different urls based on content negotiation. For RDF: wget --header Accept: application/rdf+xml http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f results in a 303 see other redirect to the RDF version of the entity: http://research-hub.griffith.edu.au/rdf/n33a4e2d3057476efaff5ce1884564a8f/n33a4e2d3057476efaff5ce1884564a8f.rdf For HTML: wget --header Accept: text/html http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f results in a 303 see other redirect to the HTML version of the entity (our old friend the display version: http://research-hub.griffith.edu.au/display/n33a4e2d3057476efaff5ce1884564a8f Note: There will never be a HTML page at http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f just a HTTP response Links will be presented as the individual uri and then redirect to the display url. All good so far - this is a perfectly functional example of the Cool URI specification at work. Unfortunately it results in a few issues in practice. If the links we present to the outside world for harvesting eg. via sparql endpoint, OAI-PMH or open social widget etc is the canonical individual URI, clients will be able to get to the display url, but the google page rank that would normally flow from these external links will not. The specification of a 303 redirect describes it as: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html The response to the request can be found under a different URI and SHOULD be retrieved using a GET method on that resource. This method exists primarily to allow the output of a POST-activated script to redirect the user agent to a selected resource. *The new URI is not a substitute reference for the originally requested resource*. The 303 response MUST NOT be cached, but the response to the second (redirected) request might be cacheable. The different URI SHOULD be given by the Location field in the response. Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s). Google correctly implements the specification and does not assign the page rank of the individual URI to the display URL as it is *not a substitute reference for the originally requested resource.* The same is true of internal links, a high page rank home page will not pass page rank on to display urls if the pathway to those urls is via individual uri links. I am not sure what the solution is here as it seems the realms of SEO and the conventions of the web they are built on are not a good fit for semantic web best practice. The most minimal compromise I can think of is to move away from the use of a 303 redirect to a redirect that conserves the flow of google page rank. - 302 Found redirect is the recommended replacement for 303 for clients that do not support HTTP 1.1 and it does allow a certain amount of google page rank to flow. - 301 Moved Permanently is a poor fit for the Cool URI pattern, but passes on the full page rank of the links. - rewriting all URIs the URL would also work, but would break the coolURI pattern. The pragmatist in me feels that if we are going to make a change for the purposes of SEO, it might as well be the one with best return, i.e. 301 redirect. Note: Indexing is not the problem here, content is indexed. The issue relates to page rank not flowing through a 303 redirect. I have tested and can confirm that 303 redirects are an issue for a number of reasons: - page rank does not flow through a 303 redirect - page rank can not be assigned from a url to a uri with a rel=canonical tag if URI does a 303 redirect (preventing aggregation of pagerank from external links to URL) - URI and URL are indexed separately - rdfa schema.org representations of URIs do not translate to URL (ie. representation described at URL A, talking about URI B, does not get connected to representation described at URL B) - url parameters are not passed by a 303 redirect. - impact on functinality of google analytics tracking eg. traversing the site is seen as a series of
Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.
On 18 July 2014 14:05, Mark Fallu m.fa...@griffith.edu.au wrote: I am attempting to understand how the the CoolURI 303 redirect pattern for the semantic web (http://www.w3.org/TR/cooluris/) can be implemented without negative impact on search engines. Just a quick question: Is there any reason you want to use 303s? I personally consider it an anti-pattern. Thank you, Melvin. I think so too. short version:anti-pattern long version: Eastern Australia is 13 hours ahead of the Central United States so ... On Saturday night in Dallas there is no semantic difference between praying Australians and liquored-up cowboys. Bug or a Feature ? No, anti-pattern.
Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.
Frankly I don't care about PageRank, and these days I don't know if Google does. These days Google gets direct sampling of user behavior through Chrome and Google Analytics, and this sort of data is probably much more valuable than the link graph since they know about things like time-on-page, query chains, and things like that. If anything, PageRank, or what people imagine about PageRank has been harmful to the web because it's created a situation where people just don't make links to other web sites anymore. It started with high profile sites (ex. engadget) that just wanted to be greedy and not give any PageRank to their competition. Then you saw people using the NOFOLLOW attribute because they thought that this too was a way to be greedy. Ten years ago I got a lot of emails from people that amounted to I will pay you $X if you make a link on page Y to page Z with anchor text T. You'd also find SEO firms that would ask for $X a month to generate Y links to your site. Recently Google made some changes and they seem to be punishing people who have inappropriate links so now people get emails like Would you please remove the link from page X to page Y and the new thing is that SEO firms now want you to pay them $X to remove Y links to your site. I think it is all a lot of bull and I make whatever links I like and figure that Google is going to do whatever it is they are going to do. ᐧ On Fri, Jul 18, 2014 at 8:05 AM, Mark Fallu m.fa...@griffith.edu.au wrote: I am attempting to understand how the the CoolURI 303 redirect pattern for the semantic web (http://www.w3.org/TR/cooluris/) can be implemented without negative impact on search engines. This pattern appears to allow site content to be indexed, but prevents page rank from flowing through internal links due to the use of a 303 redirect. For example in Griffith's Research-Hub: http://research-hub.griffith.edu.au A get request to the URI of Howard Wiseman: http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f Will resolve to different urls based on content negotiation. For RDF: wget --header Accept: application/rdf+xml http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f results in a 303 see other redirect to the RDF version of the entity: http://research-hub.griffith.edu.au/rdf/n33a4e2d3057476efaff5ce1884564a8f/n33a4e2d3057476efaff5ce1884564a8f.rdf For HTML: wget --header Accept: text/html http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f results in a 303 see other redirect to the HTML version of the entity (our old friend the display version: http://research-hub.griffith.edu.au/display/n33a4e2d3057476efaff5ce1884564a8f Note: There will never be a HTML page at http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f just a HTTP response Links will be presented as the individual uri and then redirect to the display url. All good so far - this is a perfectly functional example of the Cool URI specification at work. Unfortunately it results in a few issues in practice. If the links we present to the outside world for harvesting eg. via sparql endpoint, OAI-PMH or open social widget etc is the canonical individual URI, clients will be able to get to the display url, but the google page rank that would normally flow from these external links will not. The specification of a 303 redirect describes it as: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html The response to the request can be found under a different URI and SHOULD be retrieved using a GET method on that resource. This method exists primarily to allow the output of a POST-activated script to redirect the user agent to a selected resource. The new URI is not a substitute reference for the originally requested resource. The 303 response MUST NOT be cached, but the response to the second (redirected) request might be cacheable. The different URI SHOULD be given by the Location field in the response. Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s). Google correctly implements the specification and does not assign the page rank of the individual URI to the display URL as it is not a substitute reference for the originally requested resource. The same is true of internal links, a high page rank home page will not pass page rank on to display urls if the pathway to those urls is via individual uri links. I am not sure what the solution is here as it seems the realms of SEO and the conventions of the web they are built on are not a good fit for semantic web best practice. The most minimal compromise I can think of is to move away from the use of a 303 redirect to a redirect that conserves the flow of google page rank. 302 Found redirect is the recommended replacement for 303 for clients that do not support HTTP 1.1
Re: Linked Data and Semantic Web CoolURIs, 303 redirects and Page Rank.
That is a fair point - but I would still suggest that it is important for search engines to be able to meaningfully interpret: - internal links - rdfa representations that span multiple pages. Cheers, Mark Sent from my iPhone On 19 Jul 2014, at 3:02 am, Paul Houle ontolo...@gmail.com wrote: Frankly I don't care about PageRank, and these days I don't know if Google does. These days Google gets direct sampling of user behavior through Chrome and Google Analytics, and this sort of data is probably much more valuable than the link graph since they know about things like time-on-page, query chains, and things like that. If anything, PageRank, or what people imagine about PageRank has been harmful to the web because it's created a situation where people just don't make links to other web sites anymore. It started with high profile sites (ex. engadget) that just wanted to be greedy and not give any PageRank to their competition. Then you saw people using the NOFOLLOW attribute because they thought that this too was a way to be greedy. Ten years ago I got a lot of emails from people that amounted to I will pay you $X if you make a link on page Y to page Z with anchor text T. You'd also find SEO firms that would ask for $X a month to generate Y links to your site. Recently Google made some changes and they seem to be punishing people who have inappropriate links so now people get emails like Would you please remove the link from page X to page Y and the new thing is that SEO firms now want you to pay them $X to remove Y links to your site. I think it is all a lot of bull and I make whatever links I like and figure that Google is going to do whatever it is they are going to do. ᐧ On Fri, Jul 18, 2014 at 8:05 AM, Mark Fallu m.fa...@griffith.edu.au wrote: I am attempting to understand how the the CoolURI 303 redirect pattern for the semantic web (http://www.w3.org/TR/cooluris/) can be implemented without negative impact on search engines. This pattern appears to allow site content to be indexed, but prevents page rank from flowing through internal links due to the use of a 303 redirect. For example in Griffith's Research-Hub: http://research-hub.griffith.edu.au A get request to the URI of Howard Wiseman: http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f Will resolve to different urls based on content negotiation. For RDF: wget --header Accept: application/rdf+xml http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f results in a 303 see other redirect to the RDF version of the entity: http://research-hub.griffith.edu.au/rdf/n33a4e2d3057476efaff5ce1884564a8f/n33a4e2d3057476efaff5ce1884564a8f.rdf For HTML: wget --header Accept: text/html http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f results in a 303 see other redirect to the HTML version of the entity (our old friend the display version: http://research-hub.griffith.edu.au/display/n33a4e2d3057476efaff5ce1884564a8f Note: There will never be a HTML page at http://research-hub.griffith.edu.au/individual/n33a4e2d3057476efaff5ce1884564a8f just a HTTP response Links will be presented as the individual uri and then redirect to the display url. All good so far - this is a perfectly functional example of the Cool URI specification at work. Unfortunately it results in a few issues in practice. If the links we present to the outside world for harvesting eg. via sparql endpoint, OAI-PMH or open social widget etc is the canonical individual URI, clients will be able to get to the display url, but the google page rank that would normally flow from these external links will not. The specification of a 303 redirect describes it as: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html The response to the request can be found under a different URI and SHOULD be retrieved using a GET method on that resource. This method exists primarily to allow the output of a POST-activated script to redirect the user agent to a selected resource. The new URI is not a substitute reference for the originally requested resource. The 303 response MUST NOT be cached, but the response to the second (redirected) request might be cacheable. The different URI SHOULD be given by the Location field in the response. Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s). Google correctly implements the specification and does not assign the page rank of the individual URI to the display URL as it is not a substitute reference for the originally requested resource. The same is true of internal links, a high page rank home page will not pass page rank on to display urls if the pathway to those urls is via individual uri links. I am not sure what the solution is here as it seems the realms of SEO and