The following issue has been SUBMITTED. ====================================================================== http://bugs.librdf.org/mantis/view.php?id=402 ====================================================================== Reported By: normang Assigned To: ====================================================================== Project: Raptor RDF Syntax Library Issue ID: 402 Category: api Reproducibility: always Severity: major Priority: normal Status: new Syntax Name: RDFa ====================================================================== Date Submitted: 2010-11-30 20:20 Last Modified: 2010-11-30 20:20 ====================================================================== Summary: Parser does not respect Content-Location header when parsing RDFa from web Description: RFC 2616 section 14.14 says: "The value of Content-Location also defines the base URI for the entity" (this is the second of only two mentions of "base URI" in the document).
HTML 4 section 12.4.1 <http://www.w3.org/TR/1999/REC-html401-19991224/struct/links.html#h-12.4.1> says that "The base URI is given by meta data discovered during a protocol interaction, such as an HTTP header (see [RFC2616])" (this puts it in priority below the <base> element, and above the document's URI). I've listed this as a 'major' bug, because of my prejudices about standards conformance (but I'm getting therapy), but I won't be offended (!) if you class it instead as 'minor'.... Steps to Reproduce: 1. Configure a web document to include RDFa, and to send a content-location header on retrieval. For example (not a persistent URI): % curl -i http://text.nxg.me.uk/temp/test.html HTTP/1.1 200 OK Date: Tue, 30 Nov 2010 20:16:49 GMT Server: Apache/1.3.41 content-location: http://text.nxg.me.uk/elsewhere/foo.html Last-Modified: Tue, 30 Nov 2010 19:41:44 GMT ETag: "3c1a097-182-4cf55378" Content-Length: 386 Connection: close Content-Type: text/html; charset=utf-8 <?xml version='1.0' encoding='UTF-8'?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml -rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:dcterms="http://purl.org/dc/terms/"> <head> <title>Test document</title> </head> <body> <div about='wibble.html'> <h1 property='dcterms:title'>Test number one</h1> </div> </body> </html> 2. Use rapper to parse this % rapper --version 1.9.0 % rapper -irdfa -oturtle http://text.nxg.me.uk/temp/test.html rapper: Parsing URI http://text.nxg.me.uk/temp/test.html with parser rdfa rapper: Serializing with serializer turtle @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix : <http://www.w3.org/1999/xhtml> . @prefix dcterms: <http://purl.org/dc/terms/> . <http://text.nxg.me.uk/temp/wibble.html> dcterms:title "Test number one" . rapper: Parsing returned 1 triple % Because of the Content-Location header, the about='wibble.html' should have been resolved relative to <http://text.nxg.me.uk/elsewhere/foo.html>, so that the subject of the single RDFa statement should have been <http://text.nxg.me.uk/elsewhere/wibble.html> Additional Information: It gets slightly more complicated with HTML5. For the definition of the base URI, HTML5 defers to the XML Base specification (sect. 2.6.1, step 4 <http://www.w3.org/TR/html5/urls.html#document-base-url>). The XML Base specification <http://www.w3.org/TR/xmlbase/> sect. 4.1 defers to RFC 3986. Section 5.1.2 of that says that "If no base URI is embedded, the base URI is defined by the representation's retrieval context", and goes on to give an example involving MIME. It's not completely clear what this means, but I believe it is most naturally interpreted as referring to a mechanism like that in sect. 14.14 of RFC 2616, meaning that Content-Location still trumps retrieval-URI (and is trumped by xml:base). It's not a knock-down case, but all the above does strongly suggest to me that the RFC 2616 intention for the Content-Location header is clear: downstream processors should regard the Content-Location header's URI as the effective base URI for the document, irrespective of the URI it was actually retrieved from. And raptor doesn't. ====================================================================== Issue History Date Modified Username Field Change ====================================================================== 2010-11-30 20:20 normang New Issue ====================================================================== _______________________________________________ redland-dev mailing list [email protected] http://lists.librdf.org/mailman/listinfo/redland-dev
