Pascal Christoph created JENA-457:
-------------------------------------

             Summary: ntriples: Object-URIs should be %-encoded
                 Key: JENA-457
                 URL: https://issues.apache.org/jira/browse/JENA-457
             Project: Apache Jena
          Issue Type: Improvement
          Components: ARQ, Jena, RDF API
    Affects Versions: ARQ 2.9.3
         Environment: everywhere
            Reporter: Pascal Christoph
            Priority: Minor


Ntriple serialization is in pure ASCII for now[1] , so IRIs are not possible as 
UTF8 is not allowed (see rfc3987). Serializing a Model to ntriples escapes 
non-ASCII characters with '\u' escaping. These URIs don't resolve in most cases 
per se, e.g. in dbpedia. These are the three different notations possible:

1. http://de.dbpedia.org/resource/T\u00FCr
2. http://de.dbpedia.org/resource/T%fcr
3. http://de.dbpedia.org/resource/Tür

While the 1. doesn't resolve and the 3. is not ASCII, the 2. (the percent-octet 
encoding) fulfills both requirements. So I would like to see the use of the 2. 
to encode object URIs in ASCII ntriple serialization. See also 
https://answers.semanticweb.com/questions/18508/best-way-to-encode-uri-refsiris-for-n-triples
 .

One could use jena to serialize as turtle and transform this turtle file to 
ntriples with rapper. But rapper encodes all literals having 
unicode-escape-sequences to utf8 ignoring the transformation of URIs (wisely, 
since they are identifier). So this does not help.

Some concrete code which is responsible for this serialization:

 RDFWriter fasterWriter = model.getWriter("N-TRIPLE");

Should be save to apply a patch like this in NTripleWriter.java:

private static void writeURIString(String s, PrintWriter writer) {
    writer.print(org.apache.commons.httpclient.util.URIUtil.encodeQuery(s) ) ;
}
(not tested)

What do you think?
-o

[1]see a month old note from W3C where it is proposed to use utf-8 instead of 
ASCII : http://www.w3.org/TR/2013/NOTE-n-triples-20130409/#n-triple-changes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to