[jira] [Commented] (JENA-688) Update N-Triples and N-Quad Parsers to UTF-8

Rob Vesse (JIRA) Mon, 28 Apr 2014 09:33:38 -0700

    [ 
https://issues.apache.org/jira/browse/JENA-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983160#comment-13983160
 ]


Rob Vesse commented on JENA-688:
--------------------------------

The main reason is that I might want to interact with older tools that know 
nothing about UTF-8 encoded NTriples and will choke and die if they see 
anything outside of the ASCII range.  Bear in mind that NTriples as a UTF-8 
encoding was only officially standardised as of late Februrary 2014 so any tool 
released before then either won't support it at all or may only partially 
support.  Also since the prior version of NTriples from the RDF 1.1 Test Cases 
circa 2004 was specified as an ASCII encoding there will be plenty of tools out 
there that only accept and produce ASCII NTriples.

RIOT already allows for configuring NTriples output to use ASCII instead of 
UTF-8 so we should also permit this same functionality for input.

What if you want to validate that some NTriples will be backwards compatible 
with an older tool?  As it stands your only option would be to read in as UTF-8 
and write out as ASCII, potentially wasting lots of IO effort if the data is 
already ASCII NTriples

Note that we use the existing ASCII output functionality extensively at Cray 
because we have a massively parallel bulk loader that only reads ASCII NTriples.

> Update N-Triples and N-Quad Parsers to UTF-8
> --------------------------------------------
>
>                 Key: JENA-688
>                 URL: https://issues.apache.org/jira/browse/JENA-688
>             Project: Apache Jena
>          Issue Type: Bug
>            Reporter: Stephen Allen
>            Assignee: Rob Vesse
>         Attachments: JENA-688.patch
>
>
> RDF 1.1 defines the encoding for N-Triples (and presumably also N-Quads) as 
> UTF-8 instead of the previous ASCII.
> This patch updates the Riot parsers to use UTF-8 instead of ASCII.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (JENA-688) Update N-Triples and N-Quad Parsers to UTF-8

Reply via email to