[
https://issues.apache.org/jira/browse/JENA-688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983160#comment-13983160
]
Rob Vesse commented on JENA-688:
--------------------------------
The main reason is that I might want to interact with older tools that know
nothing about UTF-8 encoded NTriples and will choke and die if they see
anything outside of the ASCII range. Bear in mind that NTriples as a UTF-8
encoding was only officially standardised as of late Februrary 2014 so any tool
released before then either won't support it at all or may only partially
support. Also since the prior version of NTriples from the RDF 1.1 Test Cases
circa 2004 was specified as an ASCII encoding there will be plenty of tools out
there that only accept and produce ASCII NTriples.
RIOT already allows for configuring NTriples output to use ASCII instead of
UTF-8 so we should also permit this same functionality for input.
What if you want to validate that some NTriples will be backwards compatible
with an older tool? As it stands your only option would be to read in as UTF-8
and write out as ASCII, potentially wasting lots of IO effort if the data is
already ASCII NTriples
Note that we use the existing ASCII output functionality extensively at Cray
because we have a massively parallel bulk loader that only reads ASCII NTriples.
> Update N-Triples and N-Quad Parsers to UTF-8
> --------------------------------------------
>
> Key: JENA-688
> URL: https://issues.apache.org/jira/browse/JENA-688
> Project: Apache Jena
> Issue Type: Bug
> Reporter: Stephen Allen
> Assignee: Rob Vesse
> Attachments: JENA-688.patch
>
>
> RDF 1.1 defines the encoding for N-Triples (and presumably also N-Quads) as
> UTF-8 instead of the previous ASCII.
> This patch updates the Riot parsers to use UTF-8 instead of ASCII.
--
This message was sent by Atlassian JIRA
(v6.2#6252)