Andy Seaborne created JENA-2186:
-----------------------------------

             Summary: Write U+FFFD as Unicode escape
                 Key: JENA-2186
                 URL: https://issues.apache.org/jira/browse/JENA-2186
             Project: Apache Jena
          Issue Type: Improvement
    Affects Versions: Jena 4.2.0
            Reporter: Andy Seaborne
             Fix For: Jena 4.3.0


U+FFFD (Unicode replacement character) arises when there is an encoding 
mismatch between the input bytes and UTF-8 (see the [wikipedia 
article|https://en.wikipedia.org/wiki/Specials_(Unicode_block)#Replacement_character]).

The tokenizer for Turtle/N-Triple etc raises a warning when a literal U+FFFD is 
encountered to notify users/applications of potential problems.

The tokenizer does not warn if it is written intentionally in the input stream 
as {{\uFFFD}} (6 characters).

The write should this unicode escape form so charcater FFFD is written and read 
in again without warning.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to