Andy Seaborne created JENA-2186:
-----------------------------------
Summary: Write U+FFFD as Unicode escape
Key: JENA-2186
URL: https://issues.apache.org/jira/browse/JENA-2186
Project: Apache Jena
Issue Type: Improvement
Affects Versions: Jena 4.2.0
Reporter: Andy Seaborne
Fix For: Jena 4.3.0
U+FFFD (Unicode replacement character) arises when there is an encoding
mismatch between the input bytes and UTF-8 (see the [wikipedia
article|https://en.wikipedia.org/wiki/Specials_(Unicode_block)#Replacement_character]).
The tokenizer for Turtle/N-Triple etc raises a warning when a literal U+FFFD is
encountered to notify users/applications of potential problems.
The tokenizer does not warn if it is written intentionally in the input stream
as {{\uFFFD}} (6 characters).
The write should this unicode escape form so charcater FFFD is written and read
in again without warning.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)