afs opened a new pull request, #2769:
URL: https://github.com/apache/jena/pull/2769
GitHub issue resolved #2766
Pull request Description:
Due to Java bytes to string conversion using the JDK conversion, Jena can't
tell the difference between multibyte characters translated to surrogates
(legal) and surrogates actually in the in UTF-08 (illegal - UTF-8 does not
allow surrogates).
The test changes are bug fixes. They are detecting warnings on the
replacement character but that is explicitly handled, and allowed, controlled
by a flag, further up.
A deep fix might be possible - but it involves our own UTF-8 decoder and
will need careful assessment of the performance impact.
----
- [x] Commits have been squashed to remove intermediate development commit
messages.
- [x] Key commit messages start with the issue number (GH-xxxx)
By submitting this pull request, I acknowledge that I am making a
contribution to the Apache Software Foundation under the terms and conditions
of the [Contributor's
Agreement](https://www.apache.org/licenses/contributor-agreements.html).
----
See the [Apache Jena "Contributing"
guide](https://github.com/apache/jena/blob/main/CONTRIBUTING.md).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]