[
https://issues.apache.org/jira/browse/JENA-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432177#comment-17432177
]
Holger Knublauch commented on JENA-2179:
----------------------------------------
BTW the same seems to happen using RDF Delta:
{code:java}
[line: 1276, col: 437] Unicode replacement character U+FFFD.
org.apache.jena.riot.RiotParseException: [line: 1276, col: 428] Unicode
replacement character U+FFFD in string
at
org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerRiotParseException.warning(ErrorHandlerFactory.java:367)
at org.apache.jena.riot.tokens.TokenizerText.warning(TokenizerText.java:1332)
at org.apache.jena.riot.tokens.TokenizerText.readString(TokenizerText.java:768)
at org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:238)
at org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:89)
at
org.seaborne.patch.text.RDFPatchReaderText.nextToken(RDFPatchReaderText.java:243)
at
org.seaborne.patch.text.RDFPatchReaderText.nextNode(RDFPatchReaderText.java:254)
at
org.seaborne.patch.text.RDFPatchReaderText.doOneLine(RDFPatchReaderText.java:104)
at org.seaborne.patch.text.RDFPatchReaderText.apply1(RDFPatchReaderText.java:72)
at org.seaborne.patch.text.RDFPatchReaderText.read(RDFPatchReaderText.java:49)
at org.seaborne.patch.text.RDFPatchReaderText.apply(RDFPatchReaderText.java:59)
at
org.seaborne.delta.client.DeltaLinkHTTP.lambda$fetchCommon$8(DeltaLinkHTTP.java:211)
at org.seaborne.delta.client.DeltaLinkHTTP.retry(DeltaLinkHTTP.java:125)
at org.seaborne.delta.client.DeltaLinkHTTP.fetchCommon(DeltaLinkHTTP.java:204)
at org.seaborne.delta.client.DeltaLinkHTTP.fetch(DeltaLinkHTTP.java:184)
at org.topbraidlive.edg.backup.BackupUtils.getPatch(BackupUtils.java:368)
{code}
> TDB throws Unicode Replacement Character exception while fetching data
> ----------------------------------------------------------------------
>
> Key: JENA-2179
> URL: https://issues.apache.org/jira/browse/JENA-2179
> Project: Apache Jena
> Issue Type: Bug
> Components: TDB
> Affects Versions: Jena 4.2.0
> Reporter: Holger Knublauch
> Assignee: Andy Seaborne
> Priority: Major
> Fix For: Jena 4.3.0
>
> Attachments: TBS4190_Test.java
>
>
> This seems to have been introduced with
> https://issues.apache.org/jira/browse/JENA-2120
> With TDB databases that contain the replacement character in a literal, the
> warnings are reported as Exceptions. We have seen this:
> {code:java}
> WARN [http-nio-8083-exec-10] g.e.SimpleDataFetcherExceptionHandler -
> Exception while fetching data (/resources[0]/turtleSourceCode) : [line: 1,
> col: 318] Unicode replacement character U+FFFD in string
> org.apache.jena.riot.RiotParseException: [line: 1, col: 318] Unicode
> replacement character U+FFFD in string
> at
> org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerRiotParseException.warning(ErrorHandlerFactory.java:367)
> ~[jena-arq-4.2.0.jar:4.2.0]
> at
> org.apache.jena.riot.tokens.TokenizerText.warning(TokenizerText.java:1332)
> ~[jena-arq-4.2.0.jar:4.2.0]
> at
> org.apache.jena.riot.tokens.TokenizerText.readString(TokenizerText.java:768)
> ~[jena-arq-4.2.0.jar:4.2.0]
> at
> org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:238)
> ~[jena-arq-4.2.0.jar:4.2.0]
> at
> org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:89)
> ~[jena-arq-4.2.0.jar:4.2.0]
> at
> org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:119)
> ~[jena-tdb-4.2.0.jar:4.2.0]
> at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:118)
> ~[jena-tdb-4.2.0.jar:4.2.0]
> {code}
> TDB seems to use the fallback error handler causing an exception to be thrown
> instead of just printing the warning (to the log).
> Richard says he believes a fix would be to change NodecSEE.createTokenizer():
> {code:java}
> return TokenizerText.create()
> .fromString(string)
> .errorHandler(ErrorHandlerFactory.errorHandlerDetailed())
> .build();
> {code}
> Is there any known work-around in 4.2.0? We cannot even query those triples
> from the offending TDBs at the moment.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)