[
https://issues.apache.org/jira/browse/JENA-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429177#comment-17429177
]
Andy Seaborne commented on JENA-2179:
-------------------------------------
Please test!
> TDB throws Unicode Replacement Character exception while fetching data
> ----------------------------------------------------------------------
>
> Key: JENA-2179
> URL: https://issues.apache.org/jira/browse/JENA-2179
> Project: Apache Jena
> Issue Type: Bug
> Components: TDB
> Affects Versions: Jena 4.2.0
> Reporter: Holger Knublauch
> Assignee: Andy Seaborne
> Priority: Major
> Attachments: TBS4190_Test.java
>
>
> This seems to have been introduced with
> https://issues.apache.org/jira/browse/JENA-2120
> With TDB databases that contain the replacement character in a literal, the
> warnings are reported as Exceptions. We have seen this:
> {code:java}
> WARN [http-nio-8083-exec-10] g.e.SimpleDataFetcherExceptionHandler -
> Exception while fetching data (/resources[0]/turtleSourceCode) : [line: 1,
> col: 318] Unicode replacement character U+FFFD in string
> org.apache.jena.riot.RiotParseException: [line: 1, col: 318] Unicode
> replacement character U+FFFD in string
> at
> org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerRiotParseException.warning(ErrorHandlerFactory.java:367)
> ~[jena-arq-4.2.0.jar:4.2.0]
> at
> org.apache.jena.riot.tokens.TokenizerText.warning(TokenizerText.java:1332)
> ~[jena-arq-4.2.0.jar:4.2.0]
> at
> org.apache.jena.riot.tokens.TokenizerText.readString(TokenizerText.java:768)
> ~[jena-arq-4.2.0.jar:4.2.0]
> at
> org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:238)
> ~[jena-arq-4.2.0.jar:4.2.0]
> at
> org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:89)
> ~[jena-arq-4.2.0.jar:4.2.0]
> at
> org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:119)
> ~[jena-tdb-4.2.0.jar:4.2.0]
> at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:118)
> ~[jena-tdb-4.2.0.jar:4.2.0]
> {code}
> TDB seems to use the fallback error handler causing an exception to be thrown
> instead of just printing the warning (to the log).
> Richard says he believes a fix would be to change NodecSEE.createTokenizer():
> {code:java}
> return TokenizerText.create()
> .fromString(string)
> .errorHandler(ErrorHandlerFactory.errorHandlerDetailed())
> .build();
> {code}
> Is there any known work-around in 4.2.0? We cannot even query those triples
> from the offending TDBs at the moment.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)