Holger Knublauch created JENA-2179: -------------------------------------- Summary: TDB throws Unicode Replacement Character exception while fetching data Key: JENA-2179 URL: https://issues.apache.org/jira/browse/JENA-2179 Project: Apache Jena Issue Type: Bug Components: TDB Affects Versions: Jena 4.2.0 Reporter: Holger Knublauch
This seems to have been introduced with https://issues.apache.org/jira/browse/JENA-2120 With TDB databases that contain the replacement character in a literal, the warnings are reported as Exceptions. We have seen this: {code:java} WARN [http-nio-8083-exec-10] g.e.SimpleDataFetcherExceptionHandler - Exception while fetching data (/resources[0]/turtleSourceCode) : [line: 1, col: 318] Unicode replacement character U+FFFD in string org.apache.jena.riot.RiotParseException: [line: 1, col: 318] Unicode replacement character U+FFFD in string at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerRiotParseException.warning(ErrorHandlerFactory.java:367) ~[jena-arq-4.2.0.jar:4.2.0] at org.apache.jena.riot.tokens.TokenizerText.warning(TokenizerText.java:1332) ~[jena-arq-4.2.0.jar:4.2.0] at org.apache.jena.riot.tokens.TokenizerText.readString(TokenizerText.java:768) ~[jena-arq-4.2.0.jar:4.2.0] at org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:238) ~[jena-arq-4.2.0.jar:4.2.0] at org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:89) ~[jena-arq-4.2.0.jar:4.2.0] at org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:119) ~[jena-tdb-4.2.0.jar:4.2.0] at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:118) ~[jena-tdb-4.2.0.jar:4.2.0] {code} TDB seems to use the fallback error handler causing an exception to be thrown instead of just printing the warning (to the log). Richard says he believes a fix would be to change NodecSEE.createTokenizer(): {code:java} return TokenizerText.create() .fromString(string) .errorHandler(ErrorHandlerFactory.errorHandlerDetailed()) .build(); {code} Is there any known work-around in 4.2.0? We cannot even query those triples from the offending TDBs at the moment. -- This message was sent by Atlassian Jira (v8.3.4#803005)