[ https://issues.apache.org/jira/browse/JENA-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429177#comment-17429177 ]
Andy Seaborne commented on JENA-2179: ------------------------------------- Please test! > TDB throws Unicode Replacement Character exception while fetching data > ---------------------------------------------------------------------- > > Key: JENA-2179 > URL: https://issues.apache.org/jira/browse/JENA-2179 > Project: Apache Jena > Issue Type: Bug > Components: TDB > Affects Versions: Jena 4.2.0 > Reporter: Holger Knublauch > Assignee: Andy Seaborne > Priority: Major > Attachments: TBS4190_Test.java > > > This seems to have been introduced with > https://issues.apache.org/jira/browse/JENA-2120 > With TDB databases that contain the replacement character in a literal, the > warnings are reported as Exceptions. We have seen this: > {code:java} > WARN [http-nio-8083-exec-10] g.e.SimpleDataFetcherExceptionHandler - > Exception while fetching data (/resources[0]/turtleSourceCode) : [line: 1, > col: 318] Unicode replacement character U+FFFD in string > org.apache.jena.riot.RiotParseException: [line: 1, col: 318] Unicode > replacement character U+FFFD in string > at > org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerRiotParseException.warning(ErrorHandlerFactory.java:367) > ~[jena-arq-4.2.0.jar:4.2.0] > at > org.apache.jena.riot.tokens.TokenizerText.warning(TokenizerText.java:1332) > ~[jena-arq-4.2.0.jar:4.2.0] > at > org.apache.jena.riot.tokens.TokenizerText.readString(TokenizerText.java:768) > ~[jena-arq-4.2.0.jar:4.2.0] > at > org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:238) > ~[jena-arq-4.2.0.jar:4.2.0] > at > org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:89) > ~[jena-arq-4.2.0.jar:4.2.0] > at > org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:119) > ~[jena-tdb-4.2.0.jar:4.2.0] > at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:118) > ~[jena-tdb-4.2.0.jar:4.2.0] > {code} > TDB seems to use the fallback error handler causing an exception to be thrown > instead of just printing the warning (to the log). > Richard says he believes a fix would be to change NodecSEE.createTokenizer(): > {code:java} > return TokenizerText.create() > .fromString(string) > .errorHandler(ErrorHandlerFactory.errorHandlerDetailed()) > .build(); > {code} > Is there any known work-around in 4.2.0? We cannot even query those triples > from the offending TDBs at the moment. -- This message was sent by Atlassian Jira (v8.3.4#803005)