[ https://issues.apache.org/jira/browse/JENA-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425466#comment-17425466 ]
Andy Seaborne commented on JENA-2179: ------------------------------------- See JENA-2118. The ITIL glossaries are probably corrupt in some way either because they went through a bad character conversion at some time in their life or something around the issue that JENA-2118 addressed. The generation of FFFD is often related to ISO-8859 / UTF-8 confusion. Best to fix the files. I don't believe we have go to the root of the issue in this ticket yet. It does not happen consistently for me. BTW We are still waiting on test cases from TopQuadrant about writers. > TDB throws Unicode Replacement Character exception while fetching data > ---------------------------------------------------------------------- > > Key: JENA-2179 > URL: https://issues.apache.org/jira/browse/JENA-2179 > Project: Apache Jena > Issue Type: Bug > Components: TDB > Affects Versions: Jena 4.2.0 > Reporter: Holger Knublauch > Priority: Major > > This seems to have been introduced with > https://issues.apache.org/jira/browse/JENA-2120 > With TDB databases that contain the replacement character in a literal, the > warnings are reported as Exceptions. We have seen this: > {code:java} > WARN [http-nio-8083-exec-10] g.e.SimpleDataFetcherExceptionHandler - > Exception while fetching data (/resources[0]/turtleSourceCode) : [line: 1, > col: 318] Unicode replacement character U+FFFD in string > org.apache.jena.riot.RiotParseException: [line: 1, col: 318] Unicode > replacement character U+FFFD in string > at > org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerRiotParseException.warning(ErrorHandlerFactory.java:367) > ~[jena-arq-4.2.0.jar:4.2.0] > at > org.apache.jena.riot.tokens.TokenizerText.warning(TokenizerText.java:1332) > ~[jena-arq-4.2.0.jar:4.2.0] > at > org.apache.jena.riot.tokens.TokenizerText.readString(TokenizerText.java:768) > ~[jena-arq-4.2.0.jar:4.2.0] > at > org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:238) > ~[jena-arq-4.2.0.jar:4.2.0] > at > org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:89) > ~[jena-arq-4.2.0.jar:4.2.0] > at > org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:119) > ~[jena-tdb-4.2.0.jar:4.2.0] > at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:118) > ~[jena-tdb-4.2.0.jar:4.2.0] > {code} > TDB seems to use the fallback error handler causing an exception to be thrown > instead of just printing the warning (to the log). > Richard says he believes a fix would be to change NodecSEE.createTokenizer(): > {code:java} > return TokenizerText.create() > .fromString(string) > .errorHandler(ErrorHandlerFactory.errorHandlerDetailed()) > .build(); > {code} > Is there any known work-around in 4.2.0? We cannot even query those triples > from the offending TDBs at the moment. -- This message was sent by Atlassian Jira (v8.3.4#803005)