Holger Knublauch created JENA-2179:
--------------------------------------
Summary: TDB throws Unicode Replacement Character exception while
fetching data
Key: JENA-2179
URL: https://issues.apache.org/jira/browse/JENA-2179
Project: Apache Jena
Issue Type: Bug
Components: TDB
Affects Versions: Jena 4.2.0
Reporter: Holger Knublauch
This seems to have been introduced with
https://issues.apache.org/jira/browse/JENA-2120
With TDB databases that contain the replacement character in a literal, the
warnings are reported as Exceptions. We have seen this:
{code:java}
WARN [http-nio-8083-exec-10] g.e.SimpleDataFetcherExceptionHandler - Exception
while fetching data (/resources[0]/turtleSourceCode) : [line: 1, col: 318]
Unicode replacement character U+FFFD in string
org.apache.jena.riot.RiotParseException: [line: 1, col: 318] Unicode
replacement character U+FFFD in string
at
org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerRiotParseException.warning(ErrorHandlerFactory.java:367)
~[jena-arq-4.2.0.jar:4.2.0]
at
org.apache.jena.riot.tokens.TokenizerText.warning(TokenizerText.java:1332)
~[jena-arq-4.2.0.jar:4.2.0]
at
org.apache.jena.riot.tokens.TokenizerText.readString(TokenizerText.java:768)
~[jena-arq-4.2.0.jar:4.2.0]
at
org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:238)
~[jena-arq-4.2.0.jar:4.2.0]
at
org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:89)
~[jena-arq-4.2.0.jar:4.2.0]
at
org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:119)
~[jena-tdb-4.2.0.jar:4.2.0]
at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:118)
~[jena-tdb-4.2.0.jar:4.2.0]
{code}
TDB seems to use the fallback error handler causing an exception to be thrown
instead of just printing the warning (to the log).
Richard says he believes a fix would be to change NodecSEE.createTokenizer():
{code:java}
return TokenizerText.create()
.fromString(string)
.errorHandler(ErrorHandlerFactory.errorHandlerDetailed())
.build();
{code}
Is there any known work-around in 4.2.0? We cannot even query those triples
from the offending TDBs at the moment.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)