[ 
https://issues.apache.org/jira/browse/JENA-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429177#comment-17429177
 ] 

Andy Seaborne commented on JENA-2179:
-------------------------------------

Please test!

> TDB throws Unicode Replacement Character exception while fetching data
> ----------------------------------------------------------------------
>
>                 Key: JENA-2179
>                 URL: https://issues.apache.org/jira/browse/JENA-2179
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: TDB
>    Affects Versions: Jena 4.2.0
>            Reporter: Holger Knublauch
>            Assignee: Andy Seaborne
>            Priority: Major
>         Attachments: TBS4190_Test.java
>
>
> This seems to have been introduced with 
> https://issues.apache.org/jira/browse/JENA-2120
> With TDB databases that contain the replacement character in a literal, the 
> warnings are reported as Exceptions. We have seen this:
> {code:java}
> WARN  [http-nio-8083-exec-10] g.e.SimpleDataFetcherExceptionHandler - 
> Exception while fetching data (/resources[0]/turtleSourceCode) : [line: 1, 
> col: 318] Unicode replacement character U+FFFD in string
> org.apache.jena.riot.RiotParseException: [line: 1, col: 318] Unicode 
> replacement character U+FFFD in string
>       at 
> org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerRiotParseException.warning(ErrorHandlerFactory.java:367)
>  ~[jena-arq-4.2.0.jar:4.2.0]
>       at 
> org.apache.jena.riot.tokens.TokenizerText.warning(TokenizerText.java:1332) 
> ~[jena-arq-4.2.0.jar:4.2.0]
>       at 
> org.apache.jena.riot.tokens.TokenizerText.readString(TokenizerText.java:768) 
> ~[jena-arq-4.2.0.jar:4.2.0]
>       at 
> org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:238) 
> ~[jena-arq-4.2.0.jar:4.2.0]
>       at 
> org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:89) 
> ~[jena-arq-4.2.0.jar:4.2.0]
>       at 
> org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:119) 
> ~[jena-tdb-4.2.0.jar:4.2.0]
>       at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:118) 
> ~[jena-tdb-4.2.0.jar:4.2.0]
> {code}
> TDB seems to use the fallback error handler causing an exception to be thrown 
> instead of just printing the warning (to the log).
> Richard says he believes a fix would be to change NodecSEE.createTokenizer():
> {code:java}
> return TokenizerText.create()
>     .fromString(string)
>     .errorHandler(ErrorHandlerFactory.errorHandlerDetailed())
>     .build();
> {code}
> Is there any known work-around in 4.2.0? We cannot even query those triples 
> from the offending TDBs at the moment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to