[ 
https://issues.apache.org/jira/browse/JENA-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17427518#comment-17427518
 ] 

Andy Seaborne commented on JENA-2179:
-------------------------------------

It is passed by the Turtle parser - that isn't what is happening here.

There was never an intention to change anything, its just an unforeseen 
consequence.

The reason the warning was put in is the other cause of U+FFFD - when there is 
an encoding error. This matters because it warns about the issue as early 
possible. There is no U+FFFD in the input data.


> TDB throws Unicode Replacement Character exception while fetching data
> ----------------------------------------------------------------------
>
>                 Key: JENA-2179
>                 URL: https://issues.apache.org/jira/browse/JENA-2179
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: TDB
>    Affects Versions: Jena 4.2.0
>            Reporter: Holger Knublauch
>            Priority: Major
>         Attachments: TBS4190_Test.java
>
>
> This seems to have been introduced with 
> https://issues.apache.org/jira/browse/JENA-2120
> With TDB databases that contain the replacement character in a literal, the 
> warnings are reported as Exceptions. We have seen this:
> {code:java}
> WARN  [http-nio-8083-exec-10] g.e.SimpleDataFetcherExceptionHandler - 
> Exception while fetching data (/resources[0]/turtleSourceCode) : [line: 1, 
> col: 318] Unicode replacement character U+FFFD in string
> org.apache.jena.riot.RiotParseException: [line: 1, col: 318] Unicode 
> replacement character U+FFFD in string
>       at 
> org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerRiotParseException.warning(ErrorHandlerFactory.java:367)
>  ~[jena-arq-4.2.0.jar:4.2.0]
>       at 
> org.apache.jena.riot.tokens.TokenizerText.warning(TokenizerText.java:1332) 
> ~[jena-arq-4.2.0.jar:4.2.0]
>       at 
> org.apache.jena.riot.tokens.TokenizerText.readString(TokenizerText.java:768) 
> ~[jena-arq-4.2.0.jar:4.2.0]
>       at 
> org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:238) 
> ~[jena-arq-4.2.0.jar:4.2.0]
>       at 
> org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:89) 
> ~[jena-arq-4.2.0.jar:4.2.0]
>       at 
> org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:119) 
> ~[jena-tdb-4.2.0.jar:4.2.0]
>       at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:118) 
> ~[jena-tdb-4.2.0.jar:4.2.0]
> {code}
> TDB seems to use the fallback error handler causing an exception to be thrown 
> instead of just printing the warning (to the log).
> Richard says he believes a fix would be to change NodecSEE.createTokenizer():
> {code:java}
> return TokenizerText.create()
>     .fromString(string)
>     .errorHandler(ErrorHandlerFactory.errorHandlerDetailed())
>     .build();
> {code}
> Is there any known work-around in 4.2.0? We cannot even query those triples 
> from the offending TDBs at the moment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to