[ 
https://issues.apache.org/jira/browse/JENA-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17432177#comment-17432177
 ] 

Holger Knublauch commented on JENA-2179:
----------------------------------------

BTW the same seems to happen using RDF Delta:

{code:java}
[line: 1276, col: 437] Unicode replacement character U+FFFD.

org.apache.jena.riot.RiotParseException: [line: 1276, col: 428] Unicode 
replacement character U+FFFD in string
at 
org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerRiotParseException.warning(ErrorHandlerFactory.java:367)
at org.apache.jena.riot.tokens.TokenizerText.warning(TokenizerText.java:1332)
at org.apache.jena.riot.tokens.TokenizerText.readString(TokenizerText.java:768)
at org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:238)
at org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:89)
at 
org.seaborne.patch.text.RDFPatchReaderText.nextToken(RDFPatchReaderText.java:243)
at 
org.seaborne.patch.text.RDFPatchReaderText.nextNode(RDFPatchReaderText.java:254)
at 
org.seaborne.patch.text.RDFPatchReaderText.doOneLine(RDFPatchReaderText.java:104)
at org.seaborne.patch.text.RDFPatchReaderText.apply1(RDFPatchReaderText.java:72)
at org.seaborne.patch.text.RDFPatchReaderText.read(RDFPatchReaderText.java:49)
at org.seaborne.patch.text.RDFPatchReaderText.apply(RDFPatchReaderText.java:59)
at 
org.seaborne.delta.client.DeltaLinkHTTP.lambda$fetchCommon$8(DeltaLinkHTTP.java:211)
at org.seaborne.delta.client.DeltaLinkHTTP.retry(DeltaLinkHTTP.java:125)
at org.seaborne.delta.client.DeltaLinkHTTP.fetchCommon(DeltaLinkHTTP.java:204)
at org.seaborne.delta.client.DeltaLinkHTTP.fetch(DeltaLinkHTTP.java:184)
at org.topbraidlive.edg.backup.BackupUtils.getPatch(BackupUtils.java:368)
{code}


> TDB throws Unicode Replacement Character exception while fetching data
> ----------------------------------------------------------------------
>
>                 Key: JENA-2179
>                 URL: https://issues.apache.org/jira/browse/JENA-2179
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: TDB
>    Affects Versions: Jena 4.2.0
>            Reporter: Holger Knublauch
>            Assignee: Andy Seaborne
>            Priority: Major
>             Fix For: Jena 4.3.0
>
>         Attachments: TBS4190_Test.java
>
>
> This seems to have been introduced with 
> https://issues.apache.org/jira/browse/JENA-2120
> With TDB databases that contain the replacement character in a literal, the 
> warnings are reported as Exceptions. We have seen this:
> {code:java}
> WARN  [http-nio-8083-exec-10] g.e.SimpleDataFetcherExceptionHandler - 
> Exception while fetching data (/resources[0]/turtleSourceCode) : [line: 1, 
> col: 318] Unicode replacement character U+FFFD in string
> org.apache.jena.riot.RiotParseException: [line: 1, col: 318] Unicode 
> replacement character U+FFFD in string
>       at 
> org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerRiotParseException.warning(ErrorHandlerFactory.java:367)
>  ~[jena-arq-4.2.0.jar:4.2.0]
>       at 
> org.apache.jena.riot.tokens.TokenizerText.warning(TokenizerText.java:1332) 
> ~[jena-arq-4.2.0.jar:4.2.0]
>       at 
> org.apache.jena.riot.tokens.TokenizerText.readString(TokenizerText.java:768) 
> ~[jena-arq-4.2.0.jar:4.2.0]
>       at 
> org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:238) 
> ~[jena-arq-4.2.0.jar:4.2.0]
>       at 
> org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:89) 
> ~[jena-arq-4.2.0.jar:4.2.0]
>       at 
> org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:119) 
> ~[jena-tdb-4.2.0.jar:4.2.0]
>       at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:118) 
> ~[jena-tdb-4.2.0.jar:4.2.0]
> {code}
> TDB seems to use the fallback error handler causing an exception to be thrown 
> instead of just printing the warning (to the log).
> Richard says he believes a fix would be to change NodecSEE.createTokenizer():
> {code:java}
> return TokenizerText.create()
>     .fromString(string)
>     .errorHandler(ErrorHandlerFactory.errorHandlerDetailed())
>     .build();
> {code}
> Is there any known work-around in 4.2.0? We cannot even query those triples 
> from the offending TDBs at the moment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to