[ https://issues.apache.org/jira/browse/JENA-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16631709#comment-16631709 ]
Bernhard Stiftner commented on JENA-1553: ----------------------------------------- Experienced the same problem with Jena 3.8.0. TDB node tables got corrupted at some point under a combined, concurrent read/write workload, consequently leading to various exceptions being thrown in/around NodeLib.decode. Among the incarnations of the same problem were... Different kinds of RiotParseExceptions when attemping to access corrupted TDB node tables: org.apache.jena.riot.RiotParseException: [line: 1, col: 1 ] Failed to find a prefix name or keyword: ^@(0;0x0000) at org.apache.jena.riot.tokens.TokenizerText$ErrorHandlerTokenizer.error(TokenizerText.java:65) at org.apache.jena.riot.tokens.TokenizerText.error(TokenizerText.java:1244) at org.apache.jena.riot.tokens.TokenizerText.readPrefixedNameOrKeyword(TokenizerText.java:536) at org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:445) at org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:99) at org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:127) at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:110 org.apache.jena.riot.RiotParseException: [line: 1, col: 3 ] Malformed double: 2e at org.apache.jena.riot.tokens.TokenizerText$ErrorHandlerTokenizer.error(TokenizerText.java:65) at org.apache.jena.riot.tokens.TokenizerText.error(TokenizerText.java:1244) at org.apache.jena.riot.tokens.TokenizerText.exponent(TokenizerText.java:1011) at org.apache.jena.riot.tokens.TokenizerText.readNumber(TokenizerText.java:916) at org.apache.jena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:421) at org.apache.jena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:99) at org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:127) at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:110) Or a TDBException like this one: org.apache.jena.tdb.TDBException: Not a node: if/stmt/6da980f15dedf35826cf3a4354525ded8efde37b> at org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:133) at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:110) And I also got "Illegal UTF-8" errors just as in the stacktrace above: org.apache.jena.atlas.RuntimeIOException: java.io.IOException: Illegal UTF-8: 0xFFFFFF97 at org.apache.jena.atlas.io.IO.exception(IO.java:254) at org.apache.jena.atlas.io.BlockUTF8.exception(BlockUTF8.java:275) at org.apache.jena.atlas.io.BlockUTF8.toCharsBuffer(BlockUTF8.java:150) at org.apache.jena.atlas.io.BlockUTF8.toChars(BlockUTF8.java:73) at org.apache.jena.atlas.io.BlockUTF8.toString(BlockUTF8.java:95) at org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:101) at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:110) All of those errors disappeared after patching Jena (we're using our own fork of 3.8.0) with the proposed fix for JENA-1581 (upcoming Jena 3.9.0) and completely rebuilding TDB stores. Existing data is probably corrupted and cannot be recovered, but so far I believe that JENA-1581 prevents TDB corruption from happening in the first place. > Can't Backup data - java.io.IOException: Illegal UTF-8: 0xFFFFFFB1 > ------------------------------------------------------------------ > > Key: JENA-1553 > URL: https://issues.apache.org/jira/browse/JENA-1553 > Project: Apache Jena > Issue Type: Bug > Components: Jena > Environment: Ubuntu 16.04 running Docker. Running stain/jena-fuseki > from the official Docker Hub. > Reporter: Brian Mullen > Priority: Major > > Attempting to backup through Fuseki, TDB 500M+ triples, breaking with error: > > {code:java} > [2018-06-01 13:25:46] Log4jLoggerAdapter WARN Exception in backup > org.apache.jena.atlas.RuntimeIOException: java.io.IOException: Illegal UTF-8: > 0xFFFFFFB1 > at org.apache.jena.atlas.io.IO.exception(IO.java:233) > at org.apache.jena.atlas.io.BlockUTF8.exception(BlockUTF8.java:275) > at > org.apache.jena.atlas.io.BlockUTF8.toCharsBuffer(BlockUTF8.java:150) > at org.apache.jena.atlas.io.BlockUTF8.toChars(BlockUTF8.java:73) > at org.apache.jena.atlas.io.BlockUTF8.toString(BlockUTF8.java:95) > at > org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:101) > at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:105) > at org.apache.jena.tdb.lib.NodeLib.fetchDecode(NodeLib.java:81) > at > org.apache.jena.tdb.store.nodetable.NodeTableNative.readNodeFromTable(NodeTableNative.java:186) > at > org.apache.jena.tdb.store.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:111) > at > org.apache.jena.tdb.store.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:70) > at > org.apache.jena.tdb.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:128) > at > org.apache.jena.tdb.store.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:82) > at > org.apache.jena.tdb.store.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:50) > at > org.apache.jena.tdb.store.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:67) > at org.apache.jena.tdb.lib.TupleLib.triple(TupleLib.java:107) > at org.apache.jena.tdb.lib.TupleLib.triple(TupleLib.java:84) > at > org.apache.jena.tdb.lib.TupleLib.lambda$convertToTriples$2(TupleLib.java:54) > at org.apache.jena.atlas.iterator.Iter$2.next(Iter.java:270) > at org.apache.jena.atlas.iterator.Iter$2.next(Iter.java:270) > at org.apache.jena.atlas.iterator.Iter.next(Iter.java:891) > at > org.apache.jena.riot.system.StreamOps.sendQuadsToStream(StreamOps.java:140) > at > org.apache.jena.riot.writer.NQuadsWriter.write$(NQuadsWriter.java:62) > at > org.apache.jena.riot.writer.NQuadsWriter.write(NQuadsWriter.java:45) > at > org.apache.jena.riot.writer.NQuadsWriter.write(NQuadsWriter.java:91) > at org.apache.jena.riot.RDFWriter.write$(RDFWriter.java:208) > at org.apache.jena.riot.RDFWriter.output(RDFWriter.java:165) > at org.apache.jena.riot.RDFWriter.output(RDFWriter.java:112) > at > org.apache.jena.riot.RDFWriterBuilder.output(RDFWriterBuilder.java:149) > at org.apache.jena.riot.RDFDataMgr.write$(RDFDataMgr.java:1269) > at org.apache.jena.riot.RDFDataMgr.write(RDFDataMgr.java:1162) > at org.apache.jena.riot.RDFDataMgr.write(RDFDataMgr.java:1153) > at org.apache.jena.fuseki.mgt.Backup.backup(Backup.java:115) > at org.apache.jena.fuseki.mgt.Backup.backup(Backup.java:75) > at > org.apache.jena.fuseki.mgt.ActionBackup$BackupTask.run(ActionBackup.java:58) > at > org.apache.jena.fuseki.async.AsyncPool.lambda$submit$0(AsyncPool.java:55) > at org.apache.jena.fuseki.async.AsyncTask.call(AsyncTask.java:100) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Illegal UTF-8: 0xFFFFFFB1 > ... 40 more > [2018-06-01 13:25:46] Log4jLoggerAdapter INFO > Backup(/fuseki/backups/PDE_PROD_2018-06-01_13-24-00):2{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)