[ https://issues.apache.org/jira/browse/JENA-225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andy Seaborne closed JENA-225. ------------------------------ > TDB datasets can be corrupted by performing certain operations within a > transaction > ------------------------------------------------------------------------------------ > > Key: JENA-225 > URL: https://issues.apache.org/jira/browse/JENA-225 > Project: Apache Jena > Issue Type: Bug > Affects Versions: TDB 0.9.0 > Environment: jena-tdb-0.9.0-incubating > Reporter: Sam Tunnicliffe > Assignee: Andy Seaborne > Fix For: TDB 0.9.1 > > Attachments: JENA-225-v1.patch, ReportBadUnicode1.java > > > In a web application, we read some triples in a HTTP POST, using a LangTurtle > instance and a tokenizer obtained from from > TokenizerFactory.makeTokenizerUTF8. > We then write the parsed Triples back out (to temporary storage) using > OutputLangUtils.write. At some later time, these Triples are then re-read, > again using a Tokenizer from TokenizerFactory.makeTokenizerUTF8, before being > inserted into a TDB dataset. > We have found it possible for the the input data to contain character strings > which pass through the various parsers/serializers but which cause TDB's > transaction layer to error in such a way as to make recovery from journals > ineffective. > Eliminating transactions from the code path enables the database to be > updated successfully. > The stacktrace from TDB looks like this: > org.openjena.riot.RiotParseException: [line: 1, col: 2 ] Broken token: Hello > at > org.openjena.riot.tokens.TokenizerText.exception(TokenizerText.java:1209) > at > org.openjena.riot.tokens.TokenizerText.readString(TokenizerText.java:620) > at > org.openjena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:248) > at > org.openjena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:112) > at com.hp.hpl.jena.tdb.nodetable.NodecSSE.decode(NodecSSE.java:105) > at com.hp.hpl.jena.tdb.lib.NodeLib.decode(NodeLib.java:93) > at > com.hp.hpl.jena.tdb.nodetable.NodeTableNative$2.convert(NodeTableNative.java:234) > at > com.hp.hpl.jena.tdb.nodetable.NodeTableNative$2.convert(NodeTableNative.java:228) > at org.openjena.atlas.iterator.Iter$4.next(Iter.java:301) > at > com.hp.hpl.jena.tdb.transaction.NodeTableTrans.append(NodeTableTrans.java:188) > at > com.hp.hpl.jena.tdb.transaction.NodeTableTrans.writeNodeJournal(NodeTableTrans.java:306) > at > com.hp.hpl.jena.tdb.transaction.NodeTableTrans.commitPrepare(NodeTableTrans.java:266) > at > com.hp.hpl.jena.tdb.transaction.Transaction.prepare(Transaction.java:131) > at > com.hp.hpl.jena.tdb.transaction.Transaction.commit(Transaction.java:112) > at > com.hp.hpl.jena.tdb.transaction.DatasetGraphTxn.commit(DatasetGraphTxn.java:40) > at > com.hp.hpl.jena.tdb.transaction.DatasetGraphTransaction._commit(DatasetGraphTransaction.java:106) > at > com.hp.hpl.jena.tdb.migrate.DatasetGraphTrackActive.commit(DatasetGraphTrackActive.java:60) > at com.hp.hpl.jena.sparql.core.DatasetImpl.commit(DatasetImpl.java:143) > At least part of the issue seems to be stem from NodecSSE (I know this isn't > actual unicode escaping, but its derived from the user input we've received). > String s = "Hello \uDAE0 World"; > Node literal = Node.createLiteral(s); > ByteBuffer bb = NodeLib.encode(literal); > NodeLib.decode(bb); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira