[
https://issues.apache.org/jira/browse/JENA-225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235827#comment-13235827
]
Hudson commented on JENA-225:
-----------------------------
Integrated in Jena_ARQ #510 (See [https://builds.apache.org/job/Jena_ARQ/510/])
Partial fix for JENA-225.
This does not fix the problem completely for TDB because strings are 9still)
not round-trip-safe. (Revision 1303934)
Result = SUCCESS
andy :
Files :
*
/incubator/jena/Jena2/ARQ/trunk/src/main/java/org/openjena/atlas/lib/Chars.java
> TDB datasets can be corrupted by performing certain operations within a
> transaction
> ------------------------------------------------------------------------------------
>
> Key: JENA-225
> URL: https://issues.apache.org/jira/browse/JENA-225
> Project: Apache Jena
> Issue Type: Bug
> Affects Versions: TDB 0.9.0
> Environment: jena-tdb-0.9.0-incubating
> Reporter: Sam Tunnicliffe
> Attachments: JENA-225-v1.patch, ReportBadUnicode1.java
>
>
> In a web application, we read some triples in a HTTP POST, using a LangTurtle
> instance and a tokenizer obtained from from
> TokenizerFactory.makeTokenizerUTF8.
> We then write the parsed Triples back out (to temporary storage) using
> OutputLangUtils.write. At some later time, these Triples are then re-read,
> again using a Tokenizer from TokenizerFactory.makeTokenizerUTF8, before being
> inserted into a TDB dataset.
> We have found it possible for the the input data to contain character strings
> which pass through the various parsers/serializers but which cause TDB's
> transaction layer to error in such a way as to make recovery from journals
> ineffective.
> Eliminating transactions from the code path enables the database to be
> updated successfully.
> The stacktrace from TDB looks like this:
> org.openjena.riot.RiotParseException: [line: 1, col: 2 ] Broken token: Hello
> at
> org.openjena.riot.tokens.TokenizerText.exception(TokenizerText.java:1209)
> at
> org.openjena.riot.tokens.TokenizerText.readString(TokenizerText.java:620)
> at
> org.openjena.riot.tokens.TokenizerText.parseToken(TokenizerText.java:248)
> at
> org.openjena.riot.tokens.TokenizerText.hasNext(TokenizerText.java:112)
> at com.hp.hpl.jena.tdb.nodetable.NodecSSE.decode(NodecSSE.java:105)
> at com.hp.hpl.jena.tdb.lib.NodeLib.decode(NodeLib.java:93)
> at
> com.hp.hpl.jena.tdb.nodetable.NodeTableNative$2.convert(NodeTableNative.java:234)
> at
> com.hp.hpl.jena.tdb.nodetable.NodeTableNative$2.convert(NodeTableNative.java:228)
> at org.openjena.atlas.iterator.Iter$4.next(Iter.java:301)
> at
> com.hp.hpl.jena.tdb.transaction.NodeTableTrans.append(NodeTableTrans.java:188)
> at
> com.hp.hpl.jena.tdb.transaction.NodeTableTrans.writeNodeJournal(NodeTableTrans.java:306)
> at
> com.hp.hpl.jena.tdb.transaction.NodeTableTrans.commitPrepare(NodeTableTrans.java:266)
> at
> com.hp.hpl.jena.tdb.transaction.Transaction.prepare(Transaction.java:131)
> at
> com.hp.hpl.jena.tdb.transaction.Transaction.commit(Transaction.java:112)
> at
> com.hp.hpl.jena.tdb.transaction.DatasetGraphTxn.commit(DatasetGraphTxn.java:40)
> at
> com.hp.hpl.jena.tdb.transaction.DatasetGraphTransaction._commit(DatasetGraphTransaction.java:106)
> at
> com.hp.hpl.jena.tdb.migrate.DatasetGraphTrackActive.commit(DatasetGraphTrackActive.java:60)
> at com.hp.hpl.jena.sparql.core.DatasetImpl.commit(DatasetImpl.java:143)
> At least part of the issue seems to be stem from NodecSSE (I know this isn't
> actual unicode escaping, but its derived from the user input we've received).
> String s = "Hello \uDAE0 World";
> Node literal = Node.createLiteral(s);
> ByteBuffer bb = NodeLib.encode(literal);
> NodeLib.decode(bb);
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira