[ https://issues.apache.org/jira/browse/JENA-1746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927816#comment-16927816 ]
ASF subversion and git services commented on JENA-1746: ------------------------------------------------------- Commit c1a84039080a29dd20a02a79c0793724564f7c11 in jena's branch refs/heads/master from Andy Seaborne [ https://gitbox.apache.org/repos/asf?p=jena.git;h=c1a8403 ] Merge pull request #602 from afs/jena1746-tdb2-abort JENA-1746: TDB2 abort > TDB2 rollback method clashes with nodetable cache > ------------------------------------------------- > > Key: JENA-1746 > URL: https://issues.apache.org/jira/browse/JENA-1746 > Project: Apache Jena > Issue Type: Bug > Components: TDB2 > Affects Versions: Jena 3.11.0, Jena 3.12.0 > Environment: Linux 3.16.0-9-amd64 #1 SMP Debian 3.16.68-2 > (2019-06-17) x86_64 GNU/Linux > java version "1.8.0_05" > Java(TM) SE Runtime Environment (build 1.8.0_05-b13) > Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode) > Reporter: Miklós Győrfi > Priority: Critical > Attachments: jena-test.tgz > > Time Spent: 20m > Remaining Estimate: 0h > > *Issue:* Inserting triplets, then rollbacking the TDB2 dataset, and loading > back nodes, including some nodes again with the same content causes some > artifacts and mess: some nodes disappear, some nodes are replaced. Moreover > it unrecoverably *corrupts* the database files: accessing triplets then may > cause RiotThriftException. > **org.apache.jena.riot.thrift.RiotThriftException: No conversion to a > Node: <RDF_Term > > *Reproduction*: Create some quads into a non-empty dataset, then rollback it, > and create again the same triplets in another order, using anonymous and URL > nodes simultaneously. Although this method does not guarantee the issue, the > possibility is high. > *Cause*: My inverstigation shows, that the culprit is the {{NodeTableCache}}. > It caches the node - nodeId relation of the backed table ({{NodeTableTRDF}}), > but the cache does not react to the rollback (abort) operation. The backing > table - during rollback - invalidates the node Id-s. The node Id is in close > relation of the position of the node data in the node data file, so new > inserts can reuse these invalidated node Ids, or close to it for other nodes. > As the nodes (remaining in cache, but not written, and the new ones) then > overlaps each other, reading back them causes Thrift errors, or later it > causes missing nodes in the index. The data of the cached nodes disappears, > if they fall out from the cache, or the dataset reopens. > *Possible fix:* None of the NodeTables registers and reacts to the rollback, > only the backing file and index are restored. Best possible solution is > _creating an option for these components to react to the restoration_. Cache > then may evict cached data, or may track changes in transactions, and can > evict only those. Anyway it is very justifiable for the rollback situations > to evict all the caches. > TransactionCoordinator has collections for shutdownHooks, and for > transactionsComponents. This is a good pattern for creating another > collection for notification interfaces, and calling back these on > transactional events. CacheNodeTable (and other objects) can then be a > listener to this events, and may evict the cache, if necessary. > Other possibility to create callback option in the NodeTable to react to the > invalidation, and propagate back the invalidation in the NodeTable > hierarchy. > Another simpler fix is to propagate down the thread-safe storage "version" in > the NodeTables, and check it in the cache, and evict. > *Workaround:* Skipping the cache (setting nodeToIdCacheSize and > idToNodeCacheSize to -1 in StoreParams) is a good workaround now, but causes > performance issues. > -- This message was sent by Atlassian Jira (v8.3.2#803003)