[
https://issues.apache.org/jira/browse/JENA-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989698#comment-16989698
]
Andy Seaborne edited comment on JENA-1785 at 12/6/19 12:27 PM:
---------------------------------------------------------------
Here is my understanding of the problem: this is mainly to make sure I
understand the details here!
----
The NodeTable Cache should reflect the node table. The NodeTable is append-only
and it does not matter if nodes are added to it by later transactions; what is
in the data is determined by the triples/quads, not by presence in the node
table.
Inside a write-transaction things are different because the W-txn may abort.
Nodes created should not be visible outside the W-txn until it commits
(JENA-1746). The fact nodes are cleared up from storage after an abort was the
problem in JENA-1746.
The underlying disk storage, a TransBinaryDataFile, reflects transactions. In a
W-txn, there is the visible part (all still running read-transactions) and the
additional nodes of the W-txn.
NodeTableCache has buffering caches that attempts to reflect this.
These are ThreadBufferingCache, which is a pair of normal LRU caches, one
"local" for the W-txn nodes and one "base" for the globally visible cache of
the node table.
The same structure is used for the not-present cache.
Problem:
A node is initially recorded as "not present". It is now in the
base-not-present cache.
A W-txn adds this node, putting it in the node/nodeId caches as well as writing
through to the TransBinaryDataFile. It is removed from the not-present-local
cache if in there. The code correctly checks to this point because the
node/nodeId caching is checked before not-present.
However, problems arise if it falls out of the local node/nodeId cache.
Now, a lookup will not find it in a local cache, but will get the wrong answer
when it finds it in the base caching because it is in the base-not-present
cache.
was (Author: andy.seaborne):
Here is my understanding of the problem: this is mainly to make sure I
understand the details here!
----
The NodeTable Cache should reflect the node table. The NodeTable is append-only
and it does not matter if nodes are added to it by later transactions; what is
in the data is determined by the triples/quads, not by presence in the node
table.
Inside a write-transaction things are different because the W-txn may abort.
Nodes created should not be visible outside the W-txn until it commits
(JENA-1746). The fact nodes are cleared up from storage after an abort was the
problem in JENA-1746.
The underlying disk storage, a TransBinaryDataFile, reflects transactions. In a
W-txn, there is the visible part (all still running read-transactions) and the
additional nodes of the W-txn.
NodeTableCache has buffering caches that attempts to reflect this.
These are ThreadBufferingCache, which is a pair of normal LRU caches, one
"local" for the W-txn nodes and one "base" for the globally visible cache of
the node table.
The same structure is used for the not-present cache.
Problem:
A node is initially recorded as "not present". It is now in the
base-not-present cache.
A W-txn adds this node, putting it in the node<->nodeId caches as well as
writing through to the TransBinaryDataFile. It is removed from the
not-present-local cache if in there. The code correctly checks to this point
because the node<->nodeId caching is checked before not-present.
However, problems arise if it falls out of the local node/nodeId cache.
Now, a lookup will not find it in a local cache, but will get the wrong answer
when it finds it in the base caching because it is in the base-not-present
cache.
> A newly created node can remain invisible after commit
> ------------------------------------------------------
>
> Key: JENA-1785
> URL: https://issues.apache.org/jira/browse/JENA-1785
> Project: Apache Jena
> Issue Type: Bug
> Components: TDB2
> Affects Versions: Jena 3.13.0, Jena 3.13.1
> Reporter: Pavel Mikhailovskii
> Assignee: Andy Seaborne
> Priority: Critical
> Attachments: TestVisibilityOfChanges.java
>
> Time Spent: 1h
> Remaining Estimate: 0h
>
> A node once marked as non-present (_NodeTableCache.nonPresent_) can remain
> invisible even after it's created and the transaction is committed. That
> might happen because there's no guarantee that *all* newly created nodes will
> be eventually added to the "base" version _ThreadBufferingCache.baseCache_ of
> theĀ _node2id_Cache_ (as the _localCache_ has limited capacity) or removed
> from the "base" version of the _nonPresent_ cache (even if they were, there
> would still be a chance of re-adding them by some read transaction).
> The simplest fix is to get rid of the _nonPresent_ cache which seems to be of
> limited use anyway. A more sophisticated fix would involve keeping track of
> all newly allocated nodes and their removal from the base version of
> _nonPresent_ cache on transaction commit.
> To reproduce: see the attached test.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)