[
https://issues.apache.org/jira/browse/JCR-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511387
]
Thomas Mueller commented on JCR-926:
------------------------------------
The garbage collection implementation has a few disadvantages, for example you
can't stop and restart it. I suggest to get the nodes directly from the
persistence manager. To do this, I suggest to add a new method to
AbstractBundlePersistenceManager:
protected synchronized NodeIdIterator getAllNodeIds(NodeId startWith, int
maxCount)
startWith can be null, in which case there is no lower limit.
If maxCount is 0 then all node ids are returned.
Like this you can stop and restart it:
Day 1: getAllNodeIds(null, 1000);
... iterate though all node ids
... remember the last node id of this batch, for example 0x12345678
Day 2: getAllNodeIds(0x12345678, 1000);
... iterate though all node ids
... remember the last node id of this batch,...
New nodes are not a problem, as the modified date of a node is updated in this
case.
AbstractBundlePersistenceManager.getAllNodeIds could be used for other features
later on (for example, fast repository cloning, backup).
Thomas
> Global data store for binaries
> ------------------------------
>
> Key: JCR-926
> URL: https://issues.apache.org/jira/browse/JCR-926
> Project: Jackrabbit
> Issue Type: New Feature
> Components: core
> Reporter: Jukka Zitting
> Attachments: dataStore.patch, DataStore.patch, DataStore2.patch,
> dataStore3.patch, dataStore4.zip, internalValue.patch, ReadWhileSaveTest.patch
>
>
> There are three main problems with the way Jackrabbit currently handles large
> binary values:
> 1) Persisting a large binary value blocks access to the persistence layer for
> extended amounts of time (see JCR-314)
> 2) At least two copies of binary streams are made when saving them through
> the JCR API: one in the transient space, and one when persisting the value
> 3) Versioining and copy operations on nodes or subtrees that contain large
> binary values can quickly end up consuming excessive amounts of storage space.
> To solve these issues (and to get other nice benefits), I propose that we
> implement a global "data store" concept in the repository. A data store is an
> append-only set of binary values that uses short identifiers to identify and
> access the stored binary values. The data store would trivially fit the
> requirements of transient space and transaction handling due to the
> append-only nature. An explicit mark-and-sweep garbage collection process
> could be added to avoid concerns about storing garbage values.
> See the recent NGP value record discussion, especially [1], for more
> background on this idea.
> [1]
> http://mail-archives.apache.org/mod_mbox/jackrabbit-dev/200705.mbox/[EMAIL
> PROTECTED]
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.