[ https://issues.apache.org/jira/browse/OAK-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550155#comment-14550155 ]
Julian Sedding commented on OAK-2882: ------------------------------------- +1 The patch works as expected. Improving the heap-usage would of course be a plus, but is not crucial for the repository size I am working with. I ran a few tests copying 500k nodes (2/3 assets, 1/3 websites) containing a total of 59k binaries in the datastore. Times refer only to the copy phase, not the commit-hook phase. {noformat} datastore-list.txt missing: 35sec (same as without the LengthCachingDataStore wrapper) datastore-list.txt full: 15sec datastore-list.txt half-full: 25sec (counted with 'wc -l' and trimmed with 'tail') {noformat} I can also confirm the issue Shashank mentioned. After the half-full test run, the datastore-list.txt file contained the other half, i.e. all newly added mappings, but not the old. > Support migration without access to DataStore > --------------------------------------------- > > Key: OAK-2882 > URL: https://issues.apache.org/jira/browse/OAK-2882 > Project: Jackrabbit Oak > Issue Type: New Feature > Components: upgrade > Reporter: Chetan Mehrotra > Assignee: Chetan Mehrotra > Labels: docs-impacting > Fix For: 1.3.0, 1.0.15 > > Attachments: OAK-2882-v2.patch, OAK-2882.patch, > build_datastore_list.sh > > > Migration currently involves access to DataStore as its configured as part of > repository.xml. However in complete migration actual binary content in > DataStore is not accessed and migration logic only makes use of > * Dataidentifier = id of the files > * Length = As it gets encoded as part of blobId (OAK-1667) > It would be faster and beneficial to allow migration without actual access to > the DataStore. It would serve two benefits > # Allows one to test out migration on local setup by just copying the TarPM > files. For e.g. one can only zip following files to get going with repository > startup if we can somehow avoid having direct access to DataStore > {noformat} > >crx-quickstart# tar -zcvf repo-2.tar.gz repository > >--exclude=repository/repository/datastore > >--exclude=repository/repository/index > >--exclude=repository/workspaces/crx.default/index > >--exclude=repository/tarJournal > {noformat} > # Provides faster (repeatable) migration as access to DataStore can be > avoided which in cases like S3 might be slow. Given we solve how to get > length > *Proposal* > Have a DataStore implementation which can be provided a mapping file having > entries for blobId and length. This file would be used to answer queries > regarding length and existing of blob and thus would avoid actual access to > DataStore. > Going further this DataStore can be configured with a delegate which can be > used as a fallback in case the required details is not present in pre > computed data set (may be due to change in content after that data was > computed) -- This message was sent by Atlassian JIRA (v6.3.4#6332)