[ https://issues.apache.org/jira/browse/OAK-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020559#comment-16020559 ]
Vikas Saurabh edited comment on OAK-2808 at 5/24/17 1:10 PM: ------------------------------------------------------------- I've done some preliminary work here - https://github.com/catholicon/jackrabbit-oak/commits/OAK-2808-active-lucene-binary-deletion. [~chetanm], the initial state that you reviewed earlier is at https://github.com/catholicon/jackrabbit-oak/commits/OAK-2808-active-lucene-binary-deletion-take1. Apart from executor service, I think I've incorporated all the changes you mentioned. Things that need improvement: * -Use executor for file flush and subsequently some changes if we can flush more quickly- Updated branch executor for async flush * missing test and javadocs * refactor/cleanup/renames??? >From feature pov, following pieces are still missing: * API to get oldest safe timestamp from checkpoint (OAK-6227) * Patch up purge-blobs call to some scheduled task [~tmueller], can you please take a peek for early review on the direction. Current state is a little rough and needs cleanup... but it should give you the idea of how I'm implementing it. PS: While the commits I've broken in are mostly distinct - but the usage of those seems very cohesive to be broken into separate issues. But, I'm fairly ok one way or another wrt to multiple issues/sub-tasks. was (Author: catholicon): I've done some preliminary work here - https://github.com/catholicon/jackrabbit-oak/commits/OAK-2808-active-lucene-binary-deletion. [~chetanm], the initial state that you reviewed earlier is at https://github.com/catholicon/jackrabbit-oak/commits/OAK-2808-active-lucene-binary-deletion-take1. Apart from executor service, I think I've incorporated all the changes you mentioned. Things that need improvement: * Use executor for file flush and subsequently some changes if we can flush more quickly * missing test and javadocs * refactor/cleanup/renames??? >From feature pov, following pieces are still missing: * API to get oldest safe timestamp from checkpoint (OAK-6227) * Patch up purge-blobs call to some scheduled task [~tmueller], can you please take a peek for early review on the direction. Current state is a little rough and needs cleanup... but it should give you the idea of how I'm implementing it. PS: While the commits I've broken in are mostly distinct - but the usage of those seems very cohesive to be broken into separate issues. But, I'm fairly ok one way or another wrt to multiple issues/sub-tasks. > Active deletion of 'deleted' Lucene index files from DataStore without > relying on full scale Blob GC > ---------------------------------------------------------------------------------------------------- > > Key: OAK-2808 > URL: https://issues.apache.org/jira/browse/OAK-2808 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: lucene > Reporter: Chetan Mehrotra > Assignee: Thomas Mueller > Labels: datastore, performance > Fix For: 1.8 > > Attachments: copyonread-stats.png, OAK-2808-1.patch > > > With storing of Lucene index files within DataStore our usage pattern > of DataStore has changed between JR2 and Oak. > With JR2 the writes were mostly application based i.e. if application > stores a pdf/image file then that would be stored in DataStore. JR2 by > default would not write stuff to DataStore. Further in deployment > where large number of binary content is present then systems tend to > share the DataStore to avoid duplication of storage. In such cases > running Blob GC is a non trivial task as it involves a manual step and > coordination across multiple deployments. Due to this systems tend to > delay frequency of GC > Now with Oak apart from application the Oak system itself *actively* > uses the DataStore to store the index files for Lucene and there the > churn might be much higher i.e. frequency of creation and deletion of > index file is lot higher. This would accelerate the rate of garbage > generation and thus put lot more pressure on the DataStore storage > requirements. > Discussion thread http://markmail.org/thread/iybd3eq2bh372zrl -- This message was sent by Atlassian JIRA (v6.3.15#6346)