[jira] [Comment Edited] (OAK-2808) Active deletion of 'deleted' Lucene index files from DataStore without relying on full scale Blob GC

Chetan Mehrotra (JIRA) Wed, 10 May 2017 22:35:20 -0700

    [ 
https://issues.apache.org/jira/browse/OAK-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16005906#comment-16005906
 ]


Chetan Mehrotra edited comment on OAK-2808 at 5/11/17 5:34 AM:
---------------------------------------------------------------

bq. store blob ids as Strings instead of as blob ids

Thats the issue with storing the blobIds in repository. The proposed approach 
based on local file should work fine

{quote}
changes to the datastore GC are needed to collect some blobs
 changed to the datastore GC are needed to retain other blobs
{quote}

No change in current datastore GC is to be done. You only need the last gc 
details if you want to support rolling back to revisions which are older than 
last valid checkpoint. Note that such a rollback is not supported per api. 

bq.  can not support rollback to an old revision

You can rollback to last checkpointed revision. We never supported the notion 
that rollback to revisions prior to oldest checkpoint state can work properly. 
Such a rollback can only possibly work for segment and would not work for 
DocumentNodeStore. So not sure we would want to account for that

bq. "only persist the index (to the repository) every x minutes"

The problem there is complexity introduced in index logic to ensure that index 
remains consistent wrt repository state in case of unclean shutdown, leader 
node going away in a cluster setup etc. Lets take following timeline of events

# T1 - Last valid checkpoint = cp-t1. Indexes in repo are upto date till this 
time
# T2 - Next indexing cycle happens. 
#* checkpoint cp-t1 is released 
#* last valid checkpoint is cp-t2
#* Local index on cluster node N1 are upto date till cp-t2
#* Index stored in repo are upto date till cp-t1
# T3 - Cluster node N1 dies and N2 becomes leader
#* It does indexing from  [cp-t2, cp-t3] and 
#* also updates the indexes present in repo to cp-t3

In above sequence the indexes would not have the data from [cp-t1, cp-t2] as 
that data was stored locally on N1 which has gone away now.

Current approach ensures that checkpoint is only released and new one stored 
only when all index data is safely stored in repository and even with lost of 
Oak nodes in cluster the index data would remain safe. To handle that we would 
need to

# Ensure that leader remain stable in topology i.e. does not change frequently 
such that locally stored index data do gets added to repo
# Change the indexing logic to track 2 checkpoints and do reindexing from older 
checkpoint for the case where leader node died
# Need to change current index implementations to account for this 2 phase 
index update
# NRT indexes do not index binary content and rely on persistent index to 
answer queries on binary content. So index result would remain stale wrt binary 
content search depending on frequency of such index data upload

We can try doing that but need to be careful there that we never leave indexed 
in inconsistent state and cover all failure aspects. Current indexing is 
transactional/atomic and made reliable over period of time wrt checkpoint 
handling. 

Also even in that case active deletion would help as even with reduced 
frequency the garbage would still be there and full blobgc cannot be run very 
frequently. So active deletion would complement such an approach. 




was (Author: chetanm):
bq. store blob ids as Strings instead of as blob ids

Thats the issue with storing the blobIds in repository. The proposed approach 
based on local file should work fine

{quote}
changes to the datastore GC are needed to collect some blobs
 changed to the datastore GC are needed to retain other blobs
{quote}

No change in current datastore GC is to be done. You only need the last gc 
details if you want to support rolling back to revisions which are older than 
last valid checkpoint. Note that such a rollback is not supported per api. 

bq.  can not support rollback to an old revision

You can rollback to last checkpointed revision. We never supported the notion 
that rollback to revisions prior to oldest checkpoint state can work properly. 
Such a rollback can only possibly work for segment and would not work for 
DocumentNodeStore. So not sure we would want to account for that

bq. "only persist the index (to the repository) every x minutes"

The problem there is complexity introduced in index logic to ensure that index 
remains consistent wrt repository state in case of unclean shutdown, leader 
node going away in a cluster setup etc. Lets take following timeline of events

# T1 - Last valid checkpoint = cp-t1. Indexes in repo are upto date till this 
time
# T2 - Next indexing cycle happens. 
#* checkpoint cp-t1 is released 
#* last valid checkpoint is cp-t2
#* Local index on cluster node N1 are upto date till cp-t2
#* Index stored in repo are upto date till cp-t1
# T3 - Cluster node N1 dies and N2 becomes leader
#* It does indexing from  [cp-t2, cp-t3] and 
#* also updates the indexes present in repo to cp-t3

In above sequence the indexes would not have the data from [cp-t1, cp-t2] as 
that data was stored locally on N1 which has gone away now.

Current approach ensures that checkpoint is only released and new one stored 
only when all index data is safely stored in repository and even with lost of 
Oak nodes in cluster the index data would remain safe. To handle that we would 
need to

# Ensure that leader remain stable in topology i.e. does not change frequently 
such that locally stored index data do gets added to repo
# Change the indexing logic to track 2 checkpoints and do reindexing from older 
checkpoint for the case where leader node died

We can try doing that but need to be careful there that we never leave indexed 
in inconsistent state and cover all failure aspects. Current indexing is 
transactional/atomic and made reliable over period of time wrt checkpoint 
handling. 

Also even in that case active deletion would help as even with reduced 
frequency the garbage would still be there and full blobgc cannot be run very 
frequently. So active deletion would complement such an approach. 



> Active deletion of 'deleted' Lucene index files from DataStore without 
> relying on full scale Blob GC
> ----------------------------------------------------------------------------------------------------
>
>                 Key: OAK-2808
>                 URL: https://issues.apache.org/jira/browse/OAK-2808
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>            Reporter: Chetan Mehrotra
>            Assignee: Thomas Mueller
>              Labels: datastore, performance
>             Fix For: 1.8
>
>         Attachments: copyonread-stats.png, OAK-2808-1.patch
>
>
> With storing of Lucene index files within DataStore our usage pattern
> of DataStore has changed between JR2 and Oak.
> With JR2 the writes were mostly application based i.e. if application
> stores a pdf/image file then that would be stored in DataStore. JR2 by
> default would not write stuff to DataStore. Further in deployment
> where large number of binary content is present then systems tend to
> share the DataStore to avoid duplication of storage. In such cases
> running Blob GC is a non trivial task as it involves a manual step and
> coordination across multiple deployments. Due to this systems tend to
> delay frequency of GC
> Now with Oak apart from application the Oak system itself *actively*
> uses the DataStore to store the index files for Lucene and there the
> churn might be much higher i.e. frequency of creation and deletion of
> index file is lot higher. This would accelerate the rate of garbage
> generation and thus put lot more pressure on the DataStore storage
> requirements.
> Discussion thread http://markmail.org/thread/iybd3eq2bh372zrl



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (OAK-2808) Active deletion of 'deleted' Lucene index files from DataStore without relying on full scale Blob GC

Reply via email to