[ 
https://issues.apache.org/jira/browse/OAK-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986971#comment-14986971
 ] 

Tomek Rękawek edited comment on OAK-3559 at 12/2/15 1:27 PM:
-------------------------------------------------------------

h4. New bulk update method

The patch adds new {{createOrUpdate(Collection<T> collection, List<UpdateOp> 
updateOps)}} method to the {{DocumentStore}} interface. The MongoDB 
implementation uses Bulk API. RDB and Memory document stores has been extended 
with a naive implementation iterating over {{updateOps}}. The Mongo 
implementation works as follows:

1. For each {{UpdateOp}} try to read the assigned document from the cache. Add 
them to {{oldDocs}}.
2. Prepare a list of all {{UpdateOps}} that doesn't have their documents and 
read them in one {{find()}} call. Add results to {{oldDocs}}.
3. Prepare a bulk update. For each remaining {{UpdateOp}} add following 
operation:
    * Find document with the same id and the same {{mod_count}} as in the 
{{oldDocs}}.
    * Apply changes from the {{UpdateOps}}.

4. Execute the bulk update.

If some other process modifies the target documents between points 2 and 3, the 
{{mod_count}} will be increased as well and the bulk update will fail for the 
concurrently modified docs. The method will then remove the failed documents 
from the {{oldDocs}} and restart the process from point 2. It will stop after 
3rd iteration.

h4. Changes in the Commit class

The new method has been used in the {{Commit#applyToDocumentStore}}. If it 
fails (eg. there has been more than 3 unsuccessful retries in the Mongo 
implementation), there will be fallback to the classic approach, applying one 
update after another.

h4. Changes in the CommitQueue and ConflictException

Introducing bulk updates means that we may have conflicts in many revisions at 
the same time. That's the reason why the {{ConflictException}} now contains the 
revision list, rather than a single revision number. In order to resolve 
conflicts in the {{DocumentNodeStoreBranch#merge0}} method, the 
{{CommitQueue#suspendUntil()}} has been extended as well. Now it allows to pass 
a list of revisions and suspends execution until all of them are visible.


was (Author: tomek.rekawek):
The pull request has been created here:
https://github.com/apache/jackrabbit-oak/pull/43

The patch can be downloaded from:
https://patch-diff.githubusercontent.com/raw/apache/jackrabbit-oak/pull/43.diff

h4. New bulk update method

The patch adds new {{createOrUpdate(Collection<T> collection, List<UpdateOp> 
updateOps)}} method to the {{DocumentStore}} interface. The MongoDB 
implementation uses Bulk API. RDB and Memory document stores has been extended 
with a naive implementation iterating over {{updateOps}}. The Mongo 
implementation works as follows:

1. For each {{UpdateOp}} try to read the assigned document from the cache. Add 
them to {{oldDocs}}.
2. Prepare a list of all {{UpdateOps}} that doesn't have their documents and 
read them in one {{find()}} call. Add results to {{oldDocs}}.
3. Prepare a bulk update. For each remaining {{UpdateOp}} add following 
operation:
    * Find document with the same id and the same {{mod_count}} as in the 
{{oldDocs}}.
    * Apply changes from the {{UpdateOps}}.

4. Execute the bulk update.

If some other process modifies the target documents between points 2 and 3, the 
{{mod_count}} will be increased as well and the bulk update will fail for the 
concurrently modified docs. The method will then remove the failed documents 
from the {{oldDocs}} and restart the process from point 2. It will stop after 
3rd iteration.

h4. Changes in the Commit class

The new method has been used in the {{Commit#applyToDocumentStore}}. If it 
fails (eg. there has been more than 3 unsuccessful retries in the Mongo 
implementation), there will be fallback to the classic approach, applying one 
update after another.

h4. Changes in the CommitQueue and ConflictException

Introducing bulk updates means that we may have conflicts in many revisions at 
the same time. That's the reason why the {{ConflictException}} now contains the 
revision list, rather than a single revision number. In order to resolve 
conflicts in the {{DocumentNodeStoreBranch#merge0}} method, the 
{{CommitQueue#suspendUntil()}} has been extended as well. Now it allows to pass 
a list of revisions and suspends execution until all of them are visible.

> Bulk document updates in MongoDocumentStore
> -------------------------------------------
>
>                 Key: OAK-3559
>                 URL: https://issues.apache.org/jira/browse/OAK-3559
>             Project: Jackrabbit Oak
>          Issue Type: Sub-task
>          Components: mongomk
>            Reporter: Tomek Rękawek
>             Fix For: 1.4
>
>         Attachments: OAK-3559.patch
>
>
> Using the MongoDB [Bulk 
> API|https://docs.mongodb.org/manual/reference/method/Bulk/#Bulk] implement 
> the [batch version of createOrUpdate method|OAK-3662].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to