[jira] [Commented] (OAK-3559) Bulk document updates in MongoDocumentStore

2016-01-28 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15120995#comment-15120995
 ] 

Marcel Reutegger commented on OAK-3559:
---

The patch looks good. I made some minor modifications:

* Changed update() to updateOne() in MongoDocumentStore.sendBulkUpdate()
* Added argument to PERFLOG.end() call in in createOrUpdate()
* Added more tests

One of the new tests in MultiDocumentStoreTest revealed issues with the
RDBDocumentStore implementation. The test concurrentBatchUpdate() fails
either with:

{noformat}
java.lang.AssertionError: 
org.apache.jackrabbit.oak.plugins.document.DocumentStoreException: 
org.h2.jdbc.JdbcBatchUpdateException: Timeout trying to lock table ; SQL 
statement:
insert into dstest_NODES(ID, MODIFIED, HASBINARY, DELETEDONCE, MODCOUNT, 
CMODCOUNT, DSIZE, DATA, BDATA) values (?, ?, ?, ?, ?, ?, ?, ?, ?) [50200-185]
{noformat}

or with this one:

{noformat}
java.lang.AssertionError: 
org.apache.jackrabbit.oak.plugins.document.DocumentStoreException: 
org.h2.jdbc.JdbcBatchUpdateException: Unique index or primary key violation: 
"PRIMARY_KEY_1 ON PUBLIC.DSTEST_NODES(ID) VALUES ('1:/node-40', 118)"; SQL 
statement:
insert into dstest_NODES(ID, MODIFIED, HASBINARY, DELETEDONCE, MODCOUNT, 
CMODCOUNT, DSIZE, DATA, BDATA) values (?, ?, ?, ?, ?, ?, ?, ?, ?) [23505-185]
{noformat}

For now I only enabled the test for the MongoMK fixture. I will create
an issue for the second type of failures with RDB. The first type
of failures with the timeout is already tracked with OAK-3924. 


> Bulk document updates in MongoDocumentStore
> ---
>
> Key: OAK-3559
> URL: https://issues.apache.org/jira/browse/OAK-3559
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: mongomk
>Reporter: Tomek Rękawek
>Assignee: Marcel Reutegger
>Priority: Blocker
> Fix For: 1.4, 1.3.15
>
> Attachments: OAK-3559.patch
>
>
> Using the MongoDB [Bulk 
> API|https://docs.mongodb.org/manual/reference/method/Bulk/#Bulk] implement 
> the [batch version of createOrUpdate method|OAK-3662].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3559) Bulk document updates in MongoDocumentStore

2016-01-27 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15118971#comment-15118971
 ] 

Marcel Reutegger commented on OAK-3559:
---

The patch also contains a fix unrelated to this issue. With your patch 
{{MongoDocumentStore.getIfCached()}} checks if the document is 
{{NodeDocument.NULL}}. I created a separate issue to fix it: OAK-3932.

> Bulk document updates in MongoDocumentStore
> ---
>
> Key: OAK-3559
> URL: https://issues.apache.org/jira/browse/OAK-3559
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: mongomk
>Reporter: Tomek Rękawek
>Assignee: Marcel Reutegger
> Fix For: 1.4
>
> Attachments: OAK-3559.patch
>
>
> Using the MongoDB [Bulk 
> API|https://docs.mongodb.org/manual/reference/method/Bulk/#Bulk] implement 
> the [batch version of createOrUpdate method|OAK-3662].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3559) Bulk document updates in MongoDocumentStore

2016-01-26 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15117109#comment-15117109
 ] 

Tomek Rękawek commented on OAK-3559:


The concurrent unit tests have been moved to OAK-3924 (as they were breaking 
the RDB implementation).

> Bulk document updates in MongoDocumentStore
> ---
>
> Key: OAK-3559
> URL: https://issues.apache.org/jira/browse/OAK-3559
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: mongomk
>Reporter: Tomek Rękawek
>Assignee: Marcel Reutegger
> Fix For: 1.4
>
> Attachments: OAK-3559.patch
>
>
> Using the MongoDB [Bulk 
> API|https://docs.mongodb.org/manual/reference/method/Bulk/#Bulk] implement 
> the [batch version of createOrUpdate method|OAK-3662].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3559) Bulk document updates in MongoDocumentStore

2016-01-19 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106860#comment-15106860
 ] 

Tomek Rękawek commented on OAK-3559:


[~mreutegg], do you have any other suggestions?

> Bulk document updates in MongoDocumentStore
> ---
>
> Key: OAK-3559
> URL: https://issues.apache.org/jira/browse/OAK-3559
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: mongomk
>Reporter: Tomek Rękawek
>Assignee: Marcel Reutegger
> Fix For: 1.4
>
> Attachments: OAK-3559.patch
>
>
> Using the MongoDB [Bulk 
> API|https://docs.mongodb.org/manual/reference/method/Bulk/#Bulk] implement 
> the [batch version of createOrUpdate method|OAK-3662].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3559) Bulk document updates in MongoDocumentStore

2016-01-04 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15081202#comment-15081202
 ] 

Tomek Rękawek commented on OAK-3559:


I updated the patch with following changes:

1. Lock keys are ordered before being acquired - the implementation was moved 
to the NodeDocumentLocks.
2. The multi-lock acquire was moved to the bulkUpdate(), so the lock scope is 
smaller (only the currently processed batch).
3. Added an extra condition (modCount not exists) for the new documents to 
avoid race condition as in the previous comment.
4. Created two concurrent test cases in BulkCreateOrUpdateTest.
5. Extracted the size()>2 check to a separate {{if}} and added a comment.

> Bulk document updates in MongoDocumentStore
> ---
>
> Key: OAK-3559
> URL: https://issues.apache.org/jira/browse/OAK-3559
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: mongomk
>Reporter: Tomek Rękawek
>Assignee: Marcel Reutegger
> Fix For: 1.4
>
> Attachments: OAK-3559.patch
>
>
> Using the MongoDB [Bulk 
> API|https://docs.mongodb.org/manual/reference/method/Bulk/#Bulk] implement 
> the [batch version of createOrUpdate method|OAK-3662].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3559) Bulk document updates in MongoDocumentStore

2016-01-04 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15080939#comment-15080939
 ] 

Tomek Rękawek commented on OAK-3559:


1. Deadlock in the createOrUpdate()
True. Maybe ordering the locks would help here?

2. size() > 2
If we have less than 2 updates, we can apply them sequentially, as the 
bulkUpdate() will send two Mongo requests anyway (find+update).

3. Race in bulkUpdate()
Maybe we can fix it by adding following condition to the BulkWriteOperation in 
the sendBulkUpdate:
{code}
if (oldDoc == null) {
query.not().exists(Document.MOD_COUNT);
} else {
query.and(Document.MOD_COUNT).is(oldDoc.getModCount());
}
{code}
So we won't update the document if it already has some MOD_COUNT set (and 
therefore exists).

> Bulk document updates in MongoDocumentStore
> ---
>
> Key: OAK-3559
> URL: https://issues.apache.org/jira/browse/OAK-3559
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: mongomk
>Reporter: Tomek Rękawek
>Assignee: Marcel Reutegger
> Fix For: 1.4
>
> Attachments: OAK-3559.patch
>
>
> Using the MongoDB [Bulk 
> API|https://docs.mongodb.org/manual/reference/method/Bulk/#Bulk] implement 
> the [batch version of createOrUpdate method|OAK-3662].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3559) Bulk document updates in MongoDocumentStore

2016-01-04 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15080915#comment-15080915
 ] 

Marcel Reutegger commented on OAK-3559:
---

I think the current patch is prone to deadlocks. Currently MongoDocumentStore 
only ever acquires one lock at a time and releases it e.g. after a call to 
MongoDB is made. With the patch a sequence of locks are acquired. Depending on 
the sequence of UpdateOps two concurrent calls to the new createOrUpdate() 
method may deadlock.

In the same method there is also this for loop:

{noformat}
for (int i = 0; i < 3 && operationsToCover.size() > 2; i++) 
{noformat}

Why does it check for {{operationsToCover.size() > 2}}?

I also think there is a race condition in bulkUpdate(). findDocuments() will 
return null for documents that do not exist and sendBulkUpdate() will 
unconditionally upsert such documents. I think a concurrent create between 
findDocuments() and sendBulkdUpdate() will go undetected.

> Bulk document updates in MongoDocumentStore
> ---
>
> Key: OAK-3559
> URL: https://issues.apache.org/jira/browse/OAK-3559
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: mongomk
>Reporter: Tomek Rękawek
>Assignee: Marcel Reutegger
> Fix For: 1.4
>
> Attachments: OAK-3559.patch
>
>
> Using the MongoDB [Bulk 
> API|https://docs.mongodb.org/manual/reference/method/Bulk/#Bulk] implement 
> the [batch version of createOrUpdate method|OAK-3662].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3559) Bulk document updates

2015-11-09 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996258#comment-14996258
 ] 

Tomek Rękawek commented on OAK-3559:


{quote}
>The original test didn't work on the delayed network, so I modified it to 
>create 1000 nodes, rather than 1.

Can you please explain why it didn't work?{quote}

Well, it'd work, but also it'd be long. It takes about 45 seconds for the 
sequential code to create 1000 nodes on the delayed network, so it'd be about 8 
minutes to do a single iteration with 10 000 nodes. I wanted to have at least a 
few iterations during the 5-minutes test.

{quote}
>It seems that the network latency is the deciding factor for the sequential 
>approach

Hmm, that's strange. In my tests for OAK-3554 I was able to reproduce the 
calculated average journal flush wait time of 16ms with the default MongoDB 
journalCommitInterval. I would have expected to see these 16ms added to the 
20ms latency.{quote}
That's indeed strange. I compared the sequential CreateManyChildNodesTest on 
non-journaled and journaled mongo (without latency in both cases):
{noformat}
 ### latency: 0ms, sequential (SNAPSHOT) ###
 C min 10% 50% 90% max   N
no journal   1 395 406 450 5301130 299
journal  1 813 813 87610461046   4
{noformat}
So, according to time results the journaled version is only 2x longer, but on 
the other hand it was able to do just 4 iterations (rather than 299). I'll look 
into the benchmark code. It isn't related to the bulk update, though.

> Bulk document updates
> -
>
> Key: OAK-3559
> URL: https://issues.apache.org/jira/browse/OAK-3559
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: core, documentmk, mongomk
>Reporter: Tomek Rękawek
> Fix For: 1.4
>
> Attachments: OAK-3559.patch
>
>
> The {{DocumentStore#createOrUpdate(Collection, UpdateOp)}} method is invoked 
> in a loop in the {{Commit#applyToDocumentStore()}}, once for each changed 
> node. Investigate if it's possible to implement a batch version of the 
> createOrUpdate method, using the MongoDB [Bulk 
> API|https://docs.mongodb.org/manual/reference/method/Bulk/#Bulk]. It should 
> return all documents before they are modified, so the Commit class can 
> discover conflicts (if they are any).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3559) Bulk document updates

2015-11-09 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996200#comment-14996200
 ] 

Marcel Reutegger commented on OAK-3559:
---

bq. The original test didn't work on the delayed network, so I modified it to 
create 1000 nodes, rather than 1.

Can you please explain why it didn't work?

bq. It seems that the network latency is the deciding factor for the sequential 
approach

Hmm, that's strange. In my tests for OAK-3554 I was able to reproduce the 
calculated average journal flush wait time of 16ms with the default MongoDB 
journalCommitInterval. I would have expected to see these 16ms added to the 
20ms latency.

> Bulk document updates
> -
>
> Key: OAK-3559
> URL: https://issues.apache.org/jira/browse/OAK-3559
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: core, documentmk, mongomk
>Reporter: Tomek Rękawek
> Fix For: 1.4
>
> Attachments: OAK-3559.patch
>
>
> The {{DocumentStore#createOrUpdate(Collection, UpdateOp)}} method is invoked 
> in a loop in the {{Commit#applyToDocumentStore()}}, once for each changed 
> node. Investigate if it's possible to implement a batch version of the 
> createOrUpdate method, using the MongoDB [Bulk 
> API|https://docs.mongodb.org/manual/reference/method/Bulk/#Bulk]. It should 
> return all documents before they are modified, so the Commit class can 
> discover conflicts (if they are any).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3559) Bulk document updates

2015-11-06 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993474#comment-14993474
 ] 

Tomek Rękawek commented on OAK-3559:


I set the journal to true and got following results:

{noformat}
   ### latency: 20ms, bulk size: 400, journal: true ###

C min 10% 50% 90% max   N
bulk (OAK-3559) 1 731 770 805 8801195 215
sequential (SNAPSHOT)   1   43933   43933   45287   47194   47194   3
{noformat}
So, the performance gain in bulk vs sequential is smaller after enabling 
journal (a modest 55x with journal rather than 70x with ack).

It seems that the network latency is the deciding factor for the sequential 
approach, either with or without journal. In case of the bulk approach, 
probably writing a big, batched update into journal takes some time.

> Bulk document updates
> -
>
> Key: OAK-3559
> URL: https://issues.apache.org/jira/browse/OAK-3559
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: core, documentmk, mongomk
>Reporter: Tomek Rękawek
> Fix For: 1.4
>
> Attachments: OAK-3559.patch
>
>
> The {{DocumentStore#createOrUpdate(Collection, UpdateOp)}} method is invoked 
> in a loop in the {{Commit#applyToDocumentStore()}}, once for each changed 
> node. Investigate if it's possible to implement a batch version of the 
> createOrUpdate method, using the MongoDB [Bulk 
> API|https://docs.mongodb.org/manual/reference/method/Bulk/#Bulk]. It should 
> return all documents before they are modified, so the Commit class can 
> discover conflicts (if they are any).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3559) Bulk document updates

2015-11-05 Thread Chetan Mehrotra (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993222#comment-14993222
 ] 

Chetan Mehrotra commented on OAK-3559:
--

[~tomek.rekawek] Can also get numbers by increasing the write concern (see 
OAK-3554). Hopefully with bulk update the benefits would be much higher

> Bulk document updates
> -
>
> Key: OAK-3559
> URL: https://issues.apache.org/jira/browse/OAK-3559
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: core, documentmk, mongomk
>Reporter: Tomek Rękawek
> Fix For: 1.4
>
> Attachments: OAK-3559.patch
>
>
> The {{DocumentStore#createOrUpdate(Collection, UpdateOp)}} method is invoked 
> in a loop in the {{Commit#applyToDocumentStore()}}, once for each changed 
> node. Investigate if it's possible to implement a batch version of the 
> createOrUpdate method, using the MongoDB [Bulk 
> API|https://docs.mongodb.org/manual/reference/method/Bulk/#Bulk]. It should 
> return all documents before they are modified, so the Commit class can 
> discover conflicts (if they are any).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3559) Bulk document updates

2015-11-05 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14991689#comment-14991689
 ] 

Tomek Rękawek commented on OAK-3559:


I moved the fallback to the DS.

Also, did some benchmark tests - with the sequential approach (current 
SNAPSHOT) and the bulk API (this branch). In the second series I used the 
[Network Link 
Conditioner|https://developer.apple.com/library/ios/documentation/NetworkingInternetWeb/Conceptual/NetworkingOverview/WhyNetworkingIsHard/WhyNetworkingIsHard.html]
 to simulate high latency between different regions (300 ms).

{noformat}
Warmup:   10s
Runtime: 300s

# CreateManyChildNodesTest C min 10% 50% 90% max   N
# Latency   0ms
Sequential (SNAPSHOT)  133933440353039067078  30
Bulk (OAK-3559)128932970304633466649  39

# Latency 300ms
Sequential (SNAPSHOT)  134963527364240266152  31
Bulk (OAK-3559)130273058311732916467  39
{noformat}

So, the bulk approach is about 14% faster in this test.

> Bulk document updates
> -
>
> Key: OAK-3559
> URL: https://issues.apache.org/jira/browse/OAK-3559
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: core, documentmk, mongomk
>Reporter: Tomek Rękawek
> Fix For: 1.4
>
> Attachments: OAK-3559.patch
>
>
> The {{DocumentStore#createOrUpdate(Collection, UpdateOp)}} method is invoked 
> in a loop in the {{Commit#applyToDocumentStore()}}, once for each changed 
> node. Investigate if it's possible to implement a batch version of the 
> createOrUpdate method, using the MongoDB [Bulk 
> API|https://docs.mongodb.org/manual/reference/method/Bulk/#Bulk]. It should 
> return all documents before they are modified, so the Commit class can 
> discover conflicts (if they are any).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3559) Bulk document updates

2015-11-04 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14989631#comment-14989631
 ] 

Marcel Reutegger commented on OAK-3559:
---

In my view it would make the DS contract simpler and also simplify the Commit 
class, which is already complex. So, my preference is indeed a DS which does 
its best to create or update the documents as requested. I think it should only 
fail with an exception when something goes wrong communicating with MongoDB.

> Bulk document updates
> -
>
> Key: OAK-3559
> URL: https://issues.apache.org/jira/browse/OAK-3559
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: core, documentmk, mongomk
>Reporter: Tomek Rękawek
> Fix For: 1.4
>
>
> The {{DocumentStore#createOrUpdate(Collection, UpdateOp)}} method is invoked 
> in a loop in the {{Commit#applyToDocumentStore()}}, once for each changed 
> node. Investigate if it's possible to implement a batch version of the 
> createOrUpdate method, using the MongoDB [Bulk 
> API|https://docs.mongodb.org/manual/reference/method/Bulk/#Bulk]. It should 
> return all documents before they are modified, so the Commit class can 
> discover conflicts (if they are any).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3559) Bulk document updates

2015-11-04 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14989528#comment-14989528
 ] 

Tomek Rękawek commented on OAK-3559:


I will extract the CommitQueue and ConflictException changes into OAK-3586.

Regarding the fallback, having it in the Commit class allows us to check 
conflicts after each single update. After moving fallback to the DocumentStore, 
it'll try to apply all changes, so we lose the opportunity to break the loop if 
one of the changes causes conflict. At least that was my initial idea - I won't 
insists on keeping it this way. Do you still think we should move the fallback 
to DS?

> Bulk document updates
> -
>
> Key: OAK-3559
> URL: https://issues.apache.org/jira/browse/OAK-3559
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: core, documentmk, mongomk
>Reporter: Tomek Rękawek
> Fix For: 1.4
>
>
> The {{DocumentStore#createOrUpdate(Collection, UpdateOp)}} method is invoked 
> in a loop in the {{Commit#applyToDocumentStore()}}, once for each changed 
> node. Investigate if it's possible to implement a batch version of the 
> createOrUpdate method, using the MongoDB [Bulk 
> API|https://docs.mongodb.org/manual/reference/method/Bulk/#Bulk]. It should 
> return all documents before they are modified, so the Commit class can 
> discover conflicts (if they are any).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3559) Bulk document updates

2015-11-04 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14989511#comment-14989511
 ] 

Marcel Reutegger commented on OAK-3559:
---

Thanks for the pull request.

bq. If some other process modifies the target documents between points 2 and 3, 
the mod_count will be increased as well and the bulk update will fail for the 
concurrently modified docs. The method will then remove the failed documents 
from the oldDocs and restart the process from point 2. It will stop after 3rd 
iteration.

Would it be possible to implement the fallback in the DocumentStore instead of 
the Commit class or is there a specific reason why you did it that way?

bq. Changes in the CommitQueue and ConflictException

This looks useful in general. Why don't we move this into a separate issue that 
can be resolved beforehand?

> Bulk document updates
> -
>
> Key: OAK-3559
> URL: https://issues.apache.org/jira/browse/OAK-3559
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: core, documentmk, mongomk
>Reporter: Tomek Rękawek
> Fix For: 1.4
>
>
> The {{DocumentStore#createOrUpdate(Collection, UpdateOp)}} method is invoked 
> in a loop in the {{Commit#applyToDocumentStore()}}, once for each changed 
> node. Investigate if it's possible to implement a batch version of the 
> createOrUpdate method, using the MongoDB [Bulk 
> API|https://docs.mongodb.org/manual/reference/method/Bulk/#Bulk]. It should 
> return all documents before they are modified, so the Commit class can 
> discover conflicts (if they are any).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3559) Bulk document updates

2015-11-03 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14986971#comment-14986971
 ] 

Tomek Rękawek commented on OAK-3559:


The pull request has been created here:
https://github.com/apache/jackrabbit-oak/pull/43

The patch can be downloaded from:
https://patch-diff.githubusercontent.com/raw/apache/jackrabbit-oak/pull/43.diff

h4. New bulk update method

The patch adds new {{createOrUpdate(Collection collection, List 
updateOps)}} method to the {{DocumentStore}} interface. The MongoDB 
implementation uses Bulk API. RDB and Memory document stores has been extended 
with a naive implementation iterating over {{updateOps}}. The Mongo 
implementation works as follows:

1. For each {{UpdateOp}} try to read the assigned document from the cache. Add 
them to {{oldDocs}}.
2. Prepare a list of all {{UpdateOps}} that doesn't have their documents and 
read them in one {{find()}} call. Add results to {{oldDocs}}.
3. Prepare a bulk update. For each remaining {{UpdateOp}} add following 
operation:
* Find document with the same id and the same {{mod_count}} as in the 
{{oldDocs}}.
* Apply changes from the {{UpdateOps}}.
4. Execute the bulk update.

If some other process modifies the target documents between points 2 and 3, the 
{{mod_count}} will be increased as well and the bulk update will fail for the 
concurrently modified docs. The method will then remove the failed documents 
from the {{oldDocs}} and restart the process from point 2. It will stop after 
3rd iteration.

h4. Changes in the Commit class

The new method has been used in the {{Commit#applyToDocumentStore}}. If it 
fails (eg. there has been more than 3 unsuccessful retries in the Mongo 
implementation), there will be fallback to the classic approach, applying one 
update after another.

h4. Changes in the CommitQueue and ConflictException

Introducing bulk updates means that we may have conflicts in many revisions at 
the same time. That's the reason why the {{ConflictException}} now contains the 
revision list, rather than a single revision number. In order to resolve 
conflicts in the {{DocumentNodeStoreBranch#merge0}} method, the 
{{CommitQueue#suspendUntil()}} has been extended as well. Now it allows to pass 
a list of revisions and suspends execution until all of them are visible.

> Bulk document updates
> -
>
> Key: OAK-3559
> URL: https://issues.apache.org/jira/browse/OAK-3559
> Project: Jackrabbit Oak
>  Issue Type: Sub-task
>  Components: core, mongomk
>Reporter: Tomek Rękawek
> Fix For: 1.4
>
>
> The {{DocumentStore#createOrUpdate(Collection, UpdateOp)}} method is invoked 
> in a loop in the {{Commit#applyToDocumentStore()}}, once for each changed 
> node. Investigate if it's possible to implement a batch version of the 
> createOrUpdate method, using the MongoDB [Bulk 
> API|https://docs.mongodb.org/manual/reference/method/Bulk/#Bulk]. It should 
> return all documents before they are modified, so the Commit class can 
> discover conflicts (if they are any).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)