[jira] [Commented] (OAK-3637) Bulk document updates in RDBDocumentStore

2016-01-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097885#comment-15097885
 ] 

Tomek Rękawek commented on OAK-3637:


Agreed on the complexity. However, I think the performance gain is worth it.

I updated the patch. BTW, it seems that the MemoryDocumentStore doesn't update 
mod_count at all - I fixed it in the same patch, but it can be extracted to a 
separate issue as well.

> Bulk document updates in RDBDocumentStore
> -
>
> Key: OAK-3637
> URL: https://issues.apache.org/jira/browse/OAK-3637
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: rdbmk
>Affects Versions: 1.3.13, 1.2.10
>Reporter: Tomek Rękawek
>Assignee: Julian Reschke
> Fix For: 1.3.14, 1.2.11
>
> Attachments: OAK-3637-same-document-bug.patch, OAK-3637.comments.txt, 
> OAK-3637.patch, OAK-3637.patch
>
>
> Implement the [batch createOrUpdate|OAK-3662] in the RDBDocumentStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3637) Bulk document updates in RDBDocumentStore

2016-01-14 Thread Julian Reschke (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097911#comment-15097911
 ] 

Julian Reschke commented on OAK-3637:
-

We don't need a separate fixture. We can just require that if _modCount is 
present, it behaves the way it's supposed to...

> Bulk document updates in RDBDocumentStore
> -
>
> Key: OAK-3637
> URL: https://issues.apache.org/jira/browse/OAK-3637
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: rdbmk
>Affects Versions: 1.3.13, 1.2.10
>Reporter: Tomek Rękawek
>Assignee: Julian Reschke
> Fix For: 1.3.14, 1.2.11
>
> Attachments: OAK-3637-same-document-bug.patch, OAK-3637.comments.txt, 
> OAK-3637.patch, OAK-3637.patch
>
>
> Implement the [batch createOrUpdate|OAK-3662] in the RDBDocumentStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3637) Bulk document updates in RDBDocumentStore

2016-01-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097934#comment-15097934
 ] 

Tomek Rękawek commented on OAK-3637:


(y)

> Bulk document updates in RDBDocumentStore
> -
>
> Key: OAK-3637
> URL: https://issues.apache.org/jira/browse/OAK-3637
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: rdbmk
>Affects Versions: 1.3.13, 1.2.10
>Reporter: Tomek Rękawek
>Assignee: Julian Reschke
> Fix For: 1.3.14, 1.2.11
>
> Attachments: OAK-3637-same-document-bug.patch, OAK-3637.comments.txt, 
> OAK-3637.patch, OAK-3637.patch
>
>
> Implement the [batch createOrUpdate|OAK-3662] in the RDBDocumentStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3637) Bulk document updates in RDBDocumentStore

2016-01-14 Thread Marcel Reutegger (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097897#comment-15097897
 ] 

Marcel Reutegger commented on OAK-3637:
---

The _modCount is optional and a DocumentStore implementation is not required to 
maintain it. The MongoDocumentStore and RDBDocumentStore implementation use it 
to invalidate their cache. The MemoryDocumentStore obviously does not have to 
do this.

> Bulk document updates in RDBDocumentStore
> -
>
> Key: OAK-3637
> URL: https://issues.apache.org/jira/browse/OAK-3637
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: rdbmk
>Affects Versions: 1.3.13, 1.2.10
>Reporter: Tomek Rękawek
>Assignee: Julian Reschke
> Fix For: 1.3.14, 1.2.11
>
> Attachments: OAK-3637-same-document-bug.patch, OAK-3637.comments.txt, 
> OAK-3637.patch, OAK-3637.patch
>
>
> Implement the [batch createOrUpdate|OAK-3662] in the RDBDocumentStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3637) Bulk document updates in RDBDocumentStore

2016-01-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097904#comment-15097904
 ] 

Tomek Rękawek commented on OAK-3637:


[~mreutegg], thanks for the clarification. I updated the patch once more, the 
MemoryDocumentStore remains unchanged and there's an extra fixture condition in 
the test case.

> Bulk document updates in RDBDocumentStore
> -
>
> Key: OAK-3637
> URL: https://issues.apache.org/jira/browse/OAK-3637
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: rdbmk
>Affects Versions: 1.3.13, 1.2.10
>Reporter: Tomek Rękawek
>Assignee: Julian Reschke
> Fix For: 1.3.14, 1.2.11
>
> Attachments: OAK-3637-same-document-bug.patch, OAK-3637.comments.txt, 
> OAK-3637.patch, OAK-3637.patch
>
>
> Implement the [batch createOrUpdate|OAK-3662] in the RDBDocumentStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3637) Bulk document updates in RDBDocumentStore

2016-01-14 Thread Julian Reschke (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097943#comment-15097943
 ] 

Julian Reschke commented on OAK-3637:
-

Thanks. Will take it from here.

> Bulk document updates in RDBDocumentStore
> -
>
> Key: OAK-3637
> URL: https://issues.apache.org/jira/browse/OAK-3637
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: rdbmk
>Affects Versions: 1.3.13, 1.2.10
>Reporter: Tomek Rękawek
>Assignee: Julian Reschke
> Fix For: 1.3.14, 1.2.11
>
> Attachments: OAK-3637-same-document-bug.patch, OAK-3637.comments.txt, 
> OAK-3637.patch, OAK-3637.patch
>
>
> Implement the [batch createOrUpdate|OAK-3662] in the RDBDocumentStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3637) Bulk document updates in RDBDocumentStore

2016-01-14 Thread Julian Reschke (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098088#comment-15098088
 ] 

Julian Reschke commented on OAK-3637:
-

Current changes in trunk: http://svn.apache.org/r1724598 
http://svn.apache.org/r1723008

> Bulk document updates in RDBDocumentStore
> -
>
> Key: OAK-3637
> URL: https://issues.apache.org/jira/browse/OAK-3637
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: rdbmk
>Affects Versions: 1.3.13, 1.2.10
>Reporter: Tomek Rękawek
>Assignee: Julian Reschke
> Fix For: 1.3.14
>
> Attachments: OAK-3637-same-document-bug.patch, OAK-3637.comments.txt, 
> OAK-3637.patch, OAK-3637.patch
>
>
> Implement the [batch createOrUpdate|OAK-3662] in the RDBDocumentStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3637) Bulk document updates in RDBDocumentStore

2016-01-13 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15096281#comment-15096281
 ] 

Tomek Rękawek commented on OAK-3637:


[~reschke], I've noticed that the bulk createOrUpdate() method doesn't work 
correctly if there are multiple updates modifying the same document. I prepared 
appropriate test and a fix in the OAK-3637-same-document-bug.patch. Could you 
take a look on this?

> Bulk document updates in RDBDocumentStore
> -
>
> Key: OAK-3637
> URL: https://issues.apache.org/jira/browse/OAK-3637
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: rdbmk
>Affects Versions: 1.3.13, 1.2.10
>Reporter: Tomek Rękawek
>Assignee: Julian Reschke
> Fix For: 1.3.14, 1.2.11
>
> Attachments: OAK-3637-same-document-bug.patch, OAK-3637.comments.txt, 
> OAK-3637.patch, OAK-3637.patch
>
>
> Implement the [batch createOrUpdate|OAK-3662] in the RDBDocumentStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3637) Bulk document updates in RDBDocumentStore

2016-01-13 Thread Julian Reschke (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15096330#comment-15096330
 ] 

Julian Reschke commented on OAK-3637:
-

OK, will have a look.

> Bulk document updates in RDBDocumentStore
> -
>
> Key: OAK-3637
> URL: https://issues.apache.org/jira/browse/OAK-3637
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: rdbmk
>Affects Versions: 1.3.13, 1.2.10
>Reporter: Tomek Rękawek
>Assignee: Julian Reschke
> Fix For: 1.3.14, 1.2.11
>
> Attachments: OAK-3637-same-document-bug.patch, OAK-3637.comments.txt, 
> OAK-3637.patch, OAK-3637.patch
>
>
> Implement the [batch createOrUpdate|OAK-3662] in the RDBDocumentStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3637) Bulk document updates in RDBDocumentStore

2015-12-18 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063759#comment-15063759
 ] 

Tomek Rękawek commented on OAK-3637:


[~reschke], thanks for the detailed review!

{quote}
RDBDocumentStore#createOrUpdate()
JRE: the for loop is hard to understand (where do the constants and the 
conditions come from?): maybe this should be a while loop
{quote}
Refactored to while loop and added comments.

{quote}
RDBDocumentStoreJDBC#MAX_LIST_LENGTH
JRE: that sounds a bit like something that needs to be fixed separately anyway 
(it would hit us already if somebody set CHUNKSIZE to a large value, right?); 
potentiall in RDBBlobStore as well (we should address this as saparate problem 
to be fixed in 1.2 and 1.0 right away)
{quote}
Created OAK-3807.

{quote}
RDBDocumentStoreJDBC#update
JRE: are we sure that the sequence of statements implemented here will actually 
do the right thing when done concurrently with the transaction isolation that 
we have?
{quote}
There are two stages: update and insert. In the first one we include the 
modcount as a condition, so as long as single updates are done atomically, we 
won't modify anything we don't expect.

The insert stage is more complex - (1) first we look for documents that has 
been already created (so we won't try to insert them) and then (2) we call the 
proper insert. There's a chance that the document is inserted between (1) and 
(2). In this case the operation will throw an exception and - according to the 
createOrUpdate javadoc - the caller has to find out which documents has been 
updated. It's the same race condition as in 
{{RDBDocumentStore#internalCreateOrUpdate}}.

{quote}
RDBDocumentStoreJDBC#update
JRE: ok, so we drop the documents which have been modified from the update? why 
is this ok? what's the contract here?
{quote}

At this point we already tried to update the documents. Now it's time for the 
2nd stage - insert. We have a list of remaining documents. There might be 
documents which hasn't been updated correctly because of conflicts or 
completely new documents which should be added. We're interested only in the 
latter, that's why we remove documents with existing IDs from the 
{{remainingDocuments}}. Please note, that these documents won't be added to the 
{{successfulUpdates}} list, which contains all updates that has been applied.

So, the contract is that we try to update and insert as many documents as we 
can. The successful updates are returned in a list and the failed are not - 
it's caller responsibility to take care of the failed ones. I've added a 
javadoc stating this.

Regarding the other comments - I fixed all of them as suggested.

> Bulk document updates in RDBDocumentStore
> -
>
> Key: OAK-3637
> URL: https://issues.apache.org/jira/browse/OAK-3637
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: rdbmk
>Reporter: Tomek Rękawek
> Fix For: 1.4
>
> Attachments: OAK-3637.comments.txt, OAK-3637.patch
>
>
> Implement the [batch createOrUpdate|OAK-3662] in the RDBDocumentStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3637) Bulk document updates in RDBDocumentStore

2015-12-17 Thread Julian Reschke (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062157#comment-15062157
 ] 

Julian Reschke commented on OAK-3637:
-

[~tomek.rekawek]: any chance you could update your patch to apply cleanly to 
trunk again?

> Bulk document updates in RDBDocumentStore
> -
>
> Key: OAK-3637
> URL: https://issues.apache.org/jira/browse/OAK-3637
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: rdbmk
>Reporter: Tomek Rękawek
> Fix For: 1.4
>
> Attachments: OAK-3637.patch
>
>
> Implement the [batch createOrUpdate|OAK-3662] in the RDBDocumentStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3637) Bulk document updates in RDBDocumentStore

2015-12-03 Thread Julian Reschke (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037556#comment-15037556
 ] 

Julian Reschke commented on OAK-3637:
-

One more thing we should be careful with: creating too many different 
PreparedStatemens, because this might cause problems with pooling them.

In general, we should try to limit the number of statements we need, or 
alternatively mark those that are likely not to be needed a lot as non-poolable.

> Bulk document updates in RDBDocumentStore
> -
>
> Key: OAK-3637
> URL: https://issues.apache.org/jira/browse/OAK-3637
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: rdbmk
>Reporter: Tomek Rękawek
> Fix For: 1.4
>
> Attachments: OAK-3637.patch
>
>
> Implement the [batch createOrUpdate|OAK-3662] in the RDBDocumentStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3637) Bulk document updates in RDBDocumentStore

2015-12-03 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037566#comment-15037566
 ] 

Tomek Rękawek commented on OAK-3637:


Good point. In the new methods I marked statements with variable number of 
parameters as non-poolable.

> Bulk document updates in RDBDocumentStore
> -
>
> Key: OAK-3637
> URL: https://issues.apache.org/jira/browse/OAK-3637
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: rdbmk
>Reporter: Tomek Rękawek
> Fix For: 1.4
>
> Attachments: OAK-3637.patch
>
>
> Implement the [batch createOrUpdate|OAK-3662] in the RDBDocumentStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3637) Bulk document updates in RDBDocumentStore

2015-12-02 Thread Julian Reschke (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15036018#comment-15036018
 ] 

Julian Reschke commented on OAK-3637:
-

Tried the patches without changing the test...:
{{code}}
# CreateManyChildNodesTest C min 10% 50% 90% max
   N
Oak-RDB1   19490   19490   19675   21248   21248
   7
{{code}}
{{code}}
# CreateManyChildNodesTest C min 10% 50% 90% max
   N
Oak-RDB155935672579358786223
  30
{{code}}

So yes, a nice win!


> Bulk document updates in RDBDocumentStore
> -
>
> Key: OAK-3637
> URL: https://issues.apache.org/jira/browse/OAK-3637
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: rdbmk
>Reporter: Tomek Rękawek
> Fix For: 1.4
>
> Attachments: OAK-3637.patch
>
>
> Implement the [batch createOrUpdate|OAK-3662] in the RDBDocumentStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3637) Bulk document updates in RDBDocumentStore

2015-12-02 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035958#comment-15035958
 ] 

Tomek Rękawek commented on OAK-3637:


Thanks for the comment. I removed the changes related to ResultSets outside the 
new bulk methods.

> Bulk document updates in RDBDocumentStore
> -
>
> Key: OAK-3637
> URL: https://issues.apache.org/jira/browse/OAK-3637
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: rdbmk
>Reporter: Tomek Rękawek
> Fix For: 1.4
>
> Attachments: OAK-3637.patch
>
>
> Implement the [batch createOrUpdate|OAK-3662] in the RDBDocumentStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3637) Bulk document updates in RDBDocumentStore

2015-12-02 Thread Julian Reschke (JIRA)

[ 
https://issues.apache.org/jira/browse/OAK-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035941#comment-15035941
 ] 

Julian Reschke commented on OAK-3637:
-

That's a lot to review. It would be helpful if this patch only contained stuff 
that's actually related to the change. For instance, I see quite a few changes 
that move the ResultSet out of the try block so it can be closed in "finally". 
Unless I'm missing something that's pointless as that's already implied by 
closing the Statement object.

> Bulk document updates in RDBDocumentStore
> -
>
> Key: OAK-3637
> URL: https://issues.apache.org/jira/browse/OAK-3637
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: rdbmk
>Reporter: Tomek Rękawek
> Fix For: 1.4
>
> Attachments: OAK-3637.patch
>
>
> Implement the [batch createOrUpdate|OAK-3662] in the RDBDocumentStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3637) Bulk document updates in RDBDocumentStore

2015-11-26 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15028383#comment-15028383
 ] 

Tomek Rękawek commented on OAK-3637:


[~julian.resc...@gmx.de], could you review the patch? It can't be merged yet 
(it depends on the OAK-3586 and OAK-3662), but I think we can polish it in the 
meantime.

> Bulk document updates in RDBDocumentStore
> -
>
> Key: OAK-3637
> URL: https://issues.apache.org/jira/browse/OAK-3637
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: rdbmk
>Reporter: Tomek Rękawek
> Fix For: 1.4
>
> Attachments: OAK-3637.patch
>
>
> Implement the [batch createOrUpdate|OAK-3662] in the RDBDocumentStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (OAK-3637) Bulk document updates in RDBDocumentStore

2015-11-23 Thread JIRA

[ 
https://issues.apache.org/jira/browse/OAK-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022202#comment-15022202
 ] 

Tomek Rękawek commented on OAK-3637:


I've run the {{CreateManyChildNodesTest}} benchmark, using the MySQL. The numer 
of children created in one iteration is set to 100 (I wanted to have at least a 
few iterations during a 5-minute test). Results are as follows:

{noformat}
###  latency: 0ms, bulk size: 100 ###

C min 10% 50% 90% max   N
bulk (OAK-3637) 1  87  92  98 118 4501577
sequential (SNAPSHOT)   1 368 375 397 461 862 470

### latency: 20ms, bulk size: 100 ###

C min 10% 50% 90% max   N
bulk (OAK-3637) 177267754787279477973  22
sequential (SNAPSHOT)   1   42639   42639   42686   42769   42769   5
{noformat}

> Bulk document updates in RDBDocumentStore
> -
>
> Key: OAK-3637
> URL: https://issues.apache.org/jira/browse/OAK-3637
> Project: Jackrabbit Oak
>  Issue Type: Technical task
>  Components: rdbmk
>Reporter: Tomek Rękawek
> Fix For: 1.4
>
> Attachments: OAK-3637.patch
>
>
> Implement the [batch createOrUpdate|OAK-3662] in the RDBDocumentStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)