[ 
https://issues.apache.org/jira/browse/OAK-2626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483081#comment-14483081
 ] 

Julian Sedding commented on OAK-2626:
-------------------------------------

The time taken for an upgrade can be split into the time taken to *copy the 
data* and the time taken to *execute the commit hooks*.

This optimization relies on comparing blobs by reference. The optimizations are 
designed to avoid file-system access where possible, i.e. reference calculation 
is reduced to a string operation.

For the initial step of *copying the data*, the attached patch is sufficient, 
as the {{JackrabbitNodeState}}'s anonymous {{AbstractBlob}} implements 
{{equals}} with a reference comparison, before falling back to 
{{AbstractBlob#equals()}}.

In order to benefit from the optimization during the *execution of commot 
hooks*, the patch from OAK-2627, which adds a reference comparison to 
{{AbstractBlob}} itself, needs to be applied as well. This is because when the 
commit hooks are executed, any compared {{NodeState}}s are off the same type 
(e.g. SegmentNodeState or DocumentNodeState).

To activate the optimization, {{ReferenceOptimizedBlobStore}} needs to be used 
(as a drop-in replacement) instead of {{DataStoreBlobStore}}.

!incremental-upgrade-no-changes.png!

The graph shows four scenarios run with TarMK + FDS (500k nodes copied from an 
AEM instance, ~2/3 are digital assets, ~1/3 are websites). Each time the source 
repository is copied a second time without any changes.

# copy: No optimizations. Essentially, the entire repository is copied again 
(34 sec) and then compared for the commit-hooks. No NodeStates are shared, so a 
full repository traversal is done for the comparison (63 sec).
# copy + binary-optimization: Optimized blob comparison by reference. Again, 
the entire repository is copied again (34 sec) and then compared for the 
commit-hooks. A full repository traversal is done for the comparison, but blob 
comparison is optimized (14 sec).
# recursive-copy: Content is copied recursively (43 sec). All properties are 
compared during copy and set only if changed (see OAK-2619). Since there are no 
changes, no time is required to execute commit-hooks (0 sec).
# recursive-copy + blob-optimization: As above, but the recursive copy benefits 
from optimized binary comparison (15 sec). Again, no changes were made, hence 
commit hooks require no time (0 sec).

For reference, the first run for all four scenarios is very uniform: 33-36 sec 
for copy and 19 sec for the commit hooks (comparing against EmptyNodeState is 
fast), i.e. a total of 52-57 sec.

> Optimize binary comparison for merge during upgrade 
> ----------------------------------------------------
>
>                 Key: OAK-2626
>                 URL: https://issues.apache.org/jira/browse/OAK-2626
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: upgrade
>    Affects Versions: 1.1.7
>            Reporter: Julian Sedding
>            Priority: Minor
>         Attachments: OAK-2626.patch, incremental-upgrade-no-changes.png
>
>
> In OAK-2619 I propose to support repeated upgrades into the same NodeStore.
> This issue does not optimizate the first run, but any subsequent run benefits 
> from the proposed changes.
> One use-case for this feature is to import all content several days before 
> the upgrade and then copy only the delta on the day of the upgrade.
> Assuming that both the source and target repositories use the same 
> FileDataStore, binaries could be efficiently compared by their references.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to