[ 
https://issues.apache.org/jira/browse/OAK-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123537#comment-15123537
 ] 

Alex Parvulescu commented on OAK-2480:
--------------------------------------

thanks for the in-depth analysis! I have rewrote the backup code to not use 
checkpoints and instead use the current backup state vs the current repository 
state. one of the reasons was that the original code would ignore checkpoints 
entirely which means a full reindex post restore, not too good.
this means (as it is already included in the description) that the backup is 
not as efficient as it could be because it needs to do a full traversal of the 
content to figure out what changed, not ideal. this happens because the fast 
diff mechanism is based on record ids and stops working as soon as content 
escapes the current context and moves on a different store, were this a 
content-hash based storage, this diff would have been a lot more efficient (we 
had the same issue with the cold standby sync).
As it stands now, the backup doesn't try to compete on speed with a OS level 
copy, but instead it will try to compress the content (thing compaction to a 
different location), and also try to do it incrementally, even though the gains 
of doing the incremental thing are still to be determined. Also all the 
compaction optimization flags will apply to the backup.

pushed the fix in with http://svn.apache.org/viewvc?rev=1727595&view=rev, if it 
looks good I'll mark the issue as fixed soon.

> Incremental (FileStore)Backup copies the entire source instead of just the 
> delta
> --------------------------------------------------------------------------------
>
>                 Key: OAK-2480
>                 URL: https://issues.apache.org/jira/browse/OAK-2480
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: run
>    Affects Versions: 1.1.5
>            Reporter: Stefan Egli
>            Assignee: Alex Parvulescu
>             Fix For: 1.4
>
>         Attachments: IncrementalBackupTest.java, 
> oak-2480.incremental.partial.patch
>
>
> Running the FileStoreBackup (in oak-run) sequentially should correspond to an 
> incremental backup. This implies the expectation, that the incremental backup 
> is very resource-friendly, ie that it only adds the delta/diff that changed 
> since the last backup. Instead what can be een at the moment, is that it 
> copies the entire source-store again on each 'incremental' backup.
> Tested with the latest trunk snapshot.
> Suspecting the problem to be as follows: on the first backup the 
> FileStoreBackup stores a checkpoint created in the source-store and adds it 
> as a property "checkpoint" to the backup root node, besides the actual backup 
> which is stored in '/root'. 
> On subsequent incremental runs, the backup tries to retrieve said property 
> "checkpoint" from the backup and uses that in the compactor to do the diff 
> based upon.
> Now the problem seems to be that in Compactor.compact it goes to call 
> process(), which does a writer.writeNode(before) (where before is the 
> checkpoint in the origin store but writer is a writer of the backup store). 
> And in this SegmentWriter.writeNode() it fails to find the 'before' segment, 
> and thus traverses the entire tree and copies it from the origin to the 
> backup.
> So the problem looks to be in the area where it assumes to find this 
> 'checkpoint-before' in the backup but that's not the case.
> So a solution would have been to not do the diff between the checkpoint and 
> the current origin-head, but between the backup-head and the origin-head 
> instead. Now apparently this was not the intention though, as that would mean 
> to read through the entire backup for doing the diffing - and that would be 
> inefficient...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to