Re: Checkpoints and copying NodeStore instances (aka RepositorySidegrade)

Alex Parvulescu Wed, 05 Aug 2015 10:58:57 -0700

Hi,

see inline


On Wed, Aug 5, 2015 at 5:45 PM, Julian Sedding <jsedd...@gmail.com> wrote:

> Hi Alex
>
> Thanks for your comments.
>
> On Wed, Aug 5, 2015 at 3:48 PM, Alex Parvulescu
> <alex.parvule...@gmail.com> wrote:
> > Hi,
> >
> > Just a few clarifications on the error you see
> >
> >> My interpretation is that the AsyncIndexUpdate is trying to retrieve
> > the previous checkpoint as stored in /:async/async. Of course this
> > checkpoint is not present in the copied NodeStore and thus cannot be
> > retrieved.
> >
> > The error comes from DocumentMk trying to parse the reference checkpoint
> > value. Basically what fails here is 'Revision.fromString' receiving a
> > malformed checkpoint value because it comes from the SegmentMk. The quick
> > fix is to manually remove the properties on the "/:async" hidden node.
> This
> > will indeed trigger a full reindex, but will help you getting over this
> > issue.
>
> Agreed. In this case parsing the revision is the first thing that
> fails. When copying DNS to SNS a similar situation would arise,
> because no snapshot with the provided ID exists.
>
>
[alex] Not really, as the SegmentMk will not fail (no
IllegalArgumentException), but simply log a warning the checkpoint doesn't
exist and perform a full reindex. So in this regard it is a bit more
lenient :)



> >
> >> IMHO it would be desirable to (optionally) copy the checkpoints as
> > well. In the case of AsyncIndexUpdate, having the checkpoint can save
> > a full re-index.
> >
> > This is very tricky, as the 2 representations of checkpoints between
> > SegmentMk and DocumentMk are quite different. I would strongly suggest
> > going for the reindex, after all you'd only migrate once, so you can
> > prepare for this lengthy process.
>
> I'm experimenting with the following approach:
> * retrieve the first checkpoint and copy the NodeState tree at that
> revision (available via CheckpointMBean impls)
> * after copying the tree, merge and create a checkpoint (expiration
> time can be calculated)
> * rinse and repeat until the head revision is reached
>
> My aim is to reduce the critical path for migrating one NodeStore
> (incl JR2) to another. Indexing (especially async indexing) takes is a
> big part of the time, so if I can move that out of the critical path,
> it can save a lot of downtime.
>

[alex] interesting approach. I would only reduce this to the 'current'
indexed checkpoint (the async reference). So you'd migrate that over first
as the head state, create a checkpoint based on it (let' call it 'c0').
then diff&apply the SegmentMk head state on top of this. update the async
property to point to c0 and you might be good.



>
> My current approach for a migration from JR2 to MongoMK is to:
> * copy JR2 to TarMK (TarMK is a lot faster for creating indexes etc.
> than MongoMK)
> * repeat JR2 to TarMK copy every week or every 24h using incremental
> copy. this saves on CommitHook execution time - in theory this can
> reduce the time for one run to a single full repository traversal.
> * finally on the day when the systems should be switched over, run a
> last JR2 to TarMK and then a TarMK to MongoMK copy. this is the
> critical path.
>

[alex] Always going through the SegmentMk seems a bit convoluted. Why not
do the migration once, then apply the diffs on top of MongoMk directly
(AFAIK we have support for incremental updates now)? Are the 24h diffs so
big that it makes it unusable/unacceptable to go to MongoMk directly? (I'd
like to see this backed by some numbers).


hope this helps,
alex




> Due to the above, copying at least the checkpoint of the async index
> will likely speed up the critical path. Of course measuring execution
> times will provide the definitive answer to this question.
>
> Regards
> Julian
>
> >
> > best,
> > alex
> >
> >
> > On Wed, Aug 5, 2015 at 3:35 PM, Julian Sedding <jsedd...@gmail.com>
> wrote:
> >
> >> Hi all
> >>
> >> I am working on a scenario, where I need to copy a SegmentNodeStore
> >> (TarMK) to a DocumentNodeStore (MongoDB).
> >>
> >> It is pretty straight forward to simply copy the NodeStore via the
> >> API. No problems here.
> >>
> >> In a recent experiment I successfully copied the NodeStore and got an
> >> exception in the logs (stacktrace below the email).
> >>
> >> My interpretation is that the AsyncIndexUpdate is trying to retrieve
> >> the previous checkpoint as stored in /:async/async. Of course this
> >> checkpoint is not present in the copied NodeStore and thus cannot be
> >> retrieved.
> >>
> >> IMHO it would be desirable to (optionally) copy the checkpoints as
> >> well. In the case of AsyncIndexUpdate, having the checkpoint can save
> >> a full re-index.
> >>
> >> The question that remains is how the internal state of
> >> AsyncIndexUpdate should be modified:
> >> * implementing the logic in oak-upgrade would be pragmatic, but
> >> distributes knowledge about AsyncIndexUpdate implementation details to
> >> different modules
> >> * having a CommitHook/Editor in oak-core that can be used in
> >> oak-upgrade might be cleaner, but would only get used in oak-upgrade
> >>
> >> Other ideas and opinions regarding this feature are more than welcome!
> >>
> >> Regards
> >> Julian
> >>
> >>
> >> 05.08.2015 00:03:19.133 *ERROR* [pool-6-thread-2]
> >> org.apache.sling.commons.scheduler.impl.QuartzScheduler Exception
> >> during job execution of
> >> org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate@471e4b4b :
> >> 91f7e218-6cf5-4a44-a324-f094c29898e6
> >> java.lang.IllegalArgumentException: 91f7e218-6cf5-4a44-a324-f094c29898e6
> >>         at
> >>
> org.apache.jackrabbit.oak.plugins.document.Revision.fromString(Revision.java:236)
> >>         at
> >>
> org.apache.jackrabbit.oak.plugins.document.DocumentNodeStore.retrieve(DocumentNodeStore.java:1570)
> >>         at
> >>
> org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate.run(AsyncIndexUpdate.java:279)
> >>         at
> >>
> org.apache.sling.commons.scheduler.impl.QuartzJobExecutor.execute(QuartzJobExecutor.java:105)
> >>         at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
> >>         at
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >>         at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >>         at java.lang.Thread.run(Thread.java:745)
> >>
>

Re: Checkpoints and copying NodeStore instances (aka RepositorySidegrade)

Reply via email to