Hi Christoph,
Christoph Kiehl wrote:
we've got very big indexes/workspaces on our production servers which
have from 3,000,000 to 8,000,000 nodes and are still growing because of
creation of versions and adding new nodes.
When it happens that the VM in which Jackrabbit lives in crashes during
a write operation, Jackrabbit nicely applies the redo log on a restart
which gets done quite quick but then starts its consistency check. This
check takes from 30 minutes to 2 hours depending on the repository. In
this time our application is offline which we would of course like to
avoid ;) Our system uses a bundle oracle pm which probably doesn't make
things better.
I had a quick glance at the consistency check code and it seems like
there is nothing that could be substantially optimized in that place. I
thought it might be possible to just include those index segments that
where used while replaying the redo log but as the consistency check
works this is impossible.
I think the only way to fasten startup is to avoid the occurrence of the
errors that the check is checking for at all. Since the redo log
mechanism seems quite good I'm not sure if those errors
(MissingAncestor, MultipleEntries, NodeDeleted, UnknownParent) can still
occur. Could you maybe elaborate on the situations where you expect
those errors to arise?
IIRC the consistence check was introduced first in jackrabbit and later the redo
log mechanism, which makes the consistence check kind of superfluous.
For now I'm thinking about disabling consistency checks at all by
default and run them in a maintenance window at night. Unfortunately
this might be a bit dangerous if parts of the application rely on
certain nodes to be found by queries :/
WDYT?
I agree with you. We could introduce a third configuration value for
forceConsistenceCheck (in addition to 'true' and 'false'): disabled. that would
then be the default in the next released version of jackrabbit.
WDYT?
regards
marcel