Marcel,
Im replying to the list rather than Jira, since this is OT wrt JCR-169.

So, if you have 50x200MB of Lucene index... for example.... and wanted that to be accessible in a cluster environment, would Jackrabbit be a good place to put those segments ?

The big killer for Lucene is the ability to seek efficiently on the central blob (I think), but presumably by choosing the right Binary storage strategy that comes partially for free ?

If this is the case, I could replace my, slightly odd, segment distribution mechanism with Jackrabbit.


Last question,
Is JCR-169 being actively worked on ?
Is there an area where another pair of hands would help... I would like to be able to deploy Jackrabbit in a cluster.

Ian


Marcel Reutegger (JIRA) wrote:
[ http://issues.apache.org/jira/browse/JCR-169?page=comments#action_12432083 ] Marcel Reutegger commented on JCR-169:
--------------------------------------

Ian, thanks a lot for your comments.

Here are my current thoughts on clustering the search index in jackrabbit:

I think the prefered approach is to put the index into the repository itself. 
See: http://article.gmane.org/gmane.comp.apache.jackrabbit.devel/8530 and 
following messages
This would also allow us to distribute index updates to cluster nodes using the 
repository internal observation mechanism. e.g. the update of a deleted 
documents file or new index segments.

I found the best indexing strategy was to have local copies of segments, stored 
centrally as masters.

I agree. Specifically the design of lucene where index files are only created 
but never modified supports this approach very nicely.

Im the search application, speed of update of segments is not that critical,
you probably have a different requirement in JCR.

JCR is more restrictive in that respect, at least if we want to be compliant 
with the specification. As soon as a node is created in the workspace it must 
be searchable using a query. For most real life systems this is not a hard 
requirement though. E.g. when a document is added to a repository, it usually 
doesn't matter if it is retrievable by query only after a couple of seconds and 
not right away.


Make Jackrabbit clusterable
---------------------------

                Key: JCR-169
                URL: http://issues.apache.org/jira/browse/JCR-169
            Project: Jackrabbit
         Issue Type: New Feature
         Components: core
           Reporter: Marcel Reutegger
           Priority: Minor

This jira issue discusses the technical implications on the current design of 
Jackrabbit to introduce clustering.
Particularly the following areas require thorough investigation:
- SharedItemStateManager and its cache
    - cache integrity
    - cache design: look aside, write through?
    - hook for distributed cache, interface?
    - isolation level
    - transaction integrity within Jackrabbit, interaction with transient layer
- VirtualItemStateProvider
    - same strategy as SharedItemStateManager?
- Search index
    - single or per cluster node index?
- Observation
Please state more areas if needed.


Reply via email to