Re: [jr3] Search index in content

Bertrand Delacretaz Thu, 18 Feb 2010 01:49:15 -0800

Hi,

On Thu, Feb 18, 2010 at 8:39 AM, Thomas Müller <[email protected]> wrote:
> The property/value indexes are for property values, node names, paths,
> node references, and so on. ....
> ...Currently, we use Apache Lucene for this, but I wouldn't. I
> would keep those indexes within the repository.
>
> ...The fulltext index is (potentially) slow, specially fulltext
> extraction. Therefore, fulltext index should be done asynchronously if
> it takes too long....


I love this idea of separating the two kinds of indexes, having the
fulltext "eventually indexed" might be good enough, with interfaces to
find out about the status of the indexing queue.

I am involved in the IKS project (http://iks-project.eu/) where we
envision new types of indexing/search for content-based applications,
and in this perspective it might make sense to be able to add more
indexing methods.

So as we're dreaming aloud, my ideal view of Jackrabbit
indexing/search would be:

1. The "structural index" (your first type) is managed by Jackrabbit,
doesn't require configuration, behaves like a database index
(synchronous, transactional, stored in repository, etc.)

2. The "standard fulltext index" uses Lucene, large items are queued
for eventual indexing, can be delegated to a separate cluster,
configurable as to what to index and what not, can be disabled, etc.
Ideally stored in repository.

3. Additional "custom external indexes" can be configured, work like
the Lucene index but using external components (a la Solr for example,
RESTful indexing engines). Not sure how the JCR query syntax can
address those, that's a different problem.

-Bertrand

Re: [jr3] Search index in content

Reply via email to