Hi, On Thu, Feb 18, 2010 at 8:39 AM, Thomas Müller <thomas.muel...@day.com> wrote: > The property/value indexes are for property values, node names, paths, > node references, and so on. .... > ...Currently, we use Apache Lucene for this, but I wouldn't. I > would keep those indexes within the repository. > > ...The fulltext index is (potentially) slow, specially fulltext > extraction. Therefore, fulltext index should be done asynchronously if > it takes too long....
I love this idea of separating the two kinds of indexes, having the fulltext "eventually indexed" might be good enough, with interfaces to find out about the status of the indexing queue. I am involved in the IKS project (http://iks-project.eu/) where we envision new types of indexing/search for content-based applications, and in this perspective it might make sense to be able to add more indexing methods. So as we're dreaming aloud, my ideal view of Jackrabbit indexing/search would be: 1. The "structural index" (your first type) is managed by Jackrabbit, doesn't require configuration, behaves like a database index (synchronous, transactional, stored in repository, etc.) 2. The "standard fulltext index" uses Lucene, large items are queued for eventual indexing, can be delegated to a separate cluster, configurable as to what to index and what not, can be disabled, etc. Ideally stored in repository. 3. Additional "custom external indexes" can be configured, work like the Lucene index but using external components (a la Solr for example, RESTful indexing engines). Not sure how the JCR query syntax can address those, that's a different problem. -Bertrand