Re: [jr3] Search index in content

Thomas Müller Wed, 17 Feb 2010 23:40:30 -0800

Hi,

For me, there are two kinds of indexes: the property/value indexes,
and the fulltext index.


The property/value indexes are for property values, node names, paths,
node references, and so on. Such indexes (or "indices") are relatively
small and fast. In relational databases, those are the secondary
indexes (non-primary-key indexes). Those index updates should be done
synchronously as part of the transaction (maybe even in the transient
space). Currently, we use Apache Lucene for this, but I wouldn't. I
would keep those indexes within the repository.

The fulltext index is (potentially) slow, specially fulltext
extraction. Therefore, fulltext index should be done asynchronously if
it takes too long. Also, in a clustered environment, at least text
extraction should only be done in one cluster node. I would still use
Apache Tika and Apache Lucene for this.

Regards,
Thomas

Re: [jr3] Search index in content

Reply via email to