Re: Using Lucene partly as DB and 'joining' search results.

Antony Bowesman Mon, 14 Apr 2008 21:00:11 -0700

Thanks all for the suggestions - there was also another thread "Lucene index onrelational data" which had crossover here.

That's an interesting idea about using ParallelReader for the changable index.I had thought to just have a triplet indexed 'owner:mailId:label' in each Docand have multiple Documents for the same mailId, e.g. if each recipient addslabels for the same mail, or if multiple labels are added by one recipient. Iwould then have to make a join using mailId against the core. However, if Iwant to use PR, I could have a single Document with multiple field, and usingstored fields can 'modify' that Document. However, what happens to the DocIdwhen the delete+add occurs and how do I ensure it stays the same.

I'm on 2.3.1. I seem to recall a discussion on this in another thread, butcannot find it.


Antony



Chris Hostetter wrote:

: The archive is read only apart from bulk deletes, but one of the requirements
: is for users to be able to label their own mail.  Given that a Lucene Document
: cannot be updated, I have thought about having a separate Lucene index that
: has just the 3 terms (or some combination of) userId + mailId + label.
:: That of course would mean joining searches from the main mail data index and
: the label index.
tangential to the existing follwups about ways to use Filters efficientlyto get some of the behavior, take a look at ParallelReader ... your usecase sounds like it might be perfect for it: one really large main datasetthat changes fairly infrequently, and what changes do occur are mainlyabout adding new records; plus a small "parallel" set of fields abouteach record in the main set which do change fairly frequently.
you build up an index for the main data, and then you periodicly build upa second index with the docs in the exact same order as the main index.
additions to the main index do't need to block on rebuilding the secondaryindex. deletes do (since you need to delete from both indexes in parallelto keep the ids in sync) ... but that's ok since you said you only needoccasional bulk deletes (you could process them as an initial step of yourrecuring rebuild of the smaller index).
-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Using Lucene partly as DB and 'joining' search results.

Reply via email to