Re: Lucene 2.9 status (to port to Lucene.Net)

Michael McCandless Sun, 26 Apr 2009 10:56:04 -0700

This is great feedback on the new Collector API, Uwe.  Thanks!

It's awesome that you no longer have to warm your searchers... but be
careful when a large segment merge commits.


Did you hit any snags/problems/etc. that we should fix before releasing 2.9?

Mike

On Sun, Apr 26, 2009 at 9:54 AM, Uwe Schindler <[email protected]> wrote:
> Some status update:
>
>> > George, did you mean LUCENE-1516 below?  (LUCENE-1313 is a further
>> > improvement to near real-time search that's still being iterated on).
>> >
>> > In general I would say 2.9 seems to be in rather active development
>> still
>> > ;)
>> >
>> > I too would love to hear about production/beta use of 2.9.  George
>> > maybe you should re-ask on java-user?
>>
>> Here! I updated www.pangaea.de to Lucene-trunk today (because of
>> incomplete
>> hashcode in TrieRangeQuery)... Works perfect, but I do not use the
>> realtime
>> parts. And 10 days before the same, no problems :-)
>>
>> Currently I rewrite parts of my code to Collector to go away from
>> HitCollector (without score, so optimizations)! The reopen() and sorting
>> is
>> fine, almost no time is consumed for sorted searches after reopening
>> indexes
>> every 20 minutes with just some new and small segments with changed
>> documents. No extra warming is needed.
>
> I rewrote my collectors now to use the new API. Even through the number of
> methods to overwrite in the new collector is 3 instead of 1, the code got
> shorter (because the collect methods now can throw IOExceptions, great!!!).
> What is also perfect is the way how to use a FieldCache: Just retrieve the
> FieldCache array (e.g. getInts()) in the setNextReader() method and use the
> value array in the collect() method with the docid as index. Now I am able
> to e.g. retrieve cached values even after an index reopen without warming
> (same with sort). In the past you had to use a cache array for the whole
> index. The docBase is not used in my code, as I directly access the index
> readers. So users now have both possibilities: use the supplied reader or
> use the docBase as index offset into the searcher/main reader. Really cool!
>
> The overhead of score calculation can be left out, if not needed, also cool!
>
> One of my collectors is used retrieve the database ids (integers) for
> building up a SQL "IN (...)" from the field cache based on the collected
> hits. In the past this was very complicated, because FieldCache was slow
> after reopening and getting stored fields (the ids) is also very slow (inner
> search loop). Now it's just 10 lines of code and no score is involved.
>
> The new code is working now in production at PANGAEA.
>
>> Another change to be done here is Field.Store.COMPRESS and replace by
>> manually compressed binary stored fields, but this is only to get rid of
>> the
>> deprecated warnings. But this cannot be done without complete reindexing.
>>
>> Uwe
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Lucene 2.9 status (to port to Lucene.Net)

Reply via email to