This is great feedback on the new Collector API, Uwe. Thanks! It's awesome that you no longer have to warm your searchers... but be careful when a large segment merge commits.
Did you hit any snags/problems/etc. that we should fix before releasing 2.9? Mike On Sun, Apr 26, 2009 at 9:54 AM, Uwe Schindler <u...@thetaphi.de> wrote: > Some status update: > >> > George, did you mean LUCENE-1516 below? (LUCENE-1313 is a further >> > improvement to near real-time search that's still being iterated on). >> > >> > In general I would say 2.9 seems to be in rather active development >> still >> > ;) >> > >> > I too would love to hear about production/beta use of 2.9. George >> > maybe you should re-ask on java-user? >> >> Here! I updated www.pangaea.de to Lucene-trunk today (because of >> incomplete >> hashcode in TrieRangeQuery)... Works perfect, but I do not use the >> realtime >> parts. And 10 days before the same, no problems :-) >> >> Currently I rewrite parts of my code to Collector to go away from >> HitCollector (without score, so optimizations)! The reopen() and sorting >> is >> fine, almost no time is consumed for sorted searches after reopening >> indexes >> every 20 minutes with just some new and small segments with changed >> documents. No extra warming is needed. > > I rewrote my collectors now to use the new API. Even through the number of > methods to overwrite in the new collector is 3 instead of 1, the code got > shorter (because the collect methods now can throw IOExceptions, great!!!). > What is also perfect is the way how to use a FieldCache: Just retrieve the > FieldCache array (e.g. getInts()) in the setNextReader() method and use the > value array in the collect() method with the docid as index. Now I am able > to e.g. retrieve cached values even after an index reopen without warming > (same with sort). In the past you had to use a cache array for the whole > index. The docBase is not used in my code, as I directly access the index > readers. So users now have both possibilities: use the supplied reader or > use the docBase as index offset into the searcher/main reader. Really cool! > > The overhead of score calculation can be left out, if not needed, also cool! > > One of my collectors is used retrieve the database ids (integers) for > building up a SQL "IN (...)" from the field cache based on the collected > hits. In the past this was very complicated, because FieldCache was slow > after reopening and getting stored fields (the ids) is also very slow (inner > search loop). Now it's just 10 lines of code and no score is involved. > > The new code is working now in production at PANGAEA. > >> Another change to be done here is Field.Store.COMPRESS and replace by >> manually compressed binary stored fields, but this is only to get rid of >> the >> deprecated warnings. But this cannot be done without complete reindexing. >> >> Uwe >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org