Hi Mike, > This is great feedback on the new Collector API, Uwe. Thanks!
- Likewise. > It's awesome that you no longer have to warm your searchers... but be > careful when a large segment merge commits. I know this, but in our case (e.g. creating a IN-SQL list, collecting measurement parameters from the documents) the warming is not really needed, it would only be a problem if it is very often (the index is updated every 20 minutes) and it must reload the whole field cache (takes 3-5 seconds on our machine). So a large merge taking 1-2 seconds for cache reloading is no problem (the users have the same problem with sorted results). If our index gets bigger, I will add warming in my search/cache implementation after reopening, for that it would be nice, to have the list of reopened segments (I think there was a issue about it, or is there an implementation?). In our case, most time takes the query in the SQL data warehouse after it, so 1 second additionally for building the SQL query is not much. > Did you hit any snags/problems/etc. that we should fix before releasing > 2.9? Until now, I have not seen any further problems. What I have seen befor is already implemented in Lucene with our active issue communication and all these issues :-) I still wait for the step towards moving trie (and also the new automaton regex query) to core and the modularization (hopefully before 2.9, to not create new APIs that change/deprecate later). Uwe > Mike > > On Sun, Apr 26, 2009 at 9:54 AM, Uwe Schindler <u...@thetaphi.de> wrote: > > Some status update: > > > >> > George, did you mean LUCENE-1516 below? (LUCENE-1313 is a further > >> > improvement to near real-time search that's still being iterated on). > >> > > >> > In general I would say 2.9 seems to be in rather active development > >> still > >> > ;) > >> > > >> > I too would love to hear about production/beta use of 2.9. George > >> > maybe you should re-ask on java-user? > >> > >> Here! I updated www.pangaea.de to Lucene-trunk today (because of > >> incomplete > >> hashcode in TrieRangeQuery)... Works perfect, but I do not use the > >> realtime > >> parts. And 10 days before the same, no problems :-) > >> > >> Currently I rewrite parts of my code to Collector to go away from > >> HitCollector (without score, so optimizations)! The reopen() and > sorting > >> is > >> fine, almost no time is consumed for sorted searches after reopening > >> indexes > >> every 20 minutes with just some new and small segments with changed > >> documents. No extra warming is needed. > > > > I rewrote my collectors now to use the new API. Even through the number > of > > methods to overwrite in the new collector is 3 instead of 1, the code > got > > shorter (because the collect methods now can throw IOExceptions, > great!!!). > > What is also perfect is the way how to use a FieldCache: Just retrieve > the > > FieldCache array (e.g. getInts()) in the setNextReader() method and use > the > > value array in the collect() method with the docid as index. Now I am > able > > to e.g. retrieve cached values even after an index reopen without > warming > > (same with sort). In the past you had to use a cache array for the whole > > index. The docBase is not used in my code, as I directly access the > index > > readers. So users now have both possibilities: use the supplied reader > or > > use the docBase as index offset into the searcher/main reader. Really > cool! > > > > The overhead of score calculation can be left out, if not needed, also > cool! > > > > One of my collectors is used retrieve the database ids (integers) for > > building up a SQL "IN (...)" from the field cache based on the collected > > hits. In the past this was very complicated, because FieldCache was slow > > after reopening and getting stored fields (the ids) is also very slow > (inner > > search loop). Now it's just 10 lines of code and no score is involved. > > > > The new code is working now in production at PANGAEA. > > > >> Another change to be done here is Field.Store.COMPRESS and replace by > >> manually compressed binary stored fields, but this is only to get rid > of > >> the > >> deprecated warnings. But this cannot be done without complete > reindexing. > >> > >> Uwe > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-dev-h...@lucene.apache.org > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org