On Mon, Dec 27, 2010 at 5:08 AM, Li Li <fancye...@gmail.com> wrote: > I integrated pfor codec into lucene 2.9.3 and the search time > comparsion is as follows: > single term and query or query > VINT in lucene 2.9.3 11.2 36.5 38.6 > PFor in lucene 2.9.3 8.7 27.6 33.4 > VINT in lucene 4 branch 10.6 26.5 35.4 > PFor in lcuene 4 branch 8.1 22.5 30.7 > > My test terms are high frequncy terms because we are interested in "bad case"
I agree it's the bad cases we should focus on in general. If a super fast query gets somewhat slower it's "relatively harmless" (just a "capacity" question for high volume sites) but if the bad queries get slower it's awful (requires faster cutover to sharded architecture), until we fix Lucene to run a single search concurrently (which we badly need to do). > It seems lucene 4 branch's implementation of and query(conjuction > query) is well optimized that even for VINT codec, it's faster than > PFor in lucene 2.9.3. Could any one tell me what optimization is done? > is store docIDs and freqs separately making it faster? or anything > else? Actually vInt on the bulkpostings branch stores freq/doc together. Ie the format is the same as 2.9.x's format. I think it could be the fact that AND query does block reads (64 doc/freqs at once) instead of doc-at-once? Ie, because of this, the query is efficitively scanning the next block of 64 docs instead of skipping to them? Our skipping impl is unfortunately rather costly so if skip will not skip that many docs it's better to scan. > Another querstion, Is there anyone interested in integrating pfor > codec into lucene 2.9.3 as me( we have to use lucene 2.9 and solr > 1.4). And how do I contribute this patch? Realistically I don't think we can commit this to 2.9.x -- that branch is purely bug fixes at this point. Still it's possible others could make use of such a patch so if it's not too much work you may as well post it? It can lead to improvements on the bulk postings branch too :) The more patches the merrier! You only use PFor for the very high freq terms in 2.9.x right? I've wondered if we should do the same on bulkpostings... problem is for eg range queries, that visit all docs for all terms b/w X and Y, you want the bulk decode even for low freq terms... Mike --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org