On Mon, Dec 27, 2010 at 5:08 AM, Li Li <fancye...@gmail.com> wrote:
> I integrated pfor codec into lucene 2.9.3 and the search time
> comparsion is as follows:
>                                   single term   and query   or query
> VINT in lucene 2.9.3         11.2            36.5           38.6
> PFor in lucene 2.9.3         8.7              27.6           33.4
> VINT in lucene 4 branch   10.6             26.5           35.4
> PFor in lcuene 4 branch    8.1              22.5           30.7
>
> My test terms are high frequncy terms because we are interested in "bad case"

I agree it's the bad cases we should focus on in general.  If a super
fast query gets somewhat slower it's "relatively harmless" (just a
"capacity" question for high volume sites) but if the bad queries get
slower it's awful (requires faster cutover to sharded architecture),
until we fix Lucene to run a single search concurrently (which we
badly need to do).

> It seems lucene 4 branch's implementation of and query(conjuction
> query) is well optimized that even for VINT codec, it's faster than
> PFor in lucene 2.9.3. Could any one tell me what optimization is done?
> is store docIDs and freqs separately making it faster? or anything
> else?

Actually vInt on the bulkpostings branch stores freq/doc together.  Ie
the format is the same as 2.9.x's format.  I think it could be the
fact that AND query does block reads (64 doc/freqs at once) instead of
doc-at-once?  Ie, because of this, the query is efficitively scanning
the next block of 64 docs instead of skipping to them?  Our skipping
impl is unfortunately rather costly so if skip will not skip that many
docs it's better to scan.

> Another querstion, Is there anyone interested in integrating pfor
> codec into lucene 2.9.3 as me( we have to use lucene 2.9 and solr
> 1.4). And how do I contribute this patch?

Realistically I don't think we can commit this to 2.9.x -- that branch
is purely bug fixes at this point.

Still it's possible others could make use of such a patch so if it's
not too much work you may as well post it?  It can lead to
improvements on the bulk postings branch too :)  The more patches the
merrier!

You only use PFor for the very high freq terms in 2.9.x right?  I've
wondered if we should do the same on bulkpostings... problem is for eg
range queries, that visit all docs for all terms b/w X and Y, you want
the bulk decode even for low freq terms...

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to