Cool. My patch is much more limited/simple that this, i just changed the query 
evaluation. There are probably some other places that could benefit an 
approximate count but for the query evaluation, i think not calling count at 
all is even simpler and probably hard to beat in term of raw perfs. 

Thanks,

-----Original Message-----
From: K. John Wu [mailto:[email protected]] 
Sent: Wednesday, March 21, 2012 6:56 PM
To: FastBit Users
Cc: Dominique Prunier
Subject: Re: [FastBit-users] PATCH: preliminary patch to limit the number of 
calls to bitvector::cnt() during query evaluation

Hi, Dominique,

Since you brought this up, I have just checked in the stuff I am
working on (SVN revision 498).  The main idea is to implement a
cheaper version bitvector::cnt, called bitvector::sloppyCount, and
replace most of calls to the former with calls to the later.  At this
point, it seems that I have gone a little too far, because there are
some test cases producing incorrect answers.  I am working on
something at the moment, will come back to this probably tomorrow
afternoon.

John




On 3/21/12 3:30 PM, Dominique Prunier wrote:
> Hi John,
> 
>  
> 
> As promised, here is a first, partial, draft of the query evaluation
> changes aimed to limit the number of calls to bitvector::cnt() (which
> ends up being very costly with complex expressions). What i changed so
> far:
> 
> ·         I changed the sementic of query::doEvaluate to return <0 in
> case of an error, 0 if there is no match and >=0 if there is a
> potential match. Ultimately, this change would affect query::doScan
> too. I don’t think any code is relying on the returned value of
> query::doEvaluate and query::doScan to be the exact estimate.
> 
> ·         I changed some of the the term evaluation in
>  query::doEvaluate (i basically changed those involved in my
> performance test and identified the other ones to do)
> 
> ·         I changed category::patternSearch to skip the decompression
> of individual vectors, skip the compression of the final result and
> return a very rough estimate instead of the exact count of matches
> (same sementic as query::doEvaluate).
> 
>  
> 
> What would be remaining has been identified with TODO comments. It is
> mostly all the other terms and query::doScan. Since the previous spec
> is compatible with the new one, the patch is working is already usable
> with any query.
> 
>  
> 
> I stepped back at changing the sementic of the index::evaluate method
> since those explicitely specified that they are returning an exact
> count in their spec (as opposed to query::doEvaluate, query::doScan
> and category::patternSearch that only talk about negative results
> being errors). In my test case, it doesn’t seems to change the
> performance anyway.
> 
>  
> 
> With this patch, my 509789 test queries execution dropped from ~810 s
> to ~150 s (more than 5x faster), probably at the expense of some
> memory usage (i’m thinking about the compression of
> category::patternSearch, but shouldn’t be dramatic).
> 
>  
> 
> Thanks,
> 
>  
> 
> */Dominique Prunier/**//*
> 
>  APG Lead Developper
> 
> Logo-W4N-100dpi
> 
>  4388, rue Saint-Denis
> 
>  Bureau 309
> 
>  Montreal (Quebec)  H2J 2L1
> 
>  Tel. +1 514-842-6767  x310
> 
>  Fax +1 514-842-3989
> 
>  [email protected] <mailto:[email protected]>
> 
>  www.watch4net.com <http://www.watch4net.com/>
> 
> / /
> 
> /This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise private information. If you have
> received it in error, please notify the sender immediately and delete
> the original. Any other use of this electronic mail by you is prohibited.
> 
> //Ce message est pour le récipiendaire désigné seulement et peut
> contenir des informations privilégiées, propriétaires ou autrement
> privées. Si vous l'avez reçu par erreur, S.V.P. avisez l'expéditeur
> immédiatement et effacez l'original. Toute autre utilisation de ce
> courrier électronique par vous est prohibée.///
> 
>  
> 
> 
> 
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to