Hi, Dominique, Since you brought this up, I have just checked in the stuff I am working on (SVN revision 498). The main idea is to implement a cheaper version bitvector::cnt, called bitvector::sloppyCount, and replace most of calls to the former with calls to the later. At this point, it seems that I have gone a little too far, because there are some test cases producing incorrect answers. I am working on something at the moment, will come back to this probably tomorrow afternoon.
John On 3/21/12 3:30 PM, Dominique Prunier wrote: > Hi John, > > > > As promised, here is a first, partial, draft of the query evaluation > changes aimed to limit the number of calls to bitvector::cnt() (which > ends up being very costly with complex expressions). What i changed so > far: > > · I changed the sementic of query::doEvaluate to return <0 in > case of an error, 0 if there is no match and >=0 if there is a > potential match. Ultimately, this change would affect query::doScan > too. I don’t think any code is relying on the returned value of > query::doEvaluate and query::doScan to be the exact estimate. > > · I changed some of the the term evaluation in > query::doEvaluate (i basically changed those involved in my > performance test and identified the other ones to do) > > · I changed category::patternSearch to skip the decompression > of individual vectors, skip the compression of the final result and > return a very rough estimate instead of the exact count of matches > (same sementic as query::doEvaluate). > > > > What would be remaining has been identified with TODO comments. It is > mostly all the other terms and query::doScan. Since the previous spec > is compatible with the new one, the patch is working is already usable > with any query. > > > > I stepped back at changing the sementic of the index::evaluate method > since those explicitely specified that they are returning an exact > count in their spec (as opposed to query::doEvaluate, query::doScan > and category::patternSearch that only talk about negative results > being errors). In my test case, it doesn’t seems to change the > performance anyway. > > > > With this patch, my 509789 test queries execution dropped from ~810 s > to ~150 s (more than 5x faster), probably at the expense of some > memory usage (i’m thinking about the compression of > category::patternSearch, but shouldn’t be dramatic). > > > > Thanks, > > > > */Dominique Prunier/**//* > > APG Lead Developper > > Logo-W4N-100dpi > > 4388, rue Saint-Denis > > Bureau 309 > > Montreal (Quebec) H2J 2L1 > > Tel. +1 514-842-6767 x310 > > Fax +1 514-842-3989 > > [email protected] <mailto:[email protected]> > > www.watch4net.com <http://www.watch4net.com/> > > / / > > /This message is for the designated recipient only and may contain > privileged, proprietary, or otherwise private information. If you have > received it in error, please notify the sender immediately and delete > the original. Any other use of this electronic mail by you is prohibited. > > //Ce message est pour le récipiendaire désigné seulement et peut > contenir des informations privilégiées, propriétaires ou autrement > privées. Si vous l'avez reçu par erreur, S.V.P. avisez l'expéditeur > immédiatement et effacez l'original. Toute autre utilisation de ce > courrier électronique par vous est prohibée./// > > > > > > _______________________________________________ > FastBit-users mailing list > [email protected] > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
