Hi John,

As promised, here is a first, partial, draft of the query evaluation changes 
aimed to limit the number of calls to bitvector::cnt() (which ends up being 
very costly with complex expressions). What i changed so far:

·         I changed the sementic of query::doEvaluate to return <0 in case of 
an error, 0 if there is no match and >=0 if there is a potential match. 
Ultimately, this change would affect query::doScan too. I don't think any code 
is relying on the returned value of query::doEvaluate and query::doScan to be 
the exact estimate.

·         I changed some of the the term evaluation in  query::doEvaluate (i 
basically changed those involved in my performance test and identified the 
other ones to do)

·         I changed category::patternSearch to skip the decompression of 
individual vectors, skip the compression of the final result and return a very 
rough estimate instead of the exact count of matches (same sementic as 
query::doEvaluate).

What would be remaining has been identified with TODO comments. It is mostly 
all the other terms and query::doScan. Since the previous spec is compatible 
with the new one, the patch is working is already usable with any query.

I stepped back at changing the sementic of the index::evaluate method since 
those explicitely specified that they are returning an exact count in their 
spec (as opposed to query::doEvaluate, query::doScan and 
category::patternSearch that only talk about negative results being errors). In 
my test case, it doesn't seems to change the performance anyway.

With this patch, my 509789 test queries execution dropped from ~810 s to ~150 s 
(more than 5x faster), probably at the expense of some memory usage (i'm 
thinking about the compression of category::patternSearch, but shouldn't be 
dramatic).

Thanks,

Dominique Prunier
 APG Lead Developper
[cid:[email protected]]
 4388, rue Saint-Denis
 Bureau 309
 Montreal (Quebec)  H2J 2L1
 Tel. +1 514-842-6767  x310
 Fax +1 514-842-3989
 [email protected]<mailto:[email protected]>
 www.watch4net.com<http://www.watch4net.com/>

This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise private information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of this electronic mail by you is prohibited.

Ce message est pour le récipiendaire désigné seulement et peut contenir des 
informations privilégiées, propriétaires ou autrement privées. Si vous l'avez 
reçu par erreur, S.V.P. avisez l'expéditeur immédiatement et effacez 
l'original. Toute autre utilisation de ce courrier électronique par vous est 
prohibée.

<<inline: image001.gif>>

Attachment: avoid_calls_to_bitvector_cnt_in_query_eval.patch
Description: avoid_calls_to_bitvector_cnt_in_query_eval.patch

_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to