Hi John, As promised, here is a first, partial, draft of the query evaluation changes aimed to limit the number of calls to bitvector::cnt() (which ends up being very costly with complex expressions). What i changed so far:
· I changed the sementic of query::doEvaluate to return <0 in case of an error, 0 if there is no match and >=0 if there is a potential match. Ultimately, this change would affect query::doScan too. I don't think any code is relying on the returned value of query::doEvaluate and query::doScan to be the exact estimate. · I changed some of the the term evaluation in query::doEvaluate (i basically changed those involved in my performance test and identified the other ones to do) · I changed category::patternSearch to skip the decompression of individual vectors, skip the compression of the final result and return a very rough estimate instead of the exact count of matches (same sementic as query::doEvaluate). What would be remaining has been identified with TODO comments. It is mostly all the other terms and query::doScan. Since the previous spec is compatible with the new one, the patch is working is already usable with any query. I stepped back at changing the sementic of the index::evaluate method since those explicitely specified that they are returning an exact count in their spec (as opposed to query::doEvaluate, query::doScan and category::patternSearch that only talk about negative results being errors). In my test case, it doesn't seems to change the performance anyway. With this patch, my 509789 test queries execution dropped from ~810 s to ~150 s (more than 5x faster), probably at the expense of some memory usage (i'm thinking about the compression of category::patternSearch, but shouldn't be dramatic). Thanks, Dominique Prunier APG Lead Developper [cid:[email protected]] 4388, rue Saint-Denis Bureau 309 Montreal (Quebec) H2J 2L1 Tel. +1 514-842-6767 x310 Fax +1 514-842-3989 [email protected]<mailto:[email protected]> www.watch4net.com<http://www.watch4net.com/> This message is for the designated recipient only and may contain privileged, proprietary, or otherwise private information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of this electronic mail by you is prohibited. Ce message est pour le récipiendaire désigné seulement et peut contenir des informations privilégiées, propriétaires ou autrement privées. Si vous l'avez reçu par erreur, S.V.P. avisez l'expéditeur immédiatement et effacez l'original. Toute autre utilisation de ce courrier électronique par vous est prohibée.
<<inline: image001.gif>>
avoid_calls_to_bitvector_cnt_in_query_eval.patch
Description: avoid_calls_to_bitvector_cnt_in_query_eval.patch
_______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
