Here is the first version of my patch to switch SQL like from case insensitive to case sensitive and optimize this use case with CATEGORY columns.
In a nutshell, what changed is: · We extract the longest (handling the escape char too) constant prefix from the pattern · Instead of testing every value in the dictionary, we binary search the range of values to search (which sometimes even allow to skip pattern matching if no valid range can be found) · We test every value in the range On a large dictionary (~130k entries), i've commonly it can be one or two order of magnitude faster (in my example, a simple query with a single LIKE predicate drops from ~10ms to ~0.4ms). What i'd like to change/refactor (i'm really a newbie in c++): · Remove the prefix extraction and pattern matching code from dictionary and replace the added method patternSearch by something like findRange. I believe that matching and pattern handling code doesn't belong to the dictionary. I'd rather move this back to the category class or something. · Having to use a c++ string object to rebuild the longest constant prefix bugs me (suggestions ?). I'm also thinking to have a version that doesn't support escaping, but it would force me to change strMatch a bit more · To closely match the previous behavior, you can't match an empty pattern (even the empty string doesn't match), maybe that would worh being changed As always John, feel free to include this into the main branch. I'm waiting for suggestions to make it more efficient, cleaner, ... Thanks, Dominique Prunier APG Lead Developper [cid:[email protected]] 4388, rue Saint-Denis Bureau 309 Montreal (Quebec) H2J 2L1 Tel. +1 514-842-6767 x310 Fax +1 514-842-3989 [email protected]<mailto:[email protected]> www.watch4net.com<http://www.watch4net.com/> This message is for the designated recipient only and may contain privileged, proprietary, or otherwise private information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of this electronic mail by you is prohibited. Ce message est pour le récipiendaire désigné seulement et peut contenir des informations privilégiées, propriétaires ou autrement privées. Si vous l'avez reçu par erreur, S.V.P. avisez l'expéditeur immédiatement et effacez l'original. Toute autre utilisation de ce courrier électronique par vous est prohibée.
<<inline: image001.gif>>
cs_like.patch
Description: cs_like.patch
_______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
