Hi, Dominique, I have run through my usual set of tests and did not find any problem with your patch. It is now in SVN 482. Please give it a try when you get the chance.
Thanks. John On 3/9/12 10:17 AM, Dominique Prunier wrote: > Quick update to my patch: > > · Changed dictionary::patternMatch to make it work with CI too > (and i think for efficiency reasons, i have to keep all this here) > > · Moved the STR_MATCH_* constants from util.cpp to util.h and > use them in dictionary::patternMatch > > · Removed the CS/CI ifdef from category.cpp > > > > I did more testing, and on my set of ~90 000 test queries, the > execution time dropped from ~515 seconds to ~20 seconds. > > > > Thanks, > > > > *From:*[email protected] > [mailto:[email protected]] *On Behalf Of *Dominique > Prunier > *Sent:* Thursday, March 08, 2012 2:39 PM > *To:* FastBit Users > *Subject:* [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to > match LIKE patterns case-sensitively and perform specific optimizations > > > > Here is the first version of my patch to switch SQL like from case > insensitive to case sensitive and optimize this use case with CATEGORY > columns. > > > > In a nutshell, what changed is: > > · We extract the longest (handling the escape char too) > constant prefix from the pattern > > · Instead of testing every value in the dictionary, we binary > search the range of values to search (which sometimes even allow to > skip pattern matching if no valid range can be found) > > · We test every value in the range > > > > On a large dictionary (~130k entries), i’ve commonly it can be one or > two order of magnitude faster (in my example, a simple query with a > single LIKE predicate drops from ~10ms to ~0.4ms). > > > > What i’d like to change/refactor (i’m really a newbie in c++): > > · Remove the prefix extraction and pattern matching code from > dictionary and replace the added method patternSearch by something > like findRange. I believe that matching and pattern handling code > doesn’t belong to the dictionary. I’d rather move this back to the > category class or something. > > · Having to use a c++ string object to rebuild the longest > constant prefix bugs me (suggestions ?). I’m also thinking to have a > version that doesn’t support escaping, but it would force me to change > strMatch a bit more > > · To closely match the previous behavior, you can’t match an > empty pattern (even the empty string doesn’t match), maybe that would > worh being changed > > > > As always John, feel free to include this into the main branch. I’m > waiting for suggestions to make it more efficient, cleaner, ... > > > > Thanks, > > > > */Dominique Prunier/**//* > > APG Lead Developper > > Logo-W4N-100dpi > > 4388, rue Saint-Denis > > Bureau 309 > > Montreal (Quebec) H2J 2L1 > > Tel. +1 514-842-6767 x310 > > Fax +1 514-842-3989 > > [email protected] <mailto:[email protected]> > > www.watch4net.com <http://www.watch4net.com/> > > / / > > /This message is for the designated recipient only and may contain > privileged, proprietary, or otherwise private information. If you have > received it in error, please notify the sender immediately and delete > the original. Any other use of this electronic mail by you is prohibited. > > //Ce message est pour le récipiendaire désigné seulement et peut > contenir des informations privilégiées, propriétaires ou autrement > privées. Si vous l'avez reçu par erreur, S.V.P. avisez l'expéditeur > immédiatement et effacez l'original. Toute autre utilisation de ce > courrier électronique par vous est prohibée./// > > > > > > _______________________________________________ > FastBit-users mailing list > [email protected] > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
