Hey John, I just noticed a small typo in utils.h, the macro is called FASTBOT_... I don't think it was expected but it has the nice side effect of disabling new code by default thus preserving current behavior (case insensitive). Should we actually keep it in util.h now that it is documented in INSTALL ?
https://codeforge.lbl.gov/plugins/scmsvn/viewcvs.php/trunk/src/util.h?root=fastbit&r1=483&r2=482&pathrev=483 Thanks, ________________________________________ From: K. John Wu [[email protected]] Sent: March-09-12 10:47 PM To: Dominique Prunier Cc: FastBit Users Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to match LIKE patterns case-sensitively and perform specific optimizations Hi, Dominique, I would like to add FASTBIT_ prefix to the macro CS_PATTERN_MATCH to avoid possible collision when FastBit is used with other package. Hope you don't mind. John On 3/9/12 4:03 PM, K. John Wu wrote: > Hi, Dominique, > > I have run through my usual set of tests and did not find any problem > with your patch. It is now in SVN 482. Please give it a try when you > get the chance. > > Thanks. > > John > > > > On 3/9/12 10:17 AM, Dominique Prunier wrote: >> Quick update to my patch: >> >> · Changed dictionary::patternMatch to make it work with CI too >> (and i think for efficiency reasons, i have to keep all this here) >> >> · Moved the STR_MATCH_* constants from util.cpp to util.h and >> use them in dictionary::patternMatch >> >> · Removed the CS/CI ifdef from category.cpp >> >> >> >> I did more testing, and on my set of ~90 000 test queries, the >> execution time dropped from ~515 seconds to ~20 seconds. >> >> >> >> Thanks, >> >> >> >> *From:*[email protected] >> [mailto:[email protected]] *On Behalf Of *Dominique >> Prunier >> *Sent:* Thursday, March 08, 2012 2:39 PM >> *To:* FastBit Users >> *Subject:* [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >> match LIKE patterns case-sensitively and perform specific optimizations >> >> >> >> Here is the first version of my patch to switch SQL like from case >> insensitive to case sensitive and optimize this use case with CATEGORY >> columns. >> >> >> >> In a nutshell, what changed is: >> >> · We extract the longest (handling the escape char too) >> constant prefix from the pattern >> >> · Instead of testing every value in the dictionary, we binary >> search the range of values to search (which sometimes even allow to >> skip pattern matching if no valid range can be found) >> >> · We test every value in the range >> >> >> >> On a large dictionary (~130k entries), i’ve commonly it can be one or >> two order of magnitude faster (in my example, a simple query with a >> single LIKE predicate drops from ~10ms to ~0.4ms). >> >> >> >> What i’d like to change/refactor (i’m really a newbie in c++): >> >> · Remove the prefix extraction and pattern matching code from >> dictionary and replace the added method patternSearch by something >> like findRange. I believe that matching and pattern handling code >> doesn’t belong to the dictionary. I’d rather move this back to the >> category class or something. >> >> · Having to use a c++ string object to rebuild the longest >> constant prefix bugs me (suggestions ?). I’m also thinking to have a >> version that doesn’t support escaping, but it would force me to change >> strMatch a bit more >> >> · To closely match the previous behavior, you can’t match an >> empty pattern (even the empty string doesn’t match), maybe that would >> worh being changed >> >> >> >> As always John, feel free to include this into the main branch. I’m >> waiting for suggestions to make it more efficient, cleaner, ... >> >> >> >> Thanks, >> >> >> >> */Dominique Prunier/**//* >> >> APG Lead Developper >> >> Logo-W4N-100dpi >> >> 4388, rue Saint-Denis >> >> Bureau 309 >> >> Montreal (Quebec) H2J 2L1 >> >> Tel. +1 514-842-6767 x310 >> >> Fax +1 514-842-3989 >> >> [email protected] <mailto:[email protected]> >> >> www.watch4net.com <http://www.watch4net.com/> >> >> / / >> >> /This message is for the designated recipient only and may contain >> privileged, proprietary, or otherwise private information. If you have >> received it in error, please notify the sender immediately and delete >> the original. Any other use of this electronic mail by you is prohibited. >> >> //Ce message est pour le récipiendaire désigné seulement et peut >> contenir des informations privilégiées, propriétaires ou autrement >> privées. Si vous l'avez reçu par erreur, S.V.P. avisez l'expéditeur >> immédiatement et effacez l'original. Toute autre utilisation de ce >> courrier électronique par vous est prohibée./// >> >> >> >> >> >> _______________________________________________ >> FastBit-users mailing list >> [email protected] >> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
