Hey John, The fix, as checked out in the revision 484 breaks the binary search of the pattern prefix: - int32_t b = 0; - int32_t e = key_.size() - 1; + uint32_t b = 0; + uint32_t e = key_.size() - 1;
Since the stop condition of the loop can be that one of the index is -1, this now fails with a segfault. I'm troubleshooting another segfault in the bitvector right now (could it be related to the change in r 479 ?) Thanks, -----Original Message----- From: K. John Wu [mailto:[email protected]] Sent: Saturday, March 10, 2012 2:24 PM To: Dominique Prunier Cc: FastBit Users Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to match LIKE patterns case-sensitively and perform specific optimizations Just checked in the modification to allow users to define FASTBIT_CS_PATTERN_MATCH to 0 to disable case sensitive matches. The new SVN revision is 484. Also looked through other macros to make sure they are used consistently. John On 3/10/12 9:20 AM, Dominique Prunier wrote: > Hey John, > > I just noticed a small typo in utils.h, the macro is called FASTBOT_... I > don't think it was expected but it has the nice side effect of disabling new > code by default thus preserving current behavior (case insensitive). Should > we actually keep it in util.h now that it is documented in INSTALL ? > > https://codeforge.lbl.gov/plugins/scmsvn/viewcvs.php/trunk/src/util.h?root=fastbit&r1=483&r2=482&pathrev=483 > > Thanks, > ________________________________________ > From: K. John Wu [[email protected]] > Sent: March-09-12 10:47 PM > To: Dominique Prunier > Cc: FastBit Users > Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to > match LIKE patterns case-sensitively and perform specific optimizations > > Hi, Dominique, > > I would like to add FASTBIT_ prefix to the macro CS_PATTERN_MATCH to > avoid possible collision when FastBit is used with other package. > Hope you don't mind. > > John > > > On 3/9/12 4:03 PM, K. John Wu wrote: >> Hi, Dominique, >> >> I have run through my usual set of tests and did not find any problem >> with your patch. It is now in SVN 482. Please give it a try when you >> get the chance. >> >> Thanks. >> >> John >> >> >> >> On 3/9/12 10:17 AM, Dominique Prunier wrote: >>> Quick update to my patch: >>> >>> · Changed dictionary::patternMatch to make it work with CI too >>> (and i think for efficiency reasons, i have to keep all this here) >>> >>> · Moved the STR_MATCH_* constants from util.cpp to util.h and >>> use them in dictionary::patternMatch >>> >>> · Removed the CS/CI ifdef from category.cpp >>> >>> >>> >>> I did more testing, and on my set of ~90 000 test queries, the >>> execution time dropped from ~515 seconds to ~20 seconds. >>> >>> >>> >>> Thanks, >>> >>> >>> >>> *From:*[email protected] >>> [mailto:[email protected]] *On Behalf Of *Dominique >>> Prunier >>> *Sent:* Thursday, March 08, 2012 2:39 PM >>> *To:* FastBit Users >>> *Subject:* [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>> match LIKE patterns case-sensitively and perform specific optimizations >>> >>> >>> >>> Here is the first version of my patch to switch SQL like from case >>> insensitive to case sensitive and optimize this use case with CATEGORY >>> columns. >>> >>> >>> >>> In a nutshell, what changed is: >>> >>> · We extract the longest (handling the escape char too) >>> constant prefix from the pattern >>> >>> · Instead of testing every value in the dictionary, we binary >>> search the range of values to search (which sometimes even allow to >>> skip pattern matching if no valid range can be found) >>> >>> · We test every value in the range >>> >>> >>> >>> On a large dictionary (~130k entries), i’ve commonly it can be one or >>> two order of magnitude faster (in my example, a simple query with a >>> single LIKE predicate drops from ~10ms to ~0.4ms). >>> >>> >>> >>> What i’d like to change/refactor (i’m really a newbie in c++): >>> >>> · Remove the prefix extraction and pattern matching code from >>> dictionary and replace the added method patternSearch by something >>> like findRange. I believe that matching and pattern handling code >>> doesn’t belong to the dictionary. I’d rather move this back to the >>> category class or something. >>> >>> · Having to use a c++ string object to rebuild the longest >>> constant prefix bugs me (suggestions ?). I’m also thinking to have a >>> version that doesn’t support escaping, but it would force me to change >>> strMatch a bit more >>> >>> · To closely match the previous behavior, you can’t match an >>> empty pattern (even the empty string doesn’t match), maybe that would >>> worh being changed >>> >>> >>> >>> As always John, feel free to include this into the main branch. I’m >>> waiting for suggestions to make it more efficient, cleaner, ... >>> >>> >>> >>> Thanks, >>> >>> >>> >>> */Dominique Prunier/**//* >>> >>> APG Lead Developper >>> >>> Logo-W4N-100dpi >>> >>> 4388, rue Saint-Denis >>> >>> Bureau 309 >>> >>> Montreal (Quebec) H2J 2L1 >>> >>> Tel. +1 514-842-6767 x310 >>> >>> Fax +1 514-842-3989 >>> >>> [email protected] <mailto:[email protected]> >>> >>> www.watch4net.com <http://www.watch4net.com/> >>> >>> / / >>> >>> /This message is for the designated recipient only and may contain >>> privileged, proprietary, or otherwise private information. If you have >>> received it in error, please notify the sender immediately and delete >>> the original. Any other use of this electronic mail by you is prohibited. >>> >>> //Ce message est pour le récipiendaire désigné seulement et peut >>> contenir des informations privilégiées, propriétaires ou autrement >>> privées. Si vous l'avez reçu par erreur, S.V.P. avisez l'expéditeur >>> immédiatement et effacez l'original. Toute autre utilisation de ce >>> courrier électronique par vous est prohibée./// >>> >>> >>> >>> >>> >>> _______________________________________________ >>> FastBit-users mailing list >>> [email protected] >>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
