Hi, Dominique, I thought that I have checked index types. If you happen to know the stack trace for the reading operation, let me know. Otherwise, it might take me a while to figure out a good way to reproduce the problem..
John On 3/12/12 9:30 AM, Dominique Prunier wrote: > Ok, figured out the other segfault. The index have to be regenerated with the > change from relic to direkte. My guess is that it was reading something > invalid. Is there a missing check in the index read method ? > > Thanks, > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Dominique Prunier > Sent: Monday, March 12, 2012 11:45 AM > To: K. John Wu > Cc: FastBit Users > Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to > match LIKE patterns case-sensitively and perform specific optimizations > > Hey John, > > The fix, as checked out in the revision 484 breaks the binary search of the > pattern prefix: > - int32_t b = 0; > - int32_t e = key_.size() - 1; > + uint32_t b = 0; > + uint32_t e = key_.size() - 1; > > Since the stop condition of the loop can be that one of the index is -1, this > now fails with a segfault. > > I'm troubleshooting another segfault in the bitvector right now (could it be > related to the change in r 479 ?) > > Thanks, > > -----Original Message----- > From: K. John Wu [mailto:[email protected]] > Sent: Saturday, March 10, 2012 2:24 PM > To: Dominique Prunier > Cc: FastBit Users > Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to > match LIKE patterns case-sensitively and perform specific optimizations > > Just checked in the modification to allow users to define > FASTBIT_CS_PATTERN_MATCH to 0 to disable case sensitive matches. The > new SVN revision is 484. > > Also looked through other macros to make sure they are used consistently. > > John > > > On 3/10/12 9:20 AM, Dominique Prunier wrote: >> Hey John, >> >> I just noticed a small typo in utils.h, the macro is called FASTBOT_... I >> don't think it was expected but it has the nice side effect of disabling new >> code by default thus preserving current behavior (case insensitive). Should >> we actually keep it in util.h now that it is documented in INSTALL ? >> >> https://codeforge.lbl.gov/plugins/scmsvn/viewcvs.php/trunk/src/util.h?root=fastbit&r1=483&r2=482&pathrev=483 >> >> Thanks, >> ________________________________________ >> From: K. John Wu [[email protected]] >> Sent: March-09-12 10:47 PM >> To: Dominique Prunier >> Cc: FastBit Users >> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >> match LIKE patterns case-sensitively and perform specific optimizations >> >> Hi, Dominique, >> >> I would like to add FASTBIT_ prefix to the macro CS_PATTERN_MATCH to >> avoid possible collision when FastBit is used with other package. >> Hope you don't mind. >> >> John >> >> >> On 3/9/12 4:03 PM, K. John Wu wrote: >>> Hi, Dominique, >>> >>> I have run through my usual set of tests and did not find any problem >>> with your patch. It is now in SVN 482. Please give it a try when you >>> get the chance. >>> >>> Thanks. >>> >>> John >>> >>> >>> >>> On 3/9/12 10:17 AM, Dominique Prunier wrote: >>>> Quick update to my patch: >>>> >>>> · Changed dictionary::patternMatch to make it work with CI too >>>> (and i think for efficiency reasons, i have to keep all this here) >>>> >>>> · Moved the STR_MATCH_* constants from util.cpp to util.h and >>>> use them in dictionary::patternMatch >>>> >>>> · Removed the CS/CI ifdef from category.cpp >>>> >>>> >>>> >>>> I did more testing, and on my set of ~90 000 test queries, the >>>> execution time dropped from ~515 seconds to ~20 seconds. >>>> >>>> >>>> >>>> Thanks, >>>> >>>> >>>> >>>> *From:*[email protected] >>>> [mailto:[email protected]] *On Behalf Of *Dominique >>>> Prunier >>>> *Sent:* Thursday, March 08, 2012 2:39 PM >>>> *To:* FastBit Users >>>> *Subject:* [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>>> match LIKE patterns case-sensitively and perform specific optimizations >>>> >>>> >>>> >>>> Here is the first version of my patch to switch SQL like from case >>>> insensitive to case sensitive and optimize this use case with CATEGORY >>>> columns. >>>> >>>> >>>> >>>> In a nutshell, what changed is: >>>> >>>> · We extract the longest (handling the escape char too) >>>> constant prefix from the pattern >>>> >>>> · Instead of testing every value in the dictionary, we binary >>>> search the range of values to search (which sometimes even allow to >>>> skip pattern matching if no valid range can be found) >>>> >>>> · We test every value in the range >>>> >>>> >>>> >>>> On a large dictionary (~130k entries), i’ve commonly it can be one or >>>> two order of magnitude faster (in my example, a simple query with a >>>> single LIKE predicate drops from ~10ms to ~0.4ms). >>>> >>>> >>>> >>>> What i’d like to change/refactor (i’m really a newbie in c++): >>>> >>>> · Remove the prefix extraction and pattern matching code from >>>> dictionary and replace the added method patternSearch by something >>>> like findRange. I believe that matching and pattern handling code >>>> doesn’t belong to the dictionary. I’d rather move this back to the >>>> category class or something. >>>> >>>> · Having to use a c++ string object to rebuild the longest >>>> constant prefix bugs me (suggestions ?). I’m also thinking to have a >>>> version that doesn’t support escaping, but it would force me to change >>>> strMatch a bit more >>>> >>>> · To closely match the previous behavior, you can’t match an >>>> empty pattern (even the empty string doesn’t match), maybe that would >>>> worh being changed >>>> >>>> >>>> >>>> As always John, feel free to include this into the main branch. I’m >>>> waiting for suggestions to make it more efficient, cleaner, ... >>>> >>>> >>>> >>>> Thanks, >>>> >>>> >>>> >>>> */Dominique Prunier/**//* >>>> >>>> APG Lead Developper >>>> >>>> Logo-W4N-100dpi >>>> >>>> 4388, rue Saint-Denis >>>> >>>> Bureau 309 >>>> >>>> Montreal (Quebec) H2J 2L1 >>>> >>>> Tel. +1 514-842-6767 x310 >>>> >>>> Fax +1 514-842-3989 >>>> >>>> [email protected] <mailto:[email protected]> >>>> >>>> www.watch4net.com <http://www.watch4net.com/> >>>> >>>> / / >>>> >>>> /This message is for the designated recipient only and may contain >>>> privileged, proprietary, or otherwise private information. If you have >>>> received it in error, please notify the sender immediately and delete >>>> the original. Any other use of this electronic mail by you is prohibited. >>>> >>>> //Ce message est pour le récipiendaire désigné seulement et peut >>>> contenir des informations privilégiées, propriétaires ou autrement >>>> privées. Si vous l'avez reçu par erreur, S.V.P. avisez l'expéditeur >>>> immédiatement et effacez l'original. Toute autre utilisation de ce >>>> courrier électronique par vous est prohibée./// >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> FastBit-users mailing list >>>> [email protected] >>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > _______________________________________________ > FastBit-users mailing list > [email protected] > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
