Yes, you are absolutely right. It should be if (read(...) < 0 || ...)
This problem is corrected in SVN 489. Let me know if you find something else.. John On 3/13/12 2:55 PM, Dominique Prunier wrote: > Hmm, seems like it is related to if (0 <= > static_cast<ibis::direkte*>(idx)->read(idxf.c_str()) on line 185 of > category.cpp. Shouldn't it be 0!=read(..) instead of 0<=read(..) ? > > Thanks, > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Dominique Prunier > Sent: Tuesday, March 13, 2012 5:16 PM > To: K. John Wu > Cc: FastBit Users > Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to > match LIKE patterns case-sensitively and perform specific optimizations > > Hey John, > > No segfault anymore but it seems that now it seems that it is always > regenerating the index all the time :/ > > Thanks, > > -----Original Message----- > From: K. John Wu [mailto:[email protected]] > Sent: Tuesday, March 13, 2012 4:44 PM > To: Dominique Prunier > Cc: FastBit Users > Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to > match LIKE patterns case-sensitively and perform specific optimizations > > Hi, Dominique, > > Just checked in SVN version 488. Please give it a try when you get > the chance. Thanks. > > John > > > On 3/13/12 11:58 AM, Dominique Prunier wrote: >> Cool. I'll test it right after you commit it. >> While we're at fixing this, i think there is a memory leak in void >> ibis::category::prepareMembers(). The ibis::fileManager::storage *st (on >> line 182) is not freed by index, direkte or category (index just nullify the >> pointer). Not sure who should free it. >> >> Thanks, >> >> -----Original Message----- >> From: K. John Wu [mailto:[email protected]] >> Sent: Tuesday, March 13, 2012 2:53 PM >> To: Dominique Prunier >> Cc: FastBit Users >> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >> match LIKE patterns case-sensitively and perform specific optimizations >> >> Hi, Dominique, >> >> I think I know where the problem is -- during the process of >> recreating the new index, the old file was not cleaned up properly. I >> have implemented a fix and am doing some testing on it. Will check in >> the code as soon as I am comfortable that I have not broken anything >> with the new changes.. >> >> John >> >> >> On 3/13/12 11:48 AM, Dominique Prunier wrote: >>> By the way, i checked index creation and it doesn't exhibit the issue. The >>> only way to reproduce is to use an old indexed partition (category columns) >>> and run the revision 487 on it. It seems that something bad happens during >>> the conversion and make the cleanup crash. >>> >>> -----Original Message----- >>> From: [email protected] >>> [mailto:[email protected]] On Behalf Of Dominique Prunier >>> Sent: Tuesday, March 13, 2012 12:42 PM >>> To: FastBit Users; K. John Wu >>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>> match LIKE patterns case-sensitively and perform specific optimizations >>> >>> John, >>> >>> Seems like the segfault appears in the cleaning methods of the file manager: >>> >>> ==5046== Warning: set address range perms: large range [0x6f3f030, >>> 0x1f5df050) (noaccess) >>> ==5046== Invalid read of size 4 >>> ==5046== at 0x50C8BE0: ibis::util::sharedInt32::operator()() const >>> (util.h:901) >>> ==5046== by 0x50C8E37: ibis::fileManager::storage::inUse() const >>> (fileManager.h:259) >>> ==5046== by 0x5669911: ibis::fileManager::unload(unsigned long) >>> (fileManager.cpp:1259) >>> ==5046== by 0x5664C68: ibis::fileManager::clear() (fileManager.cpp:408) >>> ==5046== by 0x5665B98: ibis::fileManager::~fileManager() >>> (fileManager.cpp:654) >>> ==5046== by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so) >>> ==5046== by 0x5C25404: exit (in /lib64/libc-2.13.so) >>> ==5046== by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so) >>> ==5046== Address 0x6f3aec4 is not stack'd, malloc'd or (recently) free'd >>> ... >>> ==5046== Invalid read of size 4 >>> ==5046== at 0x52E9396: ibis::fileManager::storage::pastUse() const >>> (fileManager.h:261) >>> ==5046== by 0x56699FF: ibis::fileManager::unload(unsigned long) >>> (fileManager.cpp:1266) >>> ==5046== by 0x5664C68: ibis::fileManager::clear() (fileManager.cpp:408) >>> ==5046== by 0x5665B98: ibis::fileManager::~fileManager() >>> (fileManager.cpp:654) >>> ==5046== by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so) >>> ==5046== by 0x5C25404: exit (in /lib64/libc-2.13.so) >>> ==5046== by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so) >>> ==5046== Address 0x211c9570 is not stack'd, malloc'd or (recently) free'd >>> ... >>> ==5046== Invalid read of size 8 >>> ==5046== at 0x5665068: ibis::fileManager::clear() (fileManager.cpp:444) >>> ==5046== by 0x5665B98: ibis::fileManager::~fileManager() >>> (fileManager.cpp:654) >>> ==5046== by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so) >>> ==5046== by 0x5C25404: exit (in /lib64/libc-2.13.so) >>> ==5046== by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so) >>> ==5046== Address 0x1c is not stack'd, malloc'd or (recently) free'd... >>> ==5046== >>> ==5046== >>> ==5046== Process terminating with default action of signal 11 (SIGSEGV) >>> ==5046== Access not within mapped region at address 0x1C >>> ==5046== at 0x5665068: ibis::fileManager::clear() (fileManager.cpp:444) >>> ==5046== by 0x5665B98: ibis::fileManager::~fileManager() >>> (fileManager.cpp:654) >>> ==5046== by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so) >>> ==5046== by 0x5C25404: exit (in /lib64/libc-2.13.so) >>> ==5046== by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so) >>> ==5046== If you believe this happened as a result of a stack >>> ==5046== overflow in your program's main thread (unlikely but >>> ==5046== possible), you can try to increase the size of the >>> ==5046== main thread stack using the --main-stacksize= flag. >>> ==5046== The main thread stack size used in this run was 8388608. >>> >>> The second execution of the same program works fine, so it has to be >>> related to index creation/recreation. >>> >>> Thanks, >>> >>> -----Original Message----- >>> From: [email protected] >>> [mailto:[email protected]] On Behalf Of Dominique Prunier >>> Sent: Tuesday, March 13, 2012 10:51 AM >>> To: K. John Wu >>> Cc: FastBit Users >>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>> match LIKE patterns case-sensitively and perform specific optimizations >>> >>> Hey John, >>> >>> It seems to work just fine now, it seamlessly recreated indexes on my old >>> partition. >>> However, i'm having a segfault at the end of the first execution (the one >>> that converted the index). >>> I'll investigate this and tell you what i find. >>> >>> Thanks, >>> >>> -----Original Message----- >>> From: K. John Wu [mailto:[email protected]] >>> Sent: Monday, March 12, 2012 6:39 PM >>> To: Dominique Prunier >>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>> match LIKE patterns case-sensitively and perform specific optimizations >>> >>> Just added checking to make sure the index type to the functions >>> caused your problem (a read function that directly works with >>> ibis::fileManager::storage). It should now automatically override all >>> the relics with direktes. I am testing the code now. The source code >>> is SVN 487. >>> >>> John >>> >>> >>> On 3/12/12 1:49 PM, Dominique Prunier wrote: >>>> No problem. Do we want to do something about the migration from relic to >>>> direkte ? >>>> >>>> -----Original Message----- >>>> From: K. John Wu [mailto:[email protected]] >>>> Sent: Monday, March 12, 2012 2:21 PM >>>> To: Dominique Prunier >>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>>> match LIKE patterns case-sensitively and perform specific optimizations >>>> >>>> Thanks for the confirmation. >>>> >>>> John >>>> >>>> >>>> On 3/12/12 11:07 AM, Dominique Prunier wrote: >>>>> It seems to work just fine for me in r486. >>>>> >>>>> -----Original Message----- >>>>> From: K. John Wu [mailto:[email protected]] >>>>> Sent: Monday, March 12, 2012 1:51 PM >>>>> To: Dominique Prunier >>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>>>> match LIKE patterns case-sensitively and perform specific optimizations >>>>> >>>>> If you have verified the answers are the same as before, then we don't >>>>> have a off-by-1 problem. At this point, I have not done that. Let me >>>>> know if have. >>>>> >>>>> John >>>>> >>>>> >>>>> On 3/12/12 10:48 AM, Dominique Prunier wrote: >>>>>> Hmm, the original patch was working fairly well. I tried a couple of >>>>>> limits (range empty, start and/or ends at first value and/or last >>>>>> value). I didn't noticed any other change. Are you talking about this or >>>>>> the segfault in direkte ? >>>>>> >>>>>> -----Original Message----- >>>>>> From: K. John Wu [mailto:[email protected]] >>>>>> Sent: Monday, March 12, 2012 1:41 PM >>>>>> To: Dominique Prunier >>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>>>>> match LIKE patterns case-sensitively and perform specific optimizations >>>>>> >>>>>> Just hid release 1.2.9, there might be an off-by-1 problem as well. >>>>>> Need to dig deeper.. >>>>>> >>>>>> John >>>>>> >>>>>> >>>>>> On 3/12/12 10:13 AM, Dominique Prunier wrote: >>>>>>> Yep, but i bet you changed them for a reason, maybe a compile warning >>>>>>> or something (they were int32_t in my patch). Array_t indexes are >>>>>>> definitely uints_32. Reverting them to int32_t works but maybe it would >>>>>>> worth thinking about it. >>>>>>> >>>>>>> About the segfault, i captured an example of valgrind error (which >>>>>>> ultimately leads to a segfault). As you can see, it is during query >>>>>>> evaluation, not when reading the index. >>>>>>> >>>>>>> ==5672== Invalid read of size 4 >>>>>>> ==5672== at 0x5414F60: ibis::bitvector::or_d1(ibis::bitvector >>>>>>> const&) (bitvector.cpp:2934) >>>>>>> ==5672== by 0x541C622: ibis::bitvector::operator|=(ibis::bitvector >>>>>>> const&) (bitvector.cpp:1272) >>>>>>> ==5672== by 0x52DA4A4: ibis::index::sumBins(unsigned int, unsigned >>>>>>> int, ibis::bitvector&) const (index.cpp:6183) >>>>>>> ==5672== by 0x55D097D: >>>>>>> ibis::direkte::evaluate(ibis::qContinuousRange const&, >>>>>>> ibis::bitvector&) const (idirekte.cpp:1071) >>>>>>> ==5672== by 0x5517FB3: ibis::category::stringSearch(char const*, >>>>>>> ibis::bitvector&) const (category.cpp:390) >>>>>>> ==5672== by 0x5276C41: ibis::query::doEvaluate(ibis::qExpr const*, >>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3948) >>>>>>> ==5672== by 0x52770E7: ibis::query::doEvaluate(ibis::qExpr const*, >>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3779) >>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, >>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776) >>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, >>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776) >>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, >>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776) >>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, >>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776) >>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, >>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776) >>>>>>> ==5672== Address 0x9416f20 is 0 bytes after a block of size 354,720 >>>>>>> alloc'd >>>>>>> ==5672== at 0x4C28C6D: malloc (in >>>>>>> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) >>>>>>> ==5672== by 0x5445F24: ibis::fileManager::storage::storage(unsigned >>>>>>> long) (fileManager.cpp:1718) >>>>>>> ==5672== by 0x54466D9: ibis::fileManager::storage::enlarge(unsigned >>>>>>> long) (fileManager.cpp:1977) >>>>>>> ==5672== by 0x51A73B1: ibis::array_t<unsigned int>::resize(unsigned >>>>>>> long) (array_t.cpp:1412) >>>>>>> ==5672== by 0x54119FA: ibis::bitvector::decompress() >>>>>>> (bitvector.cpp:364) >>>>>>> ==5672== by 0x52DA45B: ibis::index::sumBins(unsigned int, unsigned >>>>>>> int, ibis::bitvector&) const (index.cpp:6180) >>>>>>> ==5672== by 0x55D097D: >>>>>>> ibis::direkte::evaluate(ibis::qContinuousRange const&, >>>>>>> ibis::bitvector&) const (idirekte.cpp:1071) >>>>>>> ==5672== by 0x5517FB3: ibis::category::stringSearch(char const*, >>>>>>> ibis::bitvector&) const (category.cpp:390) >>>>>>> ==5672== by 0x5276C41: ibis::query::doEvaluate(ibis::qExpr const*, >>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3948) >>>>>>> ==5672== by 0x52770E7: ibis::query::doEvaluate(ibis::qExpr const*, >>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3779) >>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, >>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776) >>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, >>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776) >>>>>>> >>>>>>> I'm not able to reproduce on a simplistic use case. Not sure exactly >>>>>>> what triggers this. Here again, this is not dramatic since i just have >>>>>>> to regenerate my indexes but i'm wondering if there were a way to catch >>>>>>> this (i'm thinking about people upgrading from a version prior to >>>>>>> 1.2.9). >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: K. John Wu [mailto:[email protected]] >>>>>>> Sent: Monday, March 12, 2012 1:02 PM >>>>>>> To: Dominique Prunier >>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added >>>>>>> to match LIKE patterns case-sensitively and perform specific >>>>>>> optimizations >>>>>>> >>>>>>> Hi, Dominique, >>>>>>> >>>>>>> Let me just confirm that the two lines where the change from int32_t >>>>>>> to uint32_t should be reversed are line 458 and 459 of dictionary.cpp, >>>>>>> right? >>>>>>> >>>>>>> John >>>>>>> >>>>>>> >>>>>>> On 3/12/12 9:52 AM, Dominique Prunier wrote: >>>>>>>> Hey John, >>>>>>>> >>>>>>>> The problem is that it doesn't actually fail when reading the index. >>>>>>>> The index is read but during the evaluation, i have segfaults, bogus >>>>>>>> results or valgrind errors. Once i regenerated the indexes for my >>>>>>>> category column, everything worked liked a charm. >>>>>>> >>>>>>>> >>>>>>>> It was also misleading because of the other issue (unsigned ints that >>>>>>>> should have been signed ints) that segfaulted too. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: K. John Wu [mailto:[email protected]] >>>>>>>> Sent: Monday, March 12, 2012 12:43 PM >>>>>>>> To: Dominique Prunier >>>>>>>> Cc: FastBit Users >>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added >>>>>>>> to match LIKE patterns case-sensitively and perform specific >>>>>>>> optimizations >>>>>>>> >>>>>>>> Hi, Dominique, >>>>>>>> >>>>>>>> I thought that I have checked index types. If you happen to know the >>>>>>>> stack trace for the reading operation, let me know. Otherwise, it >>>>>>>> might take me a while to figure out a good way to reproduce the >>>>>>>> problem.. >>>>>>>> >>>>>>>> John >>>>>>>> >>>>>>>> >>>>>>>> On 3/12/12 9:30 AM, Dominique Prunier wrote: >>>>>>>>> Ok, figured out the other segfault. The index have to be regenerated >>>>>>>>> with the change from relic to direkte. My guess is that it was >>>>>>>>> reading something invalid. Is there a missing check in the index read >>>>>>>>> method ? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: [email protected] >>>>>>>>> [mailto:[email protected]] On Behalf Of Dominique >>>>>>>>> Prunier >>>>>>>>> Sent: Monday, March 12, 2012 11:45 AM >>>>>>>>> To: K. John Wu >>>>>>>>> Cc: FastBit Users >>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added >>>>>>>>> to match LIKE patterns case-sensitively and perform specific >>>>>>>>> optimizations >>>>>>>>> >>>>>>>>> Hey John, >>>>>>>>> >>>>>>>>> The fix, as checked out in the revision 484 breaks the binary search >>>>>>>>> of the pattern prefix: >>>>>>>>> - int32_t b = 0; >>>>>>>>> - int32_t e = key_.size() - 1; >>>>>>>>> + uint32_t b = 0; >>>>>>>>> + uint32_t e = key_.size() - 1; >>>>>>>>> >>>>>>>>> Since the stop condition of the loop can be that one of the index is >>>>>>>>> -1, this now fails with a segfault. >>>>>>>>> >>>>>>>>> I'm troubleshooting another segfault in the bitvector right now >>>>>>>>> (could it be related to the change in r 479 ?) >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: K. John Wu [mailto:[email protected]] >>>>>>>>> Sent: Saturday, March 10, 2012 2:24 PM >>>>>>>>> To: Dominique Prunier >>>>>>>>> Cc: FastBit Users >>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added >>>>>>>>> to match LIKE patterns case-sensitively and perform specific >>>>>>>>> optimizations >>>>>>>>> >>>>>>>>> Just checked in the modification to allow users to define >>>>>>>>> FASTBIT_CS_PATTERN_MATCH to 0 to disable case sensitive matches. The >>>>>>>>> new SVN revision is 484. >>>>>>>>> >>>>>>>>> Also looked through other macros to make sure they are used >>>>>>>>> consistently. >>>>>>>>> >>>>>>>>> John >>>>>>>>> >>>>>>>>> >>>>>>>>> On 3/10/12 9:20 AM, Dominique Prunier wrote: >>>>>>>>>> Hey John, >>>>>>>>>> >>>>>>>>>> I just noticed a small typo in utils.h, the macro is called >>>>>>>>>> FASTBOT_... I don't think it was expected but it has the nice side >>>>>>>>>> effect of disabling new code by default thus preserving current >>>>>>>>>> behavior (case insensitive). Should we actually keep it in util.h >>>>>>>>>> now that it is documented in INSTALL ? >>>>>>>>>> >>>>>>>>>> https://codeforge.lbl.gov/plugins/scmsvn/viewcvs.php/trunk/src/util.h?root=fastbit&r1=483&r2=482&pathrev=483 >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> ________________________________________ >>>>>>>>>> From: K. John Wu [[email protected]] >>>>>>>>>> Sent: March-09-12 10:47 PM >>>>>>>>>> To: Dominique Prunier >>>>>>>>>> Cc: FastBit Users >>>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define >>>>>>>>>> added to match LIKE patterns case-sensitively and perform specific >>>>>>>>>> optimizations >>>>>>>>>> >>>>>>>>>> Hi, Dominique, >>>>>>>>>> >>>>>>>>>> I would like to add FASTBIT_ prefix to the macro CS_PATTERN_MATCH to >>>>>>>>>> avoid possible collision when FastBit is used with other package. >>>>>>>>>> Hope you don't mind. >>>>>>>>>> >>>>>>>>>> John >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 3/9/12 4:03 PM, K. John Wu wrote: >>>>>>>>>>> Hi, Dominique, >>>>>>>>>>> >>>>>>>>>>> I have run through my usual set of tests and did not find any >>>>>>>>>>> problem >>>>>>>>>>> with your patch. It is now in SVN 482. Please give it a try when >>>>>>>>>>> you >>>>>>>>>>> get the chance. >>>>>>>>>>> >>>>>>>>>>> Thanks. >>>>>>>>>>> >>>>>>>>>>> John >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 3/9/12 10:17 AM, Dominique Prunier wrote: >>>>>>>>>>>> Quick update to my patch: >>>>>>>>>>>> >>>>>>>>>>>> · Changed dictionary::patternMatch to make it work with CI >>>>>>>>>>>> too >>>>>>>>>>>> (and i think for efficiency reasons, i have to keep all this here) >>>>>>>>>>>> >>>>>>>>>>>> · Moved the STR_MATCH_* constants from util.cpp to util.h >>>>>>>>>>>> and >>>>>>>>>>>> use them in dictionary::patternMatch >>>>>>>>>>>> >>>>>>>>>>>> · Removed the CS/CI ifdef from category.cpp >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I did more testing, and on my set of ~90 000 test queries, the >>>>>>>>>>>> execution time dropped from ~515 seconds to ~20 seconds. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> *From:*[email protected] >>>>>>>>>>>> [mailto:[email protected]] *On Behalf Of >>>>>>>>>>>> *Dominique >>>>>>>>>>>> Prunier >>>>>>>>>>>> *Sent:* Thursday, March 08, 2012 2:39 PM >>>>>>>>>>>> *To:* FastBit Users >>>>>>>>>>>> *Subject:* [FastBit-users] PATCH: new CS_PATTERN_MATCH define >>>>>>>>>>>> added to >>>>>>>>>>>> match LIKE patterns case-sensitively and perform specific >>>>>>>>>>>> optimizations >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Here is the first version of my patch to switch SQL like from case >>>>>>>>>>>> insensitive to case sensitive and optimize this use case with >>>>>>>>>>>> CATEGORY >>>>>>>>>>>> columns. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> In a nutshell, what changed is: >>>>>>>>>>>> >>>>>>>>>>>> · We extract the longest (handling the escape char too) >>>>>>>>>>>> constant prefix from the pattern >>>>>>>>>>>> >>>>>>>>>>>> · Instead of testing every value in the dictionary, we >>>>>>>>>>>> binary >>>>>>>>>>>> search the range of values to search (which sometimes even allow to >>>>>>>>>>>> skip pattern matching if no valid range can be found) >>>>>>>>>>>> >>>>>>>>>>>> · We test every value in the range >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On a large dictionary (~130k entries), i’ve commonly it can be one >>>>>>>>>>>> or >>>>>>>>>>>> two order of magnitude faster (in my example, a simple query with a >>>>>>>>>>>> single LIKE predicate drops from ~10ms to ~0.4ms). >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> What i’d like to change/refactor (i’m really a newbie in c++): >>>>>>>>>>>> >>>>>>>>>>>> · Remove the prefix extraction and pattern matching code >>>>>>>>>>>> from >>>>>>>>>>>> dictionary and replace the added method patternSearch by something >>>>>>>>>>>> like findRange. I believe that matching and pattern handling code >>>>>>>>>>>> doesn’t belong to the dictionary. I’d rather move this back to the >>>>>>>>>>>> category class or something. >>>>>>>>>>>> >>>>>>>>>>>> · Having to use a c++ string object to rebuild the longest >>>>>>>>>>>> constant prefix bugs me (suggestions ?). I’m also thinking to have >>>>>>>>>>>> a >>>>>>>>>>>> version that doesn’t support escaping, but it would force me to >>>>>>>>>>>> change >>>>>>>>>>>> strMatch a bit more >>>>>>>>>>>> >>>>>>>>>>>> · To closely match the previous behavior, you can’t match >>>>>>>>>>>> an >>>>>>>>>>>> empty pattern (even the empty string doesn’t match), maybe that >>>>>>>>>>>> would >>>>>>>>>>>> worh being changed >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> As always John, feel free to include this into the main branch. I’m >>>>>>>>>>>> waiting for suggestions to make it more efficient, cleaner, ... >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> */Dominique Prunier/**//* >>>>>>>>>>>> >>>>>>>>>>>> APG Lead Developper >>>>>>>>>>>> >>>>>>>>>>>> Logo-W4N-100dpi >>>>>>>>>>>> >>>>>>>>>>>> 4388, rue Saint-Denis >>>>>>>>>>>> >>>>>>>>>>>> Bureau 309 >>>>>>>>>>>> >>>>>>>>>>>> Montreal (Quebec) H2J 2L1 >>>>>>>>>>>> >>>>>>>>>>>> Tel. +1 514-842-6767 x310 >>>>>>>>>>>> >>>>>>>>>>>> Fax +1 514-842-3989 >>>>>>>>>>>> >>>>>>>>>>>> [email protected] >>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>> >>>>>>>>>>>> www.watch4net.com <http://www.watch4net.com/> >>>>>>>>>>>> >>>>>>>>>>>> / / >>>>>>>>>>>> >>>>>>>>>>>> /This message is for the designated recipient only and may contain >>>>>>>>>>>> privileged, proprietary, or otherwise private information. If you >>>>>>>>>>>> have >>>>>>>>>>>> received it in error, please notify the sender immediately and >>>>>>>>>>>> delete >>>>>>>>>>>> the original. Any other use of this electronic mail by you is >>>>>>>>>>>> prohibited. >>>>>>>>>>>> >>>>>>>>>>>> //Ce message est pour le récipiendaire désigné seulement et peut >>>>>>>>>>>> contenir des informations privilégiées, propriétaires ou autrement >>>>>>>>>>>> privées. Si vous l'avez reçu par erreur, S.V.P. avisez l'expéditeur >>>>>>>>>>>> immédiatement et effacez l'original. Toute autre utilisation de ce >>>>>>>>>>>> courrier électronique par vous est prohibée./// >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> FastBit-users mailing list >>>>>>>>>>>> [email protected] >>>>>>>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >>>>>>>>> _______________________________________________ >>>>>>>>> FastBit-users mailing list >>>>>>>>> [email protected] >>>>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >>> _______________________________________________ >>> FastBit-users mailing list >>> [email protected] >>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >>> _______________________________________________ >>> FastBit-users mailing list >>> [email protected] >>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > _______________________________________________ > FastBit-users mailing list > [email protected] > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
