I'm building a nice test suite, even though it is java based, not sure it would be much helpful for you...
-----Original Message----- From: K. John Wu [mailto:[email protected]] Sent: Thursday, March 15, 2012 1:31 AM To: Dominique Prunier Cc: FastBit Users Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to match LIKE patterns case-sensitively and perform specific optimizations Hi, Dominique, Thanks for the fix. I have run through my tests. Clearly, I don't have a test case that can exercise this feature right now. Will have to think about how to get a few in there.. Let me know if you find anything else. John On 3/14/12 6:45 PM, Dominique Prunier wrote: > Hey John, > > I just noticed that the pattern search was badly broken... I have no idea how > i haven't catched that earlier, it was not using the proper index value. If > i'm not mistaken, this should fix it: > > diff --git a/src/category.cpp b/src/category.cpp > index ef3dc77..61039f9 100644 > --- a/src/category.cpp > +++ b/src/category.cpp > @@ -618,7 +618,7 @@ long ibis::category::patternSearch(const char *pat) const > { > std::auto_ptr< ibis::array_t<uint32_t> > tmp(new > ibis::array_t<uint32_t>); > dic.patternSearch(pat, *tmp); > for (uint32_t j = 0; j < tmp->size(); ++ j) { > - const ibis::bitvector *bv = rlc->getBitvector(j); > + const ibis::bitvector *bv = rlc->getBitvector((*tmp)[j]); > if (bv != 0) > est += bv->cnt(); > } > @@ -658,7 +658,7 @@ long ibis::category::patternSearch(const char *pat, > std::auto_ptr< ibis::array_t<uint32_t> > tmp(new > ibis::array_t<uint32_t>); > dic.patternSearch(pat, *tmp); > for (uint32_t j = 0; j < tmp->size(); ++ j) { > - const ibis::bitvector *bv = rlc->getBitvector(j); > + const ibis::bitvector *bv = rlc->getBitvector((*tmp)[j]); > if (bv != 0) { > ++ cnt; > est += bv->cnt(); > > I was probably testing against an older version of the library. > > Thanks, > > -----Original Message----- > From: K. John Wu [mailto:[email protected]] > Sent: Wednesday, March 14, 2012 9:18 PM > To: Dominique Prunier > Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to > match LIKE patterns case-sensitively and perform specific optimizations > > Hi, Dominique, > > I have just checked in a couple of minor changes. Would be good to > put out a stable version to replace the broken version that was taken > down. > > Please let me know if you find anything that still needs attention. > > Thanks. > > John > > > On 3/14/12 7:14 AM, Dominique Prunier wrote: >> Seems to work for me. I'll do further testings, i'd like to isolate a stable >> version sometime this week. >> >> Thanks, >> >> -----Original Message----- >> From: K. John Wu [mailto:[email protected]] >> Sent: Tuesday, March 13, 2012 7:23 PM >> To: Dominique Prunier >> Cc: FastBit Users >> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >> match LIKE patterns case-sensitively and perform specific optimizations >> >> Yes, you are absolutely right. It should be >> >> if (read(...) < 0 || ...) >> >> This problem is corrected in SVN 489. Let me know if you find >> something else.. >> >> John >> >> >> On 3/13/12 2:55 PM, Dominique Prunier wrote: >>> Hmm, seems like it is related to if (0 <= >>> static_cast<ibis::direkte*>(idx)->read(idxf.c_str()) on line 185 of >>> category.cpp. Shouldn't it be 0!=read(..) instead of 0<=read(..) ? >>> >>> Thanks, >>> >>> -----Original Message----- >>> From: [email protected] >>> [mailto:[email protected]] On Behalf Of Dominique Prunier >>> Sent: Tuesday, March 13, 2012 5:16 PM >>> To: K. John Wu >>> Cc: FastBit Users >>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>> match LIKE patterns case-sensitively and perform specific optimizations >>> >>> Hey John, >>> >>> No segfault anymore but it seems that now it seems that it is always >>> regenerating the index all the time :/ >>> >>> Thanks, >>> >>> -----Original Message----- >>> From: K. John Wu [mailto:[email protected]] >>> Sent: Tuesday, March 13, 2012 4:44 PM >>> To: Dominique Prunier >>> Cc: FastBit Users >>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>> match LIKE patterns case-sensitively and perform specific optimizations >>> >>> Hi, Dominique, >>> >>> Just checked in SVN version 488. Please give it a try when you get >>> the chance. Thanks. >>> >>> John >>> >>> >>> On 3/13/12 11:58 AM, Dominique Prunier wrote: >>>> Cool. I'll test it right after you commit it. >>>> While we're at fixing this, i think there is a memory leak in void >>>> ibis::category::prepareMembers(). The ibis::fileManager::storage *st (on >>>> line 182) is not freed by index, direkte or category (index just nullify >>>> the pointer). Not sure who should free it. >>>> >>>> Thanks, >>>> >>>> -----Original Message----- >>>> From: K. John Wu [mailto:[email protected]] >>>> Sent: Tuesday, March 13, 2012 2:53 PM >>>> To: Dominique Prunier >>>> Cc: FastBit Users >>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>>> match LIKE patterns case-sensitively and perform specific optimizations >>>> >>>> Hi, Dominique, >>>> >>>> I think I know where the problem is -- during the process of >>>> recreating the new index, the old file was not cleaned up properly. I >>>> have implemented a fix and am doing some testing on it. Will check in >>>> the code as soon as I am comfortable that I have not broken anything >>>> with the new changes.. >>>> >>>> John >>>> >>>> >>>> On 3/13/12 11:48 AM, Dominique Prunier wrote: >>>>> By the way, i checked index creation and it doesn't exhibit the issue. >>>>> The only way to reproduce is to use an old indexed partition (category >>>>> columns) and run the revision 487 on it. It seems that something bad >>>>> happens during the conversion and make the cleanup crash. >>>>> >>>>> -----Original Message----- >>>>> From: [email protected] >>>>> [mailto:[email protected]] On Behalf Of Dominique >>>>> Prunier >>>>> Sent: Tuesday, March 13, 2012 12:42 PM >>>>> To: FastBit Users; K. John Wu >>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>>>> match LIKE patterns case-sensitively and perform specific optimizations >>>>> >>>>> John, >>>>> >>>>> Seems like the segfault appears in the cleaning methods of the file >>>>> manager: >>>>> >>>>> ==5046== Warning: set address range perms: large range [0x6f3f030, >>>>> 0x1f5df050) (noaccess) >>>>> ==5046== Invalid read of size 4 >>>>> ==5046== at 0x50C8BE0: ibis::util::sharedInt32::operator()() const >>>>> (util.h:901) >>>>> ==5046== by 0x50C8E37: ibis::fileManager::storage::inUse() const >>>>> (fileManager.h:259) >>>>> ==5046== by 0x5669911: ibis::fileManager::unload(unsigned long) >>>>> (fileManager.cpp:1259) >>>>> ==5046== by 0x5664C68: ibis::fileManager::clear() (fileManager.cpp:408) >>>>> ==5046== by 0x5665B98: ibis::fileManager::~fileManager() >>>>> (fileManager.cpp:654) >>>>> ==5046== by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so) >>>>> ==5046== by 0x5C25404: exit (in /lib64/libc-2.13.so) >>>>> ==5046== by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so) >>>>> ==5046== Address 0x6f3aec4 is not stack'd, malloc'd or (recently) free'd >>>>> ... >>>>> ==5046== Invalid read of size 4 >>>>> ==5046== at 0x52E9396: ibis::fileManager::storage::pastUse() const >>>>> (fileManager.h:261) >>>>> ==5046== by 0x56699FF: ibis::fileManager::unload(unsigned long) >>>>> (fileManager.cpp:1266) >>>>> ==5046== by 0x5664C68: ibis::fileManager::clear() (fileManager.cpp:408) >>>>> ==5046== by 0x5665B98: ibis::fileManager::~fileManager() >>>>> (fileManager.cpp:654) >>>>> ==5046== by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so) >>>>> ==5046== by 0x5C25404: exit (in /lib64/libc-2.13.so) >>>>> ==5046== by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so) >>>>> ==5046== Address 0x211c9570 is not stack'd, malloc'd or (recently) free'd >>>>> ... >>>>> ==5046== Invalid read of size 8 >>>>> ==5046== at 0x5665068: ibis::fileManager::clear() (fileManager.cpp:444) >>>>> ==5046== by 0x5665B98: ibis::fileManager::~fileManager() >>>>> (fileManager.cpp:654) >>>>> ==5046== by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so) >>>>> ==5046== by 0x5C25404: exit (in /lib64/libc-2.13.so) >>>>> ==5046== by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so) >>>>> ==5046== Address 0x1c is not stack'd, malloc'd or (recently) free'd... >>>>> ==5046== >>>>> ==5046== >>>>> ==5046== Process terminating with default action of signal 11 (SIGSEGV) >>>>> ==5046== Access not within mapped region at address 0x1C >>>>> ==5046== at 0x5665068: ibis::fileManager::clear() (fileManager.cpp:444) >>>>> ==5046== by 0x5665B98: ibis::fileManager::~fileManager() >>>>> (fileManager.cpp:654) >>>>> ==5046== by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so) >>>>> ==5046== by 0x5C25404: exit (in /lib64/libc-2.13.so) >>>>> ==5046== by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so) >>>>> ==5046== If you believe this happened as a result of a stack >>>>> ==5046== overflow in your program's main thread (unlikely but >>>>> ==5046== possible), you can try to increase the size of the >>>>> ==5046== main thread stack using the --main-stacksize= flag. >>>>> ==5046== The main thread stack size used in this run was 8388608. >>>>> >>>>> The second execution of the same program works fine, so it has to be >>>>> related to index creation/recreation. >>>>> >>>>> Thanks, >>>>> >>>>> -----Original Message----- >>>>> From: [email protected] >>>>> [mailto:[email protected]] On Behalf Of Dominique >>>>> Prunier >>>>> Sent: Tuesday, March 13, 2012 10:51 AM >>>>> To: K. John Wu >>>>> Cc: FastBit Users >>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>>>> match LIKE patterns case-sensitively and perform specific optimizations >>>>> >>>>> Hey John, >>>>> >>>>> It seems to work just fine now, it seamlessly recreated indexes on my old >>>>> partition. >>>>> However, i'm having a segfault at the end of the first execution (the one >>>>> that converted the index). >>>>> I'll investigate this and tell you what i find. >>>>> >>>>> Thanks, >>>>> >>>>> -----Original Message----- >>>>> From: K. John Wu [mailto:[email protected]] >>>>> Sent: Monday, March 12, 2012 6:39 PM >>>>> To: Dominique Prunier >>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>>>> match LIKE patterns case-sensitively and perform specific optimizations >>>>> >>>>> Just added checking to make sure the index type to the functions >>>>> caused your problem (a read function that directly works with >>>>> ibis::fileManager::storage). It should now automatically override all >>>>> the relics with direktes. I am testing the code now. The source code >>>>> is SVN 487. >>>>> >>>>> John >>>>> >>>>> >>>>> On 3/12/12 1:49 PM, Dominique Prunier wrote: >>>>>> No problem. Do we want to do something about the migration from relic to >>>>>> direkte ? >>>>>> >>>>>> -----Original Message----- >>>>>> From: K. John Wu [mailto:[email protected]] >>>>>> Sent: Monday, March 12, 2012 2:21 PM >>>>>> To: Dominique Prunier >>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>>>>> match LIKE patterns case-sensitively and perform specific optimizations >>>>>> >>>>>> Thanks for the confirmation. >>>>>> >>>>>> John >>>>>> >>>>>> >>>>>> On 3/12/12 11:07 AM, Dominique Prunier wrote: >>>>>>> It seems to work just fine for me in r486. >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: K. John Wu [mailto:[email protected]] >>>>>>> Sent: Monday, March 12, 2012 1:51 PM >>>>>>> To: Dominique Prunier >>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added >>>>>>> to match LIKE patterns case-sensitively and perform specific >>>>>>> optimizations >>>>>>> >>>>>>> If you have verified the answers are the same as before, then we don't >>>>>>> have a off-by-1 problem. At this point, I have not done that. Let me >>>>>>> know if have. >>>>>>> >>>>>>> John >>>>>>> >>>>>>> >>>>>>> On 3/12/12 10:48 AM, Dominique Prunier wrote: >>>>>>>> Hmm, the original patch was working fairly well. I tried a couple of >>>>>>>> limits (range empty, start and/or ends at first value and/or last >>>>>>>> value). I didn't noticed any other change. Are you talking about this >>>>>>>> or the segfault in direkte ? >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: K. John Wu [mailto:[email protected]] >>>>>>>> Sent: Monday, March 12, 2012 1:41 PM >>>>>>>> To: Dominique Prunier >>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added >>>>>>>> to match LIKE patterns case-sensitively and perform specific >>>>>>>> optimizations >>>>>>>> >>>>>>>> Just hid release 1.2.9, there might be an off-by-1 problem as well. >>>>>>>> Need to dig deeper.. >>>>>>>> >>>>>>>> John >>>>>>>> >>>>>>>> >>>>>>>> On 3/12/12 10:13 AM, Dominique Prunier wrote: >>>>>>>>> Yep, but i bet you changed them for a reason, maybe a compile warning >>>>>>>>> or something (they were int32_t in my patch). Array_t indexes are >>>>>>>>> definitely uints_32. Reverting them to int32_t works but maybe it >>>>>>>>> would worth thinking about it. >>>>>>>>> >>>>>>>>> About the segfault, i captured an example of valgrind error (which >>>>>>>>> ultimately leads to a segfault). As you can see, it is during query >>>>>>>>> evaluation, not when reading the index. >>>>>>>>> >>>>>>>>> ==5672== Invalid read of size 4 >>>>>>>>> ==5672== at 0x5414F60: ibis::bitvector::or_d1(ibis::bitvector >>>>>>>>> const&) (bitvector.cpp:2934) >>>>>>>>> ==5672== by 0x541C622: ibis::bitvector::operator|=(ibis::bitvector >>>>>>>>> const&) (bitvector.cpp:1272) >>>>>>>>> ==5672== by 0x52DA4A4: ibis::index::sumBins(unsigned int, unsigned >>>>>>>>> int, ibis::bitvector&) const (index.cpp:6183) >>>>>>>>> ==5672== by 0x55D097D: >>>>>>>>> ibis::direkte::evaluate(ibis::qContinuousRange const&, >>>>>>>>> ibis::bitvector&) const (idirekte.cpp:1071) >>>>>>>>> ==5672== by 0x5517FB3: ibis::category::stringSearch(char const*, >>>>>>>>> ibis::bitvector&) const (category.cpp:390) >>>>>>>>> ==5672== by 0x5276C41: ibis::query::doEvaluate(ibis::qExpr const*, >>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3948) >>>>>>>>> ==5672== by 0x52770E7: ibis::query::doEvaluate(ibis::qExpr const*, >>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3779) >>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, >>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776) >>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, >>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776) >>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, >>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776) >>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, >>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776) >>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, >>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776) >>>>>>>>> ==5672== Address 0x9416f20 is 0 bytes after a block of size 354,720 >>>>>>>>> alloc'd >>>>>>>>> ==5672== at 0x4C28C6D: malloc (in >>>>>>>>> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) >>>>>>>>> ==5672== by 0x5445F24: >>>>>>>>> ibis::fileManager::storage::storage(unsigned long) >>>>>>>>> (fileManager.cpp:1718) >>>>>>>>> ==5672== by 0x54466D9: >>>>>>>>> ibis::fileManager::storage::enlarge(unsigned long) >>>>>>>>> (fileManager.cpp:1977) >>>>>>>>> ==5672== by 0x51A73B1: ibis::array_t<unsigned >>>>>>>>> int>::resize(unsigned long) (array_t.cpp:1412) >>>>>>>>> ==5672== by 0x54119FA: ibis::bitvector::decompress() >>>>>>>>> (bitvector.cpp:364) >>>>>>>>> ==5672== by 0x52DA45B: ibis::index::sumBins(unsigned int, unsigned >>>>>>>>> int, ibis::bitvector&) const (index.cpp:6180) >>>>>>>>> ==5672== by 0x55D097D: >>>>>>>>> ibis::direkte::evaluate(ibis::qContinuousRange const&, >>>>>>>>> ibis::bitvector&) const (idirekte.cpp:1071) >>>>>>>>> ==5672== by 0x5517FB3: ibis::category::stringSearch(char const*, >>>>>>>>> ibis::bitvector&) const (category.cpp:390) >>>>>>>>> ==5672== by 0x5276C41: ibis::query::doEvaluate(ibis::qExpr const*, >>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3948) >>>>>>>>> ==5672== by 0x52770E7: ibis::query::doEvaluate(ibis::qExpr const*, >>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3779) >>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, >>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776) >>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, >>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776) >>>>>>>>> >>>>>>>>> I'm not able to reproduce on a simplistic use case. Not sure exactly >>>>>>>>> what triggers this. Here again, this is not dramatic since i just >>>>>>>>> have to regenerate my indexes but i'm wondering if there were a way >>>>>>>>> to catch this (i'm thinking about people upgrading from a version >>>>>>>>> prior to 1.2.9). >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: K. John Wu [mailto:[email protected]] >>>>>>>>> Sent: Monday, March 12, 2012 1:02 PM >>>>>>>>> To: Dominique Prunier >>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added >>>>>>>>> to match LIKE patterns case-sensitively and perform specific >>>>>>>>> optimizations >>>>>>>>> >>>>>>>>> Hi, Dominique, >>>>>>>>> >>>>>>>>> Let me just confirm that the two lines where the change from int32_t >>>>>>>>> to uint32_t should be reversed are line 458 and 459 of dictionary.cpp, >>>>>>>>> right? >>>>>>>>> >>>>>>>>> John >>>>>>>>> >>>>>>>>> >>>>>>>>> On 3/12/12 9:52 AM, Dominique Prunier wrote: >>>>>>>>>> Hey John, >>>>>>>>>> >>>>>>>>>> The problem is that it doesn't actually fail when reading the index. >>>>>>>>>> The index is read but during the evaluation, i have segfaults, bogus >>>>>>>>>> results or valgrind errors. Once i regenerated the indexes for my >>>>>>>>>> category column, everything worked liked a charm. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> It was also misleading because of the other issue (unsigned ints >>>>>>>>>> that should have been signed ints) that segfaulted too. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: K. John Wu [mailto:[email protected]] >>>>>>>>>> Sent: Monday, March 12, 2012 12:43 PM >>>>>>>>>> To: Dominique Prunier >>>>>>>>>> Cc: FastBit Users >>>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define >>>>>>>>>> added to match LIKE patterns case-sensitively and perform specific >>>>>>>>>> optimizations >>>>>>>>>> >>>>>>>>>> Hi, Dominique, >>>>>>>>>> >>>>>>>>>> I thought that I have checked index types. If you happen to know the >>>>>>>>>> stack trace for the reading operation, let me know. Otherwise, it >>>>>>>>>> might take me a while to figure out a good way to reproduce the >>>>>>>>>> problem.. >>>>>>>>>> >>>>>>>>>> John >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 3/12/12 9:30 AM, Dominique Prunier wrote: >>>>>>>>>>> Ok, figured out the other segfault. The index have to be >>>>>>>>>>> regenerated with the change from relic to direkte. My guess is that >>>>>>>>>>> it was reading something invalid. Is there a missing check in the >>>>>>>>>>> index read method ? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: [email protected] >>>>>>>>>>> [mailto:[email protected]] On Behalf Of >>>>>>>>>>> Dominique Prunier >>>>>>>>>>> Sent: Monday, March 12, 2012 11:45 AM >>>>>>>>>>> To: K. John Wu >>>>>>>>>>> Cc: FastBit Users >>>>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define >>>>>>>>>>> added to match LIKE patterns case-sensitively and perform specific >>>>>>>>>>> optimizations >>>>>>>>>>> >>>>>>>>>>> Hey John, >>>>>>>>>>> >>>>>>>>>>> The fix, as checked out in the revision 484 breaks the binary >>>>>>>>>>> search of the pattern prefix: >>>>>>>>>>> - int32_t b = 0; >>>>>>>>>>> - int32_t e = key_.size() - 1; >>>>>>>>>>> + uint32_t b = 0; >>>>>>>>>>> + uint32_t e = key_.size() - 1; >>>>>>>>>>> >>>>>>>>>>> Since the stop condition of the loop can be that one of the index >>>>>>>>>>> is -1, this now fails with a segfault. >>>>>>>>>>> >>>>>>>>>>> I'm troubleshooting another segfault in the bitvector right now >>>>>>>>>>> (could it be related to the change in r 479 ?) >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: K. John Wu [mailto:[email protected]] >>>>>>>>>>> Sent: Saturday, March 10, 2012 2:24 PM >>>>>>>>>>> To: Dominique Prunier >>>>>>>>>>> Cc: FastBit Users >>>>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define >>>>>>>>>>> added to match LIKE patterns case-sensitively and perform specific >>>>>>>>>>> optimizations >>>>>>>>>>> >>>>>>>>>>> Just checked in the modification to allow users to define >>>>>>>>>>> FASTBIT_CS_PATTERN_MATCH to 0 to disable case sensitive matches. >>>>>>>>>>> The >>>>>>>>>>> new SVN revision is 484. >>>>>>>>>>> >>>>>>>>>>> Also looked through other macros to make sure they are used >>>>>>>>>>> consistently. >>>>>>>>>>> >>>>>>>>>>> John >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 3/10/12 9:20 AM, Dominique Prunier wrote: >>>>>>>>>>>> Hey John, >>>>>>>>>>>> >>>>>>>>>>>> I just noticed a small typo in utils.h, the macro is called >>>>>>>>>>>> FASTBOT_... I don't think it was expected but it has the nice side >>>>>>>>>>>> effect of disabling new code by default thus preserving current >>>>>>>>>>>> behavior (case insensitive). Should we actually keep it in util.h >>>>>>>>>>>> now that it is documented in INSTALL ? >>>>>>>>>>>> >>>>>>>>>>>> https://codeforge.lbl.gov/plugins/scmsvn/viewcvs.php/trunk/src/util.h?root=fastbit&r1=483&r2=482&pathrev=483 >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> ________________________________________ >>>>>>>>>>>> From: K. John Wu [[email protected]] >>>>>>>>>>>> Sent: March-09-12 10:47 PM >>>>>>>>>>>> To: Dominique Prunier >>>>>>>>>>>> Cc: FastBit Users >>>>>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define >>>>>>>>>>>> added to match LIKE patterns case-sensitively and perform specific >>>>>>>>>>>> optimizations >>>>>>>>>>>> >>>>>>>>>>>> Hi, Dominique, >>>>>>>>>>>> >>>>>>>>>>>> I would like to add FASTBIT_ prefix to the macro CS_PATTERN_MATCH >>>>>>>>>>>> to >>>>>>>>>>>> avoid possible collision when FastBit is used with other package. >>>>>>>>>>>> Hope you don't mind. >>>>>>>>>>>> >>>>>>>>>>>> John >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 3/9/12 4:03 PM, K. John Wu wrote: >>>>>>>>>>>>> Hi, Dominique, >>>>>>>>>>>>> >>>>>>>>>>>>> I have run through my usual set of tests and did not find any >>>>>>>>>>>>> problem >>>>>>>>>>>>> with your patch. It is now in SVN 482. Please give it a try >>>>>>>>>>>>> when you >>>>>>>>>>>>> get the chance. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks. >>>>>>>>>>>>> >>>>>>>>>>>>> John >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 3/9/12 10:17 AM, Dominique Prunier wrote: >>>>>>>>>>>>>> Quick update to my patch: >>>>>>>>>>>>>> >>>>>>>>>>>>>> · Changed dictionary::patternMatch to make it work with >>>>>>>>>>>>>> CI too >>>>>>>>>>>>>> (and i think for efficiency reasons, i have to keep all this >>>>>>>>>>>>>> here) >>>>>>>>>>>>>> >>>>>>>>>>>>>> · Moved the STR_MATCH_* constants from util.cpp to >>>>>>>>>>>>>> util.h and >>>>>>>>>>>>>> use them in dictionary::patternMatch >>>>>>>>>>>>>> >>>>>>>>>>>>>> · Removed the CS/CI ifdef from category.cpp >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I did more testing, and on my set of ~90 000 test queries, the >>>>>>>>>>>>>> execution time dropped from ~515 seconds to ~20 seconds. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> *From:*[email protected] >>>>>>>>>>>>>> [mailto:[email protected]] *On Behalf Of >>>>>>>>>>>>>> *Dominique >>>>>>>>>>>>>> Prunier >>>>>>>>>>>>>> *Sent:* Thursday, March 08, 2012 2:39 PM >>>>>>>>>>>>>> *To:* FastBit Users >>>>>>>>>>>>>> *Subject:* [FastBit-users] PATCH: new CS_PATTERN_MATCH define >>>>>>>>>>>>>> added to >>>>>>>>>>>>>> match LIKE patterns case-sensitively and perform specific >>>>>>>>>>>>>> optimizations >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Here is the first version of my patch to switch SQL like from >>>>>>>>>>>>>> case >>>>>>>>>>>>>> insensitive to case sensitive and optimize this use case with >>>>>>>>>>>>>> CATEGORY >>>>>>>>>>>>>> columns. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> In a nutshell, what changed is: >>>>>>>>>>>>>> >>>>>>>>>>>>>> · We extract the longest (handling the escape char too) >>>>>>>>>>>>>> constant prefix from the pattern >>>>>>>>>>>>>> >>>>>>>>>>>>>> · Instead of testing every value in the dictionary, we >>>>>>>>>>>>>> binary >>>>>>>>>>>>>> search the range of values to search (which sometimes even allow >>>>>>>>>>>>>> to >>>>>>>>>>>>>> skip pattern matching if no valid range can be found) >>>>>>>>>>>>>> >>>>>>>>>>>>>> · We test every value in the range >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On a large dictionary (~130k entries), i’ve commonly it can be >>>>>>>>>>>>>> one or >>>>>>>>>>>>>> two order of magnitude faster (in my example, a simple query >>>>>>>>>>>>>> with a >>>>>>>>>>>>>> single LIKE predicate drops from ~10ms to ~0.4ms). >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> What i’d like to change/refactor (i’m really a newbie in c++): >>>>>>>>>>>>>> >>>>>>>>>>>>>> · Remove the prefix extraction and pattern matching code >>>>>>>>>>>>>> from >>>>>>>>>>>>>> dictionary and replace the added method patternSearch by >>>>>>>>>>>>>> something >>>>>>>>>>>>>> like findRange. I believe that matching and pattern handling code >>>>>>>>>>>>>> doesn’t belong to the dictionary. I’d rather move this back to >>>>>>>>>>>>>> the >>>>>>>>>>>>>> category class or something. >>>>>>>>>>>>>> >>>>>>>>>>>>>> · Having to use a c++ string object to rebuild the >>>>>>>>>>>>>> longest >>>>>>>>>>>>>> constant prefix bugs me (suggestions ?). I’m also thinking to >>>>>>>>>>>>>> have a >>>>>>>>>>>>>> version that doesn’t support escaping, but it would force me to >>>>>>>>>>>>>> change >>>>>>>>>>>>>> strMatch a bit more >>>>>>>>>>>>>> >>>>>>>>>>>>>> · To closely match the previous behavior, you can’t >>>>>>>>>>>>>> match an >>>>>>>>>>>>>> empty pattern (even the empty string doesn’t match), maybe that >>>>>>>>>>>>>> would >>>>>>>>>>>>>> worh being changed >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> As always John, feel free to include this into the main branch. >>>>>>>>>>>>>> I’m >>>>>>>>>>>>>> waiting for suggestions to make it more efficient, cleaner, ... >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> */Dominique Prunier/**//* >>>>>>>>>>>>>> >>>>>>>>>>>>>> APG Lead Developper >>>>>>>>>>>>>> >>>>>>>>>>>>>> Logo-W4N-100dpi >>>>>>>>>>>>>> >>>>>>>>>>>>>> 4388, rue Saint-Denis >>>>>>>>>>>>>> >>>>>>>>>>>>>> Bureau 309 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Montreal (Quebec) H2J 2L1 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Tel. +1 514-842-6767 x310 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Fax +1 514-842-3989 >>>>>>>>>>>>>> >>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>> >>>>>>>>>>>>>> www.watch4net.com <http://www.watch4net.com/> >>>>>>>>>>>>>> >>>>>>>>>>>>>> / / >>>>>>>>>>>>>> >>>>>>>>>>>>>> /This message is for the designated recipient only and may >>>>>>>>>>>>>> contain >>>>>>>>>>>>>> privileged, proprietary, or otherwise private information. If >>>>>>>>>>>>>> you have >>>>>>>>>>>>>> received it in error, please notify the sender immediately and >>>>>>>>>>>>>> delete >>>>>>>>>>>>>> the original. Any other use of this electronic mail by you is >>>>>>>>>>>>>> prohibited. >>>>>>>>>>>>>> >>>>>>>>>>>>>> //Ce message est pour le récipiendaire désigné seulement et peut >>>>>>>>>>>>>> contenir des informations privilégiées, propriétaires ou >>>>>>>>>>>>>> autrement >>>>>>>>>>>>>> privées. Si vous l'avez reçu par erreur, S.V.P. avisez >>>>>>>>>>>>>> l'expéditeur >>>>>>>>>>>>>> immédiatement et effacez l'original. Toute autre utilisation de >>>>>>>>>>>>>> ce >>>>>>>>>>>>>> courrier électronique par vous est prohibée./// >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> FastBit-users mailing list >>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> FastBit-users mailing list >>>>>>>>>>> [email protected] >>>>>>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >>>>> _______________________________________________ >>>>> FastBit-users mailing list >>>>> [email protected] >>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >>>>> _______________________________________________ >>>>> FastBit-users mailing list >>>>> [email protected] >>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >>> _______________________________________________ >>> FastBit-users mailing list >>> [email protected] >>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
