Hey John, The problem is that right know, my test suite mainly tests the import and not so much the queries (i have maybe 10-20 test cases that actually evaluates queries). It has been basically modeled after our existing test suite for MySQL/Oracle and is supposed to prove that the imported data is equivalent to our source.
My next step is to do a micro testing framework to test queries. I have a very simplistic example which only uses CATEGORY and LONG (the only two types that we're using), but it is not convenient enough to use. Be sure that i'll post in the group if i have something good enough to be shared :) Regarding the bug i found yesterday, surprisingly enough, fixing it changed the performance quite significantly. I'm still not sure why (the number of selected bitmap was the same), but i'll try to investigate a bit further. Thanks, -----Original Message----- From: K. John Wu [mailto:[email protected]] Sent: Thursday, March 15, 2012 11:30 AM To: Dominique Prunier Cc: FastBit Users Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to match LIKE patterns case-sensitively and perform specific optimizations Hi, Dominique, Would love to see what your test suite if you are able to make it public. We can either try to use it as is or try to convert your program into C++.. John On 3/15/12 7:09 AM, Dominique Prunier wrote: > I'm building a nice test suite, even though it is java based, not sure it > would be much helpful for you... > > -----Original Message----- > From: K. John Wu [mailto:[email protected]] > Sent: Thursday, March 15, 2012 1:31 AM > To: Dominique Prunier > Cc: FastBit Users > Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to > match LIKE patterns case-sensitively and perform specific optimizations > > Hi, Dominique, > > Thanks for the fix. I have run through my tests. Clearly, I don't > have a test case that can exercise this feature right now. Will have > to think about how to get a few in there.. > > Let me know if you find anything else. > > John > > > On 3/14/12 6:45 PM, Dominique Prunier wrote: >> Hey John, >> >> I just noticed that the pattern search was badly broken... I have no idea >> how i haven't catched that earlier, it was not using the proper index value. >> If i'm not mistaken, this should fix it: >> >> diff --git a/src/category.cpp b/src/category.cpp >> index ef3dc77..61039f9 100644 >> --- a/src/category.cpp >> +++ b/src/category.cpp >> @@ -618,7 +618,7 @@ long ibis::category::patternSearch(const char *pat) >> const { >> std::auto_ptr< ibis::array_t<uint32_t> > tmp(new >> ibis::array_t<uint32_t>); >> dic.patternSearch(pat, *tmp); >> for (uint32_t j = 0; j < tmp->size(); ++ j) { >> - const ibis::bitvector *bv = rlc->getBitvector(j); >> + const ibis::bitvector *bv = rlc->getBitvector((*tmp)[j]); >> if (bv != 0) >> est += bv->cnt(); >> } >> @@ -658,7 +658,7 @@ long ibis::category::patternSearch(const char *pat, >> std::auto_ptr< ibis::array_t<uint32_t> > tmp(new >> ibis::array_t<uint32_t>); >> dic.patternSearch(pat, *tmp); >> for (uint32_t j = 0; j < tmp->size(); ++ j) { >> - const ibis::bitvector *bv = rlc->getBitvector(j); >> + const ibis::bitvector *bv = rlc->getBitvector((*tmp)[j]); >> if (bv != 0) { >> ++ cnt; >> est += bv->cnt(); >> >> I was probably testing against an older version of the library. >> >> Thanks, >> >> -----Original Message----- >> From: K. John Wu [mailto:[email protected]] >> Sent: Wednesday, March 14, 2012 9:18 PM >> To: Dominique Prunier >> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >> match LIKE patterns case-sensitively and perform specific optimizations >> >> Hi, Dominique, >> >> I have just checked in a couple of minor changes. Would be good to >> put out a stable version to replace the broken version that was taken >> down. >> >> Please let me know if you find anything that still needs attention. >> >> Thanks. >> >> John >> >> >> On 3/14/12 7:14 AM, Dominique Prunier wrote: >>> Seems to work for me. I'll do further testings, i'd like to isolate a >>> stable version sometime this week. >>> >>> Thanks, >>> >>> -----Original Message----- >>> From: K. John Wu [mailto:[email protected]] >>> Sent: Tuesday, March 13, 2012 7:23 PM >>> To: Dominique Prunier >>> Cc: FastBit Users >>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>> match LIKE patterns case-sensitively and perform specific optimizations >>> >>> Yes, you are absolutely right. It should be >>> >>> if (read(...) < 0 || ...) >>> >>> This problem is corrected in SVN 489. Let me know if you find >>> something else.. >>> >>> John >>> >>> >>> On 3/13/12 2:55 PM, Dominique Prunier wrote: >>>> Hmm, seems like it is related to if (0 <= >>>> static_cast<ibis::direkte*>(idx)->read(idxf.c_str()) on line 185 of >>>> category.cpp. Shouldn't it be 0!=read(..) instead of 0<=read(..) ? >>>> >>>> Thanks, >>>> >>>> -----Original Message----- >>>> From: [email protected] >>>> [mailto:[email protected]] On Behalf Of Dominique >>>> Prunier >>>> Sent: Tuesday, March 13, 2012 5:16 PM >>>> To: K. John Wu >>>> Cc: FastBit Users >>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>>> match LIKE patterns case-sensitively and perform specific optimizations >>>> >>>> Hey John, >>>> >>>> No segfault anymore but it seems that now it seems that it is always >>>> regenerating the index all the time :/ >>>> >>>> Thanks, >>>> >>>> -----Original Message----- >>>> From: K. John Wu [mailto:[email protected]] >>>> Sent: Tuesday, March 13, 2012 4:44 PM >>>> To: Dominique Prunier >>>> Cc: FastBit Users >>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>>> match LIKE patterns case-sensitively and perform specific optimizations >>>> >>>> Hi, Dominique, >>>> >>>> Just checked in SVN version 488. Please give it a try when you get >>>> the chance. Thanks. >>>> >>>> John >>>> >>>> >>>> On 3/13/12 11:58 AM, Dominique Prunier wrote: >>>>> Cool. I'll test it right after you commit it. >>>>> While we're at fixing this, i think there is a memory leak in void >>>>> ibis::category::prepareMembers(). The ibis::fileManager::storage *st (on >>>>> line 182) is not freed by index, direkte or category (index just nullify >>>>> the pointer). Not sure who should free it. >>>>> >>>>> Thanks, >>>>> >>>>> -----Original Message----- >>>>> From: K. John Wu [mailto:[email protected]] >>>>> Sent: Tuesday, March 13, 2012 2:53 PM >>>>> To: Dominique Prunier >>>>> Cc: FastBit Users >>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>>>> match LIKE patterns case-sensitively and perform specific optimizations >>>>> >>>>> Hi, Dominique, >>>>> >>>>> I think I know where the problem is -- during the process of >>>>> recreating the new index, the old file was not cleaned up properly. I >>>>> have implemented a fix and am doing some testing on it. Will check in >>>>> the code as soon as I am comfortable that I have not broken anything >>>>> with the new changes.. >>>>> >>>>> John >>>>> >>>>> >>>>> On 3/13/12 11:48 AM, Dominique Prunier wrote: >>>>>> By the way, i checked index creation and it doesn't exhibit the issue. >>>>>> The only way to reproduce is to use an old indexed partition (category >>>>>> columns) and run the revision 487 on it. It seems that something bad >>>>>> happens during the conversion and make the cleanup crash. >>>>>> >>>>>> -----Original Message----- >>>>>> From: [email protected] >>>>>> [mailto:[email protected]] On Behalf Of Dominique >>>>>> Prunier >>>>>> Sent: Tuesday, March 13, 2012 12:42 PM >>>>>> To: FastBit Users; K. John Wu >>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>>>>> match LIKE patterns case-sensitively and perform specific optimizations >>>>>> >>>>>> John, >>>>>> >>>>>> Seems like the segfault appears in the cleaning methods of the file >>>>>> manager: >>>>>> >>>>>> ==5046== Warning: set address range perms: large range [0x6f3f030, >>>>>> 0x1f5df050) (noaccess) >>>>>> ==5046== Invalid read of size 4 >>>>>> ==5046== at 0x50C8BE0: ibis::util::sharedInt32::operator()() const >>>>>> (util.h:901) >>>>>> ==5046== by 0x50C8E37: ibis::fileManager::storage::inUse() const >>>>>> (fileManager.h:259) >>>>>> ==5046== by 0x5669911: ibis::fileManager::unload(unsigned long) >>>>>> (fileManager.cpp:1259) >>>>>> ==5046== by 0x5664C68: ibis::fileManager::clear() >>>>>> (fileManager.cpp:408) >>>>>> ==5046== by 0x5665B98: ibis::fileManager::~fileManager() >>>>>> (fileManager.cpp:654) >>>>>> ==5046== by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so) >>>>>> ==5046== by 0x5C25404: exit (in /lib64/libc-2.13.so) >>>>>> ==5046== by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so) >>>>>> ==5046== Address 0x6f3aec4 is not stack'd, malloc'd or (recently) free'd >>>>>> ... >>>>>> ==5046== Invalid read of size 4 >>>>>> ==5046== at 0x52E9396: ibis::fileManager::storage::pastUse() const >>>>>> (fileManager.h:261) >>>>>> ==5046== by 0x56699FF: ibis::fileManager::unload(unsigned long) >>>>>> (fileManager.cpp:1266) >>>>>> ==5046== by 0x5664C68: ibis::fileManager::clear() >>>>>> (fileManager.cpp:408) >>>>>> ==5046== by 0x5665B98: ibis::fileManager::~fileManager() >>>>>> (fileManager.cpp:654) >>>>>> ==5046== by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so) >>>>>> ==5046== by 0x5C25404: exit (in /lib64/libc-2.13.so) >>>>>> ==5046== by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so) >>>>>> ==5046== Address 0x211c9570 is not stack'd, malloc'd or (recently) >>>>>> free'd >>>>>> ... >>>>>> ==5046== Invalid read of size 8 >>>>>> ==5046== at 0x5665068: ibis::fileManager::clear() >>>>>> (fileManager.cpp:444) >>>>>> ==5046== by 0x5665B98: ibis::fileManager::~fileManager() >>>>>> (fileManager.cpp:654) >>>>>> ==5046== by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so) >>>>>> ==5046== by 0x5C25404: exit (in /lib64/libc-2.13.so) >>>>>> ==5046== by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so) >>>>>> ==5046== Address 0x1c is not stack'd, malloc'd or (recently) free'd... >>>>>> ==5046== >>>>>> ==5046== >>>>>> ==5046== Process terminating with default action of signal 11 (SIGSEGV) >>>>>> ==5046== Access not within mapped region at address 0x1C >>>>>> ==5046== at 0x5665068: ibis::fileManager::clear() >>>>>> (fileManager.cpp:444) >>>>>> ==5046== by 0x5665B98: ibis::fileManager::~fileManager() >>>>>> (fileManager.cpp:654) >>>>>> ==5046== by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so) >>>>>> ==5046== by 0x5C25404: exit (in /lib64/libc-2.13.so) >>>>>> ==5046== by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so) >>>>>> ==5046== If you believe this happened as a result of a stack >>>>>> ==5046== overflow in your program's main thread (unlikely but >>>>>> ==5046== possible), you can try to increase the size of the >>>>>> ==5046== main thread stack using the --main-stacksize= flag. >>>>>> ==5046== The main thread stack size used in this run was 8388608. >>>>>> >>>>>> The second execution of the same program works fine, so it has to be >>>>>> related to index creation/recreation. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -----Original Message----- >>>>>> From: [email protected] >>>>>> [mailto:[email protected]] On Behalf Of Dominique >>>>>> Prunier >>>>>> Sent: Tuesday, March 13, 2012 10:51 AM >>>>>> To: K. John Wu >>>>>> Cc: FastBit Users >>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>>>>> match LIKE patterns case-sensitively and perform specific optimizations >>>>>> >>>>>> Hey John, >>>>>> >>>>>> It seems to work just fine now, it seamlessly recreated indexes on my >>>>>> old partition. >>>>>> However, i'm having a segfault at the end of the first execution (the >>>>>> one that converted the index). >>>>>> I'll investigate this and tell you what i find. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -----Original Message----- >>>>>> From: K. John Wu [mailto:[email protected]] >>>>>> Sent: Monday, March 12, 2012 6:39 PM >>>>>> To: Dominique Prunier >>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>>>>> match LIKE patterns case-sensitively and perform specific optimizations >>>>>> >>>>>> Just added checking to make sure the index type to the functions >>>>>> caused your problem (a read function that directly works with >>>>>> ibis::fileManager::storage). It should now automatically override all >>>>>> the relics with direktes. I am testing the code now. The source code >>>>>> is SVN 487. >>>>>> >>>>>> John >>>>>> >>>>>> >>>>>> On 3/12/12 1:49 PM, Dominique Prunier wrote: >>>>>>> No problem. Do we want to do something about the migration from relic >>>>>>> to direkte ? >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: K. John Wu [mailto:[email protected]] >>>>>>> Sent: Monday, March 12, 2012 2:21 PM >>>>>>> To: Dominique Prunier >>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added >>>>>>> to match LIKE patterns case-sensitively and perform specific >>>>>>> optimizations >>>>>>> >>>>>>> Thanks for the confirmation. >>>>>>> >>>>>>> John >>>>>>> >>>>>>> >>>>>>> On 3/12/12 11:07 AM, Dominique Prunier wrote: >>>>>>>> It seems to work just fine for me in r486. >>>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: K. John Wu [mailto:[email protected]] >>>>>>>> Sent: Monday, March 12, 2012 1:51 PM >>>>>>>> To: Dominique Prunier >>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added >>>>>>>> to match LIKE patterns case-sensitively and perform specific >>>>>>>> optimizations >>>>>>>> >>>>>>>> If you have verified the answers are the same as before, then we don't >>>>>>>> have a off-by-1 problem. At this point, I have not done that. Let me >>>>>>>> know if have. >>>>>>>> >>>>>>>> John >>>>>>>> >>>>>>>> >>>>>>>> On 3/12/12 10:48 AM, Dominique Prunier wrote: >>>>>>>>> Hmm, the original patch was working fairly well. I tried a couple of >>>>>>>>> limits (range empty, start and/or ends at first value and/or last >>>>>>>>> value). I didn't noticed any other change. Are you talking about this >>>>>>>>> or the segfault in direkte ? >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: K. John Wu [mailto:[email protected]] >>>>>>>>> Sent: Monday, March 12, 2012 1:41 PM >>>>>>>>> To: Dominique Prunier >>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added >>>>>>>>> to match LIKE patterns case-sensitively and perform specific >>>>>>>>> optimizations >>>>>>>>> >>>>>>>>> Just hid release 1.2.9, there might be an off-by-1 problem as well. >>>>>>>>> Need to dig deeper.. >>>>>>>>> >>>>>>>>> John >>>>>>>>> >>>>>>>>> >>>>>>>>> On 3/12/12 10:13 AM, Dominique Prunier wrote: >>>>>>>>>> Yep, but i bet you changed them for a reason, maybe a compile >>>>>>>>>> warning or something (they were int32_t in my patch). Array_t >>>>>>>>>> indexes are definitely uints_32. Reverting them to int32_t works but >>>>>>>>>> maybe it would worth thinking about it. >>>>>>>>>> >>>>>>>>>> About the segfault, i captured an example of valgrind error (which >>>>>>>>>> ultimately leads to a segfault). As you can see, it is during query >>>>>>>>>> evaluation, not when reading the index. >>>>>>>>>> >>>>>>>>>> ==5672== Invalid read of size 4 >>>>>>>>>> ==5672== at 0x5414F60: ibis::bitvector::or_d1(ibis::bitvector >>>>>>>>>> const&) (bitvector.cpp:2934) >>>>>>>>>> ==5672== by 0x541C622: >>>>>>>>>> ibis::bitvector::operator|=(ibis::bitvector const&) >>>>>>>>>> (bitvector.cpp:1272) >>>>>>>>>> ==5672== by 0x52DA4A4: ibis::index::sumBins(unsigned int, >>>>>>>>>> unsigned int, ibis::bitvector&) const (index.cpp:6183) >>>>>>>>>> ==5672== by 0x55D097D: >>>>>>>>>> ibis::direkte::evaluate(ibis::qContinuousRange const&, >>>>>>>>>> ibis::bitvector&) const (idirekte.cpp:1071) >>>>>>>>>> ==5672== by 0x5517FB3: ibis::category::stringSearch(char const*, >>>>>>>>>> ibis::bitvector&) const (category.cpp:390) >>>>>>>>>> ==5672== by 0x5276C41: ibis::query::doEvaluate(ibis::qExpr >>>>>>>>>> const*, ibis::bitvector const&, ibis::bitvector&) const >>>>>>>>>> (query.cpp:3948) >>>>>>>>>> ==5672== by 0x52770E7: ibis::query::doEvaluate(ibis::qExpr >>>>>>>>>> const*, ibis::bitvector const&, ibis::bitvector&) const >>>>>>>>>> (query.cpp:3779) >>>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr >>>>>>>>>> const*, ibis::bitvector const&, ibis::bitvector&) const >>>>>>>>>> (query.cpp:3776) >>>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr >>>>>>>>>> const*, ibis::bitvector const&, ibis::bitvector&) const >>>>>>>>>> (query.cpp:3776) >>>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr >>>>>>>>>> const*, ibis::bitvector const&, ibis::bitvector&) const >>>>>>>>>> (query.cpp:3776) >>>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr >>>>>>>>>> const*, ibis::bitvector const&, ibis::bitvector&) const >>>>>>>>>> (query.cpp:3776) >>>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr >>>>>>>>>> const*, ibis::bitvector const&, ibis::bitvector&) const >>>>>>>>>> (query.cpp:3776) >>>>>>>>>> ==5672== Address 0x9416f20 is 0 bytes after a block of size 354,720 >>>>>>>>>> alloc'd >>>>>>>>>> ==5672== at 0x4C28C6D: malloc (in >>>>>>>>>> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) >>>>>>>>>> ==5672== by 0x5445F24: >>>>>>>>>> ibis::fileManager::storage::storage(unsigned long) >>>>>>>>>> (fileManager.cpp:1718) >>>>>>>>>> ==5672== by 0x54466D9: >>>>>>>>>> ibis::fileManager::storage::enlarge(unsigned long) >>>>>>>>>> (fileManager.cpp:1977) >>>>>>>>>> ==5672== by 0x51A73B1: ibis::array_t<unsigned >>>>>>>>>> int>::resize(unsigned long) (array_t.cpp:1412) >>>>>>>>>> ==5672== by 0x54119FA: ibis::bitvector::decompress() >>>>>>>>>> (bitvector.cpp:364) >>>>>>>>>> ==5672== by 0x52DA45B: ibis::index::sumBins(unsigned int, >>>>>>>>>> unsigned int, ibis::bitvector&) const (index.cpp:6180) >>>>>>>>>> ==5672== by 0x55D097D: >>>>>>>>>> ibis::direkte::evaluate(ibis::qContinuousRange const&, >>>>>>>>>> ibis::bitvector&) const (idirekte.cpp:1071) >>>>>>>>>> ==5672== by 0x5517FB3: ibis::category::stringSearch(char const*, >>>>>>>>>> ibis::bitvector&) const (category.cpp:390) >>>>>>>>>> ==5672== by 0x5276C41: ibis::query::doEvaluate(ibis::qExpr >>>>>>>>>> const*, ibis::bitvector const&, ibis::bitvector&) const >>>>>>>>>> (query.cpp:3948) >>>>>>>>>> ==5672== by 0x52770E7: ibis::query::doEvaluate(ibis::qExpr >>>>>>>>>> const*, ibis::bitvector const&, ibis::bitvector&) const >>>>>>>>>> (query.cpp:3779) >>>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr >>>>>>>>>> const*, ibis::bitvector const&, ibis::bitvector&) const >>>>>>>>>> (query.cpp:3776) >>>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr >>>>>>>>>> const*, ibis::bitvector const&, ibis::bitvector&) const >>>>>>>>>> (query.cpp:3776) >>>>>>>>>> >>>>>>>>>> I'm not able to reproduce on a simplistic use case. Not sure exactly >>>>>>>>>> what triggers this. Here again, this is not dramatic since i just >>>>>>>>>> have to regenerate my indexes but i'm wondering if there were a way >>>>>>>>>> to catch this (i'm thinking about people upgrading from a version >>>>>>>>>> prior to 1.2.9). >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: K. John Wu [mailto:[email protected]] >>>>>>>>>> Sent: Monday, March 12, 2012 1:02 PM >>>>>>>>>> To: Dominique Prunier >>>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define >>>>>>>>>> added to match LIKE patterns case-sensitively and perform specific >>>>>>>>>> optimizations >>>>>>>>>> >>>>>>>>>> Hi, Dominique, >>>>>>>>>> >>>>>>>>>> Let me just confirm that the two lines where the change from int32_t >>>>>>>>>> to uint32_t should be reversed are line 458 and 459 of >>>>>>>>>> dictionary.cpp, >>>>>>>>>> right? >>>>>>>>>> >>>>>>>>>> John >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 3/12/12 9:52 AM, Dominique Prunier wrote: >>>>>>>>>>> Hey John, >>>>>>>>>>> >>>>>>>>>>> The problem is that it doesn't actually fail when reading the >>>>>>>>>>> index. The index is read but during the evaluation, i have >>>>>>>>>>> segfaults, bogus results or valgrind errors. Once i regenerated the >>>>>>>>>>> indexes for my category column, everything worked liked a charm. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> It was also misleading because of the other issue (unsigned ints >>>>>>>>>>> that should have been signed ints) that segfaulted too. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: K. John Wu [mailto:[email protected]] >>>>>>>>>>> Sent: Monday, March 12, 2012 12:43 PM >>>>>>>>>>> To: Dominique Prunier >>>>>>>>>>> Cc: FastBit Users >>>>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define >>>>>>>>>>> added to match LIKE patterns case-sensitively and perform specific >>>>>>>>>>> optimizations >>>>>>>>>>> >>>>>>>>>>> Hi, Dominique, >>>>>>>>>>> >>>>>>>>>>> I thought that I have checked index types. If you happen to know >>>>>>>>>>> the >>>>>>>>>>> stack trace for the reading operation, let me know. Otherwise, it >>>>>>>>>>> might take me a while to figure out a good way to reproduce the >>>>>>>>>>> problem.. >>>>>>>>>>> >>>>>>>>>>> John >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 3/12/12 9:30 AM, Dominique Prunier wrote: >>>>>>>>>>>> Ok, figured out the other segfault. The index have to be >>>>>>>>>>>> regenerated with the change from relic to direkte. My guess is >>>>>>>>>>>> that it was reading something invalid. Is there a missing check in >>>>>>>>>>>> the index read method ? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: [email protected] >>>>>>>>>>>> [mailto:[email protected]] On Behalf Of >>>>>>>>>>>> Dominique Prunier >>>>>>>>>>>> Sent: Monday, March 12, 2012 11:45 AM >>>>>>>>>>>> To: K. John Wu >>>>>>>>>>>> Cc: FastBit Users >>>>>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define >>>>>>>>>>>> added to match LIKE patterns case-sensitively and perform specific >>>>>>>>>>>> optimizations >>>>>>>>>>>> >>>>>>>>>>>> Hey John, >>>>>>>>>>>> >>>>>>>>>>>> The fix, as checked out in the revision 484 breaks the binary >>>>>>>>>>>> search of the pattern prefix: >>>>>>>>>>>> - int32_t b = 0; >>>>>>>>>>>> - int32_t e = key_.size() - 1; >>>>>>>>>>>> + uint32_t b = 0; >>>>>>>>>>>> + uint32_t e = key_.size() - 1; >>>>>>>>>>>> >>>>>>>>>>>> Since the stop condition of the loop can be that one of the index >>>>>>>>>>>> is -1, this now fails with a segfault. >>>>>>>>>>>> >>>>>>>>>>>> I'm troubleshooting another segfault in the bitvector right now >>>>>>>>>>>> (could it be related to the change in r 479 ?) >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: K. John Wu [mailto:[email protected]] >>>>>>>>>>>> Sent: Saturday, March 10, 2012 2:24 PM >>>>>>>>>>>> To: Dominique Prunier >>>>>>>>>>>> Cc: FastBit Users >>>>>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define >>>>>>>>>>>> added to match LIKE patterns case-sensitively and perform specific >>>>>>>>>>>> optimizations >>>>>>>>>>>> >>>>>>>>>>>> Just checked in the modification to allow users to define >>>>>>>>>>>> FASTBIT_CS_PATTERN_MATCH to 0 to disable case sensitive matches. >>>>>>>>>>>> The >>>>>>>>>>>> new SVN revision is 484. >>>>>>>>>>>> >>>>>>>>>>>> Also looked through other macros to make sure they are used >>>>>>>>>>>> consistently. >>>>>>>>>>>> >>>>>>>>>>>> John >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 3/10/12 9:20 AM, Dominique Prunier wrote: >>>>>>>>>>>>> Hey John, >>>>>>>>>>>>> >>>>>>>>>>>>> I just noticed a small typo in utils.h, the macro is called >>>>>>>>>>>>> FASTBOT_... I don't think it was expected but it has the nice >>>>>>>>>>>>> side effect of disabling new code by default thus preserving >>>>>>>>>>>>> current behavior (case insensitive). Should we actually keep it >>>>>>>>>>>>> in util.h now that it is documented in INSTALL ? >>>>>>>>>>>>> >>>>>>>>>>>>> https://codeforge.lbl.gov/plugins/scmsvn/viewcvs.php/trunk/src/util.h?root=fastbit&r1=483&r2=482&pathrev=483 >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> ________________________________________ >>>>>>>>>>>>> From: K. John Wu [[email protected]] >>>>>>>>>>>>> Sent: March-09-12 10:47 PM >>>>>>>>>>>>> To: Dominique Prunier >>>>>>>>>>>>> Cc: FastBit Users >>>>>>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define >>>>>>>>>>>>> added to match LIKE patterns case-sensitively and perform >>>>>>>>>>>>> specific optimizations >>>>>>>>>>>>> >>>>>>>>>>>>> Hi, Dominique, >>>>>>>>>>>>> >>>>>>>>>>>>> I would like to add FASTBIT_ prefix to the macro CS_PATTERN_MATCH >>>>>>>>>>>>> to >>>>>>>>>>>>> avoid possible collision when FastBit is used with other package. >>>>>>>>>>>>> Hope you don't mind. >>>>>>>>>>>>> >>>>>>>>>>>>> John >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 3/9/12 4:03 PM, K. John Wu wrote: >>>>>>>>>>>>>> Hi, Dominique, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have run through my usual set of tests and did not find any >>>>>>>>>>>>>> problem >>>>>>>>>>>>>> with your patch. It is now in SVN 482. Please give it a try >>>>>>>>>>>>>> when you >>>>>>>>>>>>>> get the chance. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>> >>>>>>>>>>>>>> John >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 3/9/12 10:17 AM, Dominique Prunier wrote: >>>>>>>>>>>>>>> Quick update to my patch: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> · Changed dictionary::patternMatch to make it work with >>>>>>>>>>>>>>> CI too >>>>>>>>>>>>>>> (and i think for efficiency reasons, i have to keep all this >>>>>>>>>>>>>>> here) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> · Moved the STR_MATCH_* constants from util.cpp to >>>>>>>>>>>>>>> util.h and >>>>>>>>>>>>>>> use them in dictionary::patternMatch >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> · Removed the CS/CI ifdef from category.cpp >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I did more testing, and on my set of ~90 000 test queries, the >>>>>>>>>>>>>>> execution time dropped from ~515 seconds to ~20 seconds. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> *From:*[email protected] >>>>>>>>>>>>>>> [mailto:[email protected]] *On Behalf Of >>>>>>>>>>>>>>> *Dominique >>>>>>>>>>>>>>> Prunier >>>>>>>>>>>>>>> *Sent:* Thursday, March 08, 2012 2:39 PM >>>>>>>>>>>>>>> *To:* FastBit Users >>>>>>>>>>>>>>> *Subject:* [FastBit-users] PATCH: new CS_PATTERN_MATCH define >>>>>>>>>>>>>>> added to >>>>>>>>>>>>>>> match LIKE patterns case-sensitively and perform specific >>>>>>>>>>>>>>> optimizations >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Here is the first version of my patch to switch SQL like from >>>>>>>>>>>>>>> case >>>>>>>>>>>>>>> insensitive to case sensitive and optimize this use case with >>>>>>>>>>>>>>> CATEGORY >>>>>>>>>>>>>>> columns. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In a nutshell, what changed is: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> · We extract the longest (handling the escape char too) >>>>>>>>>>>>>>> constant prefix from the pattern >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> · Instead of testing every value in the dictionary, we >>>>>>>>>>>>>>> binary >>>>>>>>>>>>>>> search the range of values to search (which sometimes even >>>>>>>>>>>>>>> allow to >>>>>>>>>>>>>>> skip pattern matching if no valid range can be found) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> · We test every value in the range >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On a large dictionary (~130k entries), i’ve commonly it can be >>>>>>>>>>>>>>> one or >>>>>>>>>>>>>>> two order of magnitude faster (in my example, a simple query >>>>>>>>>>>>>>> with a >>>>>>>>>>>>>>> single LIKE predicate drops from ~10ms to ~0.4ms). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> What i’d like to change/refactor (i’m really a newbie in c++): >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> · Remove the prefix extraction and pattern matching >>>>>>>>>>>>>>> code from >>>>>>>>>>>>>>> dictionary and replace the added method patternSearch by >>>>>>>>>>>>>>> something >>>>>>>>>>>>>>> like findRange. I believe that matching and pattern handling >>>>>>>>>>>>>>> code >>>>>>>>>>>>>>> doesn’t belong to the dictionary. I’d rather move this back to >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> category class or something. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> · Having to use a c++ string object to rebuild the >>>>>>>>>>>>>>> longest >>>>>>>>>>>>>>> constant prefix bugs me (suggestions ?). I’m also thinking to >>>>>>>>>>>>>>> have a >>>>>>>>>>>>>>> version that doesn’t support escaping, but it would force me to >>>>>>>>>>>>>>> change >>>>>>>>>>>>>>> strMatch a bit more >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> · To closely match the previous behavior, you can’t >>>>>>>>>>>>>>> match an >>>>>>>>>>>>>>> empty pattern (even the empty string doesn’t match), maybe that >>>>>>>>>>>>>>> would >>>>>>>>>>>>>>> worh being changed >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> As always John, feel free to include this into the main branch. >>>>>>>>>>>>>>> I’m >>>>>>>>>>>>>>> waiting for suggestions to make it more efficient, cleaner, ... >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> */Dominique Prunier/**//* >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> APG Lead Developper >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Logo-W4N-100dpi >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 4388, rue Saint-Denis >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Bureau 309 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Montreal (Quebec) H2J 2L1 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Tel. +1 514-842-6767 x310 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Fax +1 514-842-3989 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>> <mailto:[email protected]> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> www.watch4net.com <http://www.watch4net.com/> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> / / >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> /This message is for the designated recipient only and may >>>>>>>>>>>>>>> contain >>>>>>>>>>>>>>> privileged, proprietary, or otherwise private information. If >>>>>>>>>>>>>>> you have >>>>>>>>>>>>>>> received it in error, please notify the sender immediately and >>>>>>>>>>>>>>> delete >>>>>>>>>>>>>>> the original. Any other use of this electronic mail by you is >>>>>>>>>>>>>>> prohibited. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> //Ce message est pour le récipiendaire désigné seulement et peut >>>>>>>>>>>>>>> contenir des informations privilégiées, propriétaires ou >>>>>>>>>>>>>>> autrement >>>>>>>>>>>>>>> privées. Si vous l'avez reçu par erreur, S.V.P. avisez >>>>>>>>>>>>>>> l'expéditeur >>>>>>>>>>>>>>> immédiatement et effacez l'original. Toute autre utilisation de >>>>>>>>>>>>>>> ce >>>>>>>>>>>>>>> courrier électronique par vous est prohibée./// >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> FastBit-users mailing list >>>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> FastBit-users mailing list >>>>>>>>>>>> [email protected] >>>>>>>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >>>>>> _______________________________________________ >>>>>> FastBit-users mailing list >>>>>> [email protected] >>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >>>>>> _______________________________________________ >>>>>> FastBit-users mailing list >>>>>> [email protected] >>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >>>> _______________________________________________ >>>> FastBit-users mailing list >>>> [email protected] >>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
