Hey John,
I just noticed that the pattern search was badly broken... I have no idea how i
haven't catched that earlier, it was not using the proper index value. If i'm
not mistaken, this should fix it:
diff --git a/src/category.cpp b/src/category.cpp
index ef3dc77..61039f9 100644
--- a/src/category.cpp
+++ b/src/category.cpp
@@ -618,7 +618,7 @@ long ibis::category::patternSearch(const char *pat) const {
std::auto_ptr< ibis::array_t<uint32_t> > tmp(new ibis::array_t<uint32_t>);
dic.patternSearch(pat, *tmp);
for (uint32_t j = 0; j < tmp->size(); ++ j) {
- const ibis::bitvector *bv = rlc->getBitvector(j);
+ const ibis::bitvector *bv = rlc->getBitvector((*tmp)[j]);
if (bv != 0)
est += bv->cnt();
}
@@ -658,7 +658,7 @@ long ibis::category::patternSearch(const char *pat,
std::auto_ptr< ibis::array_t<uint32_t> > tmp(new ibis::array_t<uint32_t>);
dic.patternSearch(pat, *tmp);
for (uint32_t j = 0; j < tmp->size(); ++ j) {
- const ibis::bitvector *bv = rlc->getBitvector(j);
+ const ibis::bitvector *bv = rlc->getBitvector((*tmp)[j]);
if (bv != 0) {
++ cnt;
est += bv->cnt();
I was probably testing against an older version of the library.
Thanks,
-----Original Message-----
From: K. John Wu [mailto:[email protected]]
Sent: Wednesday, March 14, 2012 9:18 PM
To: Dominique Prunier
Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to match
LIKE patterns case-sensitively and perform specific optimizations
Hi, Dominique,
I have just checked in a couple of minor changes. Would be good to
put out a stable version to replace the broken version that was taken
down.
Please let me know if you find anything that still needs attention.
Thanks.
John
On 3/14/12 7:14 AM, Dominique Prunier wrote:
> Seems to work for me. I'll do further testings, i'd like to isolate a stable
> version sometime this week.
>
> Thanks,
>
> -----Original Message-----
> From: K. John Wu [mailto:[email protected]]
> Sent: Tuesday, March 13, 2012 7:23 PM
> To: Dominique Prunier
> Cc: FastBit Users
> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to
> match LIKE patterns case-sensitively and perform specific optimizations
>
> Yes, you are absolutely right. It should be
>
> if (read(...) < 0 || ...)
>
> This problem is corrected in SVN 489. Let me know if you find
> something else..
>
> John
>
>
> On 3/13/12 2:55 PM, Dominique Prunier wrote:
>> Hmm, seems like it is related to if (0 <=
>> static_cast<ibis::direkte*>(idx)->read(idxf.c_str()) on line 185 of
>> category.cpp. Shouldn't it be 0!=read(..) instead of 0<=read(..) ?
>>
>> Thanks,
>>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Dominique Prunier
>> Sent: Tuesday, March 13, 2012 5:16 PM
>> To: K. John Wu
>> Cc: FastBit Users
>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to
>> match LIKE patterns case-sensitively and perform specific optimizations
>>
>> Hey John,
>>
>> No segfault anymore but it seems that now it seems that it is always
>> regenerating the index all the time :/
>>
>> Thanks,
>>
>> -----Original Message-----
>> From: K. John Wu [mailto:[email protected]]
>> Sent: Tuesday, March 13, 2012 4:44 PM
>> To: Dominique Prunier
>> Cc: FastBit Users
>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to
>> match LIKE patterns case-sensitively and perform specific optimizations
>>
>> Hi, Dominique,
>>
>> Just checked in SVN version 488. Please give it a try when you get
>> the chance. Thanks.
>>
>> John
>>
>>
>> On 3/13/12 11:58 AM, Dominique Prunier wrote:
>>> Cool. I'll test it right after you commit it.
>>> While we're at fixing this, i think there is a memory leak in void
>>> ibis::category::prepareMembers(). The ibis::fileManager::storage *st (on
>>> line 182) is not freed by index, direkte or category (index just nullify
>>> the pointer). Not sure who should free it.
>>>
>>> Thanks,
>>>
>>> -----Original Message-----
>>> From: K. John Wu [mailto:[email protected]]
>>> Sent: Tuesday, March 13, 2012 2:53 PM
>>> To: Dominique Prunier
>>> Cc: FastBit Users
>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to
>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>
>>> Hi, Dominique,
>>>
>>> I think I know where the problem is -- during the process of
>>> recreating the new index, the old file was not cleaned up properly. I
>>> have implemented a fix and am doing some testing on it. Will check in
>>> the code as soon as I am comfortable that I have not broken anything
>>> with the new changes..
>>>
>>> John
>>>
>>>
>>> On 3/13/12 11:48 AM, Dominique Prunier wrote:
>>>> By the way, i checked index creation and it doesn't exhibit the issue. The
>>>> only way to reproduce is to use an old indexed partition (category
>>>> columns) and run the revision 487 on it. It seems that something bad
>>>> happens during the conversion and make the cleanup crash.
>>>>
>>>> -----Original Message-----
>>>> From: [email protected]
>>>> [mailto:[email protected]] On Behalf Of Dominique
>>>> Prunier
>>>> Sent: Tuesday, March 13, 2012 12:42 PM
>>>> To: FastBit Users; K. John Wu
>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to
>>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>>
>>>> John,
>>>>
>>>> Seems like the segfault appears in the cleaning methods of the file
>>>> manager:
>>>>
>>>> ==5046== Warning: set address range perms: large range [0x6f3f030,
>>>> 0x1f5df050) (noaccess)
>>>> ==5046== Invalid read of size 4
>>>> ==5046== at 0x50C8BE0: ibis::util::sharedInt32::operator()() const
>>>> (util.h:901)
>>>> ==5046== by 0x50C8E37: ibis::fileManager::storage::inUse() const
>>>> (fileManager.h:259)
>>>> ==5046== by 0x5669911: ibis::fileManager::unload(unsigned long)
>>>> (fileManager.cpp:1259)
>>>> ==5046== by 0x5664C68: ibis::fileManager::clear() (fileManager.cpp:408)
>>>> ==5046== by 0x5665B98: ibis::fileManager::~fileManager()
>>>> (fileManager.cpp:654)
>>>> ==5046== by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so)
>>>> ==5046== by 0x5C25404: exit (in /lib64/libc-2.13.so)
>>>> ==5046== by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so)
>>>> ==5046== Address 0x6f3aec4 is not stack'd, malloc'd or (recently) free'd
>>>> ...
>>>> ==5046== Invalid read of size 4
>>>> ==5046== at 0x52E9396: ibis::fileManager::storage::pastUse() const
>>>> (fileManager.h:261)
>>>> ==5046== by 0x56699FF: ibis::fileManager::unload(unsigned long)
>>>> (fileManager.cpp:1266)
>>>> ==5046== by 0x5664C68: ibis::fileManager::clear() (fileManager.cpp:408)
>>>> ==5046== by 0x5665B98: ibis::fileManager::~fileManager()
>>>> (fileManager.cpp:654)
>>>> ==5046== by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so)
>>>> ==5046== by 0x5C25404: exit (in /lib64/libc-2.13.so)
>>>> ==5046== by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so)
>>>> ==5046== Address 0x211c9570 is not stack'd, malloc'd or (recently) free'd
>>>> ...
>>>> ==5046== Invalid read of size 8
>>>> ==5046== at 0x5665068: ibis::fileManager::clear() (fileManager.cpp:444)
>>>> ==5046== by 0x5665B98: ibis::fileManager::~fileManager()
>>>> (fileManager.cpp:654)
>>>> ==5046== by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so)
>>>> ==5046== by 0x5C25404: exit (in /lib64/libc-2.13.so)
>>>> ==5046== by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so)
>>>> ==5046== Address 0x1c is not stack'd, malloc'd or (recently) free'd...
>>>> ==5046==
>>>> ==5046==
>>>> ==5046== Process terminating with default action of signal 11 (SIGSEGV)
>>>> ==5046== Access not within mapped region at address 0x1C
>>>> ==5046== at 0x5665068: ibis::fileManager::clear() (fileManager.cpp:444)
>>>> ==5046== by 0x5665B98: ibis::fileManager::~fileManager()
>>>> (fileManager.cpp:654)
>>>> ==5046== by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so)
>>>> ==5046== by 0x5C25404: exit (in /lib64/libc-2.13.so)
>>>> ==5046== by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so)
>>>> ==5046== If you believe this happened as a result of a stack
>>>> ==5046== overflow in your program's main thread (unlikely but
>>>> ==5046== possible), you can try to increase the size of the
>>>> ==5046== main thread stack using the --main-stacksize= flag.
>>>> ==5046== The main thread stack size used in this run was 8388608.
>>>>
>>>> The second execution of the same program works fine, so it has to be
>>>> related to index creation/recreation.
>>>>
>>>> Thanks,
>>>>
>>>> -----Original Message-----
>>>> From: [email protected]
>>>> [mailto:[email protected]] On Behalf Of Dominique
>>>> Prunier
>>>> Sent: Tuesday, March 13, 2012 10:51 AM
>>>> To: K. John Wu
>>>> Cc: FastBit Users
>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to
>>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>>
>>>> Hey John,
>>>>
>>>> It seems to work just fine now, it seamlessly recreated indexes on my old
>>>> partition.
>>>> However, i'm having a segfault at the end of the first execution (the one
>>>> that converted the index).
>>>> I'll investigate this and tell you what i find.
>>>>
>>>> Thanks,
>>>>
>>>> -----Original Message-----
>>>> From: K. John Wu [mailto:[email protected]]
>>>> Sent: Monday, March 12, 2012 6:39 PM
>>>> To: Dominique Prunier
>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to
>>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>>
>>>> Just added checking to make sure the index type to the functions
>>>> caused your problem (a read function that directly works with
>>>> ibis::fileManager::storage). It should now automatically override all
>>>> the relics with direktes. I am testing the code now. The source code
>>>> is SVN 487.
>>>>
>>>> John
>>>>
>>>>
>>>> On 3/12/12 1:49 PM, Dominique Prunier wrote:
>>>>> No problem. Do we want to do something about the migration from relic to
>>>>> direkte ?
>>>>>
>>>>> -----Original Message-----
>>>>> From: K. John Wu [mailto:[email protected]]
>>>>> Sent: Monday, March 12, 2012 2:21 PM
>>>>> To: Dominique Prunier
>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to
>>>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>>>
>>>>> Thanks for the confirmation.
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>> On 3/12/12 11:07 AM, Dominique Prunier wrote:
>>>>>> It seems to work just fine for me in r486.
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: K. John Wu [mailto:[email protected]]
>>>>>> Sent: Monday, March 12, 2012 1:51 PM
>>>>>> To: Dominique Prunier
>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to
>>>>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>>>>
>>>>>> If you have verified the answers are the same as before, then we don't
>>>>>> have a off-by-1 problem. At this point, I have not done that. Let me
>>>>>> know if have.
>>>>>>
>>>>>> John
>>>>>>
>>>>>>
>>>>>> On 3/12/12 10:48 AM, Dominique Prunier wrote:
>>>>>>> Hmm, the original patch was working fairly well. I tried a couple of
>>>>>>> limits (range empty, start and/or ends at first value and/or last
>>>>>>> value). I didn't noticed any other change. Are you talking about this
>>>>>>> or the segfault in direkte ?
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: K. John Wu [mailto:[email protected]]
>>>>>>> Sent: Monday, March 12, 2012 1:41 PM
>>>>>>> To: Dominique Prunier
>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added
>>>>>>> to match LIKE patterns case-sensitively and perform specific
>>>>>>> optimizations
>>>>>>>
>>>>>>> Just hid release 1.2.9, there might be an off-by-1 problem as well.
>>>>>>> Need to dig deeper..
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>>>
>>>>>>> On 3/12/12 10:13 AM, Dominique Prunier wrote:
>>>>>>>> Yep, but i bet you changed them for a reason, maybe a compile warning
>>>>>>>> or something (they were int32_t in my patch). Array_t indexes are
>>>>>>>> definitely uints_32. Reverting them to int32_t works but maybe it
>>>>>>>> would worth thinking about it.
>>>>>>>>
>>>>>>>> About the segfault, i captured an example of valgrind error (which
>>>>>>>> ultimately leads to a segfault). As you can see, it is during query
>>>>>>>> evaluation, not when reading the index.
>>>>>>>>
>>>>>>>> ==5672== Invalid read of size 4
>>>>>>>> ==5672== at 0x5414F60: ibis::bitvector::or_d1(ibis::bitvector
>>>>>>>> const&) (bitvector.cpp:2934)
>>>>>>>> ==5672== by 0x541C622: ibis::bitvector::operator|=(ibis::bitvector
>>>>>>>> const&) (bitvector.cpp:1272)
>>>>>>>> ==5672== by 0x52DA4A4: ibis::index::sumBins(unsigned int, unsigned
>>>>>>>> int, ibis::bitvector&) const (index.cpp:6183)
>>>>>>>> ==5672== by 0x55D097D:
>>>>>>>> ibis::direkte::evaluate(ibis::qContinuousRange const&,
>>>>>>>> ibis::bitvector&) const (idirekte.cpp:1071)
>>>>>>>> ==5672== by 0x5517FB3: ibis::category::stringSearch(char const*,
>>>>>>>> ibis::bitvector&) const (category.cpp:390)
>>>>>>>> ==5672== by 0x5276C41: ibis::query::doEvaluate(ibis::qExpr const*,
>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3948)
>>>>>>>> ==5672== by 0x52770E7: ibis::query::doEvaluate(ibis::qExpr const*,
>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3779)
>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*,
>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*,
>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*,
>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*,
>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*,
>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>>> ==5672== Address 0x9416f20 is 0 bytes after a block of size 354,720
>>>>>>>> alloc'd
>>>>>>>> ==5672== at 0x4C28C6D: malloc (in
>>>>>>>> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>>>>>>>> ==5672== by 0x5445F24: ibis::fileManager::storage::storage(unsigned
>>>>>>>> long) (fileManager.cpp:1718)
>>>>>>>> ==5672== by 0x54466D9: ibis::fileManager::storage::enlarge(unsigned
>>>>>>>> long) (fileManager.cpp:1977)
>>>>>>>> ==5672== by 0x51A73B1: ibis::array_t<unsigned int>::resize(unsigned
>>>>>>>> long) (array_t.cpp:1412)
>>>>>>>> ==5672== by 0x54119FA: ibis::bitvector::decompress()
>>>>>>>> (bitvector.cpp:364)
>>>>>>>> ==5672== by 0x52DA45B: ibis::index::sumBins(unsigned int, unsigned
>>>>>>>> int, ibis::bitvector&) const (index.cpp:6180)
>>>>>>>> ==5672== by 0x55D097D:
>>>>>>>> ibis::direkte::evaluate(ibis::qContinuousRange const&,
>>>>>>>> ibis::bitvector&) const (idirekte.cpp:1071)
>>>>>>>> ==5672== by 0x5517FB3: ibis::category::stringSearch(char const*,
>>>>>>>> ibis::bitvector&) const (category.cpp:390)
>>>>>>>> ==5672== by 0x5276C41: ibis::query::doEvaluate(ibis::qExpr const*,
>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3948)
>>>>>>>> ==5672== by 0x52770E7: ibis::query::doEvaluate(ibis::qExpr const*,
>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3779)
>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*,
>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*,
>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>>>
>>>>>>>> I'm not able to reproduce on a simplistic use case. Not sure exactly
>>>>>>>> what triggers this. Here again, this is not dramatic since i just have
>>>>>>>> to regenerate my indexes but i'm wondering if there were a way to
>>>>>>>> catch this (i'm thinking about people upgrading from a version prior
>>>>>>>> to 1.2.9).
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: K. John Wu [mailto:[email protected]]
>>>>>>>> Sent: Monday, March 12, 2012 1:02 PM
>>>>>>>> To: Dominique Prunier
>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added
>>>>>>>> to match LIKE patterns case-sensitively and perform specific
>>>>>>>> optimizations
>>>>>>>>
>>>>>>>> Hi, Dominique,
>>>>>>>>
>>>>>>>> Let me just confirm that the two lines where the change from int32_t
>>>>>>>> to uint32_t should be reversed are line 458 and 459 of dictionary.cpp,
>>>>>>>> right?
>>>>>>>>
>>>>>>>> John
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/12/12 9:52 AM, Dominique Prunier wrote:
>>>>>>>>> Hey John,
>>>>>>>>>
>>>>>>>>> The problem is that it doesn't actually fail when reading the index.
>>>>>>>>> The index is read but during the evaluation, i have segfaults, bogus
>>>>>>>>> results or valgrind errors. Once i regenerated the indexes for my
>>>>>>>>> category column, everything worked liked a charm.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> It was also misleading because of the other issue (unsigned ints that
>>>>>>>>> should have been signed ints) that segfaulted too.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: K. John Wu [mailto:[email protected]]
>>>>>>>>> Sent: Monday, March 12, 2012 12:43 PM
>>>>>>>>> To: Dominique Prunier
>>>>>>>>> Cc: FastBit Users
>>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added
>>>>>>>>> to match LIKE patterns case-sensitively and perform specific
>>>>>>>>> optimizations
>>>>>>>>>
>>>>>>>>> Hi, Dominique,
>>>>>>>>>
>>>>>>>>> I thought that I have checked index types. If you happen to know the
>>>>>>>>> stack trace for the reading operation, let me know. Otherwise, it
>>>>>>>>> might take me a while to figure out a good way to reproduce the
>>>>>>>>> problem..
>>>>>>>>>
>>>>>>>>> John
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3/12/12 9:30 AM, Dominique Prunier wrote:
>>>>>>>>>> Ok, figured out the other segfault. The index have to be regenerated
>>>>>>>>>> with the change from relic to direkte. My guess is that it was
>>>>>>>>>> reading something invalid. Is there a missing check in the index
>>>>>>>>>> read method ?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: [email protected]
>>>>>>>>>> [mailto:[email protected]] On Behalf Of Dominique
>>>>>>>>>> Prunier
>>>>>>>>>> Sent: Monday, March 12, 2012 11:45 AM
>>>>>>>>>> To: K. John Wu
>>>>>>>>>> Cc: FastBit Users
>>>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define
>>>>>>>>>> added to match LIKE patterns case-sensitively and perform specific
>>>>>>>>>> optimizations
>>>>>>>>>>
>>>>>>>>>> Hey John,
>>>>>>>>>>
>>>>>>>>>> The fix, as checked out in the revision 484 breaks the binary search
>>>>>>>>>> of the pattern prefix:
>>>>>>>>>> - int32_t b = 0;
>>>>>>>>>> - int32_t e = key_.size() - 1;
>>>>>>>>>> + uint32_t b = 0;
>>>>>>>>>> + uint32_t e = key_.size() - 1;
>>>>>>>>>>
>>>>>>>>>> Since the stop condition of the loop can be that one of the index is
>>>>>>>>>> -1, this now fails with a segfault.
>>>>>>>>>>
>>>>>>>>>> I'm troubleshooting another segfault in the bitvector right now
>>>>>>>>>> (could it be related to the change in r 479 ?)
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: K. John Wu [mailto:[email protected]]
>>>>>>>>>> Sent: Saturday, March 10, 2012 2:24 PM
>>>>>>>>>> To: Dominique Prunier
>>>>>>>>>> Cc: FastBit Users
>>>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define
>>>>>>>>>> added to match LIKE patterns case-sensitively and perform specific
>>>>>>>>>> optimizations
>>>>>>>>>>
>>>>>>>>>> Just checked in the modification to allow users to define
>>>>>>>>>> FASTBIT_CS_PATTERN_MATCH to 0 to disable case sensitive matches. The
>>>>>>>>>> new SVN revision is 484.
>>>>>>>>>>
>>>>>>>>>> Also looked through other macros to make sure they are used
>>>>>>>>>> consistently.
>>>>>>>>>>
>>>>>>>>>> John
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 3/10/12 9:20 AM, Dominique Prunier wrote:
>>>>>>>>>>> Hey John,
>>>>>>>>>>>
>>>>>>>>>>> I just noticed a small typo in utils.h, the macro is called
>>>>>>>>>>> FASTBOT_... I don't think it was expected but it has the nice side
>>>>>>>>>>> effect of disabling new code by default thus preserving current
>>>>>>>>>>> behavior (case insensitive). Should we actually keep it in util.h
>>>>>>>>>>> now that it is documented in INSTALL ?
>>>>>>>>>>>
>>>>>>>>>>> https://codeforge.lbl.gov/plugins/scmsvn/viewcvs.php/trunk/src/util.h?root=fastbit&r1=483&r2=482&pathrev=483
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> ________________________________________
>>>>>>>>>>> From: K. John Wu [[email protected]]
>>>>>>>>>>> Sent: March-09-12 10:47 PM
>>>>>>>>>>> To: Dominique Prunier
>>>>>>>>>>> Cc: FastBit Users
>>>>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define
>>>>>>>>>>> added to match LIKE patterns case-sensitively and perform specific
>>>>>>>>>>> optimizations
>>>>>>>>>>>
>>>>>>>>>>> Hi, Dominique,
>>>>>>>>>>>
>>>>>>>>>>> I would like to add FASTBIT_ prefix to the macro CS_PATTERN_MATCH to
>>>>>>>>>>> avoid possible collision when FastBit is used with other package.
>>>>>>>>>>> Hope you don't mind.
>>>>>>>>>>>
>>>>>>>>>>> John
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 3/9/12 4:03 PM, K. John Wu wrote:
>>>>>>>>>>>> Hi, Dominique,
>>>>>>>>>>>>
>>>>>>>>>>>> I have run through my usual set of tests and did not find any
>>>>>>>>>>>> problem
>>>>>>>>>>>> with your patch. It is now in SVN 482. Please give it a try when
>>>>>>>>>>>> you
>>>>>>>>>>>> get the chance.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>> John
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 3/9/12 10:17 AM, Dominique Prunier wrote:
>>>>>>>>>>>>> Quick update to my patch:
>>>>>>>>>>>>>
>>>>>>>>>>>>> · Changed dictionary::patternMatch to make it work with
>>>>>>>>>>>>> CI too
>>>>>>>>>>>>> (and i think for efficiency reasons, i have to keep all this here)
>>>>>>>>>>>>>
>>>>>>>>>>>>> · Moved the STR_MATCH_* constants from util.cpp to util.h
>>>>>>>>>>>>> and
>>>>>>>>>>>>> use them in dictionary::patternMatch
>>>>>>>>>>>>>
>>>>>>>>>>>>> · Removed the CS/CI ifdef from category.cpp
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I did more testing, and on my set of ~90 000 test queries, the
>>>>>>>>>>>>> execution time dropped from ~515 seconds to ~20 seconds.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *From:*[email protected]
>>>>>>>>>>>>> [mailto:[email protected]] *On Behalf Of
>>>>>>>>>>>>> *Dominique
>>>>>>>>>>>>> Prunier
>>>>>>>>>>>>> *Sent:* Thursday, March 08, 2012 2:39 PM
>>>>>>>>>>>>> *To:* FastBit Users
>>>>>>>>>>>>> *Subject:* [FastBit-users] PATCH: new CS_PATTERN_MATCH define
>>>>>>>>>>>>> added to
>>>>>>>>>>>>> match LIKE patterns case-sensitively and perform specific
>>>>>>>>>>>>> optimizations
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is the first version of my patch to switch SQL like from case
>>>>>>>>>>>>> insensitive to case sensitive and optimize this use case with
>>>>>>>>>>>>> CATEGORY
>>>>>>>>>>>>> columns.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> In a nutshell, what changed is:
>>>>>>>>>>>>>
>>>>>>>>>>>>> · We extract the longest (handling the escape char too)
>>>>>>>>>>>>> constant prefix from the pattern
>>>>>>>>>>>>>
>>>>>>>>>>>>> · Instead of testing every value in the dictionary, we
>>>>>>>>>>>>> binary
>>>>>>>>>>>>> search the range of values to search (which sometimes even allow
>>>>>>>>>>>>> to
>>>>>>>>>>>>> skip pattern matching if no valid range can be found)
>>>>>>>>>>>>>
>>>>>>>>>>>>> · We test every value in the range
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On a large dictionary (~130k entries), i’ve commonly it can be
>>>>>>>>>>>>> one or
>>>>>>>>>>>>> two order of magnitude faster (in my example, a simple query with
>>>>>>>>>>>>> a
>>>>>>>>>>>>> single LIKE predicate drops from ~10ms to ~0.4ms).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> What i’d like to change/refactor (i’m really a newbie in c++):
>>>>>>>>>>>>>
>>>>>>>>>>>>> · Remove the prefix extraction and pattern matching code
>>>>>>>>>>>>> from
>>>>>>>>>>>>> dictionary and replace the added method patternSearch by something
>>>>>>>>>>>>> like findRange. I believe that matching and pattern handling code
>>>>>>>>>>>>> doesn’t belong to the dictionary. I’d rather move this back to the
>>>>>>>>>>>>> category class or something.
>>>>>>>>>>>>>
>>>>>>>>>>>>> · Having to use a c++ string object to rebuild the longest
>>>>>>>>>>>>> constant prefix bugs me (suggestions ?). I’m also thinking to
>>>>>>>>>>>>> have a
>>>>>>>>>>>>> version that doesn’t support escaping, but it would force me to
>>>>>>>>>>>>> change
>>>>>>>>>>>>> strMatch a bit more
>>>>>>>>>>>>>
>>>>>>>>>>>>> · To closely match the previous behavior, you can’t match
>>>>>>>>>>>>> an
>>>>>>>>>>>>> empty pattern (even the empty string doesn’t match), maybe that
>>>>>>>>>>>>> would
>>>>>>>>>>>>> worh being changed
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> As always John, feel free to include this into the main branch.
>>>>>>>>>>>>> I’m
>>>>>>>>>>>>> waiting for suggestions to make it more efficient, cleaner, ...
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> */Dominique Prunier/**//*
>>>>>>>>>>>>>
>>>>>>>>>>>>> APG Lead Developper
>>>>>>>>>>>>>
>>>>>>>>>>>>> Logo-W4N-100dpi
>>>>>>>>>>>>>
>>>>>>>>>>>>> 4388, rue Saint-Denis
>>>>>>>>>>>>>
>>>>>>>>>>>>> Bureau 309
>>>>>>>>>>>>>
>>>>>>>>>>>>> Montreal (Quebec) H2J 2L1
>>>>>>>>>>>>>
>>>>>>>>>>>>> Tel. +1 514-842-6767 x310
>>>>>>>>>>>>>
>>>>>>>>>>>>> Fax +1 514-842-3989
>>>>>>>>>>>>>
>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>>>>>
>>>>>>>>>>>>> www.watch4net.com <http://www.watch4net.com/>
>>>>>>>>>>>>>
>>>>>>>>>>>>> / /
>>>>>>>>>>>>>
>>>>>>>>>>>>> /This message is for the designated recipient only and may contain
>>>>>>>>>>>>> privileged, proprietary, or otherwise private information. If you
>>>>>>>>>>>>> have
>>>>>>>>>>>>> received it in error, please notify the sender immediately and
>>>>>>>>>>>>> delete
>>>>>>>>>>>>> the original. Any other use of this electronic mail by you is
>>>>>>>>>>>>> prohibited.
>>>>>>>>>>>>>
>>>>>>>>>>>>> //Ce message est pour le récipiendaire désigné seulement et peut
>>>>>>>>>>>>> contenir des informations privilégiées, propriétaires ou autrement
>>>>>>>>>>>>> privées. Si vous l'avez reçu par erreur, S.V.P. avisez
>>>>>>>>>>>>> l'expéditeur
>>>>>>>>>>>>> immédiatement et effacez l'original. Toute autre utilisation de ce
>>>>>>>>>>>>> courrier électronique par vous est prohibée.///
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> FastBit-users mailing list
>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>>>>>>>> _______________________________________________
>>>>>>>>>> FastBit-users mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>> _______________________________________________
>>>> FastBit-users mailing list
>>>> [email protected]
>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>> _______________________________________________
>>>> FastBit-users mailing list
>>>> [email protected]
>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>> _______________________________________________
>> FastBit-users mailing list
>> [email protected]
>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users