Yes, you are absolutely right.  It should be

if (read(...) < 0 || ...)

This problem is corrected in SVN 489.  Let me know if you find
something else..

John


On 3/13/12 2:55 PM, Dominique Prunier wrote:
> Hmm, seems like it is related to        if (0 <= 
> static_cast<ibis::direkte*>(idx)->read(idxf.c_str()) on line 185 of 
> category.cpp. Shouldn't it be 0!=read(..) instead of 0<=read(..) ?
> 
> Thanks,
> 
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Dominique Prunier
> Sent: Tuesday, March 13, 2012 5:16 PM
> To: K. John Wu
> Cc: FastBit Users
> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
> match LIKE patterns case-sensitively and perform specific optimizations
> 
> Hey John,
> 
> No segfault anymore but it seems that now it seems that it is always 
> regenerating the index all the time :/
> 
> Thanks,
> 
> -----Original Message-----
> From: K. John Wu [mailto:[email protected]]
> Sent: Tuesday, March 13, 2012 4:44 PM
> To: Dominique Prunier
> Cc: FastBit Users
> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
> match LIKE patterns case-sensitively and perform specific optimizations
> 
> Hi, Dominique,
> 
> Just checked in SVN version 488.  Please give it a try when you get
> the chance.  Thanks.
> 
> John
> 
> 
> On 3/13/12 11:58 AM, Dominique Prunier wrote:
>> Cool. I'll test it right after you commit it.
>> While we're at fixing this, i think there is a memory leak in void 
>> ibis::category::prepareMembers(). The ibis::fileManager::storage *st (on 
>> line 182) is not freed by index, direkte or category (index just nullify the 
>> pointer). Not sure who should free it.
>>
>> Thanks,
>>
>> -----Original Message-----
>> From: K. John Wu [mailto:[email protected]]
>> Sent: Tuesday, March 13, 2012 2:53 PM
>> To: Dominique Prunier
>> Cc: FastBit Users
>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
>> match LIKE patterns case-sensitively and perform specific optimizations
>>
>> Hi, Dominique,
>>
>> I think I know where the problem is -- during the process of
>> recreating the new index, the old file was not cleaned up properly.  I
>> have implemented a fix and am doing some testing on it.  Will check in
>> the code as soon as I am comfortable that I have not broken anything
>> with the new changes..
>>
>> John
>>
>>
>> On 3/13/12 11:48 AM, Dominique Prunier wrote:
>>> By the way, i checked index creation and it doesn't exhibit the issue. The 
>>> only way to reproduce is to use an old indexed partition (category columns) 
>>> and run the revision 487 on it. It seems that something bad happens during 
>>> the conversion and make the cleanup crash.
>>>
>>> -----Original Message-----
>>> From: [email protected] 
>>> [mailto:[email protected]] On Behalf Of Dominique Prunier
>>> Sent: Tuesday, March 13, 2012 12:42 PM
>>> To: FastBit Users; K. John Wu
>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>
>>> John,
>>>
>>> Seems like the segfault appears in the cleaning methods of the file manager:
>>>
>>> ==5046== Warning: set address range perms: large range [0x6f3f030, 
>>> 0x1f5df050) (noaccess)
>>> ==5046== Invalid read of size 4
>>> ==5046==    at 0x50C8BE0: ibis::util::sharedInt32::operator()() const 
>>> (util.h:901)
>>> ==5046==    by 0x50C8E37: ibis::fileManager::storage::inUse() const 
>>> (fileManager.h:259)
>>> ==5046==    by 0x5669911: ibis::fileManager::unload(unsigned long) 
>>> (fileManager.cpp:1259)
>>> ==5046==    by 0x5664C68: ibis::fileManager::clear() (fileManager.cpp:408)
>>> ==5046==    by 0x5665B98: ibis::fileManager::~fileManager() 
>>> (fileManager.cpp:654)
>>> ==5046==    by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so)
>>> ==5046==    by 0x5C25404: exit (in /lib64/libc-2.13.so)
>>> ==5046==    by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so)
>>> ==5046==  Address 0x6f3aec4 is not stack'd, malloc'd or (recently) free'd
>>> ...
>>> ==5046== Invalid read of size 4
>>> ==5046==    at 0x52E9396: ibis::fileManager::storage::pastUse() const 
>>> (fileManager.h:261)
>>> ==5046==    by 0x56699FF: ibis::fileManager::unload(unsigned long) 
>>> (fileManager.cpp:1266)
>>> ==5046==    by 0x5664C68: ibis::fileManager::clear() (fileManager.cpp:408)
>>> ==5046==    by 0x5665B98: ibis::fileManager::~fileManager() 
>>> (fileManager.cpp:654)
>>> ==5046==    by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so)
>>> ==5046==    by 0x5C25404: exit (in /lib64/libc-2.13.so)
>>> ==5046==    by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so)
>>> ==5046==  Address 0x211c9570 is not stack'd, malloc'd or (recently) free'd
>>> ...
>>> ==5046== Invalid read of size 8
>>> ==5046==    at 0x5665068: ibis::fileManager::clear() (fileManager.cpp:444)
>>> ==5046==    by 0x5665B98: ibis::fileManager::~fileManager() 
>>> (fileManager.cpp:654)
>>> ==5046==    by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so)
>>> ==5046==    by 0x5C25404: exit (in /lib64/libc-2.13.so)
>>> ==5046==    by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so)
>>> ==5046==  Address 0x1c is not stack'd, malloc'd or (recently) free'd...
>>> ==5046==
>>> ==5046==
>>> ==5046== Process terminating with default action of signal 11 (SIGSEGV)
>>> ==5046==  Access not within mapped region at address 0x1C
>>> ==5046==    at 0x5665068: ibis::fileManager::clear() (fileManager.cpp:444)
>>> ==5046==    by 0x5665B98: ibis::fileManager::~fileManager() 
>>> (fileManager.cpp:654)
>>> ==5046==    by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so)
>>> ==5046==    by 0x5C25404: exit (in /lib64/libc-2.13.so)
>>> ==5046==    by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so)
>>> ==5046==  If you believe this happened as a result of a stack
>>> ==5046==  overflow in your program's main thread (unlikely but
>>> ==5046==  possible), you can try to increase the size of the
>>> ==5046==  main thread stack using the --main-stacksize= flag.
>>> ==5046==  The main thread stack size used in this run was 8388608.
>>>
>>> The second execution of the same program works fine, so it has to be 
>>> related to index creation/recreation.
>>>
>>> Thanks,
>>>
>>> -----Original Message-----
>>> From: [email protected] 
>>> [mailto:[email protected]] On Behalf Of Dominique Prunier
>>> Sent: Tuesday, March 13, 2012 10:51 AM
>>> To: K. John Wu
>>> Cc: FastBit Users
>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>
>>> Hey John,
>>>
>>> It seems to work just fine now, it seamlessly recreated indexes on my old 
>>> partition.
>>> However, i'm having a segfault at the end of the first execution (the one 
>>> that converted the index).
>>> I'll investigate this and tell you what i find.
>>>
>>> Thanks,
>>>
>>> -----Original Message-----
>>> From: K. John Wu [mailto:[email protected]]
>>> Sent: Monday, March 12, 2012 6:39 PM
>>> To: Dominique Prunier
>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>
>>> Just added checking to make sure the index type to the functions
>>> caused your problem (a read function that directly works with
>>> ibis::fileManager::storage).  It should now automatically override all
>>> the relics with direktes.  I am testing the code now.  The source code
>>> is SVN 487.
>>>
>>> John
>>>
>>>
>>> On 3/12/12 1:49 PM, Dominique Prunier wrote:
>>>> No problem. Do we want to do something about the migration from relic to 
>>>> direkte ?
>>>>
>>>> -----Original Message-----
>>>> From: K. John Wu [mailto:[email protected]]
>>>> Sent: Monday, March 12, 2012 2:21 PM
>>>> To: Dominique Prunier
>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
>>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>>
>>>> Thanks for the confirmation.
>>>>
>>>> John
>>>>
>>>>
>>>> On 3/12/12 11:07 AM, Dominique Prunier wrote:
>>>>> It seems to work just fine for me in r486.
>>>>>
>>>>> -----Original Message-----
>>>>> From: K. John Wu [mailto:[email protected]]
>>>>> Sent: Monday, March 12, 2012 1:51 PM
>>>>> To: Dominique Prunier
>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
>>>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>>>
>>>>> If you have verified the answers are the same as before, then we don't
>>>>> have a off-by-1 problem.  At this point, I have not done that.  Let me
>>>>> know if have.
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>> On 3/12/12 10:48 AM, Dominique Prunier wrote:
>>>>>> Hmm, the original patch was working fairly well. I tried a couple of 
>>>>>> limits (range empty, start and/or ends at first value and/or last 
>>>>>> value). I didn't noticed any other change. Are you talking about this or 
>>>>>> the segfault in direkte ?
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: K. John Wu [mailto:[email protected]]
>>>>>> Sent: Monday, March 12, 2012 1:41 PM
>>>>>> To: Dominique Prunier
>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
>>>>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>>>>
>>>>>> Just hid release 1.2.9, there might be an off-by-1 problem as well.
>>>>>> Need to dig deeper..
>>>>>>
>>>>>> John
>>>>>>
>>>>>>
>>>>>> On 3/12/12 10:13 AM, Dominique Prunier wrote:
>>>>>>> Yep, but i bet you changed them for a reason, maybe a compile warning 
>>>>>>> or something (they were int32_t in my patch). Array_t indexes are 
>>>>>>> definitely uints_32. Reverting them to int32_t works but maybe it would 
>>>>>>> worth thinking about it.
>>>>>>>
>>>>>>> About the segfault, i captured an example of valgrind error (which 
>>>>>>> ultimately leads to a segfault). As you can see, it is during query 
>>>>>>> evaluation, not when reading the index.
>>>>>>>
>>>>>>> ==5672== Invalid read of size 4
>>>>>>> ==5672==    at 0x5414F60: ibis::bitvector::or_d1(ibis::bitvector 
>>>>>>> const&) (bitvector.cpp:2934)
>>>>>>> ==5672==    by 0x541C622: ibis::bitvector::operator|=(ibis::bitvector 
>>>>>>> const&) (bitvector.cpp:1272)
>>>>>>> ==5672==    by 0x52DA4A4: ibis::index::sumBins(unsigned int, unsigned 
>>>>>>> int, ibis::bitvector&) const (index.cpp:6183)
>>>>>>> ==5672==    by 0x55D097D: 
>>>>>>> ibis::direkte::evaluate(ibis::qContinuousRange const&, 
>>>>>>> ibis::bitvector&) const (idirekte.cpp:1071)
>>>>>>> ==5672==    by 0x5517FB3: ibis::category::stringSearch(char const*, 
>>>>>>> ibis::bitvector&) const (category.cpp:390)
>>>>>>> ==5672==    by 0x5276C41: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3948)
>>>>>>> ==5672==    by 0x52770E7: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3779)
>>>>>>> ==5672==    by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>> ==5672==    by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>> ==5672==    by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>> ==5672==    by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>> ==5672==    by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>> ==5672==  Address 0x9416f20 is 0 bytes after a block of size 354,720 
>>>>>>> alloc'd
>>>>>>> ==5672==    at 0x4C28C6D: malloc (in 
>>>>>>> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>>>>>>> ==5672==    by 0x5445F24: ibis::fileManager::storage::storage(unsigned 
>>>>>>> long) (fileManager.cpp:1718)
>>>>>>> ==5672==    by 0x54466D9: ibis::fileManager::storage::enlarge(unsigned 
>>>>>>> long) (fileManager.cpp:1977)
>>>>>>> ==5672==    by 0x51A73B1: ibis::array_t<unsigned int>::resize(unsigned 
>>>>>>> long) (array_t.cpp:1412)
>>>>>>> ==5672==    by 0x54119FA: ibis::bitvector::decompress() 
>>>>>>> (bitvector.cpp:364)
>>>>>>> ==5672==    by 0x52DA45B: ibis::index::sumBins(unsigned int, unsigned 
>>>>>>> int, ibis::bitvector&) const (index.cpp:6180)
>>>>>>> ==5672==    by 0x55D097D: 
>>>>>>> ibis::direkte::evaluate(ibis::qContinuousRange const&, 
>>>>>>> ibis::bitvector&) const (idirekte.cpp:1071)
>>>>>>> ==5672==    by 0x5517FB3: ibis::category::stringSearch(char const*, 
>>>>>>> ibis::bitvector&) const (category.cpp:390)
>>>>>>> ==5672==    by 0x5276C41: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3948)
>>>>>>> ==5672==    by 0x52770E7: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3779)
>>>>>>> ==5672==    by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>> ==5672==    by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>>
>>>>>>> I'm not able to reproduce on a simplistic use case. Not sure exactly 
>>>>>>> what triggers this. Here again, this is not dramatic since i just have 
>>>>>>> to regenerate my indexes but i'm wondering if there were a way to catch 
>>>>>>> this (i'm thinking about people upgrading from a version prior to 
>>>>>>> 1.2.9).
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: K. John Wu [mailto:[email protected]]
>>>>>>> Sent: Monday, March 12, 2012 1:02 PM
>>>>>>> To: Dominique Prunier
>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added 
>>>>>>> to match LIKE patterns case-sensitively and perform specific 
>>>>>>> optimizations
>>>>>>>
>>>>>>> Hi, Dominique,
>>>>>>>
>>>>>>> Let me just confirm that the two lines where the change from int32_t
>>>>>>> to uint32_t should be reversed are line 458 and 459 of dictionary.cpp,
>>>>>>> right?
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>>>
>>>>>>> On 3/12/12 9:52 AM, Dominique Prunier wrote:
>>>>>>>> Hey John,
>>>>>>>>
>>>>>>>> The problem is that it doesn't actually fail when reading the index. 
>>>>>>>> The index is read but during the evaluation, i have segfaults, bogus 
>>>>>>>> results or valgrind errors. Once i regenerated the indexes for my 
>>>>>>>> category column, everything worked liked a charm.
>>>>>>>
>>>>>>>>
>>>>>>>> It was also misleading because of the other issue (unsigned ints that 
>>>>>>>> should have been signed ints) that segfaulted too.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: K. John Wu [mailto:[email protected]]
>>>>>>>> Sent: Monday, March 12, 2012 12:43 PM
>>>>>>>> To: Dominique Prunier
>>>>>>>> Cc: FastBit Users
>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added 
>>>>>>>> to match LIKE patterns case-sensitively and perform specific 
>>>>>>>> optimizations
>>>>>>>>
>>>>>>>> Hi, Dominique,
>>>>>>>>
>>>>>>>> I thought that I have checked index types.  If you happen to know the
>>>>>>>> stack trace for the reading operation, let me know.  Otherwise, it
>>>>>>>> might take me a while to figure out a good way to reproduce the 
>>>>>>>> problem..
>>>>>>>>
>>>>>>>> John
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/12/12 9:30 AM, Dominique Prunier wrote:
>>>>>>>>> Ok, figured out the other segfault. The index have to be regenerated 
>>>>>>>>> with the change from relic to direkte. My guess is that it was 
>>>>>>>>> reading something invalid. Is there a missing check in the index read 
>>>>>>>>> method ?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: [email protected] 
>>>>>>>>> [mailto:[email protected]] On Behalf Of Dominique 
>>>>>>>>> Prunier
>>>>>>>>> Sent: Monday, March 12, 2012 11:45 AM
>>>>>>>>> To: K. John Wu
>>>>>>>>> Cc: FastBit Users
>>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added 
>>>>>>>>> to match LIKE patterns case-sensitively and perform specific 
>>>>>>>>> optimizations
>>>>>>>>>
>>>>>>>>> Hey John,
>>>>>>>>>
>>>>>>>>> The fix, as checked out in the revision 484 breaks the binary search 
>>>>>>>>> of the pattern prefix:
>>>>>>>>> -       int32_t b = 0;
>>>>>>>>> -       int32_t e = key_.size() - 1;
>>>>>>>>> +       uint32_t b = 0;
>>>>>>>>> +       uint32_t e = key_.size() - 1;
>>>>>>>>>
>>>>>>>>> Since the stop condition of the loop can be that one of the index is 
>>>>>>>>> -1, this now fails with a segfault.
>>>>>>>>>
>>>>>>>>> I'm troubleshooting another segfault in the bitvector right now 
>>>>>>>>> (could it be related to the change in r 479 ?)
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: K. John Wu [mailto:[email protected]]
>>>>>>>>> Sent: Saturday, March 10, 2012 2:24 PM
>>>>>>>>> To: Dominique Prunier
>>>>>>>>> Cc: FastBit Users
>>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added 
>>>>>>>>> to match LIKE patterns case-sensitively and perform specific 
>>>>>>>>> optimizations
>>>>>>>>>
>>>>>>>>> Just checked in the modification to allow users to define
>>>>>>>>> FASTBIT_CS_PATTERN_MATCH to 0 to disable case sensitive matches.  The
>>>>>>>>> new SVN revision is 484.
>>>>>>>>>
>>>>>>>>> Also looked through other macros to make sure they are used 
>>>>>>>>> consistently.
>>>>>>>>>
>>>>>>>>> John
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3/10/12 9:20 AM, Dominique Prunier wrote:
>>>>>>>>>> Hey John,
>>>>>>>>>>
>>>>>>>>>> I just noticed a small typo in utils.h, the macro is called 
>>>>>>>>>> FASTBOT_... I don't think it was expected but it has the nice side 
>>>>>>>>>> effect of disabling new code by default thus preserving current 
>>>>>>>>>> behavior (case insensitive). Should we actually keep it in util.h 
>>>>>>>>>> now that it is documented in INSTALL ?
>>>>>>>>>>
>>>>>>>>>> https://codeforge.lbl.gov/plugins/scmsvn/viewcvs.php/trunk/src/util.h?root=fastbit&r1=483&r2=482&pathrev=483
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> ________________________________________
>>>>>>>>>> From: K. John Wu [[email protected]]
>>>>>>>>>> Sent: March-09-12 10:47 PM
>>>>>>>>>> To: Dominique Prunier
>>>>>>>>>> Cc: FastBit Users
>>>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define 
>>>>>>>>>> added to match LIKE patterns case-sensitively and perform specific 
>>>>>>>>>> optimizations
>>>>>>>>>>
>>>>>>>>>> Hi, Dominique,
>>>>>>>>>>
>>>>>>>>>> I would like to add FASTBIT_ prefix to the macro CS_PATTERN_MATCH to
>>>>>>>>>> avoid possible collision when FastBit is used with other package.
>>>>>>>>>> Hope you don't mind.
>>>>>>>>>>
>>>>>>>>>> John
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 3/9/12 4:03 PM, K. John Wu wrote:
>>>>>>>>>>> Hi, Dominique,
>>>>>>>>>>>
>>>>>>>>>>> I have run through my usual set of tests and did not find any 
>>>>>>>>>>> problem
>>>>>>>>>>> with your patch.  It is now in SVN 482.  Please give it a try when 
>>>>>>>>>>> you
>>>>>>>>>>> get the chance.
>>>>>>>>>>>
>>>>>>>>>>> Thanks.
>>>>>>>>>>>
>>>>>>>>>>> John
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 3/9/12 10:17 AM, Dominique Prunier wrote:
>>>>>>>>>>>> Quick update to my patch:
>>>>>>>>>>>>
>>>>>>>>>>>> ·         Changed dictionary::patternMatch to make it work with CI 
>>>>>>>>>>>> too
>>>>>>>>>>>> (and i think for efficiency reasons, i have to keep all this here)
>>>>>>>>>>>>
>>>>>>>>>>>> ·         Moved the STR_MATCH_* constants from util.cpp to util.h 
>>>>>>>>>>>> and
>>>>>>>>>>>> use them in dictionary::patternMatch
>>>>>>>>>>>>
>>>>>>>>>>>> ·         Removed the CS/CI ifdef from category.cpp
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I did more testing, and on my set of ~90 000 test queries, the
>>>>>>>>>>>> execution time dropped from ~515 seconds to ~20 seconds.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *From:*[email protected]
>>>>>>>>>>>> [mailto:[email protected]] *On Behalf Of 
>>>>>>>>>>>> *Dominique
>>>>>>>>>>>> Prunier
>>>>>>>>>>>> *Sent:* Thursday, March 08, 2012 2:39 PM
>>>>>>>>>>>> *To:* FastBit Users
>>>>>>>>>>>> *Subject:* [FastBit-users] PATCH: new CS_PATTERN_MATCH define 
>>>>>>>>>>>> added to
>>>>>>>>>>>> match LIKE patterns case-sensitively and perform specific 
>>>>>>>>>>>> optimizations
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Here is the first version of my patch to switch SQL like from case
>>>>>>>>>>>> insensitive to case sensitive and optimize this use case with 
>>>>>>>>>>>> CATEGORY
>>>>>>>>>>>> columns.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> In a nutshell, what changed is:
>>>>>>>>>>>>
>>>>>>>>>>>> ·         We extract the longest (handling the escape char too)
>>>>>>>>>>>> constant prefix from the pattern
>>>>>>>>>>>>
>>>>>>>>>>>> ·         Instead of testing every value in the dictionary, we 
>>>>>>>>>>>> binary
>>>>>>>>>>>> search the range of values to search (which sometimes even allow to
>>>>>>>>>>>> skip pattern matching if no valid range can be found)
>>>>>>>>>>>>
>>>>>>>>>>>> ·         We test every value in the range
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On a large dictionary (~130k entries), i’ve commonly it can be one 
>>>>>>>>>>>> or
>>>>>>>>>>>> two order of magnitude faster (in my example, a simple query with a
>>>>>>>>>>>> single LIKE predicate drops from ~10ms to ~0.4ms).
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> What i’d like to change/refactor (i’m really a newbie in c++):
>>>>>>>>>>>>
>>>>>>>>>>>> ·         Remove the prefix extraction and pattern matching code 
>>>>>>>>>>>> from
>>>>>>>>>>>> dictionary and replace the added method patternSearch by something
>>>>>>>>>>>> like findRange. I believe that matching and pattern handling code
>>>>>>>>>>>> doesn’t belong to the dictionary. I’d rather move this back to the
>>>>>>>>>>>> category class or something.
>>>>>>>>>>>>
>>>>>>>>>>>> ·         Having to use a c++ string object to rebuild the longest
>>>>>>>>>>>> constant prefix bugs me (suggestions ?). I’m also thinking to have 
>>>>>>>>>>>> a
>>>>>>>>>>>> version that doesn’t support escaping, but it would force me to 
>>>>>>>>>>>> change
>>>>>>>>>>>> strMatch a bit more
>>>>>>>>>>>>
>>>>>>>>>>>> ·         To closely match the previous behavior, you can’t match 
>>>>>>>>>>>> an
>>>>>>>>>>>> empty pattern (even the empty string doesn’t match), maybe that 
>>>>>>>>>>>> would
>>>>>>>>>>>> worh being changed
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> As always John, feel free to include this into the main branch. I’m
>>>>>>>>>>>> waiting for suggestions to make it more efficient, cleaner, ...
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> */Dominique Prunier/**//*
>>>>>>>>>>>>
>>>>>>>>>>>>  APG Lead Developper
>>>>>>>>>>>>
>>>>>>>>>>>> Logo-W4N-100dpi
>>>>>>>>>>>>
>>>>>>>>>>>>  4388, rue Saint-Denis
>>>>>>>>>>>>
>>>>>>>>>>>>  Bureau 309
>>>>>>>>>>>>
>>>>>>>>>>>>  Montreal (Quebec)  H2J 2L1
>>>>>>>>>>>>
>>>>>>>>>>>>  Tel. +1 514-842-6767  x310
>>>>>>>>>>>>
>>>>>>>>>>>>  Fax +1 514-842-3989
>>>>>>>>>>>>
>>>>>>>>>>>>  [email protected] 
>>>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>>>>
>>>>>>>>>>>>  www.watch4net.com <http://www.watch4net.com/>
>>>>>>>>>>>>
>>>>>>>>>>>> /  /
>>>>>>>>>>>>
>>>>>>>>>>>> /This message is for the designated recipient only and may contain
>>>>>>>>>>>> privileged, proprietary, or otherwise private information. If you 
>>>>>>>>>>>> have
>>>>>>>>>>>> received it in error, please notify the sender immediately and 
>>>>>>>>>>>> delete
>>>>>>>>>>>> the original. Any other use of this electronic mail by you is 
>>>>>>>>>>>> prohibited.
>>>>>>>>>>>>
>>>>>>>>>>>> //Ce message est pour le récipiendaire désigné seulement et peut
>>>>>>>>>>>> contenir des informations privilégiées, propriétaires ou autrement
>>>>>>>>>>>> privées. Si vous l'avez reçu par erreur, S.V.P. avisez l'expéditeur
>>>>>>>>>>>> immédiatement et effacez l'original. Toute autre utilisation de ce
>>>>>>>>>>>> courrier électronique par vous est prohibée.///
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> FastBit-users mailing list
>>>>>>>>>>>> [email protected]
>>>>>>>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>>>>>>> _______________________________________________
>>>>>>>>> FastBit-users mailing list
>>>>>>>>> [email protected]
>>>>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>> _______________________________________________
>>> FastBit-users mailing list
>>> [email protected]
>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>> _______________________________________________
>>> FastBit-users mailing list
>>> [email protected]
>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to