I'm building a nice test suite, even though it is java based, not sure it would 
be much helpful for you...

-----Original Message-----
From: K. John Wu [mailto:[email protected]]
Sent: Thursday, March 15, 2012 1:31 AM
To: Dominique Prunier
Cc: FastBit Users
Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to match 
LIKE patterns case-sensitively and perform specific optimizations

Hi, Dominique,

Thanks for the fix.  I have run through my tests.  Clearly, I don't
have a test case that can exercise this feature right now.  Will have
to think about how to get a few in there..

Let me know if you find anything else.

John


On 3/14/12 6:45 PM, Dominique Prunier wrote:
> Hey John,
>
> I just noticed that the pattern search was badly broken... I have no idea how 
> i haven't catched that earlier, it was not using the proper index value. If 
> i'm not mistaken, this should fix it:
>
> diff --git a/src/category.cpp b/src/category.cpp
> index ef3dc77..61039f9 100644
> --- a/src/category.cpp
> +++ b/src/category.cpp
> @@ -618,7 +618,7 @@ long ibis::category::patternSearch(const char *pat) const 
> {
>      std::auto_ptr< ibis::array_t<uint32_t> > tmp(new 
> ibis::array_t<uint32_t>);
>      dic.patternSearch(pat, *tmp);
>      for (uint32_t j = 0; j < tmp->size(); ++ j) {
> -       const ibis::bitvector *bv = rlc->getBitvector(j);
> +       const ibis::bitvector *bv = rlc->getBitvector((*tmp)[j]);
>         if (bv != 0)
>             est += bv->cnt();
>      }
> @@ -658,7 +658,7 @@ long ibis::category::patternSearch(const char *pat,
>      std::auto_ptr< ibis::array_t<uint32_t> > tmp(new 
> ibis::array_t<uint32_t>);
>      dic.patternSearch(pat, *tmp);
>      for (uint32_t j = 0; j < tmp->size(); ++ j) {
> -       const ibis::bitvector *bv = rlc->getBitvector(j);
> +       const ibis::bitvector *bv = rlc->getBitvector((*tmp)[j]);
>         if (bv != 0) {
>             ++ cnt;
>             est += bv->cnt();
>
> I was probably testing against an older version of the library.
>
> Thanks,
>
> -----Original Message-----
> From: K. John Wu [mailto:[email protected]]
> Sent: Wednesday, March 14, 2012 9:18 PM
> To: Dominique Prunier
> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
> match LIKE patterns case-sensitively and perform specific optimizations
>
> Hi, Dominique,
>
> I have just checked in a couple of minor changes.  Would be good to
> put out a stable version to replace the broken version that was taken
> down.
>
> Please let me know if you find anything that still needs attention.
>
> Thanks.
>
> John
>
>
> On 3/14/12 7:14 AM, Dominique Prunier wrote:
>> Seems to work for me. I'll do further testings, i'd like to isolate a stable 
>> version sometime this week.
>>
>> Thanks,
>>
>> -----Original Message-----
>> From: K. John Wu [mailto:[email protected]]
>> Sent: Tuesday, March 13, 2012 7:23 PM
>> To: Dominique Prunier
>> Cc: FastBit Users
>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
>> match LIKE patterns case-sensitively and perform specific optimizations
>>
>> Yes, you are absolutely right.  It should be
>>
>> if (read(...) < 0 || ...)
>>
>> This problem is corrected in SVN 489.  Let me know if you find
>> something else..
>>
>> John
>>
>>
>> On 3/13/12 2:55 PM, Dominique Prunier wrote:
>>> Hmm, seems like it is related to        if (0 <= 
>>> static_cast<ibis::direkte*>(idx)->read(idxf.c_str()) on line 185 of 
>>> category.cpp. Shouldn't it be 0!=read(..) instead of 0<=read(..) ?
>>>
>>> Thanks,
>>>
>>> -----Original Message-----
>>> From: [email protected] 
>>> [mailto:[email protected]] On Behalf Of Dominique Prunier
>>> Sent: Tuesday, March 13, 2012 5:16 PM
>>> To: K. John Wu
>>> Cc: FastBit Users
>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>
>>> Hey John,
>>>
>>> No segfault anymore but it seems that now it seems that it is always 
>>> regenerating the index all the time :/
>>>
>>> Thanks,
>>>
>>> -----Original Message-----
>>> From: K. John Wu [mailto:[email protected]]
>>> Sent: Tuesday, March 13, 2012 4:44 PM
>>> To: Dominique Prunier
>>> Cc: FastBit Users
>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>
>>> Hi, Dominique,
>>>
>>> Just checked in SVN version 488.  Please give it a try when you get
>>> the chance.  Thanks.
>>>
>>> John
>>>
>>>
>>> On 3/13/12 11:58 AM, Dominique Prunier wrote:
>>>> Cool. I'll test it right after you commit it.
>>>> While we're at fixing this, i think there is a memory leak in void 
>>>> ibis::category::prepareMembers(). The ibis::fileManager::storage *st (on 
>>>> line 182) is not freed by index, direkte or category (index just nullify 
>>>> the pointer). Not sure who should free it.
>>>>
>>>> Thanks,
>>>>
>>>> -----Original Message-----
>>>> From: K. John Wu [mailto:[email protected]]
>>>> Sent: Tuesday, March 13, 2012 2:53 PM
>>>> To: Dominique Prunier
>>>> Cc: FastBit Users
>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
>>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>>
>>>> Hi, Dominique,
>>>>
>>>> I think I know where the problem is -- during the process of
>>>> recreating the new index, the old file was not cleaned up properly.  I
>>>> have implemented a fix and am doing some testing on it.  Will check in
>>>> the code as soon as I am comfortable that I have not broken anything
>>>> with the new changes..
>>>>
>>>> John
>>>>
>>>>
>>>> On 3/13/12 11:48 AM, Dominique Prunier wrote:
>>>>> By the way, i checked index creation and it doesn't exhibit the issue. 
>>>>> The only way to reproduce is to use an old indexed partition (category 
>>>>> columns) and run the revision 487 on it. It seems that something bad 
>>>>> happens during the conversion and make the cleanup crash.
>>>>>
>>>>> -----Original Message-----
>>>>> From: [email protected] 
>>>>> [mailto:[email protected]] On Behalf Of Dominique 
>>>>> Prunier
>>>>> Sent: Tuesday, March 13, 2012 12:42 PM
>>>>> To: FastBit Users; K. John Wu
>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
>>>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>>>
>>>>> John,
>>>>>
>>>>> Seems like the segfault appears in the cleaning methods of the file 
>>>>> manager:
>>>>>
>>>>> ==5046== Warning: set address range perms: large range [0x6f3f030, 
>>>>> 0x1f5df050) (noaccess)
>>>>> ==5046== Invalid read of size 4
>>>>> ==5046==    at 0x50C8BE0: ibis::util::sharedInt32::operator()() const 
>>>>> (util.h:901)
>>>>> ==5046==    by 0x50C8E37: ibis::fileManager::storage::inUse() const 
>>>>> (fileManager.h:259)
>>>>> ==5046==    by 0x5669911: ibis::fileManager::unload(unsigned long) 
>>>>> (fileManager.cpp:1259)
>>>>> ==5046==    by 0x5664C68: ibis::fileManager::clear() (fileManager.cpp:408)
>>>>> ==5046==    by 0x5665B98: ibis::fileManager::~fileManager() 
>>>>> (fileManager.cpp:654)
>>>>> ==5046==    by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so)
>>>>> ==5046==    by 0x5C25404: exit (in /lib64/libc-2.13.so)
>>>>> ==5046==    by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so)
>>>>> ==5046==  Address 0x6f3aec4 is not stack'd, malloc'd or (recently) free'd
>>>>> ...
>>>>> ==5046== Invalid read of size 4
>>>>> ==5046==    at 0x52E9396: ibis::fileManager::storage::pastUse() const 
>>>>> (fileManager.h:261)
>>>>> ==5046==    by 0x56699FF: ibis::fileManager::unload(unsigned long) 
>>>>> (fileManager.cpp:1266)
>>>>> ==5046==    by 0x5664C68: ibis::fileManager::clear() (fileManager.cpp:408)
>>>>> ==5046==    by 0x5665B98: ibis::fileManager::~fileManager() 
>>>>> (fileManager.cpp:654)
>>>>> ==5046==    by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so)
>>>>> ==5046==    by 0x5C25404: exit (in /lib64/libc-2.13.so)
>>>>> ==5046==    by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so)
>>>>> ==5046==  Address 0x211c9570 is not stack'd, malloc'd or (recently) free'd
>>>>> ...
>>>>> ==5046== Invalid read of size 8
>>>>> ==5046==    at 0x5665068: ibis::fileManager::clear() (fileManager.cpp:444)
>>>>> ==5046==    by 0x5665B98: ibis::fileManager::~fileManager() 
>>>>> (fileManager.cpp:654)
>>>>> ==5046==    by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so)
>>>>> ==5046==    by 0x5C25404: exit (in /lib64/libc-2.13.so)
>>>>> ==5046==    by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so)
>>>>> ==5046==  Address 0x1c is not stack'd, malloc'd or (recently) free'd...
>>>>> ==5046==
>>>>> ==5046==
>>>>> ==5046== Process terminating with default action of signal 11 (SIGSEGV)
>>>>> ==5046==  Access not within mapped region at address 0x1C
>>>>> ==5046==    at 0x5665068: ibis::fileManager::clear() (fileManager.cpp:444)
>>>>> ==5046==    by 0x5665B98: ibis::fileManager::~fileManager() 
>>>>> (fileManager.cpp:654)
>>>>> ==5046==    by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so)
>>>>> ==5046==    by 0x5C25404: exit (in /lib64/libc-2.13.so)
>>>>> ==5046==    by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so)
>>>>> ==5046==  If you believe this happened as a result of a stack
>>>>> ==5046==  overflow in your program's main thread (unlikely but
>>>>> ==5046==  possible), you can try to increase the size of the
>>>>> ==5046==  main thread stack using the --main-stacksize= flag.
>>>>> ==5046==  The main thread stack size used in this run was 8388608.
>>>>>
>>>>> The second execution of the same program works fine, so it has to be 
>>>>> related to index creation/recreation.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -----Original Message-----
>>>>> From: [email protected] 
>>>>> [mailto:[email protected]] On Behalf Of Dominique 
>>>>> Prunier
>>>>> Sent: Tuesday, March 13, 2012 10:51 AM
>>>>> To: K. John Wu
>>>>> Cc: FastBit Users
>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
>>>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>>>
>>>>> Hey John,
>>>>>
>>>>> It seems to work just fine now, it seamlessly recreated indexes on my old 
>>>>> partition.
>>>>> However, i'm having a segfault at the end of the first execution (the one 
>>>>> that converted the index).
>>>>> I'll investigate this and tell you what i find.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -----Original Message-----
>>>>> From: K. John Wu [mailto:[email protected]]
>>>>> Sent: Monday, March 12, 2012 6:39 PM
>>>>> To: Dominique Prunier
>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
>>>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>>>
>>>>> Just added checking to make sure the index type to the functions
>>>>> caused your problem (a read function that directly works with
>>>>> ibis::fileManager::storage).  It should now automatically override all
>>>>> the relics with direktes.  I am testing the code now.  The source code
>>>>> is SVN 487.
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>> On 3/12/12 1:49 PM, Dominique Prunier wrote:
>>>>>> No problem. Do we want to do something about the migration from relic to 
>>>>>> direkte ?
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: K. John Wu [mailto:[email protected]]
>>>>>> Sent: Monday, March 12, 2012 2:21 PM
>>>>>> To: Dominique Prunier
>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
>>>>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>>>>
>>>>>> Thanks for the confirmation.
>>>>>>
>>>>>> John
>>>>>>
>>>>>>
>>>>>> On 3/12/12 11:07 AM, Dominique Prunier wrote:
>>>>>>> It seems to work just fine for me in r486.
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: K. John Wu [mailto:[email protected]]
>>>>>>> Sent: Monday, March 12, 2012 1:51 PM
>>>>>>> To: Dominique Prunier
>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added 
>>>>>>> to match LIKE patterns case-sensitively and perform specific 
>>>>>>> optimizations
>>>>>>>
>>>>>>> If you have verified the answers are the same as before, then we don't
>>>>>>> have a off-by-1 problem.  At this point, I have not done that.  Let me
>>>>>>> know if have.
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>>>
>>>>>>> On 3/12/12 10:48 AM, Dominique Prunier wrote:
>>>>>>>> Hmm, the original patch was working fairly well. I tried a couple of 
>>>>>>>> limits (range empty, start and/or ends at first value and/or last 
>>>>>>>> value). I didn't noticed any other change. Are you talking about this 
>>>>>>>> or the segfault in direkte ?
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: K. John Wu [mailto:[email protected]]
>>>>>>>> Sent: Monday, March 12, 2012 1:41 PM
>>>>>>>> To: Dominique Prunier
>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added 
>>>>>>>> to match LIKE patterns case-sensitively and perform specific 
>>>>>>>> optimizations
>>>>>>>>
>>>>>>>> Just hid release 1.2.9, there might be an off-by-1 problem as well.
>>>>>>>> Need to dig deeper..
>>>>>>>>
>>>>>>>> John
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/12/12 10:13 AM, Dominique Prunier wrote:
>>>>>>>>> Yep, but i bet you changed them for a reason, maybe a compile warning 
>>>>>>>>> or something (they were int32_t in my patch). Array_t indexes are 
>>>>>>>>> definitely uints_32. Reverting them to int32_t works but maybe it 
>>>>>>>>> would worth thinking about it.
>>>>>>>>>
>>>>>>>>> About the segfault, i captured an example of valgrind error (which 
>>>>>>>>> ultimately leads to a segfault). As you can see, it is during query 
>>>>>>>>> evaluation, not when reading the index.
>>>>>>>>>
>>>>>>>>> ==5672== Invalid read of size 4
>>>>>>>>> ==5672==    at 0x5414F60: ibis::bitvector::or_d1(ibis::bitvector 
>>>>>>>>> const&) (bitvector.cpp:2934)
>>>>>>>>> ==5672==    by 0x541C622: ibis::bitvector::operator|=(ibis::bitvector 
>>>>>>>>> const&) (bitvector.cpp:1272)
>>>>>>>>> ==5672==    by 0x52DA4A4: ibis::index::sumBins(unsigned int, unsigned 
>>>>>>>>> int, ibis::bitvector&) const (index.cpp:6183)
>>>>>>>>> ==5672==    by 0x55D097D: 
>>>>>>>>> ibis::direkte::evaluate(ibis::qContinuousRange const&, 
>>>>>>>>> ibis::bitvector&) const (idirekte.cpp:1071)
>>>>>>>>> ==5672==    by 0x5517FB3: ibis::category::stringSearch(char const*, 
>>>>>>>>> ibis::bitvector&) const (category.cpp:390)
>>>>>>>>> ==5672==    by 0x5276C41: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3948)
>>>>>>>>> ==5672==    by 0x52770E7: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3779)
>>>>>>>>> ==5672==    by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>>>> ==5672==    by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>>>> ==5672==    by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>>>> ==5672==    by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>>>> ==5672==    by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>>>> ==5672==  Address 0x9416f20 is 0 bytes after a block of size 354,720 
>>>>>>>>> alloc'd
>>>>>>>>> ==5672==    at 0x4C28C6D: malloc (in 
>>>>>>>>> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>>>>>>>>> ==5672==    by 0x5445F24: 
>>>>>>>>> ibis::fileManager::storage::storage(unsigned long) 
>>>>>>>>> (fileManager.cpp:1718)
>>>>>>>>> ==5672==    by 0x54466D9: 
>>>>>>>>> ibis::fileManager::storage::enlarge(unsigned long) 
>>>>>>>>> (fileManager.cpp:1977)
>>>>>>>>> ==5672==    by 0x51A73B1: ibis::array_t<unsigned 
>>>>>>>>> int>::resize(unsigned long) (array_t.cpp:1412)
>>>>>>>>> ==5672==    by 0x54119FA: ibis::bitvector::decompress() 
>>>>>>>>> (bitvector.cpp:364)
>>>>>>>>> ==5672==    by 0x52DA45B: ibis::index::sumBins(unsigned int, unsigned 
>>>>>>>>> int, ibis::bitvector&) const (index.cpp:6180)
>>>>>>>>> ==5672==    by 0x55D097D: 
>>>>>>>>> ibis::direkte::evaluate(ibis::qContinuousRange const&, 
>>>>>>>>> ibis::bitvector&) const (idirekte.cpp:1071)
>>>>>>>>> ==5672==    by 0x5517FB3: ibis::category::stringSearch(char const*, 
>>>>>>>>> ibis::bitvector&) const (category.cpp:390)
>>>>>>>>> ==5672==    by 0x5276C41: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3948)
>>>>>>>>> ==5672==    by 0x52770E7: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3779)
>>>>>>>>> ==5672==    by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>>>> ==5672==    by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>>>>
>>>>>>>>> I'm not able to reproduce on a simplistic use case. Not sure exactly 
>>>>>>>>> what triggers this. Here again, this is not dramatic since i just 
>>>>>>>>> have to regenerate my indexes but i'm wondering if there were a way 
>>>>>>>>> to catch this (i'm thinking about people upgrading from a version 
>>>>>>>>> prior to 1.2.9).
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: K. John Wu [mailto:[email protected]]
>>>>>>>>> Sent: Monday, March 12, 2012 1:02 PM
>>>>>>>>> To: Dominique Prunier
>>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added 
>>>>>>>>> to match LIKE patterns case-sensitively and perform specific 
>>>>>>>>> optimizations
>>>>>>>>>
>>>>>>>>> Hi, Dominique,
>>>>>>>>>
>>>>>>>>> Let me just confirm that the two lines where the change from int32_t
>>>>>>>>> to uint32_t should be reversed are line 458 and 459 of dictionary.cpp,
>>>>>>>>> right?
>>>>>>>>>
>>>>>>>>> John
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3/12/12 9:52 AM, Dominique Prunier wrote:
>>>>>>>>>> Hey John,
>>>>>>>>>>
>>>>>>>>>> The problem is that it doesn't actually fail when reading the index. 
>>>>>>>>>> The index is read but during the evaluation, i have segfaults, bogus 
>>>>>>>>>> results or valgrind errors. Once i regenerated the indexes for my 
>>>>>>>>>> category column, everything worked liked a charm.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It was also misleading because of the other issue (unsigned ints 
>>>>>>>>>> that should have been signed ints) that segfaulted too.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: K. John Wu [mailto:[email protected]]
>>>>>>>>>> Sent: Monday, March 12, 2012 12:43 PM
>>>>>>>>>> To: Dominique Prunier
>>>>>>>>>> Cc: FastBit Users
>>>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define 
>>>>>>>>>> added to match LIKE patterns case-sensitively and perform specific 
>>>>>>>>>> optimizations
>>>>>>>>>>
>>>>>>>>>> Hi, Dominique,
>>>>>>>>>>
>>>>>>>>>> I thought that I have checked index types.  If you happen to know the
>>>>>>>>>> stack trace for the reading operation, let me know.  Otherwise, it
>>>>>>>>>> might take me a while to figure out a good way to reproduce the 
>>>>>>>>>> problem..
>>>>>>>>>>
>>>>>>>>>> John
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 3/12/12 9:30 AM, Dominique Prunier wrote:
>>>>>>>>>>> Ok, figured out the other segfault. The index have to be 
>>>>>>>>>>> regenerated with the change from relic to direkte. My guess is that 
>>>>>>>>>>> it was reading something invalid. Is there a missing check in the 
>>>>>>>>>>> index read method ?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: [email protected] 
>>>>>>>>>>> [mailto:[email protected]] On Behalf Of 
>>>>>>>>>>> Dominique Prunier
>>>>>>>>>>> Sent: Monday, March 12, 2012 11:45 AM
>>>>>>>>>>> To: K. John Wu
>>>>>>>>>>> Cc: FastBit Users
>>>>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define 
>>>>>>>>>>> added to match LIKE patterns case-sensitively and perform specific 
>>>>>>>>>>> optimizations
>>>>>>>>>>>
>>>>>>>>>>> Hey John,
>>>>>>>>>>>
>>>>>>>>>>> The fix, as checked out in the revision 484 breaks the binary 
>>>>>>>>>>> search of the pattern prefix:
>>>>>>>>>>> -       int32_t b = 0;
>>>>>>>>>>> -       int32_t e = key_.size() - 1;
>>>>>>>>>>> +       uint32_t b = 0;
>>>>>>>>>>> +       uint32_t e = key_.size() - 1;
>>>>>>>>>>>
>>>>>>>>>>> Since the stop condition of the loop can be that one of the index 
>>>>>>>>>>> is -1, this now fails with a segfault.
>>>>>>>>>>>
>>>>>>>>>>> I'm troubleshooting another segfault in the bitvector right now 
>>>>>>>>>>> (could it be related to the change in r 479 ?)
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: K. John Wu [mailto:[email protected]]
>>>>>>>>>>> Sent: Saturday, March 10, 2012 2:24 PM
>>>>>>>>>>> To: Dominique Prunier
>>>>>>>>>>> Cc: FastBit Users
>>>>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define 
>>>>>>>>>>> added to match LIKE patterns case-sensitively and perform specific 
>>>>>>>>>>> optimizations
>>>>>>>>>>>
>>>>>>>>>>> Just checked in the modification to allow users to define
>>>>>>>>>>> FASTBIT_CS_PATTERN_MATCH to 0 to disable case sensitive matches.  
>>>>>>>>>>> The
>>>>>>>>>>> new SVN revision is 484.
>>>>>>>>>>>
>>>>>>>>>>> Also looked through other macros to make sure they are used 
>>>>>>>>>>> consistently.
>>>>>>>>>>>
>>>>>>>>>>> John
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 3/10/12 9:20 AM, Dominique Prunier wrote:
>>>>>>>>>>>> Hey John,
>>>>>>>>>>>>
>>>>>>>>>>>> I just noticed a small typo in utils.h, the macro is called 
>>>>>>>>>>>> FASTBOT_... I don't think it was expected but it has the nice side 
>>>>>>>>>>>> effect of disabling new code by default thus preserving current 
>>>>>>>>>>>> behavior (case insensitive). Should we actually keep it in util.h 
>>>>>>>>>>>> now that it is documented in INSTALL ?
>>>>>>>>>>>>
>>>>>>>>>>>> https://codeforge.lbl.gov/plugins/scmsvn/viewcvs.php/trunk/src/util.h?root=fastbit&r1=483&r2=482&pathrev=483
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>> From: K. John Wu [[email protected]]
>>>>>>>>>>>> Sent: March-09-12 10:47 PM
>>>>>>>>>>>> To: Dominique Prunier
>>>>>>>>>>>> Cc: FastBit Users
>>>>>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define 
>>>>>>>>>>>> added to match LIKE patterns case-sensitively and perform specific 
>>>>>>>>>>>> optimizations
>>>>>>>>>>>>
>>>>>>>>>>>> Hi, Dominique,
>>>>>>>>>>>>
>>>>>>>>>>>> I would like to add FASTBIT_ prefix to the macro CS_PATTERN_MATCH 
>>>>>>>>>>>> to
>>>>>>>>>>>> avoid possible collision when FastBit is used with other package.
>>>>>>>>>>>> Hope you don't mind.
>>>>>>>>>>>>
>>>>>>>>>>>> John
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 3/9/12 4:03 PM, K. John Wu wrote:
>>>>>>>>>>>>> Hi, Dominique,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have run through my usual set of tests and did not find any 
>>>>>>>>>>>>> problem
>>>>>>>>>>>>> with your patch.  It is now in SVN 482.  Please give it a try 
>>>>>>>>>>>>> when you
>>>>>>>>>>>>> get the chance.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> John
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 3/9/12 10:17 AM, Dominique Prunier wrote:
>>>>>>>>>>>>>> Quick update to my patch:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ·         Changed dictionary::patternMatch to make it work with 
>>>>>>>>>>>>>> CI too
>>>>>>>>>>>>>> (and i think for efficiency reasons, i have to keep all this 
>>>>>>>>>>>>>> here)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ·         Moved the STR_MATCH_* constants from util.cpp to 
>>>>>>>>>>>>>> util.h and
>>>>>>>>>>>>>> use them in dictionary::patternMatch
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ·         Removed the CS/CI ifdef from category.cpp
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I did more testing, and on my set of ~90 000 test queries, the
>>>>>>>>>>>>>> execution time dropped from ~515 seconds to ~20 seconds.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *From:*[email protected]
>>>>>>>>>>>>>> [mailto:[email protected]] *On Behalf Of 
>>>>>>>>>>>>>> *Dominique
>>>>>>>>>>>>>> Prunier
>>>>>>>>>>>>>> *Sent:* Thursday, March 08, 2012 2:39 PM
>>>>>>>>>>>>>> *To:* FastBit Users
>>>>>>>>>>>>>> *Subject:* [FastBit-users] PATCH: new CS_PATTERN_MATCH define 
>>>>>>>>>>>>>> added to
>>>>>>>>>>>>>> match LIKE patterns case-sensitively and perform specific 
>>>>>>>>>>>>>> optimizations
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here is the first version of my patch to switch SQL like from 
>>>>>>>>>>>>>> case
>>>>>>>>>>>>>> insensitive to case sensitive and optimize this use case with 
>>>>>>>>>>>>>> CATEGORY
>>>>>>>>>>>>>> columns.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In a nutshell, what changed is:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ·         We extract the longest (handling the escape char too)
>>>>>>>>>>>>>> constant prefix from the pattern
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ·         Instead of testing every value in the dictionary, we 
>>>>>>>>>>>>>> binary
>>>>>>>>>>>>>> search the range of values to search (which sometimes even allow 
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> skip pattern matching if no valid range can be found)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ·         We test every value in the range
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On a large dictionary (~130k entries), i’ve commonly it can be 
>>>>>>>>>>>>>> one or
>>>>>>>>>>>>>> two order of magnitude faster (in my example, a simple query 
>>>>>>>>>>>>>> with a
>>>>>>>>>>>>>> single LIKE predicate drops from ~10ms to ~0.4ms).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What i’d like to change/refactor (i’m really a newbie in c++):
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ·         Remove the prefix extraction and pattern matching code 
>>>>>>>>>>>>>> from
>>>>>>>>>>>>>> dictionary and replace the added method patternSearch by 
>>>>>>>>>>>>>> something
>>>>>>>>>>>>>> like findRange. I believe that matching and pattern handling code
>>>>>>>>>>>>>> doesn’t belong to the dictionary. I’d rather move this back to 
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> category class or something.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ·         Having to use a c++ string object to rebuild the 
>>>>>>>>>>>>>> longest
>>>>>>>>>>>>>> constant prefix bugs me (suggestions ?). I’m also thinking to 
>>>>>>>>>>>>>> have a
>>>>>>>>>>>>>> version that doesn’t support escaping, but it would force me to 
>>>>>>>>>>>>>> change
>>>>>>>>>>>>>> strMatch a bit more
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ·         To closely match the previous behavior, you can’t 
>>>>>>>>>>>>>> match an
>>>>>>>>>>>>>> empty pattern (even the empty string doesn’t match), maybe that 
>>>>>>>>>>>>>> would
>>>>>>>>>>>>>> worh being changed
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As always John, feel free to include this into the main branch. 
>>>>>>>>>>>>>> I’m
>>>>>>>>>>>>>> waiting for suggestions to make it more efficient, cleaner, ...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> */Dominique Prunier/**//*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  APG Lead Developper
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Logo-W4N-100dpi
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  4388, rue Saint-Denis
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  Bureau 309
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  Montreal (Quebec)  H2J 2L1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  Tel. +1 514-842-6767  x310
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  Fax +1 514-842-3989
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  [email protected] 
>>>>>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  www.watch4net.com <http://www.watch4net.com/>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /  /
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /This message is for the designated recipient only and may 
>>>>>>>>>>>>>> contain
>>>>>>>>>>>>>> privileged, proprietary, or otherwise private information. If 
>>>>>>>>>>>>>> you have
>>>>>>>>>>>>>> received it in error, please notify the sender immediately and 
>>>>>>>>>>>>>> delete
>>>>>>>>>>>>>> the original. Any other use of this electronic mail by you is 
>>>>>>>>>>>>>> prohibited.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> //Ce message est pour le récipiendaire désigné seulement et peut
>>>>>>>>>>>>>> contenir des informations privilégiées, propriétaires ou 
>>>>>>>>>>>>>> autrement
>>>>>>>>>>>>>> privées. Si vous l'avez reçu par erreur, S.V.P. avisez 
>>>>>>>>>>>>>> l'expéditeur
>>>>>>>>>>>>>> immédiatement et effacez l'original. Toute autre utilisation de 
>>>>>>>>>>>>>> ce
>>>>>>>>>>>>>> courrier électronique par vous est prohibée.///
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> FastBit-users mailing list
>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> FastBit-users mailing list
>>>>>>>>>>> [email protected]
>>>>>>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>>> _______________________________________________
>>>>> FastBit-users mailing list
>>>>> [email protected]
>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>>> _______________________________________________
>>>>> FastBit-users mailing list
>>>>> [email protected]
>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>> _______________________________________________
>>> FastBit-users mailing list
>>> [email protected]
>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to