Hi, Dominique,
Thanks for the fix. I have run through my tests. Clearly, I don't
have a test case that can exercise this feature right now. Will have
to think about how to get a few in there..
Let me know if you find anything else.
John
On 3/14/12 6:45 PM, Dominique Prunier wrote:
> Hey John,
>
> I just noticed that the pattern search was badly broken... I have no idea how
> i haven't catched that earlier, it was not using the proper index value. If
> i'm not mistaken, this should fix it:
>
> diff --git a/src/category.cpp b/src/category.cpp
> index ef3dc77..61039f9 100644
> --- a/src/category.cpp
> +++ b/src/category.cpp
> @@ -618,7 +618,7 @@ long ibis::category::patternSearch(const char *pat) const
> {
> std::auto_ptr< ibis::array_t<uint32_t> > tmp(new
> ibis::array_t<uint32_t>);
> dic.patternSearch(pat, *tmp);
> for (uint32_t j = 0; j < tmp->size(); ++ j) {
> - const ibis::bitvector *bv = rlc->getBitvector(j);
> + const ibis::bitvector *bv = rlc->getBitvector((*tmp)[j]);
> if (bv != 0)
> est += bv->cnt();
> }
> @@ -658,7 +658,7 @@ long ibis::category::patternSearch(const char *pat,
> std::auto_ptr< ibis::array_t<uint32_t> > tmp(new
> ibis::array_t<uint32_t>);
> dic.patternSearch(pat, *tmp);
> for (uint32_t j = 0; j < tmp->size(); ++ j) {
> - const ibis::bitvector *bv = rlc->getBitvector(j);
> + const ibis::bitvector *bv = rlc->getBitvector((*tmp)[j]);
> if (bv != 0) {
> ++ cnt;
> est += bv->cnt();
>
> I was probably testing against an older version of the library.
>
> Thanks,
>
> -----Original Message-----
> From: K. John Wu [mailto:[email protected]]
> Sent: Wednesday, March 14, 2012 9:18 PM
> To: Dominique Prunier
> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to
> match LIKE patterns case-sensitively and perform specific optimizations
>
> Hi, Dominique,
>
> I have just checked in a couple of minor changes. Would be good to
> put out a stable version to replace the broken version that was taken
> down.
>
> Please let me know if you find anything that still needs attention.
>
> Thanks.
>
> John
>
>
> On 3/14/12 7:14 AM, Dominique Prunier wrote:
>> Seems to work for me. I'll do further testings, i'd like to isolate a stable
>> version sometime this week.
>>
>> Thanks,
>>
>> -----Original Message-----
>> From: K. John Wu [mailto:[email protected]]
>> Sent: Tuesday, March 13, 2012 7:23 PM
>> To: Dominique Prunier
>> Cc: FastBit Users
>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to
>> match LIKE patterns case-sensitively and perform specific optimizations
>>
>> Yes, you are absolutely right. It should be
>>
>> if (read(...) < 0 || ...)
>>
>> This problem is corrected in SVN 489. Let me know if you find
>> something else..
>>
>> John
>>
>>
>> On 3/13/12 2:55 PM, Dominique Prunier wrote:
>>> Hmm, seems like it is related to if (0 <=
>>> static_cast<ibis::direkte*>(idx)->read(idxf.c_str()) on line 185 of
>>> category.cpp. Shouldn't it be 0!=read(..) instead of 0<=read(..) ?
>>>
>>> Thanks,
>>>
>>> -----Original Message-----
>>> From: [email protected]
>>> [mailto:[email protected]] On Behalf Of Dominique Prunier
>>> Sent: Tuesday, March 13, 2012 5:16 PM
>>> To: K. John Wu
>>> Cc: FastBit Users
>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to
>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>
>>> Hey John,
>>>
>>> No segfault anymore but it seems that now it seems that it is always
>>> regenerating the index all the time :/
>>>
>>> Thanks,
>>>
>>> -----Original Message-----
>>> From: K. John Wu [mailto:[email protected]]
>>> Sent: Tuesday, March 13, 2012 4:44 PM
>>> To: Dominique Prunier
>>> Cc: FastBit Users
>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to
>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>
>>> Hi, Dominique,
>>>
>>> Just checked in SVN version 488. Please give it a try when you get
>>> the chance. Thanks.
>>>
>>> John
>>>
>>>
>>> On 3/13/12 11:58 AM, Dominique Prunier wrote:
>>>> Cool. I'll test it right after you commit it.
>>>> While we're at fixing this, i think there is a memory leak in void
>>>> ibis::category::prepareMembers(). The ibis::fileManager::storage *st (on
>>>> line 182) is not freed by index, direkte or category (index just nullify
>>>> the pointer). Not sure who should free it.
>>>>
>>>> Thanks,
>>>>
>>>> -----Original Message-----
>>>> From: K. John Wu [mailto:[email protected]]
>>>> Sent: Tuesday, March 13, 2012 2:53 PM
>>>> To: Dominique Prunier
>>>> Cc: FastBit Users
>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to
>>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>>
>>>> Hi, Dominique,
>>>>
>>>> I think I know where the problem is -- during the process of
>>>> recreating the new index, the old file was not cleaned up properly. I
>>>> have implemented a fix and am doing some testing on it. Will check in
>>>> the code as soon as I am comfortable that I have not broken anything
>>>> with the new changes..
>>>>
>>>> John
>>>>
>>>>
>>>> On 3/13/12 11:48 AM, Dominique Prunier wrote:
>>>>> By the way, i checked index creation and it doesn't exhibit the issue.
>>>>> The only way to reproduce is to use an old indexed partition (category
>>>>> columns) and run the revision 487 on it. It seems that something bad
>>>>> happens during the conversion and make the cleanup crash.
>>>>>
>>>>> -----Original Message-----
>>>>> From: [email protected]
>>>>> [mailto:[email protected]] On Behalf Of Dominique
>>>>> Prunier
>>>>> Sent: Tuesday, March 13, 2012 12:42 PM
>>>>> To: FastBit Users; K. John Wu
>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to
>>>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>>>
>>>>> John,
>>>>>
>>>>> Seems like the segfault appears in the cleaning methods of the file
>>>>> manager:
>>>>>
>>>>> ==5046== Warning: set address range perms: large range [0x6f3f030,
>>>>> 0x1f5df050) (noaccess)
>>>>> ==5046== Invalid read of size 4
>>>>> ==5046== at 0x50C8BE0: ibis::util::sharedInt32::operator()() const
>>>>> (util.h:901)
>>>>> ==5046== by 0x50C8E37: ibis::fileManager::storage::inUse() const
>>>>> (fileManager.h:259)
>>>>> ==5046== by 0x5669911: ibis::fileManager::unload(unsigned long)
>>>>> (fileManager.cpp:1259)
>>>>> ==5046== by 0x5664C68: ibis::fileManager::clear() (fileManager.cpp:408)
>>>>> ==5046== by 0x5665B98: ibis::fileManager::~fileManager()
>>>>> (fileManager.cpp:654)
>>>>> ==5046== by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so)
>>>>> ==5046== by 0x5C25404: exit (in /lib64/libc-2.13.so)
>>>>> ==5046== by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so)
>>>>> ==5046== Address 0x6f3aec4 is not stack'd, malloc'd or (recently) free'd
>>>>> ...
>>>>> ==5046== Invalid read of size 4
>>>>> ==5046== at 0x52E9396: ibis::fileManager::storage::pastUse() const
>>>>> (fileManager.h:261)
>>>>> ==5046== by 0x56699FF: ibis::fileManager::unload(unsigned long)
>>>>> (fileManager.cpp:1266)
>>>>> ==5046== by 0x5664C68: ibis::fileManager::clear() (fileManager.cpp:408)
>>>>> ==5046== by 0x5665B98: ibis::fileManager::~fileManager()
>>>>> (fileManager.cpp:654)
>>>>> ==5046== by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so)
>>>>> ==5046== by 0x5C25404: exit (in /lib64/libc-2.13.so)
>>>>> ==5046== by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so)
>>>>> ==5046== Address 0x211c9570 is not stack'd, malloc'd or (recently) free'd
>>>>> ...
>>>>> ==5046== Invalid read of size 8
>>>>> ==5046== at 0x5665068: ibis::fileManager::clear() (fileManager.cpp:444)
>>>>> ==5046== by 0x5665B98: ibis::fileManager::~fileManager()
>>>>> (fileManager.cpp:654)
>>>>> ==5046== by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so)
>>>>> ==5046== by 0x5C25404: exit (in /lib64/libc-2.13.so)
>>>>> ==5046== by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so)
>>>>> ==5046== Address 0x1c is not stack'd, malloc'd or (recently) free'd...
>>>>> ==5046==
>>>>> ==5046==
>>>>> ==5046== Process terminating with default action of signal 11 (SIGSEGV)
>>>>> ==5046== Access not within mapped region at address 0x1C
>>>>> ==5046== at 0x5665068: ibis::fileManager::clear() (fileManager.cpp:444)
>>>>> ==5046== by 0x5665B98: ibis::fileManager::~fileManager()
>>>>> (fileManager.cpp:654)
>>>>> ==5046== by 0x5C253B0: __run_exit_handlers (in /lib64/libc-2.13.so)
>>>>> ==5046== by 0x5C25404: exit (in /lib64/libc-2.13.so)
>>>>> ==5046== by 0x5C0F0A3: (below main) (in /lib64/libc-2.13.so)
>>>>> ==5046== If you believe this happened as a result of a stack
>>>>> ==5046== overflow in your program's main thread (unlikely but
>>>>> ==5046== possible), you can try to increase the size of the
>>>>> ==5046== main thread stack using the --main-stacksize= flag.
>>>>> ==5046== The main thread stack size used in this run was 8388608.
>>>>>
>>>>> The second execution of the same program works fine, so it has to be
>>>>> related to index creation/recreation.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -----Original Message-----
>>>>> From: [email protected]
>>>>> [mailto:[email protected]] On Behalf Of Dominique
>>>>> Prunier
>>>>> Sent: Tuesday, March 13, 2012 10:51 AM
>>>>> To: K. John Wu
>>>>> Cc: FastBit Users
>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to
>>>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>>>
>>>>> Hey John,
>>>>>
>>>>> It seems to work just fine now, it seamlessly recreated indexes on my old
>>>>> partition.
>>>>> However, i'm having a segfault at the end of the first execution (the one
>>>>> that converted the index).
>>>>> I'll investigate this and tell you what i find.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -----Original Message-----
>>>>> From: K. John Wu [mailto:[email protected]]
>>>>> Sent: Monday, March 12, 2012 6:39 PM
>>>>> To: Dominique Prunier
>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to
>>>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>>>
>>>>> Just added checking to make sure the index type to the functions
>>>>> caused your problem (a read function that directly works with
>>>>> ibis::fileManager::storage). It should now automatically override all
>>>>> the relics with direktes. I am testing the code now. The source code
>>>>> is SVN 487.
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>> On 3/12/12 1:49 PM, Dominique Prunier wrote:
>>>>>> No problem. Do we want to do something about the migration from relic to
>>>>>> direkte ?
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: K. John Wu [mailto:[email protected]]
>>>>>> Sent: Monday, March 12, 2012 2:21 PM
>>>>>> To: Dominique Prunier
>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to
>>>>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>>>>
>>>>>> Thanks for the confirmation.
>>>>>>
>>>>>> John
>>>>>>
>>>>>>
>>>>>> On 3/12/12 11:07 AM, Dominique Prunier wrote:
>>>>>>> It seems to work just fine for me in r486.
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: K. John Wu [mailto:[email protected]]
>>>>>>> Sent: Monday, March 12, 2012 1:51 PM
>>>>>>> To: Dominique Prunier
>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added
>>>>>>> to match LIKE patterns case-sensitively and perform specific
>>>>>>> optimizations
>>>>>>>
>>>>>>> If you have verified the answers are the same as before, then we don't
>>>>>>> have a off-by-1 problem. At this point, I have not done that. Let me
>>>>>>> know if have.
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>>>
>>>>>>> On 3/12/12 10:48 AM, Dominique Prunier wrote:
>>>>>>>> Hmm, the original patch was working fairly well. I tried a couple of
>>>>>>>> limits (range empty, start and/or ends at first value and/or last
>>>>>>>> value). I didn't noticed any other change. Are you talking about this
>>>>>>>> or the segfault in direkte ?
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: K. John Wu [mailto:[email protected]]
>>>>>>>> Sent: Monday, March 12, 2012 1:41 PM
>>>>>>>> To: Dominique Prunier
>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added
>>>>>>>> to match LIKE patterns case-sensitively and perform specific
>>>>>>>> optimizations
>>>>>>>>
>>>>>>>> Just hid release 1.2.9, there might be an off-by-1 problem as well.
>>>>>>>> Need to dig deeper..
>>>>>>>>
>>>>>>>> John
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/12/12 10:13 AM, Dominique Prunier wrote:
>>>>>>>>> Yep, but i bet you changed them for a reason, maybe a compile warning
>>>>>>>>> or something (they were int32_t in my patch). Array_t indexes are
>>>>>>>>> definitely uints_32. Reverting them to int32_t works but maybe it
>>>>>>>>> would worth thinking about it.
>>>>>>>>>
>>>>>>>>> About the segfault, i captured an example of valgrind error (which
>>>>>>>>> ultimately leads to a segfault). As you can see, it is during query
>>>>>>>>> evaluation, not when reading the index.
>>>>>>>>>
>>>>>>>>> ==5672== Invalid read of size 4
>>>>>>>>> ==5672== at 0x5414F60: ibis::bitvector::or_d1(ibis::bitvector
>>>>>>>>> const&) (bitvector.cpp:2934)
>>>>>>>>> ==5672== by 0x541C622: ibis::bitvector::operator|=(ibis::bitvector
>>>>>>>>> const&) (bitvector.cpp:1272)
>>>>>>>>> ==5672== by 0x52DA4A4: ibis::index::sumBins(unsigned int, unsigned
>>>>>>>>> int, ibis::bitvector&) const (index.cpp:6183)
>>>>>>>>> ==5672== by 0x55D097D:
>>>>>>>>> ibis::direkte::evaluate(ibis::qContinuousRange const&,
>>>>>>>>> ibis::bitvector&) const (idirekte.cpp:1071)
>>>>>>>>> ==5672== by 0x5517FB3: ibis::category::stringSearch(char const*,
>>>>>>>>> ibis::bitvector&) const (category.cpp:390)
>>>>>>>>> ==5672== by 0x5276C41: ibis::query::doEvaluate(ibis::qExpr const*,
>>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3948)
>>>>>>>>> ==5672== by 0x52770E7: ibis::query::doEvaluate(ibis::qExpr const*,
>>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3779)
>>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*,
>>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*,
>>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*,
>>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*,
>>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*,
>>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>>>> ==5672== Address 0x9416f20 is 0 bytes after a block of size 354,720
>>>>>>>>> alloc'd
>>>>>>>>> ==5672== at 0x4C28C6D: malloc (in
>>>>>>>>> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>>>>>>>>> ==5672== by 0x5445F24:
>>>>>>>>> ibis::fileManager::storage::storage(unsigned long)
>>>>>>>>> (fileManager.cpp:1718)
>>>>>>>>> ==5672== by 0x54466D9:
>>>>>>>>> ibis::fileManager::storage::enlarge(unsigned long)
>>>>>>>>> (fileManager.cpp:1977)
>>>>>>>>> ==5672== by 0x51A73B1: ibis::array_t<unsigned
>>>>>>>>> int>::resize(unsigned long) (array_t.cpp:1412)
>>>>>>>>> ==5672== by 0x54119FA: ibis::bitvector::decompress()
>>>>>>>>> (bitvector.cpp:364)
>>>>>>>>> ==5672== by 0x52DA45B: ibis::index::sumBins(unsigned int, unsigned
>>>>>>>>> int, ibis::bitvector&) const (index.cpp:6180)
>>>>>>>>> ==5672== by 0x55D097D:
>>>>>>>>> ibis::direkte::evaluate(ibis::qContinuousRange const&,
>>>>>>>>> ibis::bitvector&) const (idirekte.cpp:1071)
>>>>>>>>> ==5672== by 0x5517FB3: ibis::category::stringSearch(char const*,
>>>>>>>>> ibis::bitvector&) const (category.cpp:390)
>>>>>>>>> ==5672== by 0x5276C41: ibis::query::doEvaluate(ibis::qExpr const*,
>>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3948)
>>>>>>>>> ==5672== by 0x52770E7: ibis::query::doEvaluate(ibis::qExpr const*,
>>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3779)
>>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*,
>>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*,
>>>>>>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>>>>>>
>>>>>>>>> I'm not able to reproduce on a simplistic use case. Not sure exactly
>>>>>>>>> what triggers this. Here again, this is not dramatic since i just
>>>>>>>>> have to regenerate my indexes but i'm wondering if there were a way
>>>>>>>>> to catch this (i'm thinking about people upgrading from a version
>>>>>>>>> prior to 1.2.9).
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: K. John Wu [mailto:[email protected]]
>>>>>>>>> Sent: Monday, March 12, 2012 1:02 PM
>>>>>>>>> To: Dominique Prunier
>>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added
>>>>>>>>> to match LIKE patterns case-sensitively and perform specific
>>>>>>>>> optimizations
>>>>>>>>>
>>>>>>>>> Hi, Dominique,
>>>>>>>>>
>>>>>>>>> Let me just confirm that the two lines where the change from int32_t
>>>>>>>>> to uint32_t should be reversed are line 458 and 459 of dictionary.cpp,
>>>>>>>>> right?
>>>>>>>>>
>>>>>>>>> John
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3/12/12 9:52 AM, Dominique Prunier wrote:
>>>>>>>>>> Hey John,
>>>>>>>>>>
>>>>>>>>>> The problem is that it doesn't actually fail when reading the index.
>>>>>>>>>> The index is read but during the evaluation, i have segfaults, bogus
>>>>>>>>>> results or valgrind errors. Once i regenerated the indexes for my
>>>>>>>>>> category column, everything worked liked a charm.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It was also misleading because of the other issue (unsigned ints
>>>>>>>>>> that should have been signed ints) that segfaulted too.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: K. John Wu [mailto:[email protected]]
>>>>>>>>>> Sent: Monday, March 12, 2012 12:43 PM
>>>>>>>>>> To: Dominique Prunier
>>>>>>>>>> Cc: FastBit Users
>>>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define
>>>>>>>>>> added to match LIKE patterns case-sensitively and perform specific
>>>>>>>>>> optimizations
>>>>>>>>>>
>>>>>>>>>> Hi, Dominique,
>>>>>>>>>>
>>>>>>>>>> I thought that I have checked index types. If you happen to know the
>>>>>>>>>> stack trace for the reading operation, let me know. Otherwise, it
>>>>>>>>>> might take me a while to figure out a good way to reproduce the
>>>>>>>>>> problem..
>>>>>>>>>>
>>>>>>>>>> John
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 3/12/12 9:30 AM, Dominique Prunier wrote:
>>>>>>>>>>> Ok, figured out the other segfault. The index have to be
>>>>>>>>>>> regenerated with the change from relic to direkte. My guess is that
>>>>>>>>>>> it was reading something invalid. Is there a missing check in the
>>>>>>>>>>> index read method ?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: [email protected]
>>>>>>>>>>> [mailto:[email protected]] On Behalf Of
>>>>>>>>>>> Dominique Prunier
>>>>>>>>>>> Sent: Monday, March 12, 2012 11:45 AM
>>>>>>>>>>> To: K. John Wu
>>>>>>>>>>> Cc: FastBit Users
>>>>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define
>>>>>>>>>>> added to match LIKE patterns case-sensitively and perform specific
>>>>>>>>>>> optimizations
>>>>>>>>>>>
>>>>>>>>>>> Hey John,
>>>>>>>>>>>
>>>>>>>>>>> The fix, as checked out in the revision 484 breaks the binary
>>>>>>>>>>> search of the pattern prefix:
>>>>>>>>>>> - int32_t b = 0;
>>>>>>>>>>> - int32_t e = key_.size() - 1;
>>>>>>>>>>> + uint32_t b = 0;
>>>>>>>>>>> + uint32_t e = key_.size() - 1;
>>>>>>>>>>>
>>>>>>>>>>> Since the stop condition of the loop can be that one of the index
>>>>>>>>>>> is -1, this now fails with a segfault.
>>>>>>>>>>>
>>>>>>>>>>> I'm troubleshooting another segfault in the bitvector right now
>>>>>>>>>>> (could it be related to the change in r 479 ?)
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: K. John Wu [mailto:[email protected]]
>>>>>>>>>>> Sent: Saturday, March 10, 2012 2:24 PM
>>>>>>>>>>> To: Dominique Prunier
>>>>>>>>>>> Cc: FastBit Users
>>>>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define
>>>>>>>>>>> added to match LIKE patterns case-sensitively and perform specific
>>>>>>>>>>> optimizations
>>>>>>>>>>>
>>>>>>>>>>> Just checked in the modification to allow users to define
>>>>>>>>>>> FASTBIT_CS_PATTERN_MATCH to 0 to disable case sensitive matches.
>>>>>>>>>>> The
>>>>>>>>>>> new SVN revision is 484.
>>>>>>>>>>>
>>>>>>>>>>> Also looked through other macros to make sure they are used
>>>>>>>>>>> consistently.
>>>>>>>>>>>
>>>>>>>>>>> John
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 3/10/12 9:20 AM, Dominique Prunier wrote:
>>>>>>>>>>>> Hey John,
>>>>>>>>>>>>
>>>>>>>>>>>> I just noticed a small typo in utils.h, the macro is called
>>>>>>>>>>>> FASTBOT_... I don't think it was expected but it has the nice side
>>>>>>>>>>>> effect of disabling new code by default thus preserving current
>>>>>>>>>>>> behavior (case insensitive). Should we actually keep it in util.h
>>>>>>>>>>>> now that it is documented in INSTALL ?
>>>>>>>>>>>>
>>>>>>>>>>>> https://codeforge.lbl.gov/plugins/scmsvn/viewcvs.php/trunk/src/util.h?root=fastbit&r1=483&r2=482&pathrev=483
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>> From: K. John Wu [[email protected]]
>>>>>>>>>>>> Sent: March-09-12 10:47 PM
>>>>>>>>>>>> To: Dominique Prunier
>>>>>>>>>>>> Cc: FastBit Users
>>>>>>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define
>>>>>>>>>>>> added to match LIKE patterns case-sensitively and perform specific
>>>>>>>>>>>> optimizations
>>>>>>>>>>>>
>>>>>>>>>>>> Hi, Dominique,
>>>>>>>>>>>>
>>>>>>>>>>>> I would like to add FASTBIT_ prefix to the macro CS_PATTERN_MATCH
>>>>>>>>>>>> to
>>>>>>>>>>>> avoid possible collision when FastBit is used with other package.
>>>>>>>>>>>> Hope you don't mind.
>>>>>>>>>>>>
>>>>>>>>>>>> John
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 3/9/12 4:03 PM, K. John Wu wrote:
>>>>>>>>>>>>> Hi, Dominique,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have run through my usual set of tests and did not find any
>>>>>>>>>>>>> problem
>>>>>>>>>>>>> with your patch. It is now in SVN 482. Please give it a try
>>>>>>>>>>>>> when you
>>>>>>>>>>>>> get the chance.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> John
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 3/9/12 10:17 AM, Dominique Prunier wrote:
>>>>>>>>>>>>>> Quick update to my patch:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> · Changed dictionary::patternMatch to make it work with
>>>>>>>>>>>>>> CI too
>>>>>>>>>>>>>> (and i think for efficiency reasons, i have to keep all this
>>>>>>>>>>>>>> here)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> · Moved the STR_MATCH_* constants from util.cpp to
>>>>>>>>>>>>>> util.h and
>>>>>>>>>>>>>> use them in dictionary::patternMatch
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> · Removed the CS/CI ifdef from category.cpp
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I did more testing, and on my set of ~90 000 test queries, the
>>>>>>>>>>>>>> execution time dropped from ~515 seconds to ~20 seconds.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *From:*[email protected]
>>>>>>>>>>>>>> [mailto:[email protected]] *On Behalf Of
>>>>>>>>>>>>>> *Dominique
>>>>>>>>>>>>>> Prunier
>>>>>>>>>>>>>> *Sent:* Thursday, March 08, 2012 2:39 PM
>>>>>>>>>>>>>> *To:* FastBit Users
>>>>>>>>>>>>>> *Subject:* [FastBit-users] PATCH: new CS_PATTERN_MATCH define
>>>>>>>>>>>>>> added to
>>>>>>>>>>>>>> match LIKE patterns case-sensitively and perform specific
>>>>>>>>>>>>>> optimizations
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here is the first version of my patch to switch SQL like from
>>>>>>>>>>>>>> case
>>>>>>>>>>>>>> insensitive to case sensitive and optimize this use case with
>>>>>>>>>>>>>> CATEGORY
>>>>>>>>>>>>>> columns.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In a nutshell, what changed is:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> · We extract the longest (handling the escape char too)
>>>>>>>>>>>>>> constant prefix from the pattern
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> · Instead of testing every value in the dictionary, we
>>>>>>>>>>>>>> binary
>>>>>>>>>>>>>> search the range of values to search (which sometimes even allow
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> skip pattern matching if no valid range can be found)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> · We test every value in the range
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On a large dictionary (~130k entries), i’ve commonly it can be
>>>>>>>>>>>>>> one or
>>>>>>>>>>>>>> two order of magnitude faster (in my example, a simple query
>>>>>>>>>>>>>> with a
>>>>>>>>>>>>>> single LIKE predicate drops from ~10ms to ~0.4ms).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What i’d like to change/refactor (i’m really a newbie in c++):
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> · Remove the prefix extraction and pattern matching code
>>>>>>>>>>>>>> from
>>>>>>>>>>>>>> dictionary and replace the added method patternSearch by
>>>>>>>>>>>>>> something
>>>>>>>>>>>>>> like findRange. I believe that matching and pattern handling code
>>>>>>>>>>>>>> doesn’t belong to the dictionary. I’d rather move this back to
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> category class or something.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> · Having to use a c++ string object to rebuild the
>>>>>>>>>>>>>> longest
>>>>>>>>>>>>>> constant prefix bugs me (suggestions ?). I’m also thinking to
>>>>>>>>>>>>>> have a
>>>>>>>>>>>>>> version that doesn’t support escaping, but it would force me to
>>>>>>>>>>>>>> change
>>>>>>>>>>>>>> strMatch a bit more
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> · To closely match the previous behavior, you can’t
>>>>>>>>>>>>>> match an
>>>>>>>>>>>>>> empty pattern (even the empty string doesn’t match), maybe that
>>>>>>>>>>>>>> would
>>>>>>>>>>>>>> worh being changed
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As always John, feel free to include this into the main branch.
>>>>>>>>>>>>>> I’m
>>>>>>>>>>>>>> waiting for suggestions to make it more efficient, cleaner, ...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> */Dominique Prunier/**//*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> APG Lead Developper
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Logo-W4N-100dpi
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 4388, rue Saint-Denis
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Bureau 309
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Montreal (Quebec) H2J 2L1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Tel. +1 514-842-6767 x310
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Fax +1 514-842-3989
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> www.watch4net.com <http://www.watch4net.com/>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> / /
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /This message is for the designated recipient only and may
>>>>>>>>>>>>>> contain
>>>>>>>>>>>>>> privileged, proprietary, or otherwise private information. If
>>>>>>>>>>>>>> you have
>>>>>>>>>>>>>> received it in error, please notify the sender immediately and
>>>>>>>>>>>>>> delete
>>>>>>>>>>>>>> the original. Any other use of this electronic mail by you is
>>>>>>>>>>>>>> prohibited.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> //Ce message est pour le récipiendaire désigné seulement et peut
>>>>>>>>>>>>>> contenir des informations privilégiées, propriétaires ou
>>>>>>>>>>>>>> autrement
>>>>>>>>>>>>>> privées. Si vous l'avez reçu par erreur, S.V.P. avisez
>>>>>>>>>>>>>> l'expéditeur
>>>>>>>>>>>>>> immédiatement et effacez l'original. Toute autre utilisation de
>>>>>>>>>>>>>> ce
>>>>>>>>>>>>>> courrier électronique par vous est prohibée.///
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> FastBit-users mailing list
>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> FastBit-users mailing list
>>>>>>>>>>> [email protected]
>>>>>>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>>> _______________________________________________
>>>>> FastBit-users mailing list
>>>>> [email protected]
>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>>> _______________________________________________
>>>>> FastBit-users mailing list
>>>>> [email protected]
>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>> _______________________________________________
>>> FastBit-users mailing list
>>> [email protected]
>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users