Hey John,

It seems to work just fine now, it seamlessly recreated indexes on my old 
partition.
However, i'm having a segfault at the end of the first execution (the one that 
converted the index).
I'll investigate this and tell you what i find.

Thanks,

-----Original Message-----
From: K. John Wu [mailto:[email protected]]
Sent: Monday, March 12, 2012 6:39 PM
To: Dominique Prunier
Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to match 
LIKE patterns case-sensitively and perform specific optimizations

Just added checking to make sure the index type to the functions
caused your problem (a read function that directly works with
ibis::fileManager::storage).  It should now automatically override all
the relics with direktes.  I am testing the code now.  The source code
is SVN 487.

John


On 3/12/12 1:49 PM, Dominique Prunier wrote:
> No problem. Do we want to do something about the migration from relic to 
> direkte ?
>
> -----Original Message-----
> From: K. John Wu [mailto:[email protected]]
> Sent: Monday, March 12, 2012 2:21 PM
> To: Dominique Prunier
> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
> match LIKE patterns case-sensitively and perform specific optimizations
>
> Thanks for the confirmation.
>
> John
>
>
> On 3/12/12 11:07 AM, Dominique Prunier wrote:
>> It seems to work just fine for me in r486.
>>
>> -----Original Message-----
>> From: K. John Wu [mailto:[email protected]]
>> Sent: Monday, March 12, 2012 1:51 PM
>> To: Dominique Prunier
>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
>> match LIKE patterns case-sensitively and perform specific optimizations
>>
>> If you have verified the answers are the same as before, then we don't
>> have a off-by-1 problem.  At this point, I have not done that.  Let me
>> know if have.
>>
>> John
>>
>>
>> On 3/12/12 10:48 AM, Dominique Prunier wrote:
>>> Hmm, the original patch was working fairly well. I tried a couple of limits 
>>> (range empty, start and/or ends at first value and/or last value). I didn't 
>>> noticed any other change. Are you talking about this or the segfault in 
>>> direkte ?
>>>
>>> -----Original Message-----
>>> From: K. John Wu [mailto:[email protected]]
>>> Sent: Monday, March 12, 2012 1:41 PM
>>> To: Dominique Prunier
>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>
>>> Just hid release 1.2.9, there might be an off-by-1 problem as well.
>>> Need to dig deeper..
>>>
>>> John
>>>
>>>
>>> On 3/12/12 10:13 AM, Dominique Prunier wrote:
>>>> Yep, but i bet you changed them for a reason, maybe a compile warning or 
>>>> something (they were int32_t in my patch). Array_t indexes are definitely 
>>>> uints_32. Reverting them to int32_t works but maybe it would worth 
>>>> thinking about it.
>>>>
>>>> About the segfault, i captured an example of valgrind error (which 
>>>> ultimately leads to a segfault). As you can see, it is during query 
>>>> evaluation, not when reading the index.
>>>>
>>>> ==5672== Invalid read of size 4
>>>> ==5672==    at 0x5414F60: ibis::bitvector::or_d1(ibis::bitvector const&) 
>>>> (bitvector.cpp:2934)
>>>> ==5672==    by 0x541C622: ibis::bitvector::operator|=(ibis::bitvector 
>>>> const&) (bitvector.cpp:1272)
>>>> ==5672==    by 0x52DA4A4: ibis::index::sumBins(unsigned int, unsigned int, 
>>>> ibis::bitvector&) const (index.cpp:6183)
>>>> ==5672==    by 0x55D097D: ibis::direkte::evaluate(ibis::qContinuousRange 
>>>> const&, ibis::bitvector&) const (idirekte.cpp:1071)
>>>> ==5672==    by 0x5517FB3: ibis::category::stringSearch(char const*, 
>>>> ibis::bitvector&) const (category.cpp:390)
>>>> ==5672==    by 0x5276C41: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3948)
>>>> ==5672==    by 0x52770E7: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3779)
>>>> ==5672==    by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>> ==5672==    by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>> ==5672==    by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>> ==5672==    by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>> ==5672==    by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>> ==5672==  Address 0x9416f20 is 0 bytes after a block of size 354,720 
>>>> alloc'd
>>>> ==5672==    at 0x4C28C6D: malloc (in 
>>>> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>>>> ==5672==    by 0x5445F24: ibis::fileManager::storage::storage(unsigned 
>>>> long) (fileManager.cpp:1718)
>>>> ==5672==    by 0x54466D9: ibis::fileManager::storage::enlarge(unsigned 
>>>> long) (fileManager.cpp:1977)
>>>> ==5672==    by 0x51A73B1: ibis::array_t<unsigned int>::resize(unsigned 
>>>> long) (array_t.cpp:1412)
>>>> ==5672==    by 0x54119FA: ibis::bitvector::decompress() (bitvector.cpp:364)
>>>> ==5672==    by 0x52DA45B: ibis::index::sumBins(unsigned int, unsigned int, 
>>>> ibis::bitvector&) const (index.cpp:6180)
>>>> ==5672==    by 0x55D097D: ibis::direkte::evaluate(ibis::qContinuousRange 
>>>> const&, ibis::bitvector&) const (idirekte.cpp:1071)
>>>> ==5672==    by 0x5517FB3: ibis::category::stringSearch(char const*, 
>>>> ibis::bitvector&) const (category.cpp:390)
>>>> ==5672==    by 0x5276C41: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3948)
>>>> ==5672==    by 0x52770E7: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3779)
>>>> ==5672==    by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>> ==5672==    by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, 
>>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776)
>>>>
>>>> I'm not able to reproduce on a simplistic use case. Not sure exactly what 
>>>> triggers this. Here again, this is not dramatic since i just have to 
>>>> regenerate my indexes but i'm wondering if there were a way to catch this 
>>>> (i'm thinking about people upgrading from a version prior to 1.2.9).
>>>>
>>>> Thanks,
>>>>
>>>> -----Original Message-----
>>>> From: K. John Wu [mailto:[email protected]]
>>>> Sent: Monday, March 12, 2012 1:02 PM
>>>> To: Dominique Prunier
>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
>>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>>
>>>> Hi, Dominique,
>>>>
>>>> Let me just confirm that the two lines where the change from int32_t
>>>> to uint32_t should be reversed are line 458 and 459 of dictionary.cpp,
>>>> right?
>>>>
>>>> John
>>>>
>>>>
>>>> On 3/12/12 9:52 AM, Dominique Prunier wrote:
>>>>> Hey John,
>>>>>
>>>>> The problem is that it doesn't actually fail when reading the index. The 
>>>>> index is read but during the evaluation, i have segfaults, bogus results 
>>>>> or valgrind errors. Once i regenerated the indexes for my category 
>>>>> column, everything worked liked a charm.
>>>>
>>>>>
>>>>> It was also misleading because of the other issue (unsigned ints that 
>>>>> should have been signed ints) that segfaulted too.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -----Original Message-----
>>>>> From: K. John Wu [mailto:[email protected]]
>>>>> Sent: Monday, March 12, 2012 12:43 PM
>>>>> To: Dominique Prunier
>>>>> Cc: FastBit Users
>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
>>>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>>>
>>>>> Hi, Dominique,
>>>>>
>>>>> I thought that I have checked index types.  If you happen to know the
>>>>> stack trace for the reading operation, let me know.  Otherwise, it
>>>>> might take me a while to figure out a good way to reproduce the problem..
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>> On 3/12/12 9:30 AM, Dominique Prunier wrote:
>>>>>> Ok, figured out the other segfault. The index have to be regenerated 
>>>>>> with the change from relic to direkte. My guess is that it was reading 
>>>>>> something invalid. Is there a missing check in the index read method ?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: [email protected] 
>>>>>> [mailto:[email protected]] On Behalf Of Dominique 
>>>>>> Prunier
>>>>>> Sent: Monday, March 12, 2012 11:45 AM
>>>>>> To: K. John Wu
>>>>>> Cc: FastBit Users
>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
>>>>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>>>>
>>>>>> Hey John,
>>>>>>
>>>>>> The fix, as checked out in the revision 484 breaks the binary search of 
>>>>>> the pattern prefix:
>>>>>> -       int32_t b = 0;
>>>>>> -       int32_t e = key_.size() - 1;
>>>>>> +       uint32_t b = 0;
>>>>>> +       uint32_t e = key_.size() - 1;
>>>>>>
>>>>>> Since the stop condition of the loop can be that one of the index is -1, 
>>>>>> this now fails with a segfault.
>>>>>>
>>>>>> I'm troubleshooting another segfault in the bitvector right now (could 
>>>>>> it be related to the change in r 479 ?)
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: K. John Wu [mailto:[email protected]]
>>>>>> Sent: Saturday, March 10, 2012 2:24 PM
>>>>>> To: Dominique Prunier
>>>>>> Cc: FastBit Users
>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
>>>>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>>>>
>>>>>> Just checked in the modification to allow users to define
>>>>>> FASTBIT_CS_PATTERN_MATCH to 0 to disable case sensitive matches.  The
>>>>>> new SVN revision is 484.
>>>>>>
>>>>>> Also looked through other macros to make sure they are used consistently.
>>>>>>
>>>>>> John
>>>>>>
>>>>>>
>>>>>> On 3/10/12 9:20 AM, Dominique Prunier wrote:
>>>>>>> Hey John,
>>>>>>>
>>>>>>> I just noticed a small typo in utils.h, the macro is called FASTBOT_... 
>>>>>>> I don't think it was expected but it has the nice side effect of 
>>>>>>> disabling new code by default thus preserving current behavior (case 
>>>>>>> insensitive). Should we actually keep it in util.h now that it is 
>>>>>>> documented in INSTALL ?
>>>>>>>
>>>>>>> https://codeforge.lbl.gov/plugins/scmsvn/viewcvs.php/trunk/src/util.h?root=fastbit&r1=483&r2=482&pathrev=483
>>>>>>>
>>>>>>> Thanks,
>>>>>>> ________________________________________
>>>>>>> From: K. John Wu [[email protected]]
>>>>>>> Sent: March-09-12 10:47 PM
>>>>>>> To: Dominique Prunier
>>>>>>> Cc: FastBit Users
>>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added 
>>>>>>> to match LIKE patterns case-sensitively and perform specific 
>>>>>>> optimizations
>>>>>>>
>>>>>>> Hi, Dominique,
>>>>>>>
>>>>>>> I would like to add FASTBIT_ prefix to the macro CS_PATTERN_MATCH to
>>>>>>> avoid possible collision when FastBit is used with other package.
>>>>>>> Hope you don't mind.
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>>>
>>>>>>> On 3/9/12 4:03 PM, K. John Wu wrote:
>>>>>>>> Hi, Dominique,
>>>>>>>>
>>>>>>>> I have run through my usual set of tests and did not find any problem
>>>>>>>> with your patch.  It is now in SVN 482.  Please give it a try when you
>>>>>>>> get the chance.
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> John
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/9/12 10:17 AM, Dominique Prunier wrote:
>>>>>>>>> Quick update to my patch:
>>>>>>>>>
>>>>>>>>> ·         Changed dictionary::patternMatch to make it work with CI too
>>>>>>>>> (and i think for efficiency reasons, i have to keep all this here)
>>>>>>>>>
>>>>>>>>> ·         Moved the STR_MATCH_* constants from util.cpp to util.h and
>>>>>>>>> use them in dictionary::patternMatch
>>>>>>>>>
>>>>>>>>> ·         Removed the CS/CI ifdef from category.cpp
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I did more testing, and on my set of ~90 000 test queries, the
>>>>>>>>> execution time dropped from ~515 seconds to ~20 seconds.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *From:*[email protected]
>>>>>>>>> [mailto:[email protected]] *On Behalf Of *Dominique
>>>>>>>>> Prunier
>>>>>>>>> *Sent:* Thursday, March 08, 2012 2:39 PM
>>>>>>>>> *To:* FastBit Users
>>>>>>>>> *Subject:* [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to
>>>>>>>>> match LIKE patterns case-sensitively and perform specific 
>>>>>>>>> optimizations
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here is the first version of my patch to switch SQL like from case
>>>>>>>>> insensitive to case sensitive and optimize this use case with CATEGORY
>>>>>>>>> columns.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In a nutshell, what changed is:
>>>>>>>>>
>>>>>>>>> ·         We extract the longest (handling the escape char too)
>>>>>>>>> constant prefix from the pattern
>>>>>>>>>
>>>>>>>>> ·         Instead of testing every value in the dictionary, we binary
>>>>>>>>> search the range of values to search (which sometimes even allow to
>>>>>>>>> skip pattern matching if no valid range can be found)
>>>>>>>>>
>>>>>>>>> ·         We test every value in the range
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On a large dictionary (~130k entries), i’ve commonly it can be one or
>>>>>>>>> two order of magnitude faster (in my example, a simple query with a
>>>>>>>>> single LIKE predicate drops from ~10ms to ~0.4ms).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What i’d like to change/refactor (i’m really a newbie in c++):
>>>>>>>>>
>>>>>>>>> ·         Remove the prefix extraction and pattern matching code from
>>>>>>>>> dictionary and replace the added method patternSearch by something
>>>>>>>>> like findRange. I believe that matching and pattern handling code
>>>>>>>>> doesn’t belong to the dictionary. I’d rather move this back to the
>>>>>>>>> category class or something.
>>>>>>>>>
>>>>>>>>> ·         Having to use a c++ string object to rebuild the longest
>>>>>>>>> constant prefix bugs me (suggestions ?). I’m also thinking to have a
>>>>>>>>> version that doesn’t support escaping, but it would force me to change
>>>>>>>>> strMatch a bit more
>>>>>>>>>
>>>>>>>>> ·         To closely match the previous behavior, you can’t match an
>>>>>>>>> empty pattern (even the empty string doesn’t match), maybe that would
>>>>>>>>> worh being changed
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> As always John, feel free to include this into the main branch. I’m
>>>>>>>>> waiting for suggestions to make it more efficient, cleaner, ...
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> */Dominique Prunier/**//*
>>>>>>>>>
>>>>>>>>>  APG Lead Developper
>>>>>>>>>
>>>>>>>>> Logo-W4N-100dpi
>>>>>>>>>
>>>>>>>>>  4388, rue Saint-Denis
>>>>>>>>>
>>>>>>>>>  Bureau 309
>>>>>>>>>
>>>>>>>>>  Montreal (Quebec)  H2J 2L1
>>>>>>>>>
>>>>>>>>>  Tel. +1 514-842-6767  x310
>>>>>>>>>
>>>>>>>>>  Fax +1 514-842-3989
>>>>>>>>>
>>>>>>>>>  [email protected] 
>>>>>>>>> <mailto:[email protected]>
>>>>>>>>>
>>>>>>>>>  www.watch4net.com <http://www.watch4net.com/>
>>>>>>>>>
>>>>>>>>> /  /
>>>>>>>>>
>>>>>>>>> /This message is for the designated recipient only and may contain
>>>>>>>>> privileged, proprietary, or otherwise private information. If you have
>>>>>>>>> received it in error, please notify the sender immediately and delete
>>>>>>>>> the original. Any other use of this electronic mail by you is 
>>>>>>>>> prohibited.
>>>>>>>>>
>>>>>>>>> //Ce message est pour le récipiendaire désigné seulement et peut
>>>>>>>>> contenir des informations privilégiées, propriétaires ou autrement
>>>>>>>>> privées. Si vous l'avez reçu par erreur, S.V.P. avisez l'expéditeur
>>>>>>>>> immédiatement et effacez l'original. Toute autre utilisation de ce
>>>>>>>>> courrier électronique par vous est prohibée.///
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> FastBit-users mailing list
>>>>>>>>> [email protected]
>>>>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>>>> _______________________________________________
>>>>>> FastBit-users mailing list
>>>>>> [email protected]
>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to