Hey John, It seems to work just fine now, it seamlessly recreated indexes on my old partition. However, i'm having a segfault at the end of the first execution (the one that converted the index). I'll investigate this and tell you what i find.
Thanks, -----Original Message----- From: K. John Wu [mailto:[email protected]] Sent: Monday, March 12, 2012 6:39 PM To: Dominique Prunier Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to match LIKE patterns case-sensitively and perform specific optimizations Just added checking to make sure the index type to the functions caused your problem (a read function that directly works with ibis::fileManager::storage). It should now automatically override all the relics with direktes. I am testing the code now. The source code is SVN 487. John On 3/12/12 1:49 PM, Dominique Prunier wrote: > No problem. Do we want to do something about the migration from relic to > direkte ? > > -----Original Message----- > From: K. John Wu [mailto:[email protected]] > Sent: Monday, March 12, 2012 2:21 PM > To: Dominique Prunier > Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to > match LIKE patterns case-sensitively and perform specific optimizations > > Thanks for the confirmation. > > John > > > On 3/12/12 11:07 AM, Dominique Prunier wrote: >> It seems to work just fine for me in r486. >> >> -----Original Message----- >> From: K. John Wu [mailto:[email protected]] >> Sent: Monday, March 12, 2012 1:51 PM >> To: Dominique Prunier >> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >> match LIKE patterns case-sensitively and perform specific optimizations >> >> If you have verified the answers are the same as before, then we don't >> have a off-by-1 problem. At this point, I have not done that. Let me >> know if have. >> >> John >> >> >> On 3/12/12 10:48 AM, Dominique Prunier wrote: >>> Hmm, the original patch was working fairly well. I tried a couple of limits >>> (range empty, start and/or ends at first value and/or last value). I didn't >>> noticed any other change. Are you talking about this or the segfault in >>> direkte ? >>> >>> -----Original Message----- >>> From: K. John Wu [mailto:[email protected]] >>> Sent: Monday, March 12, 2012 1:41 PM >>> To: Dominique Prunier >>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>> match LIKE patterns case-sensitively and perform specific optimizations >>> >>> Just hid release 1.2.9, there might be an off-by-1 problem as well. >>> Need to dig deeper.. >>> >>> John >>> >>> >>> On 3/12/12 10:13 AM, Dominique Prunier wrote: >>>> Yep, but i bet you changed them for a reason, maybe a compile warning or >>>> something (they were int32_t in my patch). Array_t indexes are definitely >>>> uints_32. Reverting them to int32_t works but maybe it would worth >>>> thinking about it. >>>> >>>> About the segfault, i captured an example of valgrind error (which >>>> ultimately leads to a segfault). As you can see, it is during query >>>> evaluation, not when reading the index. >>>> >>>> ==5672== Invalid read of size 4 >>>> ==5672== at 0x5414F60: ibis::bitvector::or_d1(ibis::bitvector const&) >>>> (bitvector.cpp:2934) >>>> ==5672== by 0x541C622: ibis::bitvector::operator|=(ibis::bitvector >>>> const&) (bitvector.cpp:1272) >>>> ==5672== by 0x52DA4A4: ibis::index::sumBins(unsigned int, unsigned int, >>>> ibis::bitvector&) const (index.cpp:6183) >>>> ==5672== by 0x55D097D: ibis::direkte::evaluate(ibis::qContinuousRange >>>> const&, ibis::bitvector&) const (idirekte.cpp:1071) >>>> ==5672== by 0x5517FB3: ibis::category::stringSearch(char const*, >>>> ibis::bitvector&) const (category.cpp:390) >>>> ==5672== by 0x5276C41: ibis::query::doEvaluate(ibis::qExpr const*, >>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3948) >>>> ==5672== by 0x52770E7: ibis::query::doEvaluate(ibis::qExpr const*, >>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3779) >>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, >>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776) >>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, >>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776) >>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, >>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776) >>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, >>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776) >>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, >>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776) >>>> ==5672== Address 0x9416f20 is 0 bytes after a block of size 354,720 >>>> alloc'd >>>> ==5672== at 0x4C28C6D: malloc (in >>>> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) >>>> ==5672== by 0x5445F24: ibis::fileManager::storage::storage(unsigned >>>> long) (fileManager.cpp:1718) >>>> ==5672== by 0x54466D9: ibis::fileManager::storage::enlarge(unsigned >>>> long) (fileManager.cpp:1977) >>>> ==5672== by 0x51A73B1: ibis::array_t<unsigned int>::resize(unsigned >>>> long) (array_t.cpp:1412) >>>> ==5672== by 0x54119FA: ibis::bitvector::decompress() (bitvector.cpp:364) >>>> ==5672== by 0x52DA45B: ibis::index::sumBins(unsigned int, unsigned int, >>>> ibis::bitvector&) const (index.cpp:6180) >>>> ==5672== by 0x55D097D: ibis::direkte::evaluate(ibis::qContinuousRange >>>> const&, ibis::bitvector&) const (idirekte.cpp:1071) >>>> ==5672== by 0x5517FB3: ibis::category::stringSearch(char const*, >>>> ibis::bitvector&) const (category.cpp:390) >>>> ==5672== by 0x5276C41: ibis::query::doEvaluate(ibis::qExpr const*, >>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3948) >>>> ==5672== by 0x52770E7: ibis::query::doEvaluate(ibis::qExpr const*, >>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3779) >>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, >>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776) >>>> ==5672== by 0x52770B2: ibis::query::doEvaluate(ibis::qExpr const*, >>>> ibis::bitvector const&, ibis::bitvector&) const (query.cpp:3776) >>>> >>>> I'm not able to reproduce on a simplistic use case. Not sure exactly what >>>> triggers this. Here again, this is not dramatic since i just have to >>>> regenerate my indexes but i'm wondering if there were a way to catch this >>>> (i'm thinking about people upgrading from a version prior to 1.2.9). >>>> >>>> Thanks, >>>> >>>> -----Original Message----- >>>> From: K. John Wu [mailto:[email protected]] >>>> Sent: Monday, March 12, 2012 1:02 PM >>>> To: Dominique Prunier >>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>>> match LIKE patterns case-sensitively and perform specific optimizations >>>> >>>> Hi, Dominique, >>>> >>>> Let me just confirm that the two lines where the change from int32_t >>>> to uint32_t should be reversed are line 458 and 459 of dictionary.cpp, >>>> right? >>>> >>>> John >>>> >>>> >>>> On 3/12/12 9:52 AM, Dominique Prunier wrote: >>>>> Hey John, >>>>> >>>>> The problem is that it doesn't actually fail when reading the index. The >>>>> index is read but during the evaluation, i have segfaults, bogus results >>>>> or valgrind errors. Once i regenerated the indexes for my category >>>>> column, everything worked liked a charm. >>>> >>>>> >>>>> It was also misleading because of the other issue (unsigned ints that >>>>> should have been signed ints) that segfaulted too. >>>>> >>>>> Thanks, >>>>> >>>>> -----Original Message----- >>>>> From: K. John Wu [mailto:[email protected]] >>>>> Sent: Monday, March 12, 2012 12:43 PM >>>>> To: Dominique Prunier >>>>> Cc: FastBit Users >>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>>>> match LIKE patterns case-sensitively and perform specific optimizations >>>>> >>>>> Hi, Dominique, >>>>> >>>>> I thought that I have checked index types. If you happen to know the >>>>> stack trace for the reading operation, let me know. Otherwise, it >>>>> might take me a while to figure out a good way to reproduce the problem.. >>>>> >>>>> John >>>>> >>>>> >>>>> On 3/12/12 9:30 AM, Dominique Prunier wrote: >>>>>> Ok, figured out the other segfault. The index have to be regenerated >>>>>> with the change from relic to direkte. My guess is that it was reading >>>>>> something invalid. Is there a missing check in the index read method ? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -----Original Message----- >>>>>> From: [email protected] >>>>>> [mailto:[email protected]] On Behalf Of Dominique >>>>>> Prunier >>>>>> Sent: Monday, March 12, 2012 11:45 AM >>>>>> To: K. John Wu >>>>>> Cc: FastBit Users >>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>>>>> match LIKE patterns case-sensitively and perform specific optimizations >>>>>> >>>>>> Hey John, >>>>>> >>>>>> The fix, as checked out in the revision 484 breaks the binary search of >>>>>> the pattern prefix: >>>>>> - int32_t b = 0; >>>>>> - int32_t e = key_.size() - 1; >>>>>> + uint32_t b = 0; >>>>>> + uint32_t e = key_.size() - 1; >>>>>> >>>>>> Since the stop condition of the loop can be that one of the index is -1, >>>>>> this now fails with a segfault. >>>>>> >>>>>> I'm troubleshooting another segfault in the bitvector right now (could >>>>>> it be related to the change in r 479 ?) >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -----Original Message----- >>>>>> From: K. John Wu [mailto:[email protected]] >>>>>> Sent: Saturday, March 10, 2012 2:24 PM >>>>>> To: Dominique Prunier >>>>>> Cc: FastBit Users >>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>>>>> match LIKE patterns case-sensitively and perform specific optimizations >>>>>> >>>>>> Just checked in the modification to allow users to define >>>>>> FASTBIT_CS_PATTERN_MATCH to 0 to disable case sensitive matches. The >>>>>> new SVN revision is 484. >>>>>> >>>>>> Also looked through other macros to make sure they are used consistently. >>>>>> >>>>>> John >>>>>> >>>>>> >>>>>> On 3/10/12 9:20 AM, Dominique Prunier wrote: >>>>>>> Hey John, >>>>>>> >>>>>>> I just noticed a small typo in utils.h, the macro is called FASTBOT_... >>>>>>> I don't think it was expected but it has the nice side effect of >>>>>>> disabling new code by default thus preserving current behavior (case >>>>>>> insensitive). Should we actually keep it in util.h now that it is >>>>>>> documented in INSTALL ? >>>>>>> >>>>>>> https://codeforge.lbl.gov/plugins/scmsvn/viewcvs.php/trunk/src/util.h?root=fastbit&r1=483&r2=482&pathrev=483 >>>>>>> >>>>>>> Thanks, >>>>>>> ________________________________________ >>>>>>> From: K. John Wu [[email protected]] >>>>>>> Sent: March-09-12 10:47 PM >>>>>>> To: Dominique Prunier >>>>>>> Cc: FastBit Users >>>>>>> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added >>>>>>> to match LIKE patterns case-sensitively and perform specific >>>>>>> optimizations >>>>>>> >>>>>>> Hi, Dominique, >>>>>>> >>>>>>> I would like to add FASTBIT_ prefix to the macro CS_PATTERN_MATCH to >>>>>>> avoid possible collision when FastBit is used with other package. >>>>>>> Hope you don't mind. >>>>>>> >>>>>>> John >>>>>>> >>>>>>> >>>>>>> On 3/9/12 4:03 PM, K. John Wu wrote: >>>>>>>> Hi, Dominique, >>>>>>>> >>>>>>>> I have run through my usual set of tests and did not find any problem >>>>>>>> with your patch. It is now in SVN 482. Please give it a try when you >>>>>>>> get the chance. >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> John >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 3/9/12 10:17 AM, Dominique Prunier wrote: >>>>>>>>> Quick update to my patch: >>>>>>>>> >>>>>>>>> · Changed dictionary::patternMatch to make it work with CI too >>>>>>>>> (and i think for efficiency reasons, i have to keep all this here) >>>>>>>>> >>>>>>>>> · Moved the STR_MATCH_* constants from util.cpp to util.h and >>>>>>>>> use them in dictionary::patternMatch >>>>>>>>> >>>>>>>>> · Removed the CS/CI ifdef from category.cpp >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I did more testing, and on my set of ~90 000 test queries, the >>>>>>>>> execution time dropped from ~515 seconds to ~20 seconds. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> *From:*[email protected] >>>>>>>>> [mailto:[email protected]] *On Behalf Of *Dominique >>>>>>>>> Prunier >>>>>>>>> *Sent:* Thursday, March 08, 2012 2:39 PM >>>>>>>>> *To:* FastBit Users >>>>>>>>> *Subject:* [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to >>>>>>>>> match LIKE patterns case-sensitively and perform specific >>>>>>>>> optimizations >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Here is the first version of my patch to switch SQL like from case >>>>>>>>> insensitive to case sensitive and optimize this use case with CATEGORY >>>>>>>>> columns. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> In a nutshell, what changed is: >>>>>>>>> >>>>>>>>> · We extract the longest (handling the escape char too) >>>>>>>>> constant prefix from the pattern >>>>>>>>> >>>>>>>>> · Instead of testing every value in the dictionary, we binary >>>>>>>>> search the range of values to search (which sometimes even allow to >>>>>>>>> skip pattern matching if no valid range can be found) >>>>>>>>> >>>>>>>>> · We test every value in the range >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On a large dictionary (~130k entries), i’ve commonly it can be one or >>>>>>>>> two order of magnitude faster (in my example, a simple query with a >>>>>>>>> single LIKE predicate drops from ~10ms to ~0.4ms). >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> What i’d like to change/refactor (i’m really a newbie in c++): >>>>>>>>> >>>>>>>>> · Remove the prefix extraction and pattern matching code from >>>>>>>>> dictionary and replace the added method patternSearch by something >>>>>>>>> like findRange. I believe that matching and pattern handling code >>>>>>>>> doesn’t belong to the dictionary. I’d rather move this back to the >>>>>>>>> category class or something. >>>>>>>>> >>>>>>>>> · Having to use a c++ string object to rebuild the longest >>>>>>>>> constant prefix bugs me (suggestions ?). I’m also thinking to have a >>>>>>>>> version that doesn’t support escaping, but it would force me to change >>>>>>>>> strMatch a bit more >>>>>>>>> >>>>>>>>> · To closely match the previous behavior, you can’t match an >>>>>>>>> empty pattern (even the empty string doesn’t match), maybe that would >>>>>>>>> worh being changed >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> As always John, feel free to include this into the main branch. I’m >>>>>>>>> waiting for suggestions to make it more efficient, cleaner, ... >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> */Dominique Prunier/**//* >>>>>>>>> >>>>>>>>> APG Lead Developper >>>>>>>>> >>>>>>>>> Logo-W4N-100dpi >>>>>>>>> >>>>>>>>> 4388, rue Saint-Denis >>>>>>>>> >>>>>>>>> Bureau 309 >>>>>>>>> >>>>>>>>> Montreal (Quebec) H2J 2L1 >>>>>>>>> >>>>>>>>> Tel. +1 514-842-6767 x310 >>>>>>>>> >>>>>>>>> Fax +1 514-842-3989 >>>>>>>>> >>>>>>>>> [email protected] >>>>>>>>> <mailto:[email protected]> >>>>>>>>> >>>>>>>>> www.watch4net.com <http://www.watch4net.com/> >>>>>>>>> >>>>>>>>> / / >>>>>>>>> >>>>>>>>> /This message is for the designated recipient only and may contain >>>>>>>>> privileged, proprietary, or otherwise private information. If you have >>>>>>>>> received it in error, please notify the sender immediately and delete >>>>>>>>> the original. Any other use of this electronic mail by you is >>>>>>>>> prohibited. >>>>>>>>> >>>>>>>>> //Ce message est pour le récipiendaire désigné seulement et peut >>>>>>>>> contenir des informations privilégiées, propriétaires ou autrement >>>>>>>>> privées. Si vous l'avez reçu par erreur, S.V.P. avisez l'expéditeur >>>>>>>>> immédiatement et effacez l'original. Toute autre utilisation de ce >>>>>>>>> courrier électronique par vous est prohibée./// >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> FastBit-users mailing list >>>>>>>>> [email protected] >>>>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >>>>>> _______________________________________________ >>>>>> FastBit-users mailing list >>>>>> [email protected] >>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
