Hi John, I just tried the new code, it seems that it saves a few percents (1-5% depending how much pattern matching there is in the query) but we're still far from the initial results.
Thanks, -----Original Message----- From: K. John Wu [mailto:[email protected]] Sent: Friday, March 16, 2012 2:06 PM To: Dominique Prunier Cc: FastBit Users Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to match LIKE patterns case-sensitively and perform specific optimizations Hi, Dominique, The function ibis::category::fillIndex now sorts the dictionary before actually creating the index. The source code is SVN 492. You could give it a try and see if it helps. John On 3/16/12 6:34 AM, Dominique Prunier wrote: > Hey John, > > I tried the sort. It ends up speeding up some cases but slowing down some > others (i guess when there are many values, you are paying the sort). At the > end, on my large query test set, it ended up being slower so i reverted it. > > Like i said earlier, this exact same set is now 6 times slower. I think it is > related to the fact that my bit vectors are much more complex (and more > distributed) and more likely to match something (since the previous results > were pure garbage), but i'm not sure why it make such a difference (i'd > assume the first N one would be as dispersed that an set of N vectors). The > hot spot is ibis::bitvector::do_cnt and i have the impression that it is > called quite often with a test like cnt() != 0 or > 0 which could probably be > simplified. I'll take a look at that if i have time today. > > Thanks, > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Dominique Prunier > Sent: Thursday, March 15, 2012 5:08 PM > To: K. John Wu > Cc: FastBit Users > Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to > match LIKE patterns case-sensitively and perform specific optimizations > > Hi John, > > About the perf, i'm trying to understand what could justify such a difference > (my ~90k queries test jumped from 20s to 120s while we should be handling the > same number of bit vectors). So far, it seems that ~70% of the time is spent > in ibis::bitvector::do_cnt(). I'll try the sort trick to see how it affects > performance. I tend to be carefull with profiler results and favor your > knowledge of the codebase. > > Thanks, > > -----Original Message----- > From: K. John Wu [mailto:[email protected]] > Sent: Thursday, March 15, 2012 4:56 PM > To: Dominique Prunier > Cc: FastBit Users > Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to > match LIKE patterns case-sensitively and perform specific optimizations > > Hi, Dominique, > > Regarding the test cases, whenever you are ready to share is fine. > > Regarding the performance, my guess is that the bit vectors are more > spread out in the new corrected approach. When the bit vectors are > spread out, FastBit needs to read more bytes in order to get them out. > If this is indeed the case, the thing to do might to sort the values > before returning them from ibis::dictionary::patternSearch with > something like > > std::sort(matches.begin(), matches.end()); > > John > > > On 3/15/12 12:24 PM, Dominique Prunier wrote: >> Hey John, >> >> The problem is that right know, my test suite mainly tests the >> import and not so much the queries (i have maybe 10-20 test cases >> that actually evaluates queries). It has been basically modeled >> after our existing test suite for MySQL/Oracle and is supposed to >> prove that the imported data is equivalent to our source. >> >> My next step is to do a micro testing framework to test queries. I >> have a very simplistic example which only uses CATEGORY and LONG >> (the only two types that we're using), but it is not convenient >> enough to use. >> >> Be sure that i'll post in the group if i have something good enough >> to be shared :) >> >> Regarding the bug i found yesterday, surprisingly enough, fixing it >> changed the performance quite significantly. I'm still not sure why >> (the number of selected bitmap was the same), but i'll try to >> investigate a bit further. >> >> Thanks, >> > _______________________________________________ > FastBit-users mailing list > [email protected] > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
