Hi John,

I just tried the new code, it seems that it saves a few percents (1-5% 
depending how much pattern matching there is in the query) but we're still far 
from the initial results.

Thanks,

-----Original Message-----
From: K. John Wu [mailto:[email protected]] 
Sent: Friday, March 16, 2012 2:06 PM
To: Dominique Prunier
Cc: FastBit Users
Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to match 
LIKE patterns case-sensitively and perform specific optimizations

Hi, Dominique,

The function ibis::category::fillIndex now sorts the dictionary before
actually creating the index.  The source code is SVN 492.  You could
give it a try and see if it helps.

John


On 3/16/12 6:34 AM, Dominique Prunier wrote:
> Hey John,
> 
> I tried the sort. It ends up speeding up some cases but slowing down some 
> others (i guess when there are many values, you are paying the sort). At the 
> end, on my large query test set, it ended up being slower so i reverted it.
> 
> Like i said earlier, this exact same set is now 6 times slower. I think it is 
> related to the fact that my bit vectors are much more complex (and more 
> distributed) and more likely to match something (since the previous results 
> were pure garbage), but i'm not sure why it make such a difference (i'd 
> assume the first N one would be as dispersed that an set of N vectors). The 
> hot spot is ibis::bitvector::do_cnt and i have the impression that it is 
> called quite often with a test like cnt() != 0 or > 0 which could probably be 
> simplified. I'll take a look at that if i have time today.
> 
> Thanks,
> 
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Dominique Prunier
> Sent: Thursday, March 15, 2012 5:08 PM
> To: K. John Wu
> Cc: FastBit Users
> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
> match LIKE patterns case-sensitively and perform specific optimizations
> 
> Hi John,
> 
> About the perf, i'm trying to understand what could justify such a difference 
> (my ~90k queries test jumped from 20s to 120s while we should be handling the 
> same number of bit vectors). So far, it seems that ~70% of the time is spent 
> in ibis::bitvector::do_cnt(). I'll try the sort trick to see how it affects 
> performance. I tend to be carefull with profiler results and favor your 
> knowledge of the codebase.
> 
> Thanks,
> 
> -----Original Message-----
> From: K. John Wu [mailto:[email protected]] 
> Sent: Thursday, March 15, 2012 4:56 PM
> To: Dominique Prunier
> Cc: FastBit Users
> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
> match LIKE patterns case-sensitively and perform specific optimizations
> 
> Hi, Dominique,
> 
> Regarding the test cases, whenever you are ready to share is fine.
> 
> Regarding the performance, my guess is that the bit vectors are more
> spread out in the new corrected approach.  When the bit vectors are
> spread out, FastBit needs to read more bytes in order to get them out.
>  If this is indeed the case, the thing to do might to sort the values
> before returning them from ibis::dictionary::patternSearch with
> something like
> 
> std::sort(matches.begin(), matches.end());
> 
> John
> 
> 
> On 3/15/12 12:24 PM, Dominique Prunier wrote:
>> Hey John,
>>
>> The problem is that right know, my test suite mainly tests the
>> import and not so much the queries (i have maybe 10-20 test cases
>> that actually evaluates queries). It has been basically modeled
>> after our existing test suite for MySQL/Oracle and is supposed to
>> prove that the imported data is equivalent to our source.
>>
>> My next step is to do a micro testing framework to test queries. I
>> have a very simplistic example which only uses CATEGORY and LONG
>> (the only two types that we're using), but it is not convenient
>> enough to use.
>>
>> Be sure that i'll post in the group if i have something good enough
>> to be shared :)
>>
>> Regarding the bug i found yesterday, surprisingly enough, fixing it
>> changed the performance quite significantly. I'm still not sure why
>> (the number of selected bitmap was the same), but i'll try to
>> investigate a bit further.
>>
>> Thanks,
>>
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to