Hey John,

I tried the sort. It ends up speeding up some cases but slowing down some 
others (i guess when there are many values, you are paying the sort). At the 
end, on my large query test set, it ended up being slower so i reverted it.

Like i said earlier, this exact same set is now 6 times slower. I think it is 
related to the fact that my bit vectors are much more complex (and more 
distributed) and more likely to match something (since the previous results 
were pure garbage), but i'm not sure why it make such a difference (i'd assume 
the first N one would be as dispersed that an set of N vectors). The hot spot 
is ibis::bitvector::do_cnt and i have the impression that it is called quite 
often with a test like cnt() != 0 or > 0 which could probably be simplified. 
I'll take a look at that if i have time today.

Thanks,

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Dominique Prunier
Sent: Thursday, March 15, 2012 5:08 PM
To: K. John Wu
Cc: FastBit Users
Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to match 
LIKE patterns case-sensitively and perform specific optimizations

Hi John,

About the perf, i'm trying to understand what could justify such a difference 
(my ~90k queries test jumped from 20s to 120s while we should be handling the 
same number of bit vectors). So far, it seems that ~70% of the time is spent in 
ibis::bitvector::do_cnt(). I'll try the sort trick to see how it affects 
performance. I tend to be carefull with profiler results and favor your 
knowledge of the codebase.

Thanks,

-----Original Message-----
From: K. John Wu [mailto:[email protected]] 
Sent: Thursday, March 15, 2012 4:56 PM
To: Dominique Prunier
Cc: FastBit Users
Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to match 
LIKE patterns case-sensitively and perform specific optimizations

Hi, Dominique,

Regarding the test cases, whenever you are ready to share is fine.

Regarding the performance, my guess is that the bit vectors are more
spread out in the new corrected approach.  When the bit vectors are
spread out, FastBit needs to read more bytes in order to get them out.
 If this is indeed the case, the thing to do might to sort the values
before returning them from ibis::dictionary::patternSearch with
something like

std::sort(matches.begin(), matches.end());

John


On 3/15/12 12:24 PM, Dominique Prunier wrote:
> Hey John,
> 
> The problem is that right know, my test suite mainly tests the
> import and not so much the queries (i have maybe 10-20 test cases
> that actually evaluates queries). It has been basically modeled
> after our existing test suite for MySQL/Oracle and is supposed to
> prove that the imported data is equivalent to our source.
> 
> My next step is to do a micro testing framework to test queries. I
> have a very simplistic example which only uses CATEGORY and LONG
> (the only two types that we're using), but it is not convenient
> enough to use.
> 
> Be sure that i'll post in the group if i have something good enough
> to be shared :)
> 
> Regarding the bug i found yesterday, surprisingly enough, fixing it
> changed the performance quite significantly. I'm still not sure why
> (the number of selected bitmap was the same), but i'll try to
> investigate a bit further.
> 
> Thanks,
> 
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to