Hi, Dominique,

I have run through my usual set of tests and did not find any problem
with your patch.  It is now in SVN 482.  Please give it a try when you
get the chance.

Thanks.

John



On 3/9/12 10:17 AM, Dominique Prunier wrote:
> Quick update to my patch:
> 
> ·         Changed dictionary::patternMatch to make it work with CI too
> (and i think for efficiency reasons, i have to keep all this here)
> 
> ·         Moved the STR_MATCH_* constants from util.cpp to util.h and
> use them in dictionary::patternMatch
> 
> ·         Removed the CS/CI ifdef from category.cpp
> 
>  
> 
> I did more testing, and on my set of ~90 000 test queries, the
> execution time dropped from ~515 seconds to ~20 seconds.
> 
>  
> 
> Thanks,
> 
>  
> 
> *From:*[email protected]
> [mailto:[email protected]] *On Behalf Of *Dominique
> Prunier
> *Sent:* Thursday, March 08, 2012 2:39 PM
> *To:* FastBit Users
> *Subject:* [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to
> match LIKE patterns case-sensitively and perform specific optimizations
> 
>  
> 
> Here is the first version of my patch to switch SQL like from case
> insensitive to case sensitive and optimize this use case with CATEGORY
> columns.
> 
>  
> 
> In a nutshell, what changed is:
> 
> ·         We extract the longest (handling the escape char too)
> constant prefix from the pattern
> 
> ·         Instead of testing every value in the dictionary, we binary
> search the range of values to search (which sometimes even allow to
> skip pattern matching if no valid range can be found)
> 
> ·         We test every value in the range
> 
>  
> 
> On a large dictionary (~130k entries), i’ve commonly it can be one or
> two order of magnitude faster (in my example, a simple query with a
> single LIKE predicate drops from ~10ms to ~0.4ms).
> 
>  
> 
> What i’d like to change/refactor (i’m really a newbie in c++):
> 
> ·         Remove the prefix extraction and pattern matching code from
> dictionary and replace the added method patternSearch by something
> like findRange. I believe that matching and pattern handling code
> doesn’t belong to the dictionary. I’d rather move this back to the
> category class or something.
> 
> ·         Having to use a c++ string object to rebuild the longest
> constant prefix bugs me (suggestions ?). I’m also thinking to have a
> version that doesn’t support escaping, but it would force me to change
> strMatch a bit more
> 
> ·         To closely match the previous behavior, you can’t match an
> empty pattern (even the empty string doesn’t match), maybe that would
> worh being changed
> 
>  
> 
> As always John, feel free to include this into the main branch. I’m
> waiting for suggestions to make it more efficient, cleaner, ...
> 
>  
> 
> Thanks,
> 
>  
> 
> */Dominique Prunier/**//*
> 
>  APG Lead Developper
> 
> Logo-W4N-100dpi
> 
>  4388, rue Saint-Denis
> 
>  Bureau 309
> 
>  Montreal (Quebec)  H2J 2L1
> 
>  Tel. +1 514-842-6767  x310
> 
>  Fax +1 514-842-3989
> 
>  [email protected] <mailto:[email protected]>
> 
>  www.watch4net.com <http://www.watch4net.com/>
> 
> /  /
> 
> /This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise private information. If you have
> received it in error, please notify the sender immediately and delete
> the original. Any other use of this electronic mail by you is prohibited.
> 
> //Ce message est pour le récipiendaire désigné seulement et peut
> contenir des informations privilégiées, propriétaires ou autrement
> privées. Si vous l'avez reçu par erreur, S.V.P. avisez l'expéditeur
> immédiatement et effacez l'original. Toute autre utilisation de ce
> courrier électronique par vous est prohibée.///
> 
>  
> 
> 
> 
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to