Hi, Dominique,

Thanks for catching this typo.  The thing to do might be change how
FASTBIT_CS_PATTERN_MATCH so that another user can disable it if they
want.  This can be done by change

#ifdef FASTBIT_CS_PATTERN_MATCH

to

#if FASTBIT_CS_PATTERN_MATCH+0 > 0

I would like to keep the default to case sensitive matching because it
is more efficient and it does not cause too much trouble for a user to
match the cases.  I will see if there is another user who speaks up
about this..

John


On 3/10/12 9:20 AM, Dominique Prunier wrote:
> Hey John,
> 
> I just noticed a small typo in utils.h, the macro is called FASTBOT_... I 
> don't think it was expected but it has the nice side effect of disabling new 
> code by default thus preserving current behavior (case insensitive). Should 
> we actually keep it in util.h now that it is documented in INSTALL ?
> 
> https://codeforge.lbl.gov/plugins/scmsvn/viewcvs.php/trunk/src/util.h?root=fastbit&r1=483&r2=482&pathrev=483
> 
> Thanks,
> ________________________________________
> From: K. John Wu [[email protected]]
> Sent: March-09-12 10:47 PM
> To: Dominique Prunier
> Cc: FastBit Users
> Subject: Re: [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to 
> match LIKE patterns case-sensitively and perform specific optimizations
> 
> Hi, Dominique,
> 
> I would like to add FASTBIT_ prefix to the macro CS_PATTERN_MATCH to
> avoid possible collision when FastBit is used with other package.
> Hope you don't mind.
> 
> John
> 
> 
> On 3/9/12 4:03 PM, K. John Wu wrote:
>> Hi, Dominique,
>>
>> I have run through my usual set of tests and did not find any problem
>> with your patch.  It is now in SVN 482.  Please give it a try when you
>> get the chance.
>>
>> Thanks.
>>
>> John
>>
>>
>>
>> On 3/9/12 10:17 AM, Dominique Prunier wrote:
>>> Quick update to my patch:
>>>
>>> ·         Changed dictionary::patternMatch to make it work with CI too
>>> (and i think for efficiency reasons, i have to keep all this here)
>>>
>>> ·         Moved the STR_MATCH_* constants from util.cpp to util.h and
>>> use them in dictionary::patternMatch
>>>
>>> ·         Removed the CS/CI ifdef from category.cpp
>>>
>>>
>>>
>>> I did more testing, and on my set of ~90 000 test queries, the
>>> execution time dropped from ~515 seconds to ~20 seconds.
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> *From:*[email protected]
>>> [mailto:[email protected]] *On Behalf Of *Dominique
>>> Prunier
>>> *Sent:* Thursday, March 08, 2012 2:39 PM
>>> *To:* FastBit Users
>>> *Subject:* [FastBit-users] PATCH: new CS_PATTERN_MATCH define added to
>>> match LIKE patterns case-sensitively and perform specific optimizations
>>>
>>>
>>>
>>> Here is the first version of my patch to switch SQL like from case
>>> insensitive to case sensitive and optimize this use case with CATEGORY
>>> columns.
>>>
>>>
>>>
>>> In a nutshell, what changed is:
>>>
>>> ·         We extract the longest (handling the escape char too)
>>> constant prefix from the pattern
>>>
>>> ·         Instead of testing every value in the dictionary, we binary
>>> search the range of values to search (which sometimes even allow to
>>> skip pattern matching if no valid range can be found)
>>>
>>> ·         We test every value in the range
>>>
>>>
>>>
>>> On a large dictionary (~130k entries), i’ve commonly it can be one or
>>> two order of magnitude faster (in my example, a simple query with a
>>> single LIKE predicate drops from ~10ms to ~0.4ms).
>>>
>>>
>>>
>>> What i’d like to change/refactor (i’m really a newbie in c++):
>>>
>>> ·         Remove the prefix extraction and pattern matching code from
>>> dictionary and replace the added method patternSearch by something
>>> like findRange. I believe that matching and pattern handling code
>>> doesn’t belong to the dictionary. I’d rather move this back to the
>>> category class or something.
>>>
>>> ·         Having to use a c++ string object to rebuild the longest
>>> constant prefix bugs me (suggestions ?). I’m also thinking to have a
>>> version that doesn’t support escaping, but it would force me to change
>>> strMatch a bit more
>>>
>>> ·         To closely match the previous behavior, you can’t match an
>>> empty pattern (even the empty string doesn’t match), maybe that would
>>> worh being changed
>>>
>>>
>>>
>>> As always John, feel free to include this into the main branch. I’m
>>> waiting for suggestions to make it more efficient, cleaner, ...
>>>
>>>
>>>
>>> Thanks,
>>>
>>>
>>>
>>> */Dominique Prunier/**//*
>>>
>>>  APG Lead Developper
>>>
>>> Logo-W4N-100dpi
>>>
>>>  4388, rue Saint-Denis
>>>
>>>  Bureau 309
>>>
>>>  Montreal (Quebec)  H2J 2L1
>>>
>>>  Tel. +1 514-842-6767  x310
>>>
>>>  Fax +1 514-842-3989
>>>
>>>  [email protected] <mailto:[email protected]>
>>>
>>>  www.watch4net.com <http://www.watch4net.com/>
>>>
>>> /  /
>>>
>>> /This message is for the designated recipient only and may contain
>>> privileged, proprietary, or otherwise private information. If you have
>>> received it in error, please notify the sender immediately and delete
>>> the original. Any other use of this electronic mail by you is prohibited.
>>>
>>> //Ce message est pour le récipiendaire désigné seulement et peut
>>> contenir des informations privilégiées, propriétaires ou autrement
>>> privées. Si vous l'avez reçu par erreur, S.V.P. avisez l'expéditeur
>>> immédiatement et effacez l'original. Toute autre utilisation de ce
>>> courrier électronique par vous est prohibée.///
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> FastBit-users mailing list
>>> [email protected]
>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to