Here is the first version of my patch to switch SQL like from case insensitive 
to case sensitive and optimize this use case with CATEGORY columns.

In a nutshell, what changed is:

·         We extract the longest (handling the escape char too) constant prefix 
from the pattern

·         Instead of testing every value in the dictionary, we binary search 
the range of values to search (which sometimes even allow to skip pattern 
matching if no valid range can be found)

·         We test every value in the range

On a large dictionary (~130k entries), i've commonly it can be one or two order 
of magnitude faster (in my example, a simple query with a single LIKE predicate 
drops from ~10ms to ~0.4ms).

What i'd like to change/refactor (i'm really a newbie in c++):

·         Remove the prefix extraction and pattern matching code from 
dictionary and replace the added method patternSearch by something like 
findRange. I believe that matching and pattern handling code doesn't belong to 
the dictionary. I'd rather move this back to the category class or something.

·         Having to use a c++ string object to rebuild the longest constant 
prefix bugs me (suggestions ?). I'm also thinking to have a version that 
doesn't support escaping, but it would force me to change strMatch a bit more

·         To closely match the previous behavior, you can't match an empty 
pattern (even the empty string doesn't match), maybe that would worh being 
changed

As always John, feel free to include this into the main branch. I'm waiting for 
suggestions to make it more efficient, cleaner, ...

Thanks,

Dominique Prunier
 APG Lead Developper
[cid:[email protected]]
 4388, rue Saint-Denis
 Bureau 309
 Montreal (Quebec)  H2J 2L1
 Tel. +1 514-842-6767  x310
 Fax +1 514-842-3989
 [email protected]<mailto:[email protected]>
 www.watch4net.com<http://www.watch4net.com/>

This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise private information. If you have received it in 
error, please notify the sender immediately and delete the original. Any other 
use of this electronic mail by you is prohibited.

Ce message est pour le récipiendaire désigné seulement et peut contenir des 
informations privilégiées, propriétaires ou autrement privées. Si vous l'avez 
reçu par erreur, S.V.P. avisez l'expéditeur immédiatement et effacez 
l'original. Toute autre utilisation de ce courrier électronique par vous est 
prohibée.

<<inline: image001.gif>>

Attachment: cs_like.patch
Description: cs_like.patch

_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to