"SuperIndex" for string serching
--------------------------------

                 Key: CORE-4687
                 URL: http://tracker.firebirdsql.org/browse/CORE-4687
             Project: Firebird Core
          Issue Type: New Feature
          Components: Engine
            Reporter: Karol Bieniaszewski


This is big task but can optimize search operation on big tables  and 
answer quality will be 100% of searched records for query "like '%something%'" 

case studies:

last year some customer with very big Firebird database near 1 bilion of 
records in biggest table ask for optimising search on field VarChar(127)
They used full text search engine and it work ok as fulltext but as you know 
how it work then answer quality is not 100%.
Benefit was that it answer in few seconds instead of many seconds if we do 
search like 
UPPER(SMS_BODY COLLATE PXW_PLK) FROM 1) LIKE 'SOMETHING%'
full table scan 

FullText indexing words and you can ask like "starting with" but not 
"containing"
It find record with text like "case studies" if you search for "studi" 
but not find if you search for "ies"

because in this case there was only one field and short VarChar(127)
then i decided to rethink all and i got concept

create expression indexes like 

CREATE ASCENDING INDEX IXAE_SMS__BODY1 ON SMS COMPUTED 
BY(SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 1));
CREATE ASCENDING INDEX IXAE_SMS__BODY2 ON SMS COMPUTED 
BY(SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 2));
CREATE ASCENDING INDEX IXAE_SMS__BODY3 ON SMS COMPUTED 
BY(SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 3));
CREATE ASCENDING INDEX IXAE_SMS__BODY4 ON SMS COMPUTED 
BY(SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 4));
CREATE ASCENDING INDEX IXAE_SMS__BODY5 ON SMS COMPUTED 
BY(SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 5));
CREATE ASCENDING INDEX IXAE_SMS__BODY6 ON SMS COMPUTED 
BY(SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 6));
CREATE ASCENDING INDEX IXAE_SMS__BODY7 ON SMS COMPUTED 
BY(SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 7));
CREATE ASCENDING INDEX IXAE_SMS__BODY8 ON SMS COMPUTED 
BY(SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 8));
CREATE ASCENDING INDEX IXAE_SMS__BODY9 ON SMS COMPUTED 
BY(SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 9));
... more

and when we search then we do

SELECT 
...
from
...
WHERE
 (
 SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 1) LIKE 'SOMETHING%' OR 
 SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 2) LIKE 'SOMETHING%' OR 
 SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 3) LIKE 'SOMETHING%' OR 
 SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 4) LIKE 'SOMETHING%' OR 
 SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 5) LIKE 'SOMETHING%' OR 
 SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 6) LIKE 'SOMETHING%' OR 
 SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 7) LIKE 'SOMETHING%' OR 
 SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 8) LIKE 'SOMETHING%' OR 
 SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 9) LIKE 'SOMETHING%' OR 
 SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 10) LIKE 'SOMETHING%' OR 
 SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 11) LIKE 'SOMETHING%' OR 
 ..
 ) 
 
and this query answer in few milliseconds with 100% of records :)
Of course it is storage expensive and any change to table is also expensive 
because of many indexes
But benefit is much more bigger for this customer and he was very surprised 
when he saw answer in few ms - and got 100% quality answer. 

Will be good to see build in Firebird creation of something what i call 
SuperIndex
This can be created poorly internally with many virtual indexes like above - 
and visible outside as single index
but mayby someone have concept how to store index info optimal without 
redundant information (i suppose that this can be patented then :)

and then we can 
CREATE SuperIndex IXAS_SMS__BODY ON SMS(BODY);

and then we can do simple search:

SELECT 
...
from
...
WHERE
SMS_BODY LIKE '%SOMETHING%'

and then engine do for us work like above

I do not know whether sufficient human resources are in FB team
and of course if the same concept will be of interest



 
 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://tracker.firebirdsql.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to