"SuperIndex" for string serching --------------------------------
Key: CORE-4687 URL: http://tracker.firebirdsql.org/browse/CORE-4687 Project: Firebird Core Issue Type: New Feature Components: Engine Reporter: Karol Bieniaszewski This is big task but can optimize search operation on big tables and answer quality will be 100% of searched records for query "like '%something%'" case studies: last year some customer with very big Firebird database near 1 bilion of records in biggest table ask for optimising search on field VarChar(127) They used full text search engine and it work ok as fulltext but as you know how it work then answer quality is not 100%. Benefit was that it answer in few seconds instead of many seconds if we do search like UPPER(SMS_BODY COLLATE PXW_PLK) FROM 1) LIKE 'SOMETHING%' full table scan FullText indexing words and you can ask like "starting with" but not "containing" It find record with text like "case studies" if you search for "studi" but not find if you search for "ies" because in this case there was only one field and short VarChar(127) then i decided to rethink all and i got concept create expression indexes like CREATE ASCENDING INDEX IXAE_SMS__BODY1 ON SMS COMPUTED BY(SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 1)); CREATE ASCENDING INDEX IXAE_SMS__BODY2 ON SMS COMPUTED BY(SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 2)); CREATE ASCENDING INDEX IXAE_SMS__BODY3 ON SMS COMPUTED BY(SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 3)); CREATE ASCENDING INDEX IXAE_SMS__BODY4 ON SMS COMPUTED BY(SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 4)); CREATE ASCENDING INDEX IXAE_SMS__BODY5 ON SMS COMPUTED BY(SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 5)); CREATE ASCENDING INDEX IXAE_SMS__BODY6 ON SMS COMPUTED BY(SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 6)); CREATE ASCENDING INDEX IXAE_SMS__BODY7 ON SMS COMPUTED BY(SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 7)); CREATE ASCENDING INDEX IXAE_SMS__BODY8 ON SMS COMPUTED BY(SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 8)); CREATE ASCENDING INDEX IXAE_SMS__BODY9 ON SMS COMPUTED BY(SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 9)); ... more and when we search then we do SELECT ... from ... WHERE ( SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 1) LIKE 'SOMETHING%' OR SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 2) LIKE 'SOMETHING%' OR SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 3) LIKE 'SOMETHING%' OR SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 4) LIKE 'SOMETHING%' OR SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 5) LIKE 'SOMETHING%' OR SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 6) LIKE 'SOMETHING%' OR SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 7) LIKE 'SOMETHING%' OR SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 8) LIKE 'SOMETHING%' OR SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 9) LIKE 'SOMETHING%' OR SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 10) LIKE 'SOMETHING%' OR SUBSTRING(UPPER(SMS_BODY COLLATE PXW_PLK) FROM 11) LIKE 'SOMETHING%' OR .. ) and this query answer in few milliseconds with 100% of records :) Of course it is storage expensive and any change to table is also expensive because of many indexes But benefit is much more bigger for this customer and he was very surprised when he saw answer in few ms - and got 100% quality answer. Will be good to see build in Firebird creation of something what i call SuperIndex This can be created poorly internally with many virtual indexes like above - and visible outside as single index but mayby someone have concept how to store index info optimal without redundant information (i suppose that this can be patented then :) and then we can CREATE SuperIndex IXAS_SMS__BODY ON SMS(BODY); and then we can do simple search: SELECT ... from ... WHERE SMS_BODY LIKE '%SOMETHING%' and then engine do for us work like above I do not know whether sufficient human resources are in FB team and of course if the same concept will be of interest -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://tracker.firebirdsql.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------------ Dive into the World of Parallel Programming. The Go Parallel Website, sponsored by Intel and developed in partnership with Slashdot Media, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials and more. Take a look and join the conversation now. http://goparallel.sourceforge.net/ Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel