SIMILAR TO is slower than LIKE for hundred times on trivial pattern matching with varchar datatype. ---------------------------------------------------------------------------------------------------
Key: CORE-5664 URL: http://tracker.firebirdsql.org/browse/CORE-5664 Project: Firebird Core Issue Type: Bug Components: Engine Affects Versions: 4.0 Alpha 1, 3.0.2, 2.5.7 Reporter: Pavel Zotov 1) SIMILAR TO does not use opportunity to stop search immediately when pattern has '%' as last character. 2) SIMILAR TO slower than LIKE with ratio about several hundred times (on the same text data; not blobs but long varchar datatype is used below). Script-1: test of performance when pattern string FINISHES with percent sign ( s like '%QWERTY' == vs== s similar to '%QWERTY' ) ======== set list on; set bail on; set term ^; execute block returns(n_count int, elap_ms_using_like int, elap_ms_using_similar_to int) as declare i int = 0; declare t0 timestamp; declare t1 timestamp; declare s varchar(32761); begin s = lpad('', 32755, uuid_to_char(gen_uuid())) || 'QWERTY'; n_count = 100; t0 = cast('now' as timestamp); while (i < n_count) do begin i = i + iif( s like '%QWERTY', 1, 1); end t1 = cast('now' as timestamp); elap_ms_using_like = datediff(millisecond from t0 to t1); i = 0; while (i < n_count) do begin i = i + iif( s similar to '%QWERTY', 1, 1); end elap_ms_using_similar_to = datediff(millisecond from t1 to cast('now' as timestamp)); suspend; end^ set term ;^ Output: ====== N_COUNT 100 ELAP_MS_USING_LIKE 15 ELAP_MS_USING_SIMILAR_TO 10875 Script-2: test performance when pattern STARTS with alpha-num and we have case equal to 'starting with' ( s like 'QWERTY%' == vs == s similar to 'QWERTY%' ) ======= set list on; set bail on; set term ^; execute block returns(n_count int, elap_ms_using_like int, elap_ms_using_similar_to int) as declare i int = 0; declare t0 timestamp; declare t1 timestamp; declare s varchar(32761); begin s = 'QWERTY' || lpad('', 32755, uuid_to_char(gen_uuid())) ; n_count = 100; t0 = cast('now' as timestamp); while (i < n_count) do begin i = i + iif( s like 'QWERTY%', 1, 1); end t1 = cast('now' as timestamp); elap_ms_using_like = datediff(millisecond from t0 to t1); i = 0; while (i < n_count) do begin i = i + iif( s similar to 'QWERTY%', 1, 1); end elap_ms_using_similar_to = datediff(millisecond from t1 to cast('now' as timestamp)); suspend; end^ set term ;^ Output: ====== N_COUNT 100 ELAP_MS_USING_LIKE 31 ELAP_MS_USING_SIMILAR_TO 10969 So, I have following questions: 1) is it possible to improve speed of SIMILAR TO per se ? (yes, i do remember about CORE-3858 but it was created more than 5 yeasago and it seems to me that there is no any progress) 2) why both operators (LIKE and SIMILAR TO) can't guess to scan string in reverse order when we ask to find matching for '%QWERTY' and pattern length << than length of source (long) string ? Look at results when number of iterations was increased from 100 to 10000, only for LIKE: 1) s LIKE 'QWERTY%': N_COUNT 10000 ELAP_MS_USING_LIKE 141 2) s LIKE '%QWERTY': N_COUNT 10000 ELAP_MS_USING_LIKE 3312 -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://tracker.firebirdsql.org/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel