SIMILAR TO is slower than LIKE for hundred times on trivial pattern matching
with varchar datatype.
---------------------------------------------------------------------------------------------------
Key: CORE-5664
URL: http://tracker.firebirdsql.org/browse/CORE-5664
Project: Firebird Core
Issue Type: Bug
Components: Engine
Affects Versions: 4.0 Alpha 1, 3.0.2, 2.5.7
Reporter: Pavel Zotov
1) SIMILAR TO does not use opportunity to stop search immediately when pattern
has '%' as last character.
2) SIMILAR TO slower than LIKE with ratio about several hundred times (on the
same text data; not blobs but long varchar datatype is used below).
Script-1: test of performance when pattern string FINISHES with percent sign (
s like '%QWERTY' == vs== s similar to '%QWERTY' )
========
set list on;
set bail on;
set term ^;
execute block returns(n_count int, elap_ms_using_like int,
elap_ms_using_similar_to int) as
declare i int = 0;
declare t0 timestamp;
declare t1 timestamp;
declare s varchar(32761);
begin
s = lpad('', 32755, uuid_to_char(gen_uuid())) || 'QWERTY';
n_count = 100;
t0 = cast('now' as timestamp);
while (i < n_count) do
begin
i = i + iif( s like '%QWERTY', 1, 1);
end
t1 = cast('now' as timestamp);
elap_ms_using_like = datediff(millisecond from t0 to t1);
i = 0;
while (i < n_count) do
begin
i = i + iif( s similar to '%QWERTY', 1, 1);
end
elap_ms_using_similar_to = datediff(millisecond from t1 to cast('now' as
timestamp));
suspend;
end^
set term ;^
Output:
======
N_COUNT 100
ELAP_MS_USING_LIKE 15
ELAP_MS_USING_SIMILAR_TO 10875
Script-2: test performance when pattern STARTS with alpha-num and we have case
equal to 'starting with' ( s like 'QWERTY%' == vs == s similar to 'QWERTY%' )
=======
set list on;
set bail on;
set term ^;
execute block returns(n_count int, elap_ms_using_like int,
elap_ms_using_similar_to int) as
declare i int = 0;
declare t0 timestamp;
declare t1 timestamp;
declare s varchar(32761);
begin
s = 'QWERTY' || lpad('', 32755, uuid_to_char(gen_uuid())) ;
n_count = 100;
t0 = cast('now' as timestamp);
while (i < n_count) do
begin
i = i + iif( s like 'QWERTY%', 1, 1);
end
t1 = cast('now' as timestamp);
elap_ms_using_like = datediff(millisecond from t0 to t1);
i = 0;
while (i < n_count) do
begin
i = i + iif( s similar to 'QWERTY%', 1, 1);
end
elap_ms_using_similar_to = datediff(millisecond from t1 to cast('now' as
timestamp));
suspend;
end^
set term ;^
Output:
======
N_COUNT 100
ELAP_MS_USING_LIKE 31
ELAP_MS_USING_SIMILAR_TO 10969
So, I have following questions:
1) is it possible to improve speed of SIMILAR TO per se ? (yes, i do remember
about CORE-3858 but it was created more than 5 yeasago and it seems to me that
there is no any progress)
2) why both operators (LIKE and SIMILAR TO) can't guess to scan string in
reverse order when we ask to find matching for '%QWERTY' and pattern length <<
than length of source (long) string ?
Look at results when number of iterations was increased from 100 to 10000, only
for LIKE:
1) s LIKE 'QWERTY%':
N_COUNT 10000
ELAP_MS_USING_LIKE 141
2) s LIKE '%QWERTY':
N_COUNT 10000
ELAP_MS_USING_LIKE 3312
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://tracker.firebirdsql.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
Firebird-Devel mailing list, web interface at
https://lists.sourceforge.net/lists/listinfo/firebird-devel