The behavior of this function is surprising to me.
select substring_similarity('dog' , 'hotdogpound') ;
substring_similarity
----------------------
0.25
Substring search was desined to search similar word in string:
contrib_regression=# select substring_similarity('dog' , 'hot dogpound') ;
substring_similarity
----------------------
0.75
contrib_regression=# select substring_similarity('dog' , 'hot dog pound') ;
substring_similarity
----------------------
1
Hmm, this behavior looks too much like magic to me. I mean, a substring
is a substring -- why are we treating the space as a special character
here?
Because it isn't a regex for substring search. Since implementing, pg_trgm
works over words in string.
contrib_regression=# select similarity('block hole', 'hole black');
similarity
------------
0.571429
contrib_regression=# select similarity('block hole', 'black hole');
similarity
------------
0.571429
It ignores spaces between words and word's order.
I agree, that substring_similarity is confusing name, but actually it search
most similar word in second arg to first arg and returns their similarity.
--
Teodor Sigaev E-mail: teo...@sigaev.ru
WWW: http://www.sigaev.ru/
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers