On 03.09.22 06:30, Nathan Bossart wrote:
On Fri, Sep 02, 2022 at 10:06:54PM -0400, Tom Lane wrote:
I think the distance limit of 5 is too loose though.  I see that
it accommodates examples like "passfile" for "password", which
seems great at first glance; but it also allows fundamentally
silly suggestions like "user" for "server" or "host" for "foo".
We'd need something smarter than Levenshtein if we want to offer
"passfile" for "password" without looking stupid on a whole lot
of other cases --- those words seem close, but they are close
semantically not textually.

Yeah, it's really only useful for simple misspellings, but IMO even that is
rather handy.

I noticed that the parse_relation.c stuff excludes matches where more than
half the characters are different, so I added that here and lowered the
distance limit to 4.  This seems to prevent the silly suggestions (e.g.,
"host" for "foo") while retaining the more believable ones (e.g.,
"passfile" for "password"), at least for the small set of examples covered
in the tests.

I think this code is compact enough and the hints it produces are reasonable, so I think we could go with it.

I notice that for column misspellings, the hint is phrased "Perhaps you meant X." whereas here we have "Did you mean X?". Let's make that uniform.



Reply via email to