On 06/17/2014 02:36 PM, Tom Lane wrote: > Josh Berkus <j...@agliodbs.com> writes: >> (2) If there are multiple columns with the same levenschtien distance, >> which one do you suggest? The current code picks a random one, which >> I'm OK with. The other option would be to list all of the columns. > > I objected to that upthread. I don't think that picking a random one is > sane at all. Listing them all might be OK (I notice that that seems to be > what both bash and git do). > > Another issue is whether to print only those having exactly the minimum > observed Levenshtein distance, or to print everything less than some > cutoff. The former approach seems to me to be placing a great deal of > faith in something that's only a heuristic.
Well, that depends on what the cutoff is. If it's high, like 0.5, that could be a LOT of columns. Like, I plan to test this feature with a 3-table join that has a combined 300 columns. I can completely imagine coming up with a string which is within 0.5 or even 0.3 of 40 columns names. So if we want to list everything below a cutoff, we'd need to make that cutoff fairly narrow, like 0.2. But that means we'd miss a lot of potential matches on short column names. I really think we're overthinking this: it is just a HINT, and we can improve it in future PostgreSQL versions, and most of our users will ignore it anyway because they'll be using a client which doesn't display HINTs. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers