* on the Mon, Dec 29, 2014 at 05:01:44PM -0700, David G Johnston wrote: >> CREATE UNIQUE INDEX hostnames_hostname_key ON hostnames >> (lower(punycode_encode(hostname))); >> >> That would prevent adding more than one representation for the same >> hostname >> to the column. > > Except two different hostname can resolve to the same > punycode_encode(hostname) value
That's exactly what I'm taking advantage of... > so the unique index won't work. I think you misunderstand what I'm suggesing. I'm suggesting storing the unicode version in the database, and then creating a unique index on the result of the punycode function to prevent more than one representation of the same hostname being inserted. > It was also mentioned that using the Perl encoding function was > non-performant; which is why caching the data into a memoization table has > value. It was non-performant because it involved unnecessarily starting a Perl interpreter. Not because the algorithm its self is slow. > WHERE lower(punycode_encode(hostname)) = > > lower(punycode_encode('any-representation')) > > > > I'm not for knowing the rules of punycode but I'm not seeing what value > lower() provides here... Case insensitive matching. So that "EXAMPLE.COM" = "example.com" -- Mike Cardwell https://grepular.com https://emailprivacytester.com OpenPGP Key 35BC AF1D 3AA2 1F84 3DC3 B0CF 70A5 F512 0018 461F XMPP OTR Key 8924 B06A 7917 AAF3 DBB1 BF1B 295C 3C78 3EF1 46B4
signature.asc
Description: Digital signature