On Mon, Dec 29, 2014 at 4:51 PM, Mike Cardwell [via PostgreSQL] < [email protected]> wrote:
> * on the Mon, Dec 29, 2014 at 04:23:22PM -0700, David G Johnston wrote: > > >>> WHERE hostname='nißan.com <http://nissan.com>' > >>> > >> > >> _IF_ Postgres had a punycode function, then you could use: > >> WHERE punycode(hostname) = punycode('nißan.com <http://nissan.com>') > > > > If the OP wraps what he is doing up into a function that is what you end > up > > getting: a memoized punycode function. > > > > http://en.wikipedia.org/wiki/Memoization > > > > It has to be defined as volatile but basically write the function to > check > > for the provided input on the indexed table and if it doesn't exist the > > function will calculate the punycode value and store it onto the table > > before returning the punycode value to the caller. > I'm not sure all that is necessary. It could be quite a simple function, > like the lower() function. So what I would do is this: > > CREATE UNIQUE INDEX hostnames_hostname_key ON hostnames > (lower(punycode_encode(hostname))); > > That would prevent adding more than one representation for the same > hostname > to the column. Except two different hostname can resolve to the same punycode_encode(hostname) value so the unique index won't work. It was also mentioned that using the Perl encoding function was non-performant; which is why caching the data into a memoization table has value. WHERE lower(punycode_encode(hostname)) = > lower(punycode_encode('any-representation')) > I'm not for knowing the rules of punycode but I'm not seeing what value lower() provides here... > > There doesn't need to be any extra table storage for the punycode encoded > version. > > David J. -- View this message in context: http://postgresql.nabble.com/Hostnames-IDNs-Punycode-and-Unicode-Case-Folding-tp5832350p5832368.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
