Re: Non-Roman characters in TLDs and domain names

Sidney Markowitz Tue, 03 Nov 2009 18:51:37 -0800

Warren Togami wrote, On 4/11/09 3:27 PM:

It seems clear that we will need to flatten/encode any URI domain topunycode for URIBL lookups.

I agree with that -- if something has non-ASCII characters then punycodeis the canonical form to use to look it up.

The unclear part is if we will need to decode URI's prior to punycodeencoding. I suspect we will be forced to decode.

I'm not sure exactly what you mean, but the big issue that I see is howto determine that a string is a URL (where it starts and where it stops)that needs to be encoded to punycode. Is that what you are talkingabout? The rule of thumb that I used when working on code to extractURLs from plain text is that is some common MUA hot links it, then wewant to treat it as a URL. Perhaps the answer is to wait until MUAssupport these URLs and then follow that rule of thumb.


 -- sidney

Re: Non-Roman characters in TLDs and domain names

Reply via email to