Re: encodings.idna.ToASCII( unicodeStr ) != unicodeStr.encode( 'idna' )

Gabriel Genellina Wed, 23 Jun 2010 00:56:12 -0700

En Tue, 22 Jun 2010 11:02:58 -0300, <[email protected]> escribió:

Python 2.6.4 (Win32): Anyone have any explanation for the
following

encodings.idna.ToASCII( unicodeStr ) != unicodeStr.encode( 'idna'
)

Given that other processes may have to use the output of these
methods, what is the recommended technique?

Demonstration:

import encodings.idna
name = u'junk\xfc\xfd.txt'
name

u'junk\xfc\xfd.txt'

encodings.idna.ToASCII( name )

'xn--junk.txt-95ak'

name.encode( 'idna' )

'xn--junk-3rag.txt'

encodings.idna.ToUnicode( encodings.idna.ToASCII( name ) )

u'junk\xfc\xfd.txt'

name.encode( 'idna' ).decode( 'idna' )

u'junk\xfc\xfd.txt'

IDNA is *specifically* designed to operate with domain names, notarbitrary text. (IDNA = Internationalizing Domain Names in Applications).Even the encoding/decoding part alone(punycode) is specifically tailoredfor use in domain names. Do not use it for any other purpose.

That said, it seems that encodings.idna.ToUnicode/ToAscii work onindividual 'labels' only (a domain name is comprised of several labelsseparated by '.') -- and encode/decode('idna') takes the whole name,splits, and processes each label (following RFC 3490, I presume)


--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

Re: encodings.idna.ToASCII( unicodeStr ) != unicodeStr.encode( 'idna' )

Reply via email to