En Tue, 22 Jun 2010 11:02:58 -0300, escribió:
Python 2.6.4 (Win32): Anyone have any explanation for the
following
encodings.idna.ToASCII( unicodeStr ) != unicodeStr.encode( 'idna'
)
Given that other processes may have to use the output of these
methods, what is the recommended technique?
Demonstration:
import encodings.idna
name = u'junk\xfc\xfd.txt'
name
u'junk\xfc\xfd.txt'
encodings.idna.ToASCII( name )
'xn--junk.txt-95ak'
name.encode( 'idna' )
'xn--junk-3rag.txt'
encodings.idna.ToUnicode( encodings.idna.ToASCII( name ) )
u'junk\xfc\xfd.txt'
name.encode( 'idna' ).decode( 'idna' )
u'junk\xfc\xfd.txt'
IDNA is *specifically* designed to operate with domain names, not
arbitrary text. (IDNA = Internationalizing Domain Names in Applications).
Even the encoding/decoding part alone(punycode) is specifically tailored
for use in domain names. Do not use it for any other purpose.
That said, it seems that encodings.idna.ToUnicode/ToAscii work on
individual 'labels' only (a domain name is comprised of several labels
separated by '.') -- and encode/decode('idna') takes the whole name,
splits, and processes each label (following RFC 3490, I presume)
--
Gabriel Genellina
--
http://mail.python.org/mailman/listinfo/python-list