En Tue, 22 Jun 2010 11:02:58 -0300, <pyt...@bdurham.com> escribió:

Python 2.6.4 (Win32): Anyone have any explanation for the
following

encodings.idna.ToASCII( unicodeStr ) != unicodeStr.encode( 'idna'
)

Given that other processes may have to use the output of these
methods, what is the recommended technique?

Demonstration:

import encodings.idna
name = u'junk\xfc\xfd.txt'
name
u'junk\xfc\xfd.txt'
encodings.idna.ToASCII( name )
'xn--junk.txt-95ak'
name.encode( 'idna' )
'xn--junk-3rag.txt'
encodings.idna.ToUnicode( encodings.idna.ToASCII( name ) )
u'junk\xfc\xfd.txt'
name.encode( 'idna' ).decode( 'idna' )
u'junk\xfc\xfd.txt'

IDNA is *specifically* designed to operate with domain names, not arbitrary text. (IDNA = Internationalizing Domain Names in Applications). Even the encoding/decoding part alone(punycode) is specifically tailored for use in domain names. Do not use it for any other purpose.

That said, it seems that encodings.idna.ToUnicode/ToAscii work on individual 'labels' only (a domain name is comprised of several labels separated by '.') -- and encode/decode('idna') takes the whole name, splits, and processes each label (following RFC 3490, I presume)

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to