Re: encodings.idna.ToASCII( unicodeStr ) != unicodeStr.encode( 'idna' )

2010-06-23 Thread Gabriel Genellina

En Tue, 22 Jun 2010 11:02:58 -0300,  escribió:


Python 2.6.4 (Win32): Anyone have any explanation for the
following

encodings.idna.ToASCII( unicodeStr ) != unicodeStr.encode( 'idna'
)

Given that other processes may have to use the output of these
methods, what is the recommended technique?

Demonstration:


import encodings.idna
name = u'junk\xfc\xfd.txt'
name

u'junk\xfc\xfd.txt'

encodings.idna.ToASCII( name )

'xn--junk.txt-95ak'

name.encode( 'idna' )

'xn--junk-3rag.txt'

encodings.idna.ToUnicode( encodings.idna.ToASCII( name ) )

u'junk\xfc\xfd.txt'

name.encode( 'idna' ).decode( 'idna' )

u'junk\xfc\xfd.txt'


IDNA is *specifically* designed to operate with domain names, not  
arbitrary text. (IDNA = Internationalizing Domain Names in Applications).  
Even the encoding/decoding part alone(punycode) is specifically tailored  
for use in domain names. Do not use it for any other purpose.


That said, it seems that encodings.idna.ToUnicode/ToAscii work on  
individual 'labels' only (a domain name is comprised of several labels  
separated by '.') -- and encode/decode('idna') takes the whole name,  
splits, and processes each label (following RFC 3490, I presume)


--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


encodings.idna.ToASCII( unicodeStr ) != unicodeStr.encode( 'idna' )

2010-06-22 Thread python
Python 2.6.4 (Win32): Anyone have any explanation for the
following

encodings.idna.ToASCII( unicodeStr ) != unicodeStr.encode( 'idna'
)

Given that other processes may have to use the output of these
methods, what is the recommended technique?

Demonstration:

>>> import encodings.idna
>>> name = u'junk\xfc\xfd.txt'
>>> name
u'junk\xfc\xfd.txt'
>>> encodings.idna.ToASCII( name )
'xn--junk.txt-95ak'
>>> name.encode( 'idna' )
'xn--junk-3rag.txt'
>>> encodings.idna.ToUnicode( encodings.idna.ToASCII( name ) )
u'junk\xfc\xfd.txt'
>>> name.encode( 'idna' ).decode( 'idna' )
u'junk\xfc\xfd.txt'

The good news is that the encodings.idna and string idna
transformations appear to properly mirror when used with their
matching transformation.

The bad news is that encodings.idna and the equivalent str
transformations can't be intermixed.

Malcolm
-- 
http://mail.python.org/mailman/listinfo/python-list