Steve Holden wrote: > John Nagle wrote: > >> Here's a strange little bug. "socket.getaddrinfo" blows up >> if given a bad domain name containing ".." in Unicode. The >> same string in ASCII produces the correct "gaierror" exception. >> >> Actually, this deserves a documentation mention. The "socket" >> module, >> given a Unicode string, calls the International Domain Name parser, >> "idna.py", which has a a whole error system of its own. The IDNA >> documentation says that "Furthermore, the socket module transparently >> converts Unicode host names to ACE, so that applications need not be >> concerned about converting host names themselves when they pass them >> to the socket module." >> However, that's not quite true; the IDNA rules say that syntax errors >> must >> be treated as errors, so you have to be prepared for IDNA exceptions. >> They are all "UnicodeError" exceptions. >> >> It's worth a mention in the documentation for "socket". >> >> John Nagle >> >> D:\>/python25/python.exe >> Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit >> (Intel)] on win >> 32 >> Type "help", "copyright", "credits" or "license" for more information. >> >>> ss = 'www.gallery84..com' >> >>> uss = unicode(ss) >> >>> import socket >> >>> socket.getaddrinfo(ss,"http") >> Traceback (most recent call last): >> File "<stdin>", line 1, in <module> >> socket.gaierror: (11001, 'getaddrinfo failed') >> >>> socket.getaddrinfo(uss,"http") >> Traceback (most recent call last): >> File "<stdin>", line 1, in <module> >> File "D:\python25\lib\encodings\idna.py", line 164, in encode >> result.append(ToASCII(label)) >> File "D:\python25\lib\encodings\idna.py", line 73, in ToASCII >> raise UnicodeError("label empty or too long") >> UnicodeError: label empty or too long >> >>> >> > I took a look at the documentation but couldn't see where to add what, > given that the documentation for socket already says: > > """All errors raise exceptions. The normal exceptions for invalid > argument types and out-of-memory conditions can be raised; errors > related to socket or address semantics raise the error socket.error. > """. > > Do we really need to specifically mention Unicode errors?
It says "errors related to socket or address semantics raise the error 'socket.error'", so, yes. The error really has nothing to do with Unicode; it's that a different parser is used when a domain name is in Unicode. It really shouldn't be a "Unicode error" at all. When Python goes to Unicode by default, this is likely to break some existing code. Python's IDNA support is good, but not entirely invisible. The socket module documentation should mention IDNA support. It's not clear, for example, when you call "getnameinfo()", whether you get back the name in Unicode or in Punycode. John Nagle -- http://mail.python.org/mailman/listinfo/python-list