I'm trying to open
http://пример.испытание
with
urllib2.urlopen(s1)
in Python 2.7 on Windows 7. This produces a Unicode exception:
>>> s1
u'http://\u043f\u0440\u0438\u043c\u0435\u0440.\u0438\u0441\u043f\u044b\u0442\u0430\u043d\u0438\u0435'
>>> fd = urllib2.urlopen(s1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "C:\python27\lib\urllib2.py", line 394, in open
response = self._open(req, data)
File "C:\python27\lib\urllib2.py", line 412, in _open
'_open', req)
File "C:\python27\lib\urllib2.py", line 372, in _call_chain
result = func(*args)
File "C:\python27\lib\urllib2.py", line 1199, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "C:\python27\lib\urllib2.py", line 1168, in do_open
h.request(req.get_method(), req.get_selector(), req.data, headers)
File "C:\python27\lib\httplib.py", line 955, in request
self._send_request(method, url, body, headers)
File "C:\python27\lib\httplib.py", line 988, in _send_request
self.putheader(hdr, value)
File "C:\python27\lib\httplib.py", line 935, in putheader
hdr = '%s: %s' % (header, '\r\n\t'.join([str(v) for v in values]))
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-5: ordinal not in range(128)
>>>
The HTTP library is trying to put the URL in the header as ASCII. Why
isn't "urllib2" handling that?
What does "urllib2" want? Percent escapes? Punycode?
John Nagle
--
http://mail.python.org/mailman/listinfo/python-list