On 17/01/2014 15:27, Steven D'Aprano wrote:
..........

# -*- coding: utf-8 -*-
def func(a):
      """
      >>> print(func(u'aaa\u020b'))
      aaaȋ
      """
      return a

There seems to be some mojibake in your post, which confuses issues.

You refer to \u020b, which is LATIN SMALL LETTER I WITH INVERTED BREVE.
At least, that's what it ought to be. But in your post, it shows up as
the two character mojibake, ╚ followed by ï (BOX DRAWINGS DOUBLE UP AND
RIGHT followed by LATIN SMALL LETTER I WITH DIAERESIS). It appears that
your posting software somehow got confused and inserted the two
characters which you would have got using cp-437 while claiming that they
are UTF-8. (Your post is correctly labelled as UTF-8.)

I'm confident that the problem isn't with my newsreader, Pan, because it
is pretty damn good at getting encodings right, but also because your
post shows the same mojibake in the email archive:

https://mail.python.org/pipermail/python-list/2014-January/664771.html

To clarify: you tried to show \u020B as a literal. As a literal, it ought
to be the single character ȋ which is a lower case I with curved accent on
top. The UTF-8 of that character is b'\xc8\x8b', which in the cp-437 code
page is two characters ╚ ï.

when I edit the file in vim with ut88 encoding I do see your ȋ as the literal. However, as you note I'm on windows and no amount of cajoling will get it to work reasonably so my printouts are broken. So on windows

(py27) C:\code\hg-repos>python -c"print(u'aaa\u020b')"
aaaȋ

on my linux

$ python2 -c"print(u'aaa\u020b')"
aaaȋ

$ python2 tdt1.py
/usr/lib/python2.7/doctest.py:1531: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  if got == want:
/usr/lib/python2.7/doctest.py:1551: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  if got == want:
**********************************************************************
File "tdt1.py", line 4, in __main__.func
Failed example:
    print(func(u'aaa\u020b'))
Expected:
    aaaȋ
Got:
    aaaȋ
**********************************************************************
1 items had failures:
   1 of   1 in __main__.func
***Test Failed*** 1 failures.
robin@everest ~/tmp:
$ cat tdt1.py
# -*- coding: utf-8 -*-
def func(a):
    """
    >>> print(func(u'aaa\u020b'))
    aaaȋ
    """
    return a
def _doctest():
    import doctest
    doctest.testmod()

if __name__ == "__main__":
    _doctest()
robin@everest ~/tmp:

so the error persists with our without copying errors.

Note that on my putty terminal I don't see the character properly (I see unknown glyph square box), but it copies OK.
--
Robin Becker

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to