New issue 2441: Unicode surrogate codepoints in string literals combined when
using unittest module
https://bitbucket.org/pypy/pypy/issues/2441/unicode-surrogate-codepoints-in-string
byllyfish:
When running unit tests under PyPy3, I sometimes see unicode literals
containing surrogate pairs combined into a single non-BMP character.
Here is `test_surrogate.py`:
```python
import unittest
class TestSurrogate(unittest.TestCase):
def test_surrogate(self):
s = '\ud800\udc00'
if len(s) != 2:
raise ValueError(s.encode('raw-unicode-escape'))
```
The first time I run it, it works fine.
```bash
$ ~/pypy3-v5.5.0-osx64/bin/pypy3 -m unittest test_surrogate
.
----------------------------------------------------------------------
Ran 1 test in 0.000s
OK
```
When I run the test a second time, it fails. The surrogate pair has been
replaced.
```bash
$ ~/pypy3-v5.5.0-osx64/bin/pypy3 -m unittest test_surrogate
E
======================================================================
ERROR: test_surrogate (test_surrogate.TestSurrogate)
----------------------------------------------------------------------
Traceback (most recent call last):
File "./test_surrogate.py", line 7, in test_surrogate
raise ValueError(s.encode('raw-unicode-escape'))
ValueError: b'\\U00010000'
----------------------------------------------------------------------
Ran 1 test in 0.010s
FAILED (errors=1)
```
The failures continue until I touch the file. After that, the test will succeed
the first time, then fail subsequently. If I touch the file and run pypy3 with
-B (don't write .py[co] files on import), all the test runs succeed.
N.B. The problem does NOT occur under normal conditions. I've only seen it
using unittest.
```
# This small program always works fine!
s = '\ud800\udc00'
if len(s) != 2:
raise ValueError(s.encode('raw-unicode-escape'))
```
I am running on Mac OS X 10.11.6. Please let me know if you can reproduce this.
```
$ ~/pypy3-v5.5.0-osx64/bin/pypy3 --version
Python 3.3.5 (619c0d5af0e5, Oct 08 2016, 22:08:19)
[PyPy 5.5.0-alpha0 with GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)]
```
_______________________________________________
pypy-issue mailing list
[email protected]
https://mail.python.org/mailman/listinfo/pypy-issue