New submission from Alexey Izbyshev <izbys...@ispras.ru>:

The C datetime implementation uses PyUnicode_AsUTF8AndSize() in wrap_strftime() 
and rejects strings containing surrogate code points (0xD800 - 0xDFFF) since 
they can't be encoded in UTF-8. On the other hand, the pure-Python datetime 
implementation doesn't have this restriction:

>>> import sys
>>> sys.modules['_datetime'] = None # block C implementation
>>> from datetime import time
>>> time().strftime('\ud800')
'\ud800'
>>> del sys.modules['datetime']
>>> del sys.modules['_datetime']
>>> from datetime import time
>>> time().strftime('\ud800')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 
0: surrogates not allowed

----------
components: Extension Modules
messages: 323963
nosy: belopolsky, izbyshev, pitrou, serhiy.storchaka, taleinat
priority: normal
severity: normal
status: open
title: Different behavior of C and Python impls of datetime.strftime with 
non-UTF-8-encodable strings
type: behavior
versions: Python 3.6, Python 3.7, Python 3.8

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue34481>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to