[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.

Inada Naoki Mon, 03 Feb 2020 04:10:21 -0800

Inada Naoki <songofaca...@gmail.com> added the comment:

I am still not sure about we should add new API only for avoiding cache.


* PyUnicode_AsUTF8String : When we need bytes or want to avoid cache.
* PyUnicode_AsUTF8AndSize : When we need C string, and cache is acceptable.


With PR-18327, PyUnicode_AsUTF8AndSize become 10+% faster than master branch, 
and same speed to PyUnicode_AsUTF8String.


## vs master

$ ./python -m pyperf timeit --compare-to=../cpython/python --python-names 
master:patched -s 'from _testcapi import unicode_bench_asutf8 as b' -- 'b(1000, 
"hello", "こんにちは")'
master: ..................... 96.6 us +- 3.3 us
patched: ..................... 83.3 us +- 0.3 us

Mean +- std dev: [master] 96.6 us +- 3.3 us -> [patched] 83.3 us +- 0.3 us: 
1.16x faster (-14%)


## vs AsUTF8String

$ ./python -m pyperf timeit -s 'from _testcapi import unicode_bench_asutf8 as 
b' -- 'b(1000, "hello", "こんにちは")'
.....................
Mean +- std dev: 83.2 us +- 0.2 us

$ ./python -m pyperf timeit -s 'from _testcapi import 
unicode_bench_asutf8string as b' -- 'b(1000, "hello", "こんにちは")'
.....................
Mean +- std dev: 81.9 us +- 2.1 us


## vs AsUTF8String (ASCII)

If we can not accept cache, PyUnicode_AsUTF8String is slower than 
PyUnicode_AsUTF8 when the unicode is ASCII string.  PyUnicode_GetUTF8Buffer 
helps only this case.

$ ./python -m pyperf timeit -s 'from _testcapi import unicode_bench_asutf8 as 
b' -- 'b(1000, "hello", "world")'
.....................
Mean +- std dev: 37.5 us +- 1.7 us

$ ./python -m pyperf timeit -s 'from _testcapi import 
unicode_bench_asutf8string as b' -- 'b(1000, "hello", "world")'
.....................
Mean +- std dev: 46.4 us +- 1.6 us

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue39087>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue39087] [C API] No efficient C API to get UTF-8 string from unicode object.

Reply via email to