------
Neil Hodgson:
"The counter-problem is that a French document that needs to include
one mathematical symbol (or emoji) outside Latin-1 will double in size
as a Python string."
Serious developers/typographers/users know that you can not compose
a text in French with "latin-1". This is now also the case with
German (Germany).
---
Neil's comment is correct,
>>> sys.getsizeof('a' * 1000 + 'z')
1026
>>> sys.getsizeof('a' * 1000 + '€')
2040
This is not really the problem. "Serious users" may
notice sooner or later, Python and Unicode are walking in
opposite directions (technically and in spirit).
>>> timeit.repeat("'a' * 1000 + 'ẞ'")
[1.1088995672090292, 1.0842266613261913, 1.1010779011941594]
>>> timeit.repeat("'a' * 1000 + 'z'")
[0.6362570846925735, 0.6159128762502917, 0.6200501673623791]
(Just an opinion)
jmf
--
http://mail.python.org/mailman/listinfo/python-list