Re: String performance regression from python 3.2 to 3.3

MRAB Wed, 13 Mar 2013 19:06:04 -0700

On 14/03/2013 00:55, Chris Angelico wrote:

On Thu, Mar 14, 2013 at 11:52 AM, MRAB <pyt...@mrabarnett.plus.com> wrote:

On 13/03/2013 23:43, Chris Angelico wrote:


On Thu, Mar 14, 2013 at 3:49 AM, rusi <rustompm...@gmail.com> wrote:


On Mar 13, 3:59 pm, Chris Angelico <ros...@gmail.com> wrote:

On Wed, Mar 13, 2013 at 9:11 PM, rusi <rustompm...@gmail.com> wrote:
> Uhhh..
> Making the subject line useful for all readers

I should have read this one before replying in the other thread.

jmf, I'd like to see evidence that there has been a performance
regression compared against a wide build of Python 3.2. You still have
never answered this fundamental, that the narrow builds of Python are
*BUGGY* in the same way that JavaScript/ECMAScript is. And believe you
me, the utterly unnecessary hassles I have had to deal with when
permitting user-provided .js code to script my engine have wasted
rather more dev hours than you would believe - there are rather a lot
of stupid edge cases to deal with.



This assumes that there are only three choices:
- narrow build that is buggy (surrogate pairs for astral characters)
- wide build that is 4-fold space inefficient for wide variety of
common (ASCII) use-cases
- flexible string engine that chooses a small tradeoff of space
efficiency over time efficiency.

There is a fourth choice: narrow build that chooses to be partial over
being buggy. ie when an astral character is encountered, an exception
is thrown rather than trying to fudge it into a 16-bit
representation.



As a simple factual matter, narrow builds of Python 3.2 don't do that.
So it doesn't factor into my original statement. But if you're talking
about a proposal for 3.4, then sure, that's a theoretical possibility.
It wouldn't be "buggy" in the sense of "string indexing/slicing
unexpectedly does the wrong thing", but it would still be incomplete
Unicode support, and I don't think people would appreciate it. Much
better to have graceful degradation: if there are non-BMP characters
in the string, then instead of throwing an exception, it just makes
the string wider.

[snip]
Do you mean that instead of switching between 1/2/4 bytes per codepoint
it would switch between 2/4 bytes per codepoint?


That's my point. We already have the better version. :)

If a later version of Python switched between 2/4 bytes per codepoint,
how much difference would it make in terms of memory and speed compared
to Python 3.2 (fixed width) and Python 3.3 (3 widths)?

The vast majority of the time, 2 bytes per codepoint is sufficient, but
would that result in less switching between widths and therefore higher
performance, or would the use of more memory (2 bytes when 1 byte would
do) offset that?

(And I'm talking about significant differences here.)
--
http://mail.python.org/mailman/listinfo/python-list

Re: String performance regression from python 3.2 to 3.3

Reply via email to