On 3/2/2013 10:08 AM, Nick Coghlan wrote:
On Sat, Mar 2, 2013 at 1:24 AM, Stefan Bucur <stefan.bu...@gmail.com> wrote:
Hi,

I'm working on an automated bug finding tool that I'm trying to apply on the
Python interpreter code (version 2.7.3). Because of early prototype
limitations, I needed to disable string interning in stringobject.c. More
precisely, I modified the PyString_FromStringAndSize and PyString_FromString
to no longer check for the null and single-char cases, and create instead a
new string every time (I can send the patch if needed).

However, after applying this modification, when running "make test" I get a
segfault in the test___all__ test case.

Before digging deeper into the issue, I wanted to ask here if there are any
implicit assumptions about string identity and interning throughout the
interpreter implementation. For instance, are two single-char strings having
the same content supposed to be identical objects?

I'm assuming that it's either this, or some refcount bug in the interpreter
that manifests only when certain strings are no longer interned and thus
have a higher chance to get low refcount values.

In theory, interning is supposed to be a pure optimisation, but it
wouldn't surprise me if there are cases that assume the described
strings are always interned (especially the null string case). Our
test suite would never detect such bugs, as we never disable the
interning.

Since it required patching functions rather than a configuration switch, it literally seems not be a supported option. If so, I would not consider it a bug for CPython to use the assumption of interning to run faster and I don't think it should be slowed down if that would be necessary to remove the assumption. (This is all assuming that the problem is not just a ref count bug.)

Stefan's question was about 2.7. I am just curious: does 3.3 still intern (some) unicode chars? Did the 256 interned bytes of 2.x carry over to 3.x?

Whether or not we're interested in fixing such bugs would depend on
the size of the patches needed to address them. From our point of
view, such bugs are purely theoretical (as the assumption is always
valid in an unpatched CPython build), so if the problem is too hard to
diagnose or fix, we're more likely to declare that interning of at
least those kinds of string values is required for correctness when
creating modified versions of CPython.

--
Terry Jan Reedy

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to