[issue1943] improved allocation of PyUnicode objects

Marc-Andre Lemburg Sun, 10 Jan 2010 13:59:32 -0800

Marc-Andre Lemburg <[email protected]> added the comment:

Adam Olsen wrote:
> 
> Adam Olsen <[email protected]> added the comment:
> 
> Points against the subclassing argument:
> 
> * We have a null-termination invariant.  For byte strings this was part of 
> the public API, and I'm not sure that's changed for unicode strings; aren't 
> you arguing that we should maximize how much of our implementation is a 
> public API?  This prevents lazy slicing.


Base type Unicode buffers end with a null-Py_UNICODE termination,
but this is not used anywhere, AFAIK. We could probably remove
that overallocation at some point.

There's no such thing as a null-termination invariant for Unicode.

> * subclassing unicode so you can change the meaning of the fields (ie 
> allocating your own buffer) is a gross hack.  It relies far too much on fine 
> details of the implementation and is fragile (what if you miss the dummy byte 
> needed by fastsearch?)  Most of the possible options could be, if they 
> function correctly, applied directly to the basetype as a patch, so it's moot.

Actually, Unicode objects were designed to be subclassable right
from the start and adjusting the buffer to point e.g. into some
other already allocated string was too. I removed this feature from
Fredrik's type implementation with the intent to readd it later on as
subclass.

See the prototype implementation of such a subclass uniref that I've
written to show how easy it is to add a subclass which can be used
to slice large Unicode objects without having to reallocate new
buffers all the time.

BTW, I'm not aware of any changes to the PyUnicodeObject by some
fastsearch implementation. Could you point me to this ?

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue1943>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1943] improved allocation of PyUnicode objects

Reply via email to