Marc-Andre Lemburg <m...@egenix.com> added the comment: Adam Olsen wrote: > > Adam Olsen <rha...@gmail.com> added the comment: > > Points against the subclassing argument: > > * We have a null-termination invariant. For byte strings this was part of > the public API, and I'm not sure that's changed for unicode strings; aren't > you arguing that we should maximize how much of our implementation is a > public API? This prevents lazy slicing.
Base type Unicode buffers end with a null-Py_UNICODE termination, but this is not used anywhere, AFAIK. We could probably remove that overallocation at some point. There's no such thing as a null-termination invariant for Unicode. > * subclassing unicode so you can change the meaning of the fields (ie > allocating your own buffer) is a gross hack. It relies far too much on fine > details of the implementation and is fragile (what if you miss the dummy byte > needed by fastsearch?) Most of the possible options could be, if they > function correctly, applied directly to the basetype as a patch, so it's moot. Actually, Unicode objects were designed to be subclassable right from the start and adjusting the buffer to point e.g. into some other already allocated string was too. I removed this feature from Fredrik's type implementation with the intent to readd it later on as subclass. See the prototype implementation of such a subclass uniref that I've written to show how easy it is to add a subclass which can be used to slice large Unicode objects without having to reallocate new buffers all the time. BTW, I'm not aware of any changes to the PyUnicodeObject by some fastsearch implementation. Could you point me to this ? ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue1943> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com