Antoine Pitrou wrote: > In all honesty, I admit I am annoyed by all the problems with the buffer API / > memoryview object, many of which are caused by its utterly bizarre design (and > the fact that the design team went missing in action after imposing such a > bizarre and complex design on us), and I'm reluctant to add yet another level > of > byzantine complexity in order to solve those problems. It explains I may > sound a > bit angry at times :-) > > If we really need to change things a lot to make them work, we should re-work > the buffer API from the ground up, make the Py_buffer struct a true PyObject > (that is, a true variable-length object so as to solve the shape and strides > allocation issue) and merge it with the current memoryview implementation. It > would make things both more simpler and more flexible.
I don't see anything wrong with the PEP 3118 protocol. It does exactly what it is designed to do: allow the number crunching crowd to share large datasets between different libraries without copying things around in memory. Yes, the protocol is complicated, but that is because it is trying to handle a complicated problem. The memoryview implementation on the other hand is pretty broken. I do have a theory on how it ended up in such an unusable state, but I'm not particularly inclined to share it - this kind of thing can happen sometimes, and the important question now is how we fix it. As I see it, memoryview is actually trying to do two things, but the design for supporting the second of them doesn't appear to have been adequately thought through in the current implementation. The first use of a memoryview object is merely to allow access to the Py_buffer of a data store. This is pretty simple, and aside from currently getting len() wrong when itemsize > 1, memoryview isn't terrible at it. If we left memoryview at that it *would* just be a simple wrapper around a Py_buffer struct, and it's implementation wouldn't be difficult at all. Where it gets a bit more complicated is if we want to support slices (rather than just indexing) on memoryview objects. When you do that, the memoryview is no longer a simple wrapper around the Py_buffer of the underlying data store, because it isn't exposing the whole data store any more - it is only exposing part of it. Requesting access to only part of a data buffer is NOT part of the PEP 3118 API, and it doesn't need to be: it can be part of a separate object that adapts from the underlying data store to the desired subview. The object that is meant to be performing at least simple 1-dimensional cases of that adaptation is memoryview (or more to the point, memoryview slices), but it currently *sucks* at this because it relies too heavily on the info in the Py_buffer that it got from the underlying object. That Py_buffer describes the *whole* data store, but a memoryview slice may only be exposing part of it - so while the info in the Py_buffer is accurate for the underlying object, it is *not* accurate for the memoryview itself. Fixing that for the 1 dimensional case shouldn't actually be all that difficult - the memoryview just needs to maintain its own shape[0] entry that reflects the number of items in the view rather than the number in the underlying object. The multi-dimensional cases get pretty tricky though, since they will almost always end up dealing with non-contiguous data. The PEP 3118 protocol is up to handling the task, but the implementation of the index mapping to handle these multi-dimensional cases is highly non-trivial, and probably best left to third party libraries like numpy. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --------------------------------------------------------------- _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com