I'm attaching my latest extended buffer-protocol PEP that is trying to
get the array interface into Python. Basically, it is a translation of
the numpy header files into something as simple as possible that can
still be used to describe a complicated block of memory to another user.
My purpose is to get feedback and criticisms from this community before
display before the larger Python community.
-Travis
PEP: <unassigned>
Title: Extending the buffer protocol to include the array interface
Version: $Revision: $
Last-Modified: $Date: $
Author: Travis Oliphant <[EMAIL PROTECTED]>
Status: Draft
Type: Standards Track
Created: 28-Aug-2006
Python-Version: 2.6
Abstract
This PEP proposes extending the tp_as_buffer structure to include
function pointers that incorporate information about the intended
shape and data-format of the provided buffer. In essence this will
place an array interface directly into Python.
Rationale
Several extensions to Python utilize the buffer protocol to share
the location of a data-buffer that is really an N-dimensional
array. However, there is no standard way to exchange the
additional N-dimensional array information so that the data-buffer
is interpreted correctly. The NumPy project introduced an array
interface (http://numpy.scipy.org/array_interface.shtml) through a
set of attributes on the object itself. While this approach
works, it requires attribute lookups which can be expensive when
sharing many small arrays.
One of the key reasons that users often request to place something
like NumPy into the standard library is so that it can be used as
standard for other packages that deal with arrays. This PEP
provides a mechanism for extending the buffer protocol (which
already allows data sharing) to add the additional information
needed to understand the data. This should be of benefit to all
third-party modules that want to share memory through the buffer
protocol such as GUI toolkits, PIL, PyGame, CVXOPT, PyVoxel,
PyMedia, audio libraries, video libraries etc.
Proposal
Add bf_getarrview and bf_relarrview function pointers to the
buffer protocol to allow objects to share a view on a memory
pointer including information about accessing it as an
N-dimensional array. Add the TP_HAS_ARRAY_BUFFER flag to types
that define this extended buffer protocol.
Also a few additionsl C-API calls should perhaps be added to Python
to facilitate creating new PyArrViewObjects.
Specification:
static PyObject* bf_getarrayview (PyObject *obj)
This function must return a new reference to a PyArrViewObject
which contains the details of the array information exposed by the
object. If failure occurs, then NULL is returned and an exception
set.
static int bf_relarrayview(PyObject *obj)
If not NULL then this will be called when the object returned by
bf_getarrview is destroyed so that the underlying object can be
aware when acquired "views" are released.
The object that defines bf_getarrview should not re-allocate memory
(re-size itself) while views are extant. A 0 is returned on success
and a -1 and an error condition set on failure.
The PyArrayViewObject has the structure
typedef struct {
PyObject_HEAD
void *data; /* pointer to the beginning of data */
int nd; /* the number of dimensions */
Py_ssize_t *shape; /* c-array of size nd giving shape */
Py_ssize_t *strides; /* SEE BELOW */
PyObject *base; /* the object this is a "view" of */
PyObject *format; /* SEE BELOW */
int flags; /* SEE BELOW */
} PyArrayViewObject;
strides -- a c-array of size nd providing the striding information
which is the number of bytes to skip to get to the next element
in that dimension.
format -- a Python data-format object (PyDataFormatObject) which
contains information about how each item in the array
should be interpreted.
flags -- an integer of flags. PYARR_WRITEABLE is the only flag
that must be set appropriately by types.
Other flags: PYARR_ALIGNED, PYARR_C_CONTIGUOUS,
PYARR_F_CONTIGUOUS, and PYARR_NOTSWAPPED can all be determined
from the rest of the PyArrayViewObject using the UpdateFlags
C-API.
The PyDataFormatObject has the structure
typedef struct {
PyObject_HEAD
PySimpleformat primitive; /* basic primitive type */
int flags; /* byte-order, isaligned */
int itemsize; /* SEE BELOW */
int alignment; /* SEE BELOW */
PyObject *extended; /* SEE BELOW */
} PyDataFormatObject;
enum Pysimpleformat {PY_BIT='1', PY_BOOL='?', PY_BYTE='b', PY_SHORT='h',
PY_INT='i',
PY_LONG='l', PY_LONGLONG='q', PY_UBYTE='B', PY_USHORT='H', PY_UINT='I',
PY_ULONG='L', PY_ULONGLONG='Q', PY_FLOAT='f', PY_DOUBLE='d',
PY_LONGDOUBLE='g',
PY_CFLOAT='F', PY_CDOUBLE='D', PY_CLONGDOUBLE='G', PY_OBJECT='O',
PY_CHAR='c', PY_UCS2='u', PY_UCS4='w', PY_FUNCPTR='X', PY_VOIDPTR='V'};
Each of these simple formats has a special character code which can be
used to
identify this primitive in a nested python list.
flags -- flags for the data-format object. Specified masks are
PY_NATIVEORDER
PY_BIGENDIAN
PY_LITTLEENDIAN
PY_IGNORE
itemsize -- the total size represented by this data-format in bytes unless
the
primitive is PY_BIT in which case it is the size in bits.
For data-formats that are simple 1-d arrays of the underlying
primitive,
this total size can represent more than one primitive (with
extended
still NULL).
alignment -- For the primitive types this is offsetof(struct {char c; type
v;},v)
extended -- NULL if this is a primitive data-type or no additional
information is
available.
If primitive is PY_FUNCPTR, then this can be a tuple with >=1
element:
(args, {dim0, dim1, dim2, ...}).
args -- A list (of at least length 2) of data-format objects
specifying the input argument formats with the last
argument specifying the output argument data-format
(use None for void inputs and/or outputs).
For other primitives, this can be a tuple with >=2 elements:
(names, fields, {dim0, dim1, dim2, ...})
Use None for both names and fields if they should be ignored.
names -- An ordered list of string or unicode objects giving
the names
of the fields for a structure data-format.
fields -- a Python dictionary with ordered-keys given by the
list
in names. Each entry in the dictionary is
a 3-tuple containing (data-format-object, offset,
meta-information) where meta-information is Py_None
if there
is no meta-information. Offset is given in bytes
from the
start of the record or in bits if PY_BIT is the
primitive.
Any additional entries in the extended tuple (dim0,
dim1, etc.) are interpreted as integers which specify
that this data-format is an array of the given shape
of the fundamental data-format specified by the
remainder of the DataFormat Object. The dimensions
are specified so that the last-index is always assumed
to vary the fastest (C-order).
The constructor of a PyArrViewObject allocates the memory for shape and
strides
and the destructor frees that memory.
The constructor of a PyDataFormatObject allocates the objects it needs for
fields,
names, and shape.
C-API
void PyArrayView_UpdateFlags(PyObject *view, int flags)
/* update the flags on the array view object provided */
PyDataFormatObject *Py_NewSimpleFormat(Pysimpleformat primitive)
/* return a new primitive data-format object */
PyDataFormatObject *Py_DataFormatFromCType(PyObject *ctype)
/* return a new data-format object from a ctype */
int Py_GetPrimitiveSize(Pysimpleformat primitive)
/* return the size (in bytes) of the provided primitive */
PyDataFormatObject *Py_AlignDataFormat(PyObject *format)
/* take a data-format object and construct an aligned data-format
object where all fields are aligned on appropriate boundaries
for the compiler */
Discussion
The information provided in the array view object is patterned
after the way a multi-dimensional array is defined in NumPy -- including
the data-format object which allows a variety of descriptions of memory
depending on the need.
Reference Implementation
Supplied when the PEP is accepted.
Copyright
This document is placed in the public domain.
_______________________________________________
Numpy-discussion mailing list
[email protected]
http://projects.scipy.org/mailman/listinfo/numpy-discussion