[Python-Dev] C API for appending to arrays

Hrvoje Niksic Mon, 02 Feb 2009 09:21:56 -0800

The array.array type is an excellent type for storing a large amount of"native" elements, such as integers, chars, doubles, etc., withoutinvolving the heavy machinery of numpy. It's both blazingly fast andreasonably efficient with memory. The one thing missing from the arraymodule is the ability to directly access array values from C.

This might seem superfluous, as it's perfectly possible to manipulatearray contents from Python/C using PyObject_CallMethod and friends. Theproblem is that it requires the native values to be marshalled to Pythonobjects, only to be immediately converted back to native values by thearray code. This can be a problem when, for example, a numeric arrayneeds to be filled with contents, such as in this hypothetical example:


/* error checking and refcounting subtleties omitted for brevity */
PyObject *load_data(Source *src)
{
  PyObject *array_type = get_array_type();
  PyObject *array = PyObject_CallFunction(array_type, "c", 'd');
  PyObject *append = PyObect_GetAttrString(array, "append");
  while (!source_done(src)) {
    double num = source_next(src);
    PyObject *f = PyFloat_FromDouble(num);
    PyObject *ret = PyObject_CallFunctionObjArgs(append, f, NULL);
    if (!ret)
      return NULL;
    Py_DECREF(ret);
    Py_DECREF(f);
  }
  Py_DECREF(array_type);
  return array;
}

The inner loop must convert each C double to a Python Float, only forthe array to immediately extract the double back from the Float andstore it into the underlying array of C doubles. This may seem like anitpick, but it turns out that more than half of the time of thisfunction is spent creating and deleting those short-lived floating-pointobjects.

Float creation is already well-optimized, so opportunities for speeduplie elsewhere. The array object exposes a writable buffer, which can beused to store values directly. For test purposes I created a faster"append" specialized for doubles, defined like this:


int array_append(PyObject *array, PyObject *appendfun, double val)
{
  PyObject *ret;
  double *buf;
  Py_ssize_t bufsize;
  static PyObject *zero;
  if (!zero)
    zero = PyFloat_FromDouble(0);

  // append dummy zero value, created only once
  ret = PyObject_CallFunctionObjArgs(appendfun, zero, NULL);
  if (!ret)
    return -1;
  Py_DECREF(ret);

  // append the element directly at the end of the C buffer
  PyObject_AsWriteBuffer(array, (void **) &buf, &bufsize));
  buf[bufsize / sizeof(double) - 1] = val;
  return 0;
}

This hack actually speeds up array creation by a significant percentage(30-40% in my case, and that's for code that was producing the values byparsing a large text file).

It turns out that an even faster method of creating an array is by usingthe fromstring() method. fromstring() requires an actual string, not abuffer, so in C++ I created an std::vector<double> with a contiguousarray of doubles, passed that array to PyString_FromStringAndSize, andcalled array.fromstring with the resulting string. Despite all theunnecessary copying, the result was much faster than either of theprevious versions.

Would it be possible for the array module to define a C interface forthe most frequent operations on array objects, such as appending anitem, and getting/setting an item? Failing that, could we at least makefromstring() accept an arbitrary read buffer, not just an actual string?

_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] C API for appending to arrays

Reply via email to