Re: [Cython] Cython syntax to pre-allocate lists for performance

2013-03-07 Thread Stefan Behnel
Yury V. Zaytsev, 07.03.2013 12:16:
 Is there any syntax that I can use to do something like this in Cython:
 
 py_object_ = PyList_New(123); ?

Note that Python has an algorithm for shrinking a list on appending, so
this might not be sufficient for your use case.


 If not, do you think that this can be added in one way or another?
 
 Unfortunately, I can't think of a non-disruptive way of doing it. For
 instance, if this
 
 [None] * N
 
 is given a completely new meaning, like make an empty list (of NULLs),
 instead of making a real list of Nones, it will certainly break Python
 code. Besides, it would probably be still faster than no pre-allocation,
 but slower than an empty list with pre-allocation...
 
 Maybe
 
 [NULL] * N ?

What do you need it for?

Won't list comprehensions work for you? They could potentially be adapted
to presize the list.

And why won't [None]*N help you out? It should be pretty cheap.

Stefan

___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Cython syntax to pre-allocate lists for performance

2013-03-07 Thread Zaur Shibzukhov
2013/3/7 Stefan Behnel stefan...@behnel.de

 Yury V. Zaytsev, 07.03.2013 12:16:
  Is there any syntax that I can use to do something like this in Cython:
 
  py_object_ = PyList_New(123); ?

 Note that Python has an algorithm for shrinking a list on appending, so
 this might not be sufficient for your use case.


  If not, do you think that this can be added in one way or another?
 
  Unfortunately, I can't think of a non-disruptive way of doing it. For
  instance, if this
 
  [None] * N
 
  is given a completely new meaning, like make an empty list (of NULLs),
  instead of making a real list of Nones, it will certainly break Python
  code. Besides, it would probably be still faster than no pre-allocation,
  but slower than an empty list with pre-allocation...
 
  Maybe
 
  [NULL] * N ?

 What do you need it for?

 Won't list comprehensions work for you? They could potentially be adapted
 to presize the list.


I guess the problem is to construct new (even empty) list with
pre-allocated memory exactly for N elements.

N*[NULL] - changes semantics because there can't be list with N elements
and filled by NULL.
N*[None] - more expansive for further assignments because of Py_DECREFs.

I suppose that N*[] could do the trick. It could be optimized so that N*[]
is equal to an empty list but with preallocated memory exactly for N
elements. Could it be?

Zaur Shibzukhov
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Cython syntax to pre-allocate lists for performance

2013-03-07 Thread Robert Bradshaw
On Thu, Mar 7, 2013 at 6:48 AM, Stefan Behnel stefan...@behnel.de wrote:
 Zaur Shibzukhov, 07.03.2013 15:39:
 2013/3/7 Zaur Shibzukhov

 I guess the problem is to construct new (even empty) list with
 pre-allocated memory exactly for N elements.

 N*[NULL] - changes semantics because there can't be list with N elements
 and filled by NULL.
 N*[None] - more expansive for further assignments because of Py_DECREFs.

 I suppose that N*[] could do the trick.

 That looks wrong to me.


 It could be optimized so that N*[]
 is equal to an empty list but with preallocated memory exactly for N
 elements. Could it be?

 Cython optimize already PyList_Append very well. Theofore scenario when
 first one create empty list with exactly prealocated memory for N elements
  and second eval elements of the list and add them using plain list.append
 could optimized in Cython very well too. As result constructed list will
 contain memory only for N elements. This allows don't vast memory when one
 need to build many many lists with relative big size.

 I prefer not adding any new syntax as long as we are not sure we can't fix
 it by making list comprehensions smarter. I tried this a while ago and have
 some initial code lying around somewhere in my patch queue. Didn't have the
 time to make it any usable, though, also because Cython didn't have its own
 append() for list comprehensions at the time. It does now, as you noted.

There are several cases where we can get the size of the result list
upfront, we can certainly do better here now. I'm also -1 to adding
special syntax for populating a list with NULL values, if you really
want to do this (and I doubt it really matters in most cases) calling
PyList_New is the syntax to use.

- Robert
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Cython syntax to pre-allocate lists for performance

2013-03-07 Thread Yury V. Zaytsev
Hi Stefan,

On Thu, 2013-03-07 at 12:21 +0100, Stefan Behnel wrote:

 Note that Python has an algorithm for shrinking a list on appending, so
 this might not be sufficient for your use case. What do you need it for?

W00t! I didn't know about that.

I'm wrapping a C++ code that should transmit large lists of objects to
Python, while these objects are stored into something vector-like, which
shouldn't get exposed directly. In the past, they did something like

obj = PyList_New(a.size());
for (a.begin(); a.end(); ++a) PyList_SetItem(obj, ...)

I figured I can translate it into a while loop like

obj = []
while (it != a.end()):
obj.append(it)
inc(it)

but then I'm not using the information about the size of a that I
already have, and for huge lists this tends to be quite slow.

I think this must be quite a common use case for bindings...

 And why won't [None]*N help you out? It should be pretty cheap.

It probably will, at least a bit. It just stroke me that if I'm going to
do something along the lines of

idx = 0
obj = [None] * a.size()
while (it != a.end()):
obj[idx] = it
idx += 1
inc(it)

I could also squeeze the last bits of performance by avoiding the
creation of Nones and subsequently populating the list with them.

If you say I have to use PyList_New directly, oh well... It's just that
now since I'm rewriting the bindings in Cython anyways, I'm also trying
to avoid using Python C API directly as much as possible.

 Won't list comprehensions work for you? They could potentially be adapted
 to presize the list.

I guess not.

-- 
Sincerely yours,
Yury V. Zaytsev


___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Cython syntax to pre-allocate lists for performance

2013-03-07 Thread Robert Bradshaw
On Thu, Mar 7, 2013 at 10:26 AM, Yury V. Zaytsev y...@shurup.com wrote:
 Hi Stefan,

 On Thu, 2013-03-07 at 12:21 +0100, Stefan Behnel wrote:

 Note that Python has an algorithm for shrinking a list on appending, so
 this might not be sufficient for your use case. What do you need it for?

 W00t! I didn't know about that.

 I'm wrapping a C++ code that should transmit large lists of objects to
 Python, while these objects are stored into something vector-like, which
 shouldn't get exposed directly. In the past, they did something like

 obj = PyList_New(a.size());
 for (a.begin(); a.end(); ++a) PyList_SetItem(obj, ...)

 I figured I can translate it into a while loop like

 obj = []
 while (it != a.end()):
 obj.append(it)
 inc(it)

 but then I'm not using the information about the size of a that I
 already have, and for huge lists this tends to be quite slow.

 I think this must be quite a common use case for bindings...

 And why won't [None]*N help you out? It should be pretty cheap.

 It probably will, at least a bit. It just stroke me that if I'm going to
 do something along the lines of

 idx = 0
 obj = [None] * a.size()
 while (it != a.end()):
 obj[idx] = it
 idx += 1
 inc(it)

 I could also squeeze the last bits of performance by avoiding the
 creation of Nones and subsequently populating the list with them.

 If you say I have to use PyList_New directly, oh well... It's just that
 now since I'm rewriting the bindings in Cython anyways, I'm also trying
 to avoid using Python C API directly as much as possible.

I would time the two approaches to see if it really matters.

 Won't list comprehensions work for you? They could potentially be adapted
 to presize the list.

 I guess not.

[o for o in a]

is nice and clean. If a has a size method (a common stl pattern), we
could optimistically call that to do the pre-allocation.

I don't know exactly what your usecase is, but you might consider
simply exposing a list-like wrapper supporting __getitem__ and
iteration, rather than eagerly converting the entire thing to a list.

- Robert
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Cython syntax to pre-allocate lists for performance

2013-03-07 Thread Robert Bradshaw
On Thu, Mar 7, 2013 at 3:19 PM, Greg Ewing greg.ew...@canterbury.ac.nz wrote:
 Nikita Nemkin wrote:

 Sorry, accidental early send. Previous mail continued...

 [None] * N makes an extra pass over the list to assign None to each item
 (and also incref None n times).


 Maybe this could be optimised by adding N to the reference
 count instead of incrementing it N times?

I'd be surprised if the C compiler doesn't.

http://hg.python.org/cpython/file/1d4849f9e37d/Objects/listobject.c#l515
___
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel


Re: [Cython] Add support for list/tuple slicing

2013-03-07 Thread Zaur Shibzukhov
2013/3/7 Zaur Shibzukhov szp...@gmail.com:
 Current Cython generate for slicing of list/tuple general
 PySequence_GetSlice/SetSlice call.
 We could replace that to native call for Py{List|Tuple}_GetSlice and
 PyList_SetSlice for lists/tuples.

There is updated change that use utility code
__Pyx_Py{List|Tuple}_GetSlice because Py{List|Tuple}_GetSlice dosn't
support negative indices. That job do (in CPython) {list|tuple}slice
function from type object's slot ({list|tuple}_subscript), but it
handle both indices and slice objects which add overhead. That's the
reason why PySequence_GetSlice is slower: it create slice object and
falls to {list|tuple}_subscript. Therefore I added utility code.

Here is utility code:

/// PyList_GetSlice.proto ///

static PyObject* __Pyx_PyList_GetSlice(
PyObject* lst, Py_ssize_t start, Py_ssize_t stop);

/// PyList_GetSlice ///

PyObject* __Pyx_PyList_GetSlice(
PyObject* lst, Py_ssize_t start, Py_ssize_t stop) {
Py_ssize_t i, length;
PyListObject* np;
PyObject **src, **dest;
PyObject *v;

length = PyList_GET_SIZE(lst);

if (start  0) {
start += length;
if (start  0)
start = 0;
}

if (stop  0)
stop += length;
else if (stop  length)
stop = length;

length = stop - start;
if (length = 0)
return PyList_New(0);

np = (PyListObject*) PyList_New(length);
if (np == NULL)
return NULL;

src = ((PyListObject*)lst)-ob_item + start;
dest = np-ob_item;
for (i = 0; i  length; i++) {
v = src[i];
Py_INCREF(v);
dest[i] = v;
}
return (PyObject*)np;
}

/// PyTuple_GetSlice.proto ///

static PyObject* __Pyx_PyTuple_GetSlice(
PyObject* ob, Py_ssize_t start, Py_ssize_t stop);

/// PyTuple_GetSlice ///

PyObject* __Pyx_PyTuple_GetSlice(
PyObject* ob, Py_ssize_t start, Py_ssize_t stop) {
Py_ssize_t i, length;
PyTupleObject* np;
PyObject **src, **dest;
PyObject *v;

length = PyTuple_GET_SIZE(ob);

if (start  0) {
start += length;
if (start  0)
start = 0;
}

if (stop  0)
stop += length;
else if (stop  length)
stop = length;

length = stop - start;
if (length = 0)
return PyList_New(0);

np = (PyTupleObject *) PyTuple_New(length);
if (np == NULL)
return NULL;

src = ((PyTupleObject*)ob)-ob_item + start;
dest = np-ob_item;
for (i = 0; i  length; i++) {
v = src[i];
Py_INCREF(v);
dest[i] = v;
}
return (PyObject*)np;
}

Here is testing code:

list_slice.pyx
-

from cpython.sequence cimport PySequence_GetSlice

cdef extern from list_tuple_slices.h:
inline object __Pyx_PyList_GetSlice(object ob, int start, int stop)
inline object __Pyx_PyTuple_GetSlice(object ob, int start, int stop)


cdef list lst = list(range(10))
cdef list lst2 = list(range(7))

def get_slice1(list lst):
cdef int i
cdef list res = []

for i in range(20):
res.append(PySequence_GetSlice(lst, 2, 8))

return res

def get_slice2(list lst):
cdef int i
cdef list res = []

for i in range(20):
res.append(__Pyx_PyList_GetSlice(lst, 2, 8))

return res

def test_get_slice1():
get_slice1(lst)

def test_get_slice2():
get_slice2(lst)

tuple_slicing.pyx
---

from cpython.sequence cimport PySequence_GetSlice

cdef extern from list_tuple_slices.h:
inline object __Pyx_PyList_GetSlice(object lst, int start, int stop)
inline object __Pyx_PyTuple_GetSlice(object ob, int start, int stop)

cdef tuple lst = tuple(range(10))

def get_slice1(tuple lst):
cdef int i
cdef list res = []

for i in range(20):
res.append(PySequence_GetSlice(lst, 2, 8))

return res

def get_slice2(tuple lst):
cdef int i
cdef list res = []

for i in range(20):
res.append(__Pyx_PyTuple_GetSlice(lst, 2, 8))

return res


def test_get_slice1():
get_slice1(lst)

def test_get_slice2():
get_slice2(lst)

Here are timings:

for list

(py33) zbook:mytests $ python -m timeit -n 100 -r 5 -v -s from
mytests.list_slice import test_get_slice1 test_get_slice1()
raw times: 10.2 10.3 10.4 10.1 10.2
100 loops, best of 5: 101 msec per loop
(py33) zbook:mytests $ python -m timeit -n 100 -r 5 -v -s from
mytests.list_slice import test_get_slice1 test_get_slice1()
raw times: 10.3 10.3 10.2 10.3 10.2
100 loops, best of 5: 102 msec per loop
(py33) zbook:mytests $ python -m timeit -n 100 -r 5 -v -s from
mytests.list_slice import test_get_slice2 test_get_slice2()
raw times: 8.16 8.19 8.17 8.2 8.16
100 loops, best of 5: 81.6 msec per loop
(py33) zbook:mytests $ python -m timeit -n 100 -r 5 -v -s from
mytests.list_slice import test_get_slice2 test_get_slice2()
raw times: 8.1 8.05