Re: [Cython] Cython syntax to pre-allocate lists for performance
2013/3/7 Stefan Behnel stefan...@behnel.de Yury V. Zaytsev, 07.03.2013 12:16: Is there any syntax that I can use to do something like this in Cython: py_object_ = PyList_New(123); ? Note that Python has an algorithm for shrinking a list on appending, so this might not be sufficient for your use case. If not, do you think that this can be added in one way or another? Unfortunately, I can't think of a non-disruptive way of doing it. For instance, if this [None] * N is given a completely new meaning, like make an empty list (of NULLs), instead of making a real list of Nones, it will certainly break Python code. Besides, it would probably be still faster than no pre-allocation, but slower than an empty list with pre-allocation... Maybe [NULL] * N ? What do you need it for? Won't list comprehensions work for you? They could potentially be adapted to presize the list. I guess the problem is to construct new (even empty) list with pre-allocated memory exactly for N elements. N*[NULL] - changes semantics because there can't be list with N elements and filled by NULL. N*[None] - more expansive for further assignments because of Py_DECREFs. I suppose that N*[] could do the trick. It could be optimized so that N*[] is equal to an empty list but with preallocated memory exactly for N elements. Could it be? Zaur Shibzukhov ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] Add support for list/tuple slicing
2013/3/7 Zaur Shibzukhov szp...@gmail.com: Current Cython generate for slicing of list/tuple general PySequence_GetSlice/SetSlice call. We could replace that to native call for Py{List|Tuple}_GetSlice and PyList_SetSlice for lists/tuples. There is updated change that use utility code __Pyx_Py{List|Tuple}_GetSlice because Py{List|Tuple}_GetSlice dosn't support negative indices. That job do (in CPython) {list|tuple}slice function from type object's slot ({list|tuple}_subscript), but it handle both indices and slice objects which add overhead. That's the reason why PySequence_GetSlice is slower: it create slice object and falls to {list|tuple}_subscript. Therefore I added utility code. Here is utility code: /// PyList_GetSlice.proto /// static PyObject* __Pyx_PyList_GetSlice( PyObject* lst, Py_ssize_t start, Py_ssize_t stop); /// PyList_GetSlice /// PyObject* __Pyx_PyList_GetSlice( PyObject* lst, Py_ssize_t start, Py_ssize_t stop) { Py_ssize_t i, length; PyListObject* np; PyObject **src, **dest; PyObject *v; length = PyList_GET_SIZE(lst); if (start 0) { start += length; if (start 0) start = 0; } if (stop 0) stop += length; else if (stop length) stop = length; length = stop - start; if (length = 0) return PyList_New(0); np = (PyListObject*) PyList_New(length); if (np == NULL) return NULL; src = ((PyListObject*)lst)-ob_item + start; dest = np-ob_item; for (i = 0; i length; i++) { v = src[i]; Py_INCREF(v); dest[i] = v; } return (PyObject*)np; } /// PyTuple_GetSlice.proto /// static PyObject* __Pyx_PyTuple_GetSlice( PyObject* ob, Py_ssize_t start, Py_ssize_t stop); /// PyTuple_GetSlice /// PyObject* __Pyx_PyTuple_GetSlice( PyObject* ob, Py_ssize_t start, Py_ssize_t stop) { Py_ssize_t i, length; PyTupleObject* np; PyObject **src, **dest; PyObject *v; length = PyTuple_GET_SIZE(ob); if (start 0) { start += length; if (start 0) start = 0; } if (stop 0) stop += length; else if (stop length) stop = length; length = stop - start; if (length = 0) return PyList_New(0); np = (PyTupleObject *) PyTuple_New(length); if (np == NULL) return NULL; src = ((PyTupleObject*)ob)-ob_item + start; dest = np-ob_item; for (i = 0; i length; i++) { v = src[i]; Py_INCREF(v); dest[i] = v; } return (PyObject*)np; } Here is testing code: list_slice.pyx - from cpython.sequence cimport PySequence_GetSlice cdef extern from list_tuple_slices.h: inline object __Pyx_PyList_GetSlice(object ob, int start, int stop) inline object __Pyx_PyTuple_GetSlice(object ob, int start, int stop) cdef list lst = list(range(10)) cdef list lst2 = list(range(7)) def get_slice1(list lst): cdef int i cdef list res = [] for i in range(20): res.append(PySequence_GetSlice(lst, 2, 8)) return res def get_slice2(list lst): cdef int i cdef list res = [] for i in range(20): res.append(__Pyx_PyList_GetSlice(lst, 2, 8)) return res def test_get_slice1(): get_slice1(lst) def test_get_slice2(): get_slice2(lst) tuple_slicing.pyx --- from cpython.sequence cimport PySequence_GetSlice cdef extern from list_tuple_slices.h: inline object __Pyx_PyList_GetSlice(object lst, int start, int stop) inline object __Pyx_PyTuple_GetSlice(object ob, int start, int stop) cdef tuple lst = tuple(range(10)) def get_slice1(tuple lst): cdef int i cdef list res = [] for i in range(20): res.append(PySequence_GetSlice(lst, 2, 8)) return res def get_slice2(tuple lst): cdef int i cdef list res = [] for i in range(20): res.append(__Pyx_PyTuple_GetSlice(lst, 2, 8)) return res def test_get_slice1(): get_slice1(lst) def test_get_slice2(): get_slice2(lst) Here are timings: for list (py33) zbook:mytests $ python -m timeit -n 100 -r 5 -v -s from mytests.list_slice import test_get_slice1 test_get_slice1() raw times: 10.2 10.3 10.4 10.1 10.2 100 loops, best of 5: 101 msec per loop (py33) zbook:mytests $ python -m timeit -n 100 -r 5 -v -s from mytests.list_slice import test_get_slice1 test_get_slice1() raw times: 10.3 10.3 10.2 10.3 10.2 100 loops, best of 5: 102 msec per loop (py33) zbook:mytests $ python -m timeit -n 100 -r 5 -v -s from mytests.list_slice import test_get_slice2 test_get_slice2() raw times: 8.16 8.19 8.17 8.2 8.16 100 loops, best of 5: 81.6 msec per loop (py33) zbook:mytests $ python -m timeit -n 100 -r 5 -v -s from mytests.list_slice import test_get_slice2 test_get_slice2() raw times: 8.1 8.05
[Cython] nonecheck and as_none_safe_node method
In ExprNodes.py there are several places where method `as_none_safe_node` was applied in order to wrap a node by NoneCheckNode. I think it would be resonable to apply that mostly only in cases when noncheck=True. Here are possible changes in ExprNodes.py: https://github.com/intellimath/cython/commit/bd041680b78067007ad6b9894a2f2c18514e397c Zaur Shibzukhov ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] nonecheck and as_none_safe_node method
2013/3/5 Zaur Shibzukhov szp...@gmail.com 2013/3/5 Zaur Shibzukhov szp...@gmail.com In ExprNodes.py there are several places where method `as_none_safe_node` was applied in order to wrap a node by NoneCheckNode. I think it would be resonable to apply that mostly only in cases when noncheck=True. Here are possible changes in ExprNodes.py: https://github.com/intellimath/cython/commit/bd041680b78067007ad6b9894a2f2c18514e397c This change would prevent generation of None checking of an objects (lists, tuples, unicode) when nonecheck=True. Sorry... when nonecheck=False Any adeas? Zaur Shibzukhov -- С уважением, Шибзухов З.М. ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] To Add datetime.pxd to cython.cpython
2013/3/3 Zaur Shibzukhov szp...@gmail.com: 2013/3/2 Stefan Behnel stefan...@behnel.de: Hi, the last pull request looks good to me now. https://github.com/cython/cython/pull/189 Any more comments on it? As was suggested earlier, I added `import_datetime` inline function to initialize PyDateTime C API instead of direct usage of non-native C macros from datetime.h. Now you call `import_array ()` first in the same way as is done with `numpy`. This approach looks natural in the light of experience with numpy. I make some performance comparisons. Here example for dates. # test_date.pyx Here test code: from cpython.datetime cimport import_datetime, date_new, date import_datetime() from datetime import date as pydate def test_date1(): cdef list lst = [] for year in range(1000, 2001): for month in range(1,13): for day in range(1, 20): d = pydate(year, month, day) lst.append(d) return lst def test_date2(): cdef list lst = [] for year in range(1000, 2001): for month in range(1,13): for day in range(1, 20): d = date(year, month, day) lst.append(d) return lst def test_date3(): cdef list lst = [] cdef int year, month, day for year in range(1000, 2001): for month in range(1,13): for day in range(1, 20): d = date_new(year, month, day) lst.append(d) return lst def test1(): l = test_date1() return l def test2(): l = test_date2() return l def test3(): l = test_date3() return l Here are timings: (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s from mytests.test_date import test1 test1() 50 loops, best of 5: 83.2 msec per loop (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s from mytests.test_date import test2 test2() 50 loops, best of 5: 74.7 msec per loop (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s from mytests.test_date import test3 test3() 50 loops, best of 5: 20.9 msec per loop OSX 10.6.8 64 bit python 3.2 Shibzukhov Zaur ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] To Add datetime.pxd to cython.cpython
2013/3/3 Zaur Shibzukhov szp...@gmail.com: 2013/3/3 Zaur Shibzukhov szp...@gmail.com: 2013/3/3 Zaur Shibzukhov szp...@gmail.com: 2013/3/2 Stefan Behnel stefan...@behnel.de: Hi, the last pull request looks good to me now. https://github.com/cython/cython/pull/189 Any more comments on it? As was suggested earlier, I added `import_datetime` inline function to initialize PyDateTime C API instead of direct usage of non-native C macros from datetime.h. Now you call `import_array ()` first in the same way as is done with `numpy`. This approach looks natural in the light of experience with numpy. I make some performance comparisons. Here example for dates. # test_date.pyx Here test code: from cpython.datetime cimport import_datetime, date_new, date import_datetime() from datetime import date as pydate def test_date1(): cdef list lst = [] for year in range(1000, 2001): for month in range(1,13): for day in range(1, 20): d = pydate(year, month, day) lst.append(d) return lst def test_date2(): cdef list lst = [] for year in range(1000, 2001): for month in range(1,13): for day in range(1, 20): d = date(year, month, day) lst.append(d) return lst def test_date3(): cdef list lst = [] cdef int year, month, day for year in range(1000, 2001): for month in range(1,13): for day in range(1, 20): d = date_new(year, month, day) lst.append(d) return lst def test1(): l = test_date1() return l def test2(): l = test_date2() return l def test3(): l = test_date3() return l Here are timings: (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s from mytests.test_date import test1 test1() 50 loops, best of 5: 83.2 msec per loop (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s from mytests.test_date import test2 test2() 50 loops, best of 5: 74.7 msec per loop (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s from mytests.test_date import test3 test3() 50 loops, best of 5: 20.9 msec per loop OSX 10.6.8 64 bit python 3.2 More acurate test... # coding: utf-8 from cpython.datetime cimport import_datetime, date_new, date import_datetime() from datetime import date as pydate def test_date1(): cdef list lst = [] cdef int year, month, day for year in range(1000, 2001): for month in range(1,13): for day in range(1, 20): d = pydate(year, month, day) lst.append(d) return lst def test_date2(): cdef list lst = [] cdef int year, month, day for year in range(1000, 2001): for month in range(1,13): for day in range(1, 20): d = date(year, month, day) lst.append(d) return lst def test_date3(): cdef list lst = [] cdef int year, month, day for year in range(1000, 2001): for month in range(1,13): for day in range(1, 20): d = date_new(year, month, day) lst.append(d) return lst def test1(): l = test_date1() return l def test2(): l = test_date2() return l def test3(): l = test_date3() return l Timings: (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s from mytests.test_date import test1 test1() 50 loops, best of 5: 83.3 msec per loop (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s from mytests.test_date import test2 test2() 50 loops, best of 5: 74.6 msec per loop (py32)zbook:mytests $ python -m timeit -n 50 -r 5 -s from mytests.test_date import test3 test3() 50 loops, best of 5: 20.8 msec per loop Yet another performance comparison for `time`: # coding: utf-8 from cpython.datetime cimport import_datetime, time_new, time import_datetime() from datetime import time as pytime def test_time1(): cdef list lst = [] cdef int hour, minute, second, microsecond for hour in range(0, 24): for minute in range(0,60): for second in range(0, 60): for microsecond in range(0, 10, 5): d = pytime(hour, minute, second, microsecond) lst.append(d) return lst def test_time2(): cdef list lst = [] cdef int hour, minute, second, microsecond for hour in range(0, 24): for minute in range(0,60): for second in range(0, 60): for microsecond in range(0, 10, 5): d = time(hour, minute, second, microsecond) lst.append(d) return lst def test_time3(): cdef list lst = [] cdef int hour, minute, second, microsecond for hour in range(0, 24): for minute in range(0,60): for second in range(0, 60): for microsecond in range(0
Re: [Cython] About IndexNode and unicode[index]
2013/3/2 Stefan Behnel stefan...@behnel.de: I think you could even pass in two flags, one for wraparound and one for boundscheck, and then just evaluate them appropriately in the existing if tests above. That should allow both features to be supported independently in a fast way. https://github.com/scoder/cython/commit/cc4f7daec3b1f19b5acaed7766e2b6f86902ad94 It seems to include the following directive at the beginning of the tests (which tests indices for lists, tuples and unicode): #cython: boundscheck=True #cython: wraparound=True as default mode for testing? -- С уважением, Шибзухов З.М. ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel
Re: [Cython] About IndexNode and unicode[index]
I think you could even pass in two flags, one for wraparound and one for boundscheck, and then just evaluate them appropriately in the existing if tests above. That should allow both features to be supported independently in a fast way. Intresting, could C compilers in optimization mode to eliminate unused evaluation path in nested if statements with constant conditional expressions? They'd be worthless if they didn't do that. (Even Cython does it, BTW.) Then it can simplify writing utility code in order to support different optimization flags in other cases too. ___ cython-devel mailing list cython-devel@python.org http://mail.python.org/mailman/listinfo/cython-devel