> time calculations are a valuable optimization, but I think inlining > (which one can explicitly request in the C output) and loop unrolling > are well handled by GCC and is probably best handled at the this > level for most things (for now at least).
Yes, I explained my rationale for that badly. The reason I wanted to do loop unrolling is because I think it would look very bad if the C code was littered with extra for-loops for every NumPy lookup, see: http://wiki.cython.org/enhancements/operators/ambitious which I have updated and made clearer. About inlining, as long as it wouldn't affect GCC's caching or noncaching of the stride calculations, I'm fine without. Details: ctypdef class numpy ... def __getitem__(self, index): return (<int*>self.data)[self.strides[0] // 4 * index[0] + self.strides[1] // 4 * index[0]] The above is the code as it would look in the parse-tree *after* compile-time optimization with the knowledge that type type is int and the dimensions 2. Note that the loop has been unrolled, resulting in that list of +.... also note that those stride calculations must either be cached by GCC or must also happen...hmm... I really should do some experiments I guess... Dag Sverre _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
