> time calculations are a valuable optimization, but I think inlining
> (which one can explicitly request in the C output) and loop unrolling
> are well handled by GCC and is probably best handled at the this
> level for most things (for now at least).

Yes, I explained my rationale for that badly.

The reason I wanted to do loop unrolling is because I think it would look
very bad if the C code was littered with extra for-loops for every NumPy
lookup, see:

http://wiki.cython.org/enhancements/operators/ambitious

which I have updated and made clearer. About inlining, as long as it
wouldn't affect GCC's caching or noncaching of the stride calculations,
I'm fine without.

Details:

ctypdef class numpy ...
  def __getitem__(self, index):
    return (<int*>self.data)[self.strides[0] // 4 * index[0] +
self.strides[1] // 4 * index[0]]

The above is the code as it would look in the parse-tree *after*
compile-time optimization with the knowledge that type type is int and the
dimensions 2. Note that the loop has been unrolled, resulting in that list
of +.... also note that those stride calculations must either be cached by
GCC or must also happen...hmm...

I really should do some experiments I guess...

Dag Sverre

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to