Hi Pauli

2009/7/9 Pauli Virtanen <pav...@iki.fi>:
> Unfortunately, improving the performance using the above scheme
> comes at the cost of some slightly murky heuristics.  I didn't
> manage to come up with an optimal decision rule, so they are
> partly empirical. There is one parameter tuning the cross-over
> between minimizing stride and avoiding small dimensions. (This is
> more or less straightforward.)  Another empirical decision is
> required in choosing whether to use the usual reduction loop,
> which is better in some cases, or the blocked loop. How to make
> this latter choice is not so clear to me.

I know very little about cache optimality, so excuse the triviality of
this question: Is it possible to design this loop optimally (taking
into account certain build-time measurable parameters), or is it the
kind of thing that can only be discovered by tuning at compile-time?
ATNumPy... scary :-)

Cheers
Stéfan
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to