Hi Pauli 2009/7/9 Pauli Virtanen <pav...@iki.fi>: > Unfortunately, improving the performance using the above scheme > comes at the cost of some slightly murky heuristics. I didn't > manage to come up with an optimal decision rule, so they are > partly empirical. There is one parameter tuning the cross-over > between minimizing stride and avoiding small dimensions. (This is > more or less straightforward.) Another empirical decision is > required in choosing whether to use the usual reduction loop, > which is better in some cases, or the blocked loop. How to make > this latter choice is not so clear to me.
I know very little about cache optimality, so excuse the triviality of this question: Is it possible to design this loop optimally (taking into account certain build-time measurable parameters), or is it the kind of thing that can only be discovered by tuning at compile-time? ATNumPy... scary :-) Cheers Stéfan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion