> The swap could be done without temporaries, but I assume you're also 
trying to match the look of the pseudocode?

It would be interesting to see how fast the code can get without 
significantly altering its look, or alternatively how much one would have 
to change to achieve speedups.

I profiled the code for a 500 x 500 random matrix and the swaps took ~ 0.5% 
of the execution time, IIRC. I'm not too concerned with those particular 
lines.

Reply via email to