This is part of a convolution filter. The result is in the same order of
magnitude (uint8):
let value:int32 = r0[col0].int32 + r0[col1].int32 * 2 + r0[col2].int32
+
r1[col0].int32 * 2 + r1[col1].int32 * 4 + r1[col2].int32
* 2 +
r2[col0].int32 + r2[col1].int32 * 2 + r2[col2].int32
w1[col1] = (value / 16.0).uint8
Run
I am looking how to improve performance-wise without entering into SIMD stuff
(that I have never used by the way). I think that all those type conversions
are killing the performance that I should achieve.