From: Siarhei Siamashka <siarhei.siamas...@nokia.com> Software prefetch significantly improves bilinear scaling performance and pushes it up to memory bandwidth limit.
Benchmark on ARM Cortex-A8: Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=20028888, dst=20028888, speed=44.27 MPix/s after: op=1, src=20028888, dst=20028888, speed=68.21 MPix/s performance of nearest scaling for comparison: op=1, src=20028888, dst=20028888, speed=74.70 MPix/s --- pixman/pixman-arm-neon-asm.S | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-arm-neon-asm.S b/pixman/pixman-arm-neon-asm.S index c168e10..1331abf 100644 --- a/pixman/pixman-arm-neon-asm.S +++ b/pixman/pixman-arm-neon-asm.S @@ -2504,8 +2504,11 @@ fname: vld1.32 {d16}, [TMP1] vld1.32 {d17}, [TMP2] vmull.u8 q9, d16, d28 + add TMP1, X, UX, asl #5 /* prefetch 32 pixels ahead (8 iterations) */ vmlal.u8 q9, d17, d29 + pld [TOP, TMP1, asr #14] vshr.u16 q15, q12, #8 + pld [BOTTOM, TMP1, asr #14] vadd.u16 q12, q12, q13 vshll.u16 q2, d6, #8 vmlsl.u16 q2, d6, d30 -- 1.7.3.4 _______________________________________________ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman