On Tuesday 22 February 2011 23:23:48 you wrote: > From: Siarhei Siamashka <siarhei.siamas...@nokia.com> > > Initial NEON optimization for bilinear scaling. Can be probably > improved more. > > Benchmark on ARM Cortex-A8: > Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): > before: op=1, src=20028888, dst=20028888, speed=10.72 MPix/s > after: op=1, src=20028888, dst=20028888, speed=44.27 MPix/s
And indeed, just adding prefetch to bilinear scaling code actually even provides something like 1.5x better performance than that. I'll try to make a separate patch adding prefetch after testing how well it performs for different scale factors. It's interesting that prefetch was not actually helping in the nearest scaling case, probably because LSU was already overloaded with handling many scattered memory accesses (or maybe because I did something wrong that time). In any case, because bilinear scaling also has a number crunching part, adding prefetch really improves memory bandwidth utilization and provides a nice performance boost. -- Best regards, Siarhei Siamashka _______________________________________________ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman