On Tue, Feb 28, 2012 at 8:32 PM, Matt Turner <matts...@gmail.com> wrote: > On Tue, Feb 28, 2012 at 1:20 PM, Lukic, Nemanja <nlu...@mips.com> wrote: >> Good point. >> Only problem there is that address on which we are storing might not be >> 4-byte aligned (since we are doing memset on array of uint16_t). >> But *dest can be aligned (with simple check) before the main loop, and then >> instead of 16 x sh, we can use 8 x sw. >> I will do that, and resubmit the patch. > > Ah, right. Co-alignment of src and dest makes this more complicated > for blt.
Looks like this is already done for blt. It uses 'pixman_mips_fast_memcpy' function, which appears to have more elaborate optimizations than the rest of MIPS assembly code (4-byte aligned writes and also better use of prefetch). The only nitpick is that it works with byte granularity and has a bit of extra overhead for 16-bit and 32-bit data. But developing special memcpy16 and memcpy32 variants just for this might be not worth the efforts. > For fill, it's pretty simple though. BTW, there are some benchmarks for fill operations in lowlevel-blt-bench - "src_n_8888" and "src_n_0565". -- Best regards, Siarhei Siamashka _______________________________________________ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman