On Tue, Feb 28, 2012 at 8:32 PM, Matt Turner <matts...@gmail.com> wrote:
> On Tue, Feb 28, 2012 at 1:20 PM, Lukic, Nemanja <nlu...@mips.com> wrote:
>> Good point.
>> Only problem there is that address on which we are storing might not be 
>> 4-byte aligned (since we are doing memset on array of uint16_t).
>> But *dest can be aligned (with simple check) before the main loop, and then 
>> instead of 16 x sh, we can use 8 x sw.
>> I will do that, and resubmit the patch.
>
> Ah, right. Co-alignment of src and dest makes this more complicated
> for blt.

Looks like this is already done for blt. It uses
'pixman_mips_fast_memcpy' function, which appears to have more
elaborate optimizations than the rest of MIPS assembly code (4-byte
aligned writes and also better use of prefetch). The only nitpick is
that it works with byte granularity and has a bit of extra overhead
for 16-bit and 32-bit data. But developing special memcpy16 and
memcpy32 variants just for this might be not worth the efforts.

> For fill, it's pretty simple though.

BTW, there are some benchmarks for fill operations in
lowlevel-blt-bench - "src_n_8888" and "src_n_0565".

-- 
Best regards,
Siarhei Siamashka
_______________________________________________
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Reply via email to