> Thu, 23 Sep 2010 01:42:43 +0400 письмо от PS <[email protected]>:
> > > > The question I have is simple: the function
> > pixman_composite_over_n_8888_asm_##cpu takes height as the second (AFAIK)
> > parameter, it has some logic to handle height and line stride. Is there a
> > function that does the same but operates on single lines? That function
> > wouldn't need to know dst_stride and height, all I'd need to pass
> is
> > the dst_line, width, and color.
> > >
> > > Just pass in a height of 1. The stride will then be irrelevant.
> > >
> > > In any case, I think you may be overcomplicating things. All the
> > > things that Pixman is designed to do are accessible from the
> > > definitions in <pixman.h>.
> > >
> > > - Jonathan Morton
> > >
> > Off course, I pass 1 for height :) but then there is almost no
> performance
> > improvement from that neon-asm code. Regular c function (inlined
> > while(bits<bits_end){ ...} loop) is only around 10-15% slower.
> > It took me like 5 minutes to create
> > pixman_composite_line_over_n_8888_asm_sse2(int32_t width, uint32_t *
> dst_line,
> > uint32_t src) from sse2_composite_over_n_8888, but it's kind of
> unclear
> > how I could write similar function for arm-neon
> I think I didn't search enough (too lazy to scroll a few more lines down
> from .macro generate_composite_function).
> The next macro in pixman-arm-neon-asm.h is what I was asking about, seems that
> it does what I really need (probably these pixman combine functions deal with
> cases like mine):
> /*
> * A simplified variant of function generation template for a single
> * scanline processing (for implementing pixman combine functions)
> */
> .macro generate_composite_function_single_scanline
> ...
> .
I used arm neon code from pixman as a reference to see how I could optimize my
code. The most useful function in my case would be
pixman_composite_scanline_over_n_8888 (could be generetated with
generate_composite_function_single_scanline).
I reviewed asm code generated with it and made some changes to adapt it to my
needs. I think similar changes could be useful in pixman.
The first change I made it to accept only pointer and count of pixels, and I
made a separate function to set the color (q0 and q1 registers).
The neon specific change I made is:
q0 contains 0x11aaaaaa (where aa is the 0xff-alpha),
q1 contains 16-bit rgb values of the src color premultiplied by alpha as
follows (color * alpha + 0x80). q1 is set to contain 4 16-bit values: 0 for
alpha, and the other for rgb values.
Here's what pixman produces now:
vuzp.8 d4, d5
vuzp.8 d6, d7
vuzp.8 d5, d7
vuzp.8 d4, d6
vmvn d24, d3
vmull.u8 q8, d24, d4
vmull.u8 q9, d24, d5
vmull.u8 q10, d24, d6
vmull.u8 q11, d24, d7
vrshr.u16 q14, q8, #8
vrshr.u16 q15, q9, #8
vrshr.u16 q12, q10, #8
vrshr.u16 q13, q11, #8
vraddhn.i16 d28, q14, q8
vraddhn.i16 d29, q15, q9
vraddhn.i16 d30, q12, q10
vraddhn.i16 d31, q13, q11
vqadd.u8 q14, q0, q14
vqadd.u8 q15, q1, q15
vzip.8 d28, d30
vzip.8 d29, d31
vzip.8 d30, d31
vzip.8 d28, d29
I do this kind of operations:
vld1.32 {q2, q3}, [r0, :128]
vmull.u8 q4, d4, d0
vmull.u8 q5, d5, d0
vmull.u8 q6, d6, d0
vmull.u8 q7, d7, d0
vaddhn.i16 d16, q4, q1 //<--- should not be vraddhn
vaddhn.i16 d17, q5, q1
vaddhn.i16 d18, q6, q1
vaddhn.i16 d19, q7, q1
vst1.32 {q8, q9}, [r0, :128]!
This way it uses less neon register (q12-q15 in vrshr step) and doesn't need to
interleave/deinterleave color components and has way fewer instructions.
It also produces results closer to what floating point alpha blending would
produce. The only possible drawback could be if the src color wasn't properly
premultiplied then color values would overflow. Since I added separate function
to set fill color then I do all premultiplication in that function and I know
that colors will never overflow.
I copied simplified code, without all interleaving for better readability.
Since neon code in pixman is heavily templated (macro in asm), I wasn't able to
add my change as is to that code. Hopefully my change could be used in pixman
as well.
_______________________________________________
Pixman mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/pixman