On Tuesday 01 May 2007 13:36, Kalle Vahlman wrote: > 2007/5/1, Siarhei Siamashka <[EMAIL PROTECTED]>: > > OK, thanks. It may take some time though. I'm still using old scratchbox > > with mistral SDK here (did not have enough free time to upgrade yet). > > Until I clean up my scratchbox mess, I can only provide some patch > > without testing, if anybody courageous can try to build it :) > > Given that I fear not the perils of building a X server with > nonstandard options[1], I shall be more than happy to conduct such > adventurous acts :) > > And unless Mr. Kulve has objections, the results could be installed > from a repository as well. > > [1] > http://syslog.movial.fi/archives/47-Shadows-for-everyone-well,-not-really.html
OK, here is this untested a patch for xserver to add ARMv6 optimized YUV420 color format conversion. Theoretically it should compile (I did not try to build xserver myself though) and work. If it refuses to compile, fixing the patch should be not too difficult. In the worst case only video playback may be broked. But if everything works as expected, video output performance should become a lot better. Video output performance can be tested by mplayer using -benchmark option, 'VO:' stat shows how much time was used for video output, 'VC:' stat shows how much time was used for video decoding. Built-in video player also should become faster. I don't know if this improvement can be 'scientifically' benchmarked, but it should drop less frames on high resolution video playback. If any of you can build xserver package with this patch, please put it for download somewhere or send directly to me. Thanks.
diff -u -r -N xorg-server-1.1.99.3/hw/kdrive/omap/Makefile.am xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/Makefile.am --- xorg-server-1.1.99.3/hw/kdrive/omap/Makefile.am 2007-03-05 16:17:32.000000000 +0200 +++ xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/Makefile.am 2007-05-01 15:04:43.000000000 +0300 @@ -1,5 +1,5 @@ if XV -XV_SRCS = omap_video.c +XV_SRCS = omap_video.c omap_colorconv.S omap_colorconv.h endif if DEBUG @@ -34,4 +34,4 @@ $(TSLIB_FLAG) \ $(DYNSYMS) -EXTRA_DIST = omap_video.c +EXTRA_DIST = omap_video.c omap_colorconv.S omap_colorconv.h diff -u -r -N xorg-server-1.1.99.3/hw/kdrive/omap/omap_colorconv.h xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/omap_colorconv.h --- xorg-server-1.1.99.3/hw/kdrive/omap/omap_colorconv.h 1970-01-01 03:00:00.000000000 +0300 +++ xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/omap_colorconv.h 2007-05-01 15:06:13.000000000 +0300 @@ -0,0 +1,45 @@ +/* + * Copyright © 2007 Siarhei Siamashka + * + * Permission to use, copy, modify, distribute and sell this software and its + * documentation for any purpose is hereby granted without fee, provided that + * the above copyright notice appear in all copies and that both that + * copyright notice and this permission notice appear in supporting + * documentation, and that the names of the authors and/or copyright holders + * not be used in advertising or publicity pertaining to distribution of the + * software without specific, written prior permission. The authors and + * copyright holders make no representations about the suitability of this + * software for any purpose. It is provided "as is" without any express + * or implied warranty. + * + * THE AUTHORS AND COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO + * THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND + * FITNESS, IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR + * ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER + * RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF + * CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN + * CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + * + * Author: Siarhei Siamashka <[EMAIL PROTECTED]> + */ + +/* + * ARMv6 assembly optimized color format conversion functions + * (planar YV12 to some custom YUV420 format used by graphics chip in Nokia N800) + */ + +#ifndef _OMAP_COLORCONV_H_ +#define _OMAP_COLORCONV_H_ + +#include <stdint.h> + +/** + * Convert a line of pixels from YV12 to YUV420 color format + * @param dst - destination buffer for YUV420 pixel data, it should be at least 16-bit aligned + * @param src_y - pointer to Y plane, it should be 16-bit aligned + * @param src_c - pointer to chroma plane (U for even lines, V for odd lines) + * @param w - number of pixels to convert (should be multiple of 4) + */ +void yv12_to_yuv420_line_armv6(uint16_t *dst, const uint16_t *src_y, const uint8_t *src_c, int w); + +#endif diff -u -r -N xorg-server-1.1.99.3/hw/kdrive/omap/omap_colorconv.S xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/omap_colorconv.S --- xorg-server-1.1.99.3/hw/kdrive/omap/omap_colorconv.S 1970-01-01 03:00:00.000000000 +0300 +++ xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/omap_colorconv.S 2007-05-01 15:06:36.000000000 +0300 @@ -0,0 +1,244 @@ +/* + * Copyright © 2007 Siarhei Siamashka + * + * Permission to use, copy, modify, distribute and sell this software and its + * documentation for any purpose is hereby granted without fee, provided that + * the above copyright notice appear in all copies and that both that + * copyright notice and this permission notice appear in supporting + * documentation, and that the names of the authors and/or copyright holders + * not be used in advertising or publicity pertaining to distribution of the + * software without specific, written prior permission. The authors and + * copyright holders make no representations about the suitability of this + * software for any purpose. It is provided "as is" without any express + * or implied warranty. + * + * THE AUTHORS AND COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO + * THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND + * FITNESS, IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR + * ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER + * RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF + * CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN + * CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + * + * Author: Siarhei Siamashka <[EMAIL PROTECTED]> + */ + +/* + * ARMv6 assembly optimized color format conversion functions + * (planar YV12 to some custom YUV420 format used by graphics chip in Nokia N800) + */ + .text + +.macro YUV420_function_template function_name, USE_PLD, USE_ARMV6 + + .align + .global \function_name + .func \function_name +\function_name: + +#define DST r0 +#define SRC_Y r1 +#define SRC_U r2 +#define WIDTH r3 +#define TMP1 r10 +#define TMP2 r11 +#define TMP3 lr + +/* Read information about 4 pixels, convert them to YUV420 and store into 6 bytes using 16-bit writes */ +.macro CONVERT_4_PIXELS_MACROBLOCK + ldrb r4, [SRC_Y], #1 + ldrb TMP1, [SRC_U], #1 + ldrb r5, [SRC_U], #1 + ldrb TMP2, [SRC_Y], #1 + ldrb r6, [SRC_Y, #1] + ldrb TMP3, [SRC_Y], #2 + add r4, r4, TMP1, lsl #8 + add r5, r5, TMP2, lsl #8 + add r6, r6, TMP3, lsl #8 + strh r4, [DST], #2 + strh r5, [DST], #2 + strh r6, [DST], #2 +.endm + +.if \USE_ARMV6 + +.macro CONVERT_8_PIXELS_MACROBLOCK_1 DST_REG1, DST_REG2, FLAG1, FLAG2, PLD_FLAG +.if \FLAG1 == 0 + ldrb \DST_REG1, [SRC_U], #1 + ldrh TMP1, [SRC_Y], #2 + ldrb TMP2, [SRC_U], #1 +.endif +.if \FLAG2 == 1 + ldrh \DST_REG2, [SRC_Y], #2 +.endif +.if \PLD_FLAG == 1 + pld [SRC_Y, #48] +.endif + add \DST_REG1, \DST_REG1, TMP1, lsl #8 + add \DST_REG1, \DST_REG1, TMP2, lsl #24 +.if \FLAG2 == 1 + ldrb TMP1, [SRC_U], #1 + ldrb TMP2, [SRC_Y], #1 +.endif + rev16 \DST_REG1, \DST_REG1 +.endm + +.macro CONVERT_8_PIXELS_MACROBLOCK_2 DST_REG1, DST_REG2, FLAG1, FLAG2, DUMMY1 +.if \FLAG1 == 0 + ldrh \DST_REG1, [SRC_Y], #2 + ldrb TMP1, [SRC_U], #1 + ldrb TMP2, [SRC_Y], #1 +.endif +.if \FLAG2 == 1 + ldrb \DST_REG2, [SRC_Y], #1 +.endif + add \DST_REG1, \DST_REG1, TMP1, lsl #16 + add \DST_REG1, \DST_REG1, TMP2, lsl #24 +.if \FLAG2 == 1 + ldrb TMP1, [SRC_U], #1 + ldrh TMP2, [SRC_Y], #2 +.endif + rev16 \DST_REG1, \DST_REG1 +.endm + +.macro CONVERT_8_PIXELS_MACROBLOCK_3 DST_REG1, DST_REG2, FLAG1, FLAG2, DUMMY1 +.if \FLAG1 == 0 + ldrb \DST_REG1, [SRC_Y], #1 + ldrb TMP1, [SRC_U], #1 + ldrh TMP2, [SRC_Y], #2 +.endif +.if \FLAG2 == 1 + ldrb \DST_REG2, [SRC_U], #1 +.endif + add \DST_REG1, \DST_REG1, TMP1, lsl #8 + add \DST_REG1, \DST_REG1, TMP2, lsl #16 +.if \FLAG2 == 1 + ldrh TMP1, [SRC_Y], #2 + ldrb TMP2, [SRC_U], #1 +.endif + rev16 \DST_REG1, \DST_REG1 +.endm + +.else + +/* Prepare the first 32-bit output value for 8 pixels macroblock */ +.macro CONVERT_8_PIXELS_MACROBLOCK_1 DST_REG, DUMMY1, DUMMY2, DUMMY3, PLD_FLAG + ldrb \DST_REG, [SRC_Y], #1 + ldrb TMP1, [SRC_U], #1 + ldrb TMP2, [SRC_U], #1 + ldrb TMP3, [SRC_Y], #1 +.if \USE_PLD && (\PLD_FLAG == 1) + pld [SRC_Y, #48] +.endif + add \DST_REG, \DST_REG, TMP1, lsl #8 + add \DST_REG, \DST_REG, TMP2, lsl #16 + add \DST_REG, \DST_REG, TMP3, lsl #24 +.endm + +/* Prepare the second 32-bit output value for 8 pixels macroblock */ +.macro CONVERT_8_PIXELS_MACROBLOCK_2 DST_REG, DUMMY1, DUMMY2, DUMMY3, DUMMY4 + ldrb \DST_REG, [SRC_Y, #1] + ldrb TMP1, [SRC_Y], #2 + ldrb TMP2, [SRC_Y], #1 + ldrb TMP3, [SRC_U], #1 + add \DST_REG, \DST_REG, TMP1, lsl #8 + add \DST_REG, \DST_REG, TMP2, lsl #16 + add \DST_REG, \DST_REG, TMP3, lsl #24 +.endm + +/* Prepare the third 32-bit output value for 8 pixels macroblock */ +.macro CONVERT_8_PIXELS_MACROBLOCK_3 DST_REG, DUMMY1, DUMMY2, DUMMY3, DUMMY4 + ldrb \DST_REG, [SRC_U], #1 + ldrb TMP1, [SRC_Y], #1 + ldrb TMP2, [SRC_Y, #1] + ldrb TMP3, [SRC_Y], #2 + add \DST_REG, \DST_REG, TMP1, lsl #8 + add \DST_REG, \DST_REG, TMP2, lsl #16 + add \DST_REG, \DST_REG, TMP3, lsl #24 +.endm + +.endif + +.if \USE_PLD + pld [SRC_Y] +.endif + stmfd sp!, {r4-r8, r10-r11, lr} + + /* Destination buffer should be at least 16-bit aligned, image width should be multiple of 4 */ + bic DST, #1 + bic WIDTH, #3 + + /* Ensure 32-bit alignment of the destination buffer */ + tst DST, #2 + beq 1f + subs WIDTH, #4 + blt 6f + CONVERT_4_PIXELS_MACROBLOCK +1: + subs WIDTH, #32 + blt 3f +2: /* Convert 32 pixels per loop iteration */ + CONVERT_8_PIXELS_MACROBLOCK_1 r4, r6, 0, 1, 1 /* Also do cache preload for SRC_Y */ + CONVERT_8_PIXELS_MACROBLOCK_2 r6, r7, 1, 1, 0 + CONVERT_8_PIXELS_MACROBLOCK_3 r7, r8, 1, 1, 0 + CONVERT_8_PIXELS_MACROBLOCK_1 r8, r5, 1, 1, 0 + stmia DST!, {r4, r6, r7, r8} + + subs WIDTH, #32 + + CONVERT_8_PIXELS_MACROBLOCK_2 r5, r6, 1, 1, 0 + CONVERT_8_PIXELS_MACROBLOCK_3 r6, r7, 1, 1, 0 + CONVERT_8_PIXELS_MACROBLOCK_1 r7, r8, 1, 1, 0 + CONVERT_8_PIXELS_MACROBLOCK_2 r8, r4, 1, 1, 0 + stmia DST!, {r5, r6, r7, r8} +.if \USE_PLD + /* Do cache preload for SRC_U */ + pld [SRC_U, #48] +.endif + CONVERT_8_PIXELS_MACROBLOCK_3 r4, r6, 1, 1, 0 + CONVERT_8_PIXELS_MACROBLOCK_1 r6, r7, 1, 1, 0 + CONVERT_8_PIXELS_MACROBLOCK_2 r7, r8, 1, 1, 0 + CONVERT_8_PIXELS_MACROBLOCK_3 r8, r4, 1, 0, 0 + stmia DST!, {r4, r6, r7, r8} + + bge 2b +3: + adds WIDTH, WIDTH, #32 + ble 6f + + subs WIDTH, WIDTH, #8 + blt 5f +4: /* Convert remaining pixels processing them 8 per iteration */ + CONVERT_8_PIXELS_MACROBLOCK_1 r4, r5, 0, 1, 0 + CONVERT_8_PIXELS_MACROBLOCK_2 r5, r6, 1, 1, 0 + CONVERT_8_PIXELS_MACROBLOCK_3 r6, r7, 1, 0, 0 + stmia DST!, {r4-r6} + subs WIDTH, WIDTH, #8 + bge 4b +5: /* Convert the last 4 pixels if needed */ + adds WIDTH, WIDTH, #8 + ble 6f + CONVERT_4_PIXELS_MACROBLOCK + subs WIDTH, #4 + bgt 4b +6: /* Restore all registers and return */ + ldmfd sp!, {r4-r8, r10-r11, pc} + +.purgem CONVERT_4_PIXELS_MACROBLOCK +.purgem CONVERT_8_PIXELS_MACROBLOCK_1 +.purgem CONVERT_8_PIXELS_MACROBLOCK_2 +.purgem CONVERT_8_PIXELS_MACROBLOCK_3 + +#undef DST +#undef SRC_Y +#undef SRC_U +#undef WIDTH +#undef TMP1 +#undef TMP2 +#undef TMP3 + + .endfunc + +.endm + +YUV420_function_template yv12_to_yuv420_line_armv6, 1, 1 diff -u -r -N xorg-server-1.1.99.3/hw/kdrive/omap/omap_video.c xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/omap_video.c --- xorg-server-1.1.99.3/hw/kdrive/omap/omap_video.c 2007-03-07 11:34:46.000000000 +0200 +++ xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/omap_video.c 2007-05-01 15:03:45.000000000 +0300 @@ -39,6 +39,8 @@ #include <X11/extensions/Xv.h> #include "fourcc.h" +#include "omap_colorconv.h" + #define MAKE_ATOM(a) MakeAtom(a, sizeof(a) - 1, TRUE) #ifndef max @@ -466,8 +468,6 @@ int h, int w, int id) { CARD8 *srcy, *srcu, *srcv, *dst; - CARD16 *d1; - CARD32 *d2; int i, j; if ((randr & RR_Rotate_All) != RR_Rotate_0) { @@ -491,33 +491,9 @@ srcu = tmp; } - w >>= 2; + w &= ~3; for (i = 0; i < h; i++) { - CARD32 *sy = (CARD32 *) srcy; - CARD16 *sc; - - sc = (CARD16 *) ((i & 1) ? srcv : srcu); - d1 = (CARD16 *) dst; - - for (j = 0; j < w; j++) { - if (((unsigned long) d1) & 3) { - /* Luma 1, chroma 1. */ - *d1++ = (*sy & 0x000000ff) | ((*sc & 0x00ff) << 8); - /* Chroma 2, luma 2. */ - *d1++ = ((*sc & 0xff00) >> 8) | (*sy & 0x0000ff00); - } - else { - d2 = (CARD32 *) d1; - /* Luma 1, chroma 1, chroma 2, luma 2. */ - *d2++ = (*sy & 0x000000ff) | (*sc << 8) | - ((*sy & 0x0000ff00) << 16); - d1 = (CARD16 *) d2; - } - /* Luma 4, luma 3. */ - *d1++ = ((*sy & 0xff000000) >> 24) | ((*sy & 0x00ff0000) >> 8); - sy++; - sc++; - } + yv12_to_yuv420_line_armv6((uint16_t *)dst, (uint16_t *)srcy, (uint8_t *)((i & 1) ? srcv : srcu), w); dst += dstPitch; srcy += srcPitch;
_______________________________________________ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers