On Tuesday 01 May 2007 13:36, Kalle Vahlman wrote:
> 2007/5/1, Siarhei Siamashka <[EMAIL PROTECTED]>:
> > OK, thanks. It may take some time though. I'm still using old scratchbox
> > with mistral SDK here (did not have enough free time to upgrade yet).
> > Until I clean up my scratchbox mess, I can only provide some patch
> > without testing, if anybody courageous can try to build it :)
>
> Given that I fear not the perils of building a X server with
> nonstandard options[1], I shall be more than happy to conduct such
> adventurous acts :)
>
> And unless Mr. Kulve has objections, the results could be installed
> from a repository as well.
>
> [1] 
> 
http://syslog.movial.fi/archives/47-Shadows-for-everyone-well,-not-really.html

OK, here is this untested a patch for xserver to add ARMv6 optimized 
YUV420 color format conversion. Theoretically it should compile
(I did not try to build xserver myself though) and work. If it refuses to
compile, fixing the patch should be not too difficult.

In the worst case only video playback may be broked. But if everything works
as expected, video output performance should become a lot better.

Video output performance can be tested by mplayer using -benchmark 
option, 'VO:' stat shows how much time was used for video output, 'VC:' stat
shows how much time was used for video decoding.

Built-in video player also should become faster. I don't know if this
improvement can be 'scientifically' benchmarked, but it should drop less
frames on high resolution video playback.

If any of you can build xserver package with this patch, please put it for
download somewhere or send directly to me.

Thanks.
diff -u -r -N xorg-server-1.1.99.3/hw/kdrive/omap/Makefile.am xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/Makefile.am
--- xorg-server-1.1.99.3/hw/kdrive/omap/Makefile.am	2007-03-05 16:17:32.000000000 +0200
+++ xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/Makefile.am	2007-05-01 15:04:43.000000000 +0300
@@ -1,5 +1,5 @@
 if XV
-XV_SRCS = omap_video.c
+XV_SRCS = omap_video.c omap_colorconv.S omap_colorconv.h
 endif
 
 if DEBUG
@@ -34,4 +34,4 @@
 	$(TSLIB_FLAG)		\
 	$(DYNSYMS)
 
-EXTRA_DIST = omap_video.c
+EXTRA_DIST = omap_video.c omap_colorconv.S omap_colorconv.h
diff -u -r -N xorg-server-1.1.99.3/hw/kdrive/omap/omap_colorconv.h xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/omap_colorconv.h
--- xorg-server-1.1.99.3/hw/kdrive/omap/omap_colorconv.h	1970-01-01 03:00:00.000000000 +0300
+++ xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/omap_colorconv.h	2007-05-01 15:06:13.000000000 +0300
@@ -0,0 +1,45 @@
+/*
+ * Copyright © 2007 Siarhei Siamashka
+ *
+ * Permission to use, copy, modify, distribute and sell this software and its
+ * documentation for any purpose is hereby granted without fee, provided that
+ * the above copyright notice appear in all copies and that both that
+ * copyright notice and this permission notice appear in supporting
+ * documentation, and that the names of the authors and/or copyright holders
+ * not be used in advertising or publicity pertaining to distribution of the
+ * software without specific, written prior permission.  The authors and
+ * copyright holders make no representations about the suitability of this
+ * software for any purpose.  It is provided "as is" without any express
+ * or implied warranty.
+ *
+ * THE AUTHORS AND COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO
+ * THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
+ * FITNESS, IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR
+ * ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER
+ * RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF
+ * CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
+ * CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ *
+ * Author: Siarhei Siamashka <[EMAIL PROTECTED]>
+ */
+
+/*
+ * ARMv6 assembly optimized color format conversion functions
+ * (planar YV12 to some custom YUV420 format used by graphics chip in Nokia N800)
+ */
+
+#ifndef _OMAP_COLORCONV_H_
+#define _OMAP_COLORCONV_H_
+
+#include <stdint.h>
+
+/**
+ * Convert a line of pixels from YV12 to YUV420 color format
+ * @param dst   - destination buffer for YUV420 pixel data, it should be at least 16-bit aligned
+ * @param src_y - pointer to Y plane, it should be 16-bit aligned
+ * @param src_c - pointer to chroma plane (U for even lines, V for odd lines)
+ * @param w     - number of pixels to convert (should be multiple of 4)
+ */
+void yv12_to_yuv420_line_armv6(uint16_t *dst, const uint16_t *src_y, const uint8_t *src_c, int w);
+
+#endif
diff -u -r -N xorg-server-1.1.99.3/hw/kdrive/omap/omap_colorconv.S xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/omap_colorconv.S
--- xorg-server-1.1.99.3/hw/kdrive/omap/omap_colorconv.S	1970-01-01 03:00:00.000000000 +0300
+++ xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/omap_colorconv.S	2007-05-01 15:06:36.000000000 +0300
@@ -0,0 +1,244 @@
+/*
+ * Copyright © 2007 Siarhei Siamashka
+ *
+ * Permission to use, copy, modify, distribute and sell this software and its
+ * documentation for any purpose is hereby granted without fee, provided that
+ * the above copyright notice appear in all copies and that both that
+ * copyright notice and this permission notice appear in supporting
+ * documentation, and that the names of the authors and/or copyright holders
+ * not be used in advertising or publicity pertaining to distribution of the
+ * software without specific, written prior permission.  The authors and
+ * copyright holders make no representations about the suitability of this
+ * software for any purpose.  It is provided "as is" without any express
+ * or implied warranty.
+ *
+ * THE AUTHORS AND COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO
+ * THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
+ * FITNESS, IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR
+ * ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER
+ * RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF
+ * CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
+ * CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ *
+ * Author: Siarhei Siamashka <[EMAIL PROTECTED]>
+ */
+
+/*
+ * ARMv6 assembly optimized color format conversion functions
+ * (planar YV12 to some custom YUV420 format used by graphics chip in Nokia N800)
+ */
+        .text
+
+.macro YUV420_function_template function_name, USE_PLD, USE_ARMV6
+
+        .align
+        .global \function_name
+        .func \function_name
+\function_name:
+
+#define DST     r0
+#define SRC_Y   r1
+#define SRC_U   r2
+#define WIDTH   r3
+#define TMP1    r10
+#define TMP2    r11
+#define TMP3    lr
+
+/* Read information about 4 pixels, convert them to YUV420 and store into 6 bytes using 16-bit writes */
+.macro  CONVERT_4_PIXELS_MACROBLOCK
+        ldrb    r4, [SRC_Y], #1
+        ldrb    TMP1, [SRC_U], #1
+        ldrb    r5, [SRC_U], #1
+        ldrb    TMP2, [SRC_Y], #1
+        ldrb    r6, [SRC_Y, #1]
+        ldrb    TMP3, [SRC_Y], #2
+        add     r4, r4, TMP1, lsl #8
+        add     r5, r5, TMP2, lsl #8
+        add     r6, r6, TMP3, lsl #8
+        strh    r4, [DST], #2
+        strh    r5, [DST], #2
+        strh    r6, [DST], #2
+.endm
+
+.if \USE_ARMV6
+
+.macro  CONVERT_8_PIXELS_MACROBLOCK_1 DST_REG1, DST_REG2, FLAG1, FLAG2, PLD_FLAG
+.if \FLAG1 == 0
+        ldrb    \DST_REG1, [SRC_U], #1
+        ldrh    TMP1, [SRC_Y], #2
+        ldrb    TMP2, [SRC_U], #1
+.endif
+.if \FLAG2 == 1
+        ldrh    \DST_REG2, [SRC_Y], #2
+.endif
+.if \PLD_FLAG == 1
+        pld     [SRC_Y, #48]
+.endif
+        add     \DST_REG1, \DST_REG1, TMP1, lsl #8
+        add     \DST_REG1, \DST_REG1, TMP2, lsl #24
+.if \FLAG2 == 1
+        ldrb    TMP1, [SRC_U], #1
+        ldrb    TMP2, [SRC_Y], #1
+.endif
+        rev16   \DST_REG1, \DST_REG1
+.endm
+
+.macro  CONVERT_8_PIXELS_MACROBLOCK_2 DST_REG1, DST_REG2, FLAG1, FLAG2, DUMMY1
+.if \FLAG1 == 0
+        ldrh    \DST_REG1, [SRC_Y], #2
+        ldrb    TMP1, [SRC_U], #1
+        ldrb    TMP2, [SRC_Y], #1
+.endif
+.if \FLAG2 == 1
+        ldrb    \DST_REG2, [SRC_Y], #1
+.endif
+        add     \DST_REG1, \DST_REG1, TMP1, lsl #16
+        add     \DST_REG1, \DST_REG1, TMP2, lsl #24
+.if \FLAG2 == 1
+        ldrb    TMP1, [SRC_U], #1
+        ldrh    TMP2, [SRC_Y], #2
+.endif
+        rev16   \DST_REG1, \DST_REG1
+.endm
+
+.macro  CONVERT_8_PIXELS_MACROBLOCK_3 DST_REG1, DST_REG2, FLAG1, FLAG2, DUMMY1
+.if \FLAG1 == 0
+        ldrb    \DST_REG1, [SRC_Y], #1
+        ldrb    TMP1, [SRC_U], #1
+        ldrh    TMP2, [SRC_Y], #2
+.endif
+.if \FLAG2 == 1
+        ldrb    \DST_REG2, [SRC_U], #1
+.endif
+        add     \DST_REG1, \DST_REG1, TMP1, lsl #8
+        add     \DST_REG1, \DST_REG1, TMP2, lsl #16
+.if \FLAG2 == 1
+        ldrh    TMP1, [SRC_Y], #2
+        ldrb    TMP2, [SRC_U], #1
+.endif
+        rev16   \DST_REG1, \DST_REG1
+.endm
+
+.else
+
+/* Prepare the first 32-bit output value for 8 pixels macroblock */
+.macro  CONVERT_8_PIXELS_MACROBLOCK_1 DST_REG, DUMMY1, DUMMY2, DUMMY3, PLD_FLAG
+        ldrb    \DST_REG, [SRC_Y], #1
+        ldrb    TMP1, [SRC_U], #1
+        ldrb    TMP2, [SRC_U], #1
+        ldrb    TMP3, [SRC_Y], #1
+.if \USE_PLD && (\PLD_FLAG == 1)
+        pld     [SRC_Y, #48]
+.endif
+        add     \DST_REG, \DST_REG, TMP1, lsl #8
+        add     \DST_REG, \DST_REG, TMP2, lsl #16
+        add     \DST_REG, \DST_REG, TMP3, lsl #24
+.endm
+
+/* Prepare the second 32-bit output value for 8 pixels macroblock */
+.macro  CONVERT_8_PIXELS_MACROBLOCK_2 DST_REG, DUMMY1, DUMMY2, DUMMY3, DUMMY4
+        ldrb    \DST_REG, [SRC_Y, #1]
+        ldrb    TMP1, [SRC_Y], #2
+        ldrb    TMP2, [SRC_Y], #1
+        ldrb    TMP3, [SRC_U], #1
+        add     \DST_REG, \DST_REG, TMP1, lsl #8
+        add     \DST_REG, \DST_REG, TMP2, lsl #16
+        add     \DST_REG, \DST_REG, TMP3, lsl #24
+.endm
+
+/* Prepare the third 32-bit output value for 8 pixels macroblock */
+.macro  CONVERT_8_PIXELS_MACROBLOCK_3 DST_REG, DUMMY1, DUMMY2, DUMMY3, DUMMY4
+        ldrb    \DST_REG, [SRC_U], #1
+        ldrb    TMP1, [SRC_Y], #1
+        ldrb    TMP2, [SRC_Y, #1]
+        ldrb    TMP3, [SRC_Y], #2
+        add     \DST_REG, \DST_REG, TMP1, lsl #8
+        add     \DST_REG, \DST_REG, TMP2, lsl #16
+        add     \DST_REG, \DST_REG, TMP3, lsl #24
+.endm
+
+.endif
+
+.if \USE_PLD
+        pld     [SRC_Y]
+.endif
+        stmfd   sp!, {r4-r8, r10-r11, lr}
+
+        /* Destination buffer should be at least 16-bit aligned, image width should be multiple of 4 */
+        bic     DST, #1
+        bic     WIDTH, #3
+
+        /* Ensure 32-bit alignment of the destination buffer */
+        tst     DST, #2
+        beq     1f
+        subs    WIDTH, #4
+        blt     6f
+        CONVERT_4_PIXELS_MACROBLOCK
+1:
+        subs    WIDTH, #32
+        blt     3f
+2:      /* Convert 32 pixels per loop iteration */
+        CONVERT_8_PIXELS_MACROBLOCK_1 r4, r6, 0, 1, 1 /* Also do cache preload for SRC_Y */
+        CONVERT_8_PIXELS_MACROBLOCK_2 r6, r7, 1, 1, 0
+        CONVERT_8_PIXELS_MACROBLOCK_3 r7, r8, 1, 1, 0
+        CONVERT_8_PIXELS_MACROBLOCK_1 r8, r5, 1, 1, 0
+        stmia   DST!, {r4, r6, r7, r8}
+
+        subs    WIDTH, #32
+
+        CONVERT_8_PIXELS_MACROBLOCK_2 r5, r6, 1, 1, 0
+        CONVERT_8_PIXELS_MACROBLOCK_3 r6, r7, 1, 1, 0
+        CONVERT_8_PIXELS_MACROBLOCK_1 r7, r8, 1, 1, 0
+        CONVERT_8_PIXELS_MACROBLOCK_2 r8, r4, 1, 1, 0
+        stmia   DST!, {r5, r6, r7, r8}
+.if \USE_PLD
+         /* Do cache preload for SRC_U */
+        pld     [SRC_U, #48]
+.endif
+        CONVERT_8_PIXELS_MACROBLOCK_3 r4, r6, 1, 1, 0
+        CONVERT_8_PIXELS_MACROBLOCK_1 r6, r7, 1, 1, 0
+        CONVERT_8_PIXELS_MACROBLOCK_2 r7, r8, 1, 1, 0
+        CONVERT_8_PIXELS_MACROBLOCK_3 r8, r4, 1, 0, 0
+        stmia   DST!, {r4, r6, r7, r8}
+
+        bge     2b
+3:
+        adds    WIDTH, WIDTH, #32
+        ble     6f
+
+        subs    WIDTH, WIDTH, #8
+        blt     5f
+4:      /* Convert remaining pixels processing them 8 per iteration */
+        CONVERT_8_PIXELS_MACROBLOCK_1 r4, r5, 0, 1, 0
+        CONVERT_8_PIXELS_MACROBLOCK_2 r5, r6, 1, 1, 0
+        CONVERT_8_PIXELS_MACROBLOCK_3 r6, r7, 1, 0, 0
+        stmia   DST!, {r4-r6}
+        subs    WIDTH, WIDTH, #8
+        bge     4b
+5:      /* Convert the last 4 pixels if needed */
+        adds    WIDTH, WIDTH, #8
+        ble     6f
+        CONVERT_4_PIXELS_MACROBLOCK
+        subs    WIDTH, #4
+        bgt     4b
+6:      /* Restore all registers and return */
+        ldmfd  sp!, {r4-r8, r10-r11, pc}
+
+.purgem CONVERT_4_PIXELS_MACROBLOCK
+.purgem CONVERT_8_PIXELS_MACROBLOCK_1
+.purgem CONVERT_8_PIXELS_MACROBLOCK_2
+.purgem CONVERT_8_PIXELS_MACROBLOCK_3
+
+#undef  DST
+#undef  SRC_Y
+#undef  SRC_U
+#undef  WIDTH
+#undef  TMP1
+#undef  TMP2
+#undef  TMP3
+
+        .endfunc
+
+.endm
+
+YUV420_function_template yv12_to_yuv420_line_armv6, 1, 1
diff -u -r -N xorg-server-1.1.99.3/hw/kdrive/omap/omap_video.c xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/omap_video.c
--- xorg-server-1.1.99.3/hw/kdrive/omap/omap_video.c	2007-03-07 11:34:46.000000000 +0200
+++ xorg-server-1.1.99.3.yuv420patch/hw/kdrive/omap/omap_video.c	2007-05-01 15:03:45.000000000 +0300
@@ -39,6 +39,8 @@
 #include <X11/extensions/Xv.h>
 #include "fourcc.h"
 
+#include "omap_colorconv.h"
+
 #define MAKE_ATOM(a) MakeAtom(a, sizeof(a) - 1, TRUE)
 
 #ifndef max
@@ -466,8 +468,6 @@
                          int h, int w, int id)
 {
     CARD8 *srcy, *srcu, *srcv, *dst;
-    CARD16 *d1;
-    CARD32 *d2;
     int i, j;
 
     if ((randr & RR_Rotate_All) != RR_Rotate_0) {
@@ -491,33 +491,9 @@
         srcu = tmp;
     }
 
-    w >>= 2;
+    w &= ~3;
     for (i = 0; i < h; i++) {
-        CARD32 *sy = (CARD32 *) srcy;
-        CARD16 *sc;
-
-        sc = (CARD16 *) ((i & 1) ? srcv : srcu);
-        d1 = (CARD16 *) dst;
-
-        for (j = 0; j < w; j++) {
-            if (((unsigned long) d1) & 3) {
-                /* Luma 1, chroma 1. */
-                *d1++ = (*sy & 0x000000ff) | ((*sc & 0x00ff) << 8);
-                /* Chroma 2, luma 2. */
-                *d1++ = ((*sc & 0xff00) >> 8) | (*sy & 0x0000ff00);
-            }
-            else {
-                d2 = (CARD32 *) d1;
-                /* Luma 1, chroma 1, chroma 2, luma 2. */
-                *d2++ = (*sy & 0x000000ff) | (*sc << 8) |
-                        ((*sy & 0x0000ff00) << 16);
-                d1 = (CARD16 *) d2;
-            }
-            /* Luma 4, luma 3. */
-            *d1++ = ((*sy & 0xff000000) >> 24) | ((*sy & 0x00ff0000) >> 8);
-            sy++;
-            sc++;
-        }
+        yv12_to_yuv420_line_armv6((uint16_t *)dst, (uint16_t *)srcy, (uint8_t *)((i & 1) ? srcv : srcu), w);
 
         dst += dstPitch;
         srcy += srcPitch;
_______________________________________________
maemo-developers mailing list
maemo-developers@maemo.org
https://maemo.org/mailman/listinfo/maemo-developers

Reply via email to