Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations

Pekka Paalanen Tue, 17 Mar 2015 04:13:21 -0700

On Tue,  3 Mar 2015 15:24:20 +0000
Ben Avison <bavi...@riscosopen.org> wrote:


> ---
>  test/Makefile.sources |    1 +
>  test/affine-bench.c   |  330 
> +++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 331 insertions(+), 0 deletions(-)
>  create mode 100644 test/affine-bench.c
> 
> diff --git a/test/Makefile.sources b/test/Makefile.sources
> index c20c34b..8b0e855 100644
> --- a/test/Makefile.sources
> +++ b/test/Makefile.sources
> @@ -37,6 +37,7 @@ OTHERPROGRAMS =                 \
>       radial-perf-test        \
>          check-formats           \
>       scaling-bench           \
> +     affine-bench            \
>       $(NULL)

affine-bench should be added to .gitignore too.

>  
>  # Utility functions
> diff --git a/test/affine-bench.c b/test/affine-bench.c
> new file mode 100644
> index 0000000..55478d6
> --- /dev/null
> +++ b/test/affine-bench.c
> @@ -0,0 +1,330 @@
> +/*
> + * Copyright © 2014 RISC OS Open Ltd
> + *
> + * Permission to use, copy, modify, distribute, and sell this software and 
> its
> + * documentation for any purpose is hereby granted without fee, provided that
> + * the above copyright notice appear in all copies and that both that
> + * copyright notice and this permission notice appear in supporting
> + * documentation, and that the name of the copyright holders not be used in
> + * advertising or publicity pertaining to distribution of the software 
> without
> + * specific, written prior permission.  The copyright holders make no
> + * representations about the suitability of this software for any purpose.  
> It
> + * is provided "as is" without express or implied warranty.
> + *
> + * THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS
> + * SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
> + * FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY
> + * SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN
> + * AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING
> + * OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
> + * SOFTWARE.
> + *
> + * Author:  Ben Avison (bavi...@riscosopen.org)
> + */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <ctype.h>
> +#include <stdint.h>
> +#include "utils.h"
> +
> +#ifdef HAVE_GETTIMEOFDAY
> +#include <sys/time.h>
> +#else
> +#include <time.h>
> +#endif
> +
> +#define WIDTH  1920
> +#define HEIGHT 1080
> +#define L2CACHE_SIZE (128 * 1024)

Hi,

I see lowlevel-blt-bench.c also uses L2CACHE_SIZE, but where does it
come from? If it's really an arch/platform-independent constant, maybe
some test header could have it?

> +
> +typedef struct
> +{
> +    pixman_fixed_48_16_t        x1;
> +    pixman_fixed_48_16_t        y1;
> +    pixman_fixed_48_16_t        x2;
> +    pixman_fixed_48_16_t        y2;
> +} box_48_16_t;

Looks like Pixman style is using separate statements for struct
definition and typedefs.

> +
> +static pixman_bool_t
> +compute_transformed_extents (pixman_transform_t *transform,
> +                             const pixman_box32_t *extents,
> +                             box_48_16_t *transformed)

CODING_STYLE is asking for alignment here for the arg names.

> +{
> +    pixman_fixed_48_16_t tx1, ty1, tx2, ty2;
> +    pixman_fixed_t x1, y1, x2, y2;
> +    int i;
> +
> +    x1 = pixman_int_to_fixed (extents->x1) + pixman_fixed_1 / 2;
> +    y1 = pixman_int_to_fixed (extents->y1) + pixman_fixed_1 / 2;
> +    x2 = pixman_int_to_fixed (extents->x2) - pixman_fixed_1 / 2;
> +    y2 = pixman_int_to_fixed (extents->y2) - pixman_fixed_1 / 2;
> +
> +    if (!transform)
> +    {
> +        transformed->x1 = x1;
> +        transformed->y1 = y1;
> +        transformed->x2 = x2;
> +        transformed->y2 = y2;

If there is no transform, why not return the original extents?
These have been reduced by a half unit in all four directions.

> +
> +        return TRUE;
> +    }
> +
> +    tx1 = ty1 = INT64_MAX;
> +    tx2 = ty2 = INT64_MIN;
> +
> +    for (i = 0; i < 4; ++i)
> +    {
> +        pixman_fixed_48_16_t tx, ty;
> +        pixman_vector_t v;
> +
> +        v.vector[0] = (i & 0x01)? x1 : x2;
> +        v.vector[1] = (i & 0x02)? y1 : y2;
> +        v.vector[2] = pixman_fixed_1;
> +
> +        if (!pixman_transform_point (transform, &v))
> +            return FALSE;
> +
> +        tx = (pixman_fixed_48_16_t)v.vector[0];
> +        ty = (pixman_fixed_48_16_t)v.vector[1];

This works only for affine. This is called affine-bench, so that is
natural, yet might be nice to add an assert or something to guarantee
no-one will copy this code to something else that uses projective
transforms.

Or, maybe having "affine" as part of the function name?

> +
> +        if (tx < tx1)
> +            tx1 = tx;
> +        if (ty < ty1)
> +            ty1 = ty;
> +        if (tx > tx2)
> +            tx2 = tx;
> +        if (ty > ty2)
> +            ty2 = ty;
> +    }
> +
> +    transformed->x1 = tx1;
> +    transformed->y1 = ty1;
> +    transformed->x2 = tx2;
> +    transformed->y2 = ty2;
> +
> +    return TRUE;
> +}
> +
> +static void
> +create_image (uint32_t width, uint32_t height, pixman_format_code_t format, 
> pixman_filter_t filter, const pixman_transform_t *t, uint32_t **bits, 
> pixman_image_t **image)

Too long line, needs breaking up.

> +{
> +    uint32_t stride = (width * PIXMAN_FORMAT_BPP(format) + 31) / 32 * 4;

Add empty line.

> +    *bits = aligned_malloc (4096, stride * height);

Is there a #define to get the 4096 from? I assume it's page size? What
if the platform has big pages?

> +    memset (*bits, 0xCC, stride * height);
> +    *image = pixman_image_create_bits (format, width, height, *bits, stride);
> +    pixman_image_set_repeat (*image, PIXMAN_REPEAT_NORMAL);
> +    pixman_image_set_filter (*image, filter, NULL, 0);
> +}
> +
> +static void
> +flush_cache (void)
> +{
> +    static const char clean_space[L2CACHE_SIZE];
> +    volatile const char *x = clean_space;
> +    const char *clean_end = clean_space + sizeof clean_space;
> +    while (x < clean_end)
> +    {
> +        (void) *x;

Are you sure this actually does the read? That the compiler is not
allowed to discard the read? That's something I'm never sure of.

> +        x += 32;

Why 32? Is this CACHELINE_LENGTH from lowlevel-blt-bench.c?
Maybe there should be a header with the shared bench definitions?

> +    }
> +}
> +
> +/* Obtain current time in microseconds modulo 2^32 */
> +uint32_t
> +gettimei (void)
> +{
> +#ifdef HAVE_GETTIMEOFDAY
> +    struct timeval tv;
> +
> +    gettimeofday (&tv, NULL);
> +    return tv.tv_sec * 1000000 + tv.tv_usec;
> +#else
> +    return (double) clock () / (double) CLOCKS_PER_SEC;

This returns seconds; copy-pasta from utils.c?

It might also be worth considering adding support for clock_gettime,
because that allows you to use CLOCK_MONOTONIC or even
CLOCK_MONOTOMIC_RAW. Though, maybe that's not necessary since below you
use a 5 second run time.

> +#endif
> +}
> +
> +static void
> +pixman_image_composite_wrapper (pixman_composite_info_t *info)
> +{
> +    pixman_image_composite (info->op,
> +                            info->src_image, info->mask_image, 
> info->dest_image,
> +                            info->src_x, info->src_y,
> +                            info->mask_x, info->mask_y,
> +                            info->dest_x, info->dest_y,
> +                            info->width, info->height);
> +}
> +
> +static void
> +pixman_image_composite_empty (pixman_composite_info_t *info)
> +{
> +    pixman_image_composite (info->op,
> +                            info->src_image, info->mask_image, 
> info->dest_image,
> +                            info->src_x, info->src_y,
> +                            info->mask_x, info->mask_y,
> +                            info->dest_x, info->dest_y,
> +                            1, 1);
> +}
> +
> +static void
> +bench (pixman_op_t         op,
> +       pixman_transform_t *t,
> +       pixman_image_t     *src_image,
> +       pixman_image_t     *mask_image,
> +       pixman_image_t     *dest_image,
> +       int32_t             src_x,
> +       int32_t             src_y,
> +       uint32_t            max_n,
> +       uint32_t            max_time,
> +       uint32_t           *ret_n,
> +       uint32_t           *ret_time,
> +       void              (*func) (pixman_composite_info_t *info))
> +{
> +    uint32_t n = 0;
> +    uint32_t t0 = gettimei ();
> +    uint32_t t1;
> +    uint32_t x = 0;

Empty line.

> +    do
> +    {
> +        pixman_composite_info_t info;

Empty line.

> +        if (++x >= 64)
> +            x = 0;

Empty line.

> +        info.op = op;
> +        info.src_image = src_image;
> +        info.mask_image = mask_image;
> +        info.dest_image = dest_image;
> +        info.src_x = 0;
> +        info.src_y = 0;
> +        info.mask_x = 0;
> +        info.mask_y = 0;
> +        info.dest_x = 63 - x;
> +        info.dest_y = 0;
> +        info.width = WIDTH;
> +        info.height = HEIGHT;

Empty line.

> +        t->matrix[0][2] = pixman_int_to_fixed (src_x + x);
> +        t->matrix[1][2] = pixman_int_to_fixed (src_y);
> +        pixman_image_set_transform (src_image, t);

Empty line.

Hmm, what's the idea here with the transform?

> +        if (mask_image)
> +            pixman_image_set_transform (mask_image, t);

Empty line.

> +        func (&info);
> +        t1 = gettimei ();
> +    }
> +    while (++n < max_n && (t1 - t0) < max_time);

Empty line.

> +    if (ret_n)
> +        *ret_n = n;

Empty line.

> +    *ret_time = t1 - t0;
> +}
> +
> +int
> +main (int argc, char *argv[])
> +{
> +    pixman_filter_t      filter      = PIXMAN_FILTER_NEAREST;
> +    pixman_transform_t   t;
> +    pixman_op_t          op          = PIXMAN_OP_SRC;
> +    pixman_format_code_t src_format  = PIXMAN_a8r8g8b8;
> +    pixman_format_code_t mask_format = 0;
> +    pixman_format_code_t dest_format = PIXMAN_a8r8g8b8;
> +
> +    uint32_t *src, *mask, *dest;
> +    pixman_image_t *src_image, *mask_image = NULL, *dest_image;
> +    pixman_box32_t dest_box = { 0, 0, WIDTH, HEIGHT };
> +    box_48_16_t transformed = { 0 };
> +    int32_t xmin, ymin, xmax, ymax;
> +    int32_t src_x, src_y;
> +
> +    pixman_transform_init_identity (&t);
> +
> +    ++argv;
> +    if (*argv && (*argv)[0] == '-' && (*argv)[1] == 'n')
> +    {
> +        filter = PIXMAN_FILTER_NEAREST;
> +        ++argv;
> +        --argc;
> +    }
> +    if (*argv && (*argv)[0] == '-' && (*argv)[1] == 'b')
> +    {
> +        filter = PIXMAN_FILTER_BILINEAR;
> +        ++argv;
> +        --argc;
> +    }
> +    if (argc == 1)
> +    {
> +        printf ("Usage: affine-bench [-n] [-b] axx [axy] [ayx] [ayy] 
> [combine type]\n");
> +        printf ("                    [src format] [mask format] [dest 
> format]\n");
> +        printf ("  -n : nearest scaling (default)\n");
> +        printf ("  -b : bilinear scaling\n");
> +        printf ("  axx : x_out:x_in factor\n");
> +        printf ("  axy : x_out:y_in factor (default 0)\n");
> +        printf ("  ayx : y_out:x_in factor (default 0)\n");
> +        printf ("  ayy : y_out:y_in factor (default 1)\n");
> +        printf ("  combine type : src, over, in etc (default src)\n");
> +        printf ("  src format : a8r8g8b8, r5g6b6 etc (default a8r8g8b8)\n");
> +        printf ("  mask format : as for src format, but no mask used if 
> omitted\n");
> +        printf ("  dest format : as for src format (default a8r8g8b8)\n");
> +        return EXIT_FAILURE;
> +    }
> +    else
> +    {
> +        t.matrix[0][0] = pixman_double_to_fixed (strtod (*argv, NULL));
> +        if (*++argv)
> +        {
> +            t.matrix[0][1] = pixman_double_to_fixed (strtod (*argv, NULL));
> +            if (*++argv)
> +            {
> +                t.matrix[1][0] = pixman_double_to_fixed (strtod (*argv, 
> NULL));
> +                if (*++argv)
> +                {
> +                    t.matrix[1][1] = pixman_double_to_fixed (strtod (*argv, 
> NULL));
> +                    if (*++argv)
> +                    {
> +                        op = operator_from_string (*argv);
> +                        if (*++argv)
> +                        {
> +                            src_format = format_from_string (*argv);
> +                            ++argv;
> +                            if (argv[0] && argv[1])
> +                            {
> +                                mask_format = format_from_string (*argv);
> +                                ++argv;
> +                            }
> +                            if (*argv)
> +                                dest_format = format_from_string (*argv);
> +                        }
> +                    }
> +                }
> +            }
> +        }

All this nesting should be refactored away by using a helper function
with an early return for each else-branch.

I don't see invalid arguments detected at all?

Should we be able to give, say, src type without giving all the
matrix elements? That doesn't work atm.

This is getting complicated enough that maybe using getopt would be
good here. When you have lots of optional positional arguments, it gets
really hard to detect what is what.

> +    }
> +
> +    compute_transformed_extents (&t, &dest_box, &transformed);
> +    xmin = pixman_fixed_to_int (transformed.x1 - 8 * pixman_fixed_e - 
> pixman_fixed_1 / 2);
> +    ymin = pixman_fixed_to_int (transformed.y1 - 8 * pixman_fixed_e - 
> pixman_fixed_1 / 2);
> +    xmax = pixman_fixed_to_int (transformed.x2 + 8 * pixman_fixed_e + 
> pixman_fixed_1 / 2);
> +    ymax = pixman_fixed_to_int (transformed.y2 + 8 * pixman_fixed_e + 
> pixman_fixed_1 / 2);
> +    src_x = -xmin;
> +    src_y = -ymin;

Oh!

I was wondering what is going on here with the 8*e, grepped for
pixman_fixed_e, and noticed that this code comes from pixman.c. I think
you should copy also the comments, not only code.

And now I see also compute_transformed_extents is copied from pixman.c.
It still doesn't invalidate my questions - why does the original
function do what it does? Why is no transform producing different
results from identity transform (AFAIU)?

> +
> +    create_image (xmax - xmin + 64, ymax - ymin + 1, src_format, filter, &t, 
> &src, &src_image);

Empty line.

Ok, so you compute the source image size from the transformed extents
size, use src_x,y to position it to avoid translation increasing memory
allocation, and have a 64 pixel margin... is the margin supposed to be
the same on all sides or only on the right? What is that 64 about,
anyway? A minimal size in case the transform would make the source too
small?

A minimal size of 64x1 - any particular reason for 64?

> +    if (mask_format)
> +    {
> +        create_image (xmax - xmin + 64, ymax - ymin + 1, mask_format, 
> filter, &t, &mask, &mask_image);
> +        if ((PIXMAN_FORMAT_R(mask_format) || PIXMAN_FORMAT_G(mask_format) || 
> PIXMAN_FORMAT_B(mask_format)))
> +            pixman_image_set_component_alpha (mask_image, 1);
> +    }

Empty line.

> +    create_image (WIDTH + 64, HEIGHT, dest_format, filter, &t, &dest, 
> &dest_image);
> +
> +    {
> +        uint32_t n;  /* number of iterations in at least 5 seconds */
> +        uint32_t t1; /* time taken to do n iterations, microseconds */
> +        uint32_t t2; /* calling overhead for n iterations, microseconds */

Empty line.

> +        flush_cache ();

I wonder if flush_cache here has any meaning... bench is doing multiple
rounds, so flush_cache affects only the first round. Is that intended?

> +        bench (op, &t, src_image, mask_image, dest_image, src_x, src_y, 
> UINT32_MAX, 5000000, &n, &t1, pixman_image_composite_wrapper);
> +        bench (op, &t, src_image, mask_image, dest_image, src_x, src_y, n, 
> UINT32_MAX, NULL, &t2, pixman_image_composite_empty);
> +        /* The result indicates the output rate in megapixels/second */
> +        printf ("%6.2f\n", (double) n * WIDTH * HEIGHT / (t1 - t2));
> +    }

I find it a little odd to have a block like this. Maybe other code
should be moved into functions instead? Or this code?

> +
> +    return EXIT_SUCCESS;
> +}

Did you check how many rounds a Raspberry Pi 1 can do? Does it do more
than two in 5 seconds with an affine transform that is nearly identity
but not quite?


There are lots of small details to improve, but the idea seems fine to
me. What I would add is better checking for whether you are actually
running what you think you are: abort on invalid arguments, and maybe a
verbose mode for checking the arguments were interpreted correctly,
also printing the transformed extents. I don't trust myself that much
when writing commands so sanity checking would be cool.


Thanks,
pq
_______________________________________________
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH 5/5] test: Add a new benchmarker targeting affine operations

Reply via email to