[ANNOUNCE] pixman release 0.43.4 now available

2024-02-29 Thread Matt Turner
A new pixman release 0.43.4 is now available.

tar.gz:
https://cairographics.org/releases/pixman-0.43.4.tar.gz
https://www.x.org/releases/individual/lib/pixman-0.43.4.tar.gz

tar.xz:
https://www.x.org/releases/individual/lib/pixman-0.43.4.tar.xz

Hashes:
SHA256: 
a0624db90180c7ddb79fc7a9151093dc37c646d8c38d3f232f767cf64b85a226  
pixman-0.43.4.tar.gz
SHA256: 
48d8539f35488d694a2fef3ce17394d1153ed4e71c05d1e621904d574be5df19  
pixman-0.43.4.tar.xz
SHA512: 
08802916648bab51fd804fc3fd823ac2c6e3d622578a534052b657491c38165696d5929d03639c52c4f29d8850d676a909f0299d1a4c76a07df18a34a896e43d
  pixman-0.43.4.tar.gz
SHA512: 
b40fb05bd58dc78f4e4e9b19c86991ab0611b708657c9a7fb42bfe82d57820a0fde01a34b00a0848a41da6c3fb90c2213942a70f435a0e9467631695d3bc7e36
  pixman-0.43.4.tar.xz

PGP signature:
https://cairographics.org/releases/pixman-0.43.4.tar.gz.sha512.asc

Git:
https://gitlab.freedesktop.org/pixman/pixman.git
tag: pixman-0.43.4

Log:

Gayathri Berli (1):
  Revert the changes to fix the problem in big-endian architectures

Heiko Lewin (1):
  Allow to build pixman on clang/arm32

Makoto Kato (1):
  pixman-arm: Fix build on clang/arm32

Matt Turner (5):
  pixman-x86: Use cpuid.h header
  pixman-x86: Move #include "cpuid.h" inside conditionals
  Revert "Allow to build pixman on clang/arm32"
  pixman-arm: Use unified syntax
  Pre-release version bump to 0.43.4

Simon Ser (1):
  Post-release version bump to 0.43.3



signature.asc
Description: PGP signature


Re: [Pixman] [ANNOUNCE] pixman release 0.42.2 now available

2022-11-03 Thread Matt Turner
On Wed, Nov 2, 2022 at 1:37 PM Matt Turner  wrote:
>
> A new pixman release 0.42.2 is now available. This is a stable release
> in the 0.42 series.
>
> This version contains a fix for a heap overflow. A CVE has been
> requested, and I'll reply to this email with the number when it is
> allocated.

This has been assigned CVE-2022-44638.


[Pixman] [ANNOUNCE] pixman release 0.42.2 now available

2022-11-02 Thread Matt Turner
A new pixman release 0.42.2 is now available. This is a stable release
in the 0.42 series.

This version contains a fix for a heap overflow. A CVE has been
requested, and I'll reply to this email with the number when it is
allocated. 

See 
https://gitlab.freedesktop.org/pixman/pixman/-/commit/a1f88e842e0216a5b4df1ab023caebe33c101395
and https://gitlab.freedesktop.org/pixman/pixman/-/issues/63 for more 
information.

Thanks to Maddie Stone and Google's Project Zero for discovering this
issue, providing a proof-of-concept, and a great analysis.

tar.gz:
https://cairographics.org/releases/pixman-0.42.2.tar.gz
https://www.x.org/releases/individual/lib/pixman-0.42.2.tar.gz

tar.xz:
https://www.x.org/releases/individual/lib/pixman-0.42.2.tar.xz

Hashes:
SHA256: 
ea1480efada2fd948bc75366f7c349e1c96d3297d09a3fe62626e38e234a625e  
pixman-0.42.2.tar.gz
SHA256: 
5747d2ec498ad0f1594878cc897ef5eb6c29e91c53b899f7f71b506785fc1376  
pixman-0.42.2.tar.xz
SHA512: 
0a4e327aef89c25f8cb474fbd01de834fd2a1b13fdf7db11ab72072082e45881cd16060673b59d02054b1711ae69c6e2395f6ae9214225ee7153939efcd2fa5d
  pixman-0.42.2.tar.gz
SHA512: 
3476e2676e66756b1af61b1e532cd80c985c191fb7956eb01702b419726cce99e79163b7f287f74f66414680e7396d13c3fee525cd663f12b6ac4877070ff4e8
  pixman-0.42.2.tar.xz

GPG signature:
https://cairographics.org/releases/pixman-0.42.2.tar.gz.sha512.asc
(signed by [ultimate] Matt Turner 
 [ultimate] Matt Turner 
 [ultimate] Matt Turner 
 [ultimate] Matt Turner )

Git:
https://gitlab.freedesktop.org/pixman/pixman.git
tag: pixman-0.42.2

Log:
    Matt Turner (4):
  build: Add a64-neon-test.S to EXTRA_DIST
  Revert "Fix signed-unsigned semantics in reduce_32"
  Avoid integer overflow leading to out-of-bounds write
  Pre-release version bump to 0.42.2

Simon Ser (3):
  Post-release version bump to 0.42.1
  meson: override pixman-1 dependency
  meson: explicitly set C standard to gnu99

Thomas Klausner (2):
  configure.ac: avoid unportable test(1) operator
  Makefile.am: increase shell portability


signature.asc
Description: PGP signature


Re: [Pixman] Performance regression with pixman 0.40

2021-06-15 Thread Matt Turner
Cc'ing the patch author, since I don't think he's subscribed.

On Fri, Jun 4, 2021 at 12:15 AM  wrote:
>
> Hi,
>
> We are developping a graphics framework called EGT dedicated to Microchip 
> parts:
> https://github.com/linux4sam/egt
>
> We are using Cairo, and so Pixman, for the drawing part. Updating our
> distribution, we noticed a performance decrease in our benchmark suite, in
> the worst case our fps decrease from 200 to 60.
>
> We have identified the move from Pixman 0.38.4 to 0.40 as the cause. I did a
> bisect to find which commit impacts us and it's this one:
>
> commit 6fe0131394fb029d2fccaee6b8edcb108840ad8a (refs/bisect/bad)
> Author: Federico Mena Quintero 
> Date:   Wed Mar 18 18:49:30 2020 -0600
>
> Initialize temporary buffers in general_composite_rect()
>
> Otherwise, Valgrind shows things like "conditional jump or move
> depends on uninitialised values" errors much later in calling code.
> For example, see https://gitlab.gnome.org/GNOME/librsvg/issues/572
>
> Fixes https://gitlab.freedesktop.org/pixman/pixman/issues/9
>
> diff --git a/pixman/pixman-general.c b/pixman/pixman-general.c
> index 7d74f98..7e5a0d0 100644
> --- a/pixman/pixman-general.c
> +++ b/pixman/pixman-general.c
> @@ -165,6 +165,12 @@ general_composite_rect  (pixman_implementation_t *imp,
>
> if (!scanline_buffer)
> return;
> +
> +   memset (scanline_buffer, 0, width * Bpp * 3 + 15 * 3);
> +}
> +else
> +{
> +   memset (stack_scanline_buffer, 0, sizeof (stack_scanline_buffer));
>  }
>
>  src_buffer = ALIGN (scanline_buffer);
>
>
> I don't know which drawing paths are impacted by this change, I can dig 
> further
> if needed. We have 2 benches with small performance decrease for all our
> devices: armv5 and armv7. And one bench with huge performance decrease on our
> armv5 device. This bench is about drawing circles with alpha blending. Other
> benches which draw squares, squares with alpha blending, and circles are not
> impacted.
>
> For sure, having an extra memset in the path can explain the performance
> decrease.
>
> Do we have to consider that the new scores we get are the valid ones or can we
> find an alternative?
>
> Thanks
>
> Regards,
> Ludovic
> ___
> Pixman mailing list
> Pixman@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/pixman
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Prevent empty top-level declaration

2020-04-26 Thread Matt Turner
On Sun, Nov 17, 2019 at 4:48 PM Michael Forney  wrote:
>
> The expansion of PIXMAN_DEFINE_THREAD_LOCAL(...) may end in a
> function definition, so the following semicolon is considered an
> empty top-level declaration, which is not allowed in ISO C.
> ---
>  pixman/pixman-compiler.h   | 6 +++---
>  pixman/pixman-implementation.c | 2 +-
>  2 files changed, 4 insertions(+), 4 deletions(-)
>

Thanks! Committed, and sorry for losing track of the patch.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [ANNOUNCE] pixman release 0.40.0 now available

2020-04-19 Thread Matt Turner
A new pixman release 0.40.0 is now available. This is a stable release.

tar.gz:
https://cairographics.org/releases/pixman-0.40.0.tar.gz
https://www.x.org/releases/individual/lib/pixman-0.40.0.tar.gz

tar.xz:
https://www.x.org/releases/individual/lib/pixman-0.40.0.tar.xz

Hashes:
SHA256: 
6d200dec3740d9ec4ec8d1180e25779c00bc749f94278c8b9021f5534db223fc  
pixman-0.40.0.tar.gz
SHA256: 
da8ed9fe2d1c5ef8ce5d1207992db959226bd4e37e3f88acf908fd9a71e2704e  
pixman-0.40.0.tar.xz
SHA512: 
063776e132f5d59a6d3f94497da41d6fc1c7dca0d269149c78247f0e0d7f520a25208d908cf5e421d1564889a91da44267b12d61c0bd7934cd54261729a7de5f
  pixman-0.40.0.tar.gz
SHA512: 
8a60edb113d68791b41bd90b761ff7b3934260cb3dada3234c9351416f61394e4157353bc4d61b8f6c2c619de470f6feefffb4935bfcf79d291ece6285de7270
  pixman-0.40.0.tar.xz

GPG signature:
https://cairographics.org/releases/pixman-0.40.0.tar.gz.sha512.asc
(signed by [ultimate] Matt Turner 
 [ultimate] Matt Turner 
 [ultimate] Matt Turner 
 [ultimate] Matt Turner )

Git:
https://gitlab.freedesktop.org/pixman/pixman.git
tag: pixman-0.40.0

Log:
Adam Jackson (17):
  test: Fix undefined left shift in affine-test
  test: Fix undefined left shift in pixel_checker_init
  pixman: Fix undefined left shift in pixel_contract_from_float
  pixman-access: Fix various undefined left shifts
  pixman-combine: Fix various undefined left shifts
  pixman-image: Fix undefined left shift
  pixman-gradient-walker: Fix undefined left shift
  pixman-sse2: Fix an undefined left shift
  pixman-fast-path: Fix various undefined left shifts
  pixman-bits-image: Fix various undefined left shifts
  pixman-bits-image: Fix left shift of a negative number
  pixman-matrix: Fix left shift of a negative number
  test: Fix unrepresentable subtraction in stress-test
  pixman-mmx: Fix undefined left-shifts
  pixman-mmx: Fix undefined unaligned loads
  pixman-sse2: Fix undefined unaligned loads
  fast-path: Fix some sketchy pointer arithmetic

Antonio Ospite (1):
  pixman-compiler.h: fix building tests with MinGW

Basile Clement (6):
  Fix bilinear filter computation in wide pipeline
  Implement basic dithering for the wide pipeline, v3
  test: Check the dithering path in tolerance-test
  demos: Add a dithering demo
  Ordered dithering with blue noise, v2
  Don't use GNU extension for binary numbers

Christoph Reiter (3):
  meson: define SIZEOF_LONG  and use -Wundef
  meson: allow building a static library
  meson: fix TLS support under mingw

Chun-wei Fan (11):
  meson.build: Fix MMX, SSE2 and SSSE3 checks on MSVC
  meson.build: Disable OpenMP on MSVC builds
  build: Don't assume PThreads if threading support is found
  meson.build: Improve libpng search on MSVC
  pixman/pixman-version.h.in: Add a PIXMAN_API macro
  pixman/pixman.h: Mark public APIs with PIXMAN_API
  pixman-[compiler|private].h: Export symbols for tests
  pixman/meson.build: Define PIXMAN_API on MSVC-style compilers
  test/solid-test.c: Include stdint.h
  demos: Define _USE_MATH_DEFINES on MSVC-style compilers
  thread-test.c: Use Windows Threading API on Windows

Dylan Baker (1):
  meson: don't use link_with for library()

Fan Jinke (1):
  add Hygon Dhyana support to enable X86_MMX_EXTENSIONS feature

Federico Mena Quintero (1):
  Initialize temporary buffers in general_composite_rect()

Ghabry (1):
  Enabled armv6 SIMD for 3DS (devkitARM) and arm neon SIMD for PS 
Vita (vit

Jonathan Kew (2):
  Explicitly cast byte to uint32_t before left-shifting.
  Avoid undefined behavior (left-shifting negative value) in 
pixman_int_to_

Khem Raj (1):
  test/utils: Check for FE_INVALID definition before use

Mathieu Duponchelle (2):
  meson: finish porting over mmx and ssse2 flags for sun and msvc
  meson: add missing function check (getisax)
    
Matt Turner (7):
  Post-release version bump to 0.38.5
  lowlevel-blt-bench: Remove unused variable
  loongson: Avoid C90 mixing-code-and-decls warning
  Distribute the blue-noise files
  Build xz tarballs instead of bzip2
  Move from MD5/SHA1 to SHA256/SHA512 digests
  Pre-release version

Re: [Pixman] [PATCH 1/2] configure.ac: Use '-mloongson-mmi' for Loongson MMI.

2020-04-05 Thread Matt Turner
On Thu, Mar 26, 2020 at 5:57 AM Shiyou Yin  wrote:
>
> It's recommended to use '-mloongson-mmi' for MMI.
> ---
>  configure.ac | 2 +-
>  meson.build  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/configure.ac b/configure.ac
> index 1ca3974..fd7df47 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -273,7 +273,7 @@ dnl 
> ===
>  dnl Check for Loongson Multimedia Instructions
>
>  if test "x$LS_CFLAGS" = "x" ; then
> -LS_CFLAGS="-march=loongson2f"
> +LS_CFLAGS="-mloongson-mmi"
>  fi
>
>  have_loongson_mmi=no
> diff --git a/meson.build b/meson.build
> index 15d3409..a45c969 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -51,7 +51,7 @@ endforeach
>
>  use_loongson_mmi = get_option('loongson-mmi')
>  have_loongson_mmi = false
> -loongson_mmi_flags = ['-march=loongson2f']
> +loongson_mmi_flags = ['-mloongson-mmi']
>  if not use_loongson_mmi.disabled()
>if host_machine.cpu_family() == 'mips64' and cc.compiles('''
>#ifndef __mips_loongson_vector_rev
> --

Thanks very much. This looks good to me. My only (minor) concern is
that the -mloongson-mmi flag is only available since GCC 9, but likely
any users would need to change -march=loongson2f to -march=loongson3a
anyway, and they can easily change -mloongson-mmi back to -march=...
if needed.

I'll just double check that with this patch that the test suite passes
on my Yeeloong and then commit it. (and sorry for my delayed response)
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH v2] build: improve control logic for enabling MMI.

2020-03-08 Thread Matt Turner
Thank you for the patch!

On Fri, Mar 6, 2020 at 3:28 AM Shiyou Yin  wrote:
>
> From: Yin Shiyou 

Should be yinshiyou-hf@loongson*.cn*?

>
> 1. Replace LS_CFLAGS with MMI_CFLAGS to express its intention more accurately.
>LS_CFLAGS is still available, but it is not recommended.

I'm not aware of any reasons why LS_CFLAGS needs to stay for
compatibility. Do we know of any distros that set it to override the
-march=... value?

> 2. Improve the control logic for enabling MMI.
>
> Three essential conditions for enabling MMI:
> 1) user have not specify --disable-loongson-mmi.
> 2) MMI options has been specified by MMI_CFLAGS,CC or compiler's default 
> setting.
> 3) compiler supports these MMI options.
> ---
>  configure.ac | 69 
> 

We should also update meson.build. I expect/hope that the autotools
build system will go away sometime in the future.

I'm not sure I entirely understand the patch. I understand that the
objective is to make it possible to easily build pixman for Loongson3A
and use the pixman-mmx.c optimizations.

I think it's currently possible to build pixman on mips without
specifying -march=loongson* in CFLAGS and it will enable the
pixman-mmx.c paths and choose them at runtime. Is part of the goal to
keep that working? If so, could we just use the -mloongson-mmi flag to
compile pixman-mmx.c?

Or does that flag mean the Loongson3A variants of the instructions?
What happens if you compile with -march=loongson2f -mloongson-mmi?
Does GCC generate instructions compatible with 2F or 3A?
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH v2 2/3] build: use '-mloongson-mmi' for Loongson MMI.

2020-03-08 Thread Matt Turner
On Sat, Feb 22, 2020 at 6:34 AM YunQiang Su  wrote:
>
> Shiyou Yin  于2020年2月22日周六 下午9:26写道:
> >
> > >-Original Message-
> > >From: Adam Jackson [mailto:a...@redhat.com]
> > >Sent: Friday, February 21, 2020 11:33 PM
> > >To: Yin Shiyou; pixman@lists.freedesktop.org
> > >Subject: Re: [Pixman] [PATCH v2 2/3] build: use '-mloongson-mmi' for 
> > >Loongson MMI.
> > >
> > >On Thu, 2020-02-20 at 22:23 +0800, Yin Shiyou wrote:
> > >> It's suggested to use '-mloongson-mmi' to enable MMI.
> > >> To keep compatible with old processor, '-mloongson-mmi' will be
> > >> setted for Loongson-3A only.
> > >
> > >The pattern we've used for other CPUs is to build support for as many
> > >ISA extensions as possible, unless they are explicitly disabled.
> > >Distributions tend to want to set their own minimum ISA levels, and if
> > >they wanted to assert -mloongson-mmi they would already have added it
> > >to CFLAGS globally.
> > >
> > >Do you have any performance data for this change?
> > >
> > >If setting -mloongson-mmi means the compiler can do useful
> > >autovectorization, then that's probably true for other arches too (eg
> > >amd64 vs avx2), and we should support this kind of thing more
> > >generically. But as it stands I don't think this patch is a good idea.
> > >
> > First, that's introduce the history of '-march=loongson2f' and 
> > '-mloongson-mmi'.
> > From loongson2f start, mmi is supported by loongson processor.
>
> Yes. So that's why when we code, we should be very careful, especially
> when we work on base part of a OS, just like pixman.
> One, history mistake will make all of the people painful.
>
> An exmaple is about time_t on 32bit system.
>
> > Unfortunately, the compiler's support for MMI extention is not standardized.
> > Gcc compiler use '-march=loongson2f' for loongson2f at first, but from 
> > Loongson-3A,
> > opcode of mmi instruction has changed, and '-march=loongson3a' is in 
> > replaced.
>
> That is the reason some of Loongson's extensions make upstream unhappy.
> You need be always very careful when you design a CPU.
> 如履薄冰. No zuo no die.
>
> > From last year, compile option for mmi instruction has been standardized.
> > Just like -mmsa for mips MSA. (MMI,LSX,LASX is Loongson SIMD extention.)
> > -mloongson-mmi   for MMI (-march=loongson3a still works, but -mloongson-mmi 
> > is recommended for new processors except Loongson2f. )
> > -mloongson-sx for LSX
> > -mloongson-asxfor LASX
>
> That is good news.
>
> >
> > Second, back to this patch itself.
> > I meet a problem when compile pixman on my Loongson3a with gcc, MMI can't 
> > be enabled.
> > configure check failure: " linking mips:loongson_2f module with previous 
> > mips:gs464 modules"
> > It can be solved by assign LS_CFLAGS="-mloongson-mmi" while config.
> > So I submit this patch in hope that no need to assign LS_CFLAGS explicitly.
> > This won't have much impact on performance as I know.
>
> Here is not about performance. You made a bad design, that is burden of 
> history.

If you're referring to using -march=loongson2f in configure.ac, then I
should point out that that was my choice, and I don't really know what
other options I had -- or even have today. As far as I know,
-march=loongson* was, until recently, the only way to enable the SIMD
instructions, and worse, if I recall correctly Loongson 2E and 2F are
not entirely binary compatible themselves!

The only stable Loongson system I've ever had is a Yeeloong -- 2F, so
it's what I chose. Like I said in another email, I even tried building
pixman-mmx.c multiple times with different -march=... values, linking
them all into libpixman, and choosing which to execute at runtime, but
binutils does not allow linking object files that are compiled with
different -march=... values on mips for reasons I do not know.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH v2 2/3] build: use '-mloongson-mmi' for Loongson MMI.

2020-03-08 Thread Matt Turner
On Sat, Feb 22, 2020 at 5:26 AM Shiyou Yin  wrote:
>
> >-Original Message-
> >From: Adam Jackson [mailto:a...@redhat.com]
> >Sent: Friday, February 21, 2020 11:33 PM
> >To: Yin Shiyou; pixman@lists.freedesktop.org
> >Subject: Re: [Pixman] [PATCH v2 2/3] build: use '-mloongson-mmi' for 
> >Loongson MMI.
> >
> >On Thu, 2020-02-20 at 22:23 +0800, Yin Shiyou wrote:
> >> It's suggested to use '-mloongson-mmi' to enable MMI.
> >> To keep compatible with old processor, '-mloongson-mmi' will be
> >> setted for Loongson-3A only.
> >
> >The pattern we've used for other CPUs is to build support for as many
> >ISA extensions as possible, unless they are explicitly disabled.
> >Distributions tend to want to set their own minimum ISA levels, and if
> >they wanted to assert -mloongson-mmi they would already have added it
> >to CFLAGS globally.
> >
> >Do you have any performance data for this change?
> >
> >If setting -mloongson-mmi means the compiler can do useful
> >autovectorization, then that's probably true for other arches too (eg
> >amd64 vs avx2), and we should support this kind of thing more
> >generically. But as it stands I don't think this patch is a good idea.
> >
> First, that's introduce the history of '-march=loongson2f' and 
> '-mloongson-mmi'.
> From loongson2f start, mmi is supported by loongson processor.
> Unfortunately, the compiler's support for MMI extention is not standardized.
> Gcc compiler use '-march=loongson2f' for loongson2f at first, but from 
> Loongson-3A,
> opcode of mmi instruction has changed, and '-march=loongson3a' is in replaced.
> From last year, compile option for mmi instruction has been standardized.
> Just like -mmsa for mips MSA. (MMI,LSX,LASX is Loongson SIMD extention.)
> -mloongson-mmi   for MMI (-march=loongson3a still works, but -mloongson-mmi 
> is recommended for new processors except Loongson2f. )
> -mloongson-sx for LSX
> -mloongson-asxfor LASX
>
> Second, back to this patch itself.
> I meet a problem when compile pixman on my Loongson3a with gcc, MMI can't be 
> enabled.
> configure check failure: " linking mips:loongson_2f module with previous 
> mips:gs464 modules"

Do you know why this is?

Obviously we can and do build MMX, SSE2, SSSE3 paths and choose to
execute them at runtime.

Why does binutils not allow combining object files that are compiled
with mixed -march=... values on mips? I cannot find the branch now,
but I tried once to make pixman build pixman-mmx.c with three
different -march=... values (2e, 2f, 3a) and choose which to execute
at runtime, but binutils would not allow the files to be linked into
the same binary.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] pixman-combine: Fix wrong value of RB_MASK_PLUS_ONE.

2020-02-20 Thread Matt Turner
On Thu, Feb 20, 2020 at 6:35 AM Shiyou Yin  wrote:
> Will this patch be merged?

Yes, pushed. Thanks!
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] pixman-combine: Fix wrong value of RB_MASK_PLUS_ONE.

2020-02-08 Thread Matt Turner
On Mon, Feb 3, 2020 at 1:56 AM Yin Shiyou  wrote:
>
> ---
>  pixman/pixman-combine32.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/pixman/pixman-combine32.h b/pixman/pixman-combine32.h
> index cdd56a6..59bb247 100644
> --- a/pixman/pixman-combine32.h
> +++ b/pixman/pixman-combine32.h
> @@ -12,7 +12,7 @@
>  #define RB_MASK 0xff00ff
>  #define AG_MASK 0xff00ff00
>  #define RB_ONE_HALF 0x800080
> -#define RB_MASK_PLUS_ONE 0x1100
> +#define RB_MASK_PLUS_ONE 0x1000100


Thanks. The patch looks correct, but obviously nothing in the test
suite is failing. How did you discover this? Does this patch fix
something for you?
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] Optimize Graphic Routines for s390x in Pixman - Queries

2020-02-08 Thread Matt Turner
On Sat, Jan 25, 2020 at 4:57 AM Naveen Naidu  wrote:
>
> Hello Everyone,
>
> I am Naveen a Senior Year Computer Science Undergraduate from India. I am 
> planning to apply for Open Mainframe Project 
> Internship(https://github.com/openmainframeproject-internship/resources) 
> program, whose one of the proposed project is to Optimize graphics routines 
> for s390x in pixman.
>
> The description of the project is as follows:
>
>> With the introduction of VirtIO GPU hardware (virtual graphic adapter for 
>> KVM-based virtual machines) for the s390x platform it makes sense to provide 
>> optimized routines in the pixman library also for the s390x architecture.
>
>
> From what I gather from the description, t s390x has support for vector 
> instruction i.e SIMD instructions and since these instructions quicken the 
> processing, the project asks us to write an implementation of pixman that 
> uses the vector instructions for s390x.
>
> I have also been going through the Implementation for Power VMX SIMD, which 
> was created to use the Vector instructions for Power PC. But I must confess 
> that I am a little lost.
>
> It would be really kind of you all if you could guide me in what I would need 
> to learn/do in order for me to be able to implement the project. I've had a 
> course on computer graphics in our undergrad so I do understand the 
> fundamentals. But I would really like to know the right way of steps to do 
> the project so that I can get a better understanding of the project.
>
> Thank you very much for your time,
> Naveen

Welcome :)

Here's some snippets of an email I sent to someone else interested in
contributing optimization to pixman:

Background information for the operations pixman implements:
http://ssp.impulsetrain.com/porterduff.html (written by the author of Pixman)
https://en.wikipedia.org/wiki/Alpha_compositing

`lowlevel-blt-bench` lives in pixman's test/ directory. It's a small
self-contained benchmark. Run with

   ./test/lowlevel-blt-bench all
   ./test/lowlevel-blt-bench over__

etc. The -b (bilinear) and -n (nearest) options are useful as well.
Firefox traces will show lots of usage of bilinear and nearest scaling
functions.

There's an environment variable named PIXMAN_DISABLE=... which is very
useful for getting side-by-side performance comparisons of MMX vs SSE2
vs AVX2. (For S390, since it doesn't already have some optimizations,
it may not be particularly useful). It works for both
lowlevel-blt-bench and cairo-perf-trace.

Cairo
https://cgit.freedesktop.org/cairo/My
https://cgit.freedesktop.org/cairo-traces/

`cairo-perf-trace` lives in cairo's perf directory. Run with

   CAIRO_TEST_TARGET=image16,image ./perf/cairo-perf-trace ~/path/to/trace

The trace files in cairo-traces are .lzma files which will have to be
decompressed. Decompress with lzma -dk trace.lzma or alternatively run
make in cairo-traces to uncompress them all. Pass the uncompressed
file to cairo-perf-trace. The arguments to CAIRO_TEST_TARGET specify
what backend Cairo should use. 'image' corresponds to 32-bit visuals,
and 'image16' is 16-bit visuals.

Here's a couple of my blog posts about some work I did on pixman.
Maybe you can find something valuable in them.
https://mattst88.com/blog/2012/05/17/Optimizing_pixman_for_Loongson:_Process_and_Results/
https://mattst88.com/blog/2012/07/06/My_time_optimizing_graphics_performance_on_the_OLPC_XO_1,75_laptop/


I would look at the pixman_sse2.c file for examples of what pixman
optimizations look like. That may be a better starting point than the
POWER optimizations. I have a small branch here
(https://cgit.freedesktop.org/~mattst88/pixman/log/?h=avx2) that
demonstrates adding a set of optimizations for a new instruction set.
I expect it would be helpful to look over.

Thanks,
Matt
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] [dither] Don't use GNU extension for binary numbers

2019-06-10 Thread Matt Turner
Thanks. Pushed.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] Dithering patches, v2

2019-05-13 Thread Matt Turner
On Sat, May 11, 2019 at 7:42 AM Bryce Harrington
 wrote:
>
> On Tue, May 07, 2019 at 09:52:39AM -0700, Matt Turner wrote:
> > On Sun, May 5, 2019 at 11:50 AM Bryce Harrington
> >  wrote:
> > >
> > > On Mon, Apr 22, 2019 at 09:26:48AM -0700, Matt Turner wrote:
> > > > On Fri, Apr 19, 2019 at 4:52 PM Bryce Harrington
> > > >  wrote:
> > > > > Inkscape would love to see Basile's dithering patches included.  Our
> > > > > testing shows that they make a huge quality difference for our users;
> > > > > this solves a critical need.
> > > > >
> > > > > Mc and I have done some preliminary investigation into how to plumb 
> > > > > this
> > > > > into Cairo, and would love to hear your review of Basile's approach to
> > > > > the problem.
> > > >
> > > > I don't feel like I'm experienced enough with that side of pixman to
> > > > offer meaningful comments. I've Cc'd Søren in the hopes that he
> > > > remains interested enough in the project to review the patches that
> > > > Basile says implement the approach Søren described.
> > >
> > > I totally understand, I'd feel the same.  But I think this is an
> > > important patch, so how can we move forward with it?
> >
> > If you're happy with the patches, I'd say let's commit them.
>
> Works for me, would you prefer me to commit them, or will you be
> committing them yourself?

I'd prefer you commit them since they're for Inkscape.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] Dithering patches, v2

2019-05-07 Thread Matt Turner
On Sun, May 5, 2019 at 11:50 AM Bryce Harrington
 wrote:
>
> On Mon, Apr 22, 2019 at 09:26:48AM -0700, Matt Turner wrote:
> > On Fri, Apr 19, 2019 at 4:52 PM Bryce Harrington
> >  wrote:
> > > Inkscape would love to see Basile's dithering patches included.  Our
> > > testing shows that they make a huge quality difference for our users;
> > > this solves a critical need.
> > >
> > > Mc and I have done some preliminary investigation into how to plumb this
> > > into Cairo, and would love to hear your review of Basile's approach to
> > > the problem.
> >
> > I don't feel like I'm experienced enough with that side of pixman to
> > offer meaningful comments. I've Cc'd Søren in the hopes that he
> > remains interested enough in the project to review the patches that
> > Basile says implement the approach Søren described.
>
> I totally understand, I'd feel the same.  But I think this is an
> important patch, so how can we move forward with it?

If you're happy with the patches, I'd say let's commit them.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] Dithering patches, v2

2019-04-22 Thread Matt Turner
On Fri, Apr 19, 2019 at 4:52 PM Bryce Harrington
 wrote:
> Inkscape would love to see Basile's dithering patches included.  Our
> testing shows that they make a huge quality difference for our users;
> this solves a critical need.
>
> Mc and I have done some preliminary investigation into how to plumb this
> into Cairo, and would love to hear your review of Basile's approach to
> the problem.

I don't feel like I'm experienced enough with that side of pixman to
offer meaningful comments. I've Cc'd Søren in the hopes that he
remains interested enough in the project to review the patches that
Basile says implement the approach Søren described.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH 2/2] AVX2 implementation of OVER, ROVER, ADD, ROUT operators.

2019-04-16 Thread Matt Turner
On Thu, Mar 28, 2019 at 10:41 PM Matt Turner  wrote:
>
> On Wed, Mar 27, 2019 at 1:06 PM Matt Turner  wrote:
> >
> > Thank you. I'll run some benchmarks on my KBL system to confirm and
> > then commit them.
> >
> > I'm planning to do a 0.40 release soon with some Meson fixes and other
> > small things. Seems like these patches will be good to include to make
> > the release have a new feature :)
>
> Or maybe not.
>
> I benchmarked cairo-traces. The only thing that improved measurably
> was poppler. I thought, well, at least we improved that and then
> remembering my patch that also improved it I applied it, only to
> realize that you incorporated my patch into your work without
> mentioning it.
>
> And so your poppler improvements are in fact from my patch, now
> modified and silently combined into this one. That's really bad form.

Review processes undertaken indicate that Raghu wrote this code
independently of me. My apologies for suggesting otherwise.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

[Pixman] [ANNOUNCE] pixman release 0.38.4 now available

2019-04-10 Thread Matt Turner

A new pixman release 0.38.4 is now available. This is a stable release in the
in the 0.38 series.

tar.gz:
https://cairographics.org/releases/pixman-0.38.4.tar.gz
https://www.x.org/releases/individual/lib/pixman-0.38.4.tar.gz

tar.bz2:
https://www.x.org/releases/individual/lib/pixman-0.38.4.tar.bz2

Hashes:
MD5:  267a7af290f93f643a1bc74490d9fdd1  pixman-0.38.4.tar.gz
MD5:  16a350a8a40116ddf67632a1d2623711  pixman-0.38.4.tar.bz2
SHA1: 8594e0a31c1802ae0c155d6b502c0953aa862baa  pixman-0.38.4.tar.gz
SHA1: 87e1abc91ac4e5dfcc275f744f1d0ec3277ee7cd  pixman-0.38.4.tar.bz2

GPG signature:
https://cairographics.org/releases/pixman-0.38.4.tar.gz.sha1.asc
(signed by [ultimate] Matt Turner 
[ultimate] Matt Turner 
[ultimate] Matt Turner 
[ultimate] Matt Turner )

Git:
https://gitlab.freedesktop.org/pixman/pixman.git
tag: pixman-0.38.4

Log:
Matt Turner (4):
  Post-release version bump to 0.38.3
  Makefile.am: Update download links
  Makefile.am: Ship Meson assembly test files in the tarball
  Pre-release version bump to 0.38.4


signature.asc
Description: PGP signature
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

[Pixman] [ANNOUNCE] pixman release 0.38.2 now available

2019-04-07 Thread Matt Turner


A new pixman release 0.38.2 is now available. This is a stable release in the
in the 0.38 series.

This release mostly contains fixes for the Meson build system.

tar.gz:
https://cairographics.org/releases/pixman-0.38.2.tar.gz
https://www.x.org/releases/individual/lib/pixman-0.38.2.tar.gz

tar.bz2:
https://www.x.org/releases/individual/lib/pixman-0.38.2.tar.bz2

Hashes:
MD5:  e216abae705641038ca782c6d6fd4204  pixman-0.38.2.tar.gz
MD5:  dfdbebf2ce6c2ff0891247c55f928d97  pixman-0.38.2.tar.bz2
SHA1: c2abaea13ff9f12f31592859604047d8b1fa082a  pixman-0.38.2.tar.gz
SHA1: ce40833fe4337aa6329ac5694d9ff342338219c1  pixman-0.38.2.tar.bz2

GPG signature:
http://cairographics.org/releases/pixman-0.38.2.tar.gz.sha1.asc
(signed by [ultimate] Matt Turner 
[ultimate] Matt Turner 
[ultimate] Matt Turner 
[ultimate] Matt Turner )

Git:
https://gitlab.freedesktop.org/pixman/pixman.git
tag: pixman-0.38.2

Log:
Dylan Baker (6):
  meson: work around meson issue #5115
  meson: fix typo which breaks loongson checks
  meson: fix copy-n-paste error for arm simd assembly
  meson: Add proper include paths for the loongson check
  meson: simplify and fix mmx library compilation
  meson: store ARM SIMD and NEON tests as text files

Matt Turner (2):
  meson: Correct copy-and-paste mistake
  Pre-release version bump to 0.38.2

Niveditha Rau (1):
  void function should not return a value

Simon Richter (2):
  Windows: Show compiler invocation
  Windows: Support building with SHELL=cmd.exe


signature.asc
Description: PGP signature
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH 2/2] AVX2 implementation of OVER, ROVER, ADD, ROUT operators.

2019-03-28 Thread Matt Turner
On Wed, Mar 27, 2019 at 1:06 PM Matt Turner  wrote:
>
> Thank you. I'll run some benchmarks on my KBL system to confirm and
> then commit them.
>
> I'm planning to do a 0.40 release soon with some Meson fixes and other
> small things. Seems like these patches will be good to include to make
> the release have a new feature :)

Or maybe not.

I benchmarked cairo-traces. The only thing that improved measurably
was poppler. I thought, well, at least we improved that and then
remembering my patch that also improved it I applied it, only to
realize that you incorporated my patch into your work without
mentioning it.

And so your poppler improvements are in fact from my patch, now
modified and silently combined into this one. That's really bad form.

From a technical perspective, I think we're back where we started:
with an AVX2 implementation of over__ that does not provide a
meaningful improvement in any cairo-trace and me doubting whether it's
worth pursuing this project any further. To be honest, at this point I
would prefer that you not continue this project.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH] void function should not return a value

2019-03-27 Thread Matt Turner
Thanks. Merged.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH 1/2] Windows: Show compiler invocation

2019-03-27 Thread Matt Turner
Thanks. Merged both.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

Re: [Pixman] [PATCH 2/2] AVX2 implementation of OVER, ROVER, ADD, ROUT operators.

2019-03-27 Thread Matt Turner
Thank you. I'll run some benchmarks on my KBL system to confirm and
then commit them.

I'm planning to do a 0.40 release soon with some Meson fixes and other
small things. Seems like these patches will be good to include to make
the release have a new feature :)
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman

[Pixman] [PATCH] avx2: Add fast path for over_reverse_n_8888

2019-01-21 Thread Matt Turner
lowlevel-blt-bench, over_reverse_n_, 100 iterations:

   Before  After
  Mean StdDev Mean StdDev   Confidence   Change
L1  2372.6   2.50   4387.6   8.00100.00% +84.9%
L2  2490.3   5.29   4326.5  20.79100.00% +73.7%
M   2418.3  10.43   3718.0  38.55100.00% +53.7%
HT  1555.8  13.35   2112.9  23.85100.00% +35.8%
VT  1120.1   9.58   1403.7  15.43100.00% +25.3%
R958.5  17.66   1176.9  20.87100.00% +22.8%
RT   407.3   6.79450.1   7.22100.00% +10.5%

At most 18 outliers rejected per test per set.

cairo-perf-trace with trimmed traces, 30 iterations:

Before  After
   Mean StdDev Mean StdDev   Confidence   Change
poppler   0.516  0.0030.478  0.002   100.000%  +8.1%

Cairo perf reports the running time, but the change is computed for
operations per second instead (inverse of running time).
---
 pixman/pixman-avx2.c | 94 
 1 file changed, 94 insertions(+)

diff --git a/pixman/pixman-avx2.c b/pixman/pixman-avx2.c
index faef552..6a67515 100644
--- a/pixman/pixman-avx2.c
+++ b/pixman/pixman-avx2.c
@@ -28,6 +28,18 @@ negate_2x256 (__m256i  data_lo,
 *neg_hi = _mm256_xor_si256 (data_hi, MASK_00FF_AVX2);
 }
 
+static force_inline __m256i
+unpack_32_1x256 (uint32_t data)
+{
+return _mm256_unpacklo_epi8 (_mm256_broadcastd_epi32 (_mm_cvtsi32_si128 
(data)), _mm256_setzero_si256 ());
+}
+
+static force_inline __m256i
+expand_pixel_32_1x256 (uint32_t data)
+{
+return _mm256_shuffle_epi32 (unpack_32_1x256 (data), _MM_SHUFFLE (1, 0, 1, 
0));
+}
+
 static force_inline __m256i
 pack_2x256_256 (__m256i lo, __m256i hi)
 {
@@ -100,6 +112,13 @@ save_256_aligned (__m256i* dst,
 _mm256_store_si256 (dst, data);
 }
 
+static force_inline void
+save_256_unaligned (__m256i* dst,
+   __m256i  data)
+{
+_mm256_storeu_si256 (dst, data);
+}
+
 static force_inline int
 is_opaque_256 (__m256i x)
 {
@@ -429,12 +448,87 @@ avx2_composite_over__ (pixman_implementation_t 
*imp,
src += src_stride;
 }
 }
+
+static void
+avx2_composite_over_reverse_n_ (pixman_implementation_t *imp,
+   pixman_composite_info_t *info)
+{
+PIXMAN_COMPOSITE_ARGS (info);
+uint32_t src;
+uint32_t*dst_line, *dst;
+__m256i ymm_src;
+__m256i ymm_dst, ymm_dst_lo, ymm_dst_hi;
+__m256i ymm_dsta_hi, ymm_dsta_lo;
+int dst_stride;
+int32_t w;
+
+src = _pixman_image_get_solid (imp, src_image, dest_image->bits.format);
+
+if (src == 0)
+   return;
+
+PIXMAN_IMAGE_GET_LINE (
+   dest_image, dest_x, dest_y, uint32_t, dst_stride, dst_line, 1);
+
+ymm_src = expand_pixel_32_1x256 (src);
+
+while (height--)
+{
+   dst = dst_line;
+
+   dst_line += dst_stride;
+   w = width;
+
+   while (w >= 8)
+   {
+   __m256i tmp_lo, tmp_hi;
+
+   ymm_dst = load_256_unaligned ((__m256i*)dst);
+
+   unpack_256_2x256 (ymm_dst, &ymm_dst_lo, &ymm_dst_hi);
+   expand_alpha_2x256 (ymm_dst_lo, ymm_dst_hi, &ymm_dsta_lo, 
&ymm_dsta_hi);
+
+   tmp_lo = ymm_src;
+   tmp_hi = ymm_src;
+
+   over_2x256 (&ymm_dst_lo, &ymm_dst_hi,
+   &ymm_dsta_lo, &ymm_dsta_hi,
+   &tmp_lo, &tmp_hi);
+
+   save_256_unaligned (
+   (__m256i*)dst, pack_2x256_256 (tmp_lo, tmp_hi));
+
+   w -= 8;
+   dst += 8;
+   }
+
+   while (w)
+   {
+   __m128i vd;
+
+   vd = unpack_32_1x128 (*dst);
+
+   *dst = pack_1x128_32 (over_1x128 (vd, expand_alpha_1x128 (vd),
+ _mm256_castsi256_si128 
(ymm_src)));
+   w--;
+   dst++;
+   }
+
+}
+
+}
+
 static const pixman_fast_path_t avx2_fast_paths[] =
 {
 PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, a8r8g8b8, 
avx2_composite_over__),
 PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, x8r8g8b8, 
avx2_composite_over__),
 PIXMAN_STD_FAST_PATH (OVER, a8b8g8r8, null, a8b8g8r8, 
avx2_composite_over__),
 PIXMAN_STD_FAST_PATH (OVER, a8b8g8r8, null, x8b8g8r8, 
avx2_composite_over__),
+
+/* PIXMAN_OP_OVER_REVERSE */
+PIXMAN_STD_FAST_PATH (OVER_REVERSE, solid, null, a8r8g8b8, 
avx2_composite_over_reverse_n_),
+PIXMAN_STD_FAST_PATH (OVER_REVERSE, solid, null, a8b8g8r8, 
avx2_composite_over_reverse_n_),
+
 { PIXMAN_OP_NONE },
 };
 
-- 
2.19.2

___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 3/3] Rev2 of patch: AVX2 versions of OVER and ROVER operators.

2019-01-21 Thread Matt Turner
On Wed, Jan 16, 2019 at 4:57 PM Raghuveer Devulapalli
 wrote:
>
> From: raghuveer devulapalli 
>
> These were found to be upto 1.8 times faster (depending on the array
> size) than the corresponding SSE2 version. The AVX2 and SSE2 were
> benchmarked on a Intel(R) Core(TM) i5-6260U CPU @ 1.80GHz. The AVX2 and
> SSE versions were benchmarked by measuring how many TSC cycles each of
> the avx2_combine_over_u and sse2_combine_over_u functions took to run
> for various array sizes. For the purpose of benchmarking, turbo was
> disabled and intel_pstate governor was set to performance to avoid
> variance in CPU frequencies across multiple runs.
>
> | Array size | #cycles SSE2 | #cycles AVX2 |
> 
> | 400| 53966| 32800|
> | 800| 107595   | 62595|
> | 1600   | 214810   | 122482   |
> | 3200   | 429748   | 241971   |
> | 6400   | 859070   | 481076   |
>
> Also ran lowlevel-blt-bench for OVER__ operation and that
> also shows a 1.55x-1.79x improvement over SSE2. Here are the details:
>
> AVX2: OVER__ =  L1:2136.35  L2:2109.46  M:1751.99 ( 60.90%)
> SSE2: OVER__ =  L1:1188.91  L2:1190.63  M:1128.32 ( 40.31%)
>
> The AVX2 implementation uses the SSE2 version for manipulating pixels
> that are not 32 byte aligned. The helper functions from pixman-sse2.h
> are re-used for this purpose.

I still cannot measure any performance improvement with cairo-traces.
If we're not improving performance in any real world application, then
I don't think it's worth adding a significant amount of code.

As I told you in person and in private mail, I suspect that you're
more likely to see real performance improvements in operations that
are more compute-heavy, like bilinear filtering. You could probably
use AVX2's gather instructions in the bilinear code as well. Filling
out the avx2_iters array would also be a good place to start, since
those functions execute when we do not have a specific fast-path for
an operation (which will be the case for AVX2).

I sense that you want to check this off your todo list and move on. If
that's the case, we can include the avx2_composite_over_reverse_n_
function I wrote (and will send as a separate patch) to confirm that
using AVX2 is capable of giving a performance improvement in some
cairo traces.

> ---
>  pixman/pixman-avx2.c | 431 ++-
>  1 file changed, 430 insertions(+), 1 deletion(-)
>
> diff --git a/pixman/pixman-avx2.c b/pixman/pixman-avx2.c
> index d860d67..faef552 100644
> --- a/pixman/pixman-avx2.c
> +++ b/pixman/pixman-avx2.c
> @@ -6,13 +6,439 @@
>  #include "pixman-private.h"
>  #include "pixman-combine32.h"
>  #include "pixman-inlines.h"
> +#include "pixman-sse2.h"
>
> +#define MASK_0080_AVX2 _mm256_set1_epi16(0x0080)
> +#define MASK_00FF_AVX2 _mm256_set1_epi16(0x00ff)
> +#define MASK_0101_AVX2 _mm256_set1_epi16(0x0101)
> +
> +static force_inline __m256i
> +load_256_aligned (__m256i* src)
> +{
> +return _mm256_load_si256(src);
> +}
> +
> +static force_inline void
> +negate_2x256 (__m256i  data_lo,
> + __m256i  data_hi,
> + __m256i* neg_lo,
> + __m256i* neg_hi)
> +{
> +*neg_lo = _mm256_xor_si256 (data_lo, MASK_00FF_AVX2);
> +*neg_hi = _mm256_xor_si256 (data_hi, MASK_00FF_AVX2);
> +}
> +
> +static force_inline __m256i
> +pack_2x256_256 (__m256i lo, __m256i hi)
> +{
> +return _mm256_packus_epi16 (lo, hi);
> +}
> +

Stray space

> +static force_inline void
> +pix_multiply_2x256 (__m256i* data_lo,
> +   __m256i* data_hi,
> +   __m256i* alpha_lo,
> +   __m256i* alpha_hi,
> +   __m256i* ret_lo,
> +   __m256i* ret_hi)
> +{
> +__m256i lo, hi;
> +
> +lo = _mm256_mullo_epi16 (*data_lo, *alpha_lo);
> +hi = _mm256_mullo_epi16 (*data_hi, *alpha_hi);
> +lo = _mm256_adds_epu16 (lo, MASK_0080_AVX2);
> +hi = _mm256_adds_epu16 (hi, MASK_0080_AVX2);
> +*ret_lo = _mm256_mulhi_epu16 (lo, MASK_0101_AVX2);
> +*ret_hi = _mm256_mulhi_epu16 (hi, MASK_0101_AVX2);
> +}
> +

Stray space

> +static force_inline void
> +over_2x256 (__m256i* src_lo,
> +   __m256i* src_hi,
> +   __m256i* alpha_lo,
> +   __m256i* alpha_hi,
> +   __m256i* dst_lo,
> +   __m256i* dst_hi)
> +{
> +__m256i t1, t2;
> +
> +negate_2x256 (*alpha_lo, *alpha_hi, &t1, &t2);
> +
> +pix_multiply_2x256 (dst_lo, dst_hi, &t1, &t2, dst_lo, dst_hi);
> +
> +*dst_lo = _mm256_adds_epu8 (*src_lo, *dst_lo);
> +*dst_hi = _mm256_adds_epu8 (*src_hi, *dst_hi);
> +}
> +
> +static force_inline void
> +expand_alpha_2x256 (__m256i  data_lo,
> +   __m256i  data_hi,
> +   __m256i* alpha_lo,
> +   __m256i* alpha_hi)
> +{
> +__m256i lo, hi;
> +
> +lo = _mm256_shufflelo_epi16 (data_lo, _MM_SHUFFLE (3, 3, 3, 3));
> +   

Re: [Pixman] [PATCH 2/3] Moving helper functions in pixman-sse2.c to pixman-sse2.h.

2019-01-21 Thread Matt Turner
On Wed, Jan 16, 2019 at 4:57 PM Raghuveer Devulapalli
 wrote:
>
> From: raghuveer devulapalli 
>
> These helper function will be reused in pixman-avx2.c implementations in
> the future.
> ---
>  pixman/pixman-sse2.c | 504 +--
>  pixman/pixman-sse2.h | 502 ++
>  2 files changed, 503 insertions(+), 503 deletions(-)
>  create mode 100644 pixman/pixman-sse2.h
>
> diff --git a/pixman/pixman-sse2.c b/pixman/pixman-sse2.c
> index 8955103..8dea0c2 100644
> --- a/pixman/pixman-sse2.c
> +++ b/pixman/pixman-sse2.c
> @@ -32,509 +32,7 @@
>
>  /* PSHUFD is slow on a lot of old processors, and new processors have SSSE3 
> */
>  #define PSHUFD_IS_FAST 0
> -
> -#include  /* for _mm_shuffle_pi16 and _MM_SHUFFLE */
> -#include  /* for SSE2 intrinsics */
> -#include "pixman-private.h"
> -#include "pixman-combine32.h"
> -#include "pixman-inlines.h"
> -
> -static __m128i mask_0080;
> -static __m128i mask_00ff;
> -static __m128i mask_0101;
> -static __m128i mask_;
> -static __m128i mask_ff00;
> -static __m128i mask_alpha;
> -
> -static __m128i mask_565_r;
> -static __m128i mask_565_g1, mask_565_g2;
> -static __m128i mask_565_b;
> -static __m128i mask_red;
> -static __m128i mask_green;
> -static __m128i mask_blue;
> -
> -static __m128i mask_565_fix_rb;
> -static __m128i mask_565_fix_g;
> -
> -static __m128i mask_565_rb;
> -static __m128i mask_565_pack_multiplier;
> -

These are moving to pixman-sse2.h to be used by the code below, which
is to be used by the AVX2 code. But they're initialized in
_pixman_implementation_create_sse2(), which means if you used
PIXMAN_DISABLE=sse2 the AVX2 paths would fail.

I suspect these constants do need to be prefixed with "sse2_", and in
_pixman_x86_get_implementations() you should disable avx2 if
PIXMAN_DISABLE=sse2.
> -static force_inline __m128i
> -unpack_32_1x128 (uint32_t data)
> -{
> -return _mm_unpacklo_epi8 (_mm_cvtsi32_si128 (data), _mm_setzero_si128 
> ());
> -}
> -
> -static force_inline void
> -unpack_128_2x128 (__m128i data, __m128i* data_lo, __m128i* data_hi)
> -{
> -*data_lo = _mm_unpacklo_epi8 (data, _mm_setzero_si128 ());
> -*data_hi = _mm_unpackhi_epi8 (data, _mm_setzero_si128 ());
> -}
> -
> -static force_inline __m128i
> -unpack_565_to_ (__m128i lo)
> -{
> -__m128i r, g, b, rb, t;
> -
> -r = _mm_and_si128 (_mm_slli_epi32 (lo, 8), mask_red);
> -g = _mm_and_si128 (_mm_slli_epi32 (lo, 5), mask_green);
> -b = _mm_and_si128 (_mm_slli_epi32 (lo, 3), mask_blue);
> -
> -rb = _mm_or_si128 (r, b);
> -t  = _mm_and_si128 (rb, mask_565_fix_rb);
> -t  = _mm_srli_epi32 (t, 5);
> -rb = _mm_or_si128 (rb, t);
> -
> -t  = _mm_and_si128 (g, mask_565_fix_g);
> -t  = _mm_srli_epi32 (t, 6);
> -g  = _mm_or_si128 (g, t);
> -
> -return _mm_or_si128 (rb, g);
> -}
> -
> -static force_inline void
> -unpack_565_128_4x128 (__m128i  data,
> -  __m128i* data0,
> -  __m128i* data1,
> -  __m128i* data2,
> -  __m128i* data3)
> -{
> -__m128i lo, hi;
> -
> -lo = _mm_unpacklo_epi16 (data, _mm_setzero_si128 ());
> -hi = _mm_unpackhi_epi16 (data, _mm_setzero_si128 ());
> -
> -lo = unpack_565_to_ (lo);
> -hi = unpack_565_to_ (hi);
> -
> -unpack_128_2x128 (lo, data0, data1);
> -unpack_128_2x128 (hi, data2, data3);
> -}
> -
> -static force_inline uint16_t
> -pack_565_32_16 (uint32_t pixel)
> -{
> -return (uint16_t) (((pixel >> 8) & 0xf800) |
> -  ((pixel >> 5) & 0x07e0) |
> -  ((pixel >> 3) & 0x001f));
> -}
> -
> -static force_inline __m128i
> -pack_2x128_128 (__m128i lo, __m128i hi)
> -{
> -return _mm_packus_epi16 (lo, hi);
> -}
> -
> -static force_inline __m128i
> -pack_565_2packedx128_128 (__m128i lo, __m128i hi)
> -{
> -__m128i rb0 = _mm_and_si128 (lo, mask_565_rb);
> -__m128i rb1 = _mm_and_si128 (hi, mask_565_rb);
> -
> -__m128i t0 = _mm_madd_epi16 (rb0, mask_565_pack_multiplier);
> -__m128i t1 = _mm_madd_epi16 (rb1, mask_565_pack_multiplier);
> -
> -__m128i g0 = _mm_and_si128 (lo, mask_green);
> -__m128i g1 = _mm_and_si128 (hi, mask_green);
> -
> -t0 = _mm_or_si128 (t0, g0);
> -t1 = _mm_or_si128 (t1, g1);
> -
> -/* Simulates _mm_packus_epi32 */
> -t0 = _mm_slli_epi32 (t0, 16 - 5);
> -t1 = _mm_slli_epi32 (t1, 16 - 5);
> -t0 = _mm_srai_epi32 (t0, 16);
> -t1 = _mm_srai_epi32 (t1, 16);
> -return _mm_packs_epi32 (t0, t1);
> -}
> -
> -static force_inline __m128i
> -pack_565_2x128_128 (__m128i lo, __m128i hi)
> -{
> -__m128i data;
> -__m128i r, g1, g2, b;
> -
> -data = pack_2x128_128 (lo, hi);
> -
> -r  = _mm_and_si128 (data, mask_565_r);
> -g1 = _mm_and_si128 (_mm_slli_epi32 (data, 3), mask_565_g1);
> -g2 = _mm_and_si128 (_mm_srli_epi32 (data, 5), mask_565_g2);
> -b  = _mm_and_si128 (_mm_srli_epi32 (data, 3), mask_565_b);
> -
>

Re: [Pixman] [PATCH 2/3] Moving helper functions in pixman-sse2.c to pixman-sse2.h.

2019-01-21 Thread Matt Turner
On Thu, Jan 17, 2019 at 12:27 AM Chris Wilson  wrote:
>
> Quoting Raghuveer Devulapalli (2019-01-17 00:59:59)
> > From: raghuveer devulapalli 
> >
> > These helper function will be reused in pixman-avx2.c implementations in
> > the future.
>
> Are we ever going to run into a naming conflict in the future? Is it
> worth prefixing all the inlines with sse2_? Probably makes sense so that
> we can see the instruction set used when mixing later.
> -Chris

The SSE2 intrinsics will actually be compiled into VEX-prefixed (AVX)
instructions operating on xmm registers when -mavx2 is used.

I can't think of a reason the lack of a prefix would cause any
confusion for the ones that already have "128" in the name. For
unpack_565_to_, etc, maybe it would be best to add a _128 suffix.
We have functions (e.g., pack_565_2x128_128) that look like that
already.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] Pixman release?

2018-11-15 Thread Matt Turner
On Thu, Nov 15, 2018 at 1:32 AM Maarten Lankhorst
 wrote:
>
> Hey,
>
> To get the floating point support in hands of users, I want to make a pixman
> release. Since pixman appears to be in a bit of a limbo, is there anyone
> still being a maintainer, or should I simply make a release according to the
> guidelines at RELEASING?

That's fine with me. Just make sure you're in the cairo group on
annachy or else the tarball upload will fail, IIRC.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] mmx: compile on MIPS for Loongson-3A MMI optimizations

2018-09-19 Thread Matt Turner
On Tue, Sep 18, 2018 at 2:34 AM  wrote:
>
> From: Xianju Diao 
>
> make check:
> when I enable the USE_OPENMP, the test of 'glyph-test' and 
> 'cover-test' will failed on Loongson-3A3000.
> Neither of the two test examples passed without optimizing the 
> code.Maybe be multi-core synchronization
> of cpu bug,I will continue to debug this problem, Now, I use the 
> critical of openMP, 'glyph-test' and '
> cover-test' can passed.
>
> benchmark:
> Running cairo-perf-trace benchmark on Loongson-3A.
>   image image16
> gvim  5.425 -> 5.069 5.531 -> 5.236
> popler-reseau 2.149 -> 2.13  2.152 -> 2.139
> swfdec-giant-steps-full  18.672 -> 8.21533.167 -> 18.28
> swfdec-giant-steps7.014 -> 2.45512.48  -> 5.982
> xfce4-terminal-al13.695 -> 5.24115.703 -> 5.859
> gonme-system-monitor 12.783 -> 7.05812.780 -> 7.104
> grads-heat-map0.482 -> 0.486 0.516 -> 0.514
> firefox-talos-svg   141.138 -> 134.621 152.495 -> 159.069
> firefox-talos-gfx23.119 -> 14.437   24.870 -> 15.161
> firefox-world-map32.018 -> 27.139   33.817 -> 28.085
> firefox-periodic-table   12.305 -> 12.443   12.876 -> 12.913
> evolution 7.071 -> 3.564 8.550 -> 3.784
> firefox-planet-gnome 77.926 -> 67.526   81.554 -> 65.840
> ocitysmap 4.934 -> 1.702 4.937 -> 1.701
> ---

Thanks for the patch. I will review it when I have time (I'm preparing
for a trip at the moment).

I have a Loongson3 system that I have found to be unstable. I assume
it is due to the hardware bugs that must be worked around in gcc and
binutils. I have patched both of them with the patches I found in
https://github.com/loongson-community/binutils-gdb etc, but I still
have instability. I would appreciate it very much if you could offer
some suggestions or help in improving the stability of my system.

Looks like there are a couple of different things happening in this
patch. We should try to split them up. One patch could be making the
assembly memcpy implementation usable on mips64. A separate patch
would add new functions to pixman-mmx.c.

A few quick comments inline.

>  configure.ac|7 +-
>  pixman/Makefile.am  |4 +-
>  pixman/loongson-mmintrin.h  |   46 ++
>  pixman/pixman-combine32.h   |6 +
>  pixman/pixman-mips-dspr2-asm.h  |2 +-
>  pixman/pixman-mips-memcpy-asm.S |  324 +---
>  pixman/pixman-mmx.c | 1088 
> ++-
>  pixman/pixman-private.h |   32 +-
>  pixman/pixman-solid-fill.c  |   49 +-
>  pixman/pixman-utils.c   |   65 ++-
>  test/Makefile.am|2 +-
>  test/utils.c|8 +

This diff stat doesn't correspond to this patch.

>  12 files changed, 1418 insertions(+), 215 deletions(-)
>
> diff --git a/configure.ac b/configure.ac
> index e833e45..3e3dde5 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -154,9 +154,9 @@ AC_CHECK_DECL([__amd64], [AMD64_ABI="yes"], 
> [AMD64_ABI="no"])
>  # has set CFLAGS.
>  if test $SUNCC = yes &&\
> test "x$test_CFLAGS" = "x" &&   \
> -   test "$CFLAGS" = "-g"
> +   test "$CFLAGS" = "-g -mabi=n64"
>  then
> -  CFLAGS="-O -g"
> +  CFLAGS="-O -g -mabi=n64"

This isn't acceptable.

>  fi
>
>  #
> @@ -183,6 +183,7 @@ AC_SUBST(LT_VERSION_INFO)
>  # Check for dependencies
>
>  PIXMAN_CHECK_CFLAG([-Wall])
> +PIXMAN_CHECK_CFLAG([-mabi=n64])
>  PIXMAN_CHECK_CFLAG([-Wdeclaration-after-statement])
>  PIXMAN_CHECK_CFLAG([-Wno-unused-local-typedefs])
>  PIXMAN_CHECK_CFLAG([-fno-strict-aliasing])
> @@ -273,7 +274,7 @@ dnl 
> ===
>  dnl Check for Loongson Multimedia Instructions
>
>  if test "x$LS_CFLAGS" = "x" ; then
> -LS_CFLAGS="-march=loongson2f"
> +LS_CFLAGS="-march=loongson3a"

Also not acceptable. I see that recent gcc and binutils have gotten
new options for enabling MMI separately from -march=loongson*. Maybe
we could use those if available.

I'm not sure there is currently a good solution. Let me think about it.

>  fi
>
>  have_loongson_mmi=no
> diff --git a/pixman/Makefile.am b/pixman/Makefile.am
> index 581b6f6..e3a080c 100644
> --- a/pixman/Makefile.am
> +++ b/pixman/Makefile.am
> @@ -122,7 +122,7 @@ libpixman_mips_dspr2_la_SOURCES = \
>  pixman-mips-dspr2.h \
>  pixman-mips-dspr2-asm.S \
>  pixman-mips-dspr2-asm.h \
> -pixman-mips-memcpy-asm.S
> +#pixman-mips-memcpy-asm.S

Can't do this.

>  libpixman_1_la_LIBADD += libpixman-mips-dspr2.la
>
>  ASM_CFLAGS_mips_dsp

Re: [Pixman] [PATCH] Adding AVX2 implementation of the OVER and REVERSE-OVER operator

2018-08-30 Thread Matt Turner
On Wed, Aug 29, 2018 at 12:09 PM Matt Turner  wrote:
> Trailing whitespace. There's a lot throughout this patch. I'm not
> going to point them out individually.

I just looked up how to configure git to alert you to bad whitespace:

git config core.whitespace indent-with-non-tab,space-before-tab,trailing-space

Give that a try.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Adding AVX2 implementation of the OVER and REVERSE-OVER operator

2018-08-30 Thread Matt Turner
On Wed, Aug 29, 2018 at 12:09 PM Matt Turner  wrote:
>
> On Wed, Aug 22, 2018 at 10:03 AM raghuveer devulapalli
>  wrote:
> >
> > The AVX2 implementation of OVER and REVERSE OVER operator was
> > found to be upto 2.2 times faster (depending on the array size) than
> > the corresponding SSE2 version. The AVX2 and SSE2 were benchmarked
> > on a Intel(R) Core(TM) i5-6260U CPU @ 1.80GHz
> >
> > Moving the helper functions in pixman-sse2.c to pixman-sse2.h. The AVX2
> > implementation uses the SSE2 version for manipulating pixels that are not
> > 32 byte aligned and hence, it made sense to separate the SSE2 helper
> > functions into a separate file to be included in the AVX2 file rather
> > than duplicate code.
>
> Let's please move the helpers into pixman-sse2.h in a separate commit
> from the one that adds AVX2 code paths.
>
> We typically have more substantial benchmarks in the commit message.

I ran all of the cairo traces in the benchmarks directory and couldn't
measure any difference. You'll have to describe your benchmarking.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Adding AVX2 implementation of the OVER and REVERSE-OVER operator

2018-08-29 Thread Matt Turner
On Wed, Aug 22, 2018 at 10:03 AM raghuveer devulapalli
 wrote:
>
> The AVX2 implementation of OVER and REVERSE OVER operator was
> found to be upto 2.2 times faster (depending on the array size) than
> the corresponding SSE2 version. The AVX2 and SSE2 were benchmarked
> on a Intel(R) Core(TM) i5-6260U CPU @ 1.80GHz
>
> Moving the helper functions in pixman-sse2.c to pixman-sse2.h. The AVX2
> implementation uses the SSE2 version for manipulating pixels that are not
> 32 byte aligned and hence, it made sense to separate the SSE2 helper
> functions into a separate file to be included in the AVX2 file rather
> than duplicate code.

Let's please move the helpers into pixman-sse2.h in a separate commit
from the one that adds AVX2 code paths.

We typically have more substantial benchmarks in the commit message.

Let me run some cairo traces and see what I come up with.

Also, what about the problems of AVX2 turbo?

https://mobile.twitter.com/rygorous/status/992170573819138048
https://gist.github.com/rygorous/32bc3ea8301dba09358fd2c64e02d774

It doesn't seem like we are doing anything related to it in these patches.

> ---
>  pixman/pixman-avx2.c | 401 
>  pixman/pixman-sse2.c | 504 
> +--
>  pixman/pixman-sse2.h | 502 ++
>  3 files changed, 904 insertions(+), 503 deletions(-)
>  create mode 100644 pixman/pixman-sse2.h
>
> diff --git a/pixman/pixman-avx2.c b/pixman/pixman-avx2.c
> index d860d67..60b1b2b 100644
> --- a/pixman/pixman-avx2.c
> +++ b/pixman/pixman-avx2.c
> @@ -6,6 +6,404 @@
>  #include "pixman-private.h"
>  #include "pixman-combine32.h"
>  #include "pixman-inlines.h"
> +#include "pixman-sse2.h"
> +
> +#define MASK_0080_AVX2 _mm256_set1_epi16(0x0080)
> +#define MASK_00FF_AVX2 _mm256_set1_epi16(0x00ff)
> +#define MASK_0101_AVX2 _mm256_set1_epi16(0x0101)
> +
> +static force_inline __m256i

Trailing whitespace. There's a lot throughout this patch. I'm not
going to point them out individually.

> +load_256_aligned (__m256i* src)
> +{
> +return _mm256_load_si256(src);
> +}
> +
> +static force_inline void
> +negate_2x256 (__m256i  data_lo,
> + __m256i  data_hi,
> + __m256i* neg_lo,
> + __m256i* neg_hi)
> +{
> +*neg_lo = _mm256_xor_si256 (data_lo, MASK_00FF_AVX2);
> +*neg_hi = _mm256_xor_si256 (data_hi, MASK_00FF_AVX2);
> +}
> +
> +static force_inline __m256i
> +pack_2x256_256 (__m256i lo, __m256i hi)
> +{
> +return _mm256_packus_epi16 (lo, hi);
> +}
> +
> +static force_inline void
> +pix_multiply_2x256 (__m256i* data_lo,
> +   __m256i* data_hi,
> +   __m256i* alpha_lo,
> +   __m256i* alpha_hi,
> +   __m256i* ret_lo,
> +   __m256i* ret_hi)
> +{
> +__m256i lo, hi;
> +
> +lo = _mm256_mullo_epi16 (*data_lo, *alpha_lo);
> +hi = _mm256_mullo_epi16 (*data_hi, *alpha_hi);
> +lo = _mm256_adds_epu16 (lo, MASK_0080_AVX2);
> +hi = _mm256_adds_epu16 (hi, MASK_0080_AVX2);
> +*ret_lo = _mm256_mulhi_epu16 (lo, MASK_0101_AVX2);
> +*ret_hi = _mm256_mulhi_epu16 (hi, MASK_0101_AVX2);
> +}
> +
> +static force_inline void
> +over_2x256 (__m256i* src_lo,
> +   __m256i* src_hi,
> +   __m256i* alpha_lo,
> +   __m256i* alpha_hi,
> +   __m256i* dst_lo,
> +   __m256i* dst_hi)
> +{
> +__m256i t1, t2;
> +
> +negate_2x256 (*alpha_lo, *alpha_hi, &t1, &t2);
> +
> +pix_multiply_2x256 (dst_lo, dst_hi, &t1, &t2, dst_lo, dst_hi);
> +
> +*dst_lo = _mm256_adds_epu8 (*src_lo, *dst_lo);
> +*dst_hi = _mm256_adds_epu8 (*src_hi, *dst_hi);
> +}
> +
> +static force_inline void
> +expand_alpha_2x256 (__m256i  data_lo,
> +   __m256i  data_hi,
> +   __m256i* alpha_lo,
> +   __m256i* alpha_hi)
> +{
> +__m256i lo, hi;
> +
> +lo = _mm256_shufflelo_epi16 (data_lo, _MM_SHUFFLE (3, 3, 3, 3));
> +hi = _mm256_shufflelo_epi16 (data_hi, _MM_SHUFFLE (3, 3, 3, 3));
> +
> +*alpha_lo = _mm256_shufflehi_epi16 (lo, _MM_SHUFFLE (3, 3, 3, 3));
> +*alpha_hi = _mm256_shufflehi_epi16 (hi, _MM_SHUFFLE (3, 3, 3, 3));
> +}
> +
> +static force_inline  void
> +unpack_256_2x256 (__m256i data, __m256i* data_lo, __m256i* data_hi)
> +{
> +*data_lo = _mm256_unpacklo_epi8 (data, _mm256_setzero_si256 ());
> +*data_hi = _mm256_unpackhi_epi8 (data, _mm256_setzero_si256 ());
> +}
> +
> +/* save 4 pixels on a 16-byte boundary aligned address */
> +static force_inline void
> +save_256_aligned (__m256i* dst,
> + __m256i  data)
> +{
> +_mm256_store_si256 (dst, data);
> +}
> +
> +static force_inline int
> +is_opaque_256 (__m256i x)
> +{
> +__m256i ffs = _mm256_cmpeq_epi8 (x, x);
> +
> +return (_mm256_movemask_epi8
> +   (_mm256_cmpeq_epi8 (x, ffs)) & 0x) == 0x;
> +}
> +
> +static force_inline int
> +is_zero_256 (__m256i x

Re: [Pixman] [PATCH] Adding infrastructure to permit future AVX2 implementations

2018-08-29 Thread Matt Turner
Thank you for the patches! Some comments inline.

On Wed, Aug 22, 2018 at 10:03 AM raghuveer devulapalli
 wrote:
>
> ---
>  configure.ac| 44 
>  pixman/Makefile.am  | 12 
>  pixman/pixman-avx2.c| 32 
>  pixman/pixman-private.h |  5 +
>  pixman/pixman-x86.c | 15 +--
>  5 files changed, 106 insertions(+), 2 deletions(-)
>  create mode 100644 pixman/pixman-avx2.c
>
> diff --git a/configure.ac b/configure.ac
> index e833e45..27f4305 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -503,6 +503,48 @@ fi
>  AM_CONDITIONAL(USE_SSSE3, test $have_ssse3_intrinsics = yes)
>
>  dnl 
> ===
> +dnl Check for AVX2

Trailing whitespace

> +
> +if test "x$AVX2_CFLAGS" = "x" ; then
> +AVX2_CFLAGS="-mavx2 -Winline"
> +fi
> +
> +have_avx2_intrinsics=no
> +AC_MSG_CHECKING(whether to use AVX2 intrinsics)
> +xserver_save_CFLAGS=$CFLAGS
> +CFLAGS="$AVX2_CFLAGS $CFLAGS"
> +
> +AC_COMPILE_IFELSE([AC_LANG_SOURCE([[
> +#include 
> +int param;
> +int main () {
> +__m256i a = _mm256_set1_epi32 (param), b = _mm256_set1_epi32 (param + 
> 1), c;
> +c = _mm256_maddubs_epi16 (a, b);
> +return _mm256_cvtsi256_si32(c);
> +}]])], have_avx2_intrinsics=yes)
> +CFLAGS=$xserver_save_CFLAGS
> +
> +AC_ARG_ENABLE(avx2,
> +   [AC_HELP_STRING([--disable-avx2],
> +   [disable AVX2 fast paths])],
> +   [enable_avx2=$enableval], [enable_avx2=auto])
> +
> +if test $enable_avx2 = no ; then
> +   have_avx2_intrinsics=disabled
> +fi
> +
> +if test $have_avx2_intrinsics = yes ; then
> +   AC_DEFINE(USE_AVX2, 1, [use AVX2 compiler intrinsics])
> +fi
> +
> +AC_MSG_RESULT($have_avx2_intrinsics)
> +if test $enable_avx2 = yes && test $have_avx2_intrinsics = no ; then
> +   AC_MSG_ERROR([AVX2 intrinsics not detected])
> +fi
> +
> +AM_CONDITIONAL(USE_AVX2, test $have_avx2_intrinsics = yes)
> +
> +dnl 
> ===
>  dnl Other special flags needed when building code using MMX or SSE 
> instructions
>  case $host_os in
> solaris*)
> @@ -538,6 +580,8 @@ AC_SUBST(MMX_LDFLAGS)
>  AC_SUBST(SSE2_CFLAGS)
>  AC_SUBST(SSE2_LDFLAGS)
>  AC_SUBST(SSSE3_CFLAGS)
> +AC_SUBST(AVX2_CFLAGS)
> +AC_SUBST(AVX2_LDFLAGS)
>
>  dnl 
> ===
>  dnl Check for VMX/Altivec
> diff --git a/pixman/Makefile.am b/pixman/Makefile.am
> index 581b6f6..7204621 100644
> --- a/pixman/Makefile.am
> +++ b/pixman/Makefile.am
> @@ -64,6 +64,18 @@ libpixman_1_la_LIBADD += libpixman-ssse3.la
>  ASM_CFLAGS_ssse3=$(SSSE3_CFLAGS)
>  endif
>
> +# avx2 code
> +if USE_AVX2
> +noinst_LTLIBRARIES += libpixman-avx2.la
> +libpixman_avx2_la_SOURCES = \
> +   pixman-avx2.c
> +libpixman_avx2_la_CFLAGS = $(AVX2_CFLAGS)
> +libpixman_1_la_LDFLAGS += $(AVX2_LDFLAGS)
> +libpixman_1_la_LIBADD += libpixman-avx2.la
> +
> +ASM_CFLAGS_avx2=$(AVX2_CFLAGS)
> +endif
> +
>  # arm simd code
>  if USE_ARM_SIMD
>  noinst_LTLIBRARIES += libpixman-arm-simd.la
> diff --git a/pixman/pixman-avx2.c b/pixman/pixman-avx2.c
> new file mode 100644
> index 000..d860d67
> --- /dev/null
> +++ b/pixman/pixman-avx2.c
> @@ -0,0 +1,32 @@
> +#ifdef HAVE_CONFIG_H
> +#include 
> +#endif
> +
> +#include  /* for AVX2 intrinsics */
> +#include "pixman-private.h"
> +#include "pixman-combine32.h"
> +#include "pixman-inlines.h"
> +
> +static const pixman_fast_path_t avx2_fast_paths[] =
> +{
> +{ PIXMAN_OP_NONE },
> +};
> +
> +static const pixman_iter_info_t avx2_iters[] =

Trailing whitespace

> +{
> +{ PIXMAN_null },
> +};
> +
> +#if defined(__GNUC__) && !defined(__x86_64__) && !defined(__amd64__)
> +__attribute__((__force_align_arg_pointer__))
> +#endif
> +pixman_implementation_t *
> +_pixman_implementation_create_avx2 (pixman_implementation_t *fallback)
> +{
> +pixman_implementation_t *imp = _pixman_implementation_create (fallback, 
> avx2_fast_paths);
> +
> +/* Set up function pointers */
> +imp->iter_info = avx2_iters;
> +
> +return imp;
> +}
> diff --git a/pixman/pixman-private.h b/pixman/pixman-private.h
> index 73a5414..b6b15df 100644
> --- a/pixman/pixman-private.h
> +++ b/pixman/pixman-private.h
> @@ -597,6 +597,11 @@ pixman_implementation_t *
>  _pixman_implementation_create_ssse3 (pixman_implementation_t *fallback);
>  #endif
>
> +#ifdef USE_AVX2
> +pixman_implementation_t *
> +_pixman_implementation_create_avx2 (pixman_implementation_t *fallback);
> +#endif
> +
>  #ifdef USE_ARM_SIMD
>  pixman_implementation_t *
>  _pixman_implementation_create_arm_simd (pixman_implementation_t *fallback);
> diff --git a/pixman/pixman-x86.c b/pixman/pixman-x86.c
> index 05297c4..687c83b 100644
> --- a/pixman/pixman-x86.c
> +++ b/pixman/pixman-x86.c

At the top of this file there is a preprocessor check:

#if defined(USE_X86_MMX) || defined (USE_SSE2) || defined (USE_SSSE3)

Re: [Pixman] [Patch 1/1] Clang compile failure due to use of __builtin_shuffle

2018-08-14 Thread Matt Turner
On Tue, Aug 7, 2018 at 2:50 AM StormByte  wrote:
>
> While playing with Clang and compiling a Gentoo system with it, I realized 
> that pixman is not compiling because of the use of __builtin_shuffle which 
> according to LLVM mailing list, should not be used directly [1].
>
> As such, I investigated a bit, and made a patch for making it compile 
> compatible with Clang that I attach here in the hope that it is reviewed.
> Thanks,
> David C. Manuelda
> [1]: http://lists.llvm.org/pipermail/cfe-dev/2017-August/055142.html

Thanks. This has already been reported as
https://bugs.gentoo.org/646360 and I committed a patch two months ago
to fix it -- see
https://gitlab.freedesktop.org/pixman/pixman/commit/bd2b49185b28c5024597a5e530af9fc25de3193a

The next version of pixman will include the patch.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] Pushing unreviewed patches to the pixman git repository

2018-06-06 Thread Matt Turner
On Tue, Jun 5, 2018 at 6:06 PM, Siarhei Siamashka
 wrote:
> Hello,
>
> I noticed that some people with commit access started pushing patches
> to the pixman git repository without giving the pixman mailing list
> subscribers any reasonable chance to review them:
>
> https://cgit.freedesktop.org/pixman/commit/?id=8b95e0e460baa499e54c19d29bf761d34c25badc
> https://cgit.freedesktop.org/pixman/commit/?id=bd2b49185b28c5024597a5e530af9fc25de3193a
>
> Yes, these fixes were trivial. But still it would be more polite to
> actually post patches to the mailing list, collect some reviews and
> then *wait* at least severaldays before pushing them to the repository
> (unless the issue is really urgent). Not everyone constantly monitors
> the mailing list and is able to provide an instant response.

I hope you don't consider those two patches to be similar cases.

One was committed without going to the mailing list by someone with
one patch in pixman every 5 years.

The other was was sent to the mailing list by a person with plenty of
pixman contributions and reviewed by two people. In Mesa we wait 24
hours, for the reasons you describe. Looks like it was close to 24
hours in this case.

I'm happy to wait more than 24 hours in the future -- that's no
problem. I'm just taking issue with the suggestion that the two cited
examples are somehow the same.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] test: Adjust for clang's removal of __builtin_shuffle

2018-06-04 Thread Matt Turner
On Mon, Jun 4, 2018 at 10:37 AM, Adam Jackson  wrote:
> On Mon, 2018-06-04 at 10:04 -0700, Matt Turner wrote:
>
>> #ifdef HAVE_GCC_VECTOR_EXTENSIONS
>> -const uint8x16 bswap_shufflemask =
>> +# if __has_builtin(__builtin_shufflevector)
>> +randdata.vb =
>> +__builtin_shufflevector (randdata.vb, randdata.vb,
>> +  3,  2,  1,  0,  7,  6 , 5,  4,
>> + 11, 10,  9,  8, 15, 14, 13, 12);
>> +# else
>> +static const uint8x16 bswap_shufflemask =
>   ^^^
>
> Seems superfluous, though I guess it doesn't change semantics. With or
> without that bit:

Oh, I think I added that when I was trying to consolidate the
constants between the two paths. I'll remove that.

> Reviewed-by: Adam Jackson 
>
> I think we're starting to be well overdue for an 0.36 release, but I'd
> like to take the opportunity to suggest moving to fdo's gitlab as we do
> that. I already have a copy imported personally and have CI working:
>
> https://gitlab.freedesktop.org/ajax/pixman/-/jobs/986

Agreed.

I would like to make 0.36 pass the test suite with clang, so if you
have any time or interest I'd appreciate a second set of eyes. I'll
filed https://bugs.freedesktop.org/show_bug.cgi?id=106818 so we can
track it.

I guess it's possible it's a clang bug.

I also need to take some time to look into the Loongson3 patch. If
you're not in a particular hurry, it would be nice to get that in.
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [PATCH] test: Adjust for clang's removal of __builtin_shuffle

2018-06-04 Thread Matt Turner
From: Vladimir Smirnov 

__builtin_shuffle was removed in clang 5.0.

Build log says:
test/utils-prng.c:207:27: error: use of unknown builtin '__builtin_shuffle' 
[-Wimplicit-function-declaration]
randdata.vb = __builtin_shuffle (randdata.vb, bswap_shufflemask);
  ^
test/utils-prng.c:207:25: error: assigning to 'uint8x16' (vector of 16 
'uint8_t' values) from incompatible type 'int'
randdata.vb = __builtin_shuffle (randdata.vb, bswap_shufflemask);
^ ~~
2 errors generated

Link to original discussion:
http://lists.llvm.org/pipermail/cfe-dev/2017-August/055140.html

It's possible to build pixman if attached patch is applied. Basically
patch adds check for __builtin_shuffle support and in case there is
none, falls back to clang-specific __builtin_shufflevector that do the
same but have different API.

Bugzilla: https://bugs.gentoo.org/646360
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104886
Tested-by: Philip Chimento 
Reviewed-by: Matt Turner 
---
I turned https://bugs.freedesktop.org/show_bug.cgi?id=104886#c2 into a
Tested-by tag for Philip.

I also reversed the order of the preprocessor conditions in order to
simplify it a bit (the !defined(__clang__) looked like a problem waiting
to happen).

Unfortunately combiner-test, gradient-crash-test, and stress-test fail
when built with clang for unrelated reasons.

 test/utils-prng.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/test/utils-prng.c b/test/utils-prng.c
index c27b5be..0cf53dd 100644
--- a/test/utils-prng.c
+++ b/test/utils-prng.c
@@ -199,12 +199,25 @@ randmemset_internal (prng_t  *prng,
 }
 else
 {
+
+#ifndef __has_builtin
+#define __has_builtin(x) 0
+#endif
+
 #ifdef HAVE_GCC_VECTOR_EXTENSIONS
-const uint8x16 bswap_shufflemask =
+# if __has_builtin(__builtin_shufflevector)
+randdata.vb =
+__builtin_shufflevector (randdata.vb, randdata.vb,
+  3,  2,  1,  0,  7,  6 , 5,  4,
+ 11, 10,  9,  8, 15, 14, 13, 12);
+# else
+static const uint8x16 bswap_shufflemask =
 {
 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12
 };
 randdata.vb = __builtin_shuffle (randdata.vb, bswap_shufflemask);
+# endif
+
 store_rand_128_data (buf, &randdata, aligned);
 buf += 16;
 #else
-- 
2.16.1

___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] vmx: Fix vector loads on ppc64le

2018-05-10 Thread Matt Turner
Tested-by: Matt Turner 
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] Pixman not building on MacOS X 10.11

2015-11-18 Thread Matt Turner
On Wed, Nov 18, 2015 at 8:35 PM, Siarhei Siamashka
 wrote:
> On Wed, 18 Nov 2015 14:22:09 -0800
> Matt Turner  wrote:
>
>> On Sun, Oct 11, 2015 at 10:34 AM, Andrea Canciani  wrote:
>> > On Sun, Oct 11, 2015 at 5:30 AM, Siarhei Siamashka
>> >  wrote:
>> >>
>> >> On Sun, 11 Oct 2015 04:53:08 +0300
>> >> Siarhei Siamashka  wrote:
>> >>
>> >> > On Sat, 10 Oct 2015 16:03:53 -0700
>> >> > Jeremy Huddleston Sequoia  wrote:
>> >> >
>> >> > > > On Oct 10, 2015, at 13:48, Andrea Canciani 
>> >> > > > wrote:
>> >> > > > The attached hack gets the code to compile on modern clang, but I
>> >> > > > believe first of all we should improve the configure.ac detection
>> >> > > > code
>> >> > > > so that pixman can actually build both on old and on new clang
>> >> > > > versions (possibly with mmx disabled, if the asm constraints we need
>> >> > > > are not implemented).
>> >> >
>> >> > This workaround looks reasonable to me. We should probably just drop
>> >> > the whole "ifdef __OPTIMIZE__" part in
>> >> >
>> >> > http://cgit.freedesktop.org/pixman/tree/pixman/pixman-mmx.c?id=pixman-0.32.8#n92
>> >> >
>> >> > I don't quite like the fact that this way of returning results from
>> >> > a macro is a GNU C specific extension. But as you said, the configure
>> >> > test can be updated to better match the code and also check if the
>> >> > compiler supports this particular construct.
>> >> >
>> >> > Could you please submit the final variant of your patch in a
>> >> > "git format-patch" format with a commit message and your
>> >> > Signed-off-by tag?
>> >>
>> >> After looking at this issue a bit more, I realized that we are
>> >> about to add a second layer of workarounds on top of the existing
>> >> old workarounds :-)
>> >
>> >
>> > The attached patch should fix the issue with only minor changes.
>> > It keeps the workarounds :( but somewhat it simplifies them :)
>> > I followed your suggestion of checking&using block expressions.
>> > Given that the _mm_shuffle_pi16() function is always used in a "return"
>> > statement, if needed we could avoid the usage of block expressions by
>> > defining a macro "_return_mm_shuffle_pi16()" (which would return the result
>> > of the operation instead of making it available as an expression) both for
>> > the xmmintrin branch and for the hand-coded one.
>> >
>> >> The original problem is that certain compilers (just GCC?) did not
>> >> support some intrinsics when compiling MMX code (_mm_movemask_pi8,
>> >> _mm_mulhi_pu16, _mm_shuffle_pi16) and we got the following code:
>> >>
>> >> http://cgit.freedesktop.org/pixman/tree/pixman/pixman-mmx.c?id=pixman-0.32.8#n66
>> >>
>> >> In fact, these instructions were not available as part of the original
>> >> MMX, but only got introduced later with AMD Extended 3DNow! and Intel
>> >> SSE1. This is mentioned in the commit messages:
>> >>
>> >> http://cgit.freedesktop.org/pixman/commit/?id=84221f4c1687b8ea14e9cbdc78b2ba7258e62c9e
>> >>
>> >> http://cgit.freedesktop.org/pixman/commit/?id=14208344964f341a7b4a704b05cf4804c23792e9
>> >>
>> >> These extra instructions are unofficially known as MMX2. But GCC does
>> >> not have a separate option for "-mmmx2". Instead the GCC manual says
>> >> that these intrinsics are available when either "-msse" or a
>> >> combination of "-m3dnow -march=athlon" is used:
>> >>
>> >> https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gcc/x86-Built-in-Functions.html#x86-Built-in-Functions
>> >>
>> >>
>> >> Now I wonder if the comment "We have to compile with -msse to use
>> >> xmmintrin.h" is still valid. I tried to tweak the following ifdef to
>> >> use the part of code, which includes  and the it compiled
>> >> fine for me with CFLAGS="-O2 -m32" using recent versions of GCC and
>> >> Clang:
>> >>
>> >> http://cgit.freedesktop.org/pixman/tree/pixman/pixman-mmx.c?id=pixman-0.32.8#n63
>> >>
>> >> I believe that this might be someho

Re: [Pixman] Pixman not building on MacOS X 10.11

2015-11-18 Thread Matt Turner
On Sun, Oct 11, 2015 at 10:34 AM, Andrea Canciani  wrote:
> On Sun, Oct 11, 2015 at 5:30 AM, Siarhei Siamashka
>  wrote:
>>
>> On Sun, 11 Oct 2015 04:53:08 +0300
>> Siarhei Siamashka  wrote:
>>
>> > On Sat, 10 Oct 2015 16:03:53 -0700
>> > Jeremy Huddleston Sequoia  wrote:
>> >
>> > > > On Oct 10, 2015, at 13:48, Andrea Canciani 
>> > > > wrote:
>> > > > The attached hack gets the code to compile on modern clang, but I
>> > > > believe first of all we should improve the configure.ac detection
>> > > > code
>> > > > so that pixman can actually build both on old and on new clang
>> > > > versions (possibly with mmx disabled, if the asm constraints we need
>> > > > are not implemented).
>> >
>> > This workaround looks reasonable to me. We should probably just drop
>> > the whole "ifdef __OPTIMIZE__" part in
>> >
>> > http://cgit.freedesktop.org/pixman/tree/pixman/pixman-mmx.c?id=pixman-0.32.8#n92
>> >
>> > I don't quite like the fact that this way of returning results from
>> > a macro is a GNU C specific extension. But as you said, the configure
>> > test can be updated to better match the code and also check if the
>> > compiler supports this particular construct.
>> >
>> > Could you please submit the final variant of your patch in a
>> > "git format-patch" format with a commit message and your
>> > Signed-off-by tag?
>>
>> After looking at this issue a bit more, I realized that we are
>> about to add a second layer of workarounds on top of the existing
>> old workarounds :-)
>
>
> The attached patch should fix the issue with only minor changes.
> It keeps the workarounds :( but somewhat it simplifies them :)
> I followed your suggestion of checking&using block expressions.
> Given that the _mm_shuffle_pi16() function is always used in a "return"
> statement, if needed we could avoid the usage of block expressions by
> defining a macro "_return_mm_shuffle_pi16()" (which would return the result
> of the operation instead of making it available as an expression) both for
> the xmmintrin branch and for the hand-coded one.
>
>> The original problem is that certain compilers (just GCC?) did not
>> support some intrinsics when compiling MMX code (_mm_movemask_pi8,
>> _mm_mulhi_pu16, _mm_shuffle_pi16) and we got the following code:
>>
>> http://cgit.freedesktop.org/pixman/tree/pixman/pixman-mmx.c?id=pixman-0.32.8#n66
>>
>> In fact, these instructions were not available as part of the original
>> MMX, but only got introduced later with AMD Extended 3DNow! and Intel
>> SSE1. This is mentioned in the commit messages:
>>
>> http://cgit.freedesktop.org/pixman/commit/?id=84221f4c1687b8ea14e9cbdc78b2ba7258e62c9e
>>
>> http://cgit.freedesktop.org/pixman/commit/?id=14208344964f341a7b4a704b05cf4804c23792e9
>>
>> These extra instructions are unofficially known as MMX2. But GCC does
>> not have a separate option for "-mmmx2". Instead the GCC manual says
>> that these intrinsics are available when either "-msse" or a
>> combination of "-m3dnow -march=athlon" is used:
>>
>> https://gcc.gnu.org/onlinedocs/gcc-5.2.0/gcc/x86-Built-in-Functions.html#x86-Built-in-Functions
>>
>>
>> Now I wonder if the comment "We have to compile with -msse to use
>> xmmintrin.h" is still valid. I tried to tweak the following ifdef to
>> use the part of code, which includes  and the it compiled
>> fine for me with CFLAGS="-O2 -m32" using recent versions of GCC and
>> Clang:
>>
>> http://cgit.freedesktop.org/pixman/tree/pixman/pixman-mmx.c?id=pixman-0.32.8#n63
>>
>> I believe that this might be somehow related to the new __ALL_ISA__
>> define, which had been mentioned in 2013:
>> https://gcc.gnu.org/ml/gcc-patches/2013-04/txts5M0c0uU9y.txt
>>
>> So what about just dropping this ugly stuff and adding a configure
>> check, which would verify if the MMX code can include ?
>
>
> I would love getting rid of the workarounds, but I'm somewhat worried about
> the possibility of regressions.
> If you believe is a valid option, we might definitely try to pursue it.
>
> What is the best way forward?

I've now reverted my commit and pushed yours.

Thanks.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] mmx: Use MMX2 intrinsics from xmmintrin.h directly.

2015-11-18 Thread Matt Turner
On Sun, Oct 25, 2015 at 1:13 PM, Matt Turner  wrote:
> On Sun, Oct 11, 2015 at 8:59 PM, Matt Turner  wrote:
>> We had lots of hacks to handle the inability to include xmmintrin.h
>> without compiling with -msse (lest SSE instructions be used in
>> pixman-mmx.c). Some recent version of gcc relaxed this restriction.
>>
>> Change configure.ac to test that xmmintrin.h can be included and that we
>> can use some intrinsics from it, and remove the work-around code from
>> pixman-mmx.c.
>>
>> Evidently allows gcc 4.9.3 to optimize better as well:
>>
>>textdata bss dec hex filename
>>  657078   30848 680  688606   a81de libpixman-1.so.0.33.3 before
>>  656710   30848 680  688238   a806e libpixman-1.so.0.33.3 after
>>
>> Signed-off-by: Matt Turner 
>> ---
>
> Ugh. This is apparently not sufficient...
>
> https://bugs.gentoo.org/show_bug.cgi?id=564024
>
> GCC allows you to *include* xmmintrin.h without enabling SSE, but it
> still doesn't allow you to use any of the functions:
>
> conftest.c: In function ‘main’:
> /usr/lib/gcc/x86_64-pc-linux-gnu/5.1.0/include/xmmintrin.h:1124:1:
> error: inlining failed in call to always_inline ‘_mm_mulhi_pu16’:
> target specific option mismatch
>  _mm_mulhi_pu16 (__m64 __A, __m64 __B)
>  ^
> conftest.c:12:7: error: called from here
>  w = _mm_mulhi_pu16(w, w);
>
> I'm not sure what to do except to revert.
>
> The MMX but no SSE case is important, at least it was in the past
> because of OLPC's XO-1.
>
> Suggestions besides reverting this?

I've now reverted this commit and committed Andrea's fix for clang.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] mmx: Use MMX2 intrinsics from xmmintrin.h directly.

2015-11-03 Thread Matt Turner
On Sun, Oct 25, 2015 at 7:12 PM, Søren Sandmann
 wrote:
> On Sun, Oct 25, 2015 at 8:10 PM, Siarhei Siamashka
>  wrote:
>
>>
>> Or we could simply do nothing and finally retire MMX support on x86.
>> If OLPC XO-1 users still do exist, they can always contact us.
>
>
> This is probably the way forward. Except for XO-1, MMX hasn't really done
> anything useful on
> x86 for a long time, but it has been an endless source of compiler headaches
> and maintenance
> issues.

I agree that it has caused a huge number of compiler headaches. I
suppose I'd be okay with disabling it by default, but like I said to
Siarhei I would like to keep it working on x86 because that's a much
easier way to test and prototype code than using slow iwMMXt/loongson
systems. Though, I do fear that if we disable it by default it'll just
get close to zero testing.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] mmx: Use MMX2 intrinsics from xmmintrin.h directly.

2015-11-03 Thread Matt Turner
On Sun, Oct 25, 2015 at 5:10 PM, Siarhei Siamashka
 wrote:
> On Sun, 25 Oct 2015 13:13:09 -0700
> Matt Turner  wrote:
>
>> On Sun, Oct 11, 2015 at 8:59 PM, Matt Turner  wrote:
>> > We had lots of hacks to handle the inability to include xmmintrin.h
>> > without compiling with -msse (lest SSE instructions be used in
>> > pixman-mmx.c). Some recent version of gcc relaxed this restriction.
>> >
>> > Change configure.ac to test that xmmintrin.h can be included and that we
>> > can use some intrinsics from it, and remove the work-around code from
>> > pixman-mmx.c.
>> >
>> > Evidently allows gcc 4.9.3 to optimize better as well:
>> >
>> >textdata bss dec hex filename
>> >  657078   30848 680  688606   a81de libpixman-1.so.0.33.3 before
>> >  656710   30848 680  688238   a806e libpixman-1.so.0.33.3 after
>> >
>> > Signed-off-by: Matt Turner 
>> > ---
>>
>> Ugh. This is apparently not sufficient...
>>
>> https://bugs.gentoo.org/show_bug.cgi?id=564024
>>
>> GCC allows you to *include* xmmintrin.h without enabling SSE, but it
>> still doesn't allow you to use any of the functions:
>>
>> conftest.c: In function ‘main’:
>> /usr/lib/gcc/x86_64-pc-linux-gnu/5.1.0/include/xmmintrin.h:1124:1:
>> error: inlining failed in call to always_inline ‘_mm_mulhi_pu16’:
>> target specific option mismatch
>>  _mm_mulhi_pu16 (__m64 __A, __m64 __B)
>>  ^
>> conftest.c:12:7: error: called from here
>>  w = _mm_mulhi_pu16(w, w);
>
> Oh, looks like the restriction used to be relaxed for a while, but then
> GCC 4.9 started to be strict again:
> https://bugzilla.redhat.com/show_bug.cgi?id=1092991#c1
>
>> I'm not sure what to do except to revert.
>
> The real problem is that GCC does not provide a separate option for
> MMX2 (a common subset of 3DNOW and SSE). We usually solve compiler
> problems by reporting bugs to compiler developers. This particular
> case had not been handled according to the usual rule, and now
> we have a nice practical demonstration of the consequences ;-)
>
> BTW, we can still report a bug to GCC. Better late than never.

Yeah, I suppose. The disappointing thing is that Google says an
-m3dnowext flag existed at one point...

>> The MMX but no SSE case is important, at least it was in the past
>> because of OLPC's XO-1.
>
> I'm not sure how many OLPC XO-1 laptops might be still remaining in
> real use in the hands of real people:
> http://www.olpcnews.com/about_olpc_news/goodbye_one_laptop_per_child.html
>
>> Suggestions besides reverting this?
>
> Because OLPC XO-1 is using the AMD Geode processor, we could probably
> treat the code in pixman-mmx.c as 3dnow optimizations on x86 hardware?

The problem is that -m3dnow isn't sufficient. The instructions we want
to use are a subset of SSE that AMD implemented in the Athlon. We need
an -m3dnowext flag.

We can't pass -march=athlon in MMX_CFLAGS either, since the user is
likely to have specified a -march= value of their own.

> Another option is to start using assembly instead of intrinsics.
> Unless a miracle happens and somebody decides to pay for this job,
> we definitely don't have resources to do a high quality assembly
> implementation for MMX/MMX2. But we still can take the assembly
> output of GCC and tweak it a bit. This is ugly and not very
> maintainable though. Been there, done that with ARMv6.

Not interested.

> Or we could simply do nothing and finally retire MMX support on x86.
> If OLPC XO-1 users still do exist, they can always contact us.

I don't care so much about XO-1, but I do want to retain the ability
to test the MMX code on x86. iwMMXt/loongson systems are slow, and
most development can be done on a fast desktop this way.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] mmx: Use MMX2 intrinsics from xmmintrin.h directly.

2015-10-25 Thread Matt Turner
On Sun, Oct 11, 2015 at 8:59 PM, Matt Turner  wrote:
> We had lots of hacks to handle the inability to include xmmintrin.h
> without compiling with -msse (lest SSE instructions be used in
> pixman-mmx.c). Some recent version of gcc relaxed this restriction.
>
> Change configure.ac to test that xmmintrin.h can be included and that we
> can use some intrinsics from it, and remove the work-around code from
> pixman-mmx.c.
>
> Evidently allows gcc 4.9.3 to optimize better as well:
>
>textdata bss dec hex filename
>  657078   30848 680  688606   a81de libpixman-1.so.0.33.3 before
>  656710   30848 680  688238   a806e libpixman-1.so.0.33.3 after
>
> Signed-off-by: Matt Turner 
> ---

Ugh. This is apparently not sufficient...

https://bugs.gentoo.org/show_bug.cgi?id=564024

GCC allows you to *include* xmmintrin.h without enabling SSE, but it
still doesn't allow you to use any of the functions:

conftest.c: In function ‘main’:
/usr/lib/gcc/x86_64-pc-linux-gnu/5.1.0/include/xmmintrin.h:1124:1:
error: inlining failed in call to always_inline ‘_mm_mulhi_pu16’:
target specific option mismatch
 _mm_mulhi_pu16 (__m64 __A, __m64 __B)
 ^
conftest.c:12:7: error: called from here
 w = _mm_mulhi_pu16(w, w);

I'm not sure what to do except to revert.

The MMX but no SSE case is important, at least it was in the past
because of OLPC's XO-1.

Suggestions besides reverting this?
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [PATCH] mmx: Use MMX2 intrinsics from xmmintrin.h directly.

2015-10-11 Thread Matt Turner
We had lots of hacks to handle the inability to include xmmintrin.h
without compiling with -msse (lest SSE instructions be used in
pixman-mmx.c). Some recent version of gcc relaxed this restriction.

Change configure.ac to test that xmmintrin.h can be included and that we
can use some intrinsics from it, and remove the work-around code from
pixman-mmx.c.

Evidently allows gcc 4.9.3 to optimize better as well:

   textdata bss dec hex filename
 657078   30848 680  688606   a81de libpixman-1.so.0.33.3 before
 656710   30848 680  688238   a806e libpixman-1.so.0.33.3 after

Signed-off-by: Matt Turner 
---
Looks like _MM_SHUFFLE isn't defined by ARM's mmintrin.h.

 configure.ac| 15 -
 pixman/pixman-mmx.c | 64 -
 2 files changed, 8 insertions(+), 71 deletions(-)

diff --git a/configure.ac b/configure.ac
index 424bfd3..b04cc69 100644
--- a/configure.ac
+++ b/configure.ac
@@ -347,21 +347,14 @@ AC_COMPILE_IFELSE([AC_LANG_SOURCE([[
 #error "Need GCC >= 3.4 for MMX intrinsics"
 #endif
 #include 
+#include 
 int main () {
 __m64 v = _mm_cvtsi32_si64 (1);
 __m64 w;
 
-/* Some versions of clang will choke on K */
-asm ("pshufw %2, %1, %0\n\t"
-: "=y" (w)
-: "y" (v), "K" (5)
-);
-
-/* Some versions of clang will choke on this */
-asm ("pmulhuw %1, %0\n\t"
-   : "+y" (w)
-   : "y" (v)
-);
+/* Test some intrinsics from xmmintrin.h */
+w = _mm_shuffle_pi16(v, 5);
+w = _mm_mulhi_pu16(w, w);
 
 return _mm_cvtsi64_si32 (v);
 }]])], have_mmx_intrinsics=yes)
diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index 05c48a4..88c3a39 100644
--- a/pixman/pixman-mmx.c
+++ b/pixman/pixman-mmx.c
@@ -40,6 +40,9 @@
 #else
 #include 
 #endif
+#ifdef USE_X86_MMX
+#include 
+#endif
 #include "pixman-private.h"
 #include "pixman-combine32.h"
 #include "pixman-inlines.h"
@@ -59,66 +62,7 @@ _mm_empty (void)
 }
 #endif
 
-#ifdef USE_X86_MMX
-# if (defined(__SUNPRO_C) || defined(_MSC_VER) || defined(_WIN64))
-#  include 
-# else
-/* We have to compile with -msse to use xmmintrin.h, but that causes SSE
- * instructions to be generated that we don't want. Just duplicate the
- * functions we want to use.  */
-extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
-_mm_movemask_pi8 (__m64 __A)
-{
-int ret;
-
-asm ("pmovmskb %1, %0\n\t"
-   : "=r" (ret)
-   : "y" (__A)
-);
-
-return ret;
-}
-
-extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
-_mm_mulhi_pu16 (__m64 __A, __m64 __B)
-{
-asm ("pmulhuw %1, %0\n\t"
-   : "+y" (__A)
-   : "y" (__B)
-);
-return __A;
-}
-
-#  ifdef __OPTIMIZE__
-extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
-_mm_shuffle_pi16 (__m64 __A, int8_t const __N)
-{
-__m64 ret;
-
-asm ("pshufw %2, %1, %0\n\t"
-   : "=y" (ret)
-   : "y" (__A), "K" (__N)
-);
-
-return ret;
-}
-#  else
-#   define _mm_shuffle_pi16(A, N)  \
-({ \
-   __m64 ret;  \
-   \
-   asm ("pshufw %2, %1, %0\n\t"\
-: "=y" (ret)   \
-: "y" (A), "K" ((const int8_t)N)   \
-   );  \
-   \
-   ret;\
-})
-#  endif
-# endif
-#endif
-
-#ifndef _MSC_VER
+#ifndef _MM_SHUFFLE
 #define _MM_SHUFFLE(fp3,fp2,fp1,fp0) \
  (((fp3) << 6) | ((fp2) << 4) | ((fp1) << 2) | (fp0))
 #endif
-- 
2.4.9

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] mmx: Use MMX2 intrinsics from xmmintrin.h directly.

2015-10-11 Thread Matt Turner
On Sun, Oct 11, 2015 at 8:41 PM, Siarhei Siamashka
 wrote:
> On Sun, 11 Oct 2015 14:55:28 -0700
> Matt Turner  wrote:
>
> Hello,
>
> Thanks. The patch looks good. In fact, it also allows the MMX code to
> be compiled with the Intel Compiler now (previously it was disabled by
> the configure check). A few minor things need to be fixed though. See
> the comments below.
>
>> We had lots of hacks to handle the inability to include xmmintrin.h
>> without compiling with -msse (lest SSE instructions be used in
>
> "lest" -> "lets" ?

Nope, I mean "lest" (means "otherwise something bad would happen")

>> pixman-mmx.c). Some recent version of gcc relaxed this restriction.
>>
>> Change configure.ac to test that xmmintrin.h can be included and that we
>> can use some intrinsics from it, and remove the work-around code from
>> pixman-mmx.c.
>>
>> Evidently allows gcc to optimize better as well:
>>
>>text  data bss dec hex filename
>>  657078 30848 680  688606   a81de libpixman-1.so.0.33.3 before
>>  656710 30848 680  688238   a806e libpixman-1.so.0.33.3 after
>
> It is always a good idea to mention the exact version of gcc in the
> commit message. For example, it could help if somebody happens to be
> reading this commit message a few years in the future.

Sure, will do.

> As for being able to optimize better. Yes, the "asm" blocks are
> treated by the compiler as opaque boxes (with just the input/output
> interface specified by constraints). The optimizer has difficulties
> generating efficient code if it has to deal with these bubbles. So
> it is a good idea to use intrinsics instead of single-instruction
> "asm" statements.
>
> Also I'm not completely sure, but now we probably prefer (require?) the
> "Signed-off-by" tags in commit messages.

Will do.

>> ---
>>  configure.ac| 15 --
>>  pixman/pixman-mmx.c | 60 
>> +
>>  2 files changed, 5 insertions(+), 70 deletions(-)
>
> Nice stats :-)
>
>>
>> diff --git a/configure.ac b/configure.ac
>> index 424bfd3..b04cc69 100644
>> --- a/configure.ac
>> +++ b/configure.ac
>> @@ -347,21 +347,14 @@ AC_COMPILE_IFELSE([AC_LANG_SOURCE([[
>>  #error "Need GCC >= 3.4 for MMX intrinsics"
>>  #endif
>>  #include 
>> +#include 
>
> We still would want to have this under the USE_X86_MMX ifdef check.
> Otherwise crosscompiling for ARM fails:
>
> $ ./configure --host=arm-linux-gnueabihf --disable-libpng --disable-gtk
> $ make
>
> pixman-mmx.c:42:23: fatal error: xmmintrin.h: No such file or directory
>  #include 
>^

Heh, can't believe I forgot about that since I added the iwMMXt support. :)

>>  int main () {
>>  __m64 v = _mm_cvtsi32_si64 (1);
>>  __m64 w;
>>
>> -/* Some versions of clang will choke on K */
>> -asm ("pshufw %2, %1, %0\n\t"
>> -: "=y" (w)
>> -: "y" (v), "K" (5)
>> -);
>> -
>> -/* Some versions of clang will choke on this */
>> -asm ("pmulhuw %1, %0\n\t"
>> - : "+y" (w)
>> - : "y" (v)
>> -);
>> +/* Test some intrinsics from xmmintrin.h */
>> +w = _mm_shuffle_pi16(v, 5);
>> +w = _mm_mulhi_pu16(w, w);
>>
>>  return _mm_cvtsi64_si32 (v);
>>  }]])], have_mmx_intrinsics=yes)
>> diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
>> index 05c48a4..6bcdee2 100644
>> --- a/pixman/pixman-mmx.c
>> +++ b/pixman/pixman-mmx.c
>> @@ -39,6 +39,7 @@
>>  #include 
>>  #else
>>  #include 
>> +#include 
>>  #endif
>>  #include "pixman-private.h"
>>  #include "pixman-combine32.h"
>> @@ -59,65 +60,6 @@ _mm_empty (void)
>>  }
>>  #endif
>>
>> -#ifdef USE_X86_MMX
>> -# if (defined(__SUNPRO_C) || defined(_MSC_VER) || defined(_WIN64))
>> -#  include 
>> -# else
>> -/* We have to compile with -msse to use xmmintrin.h, but that causes SSE
>> - * instructions to be generated that we don't want. Just duplicate the
>> - * functions we want to use.  */
>> -extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
>> __artificial__))
>> -_mm_movemask_pi8 (__m64 __A)
>> -{
>> -int ret;
>> -
&

[Pixman] [PATCH] mmx: Use MMX2 intrinsics from xmmintrin.h directly.

2015-10-11 Thread Matt Turner
We had lots of hacks to handle the inability to include xmmintrin.h
without compiling with -msse (lest SSE instructions be used in
pixman-mmx.c). Some recent version of gcc relaxed this restriction.

Change configure.ac to test that xmmintrin.h can be included and that we
can use some intrinsics from it, and remove the work-around code from
pixman-mmx.c.

Evidently allows gcc to optimize better as well:

   textdata bss dec hex filename
 657078   30848 680  688606   a81de libpixman-1.so.0.33.3 before
 656710   30848 680  688238   a806e libpixman-1.so.0.33.3 after
---
 configure.ac| 15 --
 pixman/pixman-mmx.c | 60 +
 2 files changed, 5 insertions(+), 70 deletions(-)

diff --git a/configure.ac b/configure.ac
index 424bfd3..b04cc69 100644
--- a/configure.ac
+++ b/configure.ac
@@ -347,21 +347,14 @@ AC_COMPILE_IFELSE([AC_LANG_SOURCE([[
 #error "Need GCC >= 3.4 for MMX intrinsics"
 #endif
 #include 
+#include 
 int main () {
 __m64 v = _mm_cvtsi32_si64 (1);
 __m64 w;
 
-/* Some versions of clang will choke on K */
-asm ("pshufw %2, %1, %0\n\t"
-: "=y" (w)
-: "y" (v), "K" (5)
-);
-
-/* Some versions of clang will choke on this */
-asm ("pmulhuw %1, %0\n\t"
-   : "+y" (w)
-   : "y" (v)
-);
+/* Test some intrinsics from xmmintrin.h */
+w = _mm_shuffle_pi16(v, 5);
+w = _mm_mulhi_pu16(w, w);
 
 return _mm_cvtsi64_si32 (v);
 }]])], have_mmx_intrinsics=yes)
diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index 05c48a4..6bcdee2 100644
--- a/pixman/pixman-mmx.c
+++ b/pixman/pixman-mmx.c
@@ -39,6 +39,7 @@
 #include 
 #else
 #include 
+#include 
 #endif
 #include "pixman-private.h"
 #include "pixman-combine32.h"
@@ -59,65 +60,6 @@ _mm_empty (void)
 }
 #endif
 
-#ifdef USE_X86_MMX
-# if (defined(__SUNPRO_C) || defined(_MSC_VER) || defined(_WIN64))
-#  include 
-# else
-/* We have to compile with -msse to use xmmintrin.h, but that causes SSE
- * instructions to be generated that we don't want. Just duplicate the
- * functions we want to use.  */
-extern __inline int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
-_mm_movemask_pi8 (__m64 __A)
-{
-int ret;
-
-asm ("pmovmskb %1, %0\n\t"
-   : "=r" (ret)
-   : "y" (__A)
-);
-
-return ret;
-}
-
-extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
-_mm_mulhi_pu16 (__m64 __A, __m64 __B)
-{
-asm ("pmulhuw %1, %0\n\t"
-   : "+y" (__A)
-   : "y" (__B)
-);
-return __A;
-}
-
-#  ifdef __OPTIMIZE__
-extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
-_mm_shuffle_pi16 (__m64 __A, int8_t const __N)
-{
-__m64 ret;
-
-asm ("pshufw %2, %1, %0\n\t"
-   : "=y" (ret)
-   : "y" (__A), "K" (__N)
-);
-
-return ret;
-}
-#  else
-#   define _mm_shuffle_pi16(A, N)  \
-({ \
-   __m64 ret;  \
-   \
-   asm ("pshufw %2, %1, %0\n\t"\
-: "=y" (ret)   \
-: "y" (A), "K" ((const int8_t)N)   \
-   );  \
-   \
-   ret;\
-})
-#  endif
-# endif
-#endif
-
 #ifndef _MSC_VER
 #define _MM_SHUFFLE(fp3,fp2,fp1,fp0) \
  (((fp3) << 6) | ((fp2) << 4) | ((fp1) << 2) | (fp0))
-- 
2.4.9

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 1/4] vmx: optimize scaled_nearest_scanline_vmx_8888_8888_OVER

2015-09-06 Thread Matt Turner
On Sun, Sep 6, 2015 at 8:27 AM, Oded Gabbay  wrote:
> reference memcpy speed = 24847.3MB/s (6211.8MP/s for 32bpp fills)
>
> Before  After   Change
>   
> L1  182.05  210.22 +15.47%
> L2  180.6   208.92 +15.68%
> M   180.52  208.22 +15.34%

There's no variation between L1, L2, and M -- as a follow on, it might
be interesting to experiment with unrolling the loop a bit. Looked
like the other patches in this series show the same behavior.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 1/4] pixman-fast-path: Add over_n_8888 fast path (disabled)

2015-08-22 Thread Matt Turner
On Thu, Aug 20, 2015 at 6:58 AM, Pekka Paalanen  wrote:
> From: Ben Avison 
>
> This is a C fast path, useful for reference or for platforms that don't
> have their own fast path for this operation.
>
> This new fast path is initially disabled by putting the entries in the
> lookup table after the sentinel. The compiler cannot tell the new code
> is not used, so it cannot eliminate the code. Also the lookup table size
> will include the new fast path. When the follow-up patch then enables
> the new fast path, the binary layout (alignments, size, etc.) will stay
> the same compared to the disabled case.
>
> Keeping the binary layout identical is important for benchmarking on
> Raspberry Pi 1. The addresses at which functions are loaded will have a
> significant impact on benchmark results, causing unexpected performance
> changes. Keeping all function addresses the same across the patch
> enabling a new fast path improves the reliability of benchmarks.
>
> Benchmark results are included in the patch enabling this fast path.
>
> [Pekka: disabled the fast path, commit message]
> Signed-off-by: Pekka Paalanen 

I don't care strongly, but I might just squash 1+2, 3+4 together and
make a mention in the commit message of exactly what the benchmark
numbers are comparing.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] Is Pixman being maintained at all?

2015-07-16 Thread Matt Turner
On Thu, Jul 16, 2015 at 6:38 AM, Oded Gabbay  wrote:
> Hi Matt, Siarhei
>
> As you probably already know, my name is Oded Gabbay and I'm working
> at Red Hat Desktop graphics team, and my current focus is on ppc64le.
> During the last couple of months, I've been working on adding support
> for ppc64le to pixman (fixing vmx fast-paths and adding new
> implementation). Some of the patches have been upstreamed (by Pekka
> Paalanen) and some are in the process of review (by Siarhei and
> others).

Welcome!

> From reading the above email thread, and from talking to Soren, Pekka
> and others, I understand you may need some help, in terms of time and
> resources, for actively maintaining pixman (last release was 1 year
> ago).

To be clear, I don't really consider myself a pixman maintainer -- I
was actually kind of surprised to see that I was the #2 committer.

I work full-time on Mesa, and the work I did on pixman was for an old
contract job at OLPC and then some stuff for fun (Loongson MIPS).

> If that is indeed the case, I would like to offer my help to make
> regular releases for Pixman, both for upstream and Fedora, as well as
> do bug-triage and code reviews.
>
> I believe I have the available time to do it as I'm working 100% on
> graphics in Red Hat. In addition, as a Red Hat employee, I have the
> resources to build/test pixman on multitude of architectures, and use
> Fedora build system as well. moreover, I'm already the maintainer of a
> fairly large kernel gpu driver (amdkfd - upstream since February), so
> I have maintainer experience.
>
> So far I had already helped Pekka clean the patchwork site he setup
> and I will continue to make sure it is updated. In addition, I got
> packager role for pixman in Fedora, so I'm able to release new
> packages.
>
> Waiting for your response. Feel free to express your opinions - I
> already have thick skin from the kernel work ;)

That all sounds fantastic. I'd personally be very happy to see pixman
maintained by another Red Hatter :)

I'm of course around to help out when I have time.

Thanks,
Matt
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] test: Add cover-test

2015-05-26 Thread Matt Turner
On Tue, May 26, 2015 at 3:58 PM, Ben Avison  wrote:
> This test aims to verify both numerical correctness and the honouring of
> array bounds for scaled plots (both nearest-neighbour and bilinear) at or
> close to the boundary conditions for applicability of "cover" type fast paths
> and iter fetch routines.
>
> It has a secondary purpose: by setting the env var EXACT (to any value) it
> will only test plots that are exactly on the boundary condition. This makes
> it possible to ensure that "cover" routines are being used to the maximum,
> although this requires the use of a debugger or code instrumentation to
> verify.
> ---
> Note that this must be pushed after Pekka's fence-image patches.
>
>  test/Makefile.sources |1 +
>  test/cover-test.c |  376 
> +
>  2 files changed, 377 insertions(+), 0 deletions(-)
>  create mode 100644 test/cover-test.c
>
> diff --git a/test/Makefile.sources b/test/Makefile.sources
> index 14a3710..5b901db 100644
> --- a/test/Makefile.sources
> +++ b/test/Makefile.sources
> @@ -26,6 +26,7 @@ TESTPROGRAMS =  \
> glyph-test\
> solid-test\
> stress-test   \
> +   cover-test\

Remember to add cover-test to .gitignore.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] Is Pixman being maintained at all?

2015-04-07 Thread Matt Turner
On Thu, Apr 2, 2015 at 2:26 AM, Pekka Paalanen  wrote:
> On Wed, 1 Apr 2015 18:46:10 -0700
> Matt Turner  wrote:
>
>> On Mon, Mar 30, 2015 at 10:58 AM, Bill Spitzak  wrote:
>> > On 03/30/2015 10:25 AM, Matt Turner wrote:
>> >
>> >> Do you just need someone to push them?
>> >>
>> >> I'm not capable of reviewing these.
>> >>
>> >> Since Søren isn't really maintaining pixman anymore I'm not really
>> >> sure how to proceed.
>> >
>> >
>> > Is this true?
>>
>> I don't see anyone but Pekka reviewing patches and there hasn't been a
>> release in 15 months, so yeah.
>>
>> > I think something needs to be done about this as all new work on X and 
>> > Cairo
>> > is depending on pixman.
>>
>> I mean, sure.
>>
>> > I have had an outstanding patch set for 8 months now. Søren responded to an
>> > earlier version and I tried to address it but have not heard anything 
>> > since.
>> > This is very frustrating as I would like to work on this but I'm not going
>> > to do it if it is useless.
>>
>> As far as I know, Søren isn't working at Redhat any more, so I don't
>> think you can expect him to continue maintaining pixman.
>
> Ok.
>
> Søren, Matt, Siarhei,
>
> how can we get the Pixman maintenance communitized? Maybe a la
> libdrm, because no-one has the resources to become a dedicated
> maintainer?

Seems fine to me, though I don't really feel like a pixman maintainer. :)

> What does it take to get push and release authorization, in the
> political sense that Pixman quality would not degrade and the
> current/old maintainers would approve?
> What kind of review policies should be enforced?

Søren told me back in December on IRC "Feel free to do a release".

I'm happy to have people commit to pixman who have a track record of
contributions to other X.Org projects.

> What development guidelines should there be? Should it be strictly no
> new API/ABI nor features, only performance work and new platform
> support like the latest new ARM?

I'm not aware of any backwards-incompatible changes to pixman, at
least in a really long time. Keeping that policy in place seems like a
good idea.

New APIs do happen. I think that's probably fine.

> If there is one person contributing arch or cpu-specific optimizations
> in assembly that no-one is willing to review apart from the scope of
> code changes and style, should we trust that one person and just land
> his work if he shows the performance numbers are good?

I might be a bit biased in my answer, since I have some patches to the
MMX code in my tree that I don't expect anyone to review, but yeah I
think we should mostly trust the author (obviously depends on the
author's credibility).

> I mean, I'm a newbie here. I don't want to hijack this project and push
> it only to my own directions, also because I cannot become a dedicated
> maintainer, nor promise to review anyone else's stuff. But, there are
> patches I'd like to see landed. I could work on them with Ben, but if
> there is no-one "upstream" to tell us what goes and what doesn't, we
> are left to our own judgement. Would you trust my and Ben's judgement
> so that I could land Ben's patches and make Pixman releases?

I don't think you're hijacking at all. I think this conversation
needed to happen sooner or later, though I do wish Søren or Siarhei
could spend a little time on it.

> You probably don't have a good understanding about how I work and what
> kind of a developer I am, nor have that kind of trust in me. That is
> fine. We need time to build that trust through discussion and patches.
> But it's hard to have a discussion if no-one can reply. I also
> understand that because I will not promise to be a maintainer, there is
> less incentive in educating me. It is quite likely that I hang around
> here for a while and then wander off when my needs are filled.

I haven't worked with you, but I'm familiar with your contributions.
I'd trust you to commit to pixman.

But I don't think I could really educate anyone except in the MMX and SSE2 code.

> The same goes for everyone, I believe.
>
> What could we do to let Pixman go forward?
>
> I suppose a project in a similar state would just get forked by some
> new people, who will then drive it with their own goals. Except here
> that doesn't work, because the fork would soon fall into the same state
> as the original project, except the world would just be more
> fragmented. Couldn't we as well just loosen up on the m

Re: [Pixman] Is Pixman being maintained at all?

2015-04-01 Thread Matt Turner
On Mon, Mar 30, 2015 at 10:58 AM, Bill Spitzak  wrote:
> On 03/30/2015 10:25 AM, Matt Turner wrote:
>
>> Do you just need someone to push them?
>>
>> I'm not capable of reviewing these.
>>
>> Since Søren isn't really maintaining pixman anymore I'm not really
>> sure how to proceed.
>> ___
>> Pixman mailing list
>> Pixman@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/pixman
>
>
> Is this true?

I don't see anyone but Pekka reviewing patches and there hasn't been a
release in 15 months, so yeah.

> I think something needs to be done about this as all new work on X and Cairo
> is depending on pixman.

I mean, sure.

> I have had an outstanding patch set for 8 months now. Søren responded to an
> earlier version and I tried to address it but have not heard anything since.
> This is very frustrating as I would like to work on this but I'm not going
> to do it if it is useless.

As far as I know, Søren isn't working at Redhat any more, so I don't
think you can expect him to continue maintaining pixman.

> If nothing is going to change in pixman I think Cairo is going to have to
> fork it and make a local copy. This is going to remove the ability for Cairo
> to use X remote rendering (since X will still be using the old pixman),
> though it is unclear if any serious software is using this mode any more.

Sounds ridiculous.

Get a Cairo developer to review and commit your pixman changes? I don't know.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 1/5] armv6: Fix typo in preload macro

2015-04-01 Thread Matt Turner
Pushed.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 1/5] armv6: Fix typo in preload macro

2015-03-30 Thread Matt Turner
On Mon, Mar 30, 2015 at 3:41 AM, Pekka Paalanen  wrote:
> On Mon, 16 Mar 2015 13:56:53 +0200
> Pekka Paalanen  wrote:
>
>> On Tue,  3 Mar 2015 15:24:16 +
>> Ben Avison  wrote:
>>
>> > Missing "lsl" meant that cases with a 32-bit source and/or mask, and an
>> > 8-bit destination, the code would not assemble.
>> > ---
>> >  pixman/pixman-arm-simd-asm.h |4 ++--
>> >  1 files changed, 2 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/pixman/pixman-arm-simd-asm.h b/pixman/pixman-arm-simd-asm.h
>> > index 8de060a..da153c3 100644
>> > --- a/pixman/pixman-arm-simd-asm.h
>> > +++ b/pixman/pixman-arm-simd-asm.h
>> > @@ -211,8 +211,8 @@
>> >  PF  add,SCRATCH, base, WK0, lsl #bpp_shift-dst_bpp_shift
>> >  PF  and,SCRATCH, SCRATCH, #31
>> >  PF  rsb,SCRATCH, SCRATCH, WK0, lsl #bpp_shift-dst_bpp_shift
>> > -PF  sub,SCRATCH, SCRATCH, #1/* so now ranges are -16..-1 
>> > / 0..31 / 32..63 */
>> > -PF  movs,   SCRATCH, SCRATCH, #32-6 /* so this sets NC   
>> > /  nc   /   Nc   */
>> > +PF  sub,SCRATCH, SCRATCH, #1/* so now ranges are 
>> > -16..-1 / 0..31 / 32..63 */
>> > +PF  movs,   SCRATCH, SCRATCH, lsl #32-6 /* so this sets 
>> > NC   /  nc   /   Nc   */
>> >  PF  bcs,61f
>> >  PF  bpl,60f
>> >  PF  pld,[ptr, #32*(prefetch_distance+2)]
>>
>> Hi,
>>
>> this one patch looks like it is independent from the series.
>>
>> On Sun, 05 Oct 2014 21:03:42 +0200
>> soren.sandm...@gmail.com (Søren Sandmann) wrote:
>>
>> > ==
>> > 0001:   Typo in preload Looks good
>> > ==
>> >
>> > Looks good
>>
>> Seems like Søren already agreed. Could this one be pushed alone?
>
> Ping?
>
> I know there are lots of patches from Ben in the queue, but this small
> series is the one to be landed first.

Do you just need someone to push them?

I'm not capable of reviewing these.

Since Søren isn't really maintaining pixman anymore I'm not really
sure how to proceed.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 2/3] armv7: Faster fill operations

2015-03-04 Thread Matt Turner
On Wed, Mar 4, 2015 at 5:56 PM, Ben Avison  wrote:
> This eliminates a number of branches over blocks of code that are either
> empty or can be trivially combined with a separate code block at the start
> and end of each scanline. This has a surprisingly big effect, at least on
> Cortex-A7, for src_n_8:
>
> Before  After
> Mean   StdDev   Mean   StdDev  Confidence  Change
> L1  1570.4 133.11639.6 110.7   100.0%  +4.4%
> L2  1042.6 19.9 1086.6 23.4100.0%  +4.2%
> M   1030.8 7.2  1036.8 3.2 100.0%  +0.6%
> HT  287.4  3.5  303.3  2.9 100.0%  +5.5%
> VT  262.0  2.6  263.3  2.6 99.9%   +0.5%
> R   206.5  2.4  209.9  2.4 100.0%  +1.7%
> RT  56.5   1.0  59.2   0.5 100.0%  +4.7%
> ---

What do you use to generate this?

I'd certainly like to use it.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] Unable to build master on Raspberry PI

2014-12-03 Thread Matt Turner
On Wed, Dec 3, 2014 at 9:18 AM, Andrea Giammarchi
 wrote:
> Thank you very much Siarhei, I am still building something huge and had no
> way to double check but at least I can confirm the gcc is 4.9.2.
>
> I will try to --disable-arm-iwmmxt when it shows arm6l as uname -m and let
> you know if that fixed.
>
> Do you think it should be enabled in the future or it's needed to let pixman
> properly work?

iwMMXt is a SIMD instruction set that the Raspberry Pi's CPU doesn't
support, so it's not useful for your use case.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 0/2] mmx nearest scaling paths

2014-09-26 Thread Matt Turner
On Tue, Sep 23, 2014 at 12:24 PM, Søren Sandmann
 wrote:
>
>> IIRC, we have already discussed it before. Maybe we should just disable
>> MMX support for x86 and use it only for MIPS Loongson and ARM IWMMXT?

I don't really see the benefit. The bugs we've had have all been
trivially fixed.

I'm concerned that if we disable the MMX code on x86 that over time we
might not notice a bug and it'll become harder to debug. But I suppose
you had to disable SSE2 to find those bugs anyway..

> I'd be in favor of that. For a long time the only real use case for MMX/x86
> has been the XO 1 laptops, and I really doubt that they are getting updated
> pixman libraries any more.
>
> Søren

Cc'ing Daniel Drake, who should know.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 0/2] mmx nearest scaling paths

2014-09-21 Thread Matt Turner
On Sun, Sep 21, 2014 at 8:45 PM, Siarhei Siamashka
 wrote:
> On Fri,  5 Sep 2014 00:26:21 -0700
> Matt Turner  wrote:
>
>> Here are a couple of nearest scaling MMX paths I wrote a long time ago
>> for Loongson and other things using the MMX code.
>>
>> I've got a few more patches for the MMX code that I'll send out as I
>> benchmark them.
>>
>> I don't really expect any reviews, so barring objections I'll plan to
>> commit them in a few days.
>
> Thanks for the patches. However the 32-bit x86 platform appears to be
> a never ending source of MMX troubles:
> http://lists.freedesktop.org/archives/pixman/2014-September/003422.html
>
> IIRC, we have already discussed it before. Maybe we should just disable
> MMX support for x86 and use it only for MIPS Loongson and ARM IWMMXT?
> It does not look like https://gcc.gnu.org/PR47759 is going to be ever
> fixed.

Ah, crap. Thanks for fixing that. I should have run the test suite on
x86 of course.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] mmx: Fix _mm_empty problems for over_8888_8888/over_8888_n_8888

2014-09-21 Thread Matt Turner
Reviewed-by: Matt Turner 
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [PATCH 0/2] mmx nearest scaling paths

2014-09-05 Thread Matt Turner
Here are a couple of nearest scaling MMX paths I wrote a long time ago
for Loongson and other things using the MMX code.

I've got a few more patches for the MMX code that I'll send out as I
benchmark them.

I don't really expect any reviews, so barring objections I'll plan to
commit them in a few days.

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [PATCH 2/2] mmx: Add nearest over_8888_8888

2014-09-05 Thread Matt Turner
lowlevel-blt-bench -n, over__, 15 iterations on Loongson 2f:

   Before  After
  Mean StdDev Mean StdDev   Change
L115.8   0.02 24.0   0.06   +52.0%
L214.8   0.15 23.3   0.13   +56.9%
M 10.3   0.01 13.8   0.03   +33.6%
HT10.0   0.02 14.5   0.05   +44.7%
VT 9.7   0.02 13.5   0.04   +39.2%
R  9.1   0.01 12.2   0.04   +34.4%
RT 7.1   0.06  8.9   0.09   +25.2%
---
 pixman/pixman-mmx.c | 57 +
 1 file changed, 57 insertions(+)

diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index 63f4cdf..c7fd503 100644
--- a/pixman/pixman-mmx.c
+++ b/pixman/pixman-mmx.c
@@ -3556,6 +3556,46 @@ mmx_composite_over_reverse_n_ 
(pixman_implementation_t *imp,
 }
 
 static force_inline void
+scaled_nearest_scanline_mmx___OVER (uint32_t*   pd,
+const uint32_t* ps,
+int32_t w,
+pixman_fixed_t  vx,
+pixman_fixed_t  unit_x,
+pixman_fixed_t  src_width_fixed,
+pixman_bool_t   
fully_transparent_src)
+{
+if (fully_transparent_src)
+   return;
+
+while (w)
+{
+   __m64 d = load (pd);
+   __m64 s = load (ps + pixman_fixed_to_int (vx));
+   vx += unit_x;
+   while (vx >= 0)
+   vx -= src_width_fixed;
+
+   store (pd, core_combine_over_u_pixel_mmx (s, d));
+   pd++;
+
+   w--;
+}
+}
+
+FAST_NEAREST_MAINLOOP (mmx___cover_OVER,
+  scaled_nearest_scanline_mmx___OVER,
+  uint32_t, uint32_t, COVER)
+FAST_NEAREST_MAINLOOP (mmx___none_OVER,
+  scaled_nearest_scanline_mmx___OVER,
+  uint32_t, uint32_t, NONE)
+FAST_NEAREST_MAINLOOP (mmx___pad_OVER,
+  scaled_nearest_scanline_mmx___OVER,
+  uint32_t, uint32_t, PAD)
+FAST_NEAREST_MAINLOOP (mmx___normal_OVER,
+  scaled_nearest_scanline_mmx___OVER,
+  uint32_t, uint32_t, NORMAL)
+
+static force_inline void
 scaled_nearest_scanline_mmx__n__OVER (const uint32_t * mask,
  uint32_t *   dst,
  const uint32_t * src,
@@ -4048,6 +4088,23 @@ static const pixman_fast_path_t mmx_fast_paths[] =
 PIXMAN_STD_FAST_PATH(IN,   a8,   null, a8,   
mmx_composite_in_8_8  ),
 PIXMAN_STD_FAST_PATH(IN,   solid,a8,   a8,   
mmx_composite_in_n_8_8),
 
+SIMPLE_NEAREST_FAST_PATH_COVER  (OVER,   a8r8g8b8, x8r8g8b8, mmx__ 
),
+SIMPLE_NEAREST_FAST_PATH_COVER  (OVER,   a8b8g8r8, x8b8g8r8, mmx__ 
),
+SIMPLE_NEAREST_FAST_PATH_COVER  (OVER,   a8r8g8b8, a8r8g8b8, mmx__ 
),
+SIMPLE_NEAREST_FAST_PATH_COVER  (OVER,   a8b8g8r8, a8b8g8r8, mmx__ 
),
+SIMPLE_NEAREST_FAST_PATH_NONE   (OVER,   a8r8g8b8, x8r8g8b8, mmx__ 
),
+SIMPLE_NEAREST_FAST_PATH_NONE   (OVER,   a8b8g8r8, x8b8g8r8, mmx__ 
),
+SIMPLE_NEAREST_FAST_PATH_NONE   (OVER,   a8r8g8b8, a8r8g8b8, mmx__ 
),
+SIMPLE_NEAREST_FAST_PATH_NONE   (OVER,   a8b8g8r8, a8b8g8r8, mmx__ 
),
+SIMPLE_NEAREST_FAST_PATH_PAD(OVER,   a8r8g8b8, x8r8g8b8, mmx__ 
),
+SIMPLE_NEAREST_FAST_PATH_PAD(OVER,   a8b8g8r8, x8b8g8r8, mmx__ 
),
+SIMPLE_NEAREST_FAST_PATH_PAD(OVER,   a8r8g8b8, a8r8g8b8, mmx__ 
),
+SIMPLE_NEAREST_FAST_PATH_PAD(OVER,   a8b8g8r8, a8b8g8r8, mmx__ 
),
+SIMPLE_NEAREST_FAST_PATH_NORMAL (OVER,   a8r8g8b8, x8r8g8b8, mmx__ 
),
+SIMPLE_NEAREST_FAST_PATH_NORMAL (OVER,   a8b8g8r8, x8b8g8r8, mmx__ 
),
+SIMPLE_NEAREST_FAST_PATH_NORMAL (OVER,   a8r8g8b8, a8r8g8b8, mmx__ 
),
+SIMPLE_NEAREST_FAST_PATH_NORMAL (OVER,   a8b8g8r8, a8b8g8r8, mmx__ 
),
+
 SIMPLE_NEAREST_SOLID_MASK_FAST_PATH (OVER, a8r8g8b8, a8r8g8b8, 
mmx__n_ ),
 SIMPLE_NEAREST_SOLID_MASK_FAST_PATH (OVER, a8b8g8r8, a8b8g8r8, 
mmx__n_ ),
 SIMPLE_NEAREST_SOLID_MASK_FAST_PATH (OVER, a8r8g8b8, x8r8g8b8, 
mmx__n_ ),
-- 
1.8.5.5

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/

[Pixman] [PATCH 1/2] mmx: Add nearest over_8888_n_8888

2014-09-05 Thread Matt Turner
lowlevel-blt-bench -n, over__n_, 15 iterations on Loongson 2f:

   Before  After
  Mean StdDev Mean StdDev   Change
L1 9.7   0.01 19.2   0.02   +98.2%
L2 9.6   0.11 19.2   0.16   +99.5%
M  7.3   0.02 12.5   0.01   +72.0%
HT 6.6   0.01 13.4   0.02  +103.2%
VT 6.4   0.01 12.6   0.03   +96.1%
R  6.3   0.01 11.2   0.01   +76.5%
RT 4.4   0.01  8.1   0.03   +82.6%
---
 pixman/pixman-mmx.c | 62 +
 1 file changed, 62 insertions(+)

diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index f9a92ce..63f4cdf 100644
--- a/pixman/pixman-mmx.c
+++ b/pixman/pixman-mmx.c
@@ -3555,6 +3555,59 @@ mmx_composite_over_reverse_n_ 
(pixman_implementation_t *imp,
 _mm_empty ();
 }
 
+static force_inline void
+scaled_nearest_scanline_mmx__n__OVER (const uint32_t * mask,
+ uint32_t *   dst,
+ const uint32_t * src,
+ int32_t  w,
+ pixman_fixed_t   vx,
+ pixman_fixed_t   unit_x,
+ pixman_fixed_t   src_width_fixed,
+ pixman_bool_tzero_src)
+{
+__m64 mm_mask;
+
+if (zero_src || (*mask >> 24) == 0)
+   return;
+
+mm_mask = expand_alpha (load (mask));
+
+while (w)
+{
+   uint32_t s = *(src + pixman_fixed_to_int (vx));
+   vx += unit_x;
+   while (vx >= 0)
+   vx -= src_width_fixed;
+
+   if (s)
+   {
+   __m64 ms = load (&s);
+   __m64 alpha = expand_alpha (ms);
+   __m64 dest  = load (dst);
+
+   store (dst, (in_over (ms, alpha, mm_mask, dest)));
+   }
+
+   dst++;
+   w--;
+}
+
+_mm_empty ();
+}
+
+FAST_NEAREST_MAINLOOP_COMMON (mmx__n__cover_OVER,
+ scaled_nearest_scanline_mmx__n__OVER,
+ uint32_t, uint32_t, uint32_t, COVER, TRUE, TRUE)
+FAST_NEAREST_MAINLOOP_COMMON (mmx__n__pad_OVER,
+ scaled_nearest_scanline_mmx__n__OVER,
+ uint32_t, uint32_t, uint32_t, PAD, TRUE, TRUE)
+FAST_NEAREST_MAINLOOP_COMMON (mmx__n__none_OVER,
+ scaled_nearest_scanline_mmx__n__OVER,
+ uint32_t, uint32_t, uint32_t, NONE, TRUE, TRUE)
+FAST_NEAREST_MAINLOOP_COMMON (mmx__n__normal_OVER,
+ scaled_nearest_scanline_mmx__n__OVER,
+ uint32_t, uint32_t, uint32_t, NORMAL, TRUE, TRUE)
+
 #define BSHIFT ((1 << BILINEAR_INTERPOLATION_BITS))
 #define BMSK (BSHIFT - 1)
 
@@ -3995,6 +4048,15 @@ static const pixman_fast_path_t mmx_fast_paths[] =
 PIXMAN_STD_FAST_PATH(IN,   a8,   null, a8,   
mmx_composite_in_8_8  ),
 PIXMAN_STD_FAST_PATH(IN,   solid,a8,   a8,   
mmx_composite_in_n_8_8),
 
+SIMPLE_NEAREST_SOLID_MASK_FAST_PATH (OVER, a8r8g8b8, a8r8g8b8, 
mmx__n_ ),
+SIMPLE_NEAREST_SOLID_MASK_FAST_PATH (OVER, a8b8g8r8, a8b8g8r8, 
mmx__n_ ),
+SIMPLE_NEAREST_SOLID_MASK_FAST_PATH (OVER, a8r8g8b8, x8r8g8b8, 
mmx__n_ ),
+SIMPLE_NEAREST_SOLID_MASK_FAST_PATH (OVER, a8b8g8r8, x8b8g8r8, 
mmx__n_ ),
+SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NORMAL (OVER, a8r8g8b8, a8r8g8b8, 
mmx__n_  ),
+SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NORMAL (OVER, a8b8g8r8, a8b8g8r8, 
mmx__n_  ),
+SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NORMAL (OVER, a8r8g8b8, x8r8g8b8, 
mmx__n_  ),
+SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NORMAL (OVER, a8b8g8r8, x8b8g8r8, 
mmx__n_  ),
+
 SIMPLE_BILINEAR_FAST_PATH (SRC, a8r8g8b8,  a8r8g8b8, mmx__ 
),
 SIMPLE_BILINEAR_FAST_PATH (SRC, a8r8g8b8,  x8r8g8b8, mmx__ 
),
 SIMPLE_BILINEAR_FAST_PATH (SRC, x8r8g8b8,  x8r8g8b8, mmx__ 
),
-- 
1.8.5.5

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [ANNOUNCE] pixman release 0.32.4 now available

2013-11-17 Thread Matt Turner
A new pixman release 0.32.4 is now available. This is a stable release in the
0.32 series.

tar.gz:
http://cairographics.org/releases/pixman-0.32.4.tar.gz
http://xorg.freedesktop.org/archive/individual/lib/pixman-0.32.4.tar.gz

tar.bz2:
http://xorg.freedesktop.org/archive/individual/lib/pixman-0.32.4.tar.bz2

Hashes:
MD5:  eba449138b972fbf4547a8c152fea162  pixman-0.32.4.tar.gz
MD5:  cdb566504fe9daf6728c7b03cc7ea228  pixman-0.32.4.tar.bz2
SHA1: 54be89b3453109be0930400e5b13c35c9e9d5e3a  pixman-0.32.4.tar.gz
SHA1: e2708db16595412e5aaf21a66b6f18b7223eb6c3  pixman-0.32.4.tar.bz2

GPG signature:
http://cairographics.org/releases/pixman-0.32.4.tar.gz.sha1.asc
(signed by Matt Turner 
   Matt Turner 
   Matt Turner )

Git:
git://git.freedesktop.org/git/pixman
tag: pixman-0.32.4

Log:
Jakub Bogusz (1):
  Fix the SSSE3 CPUID detection.

Matt Turner (1):
  Pre-release version bump to 0.32.4

Søren Sandmann (2):
  Post-release version bump to 0.32.3
  test/utils.c: Make the stack unaligned only on 32 bit Windows


signature.asc
Description: Digital signature
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] test/utils.c: Make the stack unaligned only on 32 bit Windows

2013-11-16 Thread Matt Turner
On Sat, Nov 16, 2013 at 4:27 PM, Søren Sandmann  wrote:
> The call_test_function() contains some assembly that deliberately
> causes the stack to be aligned to 32 bits rather than 128 bits on
> x86-32. The intention is to catch bugs that surface when pixman is
> called from code that only uses a 32 bit alignment.
>
> However, recent versions of GCC apparently make the assumption (either
> accidentally or deliberately) that that the incoming stack is aligned
> to 128 bits, where older versions only seemed to make this assumption
> when compiling with -msse2. This causes the vector code in the PRNG to
> now segfault when called from call_test_function() on x86-32.
>
> This patch fixes that by only making the stack unaligned on 32 bit
> Windows, where it would definitely be incorrect for GCC to assume that
> the incoming stack is aligned to 128 bits.
> ---
>  test/utils.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/test/utils.c b/test/utils.c
> index 281f6b4..f3c1b31 100644
> --- a/test/utils.c
> +++ b/test/utils.c
> @@ -648,7 +648,7 @@ call_test_function (uint32_t(*test_function)(int 
> testnum, int verbose),
>  {
>  uint32_t retval;
>
> -#if defined (__GNUC__) && (defined (__i386) || defined (__i386__))
> +#if __GNUC__ && defined (_WIN32) && (defined (__i386) || defined (__i386__))
>  __asm__ (
>     /* Deliberately avoid aligning the stack to 16 bytes */
> "pushl  %1\n\t"
> --
> 1.8.3.1

Tested-by: Matt Turner 

Could we do a 0.32.4 release with this and the SSSE3 detection fix?
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] fix SSSE3 detection in pixman 0.32.x

2013-11-12 Thread Matt Turner
On Tue, Nov 12, 2013 at 8:57 AM, Jakub Bogusz  wrote:
> The attached patch fixes SSSE3 detection, so that some routines
> (including tests) don't crash on older chips having APIC (bit 9 in
> cpuid info EDX) but no SSSE3 (bit 9 in cpuid info ECX).
>
> (note: I'm not subscribed to the list)

Thanks for the patch. It is indeed correct.

I've committed the patch. In the future please send a properly
formatted (git send-email) patch that I can git-am directly.

Thanks,
Matt
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] Test suite failures on 32-bit x86?

2013-11-12 Thread Matt Turner
Building the 0.32.2 release and from git with

CC="gcc -m32" ./autogen.sh && make check

PASS: prng-test
PASS: a1-trap-test
PASS: region-translate-test
PASS: pdf-op-test
PASS: region-test
PASS: fetch-test
../test-driver: line 95:  3312 Segmentation fault  "$@" > $log_file 2>&1
FAIL: rotate-test
PASS: oob-test
PASS: infinite-loop
PASS: combiner-test
PASS: pixel-test
PASS: trap-crasher
PASS: alpha-loop
PASS: thread-test
PASS: scaling-helpers-test
PASS: scaling-crash-test
../test-driver: line 95:  3571 Segmentation fault  "$@" > $log_file 2>&1
FAIL: matrix-test
PASS: gradient-crash-test
../test-driver: line 95:  3637 Segmentation fault  "$@" > $log_file 2>&1
FAIL: blitters-test
../test-driver: line 95:  3659 Segmentation fault  "$@" > $log_file 2>&1
FAIL: glyph-test
../test-driver: line 95:  3681 Segmentation fault  "$@" > $log_file 2>&1
FAIL: scaling-test
../test-driver: line 95:  3703 Segmentation fault  "$@" > $log_file 2>&1
FAIL: affine-test
PASS: alphamap
PASS: composite-traps-test
PASS: region-contains-test
PASS: stress-test
PASS: composite

Manually running the tests shows that they all crash in
prng_rand_128_r (utils-prng.h:138):

> uint32x4 e = x->a - ((x->b << 27) + (x->b >> (32 - 27)));

which is code inside an #ifdef GCC_VECTOR_EXTENSIONS_SUPPORTED block.

I realize this may be a gcc bug, so I tested with 4.8.1 and 4.7.2 and
got the same results. Testing with 4.6.3 leads to only a single
failure, in matrix-test (with a different backtrace, so probably
different).

Do we need some kind of configure check to make sure that our use of
gcc's vector extensions is actually going to work?
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] Latest GIT source for 'pixman-sse2.c'

2013-10-06 Thread Matt Turner
On Sun, Oct 6, 2013 at 1:50 AM, John Emmas  wrote:
> On 05/10/2013 19:32, John Emmas wrote:
>>
>> On 5 Oct 2013, at 19:00, Siarhei Siamashka wrote:
>>
>>> Andrea Canciani has already investigated the problem and submitted the
>>> fixes here:
>>>
>>>
>>> http://lists.freedesktop.org/archives/pixman/2013-September/002954.html
>>>
>> Many thanks for the super fast response guys.  I'm at a different PC now
>> but I'll apply that patch tomorrow.
>
>
> I applied that patch this morning and sure enough, it does fix the problem.
> Thanks to Andrea for noticing it.
>
> BTW...  while reading the patch I noticed that, quite by accident, the
> source file 'pixman-mmx.c' had somehow gotten excluded from my MSVC build
> project, so I took the opportunity to add it.  Although the build still
> succeeds, I see several warnings of this form while building
> 'pixman-mmx.c':-
>
>   pixman-mmx.c(586) : warning C4799: function 'whatever' has no EMMS
> instruction
>
> I don't know if that means anything bad but I thought it wouldn't do any
> harm flag it up.  Here's a list of the affected functions:-
>
>   function 'expand_4xpacked565' has no EMMS instruction
>   function 'is_opaque' has no EMMS instruction
>   function 'is_equal' has no EMMS instruction
>   function 'to_uint64' has no EMMS instruction
>   function 'expand_4x565' has no EMMS instruction
>   function 'is_zero' has no EMMS instruction
>   function 'store' has no EMMS instruction

All of these are all inline functions, so _mm_empty() isn't required.

>   function 'fast_composite_scaled_bilinear_mmx__8__none_OVER'
> has no EMMS instruction
>   function 'fast_composite_scaled_bilinear_mmx__8__pad_OVER' has
> no EMMS instruction

This has _mm_empty().
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 11/12] MIPS: runtime detection extended

2013-09-27 Thread Matt Turner
On Thu, Sep 19, 2013 at 3:00 PM, Søren Sandmann  wrote:
> I assume there is a good reason for those spaces in front of the
> keywords, but they definitely set off my "wrong formatting" detector,
> especially because there is no space in front of "Loongson" and because
> Loongson is capitalized while they others are not.

Not sure about the spaces, but the kernel sources display "ICT
Loongson-2" or "Loongson 1B" (the latter of which I don't believe have
MMI).

> Do we know if the loongson MMI instruction set shows up in /proc/cpuinfo
> in newer kernel versions?

I checked the upstream kernel and I don't see any evidence that it does.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 1/2] Add empty SSSE3 implementation

2013-09-05 Thread Matt Turner
On Thu, Aug 29, 2013 at 10:02 AM, Søren Sandmann Pedersen
 wrote:
> This commit adds a new, empty SSSE3 implementation and the associated
> build system support.
>
> configure.ac:   detect whether the compiler understands SSSE3
> intrinsics and set up the required CFLAGS
>
> Makefile.am:Add libpixman-ssse3.la
>
> pixman-x86.c:   Add X86_SSSE3 feature flag and detect it in
> detect_cpu_features().
>
> pixman-ssse3.c: New file with an empty SSSE3 implementation
> ---
>  configure.ac|   46 +++
>  pixman/Makefile.am  |   12 +++
>  pixman/pixman-private.h |5 
>  pixman/pixman-ssse3.c   |   50 
> +++
>  pixman/pixman-x86.c |   15 -
>  5 files changed, 126 insertions(+), 2 deletions(-)
>  create mode 100644 pixman/pixman-ssse3.c
>
> diff --git a/configure.ac b/configure.ac
> index 5b9512c..ff97bfb 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -437,6 +437,50 @@ fi
>  AM_CONDITIONAL(USE_SSE2, test $have_sse2_intrinsics = yes)
>
>  dnl 
> ===
> +dnl Check for SSSE3
> +
> +if test "x$SSSE3_CFLAGS" = "x" ; then
> +SSSE3_CFLAGS="-mssse3 -Winline"
> +fi
> +
> +have_ssse3_intrinsics=no
> +AC_MSG_CHECKING(whether to use SSSE3 intrinsics)
> +xserver_save_CFLAGS=$CFLAGS
> +CFLAGS="$SSSE3_CFLAGS $CFLAGS"
> +
> +AC_COMPILE_IFELSE([AC_LANG_SOURCE([[
> +#include 
> +#include 
> +#include 
> +#include 
> +int main () {
> +__m128i a = _mm_set1_epi32 (0), b = _mm_set1_epi32 (0), c;
> +c = _mm_maddubs_epi16 (a, b);
> +return 0;
> +}]])], have_ssse3_intrinsics=yes)
> +CFLAGS=$xserver_save_CFLAGS
> +
> +AC_ARG_ENABLE(ssse3,
> +   [AC_HELP_STRING([--disable-ssse3],
> +   [disable SSSE3 fast paths])],
> +   [enable_ssse3=$enableval], [enable_ssse3=auto])
> +
> +if test $enable_ssse3 = no ; then
> +   have_ssse3_intrinsics=disabled
> +fi
> +
> +if test $have_ssse3_intrinsics = yes ; then
> +   AC_DEFINE(USE_SSSE3, 1, [use SSSE3 compiler intrinsics])
> +fi
> +
> +AC_MSG_RESULT($have_ssse3_intrinsics)
> +if test $enable_ssse3 = yes && test $have_ssse3_intrinsics = no ; then
> +   AC_MSG_ERROR([SSSE3 intrinsics not detected])
> +fi
> +
> +AM_CONDITIONAL(USE_SSSE3, test $have_ssse3_intrinsics = yes)
> +
> +dnl 
> ===
>  dnl Other special flags needed when building code using MMX or SSE 
> instructions
>  case $host_os in
> solaris*)
> @@ -471,6 +515,8 @@ AC_SUBST(MMX_CFLAGS)
>  AC_SUBST(MMX_LDFLAGS)
>  AC_SUBST(SSE2_CFLAGS)
>  AC_SUBST(SSE2_LDFLAGS)
> +AC_SUBST(SSSE3_CFLAGS)
> +AC_SUBST(SSSE3_LDFLAGS)

No need for SSSE3_LDFLAGS. Remove it?

>  dnl 
> ===
>  dnl Check for VMX/Altivec
> diff --git a/pixman/Makefile.am b/pixman/Makefile.am
> index b9ea754..b376d9a 100644
> --- a/pixman/Makefile.am
> +++ b/pixman/Makefile.am
> @@ -52,6 +52,18 @@ libpixman_1_la_LIBADD += libpixman-sse2.la
>  ASM_CFLAGS_sse2=$(SSE2_CFLAGS)
>  endif
>
> +# ssse3 code
> +if USE_SSSE3
> +noinst_LTLIBRARIES += libpixman-ssse3.la
> +libpixman_ssse3_la_SOURCES = \
> +   pixman-ssse3.c
> +libpixman_ssse3_la_CFLAGS = $(SSSE3_CFLAGS)
> +libpixman_1_la_LDFLAGS += $(SSSE3_LDFLAGS)
> +libpixman_1_la_LIBADD += libpixman-ssse3.la
> +
> +ASM_CFLAGS_ssse3=$(SSSE3_CFLAGS)
> +endif
> +
>  # arm simd code
>  if USE_ARM_SIMD
>  noinst_LTLIBRARIES += libpixman-arm-simd.la
> diff --git a/pixman/pixman-private.h b/pixman/pixman-private.h
> index 0afabad..732f3d1 100644
> --- a/pixman/pixman-private.h
> +++ b/pixman/pixman-private.h
> @@ -593,6 +593,11 @@ pixman_implementation_t *
>  _pixman_implementation_create_sse2 (pixman_implementation_t *fallback);
>  #endif
>
> +#ifdef USE_SSSE3
> +pixman_implementation_t *
> +_pixman_implementation_create_ssse3 (pixman_implementation_t *fallback);
> +#endif
> +
>  #ifdef USE_ARM_SIMD
>  pixman_implementation_t *
>  _pixman_implementation_create_arm_simd (pixman_implementation_t *fallback);
> diff --git a/pixman/pixman-ssse3.c b/pixman/pixman-ssse3.c
> new file mode 100644
> index 000..19d71e7
> --- /dev/null
> +++ b/pixman/pixman-ssse3.c
> @@ -0,0 +1,50 @@
> +/*
> + * Copyright © 2013 Soren Sandmann Pedersen
> + * Copyright © 2013 Red Hat, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall 

Re: [Pixman] [PATCH] Drop support for 8-bit precision in bilinear filtering

2013-09-05 Thread Matt Turner
On Wed, Sep 4, 2013 at 7:49 PM, Søren Sandmann  wrote:
> From: Søren Sandmann Pedersen 
>
> The default has been 7-bit for a while now, and the quality
> improvement with 8-bit precision is not enough to justify keeping the
> code around as a compile-time option.
> ---

I'm fine with this change, but just a strange data point, since the
purpose in 7-bit was to be able to use _mm_madd_pi16 or similar:

I noticed that for commit 9aa8e3a2 that I actually had a slight loss
of performance on Loongson (1.8%) but a gain of 8.3% on ARM/iwMMXt.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [PATCH] Use AC_LINK_IFELSE to check if the Loongson MMI code can link

2013-05-15 Thread Matt Turner
From: Markos Chandras 

The Loongson code is compiled with -march=loongson2f to enable the MMI
instructions, but binutils refuses to link object code compiled with
different -march settings, leading to link failures later in the
compile. This avoids that problem by checking if we can link code
compiled for Loongson.

Signed-off-by: Markos Chandras 
---
 configure.ac | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index c43a0d2..221179f 100644
--- a/configure.ac
+++ b/configure.ac
@@ -279,7 +279,7 @@ AC_MSG_CHECKING(whether to use Loongson MMI assembler)
 
 xserver_save_CFLAGS=$CFLAGS
 CFLAGS=" $LS_CFLAGS $CFLAGS -I$srcdir"
-AC_COMPILE_IFELSE([AC_LANG_SOURCE([[
+AC_LINK_IFELSE([AC_LANG_SOURCE([[
 #ifndef __mips_loongson_vector_rev
 #error "Loongson Multimedia Instructions are only available on Loongson"
 #endif
-- 
1.8.1.5

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [PATCH] mmx: Document implementation(s) of pix_multiply().

2013-05-15 Thread Matt Turner
---
I look at that function and can never remember what it does or how it
manages to do it.

 pixman/pixman-mmx.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index 14790c0..746ecd6 100644
--- a/pixman/pixman-mmx.c
+++ b/pixman/pixman-mmx.c
@@ -301,6 +301,29 @@ negate (__m64 mask)
 return _mm_xor_si64 (mask, MC (4x00ff));
 }
 
+/* Computes the product of two unsigned fixed-point 8-bit values from 0 to 1
+ * and maps its result to the same range.
+ *
+ * Jim Blinn gives multiple ways to compute this in "Jim Blinn's Corner:
+ * Notation, Notation, Notation", the first of which is
+ *
+ *   prod(a, b) = (a * b + 128) / 255.
+ *
+ * By approximating the division by 255 as 257/65536 it can be replaced by a
+ * multiply and a right shift. This is the implementation that we use in
+ * pix_multiply(), but we _mm_mulhi_pu16() by 257 (part of SSE1 or Extended
+ * 3DNow!, and unavailable at the time of the book's publication) to perform
+ * the multiplication and right shift in a single operation.
+ *
+ *   prod(a, b) = ((a * b + 128) * 257) >> 16.
+ *
+ * A third way (how pix_multiply() was implemented prior to 14208344) exists
+ * also that performs the multiplication by 257 with adds and shifts.
+ *
+ * Where temp = a * b + 128
+ *
+ *   prod(a, b) = (temp + (temp >> 8)) >> 8.
+ */
 static force_inline __m64
 pix_multiply (__m64 a, __m64 b)
 {
-- 
1.8.1.5

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [ANNOUNCE] pixman major release 0.30.0 now available

2013-05-11 Thread Matt Turner
On Wed, May 8, 2013 at 4:56 PM, Søren Sandmann  wrote:
> Ben Avison (8):
>   Fix to lowlevel-blt-bench

In case this saves someone else some time: this commit changes
lowlevel-blit-bench results significantly. Comparisons of benchmark
results taken before and after this commit cannot be made.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] As per : Please report to pixman@lists.freedesktop.org

2013-04-09 Thread Matt Turner
On Tue, Apr 9, 2013 at 2:39 PM, David Lisle  wrote:
> Thanks for responding, the problem remains a mystery bu the overall project
> now is operational. I appreciate that you took time.

I really meant that there certainly must have been more error output
that wasn't in your email. This would lead to the actual problem.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] As per : Please report to pixman@lists.freedesktop.org

2013-04-08 Thread Matt Turner
On Mon, Apr 8, 2013 at 11:13 AM, David Lisle  wrote:
> ===
> make[2]: *** [check-TESTS] Error 1
> make[2]: Leaving directory `/usr/src/pixman-0.28.2/test'
> make[1]: *** [check-am] Error 2
> make[1]: Leaving directory `/usr/src/pixman-0.28.2/test'
> make: *** [check-recursive] Error 1
> ==
>
> The test failed.

Which test?

> I am using Slackware 2.6.37.6-smp
> KDE SC Version 4.5.5(KDE 4.5.5)
>
> Compiles as root, added other programs that were dependencies i.e. wv-1.2.4
> prior to configuration and make. Make gave no error messages or warnings.

Seems doubtful.

> This program did not correctly pass the tests, therefore installation is
> held in abeyance until it does.
>
> There is insufficient information for me to solve this problem.

Us too.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 2/4] Added fast path for "pad" type repeats

2013-02-05 Thread Matt Turner
On Tue, Feb 5, 2013 at 4:39 PM, Ben Avison  wrote:
> diff --git a/test/Makefile.sources b/test/Makefile.sources
> index e323a8e..bcbca37 100644
> --- a/test/Makefile.sources
> +++ b/test/Makefile.sources
> @@ -1,6 +1,7 @@
>  # Tests (sorted by expected completion time)
>  TESTPROGRAMS = \
> prng-test   \
> +   repeat-test \

Update .gitignore for the new test.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] 0.29.2

2013-01-27 Thread Matt Turner
On Sun, Jan 27, 2013 at 11:43 AM, Siarhei Siamashka
 wrote:
> Still, I'm not very happy about the code duplication. We already have
> similar iterators (fetch only, no writeback) in "pixman-mmx.c":
>
> 
> http://cgit.freedesktop.org/pixman/tree/pixman/pixman-mmx.c?id=pixman-0.28.2#n3904
>
> Ideally, a lot of this code can be reused in different backends. The
> only unique parts are just the fetch/store functions themselves.

I'm not sure I understand totally. Is the suggestion adding writeback
iterators, thereby allowing the removal of src_x888_0565?
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] build: Support building Loongson code for 2e, 2f, 3a

2013-01-26 Thread Matt Turner
Some preemptive explanations:

On Sat, Jan 26, 2013 at 6:54 PM, Matt Turner  wrote:
> diff --git a/pixman/pixman-mips.c b/pixman/pixman-mips.c
> index 3048813..77bef5c 100644
> --- a/pixman/pixman-mips.c
> +++ b/pixman/pixman-mips.c
> @@ -27,6 +27,10 @@
>
>  #if defined(USE_MIPS_DSPR2) || defined(USE_LOONGSON_MMI)
>
> +#ifdef DLOPEN_LOONGSON_MMI
> +#include 
> +#endif
> +
>  #include 
>  #include 
>
> @@ -69,10 +73,64 @@ pixman_implementation_t *
>  _pixman_mips_get_implementations (pixman_implementation_t *imp)
>  {
>  #ifdef USE_LOONGSON_MMI
> +void *mmi_handle = NULL;

mmi_handle is outside of DLOPEN_LOONGSON_MMI so that I don't have to
do funny things to the if-statements below. In the !dlopen case, I
expect gcc to recognize that it's always NULL and optimize it
completely out.

> +#ifdef DLOPEN_LOONGSON_MMI
> +pixman_implementation_t *(*_pixman_implementation_create_mmx) 
> (pixman_implementation_t *);
> +#endif
>  /* I really don't know if some Loongson CPUs don't have MMI. */
> -if (!_pixman_disabled ("loongson-mmi") && have_feature ("Loongson"))
> +#ifdef HAVE_LOONGSON2E_MMI
> +if (!mmi_handle && !_pixman_disabled ("loongson-mmi")
> +   && have_feature ("Loongson") && have_feature ("-2e"))
> +{
> +#ifdef DLOPEN_LOONGSON_MMI
> +   mmi_handle = dlopen("libpixman-1-loongson2e-mmi.so", RTLD_LAZY | 
> RTLD_LOCAL);
> +#else
> +   imp = _pixman_implementation_create_mmx (imp);
> +#endif
> +}
> +#endif
> +#ifdef HAVE_LOONGSON2F_MMI
> +if (!mmi_handle && !_pixman_disabled ("loongson-mmi")
> +   && have_feature ("Loongson") && have_feature ("-2f"))
> +{
> +#ifdef DLOPEN_LOONGSON_MMI
> +   mmi_handle = dlopen("libpixman-1-loongson2f-mmi.so", RTLD_LAZY | 
> RTLD_LOCAL);
> +#else
> +   imp = _pixman_implementation_create_mmx (imp);
> +#endif
> +}
> +#endif
> +#ifdef HAVE_LOONGSON3A_MMI
> +if (!mmi_handle && !_pixman_disabled ("loongson-mmi")
> +   && have_feature ("Loongson-3A"))
> +{
> +#ifdef DLOPEN_LOONGSON_MMI
> +   mmi_handle = dlopen("libpixman-1-loongson3a-mmi.so", RTLD_LAZY | 
> RTLD_LOCAL);
> +#else
> imp = _pixman_implementation_create_mmx (imp);
>  #endif
> +}
> +#endif
> +
> +#ifdef DLOPEN_LOONGSON_MMI
> +if (mmi_handle)
> +{
> +   _pixman_implementation_create_mmx = dlsym(mmi_handle, 
> "_pixman_implementation_create_mmx");
> +   if (_pixman_implementation_create_mmx)
> +   {
> +   imp = _pixman_implementation_create_mmx (imp);
> +   }
> +   else
> +   {
> +   puts(dlerror());
> +   }
> +}
> +else
> +{
> +   puts(dlerror());
> +}
> +#endif
> +#endif

I don't ever dlclose() the handle. I expect that it will be live for
the rest of process execution. I think there are other cases of
"leaks" like this in pixman already.

>  #ifdef USE_MIPS_DSPR2
>  if (!_pixman_disabled ("mips-dspr2"))
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [PATCH] build: Support building Loongson code for 2e, 2f, 3a

2013-01-26 Thread Matt Turner
Since binutils refuses to link objects that are compiled with different
-march flags, pixman-mmx.c is compiled with varying -march flags into
separate shared objects, which are dlopened at runtime.

AC_LINK_IFELSE is used to confirm that linking works, since for example
an object built with -march=loongson2e cannot be linked with libc.so
built with -march=loongson2f. I expect binary distributions' libcs to
be built with generic flags, and in such case all three loongson march
values can be built.

If libc is built with a particular -march=loongson* flag, the linking
test will fail and only the -march value matching the C library will be
built.

If only one -march value is built, avoid dlopen and simply build the
code into libpixman-1 like before.

Unfortunately, two internal pixman symbols are needed by pixman-mmx.c:
_pixman_image_get_solid
_pixman_implementation_create

They are annotated with PIXMAN_EXPORT, but only in the dlopen case.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51451
---
An alternative would be to move the code that creates the implementation
record from pixman-mmx.c to pixman-mips.c and use dlsym to get the
function pointers for the fast paths table. This seemed like a lot more
work for the benefit of not exposing two private symbols on a platform
almost no one cares about, and only on binary distributions at that.

 configure.ac   | 135 +++--
 pixman/Makefile.am |  60 --
 pixman/pixman-image.c  |   3 +
 pixman/pixman-implementation.c |   3 +
 pixman/pixman-mips.c   |  60 +-
 pixman/pixman-mmx.c|   3 +
 pixman/pixman-private.h|   6 ++
 7 files changed, 244 insertions(+), 26 deletions(-)

diff --git a/configure.ac b/configure.ac
index 515e312..bf10344 100644
--- a/configure.ac
+++ b/configure.ac
@@ -270,21 +270,26 @@ PIXMAN_CHECK_CFLAG([-xldscope=hidden], [dnl
 dnl ===
 dnl Check for Loongson Multimedia Instructions
 
-if test "x$LS_CFLAGS" = "x" ; then
-LS_CFLAGS="-march=loongson2f"
+if test "x$LS2E_CFLAGS" = "x" ; then
+LS2E_CFLAGS="-march=loongson2e"
+fi
+if test "x$LS2F_CFLAGS" = "x" ; then
+LS2F_CFLAGS="-march=loongson2f"
+fi
+if test "x$LS3A_CFLAGS" = "x" ; then
+LS3A_CFLAGS="-march=loongson3a"
 fi
 
-have_loongson_mmi=no
 AC_MSG_CHECKING(whether to use Loongson MMI assembler)
 
-xserver_save_CFLAGS=$CFLAGS
-CFLAGS=" $LS_CFLAGS $CFLAGS -I$srcdir"
-AC_COMPILE_IFELSE([AC_LANG_SOURCE([[
+save_CFLAGS=$CFLAGS
+CFLAGS=" $CFLAGS $LS2E_CFLAGS -I$srcdir"
+AC_LINK_IFELSE([AC_LANG_SOURCE([[
 #ifndef __mips_loongson_vector_rev
 #error "Loongson Multimedia Instructions are only available on Loongson"
 #endif
 #if defined(__GNUC__) && (__GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 
4))
-#error "Need GCC >= 4.4 for Loongson MMI compilation"
+#error "Need GCC >= 4.4 for Loongson 2e/f MMI compilation"
 #endif
 #include "pixman/loongson-mmintrin.h"
 int main () {
@@ -295,30 +300,120 @@ int main () {
 int b = 4;
 __m64 c = _mm_srli_pi16 (a.v, b);
 return 0;
-}]])], have_loongson_mmi=yes)
-CFLAGS=$xserver_save_CFLAGS
+}]])], have_loongson2e_mmi=yes)
+CFLAGS=$save_CFLAGS
+
+save_CFLAGS=$CFLAGS
+CFLAGS=" $CFLAGS $LS2F_CFLAGS -I$srcdir"
+AC_LINK_IFELSE([AC_LANG_SOURCE([[
+#ifndef __mips_loongson_vector_rev
+#error "Loongson Multimedia Instructions are only available on Loongson"
+#endif
+#if defined(__GNUC__) && (__GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 
4))
+#error "Need GCC >= 4.4 for Loongson 2e/f MMI compilation"
+#endif
+#include "pixman/loongson-mmintrin.h"
+int main () {
+union {
+__m64 v;
+char c[8];
+} a = { .c = {1, 2, 3, 4, 5, 6, 7, 8} };
+int b = 4;
+__m64 c = _mm_srli_pi16 (a.v, b);
+return 0;
+}]])], have_loongson2f_mmi=yes)
+CFLAGS=$save_CFLAGS
+
+save_CFLAGS=$CFLAGS
+CFLAGS=" $CFLAGS $LS3A_CFLAGS -I$srcdir"
+AC_LINK_IFELSE([AC_LANG_SOURCE([[
+#ifndef __mips_loongson_vector_rev
+#error "Loongson Multimedia Instructions are only available on Loongson"
+#endif
+#if defined(__GNUC__) && (__GNUC__ < 4 || (__GNUC__ == 4 && __GNUC_MINOR__ < 
6))
+#error "Need GCC >= 4.6 for Loongson 3A MMI compilation"
+#endif
+#include "pixman/loongson-mmintrin.h"
+int main () {
+union {
+__m64 v;
+char c[8];
+} a = { .c = {1, 2, 3, 4, 5, 6, 7, 8} };
+int b = 4;
+__m64 c = _mm_srli_pi16 (a.v, b);
+return 0;
+}]])], have_loongson3a_mmi=yes)
+CFLAGS=$save_CFLAGS
 
 AC_ARG_ENABLE(loongson-mmi,
[AC_HELP_STRING([--disable-loongson-mmi],
[disable Loongson MMI fast paths])],
[enable_loongson_mmi=$enableval], [enable_loongson_mmi=auto])
-
-if test $enable_loongson_mmi = no ; then
-   have_loongson_mmi=disabled
-fi
-
-if test $have_loongson_mmi = yes ; then
+AC_ARG_ENABLE(loongson2e-mmi,
+   [AC_HELP_STRING([--disable-loongson2e-mmi],
+   [do 

Re: [Pixman] [PATCH] sse2: Implement simple bilinear scaling for x8r8g8b8 to a8r8g8b8

2013-01-23 Thread Matt Turner
On Wed, Jan 23, 2013 at 6:37 AM, Chris Wilson  wrote:
> Improves firefon-tron on a IVB i7-3720qm: 68.6s to 45.2s.
>
> Signed-off-by: Chris Wilson 
> ---
>  pixman/pixman-sse2.c |   63 
> ++
>  1 file changed, 63 insertions(+)
>
> diff --git a/pixman/pixman-sse2.c b/pixman/pixman-sse2.c
> index fc873cc..bc3e2f1 100644
> --- a/pixman/pixman-sse2.c
> +++ b/pixman/pixman-sse2.c
> @@ -5679,6 +5679,67 @@ FAST_BILINEAR_MAINLOOP_COMMON 
> (sse2___normal_SRC,
>NORMAL, FLAG_NONE)
>
>  static force_inline void
> +scaled_bilinear_scanline_sse2_0888__SRC (uint32_t *   dst,

Maybe some funny whitespace before dst? Or maybe just a spaces vs tabs issue.

Anyway, Reviewed-by: Matt Turner 

> +const uint32_t * mask,
> +const uint32_t * src_top,
> +const uint32_t * src_bottom,
> +int32_t  w,
> +int  wt,
> +int  wb,
> +pixman_fixed_t   vx,
> +pixman_fixed_t   unit_x,
> +pixman_fixed_t   max_vx,
> +pixman_bool_tzero_src)
> +{
> +BILINEAR_DECLARE_VARIABLES;
> +uint32_t pix1, pix2, pix3, pix4;
> +
> +while ((w -= 4) >= 0)
> +{
> +   BILINEAR_INTERPOLATE_ONE_PIXEL (pix1);
> +   BILINEAR_INTERPOLATE_ONE_PIXEL (pix2);
> +   BILINEAR_INTERPOLATE_ONE_PIXEL (pix3);
> +   BILINEAR_INTERPOLATE_ONE_PIXEL (pix4);
> +   *dst++ = pix1 | 0xff00;
> +   *dst++ = pix2 | 0xff00;
> +   *dst++ = pix3 | 0xff00;
> +   *dst++ = pix4 | 0xff00;
> +}
> +
> +if (w & 2)
> +{
> +   BILINEAR_INTERPOLATE_ONE_PIXEL (pix1);
> +   BILINEAR_INTERPOLATE_ONE_PIXEL (pix2);
> +   *dst++ = pix1 | 0xff00;
> +   *dst++ = pix2 | 0xff00;
> +}
> +
> +if (w & 1)
> +{
> +   BILINEAR_INTERPOLATE_ONE_PIXEL (pix1);
> +   *dst = pix1 | 0xff00;
> +}
> +
> +}
> +
> +FAST_BILINEAR_MAINLOOP_COMMON (sse2_0888__cover_SRC,
> +  scaled_bilinear_scanline_sse2_0888__SRC,
> +  uint32_t, uint32_t, uint32_t,
> +  COVER, FLAG_NONE)
> +FAST_BILINEAR_MAINLOOP_COMMON (sse2_0888__pad_SRC,
> +  scaled_bilinear_scanline_sse2_0888__SRC,
> +  uint32_t, uint32_t, uint32_t,
> +  PAD, FLAG_NONE)
> +FAST_BILINEAR_MAINLOOP_COMMON (sse2_0888__none_SRC,
> +  scaled_bilinear_scanline_sse2_0888__SRC,
> +  uint32_t, uint32_t, uint32_t,
> +  NONE, FLAG_NONE)
> +FAST_BILINEAR_MAINLOOP_COMMON (sse2_0888__normal_SRC,
> +  scaled_bilinear_scanline_sse2_0888__SRC,
> +  uint32_t, uint32_t, uint32_t,
> +  NORMAL, FLAG_NONE)
> +
> +static force_inline void
>  scaled_bilinear_scanline_sse2___OVER (uint32_t *   dst,
>   const uint32_t * mask,
>   const uint32_t * src_top,
> @@ -6185,6 +6246,8 @@ static const pixman_fast_path_t sse2_fast_paths[] =
>  SIMPLE_BILINEAR_FAST_PATH (SRC, a8b8g8r8, a8b8g8r8, sse2__),
>  SIMPLE_BILINEAR_FAST_PATH (SRC, a8b8g8r8, x8b8g8r8, sse2__),
>  SIMPLE_BILINEAR_FAST_PATH (SRC, x8b8g8r8, x8b8g8r8, sse2__),
> +SIMPLE_BILINEAR_FAST_PATH (SRC, x8r8g8b8, a8r8g8b8, sse2_0888_),
> +SIMPLE_BILINEAR_FAST_PATH (SRC, x8b8g8r8, a8b8g8r8, sse2_0888_),
>
>  SIMPLE_BILINEAR_FAST_PATH (OVER, a8r8g8b8, x8r8g8b8, sse2__),
>  SIMPLE_BILINEAR_FAST_PATH (OVER, a8b8g8r8, x8b8g8r8, sse2__),
> --
> 1.7.10.4
>
> ___
> Pixman mailing list
> Pixman@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/pixman
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [PATCH] Add new demos and tests to .gitignore

2013-01-18 Thread Matt Turner
---
 .gitignore | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/.gitignore b/.gitignore
index 2d089fc..648699b 100644
--- a/.gitignore
+++ b/.gitignore
@@ -31,12 +31,15 @@ demos/checkerboard
 demos/clip-in
 demos/clip-test
 demos/composite-test
+demos/conical-test
 demos/convolution-test
 demos/gradient-test
 demos/quad2quad
 demos/radial-test
+demos/scale
 demos/screen-test
 demos/srgb-test
+demos/srgb-trap-test
 demos/trap-test
 demos/tri-test
 pixman/pixman-srgb.c
@@ -49,6 +52,7 @@ test/alpha-test
 test/blitters-test
 test/clip-in
 test/clip-test
+test/combiner-test
 test/composite
 test/composite-test
 test/composite-traps-test
@@ -57,13 +61,16 @@ test/fetch-test
 test/glyph-test
 test/gradient-crash-test
 test/gradient-test
+test/infinite-loop
 test/lowlevel-blt-bench
 test/oob-test
 test/pdf-op-test
+test/prng-test
 test/region-contains-test
 test/region-test
 test/region-translate
 test/region-translate-test
+test/rotate-test
 test/scaling-crash-test
 test/scaling-helpers-test
 test/scaling-test
-- 
1.7.12.4

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] 0.29.2

2013-01-18 Thread Matt Turner
On Fri, Jan 18, 2013 at 4:15 PM, Søren Sandmann  wrote:
> Hi,
>
> It's about time to get a 0.29.2 development snapshot out, but there are
> some outstanding patches

I'd like to get my triple build loongson patch in, but haven't gotten
any testers yet. I'll set up a chroot this weekend to test it.

Matt
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [PATCH] Convert INCLUDES to AM_CPPFLAGS

2013-01-18 Thread Matt Turner
INCLUDES has been deprecated starting with automake 1.13. Convert all
occurrences with the recommended AM_CPPFLAGS replacement.
---
 demos/Makefile.am  | 2 +-
 pixman/Makefile.am | 2 +-
 test/Makefile.am   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/demos/Makefile.am b/demos/Makefile.am
index f324f5f..fca2710 100644
--- a/demos/Makefile.am
+++ b/demos/Makefile.am
@@ -4,7 +4,7 @@ AM_CFLAGS = $(OPENMP_CFLAGS)
 AM_LDFLAGS = $(OPENMP_CFLAGS)
 
 LDADD = $(top_builddir)/pixman/libpixman-1.la -lm $(GTK_LIBS) $(PNG_LIBS)
-INCLUDES = -I$(top_srcdir)/pixman -I$(top_builddir)/pixman $(GTK_CFLAGS) 
$(PNG_CFLAGS)
+AM_CPPFLAGS = -I$(top_srcdir)/pixman -I$(top_builddir)/pixman $(GTK_CFLAGS) 
$(PNG_CFLAGS)
 
 GTK_UTILS = gtk-utils.c gtk-utils.h ../test/utils.c ../test/utils.h
 
diff --git a/pixman/Makefile.am b/pixman/Makefile.am
index 270d65e..d4b7bb3 100644
--- a/pixman/Makefile.am
+++ b/pixman/Makefile.am
@@ -91,7 +91,7 @@ noinst_LTLIBRARIES += libpixman-iwmmxt.la
 libpixman_1_la_LIBADD += libpixman-iwmmxt.la
 
 libpixman_iwmmxt_la-pixman-mmx.lo: pixman-mmx.c
-   $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) 
$(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) 
$(AM_CPPFLAGS) $(CPPFLAGS) $(CFLAGS) $(IWMMXT_CFLAGS) -MT 
libpixman_iwmmxt_la-pixman-mmx.lo -MD -MP -MF 
$(DEPDIR)/libpixman_iwmmxt_la-pixman-mmx.Tpo -c -o 
libpixman_iwmmxt_la-pixman-mmx.lo `test -f 'pixman-mmx.c' || echo 
'$(srcdir)/'`pixman-mmx.c
+   $(AM_V_CC)$(LIBTOOL) $(AM_V_lt) --tag=CC $(AM_LIBTOOLFLAGS) 
$(LIBTOOLFLAGS) --mode=compile $(CC) $(DEFS) $(DEFAULT_INCLUDES) $(AM_CPPFLAGS) 
$(AM_CPPFLAGS) $(CPPFLAGS) $(CFLAGS) $(IWMMXT_CFLAGS) -MT 
libpixman_iwmmxt_la-pixman-mmx.lo -MD -MP -MF 
$(DEPDIR)/libpixman_iwmmxt_la-pixman-mmx.Tpo -c -o 
libpixman_iwmmxt_la-pixman-mmx.lo `test -f 'pixman-mmx.c' || echo 
'$(srcdir)/'`pixman-mmx.c
$(AM_V_at)$(am__mv) $(DEPDIR)/libpixman_iwmmxt_la-pixman-mmx.Tpo 
$(DEPDIR)/libpixman_iwmmxt_la-pixman-mmx.Plo
 
 libpixman_iwmmxt_la_DEPENDENCIES = $(am__DEPENDENCIES_1)
diff --git a/test/Makefile.am b/test/Makefile.am
index eeb3679..ca87f4e 100644
--- a/test/Makefile.am
+++ b/test/Makefile.am
@@ -3,7 +3,7 @@ include $(top_srcdir)/test/Makefile.sources
 AM_CFLAGS = $(OPENMP_CFLAGS)
 AM_LDFLAGS = $(OPENMP_CFLAGS) $(TESTPROGS_EXTRA_LDFLAGS)
 LDADD = libutils.la $(top_builddir)/pixman/libpixman-1.la -lm  $(PNG_LIBS)
-INCLUDES = -I$(top_srcdir)/pixman -I$(top_builddir)/pixman $(PNG_CFLAGS)
+AM_CPPFLAGS = -I$(top_srcdir)/pixman -I$(top_builddir)/pixman $(PNG_CFLAGS)
 
 libutils_la_SOURCES = $(libutils_sources) $(libutils_headers)
 
-- 
1.7.12.4

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [PATCH] Add new demos and tests to .gitignore

2013-01-18 Thread Matt Turner
---
 .gitignore | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/.gitignore b/.gitignore
index a4d9f99..dcb3f8e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -37,6 +37,7 @@ demos/quad2quad
 demos/radial-test
 demos/screen-test
 demos/srgb-test
+demos/srgb-trap-test
 demos/trap-test
 demos/tri-test
 pixman/pixman-combine32.c
@@ -61,6 +62,7 @@ test/fetch-test
 test/glyph-test
 test/gradient-crash-test
 test/gradient-test
+test/infinite-loop
 test/lowlevel-blt-bench
 test/oob-test
 test/pdf-op-test
@@ -68,6 +70,7 @@ test/region-contains-test
 test/region-test
 test/region-translate
 test/region-translate-test
+test/rotate-test
 test/scaling-crash-test
 test/scaling-helpers-test
 test/scaling-test
-- 
1.7.12.4

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] build: Support building Loongson code for 2e, 2f, 3a

2013-01-17 Thread Matt Turner
On Sun, Jan 6, 2013 at 7:46 PM, Cyril Brulebois  wrote:
> Hello Matt,
>
> Matt Turner  (06/01/2013):
>> On Sat, Sep 15, 2012 at 11:59 PM, Matt Turner  wrote:
>> > pixman/Makefile.am contains a hack that allows pixman-mmx.c to
>> > be compiled with different overriding CFLAGS, since automake
>> > seriously doesn't have a way to do this. Seriously stupid.
>> >
>> > It works by defining a new rule and recursively calling make
>> > with modified CFLAGS set.
>> >
>> > Note the difference between the USE_LOONGSON* and HAVE_LOONGSON*
>> > preprocessor macros.
>> >
>> > Cc: Cyril Brulebois 
>> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51451
>> > ---
>>
>> Cyril,
>>
>> I've updated the patch so that it builds .so files for each
>> architecture against which pixman links and attached it to the bug
>> report. Please give it a test. I cannot test it, as my system is
>> compiled with -march=loongson2f and therefore I cannot even link code
>> compiled with -march=loongson2e with my C library.
>
> thanks; unfortunately I'm busy working on the Debian Installer right
> now and pixman is a bit further down my todo list. Adding debian-mips@
> to Cc, hoping somebody there will be able to perform some tests/share
> some insight.
>
> Mraw,
> KiBi.

Any testers, debian-mips@?
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] sse2: Add fast paths for bilinear source with a solid mask

2013-01-08 Thread Matt Turner
On Tue, Jan 8, 2013 at 12:55 PM, Chris Wilson  wrote:
> Based on the existing sse2__n_ nearest scaling routines.
>
> fishbowl on an i5-2500: 60.9s -> 56.9s
>
> Signed-off-by: Chris Wilson 
> ---

Looks good to me. Reviewed-by: Matt Turner 
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] build: Support building Loongson code for 2e, 2f, 3a

2013-01-06 Thread Matt Turner
On Sat, Sep 15, 2012 at 11:59 PM, Matt Turner  wrote:
> pixman/Makefile.am contains a hack that allows pixman-mmx.c to
> be compiled with different overriding CFLAGS, since automake
> seriously doesn't have a way to do this. Seriously stupid.
>
> It works by defining a new rule and recursively calling make
> with modified CFLAGS set.
>
> Note the difference between the USE_LOONGSON* and HAVE_LOONGSON*
> preprocessor macros.
>
> Cc: Cyril Brulebois 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=51451
> ---

Cyril,

I've updated the patch so that it builds .so files for each
architecture against which pixman links and attached it to the bug
report. Please give it a test. I cannot test it, as my system is
compiled with -march=loongson2f and therefore I cannot even link code
compiled with -march=loongson2e with my C library.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Fix build with automake-1.13

2013-01-03 Thread Matt Turner
On Wed, Jan 2, 2013 at 8:38 PM, Marko Lindqvist  wrote:
> Automake-1.13 has removed long obsolete AM_CONFIG_HEADER macro (
> http://lists.gnu.org/archive/html/automake/2012-12/msg00038.html )
> and autoreconf errors out upon seeing it.
>
> Attached patch replaces obsolete AM_CONFIG_HEADER with now proper
> AC_CONFIG_HEADERS.
>
> I'm not subscribed to the mailing list.

Thanks, I tried to apply this, but git won't let me push... will try
to get this worked out.

In the future, please use git format-patch and git send-email. To
apply your patch, I had to

patch -p1 < ...
git commit --author="Marko Lindqvist " -a


It's a lot nicer to just be able to type git am :)

Matt
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] sse2: Add a fast path for add_n_8888

2013-01-02 Thread Matt Turner
On Wed, Jan 2, 2013 at 3:01 AM, Chris Wilson  wrote:
> This path is being exercised by inplace compositing of trapezoids, for
> instance as used in the firefox-asteroids cairo-trace.
>
> core2 @ 2.66GHz,
>
> reference memcpy speed = 4898.2MB/s (1224.6MP/s for 32bpp fills)
>
> before: add_n_ = L1:   4.36  L2:   4.27  M:  1.61 (  0.13%)  HT:
> 1.65  VT:  1.63  R:  1.63  RT:  1.59 (  21Kops/s)
>
> after:  add_n_ = L1:2969.09  L2:3926.11  M:603.30 ( 49.27%)  HT:524.69
> VT:401.01  R:407.59  RT:210.34 ( 804Kops/s)
>
> Signed-off-by: Chris Wilson 
> ---
>  pixman/pixman-sse2.c |   63 
> ++
>  1 file changed, 63 insertions(+)
>
> diff --git a/pixman/pixman-sse2.c b/pixman/pixman-sse2.c
> index 665eead..73eee68 100644
> --- a/pixman/pixman-sse2.c
> +++ b/pixman/pixman-sse2.c
> @@ -4519,9 +4519,70 @@ sse2_composite_add__ (pixman_implementation_t 
> *imp,
>
> sse2_combine_add_u (imp, op, dst, src, NULL, width);
>  }
> +}
> +
> +static void
> +sse2_composite_add_n_ (pixman_implementation_t *imp,
> +  pixman_composite_info_t *info)
> +{
> +PIXMAN_COMPOSITE_ARGS (info);
> +uint32_t *dst_line, *dst, src;
> +int dst_stride;
> +
> +__m128i xmm_src;
> +
> +PIXMAN_IMAGE_GET_LINE (dest_image, dest_x, dest_y, uint32_t, dst_stride, 
> dst_line, 1);
> +
> +src = _pixman_image_get_solid (imp, src_image, dest_image->bits.format);
> +if (src == 0)
> +   return;
> +
> +if (src == ~0)
> +{
> +   pixman_fill (dest_image->bits.bits, dest_image->bits.rowstride, 32,
> +dest_x, dest_y, width, height, ~0);
> +
> +   return;
> +}
> +
> +xmm_src = _mm_set_epi32 (src, src, src, src);
> +while (height--)
> +{
> +   int w = width;
> +   uint32_t d;
>
> +   dst = dst_line;
> +   dst_line += dst_stride;
> +
> +   while (w && (unsigned long)dst & 15)

Use uintptr_t instead. The rest of the patch looks good to me.

> +   {
> +   d = *dst;
> +   *dst++ =
> +   _mm_cvtsi128_si32 ( _mm_adds_epu8 (xmm_src, _mm_cvtsi32_si128 
> (d)));
> +   w--;
> +   }
> +
> +   while (w >= 4)
> +   {
> +   save_128_aligned
> +   ((__m128i*)dst,
> +_mm_adds_epu8 (xmm_src, load_128_aligned ((__m128i*)dst)));
> +
> +   dst += 4;
> +   w -= 4;
> +   }
> +
> +   while (w--)
> +   {
> +   d = *dst;
> +   *dst++ =
> +   _mm_cvtsi128_si32 (_mm_adds_epu8 (xmm_src,
> + _mm_cvtsi32_si128 (d)));
> +   }
> +}
>  }
>
> +
>  static pixman_bool_t
>  pixman_blt_sse2 (uint32_t *src_bits,
>   uint32_t *dst_bits,
> @@ -5814,6 +5875,8 @@ static const pixman_fast_path_t sse2_fast_paths[] =
>  PIXMAN_STD_FAST_PATH (ADD, a8b8g8r8, null, a8b8g8r8, 
> sse2_composite_add__),
>  PIXMAN_STD_FAST_PATH (ADD, solid, a8, a8, sse2_composite_add_n_8_8),
>  PIXMAN_STD_FAST_PATH (ADD, solid, null, a8, sse2_composite_add_n_8),
> +PIXMAN_STD_FAST_PATH (ADD, solid, null, x8r8g8b8, 
> sse2_composite_add_n_),
> +PIXMAN_STD_FAST_PATH (ADD, solid, null, a8r8g8b8, 
> sse2_composite_add_n_),
>
>  /* PIXMAN_OP_SRC */
>  PIXMAN_STD_FAST_PATH (SRC, solid, a8, a8r8g8b8, 
> sse2_composite_src_n_8_),
> --
> 1.7.10.4
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] sse2: Add a fast path for add_n_8888

2013-01-02 Thread Matt Turner
On Wed, Jan 2, 2013 at 3:01 AM, Chris Wilson  wrote:
> This path is being exercised by inplace compositing of trapezoids, for
> instance as used in the firefox-asteroids cairo-trace.

cairo-perf-trace numbers from firefox-asteroids would be cool to have
in the commit message.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [cairo] issue with blend modes in pixman

2012-12-31 Thread Matt Turner
On Mon, Dec 31, 2012 at 1:05 PM, Rik Cabanier  wrote:
> Looking at the formulas, I can see what's wrong but I don't know who to
> contact.

These mailing lists are perfect.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Always use xmmintrin.h for 64 bit Windows

2012-11-16 Thread Matt Turner
On Tue, Nov 13, 2012 at 10:44 AM, Stefan Weil  wrote:
> MinGW-w64 uses the GNU compiler and does not define _MSC_VER.
> Nevertheless, it provides xmmintrin.h and must be handled
> here like the MS compiler. Otherwise compilation fails due to
> conflicting declarations.
>
> Signed-off-by: Stefan Weil 
> ---
>  pixman/pixman-mmx.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
> index c2ae4ea..aef468a 100644
> --- a/pixman/pixman-mmx.c
> +++ b/pixman/pixman-mmx.c
> @@ -62,7 +62,7 @@ _mm_empty (void)
>  #endif
>
>  #ifdef USE_X86_MMX
> -# if (defined(__SUNPRO_C) || defined(_MSC_VER))
> +# if (defined(__SUNPRO_C) || defined(_MSC_VER) || defined(_WIN64))
>  #  include 
>  # else
>  /* We have to compile with -msse to use xmmintrin.h, but that causes SSE
> --
> 1.7.10.4

If you're compiling for Win64, you have SSE2. Why even compile the MMX code?
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] Questionable numbers from lowlevel-blt-bench

2012-10-01 Thread Matt Turner
On Mon, Oct 1, 2012 at 1:17 AM, Jonathan Morton
 wrote:
> On Sun, 30 Sep 2012 15:05:18 -0700, Matt Turner 
> wrote:
>> In doing performance work, I've noticed some weird results from
>> lowlevel-blt-bench. Often it has seemed that the RT results determined
>> the Kops/s almost entirely. I came across an instance of this today
>> which was particularly striking:
>>
>> Before:
>> add__ =  L1:  47.01  L2:  36.84  M: 18.96 ( 33.14%)  HT: 35.94
>>  VT: 33.82  R: 30.64  RT: 19.36 ( 181Kops/s)
>>
>> After:
>> add__ =  L1: 230.78  L2: 200.86  M: 90.48 (159.44%)  HT: 48.41
>>  VT: 45.46  R: 42.78  RT: 19.22 ( 181Kops/s)
>>
>> L1/L2/M numbers are improved by ~5x. HT, VT, and R numbers are
>> improved by ~1.35x. RT doesn't change... neither does Kops/s.
>>
>> What's going on here, and can we make the composite result more sensible?
>
> The figures in brackets are derived directly from one or more of the
> other figures.  In this case, the Kops/s number is derived directly
> from the RT number, which should explain why they correlate.

Ahh. At least I (and I'm pretty sure others too) thought that the Kops
number was supposed to be a composite of HT, VT, RT, and R. This
explains it then.

> The percentage figure, meanwhile, represents a percentage of memory
> bandwidth used by this blitter (under the M test), the peak bandwidth
> being derived from an earlier measurement.  (You're seeing more than
> 100%, which suggests that the earlier measurement is not optimal.)

Indeed. I'm prefetching in the modified function.

> The RT figure is meant to measure, as directly as possible, the per-call
> overhead which does not depend on the number of pixels involved.
> Accordingly, it is not expected to change significantly when doing
> pixel-related optimisations.

Right, makes sense.

Thanks!
Matt
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] Questionable numbers from lowlevel-blt-bench

2012-09-30 Thread Matt Turner
Hi Jonathan,

In doing performance work, I've noticed some weird results from
lowlevel-blt-bench. Often it has seemed that the RT results determined
the Kops/s almost entirely. I came across an instance of this today
which was particularly striking:

Before:
add__ =  L1:  47.01  L2:  36.84  M: 18.96 ( 33.14%)  HT: 35.94
 VT: 33.82  R: 30.64  RT: 19.36 ( 181Kops/s)

After:
add__ =  L1: 230.78  L2: 200.86  M: 90.48 (159.44%)  HT: 48.41
 VT: 45.46  R: 42.78  RT: 19.22 ( 181Kops/s)

L1/L2/M numbers are improved by ~5x. HT, VT, and R numbers are
improved by ~1.35x. RT doesn't change... neither does Kops/s.

What's going on here, and can we make the composite result more sensible?

Thanks,
Matt
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [PATCH 2/2] sse2: use add_8888_8888 for x8* formats

2012-09-30 Thread Matt Turner
---
 pixman/pixman-sse2.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/pixman/pixman-sse2.c b/pixman/pixman-sse2.c
index efed310..82e242a 100644
--- a/pixman/pixman-sse2.c
+++ b/pixman/pixman-sse2.c
@@ -5845,7 +5845,11 @@ static const pixman_fast_path_t sse2_fast_paths[] =
 PIXMAN_STD_FAST_PATH_CA (ADD, solid, a8r8g8b8, a8r8g8b8, 
sse2_composite_add_n___ca),
 PIXMAN_STD_FAST_PATH (ADD, a8, null, a8, sse2_composite_add_8_8),
 PIXMAN_STD_FAST_PATH (ADD, a8r8g8b8, null, a8r8g8b8, 
sse2_composite_add__),
+PIXMAN_STD_FAST_PATH (ADD, a8r8g8b8, null, x8r8g8b8, 
sse2_composite_add__),
+PIXMAN_STD_FAST_PATH (ADD, x8r8g8b8, null, x8r8g8b8, 
sse2_composite_add__),
 PIXMAN_STD_FAST_PATH (ADD, a8b8g8r8, null, a8b8g8r8, 
sse2_composite_add__),
+PIXMAN_STD_FAST_PATH (ADD, a8b8g8r8, null, x8b8g8r8, 
sse2_composite_add__),
+PIXMAN_STD_FAST_PATH (ADD, x8b8g8r8, null, x8b8g8r8, 
sse2_composite_add__),
 PIXMAN_STD_FAST_PATH (ADD, solid, a8, a8, sse2_composite_add_n_8_8),
 PIXMAN_STD_FAST_PATH (ADD, solid, null, a8, sse2_composite_add_n_8),
 
-- 
1.7.8.6

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [PATCH 1/2] mmx: use add_8888_8888 for x8* formats

2012-09-30 Thread Matt Turner
---
 pixman/pixman-mmx.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/pixman/pixman-mmx.c b/pixman/pixman-mmx.c
index fccba9d..2771a38 100644
--- a/pixman/pixman-mmx.c
+++ b/pixman/pixman-mmx.c
@@ -4000,7 +4000,11 @@ static const pixman_fast_path_t mmx_fast_paths[] =
 PIXMAN_STD_FAST_PATH(ADD,  r5g6b5,   null, r5g6b5,   
mmx_composite_add_0565_0565   ),
 PIXMAN_STD_FAST_PATH(ADD,  b5g6r5,   null, b5g6r5,   
mmx_composite_add_0565_0565   ),
 PIXMAN_STD_FAST_PATH(ADD,  a8r8g8b8, null, a8r8g8b8, 
mmx_composite_add__   ),
+PIXMAN_STD_FAST_PATH(ADD,  a8r8g8b8, null, x8r8g8b8, 
mmx_composite_add__   ),
+PIXMAN_STD_FAST_PATH(ADD,  x8r8g8b8, null, x8r8g8b8, 
mmx_composite_add__   ),
 PIXMAN_STD_FAST_PATH(ADD,  a8b8g8r8, null, a8b8g8r8, 
mmx_composite_add__   ),
+PIXMAN_STD_FAST_PATH(ADD,  a8b8g8r8, null, x8b8g8r8, 
mmx_composite_add__   ),
+PIXMAN_STD_FAST_PATH(ADD,  x8b8g8r8, null, x8b8g8r8, 
mmx_composite_add__   ),
 PIXMAN_STD_FAST_PATH(ADD,  a8,   null, a8,   
mmx_composite_add_8_8),
 PIXMAN_STD_FAST_PATH(ADD,  solid,a8,   a8,   
mmx_composite_add_n_8_8   ),
 
-- 
1.7.8.6

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [test PATCH] Use _mm_maddubs_epi16 in BILINEAR_INTERPOLATE_ONE_PIXEL

2012-09-29 Thread Matt Turner
Siarhei, can you measure any performance improvement with this? I
can't... :(
---
 pixman/pixman-sse2.c |8 +++-
 1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/pixman/pixman-sse2.c b/pixman/pixman-sse2.c
index efed310..4fbc045 100644
--- a/pixman/pixman-sse2.c
+++ b/pixman/pixman-sse2.c
@@ -32,6 +32,7 @@
 
 #include  /* for _mm_shuffle_pi16 and _MM_SHUFFLE */
 #include  /* for SSE2 intrinsics */
+#include  /* for SSSE3 intrinsics */
 #include "pixman-private.h"
 #include "pixman-combine32.h"
 #include "pixman-inlines.h"
@@ -5414,7 +5415,7 @@ FAST_NEAREST_MAINLOOP_COMMON 
(sse2__n__normal_OVER,
 
 #define BILINEAR_INTERPOLATE_ONE_PIXEL(pix)
\
 do {   
\
-__m128i xmm_wh, xmm_lo, xmm_hi, a; 
\
+__m128i xmm_wh, a; 
\
 /* fetch 2x2 pixel block into sse2 registers */
\
 __m128i tltr = _mm_loadl_epi64 (   
\
(__m128i *)&src_top[pixman_fixed_to_int (vx)]); 
\
@@ -5443,10 +5444,7 @@ do { 
\
_mm_srli_epi16 (xmm_x, 16 - BILINEAR_INTERPOLATION_BITS))); 
\
xmm_x = _mm_add_epi16 (xmm_x, xmm_ux);  
\
/* horizontal interpolation */  
\
-   xmm_lo = _mm_mullo_epi16 (a, xmm_wh);   
\
-   xmm_hi = _mm_mulhi_epu16 (a, xmm_wh);   
\
-   a = _mm_add_epi32 (_mm_unpacklo_epi16 (xmm_lo, xmm_hi), 
\
-  _mm_unpackhi_epi16 (xmm_lo, xmm_hi));
\
+   a = _mm_maddubs_epi16 (a, xmm_wh);  
\
 }  
\
 /* shift and pack the result */
\
 a = _mm_srli_epi32 (a, BILINEAR_INTERPOLATION_BITS * 2);   
\
-- 
1.7.8.6

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 05/10] pixman-utils.c, pixman-private.h: Add floating point conversion routines

2012-09-26 Thread Matt Turner
On Wed, Sep 26, 2012 at 1:43 PM, Søren Sandmann  wrote:
> From: Søren Sandmann Pedersen 
>
> A new struct argb_t containing a floating point pixel is added to
> pixman-private.h, and conversion routines are added to pixman-utils.c
> to convert normalized integers to and from that struct.
>
> New functions:
>
>   - pixman_expand_to_float()
> Expands a buffer of integer pixels to a buffer of argb_t pixels
>
>   - pixman_contract_from_float()
> Converts a buffer of argb_t pixels to a buffer integer pixels
>
>   - pixman_float_to_unorm()
> Converts a floating point number to an unsigned normalized integer
>
>   - pixman_unorm_to_float()
> Converts an unsigned normalized integer to a floating point number
> ---
>  pixman/pixman-private.h |   35 +++
>  pixman/pixman-utils.c   |  107 
> +++
>  2 files changed, 142 insertions(+), 0 deletions(-)
>
> diff --git a/pixman/pixman-private.h b/pixman/pixman-private.h
> index c82316f..91f35ed 100644
> --- a/pixman/pixman-private.h
> +++ b/pixman/pixman-private.h
> @@ -45,6 +45,16 @@ typedef struct radial_gradient radial_gradient_t;
>  typedef struct bits_image bits_image_t;
>  typedef struct circle circle_t;
>
> +typedef struct argb_t argb_t;
> +
> +struct argb_t
> +{
> +float a;
> +float r;
> +float g;
> +float b;
> +};
> +
>  typedef void (*fetch_scanline_t) (pixman_image_t *image,
>   int x,
>   int y,
> @@ -792,12 +802,34 @@ pixman_expand (uint64_t *   dst,
> const uint32_t * src,
> pixman_format_code_t format,
> int  width);
> +void
> +pixman_expand_to_float (argb_t   *dst,
> +   const uint32_t   *src,
> +   pixman_format_code_t  format,
> +   int   width);
>
>  void
>  pixman_contract (uint32_t *  dst,
>   const uint64_t *src,
>   int width);
>
> +void
> +pixman_contract_from_float (uint32_t *dst,
> +   const argb_t *src,
> +   int   width);
> +
> +pixman_bool_t
> +_pixman_lookup_composite_function (pixman_implementation_t *toplevel,
> +  pixman_op_t  op,
> +  pixman_format_code_t src_format,
> +  uint32_t src_flags,
> +  pixman_format_code_t mask_format,
> +  uint32_t mask_flags,
> +  pixman_format_code_t dest_format,
> +  uint32_t dest_flags,
> +  pixman_implementation_t**out_imp,
> +  pixman_composite_func_t *out_func);
> +
>  /* Region Helpers */
>  pixman_bool_t
>  pixman_region32_copy_from_region16 (pixman_region32_t *dst,
> @@ -957,6 +989,9 @@ unorm_to_unorm (uint32_t val, int from_bits, int to_bits)
>  return result;
>  }
>
> +uint16_t pixman_float_to_unorm (float f, int n_bits);
> +float pixman_unorm_to_float (uint16_t u, int n_bits);
> +
>  /*
>   * Various debugging code
>   */
> diff --git a/pixman/pixman-utils.c b/pixman/pixman-utils.c
> index e4a9730..4f9db29 100644
> --- a/pixman/pixman-utils.c
> +++ b/pixman/pixman-utils.c
> @@ -162,6 +162,113 @@ pixman_expand (uint64_t *   dst,
>  }
>  }
>
> +static force_inline uint16_t
> +float_to_unorm (float f, int n_bits)
> +{
> +uint32_t u;
> +
> +if (f > 1.0)
> +   f = 1.0;
> +if (f < 0.0)
> +   f = 0.0;
> +
> +u = f * (1 << n_bits);
> +u -= (u >> n_bits);
> +
> +return u;
> +}
> +
> +static force_inline float
> +unorm_to_float (uint16_t u, int n_bits)
> +{
> +uint32_t m = ((1 << n_bits) - 1);
> +
> +return (u & m) * (1.f / (float)m);
> +}
> +
> +/*
> + * This function expands images from a8r8g8b8 to argb_t.  To preserve
> + * precision, it needs to know from which source format the a8r8g8b8 pixels
> + * originally came.
> + *
> + * For example, if the source was PIXMAN_x1r5g5b5 and the red component
> + * contained bits 12345, then the 8-bit value is 12345123.  To correctly
> + * expand this to floating point, it should be 12345 / 31.0 and not
> + * 12345123 / 255.0.
> + */
> +void
> +pixman_expand_to_float (argb_t   *dst,
> +   const uint32_t   *src,
> +   pixman_format_code_t  format,
> +   int   width)
> +{
> +int a_size, r_size, g_size, b_size;
> +int a_shift, r_shift, g_shift, b_shift;
> +int i;
> +
> +if (!PIXMAN_FORMAT_VIS (format))
> +   format = PIXMAN_a8r8g8b8;
> +
> +/*
> + * Determine the

  1   2   3   >