Re: [Pixman] 'AV_386_SSSE3' undeclared when building pixman0.34.1 with Solaris10-u2

2016-04-04 Thread Yike Zhang
Thanks Alan, I would try with that option. Hopefully we would not came across 
any functions actually invoke SSSE3 supported feature.


Best Regards
Yike


-Original Message-
From: Alan Coopersmith [mailto:alan.coopersm...@oracle.com] 
Sent: Tuesday, April 05, 2016 10:01 AM
To: Yike Zhang; pixman@lists.freedesktop.org
Subject: Re: [Pixman] 'AV_386_SSSE3' undeclared when building pixman0.34.1 with 
Solaris10-u2

On 04/ 4/16 06:51 PM, Yike Zhang wrote:
> Hi Alan,
>
> Thanks so much for your promptly response. Based on your valuable input, I 
> find the following RN
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.c0t0d0s0.org_archives_4974-2DWhats-2Dnew-2Din-2DSolaris-2D10-2D1008-2Dalso-2Dknown-2Das-2DUpdate-2D6.html=BQICaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=omJQL_siy7L8qYdXDzXVtIRGnyB5mkyk91I8ZKguO3s=iVXmzHLk6_vYCC59etKsOrnE_1Cgd8nTpSiw0K-MzEo=zyMHfos2JwwkowxbJERvbcXEVfLP55YVq0Gc2fCUnOk=
>  
> and
> x86: Kernel Support for Intel SSSE3, SSE4.1, SSE4.2, and AMD SSE4A
>
> Am I finding the correct path?
>
> It is a little bit embarrassing to say that the build system we kept for 
> Solaris is still based on 10-u2, company-wide unfortunately. So to adding a 
> fresh new 10-u6 build env might bring much effort.
>
> So may I know if any other simpler solution if we have to stick to 10-u2 (CC 
> flag?)

I've not tried it, but I can't see any reason why "cc -DAV_386_SSSE3=0x40"
wouldn't work for that particular bit of code.   If you need to actually call
other functions added in that support, you'd probably need something more 
though.

-- 
-Alan Coopersmith-  alan.coopersm...@oracle.com
 Oracle Solaris Engineering - 
https://urldefense.proofpoint.com/v2/url?u=http-3A__blogs.oracle.com_alanc=BQICaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=omJQL_siy7L8qYdXDzXVtIRGnyB5mkyk91I8ZKguO3s=iVXmzHLk6_vYCC59etKsOrnE_1Cgd8nTpSiw0K-MzEo=vwBypDbJqpvvNfIUouiI9ETsZ5tU5L8cwGqk9DDvKAw=
 
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] 'AV_386_SSSE3' undeclared when building pixman0.34.1 with Solaris10-u2

2016-04-04 Thread Alan Coopersmith

On 04/ 4/16 06:51 PM, Yike Zhang wrote:

Hi Alan,

Thanks so much for your promptly response. Based on your valuable input, I find 
the following RN

http://www.c0t0d0s0.org/archives/4974-Whats-new-in-Solaris-10-1008-also-known-as-Update-6.html
and
x86: Kernel Support for Intel SSSE3, SSE4.1, SSE4.2, and AMD SSE4A

Am I finding the correct path?

It is a little bit embarrassing to say that the build system we kept for 
Solaris is still based on 10-u2, company-wide unfortunately. So to adding a 
fresh new 10-u6 build env might bring much effort.

So may I know if any other simpler solution if we have to stick to 10-u2 (CC 
flag?)


I've not tried it, but I can't see any reason why "cc -DAV_386_SSSE3=0x40"
wouldn't work for that particular bit of code.   If you need to actually call
other functions added in that support, you'd probably need something more 
though.

--
-Alan Coopersmith-  alan.coopersm...@oracle.com
 Oracle Solaris Engineering - http://blogs.oracle.com/alanc
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] 'AV_386_SSSE3' undeclared when building pixman0.34.1 with Solaris10-u2

2016-04-04 Thread Yike Zhang
Hi Alan,

Thanks so much for your promptly response. Based on your valuable input, I find 
the following RN

http://www.c0t0d0s0.org/archives/4974-Whats-new-in-Solaris-10-1008-also-known-as-Update-6.html
and
x86: Kernel Support for Intel SSSE3, SSE4.1, SSE4.2, and AMD SSE4A

Am I finding the correct path?

It is a little bit embarrassing to say that the build system we kept for 
Solaris is still based on 10-u2, company-wide unfortunately. So to adding a 
fresh new 10-u6 build env might bring much effort. 

So may I know if any other simpler solution if we have to stick to 10-u2 (CC 
flag?)

Br
Yike







-Original Message-
From: Alan Coopersmith [mailto:alan.coopersm...@oracle.com] 
Sent: Tuesday, April 05, 2016 9:20 AM
To: Yike Zhang; pixman@lists.freedesktop.org
Subject: Re: [Pixman] 'AV_386_SSSE3' undeclared when building pixman0.34.1 with 
Solaris10-u2

On 04/ 4/16 05:56 PM, Yike Zhang wrote:
> Hello there,
>
> May I know if anyone hit the similar issue and guide me how to work around it?
>
> Double check and found that AV_386_SSSE3 is not defined in the 
> solaris10-u2 header file,

According to our bug tracker, it was added in Solaris 10 Update 6 (Solaris 10 
8/08).

Can you not update to an OS that's only 8 years old instead of sticking to the
10 year old release?

-- 
-Alan Coopersmith-  alan.coopersm...@oracle.com
 Oracle Solaris Engineering - 
https://urldefense.proofpoint.com/v2/url?u=http-3A__blogs.oracle.com_alanc=BQICaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=omJQL_siy7L8qYdXDzXVtIRGnyB5mkyk91I8ZKguO3s=YeedvpoCAygWcs3GI-ZwR-isL7L_B2xxVyqXlpuwAPE=p1_0V6QH6eSsjojymYMCIAsGxw-f3Wo0Ne3AN11_lPU=
 
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] 'AV_386_SSSE3' undeclared when building pixman0.34.1 with Solaris10-u2

2016-04-04 Thread Alan Coopersmith

On 04/ 4/16 05:56 PM, Yike Zhang wrote:

Hello there,

May I know if anyone hit the similar issue and guide me how to work around it?

Double check and found that AV_386_SSSE3 is not defined in the solaris10-u2
header file,


According to our bug tracker, it was added in Solaris 10 Update 6 (Solaris 10
8/08).

Can you not update to an OS that's only 8 years old instead of sticking to the
10 year old release?

--
-Alan Coopersmith-  alan.coopersm...@oracle.com
 Oracle Solaris Engineering - http://blogs.oracle.com/alanc
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] 'AV_386_SSSE3' undeclared when building pixman0.34.1 with Solaris10-u2

2016-04-04 Thread Yike Zhang
Hello there,

May I know if anyone hit the similar issue and guide me how to work around it?

I am trying to build and pixman library for Solaris10, but get the following 
error

2016-03-30 18:15:32 gobuilds.Compile : 18:15:32.994 make   INFO   
CC pixman-x86.lo
2016-03-30 18:15:33 gobuilds.Compile : 18:15:33.023 make   INFO 
pixman-x86.c: In function 'detect_cpu_features':
2016-03-30 18:15:33 gobuilds.Compile : 18:15:33.023 make   INFO 
pixman-x86.c:68: error: 'AV_386_SSSE3' undeclared (first use in this function)
2016-03-30 18:15:33 gobuilds.Compile : 18:15:33.023 make   INFO 
pixman-x86.c:68: error: (Each undeclared identifier is reported only once
2016-03-30 18:15:33 gobuilds.Compile : 18:15:33.023 make   INFO 
pixman-x86.c:68: error: for each function it appears in.)
2016-03-30 18:15:33 gobuilds.Compile : 18:15:33.025 make   INFO 
make[2]: *** [pixman-x86.lo] Error 1

Double check and found that AV_386_SSSE3 is not defined in the solaris10-u2 
header file,
...

#define AV_386_FPU  0x1 /* x87-style floating point */
#define AV_386_TSC  0x2 /* rdtsc insn */
#define AV_386_CX8  0x4 /* cmpxchg8b insn */
#define AV_386_SEP  0x8 /* sysenter and sysexit */
#define AV_386_AMD_SYSC 0x00010 /* AMD's syscall and sysret */
#define AV_386_CMOV 0x00020 /* conditional move insns */
#define AV_386_MMX  0x00040 /* MMX insns */
#define AV_386_AMD_MMX  0x00080 /* AMD's MMX insns */
#define AV_386_AMD_3DNow0x00100 /* AMD's 3Dnow! insns */
#define AV_386_AMD_3DNowx   0x00200 /* AMD's 3Dnow! extended insns */
#define AV_386_FXSR 0x00400 /* fxsave and fxrstor */
#define AV_386_SSE  0x00800 /* SSE insns and regs */
#define AV_386_SSE2 0x01000 /* SSE2 insns and regs */
#define AV_386_PAUSE0x02000 /* use pause insn (in spin loops) */
#define AV_386_SSE3 0x04000 /* SSE3 insns and regs */
#define AV_386_MON  0x08000 /* monitor/mwait insns */
#define AV_386_CX16 0x1 /* cmpxchg16b insn */...

However, I could find related definition in the header file for Solaris11, see

cat ./solaris11_/usr/include/sys/auxv_386.h |grep AV_386_SS*E3
#define AV_386_SSE3 0x04000 /* SSE3 insns and regs */
#define AV_386_SSSE30x40 /* Intel SSSE3 insns */


May I know do I have to manipulate the 10u2 header file by including related 
definition, or I need to update the pixman source code like

diff --git a/pixman/pixman-x86.c b/pixman/pixman-x86.c
index 05297c4..32b4018 100644
--- a/pixman/pixman-x86.c
+++ b/pixman/pixman-x86.c
@@ -65,8 +65,10 @@ detect_cpu_features (void)
features |= X86_SSE;
if (result & AV_386_SSE2)
features |= X86_SSE2;
+#ifdef USE_SSSE3
if (result & AV_386_SSSE3)
features |= X86_SSSE3;
+#endif
 }

Thanks
Yike



___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH v14 12/22] pixman-filter: fix subsample_bits == 0

2016-04-04 Thread Bill Spitzak
I believe there may have been a rebasing error. Will look into it.

I certainly intended this to change behavior when the filter size is odd,
not when the number of subsamples is odd. Not sure if this is truly rebase
screwing up, or I just mistyped an attempt to fix a rebase error.


On Mon, Apr 4, 2016 at 12:21 PM, Søren Sandmann 
wrote:

> No, it changes behavior when the number of *phases* is odd, which it is in
> the example I gave.
>
>
> Søren
>
> On Mon, Apr 4, 2016 at 2:51 PM, Bill Spitzak  wrote:
>
>> That's why this only changes the behavior when the number of samples is
>> odd.
>>
>>
>> On Sun, Apr 3, 2016 at 11:17 AM, Søren Sandmann > > wrote:
>>
>>> I don't believe this patch is correct.
>>>
>>> As an example, consider an image with an identity transformation and a
>>> cubic filter (which has width 4) with subsample_bits = 0. The current code
>>> will compute a matrix
>>>
>>> [ cubic(-2), cubic(-1), cubic(0), cubic(1) ]
>>>
>>> Now suppose we are filtering a pixel located at x = 17.5. The code in
>>> bits_image_fetch_pixel_separable_convolution() will eventually end up
>>> convolving with the pixels at locations 15.5, 16.5, 17.5 and 18.5, which is
>>> the correct result since cubic(-2) = 0 [1].
>>>
>>> With your code, the matrix computed would be
>>>
>>> [ cubic(-1.5), cubic(-0.5), cubic(0.5), cubic(1) ]
>>>
>>> which would not be correct no matter what: With an identity
>>> transformation, the pixel at 17.5 should be multiplied with cubic(0).
>>>
>>> There is a detailed explanation of these issues in the file
>>> pixman/rounding.txt.
>>>
>>>
>>> Søren
>>>
>>> [1] You might consider optimizing this case away and generate a matrix
>>> with width 3, but since this would only work with subsample_bits=0, it's
>>> not worth the special-casing.
>>>
>>>
>>> On Sun, Mar 6, 2016 at 8:06 PM,  wrote:
>>>
 From: Bill Spitzak 

 The position of only one subsample was wrong as ceil() was done on an
 integer.
 Use a different function for all odd numbers of subsamples that gets
 this right.

 Signed-off-by: Bill Spitzak 
 ---
  pixman/pixman-filter.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

 diff --git a/pixman/pixman-filter.c b/pixman/pixman-filter.c
 index ac89dda..f9ad45f 100644
 --- a/pixman/pixman-filter.c
 +++ b/pixman/pixman-filter.c
 @@ -252,7 +252,10 @@ create_1d_filter (int  width,
  * and sample positions.
  */

 -   pos = ceil (frac - width / 2.0 - 0.5) + 0.5 - frac;
 +   if (n_phases & 1)
 +   pos = frac - width / 2.0;
 +   else
 +   pos = ceil (frac - width / 2.0 - 0.5) + 0.5 - frac;

 total = 0;
 for (x = 0; x < width; ++x, ++pos)
 --
 1.9.1

 ___
 Pixman mailing list
 Pixman@lists.freedesktop.org
 https://lists.freedesktop.org/mailman/listinfo/pixman

>>>
>>>
>>
>
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH v14 12/22] pixman-filter: fix subsample_bits == 0

2016-04-04 Thread Søren Sandmann
No, it changes behavior when the number of *phases* is odd, which it is in
the example I gave.


Søren

On Mon, Apr 4, 2016 at 2:51 PM, Bill Spitzak  wrote:

> That's why this only changes the behavior when the number of samples is
> odd.
>
>
> On Sun, Apr 3, 2016 at 11:17 AM, Søren Sandmann 
> wrote:
>
>> I don't believe this patch is correct.
>>
>> As an example, consider an image with an identity transformation and a
>> cubic filter (which has width 4) with subsample_bits = 0. The current code
>> will compute a matrix
>>
>> [ cubic(-2), cubic(-1), cubic(0), cubic(1) ]
>>
>> Now suppose we are filtering a pixel located at x = 17.5. The code in
>> bits_image_fetch_pixel_separable_convolution() will eventually end up
>> convolving with the pixels at locations 15.5, 16.5, 17.5 and 18.5, which is
>> the correct result since cubic(-2) = 0 [1].
>>
>> With your code, the matrix computed would be
>>
>> [ cubic(-1.5), cubic(-0.5), cubic(0.5), cubic(1) ]
>>
>> which would not be correct no matter what: With an identity
>> transformation, the pixel at 17.5 should be multiplied with cubic(0).
>>
>> There is a detailed explanation of these issues in the file
>> pixman/rounding.txt.
>>
>>
>> Søren
>>
>> [1] You might consider optimizing this case away and generate a matrix
>> with width 3, but since this would only work with subsample_bits=0, it's
>> not worth the special-casing.
>>
>>
>> On Sun, Mar 6, 2016 at 8:06 PM,  wrote:
>>
>>> From: Bill Spitzak 
>>>
>>> The position of only one subsample was wrong as ceil() was done on an
>>> integer.
>>> Use a different function for all odd numbers of subsamples that gets
>>> this right.
>>>
>>> Signed-off-by: Bill Spitzak 
>>> ---
>>>  pixman/pixman-filter.c | 5 -
>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/pixman/pixman-filter.c b/pixman/pixman-filter.c
>>> index ac89dda..f9ad45f 100644
>>> --- a/pixman/pixman-filter.c
>>> +++ b/pixman/pixman-filter.c
>>> @@ -252,7 +252,10 @@ create_1d_filter (int  width,
>>>  * and sample positions.
>>>  */
>>>
>>> -   pos = ceil (frac - width / 2.0 - 0.5) + 0.5 - frac;
>>> +   if (n_phases & 1)
>>> +   pos = frac - width / 2.0;
>>> +   else
>>> +   pos = ceil (frac - width / 2.0 - 0.5) + 0.5 - frac;
>>>
>>> total = 0;
>>> for (x = 0; x < width; ++x, ++pos)
>>> --
>>> 1.9.1
>>>
>>> ___
>>> Pixman mailing list
>>> Pixman@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/pixman
>>>
>>
>>
>
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Add support for aarch64 neon optimization

2016-04-04 Thread Ben Avison

On Sat, 02 Apr 2016 13:30:58 +0100, Mizuki Asakura  wrote:

This patch only contains STD_FAST_PATH codes, not scaling (nearest,
bilinear) codes.


Hi Mizuki,

It looks like you have used an automated process to convert the AArch32
NEON code to AArch64. Will you be able to repeat that process for other
code, or at least assist others to repeat your steps?

The reason I ask is that I have a large number of outstanding patches to
the ARM NEON support. The process of getting them merged into the
FreeDesktop git repository has been very slow because there aren't many
people on this list with the time and ability to review them, however my
versions are in many cases up to twice the speed of the FreeDesktop
versions, and it would be a shame if AArch64 couldn't benefit from them.
If your AArch64 conversion is a one-time thing, it will make make it
extremely difficult to merge my changes in.


After completing optimization this patch, scaling related codes should be done.


One of my aims was to implement missing "iter" routines so as to accelerate
scaled plots for a much wider combination of pixels formats and Porter-Duff
combiner rules than the existing limited selection of fast paths could
cover. If you look towards the end of my patch series here:

https://github.com/bavison/pixman/commits/arm-neon-release1

you'll see that I discovered that I was actually outperforming Pixman's
existing bilinear plotters so consistently that I'm advocating removing
them entirely, with the additional advantage that it simplifies the code
base a lot. So you might want to consider whether it's worth bothering
converting those to AArch64 in the first place.

I would maybe go so far as to suggest that you try converting all the iters
first and only add fast paths if you find they do better than the iters.
One of the drawbacks of using iters is that the prefetch code can't be as
sophisticated - it can't easily be prefetching the start of the next row
while it is still working on the end of the current one. But since hardware
prefetchers are better now and conditional execution is hard in AArch64,
this will be less of a drawback with AArch64 CPUs.

I'll also repeat what has been said, that it's very neat the way the
existing prefetch code sneaks calculations into pipeline stalls, but it was
only ever really ideal for Cortex-A8. With Cortex-A7 (despite the number,
actually a much more recent 32-bit core) I noted that it was impossible to
schedule such complex prefetch code without adding to the cycle count, at
least when the images were already in the cache.

Ben
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH v14 08/22] pixman-filter: rename "scale" to "size" when it is 1/scale

2016-04-04 Thread Bill Spitzak
Okay that sounds pretty good. Also 'r' is better than 'i' since 'i' can be
mis-read as "input" or "image". 'r' I think is mostly going to be read as
"reverse" which actually makes sense.

Sorry to be a pain about this but I really find it confusing to use the
same term for different numbers.

On Sun, Apr 3, 2016 at 9:29 AM, Søren Sandmann 
wrote:

> I guess I can live with 'rscale' (for reciprocal scale).
>
>
> Søren
>
> On Thu, Mar 24, 2016 at 9:30 PM, Bill Spitzak  wrote:
>
>> Would using "iscale" (for inverse scale) work? Another is "tscale"
>> because it is the scale from the transform matrix.
>>
>> I really would like to use two different names for these variables as it
>> is really confusing using the same one.
>>
>>
>> On Fri, Mar 18, 2016 at 8:24 AM, Bill Spitzak  wrote:
>>
>>>
>>>
>>> On Thu, Mar 17, 2016 at 10:06 PM, Søren Sandmann <
>>> soren.sandm...@gmail.com> wrote:
>>>
 I suppose it's a little illogical that scale_x and scale_y really are
 the reciprocal values of how much the source image should be scaled. I
 don't remember exactly what I was thinking, but it might have something
 like "this allows you to just pass t[0][0] and t[1][1] if you have a pure
 scaling, which avoids a division". There is also kind of precedence in the
 Pixman API since the transformation given as in the dst->source direction.

 I don't really like "size" either though. It's not really the size of
 the filter that we are specifying; it just happens to be proportional to 
 it.

 If it comes down to "size" and "scale", I prefer "scale".

>>>
>>> I tried "width" but it is not actually the filter width. How about "dx"
>>> and "dy" since it is almost always the derivative of the x and y in the
>>> input (or du,dv or dtx dtx)
>>>


 Søren


 On Sun, Mar 6, 2016 at 8:06 PM,  wrote:

> From: Bill Spitzak 
>
> This is to remove some confusion when reading the code. "scale" gets
> larger
> as the picture gets larger, while "size" (ie the size of the filter)
> gets
> smaller.
>
> v14: Removed changes to integral function
>
> Signed-off-by: Bill Spitzak 
> Reviewed-by: Oded Gabbay 
> ---
>  pixman/pixman-filter.c | 18 +-
>  pixman/pixman.h|  6 +++---
>  2 files changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/pixman/pixman-filter.c b/pixman/pixman-filter.c
> index a29116a..c03a7f6 100644
> --- a/pixman/pixman-filter.c
> +++ b/pixman/pixman-filter.c
> @@ -221,7 +221,7 @@ static void
>  create_1d_filter (int  width,
>   pixman_kernel_t  reconstruct,
>   pixman_kernel_t  sample,
> - double   scale,
> + double   size,
>   int  n_phases,
>   pixman_fixed_t *p)
>  {
> @@ -251,8 +251,8 @@ create_1d_filter (int  width,
> double pos = x + 0.5 - frac;
> double rlow = - filters[reconstruct].width / 2.0;
> double rhigh = rlow + filters[reconstruct].width;
> -   double slow = pos - scale * filters[sample].width / 2.0;
> -   double shigh = slow + scale * filters[sample].width;
> +   double slow = pos - size * filters[sample].width / 2.0;
> +   double shigh = slow + size * filters[sample].width;
> double c = 0.0;
> double ilow, ihigh;
>
> @@ -262,7 +262,7 @@ create_1d_filter (int  width,
> ihigh = MIN (shigh, rhigh);
>
> c = integral (reconstruct, ilow,
> - sample, 1.0 / scale, ilow - pos,
> + sample, 1.0 / size, ilow - pos,
>   ihigh - ilow);
> }
>
> @@ -335,12 +335,12 @@ filter_width (pixman_kernel_t reconstruct,
> pixman_kernel_t sample, double size)
>  }
>
>  /* Create the parameter list for a SEPARABLE_CONVOLUTION filter
> - * with the given kernels and scale parameters
> + * with the given kernels and size parameters
>   */
>  PIXMAN_EXPORT pixman_fixed_t *
>  pixman_filter_create_separable_convolution (int *n_values,
> -   pixman_fixed_t   scale_x,
> -   pixman_fixed_t   scale_y,
> +   pixman_fixed_t   size_x,
> +   pixman_fixed_t   size_y,
> pixman_kernel_t
> reconstruct_x,
> 

Re: [Pixman] [PATCH v14 12/22] pixman-filter: fix subsample_bits == 0

2016-04-04 Thread Bill Spitzak
That's why this only changes the behavior when the number of samples is odd.


On Sun, Apr 3, 2016 at 11:17 AM, Søren Sandmann 
wrote:

> I don't believe this patch is correct.
>
> As an example, consider an image with an identity transformation and a
> cubic filter (which has width 4) with subsample_bits = 0. The current code
> will compute a matrix
>
> [ cubic(-2), cubic(-1), cubic(0), cubic(1) ]
>
> Now suppose we are filtering a pixel located at x = 17.5. The code in
> bits_image_fetch_pixel_separable_convolution() will eventually end up
> convolving with the pixels at locations 15.5, 16.5, 17.5 and 18.5, which is
> the correct result since cubic(-2) = 0 [1].
>
> With your code, the matrix computed would be
>
> [ cubic(-1.5), cubic(-0.5), cubic(0.5), cubic(1) ]
>
> which would not be correct no matter what: With an identity
> transformation, the pixel at 17.5 should be multiplied with cubic(0).
>
> There is a detailed explanation of these issues in the file
> pixman/rounding.txt.
>
>
> Søren
>
> [1] You might consider optimizing this case away and generate a matrix
> with width 3, but since this would only work with subsample_bits=0, it's
> not worth the special-casing.
>
>
> On Sun, Mar 6, 2016 at 8:06 PM,  wrote:
>
>> From: Bill Spitzak 
>>
>> The position of only one subsample was wrong as ceil() was done on an
>> integer.
>> Use a different function for all odd numbers of subsamples that gets this
>> right.
>>
>> Signed-off-by: Bill Spitzak 
>> ---
>>  pixman/pixman-filter.c | 5 -
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/pixman/pixman-filter.c b/pixman/pixman-filter.c
>> index ac89dda..f9ad45f 100644
>> --- a/pixman/pixman-filter.c
>> +++ b/pixman/pixman-filter.c
>> @@ -252,7 +252,10 @@ create_1d_filter (int  width,
>>  * and sample positions.
>>  */
>>
>> -   pos = ceil (frac - width / 2.0 - 0.5) + 0.5 - frac;
>> +   if (n_phases & 1)
>> +   pos = frac - width / 2.0;
>> +   else
>> +   pos = ceil (frac - width / 2.0 - 0.5) + 0.5 - frac;
>>
>> total = 0;
>> for (x = 0; x < width; ++x, ++pos)
>> --
>> 1.9.1
>>
>> ___
>> Pixman mailing list
>> Pixman@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/pixman
>>
>
>
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Add support for aarch64 neon optimization

2016-04-04 Thread Mizuki Asakura
> But it is much more interesting to compare the performance of this
> patch with the existing 32-bit ARM NEON code. You can build the
> 32-bit tests programs (including lowlevel-blt-bench) using a 32-bit
> ARM crosscompiler either on your desktop PC or on the ARM board:
>
>   ./configure --host=arm-linux-gnueabihf --enable-static-testprogs \
>   --disable-libpng --disable-gtk
>   make

Ok. I succeed to install cross-build toolchain in my Linux PC and
create armeabihf version of pixman.

And attached the benchmark result to the bug ticket. Please find:
https://bugs.freedesktop.org/attachment.cgi?id=122695

At least, I have to catch up with this result on aarch64...

On 4 April 2016 at 19:28, Mizuki Asakura  wrote:
>> OK, thanks for this information. What I mean is that this should be
>> preferably a separate patch.
>
> I've sent it on the separated patch.
>
> On 3 April 2016 at 20:22, Siarhei Siamashka  
> wrote:
>> On Sat, 2 Apr 2016 20:28:44 +0900
>> Mizuki Asakura  wrote:
>>
>>> Thanks for your reply.
>>>
>>> > The main concern is whether we can get support for all these
>>> > assembly syntax flavours, calling conventions and minor instructions
>>> > differences, while keeping the code maintainable.
>>>
>>> In the aarch64 point of view, I believe we should only support ARM's 
>>> original
>>> syntax and calling convensions. It would be most portable.
>>
>> The ILP32 is an alternative ABI, which is being endorsed or even
>> developed by ARM:
>> http://infocenter.arm.com/help/topic/com.arm.doc.dai0490a/index.html
>>
>> When/if somebody tries to use pixman in an AArch64 ILP32 system, this
>> may cause some difficulties/inconveniences for us.
>>
>>> In iOS case, they may be able to create some converter script that only
>>> translate instruction name to their own since the architcture is same
>>> (as opposed aarch32-neon vs aarch64-neon is (almost) same, but different).
>>
>> Yes. The converters exists, but AFAIK the pixman assembly code is
>> way too difficult for them to parse. Again, the problem is technically
>> solvable, just nobody has invested efforts into solving it yet. Either
>> by contributing the needed code or by sponsoring this work.
>>
>> I mentioned this only because the unified 32-bit/64-bit ARM assembly
>> preprocessor could also potentially take care of this iOS issue as a
>> side effect.
>>
>>> # How about SSE2 on OS X ? Is it same syntax and calling convension ?
>>
>> I'm not familiar with OS X myself. But the SSE2 code in pixman is
>> currently using intrinsics, so this makes everything quite simple
>> as far as portability is concerned.
>>
>> If we were to use hand tuned assembly optimizations for x86, then we
>> would have to take care of the 32-bit Linux/Windows, 64-bit Linux with
>> x32 ABI, 64-bit Linux and 64-bit Windows cases separately. Windows
>> arrived late to the x86-64 scene, but they still decided to invent
>> their own incompatible calling conventions. All of this is fine for
>> intrinsics, but may clearly cause problems for assembly implementations.
>>
>> Hand written assembly has a relatively high maintenance cost. But it
>> pays off if we want to have good performance.
>>
>> I still see a large potential for improving pixman performance on
>> x86 (replace intrinsics with hand written assembly and make a better
>> use of SSSE3, AVX, etc.). But somebody has to shoulder the expenses
>> for this to happen. BTW, I would be happy to take this task if
>> somebody is willing to pay :-)
>>
>>> >  /* stride is always multiple of 32bit units in pixman */
>>> > -uint32_t byte_stride = stride * sizeof(uint32_t);
>>> > +int32_t byte_stride = stride * sizeof(uint32_t);
>>>
>>> It caused horribly crashes in aarch64. Why it works on aarch32 ? :)
>>> (32-bit integer overflow may help them...)
>>
>> OK, thanks for this information. What I mean is that this should be
>> preferably a separate patch.
>>
>>> > I'm currently investigating the possibility of maybe "inventing" a
>>> > common unified syntax for some preprocessor script to take care of
>>> > both 32-bit and 64-bit ARM code. But we still can't be sure if this
>>> > approach is practical (or even doable) and whether it can solve more
>>> > problems than introduce new ones...
>>>
>>> At first, I've also tried to create a "converter script" from
>>> pixman-arm-neon-asm*.S to
>>> aarch64 neon codes just translating instruction, register names and so on.
>>
>> I see. It is nice to know that we basically came up with the same
>> solution at least as the starting point :-)
>>
>>> But I've found that this script won't work on pixman's code because of:
>>>
>>> * conflicting register names: both Qn and Dn are used in same place
>>> * highly optimized for aarch32-neon register structure. assumes d29,
>>> d30 is low / high of q15
>>
>> It's all mostly just a syntax sugar. The Dn registers still do exist
>> in a somewhat hidden way. For example:

Re: [Pixman] [PATCH] Add support for aarch64 neon optimization

2016-04-04 Thread Mizuki Asakura
> OK, thanks for this information. What I mean is that this should be
> preferably a separate patch.

I've sent it on the separated patch.

On 3 April 2016 at 20:22, Siarhei Siamashka  wrote:
> On Sat, 2 Apr 2016 20:28:44 +0900
> Mizuki Asakura  wrote:
>
>> Thanks for your reply.
>>
>> > The main concern is whether we can get support for all these
>> > assembly syntax flavours, calling conventions and minor instructions
>> > differences, while keeping the code maintainable.
>>
>> In the aarch64 point of view, I believe we should only support ARM's original
>> syntax and calling convensions. It would be most portable.
>
> The ILP32 is an alternative ABI, which is being endorsed or even
> developed by ARM:
> http://infocenter.arm.com/help/topic/com.arm.doc.dai0490a/index.html
>
> When/if somebody tries to use pixman in an AArch64 ILP32 system, this
> may cause some difficulties/inconveniences for us.
>
>> In iOS case, they may be able to create some converter script that only
>> translate instruction name to their own since the architcture is same
>> (as opposed aarch32-neon vs aarch64-neon is (almost) same, but different).
>
> Yes. The converters exists, but AFAIK the pixman assembly code is
> way too difficult for them to parse. Again, the problem is technically
> solvable, just nobody has invested efforts into solving it yet. Either
> by contributing the needed code or by sponsoring this work.
>
> I mentioned this only because the unified 32-bit/64-bit ARM assembly
> preprocessor could also potentially take care of this iOS issue as a
> side effect.
>
>> # How about SSE2 on OS X ? Is it same syntax and calling convension ?
>
> I'm not familiar with OS X myself. But the SSE2 code in pixman is
> currently using intrinsics, so this makes everything quite simple
> as far as portability is concerned.
>
> If we were to use hand tuned assembly optimizations for x86, then we
> would have to take care of the 32-bit Linux/Windows, 64-bit Linux with
> x32 ABI, 64-bit Linux and 64-bit Windows cases separately. Windows
> arrived late to the x86-64 scene, but they still decided to invent
> their own incompatible calling conventions. All of this is fine for
> intrinsics, but may clearly cause problems for assembly implementations.
>
> Hand written assembly has a relatively high maintenance cost. But it
> pays off if we want to have good performance.
>
> I still see a large potential for improving pixman performance on
> x86 (replace intrinsics with hand written assembly and make a better
> use of SSSE3, AVX, etc.). But somebody has to shoulder the expenses
> for this to happen. BTW, I would be happy to take this task if
> somebody is willing to pay :-)
>
>> >  /* stride is always multiple of 32bit units in pixman */
>> > -uint32_t byte_stride = stride * sizeof(uint32_t);
>> > +int32_t byte_stride = stride * sizeof(uint32_t);
>>
>> It caused horribly crashes in aarch64. Why it works on aarch32 ? :)
>> (32-bit integer overflow may help them...)
>
> OK, thanks for this information. What I mean is that this should be
> preferably a separate patch.
>
>> > I'm currently investigating the possibility of maybe "inventing" a
>> > common unified syntax for some preprocessor script to take care of
>> > both 32-bit and 64-bit ARM code. But we still can't be sure if this
>> > approach is practical (or even doable) and whether it can solve more
>> > problems than introduce new ones...
>>
>> At first, I've also tried to create a "converter script" from
>> pixman-arm-neon-asm*.S to
>> aarch64 neon codes just translating instruction, register names and so on.
>
> I see. It is nice to know that we basically came up with the same
> solution at least as the starting point :-)
>
>> But I've found that this script won't work on pixman's code because of:
>>
>> * conflicting register names: both Qn and Dn are used in same place
>> * highly optimized for aarch32-neon register structure. assumes d29,
>> d30 is low / high of q15
>
> It's all mostly just a syntax sugar. The Dn registers still do exist
> in a somewhat hidden way. For example:
>
> vshrn.u32   d0,q15,#1
> vshrn.u32   d1,q15,#1
>
> directly translates into
>
> shrnv0.4h, v15.4s, #1
> shrn2   v0.8h, v15.4s, #1
>
> The selection of SHRN or SHRN2 instruction variant decides whether
> the lower or upper half of the 128-bit destination register is updated.
>
> The 64-bit source operands (Dn registers) are handled in the very
> same way:
>
> vmull.u16   q1,d0,d0
> vmull.u16   q2,d1,d1
>
> directly translates into
>
> umull   v1.4s, v0.4h, v0.4h
> umull2  v2.4s, v0.8h, v0.8h
>
> There are of course some limitations. Mixing odd and even
> numbered d-registers as the source operands is not supported
> in AArch64. This is a bit inconvenient, but the AArch64 code
> still needs to take this into account either way regardless
> of the syntax.
>
> I'm 

[Pixman] [PATCH] Fixed invalid casting in arm_neon_fill

2016-04-04 Thread Mizuki Asakura
stride con be negative, but byte-stride is declared as uint32_t.
It may cause integer overflow.

Signed-off-by: Mizuki Asakura 
---
 pixman/pixman-arm-neon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pixman/pixman-arm-neon.c b/pixman/pixman-arm-neon.c
index be761c9..fcb61bc
--- a/pixman/pixman-arm-neon.c
+++ b/pixman/pixman-arm-neon.c
@@ -194,7 +194,7 @@ arm_neon_fill (pixman_implementation_t *imp,
uint32_t _xor)
 {
 /* stride is always multiple of 32bit units in pixman */
-uint32_t byte_stride = stride * sizeof(uint32_t);
+int32_t byte_stride = stride * sizeof(uint32_t);

 switch (bpp)
 {
-- 
2.7.4
___
Pixman mailing list
Pixman@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/pixman