Re: [Pixman] 'AV_386_SSSE3' undeclared when building pixman0.34.1 with Solaris10-u2
Thanks Alan, I would try with that option. Hopefully we would not came across any functions actually invoke SSSE3 supported feature. Best Regards Yike -Original Message- From: Alan Coopersmith [mailto:alan.coopersm...@oracle.com] Sent: Tuesday, April 05, 2016 10:01 AM To: Yike Zhang; pixman@lists.freedesktop.org Subject: Re: [Pixman] 'AV_386_SSSE3' undeclared when building pixman0.34.1 with Solaris10-u2 On 04/ 4/16 06:51 PM, Yike Zhang wrote: > Hi Alan, > > Thanks so much for your promptly response. Based on your valuable input, I > find the following RN > > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.c0t0d0s0.org_archives_4974-2DWhats-2Dnew-2Din-2DSolaris-2D10-2D1008-2Dalso-2Dknown-2Das-2DUpdate-2D6.html=BQICaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=omJQL_siy7L8qYdXDzXVtIRGnyB5mkyk91I8ZKguO3s=iVXmzHLk6_vYCC59etKsOrnE_1Cgd8nTpSiw0K-MzEo=zyMHfos2JwwkowxbJERvbcXEVfLP55YVq0Gc2fCUnOk= > > and > x86: Kernel Support for Intel SSSE3, SSE4.1, SSE4.2, and AMD SSE4A > > Am I finding the correct path? > > It is a little bit embarrassing to say that the build system we kept for > Solaris is still based on 10-u2, company-wide unfortunately. So to adding a > fresh new 10-u6 build env might bring much effort. > > So may I know if any other simpler solution if we have to stick to 10-u2 (CC > flag?) I've not tried it, but I can't see any reason why "cc -DAV_386_SSSE3=0x40" wouldn't work for that particular bit of code. If you need to actually call other functions added in that support, you'd probably need something more though. -- -Alan Coopersmith- alan.coopersm...@oracle.com Oracle Solaris Engineering - https://urldefense.proofpoint.com/v2/url?u=http-3A__blogs.oracle.com_alanc=BQICaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=omJQL_siy7L8qYdXDzXVtIRGnyB5mkyk91I8ZKguO3s=iVXmzHLk6_vYCC59etKsOrnE_1Cgd8nTpSiw0K-MzEo=vwBypDbJqpvvNfIUouiI9ETsZ5tU5L8cwGqk9DDvKAw= ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] 'AV_386_SSSE3' undeclared when building pixman0.34.1 with Solaris10-u2
On 04/ 4/16 06:51 PM, Yike Zhang wrote: Hi Alan, Thanks so much for your promptly response. Based on your valuable input, I find the following RN http://www.c0t0d0s0.org/archives/4974-Whats-new-in-Solaris-10-1008-also-known-as-Update-6.html and x86: Kernel Support for Intel SSSE3, SSE4.1, SSE4.2, and AMD SSE4A Am I finding the correct path? It is a little bit embarrassing to say that the build system we kept for Solaris is still based on 10-u2, company-wide unfortunately. So to adding a fresh new 10-u6 build env might bring much effort. So may I know if any other simpler solution if we have to stick to 10-u2 (CC flag?) I've not tried it, but I can't see any reason why "cc -DAV_386_SSSE3=0x40" wouldn't work for that particular bit of code. If you need to actually call other functions added in that support, you'd probably need something more though. -- -Alan Coopersmith- alan.coopersm...@oracle.com Oracle Solaris Engineering - http://blogs.oracle.com/alanc ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] 'AV_386_SSSE3' undeclared when building pixman0.34.1 with Solaris10-u2
Hi Alan, Thanks so much for your promptly response. Based on your valuable input, I find the following RN http://www.c0t0d0s0.org/archives/4974-Whats-new-in-Solaris-10-1008-also-known-as-Update-6.html and x86: Kernel Support for Intel SSSE3, SSE4.1, SSE4.2, and AMD SSE4A Am I finding the correct path? It is a little bit embarrassing to say that the build system we kept for Solaris is still based on 10-u2, company-wide unfortunately. So to adding a fresh new 10-u6 build env might bring much effort. So may I know if any other simpler solution if we have to stick to 10-u2 (CC flag?) Br Yike -Original Message- From: Alan Coopersmith [mailto:alan.coopersm...@oracle.com] Sent: Tuesday, April 05, 2016 9:20 AM To: Yike Zhang; pixman@lists.freedesktop.org Subject: Re: [Pixman] 'AV_386_SSSE3' undeclared when building pixman0.34.1 with Solaris10-u2 On 04/ 4/16 05:56 PM, Yike Zhang wrote: > Hello there, > > May I know if anyone hit the similar issue and guide me how to work around it? > > Double check and found that AV_386_SSSE3 is not defined in the > solaris10-u2 header file, According to our bug tracker, it was added in Solaris 10 Update 6 (Solaris 10 8/08). Can you not update to an OS that's only 8 years old instead of sticking to the 10 year old release? -- -Alan Coopersmith- alan.coopersm...@oracle.com Oracle Solaris Engineering - https://urldefense.proofpoint.com/v2/url?u=http-3A__blogs.oracle.com_alanc=BQICaQ=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs=omJQL_siy7L8qYdXDzXVtIRGnyB5mkyk91I8ZKguO3s=YeedvpoCAygWcs3GI-ZwR-isL7L_B2xxVyqXlpuwAPE=p1_0V6QH6eSsjojymYMCIAsGxw-f3Wo0Ne3AN11_lPU= ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] 'AV_386_SSSE3' undeclared when building pixman0.34.1 with Solaris10-u2
On 04/ 4/16 05:56 PM, Yike Zhang wrote: Hello there, May I know if anyone hit the similar issue and guide me how to work around it? Double check and found that AV_386_SSSE3 is not defined in the solaris10-u2 header file, According to our bug tracker, it was added in Solaris 10 Update 6 (Solaris 10 8/08). Can you not update to an OS that's only 8 years old instead of sticking to the 10 year old release? -- -Alan Coopersmith- alan.coopersm...@oracle.com Oracle Solaris Engineering - http://blogs.oracle.com/alanc ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] 'AV_386_SSSE3' undeclared when building pixman0.34.1 with Solaris10-u2
Hello there, May I know if anyone hit the similar issue and guide me how to work around it? I am trying to build and pixman library for Solaris10, but get the following error 2016-03-30 18:15:32 gobuilds.Compile : 18:15:32.994 make INFO CC pixman-x86.lo 2016-03-30 18:15:33 gobuilds.Compile : 18:15:33.023 make INFO pixman-x86.c: In function 'detect_cpu_features': 2016-03-30 18:15:33 gobuilds.Compile : 18:15:33.023 make INFO pixman-x86.c:68: error: 'AV_386_SSSE3' undeclared (first use in this function) 2016-03-30 18:15:33 gobuilds.Compile : 18:15:33.023 make INFO pixman-x86.c:68: error: (Each undeclared identifier is reported only once 2016-03-30 18:15:33 gobuilds.Compile : 18:15:33.023 make INFO pixman-x86.c:68: error: for each function it appears in.) 2016-03-30 18:15:33 gobuilds.Compile : 18:15:33.025 make INFO make[2]: *** [pixman-x86.lo] Error 1 Double check and found that AV_386_SSSE3 is not defined in the solaris10-u2 header file, ... #define AV_386_FPU 0x1 /* x87-style floating point */ #define AV_386_TSC 0x2 /* rdtsc insn */ #define AV_386_CX8 0x4 /* cmpxchg8b insn */ #define AV_386_SEP 0x8 /* sysenter and sysexit */ #define AV_386_AMD_SYSC 0x00010 /* AMD's syscall and sysret */ #define AV_386_CMOV 0x00020 /* conditional move insns */ #define AV_386_MMX 0x00040 /* MMX insns */ #define AV_386_AMD_MMX 0x00080 /* AMD's MMX insns */ #define AV_386_AMD_3DNow0x00100 /* AMD's 3Dnow! insns */ #define AV_386_AMD_3DNowx 0x00200 /* AMD's 3Dnow! extended insns */ #define AV_386_FXSR 0x00400 /* fxsave and fxrstor */ #define AV_386_SSE 0x00800 /* SSE insns and regs */ #define AV_386_SSE2 0x01000 /* SSE2 insns and regs */ #define AV_386_PAUSE0x02000 /* use pause insn (in spin loops) */ #define AV_386_SSE3 0x04000 /* SSE3 insns and regs */ #define AV_386_MON 0x08000 /* monitor/mwait insns */ #define AV_386_CX16 0x1 /* cmpxchg16b insn */... However, I could find related definition in the header file for Solaris11, see cat ./solaris11_/usr/include/sys/auxv_386.h |grep AV_386_SS*E3 #define AV_386_SSE3 0x04000 /* SSE3 insns and regs */ #define AV_386_SSSE30x40 /* Intel SSSE3 insns */ May I know do I have to manipulate the 10u2 header file by including related definition, or I need to update the pixman source code like diff --git a/pixman/pixman-x86.c b/pixman/pixman-x86.c index 05297c4..32b4018 100644 --- a/pixman/pixman-x86.c +++ b/pixman/pixman-x86.c @@ -65,8 +65,10 @@ detect_cpu_features (void) features |= X86_SSE; if (result & AV_386_SSE2) features |= X86_SSE2; +#ifdef USE_SSSE3 if (result & AV_386_SSSE3) features |= X86_SSSE3; +#endif } Thanks Yike ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH v14 12/22] pixman-filter: fix subsample_bits == 0
I believe there may have been a rebasing error. Will look into it. I certainly intended this to change behavior when the filter size is odd, not when the number of subsamples is odd. Not sure if this is truly rebase screwing up, or I just mistyped an attempt to fix a rebase error. On Mon, Apr 4, 2016 at 12:21 PM, Søren Sandmannwrote: > No, it changes behavior when the number of *phases* is odd, which it is in > the example I gave. > > > Søren > > On Mon, Apr 4, 2016 at 2:51 PM, Bill Spitzak wrote: > >> That's why this only changes the behavior when the number of samples is >> odd. >> >> >> On Sun, Apr 3, 2016 at 11:17 AM, Søren Sandmann > > wrote: >> >>> I don't believe this patch is correct. >>> >>> As an example, consider an image with an identity transformation and a >>> cubic filter (which has width 4) with subsample_bits = 0. The current code >>> will compute a matrix >>> >>> [ cubic(-2), cubic(-1), cubic(0), cubic(1) ] >>> >>> Now suppose we are filtering a pixel located at x = 17.5. The code in >>> bits_image_fetch_pixel_separable_convolution() will eventually end up >>> convolving with the pixels at locations 15.5, 16.5, 17.5 and 18.5, which is >>> the correct result since cubic(-2) = 0 [1]. >>> >>> With your code, the matrix computed would be >>> >>> [ cubic(-1.5), cubic(-0.5), cubic(0.5), cubic(1) ] >>> >>> which would not be correct no matter what: With an identity >>> transformation, the pixel at 17.5 should be multiplied with cubic(0). >>> >>> There is a detailed explanation of these issues in the file >>> pixman/rounding.txt. >>> >>> >>> Søren >>> >>> [1] You might consider optimizing this case away and generate a matrix >>> with width 3, but since this would only work with subsample_bits=0, it's >>> not worth the special-casing. >>> >>> >>> On Sun, Mar 6, 2016 at 8:06 PM, wrote: >>> From: Bill Spitzak The position of only one subsample was wrong as ceil() was done on an integer. Use a different function for all odd numbers of subsamples that gets this right. Signed-off-by: Bill Spitzak --- pixman/pixman-filter.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/pixman/pixman-filter.c b/pixman/pixman-filter.c index ac89dda..f9ad45f 100644 --- a/pixman/pixman-filter.c +++ b/pixman/pixman-filter.c @@ -252,7 +252,10 @@ create_1d_filter (int width, * and sample positions. */ - pos = ceil (frac - width / 2.0 - 0.5) + 0.5 - frac; + if (n_phases & 1) + pos = frac - width / 2.0; + else + pos = ceil (frac - width / 2.0 - 0.5) + 0.5 - frac; total = 0; for (x = 0; x < width; ++x, ++pos) -- 1.9.1 ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman >>> >>> >> > ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH v14 12/22] pixman-filter: fix subsample_bits == 0
No, it changes behavior when the number of *phases* is odd, which it is in the example I gave. Søren On Mon, Apr 4, 2016 at 2:51 PM, Bill Spitzakwrote: > That's why this only changes the behavior when the number of samples is > odd. > > > On Sun, Apr 3, 2016 at 11:17 AM, Søren Sandmann > wrote: > >> I don't believe this patch is correct. >> >> As an example, consider an image with an identity transformation and a >> cubic filter (which has width 4) with subsample_bits = 0. The current code >> will compute a matrix >> >> [ cubic(-2), cubic(-1), cubic(0), cubic(1) ] >> >> Now suppose we are filtering a pixel located at x = 17.5. The code in >> bits_image_fetch_pixel_separable_convolution() will eventually end up >> convolving with the pixels at locations 15.5, 16.5, 17.5 and 18.5, which is >> the correct result since cubic(-2) = 0 [1]. >> >> With your code, the matrix computed would be >> >> [ cubic(-1.5), cubic(-0.5), cubic(0.5), cubic(1) ] >> >> which would not be correct no matter what: With an identity >> transformation, the pixel at 17.5 should be multiplied with cubic(0). >> >> There is a detailed explanation of these issues in the file >> pixman/rounding.txt. >> >> >> Søren >> >> [1] You might consider optimizing this case away and generate a matrix >> with width 3, but since this would only work with subsample_bits=0, it's >> not worth the special-casing. >> >> >> On Sun, Mar 6, 2016 at 8:06 PM, wrote: >> >>> From: Bill Spitzak >>> >>> The position of only one subsample was wrong as ceil() was done on an >>> integer. >>> Use a different function for all odd numbers of subsamples that gets >>> this right. >>> >>> Signed-off-by: Bill Spitzak >>> --- >>> pixman/pixman-filter.c | 5 - >>> 1 file changed, 4 insertions(+), 1 deletion(-) >>> >>> diff --git a/pixman/pixman-filter.c b/pixman/pixman-filter.c >>> index ac89dda..f9ad45f 100644 >>> --- a/pixman/pixman-filter.c >>> +++ b/pixman/pixman-filter.c >>> @@ -252,7 +252,10 @@ create_1d_filter (int width, >>> * and sample positions. >>> */ >>> >>> - pos = ceil (frac - width / 2.0 - 0.5) + 0.5 - frac; >>> + if (n_phases & 1) >>> + pos = frac - width / 2.0; >>> + else >>> + pos = ceil (frac - width / 2.0 - 0.5) + 0.5 - frac; >>> >>> total = 0; >>> for (x = 0; x < width; ++x, ++pos) >>> -- >>> 1.9.1 >>> >>> ___ >>> Pixman mailing list >>> Pixman@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/pixman >>> >> >> > ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Add support for aarch64 neon optimization
On Sat, 02 Apr 2016 13:30:58 +0100, Mizuki Asakurawrote: This patch only contains STD_FAST_PATH codes, not scaling (nearest, bilinear) codes. Hi Mizuki, It looks like you have used an automated process to convert the AArch32 NEON code to AArch64. Will you be able to repeat that process for other code, or at least assist others to repeat your steps? The reason I ask is that I have a large number of outstanding patches to the ARM NEON support. The process of getting them merged into the FreeDesktop git repository has been very slow because there aren't many people on this list with the time and ability to review them, however my versions are in many cases up to twice the speed of the FreeDesktop versions, and it would be a shame if AArch64 couldn't benefit from them. If your AArch64 conversion is a one-time thing, it will make make it extremely difficult to merge my changes in. After completing optimization this patch, scaling related codes should be done. One of my aims was to implement missing "iter" routines so as to accelerate scaled plots for a much wider combination of pixels formats and Porter-Duff combiner rules than the existing limited selection of fast paths could cover. If you look towards the end of my patch series here: https://github.com/bavison/pixman/commits/arm-neon-release1 you'll see that I discovered that I was actually outperforming Pixman's existing bilinear plotters so consistently that I'm advocating removing them entirely, with the additional advantage that it simplifies the code base a lot. So you might want to consider whether it's worth bothering converting those to AArch64 in the first place. I would maybe go so far as to suggest that you try converting all the iters first and only add fast paths if you find they do better than the iters. One of the drawbacks of using iters is that the prefetch code can't be as sophisticated - it can't easily be prefetching the start of the next row while it is still working on the end of the current one. But since hardware prefetchers are better now and conditional execution is hard in AArch64, this will be less of a drawback with AArch64 CPUs. I'll also repeat what has been said, that it's very neat the way the existing prefetch code sneaks calculations into pipeline stalls, but it was only ever really ideal for Cortex-A8. With Cortex-A7 (despite the number, actually a much more recent 32-bit core) I noted that it was impossible to schedule such complex prefetch code without adding to the cycle count, at least when the images were already in the cache. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH v14 08/22] pixman-filter: rename "scale" to "size" when it is 1/scale
Okay that sounds pretty good. Also 'r' is better than 'i' since 'i' can be mis-read as "input" or "image". 'r' I think is mostly going to be read as "reverse" which actually makes sense. Sorry to be a pain about this but I really find it confusing to use the same term for different numbers. On Sun, Apr 3, 2016 at 9:29 AM, Søren Sandmannwrote: > I guess I can live with 'rscale' (for reciprocal scale). > > > Søren > > On Thu, Mar 24, 2016 at 9:30 PM, Bill Spitzak wrote: > >> Would using "iscale" (for inverse scale) work? Another is "tscale" >> because it is the scale from the transform matrix. >> >> I really would like to use two different names for these variables as it >> is really confusing using the same one. >> >> >> On Fri, Mar 18, 2016 at 8:24 AM, Bill Spitzak wrote: >> >>> >>> >>> On Thu, Mar 17, 2016 at 10:06 PM, Søren Sandmann < >>> soren.sandm...@gmail.com> wrote: >>> I suppose it's a little illogical that scale_x and scale_y really are the reciprocal values of how much the source image should be scaled. I don't remember exactly what I was thinking, but it might have something like "this allows you to just pass t[0][0] and t[1][1] if you have a pure scaling, which avoids a division". There is also kind of precedence in the Pixman API since the transformation given as in the dst->source direction. I don't really like "size" either though. It's not really the size of the filter that we are specifying; it just happens to be proportional to it. If it comes down to "size" and "scale", I prefer "scale". >>> >>> I tried "width" but it is not actually the filter width. How about "dx" >>> and "dy" since it is almost always the derivative of the x and y in the >>> input (or du,dv or dtx dtx) >>> Søren On Sun, Mar 6, 2016 at 8:06 PM, wrote: > From: Bill Spitzak > > This is to remove some confusion when reading the code. "scale" gets > larger > as the picture gets larger, while "size" (ie the size of the filter) > gets > smaller. > > v14: Removed changes to integral function > > Signed-off-by: Bill Spitzak > Reviewed-by: Oded Gabbay > --- > pixman/pixman-filter.c | 18 +- > pixman/pixman.h| 6 +++--- > 2 files changed, 12 insertions(+), 12 deletions(-) > > diff --git a/pixman/pixman-filter.c b/pixman/pixman-filter.c > index a29116a..c03a7f6 100644 > --- a/pixman/pixman-filter.c > +++ b/pixman/pixman-filter.c > @@ -221,7 +221,7 @@ static void > create_1d_filter (int width, > pixman_kernel_t reconstruct, > pixman_kernel_t sample, > - double scale, > + double size, > int n_phases, > pixman_fixed_t *p) > { > @@ -251,8 +251,8 @@ create_1d_filter (int width, > double pos = x + 0.5 - frac; > double rlow = - filters[reconstruct].width / 2.0; > double rhigh = rlow + filters[reconstruct].width; > - double slow = pos - scale * filters[sample].width / 2.0; > - double shigh = slow + scale * filters[sample].width; > + double slow = pos - size * filters[sample].width / 2.0; > + double shigh = slow + size * filters[sample].width; > double c = 0.0; > double ilow, ihigh; > > @@ -262,7 +262,7 @@ create_1d_filter (int width, > ihigh = MIN (shigh, rhigh); > > c = integral (reconstruct, ilow, > - sample, 1.0 / scale, ilow - pos, > + sample, 1.0 / size, ilow - pos, > ihigh - ilow); > } > > @@ -335,12 +335,12 @@ filter_width (pixman_kernel_t reconstruct, > pixman_kernel_t sample, double size) > } > > /* Create the parameter list for a SEPARABLE_CONVOLUTION filter > - * with the given kernels and scale parameters > + * with the given kernels and size parameters > */ > PIXMAN_EXPORT pixman_fixed_t * > pixman_filter_create_separable_convolution (int *n_values, > - pixman_fixed_t scale_x, > - pixman_fixed_t scale_y, > + pixman_fixed_t size_x, > + pixman_fixed_t size_y, > pixman_kernel_t > reconstruct_x, >
Re: [Pixman] [PATCH v14 12/22] pixman-filter: fix subsample_bits == 0
That's why this only changes the behavior when the number of samples is odd. On Sun, Apr 3, 2016 at 11:17 AM, Søren Sandmannwrote: > I don't believe this patch is correct. > > As an example, consider an image with an identity transformation and a > cubic filter (which has width 4) with subsample_bits = 0. The current code > will compute a matrix > > [ cubic(-2), cubic(-1), cubic(0), cubic(1) ] > > Now suppose we are filtering a pixel located at x = 17.5. The code in > bits_image_fetch_pixel_separable_convolution() will eventually end up > convolving with the pixels at locations 15.5, 16.5, 17.5 and 18.5, which is > the correct result since cubic(-2) = 0 [1]. > > With your code, the matrix computed would be > > [ cubic(-1.5), cubic(-0.5), cubic(0.5), cubic(1) ] > > which would not be correct no matter what: With an identity > transformation, the pixel at 17.5 should be multiplied with cubic(0). > > There is a detailed explanation of these issues in the file > pixman/rounding.txt. > > > Søren > > [1] You might consider optimizing this case away and generate a matrix > with width 3, but since this would only work with subsample_bits=0, it's > not worth the special-casing. > > > On Sun, Mar 6, 2016 at 8:06 PM, wrote: > >> From: Bill Spitzak >> >> The position of only one subsample was wrong as ceil() was done on an >> integer. >> Use a different function for all odd numbers of subsamples that gets this >> right. >> >> Signed-off-by: Bill Spitzak >> --- >> pixman/pixman-filter.c | 5 - >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/pixman/pixman-filter.c b/pixman/pixman-filter.c >> index ac89dda..f9ad45f 100644 >> --- a/pixman/pixman-filter.c >> +++ b/pixman/pixman-filter.c >> @@ -252,7 +252,10 @@ create_1d_filter (int width, >> * and sample positions. >> */ >> >> - pos = ceil (frac - width / 2.0 - 0.5) + 0.5 - frac; >> + if (n_phases & 1) >> + pos = frac - width / 2.0; >> + else >> + pos = ceil (frac - width / 2.0 - 0.5) + 0.5 - frac; >> >> total = 0; >> for (x = 0; x < width; ++x, ++pos) >> -- >> 1.9.1 >> >> ___ >> Pixman mailing list >> Pixman@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/pixman >> > > ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Add support for aarch64 neon optimization
> But it is much more interesting to compare the performance of this > patch with the existing 32-bit ARM NEON code. You can build the > 32-bit tests programs (including lowlevel-blt-bench) using a 32-bit > ARM crosscompiler either on your desktop PC or on the ARM board: > > ./configure --host=arm-linux-gnueabihf --enable-static-testprogs \ > --disable-libpng --disable-gtk > make Ok. I succeed to install cross-build toolchain in my Linux PC and create armeabihf version of pixman. And attached the benchmark result to the bug ticket. Please find: https://bugs.freedesktop.org/attachment.cgi?id=122695 At least, I have to catch up with this result on aarch64... On 4 April 2016 at 19:28, Mizuki Asakurawrote: >> OK, thanks for this information. What I mean is that this should be >> preferably a separate patch. > > I've sent it on the separated patch. > > On 3 April 2016 at 20:22, Siarhei Siamashka > wrote: >> On Sat, 2 Apr 2016 20:28:44 +0900 >> Mizuki Asakura wrote: >> >>> Thanks for your reply. >>> >>> > The main concern is whether we can get support for all these >>> > assembly syntax flavours, calling conventions and minor instructions >>> > differences, while keeping the code maintainable. >>> >>> In the aarch64 point of view, I believe we should only support ARM's >>> original >>> syntax and calling convensions. It would be most portable. >> >> The ILP32 is an alternative ABI, which is being endorsed or even >> developed by ARM: >> http://infocenter.arm.com/help/topic/com.arm.doc.dai0490a/index.html >> >> When/if somebody tries to use pixman in an AArch64 ILP32 system, this >> may cause some difficulties/inconveniences for us. >> >>> In iOS case, they may be able to create some converter script that only >>> translate instruction name to their own since the architcture is same >>> (as opposed aarch32-neon vs aarch64-neon is (almost) same, but different). >> >> Yes. The converters exists, but AFAIK the pixman assembly code is >> way too difficult for them to parse. Again, the problem is technically >> solvable, just nobody has invested efforts into solving it yet. Either >> by contributing the needed code or by sponsoring this work. >> >> I mentioned this only because the unified 32-bit/64-bit ARM assembly >> preprocessor could also potentially take care of this iOS issue as a >> side effect. >> >>> # How about SSE2 on OS X ? Is it same syntax and calling convension ? >> >> I'm not familiar with OS X myself. But the SSE2 code in pixman is >> currently using intrinsics, so this makes everything quite simple >> as far as portability is concerned. >> >> If we were to use hand tuned assembly optimizations for x86, then we >> would have to take care of the 32-bit Linux/Windows, 64-bit Linux with >> x32 ABI, 64-bit Linux and 64-bit Windows cases separately. Windows >> arrived late to the x86-64 scene, but they still decided to invent >> their own incompatible calling conventions. All of this is fine for >> intrinsics, but may clearly cause problems for assembly implementations. >> >> Hand written assembly has a relatively high maintenance cost. But it >> pays off if we want to have good performance. >> >> I still see a large potential for improving pixman performance on >> x86 (replace intrinsics with hand written assembly and make a better >> use of SSSE3, AVX, etc.). But somebody has to shoulder the expenses >> for this to happen. BTW, I would be happy to take this task if >> somebody is willing to pay :-) >> >>> > /* stride is always multiple of 32bit units in pixman */ >>> > -uint32_t byte_stride = stride * sizeof(uint32_t); >>> > +int32_t byte_stride = stride * sizeof(uint32_t); >>> >>> It caused horribly crashes in aarch64. Why it works on aarch32 ? :) >>> (32-bit integer overflow may help them...) >> >> OK, thanks for this information. What I mean is that this should be >> preferably a separate patch. >> >>> > I'm currently investigating the possibility of maybe "inventing" a >>> > common unified syntax for some preprocessor script to take care of >>> > both 32-bit and 64-bit ARM code. But we still can't be sure if this >>> > approach is practical (or even doable) and whether it can solve more >>> > problems than introduce new ones... >>> >>> At first, I've also tried to create a "converter script" from >>> pixman-arm-neon-asm*.S to >>> aarch64 neon codes just translating instruction, register names and so on. >> >> I see. It is nice to know that we basically came up with the same >> solution at least as the starting point :-) >> >>> But I've found that this script won't work on pixman's code because of: >>> >>> * conflicting register names: both Qn and Dn are used in same place >>> * highly optimized for aarch32-neon register structure. assumes d29, >>> d30 is low / high of q15 >> >> It's all mostly just a syntax sugar. The Dn registers still do exist >> in a somewhat hidden way. For example:
Re: [Pixman] [PATCH] Add support for aarch64 neon optimization
> OK, thanks for this information. What I mean is that this should be > preferably a separate patch. I've sent it on the separated patch. On 3 April 2016 at 20:22, Siarhei Siamashkawrote: > On Sat, 2 Apr 2016 20:28:44 +0900 > Mizuki Asakura wrote: > >> Thanks for your reply. >> >> > The main concern is whether we can get support for all these >> > assembly syntax flavours, calling conventions and minor instructions >> > differences, while keeping the code maintainable. >> >> In the aarch64 point of view, I believe we should only support ARM's original >> syntax and calling convensions. It would be most portable. > > The ILP32 is an alternative ABI, which is being endorsed or even > developed by ARM: > http://infocenter.arm.com/help/topic/com.arm.doc.dai0490a/index.html > > When/if somebody tries to use pixman in an AArch64 ILP32 system, this > may cause some difficulties/inconveniences for us. > >> In iOS case, they may be able to create some converter script that only >> translate instruction name to their own since the architcture is same >> (as opposed aarch32-neon vs aarch64-neon is (almost) same, but different). > > Yes. The converters exists, but AFAIK the pixman assembly code is > way too difficult for them to parse. Again, the problem is technically > solvable, just nobody has invested efforts into solving it yet. Either > by contributing the needed code or by sponsoring this work. > > I mentioned this only because the unified 32-bit/64-bit ARM assembly > preprocessor could also potentially take care of this iOS issue as a > side effect. > >> # How about SSE2 on OS X ? Is it same syntax and calling convension ? > > I'm not familiar with OS X myself. But the SSE2 code in pixman is > currently using intrinsics, so this makes everything quite simple > as far as portability is concerned. > > If we were to use hand tuned assembly optimizations for x86, then we > would have to take care of the 32-bit Linux/Windows, 64-bit Linux with > x32 ABI, 64-bit Linux and 64-bit Windows cases separately. Windows > arrived late to the x86-64 scene, but they still decided to invent > their own incompatible calling conventions. All of this is fine for > intrinsics, but may clearly cause problems for assembly implementations. > > Hand written assembly has a relatively high maintenance cost. But it > pays off if we want to have good performance. > > I still see a large potential for improving pixman performance on > x86 (replace intrinsics with hand written assembly and make a better > use of SSSE3, AVX, etc.). But somebody has to shoulder the expenses > for this to happen. BTW, I would be happy to take this task if > somebody is willing to pay :-) > >> > /* stride is always multiple of 32bit units in pixman */ >> > -uint32_t byte_stride = stride * sizeof(uint32_t); >> > +int32_t byte_stride = stride * sizeof(uint32_t); >> >> It caused horribly crashes in aarch64. Why it works on aarch32 ? :) >> (32-bit integer overflow may help them...) > > OK, thanks for this information. What I mean is that this should be > preferably a separate patch. > >> > I'm currently investigating the possibility of maybe "inventing" a >> > common unified syntax for some preprocessor script to take care of >> > both 32-bit and 64-bit ARM code. But we still can't be sure if this >> > approach is practical (or even doable) and whether it can solve more >> > problems than introduce new ones... >> >> At first, I've also tried to create a "converter script" from >> pixman-arm-neon-asm*.S to >> aarch64 neon codes just translating instruction, register names and so on. > > I see. It is nice to know that we basically came up with the same > solution at least as the starting point :-) > >> But I've found that this script won't work on pixman's code because of: >> >> * conflicting register names: both Qn and Dn are used in same place >> * highly optimized for aarch32-neon register structure. assumes d29, >> d30 is low / high of q15 > > It's all mostly just a syntax sugar. The Dn registers still do exist > in a somewhat hidden way. For example: > > vshrn.u32 d0,q15,#1 > vshrn.u32 d1,q15,#1 > > directly translates into > > shrnv0.4h, v15.4s, #1 > shrn2 v0.8h, v15.4s, #1 > > The selection of SHRN or SHRN2 instruction variant decides whether > the lower or upper half of the 128-bit destination register is updated. > > The 64-bit source operands (Dn registers) are handled in the very > same way: > > vmull.u16 q1,d0,d0 > vmull.u16 q2,d1,d1 > > directly translates into > > umull v1.4s, v0.4h, v0.4h > umull2 v2.4s, v0.8h, v0.8h > > There are of course some limitations. Mixing odd and even > numbered d-registers as the source operands is not supported > in AArch64. This is a bit inconvenient, but the AArch64 code > still needs to take this into account either way regardless > of the syntax. > > I'm
[Pixman] [PATCH] Fixed invalid casting in arm_neon_fill
stride con be negative, but byte-stride is declared as uint32_t. It may cause integer overflow. Signed-off-by: Mizuki Asakura--- pixman/pixman-arm-neon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pixman/pixman-arm-neon.c b/pixman/pixman-arm-neon.c index be761c9..fcb61bc --- a/pixman/pixman-arm-neon.c +++ b/pixman/pixman-arm-neon.c @@ -194,7 +194,7 @@ arm_neon_fill (pixman_implementation_t *imp, uint32_t _xor) { /* stride is always multiple of 32bit units in pixman */ -uint32_t byte_stride = stride * sizeof(uint32_t); +int32_t byte_stride = stride * sizeof(uint32_t); switch (bpp) { -- 2.7.4 ___ Pixman mailing list Pixman@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/pixman