Hi, Siarhei Siamashka:
Q:---" What problems do you have without "merge" mechanism?"
A: Of course there isn't correctness issue w/o "merge".
Currently, sse2_fast_paths/mmx_fast_paths/c_fast_paths...are excluded each 
other, although some checking forms delegate chain. While after delegate chain 
formed, only one fast table effect. Is my understanding correct?
So after ssse3_fast_paths is newly added, only SSSE3 table will be effect after 
SSSE3 CPUID is detected. Of course we can just keep 2 new entries with SSSE3 
asm optimizations, the issue is: we lost the optimizations which already exist 
in current SSE2 table. So, w/o merging or w/o totally duplication from SSE2 
file, there will be performance unfriendly. 

Sure, It is ok for us for this "correctness firstly, then performance in next 
wave" philosophy. A new SSSE3 file with only 2 fast path entries is ok?

For simplifying on CPU detection, I think same " correctness firstly, then 
performance in next wave" philosophy might be followed. A full CPU detection 
for 64 bit can be added firstly as a baseline, the one who can test win32 and 
Solaris can help us to make it shorter. 

As a wrap, new path will:
1) keep most 64 bit CPU detection as currently patch (GNU C part can reduce 
some edx checking)
2) A new SSSE3 file with only 2 fast path entries for newly added ASM 
optimization

Comments?

Samuel


-----Original Message-----
From: Siarhei Siamashka [mailto:siarhei.siamas...@gmail.com] 
Sent: Friday, August 27, 2010 10:57 PM
To: Xu, Samuel
Cc: pixman@lists.freedesktop.org; Ma, Ling; Liu, Xinyun
Subject: Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

On Friday 27 August 2010 15:00:49 Xu, Samuel wrote:
> Hi, Siarhei Siamashka:
>       Thanks for quick response!
>       For 64 bit detect_cpu_features(), if ignore HAVE_GETISAX and _MSC_VER,
>       it is ok for us to simplify it as your example in next update.

If you can ensure MSVC compatibility and make it work with your optimizations, 
then it would be really great. But if it is totally untested, I don't feel 
comfortable about having it just blindly replicated from 32 to 64 bits with the 
hope that it will work.

It's just my opinion, the others may disagree. And the others may also try to 
test your patch on win32 or solaris systems, providing a lot more useful 
feedback than me.

> For pixman-ssse3.c, maybe we have 2 options:
>  1) duplicate 6562 lines from pixman-sse2.c to new pixman-
>     ssse3.c in 1st patch (of course to replace 2 entries with newly added
>     SSSE3 asm optimization), and then add "merge" mechanism in later patch.

No, there is no need to duplicate anything.

>  2) firstly add "merge" mechanism patch, and the added new pixman-ssse3.c in
>     later patch, which might be shorter (111 lines) Does it mean 
>     1) option is preferred?

What problems do you have without "merge" mechanism? The pixman-sse2.c works 
fine without it, and it does properly fallback to MMX code if SSE2 does not 
support some operations. Similarly, SSSE3 can fallback to SSE2 in the very same 
way.

--
Best regards,
Siarhei Siamashka
_______________________________________________
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman

Reply via email to