Hi Siarhei Siamashka: We'd like to provide a new patch with following enhancement soon: 1) Add 64 bit asm code specifically for 64 bit, which will co-exist with 32 bit version 2) CPUID dynamic check in pixman-cpu.c and pixman-access.c 3) Makefile fixing 4) Stack executable fixing in both 32/64 asm files. 5) follow your comments to make asm code shorter Any missing?
Two more questions: ----"Maybe the intention was good, but the end result just fails to build on my 64-bit Intel Atom netbook. And pixman "make check" dies with "Illegal instruction" on a 32-bit Intel Pentium-M laptop. So it's a clear indication that you did something wrong and the patch is unacceptable as is." Q1: Could you past the 64-bit ATOM build failure? Is it caused by ./configure to generate make file, or build error, or link error? Q2: Could you show us how to do "make check"? Samuel -----Original Message----- From: Ma, Ling Sent: Tuesday, August 17, 2010 3:22 PM To: Siarhei Siamashka; Xu, Samuel Cc: pixman@lists.freedesktop.org; Liu, Xinyun Subject: RE: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8 Hi Siarhei Siamashka > I did not question the fact that PALIGNR instruction must be useful > for something. There must have been some reason why it was introduced > :) But real benchmark numbers are always interesting, like the ones > you provided for > SSSE3 against original C code comparison. Just to be sure that these > more than 1000 lines of code are really justified (at least for this > particular use). We attached data report about movups + movaps Vs movaps + palignr. The report shows movaps + palignr is better because crossing-cacheline will cause big latency on Atom, Core2. > As I already mentioned, the code in 'pixman-access-ssse3.S' is highly > redundant just for 'fetch_scanline_x8r8g8b8' implementation. Due to > processing 32-bit data, there are only 4 possible cases of relative > source/destination alignment, but the whole set of 16 alignment cases > is implemented. In addition to other benefits (like improved sources > readability), reducing code size is good for saving space in the CPU > instruction cache and this typically improves performance. Original patch can tolerate any type of offset from src and dest. Yes, I agree if offset of src and dest must be 4 bytes aligned, Your suggestion will reduce code size significantly. Other questions, Samuel or Xinyun will provide further answers. Best Regards Ling > > 2) For #ifdefs vs. dynamically checking SSSE3, the patch exactly > > mimics "SSE2" support in currently pixman code(#ifdefs), which > > checks the build system when makefile generating and applies > > USE_SSSE3 into later building. > > Maybe the intention was good, but the end result just fails to build > on my 64-bit Intel Atom netbook. And pixman "make check" dies with > "Illegal instruction" on a 32-bit Intel Pentium-M laptop. So it's a > clear indication that you did something wrong and the patch is unacceptable > as is. > > > Of course prefix of CPUID can achieve better compatibility. It is ok > > for us to add this kind of CPUID prefix. I will appreciate directly > > comments on shape of this patch, using SSSE3, e.g. some code sample > > of expected pixman-cpu.c and > pixman-access.c. > > SSSE3 is not much different from MMX and SSE2 and can be added in a > similar way to pixman-cpu.c > > Regarding SIMD optimized fetchers. Pixman has none at the moment, so > there is no code to be used as a reference to see how it should be > done "right" (and that's partially the reason why bug#20709 is still > unresolved). > There was an older discussion about SIMD fetchers: > http://comments.gmane.org/gmane.comp.lib.cairo/18342 > > > 3) For executable stack caused by assemble file, it should be able > > to solved by adding ".section .note.GNU-stack .previous", according > > https://www.redhat.com/archives/fedora-devel-list/2005-March/msg00460. > > html . In fact, same solution already exist in current > > pixman-arm-neon-asm.S inside pixman code. > > Yes sure. I did not say that it can't be solved :) It's just better to > address this particular problem in the next revision of ssse3 patch. > > -- > Best regards, > Siarhei Siamashka _______________________________________________ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman