http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57571
Bug ID: 57571 Summary: linux kernel function memcpy() execute with low efficiency on Intel Ivybridge platform Product: gcc Version: 4.7.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: yiyi8761 at gmail dot com OS type: OpenSuse 12.3 or SUSE 11 SP2 CPU type: Intel Ivybridge i7-3612QE or Intel Ivybridge i7-3615QE GCC Ver: 4.7.2(Open Suse 12.3) or 4.3.4(SUSE 11 SP2) GCC 4.7.2 Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,objc,fortran,obj-c++,java,ada --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.7 --enable-ssp --disable-libssp --disable-libitm --disable-plugin --with-bugurl=http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --disable-libgcj --disable-libmudflap --with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-specific-runtime-libs --enable-linker-build-id --program-suffix=-4.7 --enable-linux-futex --without-system-libunwind --with-arch-32=i586 --with-tune=generic --build=x86_64-suse-linux GCC 4.3.4 Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,objc,fortran,obj-c++,java,ada --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.7 --enable-ssp --disable-libssp --disable-libitm --disable-plugin --with-bugurl=http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --disable-libgcj --disable-libmudflap --with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-specific-runtime-libs --enable-linker-build-id --program-suffix=-4.7 --enable-linux-futex --without-system-libunwind --with-arch-32=i586 --with-tune=generic --build=x86_64-suse-linux description: 1. With the configurations above, the memcpy() used by linux kernel has a very low performance. use gdb to view memcpy() in disassembled code, it works like this: (gdb) set disassembly-flavor intel (gdb) x/20i 0xffffffff812ca220 0xffffffff812ca220: mov rax,rdi 0xffffffff812ca223: mov rcx,rdx 0xffffffff812ca226: rep movs BYTE PTR es:[rdi],BYTE PTR ds:[rsi] 0xffffffff812ca228: ret 0xffffffff812ca229: add eax,DWORD PTR [rbx+0x48f307e2] 0xffffffff812ca22f: movs DWORD PTR es:[rdi],DWORD PTR ds:[rsi] 0xffffffff812ca230: mov ecx,edx 0xffffffff812ca232: rep movs BYTE PTR es:[rdi],BYTE PTR ds:[rsi] 0xffffffff812ca234: ret 2. However, using the same OS(same GCC version and config), but on Intel Arrandle platform (i7 CPU L620), in gdb the function memcpy() in disassembled code like this: (gdb) set disassembly-flavor intel (gdb) x/20i 0xffffffff81250e80 0xffffffff81250e80: mov rax,rdi 0xffffffff81250e83: mov ecx,edx 0xffffffff81250e85: shr ecx,0x3 0xffffffff81250e88: and edx,0x7 0xffffffff81250e8b: rep movs QWORD PTR es:[rdi],QWORD PTR ds:[rsi] 0xffffffff81250e8e: mov ecx,edx 0xffffffff81250e90: rep movs BYTE PTR es:[rdi],BYTE PTR ds:[rsi] 0xffffffff81250e92: ret 3. So, the memcpy()'s efficiency on i7 L620 is eight times on the Intel Ivybridge Platform when the copy length is bigger than 8. 4. Have already referred to Intel and novell, the engineers said this issue may related with the compiler.