On Fri, Jul 10, 2015 at 11:37:18AM +0200, Uros Bizjak wrote:
Hello!
As I wrote at
[PATCH, libcpp]: Use asm flag outputs in search_line_sse42 main loop
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg113610.html
I wont repeat myself with reasons summary is that current sse4.2 code is
reduntant as it has same performance as sse2 one.
This improves sse2 performance by around 10% vs sse4.2 code by
using better header.
Have you tried new SSE4.2 implementation (the one with asm flags) with
unrolled loop?
Also, the SSE4.2 implementation looks shorter, so more I-cache friendly,
so I wouldn't really say it is redundant if they are roughly same speed.
Jakub