------- Comment #17 from eyal at geomage dot com  2008-02-08 08:58 -------
> Using malloc instead of new does generate better code and improves performance
> slightly for me, admittedly not as much as we would like; the kernel becomes:
> (using only -O3 -S -m64 -maltivec)
> .L29:
>         lvx 13,7,9
>         lvx 12,3,9
>         vperm 1,10,13,7
>         vperm 11,9,12,8
>         lvx 0,29,9
>         vor 10,13,13
>         vor 9,12,12
>         vaddfp 1,1,11
>         vaddfp 0,0,1
>         stvx 0,29,9
>         addi 9,9,16
>         bdnz .L29
> which is as good as the vectorizer can get, iinm: peeling the loop to align 
> the
> store (and the load from the same address), treating the other two loads as
> potentially unaligned.
> To further optimize this loop we would probably want to overlap the store with
> subsequent loads using -fmodulo-sched; perhaps the new export-ddg can help 
> with
> that.

I was able to get about 20% more in one case with malloc.
I was expecting something like 2-4 times faster when the vectorization is
enabled.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117

Reply via email to