------- Comment #17 from eyal at geomage dot com 2008-02-08 08:58 ------- > Using malloc instead of new does generate better code and improves performance > slightly for me, admittedly not as much as we would like; the kernel becomes: > (using only -O3 -S -m64 -maltivec) > .L29: > lvx 13,7,9 > lvx 12,3,9 > vperm 1,10,13,7 > vperm 11,9,12,8 > lvx 0,29,9 > vor 10,13,13 > vor 9,12,12 > vaddfp 1,1,11 > vaddfp 0,0,1 > stvx 0,29,9 > addi 9,9,16 > bdnz .L29 > which is as good as the vectorizer can get, iinm: peeling the loop to align > the > store (and the load from the same address), treating the other two loads as > potentially unaligned. > To further optimize this loop we would probably want to overlap the store with > subsequent loads using -fmodulo-sched; perhaps the new export-ddg can help > with > that.
I was able to get about 20% more in one case with malloc. I was expecting something like 2-4 times faster when the vectorization is enabled. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35117