Re: Vectorization: Loop peeling with misaligned support.

Ondřej Bílka Sat, 16 Nov 2013 03:46:50 -0800

On Sat, Nov 16, 2013 at 11:37:36AM +0100, Richard Biener wrote:
> "Ondřej Bílka" <nel...@seznam.cz> wrote:
> >On Fri, Nov 15, 2013 at 09:17:14AM -0800, Hendrik Greving wrote:
> 
> IIRC what can still be seen is store-buffer related slowdowns when you have a 
> big unaligned store load in your loop.  Thus aligning stores still pays back 
> last time I measured this.


Then send you benchmark. What I did is a loop that stores 512 bytes. Unaligned 
stores there are faster than aligned ones, so tell me when aligning stores pays 
itself. Note that in filling store buffer you must take into account extra 
stores to make loop aligned.

Also what do you do with loops that contain no store? If I modify test to

int set(int *p, int *q){
  int i;
  int sum = 0;
  for (i=0; i < 128; i++)
     sum += 42 * p[i];
  return sum;
}

then it still does aligning.

There may be a threshold after which aligning buffer makes sense then you
need to show that loop spend most of time on sizes after that treshold.

Also do you have data how common store-buffer slowdowns are? Without
knowing that you risk that you make few loops faster at expense of
majority which could likely slow whole application down. It would not
supprise me as these loops can be ran mostly on L1 cache data (which is
around same level as assuming that increased code size fits into instruction 
cache.)


Actually these questions could be answered by a test, first compile
SPEC2006 with vanilla gcc -O3 and then with gcc that contains patch to
use unaligned loads. Then results will tell if peeling is also good in
practice or not.

Re: Vectorization: Loop peeling with misaligned support.

Reply via email to