http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57796

Yuri Rumyantsev <ysrumyan at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ysrumyan at gmail dot com

--- Comment #2 from Yuri Rumyantsev <ysrumyan at gmail dot com> ---
This issue does not related to avx2 tuning but rather to estimation of
vectorization profitability. Note that avx does not support "gathers" and so
the following lnnermost loop
                for (i=rowR; i<rowRp1; i++)
                    sum += x[ col[i] ] * val[i];
is not vectorized. I did simple experiment and found out that iteration count
for it is 5 or 10 (for -large input) and it looks not profitable for avx2
vectorization, i.e. scalar version should be more profitable for execution. If
we slightly change this loop to
        int n = row[r+1] - row[r];
        int *col1 = col + row[r];                         
                for (i=0; i<n; i++)
                    sum += x[ col1[i] ] * val[i];
i.e. set up low bound to zero, peformance drop for avx2 will disappear:

with avx
Sparse matmult  Mflops:  2135.59    (N=1000, nz=5000)
with avx2
Sparse matmult  Mflops:  2309.64    (N=1000, nz=5000)

Reply via email to