On Tue, Apr 23, 2013 at 11:47 AM, Andres Freund <and...@2ndquadrant.com> wrote: > On 2013-04-23 00:17:28 -0700, Jeff Davis wrote: >> + # important optimization flags for checksum.c >> + ifeq ($(GCC),yes) >> + checksum.o: CFLAGS += -msse4.1 -funroll-loops -ftree-vectorize >> + endif > > I am pretty sure we can't do those unconditionally: > - -funroll-loops and -ftree-vectorize weren't always part of gcc afair, > so we would need a configure check for those
-funroll-loops is available from at least GCC 2.95. -ftree-vectorize is GCC 4.0+. From what I read from the documentation on ICC -axSSE4.1 should generate a plain and accelerated version and do a runtime check., I don't know if ICC vectorizes the specific loop in the patch, but I would expect it to given that Intels vectorization has generally been better than GCCs and the loop is about as simple as it gets. I don't know the relevant options for other compilers. > - SSE4.1 looks like a total no-go, its not available everywhere. We > *can* add runtime detection of that with gcc fairly easily and > one-time if we wan't to go there (later?) using 'ifunc's, but that > needs a fair amount of infrastructure work. > - We can rely on SSE1/2 on amd64, but I think thats automatically > enabled there. This is why I initially went for the lower strength 16bit checksum calculation - requiring only SSE2 would have made supporting the vectorized version on amd64 trivial. By now my feeling is that it's not prudent to compromise in quality to save some infrastructure complexity. If we set a hypothetical VECTORIZATION_FLAGS variable at configure time, the performance is still there for those who need it and can afford CPU specific builds. Regards, Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers