On May 03 20:03:43, h...@stare.cz wrote: > > > > > > > On Apr 26 20:46:51, b...@comstyle.com wrote: > > > > > > > > Implement SSE2 lrint() and lrintf() on amd64. > > > > > > > > > > > > > > I don't think this is worth the added complexity: > > > > > > > seven more patches to have a different lrint()? > > > > > > > Does it make the resampling noticably better/faster?
BTW, this is what libm/arch/amd64/s_lrint.S says: ENTRY(lrint) RETGUARD_SETUP(lrint, r11) cvtsd2si %xmm0, %rax RETGUARD_CHECK(lrint, r11) ret END(lrint) So isn't that already used anyway? If so, what's the point of replacing lrint with _mm_cvtsd_si32(_mm_load_sd(&x)) ? Jan > > > > https://github.com/libsndfile/libsndfile/pull/663 > > > > -> https://quick-bench.com/q/OabKT-gEOZ8CYDriy1JEwq1lEsg > > > > where there's a huge difference in clang builds. > > > > > > Sorry, I don't understand at all how this concerns > > > the OpenBSD port of libsamplerate: the Benchmark does not > > > mention an OS or an architecture, so what is this being run on? > > > > > > Anyway, just running it (Run Benchmark) gives the result > > > of cpu_time of 722.537 for BM_d2les_array (using lrint) > > > and cpu_time of 0 for BM_d2les_array_sse2 (using psf_lrint), > > > reporting a speedup ratio of 200,000,000. > > > > > > That's not an example of what I have in mind: a simple application > > > of libsamplerate, sped up by the usage of the new SSE2 lrint > > > OK, here is a test that's a modified version of what Stuart linked, > > testing the performance of the lrint() itself (code below). > > A better test below, lrint()ing a random sequence. > The SSE version is slower on every SSE2 machine I tried. > Is that the case for you too? > > Jan > > > #include <immintrin.h> > #include <math.h> > > static inline int > psf_lrint(double const x) > { > return _mm_cvtsd_si32(_mm_load_sd(&x)); > } > > static void > d2l(const double *src, long *dst, size_t len) > { > for (size_t i = 0; i < len; i++) > dst[i] = lrint(src[i]); > } > > static void > d2l_sse(const double *src, long *dst, size_t len) > { > for (size_t i = 0; i < len; i++) > dst[i] = psf_lrint(src[i]); > } > > int > main() > { > size_t i, len = 1000 * 1000 * 100; > double *src = NULL; > long *dst = NULL; > > src = calloc(len, sizeof(double)); > dst = calloc(len, sizeof(long)); > > arc4random_buf(src, len * sizeof(double)); > d2l_sse(src, dst, len); > /*d2l(src, dst, len);*/ > > return 0; > }