On May 03 20:03:43, h...@stare.cz wrote:
> > > > > > > On Apr 26 20:46:51, b...@comstyle.com wrote:
> > > > > > > > Implement SSE2 lrint() and lrintf() on amd64.
> > > > > > > 
> > > > > > > I don't think this is worth the added complexity:
> > > > > > > seven more patches to have a different lrint()?
> > > > > > > Does it make the resampling noticably better/faster?

BTW, this is what libm/arch/amd64/s_lrint.S says:

ENTRY(lrint)
        RETGUARD_SETUP(lrint, r11)
        cvtsd2si %xmm0, %rax
        RETGUARD_CHECK(lrint, r11)
        ret
END(lrint)

So isn't that already used anyway?
If so, what's the point of replacing lrint
with _mm_cvtsd_si32(_mm_load_sd(&x)) ?

        Jan



> > > > https://github.com/libsndfile/libsndfile/pull/663
> > > > -> https://quick-bench.com/q/OabKT-gEOZ8CYDriy1JEwq1lEsg
> > > > where there's a huge difference in clang builds.
> > > 
> > > Sorry, I don't understand at all how this concerns
> > > the OpenBSD port of libsamplerate: the Benchmark does not
> > > mention an OS or an architecture, so what is this being run on?
> > > 
> > > Anyway, just running it (Run Benchmark) gives the result
> > > of cpu_time of 722.537 for BM_d2les_array (using lrint)
> > > and cpu_time of 0 for BM_d2les_array_sse2 (using psf_lrint),
> > > reporting a speedup ratio of 200,000,000.
> > > 
> > > That's not an example of what I have in mind: a simple application
> > > of libsamplerate, sped up by the usage of the new SSE2 lrint
> 
> > OK, here is a test that's a modified version of what Stuart linked,
> > testing the performance of the lrint() itself (code below).
> 
> A better test below, lrint()ing a random sequence.
> The SSE version is slower on every SSE2 machine I tried.
> Is that the case for you too?
> 
>       Jan
> 
> 
> #include <immintrin.h>
> #include <math.h>
> 
> static inline int 
> psf_lrint(double const x)
> {
>       return _mm_cvtsd_si32(_mm_load_sd(&x));
> }
> 
> static void
> d2l(const double *src, long *dst, size_t len)
> {
>       for (size_t i = 0; i < len; i++)
>               dst[i] = lrint(src[i]);
> }
> 
> static void
> d2l_sse(const double *src, long *dst, size_t len)
> {
>       for (size_t i = 0; i < len; i++)
>               dst[i] = psf_lrint(src[i]);
> }
> 
> int
> main()
> {
>       size_t i, len = 1000 * 1000 * 100;
>       double *src = NULL;
>       long *dst = NULL;
> 
>       src = calloc(len, sizeof(double));
>       dst = calloc(len, sizeof(long));
> 
>       arc4random_buf(src, len * sizeof(double));
>       d2l_sse(src, dst, len);
>       /*d2l(src, dst, len);*/
> 
>       return 0;
> }

Reply via email to