[Issue 3738] MinstdRand0 and MinstdRand very poor performance

d-bugmail Sun, 24 Jan 2010 12:05:17 -0800

http://d.puremagic.com/issues/show_bug.cgi?id=3738




--- Comment #5 from Witold Baryluk <bary...@smp.if.uj.edu.pl> 2010-01-24 
12:03:20 PST ---
All my tests pases on your version.
10_000_000_000 (more than all possibilities) passes (for MinstdRand0 and
MinstdRand, it is possible that for other a it will fail, but i tested and done
some calculation.  and it should work for all a,c<m.
I also tested versions with slightly changed constants (like 30 instad of 31,
or int.max-1, int.max-2, int.max-3, int.max+1 in multiple comibinations an
allone), and all them fail.


Shouldn't const uint y by immucatable y, just like in first static if?


I think that our pretty optimised line, can be further optimalised: (1 branch
sub)

-             _x = (y >= int.max) ? ((y + 1) & int.max) : y;
+             _x = (y >= int.max) ? (y - int.max) : y;

:)

This gives:
MinstdRand0:
  standard with tricks (1 branch sub): 15.25
  inline with tricks (1 branch sub): 13.53 s
MinstdRand:
  standard with tricks (1 branch sub): 15.26
  inline with tricks (1 branch sub): 13.52 s

It doesn't give much difference (maybe 2%). But i thinkg it is better to have
one substraction, than one add and one binary and.



I found few interesting materials about this m=2^^31-1 operation.
One is http://www.firstpr.com.au/dsp/rand31/

I tested this Carta's versions: But they are only relevant for a=16807 (out
Park-Miller's StdminRand0 generator), so pretty limited:

Carta1 (original):
      uint lo = 16807*(_x & 0xFFFFu);
      immutable uint hi = 16807*(_x >> 16);
      lo += (hi & 0x7FFF) << 16;
      lo += (hi >> 15);
      _x = (lo > int.max) ? (lo - int.max) : lo;

Carta3:
      uint lo = 16807*(_x & 0xFFFFu);
      immutable uint hi = 16807*(_x >> 16);
      lo += ((hi & 0x7FFF) << 16) + (hi >> 15);
      _x = (lo > int.max) ? (lo - int.max) : lo;

Carta2: 
        immutable uint
           hi = 16807*(_x >> 16),
           lo 16807*(_x & 0xFFFFu) + ((hi & 0x7FFF) << 16) + (hi >> 15);
           _x = (lo > int.max) ? (lo - int.max) : lo;

Last lines can also be changed to branchles:
           _x = (lo > int.max) ? (lo - int.max) : lo;

Timings:
Carta code1: 22.30 s
Carta code1 branchless: 18.42
Carta code3: 21.84 s
Carta code3 branchless: 18.19 s
Carta code2: 23.84 s
Carta code2 branchless: 19.84 s

So it is slower in all cases, and it is only for speciall cases (c=0, a <
2^^15). Of course answers are still correct.


Performing uint*uint+uint -> ulong is just faster than their tricks, and more
general, and can be reused in many other places.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------

[Issue 3738] MinstdRand0 and MinstdRand very poor performance

Reply via email to