http://d.puremagic.com/issues/show_bug.cgi?id=3738
--- Comment #5 from Witold Baryluk <bary...@smp.if.uj.edu.pl> 2010-01-24 12:03:20 PST --- All my tests pases on your version. 10_000_000_000 (more than all possibilities) passes (for MinstdRand0 and MinstdRand, it is possible that for other a it will fail, but i tested and done some calculation. and it should work for all a,c<m. I also tested versions with slightly changed constants (like 30 instad of 31, or int.max-1, int.max-2, int.max-3, int.max+1 in multiple comibinations an allone), and all them fail. Shouldn't const uint y by immucatable y, just like in first static if? I think that our pretty optimised line, can be further optimalised: (1 branch sub) - _x = (y >= int.max) ? ((y + 1) & int.max) : y; + _x = (y >= int.max) ? (y - int.max) : y; :) This gives: MinstdRand0: standard with tricks (1 branch sub): 15.25 inline with tricks (1 branch sub): 13.53 s MinstdRand: standard with tricks (1 branch sub): 15.26 inline with tricks (1 branch sub): 13.52 s It doesn't give much difference (maybe 2%). But i thinkg it is better to have one substraction, than one add and one binary and. I found few interesting materials about this m=2^^31-1 operation. One is http://www.firstpr.com.au/dsp/rand31/ I tested this Carta's versions: But they are only relevant for a=16807 (out Park-Miller's StdminRand0 generator), so pretty limited: Carta1 (original): uint lo = 16807*(_x & 0xFFFFu); immutable uint hi = 16807*(_x >> 16); lo += (hi & 0x7FFF) << 16; lo += (hi >> 15); _x = (lo > int.max) ? (lo - int.max) : lo; Carta3: uint lo = 16807*(_x & 0xFFFFu); immutable uint hi = 16807*(_x >> 16); lo += ((hi & 0x7FFF) << 16) + (hi >> 15); _x = (lo > int.max) ? (lo - int.max) : lo; Carta2: immutable uint hi = 16807*(_x >> 16), lo 16807*(_x & 0xFFFFu) + ((hi & 0x7FFF) << 16) + (hi >> 15); _x = (lo > int.max) ? (lo - int.max) : lo; Last lines can also be changed to branchles: _x = (lo > int.max) ? (lo - int.max) : lo; Timings: Carta code1: 22.30 s Carta code1 branchless: 18.42 Carta code3: 21.84 s Carta code3 branchless: 18.19 s Carta code2: 23.84 s Carta code2 branchless: 19.84 s So it is slower in all cases, and it is only for speciall cases (c=0, a < 2^^15). Of course answers are still correct. Performing uint*uint+uint -> ulong is just faster than their tricks, and more general, and can be reused in many other places. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------