Re: [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.

2017-01-08 Thread Joseph Rushton Wakeling via Digitalmars-d
On Sunday, 8 January 2017 at 13:16:29 UTC, Joseph Rushton 
Wakeling wrote:
I'm asking for eyes on the problem because reducing it to a 
minimal example appears non-trivial, while the bug itself looks 
serious beyond its effect on this PR.


I underestimated myself :-P  Minimal example is as follows:

/
struct Inner
{
uint value = void;  // <=== removing this `void` initializer
// removes the problem
}

struct Outer
{
Inner inner = defaultInner();

static Inner defaultInner()
{
if (!__ctfe) assert(false);
Inner inn;
inn.value = 23;
return inn;
}
}

void main()
{
import std.stdio : writeln;
Outer outer;
outer.inner.writeln;  // outputs `Inner(0)` with dmd
  // but should be `Inner(23)`
  // ldc gets it right ;-)
}
/

So, the bug comes down to priority of default initialization.  
I'll follow an issue later today and fix the PR accordingly.


Re: [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.

2017-01-08 Thread Joseph Rushton Wakeling via Digitalmars-d
On Sunday, 8 January 2017 at 02:51:51 UTC, Andrei Alexandrescu 
wrote:
This indicates a compiler bug in dmd itself, so the best 
outcome for this library work would be a reduced compiler bug + 
a simple library workaround. -- Andrei


Yes, that much is clear -- sorry if I wasn't clear enough myself. 
 I'm asking for eyes on the problem because reducing it to a 
minimal example appears non-trivial, while the bug itself looks 
serious beyond its effect on this PR.


For what it's worth:

  * it appears to be a backend issue, not frontend (using dmd 
with the same

frontend as the latest ldc doesn't make a difference);

  * creating an `enum` manifest constant from the CTFE function 
works fine,
but assigning to the runtime struct field from that enum 
produces the

same mostly-zeros result;

  * it definitely doesn't appear to be something as trivial as 
just default
initializing a struct instance (or a struct instance 
including a static

array of more than a certain size) being the problem.

I will keep trying to reduce it to a minimal example, but 
CTFE-related backend bugs are not something I have a lot of 
experience with tracking down, hence the request for help.


Would filing an issue still be helpful at this stage, even 
without a minimal example?


Re: [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.

2017-01-07 Thread Andrei Alexandrescu via Digitalmars-d

On 1/7/17 6:16 PM, Joseph Rushton Wakeling wrote:


The method I developed works fine with LDC, but fails with DMD: the
internal state of the generator winds up as all zeros, except for the
`State.z` parameter which mysteriously ends up at the correct value.
This would suggest that somehow the generated state is being overwritten
_after_ being correctly generated, rather than not correctly generated
in the first place.

Any chance that someone with more CTFE-fu than myself could take a
glance and advise what might be wrong (or how to work around it)?


This indicates a compiler bug in dmd itself, so the best outcome for 
this library work would be a reduced compiler bug + a simple library 
workaround. -- Andrei


Re: [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.

2017-01-07 Thread Joseph Rushton Wakeling via Digitalmars-d
On Saturday, 26 November 2016 at 20:13:36 UTC, Andrei 
Alexandrescu wrote:

On 11/26/16 11:31 AM, Ilya Yaroshenko wrote:

Hey,

32-bit Mt19937 random number Generator is default in Phobos.
It is default in Mir too, except that 64-bit targets use 
64-bit Mt19937 instead.


Congrats! Also thanks for using the Boost license which would 
allow backporting the improvements to Phobos. Who'd be up for 
it?


PR is now on Phobos, and in the process of looking at this 
together, Ilya and I resolved a number of issues both in this 
code and in mir.random:

https://github.com/dlang/phobos/pull/5011

There is unfortunately one serious blocker to this PR.  In order 
to allow it to be a drop-in replacement for the previous Phobos 
implementation, I had to find a means to default-initialize its 
internal state at compile time so that it would match what would 
be there if it was explicitly seeded with the default seed.


The method I developed works fine with LDC, but fails with DMD: 
the internal state of the generator winds up as all zeros, except 
for the `State.z` parameter which mysteriously ends up at the 
correct value.  This would suggest that somehow the generated 
state is being overwritten _after_ being correctly generated, 
rather than not correctly generated in the first place.


Any chance that someone with more CTFE-fu than myself could take 
a glance and advise what might be wrong (or how to work around 
it)?


Re: [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.

2016-12-14 Thread Ilya Yaroshenko via Digitalmars-d
On Wednesday, 14 December 2016 at 00:18:06 UTC, Joseph Rushton 
Wakeling wrote:
Cool.  Last question: IIUC you use the private `_z` parameter 
as a cache for the most recent `data[index]` value (and AFAICT 
that's the only use it has).  Is there a good reason for doing 
this, rather than just setting `z = data[index]` inside 
`opCall` (where `z` is the local parameter inside the method)?


Because memory access locality. z is always hot.


Re: [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.

2016-12-14 Thread Joseph Rushton Wakeling via Digitalmars-d
On Saturday, 26 November 2016 at 16:31:40 UTC, Ilya Yaroshenko 
wrote:
1. Improve RNG generation performance by making code more 
friendly for CPU pipelining. Tempering (finalization) 
operations was mixed with internal payload update operations.


A note on this.  The `opCall` (or, in the range version, 
`popFront`) of Ilya's implementation mixes together two 
superficially independent actions:


  (1) calculating the current random variate from the current 
index

  of the internal state array;

  (2) updating the current index of the internal state array, and
  moving to the next entry.

It's straightforward to split out these two procedures into two 
separate methods (or at least two clearly separated sequences 
within the `opCall`), but doing so results in a notable 
performance hit (on my machine, something in the order of 1 GB/s 
less random bits).


Intertwining these steps in this way is therefore a very smart 
optimization (although TBH it feels a little worrying that it's 
necessary).


Re: [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.

2016-12-13 Thread Joseph Rushton Wakeling via Digitalmars-d
On Tuesday, 13 December 2016 at 23:18:26 UTC, Ilya Yaroshenko 
wrote:
  * @Ilya, is this implementation your own design, or do you 
have a reference

for the rationale behind this revised implementation of MT?


My own.


Congratulations, then -- I think this is a very interesting 
rewrite of the algorithm and I would encourage you to write up a 
paper on your choice of optimizations, if that is of interest to 
you.


  * Is there a particular reason why the `index` variable is a 
`Uint` (i.e.
the word type of the generator) rather than a `size_t`?  I 
presume there's
a motivation, since the casting back and forth in `opCall` 
would otherwise

seem inconvenient.


If I remember correctly it is used with Uint, so I use the same 
type.


Not anywhere that I can find.  `n`, the template parameter from 
which `this.index` is initialized, is a `size_t`, while the value 
used inside `opCall` is a `sizediff_t`.


TBH I assumed it was related to the bit-packing of the struct, 
rather than any question of usage.


  * Is there a motivation for the reverse iteration through 
the RNG state array?


Yes, it composes add/sub with cmp operations.


Cool.  Last question: IIUC you use the private `_z` parameter as 
a cache for the most recent `data[index]` value (and AFAICT 
that's the only use it has).  Is there a good reason for doing 
this, rather than just setting `z = data[index]` inside `opCall` 
(where `z` is the local parameter inside the method)?


I ask because it doesn't seem to make any practical difference to 
the benchmarking outcomes, and you have to look up `data[index]` 
anyway when you write it, so it's not obvious to me that it 
really saves anything in practice.  (I confess I haven't looked 
at the assembly, so I might be missing something.)


Thanks for the answers, and apologies for my slow follow-up on 
some of the things we discussed earlier related to mir.random -- 
it's been a bit of a busy patch ... :-\


Re: [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.

2016-12-13 Thread Ilya Yaroshenko via Digitalmars-d


If I remember correctly it is used with Using, so I use the 
same type.


Using -> Uint
Sorry, it is phone keyboard


Re: [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.

2016-12-13 Thread Ilya Yaroshenko via Digitalmars-d
On Tuesday, 13 December 2016 at 18:15:25 UTC, Joseph Rushton 
Wakeling wrote:
On Saturday, 26 November 2016 at 20:13:36 UTC, Andrei 
Alexandrescu wrote:
Congrats! Also thanks for using the Boost license which would 
allow backporting the improvements to Phobos. Who'd be up for 
it?


I've finally found a moment to look into this (I'm at home 
recovering from a seasonal virus, which ironically puts some 
time and mental space in my hands).


It makes for an interesting comparison.  The traditional 
Mersenne Twister algorithm generates N random words in one go, 
and then iterates over them, before generating another N random 
words, and so on.  (Word == unsigned integer type.)


By contrast Ilya's implementation just updates the single next 
entry in the static array that stores the state of the engine.  
This would seem to work out cheaper on average: perhaps because 
the update-a-variate code stays hotter, perhaps because it 
involves less branching and looping.  It certainly makes for a 
simpler, easier to follow update.


However, it might be a good idea to benchmark this in a context 
where there are a lot of random variates being generated, but 
where other things are also happening that might render the 
update code less "hot" than a straightforward `foreach` loop 
benchmark.  (Maybe try it out with a Monte Carlo simulation 
...?)
 The tradeoff between many cheap updates and one expensive one, 
versus all updates being a little more expensive but less on 
average, might not be viable in some practical use-cases.


The Phobos implementation also includes a check on whether the 
generator is properly seeded, which is performed in every 
`popFront()`, which has a reasonably significant impact.  
Ilya's implementation can get away with not performing that 
check because it uses `@disable this();` to remove the 
possibility of initializing the generator without it being 
properly seeded.  The impact of that check can possibly be 
lessened by making it a boolean comparison rather than the 
existing `size_t` equality comparison.


  -- action item here: could we deprecate `this()` in Phobos as 
a precursor to removing
 the opportunity to instantiate a generator without 
properly initializing it?


I'm going to try to put together a range-based version to see 
if this also makes any difference.  I'll post some benchmarks 
of my own once that's done, and if all looks good I'll try to 
put a Phobos PR together.


A few questions -- not blockers, but nice to understand:

  * @Ilya, is this implementation your own design, or do you 
have a reference

for the rationale behind this revised implementation of MT?


My own.

  * Is there a particular reason why the `index` variable is a 
`Uint` (i.e.
the word type of the generator) rather than a `size_t`?  I 
presume there's
a motivation, since the casting back and forth in `opCall` 
would otherwise

seem inconvenient.


If I remember correctly it is used with Using, so I use the same 
type.


  * Is there a motivation for the reverse iteration through the 
RNG state array?


Yes, it composes add/sub with cmp operations.

Ilya


Re: [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.

2016-12-13 Thread Joseph Rushton Wakeling via Digitalmars-d
On Tuesday, 13 December 2016 at 18:15:25 UTC, Joseph Rushton 
Wakeling wrote:
I'm going to try to put together a range-based version to see 
if this also makes any difference.  I'll post some benchmarks 
of my own once that's done, and if all looks good I'll try to 
put a Phobos PR together.


Benchmark is here:
https://github.com/WebDrake/mersenne-twister-range

Overall these benchmarks seem to have reasonably wide variance.  
I note that the original benchmark's attempt to make the code 
"hot" had quite a strong impact in terms of the ordering of 
individual benchmarks; I've tried to work around that by doing 
the "make it hot" loop immediately before the benchmark.


This is still just a plain old foreach, so it may be worth 
exploring some benchmarks where other stuff is done with the 
random variates.  As a first step I've added some benchmarks 
using `uniform`.


Sample output from 2 different runs of these benchmarks:

-
Running ./mir-mt-range
opCall Mir 32-bit Mt19937: 6.68058 Gb/s
check sum = 429472035921730457

range Mir 32-bit Mt19937: 6.639 Gb/s
check sum = 429472031909629874

opCall Mir 64-bit Mt19937: 14.0351 Gb/s
check sum = 5606740663277085587

range Mir 64-bit Mt19937: 13.8229 Gb/s
check sum = 3636092631268073456

Phobos 32-bit Mt19937: 4.22164 Gb/s
check sum = 429472035921730457

Phobos 64-bit Mt19937: wrong initialization and tempering

range Mir 32-bit Mt19937: 95.4198 U(0.0, 1.0) variates/s

range Mir 64-bit Mt19937: 87.184 U(0.0, 1.0) variates/s

Phobos 32-bit Mt19937: 80.8407 U(0.0, 1.0) variates/s



Running ./mir-mt-range
opCall Mir 32-bit Mt19937: 6.36183 Gb/s
check sum = 429472035921730457

range Mir 32-bit Mt19937: 6.54397 Gb/s
check sum = 429472031909629874

opCall Mir 64-bit Mt19937: 13.7339 Gb/s
check sum = 5606740663277085587

range Mir 64-bit Mt19937: 13.8229 Gb/s
check sum = 3636092631268073456

Phobos 32-bit Mt19937: 4.18848 Gb/s
check sum = 429472035921730457

Phobos 64-bit Mt19937: wrong initialization and tempering

range Mir 32-bit Mt19937: 95.2381 U(0.0, 1.0) variates/s

range Mir 64-bit Mt19937: 97.0874 U(0.0, 1.0) variates/s

Phobos 32-bit Mt19937: 88.1057 U(0.0, 1.0) variates/s



Re: [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.

2016-12-13 Thread Joseph Rushton Wakeling via Digitalmars-d
On Saturday, 26 November 2016 at 20:13:36 UTC, Andrei 
Alexandrescu wrote:
Congrats! Also thanks for using the Boost license which would 
allow backporting the improvements to Phobos. Who'd be up for 
it?


I've finally found a moment to look into this (I'm at home 
recovering from a seasonal virus, which ironically puts some time 
and mental space in my hands).


It makes for an interesting comparison.  The traditional Mersenne 
Twister algorithm generates N random words in one go, and then 
iterates over them, before generating another N random words, and 
so on.  (Word == unsigned integer type.)


By contrast Ilya's implementation just updates the single next 
entry in the static array that stores the state of the engine.  
This would seem to work out cheaper on average: perhaps because 
the update-a-variate code stays hotter, perhaps because it 
involves less branching and looping.  It certainly makes for a 
simpler, easier to follow update.


However, it might be a good idea to benchmark this in a context 
where there are a lot of random variates being generated, but 
where other things are also happening that might render the 
update code less "hot" than a straightforward `foreach` loop 
benchmark.  (Maybe try it out with a Monte Carlo simulation ...?) 
 The tradeoff between many cheap updates and one expensive one, 
versus all updates being a little more expensive but less on 
average, might not be viable in some practical use-cases.


The Phobos implementation also includes a check on whether the 
generator is properly seeded, which is performed in every 
`popFront()`, which has a reasonably significant impact.  Ilya's 
implementation can get away with not performing that check 
because it uses `@disable this();` to remove the possibility of 
initializing the generator without it being properly seeded.  The 
impact of that check can possibly be lessened by making it a 
boolean comparison rather than the existing `size_t` equality 
comparison.


  -- action item here: could we deprecate `this()` in Phobos as a 
precursor to removing
 the opportunity to instantiate a generator without properly 
initializing it?


I'm going to try to put together a range-based version to see if 
this also makes any difference.  I'll post some benchmarks of my 
own once that's done, and if all looks good I'll try to put a 
Phobos PR together.


A few questions -- not blockers, but nice to understand:

  * @Ilya, is this implementation your own design, or do you have 
a reference

for the rationale behind this revised implementation of MT?

  * Is there a particular reason why the `index` variable is a 
`Uint` (i.e.
the word type of the generator) rather than a `size_t`?  I 
presume there's
a motivation, since the casting back and forth in `opCall` 
would otherwise

seem inconvenient.

  * Is there a motivation for the reverse iteration through the 
RNG state array?


Re: [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.

2016-11-29 Thread Ilya Yaroshenko via Digitalmars-d

On Tuesday, 29 November 2016 at 16:54:55 UTC, Nordlöw wrote:
On Saturday, 26 November 2016 at 16:31:40 UTC, Ilya Yaroshenko 
wrote:

Bench results:

mir.random 32-bit Mt19937:
6.80851 Gb/s


Does Gb mean Gigabytes or Gigabits?


Gigabits


Re: [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.

2016-11-29 Thread Nordlöw via Digitalmars-d
On Saturday, 26 November 2016 at 16:31:40 UTC, Ilya Yaroshenko 
wrote:

Bench results:

mir.random 32-bit Mt19937:
6.80851 Gb/s


Does Gb mean Gigabytes or Gigabits?


Re: [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.

2016-11-28 Thread Steven Schveighoffer via Digitalmars-d

On 11/27/16 8:58 AM, Guillaume Piolat wrote:

On Sunday, 27 November 2016 at 13:35:48 UTC, Andrei Alexandrescu wrote:



A useful intermediate step is to have these "[shared] static this" ctor
call a function instead, so that programs without druntime can call them
too.


That would be progress.



Same story for core.cpuid which is initialized by a shared static
constructor.
https://github.com/dlang/druntime/blob/master/src/core/cpuid.d

It seems the "shared static this" is allowed to break immutable but a
function wouldn't. So maybe the variables here should be private
__gshared?.


How we do a similar thing for core.time:

https://github.com/dlang/druntime/blob/master/src/core/time.d#L2388

This function is called from the druntime startup (before any static ctors).

Not sure if it helps or not.

Note that we can cheat the type system here because we know that nothing 
else will access the data before it's initialized. There is also a check 
in the function to ensure it's only done once.


-Steve


Re: [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.

2016-11-28 Thread Jacob Carlborg via Digitalmars-d

On 2016-11-27 14:28, Guillaume Piolat wrote:


Without druntime, global ctor/dtor and TLS can't be used too.


Isn't TLS initialized by the OS?

--
/Jacob Carlborg


Re: [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.

2016-11-27 Thread jmh530 via Digitalmars-d
On Sunday, 27 November 2016 at 06:46:20 UTC, Ilya Yaroshenko 
wrote:


LAPACK API will be a part of Mir GLAS
https://github.com/libmir/mir-glas --Ilya


Great. I might have asked before, apologies if so.


Re: [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.

2016-11-27 Thread Ilya Yaroshenko via Digitalmars-d
On Sunday, 27 November 2016 at 13:58:29 UTC, Guillaume Piolat 
wrote:
On Sunday, 27 November 2016 at 13:35:48 UTC, Andrei 
Alexandrescu wrote:


A useful intermediate step is to have these "[shared] static 
this" ctor
call a function instead, so that programs without druntime 
can call them

too.


That would be progress.


Andrei


Same story for core.cpuid which is initialized by a shared 
static constructor.

https://github.com/dlang/druntime/blob/master/src/core/cpuid.d

It seems the "shared static this" is allowed to break immutable 
but a function wouldn't. So maybe the variables here should be 
private __gshared?.


They are in Mir CPUID, but methods are not pure.

https://github.com/libmir/mir-cpuid/blob/master/source/cpuid/x86_any.d


Re: [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.

2016-11-27 Thread Guillaume Piolat via Digitalmars-d
On Sunday, 27 November 2016 at 13:35:48 UTC, Andrei Alexandrescu 
wrote:


A useful intermediate step is to have these "[shared] static 
this" ctor
call a function instead, so that programs without druntime can 
call them

too.


That would be progress.


Andrei


Same story for core.cpuid which is initialized by a shared static 
constructor.

https://github.com/dlang/druntime/blob/master/src/core/cpuid.d

It seems the "shared static this" is allowed to break immutable 
but a function wouldn't. So maybe the variables here should be 
private __gshared?.


Re: [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.

2016-11-27 Thread Andrei Alexandrescu via Digitalmars-d

On 11/27/2016 08:28 AM, Guillaume Piolat wrote:

On Saturday, 26 November 2016 at 20:13:36 UTC, Andrei Alexandrescu wrote:


Congrats! Also thanks for using the Boost license which would allow
backporting the improvements to Phobos. Who'd be up for it?

Also I'm thinking of removing std.random's dependency on druntime,
e.g. by removing the uses of enforce. Thoughts?

Andrei


Without druntime, global ctor/dtor and TLS can't be used too.
Besides nothrow @nogc, std.random relies on a global ctor for creating
the default RNG, which is used liberally.


I see, so TLS is a problem.


A useful intermediate step is to have these "[shared] static this" ctor
call a function instead, so that programs without druntime can call them
too.


That would be progress.


Andrei



Re: [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.

2016-11-27 Thread Guillaume Piolat via Digitalmars-d
On Saturday, 26 November 2016 at 20:13:36 UTC, Andrei 
Alexandrescu wrote:


Congrats! Also thanks for using the Boost license which would 
allow backporting the improvements to Phobos. Who'd be up for 
it?


Also I'm thinking of removing std.random's dependency on 
druntime, e.g. by removing the uses of enforce. Thoughts?


Andrei


Without druntime, global ctor/dtor and TLS can't be used too.
Besides nothrow @nogc, std.random relies on a global ctor for 
creating the default RNG, which is used liberally.
A useful intermediate step is to have these "[shared] static 
this" ctor call a function instead, so that programs without 
druntime can call them too.


Re: [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.

2016-11-26 Thread Ilya Yaroshenko via Digitalmars-d

On Sunday, 27 November 2016 at 02:28:06 UTC, jmh530 wrote:
On Saturday, 26 November 2016 at 16:31:40 UTC, Ilya Yaroshenko 
wrote:


https://github.com/libmir :

 - mir - Generic numeric library
 - dcv - D Computer Vision library
 - mir-glas - Linear Algebra Subprograms (written in D)
 - mir-random - random numbers generators, includes 
non-uniform distributions.

 - mir-cpuid - CPU identification

Future:
 - mir-runtime - lightweight always inlined `nothrow @nogc` 
Dlang Runtime

 - mir-neural - neural network library [WIP]
 - mir-fft - Fast and multidimensional FFT
 - mir-svm - support vector machines

Cheers,
Ilya & Mir Team


How does a mir-lapack fit in to the plan going forward?


LAPACK API will be a part of Mir GLAS
https://github.com/libmir/mir-glas --Ilya


Re: [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.

2016-11-26 Thread jmh530 via Digitalmars-d
On Saturday, 26 November 2016 at 16:31:40 UTC, Ilya Yaroshenko 
wrote:


https://github.com/libmir :

 - mir - Generic numeric library
 - dcv - D Computer Vision library
 - mir-glas - Linear Algebra Subprograms (written in D)
 - mir-random - random numbers generators, includes non-uniform 
distributions.

 - mir-cpuid - CPU identification

Future:
 - mir-runtime - lightweight always inlined `nothrow @nogc` 
Dlang Runtime

 - mir-neural - neural network library [WIP]
 - mir-fft - Fast and multidimensional FFT
 - mir-svm - support vector machines

Cheers,
Ilya & Mir Team


How does a mir-lapack fit in to the plan going forward?


Re: [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.

2016-11-26 Thread Joseph Rushton Wakeling via Digitalmars-d
On Saturday, 26 November 2016 at 20:13:36 UTC, Andrei 
Alexandrescu wrote:
Also I'm thinking of removing std.random's dependency on 
druntime, e.g. by removing the uses of enforce. Thoughts?


There's no strong reason for those checks to be done via 
`enforce` except for a design decision that user input should be 
treated as external input.  The user is in complete control of 
those parameters, there's no reason they can't be `assert` checks.




Re: [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.

2016-11-26 Thread Ilya Yaroshenko via Digitalmars-d
On Saturday, 26 November 2016 at 20:13:36 UTC, Andrei 
Alexandrescu wrote:

On 11/26/16 11:31 AM, Ilya Yaroshenko wrote:

Hey,

32-bit Mt19937 random number Generator is default in Phobos.
It is default in Mir too, except that 64-bit targets use 
64-bit Mt19937

instead.

The last Mir Random beta improves performance for Mt19937.

The goal was to:


Congrats! Also thanks for using the Boost license which would 
allow backporting the improvements to Phobos. Who'd be up for 
it?


Yep, I am using Phobos parts and we can do backports. In other 
hand it is requires efforts (I have experience with ndslice, and 
I will be happy to remove ndslice from Mir after LDC 1.2.0 
release).


Also I'm thinking of removing std.random's dependency on 
druntime, e.g. by removing the uses of enforce. Thoughts?


Andrei


Another one DRuntime dependency is core time.

BTW, I would be happy to have core.time + std.datetime as unified 
nothrow @nogc library without DRuntime dependency (ahah, it is 
BetterC propaganda!)


Ilya



Re: [Bench!][Mir] +54%..+185% performance boost for Mersenne Twister.

2016-11-26 Thread Andrei Alexandrescu via Digitalmars-d

On 11/26/16 11:31 AM, Ilya Yaroshenko wrote:

Hey,

32-bit Mt19937 random number Generator is default in Phobos.
It is default in Mir too, except that 64-bit targets use 64-bit Mt19937
instead.

The last Mir Random beta improves performance for Mt19937.

The goal was to:


Congrats! Also thanks for using the Boost license which would allow 
backporting the improvements to Phobos. Who'd be up for it?


Also I'm thinking of removing std.random's dependency on druntime, e.g. 
by removing the uses of enforce. Thoughts?


Andrei