On Saturday, 26 November 2016 at 16:31:40 UTC, Ilya Yaroshenko wrote:
1. Improve RNG generation performance by making code more friendly for CPU pipelining. Tempering (finalization) operations was mixed with internal payload update operations.

A note on this. The `opCall` (or, in the range version, `popFront`) of Ilya's implementation mixes together two superficially independent actions:

(1) calculating the current random variate from the current index
      of the internal state array;

  (2) updating the current index of the internal state array, and
      moving to the next entry.

It's straightforward to split out these two procedures into two separate methods (or at least two clearly separated sequences within the `opCall`), but doing so results in a notable performance hit (on my machine, something in the order of 1 GB/s less random bits).

Intertwining these steps in this way is therefore a very smart optimization (although TBH it feels a little worrying that it's necessary).

Reply via email to