On Monday, 13 February 2017 at 00:56:37 UTC, Nestor wrote:
On Sunday, 12 February 2017 at 05:54:34 UTC, Era Scarecrow
wrote:
Ran some more tests.
Wow!
Thanks for the interest and effort.
Certainly. But the bulk of the answer comes down that the 2
levels that I've already provided are the fastest you're probably
going to get. Certainly we can test using shorts or bytes
instead, but it's likely the results will only go down.
To note my tests are strictly on my x86 system and it would be
better to also test this on other systems like PPC, Linux, ARM,
and other architectures to see how they perform, and possibly
tweak them as appropriate.
Still we did find out there is some optimization that can be
done and successfully for the Damm algorithm, it just isn't going
to be a lot.
Hmmm... A thought does come to mind. Parallelizing the code;
However that would require probably 11 instances to get a 2x
speedup (calculating the second half with all 10 possibilities
for the carry over, and also calculating the first half, then
choosing which of the 10 based on the first half's output), which
only really works if you have a ton of cores, and the input is
REALLY REALLY large, like a meg or something. While the usage of
the Damm code is more useful for adding a digit to the end of a
code like UPC or Barcodes as error detection, and expecting
larger than 32 for real applications is unlikely.
But at this point I'm rambling.