We have proven with the straight translation of Daniel Bernstein's C code to golang that the Sieve of Eratosthenes (SoE) eratspeed rans about the same speed as the Sieve of Atkin (SoA) primespeed for the same range with the same optimizations and with John Barham's multi-treading for primespeed disabled (set numprocs = 1 in sieve.go), however golang primespeed is slower with array bounds checks enabled due to the complexity of the array operations for the prime square free operations. For both I eliminated the use of the "two" shifting look up array table as that caused an extra bounds check, and tuned the inner loops in exactly the same way to minimize tight loop time.
Now that already shows that the SoA is inferior for page segmented sieving to 1,000,000,000 (one billion), as it only has about 258,000,000 operations whereas SoE eratspeed has about 405,000,000 operations, meaning that the SoA operations are more complex, which is obvious. However, Bernstein crippled eratspeed to only 2/3/5 wheel factorization where it is easily possible to use 2/3/5/7 plus pre-culling of the 11/13/17 primes to reduce the operations to about 263,000,000 operations, or about 35% less. The golang FasterEratspeed that implements this maximum wheel factorization version is posted at play.golang at: https://play.golang.org/p/8Bo04bnT5F and runs about 33% faster than the previous standard wheel factorized version, just as the ratio of operations suggests that it should; also, because it minimizes array access it doesn't matter much whether array bound checking is enabled or not. This is almost as fast as the highly optimized C++ primesieve (http://primesieve.org/) run signle threaded on my cache access speed bottle-necked machine, but on faster machines it will be slower than primesieve, as much as twice as slow on high end Intel desktop CPU's. Actually, this implementation isn't maximally factorized as in some languages one is able to also pre-cull by 19 as well, but with golang the overhead of creating handling the extra size of pattern array is about the same as the few percent gain. So, in this implementation we are coming very close to C/C++ speeds and faster than C#/Java; however, I learned much from the exercise and have found that very slight changes in the formulation of the tight inner loop(s) can slow the speed by at least a factor of two and sometimes up to almost three. These best opimizations aren't always obvious and it is time consuming trying the different optimizations to get the fastest code. Sometimes adding instructions actually reduces the execution speed, as the compiler will perhaps change the use of registers in a way that there are less stalls or bottle-necks. For this particular implementation and golang version 1.7beta1, it seems that for this particular loop, it is optimized quite effectively in spite of having the array bounds check and even better without it. That should be enough to prove that the SoA is just an interesting intellectual exercise, as for page seqmented sieves as these it has more complexity and is thus slower than the maximally factorized SoE, and also does not support its theoretical O(n) asymptotic complexity as with increasing range there is an ever increasing cost in code overhead, whereas the SoE comes very close to maintaining its theoretical O(n log log n) performance. -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.