On Wednesday, 10 October 2018 at 20:43:01 UTC, Kagamin wrote:
On Wednesday, 10 October 2018 at 16:15:56 UTC, Jabari Zakiya wrote:
https://gist.github.com/jzakiya/6c7e1868bd749a6b1add62e3e3b2341e

As i understand, main thread preallocates global memory and tracks it, and other threads don't track it?

Here's an abreviated elementary explanation of the algorithm and implementation. You will need to read (download) my paper "The Segmented Sieve of Zakiya (SSoZ)" because I will make reference to things in it, to keep this as short as possible.

https://www.scribd.com/doc/228155369/The-Segmented-Sieve-of-Zakiya-SSoZ

To really understand the math and theory of the algorithm requires primarily just understanding Table 3 on page 3 of the paper, as it encapsulates everything. You can read the paper to understand the language of the algorithm used in the code.

Twinprimes are 2 (odd) primes that differ by only 2 eg. 3-5, 5-7, 11-13, 29-31.

Table 3 shows all the residues values (prime candidates|pcs) and residues groups (resgroups|columns) to find all the primes upto 541 using P5 (modpg = 30, rescnt = 8).

For a given value N, it will be represented with a PG table of some number of resgroups, with max size I call Kmax (the regroups value N residues in).

Using P5, I only need to sieve primes along the residue tracks (restracks) that can produce twinprimes, here 11-13, 17-19, 29-31. Thus I create 3 byte arrays, one for each twinpair, and use the lower 2 bits to represent the upper and lower twinprime restracks.

Then for each twinpair (here 3) I run 3 thread which perform the SSoZ on the entire Kmax length of resgroups in parallel. At the end I accumulate the results and print them out. This, in a nutshell, is what the algorithm does. The paper gives you enough to understand the fundamental nature of the algorithm, though I've learned so much more than in 2014. :)

The larger the PG the more twinpair restracks (see page 14) there are to use. For larger numbers you want to use the largest PG possible that 'fits' into the hardware cpu. All my development|testing was done on laptops using Intel I5|I7 cpus with 4 or 8 threads. I'm really interested how it performs on other platforms (ARM, AMD, PowerPC, etc).


The main proc "twinprimes_ssoz" manages program flow and set as follows:

1) accepts the input number (an integer) in "val" from the cli

2) sets "num" to be first odd number < "val" if "val" even

3) calls "selectPG" with "num" to select optimum Prime Generator (PG) parameters

4) compute various working parameters per selected PG and number value (see refs)

5) compute global values for number of primes, and their values, <= sqrt(num) for
selected PG with proc "sozpg"

6) Set initial twinprime count for selected PG

7) Then with proc "segsieve" allocate one thread to perform SSoZ (segmented sieve of zakiya) on each twinprime pair residues (determined by selected PG), and count number of twinprmes
computed in each thread.

8) Then determine true total twinprime count and last twinprime <= "num"

It also is timing different intervals and prints out diagnostics and final output.


The proc "twins_sieve" is the primary function that manages and performs the SSoZ for a given twinprim pair parameters in each thread. The more threads the faster the process goes.

I'll provide more info later. I have to run now. I wanted to get this out now while I
was at my laptop, and online.

Reply via email to