I don't know if this is currently possible, but it would be nice to have.

It seems I could theoretically speed the execution of this code.
    
    
    proc segsieve(Kmax: uint, KB: int) = # for Kn resgroups|bytes in segment
      let Ks = KB                        # make default seg size immutable
      parallel:                          # perform SSoZ in parallel
        for r in 0..rescnt-1:            # for each residue track number 'r'
          let nextp_row  = r * pcnt      # set the 'nextp' table row address
          let seg_row = r * Ks           # set the 'seg' memory row address
          spawn residue_sieve(nextp_row, seg_row, Kmax, Ks, r) # do sieve for 
row 'r'
      sync()                             # wait for all row threads to finish
      for i in 0..rescnt-1:              # update 'primecnt' with the count of
        primecnt += cnts[i]              # segment primes for each 'seg' row
    

Here `sync()` causes the following code to wait for execution until all the 
threads finished executing. It should be theoretically possible to speed 
overall execution by having the `cnts` from each thread be asynchronously put 
into a thread queue (FIFO) and extracted and added to `primecnt`. Since here 
there are a known number of `cnt` values (`rescnt` amount) `primecnt` can then 
be updated as these values become availble until `rescnt` are added. Is this 
possible now? Could it be faster?

Reply via email to