At 12:52 AM 9/6/2006, Jason wrote: >| | |****************| | | |****************| >|c| | block i after | propagate |c| | block i+1 after| >|a| | itwiddle, IFFT | carries |a| | itwiddle, IFFT | >|r|--> | and IDWT | for ea --> |r|--> | and IDWT | --> ... >|r| | weights | row |r| | weights | >|y| |****************| |y| |****************| > >The point is that carries can be propagated while each block is being >processed, until you finish the last block. Then all the carries have >to "carriage return" to the next row, the last carry wraps around to >the first row like before, and the carry vector propagates 10 elements >into the first block
This is the scheme prime95 currently uses - and as you noted it introduces a data dependence. In pass 2, the blocks are truly independent and I can dole a data block to each thread as needed. Very easy and efficient. In pass 1, with the data dependence introduced by carry propagation, I can see two choices (as an example say there are 1024 data blocks and 8 threads): 1) Give each thread a contiguous set of data blocks to crunch through. Thread 1 gets blocks 0 - 127, thread 2 gets 128 - 255, etc. Two major disadvantages: a) if one CPU core is busy with something else then 7 threads finish and 6 then sit idle waiting for the 8th thread to finish. b) more memory for carry buffers and more overhead propagating final carries. 2) Have the 8 threads start on data blocks 0-7. Put a lock around the carry code such that it insures that data blocks are processed in order. The theory is that all 8 threads race to the carry code, but then the lock "spaces them apart" such that thread 1 does carries for block 0 and starts data block 9, then thread 2 starts carries for data block 1 and then starts data block 10. The major negative for this method is that there is a limit to the number of threads that can be kept busy. If the code is spending 3 units of time in the FFT code and 1 unit of time in the carry code, then only 4 threads can be kept busy this way. I lean toward the second solution. Does anyone have any real-world experiences or alternative suggestions? _______________________________________________ Prime mailing list [email protected] http://hogranch.com/mailman/listinfo/prime
