On Friday, March 11, 2016 at 9:01:38 PM UTC-5, Brian Adkins wrote:
> On Friday, March 11, 2016 at 7:42:22 PM UTC-5, Brian Adkins wrote:
> > I coded up a sequential and parallel version of N-Queens, then did a ton of 
> > benchmark runs of 13-Queens to compare the time. For each configuration 
> > (sequential or parallel w/ M workers), I ran the programs 6 times, threw 
> > out the high two & low two and averaged the middle two numbers.
> > 
> > The spreadsheet with timings is here:
> > 
> > https://docs.google.com/spreadsheets/d/1LFwdZbBveaARY_AquGXY9jgaSJlOA03NQCV9TeYQ-l8/edit?usp=sharing
> > 
> > The code is here:
> > 
> > https://gist.github.com/lojic/aef0aec491d3dc9cb40b
> > 
> > I didn't spend any time refining/optimizing, so it's fairly crude, but 
> > informative nonetheless.
> > 
> > The executive summary of timings for the parallel version:
> > 
> > # Places    Time
> > 1   34.9
> > 2   19.7
> > 3   13.8
> > 4   12.3
> > 5   11.9
> > 6   12.9
> > 7   12.1
> > 8   12.2
> > 
> > The sequential version took 31.3 seconds.
> > 
> > The basic idea for the parallel version is to place the first 13 starting 
> > positions in a queue that the place workers will pull from and produce a 
> > set of solutions with that starting position. Both the parallel and 
> > sequential versions collect all 73,712 solutions in a list and compute the 
> > length of it. I hardcoded the number of solutions as a quick & dirty way of 
> > determining completion in the parallel version just to allow me to get the 
> > timings easily.
> > 
> > It was great to finally write some Racket code that got all my cores heated 
> > up :)
> > 
> > Brian
> 
> Interesting. My first parallel version had the workers write each solutions 
> (list of N queen positions) to the parent as they were computed. I just 
> changed the program to collect all the solutions for a given prefix (closer 
> to the way the sequential version works) and send that list to the parent.
> 
> The result is a modest 4% speeedup. I had assumed that creating the extra 
> intermediate lists would be slower.
> 
> I thought I read something about the implementation of Places that allows 
> data to be moved efficiently from one Place to another via virtual memory 
> pages - maybe something like that is going on. Regardless, I'm happy that the 
> more natural algorithm (to me anyway) is also more efficient. I guess the 
> overhead of many place-channel-put/get calls is more expensive than the 
> allocation & garbage collection of the intermediate lists.
> 
> The second parallel version referred to above is here: 
> https://gist.github.com/lojic/957828ce7c67c0376a23

I cleaned up the code a bit (removed hardcoded completion checks, etc.) and 
moved the code to a git repo:

https://github.com/lojic/LearningRacket/tree/master/miscellaneous

I was hoping to get more than a 3x speedup, but my final results show a 2.8x 
speedup on 4 cores. I suppose the amount of work per solution isn't large 
enough for a better speedup.

In the N=13 case that takes 11 seconds, there are 6,700 solutions per second 
being sent across a place-channel with each solution being a list of 13 structs 
of 2 elements.

Still, 2.8x is appreciated :)

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to