So, the important question - are we going to enter it? It looks like we'd get access to a computer with a number of "cores" containing nehalems and tesla cards. The problem here will be optimising distribution of the work to be done across these units given the different nature of the processors. This is a problem we'll have to look at parallelising MPIR/bsdnt or whatever so perhaps it is worth doing just as an exercise?
As you probably know parellelising differs from execution in that you generally do this: serial work -> fork -> (in parallel) do work -> join (wait) -> serial work. There are a number of issues: * Creating jobs of roughly equal time. Before the parallel section is considered complete all the threads have to complete so you have to parcel up your tasks carefully. Straight out #omp pragmas are great for lazy programmers like me doing basic SIMD stuff but you might find splitting a job into two threads, allowing t1 to run and doing two parallel jobs in t2 is a faster way of splitting up the work than creating jobs u1...u4 and allowing them to run, if that makes sense. On a larger scale this might be affected by how many cores you have when you initiate the job * Making the process of doing the above efficient enough that it doesn't take longer than the parallel region. No point in attempting to solve a NP-complete problem with a scheduler running at O(n!)... What do we think? I was intending to start with OpenMP and see if that on its own gives us a speed increase but....? Antony On 04/20/2010 03:25 PM, Bill Hart wrote: > See: > > http://pasco2010.imag.fr/contest.html > > Bill. > -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to mpir-de...@googlegroups.com. To unsubscribe from this group, send email to mpir-devel+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en.