So, the important question - are we going to enter it?

It looks like we'd get access to a computer with a number of "cores"
containing nehalems and tesla cards. The problem here will be optimising
distribution of the work to be done across these units given the
different nature of the processors. This is a problem we'll have to look
at parallelising MPIR/bsdnt or whatever so perhaps it is worth doing
just as an exercise?

As you probably know parellelising differs from execution in that you
generally do this: serial work -> fork -> (in parallel) do work -> join
(wait) -> serial work. There are a number of issues:

* Creating jobs of roughly equal time. Before the parallel section is
considered complete all the threads have to complete so you have to
parcel up your tasks carefully. Straight out #omp pragmas are great for
lazy programmers like me doing basic SIMD stuff but you might find
splitting a job into two threads, allowing t1 to run and doing two
parallel jobs in t2 is a faster way of splitting up the work than
creating jobs u1...u4 and allowing them to run, if that makes sense. On
a larger scale this might be affected by how many cores you have when
you initiate the job
* Making the process of doing the above efficient enough that it doesn't
take longer than the parallel region. No point in attempting to solve a
NP-complete problem with a scheduler running at O(n!)...

What do we think?

I was intending to start with OpenMP and see if that on its own gives us
a speed increase but....?

Antony

On 04/20/2010 03:25 PM, Bill Hart wrote:
> See:
> 
> http://pasco2010.imag.fr/contest.html
> 
> Bill.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to mpir-de...@googlegroups.com.
To unsubscribe from this group, send email to 
mpir-devel+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en.

Reply via email to