Rob Giltrap wrote: > Calling all performance experts! Here's you chance to become part of > mathematics history contributing to the finding of one of the rarest > things known to man... Mersenne Primes (there are currently only 44 > known to exist). Refer: http://en.wikipedia.org/wiki/Mersenne_prime > > How does this relate to OpenSolaris? Well OpenSolaris is involved in > this years Google Summer of Code (GSoC)... > > Refer: http://www.opensolaris.org/os/project/summerofcode/students/ > > One of the projects is the FFT project which basically is about > leveraging OpenSolaris for it's incredible scalability, cross platform > support and most importantly it's observability and performance tools as > a showcase for it's usefulness in highly threaded / HPC computing. > > To date we have taken what was a single threaded 'C' application > 'Mlucas', added coarse grained parallel code (using OpenMP) to the > majority of the program (there are a few minor components still being > worked on) and compiled using the latest SunStudio12 C compiler. This > has resulted in significant scalability as shown below... > > Sparc64 VI (2.13Ghz x 8 CPU (16 core)) > - 1 thread = 106 secs > - 2 threads = 55 secs (x 1.93) > - 4 threads = 31 secs (x 3.42) > - 8 threads = 18 secs (x 5.89) > > We are still a couple of days away from getting the application to be > able to usefully employ 16 threads, however assuming even modest > improvement beyond 8 threads, it will result in a new world record for > the verification of Mersenne Primes. Currently at 8 threads on the > Sparc64 VI it would take over 7.5 days to do the calculations for just > one exponent. > > There are only 5 weeks left of the GSoC program and we now need to look > at further tuning the software particularly for OpenSolaris on the Sparc > & x64 architectures. > > What I require is for those on this list with performance experience to > provide a little help in removing any system bottlenecks that might be > there. This is actually not a large task as the application is only > bound by CPU and real memory. There is no network activity, a tiny > amount of disk I/O occurs once every 30+ secs, the application uses less > than 100MB of real memory and is pretty much all integer calculations. > > So I only need help with CPU/Memory (particularly cache) performance. > Basic checks have been made for cpu utilization, trapstat & context > switching issues etc but we really need to bring in a bit more knowledge > into the team. Ideally what I would like is for someone just to take 1-2 > hours and give it the once over in terms of, are there any obvious wins > that we have missed, which may include pointing us to the right Dtrace > scripts to use for ongoing analysis. > > The Mlucas application is actually VERY simple in terms of processing > and should be easy to analyze. There are four steps: > - transform > - square > - inverse transform > - carry > > These steps currently (at 8 threads) collectively take 0.018 secs to > complete and then they run again and again (about 36million times). Thus > ANY performance improvement makes a big difference as anything * 36 > million equals something significant. > > So my request is.... if you may have the skills we are after please > contact me and I can provide more information (an initial inquiry will > not be treated as your guarantee of further commitment!). We can provide > you with whatever you need and if you are in the Bay Area all the better > as we could arrange for a physical meet with the main developer. Our two > main developers are VERY smart cookies, we are just a little short on > OpenSolaris observability/performance expertise. > > One benefit to OpenSolaris as a whole, is that this tuning will be > including in the case study I'm writing up showcasing OpenSolaris as a > great environment in which to do highly threaded / HPC type development. > It should also result in OpenSolaris being the infrastructure that is > used to verify the EFF prizewinning 10+million digit 45th Mersenne Prime > (when it is found) resulting in additional media attention for > OpenSolaris especially within the math and HPC sciences arenas. > > Many thanks, > > Rob Giltrap > OpenSolaris - Google Summer of Code Mentor.
Hi Rob - Have you taken a pass at this w/ DProfile (in the latest sun studio)? You should look for cache/tlb hots spots; this may show false sharing, excessive conflicts, etc. - Bart -- Bart Smaalders Solaris Kernel Performance [EMAIL PROTECTED] http://blogs.sun.com/barts _______________________________________________ perf-discuss mailing list [email protected]
