[petsc-dev] [GPU] Performance on Fermi
On Fri, 27 Aug 2010 16:06:30 -0500, Keita Teranishi keita at cray.com wrote: Barry, The CPU timing I reported was after recompiling the code (I removed PETSC_USE_DEBUG and GDB macros from petscconf.h). Unless you were manually overriding compiler flags, it still wasn't optimized. Please just reconfigure a new PETSC_ARCH --with-debugging=0. It's as easy as foo-dbg/conf/reconfigure-foo-dbg.py --with-debugging=0 PETSC_ARCH=foo-opt make PETSC_ARCH=foo-opt Jed
[petsc-dev] [GPU] Performance on Fermi
On Fri, 27 Aug 2010 16:18:43 -0500, Keita Teranishi keita at cray.com wrote: Yes, I replaced all the compiler flags by -O3. petsc-maint doesn't come to me, but if the snippet that Barry quoted was from your log_summary, then PETSC_USE_DEBUG was definitely defined when plog.c was compiled. It's really much easier to have two separate builds and always use the optimized one when profiling. Jed
[petsc-dev] [GPU] Performance on Fermi
On Fri, 27 Aug 2010 16:34:45 -0500, Keita Teranishi keita at cray.com wrote: Jed, I usually manually edit petscconf.h and petscvariables to change the installation configurations for Cray XT/XE. The problem is configure script of PETSc picks up wrong variables and #define macros because the OS and library setting on the login node is different from the compute node. This particular case is just a mistake in configure script (and it's not a big deal to fix), but it will be great if you have any ideas to avoid picking up wrong settings. If it's behaving incorrectly when you configure --with-batch, it is a configure bug, so please submit the full error. Jed
[petsc-dev] [GPU] Performance on Fermi
Keita, I'd just like to echo what Barry says. I probably build petsc-dev on Jaguar more than any other person, and I generally don't have to manually edit any files generated by configure.py. When I do, I either find and fix the problem in BuildSystem, or work with the petsc-maint folks to fix it. If you will report problems to petsc-maint, we can work to ensure that you don't have to do these manual edits. Best regards, Richard On 8/27/2010 8:00 PM, Barry Smith wrote: On Aug 27, 2010, at 4:34 PM, Keita Teranishi wrote: Jed, I usually manually edit petscconf.h and petscvariables to change the installation configurations for Cray XT/XE. The problem is configure script of PETSc picks up wrong variables and #define macros because the OS and library setting on the login node is different from the compute node. Keita, We would prefer that you complain to petsc-maint at mcs.anl.gov so that we can fix configure problems and not have anyone editing the generated files. Barry 1) so that it works for all users not just those that know how to edit those files. We cannot fix problems we don't know about 2) editing those files repeatedly is fragile and it is easy to make a slight mistake that's hard to track down. This particular case is just a mistake in configure script (and it's not a big deal to fix), but it will be great if you have any ideas to avoid picking up wrong settings. Thanks, Keita Teranishi Scientific Library Group Cray, Inc. keita at cray.com -Original Message- From: Jed Brown [mailto:five9a2 at gmail.com] On Behalf Of Jed Brown Sent: Friday, August 27, 2010 4:29 PM To: Keita Teranishi; For users of the development version of PETSc Subject: RE: [petsc-dev] [GPU] Performance on Fermi On Fri, 27 Aug 2010 16:18:43 -0500, Keita Teranishikeita at cray.com wrote: Yes, I replaced all the compiler flags by -O3. petsc-maint doesn't come to me, but if the snippet that Barry quoted was from your log_summary, then PETSC_USE_DEBUG was definitely defined when plog.c was compiled. It's really much easier to have two separate builds and always use the optimized one when profiling. Jed -- Richard Tran Mills, Ph.D.| E-mail: rmills at climate.ornl.gov Computational Scientist | Phone: (865) 241-3198 Computational Earth Sciences Group | Fax:(865) 574-0405 Oak Ridge National Laboratory| http://climate.ornl.gov/~rmills
[petsc-dev] [GPU] Performance on Fermi
On Fri, Aug 27, 2010 at 7:19 PM, Keita Teranishi keita at cray.com wrote: Barry, Yes. It improves the performance dramatically, but the execution time for KSPSolve stays the same. MatMult 5.2 Gflops I will note that to put the matvec on the GPU you will also need -mat_type aijcuda. Matt Thanks, Keita Teranishi Scientific Library Group Cray, Inc. keita at cray.com -Original Message- From: petsc-dev-bounces at mcs.anl.gov [mailto:petsc-dev-bounces at mcs.anl.gov] On Behalf Of Barry Smith Sent: Friday, August 27, 2010 2:15 PM To: For users of the development version of PETSc Subject: [petsc-dev] [GPU] Performance on Fermi PETSc-dev folks, Please prepend all messages to petsc-dev that involve GPUs with [GPU] so they can be easily filtered. Keita, To run src/ksp/ksp/examples/tutorials/ex2.c with CUDA you need the flag -vec_type cuda Note also that this example is fine for simple ONE processor tests but should not be used for parallel testing because it does not do a proper parallel partitioning for performance Barry On Aug 27, 2010, at 2:04 PM, Keita Teranishi wrote: Hi, I ran ex2.c with a matrix from 512x512 grid. I set CG and Jacobi for the solver and preconditioner. GCC-4.4.4 and CUDA-3.1 are used to compile the code. BLAS and LAPAKCK are not optimized. MatMult Fermi:1142 MFlops 1 core Istanbul: 420 MFlops KSPSolve: Fermi:1.5 Sec 1 core Istanbul: 1.7 Sec Keita Teranishi Scientific Library Group Cray, Inc. keita at cray.com -Original Message- From: petsc-dev-bounces at mcs.anl.gov [mailto: petsc-dev-bounces at mcs.anl.gov] On Behalf Of Satish Balay Sent: Friday, August 27, 2010 1:49 PM To: For users of the development version of PETSc Subject: Re: [petsc-dev] Problem with petsc-dev On Fri, 27 Aug 2010, Satish Balay wrote: There was a problem with tarball creation for the past few days. Will try to respin manually today - and update you. the petsc-dev tarball is now updated on the website.. Satish -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -- next part -- An HTML attachment was scrubbed... URL: http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20100828/38cd38a9/attachment.html
[petsc-dev] [GPU] Performance on Fermi
PETSc-dev folks, Please prepend all messages to petsc-dev that involve GPUs with [GPU] so they can be easily filtered. Keita, To run src/ksp/ksp/examples/tutorials/ex2.c with CUDA you need the flag -vec_type cuda Note also that this example is fine for simple ONE processor tests but should not be used for parallel testing because it does not do a proper parallel partitioning for performance Barry On Aug 27, 2010, at 2:04 PM, Keita Teranishi wrote: Hi, I ran ex2.c with a matrix from 512x512 grid. I set CG and Jacobi for the solver and preconditioner. GCC-4.4.4 and CUDA-3.1 are used to compile the code. BLAS and LAPAKCK are not optimized. MatMult Fermi:1142 MFlops 1 core Istanbul: 420 MFlops KSPSolve: Fermi:1.5 Sec 1 core Istanbul: 1.7 Sec Keita Teranishi Scientific Library Group Cray, Inc. keita at cray.com -Original Message- From: petsc-dev-bounces at mcs.anl.gov [mailto:petsc-dev-bounces at mcs.anl.gov] On Behalf Of Satish Balay Sent: Friday, August 27, 2010 1:49 PM To: For users of the development version of PETSc Subject: Re: [petsc-dev] Problem with petsc-dev On Fri, 27 Aug 2010, Satish Balay wrote: There was a problem with tarball creation for the past few days. Will try to respin manually today - and update you. the petsc-dev tarball is now updated on the website.. Satish
[petsc-dev] [GPU] Performance on Fermi
Barry, CPU version takes another digit. So it is 1.6 sec on Fermi and 17 sec 1 core CPU. Thanks, ?Keita Teranishi ?Scientific Library Group ?Cray, Inc. ?keita at cray.com -Original Message- From: petsc-dev-bounces at mcs.anl.gov [mailto:petsc-dev-boun...@mcs.anl.gov] On Behalf Of Keita Teranishi Sent: Friday, August 27, 2010 2:20 PM To: For users of the development version of PETSc Subject: Re: [petsc-dev] [GPU] Performance on Fermi Barry, Yes. It improves the performance dramatically, but the execution time for KSPSolve stays the same. MatMult 5.2 Gflops Thanks, ?Keita Teranishi ?Scientific Library Group ?Cray, Inc. ?keita at cray.com -Original Message- From: petsc-dev-bounces at mcs.anl.gov [mailto:petsc-dev-boun...@mcs.anl.gov] On Behalf Of Barry Smith Sent: Friday, August 27, 2010 2:15 PM To: For users of the development version of PETSc Subject: [petsc-dev] [GPU] Performance on Fermi PETSc-dev folks, Please prepend all messages to petsc-dev that involve GPUs with [GPU] so they can be easily filtered. Keita, To run src/ksp/ksp/examples/tutorials/ex2.c with CUDA you need the flag -vec_type cuda Note also that this example is fine for simple ONE processor tests but should not be used for parallel testing because it does not do a proper parallel partitioning for performance Barry On Aug 27, 2010, at 2:04 PM, Keita Teranishi wrote: Hi, I ran ex2.c with a matrix from 512x512 grid. I set CG and Jacobi for the solver and preconditioner. GCC-4.4.4 and CUDA-3.1 are used to compile the code. BLAS and LAPAKCK are not optimized. MatMult Fermi:1142 MFlops 1 core Istanbul: 420 MFlops KSPSolve: Fermi:1.5 Sec 1 core Istanbul: 1.7 Sec Keita Teranishi Scientific Library Group Cray, Inc. keita at cray.com -Original Message- From: petsc-dev-bounces at mcs.anl.gov [mailto:petsc-dev-bounces at mcs.anl.gov] On Behalf Of Satish Balay Sent: Friday, August 27, 2010 1:49 PM To: For users of the development version of PETSc Subject: Re: [petsc-dev] Problem with petsc-dev On Fri, 27 Aug 2010, Satish Balay wrote: There was a problem with tarball creation for the past few days. Will try to respin manually today - and update you. the petsc-dev tarball is now updated on the website.. Satish
[petsc-dev] [GPU] Performance on Fermi
## ## # WARNING!!!# ## # This code was compiled with a debugging option, # # To get timing results run ./configure# # using --with-debugging=no, the performance will # # be generally two or three times faster. # ## ## You need to build the code with ./configure --with-debugging=0 to make a far comparison. This will speed up the CPU version. Barry On Aug 27, 2010, at 2:22 PM, Keita Teranishi wrote: Barry, CPU version takes another digit. So it is 1.6 sec on Fermi and 17 sec 1 core CPU. Thanks, Keita Teranishi Scientific Library Group Cray, Inc. keita at cray.com -Original Message- From: petsc-dev-bounces at mcs.anl.gov [mailto:petsc-dev-bounces at mcs.anl.gov] On Behalf Of Keita Teranishi Sent: Friday, August 27, 2010 2:20 PM To: For users of the development version of PETSc Subject: Re: [petsc-dev] [GPU] Performance on Fermi Barry, Yes. It improves the performance dramatically, but the execution time for KSPSolve stays the same. MatMult 5.2 Gflops Thanks, Keita Teranishi Scientific Library Group Cray, Inc. keita at cray.com -Original Message- From: petsc-dev-bounces at mcs.anl.gov [mailto:petsc-dev-bounces at mcs.anl.gov] On Behalf Of Barry Smith Sent: Friday, August 27, 2010 2:15 PM To: For users of the development version of PETSc Subject: [petsc-dev] [GPU] Performance on Fermi PETSc-dev folks, Please prepend all messages to petsc-dev that involve GPUs with [GPU] so they can be easily filtered. Keita, To run src/ksp/ksp/examples/tutorials/ex2.c with CUDA you need the flag -vec_type cuda Note also that this example is fine for simple ONE processor tests but should not be used for parallel testing because it does not do a proper parallel partitioning for performance Barry On Aug 27, 2010, at 2:04 PM, Keita Teranishi wrote: Hi, I ran ex2.c with a matrix from 512x512 grid. I set CG and Jacobi for the solver and preconditioner. GCC-4.4.4 and CUDA-3.1 are used to compile the code. BLAS and LAPAKCK are not optimized. MatMult Fermi: 1142 MFlops 1 core Istanbul: 420 MFlops KSPSolve: Fermi: 1.5 Sec 1 core Istanbul: 1.7 Sec Keita Teranishi Scientific Library Group Cray, Inc. keita at cray.com -Original Message- From: petsc-dev-bounces at mcs.anl.gov [mailto:petsc-dev-bounces at mcs.anl.gov] On Behalf Of Satish Balay Sent: Friday, August 27, 2010 1:49 PM To: For users of the development version of PETSc Subject: Re: [petsc-dev] Problem with petsc-dev On Fri, 27 Aug 2010, Satish Balay wrote: There was a problem with tarball creation for the past few days. Will try to respin manually today - and update you. the petsc-dev tarball is now updated on the website.. Satish
[petsc-dev] [GPU] Performance on Fermi
Yes, I replaced all the compiler flags by -O3. ?Keita Teranishi ?Scientific Library Group ?Cray, Inc. ?keita at cray.com -Original Message- From: Jed Brown [mailto:five...@gmail.com] On Behalf Of Jed Brown Sent: Friday, August 27, 2010 4:16 PM To: Keita Teranishi; For users of the development version of PETSc Subject: Re: [petsc-dev] [GPU] Performance on Fermi On Fri, 27 Aug 2010 16:06:30 -0500, Keita Teranishi keita at cray.com wrote: Barry, The CPU timing I reported was after recompiling the code (I removed PETSC_USE_DEBUG and GDB macros from petscconf.h). Unless you were manually overriding compiler flags, it still wasn't optimized. Please just reconfigure a new PETSC_ARCH --with-debugging=0. It's as easy as foo-dbg/conf/reconfigure-foo-dbg.py --with-debugging=0 PETSC_ARCH=foo-opt make PETSC_ARCH=foo-opt Jed
[petsc-dev] [GPU] Performance on Fermi
Jed, I usually manually edit petscconf.h and petscvariables to change the installation configurations for Cray XT/XE. The problem is configure script of PETSc picks up wrong variables and #define macros because the OS and library setting on the login node is different from the compute node. This particular case is just a mistake in configure script (and it's not a big deal to fix), but it will be great if you have any ideas to avoid picking up wrong settings. Thanks, ?Keita Teranishi ?Scientific Library Group ?Cray, Inc. ?keita at cray.com -Original Message- From: Jed Brown [mailto:five...@gmail.com] On Behalf Of Jed Brown Sent: Friday, August 27, 2010 4:29 PM To: Keita Teranishi; For users of the development version of PETSc Subject: RE: [petsc-dev] [GPU] Performance on Fermi On Fri, 27 Aug 2010 16:18:43 -0500, Keita Teranishi keita at cray.com wrote: Yes, I replaced all the compiler flags by -O3. petsc-maint doesn't come to me, but if the snippet that Barry quoted was from your log_summary, then PETSC_USE_DEBUG was definitely defined when plog.c was compiled. It's really much easier to have two separate builds and always use the optimized one when profiling. Jed
[petsc-dev] [GPU] Performance on Fermi
On Aug 27, 2010, at 4:34 PM, Keita Teranishi wrote: Jed, I usually manually edit petscconf.h and petscvariables to change the installation configurations for Cray XT/XE. The problem is configure script of PETSc picks up wrong variables and #define macros because the OS and library setting on the login node is different from the compute node. Keita, We would prefer that you complain to petsc-maint at mcs.anl.gov so that we can fix configure problems and not have anyone editing the generated files. Barry 1) so that it works for all users not just those that know how to edit those files. We cannot fix problems we don't know about 2) editing those files repeatedly is fragile and it is easy to make a slight mistake that's hard to track down. This particular case is just a mistake in configure script (and it's not a big deal to fix), but it will be great if you have any ideas to avoid picking up wrong settings. Thanks, Keita Teranishi Scientific Library Group Cray, Inc. keita at cray.com -Original Message- From: Jed Brown [mailto:five9a2 at gmail.com] On Behalf Of Jed Brown Sent: Friday, August 27, 2010 4:29 PM To: Keita Teranishi; For users of the development version of PETSc Subject: RE: [petsc-dev] [GPU] Performance on Fermi On Fri, 27 Aug 2010 16:18:43 -0500, Keita Teranishi keita at cray.com wrote: Yes, I replaced all the compiler flags by -O3. petsc-maint doesn't come to me, but if the snippet that Barry quoted was from your log_summary, then PETSC_USE_DEBUG was definitely defined when plog.c was compiled. It's really much easier to have two separate builds and always use the optimized one when profiling. Jed