[petsc-dev] [GPU] Performance on Fermi

2010-08-28 Thread Jed Brown
On Fri, 27 Aug 2010 16:06:30 -0500, Keita Teranishi keita at cray.com wrote:
 Barry,
 
 The CPU timing I reported was after recompiling the code (I removed 
 PETSC_USE_DEBUG and GDB macros from petscconf.h).  

Unless you were manually overriding compiler flags, it still wasn't
optimized.  Please just reconfigure a new PETSC_ARCH --with-debugging=0.
It's as easy as

  foo-dbg/conf/reconfigure-foo-dbg.py --with-debugging=0 PETSC_ARCH=foo-opt
  make PETSC_ARCH=foo-opt

Jed



[petsc-dev] [GPU] Performance on Fermi

2010-08-28 Thread Jed Brown
On Fri, 27 Aug 2010 16:18:43 -0500, Keita Teranishi keita at cray.com wrote:
 Yes, I replaced all the compiler flags by -O3.

petsc-maint doesn't come to me, but if the snippet that Barry quoted was
from your log_summary, then PETSC_USE_DEBUG was definitely defined when
plog.c was compiled.  It's really much easier to have two separate
builds and always use the optimized one when profiling.

Jed



[petsc-dev] [GPU] Performance on Fermi

2010-08-28 Thread Jed Brown
On Fri, 27 Aug 2010 16:34:45 -0500, Keita Teranishi keita at cray.com wrote:
 Jed,
 
 I usually manually edit petscconf.h and petscvariables to change the
 installation configurations for Cray XT/XE.  The problem is configure
 script of PETSc picks up wrong variables and #define macros because
 the OS and library setting on the login node is different from the
 compute node.
 
 This particular case is just a mistake in configure script (and it's
 not a big deal to fix), but it will be great if you have any ideas to
 avoid picking up wrong settings.

If it's behaving incorrectly when you configure --with-batch, it is a
configure bug, so please submit the full error.

Jed



[petsc-dev] [GPU] Performance on Fermi

2010-08-28 Thread Richard Tran Mills
Keita,

I'd just like to echo what Barry says.  I probably build petsc-dev on Jaguar 
more than any other person, and I generally don't have to manually edit any 
files generated by configure.py.  When I do, I either find and fix the problem 
in BuildSystem, or work with the petsc-maint folks to fix it.  If you will 
report problems to petsc-maint, we can work to ensure that you don't have to 
do these manual edits.

Best regards,
Richard

On 8/27/2010 8:00 PM, Barry Smith wrote:

 On Aug 27, 2010, at 4:34 PM, Keita Teranishi wrote:

 Jed,

 I usually manually edit petscconf.h and petscvariables to change the 
 installation configurations for Cray XT/XE.   The problem is configure 
 script of PETSc picks up wrong variables and #define macros because the OS 
 and library setting on the login node is different from the compute node.

 Keita,

   We would prefer that you complain to petsc-maint at mcs.anl.gov so that 
 we can fix configure problems and not have anyone editing the generated files.

 Barry

 1) so that it works for all users not just those that know how to edit 
 those files. We cannot fix problems we don't know about

  2) editing those files repeatedly is fragile and it is easy to make a 
 slight mistake that's hard to track down.



 This particular case is just a mistake in configure script (and it's not a 
 big deal to fix), but it will be great if you have any ideas to avoid 
 picking up wrong settings.

 Thanks,
 
   Keita Teranishi
   Scientific Library Group
   Cray, Inc.
   keita at cray.com
 


 -Original Message-
 From: Jed Brown [mailto:five9a2 at gmail.com] On Behalf Of Jed Brown
 Sent: Friday, August 27, 2010 4:29 PM
 To: Keita Teranishi; For users of the development version of PETSc
 Subject: RE: [petsc-dev] [GPU] Performance on Fermi

 On Fri, 27 Aug 2010 16:18:43 -0500, Keita Teranishikeita at cray.com  
 wrote:
 Yes, I replaced all the compiler flags by -O3.

 petsc-maint doesn't come to me, but if the snippet that Barry quoted was
 from your log_summary, then PETSC_USE_DEBUG was definitely defined when
 plog.c was compiled.  It's really much easier to have two separate
 builds and always use the optimized one when profiling.

 Jed


-- 
Richard Tran Mills, Ph.D.|   E-mail: rmills at climate.ornl.gov
Computational Scientist  |   Phone:  (865) 241-3198
Computational Earth Sciences Group   |   Fax:(865) 574-0405
Oak Ridge National Laboratory|   http://climate.ornl.gov/~rmills



[petsc-dev] [GPU] Performance on Fermi

2010-08-28 Thread Matthew Knepley
On Fri, Aug 27, 2010 at 7:19 PM, Keita Teranishi keita at cray.com wrote:

 Barry,

 Yes. It improves the performance dramatically, but the execution time for
 KSPSolve stays the same.

 MatMult 5.2 Gflops


I will note that to put the matvec on the GPU you will also need -mat_type
aijcuda.

   Matt


 Thanks,

 
  Keita Teranishi
  Scientific Library Group
  Cray, Inc.
  keita at cray.com
 


 -Original Message-
 From: petsc-dev-bounces at mcs.anl.gov [mailto:petsc-dev-bounces at 
 mcs.anl.gov]
 On Behalf Of Barry Smith
 Sent: Friday, August 27, 2010 2:15 PM
 To: For users of the development version of PETSc
 Subject: [petsc-dev] [GPU] Performance on Fermi


   PETSc-dev folks,

  Please prepend all messages to petsc-dev that involve GPUs with [GPU]
 so they can be easily filtered.

Keita,

  To run src/ksp/ksp/examples/tutorials/ex2.c with CUDA you need the
 flag -vec_type cuda

  Note also that this example is fine for simple ONE processor tests but
 should not be used for parallel testing because it does not do a proper
 parallel partitioning for performance

Barry

 On Aug 27, 2010, at 2:04 PM, Keita Teranishi wrote:

  Hi,
 
  I ran ex2.c with a matrix from 512x512 grid.
  I set CG and Jacobi for the solver and preconditioner.
  GCC-4.4.4 and CUDA-3.1 are used to compile the code.
  BLAS and LAPAKCK are not optimized.
 
  MatMult
  Fermi:1142 MFlops
  1 core Istanbul:  420 MFlops
 
  KSPSolve:
  Fermi:1.5 Sec
  1 core Istanbul:  1.7 Sec
 
 
  
   Keita Teranishi
   Scientific Library Group
   Cray, Inc.
   keita at cray.com
  
 
 
  -Original Message-
  From: petsc-dev-bounces at mcs.anl.gov [mailto:
 petsc-dev-bounces at mcs.anl.gov] On Behalf Of Satish Balay
  Sent: Friday, August 27, 2010 1:49 PM
  To: For users of the development version of PETSc
  Subject: Re: [petsc-dev] Problem with petsc-dev
 
  On Fri, 27 Aug 2010, Satish Balay wrote:
 
  There was a problem with tarball creation for the past few days. Will
  try to respin manually today - and update you.
 
  the petsc-dev tarball is now updated on the website..
 
  Satish




-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-- next part --
An HTML attachment was scrubbed...
URL: 
http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20100828/38cd38a9/attachment.html


[petsc-dev] [GPU] Performance on Fermi

2010-08-27 Thread Barry Smith

   PETSc-dev folks,

  Please prepend all messages to petsc-dev that involve GPUs with [GPU] so 
they can be easily filtered.

Keita,

  To run src/ksp/ksp/examples/tutorials/ex2.c with CUDA you need the flag 
-vec_type cuda

  Note also that this example is fine for simple ONE processor tests but 
should not be used for parallel testing because it does not do a proper 
parallel partitioning for performance

Barry

On Aug 27, 2010, at 2:04 PM, Keita Teranishi wrote:

 Hi,
 
 I ran ex2.c with a matrix from 512x512 grid. 
 I set CG and Jacobi for the solver and preconditioner. 
 GCC-4.4.4 and CUDA-3.1 are used to compile the code.
 BLAS and LAPAKCK are not optimized.
 
 MatMult
 Fermi:1142 MFlops
 1 core Istanbul:  420 MFlops
 
 KSPSolve:
 Fermi:1.5 Sec
 1 core Istanbul:  1.7 Sec
 
 
 
  Keita Teranishi
  Scientific Library Group
  Cray, Inc.
  keita at cray.com
 
 
 
 -Original Message-
 From: petsc-dev-bounces at mcs.anl.gov [mailto:petsc-dev-bounces at 
 mcs.anl.gov] On Behalf Of Satish Balay
 Sent: Friday, August 27, 2010 1:49 PM
 To: For users of the development version of PETSc
 Subject: Re: [petsc-dev] Problem with petsc-dev
 
 On Fri, 27 Aug 2010, Satish Balay wrote:
 
 There was a problem with tarball creation for the past few days. Will
 try to respin manually today - and update you.
 
 the petsc-dev tarball is now updated on the website..
 
 Satish




[petsc-dev] [GPU] Performance on Fermi

2010-08-27 Thread Keita Teranishi
Barry,

CPU version takes another digit. So it is 1.6 sec on Fermi and 17 sec 1 core 
CPU.

Thanks,

?Keita Teranishi
?Scientific Library Group
?Cray, Inc.
?keita at cray.com



-Original Message-
From: petsc-dev-bounces at mcs.anl.gov [mailto:petsc-dev-boun...@mcs.anl.gov] 
On Behalf Of Keita Teranishi
Sent: Friday, August 27, 2010 2:20 PM
To: For users of the development version of PETSc
Subject: Re: [petsc-dev] [GPU] Performance on Fermi

Barry,

Yes. It improves the performance dramatically, but the execution time for 
KSPSolve stays the same.

MatMult 5.2 Gflops

Thanks,


?Keita Teranishi
?Scientific Library Group
?Cray, Inc.
?keita at cray.com



-Original Message-
From: petsc-dev-bounces at mcs.anl.gov [mailto:petsc-dev-boun...@mcs.anl.gov] 
On Behalf Of Barry Smith
Sent: Friday, August 27, 2010 2:15 PM
To: For users of the development version of PETSc
Subject: [petsc-dev] [GPU] Performance on Fermi


   PETSc-dev folks,

  Please prepend all messages to petsc-dev that involve GPUs with [GPU] so 
they can be easily filtered.

Keita,

  To run src/ksp/ksp/examples/tutorials/ex2.c with CUDA you need the flag 
-vec_type cuda

  Note also that this example is fine for simple ONE processor tests but 
should not be used for parallel testing because it does not do a proper 
parallel partitioning for performance

Barry

On Aug 27, 2010, at 2:04 PM, Keita Teranishi wrote:

 Hi,
 
 I ran ex2.c with a matrix from 512x512 grid. 
 I set CG and Jacobi for the solver and preconditioner. 
 GCC-4.4.4 and CUDA-3.1 are used to compile the code.
 BLAS and LAPAKCK are not optimized.
 
 MatMult
 Fermi:1142 MFlops
 1 core Istanbul:  420 MFlops
 
 KSPSolve:
 Fermi:1.5 Sec
 1 core Istanbul:  1.7 Sec
 
 
 
  Keita Teranishi
  Scientific Library Group
  Cray, Inc.
  keita at cray.com
 
 
 
 -Original Message-
 From: petsc-dev-bounces at mcs.anl.gov 
 [mailto:petsc-dev-bounces at mcs.anl.gov] On Behalf Of Satish Balay
 Sent: Friday, August 27, 2010 1:49 PM
 To: For users of the development version of PETSc
 Subject: Re: [petsc-dev] Problem with petsc-dev
 
 On Fri, 27 Aug 2010, Satish Balay wrote:
 
 There was a problem with tarball creation for the past few days. Will 
 try to respin manually today - and update you.
 
 the petsc-dev tarball is now updated on the website..
 
 Satish




[petsc-dev] [GPU] Performance on Fermi

2010-08-27 Thread Barry Smith

 ##
  ##
  #  WARNING!!!#
  ##
  #   This code was compiled with a debugging option,  #
  #   To get timing results run ./configure#
  #   using --with-debugging=no, the performance will  #
  #   be generally two or three times faster.  #
  ##
  ##


  You need to build the code with ./configure --with-debugging=0 to make a far 
comparison. This will speed up the CPU version.

   Barry


On Aug 27, 2010, at 2:22 PM, Keita Teranishi wrote:

 Barry,
 
 CPU version takes another digit. So it is 1.6 sec on Fermi and 17 sec 1 core 
 CPU.
 
 Thanks,
 
  Keita Teranishi
  Scientific Library Group
  Cray, Inc.
  keita at cray.com
 
 
 
 -Original Message-
 From: petsc-dev-bounces at mcs.anl.gov [mailto:petsc-dev-bounces at 
 mcs.anl.gov] On Behalf Of Keita Teranishi
 Sent: Friday, August 27, 2010 2:20 PM
 To: For users of the development version of PETSc
 Subject: Re: [petsc-dev] [GPU] Performance on Fermi
 
 Barry,
 
 Yes. It improves the performance dramatically, but the execution time for 
 KSPSolve stays the same.
 
 MatMult 5.2 Gflops
 
 Thanks,
 
 
  Keita Teranishi
  Scientific Library Group
  Cray, Inc.
  keita at cray.com
 
 
 
 -Original Message-
 From: petsc-dev-bounces at mcs.anl.gov [mailto:petsc-dev-bounces at 
 mcs.anl.gov] On Behalf Of Barry Smith
 Sent: Friday, August 27, 2010 2:15 PM
 To: For users of the development version of PETSc
 Subject: [petsc-dev] [GPU] Performance on Fermi
 
 
   PETSc-dev folks,
 
  Please prepend all messages to petsc-dev that involve GPUs with [GPU] so 
 they can be easily filtered.
 
Keita,
 
  To run src/ksp/ksp/examples/tutorials/ex2.c with CUDA you need the flag 
 -vec_type cuda
 
  Note also that this example is fine for simple ONE processor tests but 
 should not be used for parallel testing because it does not do a proper 
 parallel partitioning for performance
 
Barry
 
 On Aug 27, 2010, at 2:04 PM, Keita Teranishi wrote:
 
 Hi,
 
 I ran ex2.c with a matrix from 512x512 grid. 
 I set CG and Jacobi for the solver and preconditioner. 
 GCC-4.4.4 and CUDA-3.1 are used to compile the code.
 BLAS and LAPAKCK are not optimized.
 
 MatMult
 Fermi:   1142 MFlops
 1 core Istanbul: 420 MFlops
 
 KSPSolve:
 Fermi:   1.5 Sec
 1 core Istanbul: 1.7 Sec
 
 
 
 Keita Teranishi
 Scientific Library Group
 Cray, Inc.
 keita at cray.com
 
 
 
 -Original Message-
 From: petsc-dev-bounces at mcs.anl.gov 
 [mailto:petsc-dev-bounces at mcs.anl.gov] On Behalf Of Satish Balay
 Sent: Friday, August 27, 2010 1:49 PM
 To: For users of the development version of PETSc
 Subject: Re: [petsc-dev] Problem with petsc-dev
 
 On Fri, 27 Aug 2010, Satish Balay wrote:
 
 There was a problem with tarball creation for the past few days. Will 
 try to respin manually today - and update you.
 
 the petsc-dev tarball is now updated on the website..
 
 Satish
 




[petsc-dev] [GPU] Performance on Fermi

2010-08-27 Thread Keita Teranishi
Yes, I replaced all the compiler flags by -O3.


?Keita Teranishi
?Scientific Library Group
?Cray, Inc.
?keita at cray.com



-Original Message-
From: Jed Brown [mailto:five...@gmail.com] On Behalf Of Jed Brown
Sent: Friday, August 27, 2010 4:16 PM
To: Keita Teranishi; For users of the development version of PETSc
Subject: Re: [petsc-dev] [GPU] Performance on Fermi

On Fri, 27 Aug 2010 16:06:30 -0500, Keita Teranishi keita at cray.com wrote:
 Barry,
 
 The CPU timing I reported was after recompiling the code (I removed 
 PETSC_USE_DEBUG and GDB macros from petscconf.h).  

Unless you were manually overriding compiler flags, it still wasn't
optimized.  Please just reconfigure a new PETSC_ARCH --with-debugging=0.
It's as easy as

  foo-dbg/conf/reconfigure-foo-dbg.py --with-debugging=0 PETSC_ARCH=foo-opt
  make PETSC_ARCH=foo-opt

Jed



[petsc-dev] [GPU] Performance on Fermi

2010-08-27 Thread Keita Teranishi
Jed,

I usually manually edit petscconf.h and petscvariables to change the 
installation configurations for Cray XT/XE.   The problem is configure script 
of PETSc picks up wrong variables and #define macros because the OS and library 
setting on the login node is different from the compute node. 

This particular case is just a mistake in configure script (and it's not a big 
deal to fix), but it will be great if you have any ideas to avoid picking up 
wrong settings.  

Thanks,

?Keita Teranishi
?Scientific Library Group
?Cray, Inc.
?keita at cray.com



-Original Message-
From: Jed Brown [mailto:five...@gmail.com] On Behalf Of Jed Brown
Sent: Friday, August 27, 2010 4:29 PM
To: Keita Teranishi; For users of the development version of PETSc
Subject: RE: [petsc-dev] [GPU] Performance on Fermi

On Fri, 27 Aug 2010 16:18:43 -0500, Keita Teranishi keita at cray.com wrote:
 Yes, I replaced all the compiler flags by -O3.

petsc-maint doesn't come to me, but if the snippet that Barry quoted was
from your log_summary, then PETSC_USE_DEBUG was definitely defined when
plog.c was compiled.  It's really much easier to have two separate
builds and always use the optimized one when profiling.

Jed



[petsc-dev] [GPU] Performance on Fermi

2010-08-27 Thread Barry Smith

On Aug 27, 2010, at 4:34 PM, Keita Teranishi wrote:

 Jed,
 
 I usually manually edit petscconf.h and petscvariables to change the 
 installation configurations for Cray XT/XE.   The problem is configure script 
 of PETSc picks up wrong variables and #define macros because the OS and 
 library setting on the login node is different from the compute node. 

   Keita,

 We would prefer that you complain to petsc-maint at mcs.anl.gov so that we 
can fix configure problems and not have anyone editing the generated files.

   Barry

   1) so that it works for all users not just those that know how to edit those 
files. We cannot fix problems we don't know about

2) editing those files repeatedly is fragile and it is easy to make a 
slight mistake that's hard to track down.


 
 This particular case is just a mistake in configure script (and it's not a 
 big deal to fix), but it will be great if you have any ideas to avoid picking 
 up wrong settings.  
 
 Thanks,
 
  Keita Teranishi
  Scientific Library Group
  Cray, Inc.
  keita at cray.com
 
 
 
 -Original Message-
 From: Jed Brown [mailto:five9a2 at gmail.com] On Behalf Of Jed Brown
 Sent: Friday, August 27, 2010 4:29 PM
 To: Keita Teranishi; For users of the development version of PETSc
 Subject: RE: [petsc-dev] [GPU] Performance on Fermi
 
 On Fri, 27 Aug 2010 16:18:43 -0500, Keita Teranishi keita at cray.com wrote:
 Yes, I replaced all the compiler flags by -O3.
 
 petsc-maint doesn't come to me, but if the snippet that Barry quoted was
 from your log_summary, then PETSC_USE_DEBUG was definitely defined when
 plog.c was compiled.  It's really much easier to have two separate
 builds and always use the optimized one when profiling.
 
 Jed