Re: [petsc-dev] OpenMP

2021-11-07 Thread Mark Adams
Ah, great.
I guessed right (that was the page I was looking at but I still don't see
this)
#if defined(_OPENMP) && _OPENMP >= 201811

On Sat, Nov 6, 2021 at 8:47 PM Junchao Zhang 
wrote:

>
>
> On Sat, Nov 6, 2021 at 3:51 PM Mark Adams  wrote:
>
>> Yea, that is a bit inscrutable, but I see mumps is the main/only user of
>> this:
>>
>> /* if using PETSc OpenMP support, we only call MUMPS on master ranks.
>> Before/after the call, we change/restore CPUs the master ranks can run on */
>>
>> And I see _OPENMP is a macro for the release date (mm) of the OMP
>> version. It's not clear what the v5.0 is (
>> https://www.openmp.org/specifications/)
>>
> {200505,"2.5"},{200805,"3.0"},{201107,"3.1"},{201307,"4.0"},{201511,"4.5"},{201811,"5.0"},{202011,"5.1"}
>
> On Sat, Nov 6, 2021 at 4:27 PM Junchao Zhang 
>> wrote:
>>
>>>
>>>
>>> On Sat, Nov 6, 2021 at 5:51 AM Mark Adams  wrote:
>>>
 Two questions on OMP:

 * Can I test for the version of OMP? I want >= 5 and I see this, which
 looks promising:
 include/petscsys.h:#elif defined(_OPENMP) && *_OPENMP >= 201307* &&
 !defined(_WIN32)

 * What is the difference between HAVE_OPENMP and
 HAVE_OPENMP_SUPPORT.

 # this is different from HAVE_OPENMP. HAVE_OPENMP_SUPPORT checks if we
>>> have facilities to support
>>> # running PETSc in flat-MPI mode and third party libraries in MPI+OpenMP
>>> hybrid mode
>>> if self.mpi.found and self.mpi.support_mpi3_shm and self.pthread.found
>>> and self.hwloc.found:
>>> # Apple pthread does not provide this functionality
>>> if self.function.check('pthread_barrier_init', libraries = 'pthread'):
>>> self.addDefine('HAVE_OPENMP_SUPPORT', 1)
>>>
>>>
 Thanks,
 Mark

>>>


Re: [petsc-dev] OpenMP

2021-11-06 Thread Junchao Zhang
On Sat, Nov 6, 2021 at 3:51 PM Mark Adams  wrote:

> Yea, that is a bit inscrutable, but I see mumps is the main/only user of
> this:
>
> /* if using PETSc OpenMP support, we only call MUMPS on master ranks.
> Before/after the call, we change/restore CPUs the master ranks can run on */
>
> And I see _OPENMP is a macro for the release date (mm) of the OMP
> version. It's not clear what the v5.0 is (
> https://www.openmp.org/specifications/)
>
{200505,"2.5"},{200805,"3.0"},{201107,"3.1"},{201307,"4.0"},{201511,"4.5"},{201811,"5.0"},{202011,"5.1"}

On Sat, Nov 6, 2021 at 4:27 PM Junchao Zhang 
> wrote:
>
>>
>>
>> On Sat, Nov 6, 2021 at 5:51 AM Mark Adams  wrote:
>>
>>> Two questions on OMP:
>>>
>>> * Can I test for the version of OMP? I want >= 5 and I see this, which
>>> looks promising:
>>> include/petscsys.h:#elif defined(_OPENMP) && *_OPENMP >= 201307* &&
>>> !defined(_WIN32)
>>>
>>> * What is the difference between HAVE_OPENMP and
>>> HAVE_OPENMP_SUPPORT.
>>>
>>> # this is different from HAVE_OPENMP. HAVE_OPENMP_SUPPORT checks if we
>> have facilities to support
>> # running PETSc in flat-MPI mode and third party libraries in MPI+OpenMP
>> hybrid mode
>> if self.mpi.found and self.mpi.support_mpi3_shm and self.pthread.found
>> and self.hwloc.found:
>> # Apple pthread does not provide this functionality
>> if self.function.check('pthread_barrier_init', libraries = 'pthread'):
>> self.addDefine('HAVE_OPENMP_SUPPORT', 1)
>>
>>
>>> Thanks,
>>> Mark
>>>
>>


Re: [petsc-dev] OpenMP

2021-11-06 Thread Mark Adams
Yea, that is a bit inscrutable, but I see mumps is the main/only user of
this:

/* if using PETSc OpenMP support, we only call MUMPS on master ranks.
Before/after the call, we change/restore CPUs the master ranks can run on */

And I see _OPENMP is a macro for the release date (mm) of the OMP
version. It's not clear what the v5.0 is (
https://www.openmp.org/specifications/)

On Sat, Nov 6, 2021 at 4:27 PM Junchao Zhang 
wrote:

>
>
> On Sat, Nov 6, 2021 at 5:51 AM Mark Adams  wrote:
>
>> Two questions on OMP:
>>
>> * Can I test for the version of OMP? I want >= 5 and I see this, which
>> looks promising:
>> include/petscsys.h:#elif defined(_OPENMP) && *_OPENMP >= 201307* &&
>> !defined(_WIN32)
>>
>> * What is the difference between HAVE_OPENMP and
>> HAVE_OPENMP_SUPPORT.
>>
>> # this is different from HAVE_OPENMP. HAVE_OPENMP_SUPPORT checks if we
> have facilities to support
> # running PETSc in flat-MPI mode and third party libraries in MPI+OpenMP
> hybrid mode
> if self.mpi.found and self.mpi.support_mpi3_shm and self.pthread.found and
> self.hwloc.found:
> # Apple pthread does not provide this functionality
> if self.function.check('pthread_barrier_init', libraries = 'pthread'):
> self.addDefine('HAVE_OPENMP_SUPPORT', 1)
>
>
>> Thanks,
>> Mark
>>
>


Re: [petsc-dev] OpenMP

2021-11-06 Thread Junchao Zhang
On Sat, Nov 6, 2021 at 5:51 AM Mark Adams  wrote:

> Two questions on OMP:
>
> * Can I test for the version of OMP? I want >= 5 and I see this, which
> looks promising:
> include/petscsys.h:#elif defined(_OPENMP) && *_OPENMP >= 201307* &&
> !defined(_WIN32)
>
> * What is the difference between HAVE_OPENMP and
> HAVE_OPENMP_SUPPORT.
>
> # this is different from HAVE_OPENMP. HAVE_OPENMP_SUPPORT checks if we
have facilities to support
# running PETSc in flat-MPI mode and third party libraries in MPI+OpenMP
hybrid mode
if self.mpi.found and self.mpi.support_mpi3_shm and self.pthread.found and
self.hwloc.found:
# Apple pthread does not provide this functionality
if self.function.check('pthread_barrier_init', libraries = 'pthread'):
self.addDefine('HAVE_OPENMP_SUPPORT', 1)


> Thanks,
> Mark
>


[petsc-dev] OpenMP

2021-11-06 Thread Mark Adams
Two questions on OMP:

* Can I test for the version of OMP? I want >= 5 and I see this, which
looks promising:
include/petscsys.h:#elif defined(_OPENMP) && *_OPENMP >= 201307* &&
!defined(_WIN32)

* What is the difference between HAVE_OPENMP and
HAVE_OPENMP_SUPPORT.

Thanks,
Mark


Re: [petsc-dev] OpenMP and web page

2021-01-15 Thread Barry Smith

  I agree.  We should just make sure it "disappears" with the transition to the 
Spinx docs.

  Barry


> On Jan 15, 2021, at 10:32 AM, jacob@gmail.com wrote:
> 
> Alternatively, I simply don’t include it in the MR to port the website to 
> sphinx.
>  
> Best regards,
>  
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
> Cell: (312) 694-3391
>  
> From: petsc-dev  On Behalf Of Mark Adams
> Sent: Friday, January 15, 2021 10:20
> To: For users of the development version of PETSc 
> Subject: [petsc-dev] OpenMP and web page
>  
> I am experimenting with launching asynchronous GPU solves in 
> PCFieldsplit/additive using OpenMP and I found this web page, which I think 
> is obsolete and should be removed: 
> https://www.mcs.anl.gov/petsc/miscellaneous/petscthreads.html 
> <https://www.mcs.anl.gov/petsc/miscellaneous/petscthreads.html>
>  
> Mark



Re: [petsc-dev] OpenMP and web page

2021-01-15 Thread jacob.fai
Alternatively, I simply don’t include it in the MR to port the website to 
sphinx.

 

Best regards,

 

Jacob Faibussowitsch

(Jacob Fai - booss - oh - vitch)

Cell: (312) 694-3391

 

From: petsc-dev  On Behalf Of Mark Adams
Sent: Friday, January 15, 2021 10:20
To: For users of the development version of PETSc 
Subject: [petsc-dev] OpenMP and web page

 

I am experimenting with launching asynchronous GPU solves in 
PCFieldsplit/additive using OpenMP and I found this web page, which I think is 
obsolete and should be removed: 
https://www.mcs.anl.gov/petsc/miscellaneous/petscthreads.html

 

Mark



[petsc-dev] OpenMP and web page

2021-01-15 Thread Mark Adams
I am experimenting with launching asynchronous GPU solves in
PCFieldsplit/additive using OpenMP and I found this web page, which I think
is obsolete and should be removed:
https://www.mcs.anl.gov/petsc/miscellaneous/petscthreads.html

Mark


Re: [petsc-dev] OpenMP for GPU course

2019-03-19 Thread Smith, Barry F. via petsc-dev


  


> On Mar 19, 2019, at 12:22 PM, Matthew Knepley  wrote:
> 
> On Tue, Mar 19, 2019 at 1:17 PM Jed Brown  wrote:
> Matthew Knepley  writes:
> 
> > Are you saying that using OpenMP 4.5 "offload" is something you would do?
> 
> For applications that make sense for GPUs, yes.
> 
> Would you say this workshop is oriented toward application that make sense 
> for GPUs, or
> are they trying to sell this to everyone regardless of the suitability?

bnl website is currently off-line; at least from my location

> 
>   Matt
> 
> -- 
> What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to which their experiments 
> lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/



Re: [petsc-dev] OpenMP for GPU course

2019-03-19 Thread Jed Brown via petsc-dev
"Smith, Barry F."  writes:

>> On Mar 19, 2019, at 12:17 PM, Jed Brown  wrote:
>> 
>> Matthew Knepley  writes:
>> 
>>> Are you saying that using OpenMP 4.5 "offload" is something you would do?
>> 
>> For applications that make sense for GPUs, yes.
>
>   Oh, so you mean for AI ;)

We really missed the boat by not rebranding as PETAi ("intelligence is
the easy part") and registering pet.ai.


Re: [petsc-dev] OpenMP for GPU course

2019-03-19 Thread Smith, Barry F. via petsc-dev



> On Mar 19, 2019, at 12:17 PM, Jed Brown  wrote:
> 
> Matthew Knepley  writes:
> 
>> Are you saying that using OpenMP 4.5 "offload" is something you would do?
> 
> For applications that make sense for GPUs, yes.

  Oh, so you mean for AI ;)





Re: [petsc-dev] OpenMP for GPU course

2019-03-19 Thread Smith, Barry F. via petsc-dev


> On Mar 19, 2019, at 12:12 PM, Matthew Knepley  wrote:
> 
> On Tue, Mar 19, 2019 at 12:31 PM Jed Brown via petsc-dev 
>  wrote:
> These are well-organized events that puts application teams together
> with compiler developers and the like.  I served as a mentor for the
> Boulder hackathon last year (joining a NASA team that included PETSc
> user Gaetan Kenway) and learned a lot.  It's highly recommended to
> prepare for the event by focusing the code you want to work on to
> something concrete and easily modifiable, and to have a test suite that
> can be easily run to evaluate correctness and performance.  In our case,
> Michael Barad had done an exemplary job in that preparation.
> 
> Are you saying that using OpenMP 4.5 "offload" is something you would do?

   It may be the only choice ;(.  In addition, if it does perform poorly it 
would be good to have concrete evidence that it performs poorly.

> 
>   Matt
>  
> "Smith, Barry F. via petsc-dev"  writes:
> 
> >I got this off an ECP mailing list.
> >
> >
> > OpenMP Brookathon 2019
> > April 29 – May 2, 2019
> > URL:  https://www.bnl.gov/ompbrookathon2019/
> > The Computational Science Initiative at Brookhaven National Laboratory 
> > (BNL) is organizing in conjunction with Oak Ridge National Laboratory 
> > (ORNL) and IBM, the "OpenMP Brookathon 2019". This event is sponsored by 
> > ECP and driven by the ECP SOLLVE Project. The goal of this hackathon is to 
> > port, optimize and evolve applications towards the latest OpenMP versions 
> > (4.5+). In practical terms, this event will enable application teams and 
> > developers to accelerate their code with the use of GPUs, as well as 
> > exploiting the latest OpenMP functionality to program (IBM Power9) 
> > multi-core platforms. Prospective user groups of large hybrid CPU-GPU 
> > systems will send teams of at least 3 developers along with either (1) a 
> > scalable application that could benefit from GPU accelerators, or (2) an 
> > application running on accelerators that has already written OpenMP and 
> > needs optimization or (3) applications that have OpenACC in their codes and 
> > need assistance to convert them to OpenMP 4.5 offload. There will be 
> > intensive mentoring during this 4-day hands-on event. Programming 
> > experience with OpenMP 4.5 offload or CUDA is not a requirement. We will 
> > hold training events / tutorials covering the set of background topics 
> > required. In the weeks preceding the hackathon, you will have a chance to 
> > attend training to prepare you for the event. Prior GPU experience is not 
> > required!
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to which their experiments 
> lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/



Re: [petsc-dev] OpenMP for GPU course

2019-03-19 Thread Jed Brown via petsc-dev
Matthew Knepley  writes:

> Are you saying that using OpenMP 4.5 "offload" is something you would do?

For applications that make sense for GPUs, yes.


Re: [petsc-dev] OpenMP for GPU course

2019-03-19 Thread Jed Brown via petsc-dev
These are well-organized events that puts application teams together
with compiler developers and the like.  I served as a mentor for the
Boulder hackathon last year (joining a NASA team that included PETSc
user Gaetan Kenway) and learned a lot.  It's highly recommended to
prepare for the event by focusing the code you want to work on to
something concrete and easily modifiable, and to have a test suite that
can be easily run to evaluate correctness and performance.  In our case,
Michael Barad had done an exemplary job in that preparation.

"Smith, Barry F. via petsc-dev"  writes:

>I got this off an ECP mailing list.
>
>
> OpenMP Brookathon 2019
> April 29 – May 2, 2019
> URL:  https://www.bnl.gov/ompbrookathon2019/
> The Computational Science Initiative at Brookhaven National Laboratory (BNL) 
> is organizing in conjunction with Oak Ridge National Laboratory (ORNL) and 
> IBM, the "OpenMP Brookathon 2019". This event is sponsored by ECP and driven 
> by the ECP SOLLVE Project. The goal of this hackathon is to port, optimize 
> and evolve applications towards the latest OpenMP versions (4.5+). In 
> practical terms, this event will enable application teams and developers to 
> accelerate their code with the use of GPUs, as well as exploiting the latest 
> OpenMP functionality to program (IBM Power9) multi-core platforms. 
> Prospective user groups of large hybrid CPU-GPU systems will send teams of at 
> least 3 developers along with either (1) a scalable application that could 
> benefit from GPU accelerators, or (2) an application running on accelerators 
> that has already written OpenMP and needs optimization or (3) applications 
> that have OpenACC in their codes and need assistance to convert them to 
> OpenMP 4.5 offload. There will be intensive mentoring during this 4-day 
> hands-on event. Programming experience with OpenMP 4.5 offload or CUDA is not 
> a requirement. We will hold training events / tutorials covering the set of 
> background topics required. In the weeks preceding the hackathon, you will 
> have a chance to attend training to prepare you for the event. Prior GPU 
> experience is not required!


[petsc-dev] OpenMP for GPU course

2019-03-19 Thread Smith, Barry F. via petsc-dev

   I got this off an ECP mailing list.


OpenMP Brookathon 2019
April 29 – May 2, 2019
URL:  https://www.bnl.gov/ompbrookathon2019/
The Computational Science Initiative at Brookhaven National Laboratory (BNL) is 
organizing in conjunction with Oak Ridge National Laboratory (ORNL) and IBM, 
the "OpenMP Brookathon 2019". This event is sponsored by ECP and driven by the 
ECP SOLLVE Project. The goal of this hackathon is to port, optimize and evolve 
applications towards the latest OpenMP versions (4.5+). In practical terms, 
this event will enable application teams and developers to accelerate their 
code with the use of GPUs, as well as exploiting the latest OpenMP 
functionality to program (IBM Power9) multi-core platforms. Prospective user 
groups of large hybrid CPU-GPU systems will send teams of at least 3 developers 
along with either (1) a scalable application that could benefit from GPU 
accelerators, or (2) an application running on accelerators that has already 
written OpenMP and needs optimization or (3) applications that have OpenACC in 
their codes and need assistance to convert them to OpenMP 4.5 offload. There 
will be intensive mentoring during this 4-day hands-on event. Programming 
experience with OpenMP 4.5 offload or CUDA is not a requirement. We will hold 
training events / tutorials covering the set of background topics required. In 
the weeks preceding the hackathon, you will have a chance to attend training to 
prepare you for the event. Prior GPU experience is not required!



Re: [petsc-dev] openmp

2013-11-06 Thread Svetlana Tkachenko
Thanks. Could you please send an example makefile for this small example as 
src/ts/examples/tutorials/makefile is huge and I don't know which part of it 
manages the -I -L stuff.

  Svetlana

On Thu, 7 Nov 2013, at 1:19, Barry Smith wrote:
 
Your makefile does not indicate the location of mpif.h to your FORTRAN 
 compiler which in the case of —with-mpi=0 is in ${PETSC}/include/mpiuni
 
Note that if you simply copy a makefile from PETSc, say 
 src/ts/examples/tutorials/makefile and modify that slightly for you code you 
 don’t need to manage all the -I -L stuff yourself, our makefiles take care of 
 it and are portable for different MPIs etc.
 
 Barry
 
 
 
 On Nov 6, 2013, at 12:25 AM, Svetlana Tkachenko 
 svetlana.tkache...@fastmail.fm wrote:
 
  I have configured petsc-dev (downloaded it today) with these options, and a 
  small example. It appears to fail to compile without MPI with the error 
  message:
  
  ./configure --with-cc=gcc --with-fc=gfortran  --download-f-blas-lapack 
  --with-openmp --with-mpi=0
  
  ~/dev/test/petsc $ echo $LD_LIBRARY_PATH
  /home/username/petsc/linux-amd64/lib:/opt/openmpi/lib
  ~/dev/test/petsc $ cat solver.f
 subroutine solver()
  #include finclude/petscsys.h
  
   PetscErrorCode ierr
   print *, Entered petsc.
  
   ! Init PETSc
   call PetscInitialize(PETSC_NULL_CHARACTER,ierr)
   CHKERRQ(ierr)
   print *, Init done.
  
   ! Finalise PETSc
   call PetscFinalize(ierr)
   CHKERRQ(ierr)
   print *, Finalized.
  end
  ~/dev/test/petsc $ cat myexample.f
program myexample
  
call solver
end
  ~/dev/test/petsc $ cat makefile
  include ${PETSC_DIR}/conf/variables
  
  myexample: myexample.o solver.o ; gfortran -o myexample myexample.o 
  solver.o -lpetsc -L${PETSC_DIR}/${PETSC_ARCH}/lib -fopenmp
  solver.o: ; gfortran -c -cpp -I${PETSC_DIR}/include -O0 solver.f -lpetsc 
  -I${PETSC_DIR}/${PETSC_ARCH}/include -L${PETSC_DIR}/${PETSC_ARCH}/lib 
  -lpetsc -fopenmp
  myexample.o: ; gfortran -c -cpp -I${PETSC_DIR}/include -O0 myexample.f 
  -lpetsc -I${PETSC_DIR}/${PETSC_ARCH}/include 
  -L${PETSC_DIR}/${PETSC_ARCH}/lib -lpetsc -fopenmp
  ~/dev/test/petsc $ make
  gfortran -c -cpp -I/home/username/petsc/include -O0 myexample.f -lpetsc 
  -I/home/username/petsc/linux-amd64/include 
  -L/home/username/petsc/linux-amd64/lib -lpetsc -fopenmp
  gfortran -c -cpp -I/home/username/petsc/include -O0 solver.f -lpetsc 
  -I/home/username/petsc/linux-amd64/include 
  -L/home/username/petsc/linux-amd64/lib -lpetsc -fopenmp
  In file included from solver.f:3:
  /home/username/petsc/include/finclude/petscsys.h:10: error: mpif.h: No 
  such file or directory
  /home/username/petsc/include/finclude/petscsys.h:163.29:
 Included at solver.f:3:
  
   parameter(MPIU_SCALAR = MPI_DOUBLE_PRECISION)
  1
  Error: Parameter 'mpi_double_precision' at (1) has not been declared or is 
  a variable, which does not reduce to a constant expression
  /home/username/petsc/include/finclude/petscsys.h:171.30:
 Included at solver.f:3:
  
   parameter(MPIU_INTEGER = MPI_INTEGER)
   1
  Error: Parameter 'mpi_integer' at (1) has not been declared or is a 
  variable, which does not reduce to a constant expression
  make: *** [solver.o] Error 1
  ~/dev/test/petsc $ 
 


Re: [petsc-dev] openmp

2013-11-06 Thread Matthew Knepley
On Wed, Nov 6, 2013 at 4:47 PM, Svetlana Tkachenko 
svetlana.tkache...@fastmail.fm wrote:

 Thanks. Could you please send an example makefile for this small example
 as src/ts/examples/tutorials/makefile is huge and I don't know which part
 of it manages the -I -L stuff.


prog: prog.o
  ${CLINKER} -o prog prog.o ${PETSC_LIB}

include ${PETSC_DIR}/conf/variables
include ${PETSC_DIR}/conf/rules

Matt


   Svetlana

 On Thu, 7 Nov 2013, at 1:19, Barry Smith wrote:
 
 Your makefile does not indicate the location of mpif.h to your
 FORTRAN compiler which in the case of —with-mpi=0 is in
 ${PETSC}/include/mpiuni
 
 Note that if you simply copy a makefile from PETSc, say
 src/ts/examples/tutorials/makefile and modify that slightly for you code
 you don’t need to manage all the -I -L stuff yourself, our makefiles take
 care of it and are portable for different MPIs etc.
 
  Barry
 
 
 
  On Nov 6, 2013, at 12:25 AM, Svetlana Tkachenko 
 svetlana.tkache...@fastmail.fm wrote:
 
   I have configured petsc-dev (downloaded it today) with these options,
 and a small example. It appears to fail to compile without MPI with the
 error message:
  
   ./configure --with-cc=gcc --with-fc=gfortran  --download-f-blas-lapack
 --with-openmp --with-mpi=0
  
   ~/dev/test/petsc $ echo $LD_LIBRARY_PATH
   /home/username/petsc/linux-amd64/lib:/opt/openmpi/lib
   ~/dev/test/petsc $ cat solver.f
  subroutine solver()
   #include finclude/petscsys.h
  
PetscErrorCode ierr
print *, Entered petsc.
  
! Init PETSc
call PetscInitialize(PETSC_NULL_CHARACTER,ierr)
CHKERRQ(ierr)
print *, Init done.
  
! Finalise PETSc
call PetscFinalize(ierr)
CHKERRQ(ierr)
print *, Finalized.
   end
   ~/dev/test/petsc $ cat myexample.f
 program myexample
  
 call solver
 end
   ~/dev/test/petsc $ cat makefile
   include ${PETSC_DIR}/conf/variables
  
   myexample: myexample.o solver.o ; gfortran -o myexample myexample.o
 solver.o -lpetsc -L${PETSC_DIR}/${PETSC_ARCH}/lib -fopenmp
   solver.o: ; gfortran -c -cpp -I${PETSC_DIR}/include -O0 solver.f
 -lpetsc -I${PETSC_DIR}/${PETSC_ARCH}/include
 -L${PETSC_DIR}/${PETSC_ARCH}/lib -lpetsc -fopenmp
   myexample.o: ; gfortran -c -cpp -I${PETSC_DIR}/include -O0 myexample.f
 -lpetsc -I${PETSC_DIR}/${PETSC_ARCH}/include
 -L${PETSC_DIR}/${PETSC_ARCH}/lib -lpetsc -fopenmp
   ~/dev/test/petsc $ make
   gfortran -c -cpp -I/home/username/petsc/include -O0 myexample.f
 -lpetsc -I/home/username/petsc/linux-amd64/include
 -L/home/username/petsc/linux-amd64/lib -lpetsc -fopenmp
   gfortran -c -cpp -I/home/username/petsc/include -O0 solver.f -lpetsc
 -I/home/username/petsc/linux-amd64/include
 -L/home/username/petsc/linux-amd64/lib -lpetsc -fopenmp
   In file included from solver.f:3:
   /home/username/petsc/include/finclude/petscsys.h:10: error: mpif.h:
 No such file or directory
   /home/username/petsc/include/finclude/petscsys.h:163.29:
  Included at solver.f:3:
  
parameter(MPIU_SCALAR = MPI_DOUBLE_PRECISION)
   1
   Error: Parameter 'mpi_double_precision' at (1) has not been declared
 or is a variable, which does not reduce to a constant expression
   /home/username/petsc/include/finclude/petscsys.h:171.30:
  Included at solver.f:3:
  
parameter(MPIU_INTEGER = MPI_INTEGER)
1
   Error: Parameter 'mpi_integer' at (1) has not been declared or is a
 variable, which does not reduce to a constant expression
   make: *** [solver.o] Error 1
   ~/dev/test/petsc $
 




-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener


Re: [petsc-dev] openmp

2013-11-06 Thread Jed Brown
Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes:

 Thanks. Could you please send an example makefile for this small
 example as src/ts/examples/tutorials/makefile is huge and I don't know
 which part of it manages the -I -L stuff.

There are examples in the user's manual.  I would start with this:

ALL: ex2

include ${PETSC_DIR}/conf/variables
include ${PETSC_DIR}/conf/rules
ex2: ex2.o chkopts
${CLINKER} -o $@ $ ${PETSC_LIB}


pgpzozD5V73lB.pgp
Description: PGP signature


Re: [petsc-dev] openmp

2013-11-06 Thread Svetlana Tkachenko
On Thu, 7 Nov 2013, at 9:22, Jed Brown wrote:
 Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes:
 
  Thanks. Could you please send an example makefile for this small
  example as src/ts/examples/tutorials/makefile is huge and I don't know
  which part of it manages the -I -L stuff.
 
 There are examples in the user's manual.  I would start with this:
 
 ALL: ex2
 
 include ${PETSC_DIR}/conf/variables
 include ${PETSC_DIR}/conf/rules
 ex2: ex2.o chkopts
   ${CLINKER} -o $@ $ ${PETSC_LIB}
 Email had 1 attachment:
 + Attachment2
   1k (application/pgp-signature)

On Thu, 7 Nov 2013, at 9:21, Matthew Knepley wrote:
 prog: prog.o
   ${CLINKER} -o prog prog.o ${PETSC_LIB}
 
 include ${PETSC_DIR}/conf/variables
 include ${PETSC_DIR}/conf/rules
 
 Matt

Right. I have spent half of an hour now trying to imagine what to do to link 
trying everything like a headless chicken and it did not work. I would 
appreciate if you could come up with something that links, not just runs a 
single program.


Re: [petsc-dev] openmp

2013-11-06 Thread Jed Brown
Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes:
 Right. I have spent half of an hour now trying to imagine what to do
 to link trying everything like a headless chicken and it did not
 work. 

You always have to send the error message.

 I would appreciate if you could come up with something that links, not
 just runs a single program.

The makefiles we suggested link a program of the same name as the source
file.  Do you have multiple source files?  You only have to edit the one
line and run make.

program: several.o object.o files.o
${CLINKER} -o $@ $^ ${PETSC_LIB}

include ${PETSC_DIR}/conf/variables
include ${PETSC_DIR}/conf/rules


pgpnp8FMfHJGP.pgp
Description: PGP signature


Re: [petsc-dev] openmp

2013-11-06 Thread Svetlana Tkachenko


On Thu, 7 Nov 2013, at 10:22, Jed Brown wrote:
 Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes:
  Right. I have spent half of an hour now trying to imagine what to do
  to link trying everything like a headless chicken and it did not
  work. 
 
 You always have to send the error message.
 
  I would appreciate if you could come up with something that links, not
  just runs a single program.
 
 The makefiles we suggested link a program of the same name as the source
 file.  Do you have multiple source files?  You only have to edit the one
 line and run make.
 
 program: several.o object.o files.o
   ${CLINKER} -o $@ $^ ${PETSC_LIB}
 
 include ${PETSC_DIR}/conf/variables
 include ${PETSC_DIR}/conf/rules
 Email had 1 attachment:
 + Attachment2
   1k (application/pgp-signature)

~/dev/test/petsc $ cat makefile
myexample: myexample.o solver.o
${CLINKER} -o $@ $^ ${PETSC_LIB}

include ${PETSC_DIR}/conf/variables
include ${PETSC_DIR}/conf/rules
~/dev/test/petsc $ cat solver.f
  subroutine solver()
#include finclude/petscsys.h
PetscErrorCode ierr
print *, Entered petsc.

  ! Init PETSc
  call PetscInitialize(PETSC_NULL_CHARACTER,ierr)
  CHKERRQ(ierr)
  print *, Init done.

  ! Finalise PETSc
  call PetscFinalize(ierr)
  CHKERRQ(ierr)
  print *, Finalized.
  end
~/dev/test/petsc $ make
gfortran -c  -fPIC -Wall -Wno-unused-variable -g  -fopenmp  -o solver.o solver.f
Warning: solver.f:2: Illegal preprocessor directive
solver.f:3.12:

PetscErrorCode ierr
1
Error: Unclassifiable statement at (1)
solver.f:8.14:

  CHKERRQ(ierr)
  1
Error: Unclassifiable statement at (1)
solver.f:13.14:

  CHKERRQ(ierr)
  1
Error: Unclassifiable statement at (1)
make: [solver.o] Error 1 (ignored)
gcc -fopenmp -fopenmp   -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing 
-Wno-unknown-pragmas -g3 -fno-inline -O0 -fopenmp  -o myexample myexample.o 
solver.o -Wl,-rpath,/home/username/petsc/linux-amd64/lib 
-L/home/username/petsc/linux-amd64/lib  -lpetsc 
-Wl,-rpath,/home/username/petsc/linux-amd64/lib -lflapack -lfblas -lX11 
-lpthread -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 
-L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -lgfortran -lm -lm -lstdc++ -lstdc++ 
-ldl -lgcc_s -ldl
gcc: solver.o: No such file or directory
make: *** [myexample] Error 1

~/dev/test/petsc $ 


Re: [petsc-dev] openmp

2013-11-06 Thread Jed Brown
Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes:

 On Thu, 7 Nov 2013, at 10:22, Jed Brown wrote:
 Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes:
  Right. I have spent half of an hour now trying to imagine what to do
  to link trying everything like a headless chicken and it did not
  work. 
 
 You always have to send the error message.
 
  I would appreciate if you could come up with something that links, not
  just runs a single program.
 
 The makefiles we suggested link a program of the same name as the source
 file.  Do you have multiple source files?  You only have to edit the one
 line and run make.
 
 program: several.o object.o files.o
  ${CLINKER} -o $@ $^ ${PETSC_LIB}
 
 include ${PETSC_DIR}/conf/variables
 include ${PETSC_DIR}/conf/rules
 Email had 1 attachment:
 + Attachment2
   1k (application/pgp-signature)

 ~/dev/test/petsc $ cat makefile
 myexample: myexample.o solver.o
 ${CLINKER} -o $@ $^ ${PETSC_LIB}

 include ${PETSC_DIR}/conf/variables
 include ${PETSC_DIR}/conf/rules
 ~/dev/test/petsc $ cat solver.f
   subroutine solver()
 #include finclude/petscsys.h

Name your source file solver.F so that the Fortran compiler preprocesses
it.  (Or add the option -cpp, but that is more confusing and less
portable, so rename the source file.)


pgpXoN197wB7v.pgp
Description: PGP signature


Re: [petsc-dev] openmp

2013-11-06 Thread Svetlana Tkachenko
On Thu, 7 Nov 2013, at 12:34, Jed Brown wrote:
 Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes:
 
  On Thu, 7 Nov 2013, at 10:22, Jed Brown wrote:
  Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes:
   Right. I have spent half of an hour now trying to imagine what to do
   to link trying everything like a headless chicken and it did not
   work. 
  
  You always have to send the error message.
  
   I would appreciate if you could come up with something that links, not
   just runs a single program.
  
  The makefiles we suggested link a program of the same name as the source
  file.  Do you have multiple source files?  You only have to edit the one
  line and run make.
  
  program: several.o object.o files.o
 ${CLINKER} -o $@ $^ ${PETSC_LIB}
  
  include ${PETSC_DIR}/conf/variables
  include ${PETSC_DIR}/conf/rules
  Email had 1 attachment:
  + Attachment2
1k (application/pgp-signature)
 
  ~/dev/test/petsc $ cat makefile
  myexample: myexample.o solver.o
  ${CLINKER} -o $@ $^ ${PETSC_LIB}
 
  include ${PETSC_DIR}/conf/variables
  include ${PETSC_DIR}/conf/rules
  ~/dev/test/petsc $ cat solver.f
subroutine solver()
  #include finclude/petscsys.h
 
 Name your source file solver.F so that the Fortran compiler preprocesses
 it.  (Or add the option -cpp, but that is more confusing and less
 portable, so rename the source file.)
 Email had 1 attachment:
 + Attachment2
   1k (application/pgp-signature)

~/dev/test/petsc $ mv solver.f solver.F
~/dev/test/petsc $ make
gfortran -c  -fPIC -Wall -Wno-unused-variable -g  -fopenmp  
-I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include 
-I/home/username/petsc/include/mpiuni-o solver.o solver.F
solver.F:8.46:

  if (ierr .ne. 0) call MPI_Abort(PETSC_COMM_WORLD,ierr,ierr
  1
Error: Missing ')' in statement at or before (1)
solver.F:8.72:

  if (ierr .ne. 0) call MPI_Abort(PETSC_COMM_WORLD,ierr,ierr
1
Warning: Line truncated at (1)
solver.F:13.46:

  if (ierr .ne. 0) call MPI_Abort(PETSC_COMM_WORLD,ierr,ierr
  1
Error: Missing ')' in statement at or before (1)
solver.F:13.72:

  if (ierr .ne. 0) call MPI_Abort(PETSC_COMM_WORLD,ierr,ierr
1
Warning: Line truncated at (1)
make: [solver.o] Error 1 (ignored)
gcc -fopenmp -fopenmp   -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing 
-Wno-unknown-pragmas -g3 -fno-inline -O0 -fopenmp  -o myexample myexample.o 
solver.o -Wl,-rpath,/home/username/petsc/linux-amd64/lib 
-L/home/username/petsc/linux-amd64/lib  -lpetsc 
-Wl,-rpath,/home/username/petsc/linux-amd64/lib -lflapack -lfblas -lX11 
-lpthread -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 
-L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -lgfortran -lm -lm -lstdc++ -lstdc++ 
-ldl -lgcc_s -ldl
gcc: solver.o: No such file or directory
make: *** [myexample] Error 1
~/dev/test/petsc $ 


Re: [petsc-dev] openmp

2013-11-06 Thread Jed Brown
Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes:

 ~/dev/test/petsc $ mv solver.f solver.F
 ~/dev/test/petsc $ make
 gfortran -c  -fPIC -Wall -Wno-unused-variable -g  -fopenmp  
 -I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include 
 -I/home/username/petsc/include/mpiuni-o solver.o solver.F
 solver.F:8.46:

   if (ierr .ne. 0) call MPI_Abort(PETSC_COMM_WORLD,ierr,ierr
   1
 Error: Missing ')' in statement at or before (1)

You indented so far that the expanded macro spilled over the
72-character line length needed to fit on a punch card in the 1950s.  If
you would like to modernize your Fortran dialect beyond the constraints
of punch cards, you could consider naming your file .F90 or adding the
option -ffree-form, perhaps also with -ffree-line-length-none.


pgpSZ9zDQY7VU.pgp
Description: PGP signature


Re: [petsc-dev] openmp

2013-11-06 Thread Svetlana Tkachenko


On Thu, 7 Nov 2013, at 12:56, Jed Brown wrote:
 Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes:
 
  ~/dev/test/petsc $ mv solver.f solver.F
  ~/dev/test/petsc $ make
  gfortran -c  -fPIC -Wall -Wno-unused-variable -g  -fopenmp  
  -I/home/username/petsc/include 
  -I/home/username/petsc/linux-amd64/include 
  -I/home/username/petsc/include/mpiuni-o solver.o solver.F
  solver.F:8.46:
 
if (ierr .ne. 0) call MPI_Abort(PETSC_COMM_WORLD,ierr,ierr
1
  Error: Missing ')' in statement at or before (1)
 
 You indented so far that the expanded macro spilled over the
 72-character line length needed to fit on a punch card in the 1950s.  If
 you would like to modernize your Fortran dialect beyond the constraints
 of punch cards, you could consider naming your file .F90 or adding the
 option -ffree-form, perhaps also with -ffree-line-length-none.
 Email had 1 attachment:
 + Attachment2
   1k (application/pgp-signature)

Thanks! I need to apologize as I didn't mention it in the previous message:  in 
my solver.F no I did not exceed the line length and I don't know what it is 
referring to even. That line is not mine.

  Svetlana


Re: [petsc-dev] openmp

2013-11-06 Thread Jed Brown
Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes:
 Thanks! I need to apologize as I didn't mention it in the previous
 message: in my solver.F no I did not exceed the line length and I
 don't know what it is referring to even. That line is not mine.

It is the expanded error checking macro.  Fortran does not provide a
concise way to do error checking except to use the C preprocessor, but
the line length requirements apply to the _expanded_ macro.  Fortran is
a perpetual nuisance, but once you learn about all the potholes, barbed
wire, broken glass, and polonium in the food, it's possible to get by.


pgpsPLnLrMrN4.pgp
Description: PGP signature


Re: [petsc-dev] openmp

2013-11-06 Thread Svetlana Tkachenko
On Thu, 7 Nov 2013, at 13:51, Jed Brown wrote:
 Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes:
 
  On Thu, 7 Nov 2013, at 12:56, Jed Brown wrote:
  Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes:
  
   ~/dev/test/petsc $ mv solver.f solver.F
   ~/dev/test/petsc $ make
   gfortran -c  -fPIC -Wall -Wno-unused-variable -g  -fopenmp  
   -I/home/username/petsc/include 
   -I/home/username/petsc/linux-amd64/include 
   -I/home/username/petsc/include/mpiuni-o solver.o solver.F
   solver.F:8.46:
  
 if (ierr .ne. 0) call MPI_Abort(PETSC_COMM_WORLD,ierr,ierr
 1
   Error: Missing ')' in statement at or before (1)
  
  You indented so far that the expanded macro spilled over the
  72-character line length needed to fit on a punch card in the 1950s.  If
  you would like to modernize your Fortran dialect beyond the constraints
  of punch cards, you could consider naming your file .F90 or adding the
  option -ffree-form, perhaps also with -ffree-line-length-none.
  Email had 1 attachment:
  + Attachment2
1k (application/pgp-signature)
 
  Thank you for the advice.
  I've named both files .F90 and now that's what it's got.
 
  ~/dev/test/petsc $ ls
  makefile  myexample.F90  solver.F90
  ~/dev/test/petsc $ make
  gfortran -c  -fPIC -Wall -Wno-unused-variable -g  -fopenmp  
  -I/home/username/petsc/include 
  -I/home/username/petsc/linux-amd64/include 
  -I/home/username/petsc/include/mpiuni-o myexample.o myexample.F90
  gfortran -c  -fPIC -Wall -Wno-unused-variable -g  -fopenmp  
  -I/home/username/petsc/include 
  -I/home/username/petsc/linux-amd64/include 
  -I/home/username/petsc/include/mpiuni-o solver.o solver.F90
  gcc -fopenmp -fopenmp   -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing 
  -Wno-unknown-pragmas -g3 -fno-inline -O0 -fopenmp  -o myexample myexample.o 
  solver.o -Wl,-rpath,/home/username/petsc/linux-amd64/lib 
  -L/home/username/petsc/linux-amd64/lib  -lpetsc 
  -Wl,-rpath,/home/username/petsc/linux-amd64/lib -lflapack -lfblas -lX11 
  -lpthread -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 
  -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -lgfortran -lm -lm -lstdc++ 
  -lstdc++ -ldl -lgcc_s -ldl
  /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/crt1.o: In 
  function `_start':
  (.text+0x20): undefined reference to `main'
 
 What do you expect?  Your file only contains a subroutine, no program.

What do you mean? (I don't think the program name has to be 'main'.).

~/dev/test/petsc $ cat myexample.F90
   program myexample

   call solver
   end
~/dev/test/petsc $


Re: [petsc-dev] openmp

2013-11-06 Thread Jed Brown
Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes:
 What do you mean? (I don't think the program name has to be 'main'.).

No, it doesn't.  The name is meaningless in Fortran, but you need to use
the keyword program.

 ~/dev/test/petsc $ cat myexample.F90
program myexample

call solver
end
 ~/dev/test/petsc $

Add myexample.o to the makefile so it gets compiled.


pgpwF5jw09bs9.pgp
Description: PGP signature


Re: [petsc-dev] openmp

2013-11-06 Thread Svetlana Tkachenko


On Thu, 7 Nov 2013, at 13:59, Jed Brown wrote:
 Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes:
  What do you mean? (I don't think the program name has to be 'main'.).
 
 No, it doesn't.  The name is meaningless in Fortran, but you need to use
 the keyword program.
 
  ~/dev/test/petsc $ cat myexample.F90
 program myexample
 
 call solver
 end
  ~/dev/test/petsc $
 
 Add myexample.o to the makefile so it gets compiled.

Already did, please, see:

~/dev/test/petsc $ cat makefile
myexample: myexample.o solver.o
${CLINKER} -o $@ $^ ${PETSC_LIB}

include ${PETSC_DIR}/conf/variables
include ${PETSC_DIR}/conf/rules
~/dev/test/petsc $


Re: [petsc-dev] openmp

2013-11-06 Thread Jed Brown
Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes:

 On Thu, 7 Nov 2013, at 13:59, Jed Brown wrote:
 Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes:
  What do you mean? (I don't think the program name has to be 'main'.).
 
 No, it doesn't.  The name is meaningless in Fortran, but you need to use
 the keyword program.
 
  ~/dev/test/petsc $ cat myexample.F90
 program myexample
 
 call solver
 end
  ~/dev/test/petsc $
 
 Add myexample.o to the makefile so it gets compiled.

 Already did, please, see:

 ~/dev/test/petsc $ cat makefile
 myexample: myexample.o solver.o
 ${CLINKER} -o $@ $^ ${PETSC_LIB}

 include ${PETSC_DIR}/conf/variables
 include ${PETSC_DIR}/conf/rules
 ~/dev/test/petsc $

Run make clean; make.  If you get the same error, check

$ nm myexample.o
 U _gfortran_set_args
 U _gfortran_set_options
 U _GLOBAL_OFFSET_TABLE_
 T main
 r options.0.1881
 U solver_



pgpwhLIKSRraS.pgp
Description: PGP signature


Re: [petsc-dev] openmp

2013-11-06 Thread Svetlana Tkachenko


On Thu, 7 Nov 2013, at 14:15, Jed Brown wrote:
 Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes:
 
  On Thu, 7 Nov 2013, at 13:59, Jed Brown wrote:
  Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes:
   What do you mean? (I don't think the program name has to be 'main'.).
  
  No, it doesn't.  The name is meaningless in Fortran, but you need to use
  the keyword program.
  
   ~/dev/test/petsc $ cat myexample.F90
  program myexample
  
  call solver
  end
   ~/dev/test/petsc $
  
  Add myexample.o to the makefile so it gets compiled.
 
  Already did, please, see:
 
  ~/dev/test/petsc $ cat makefile
  myexample: myexample.o solver.o
  ${CLINKER} -o $@ $^ ${PETSC_LIB}
 
  include ${PETSC_DIR}/conf/variables
  include ${PETSC_DIR}/conf/rules
  ~/dev/test/petsc $
 
 Run make clean; make.  If you get the same error, check
 
 $ nm myexample.o
  U _gfortran_set_args
  U _gfortran_set_options
  U _GLOBAL_OFFSET_TABLE_
  T main
  r options.0.1881
  U solver_
 

~/dev/test/petsc $ make clean
~/dev/test/petsc $ make
gfortran -c  -fPIC -Wall -Wno-unused-variable -g  -fopenmp  
-I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include 
-I/home/username/petsc/include/mpiuni-o myexample.o myexample.F90
gfortran -c  -fPIC -Wall -Wno-unused-variable -g  -fopenmp  
-I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include 
-I/home/username/petsc/include/mpiuni-o solver.o solver.F90
gcc -fopenmp -fopenmp   -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing 
-Wno-unknown-pragmas -g3 -fno-inline -O0 -fopenmp  -o myexample myexample.o 
solver.o -Wl,-rpath,/home/username/petsc/linux-amd64/lib 
-L/home/username/petsc/linux-amd64/lib  -lpetsc 
-Wl,-rpath,/home/username/petsc/linux-amd64/lib -lflapack -lfblas -lX11 
-lpthread -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 
-L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -lgfortran -lm -lm -lstdc++ -lstdc++ 
-ldl -lgcc_s -ldl
/usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/crt1.o: In function 
`_start':
(.text+0x20): undefined reference to `main'
collect2: ld returned 1 exit status
make: *** [myexample] Error 1
~/dev/test/petsc $ nm myexample.o
 T MAIN__
 U _GLOBAL_OFFSET_TABLE_
 U _gfortran_set_options
 r options.0.1516
 U solver_
~/dev/test/petsc $



Re: [petsc-dev] openmp

2013-11-06 Thread Svetlana Tkachenko


On Thu, 7 Nov 2013, at 14:22, Svetlana Tkachenko wrote:
 
 
 On Thu, 7 Nov 2013, at 14:15, Jed Brown wrote:
  Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes:
  
   On Thu, 7 Nov 2013, at 13:59, Jed Brown wrote:
   Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes:
What do you mean? (I don't think the program name has to be 'main'.).
   
   No, it doesn't.  The name is meaningless in Fortran, but you need to use
   the keyword program.
   
~/dev/test/petsc $ cat myexample.F90
   program myexample
   
   call solver
   end
~/dev/test/petsc $
   
   Add myexample.o to the makefile so it gets compiled.
  
   Already did, please, see:
  
   ~/dev/test/petsc $ cat makefile
   myexample: myexample.o solver.o
   ${CLINKER} -o $@ $^ ${PETSC_LIB}
  
   include ${PETSC_DIR}/conf/variables
   include ${PETSC_DIR}/conf/rules
   ~/dev/test/petsc $
  
  Run make clean; make.  If you get the same error, check
  
  $ nm myexample.o
   U _gfortran_set_args
   U _gfortran_set_options
   U _GLOBAL_OFFSET_TABLE_
   T main
   r options.0.1881
   U solver_
  
 
 ~/dev/test/petsc $ make clean
 ~/dev/test/petsc $ make
 gfortran -c  -fPIC -Wall -Wno-unused-variable -g  -fopenmp  
 -I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include 
 -I/home/username/petsc/include/mpiuni-o myexample.o myexample.F90
 gfortran -c  -fPIC -Wall -Wno-unused-variable -g  -fopenmp  
 -I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include 
 -I/home/username/petsc/include/mpiuni-o solver.o solver.F90
 gcc -fopenmp -fopenmp   -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing 
 -Wno-unknown-pragmas -g3 -fno-inline -O0 -fopenmp  -o myexample myexample.o 
 solver.o -Wl,-rpath,/home/username/petsc/linux-amd64/lib 
 -L/home/username/petsc/linux-amd64/lib  -lpetsc 
 -Wl,-rpath,/home/username/petsc/linux-amd64/lib -lflapack -lfblas -lX11 
 -lpthread -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 
 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -lgfortran -lm -lm -lstdc++ -lstdc++ 
 -ldl -lgcc_s -ldl
 /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/crt1.o: In function 
 `_start':
 (.text+0x20): undefined reference to `main'
 collect2: ld returned 1 exit status
 make: *** [myexample] Error 1
 ~/dev/test/petsc $ nm myexample.o
  T MAIN__
  U _GLOBAL_OFFSET_TABLE_
  U _gfortran_set_options
  r options.0.1516
  U solver_
 ~/dev/test/petsc $
 

Using FLINKER instead of CLINKER in the makefile gets it to compile without 
errors. However, the previous behaviour persists (?).

~/dev/test/petsc $ make
gfortran -c  -fPIC -Wall -Wno-unused-variable -g  -fopenmp  
-I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include 
-I/home/username/petsc/include/mpiuni-o myexample.o myexample.F90
gfortran -c  -fPIC -Wall -Wno-unused-variable -g  -fopenmp  
-I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include 
-I/home/username/petsc/include/mpiuni-o solver.o solver.F90
gfortran -fopenmp -fopenmp   -fPIC -Wall -Wno-unused-variable -g  -fopenmp  -o 
myexample myexample.o solver.o 
-Wl,-rpath,/home/username/petsc/linux-amd64/lib 
-L/home/username/petsc/linux-amd64/lib  -lpetsc 
-Wl,-rpath,/home/username/petsc/linux-amd64/lib -lflapack -lfblas -lX11 
-lpthread -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 
-L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -lgfortran -lm -lm -lstdc++ -lstdc++ 
-ldl -lgcc_s -ldl
~/dev/test/petsc $ ./myexample
 Entered petsc.
 Init done.
WARNING! There are options you set that were not used!
WARNING! could be spelling mistake, etc!
Option left: name:-threadcomm_nthreads value: 8
Option left: name:-threadcomm_type value: openmp
 Finalized.
~/dev/test/petsc $


Re: [petsc-dev] openmp

2013-11-06 Thread Svetlana Tkachenko


On Thu, 7 Nov 2013, at 14:24, Svetlana Tkachenko wrote:
 
 
 On Thu, 7 Nov 2013, at 14:22, Svetlana Tkachenko wrote:
  
  
  On Thu, 7 Nov 2013, at 14:15, Jed Brown wrote:
   Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes:
   
On Thu, 7 Nov 2013, at 13:59, Jed Brown wrote:
Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes:
 What do you mean? (I don't think the program name has to be 'main'.).

No, it doesn't.  The name is meaningless in Fortran, but you need to 
use
the keyword program.

 ~/dev/test/petsc $ cat myexample.F90
program myexample

call solver
end
 ~/dev/test/petsc $

Add myexample.o to the makefile so it gets compiled.
   
Already did, please, see:
   
~/dev/test/petsc $ cat makefile
myexample: myexample.o solver.o
${CLINKER} -o $@ $^ ${PETSC_LIB}
   
include ${PETSC_DIR}/conf/variables
include ${PETSC_DIR}/conf/rules
~/dev/test/petsc $
   
   Run make clean; make.  If you get the same error, check
   
   $ nm myexample.o
U _gfortran_set_args
U _gfortran_set_options
U _GLOBAL_OFFSET_TABLE_
    T main
    r options.0.1881
U solver_
   
  
  ~/dev/test/petsc $ make clean
  ~/dev/test/petsc $ make
  gfortran -c  -fPIC -Wall -Wno-unused-variable -g  -fopenmp  
  -I/home/username/petsc/include 
  -I/home/username/petsc/linux-amd64/include 
  -I/home/username/petsc/include/mpiuni-o myexample.o myexample.F90
  gfortran -c  -fPIC -Wall -Wno-unused-variable -g  -fopenmp  
  -I/home/username/petsc/include 
  -I/home/username/petsc/linux-amd64/include 
  -I/home/username/petsc/include/mpiuni-o solver.o solver.F90
  gcc -fopenmp -fopenmp   -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing 
  -Wno-unknown-pragmas -g3 -fno-inline -O0 -fopenmp  -o myexample myexample.o 
  solver.o -Wl,-rpath,/home/username/petsc/linux-amd64/lib 
  -L/home/username/petsc/linux-amd64/lib  -lpetsc 
  -Wl,-rpath,/home/username/petsc/linux-amd64/lib -lflapack -lfblas -lX11 
  -lpthread -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 
  -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -lgfortran -lm -lm -lstdc++ 
  -lstdc++ -ldl -lgcc_s -ldl
  /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/crt1.o: In 
  function `_start':
  (.text+0x20): undefined reference to `main'
  collect2: ld returned 1 exit status
  make: *** [myexample] Error 1
  ~/dev/test/petsc $ nm myexample.o
   T MAIN__
   U _GLOBAL_OFFSET_TABLE_
   U _gfortran_set_options
   r options.0.1516
   U solver_
  ~/dev/test/petsc $
  
 
 Using FLINKER instead of CLINKER in the makefile gets it to compile without 
 errors. However, the previous behaviour persists (?).
 
 ~/dev/test/petsc $ make
 gfortran -c  -fPIC -Wall -Wno-unused-variable -g  -fopenmp  
 -I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include 
 -I/home/username/petsc/include/mpiuni-o myexample.o myexample.F90
 gfortran -c  -fPIC -Wall -Wno-unused-variable -g  -fopenmp  
 -I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include 
 -I/home/username/petsc/include/mpiuni-o solver.o solver.F90
 gfortran -fopenmp -fopenmp   -fPIC -Wall -Wno-unused-variable -g  -fopenmp  
 -o myexample myexample.o solver.o 
 -Wl,-rpath,/home/username/petsc/linux-amd64/lib 
 -L/home/username/petsc/linux-amd64/lib  -lpetsc 
 -Wl,-rpath,/home/username/petsc/linux-amd64/lib -lflapack -lfblas -lX11 
 -lpthread -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 
 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -lgfortran -lm -lm -lstdc++ -lstdc++ 
 -ldl -lgcc_s -ldl
 ~/dev/test/petsc $ ./myexample
  Entered petsc.
  Init done.
 WARNING! There are options you set that were not used!
 WARNING! could be spelling mistake, etc!
 Option left: name:-threadcomm_nthreads value: 8
 Option left: name:-threadcomm_type value: openmp
  Finalized.
 ~/dev/test/petsc $

Correction:
Not previous behaviour, rather, new behaviour...


Re: [petsc-dev] openmp

2013-11-06 Thread Jed Brown
Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes:
 Using FLINKER instead of CLINKER in the makefile gets it to compile without 
 errors. However, the previous behaviour persists (?).

 ~/dev/test/petsc $ make
 gfortran -c  -fPIC -Wall -Wno-unused-variable -g  -fopenmp  
 -I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include 
 -I/home/username/petsc/include/mpiuni-o myexample.o myexample.F90
 gfortran -c  -fPIC -Wall -Wno-unused-variable -g  -fopenmp  
 -I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include 
 -I/home/username/petsc/include/mpiuni-o solver.o solver.F90
 gfortran -fopenmp -fopenmp   -fPIC -Wall -Wno-unused-variable -g  -fopenmp  
 -o myexample myexample.o solver.o 
 -Wl,-rpath,/home/username/petsc/linux-amd64/lib 
 -L/home/username/petsc/linux-amd64/lib  -lpetsc 
 -Wl,-rpath,/home/username/petsc/linux-amd64/lib -lflapack -lfblas -lX11 
 -lpthread -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 
 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -lgfortran -lm -lm -lstdc++ -lstdc++ 
 -ldl -lgcc_s -ldl
 ~/dev/test/petsc $ ./myexample
  Entered petsc.
  Init done.
 WARNING! There are options you set that were not used!
 WARNING! could be spelling mistake, etc!
 Option left: name:-threadcomm_nthreads value: 8
 Option left: name:-threadcomm_type value: openmp
  Finalized.
 ~/dev/test/petsc $

Did you add these options to .petscrc?  Run with -log_summary and send
the output so we can see what configuration you are using.  At the top
of this thread, it looks like you did not configure --with-threadcomm.
I recommend using --with-threadcomm --with-pthreadclasses --with-openmp.


pgpZSqrY9qtwK.pgp
Description: PGP signature


Re: [petsc-dev] openmp

2013-11-06 Thread Svetlana Tkachenko
On Thu, 7 Nov 2013, at 12:56, Jed Brown wrote:
 Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes:
 
  ~/dev/test/petsc $ mv solver.f solver.F
  ~/dev/test/petsc $ make
  gfortran -c  -fPIC -Wall -Wno-unused-variable -g  -fopenmp  
  -I/home/username/petsc/include 
  -I/home/username/petsc/linux-amd64/include 
  -I/home/username/petsc/include/mpiuni-o solver.o solver.F
  solver.F:8.46:
 
if (ierr .ne. 0) call MPI_Abort(PETSC_COMM_WORLD,ierr,ierr
1
  Error: Missing ')' in statement at or before (1)
 
 You indented so far that the expanded macro spilled over the
 72-character line length needed to fit on a punch card in the 1950s.  If
 you would like to modernize your Fortran dialect beyond the constraints
 of punch cards, you could consider naming your file .F90 or adding the
 option -ffree-form, perhaps also with -ffree-line-length-none.


For the bigger (non-test) project, I would really make use of --free-whatever 
options you gave. But simply writing prog: prog.o foo.o ${FLINKER} -ffree-form 
-o $@ $^ ${PETSC_LIB}, and looking at 'make' output, shows that it ignored my 
lines entirely and keeps trying to compile without -ffree-form.


Re: [petsc-dev] openmp

2013-11-06 Thread Jed Brown
Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes:
 For the bigger (non-test) project, I would really make use of
 --free-whatever options you gave. But simply writing prog: prog.o
 foo.o ${FLINKER} -ffree-form -o $@ $^ ${PETSC_LIB}, and looking at
 'make' output, shows that it ignored my lines entirely and keeps
 trying to compile without -ffree-form.

These are *compiler* options, not *linker* options.  Put it in FFLAGS instead.

make FFLAGS=-ffree-form



pgp8WTt6ay3OO.pgp
Description: PGP signature


[petsc-dev] openmp

2013-11-05 Thread Svetlana Tkachenko
I have configured petsc-dev (downloaded it today) with these options, and a 
small example. It appears to fail to compile without MPI with the error message:

./configure --with-cc=gcc --with-fc=gfortran  --download-f-blas-lapack 
--with-openmp --with-mpi=0

~/dev/test/petsc $ echo $LD_LIBRARY_PATH
/home/username/petsc/linux-amd64/lib:/opt/openmpi/lib
~/dev/test/petsc $ cat solver.f
subroutine solver()
#include finclude/petscsys.h

  PetscErrorCode ierr
  print *, Entered petsc.

  ! Init PETSc
  call PetscInitialize(PETSC_NULL_CHARACTER,ierr)
  CHKERRQ(ierr)
  print *, Init done.

  ! Finalise PETSc
  call PetscFinalize(ierr)
  CHKERRQ(ierr)
  print *, Finalized.
 end
~/dev/test/petsc $ cat myexample.f
   program myexample

   call solver
   end
~/dev/test/petsc $ cat makefile
include ${PETSC_DIR}/conf/variables

myexample: myexample.o solver.o ; gfortran -o myexample myexample.o solver.o 
-lpetsc -L${PETSC_DIR}/${PETSC_ARCH}/lib -fopenmp
solver.o: ; gfortran -c -cpp -I${PETSC_DIR}/include -O0 solver.f -lpetsc 
-I${PETSC_DIR}/${PETSC_ARCH}/include -L${PETSC_DIR}/${PETSC_ARCH}/lib -lpetsc 
-fopenmp
myexample.o: ; gfortran -c -cpp -I${PETSC_DIR}/include -O0 myexample.f -lpetsc 
-I${PETSC_DIR}/${PETSC_ARCH}/include -L${PETSC_DIR}/${PETSC_ARCH}/lib -lpetsc 
-fopenmp
~/dev/test/petsc $ make
gfortran -c -cpp -I/home/username/petsc/include -O0 myexample.f -lpetsc 
-I/home/username/petsc/linux-amd64/include 
-L/home/username/petsc/linux-amd64/lib -lpetsc -fopenmp
gfortran -c -cpp -I/home/username/petsc/include -O0 solver.f -lpetsc 
-I/home/username/petsc/linux-amd64/include 
-L/home/username/petsc/linux-amd64/lib -lpetsc -fopenmp
In file included from solver.f:3:
/home/username/petsc/include/finclude/petscsys.h:10: error: mpif.h: No such 
file or directory
/home/username/petsc/include/finclude/petscsys.h:163.29:
Included at solver.f:3:

  parameter(MPIU_SCALAR = MPI_DOUBLE_PRECISION)
 1
Error: Parameter 'mpi_double_precision' at (1) has not been declared or is a 
variable, which does not reduce to a constant expression
/home/username/petsc/include/finclude/petscsys.h:171.30:
Included at solver.f:3:

  parameter(MPIU_INTEGER = MPI_INTEGER)
  1
Error: Parameter 'mpi_integer' at (1) has not been declared or is a variable, 
which does not reduce to a constant expression
make: *** [solver.o] Error 1
~/dev/test/petsc $ 


[petsc-dev] OpenMP in PETSc when calling from Fortran?

2013-03-06 Thread Åsmund Ervik
Hi again,

On 01. mars 2013 20:06, Jed Brown wrote:
 
 Matrix and vector operations are probably running in parallel, but probably
 not the operations that are taking time. Always send -log_summary if you
 have a performance question.
 

I don't think they are running in parallel. When I analyze my code in
Intel Vtune Amplifier, the only routines running in parallel are my own
OpenMP ones. Indeed, if I comment out my OpenMP pragmas and recompile my
code, it never uses more than one thread.

-log_summary is shown below; this is using -pc_type lu -ksp_type bcgs.
The fastest PC for my cases is usually BoomerAMG from HYPRE, so i used
LU instead here in order to limit the test to PETSc only. The summary
agrees with Vtune that MatLUFactorNumeric is the most time-consuming
routine; in general it seems that the PC is always the most time-consuming.

Any advice on how to get OpenMP working?

Regards,
?smund



-- PETSc Performance
Summary: --

./run on a arch-linux2-c-opt named vsl161 with 1 processor, by asmunder
Wed Mar  6 10:14:55 2013
Using Petsc Development HG revision:
58cc6199509f1642f637843f1ca468283bf5ced9  HG Date: Wed Jan 30 00:39:35
2013 -0600

 Max   Max/MinAvg  Total
Time (sec):   4.446e+02  1.0   4.446e+02
Objects:  2.017e+03  1.0   2.017e+03
Flops:3.919e+11  1.0   3.919e+11  3.919e+11
Flops/sec:8.815e+08  1.0   8.815e+08  8.815e+08
MPI Messages: 0.000e+00  0.0   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00  0.0   0.000e+00  0.000e+00
MPI Reductions:   2.818e+03  1.0

Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N
-- 2N flops
and VecAXPY() for complex vectors of length
N -- 8N flops

Summary of Stages:   - Time --  - Flops -  --- Messages
---  -- Message Lengths --  -- Reductions --
Avg %Total Avg %Total   counts
%Total Avg %Total   counts   %Total
 0:  Main Stage: 4.4460e+02 100.0%  3.9191e+11 100.0%  0.000e+00
0.0%  0.000e+000.0%  2.817e+03 100.0%


See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush()
and PetscLogStagePop().
  %T - percent time in this phase %f - percent flops in this
phase
  %M - percent messages in this phase %L - percent message
lengths in this phase
  %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
over all processors)

EventCount  Time (sec) Flops
 --- Global ---  --- Stage ---   Total
   Max Ratio  Max Ratio   Max  Ratio  Mess   Avg len
Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s


--- Event Stage 0: Main Stage

VecDot   802 1.0 9.2811e-02 1.0 1.96e+08 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0  2117
VecDotNorm2  401 1.0 7.1333e-02 1.0 1.96e+08 1.0 0.0e+00 0.0e+00
4.0e+02  0  0  0  0 14   0  0  0  0 14  2755
VecNorm 1203 1.0 7.8265e-02 1.0 2.95e+08 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0  3766
VecCopy  802 1.0 1.1754e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0 0
VecSet  1211 1.0 9.9961e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0 0
VecAXPY  401 1.0 4.5847e-02 1.0 9.82e+07 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0  2143
VecAXPBYCZ   802 1.0 1.3489e-01 1.0 3.93e+08 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0  2913
VecWAXPY 802 1.0 1.2292e-01 1.0 1.96e+08 1.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0  1599
VecAssemblyBegin 802 1.0 2.4509e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0 0
VecAssemblyEnd   802 1.0 6.7234e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0 0
MatMult 

[petsc-dev] OpenMP in PETSc when calling from Fortran?

2013-03-06 Thread Barry Smith

   I don't see any options for turning on the threads here?

  #PETSc Option Table entries:
-ksp_type bcgs
-log_summary
-pc_type lu
#End of PETSc Option Table entries

  From http://www.mcs.anl.gov/petsc/features/threads.html

? The three important run-time options for using threads are:
? -threadcomm_nthreads nthreads: Sets the number of threads
? -threadcomm_affinities list_of_affinities: Sets the core 
affinities of threads
? -threadcomm_type nothread,pthread,openmp: Threading model 
(OpenMP, pthread, nothread)
? Run with -help to see the avialable options with threads.
? A few tutorial examples are located at 
$PETSC_DIR/src/sys/threadcomm/examples/tutorials

  Also LU is a direct solver that is not threaded so using threads for this 
exact run will not help (much) at all. The threads will only show useful speed 
up for iterative methods.

   Barry

  As time goes by we hope to have more extensive support in more routines for 
threads but things like factorization and solve are difficult so out side help 
would be very useful.

On Mar 6, 2013, at 3:39 AM, ?smund Ervik Asmund.Ervik at sintef.no wrote:

 Hi again,
 
 On 01. mars 2013 20:06, Jed Brown wrote:
 
 Matrix and vector operations are probably running in parallel, but probably
 not the operations that are taking time. Always send -log_summary if you
 have a performance question.
 
 
 I don't think they are running in parallel. When I analyze my code in
 Intel Vtune Amplifier, the only routines running in parallel are my own
 OpenMP ones. Indeed, if I comment out my OpenMP pragmas and recompile my
 code, it never uses more than one thread.
 
 -log_summary is shown below; this is using -pc_type lu -ksp_type bcgs.
 The fastest PC for my cases is usually BoomerAMG from HYPRE, so i used
 LU instead here in order to limit the test to PETSc only. The summary
 agrees with Vtune that MatLUFactorNumeric is the most time-consuming
 routine; in general it seems that the PC is always the most time-consuming.
 
 Any advice on how to get OpenMP working?
 
 Regards,
 ?smund
 
 
 
 -- PETSc Performance
 Summary: --
 
 ./run on a arch-linux2-c-opt named vsl161 with 1 processor, by asmunder
 Wed Mar  6 10:14:55 2013
 Using Petsc Development HG revision:
 58cc6199509f1642f637843f1ca468283bf5ced9  HG Date: Wed Jan 30 00:39:35
 2013 -0600
 
 Max   Max/MinAvg  Total
 Time (sec):   4.446e+02  1.0   4.446e+02
 Objects:  2.017e+03  1.0   2.017e+03
 Flops:3.919e+11  1.0   3.919e+11  3.919e+11
 Flops/sec:8.815e+08  1.0   8.815e+08  8.815e+08
 MPI Messages: 0.000e+00  0.0   0.000e+00  0.000e+00
 MPI Message Lengths:  0.000e+00  0.0   0.000e+00  0.000e+00
 MPI Reductions:   2.818e+03  1.0
 
 Flop counting convention: 1 flop = 1 real number operation of type
 (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N
 -- 2N flops
and VecAXPY() for complex vectors of length
 N -- 8N flops
 
 Summary of Stages:   - Time --  - Flops -  --- Messages
 ---  -- Message Lengths --  -- Reductions --
Avg %Total Avg %Total   counts
 %Total Avg %Total   counts   %Total
 0:  Main Stage: 4.4460e+02 100.0%  3.9191e+11 100.0%  0.000e+00
 0.0%  0.000e+000.0%  2.817e+03 100.0%
 
 
 See the 'Profiling' chapter of the users' manual for details on
 interpreting output.
 Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush()
 and PetscLogStagePop().
  %T - percent time in this phase %f - percent flops in this
 phase
  %M - percent messages in this phase %L - percent message
 lengths in this phase
  %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
 over all processors)
 
 EventCount  Time (sec) Flops
 --- Global ---  --- Stage ---   Total
   Max Ratio  Max Ratio   Max  Ratio  Mess   Avg len
 Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
 

[petsc-dev] OpenMP in PETSc when calling from Fortran?

2013-03-01 Thread Jed Brown
On Fri, Mar 1, 2013 at 3:26 AM, ?smund Ervik asmund.ervik at ntnu.no wrote:

 Thanks for clarifying this. I am already using OpenMP pragmas in non-PETSc
 routines in my code, and using petsc-dev. Are you saying that I should also
 somehow use OpenMP pragmas around the calls to KSPSolve etc.?

 Suppose that my program is usually run like this:
 ./run -pc_type gamg -ksp_type bcgs
 with other values left to their defaults, and I want to make it run in
 parallel:
 ./run -pc type gamg -ksp_type bcgs -threadcomm_type openmp
 -threadcomm_nthreads 8

 When I do this, the PC and KSP still run in serial as far as I can tell,
 and the program does not execute faster. What am I missing here?


Matrix and vector operations are probably running in parallel, but probably
not the operations that are taking time. Always send -log_summary if you
have a performance question.


 In case it is of interest, the matrix from my Poisson equation has in the
 range of 0.4 - 1 million nonzero elements, on average 5 per row.

-- next part --
An HTML attachment was scrubbed...
URL: 
http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20130301/795d346f/attachment.html


[petsc-dev] OpenMP in PETSc when calling from Fortran?

2013-03-01 Thread Åsmund Ervik
Hi Barry,

On 28. feb. 2013 17:38, Barry Smith wrote:

  2) You should not need petscthreadcomm.h in Fortran. Simply using OpenMP 
 progmas in your portion of the code.
 

Thanks for clarifying this. I am already using OpenMP pragmas in 
non-PETSc routines in my code, and using petsc-dev. Are you saying that 
I should also somehow use OpenMP pragmas around the calls to KSPSolve etc.?

Suppose that my program is usually run like this:
./run -pc_type gamg -ksp_type bcgs
with other values left to their defaults, and I want to make it run in 
parallel:
./run -pc type gamg -ksp_type bcgs -threadcomm_type openmp 
-threadcomm_nthreads 8

When I do this, the PC and KSP still run in serial as far as I can tell, 
and the program does not execute faster. What am I missing here?

In case it is of interest, the matrix from my Poisson equation has in 
the range of 0.4 - 1 million nonzero elements, on average 5 per row.

Regards,
?smund


[petsc-dev] OpenMP compiler options

2012-05-29 Thread Jed Brown
The OpenMP flags do not definitively identify that OpenMP is used. In
particular, IBM XL interprets Cray's option -h omp as being equivalent to
-soname omp, then silently ignores the Open MP pragmas. We can perhaps
fix this instance by moving -qsmp up in the list, but we may eventually
need to move it to compilerOptions.py.

  def configureLibrary(self):
''' Checks for -fopenmp compiler flag'''
''' Needs to check if OpenMP actually exists and works '''
self.setCompilers.pushLanguage('C')
#
for flag in [-fopenmp, # Gnu
 -h omp,   # Cray
 -mp,  # Portland Group
 -Qopenmp, # Intel windows
 -openmp,  # Intel
  ,# Empty, if compiler automatically accepts
openmp
 -xopenmp, # Sun
 +Oopenmp, # HP
 -qsmp,# IBM XL C/c++
 /openmp   # Microsoft Visual Studio
 ]:
  if self.setCompilers.checkCompilerFlag(flag):
ompflag = flag
break
-- next part --
An HTML attachment was scrubbed...
URL: 
http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120529/cbdc2b8c/attachment.html


[petsc-dev] OpenMP compiler options

2012-05-29 Thread Matthew Knepley
On Tue, May 29, 2012 at 3:52 PM, Jed Brown jedbrown at mcs.anl.gov wrote:

 The OpenMP flags do not definitively identify that OpenMP is used. In
 particular, IBM XL interprets Cray's option -h omp as being equivalent to
 -soname omp, then silently ignores the Open MP pragmas. We can perhaps
 fix this instance by moving -qsmp up in the list, but we may eventually
 need to move it to compilerOptions.py.


Move it up, and add it to the comment. And people think OpenMP is the easy
way?

   Matt


   def configureLibrary(self):
 ''' Checks for -fopenmp compiler flag'''
 ''' Needs to check if OpenMP actually exists and works '''
 self.setCompilers.pushLanguage('C')
 #
 for flag in [-fopenmp, # Gnu
  -h omp,   # Cray
  -mp,  # Portland Group
  -Qopenmp, # Intel windows
  -openmp,  # Intel
   ,# Empty, if compiler automatically accepts
 openmp
  -xopenmp, # Sun
  +Oopenmp, # HP
  -qsmp,# IBM XL C/c++
  /openmp   # Microsoft Visual Studio
  ]:
   if self.setCompilers.checkCompilerFlag(flag):
 ompflag = flag
 break




-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-- next part --
An HTML attachment was scrubbed...
URL: 
http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120529/7c32ca41/attachment.html


[petsc-dev] OpenMP compiler options

2012-05-29 Thread Satish Balay
Perhaps the 'empty' flag check should be at the begining of the list..

[we do have other places in configure where we fix the order in which
flags are checked - due to similar conflicts between compilers]

Satish

On Tue, 29 May 2012, Jed Brown wrote:

 The OpenMP flags do not definitively identify that OpenMP is used. In
 particular, IBM XL interprets Cray's option -h omp as being equivalent to
 -soname omp, then silently ignores the Open MP pragmas. We can perhaps
 fix this instance by moving -qsmp up in the list, but we may eventually
 need to move it to compilerOptions.py.
 
   def configureLibrary(self):
 ''' Checks for -fopenmp compiler flag'''
 ''' Needs to check if OpenMP actually exists and works '''
 self.setCompilers.pushLanguage('C')
 #
 for flag in [-fopenmp, # Gnu
  -h omp,   # Cray
  -mp,  # Portland Group
  -Qopenmp, # Intel windows
  -openmp,  # Intel
   ,# Empty, if compiler automatically accepts
 openmp
  -xopenmp, # Sun
  +Oopenmp, # HP
  -qsmp,# IBM XL C/c++
  /openmp   # Microsoft Visual Studio
  ]:
   if self.setCompilers.checkCompilerFlag(flag):
 ompflag = flag
 break
 




[petsc-dev] OpenMP compiler options

2012-05-29 Thread Jed Brown
On Tue, May 29, 2012 at 10:56 AM, Satish Balay balay at mcs.anl.gov wrote:

 Perhaps the 'empty' flag check should be at the begining of the list..


No, because most compilers ignore the pragmas when no options are given. I
do not know of any portable way, short of running code, to determine
whether a compiler used the OpenMP pragmas or ignored them.



 [we do have other places in configure where we fix the order in which
 flags are checked - due to similar conflicts between compilers]

 Satish

 On Tue, 29 May 2012, Jed Brown wrote:

  The OpenMP flags do not definitively identify that OpenMP is used. In
  particular, IBM XL interprets Cray's option -h omp as being equivalent
 to
  -soname omp, then silently ignores the Open MP pragmas. We can perhaps
  fix this instance by moving -qsmp up in the list, but we may eventually
  need to move it to compilerOptions.py.
 
def configureLibrary(self):
  ''' Checks for -fopenmp compiler flag'''
  ''' Needs to check if OpenMP actually exists and works '''
  self.setCompilers.pushLanguage('C')
  #
  for flag in [-fopenmp, # Gnu
   -h omp,   # Cray
   -mp,  # Portland Group
   -Qopenmp, # Intel windows
   -openmp,  # Intel
,# Empty, if compiler automatically accepts
  openmp
   -xopenmp, # Sun
   +Oopenmp, # HP
   -qsmp,# IBM XL C/c++
   /openmp   # Microsoft Visual Studio
   ]:
if self.setCompilers.checkCompilerFlag(flag):
  ompflag = flag
  break
 


-- next part --
An HTML attachment was scrubbed...
URL: 
http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120529/f717e993/attachment.html


[petsc-dev] OpenMP Support

2012-05-10 Thread Gerard Gorman
Hi Dave

I should just say that we still have not finished a code review with the
petsc-dev team so any faults should be assumed to be ours rather than
theirs!

Michele just put a preprint of our paper covering the preliminary work
on arXiv, which you might found useful:
http://arxiv.org/abs/1205.2005

As we have not yet merged with the truck so you'd need to pull a branch
from:
https://bitbucket.org/wence/petsc-dev-omp
and configure using --with-openmp

A key issue when running is to set thread/core affinity. Unfortunately
there is no general way of doing this - it depends on your compiler. But
likwid-pin can make your life a little easier:
http://code.google.com/p/likwid/wiki/LikwidPin

Cheers
Gerard


Dave Nystrom emailed the following on 10/05/12 05:55:
 Hi Gerard,

 Thanks for the info.  Is there any documentation on how to use the petsc
 OpenMP support?  I would be interested in trying it out.

 Thanks,

 Dave

 Gerard Gorman writes:
   Hi Dave
   
   OpenMP support exists for vec and mat (only AIJ so far). There is a big
   difference in performance depending on available memory bandwidth and
   the compiler OpenMP implementation. In application codes (such as
   Fluidity which is our main target code for this work) there are other
   significant costs such as matrix assembly. So in general you have to
   consider how easy it will be to thread to the other computationally
   expensive sections of your code, otherwise the overall speed-up of your
   application will be modest.
   
   Cheers
   Gerard

   Dave Nystrom emailed the following on 09/05/12 04:29:
Is the pthreads support further along than the OpenMP support?  I have 
 not
tried the pthreads support yet.  Does either the pthreads support or the
OpenMP support implement the matvec or do they just do vector type
operations?
   
Jed Brown writes:
  On Tue, May 8, 2012 at 9:23 PM, Dave Nystrom dnystrom1 at 
 comcast.net wrote:
  
   I see that petsc-dev now has some OpenMP support.  Would a serial, 
 non-mpi
   code that uses petsc-dev be able to gain much performance 
 improvement from
   it
   now for the case of doing sparse linear solve with cg and jacobi
   preconditioning?
  
  
  The kernels are being transitioned to use the threadcomm, which 
 enables
  OpenMP and other threading models.
  
  We anticipate that pthreads will provide the best performance because
  operations can be less synchronous than with OpenMP (for which a 
 parallel
   region implies barrier semantics). But if other parts of an 
 application
  are using OpenMP, it would be preferable for PETSc to also use OpenMP 
 so
  that it can share the same thread pool. The same applies to TBB.
   




[petsc-dev] OpenMP Support

2012-05-09 Thread Gerard Gorman
Hi Dave

OpenMP support exists for vec and mat (only AIJ so far). There is a big
difference in performance depending on available memory bandwidth and
the compiler OpenMP implementation. In application codes (such as
Fluidity which is our main target code for this work) there are other
significant costs such as matrix assembly. So in general you have to
consider how easy it will be to thread to the other computationally
expensive sections of your code, otherwise the overall speed-up of your
application will be modest.

Cheers
Gerard
 
Dave Nystrom emailed the following on 09/05/12 04:29:
 Is the pthreads support further along than the OpenMP support?  I have not
 tried the pthreads support yet.  Does either the pthreads support or the
 OpenMP support implement the matvec or do they just do vector type
 operations?

 Jed Brown writes:
   On Tue, May 8, 2012 at 9:23 PM, Dave Nystrom dnystrom1 at comcast.net 
 wrote:
   
I see that petsc-dev now has some OpenMP support.  Would a serial, 
 non-mpi
code that uses petsc-dev be able to gain much performance improvement 
 from
it
now for the case of doing sparse linear solve with cg and jacobi
preconditioning?
   
   
   The kernels are being transitioned to use the threadcomm, which enables
   OpenMP and other threading models.
   
   We anticipate that pthreads will provide the best performance because
   operations can be less synchronous than with OpenMP (for which a parallel
region implies barrier semantics). But if other parts of an application
   are using OpenMP, it would be preferable for PETSc to also use OpenMP so
   that it can share the same thread pool. The same applies to TBB.




[petsc-dev] OpenMP Support

2012-05-09 Thread Dave Nystrom
Hi Gerard,

Thanks for the info.  Is there any documentation on how to use the petsc
OpenMP support?  I would be interested in trying it out.

Thanks,

Dave

Gerard Gorman writes:
  Hi Dave
  
  OpenMP support exists for vec and mat (only AIJ so far). There is a big
  difference in performance depending on available memory bandwidth and
  the compiler OpenMP implementation. In application codes (such as
  Fluidity which is our main target code for this work) there are other
  significant costs such as matrix assembly. So in general you have to
  consider how easy it will be to thread to the other computationally
  expensive sections of your code, otherwise the overall speed-up of your
  application will be modest.
  
  Cheers
  Gerard
   
  Dave Nystrom emailed the following on 09/05/12 04:29:
   Is the pthreads support further along than the OpenMP support?  I have not
   tried the pthreads support yet.  Does either the pthreads support or the
   OpenMP support implement the matvec or do they just do vector type
   operations?
  
   Jed Brown writes:
 On Tue, May 8, 2012 at 9:23 PM, Dave Nystrom dnystrom1 at comcast.net 
   wrote:
 
  I see that petsc-dev now has some OpenMP support.  Would a serial, 
   non-mpi
  code that uses petsc-dev be able to gain much performance improvement 
   from
  it
  now for the case of doing sparse linear solve with cg and jacobi
  preconditioning?
 
 
 The kernels are being transitioned to use the threadcomm, which enables
 OpenMP and other threading models.
 
 We anticipate that pthreads will provide the best performance because
 operations can be less synchronous than with OpenMP (for which a 
   parallel
  region implies barrier semantics). But if other parts of an application
 are using OpenMP, it would be preferable for PETSc to also use OpenMP so
 that it can share the same thread pool. The same applies to TBB.
  



[petsc-dev] OpenMP Support

2012-05-08 Thread Dave Nystrom
I see that petsc-dev now has some OpenMP support.  Would a serial, non-mpi
code that uses petsc-dev be able to gain much performance improvement from it
now for the case of doing sparse linear solve with cg and jacobi
preconditioning?

Thanks,

Dave



[petsc-dev] OpenMP Support

2012-05-08 Thread Jed Brown
On Tue, May 8, 2012 at 9:23 PM, Dave Nystrom dnystrom1 at comcast.net wrote:

 I see that petsc-dev now has some OpenMP support.  Would a serial, non-mpi
 code that uses petsc-dev be able to gain much performance improvement from
 it
 now for the case of doing sparse linear solve with cg and jacobi
 preconditioning?


The kernels are being transitioned to use the threadcomm, which enables
OpenMP and other threading models.

We anticipate that pthreads will provide the best performance because
operations can be less synchronous than with OpenMP (for which a parallel
 region implies barrier semantics). But if other parts of an application
are using OpenMP, it would be preferable for PETSc to also use OpenMP so
that it can share the same thread pool. The same applies to TBB.
-- next part --
An HTML attachment was scrubbed...
URL: 
http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120508/7f38fe1b/attachment.html


[petsc-dev] OpenMP Support

2012-05-08 Thread Dave Nystrom
Is the pthreads support further along than the OpenMP support?  I have not
tried the pthreads support yet.  Does either the pthreads support or the
OpenMP support implement the matvec or do they just do vector type
operations?

Jed Brown writes:
  On Tue, May 8, 2012 at 9:23 PM, Dave Nystrom dnystrom1 at comcast.net 
  wrote:
  
   I see that petsc-dev now has some OpenMP support.  Would a serial, non-mpi
   code that uses petsc-dev be able to gain much performance improvement from
   it
   now for the case of doing sparse linear solve with cg and jacobi
   preconditioning?
  
  
  The kernels are being transitioned to use the threadcomm, which enables
  OpenMP and other threading models.
  
  We anticipate that pthreads will provide the best performance because
  operations can be less synchronous than with OpenMP (for which a parallel
   region implies barrier semantics). But if other parts of an application
  are using OpenMP, it would be preferable for PETSc to also use OpenMP so
  that it can share the same thread pool. The same applies to TBB.



[petsc-dev] OpenMP Support

2012-05-08 Thread Jed Brown
On Tue, May 8, 2012 at 10:29 PM, Dave Nystrom Dave.Nystrom at tachyonlogic.com
 wrote:

 Is the pthreads support further along than the OpenMP support?  I have not
 tried the pthreads support yet.  Does either the pthreads support or the
 OpenMP support implement the matvec or do they just do vector type
 operations?


The pthreads stuff has been available for a while, including some matrix
kernels, but it's being transformed now, so it will get better.
-- next part --
An HTML attachment was scrubbed...
URL: 
http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120508/c0b41267/attachment.html


[petsc-dev] OpenMP/Vec

2012-02-28 Thread Lawrence Mitchell
Dear all,

I've been doing some of the programming associated with the work 
Gerard's describing, so here's an attempt of a bit of a response with 
some summary of where we'd go next.

On 28/02/12 04:39, Jed Brown wrote:
 On Mon, Feb 27, 2012 at 16:31, Gerard Gorman g.gorman at imperial.ac.uk
 mailto:g.gorman at imperial.ac.uk wrote:

 I had a quick go at trying to get some sensible benchmarks for this but
 I was getting too much system noise. I am particularly interested in
 seeing if the overhead goes to zero if num_threads(1) is used.

 What timing method did you use? I did not see overhead going to zero
 when num_threads goes to 1 when using GCC compilers, but Intel seems to
 do fairly well.

It's possible that if one asks the gcc folk nicely enough that the 
overhead might go to zero.  Alternately, if it turns out that overheads 
are unavoidable, and we've gone down the macro-ification route anyway 
(see below) we could explicitly make a runtime decision based on 
numthreads in PETSc.  So that:

PetscOMPParallelFor(..., numthreads)
for ( ... )
 ;

would turn into
if ( numthreads == 1 )
 for ( ... )
 ;
else
#pragma omp parallel for
 for ( ... )
 ;

The branch should hopefully be cheaper than the thread sync overhead.

 I'm surprised by this. I not aware of any compiler that doesn't have
 OpenMP support - and then you do not actually enable OpenMP compilers
 generally just ignore the pragma. Do you know of any compiler that does
 not have OpenMP support which will complain?


 Sean points out that omp.h might not be available, but that misses the
 point. As far as I know, recent mainstream compilers have enough sense
 to at least ignore these directives, but I'm sure there are still cases
 where it would be an issue. More importantly, #pragma was a misfeature
 that should never be used now that _Pragma() exists. The latter is
 better not just because it can be turned off, but because it can be
 manipulated using macros and can be explicitly compiled out.

 This may not be flexible enough. You frequently want to have a parallel
 region, and then have multiple omp for's within that one region.


 PetscPragmaOMPObject(obj, parallel)
 {
 PetscPragmaOMP(whetever you normally write for this loop)
 for () { }
 ...
 and so on
 }

So this is easy to do as you say, the exact best way to do it will 
probably become clear when writing the code.

As far as attaching the threading information to the object goes, it 
would be nice to do so staying within the sequential implementation 
(rather than copying all the code as is done for pthreads right now). 
I'd envisage something like this:

PetscPragmaOMPObject(vec, parallel)
{
PetscPragmaOMP(for...)
for ( i = PetscVecLowerBound(vec); i  PetscVecUpperBound(vec); i++ )
...;
}

Where one has:

#if defined(PETSC_HAVE_OPENMP)
#define PetscVecLowerBound(vec) compute_lower_bound
#define PetscVecUpperBound(vec) compute_upper_bound
#else
#define PetscVecLowerBound(vec) 0
#define PetscVecUpperBound(vec) vec-map-n
#endif

Or is this too ugly for words?

Computing the lower and upper bounds could either be done for every 
loop, or, if it's only a property of the vector length, we could stash 
it in some extra slots at creation time.  For a first cut, computation 
of the upper and lower bounds will do exactly what a static schedule 
does (i.e. chunking by size/nthread).



 I think what you describe is close to Fig 3 of this paper written by
 your neighbours:
 http://greg.bronevetsky.com/papers/2008IWOMP.pdf
 However, before making the implementation more complex, it would be good
 to benchmark the current approach and use a tool like likwid to measure
 the NUMA traffic so we can get a good handle on the costs.


 Sure.

 Well this is where the implementation details get richer and there are
 many options - they also become less portable. For example, what does
 all this mean for the sparc64 processors which are UMA.


 Delay to runtime, use an ignorant partition for UMA. (Blue Gene/Q is
 also essentially uniform.) But note that even with uniform memory, cache
 still makes it somewhat hierarchical.

 Not to mention
 Intel MIC which also supports OpenMP. I guess I am cautious about
 getting too bogged down with very invasive optimisations until we have
 benchmarked the basic approach which in a wide range of use cases will
 achieve good thread/page locality as illustrated previously.


 I guess I'm just interested in exposing enough semantic information to
 be able to schedule a few different ways using run-time (or, if
 absolutely necessary, configure-time) options. I don't want to have to
 revisit individual loops.

So to take a single, representative, kernel we move from something like:

PetscErrorCode VecConjugate_Seq(Vec xin)
{
   PetscScalar*x;
   PetscInt   n = xin-map-n;
   PetscInt   i;
   PetscErrorCode ierr;

   

[petsc-dev] OpenMP/Vec

2012-02-27 Thread Jed Brown
On Sun, Feb 26, 2012 at 22:54, recrusader recrusader at gmail.com wrote:

 For multithread support in PETSc, my question is whether KSP and/or PC
 work when Vec and Mat use multithread mode.


Yes
-- next part --
An HTML attachment was scrubbed...
URL: 
http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120227/e7e6ba7f/attachment.html


[petsc-dev] OpenMP/Vec

2012-02-27 Thread Gerard Gorman
Jed Brown emailed the following on 27/02/12 00:39:
 On Sun, Feb 26, 2012 at 04:07, Gerard Gorman g.gorman at imperial.ac.uk
 mailto:g.gorman at imperial.ac.uk wrote:

  Did you post a repository yet? I'd like to have a look at the code.

 It's on Barry's favourite collaborative software development site of
 course ;-)

 https://bitbucket.org/wence/petsc-dev-omp/overview


 I looked through the code and I'm concerned that all the OpenMP code
 is inlined into vec/impls/seq and mat/impls/aij/seq with, as far as I
 can tell, no way to use OpenMP for some objects in a simulation, but
 not others. I think that all the pragmas should have
 num_threads(x-nthreads) clauses. We can compute the correct number of
 threads based on sizes when memory is allocated (or specified through
 command line options, inherited from related objects, etc).

The num_threads(x-nthreads) is worth investigating. However, in the
benchmarks I have done so far it would seem that the only two sensible
values for nthreads would be 1 or the total number of threads. When you
set num_threads(1) it seems that OpenMP (for some implementations at
least) is sensible enough not to do anything silly that would introduce
scheduling/synchronisation overheads. Once you use more than one thread
then you incur the scheduling/synchronisation overheads. As you increase
the number of threads you may for small arrays see parallel efficiency
decreasing, but I have not seen the actual time increasing. If this is a
general result then it might not be a big deal that OpenMP is inlined
into vec/impls/seq and mat/impls/aij/seq so long as the num_threads was
used as suggested.

For determining where the cut off array size for using all or one thread
- as interesting option would be to determine this at run time, i.e.
learn at run time what the appropriate is.



 I don't think we can get away from OpenMP schedule overhead (several
 hundred cycles) even for those objects that we choose not to use
 threads for, but (at least with my gcc tests), that overhead is only
 about a third of the cost of actually starting a parallel region.

I had a quick go at trying to get some sensible benchmarks for this but
I was getting too much system noise. I am particularly interested in
seeing if the overhead goes to zero if num_threads(1) is used. The next
step is to take a look at the EPCC OpenMP microbenchmarks to see if they
have tied down these issues.



 It's really not acceptable to insert unguarded

 #pragma omp ...

 into the code because this will generate tons of warnings or errors
 with compilers that don't know about OpenMP. It would be better to
 test for _Pragma and use


I'm surprised by this. I not aware of any compiler that doesn't have
OpenMP support - and then you do not actually enable OpenMP compilers
generally just ignore the pragma. Do you know of any compiler that does
not have OpenMP support which will complain?



 #define PetscPragmatize(x) _Pragma(#x)
 #if defined(PETSC_USE_OPENMP)
 #  define PetscPragmaOMP(x) PetscPragmatize(omp x)
 #else
 #  define PetscPragmaOMP(x)
 #endif

 then use

 PetscPragmaOMP(parallel for ...)

 We should probably use a variant for object-based threading

 #define PetscPragmaOMPObject(obj,x) PetscPragmaOMP(x
 num_threads((obj)-nthreads))


This may not be flexible enough. You frequently want to have a parallel
region, and then have multiple omp for's within that one region.



 In the case of multiple objects, I think you usually want the object
 being modified to control the number of threads.

I take this point.



 In many cases, I would prefer more control over the partition of the
 loop. For example, in many cases, I'd be willing to tolerate a slight
 computational imbalance between threads in exchange for working
 exclusively within my page. Note that the arithmetic to compute such
 things is orders of magnitude less expensive than the
 schedule/distribution to threads. I don't know how to do that except to

 PragmaOMP(parallel) {
   int nthreads = omp_get_num_threads();
   int tnum = omp_get_thread_num();
   int start,end;
   // compute start and end
   for (int i=start; iend; i++) {
 // the work
   }
 }

 We could perhaps capture some of this common logic in a macro:

 #define VecOMPParallelBegin(X,args) do { \
   PragmaOMPObject(X,parallel args) { \
   PetscInt _start, _end; \
   VecOMPGetThreadLocalPart(X,_start,_end); \
   { do {} while(0)

 #define VecOMPParallelEnd() }}} while(0)

 VecOMPParallelBegin(X, shared/private ...);
 {
   PetscInt i;
   for (i=_start; i_end; i++) {
 // the work
   }
 }
 VecOMPParallelEnd();

 That should reasonably give us complete run-time control of the number
 of parallel threads per object and their distribution, within the
 constraints of contiguous thread partition.

I think what you describe is close to Fig 3 of this paper written by
your neighbours:
http://greg.bronevetsky.com/papers/2008IWOMP.pdf
However, before making the implementation more complex, it 

[petsc-dev] OpenMP/Vec

2012-02-27 Thread Sean Farley
On Mon, Feb 27, 2012 at 4:31 PM, Gerard Gorman g.gorman at 
imperial.ac.ukwrote:

 I'm surprised by this. I not aware of any compiler that doesn't have
 OpenMP support - and then you do not actually enable OpenMP compilers
 generally just ignore the pragma. Do you know of any compiler that does
 not have OpenMP support which will complain?


clang (which is on any mac with xcode = 4.0) will not compile any OpenMP
(but gcc will compile just fine):


fatal error: 'omp.h' file not found
#include omp.h
 ^
1 error generated.


 It's not in any llvm stuff as of last month:

http://www.phoronix.com/scan.php?page=news_itempx=MTA0Mzc

Putting this layer of abstraction is really needed (especially if there are
any calls to omp_ functions)
-- next part --
An HTML attachment was scrubbed...
URL: 
http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120227/b6e0e269/attachment.html


[petsc-dev] OpenMP/Vec

2012-02-27 Thread Jed Brown
On Mon, Feb 27, 2012 at 16:31, Gerard Gorman g.gorman at imperial.ac.ukwrote:

 I had a quick go at trying to get some sensible benchmarks for this but
 I was getting too much system noise. I am particularly interested in
 seeing if the overhead goes to zero if num_threads(1) is used.


What timing method did you use? I did not see overhead going to zero when
num_threads goes to 1 when using GCC compilers, but Intel seems to do
fairly well.



 I'm surprised by this. I not aware of any compiler that doesn't have
 OpenMP support - and then you do not actually enable OpenMP compilers
 generally just ignore the pragma. Do you know of any compiler that does
 not have OpenMP support which will complain?


Sean points out that omp.h might not be available, but that misses the
point. As far as I know, recent mainstream compilers have enough sense to
at least ignore these directives, but I'm sure there are still cases where
it would be an issue. More importantly, #pragma was a misfeature that
should never be used now that _Pragma() exists. The latter is better not
just because it can be turned off, but because it can be manipulated using
macros and can be explicitly compiled out.


 This may not be flexible enough. You frequently want to have a parallel
 region, and then have multiple omp for's within that one region.


PetscPragmaOMPObject(obj, parallel)
{
PetscPragmaOMP(whetever you normally write for this loop)
for () { }
...
and so on
}


 I think what you describe is close to Fig 3 of this paper written by
 your neighbours:
 http://greg.bronevetsky.com/papers/2008IWOMP.pdf
 However, before making the implementation more complex, it would be good
 to benchmark the current approach and use a tool like likwid to measure
 the NUMA traffic so we can get a good handle on the costs.


Sure.


 Well this is where the implementation details get richer and there are
 many options - they also become less portable. For example, what does
 all this mean for the sparc64 processors which are UMA.


Delay to runtime, use an ignorant partition for UMA. (Blue Gene/Q is also
essentially uniform.) But note that even with uniform memory, cache still
makes it somewhat hierarchical.


 Not to mention
 Intel MIC which also supports OpenMP. I guess I am cautious about
 getting too bogged down with very invasive optimisations until we have
 benchmarked the basic approach which in a wide range of use cases will
 achieve good thread/page locality as illustrated previously.


I guess I'm just interested in exposing enough semantic information to be
able to schedule a few different ways using run-time (or, if absolutely
necessary, configure-time) options. I don't want to have to revisit
individual loops.
-- next part --
An HTML attachment was scrubbed...
URL: 
http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120227/5163a159/attachment.html


[petsc-dev] OpenMP/Vec

2012-02-26 Thread Jed Brown
On Sun, Feb 26, 2012 at 04:07, Gerard Gorman g.gorman at imperial.ac.ukwrote:

  Did you post a repository yet? I'd like to have a look at the code.

 It's on Barry's favourite collaborative software development site of
 course ;-)

 https://bitbucket.org/wence/petsc-dev-omp/overview


I looked through the code and I'm concerned that all the OpenMP code is
inlined into vec/impls/seq and mat/impls/aij/seq with, as far as I can
tell, no way to use OpenMP for some objects in a simulation, but not
others. I think that all the pragmas should have num_threads(x-nthreads)
clauses. We can compute the correct number of threads based on sizes when
memory is allocated (or specified through command line options, inherited
from related objects, etc).

I don't think we can get away from OpenMP schedule overhead (several
hundred cycles) even for those objects that we choose not to use threads
for, but (at least with my gcc tests), that overhead is only about a third
of the cost of actually starting a parallel region.

It's really not acceptable to insert unguarded

#pragma omp ...

into the code because this will generate tons of warnings or errors with
compilers that don't know about OpenMP. It would be better to test for
_Pragma and use

#define PetscPragmatize(x) _Pragma(#x)
#if defined(PETSC_USE_OPENMP)
#  define PetscPragmaOMP(x) PetscPragmatize(omp x)
#else
#  define PetscPragmaOMP(x)
#endif

then use

PetscPragmaOMP(parallel for ...)

We should probably use a variant for object-based threading

#define PetscPragmaOMPObject(obj,x) PetscPragmaOMP(x
num_threads((obj)-nthreads))

In the case of multiple objects, I think you usually want the object being
modified to control the number of threads.

In many cases, I would prefer more control over the partition of the loop.
For example, in many cases, I'd be willing to tolerate a slight
computational imbalance between threads in exchange for working exclusively
within my page. Note that the arithmetic to compute such things is orders
of magnitude less expensive than the schedule/distribution to threads. I
don't know how to do that except to

PragmaOMP(parallel) {
  int nthreads = omp_get_num_threads();
  int tnum = omp_get_thread_num();
  int start,end;
  // compute start and end
  for (int i=start; iend; i++) {
// the work
  }
}

We could perhaps capture some of this common logic in a macro:

#define VecOMPParallelBegin(X,args) do { \
  PragmaOMPObject(X,parallel args) { \
  PetscInt _start, _end; \
  VecOMPGetThreadLocalPart(X,_start,_end); \
  { do {} while(0)

#define VecOMPParallelEnd() }}} while(0)

VecOMPParallelBegin(X, shared/private ...);
{
  PetscInt i;
  for (i=_start; i_end; i++) {
// the work
  }
}
VecOMPParallelEnd();

That should reasonably give us complete run-time control of the number of
parallel threads per object and their distribution, within the constraints
of contiguous thread partition. That also leaves open the possibility of
using libnuma to query and migrate pages. (For example, a short vector that
needs to be accessed from multiple NUMA nodes might intentionally be
faulted with pages spread apart even though other vectors of similar size
might be accessed from within one NUMA nodes and thus not use threads at
all. (One 4 KiB page is only 512 doubles, but if the memory is local to a
single NUMA node, we wouldn't use threads until the vector length was 4 to
8 times larger.)
-- next part --
An HTML attachment was scrubbed...
URL: 
http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120226/164b1637/attachment.html


[petsc-dev] OpenMP/Vec

2012-02-26 Thread recrusader
Dear Jed,

For multithread support in PETSc, my question is whether KSP and/or PC work
when Vec and Mat use multithread mode.

Thanks,
Yujie

On Sun, Feb 26, 2012 at 6:39 PM, Jed Brown jedbrown at mcs.anl.gov wrote:

 On Sun, Feb 26, 2012 at 04:07, Gerard Gorman g.gorman at 
 imperial.ac.ukwrote:

  Did you post a repository yet? I'd like to have a look at the code.

 It's on Barry's favourite collaborative software development site of
 course ;-)

 https://bitbucket.org/wence/petsc-dev-omp/overview


 I looked through the code and I'm concerned that all the OpenMP code is
 inlined into vec/impls/seq and mat/impls/aij/seq with, as far as I can
 tell, no way to use OpenMP for some objects in a simulation, but not
 others. I think that all the pragmas should have num_threads(x-nthreads)
 clauses. We can compute the correct number of threads based on sizes when
 memory is allocated (or specified through command line options, inherited
 from related objects, etc).

 I don't think we can get away from OpenMP schedule overhead (several
 hundred cycles) even for those objects that we choose not to use threads
 for, but (at least with my gcc tests), that overhead is only about a third
 of the cost of actually starting a parallel region.

 It's really not acceptable to insert unguarded

 #pragma omp ...

 into the code because this will generate tons of warnings or errors with
 compilers that don't know about OpenMP. It would be better to test for
 _Pragma and use

 #define PetscPragmatize(x) _Pragma(#x)
 #if defined(PETSC_USE_OPENMP)
 #  define PetscPragmaOMP(x) PetscPragmatize(omp x)
 #else
 #  define PetscPragmaOMP(x)
 #endif

 then use

 PetscPragmaOMP(parallel for ...)

 We should probably use a variant for object-based threading

 #define PetscPragmaOMPObject(obj,x) PetscPragmaOMP(x
 num_threads((obj)-nthreads))

 In the case of multiple objects, I think you usually want the object being
 modified to control the number of threads.

 In many cases, I would prefer more control over the partition of the loop.
 For example, in many cases, I'd be willing to tolerate a slight
 computational imbalance between threads in exchange for working exclusively
 within my page. Note that the arithmetic to compute such things is orders
 of magnitude less expensive than the schedule/distribution to threads. I
 don't know how to do that except to

 PragmaOMP(parallel) {
   int nthreads = omp_get_num_threads();
   int tnum = omp_get_thread_num();
   int start,end;
   // compute start and end
   for (int i=start; iend; i++) {
 // the work
   }
 }

 We could perhaps capture some of this common logic in a macro:

 #define VecOMPParallelBegin(X,args) do { \
   PragmaOMPObject(X,parallel args) { \
   PetscInt _start, _end; \
   VecOMPGetThreadLocalPart(X,_start,_end); \
   { do {} while(0)

 #define VecOMPParallelEnd() }}} while(0)

 VecOMPParallelBegin(X, shared/private ...);
 {
   PetscInt i;
   for (i=_start; i_end; i++) {
 // the work
   }
 }
 VecOMPParallelEnd();

 That should reasonably give us complete run-time control of the number of
 parallel threads per object and their distribution, within the constraints
 of contiguous thread partition. That also leaves open the possibility of
 using libnuma to query and migrate pages. (For example, a short vector that
 needs to be accessed from multiple NUMA nodes might intentionally be
 faulted with pages spread apart even though other vectors of similar size
 might be accessed from within one NUMA nodes and thus not use threads at
 all. (One 4 KiB page is only 512 doubles, but if the memory is local to a
 single NUMA node, we wouldn't use threads until the vector length was 4 to
 8 times larger.)

-- next part --
An HTML attachment was scrubbed...
URL: 
http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120226/c87f9ab1/attachment.html


[petsc-dev] OpenMP/Vec

2012-02-16 Thread Gerard Gorman
Hi

I have been running benchmarks on the OpenMP branch of petsc-dev on an
Intel Westmere (Intel(R) Xeon(R) CPU X5670 @ 2.93GHz).

You can see all graphs + test code + code to generate results in the tar
ball linked below and I am just going to give a quick summary here.
http://amcg.ese.ic.ac.uk/~ggorman/omp_vec_benchmarks.tar.gz

There are 3 sets of results:

gcc/ : GCC 4.6
intel/ : Intel 12.0 with MKL
intel-pinning/ : as above put applying hard affinity.

Files matching  mpi_*.pdf show the MPI speedup and parallel efficiency
for a range of vector sizes. Similarly for omp_*.pdf with respect to
OpenMP. The remaining files directly compare scaling of MPI Vs OpenMP
for the various tests for the largest vector size.

I think the results are very encouraging and there are many interesting
little details in there. I am just going to summarise a few here that I
think are particularly important.

1. In most cases the threaded code performs as well as, and in many
cases better then the mpi code.

2. For GCC I did not use a threaded blas. For Intel I used
-lmkl_intel_thread. However, it appears dnrm2 is not threaded. It seems
to be a common feature among other threaded blas libraries that Level 1
is not completely threaded (e.g. cray). Unfortunately most of this is
experience/anecdotal information. I do not know of any proper survey. We
have the option here of either rolling our own or ignoring the issue
until profiling shows it is a problem...and eventually someone else will
release a fully threaded blas.

3. Comparing intel/ and intel-pinning/ is particularly interesting.
First touch has been applied to all memory in VecCreate so that memory
should be paged correctly for NUMA. But first touch does not gain you
much if threads migrate, so for the intel-pinning/ results I set the env
KMP_AFFINITY=scatter to get hard affinity. You can clearly from the
results that this improves parallel efficiency by a few percentage
points in many cases. It also really smooths out efficiency dips as you
run on different number of threads.

Full blown benchmarks would not make a lot of sense until we get the Mat
classes threaded in a similar fashion. However, at this point I would
like feedback on the direction this is taking and if we can start
getting code committed.

Cheers
Gerard




[petsc-dev] OpenMP support

2012-02-14 Thread Gerard Gorman
Matthew Knepley emailed the following on 14/02/12 00:34:

 As a first step - can we add OpenMP support to PETSc conf?
 Lawrence made
 a first pass at this:
 
 https://bitbucket.org/wence/petsc-dev-omp/src/52afd5fd2c25/config/PETSc/packages/openmp.py
 It does need extending because it will fail in it is current state
 for a
 number of compilers. I am guessing we would have to reimplement
 something like ax_openmp or similar...what is the right thing to
 do here?


 Can you explain the right test?

The definitive test would be to try to compile a test code: eg

int main(){
#ifndef _OPENMP
choke me
#endif
return 0;
}

The OpenMP standard specifies that the preprocessor macro name _OPENMP
should be set.

As for the actual compile flag ... reading FindOpenMP.cmake you can see
that they take the test compile approach as above, and they have a list
of candidates (OpenMP_C_FLAG_CANDIDATES) which are tested. Another
option would be to allow --with-openmp[=magic_flag] for the odd compiler
not recognised. For example I notice that the Fujitsu compiler is not
listed in FindOpenMP.cmake.

Cheers
Gerard




[petsc-dev] OpenMP support

2012-02-13 Thread Gerard Gorman
Hi

I have been working with Lawrence Mitchell and Michele Weiland at EPCC
to add OpenMP support to the mat/vec classes and we are at the stage
that we would like to give other other people a chance to play with it.

Lawrence put a branch on bitbucket if you want to browse:
https://bitbucket.org/wence/petsc-dev-omp/overview

I think that there is a lot in there that needs
discussion(/modification), so I propose that we try to break this into a
number of discussions/steps (also I do not want to write a several page
email).

I am making the assumption here that OpenMP is interesting enough to
want to put in PETSc. Other than the garden variety multicore, it can
also be used on Intel MIC. Does this need further discussion?

As a first step - can we add OpenMP support to PETSc conf? Lawrence made
a first pass at this:
https://bitbucket.org/wence/petsc-dev-omp/src/52afd5fd2c25/config/PETSc/packages/openmp.py
It does need extending because it will fail in it is current state for a
number of compilers. I am guessing we would have to reimplement
something like ax_openmp or similar...what is the right thing to do here?

Regarding cmake - I only recently learned cmake myself and I have been
using:
FIND_PACKAGE(OpenMP) (i.e. cmake does all the heavy lifting in
FindOpenMP.cmake which it provides). However, as PETSc does not appear
to use the find_package feature I am not sure what approach should be
adopted for PETSc. Suggestions?

Cheers
Gerard




[petsc-dev] OpenMP support

2012-02-13 Thread Matthew Knepley
On Mon, Feb 13, 2012 at 9:28 AM, Gerard Gorman g.gorman at 
imperial.ac.ukwrote:

 Hi

 I have been working with Lawrence Mitchell and Michele Weiland at EPCC
 to add OpenMP support to the mat/vec classes and we are at the stage
 that we would like to give other other people a chance to play with it.

 Lawrence put a branch on bitbucket if you want to browse:
 https://bitbucket.org/wence/petsc-dev-omp/overview

 I think that there is a lot in there that needs
 discussion(/modification), so I propose that we try to break this into a
 number of discussions/steps (also I do not want to write a several page
 email).

 I am making the assumption here that OpenMP is interesting enough to
 want to put in PETSc. Other than the garden variety multicore, it can
 also be used on Intel MIC. Does this need further discussion?

 As a first step - can we add OpenMP support to PETSc conf? Lawrence made
 a first pass at this:

 https://bitbucket.org/wence/petsc-dev-omp/src/52afd5fd2c25/config/PETSc/packages/openmp.py
 It does need extending because it will fail in it is current state for a
 number of compilers. I am guessing we would have to reimplement
 something like ax_openmp or similar...what is the right thing to do here?


Can you explain the right test?


 Regarding cmake - I only recently learned cmake myself and I have been
 using:
 FIND_PACKAGE(OpenMP) (i.e. cmake does all the heavy lifting in
 FindOpenMP.cmake which it provides). However, as PETSc does not appear
 to use the find_package feature I am not sure what approach should be
 adopted for PETSc. Suggestions?


If the configure works right, there is nothing left to do for CMake.

  Matt


 Cheers
 Gerard




-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-- next part --
An HTML attachment was scrubbed...
URL: 
http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120213/af0ebdbf/attachment.html