Re: [petsc-dev] OpenMP
Ah, great. I guessed right (that was the page I was looking at but I still don't see this) #if defined(_OPENMP) && _OPENMP >= 201811 On Sat, Nov 6, 2021 at 8:47 PM Junchao Zhang wrote: > > > On Sat, Nov 6, 2021 at 3:51 PM Mark Adams wrote: > >> Yea, that is a bit inscrutable, but I see mumps is the main/only user of >> this: >> >> /* if using PETSc OpenMP support, we only call MUMPS on master ranks. >> Before/after the call, we change/restore CPUs the master ranks can run on */ >> >> And I see _OPENMP is a macro for the release date (mm) of the OMP >> version. It's not clear what the v5.0 is ( >> https://www.openmp.org/specifications/) >> > {200505,"2.5"},{200805,"3.0"},{201107,"3.1"},{201307,"4.0"},{201511,"4.5"},{201811,"5.0"},{202011,"5.1"} > > On Sat, Nov 6, 2021 at 4:27 PM Junchao Zhang >> wrote: >> >>> >>> >>> On Sat, Nov 6, 2021 at 5:51 AM Mark Adams wrote: >>> Two questions on OMP: * Can I test for the version of OMP? I want >= 5 and I see this, which looks promising: include/petscsys.h:#elif defined(_OPENMP) && *_OPENMP >= 201307* && !defined(_WIN32) * What is the difference between HAVE_OPENMP and HAVE_OPENMP_SUPPORT. # this is different from HAVE_OPENMP. HAVE_OPENMP_SUPPORT checks if we >>> have facilities to support >>> # running PETSc in flat-MPI mode and third party libraries in MPI+OpenMP >>> hybrid mode >>> if self.mpi.found and self.mpi.support_mpi3_shm and self.pthread.found >>> and self.hwloc.found: >>> # Apple pthread does not provide this functionality >>> if self.function.check('pthread_barrier_init', libraries = 'pthread'): >>> self.addDefine('HAVE_OPENMP_SUPPORT', 1) >>> >>> Thanks, Mark >>>
Re: [petsc-dev] OpenMP
On Sat, Nov 6, 2021 at 3:51 PM Mark Adams wrote: > Yea, that is a bit inscrutable, but I see mumps is the main/only user of > this: > > /* if using PETSc OpenMP support, we only call MUMPS on master ranks. > Before/after the call, we change/restore CPUs the master ranks can run on */ > > And I see _OPENMP is a macro for the release date (mm) of the OMP > version. It's not clear what the v5.0 is ( > https://www.openmp.org/specifications/) > {200505,"2.5"},{200805,"3.0"},{201107,"3.1"},{201307,"4.0"},{201511,"4.5"},{201811,"5.0"},{202011,"5.1"} On Sat, Nov 6, 2021 at 4:27 PM Junchao Zhang > wrote: > >> >> >> On Sat, Nov 6, 2021 at 5:51 AM Mark Adams wrote: >> >>> Two questions on OMP: >>> >>> * Can I test for the version of OMP? I want >= 5 and I see this, which >>> looks promising: >>> include/petscsys.h:#elif defined(_OPENMP) && *_OPENMP >= 201307* && >>> !defined(_WIN32) >>> >>> * What is the difference between HAVE_OPENMP and >>> HAVE_OPENMP_SUPPORT. >>> >>> # this is different from HAVE_OPENMP. HAVE_OPENMP_SUPPORT checks if we >> have facilities to support >> # running PETSc in flat-MPI mode and third party libraries in MPI+OpenMP >> hybrid mode >> if self.mpi.found and self.mpi.support_mpi3_shm and self.pthread.found >> and self.hwloc.found: >> # Apple pthread does not provide this functionality >> if self.function.check('pthread_barrier_init', libraries = 'pthread'): >> self.addDefine('HAVE_OPENMP_SUPPORT', 1) >> >> >>> Thanks, >>> Mark >>> >>
Re: [petsc-dev] OpenMP
Yea, that is a bit inscrutable, but I see mumps is the main/only user of this: /* if using PETSc OpenMP support, we only call MUMPS on master ranks. Before/after the call, we change/restore CPUs the master ranks can run on */ And I see _OPENMP is a macro for the release date (mm) of the OMP version. It's not clear what the v5.0 is ( https://www.openmp.org/specifications/) On Sat, Nov 6, 2021 at 4:27 PM Junchao Zhang wrote: > > > On Sat, Nov 6, 2021 at 5:51 AM Mark Adams wrote: > >> Two questions on OMP: >> >> * Can I test for the version of OMP? I want >= 5 and I see this, which >> looks promising: >> include/petscsys.h:#elif defined(_OPENMP) && *_OPENMP >= 201307* && >> !defined(_WIN32) >> >> * What is the difference between HAVE_OPENMP and >> HAVE_OPENMP_SUPPORT. >> >> # this is different from HAVE_OPENMP. HAVE_OPENMP_SUPPORT checks if we > have facilities to support > # running PETSc in flat-MPI mode and third party libraries in MPI+OpenMP > hybrid mode > if self.mpi.found and self.mpi.support_mpi3_shm and self.pthread.found and > self.hwloc.found: > # Apple pthread does not provide this functionality > if self.function.check('pthread_barrier_init', libraries = 'pthread'): > self.addDefine('HAVE_OPENMP_SUPPORT', 1) > > >> Thanks, >> Mark >> >
Re: [petsc-dev] OpenMP
On Sat, Nov 6, 2021 at 5:51 AM Mark Adams wrote: > Two questions on OMP: > > * Can I test for the version of OMP? I want >= 5 and I see this, which > looks promising: > include/petscsys.h:#elif defined(_OPENMP) && *_OPENMP >= 201307* && > !defined(_WIN32) > > * What is the difference between HAVE_OPENMP and > HAVE_OPENMP_SUPPORT. > > # this is different from HAVE_OPENMP. HAVE_OPENMP_SUPPORT checks if we have facilities to support # running PETSc in flat-MPI mode and third party libraries in MPI+OpenMP hybrid mode if self.mpi.found and self.mpi.support_mpi3_shm and self.pthread.found and self.hwloc.found: # Apple pthread does not provide this functionality if self.function.check('pthread_barrier_init', libraries = 'pthread'): self.addDefine('HAVE_OPENMP_SUPPORT', 1) > Thanks, > Mark >
[petsc-dev] OpenMP
Two questions on OMP: * Can I test for the version of OMP? I want >= 5 and I see this, which looks promising: include/petscsys.h:#elif defined(_OPENMP) && *_OPENMP >= 201307* && !defined(_WIN32) * What is the difference between HAVE_OPENMP and HAVE_OPENMP_SUPPORT. Thanks, Mark
Re: [petsc-dev] OpenMP and web page
I agree. We should just make sure it "disappears" with the transition to the Spinx docs. Barry > On Jan 15, 2021, at 10:32 AM, jacob@gmail.com wrote: > > Alternatively, I simply don’t include it in the MR to port the website to > sphinx. > > Best regards, > > Jacob Faibussowitsch > (Jacob Fai - booss - oh - vitch) > Cell: (312) 694-3391 > > From: petsc-dev On Behalf Of Mark Adams > Sent: Friday, January 15, 2021 10:20 > To: For users of the development version of PETSc > Subject: [petsc-dev] OpenMP and web page > > I am experimenting with launching asynchronous GPU solves in > PCFieldsplit/additive using OpenMP and I found this web page, which I think > is obsolete and should be removed: > https://www.mcs.anl.gov/petsc/miscellaneous/petscthreads.html > <https://www.mcs.anl.gov/petsc/miscellaneous/petscthreads.html> > > Mark
Re: [petsc-dev] OpenMP and web page
Alternatively, I simply don’t include it in the MR to port the website to sphinx. Best regards, Jacob Faibussowitsch (Jacob Fai - booss - oh - vitch) Cell: (312) 694-3391 From: petsc-dev On Behalf Of Mark Adams Sent: Friday, January 15, 2021 10:20 To: For users of the development version of PETSc Subject: [petsc-dev] OpenMP and web page I am experimenting with launching asynchronous GPU solves in PCFieldsplit/additive using OpenMP and I found this web page, which I think is obsolete and should be removed: https://www.mcs.anl.gov/petsc/miscellaneous/petscthreads.html Mark
[petsc-dev] OpenMP and web page
I am experimenting with launching asynchronous GPU solves in PCFieldsplit/additive using OpenMP and I found this web page, which I think is obsolete and should be removed: https://www.mcs.anl.gov/petsc/miscellaneous/petscthreads.html Mark
Re: [petsc-dev] OpenMP for GPU course
> On Mar 19, 2019, at 12:22 PM, Matthew Knepley wrote: > > On Tue, Mar 19, 2019 at 1:17 PM Jed Brown wrote: > Matthew Knepley writes: > > > Are you saying that using OpenMP 4.5 "offload" is something you would do? > > For applications that make sense for GPUs, yes. > > Would you say this workshop is oriented toward application that make sense > for GPUs, or > are they trying to sell this to everyone regardless of the suitability? bnl website is currently off-line; at least from my location > > Matt > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/
Re: [petsc-dev] OpenMP for GPU course
"Smith, Barry F." writes: >> On Mar 19, 2019, at 12:17 PM, Jed Brown wrote: >> >> Matthew Knepley writes: >> >>> Are you saying that using OpenMP 4.5 "offload" is something you would do? >> >> For applications that make sense for GPUs, yes. > > Oh, so you mean for AI ;) We really missed the boat by not rebranding as PETAi ("intelligence is the easy part") and registering pet.ai.
Re: [petsc-dev] OpenMP for GPU course
> On Mar 19, 2019, at 12:17 PM, Jed Brown wrote: > > Matthew Knepley writes: > >> Are you saying that using OpenMP 4.5 "offload" is something you would do? > > For applications that make sense for GPUs, yes. Oh, so you mean for AI ;)
Re: [petsc-dev] OpenMP for GPU course
> On Mar 19, 2019, at 12:12 PM, Matthew Knepley wrote: > > On Tue, Mar 19, 2019 at 12:31 PM Jed Brown via petsc-dev > wrote: > These are well-organized events that puts application teams together > with compiler developers and the like. I served as a mentor for the > Boulder hackathon last year (joining a NASA team that included PETSc > user Gaetan Kenway) and learned a lot. It's highly recommended to > prepare for the event by focusing the code you want to work on to > something concrete and easily modifiable, and to have a test suite that > can be easily run to evaluate correctness and performance. In our case, > Michael Barad had done an exemplary job in that preparation. > > Are you saying that using OpenMP 4.5 "offload" is something you would do? It may be the only choice ;(. In addition, if it does perform poorly it would be good to have concrete evidence that it performs poorly. > > Matt > > "Smith, Barry F. via petsc-dev" writes: > > >I got this off an ECP mailing list. > > > > > > OpenMP Brookathon 2019 > > April 29 – May 2, 2019 > > URL: https://www.bnl.gov/ompbrookathon2019/ > > The Computational Science Initiative at Brookhaven National Laboratory > > (BNL) is organizing in conjunction with Oak Ridge National Laboratory > > (ORNL) and IBM, the "OpenMP Brookathon 2019". This event is sponsored by > > ECP and driven by the ECP SOLLVE Project. The goal of this hackathon is to > > port, optimize and evolve applications towards the latest OpenMP versions > > (4.5+). In practical terms, this event will enable application teams and > > developers to accelerate their code with the use of GPUs, as well as > > exploiting the latest OpenMP functionality to program (IBM Power9) > > multi-core platforms. Prospective user groups of large hybrid CPU-GPU > > systems will send teams of at least 3 developers along with either (1) a > > scalable application that could benefit from GPU accelerators, or (2) an > > application running on accelerators that has already written OpenMP and > > needs optimization or (3) applications that have OpenACC in their codes and > > need assistance to convert them to OpenMP 4.5 offload. There will be > > intensive mentoring during this 4-day hands-on event. Programming > > experience with OpenMP 4.5 offload or CUDA is not a requirement. We will > > hold training events / tutorials covering the set of background topics > > required. In the weeks preceding the hackathon, you will have a chance to > > attend training to prepare you for the event. Prior GPU experience is not > > required! > > > -- > What most experimenters take for granted before they begin their experiments > is infinitely more interesting than any results to which their experiments > lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/
Re: [petsc-dev] OpenMP for GPU course
Matthew Knepley writes: > Are you saying that using OpenMP 4.5 "offload" is something you would do? For applications that make sense for GPUs, yes.
Re: [petsc-dev] OpenMP for GPU course
These are well-organized events that puts application teams together with compiler developers and the like. I served as a mentor for the Boulder hackathon last year (joining a NASA team that included PETSc user Gaetan Kenway) and learned a lot. It's highly recommended to prepare for the event by focusing the code you want to work on to something concrete and easily modifiable, and to have a test suite that can be easily run to evaluate correctness and performance. In our case, Michael Barad had done an exemplary job in that preparation. "Smith, Barry F. via petsc-dev" writes: >I got this off an ECP mailing list. > > > OpenMP Brookathon 2019 > April 29 – May 2, 2019 > URL: https://www.bnl.gov/ompbrookathon2019/ > The Computational Science Initiative at Brookhaven National Laboratory (BNL) > is organizing in conjunction with Oak Ridge National Laboratory (ORNL) and > IBM, the "OpenMP Brookathon 2019". This event is sponsored by ECP and driven > by the ECP SOLLVE Project. The goal of this hackathon is to port, optimize > and evolve applications towards the latest OpenMP versions (4.5+). In > practical terms, this event will enable application teams and developers to > accelerate their code with the use of GPUs, as well as exploiting the latest > OpenMP functionality to program (IBM Power9) multi-core platforms. > Prospective user groups of large hybrid CPU-GPU systems will send teams of at > least 3 developers along with either (1) a scalable application that could > benefit from GPU accelerators, or (2) an application running on accelerators > that has already written OpenMP and needs optimization or (3) applications > that have OpenACC in their codes and need assistance to convert them to > OpenMP 4.5 offload. There will be intensive mentoring during this 4-day > hands-on event. Programming experience with OpenMP 4.5 offload or CUDA is not > a requirement. We will hold training events / tutorials covering the set of > background topics required. In the weeks preceding the hackathon, you will > have a chance to attend training to prepare you for the event. Prior GPU > experience is not required!
[petsc-dev] OpenMP for GPU course
I got this off an ECP mailing list. OpenMP Brookathon 2019 April 29 – May 2, 2019 URL: https://www.bnl.gov/ompbrookathon2019/ The Computational Science Initiative at Brookhaven National Laboratory (BNL) is organizing in conjunction with Oak Ridge National Laboratory (ORNL) and IBM, the "OpenMP Brookathon 2019". This event is sponsored by ECP and driven by the ECP SOLLVE Project. The goal of this hackathon is to port, optimize and evolve applications towards the latest OpenMP versions (4.5+). In practical terms, this event will enable application teams and developers to accelerate their code with the use of GPUs, as well as exploiting the latest OpenMP functionality to program (IBM Power9) multi-core platforms. Prospective user groups of large hybrid CPU-GPU systems will send teams of at least 3 developers along with either (1) a scalable application that could benefit from GPU accelerators, or (2) an application running on accelerators that has already written OpenMP and needs optimization or (3) applications that have OpenACC in their codes and need assistance to convert them to OpenMP 4.5 offload. There will be intensive mentoring during this 4-day hands-on event. Programming experience with OpenMP 4.5 offload or CUDA is not a requirement. We will hold training events / tutorials covering the set of background topics required. In the weeks preceding the hackathon, you will have a chance to attend training to prepare you for the event. Prior GPU experience is not required!
Re: [petsc-dev] openmp
Thanks. Could you please send an example makefile for this small example as src/ts/examples/tutorials/makefile is huge and I don't know which part of it manages the -I -L stuff. Svetlana On Thu, 7 Nov 2013, at 1:19, Barry Smith wrote: Your makefile does not indicate the location of mpif.h to your FORTRAN compiler which in the case of —with-mpi=0 is in ${PETSC}/include/mpiuni Note that if you simply copy a makefile from PETSc, say src/ts/examples/tutorials/makefile and modify that slightly for you code you don’t need to manage all the -I -L stuff yourself, our makefiles take care of it and are portable for different MPIs etc. Barry On Nov 6, 2013, at 12:25 AM, Svetlana Tkachenko svetlana.tkache...@fastmail.fm wrote: I have configured petsc-dev (downloaded it today) with these options, and a small example. It appears to fail to compile without MPI with the error message: ./configure --with-cc=gcc --with-fc=gfortran --download-f-blas-lapack --with-openmp --with-mpi=0 ~/dev/test/petsc $ echo $LD_LIBRARY_PATH /home/username/petsc/linux-amd64/lib:/opt/openmpi/lib ~/dev/test/petsc $ cat solver.f subroutine solver() #include finclude/petscsys.h PetscErrorCode ierr print *, Entered petsc. ! Init PETSc call PetscInitialize(PETSC_NULL_CHARACTER,ierr) CHKERRQ(ierr) print *, Init done. ! Finalise PETSc call PetscFinalize(ierr) CHKERRQ(ierr) print *, Finalized. end ~/dev/test/petsc $ cat myexample.f program myexample call solver end ~/dev/test/petsc $ cat makefile include ${PETSC_DIR}/conf/variables myexample: myexample.o solver.o ; gfortran -o myexample myexample.o solver.o -lpetsc -L${PETSC_DIR}/${PETSC_ARCH}/lib -fopenmp solver.o: ; gfortran -c -cpp -I${PETSC_DIR}/include -O0 solver.f -lpetsc -I${PETSC_DIR}/${PETSC_ARCH}/include -L${PETSC_DIR}/${PETSC_ARCH}/lib -lpetsc -fopenmp myexample.o: ; gfortran -c -cpp -I${PETSC_DIR}/include -O0 myexample.f -lpetsc -I${PETSC_DIR}/${PETSC_ARCH}/include -L${PETSC_DIR}/${PETSC_ARCH}/lib -lpetsc -fopenmp ~/dev/test/petsc $ make gfortran -c -cpp -I/home/username/petsc/include -O0 myexample.f -lpetsc -I/home/username/petsc/linux-amd64/include -L/home/username/petsc/linux-amd64/lib -lpetsc -fopenmp gfortran -c -cpp -I/home/username/petsc/include -O0 solver.f -lpetsc -I/home/username/petsc/linux-amd64/include -L/home/username/petsc/linux-amd64/lib -lpetsc -fopenmp In file included from solver.f:3: /home/username/petsc/include/finclude/petscsys.h:10: error: mpif.h: No such file or directory /home/username/petsc/include/finclude/petscsys.h:163.29: Included at solver.f:3: parameter(MPIU_SCALAR = MPI_DOUBLE_PRECISION) 1 Error: Parameter 'mpi_double_precision' at (1) has not been declared or is a variable, which does not reduce to a constant expression /home/username/petsc/include/finclude/petscsys.h:171.30: Included at solver.f:3: parameter(MPIU_INTEGER = MPI_INTEGER) 1 Error: Parameter 'mpi_integer' at (1) has not been declared or is a variable, which does not reduce to a constant expression make: *** [solver.o] Error 1 ~/dev/test/petsc $
Re: [petsc-dev] openmp
On Wed, Nov 6, 2013 at 4:47 PM, Svetlana Tkachenko svetlana.tkache...@fastmail.fm wrote: Thanks. Could you please send an example makefile for this small example as src/ts/examples/tutorials/makefile is huge and I don't know which part of it manages the -I -L stuff. prog: prog.o ${CLINKER} -o prog prog.o ${PETSC_LIB} include ${PETSC_DIR}/conf/variables include ${PETSC_DIR}/conf/rules Matt Svetlana On Thu, 7 Nov 2013, at 1:19, Barry Smith wrote: Your makefile does not indicate the location of mpif.h to your FORTRAN compiler which in the case of —with-mpi=0 is in ${PETSC}/include/mpiuni Note that if you simply copy a makefile from PETSc, say src/ts/examples/tutorials/makefile and modify that slightly for you code you don’t need to manage all the -I -L stuff yourself, our makefiles take care of it and are portable for different MPIs etc. Barry On Nov 6, 2013, at 12:25 AM, Svetlana Tkachenko svetlana.tkache...@fastmail.fm wrote: I have configured petsc-dev (downloaded it today) with these options, and a small example. It appears to fail to compile without MPI with the error message: ./configure --with-cc=gcc --with-fc=gfortran --download-f-blas-lapack --with-openmp --with-mpi=0 ~/dev/test/petsc $ echo $LD_LIBRARY_PATH /home/username/petsc/linux-amd64/lib:/opt/openmpi/lib ~/dev/test/petsc $ cat solver.f subroutine solver() #include finclude/petscsys.h PetscErrorCode ierr print *, Entered petsc. ! Init PETSc call PetscInitialize(PETSC_NULL_CHARACTER,ierr) CHKERRQ(ierr) print *, Init done. ! Finalise PETSc call PetscFinalize(ierr) CHKERRQ(ierr) print *, Finalized. end ~/dev/test/petsc $ cat myexample.f program myexample call solver end ~/dev/test/petsc $ cat makefile include ${PETSC_DIR}/conf/variables myexample: myexample.o solver.o ; gfortran -o myexample myexample.o solver.o -lpetsc -L${PETSC_DIR}/${PETSC_ARCH}/lib -fopenmp solver.o: ; gfortran -c -cpp -I${PETSC_DIR}/include -O0 solver.f -lpetsc -I${PETSC_DIR}/${PETSC_ARCH}/include -L${PETSC_DIR}/${PETSC_ARCH}/lib -lpetsc -fopenmp myexample.o: ; gfortran -c -cpp -I${PETSC_DIR}/include -O0 myexample.f -lpetsc -I${PETSC_DIR}/${PETSC_ARCH}/include -L${PETSC_DIR}/${PETSC_ARCH}/lib -lpetsc -fopenmp ~/dev/test/petsc $ make gfortran -c -cpp -I/home/username/petsc/include -O0 myexample.f -lpetsc -I/home/username/petsc/linux-amd64/include -L/home/username/petsc/linux-amd64/lib -lpetsc -fopenmp gfortran -c -cpp -I/home/username/petsc/include -O0 solver.f -lpetsc -I/home/username/petsc/linux-amd64/include -L/home/username/petsc/linux-amd64/lib -lpetsc -fopenmp In file included from solver.f:3: /home/username/petsc/include/finclude/petscsys.h:10: error: mpif.h: No such file or directory /home/username/petsc/include/finclude/petscsys.h:163.29: Included at solver.f:3: parameter(MPIU_SCALAR = MPI_DOUBLE_PRECISION) 1 Error: Parameter 'mpi_double_precision' at (1) has not been declared or is a variable, which does not reduce to a constant expression /home/username/petsc/include/finclude/petscsys.h:171.30: Included at solver.f:3: parameter(MPIU_INTEGER = MPI_INTEGER) 1 Error: Parameter 'mpi_integer' at (1) has not been declared or is a variable, which does not reduce to a constant expression make: *** [solver.o] Error 1 ~/dev/test/petsc $ -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener
Re: [petsc-dev] openmp
Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes: Thanks. Could you please send an example makefile for this small example as src/ts/examples/tutorials/makefile is huge and I don't know which part of it manages the -I -L stuff. There are examples in the user's manual. I would start with this: ALL: ex2 include ${PETSC_DIR}/conf/variables include ${PETSC_DIR}/conf/rules ex2: ex2.o chkopts ${CLINKER} -o $@ $ ${PETSC_LIB} pgpzozD5V73lB.pgp Description: PGP signature
Re: [petsc-dev] openmp
On Thu, 7 Nov 2013, at 9:22, Jed Brown wrote: Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes: Thanks. Could you please send an example makefile for this small example as src/ts/examples/tutorials/makefile is huge and I don't know which part of it manages the -I -L stuff. There are examples in the user's manual. I would start with this: ALL: ex2 include ${PETSC_DIR}/conf/variables include ${PETSC_DIR}/conf/rules ex2: ex2.o chkopts ${CLINKER} -o $@ $ ${PETSC_LIB} Email had 1 attachment: + Attachment2 1k (application/pgp-signature) On Thu, 7 Nov 2013, at 9:21, Matthew Knepley wrote: prog: prog.o ${CLINKER} -o prog prog.o ${PETSC_LIB} include ${PETSC_DIR}/conf/variables include ${PETSC_DIR}/conf/rules Matt Right. I have spent half of an hour now trying to imagine what to do to link trying everything like a headless chicken and it did not work. I would appreciate if you could come up with something that links, not just runs a single program.
Re: [petsc-dev] openmp
Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes: Right. I have spent half of an hour now trying to imagine what to do to link trying everything like a headless chicken and it did not work. You always have to send the error message. I would appreciate if you could come up with something that links, not just runs a single program. The makefiles we suggested link a program of the same name as the source file. Do you have multiple source files? You only have to edit the one line and run make. program: several.o object.o files.o ${CLINKER} -o $@ $^ ${PETSC_LIB} include ${PETSC_DIR}/conf/variables include ${PETSC_DIR}/conf/rules pgpnp8FMfHJGP.pgp Description: PGP signature
Re: [petsc-dev] openmp
On Thu, 7 Nov 2013, at 10:22, Jed Brown wrote: Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes: Right. I have spent half of an hour now trying to imagine what to do to link trying everything like a headless chicken and it did not work. You always have to send the error message. I would appreciate if you could come up with something that links, not just runs a single program. The makefiles we suggested link a program of the same name as the source file. Do you have multiple source files? You only have to edit the one line and run make. program: several.o object.o files.o ${CLINKER} -o $@ $^ ${PETSC_LIB} include ${PETSC_DIR}/conf/variables include ${PETSC_DIR}/conf/rules Email had 1 attachment: + Attachment2 1k (application/pgp-signature) ~/dev/test/petsc $ cat makefile myexample: myexample.o solver.o ${CLINKER} -o $@ $^ ${PETSC_LIB} include ${PETSC_DIR}/conf/variables include ${PETSC_DIR}/conf/rules ~/dev/test/petsc $ cat solver.f subroutine solver() #include finclude/petscsys.h PetscErrorCode ierr print *, Entered petsc. ! Init PETSc call PetscInitialize(PETSC_NULL_CHARACTER,ierr) CHKERRQ(ierr) print *, Init done. ! Finalise PETSc call PetscFinalize(ierr) CHKERRQ(ierr) print *, Finalized. end ~/dev/test/petsc $ make gfortran -c -fPIC -Wall -Wno-unused-variable -g -fopenmp -o solver.o solver.f Warning: solver.f:2: Illegal preprocessor directive solver.f:3.12: PetscErrorCode ierr 1 Error: Unclassifiable statement at (1) solver.f:8.14: CHKERRQ(ierr) 1 Error: Unclassifiable statement at (1) solver.f:13.14: CHKERRQ(ierr) 1 Error: Unclassifiable statement at (1) make: [solver.o] Error 1 (ignored) gcc -fopenmp -fopenmp -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -fno-inline -O0 -fopenmp -o myexample myexample.o solver.o -Wl,-rpath,/home/username/petsc/linux-amd64/lib -L/home/username/petsc/linux-amd64/lib -lpetsc -Wl,-rpath,/home/username/petsc/linux-amd64/lib -lflapack -lfblas -lX11 -lpthread -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -lgfortran -lm -lm -lstdc++ -lstdc++ -ldl -lgcc_s -ldl gcc: solver.o: No such file or directory make: *** [myexample] Error 1 ~/dev/test/petsc $
Re: [petsc-dev] openmp
Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes: On Thu, 7 Nov 2013, at 10:22, Jed Brown wrote: Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes: Right. I have spent half of an hour now trying to imagine what to do to link trying everything like a headless chicken and it did not work. You always have to send the error message. I would appreciate if you could come up with something that links, not just runs a single program. The makefiles we suggested link a program of the same name as the source file. Do you have multiple source files? You only have to edit the one line and run make. program: several.o object.o files.o ${CLINKER} -o $@ $^ ${PETSC_LIB} include ${PETSC_DIR}/conf/variables include ${PETSC_DIR}/conf/rules Email had 1 attachment: + Attachment2 1k (application/pgp-signature) ~/dev/test/petsc $ cat makefile myexample: myexample.o solver.o ${CLINKER} -o $@ $^ ${PETSC_LIB} include ${PETSC_DIR}/conf/variables include ${PETSC_DIR}/conf/rules ~/dev/test/petsc $ cat solver.f subroutine solver() #include finclude/petscsys.h Name your source file solver.F so that the Fortran compiler preprocesses it. (Or add the option -cpp, but that is more confusing and less portable, so rename the source file.) pgpXoN197wB7v.pgp Description: PGP signature
Re: [petsc-dev] openmp
On Thu, 7 Nov 2013, at 12:34, Jed Brown wrote: Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes: On Thu, 7 Nov 2013, at 10:22, Jed Brown wrote: Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes: Right. I have spent half of an hour now trying to imagine what to do to link trying everything like a headless chicken and it did not work. You always have to send the error message. I would appreciate if you could come up with something that links, not just runs a single program. The makefiles we suggested link a program of the same name as the source file. Do you have multiple source files? You only have to edit the one line and run make. program: several.o object.o files.o ${CLINKER} -o $@ $^ ${PETSC_LIB} include ${PETSC_DIR}/conf/variables include ${PETSC_DIR}/conf/rules Email had 1 attachment: + Attachment2 1k (application/pgp-signature) ~/dev/test/petsc $ cat makefile myexample: myexample.o solver.o ${CLINKER} -o $@ $^ ${PETSC_LIB} include ${PETSC_DIR}/conf/variables include ${PETSC_DIR}/conf/rules ~/dev/test/petsc $ cat solver.f subroutine solver() #include finclude/petscsys.h Name your source file solver.F so that the Fortran compiler preprocesses it. (Or add the option -cpp, but that is more confusing and less portable, so rename the source file.) Email had 1 attachment: + Attachment2 1k (application/pgp-signature) ~/dev/test/petsc $ mv solver.f solver.F ~/dev/test/petsc $ make gfortran -c -fPIC -Wall -Wno-unused-variable -g -fopenmp -I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include -I/home/username/petsc/include/mpiuni-o solver.o solver.F solver.F:8.46: if (ierr .ne. 0) call MPI_Abort(PETSC_COMM_WORLD,ierr,ierr 1 Error: Missing ')' in statement at or before (1) solver.F:8.72: if (ierr .ne. 0) call MPI_Abort(PETSC_COMM_WORLD,ierr,ierr 1 Warning: Line truncated at (1) solver.F:13.46: if (ierr .ne. 0) call MPI_Abort(PETSC_COMM_WORLD,ierr,ierr 1 Error: Missing ')' in statement at or before (1) solver.F:13.72: if (ierr .ne. 0) call MPI_Abort(PETSC_COMM_WORLD,ierr,ierr 1 Warning: Line truncated at (1) make: [solver.o] Error 1 (ignored) gcc -fopenmp -fopenmp -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -fno-inline -O0 -fopenmp -o myexample myexample.o solver.o -Wl,-rpath,/home/username/petsc/linux-amd64/lib -L/home/username/petsc/linux-amd64/lib -lpetsc -Wl,-rpath,/home/username/petsc/linux-amd64/lib -lflapack -lfblas -lX11 -lpthread -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -lgfortran -lm -lm -lstdc++ -lstdc++ -ldl -lgcc_s -ldl gcc: solver.o: No such file or directory make: *** [myexample] Error 1 ~/dev/test/petsc $
Re: [petsc-dev] openmp
Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes: ~/dev/test/petsc $ mv solver.f solver.F ~/dev/test/petsc $ make gfortran -c -fPIC -Wall -Wno-unused-variable -g -fopenmp -I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include -I/home/username/petsc/include/mpiuni-o solver.o solver.F solver.F:8.46: if (ierr .ne. 0) call MPI_Abort(PETSC_COMM_WORLD,ierr,ierr 1 Error: Missing ')' in statement at or before (1) You indented so far that the expanded macro spilled over the 72-character line length needed to fit on a punch card in the 1950s. If you would like to modernize your Fortran dialect beyond the constraints of punch cards, you could consider naming your file .F90 or adding the option -ffree-form, perhaps also with -ffree-line-length-none. pgpSZ9zDQY7VU.pgp Description: PGP signature
Re: [petsc-dev] openmp
On Thu, 7 Nov 2013, at 12:56, Jed Brown wrote: Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes: ~/dev/test/petsc $ mv solver.f solver.F ~/dev/test/petsc $ make gfortran -c -fPIC -Wall -Wno-unused-variable -g -fopenmp -I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include -I/home/username/petsc/include/mpiuni-o solver.o solver.F solver.F:8.46: if (ierr .ne. 0) call MPI_Abort(PETSC_COMM_WORLD,ierr,ierr 1 Error: Missing ')' in statement at or before (1) You indented so far that the expanded macro spilled over the 72-character line length needed to fit on a punch card in the 1950s. If you would like to modernize your Fortran dialect beyond the constraints of punch cards, you could consider naming your file .F90 or adding the option -ffree-form, perhaps also with -ffree-line-length-none. Email had 1 attachment: + Attachment2 1k (application/pgp-signature) Thanks! I need to apologize as I didn't mention it in the previous message: in my solver.F no I did not exceed the line length and I don't know what it is referring to even. That line is not mine. Svetlana
Re: [petsc-dev] openmp
Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes: Thanks! I need to apologize as I didn't mention it in the previous message: in my solver.F no I did not exceed the line length and I don't know what it is referring to even. That line is not mine. It is the expanded error checking macro. Fortran does not provide a concise way to do error checking except to use the C preprocessor, but the line length requirements apply to the _expanded_ macro. Fortran is a perpetual nuisance, but once you learn about all the potholes, barbed wire, broken glass, and polonium in the food, it's possible to get by. pgpsPLnLrMrN4.pgp Description: PGP signature
Re: [petsc-dev] openmp
On Thu, 7 Nov 2013, at 13:51, Jed Brown wrote: Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes: On Thu, 7 Nov 2013, at 12:56, Jed Brown wrote: Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes: ~/dev/test/petsc $ mv solver.f solver.F ~/dev/test/petsc $ make gfortran -c -fPIC -Wall -Wno-unused-variable -g -fopenmp -I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include -I/home/username/petsc/include/mpiuni-o solver.o solver.F solver.F:8.46: if (ierr .ne. 0) call MPI_Abort(PETSC_COMM_WORLD,ierr,ierr 1 Error: Missing ')' in statement at or before (1) You indented so far that the expanded macro spilled over the 72-character line length needed to fit on a punch card in the 1950s. If you would like to modernize your Fortran dialect beyond the constraints of punch cards, you could consider naming your file .F90 or adding the option -ffree-form, perhaps also with -ffree-line-length-none. Email had 1 attachment: + Attachment2 1k (application/pgp-signature) Thank you for the advice. I've named both files .F90 and now that's what it's got. ~/dev/test/petsc $ ls makefile myexample.F90 solver.F90 ~/dev/test/petsc $ make gfortran -c -fPIC -Wall -Wno-unused-variable -g -fopenmp -I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include -I/home/username/petsc/include/mpiuni-o myexample.o myexample.F90 gfortran -c -fPIC -Wall -Wno-unused-variable -g -fopenmp -I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include -I/home/username/petsc/include/mpiuni-o solver.o solver.F90 gcc -fopenmp -fopenmp -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -fno-inline -O0 -fopenmp -o myexample myexample.o solver.o -Wl,-rpath,/home/username/petsc/linux-amd64/lib -L/home/username/petsc/linux-amd64/lib -lpetsc -Wl,-rpath,/home/username/petsc/linux-amd64/lib -lflapack -lfblas -lX11 -lpthread -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -lgfortran -lm -lm -lstdc++ -lstdc++ -ldl -lgcc_s -ldl /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/crt1.o: In function `_start': (.text+0x20): undefined reference to `main' What do you expect? Your file only contains a subroutine, no program. What do you mean? (I don't think the program name has to be 'main'.). ~/dev/test/petsc $ cat myexample.F90 program myexample call solver end ~/dev/test/petsc $
Re: [petsc-dev] openmp
Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes: What do you mean? (I don't think the program name has to be 'main'.). No, it doesn't. The name is meaningless in Fortran, but you need to use the keyword program. ~/dev/test/petsc $ cat myexample.F90 program myexample call solver end ~/dev/test/petsc $ Add myexample.o to the makefile so it gets compiled. pgpwF5jw09bs9.pgp Description: PGP signature
Re: [petsc-dev] openmp
On Thu, 7 Nov 2013, at 13:59, Jed Brown wrote: Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes: What do you mean? (I don't think the program name has to be 'main'.). No, it doesn't. The name is meaningless in Fortran, but you need to use the keyword program. ~/dev/test/petsc $ cat myexample.F90 program myexample call solver end ~/dev/test/petsc $ Add myexample.o to the makefile so it gets compiled. Already did, please, see: ~/dev/test/petsc $ cat makefile myexample: myexample.o solver.o ${CLINKER} -o $@ $^ ${PETSC_LIB} include ${PETSC_DIR}/conf/variables include ${PETSC_DIR}/conf/rules ~/dev/test/petsc $
Re: [petsc-dev] openmp
Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes: On Thu, 7 Nov 2013, at 13:59, Jed Brown wrote: Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes: What do you mean? (I don't think the program name has to be 'main'.). No, it doesn't. The name is meaningless in Fortran, but you need to use the keyword program. ~/dev/test/petsc $ cat myexample.F90 program myexample call solver end ~/dev/test/petsc $ Add myexample.o to the makefile so it gets compiled. Already did, please, see: ~/dev/test/petsc $ cat makefile myexample: myexample.o solver.o ${CLINKER} -o $@ $^ ${PETSC_LIB} include ${PETSC_DIR}/conf/variables include ${PETSC_DIR}/conf/rules ~/dev/test/petsc $ Run make clean; make. If you get the same error, check $ nm myexample.o U _gfortran_set_args U _gfortran_set_options U _GLOBAL_OFFSET_TABLE_ T main r options.0.1881 U solver_ pgpwhLIKSRraS.pgp Description: PGP signature
Re: [petsc-dev] openmp
On Thu, 7 Nov 2013, at 14:15, Jed Brown wrote: Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes: On Thu, 7 Nov 2013, at 13:59, Jed Brown wrote: Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes: What do you mean? (I don't think the program name has to be 'main'.). No, it doesn't. The name is meaningless in Fortran, but you need to use the keyword program. ~/dev/test/petsc $ cat myexample.F90 program myexample call solver end ~/dev/test/petsc $ Add myexample.o to the makefile so it gets compiled. Already did, please, see: ~/dev/test/petsc $ cat makefile myexample: myexample.o solver.o ${CLINKER} -o $@ $^ ${PETSC_LIB} include ${PETSC_DIR}/conf/variables include ${PETSC_DIR}/conf/rules ~/dev/test/petsc $ Run make clean; make. If you get the same error, check $ nm myexample.o U _gfortran_set_args U _gfortran_set_options U _GLOBAL_OFFSET_TABLE_ T main r options.0.1881 U solver_ ~/dev/test/petsc $ make clean ~/dev/test/petsc $ make gfortran -c -fPIC -Wall -Wno-unused-variable -g -fopenmp -I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include -I/home/username/petsc/include/mpiuni-o myexample.o myexample.F90 gfortran -c -fPIC -Wall -Wno-unused-variable -g -fopenmp -I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include -I/home/username/petsc/include/mpiuni-o solver.o solver.F90 gcc -fopenmp -fopenmp -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -fno-inline -O0 -fopenmp -o myexample myexample.o solver.o -Wl,-rpath,/home/username/petsc/linux-amd64/lib -L/home/username/petsc/linux-amd64/lib -lpetsc -Wl,-rpath,/home/username/petsc/linux-amd64/lib -lflapack -lfblas -lX11 -lpthread -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -lgfortran -lm -lm -lstdc++ -lstdc++ -ldl -lgcc_s -ldl /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/crt1.o: In function `_start': (.text+0x20): undefined reference to `main' collect2: ld returned 1 exit status make: *** [myexample] Error 1 ~/dev/test/petsc $ nm myexample.o T MAIN__ U _GLOBAL_OFFSET_TABLE_ U _gfortran_set_options r options.0.1516 U solver_ ~/dev/test/petsc $
Re: [petsc-dev] openmp
On Thu, 7 Nov 2013, at 14:22, Svetlana Tkachenko wrote: On Thu, 7 Nov 2013, at 14:15, Jed Brown wrote: Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes: On Thu, 7 Nov 2013, at 13:59, Jed Brown wrote: Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes: What do you mean? (I don't think the program name has to be 'main'.). No, it doesn't. The name is meaningless in Fortran, but you need to use the keyword program. ~/dev/test/petsc $ cat myexample.F90 program myexample call solver end ~/dev/test/petsc $ Add myexample.o to the makefile so it gets compiled. Already did, please, see: ~/dev/test/petsc $ cat makefile myexample: myexample.o solver.o ${CLINKER} -o $@ $^ ${PETSC_LIB} include ${PETSC_DIR}/conf/variables include ${PETSC_DIR}/conf/rules ~/dev/test/petsc $ Run make clean; make. If you get the same error, check $ nm myexample.o U _gfortran_set_args U _gfortran_set_options U _GLOBAL_OFFSET_TABLE_ T main r options.0.1881 U solver_ ~/dev/test/petsc $ make clean ~/dev/test/petsc $ make gfortran -c -fPIC -Wall -Wno-unused-variable -g -fopenmp -I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include -I/home/username/petsc/include/mpiuni-o myexample.o myexample.F90 gfortran -c -fPIC -Wall -Wno-unused-variable -g -fopenmp -I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include -I/home/username/petsc/include/mpiuni-o solver.o solver.F90 gcc -fopenmp -fopenmp -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -fno-inline -O0 -fopenmp -o myexample myexample.o solver.o -Wl,-rpath,/home/username/petsc/linux-amd64/lib -L/home/username/petsc/linux-amd64/lib -lpetsc -Wl,-rpath,/home/username/petsc/linux-amd64/lib -lflapack -lfblas -lX11 -lpthread -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -lgfortran -lm -lm -lstdc++ -lstdc++ -ldl -lgcc_s -ldl /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/crt1.o: In function `_start': (.text+0x20): undefined reference to `main' collect2: ld returned 1 exit status make: *** [myexample] Error 1 ~/dev/test/petsc $ nm myexample.o T MAIN__ U _GLOBAL_OFFSET_TABLE_ U _gfortran_set_options r options.0.1516 U solver_ ~/dev/test/petsc $ Using FLINKER instead of CLINKER in the makefile gets it to compile without errors. However, the previous behaviour persists (?). ~/dev/test/petsc $ make gfortran -c -fPIC -Wall -Wno-unused-variable -g -fopenmp -I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include -I/home/username/petsc/include/mpiuni-o myexample.o myexample.F90 gfortran -c -fPIC -Wall -Wno-unused-variable -g -fopenmp -I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include -I/home/username/petsc/include/mpiuni-o solver.o solver.F90 gfortran -fopenmp -fopenmp -fPIC -Wall -Wno-unused-variable -g -fopenmp -o myexample myexample.o solver.o -Wl,-rpath,/home/username/petsc/linux-amd64/lib -L/home/username/petsc/linux-amd64/lib -lpetsc -Wl,-rpath,/home/username/petsc/linux-amd64/lib -lflapack -lfblas -lX11 -lpthread -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -lgfortran -lm -lm -lstdc++ -lstdc++ -ldl -lgcc_s -ldl ~/dev/test/petsc $ ./myexample Entered petsc. Init done. WARNING! There are options you set that were not used! WARNING! could be spelling mistake, etc! Option left: name:-threadcomm_nthreads value: 8 Option left: name:-threadcomm_type value: openmp Finalized. ~/dev/test/petsc $
Re: [petsc-dev] openmp
On Thu, 7 Nov 2013, at 14:24, Svetlana Tkachenko wrote: On Thu, 7 Nov 2013, at 14:22, Svetlana Tkachenko wrote: On Thu, 7 Nov 2013, at 14:15, Jed Brown wrote: Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes: On Thu, 7 Nov 2013, at 13:59, Jed Brown wrote: Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes: What do you mean? (I don't think the program name has to be 'main'.). No, it doesn't. The name is meaningless in Fortran, but you need to use the keyword program. ~/dev/test/petsc $ cat myexample.F90 program myexample call solver end ~/dev/test/petsc $ Add myexample.o to the makefile so it gets compiled. Already did, please, see: ~/dev/test/petsc $ cat makefile myexample: myexample.o solver.o ${CLINKER} -o $@ $^ ${PETSC_LIB} include ${PETSC_DIR}/conf/variables include ${PETSC_DIR}/conf/rules ~/dev/test/petsc $ Run make clean; make. If you get the same error, check $ nm myexample.o U _gfortran_set_args U _gfortran_set_options U _GLOBAL_OFFSET_TABLE_ T main r options.0.1881 U solver_ ~/dev/test/petsc $ make clean ~/dev/test/petsc $ make gfortran -c -fPIC -Wall -Wno-unused-variable -g -fopenmp -I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include -I/home/username/petsc/include/mpiuni-o myexample.o myexample.F90 gfortran -c -fPIC -Wall -Wno-unused-variable -g -fopenmp -I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include -I/home/username/petsc/include/mpiuni-o solver.o solver.F90 gcc -fopenmp -fopenmp -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -fno-inline -O0 -fopenmp -o myexample myexample.o solver.o -Wl,-rpath,/home/username/petsc/linux-amd64/lib -L/home/username/petsc/linux-amd64/lib -lpetsc -Wl,-rpath,/home/username/petsc/linux-amd64/lib -lflapack -lfblas -lX11 -lpthread -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -lgfortran -lm -lm -lstdc++ -lstdc++ -ldl -lgcc_s -ldl /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../lib64/crt1.o: In function `_start': (.text+0x20): undefined reference to `main' collect2: ld returned 1 exit status make: *** [myexample] Error 1 ~/dev/test/petsc $ nm myexample.o T MAIN__ U _GLOBAL_OFFSET_TABLE_ U _gfortran_set_options r options.0.1516 U solver_ ~/dev/test/petsc $ Using FLINKER instead of CLINKER in the makefile gets it to compile without errors. However, the previous behaviour persists (?). ~/dev/test/petsc $ make gfortran -c -fPIC -Wall -Wno-unused-variable -g -fopenmp -I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include -I/home/username/petsc/include/mpiuni-o myexample.o myexample.F90 gfortran -c -fPIC -Wall -Wno-unused-variable -g -fopenmp -I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include -I/home/username/petsc/include/mpiuni-o solver.o solver.F90 gfortran -fopenmp -fopenmp -fPIC -Wall -Wno-unused-variable -g -fopenmp -o myexample myexample.o solver.o -Wl,-rpath,/home/username/petsc/linux-amd64/lib -L/home/username/petsc/linux-amd64/lib -lpetsc -Wl,-rpath,/home/username/petsc/linux-amd64/lib -lflapack -lfblas -lX11 -lpthread -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -lgfortran -lm -lm -lstdc++ -lstdc++ -ldl -lgcc_s -ldl ~/dev/test/petsc $ ./myexample Entered petsc. Init done. WARNING! There are options you set that were not used! WARNING! could be spelling mistake, etc! Option left: name:-threadcomm_nthreads value: 8 Option left: name:-threadcomm_type value: openmp Finalized. ~/dev/test/petsc $ Correction: Not previous behaviour, rather, new behaviour...
Re: [petsc-dev] openmp
Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes: Using FLINKER instead of CLINKER in the makefile gets it to compile without errors. However, the previous behaviour persists (?). ~/dev/test/petsc $ make gfortran -c -fPIC -Wall -Wno-unused-variable -g -fopenmp -I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include -I/home/username/petsc/include/mpiuni-o myexample.o myexample.F90 gfortran -c -fPIC -Wall -Wno-unused-variable -g -fopenmp -I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include -I/home/username/petsc/include/mpiuni-o solver.o solver.F90 gfortran -fopenmp -fopenmp -fPIC -Wall -Wno-unused-variable -g -fopenmp -o myexample myexample.o solver.o -Wl,-rpath,/home/username/petsc/linux-amd64/lib -L/home/username/petsc/linux-amd64/lib -lpetsc -Wl,-rpath,/home/username/petsc/linux-amd64/lib -lflapack -lfblas -lX11 -lpthread -Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -L/usr/lib/gcc/x86_64-redhat-linux/4.4.7 -lgfortran -lm -lm -lstdc++ -lstdc++ -ldl -lgcc_s -ldl ~/dev/test/petsc $ ./myexample Entered petsc. Init done. WARNING! There are options you set that were not used! WARNING! could be spelling mistake, etc! Option left: name:-threadcomm_nthreads value: 8 Option left: name:-threadcomm_type value: openmp Finalized. ~/dev/test/petsc $ Did you add these options to .petscrc? Run with -log_summary and send the output so we can see what configuration you are using. At the top of this thread, it looks like you did not configure --with-threadcomm. I recommend using --with-threadcomm --with-pthreadclasses --with-openmp. pgpZSqrY9qtwK.pgp Description: PGP signature
Re: [petsc-dev] openmp
On Thu, 7 Nov 2013, at 12:56, Jed Brown wrote: Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes: ~/dev/test/petsc $ mv solver.f solver.F ~/dev/test/petsc $ make gfortran -c -fPIC -Wall -Wno-unused-variable -g -fopenmp -I/home/username/petsc/include -I/home/username/petsc/linux-amd64/include -I/home/username/petsc/include/mpiuni-o solver.o solver.F solver.F:8.46: if (ierr .ne. 0) call MPI_Abort(PETSC_COMM_WORLD,ierr,ierr 1 Error: Missing ')' in statement at or before (1) You indented so far that the expanded macro spilled over the 72-character line length needed to fit on a punch card in the 1950s. If you would like to modernize your Fortran dialect beyond the constraints of punch cards, you could consider naming your file .F90 or adding the option -ffree-form, perhaps also with -ffree-line-length-none. For the bigger (non-test) project, I would really make use of --free-whatever options you gave. But simply writing prog: prog.o foo.o ${FLINKER} -ffree-form -o $@ $^ ${PETSC_LIB}, and looking at 'make' output, shows that it ignored my lines entirely and keeps trying to compile without -ffree-form.
Re: [petsc-dev] openmp
Svetlana Tkachenko svetlana.tkache...@fastmail.fm writes: For the bigger (non-test) project, I would really make use of --free-whatever options you gave. But simply writing prog: prog.o foo.o ${FLINKER} -ffree-form -o $@ $^ ${PETSC_LIB}, and looking at 'make' output, shows that it ignored my lines entirely and keeps trying to compile without -ffree-form. These are *compiler* options, not *linker* options. Put it in FFLAGS instead. make FFLAGS=-ffree-form pgp8WTt6ay3OO.pgp Description: PGP signature
[petsc-dev] openmp
I have configured petsc-dev (downloaded it today) with these options, and a small example. It appears to fail to compile without MPI with the error message: ./configure --with-cc=gcc --with-fc=gfortran --download-f-blas-lapack --with-openmp --with-mpi=0 ~/dev/test/petsc $ echo $LD_LIBRARY_PATH /home/username/petsc/linux-amd64/lib:/opt/openmpi/lib ~/dev/test/petsc $ cat solver.f subroutine solver() #include finclude/petscsys.h PetscErrorCode ierr print *, Entered petsc. ! Init PETSc call PetscInitialize(PETSC_NULL_CHARACTER,ierr) CHKERRQ(ierr) print *, Init done. ! Finalise PETSc call PetscFinalize(ierr) CHKERRQ(ierr) print *, Finalized. end ~/dev/test/petsc $ cat myexample.f program myexample call solver end ~/dev/test/petsc $ cat makefile include ${PETSC_DIR}/conf/variables myexample: myexample.o solver.o ; gfortran -o myexample myexample.o solver.o -lpetsc -L${PETSC_DIR}/${PETSC_ARCH}/lib -fopenmp solver.o: ; gfortran -c -cpp -I${PETSC_DIR}/include -O0 solver.f -lpetsc -I${PETSC_DIR}/${PETSC_ARCH}/include -L${PETSC_DIR}/${PETSC_ARCH}/lib -lpetsc -fopenmp myexample.o: ; gfortran -c -cpp -I${PETSC_DIR}/include -O0 myexample.f -lpetsc -I${PETSC_DIR}/${PETSC_ARCH}/include -L${PETSC_DIR}/${PETSC_ARCH}/lib -lpetsc -fopenmp ~/dev/test/petsc $ make gfortran -c -cpp -I/home/username/petsc/include -O0 myexample.f -lpetsc -I/home/username/petsc/linux-amd64/include -L/home/username/petsc/linux-amd64/lib -lpetsc -fopenmp gfortran -c -cpp -I/home/username/petsc/include -O0 solver.f -lpetsc -I/home/username/petsc/linux-amd64/include -L/home/username/petsc/linux-amd64/lib -lpetsc -fopenmp In file included from solver.f:3: /home/username/petsc/include/finclude/petscsys.h:10: error: mpif.h: No such file or directory /home/username/petsc/include/finclude/petscsys.h:163.29: Included at solver.f:3: parameter(MPIU_SCALAR = MPI_DOUBLE_PRECISION) 1 Error: Parameter 'mpi_double_precision' at (1) has not been declared or is a variable, which does not reduce to a constant expression /home/username/petsc/include/finclude/petscsys.h:171.30: Included at solver.f:3: parameter(MPIU_INTEGER = MPI_INTEGER) 1 Error: Parameter 'mpi_integer' at (1) has not been declared or is a variable, which does not reduce to a constant expression make: *** [solver.o] Error 1 ~/dev/test/petsc $
[petsc-dev] OpenMP in PETSc when calling from Fortran?
Hi again, On 01. mars 2013 20:06, Jed Brown wrote: Matrix and vector operations are probably running in parallel, but probably not the operations that are taking time. Always send -log_summary if you have a performance question. I don't think they are running in parallel. When I analyze my code in Intel Vtune Amplifier, the only routines running in parallel are my own OpenMP ones. Indeed, if I comment out my OpenMP pragmas and recompile my code, it never uses more than one thread. -log_summary is shown below; this is using -pc_type lu -ksp_type bcgs. The fastest PC for my cases is usually BoomerAMG from HYPRE, so i used LU instead here in order to limit the test to PETSc only. The summary agrees with Vtune that MatLUFactorNumeric is the most time-consuming routine; in general it seems that the PC is always the most time-consuming. Any advice on how to get OpenMP working? Regards, ?smund -- PETSc Performance Summary: -- ./run on a arch-linux2-c-opt named vsl161 with 1 processor, by asmunder Wed Mar 6 10:14:55 2013 Using Petsc Development HG revision: 58cc6199509f1642f637843f1ca468283bf5ced9 HG Date: Wed Jan 30 00:39:35 2013 -0600 Max Max/MinAvg Total Time (sec): 4.446e+02 1.0 4.446e+02 Objects: 2.017e+03 1.0 2.017e+03 Flops:3.919e+11 1.0 3.919e+11 3.919e+11 Flops/sec:8.815e+08 1.0 8.815e+08 8.815e+08 MPI Messages: 0.000e+00 0.0 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.0 0.000e+00 0.000e+00 MPI Reductions: 2.818e+03 1.0 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N -- 2N flops and VecAXPY() for complex vectors of length N -- 8N flops Summary of Stages: - Time -- - Flops - --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 4.4460e+02 100.0% 3.9191e+11 100.0% 0.000e+00 0.0% 0.000e+000.0% 2.817e+03 100.0% See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %f - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) EventCount Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s --- Event Stage 0: Main Stage VecDot 802 1.0 9.2811e-02 1.0 1.96e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2117 VecDotNorm2 401 1.0 7.1333e-02 1.0 1.96e+08 1.0 0.0e+00 0.0e+00 4.0e+02 0 0 0 0 14 0 0 0 0 14 2755 VecNorm 1203 1.0 7.8265e-02 1.0 2.95e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 3766 VecCopy 802 1.0 1.1754e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 1211 1.0 9.9961e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 401 1.0 4.5847e-02 1.0 9.82e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2143 VecAXPBYCZ 802 1.0 1.3489e-01 1.0 3.93e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2913 VecWAXPY 802 1.0 1.2292e-01 1.0 1.96e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1599 VecAssemblyBegin 802 1.0 2.4509e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 802 1.0 6.7234e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMult
[petsc-dev] OpenMP in PETSc when calling from Fortran?
I don't see any options for turning on the threads here? #PETSc Option Table entries: -ksp_type bcgs -log_summary -pc_type lu #End of PETSc Option Table entries From http://www.mcs.anl.gov/petsc/features/threads.html ? The three important run-time options for using threads are: ? -threadcomm_nthreads nthreads: Sets the number of threads ? -threadcomm_affinities list_of_affinities: Sets the core affinities of threads ? -threadcomm_type nothread,pthread,openmp: Threading model (OpenMP, pthread, nothread) ? Run with -help to see the avialable options with threads. ? A few tutorial examples are located at $PETSC_DIR/src/sys/threadcomm/examples/tutorials Also LU is a direct solver that is not threaded so using threads for this exact run will not help (much) at all. The threads will only show useful speed up for iterative methods. Barry As time goes by we hope to have more extensive support in more routines for threads but things like factorization and solve are difficult so out side help would be very useful. On Mar 6, 2013, at 3:39 AM, ?smund Ervik Asmund.Ervik at sintef.no wrote: Hi again, On 01. mars 2013 20:06, Jed Brown wrote: Matrix and vector operations are probably running in parallel, but probably not the operations that are taking time. Always send -log_summary if you have a performance question. I don't think they are running in parallel. When I analyze my code in Intel Vtune Amplifier, the only routines running in parallel are my own OpenMP ones. Indeed, if I comment out my OpenMP pragmas and recompile my code, it never uses more than one thread. -log_summary is shown below; this is using -pc_type lu -ksp_type bcgs. The fastest PC for my cases is usually BoomerAMG from HYPRE, so i used LU instead here in order to limit the test to PETSc only. The summary agrees with Vtune that MatLUFactorNumeric is the most time-consuming routine; in general it seems that the PC is always the most time-consuming. Any advice on how to get OpenMP working? Regards, ?smund -- PETSc Performance Summary: -- ./run on a arch-linux2-c-opt named vsl161 with 1 processor, by asmunder Wed Mar 6 10:14:55 2013 Using Petsc Development HG revision: 58cc6199509f1642f637843f1ca468283bf5ced9 HG Date: Wed Jan 30 00:39:35 2013 -0600 Max Max/MinAvg Total Time (sec): 4.446e+02 1.0 4.446e+02 Objects: 2.017e+03 1.0 2.017e+03 Flops:3.919e+11 1.0 3.919e+11 3.919e+11 Flops/sec:8.815e+08 1.0 8.815e+08 8.815e+08 MPI Messages: 0.000e+00 0.0 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.0 0.000e+00 0.000e+00 MPI Reductions: 2.818e+03 1.0 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N -- 2N flops and VecAXPY() for complex vectors of length N -- 8N flops Summary of Stages: - Time -- - Flops - --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 4.4460e+02 100.0% 3.9191e+11 100.0% 0.000e+00 0.0% 0.000e+000.0% 2.817e+03 100.0% See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %f - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) EventCount Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
[petsc-dev] OpenMP in PETSc when calling from Fortran?
On Fri, Mar 1, 2013 at 3:26 AM, ?smund Ervik asmund.ervik at ntnu.no wrote: Thanks for clarifying this. I am already using OpenMP pragmas in non-PETSc routines in my code, and using petsc-dev. Are you saying that I should also somehow use OpenMP pragmas around the calls to KSPSolve etc.? Suppose that my program is usually run like this: ./run -pc_type gamg -ksp_type bcgs with other values left to their defaults, and I want to make it run in parallel: ./run -pc type gamg -ksp_type bcgs -threadcomm_type openmp -threadcomm_nthreads 8 When I do this, the PC and KSP still run in serial as far as I can tell, and the program does not execute faster. What am I missing here? Matrix and vector operations are probably running in parallel, but probably not the operations that are taking time. Always send -log_summary if you have a performance question. In case it is of interest, the matrix from my Poisson equation has in the range of 0.4 - 1 million nonzero elements, on average 5 per row. -- next part -- An HTML attachment was scrubbed... URL: http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20130301/795d346f/attachment.html
[petsc-dev] OpenMP in PETSc when calling from Fortran?
Hi Barry, On 28. feb. 2013 17:38, Barry Smith wrote: 2) You should not need petscthreadcomm.h in Fortran. Simply using OpenMP progmas in your portion of the code. Thanks for clarifying this. I am already using OpenMP pragmas in non-PETSc routines in my code, and using petsc-dev. Are you saying that I should also somehow use OpenMP pragmas around the calls to KSPSolve etc.? Suppose that my program is usually run like this: ./run -pc_type gamg -ksp_type bcgs with other values left to their defaults, and I want to make it run in parallel: ./run -pc type gamg -ksp_type bcgs -threadcomm_type openmp -threadcomm_nthreads 8 When I do this, the PC and KSP still run in serial as far as I can tell, and the program does not execute faster. What am I missing here? In case it is of interest, the matrix from my Poisson equation has in the range of 0.4 - 1 million nonzero elements, on average 5 per row. Regards, ?smund
[petsc-dev] OpenMP compiler options
The OpenMP flags do not definitively identify that OpenMP is used. In particular, IBM XL interprets Cray's option -h omp as being equivalent to -soname omp, then silently ignores the Open MP pragmas. We can perhaps fix this instance by moving -qsmp up in the list, but we may eventually need to move it to compilerOptions.py. def configureLibrary(self): ''' Checks for -fopenmp compiler flag''' ''' Needs to check if OpenMP actually exists and works ''' self.setCompilers.pushLanguage('C') # for flag in [-fopenmp, # Gnu -h omp, # Cray -mp, # Portland Group -Qopenmp, # Intel windows -openmp, # Intel ,# Empty, if compiler automatically accepts openmp -xopenmp, # Sun +Oopenmp, # HP -qsmp,# IBM XL C/c++ /openmp # Microsoft Visual Studio ]: if self.setCompilers.checkCompilerFlag(flag): ompflag = flag break -- next part -- An HTML attachment was scrubbed... URL: http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120529/cbdc2b8c/attachment.html
[petsc-dev] OpenMP compiler options
On Tue, May 29, 2012 at 3:52 PM, Jed Brown jedbrown at mcs.anl.gov wrote: The OpenMP flags do not definitively identify that OpenMP is used. In particular, IBM XL interprets Cray's option -h omp as being equivalent to -soname omp, then silently ignores the Open MP pragmas. We can perhaps fix this instance by moving -qsmp up in the list, but we may eventually need to move it to compilerOptions.py. Move it up, and add it to the comment. And people think OpenMP is the easy way? Matt def configureLibrary(self): ''' Checks for -fopenmp compiler flag''' ''' Needs to check if OpenMP actually exists and works ''' self.setCompilers.pushLanguage('C') # for flag in [-fopenmp, # Gnu -h omp, # Cray -mp, # Portland Group -Qopenmp, # Intel windows -openmp, # Intel ,# Empty, if compiler automatically accepts openmp -xopenmp, # Sun +Oopenmp, # HP -qsmp,# IBM XL C/c++ /openmp # Microsoft Visual Studio ]: if self.setCompilers.checkCompilerFlag(flag): ompflag = flag break -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -- next part -- An HTML attachment was scrubbed... URL: http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120529/7c32ca41/attachment.html
[petsc-dev] OpenMP compiler options
Perhaps the 'empty' flag check should be at the begining of the list.. [we do have other places in configure where we fix the order in which flags are checked - due to similar conflicts between compilers] Satish On Tue, 29 May 2012, Jed Brown wrote: The OpenMP flags do not definitively identify that OpenMP is used. In particular, IBM XL interprets Cray's option -h omp as being equivalent to -soname omp, then silently ignores the Open MP pragmas. We can perhaps fix this instance by moving -qsmp up in the list, but we may eventually need to move it to compilerOptions.py. def configureLibrary(self): ''' Checks for -fopenmp compiler flag''' ''' Needs to check if OpenMP actually exists and works ''' self.setCompilers.pushLanguage('C') # for flag in [-fopenmp, # Gnu -h omp, # Cray -mp, # Portland Group -Qopenmp, # Intel windows -openmp, # Intel ,# Empty, if compiler automatically accepts openmp -xopenmp, # Sun +Oopenmp, # HP -qsmp,# IBM XL C/c++ /openmp # Microsoft Visual Studio ]: if self.setCompilers.checkCompilerFlag(flag): ompflag = flag break
[petsc-dev] OpenMP compiler options
On Tue, May 29, 2012 at 10:56 AM, Satish Balay balay at mcs.anl.gov wrote: Perhaps the 'empty' flag check should be at the begining of the list.. No, because most compilers ignore the pragmas when no options are given. I do not know of any portable way, short of running code, to determine whether a compiler used the OpenMP pragmas or ignored them. [we do have other places in configure where we fix the order in which flags are checked - due to similar conflicts between compilers] Satish On Tue, 29 May 2012, Jed Brown wrote: The OpenMP flags do not definitively identify that OpenMP is used. In particular, IBM XL interprets Cray's option -h omp as being equivalent to -soname omp, then silently ignores the Open MP pragmas. We can perhaps fix this instance by moving -qsmp up in the list, but we may eventually need to move it to compilerOptions.py. def configureLibrary(self): ''' Checks for -fopenmp compiler flag''' ''' Needs to check if OpenMP actually exists and works ''' self.setCompilers.pushLanguage('C') # for flag in [-fopenmp, # Gnu -h omp, # Cray -mp, # Portland Group -Qopenmp, # Intel windows -openmp, # Intel ,# Empty, if compiler automatically accepts openmp -xopenmp, # Sun +Oopenmp, # HP -qsmp,# IBM XL C/c++ /openmp # Microsoft Visual Studio ]: if self.setCompilers.checkCompilerFlag(flag): ompflag = flag break -- next part -- An HTML attachment was scrubbed... URL: http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120529/f717e993/attachment.html
[petsc-dev] OpenMP Support
Hi Dave I should just say that we still have not finished a code review with the petsc-dev team so any faults should be assumed to be ours rather than theirs! Michele just put a preprint of our paper covering the preliminary work on arXiv, which you might found useful: http://arxiv.org/abs/1205.2005 As we have not yet merged with the truck so you'd need to pull a branch from: https://bitbucket.org/wence/petsc-dev-omp and configure using --with-openmp A key issue when running is to set thread/core affinity. Unfortunately there is no general way of doing this - it depends on your compiler. But likwid-pin can make your life a little easier: http://code.google.com/p/likwid/wiki/LikwidPin Cheers Gerard Dave Nystrom emailed the following on 10/05/12 05:55: Hi Gerard, Thanks for the info. Is there any documentation on how to use the petsc OpenMP support? I would be interested in trying it out. Thanks, Dave Gerard Gorman writes: Hi Dave OpenMP support exists for vec and mat (only AIJ so far). There is a big difference in performance depending on available memory bandwidth and the compiler OpenMP implementation. In application codes (such as Fluidity which is our main target code for this work) there are other significant costs such as matrix assembly. So in general you have to consider how easy it will be to thread to the other computationally expensive sections of your code, otherwise the overall speed-up of your application will be modest. Cheers Gerard Dave Nystrom emailed the following on 09/05/12 04:29: Is the pthreads support further along than the OpenMP support? I have not tried the pthreads support yet. Does either the pthreads support or the OpenMP support implement the matvec or do they just do vector type operations? Jed Brown writes: On Tue, May 8, 2012 at 9:23 PM, Dave Nystrom dnystrom1 at comcast.net wrote: I see that petsc-dev now has some OpenMP support. Would a serial, non-mpi code that uses petsc-dev be able to gain much performance improvement from it now for the case of doing sparse linear solve with cg and jacobi preconditioning? The kernels are being transitioned to use the threadcomm, which enables OpenMP and other threading models. We anticipate that pthreads will provide the best performance because operations can be less synchronous than with OpenMP (for which a parallel region implies barrier semantics). But if other parts of an application are using OpenMP, it would be preferable for PETSc to also use OpenMP so that it can share the same thread pool. The same applies to TBB.
[petsc-dev] OpenMP Support
Hi Dave OpenMP support exists for vec and mat (only AIJ so far). There is a big difference in performance depending on available memory bandwidth and the compiler OpenMP implementation. In application codes (such as Fluidity which is our main target code for this work) there are other significant costs such as matrix assembly. So in general you have to consider how easy it will be to thread to the other computationally expensive sections of your code, otherwise the overall speed-up of your application will be modest. Cheers Gerard Dave Nystrom emailed the following on 09/05/12 04:29: Is the pthreads support further along than the OpenMP support? I have not tried the pthreads support yet. Does either the pthreads support or the OpenMP support implement the matvec or do they just do vector type operations? Jed Brown writes: On Tue, May 8, 2012 at 9:23 PM, Dave Nystrom dnystrom1 at comcast.net wrote: I see that petsc-dev now has some OpenMP support. Would a serial, non-mpi code that uses petsc-dev be able to gain much performance improvement from it now for the case of doing sparse linear solve with cg and jacobi preconditioning? The kernels are being transitioned to use the threadcomm, which enables OpenMP and other threading models. We anticipate that pthreads will provide the best performance because operations can be less synchronous than with OpenMP (for which a parallel region implies barrier semantics). But if other parts of an application are using OpenMP, it would be preferable for PETSc to also use OpenMP so that it can share the same thread pool. The same applies to TBB.
[petsc-dev] OpenMP Support
Hi Gerard, Thanks for the info. Is there any documentation on how to use the petsc OpenMP support? I would be interested in trying it out. Thanks, Dave Gerard Gorman writes: Hi Dave OpenMP support exists for vec and mat (only AIJ so far). There is a big difference in performance depending on available memory bandwidth and the compiler OpenMP implementation. In application codes (such as Fluidity which is our main target code for this work) there are other significant costs such as matrix assembly. So in general you have to consider how easy it will be to thread to the other computationally expensive sections of your code, otherwise the overall speed-up of your application will be modest. Cheers Gerard Dave Nystrom emailed the following on 09/05/12 04:29: Is the pthreads support further along than the OpenMP support? I have not tried the pthreads support yet. Does either the pthreads support or the OpenMP support implement the matvec or do they just do vector type operations? Jed Brown writes: On Tue, May 8, 2012 at 9:23 PM, Dave Nystrom dnystrom1 at comcast.net wrote: I see that petsc-dev now has some OpenMP support. Would a serial, non-mpi code that uses petsc-dev be able to gain much performance improvement from it now for the case of doing sparse linear solve with cg and jacobi preconditioning? The kernels are being transitioned to use the threadcomm, which enables OpenMP and other threading models. We anticipate that pthreads will provide the best performance because operations can be less synchronous than with OpenMP (for which a parallel region implies barrier semantics). But if other parts of an application are using OpenMP, it would be preferable for PETSc to also use OpenMP so that it can share the same thread pool. The same applies to TBB.
[petsc-dev] OpenMP Support
I see that petsc-dev now has some OpenMP support. Would a serial, non-mpi code that uses petsc-dev be able to gain much performance improvement from it now for the case of doing sparse linear solve with cg and jacobi preconditioning? Thanks, Dave
[petsc-dev] OpenMP Support
On Tue, May 8, 2012 at 9:23 PM, Dave Nystrom dnystrom1 at comcast.net wrote: I see that petsc-dev now has some OpenMP support. Would a serial, non-mpi code that uses petsc-dev be able to gain much performance improvement from it now for the case of doing sparse linear solve with cg and jacobi preconditioning? The kernels are being transitioned to use the threadcomm, which enables OpenMP and other threading models. We anticipate that pthreads will provide the best performance because operations can be less synchronous than with OpenMP (for which a parallel region implies barrier semantics). But if other parts of an application are using OpenMP, it would be preferable for PETSc to also use OpenMP so that it can share the same thread pool. The same applies to TBB. -- next part -- An HTML attachment was scrubbed... URL: http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120508/7f38fe1b/attachment.html
[petsc-dev] OpenMP Support
Is the pthreads support further along than the OpenMP support? I have not tried the pthreads support yet. Does either the pthreads support or the OpenMP support implement the matvec or do they just do vector type operations? Jed Brown writes: On Tue, May 8, 2012 at 9:23 PM, Dave Nystrom dnystrom1 at comcast.net wrote: I see that petsc-dev now has some OpenMP support. Would a serial, non-mpi code that uses petsc-dev be able to gain much performance improvement from it now for the case of doing sparse linear solve with cg and jacobi preconditioning? The kernels are being transitioned to use the threadcomm, which enables OpenMP and other threading models. We anticipate that pthreads will provide the best performance because operations can be less synchronous than with OpenMP (for which a parallel region implies barrier semantics). But if other parts of an application are using OpenMP, it would be preferable for PETSc to also use OpenMP so that it can share the same thread pool. The same applies to TBB.
[petsc-dev] OpenMP Support
On Tue, May 8, 2012 at 10:29 PM, Dave Nystrom Dave.Nystrom at tachyonlogic.com wrote: Is the pthreads support further along than the OpenMP support? I have not tried the pthreads support yet. Does either the pthreads support or the OpenMP support implement the matvec or do they just do vector type operations? The pthreads stuff has been available for a while, including some matrix kernels, but it's being transformed now, so it will get better. -- next part -- An HTML attachment was scrubbed... URL: http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120508/c0b41267/attachment.html
[petsc-dev] OpenMP/Vec
Dear all, I've been doing some of the programming associated with the work Gerard's describing, so here's an attempt of a bit of a response with some summary of where we'd go next. On 28/02/12 04:39, Jed Brown wrote: On Mon, Feb 27, 2012 at 16:31, Gerard Gorman g.gorman at imperial.ac.uk mailto:g.gorman at imperial.ac.uk wrote: I had a quick go at trying to get some sensible benchmarks for this but I was getting too much system noise. I am particularly interested in seeing if the overhead goes to zero if num_threads(1) is used. What timing method did you use? I did not see overhead going to zero when num_threads goes to 1 when using GCC compilers, but Intel seems to do fairly well. It's possible that if one asks the gcc folk nicely enough that the overhead might go to zero. Alternately, if it turns out that overheads are unavoidable, and we've gone down the macro-ification route anyway (see below) we could explicitly make a runtime decision based on numthreads in PETSc. So that: PetscOMPParallelFor(..., numthreads) for ( ... ) ; would turn into if ( numthreads == 1 ) for ( ... ) ; else #pragma omp parallel for for ( ... ) ; The branch should hopefully be cheaper than the thread sync overhead. I'm surprised by this. I not aware of any compiler that doesn't have OpenMP support - and then you do not actually enable OpenMP compilers generally just ignore the pragma. Do you know of any compiler that does not have OpenMP support which will complain? Sean points out that omp.h might not be available, but that misses the point. As far as I know, recent mainstream compilers have enough sense to at least ignore these directives, but I'm sure there are still cases where it would be an issue. More importantly, #pragma was a misfeature that should never be used now that _Pragma() exists. The latter is better not just because it can be turned off, but because it can be manipulated using macros and can be explicitly compiled out. This may not be flexible enough. You frequently want to have a parallel region, and then have multiple omp for's within that one region. PetscPragmaOMPObject(obj, parallel) { PetscPragmaOMP(whetever you normally write for this loop) for () { } ... and so on } So this is easy to do as you say, the exact best way to do it will probably become clear when writing the code. As far as attaching the threading information to the object goes, it would be nice to do so staying within the sequential implementation (rather than copying all the code as is done for pthreads right now). I'd envisage something like this: PetscPragmaOMPObject(vec, parallel) { PetscPragmaOMP(for...) for ( i = PetscVecLowerBound(vec); i PetscVecUpperBound(vec); i++ ) ...; } Where one has: #if defined(PETSC_HAVE_OPENMP) #define PetscVecLowerBound(vec) compute_lower_bound #define PetscVecUpperBound(vec) compute_upper_bound #else #define PetscVecLowerBound(vec) 0 #define PetscVecUpperBound(vec) vec-map-n #endif Or is this too ugly for words? Computing the lower and upper bounds could either be done for every loop, or, if it's only a property of the vector length, we could stash it in some extra slots at creation time. For a first cut, computation of the upper and lower bounds will do exactly what a static schedule does (i.e. chunking by size/nthread). I think what you describe is close to Fig 3 of this paper written by your neighbours: http://greg.bronevetsky.com/papers/2008IWOMP.pdf However, before making the implementation more complex, it would be good to benchmark the current approach and use a tool like likwid to measure the NUMA traffic so we can get a good handle on the costs. Sure. Well this is where the implementation details get richer and there are many options - they also become less portable. For example, what does all this mean for the sparc64 processors which are UMA. Delay to runtime, use an ignorant partition for UMA. (Blue Gene/Q is also essentially uniform.) But note that even with uniform memory, cache still makes it somewhat hierarchical. Not to mention Intel MIC which also supports OpenMP. I guess I am cautious about getting too bogged down with very invasive optimisations until we have benchmarked the basic approach which in a wide range of use cases will achieve good thread/page locality as illustrated previously. I guess I'm just interested in exposing enough semantic information to be able to schedule a few different ways using run-time (or, if absolutely necessary, configure-time) options. I don't want to have to revisit individual loops. So to take a single, representative, kernel we move from something like: PetscErrorCode VecConjugate_Seq(Vec xin) { PetscScalar*x; PetscInt n = xin-map-n; PetscInt i; PetscErrorCode ierr;
[petsc-dev] OpenMP/Vec
On Sun, Feb 26, 2012 at 22:54, recrusader recrusader at gmail.com wrote: For multithread support in PETSc, my question is whether KSP and/or PC work when Vec and Mat use multithread mode. Yes -- next part -- An HTML attachment was scrubbed... URL: http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120227/e7e6ba7f/attachment.html
[petsc-dev] OpenMP/Vec
Jed Brown emailed the following on 27/02/12 00:39: On Sun, Feb 26, 2012 at 04:07, Gerard Gorman g.gorman at imperial.ac.uk mailto:g.gorman at imperial.ac.uk wrote: Did you post a repository yet? I'd like to have a look at the code. It's on Barry's favourite collaborative software development site of course ;-) https://bitbucket.org/wence/petsc-dev-omp/overview I looked through the code and I'm concerned that all the OpenMP code is inlined into vec/impls/seq and mat/impls/aij/seq with, as far as I can tell, no way to use OpenMP for some objects in a simulation, but not others. I think that all the pragmas should have num_threads(x-nthreads) clauses. We can compute the correct number of threads based on sizes when memory is allocated (or specified through command line options, inherited from related objects, etc). The num_threads(x-nthreads) is worth investigating. However, in the benchmarks I have done so far it would seem that the only two sensible values for nthreads would be 1 or the total number of threads. When you set num_threads(1) it seems that OpenMP (for some implementations at least) is sensible enough not to do anything silly that would introduce scheduling/synchronisation overheads. Once you use more than one thread then you incur the scheduling/synchronisation overheads. As you increase the number of threads you may for small arrays see parallel efficiency decreasing, but I have not seen the actual time increasing. If this is a general result then it might not be a big deal that OpenMP is inlined into vec/impls/seq and mat/impls/aij/seq so long as the num_threads was used as suggested. For determining where the cut off array size for using all or one thread - as interesting option would be to determine this at run time, i.e. learn at run time what the appropriate is. I don't think we can get away from OpenMP schedule overhead (several hundred cycles) even for those objects that we choose not to use threads for, but (at least with my gcc tests), that overhead is only about a third of the cost of actually starting a parallel region. I had a quick go at trying to get some sensible benchmarks for this but I was getting too much system noise. I am particularly interested in seeing if the overhead goes to zero if num_threads(1) is used. The next step is to take a look at the EPCC OpenMP microbenchmarks to see if they have tied down these issues. It's really not acceptable to insert unguarded #pragma omp ... into the code because this will generate tons of warnings or errors with compilers that don't know about OpenMP. It would be better to test for _Pragma and use I'm surprised by this. I not aware of any compiler that doesn't have OpenMP support - and then you do not actually enable OpenMP compilers generally just ignore the pragma. Do you know of any compiler that does not have OpenMP support which will complain? #define PetscPragmatize(x) _Pragma(#x) #if defined(PETSC_USE_OPENMP) # define PetscPragmaOMP(x) PetscPragmatize(omp x) #else # define PetscPragmaOMP(x) #endif then use PetscPragmaOMP(parallel for ...) We should probably use a variant for object-based threading #define PetscPragmaOMPObject(obj,x) PetscPragmaOMP(x num_threads((obj)-nthreads)) This may not be flexible enough. You frequently want to have a parallel region, and then have multiple omp for's within that one region. In the case of multiple objects, I think you usually want the object being modified to control the number of threads. I take this point. In many cases, I would prefer more control over the partition of the loop. For example, in many cases, I'd be willing to tolerate a slight computational imbalance between threads in exchange for working exclusively within my page. Note that the arithmetic to compute such things is orders of magnitude less expensive than the schedule/distribution to threads. I don't know how to do that except to PragmaOMP(parallel) { int nthreads = omp_get_num_threads(); int tnum = omp_get_thread_num(); int start,end; // compute start and end for (int i=start; iend; i++) { // the work } } We could perhaps capture some of this common logic in a macro: #define VecOMPParallelBegin(X,args) do { \ PragmaOMPObject(X,parallel args) { \ PetscInt _start, _end; \ VecOMPGetThreadLocalPart(X,_start,_end); \ { do {} while(0) #define VecOMPParallelEnd() }}} while(0) VecOMPParallelBegin(X, shared/private ...); { PetscInt i; for (i=_start; i_end; i++) { // the work } } VecOMPParallelEnd(); That should reasonably give us complete run-time control of the number of parallel threads per object and their distribution, within the constraints of contiguous thread partition. I think what you describe is close to Fig 3 of this paper written by your neighbours: http://greg.bronevetsky.com/papers/2008IWOMP.pdf However, before making the implementation more complex, it
[petsc-dev] OpenMP/Vec
On Mon, Feb 27, 2012 at 4:31 PM, Gerard Gorman g.gorman at imperial.ac.ukwrote: I'm surprised by this. I not aware of any compiler that doesn't have OpenMP support - and then you do not actually enable OpenMP compilers generally just ignore the pragma. Do you know of any compiler that does not have OpenMP support which will complain? clang (which is on any mac with xcode = 4.0) will not compile any OpenMP (but gcc will compile just fine): fatal error: 'omp.h' file not found #include omp.h ^ 1 error generated. It's not in any llvm stuff as of last month: http://www.phoronix.com/scan.php?page=news_itempx=MTA0Mzc Putting this layer of abstraction is really needed (especially if there are any calls to omp_ functions) -- next part -- An HTML attachment was scrubbed... URL: http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120227/b6e0e269/attachment.html
[petsc-dev] OpenMP/Vec
On Mon, Feb 27, 2012 at 16:31, Gerard Gorman g.gorman at imperial.ac.ukwrote: I had a quick go at trying to get some sensible benchmarks for this but I was getting too much system noise. I am particularly interested in seeing if the overhead goes to zero if num_threads(1) is used. What timing method did you use? I did not see overhead going to zero when num_threads goes to 1 when using GCC compilers, but Intel seems to do fairly well. I'm surprised by this. I not aware of any compiler that doesn't have OpenMP support - and then you do not actually enable OpenMP compilers generally just ignore the pragma. Do you know of any compiler that does not have OpenMP support which will complain? Sean points out that omp.h might not be available, but that misses the point. As far as I know, recent mainstream compilers have enough sense to at least ignore these directives, but I'm sure there are still cases where it would be an issue. More importantly, #pragma was a misfeature that should never be used now that _Pragma() exists. The latter is better not just because it can be turned off, but because it can be manipulated using macros and can be explicitly compiled out. This may not be flexible enough. You frequently want to have a parallel region, and then have multiple omp for's within that one region. PetscPragmaOMPObject(obj, parallel) { PetscPragmaOMP(whetever you normally write for this loop) for () { } ... and so on } I think what you describe is close to Fig 3 of this paper written by your neighbours: http://greg.bronevetsky.com/papers/2008IWOMP.pdf However, before making the implementation more complex, it would be good to benchmark the current approach and use a tool like likwid to measure the NUMA traffic so we can get a good handle on the costs. Sure. Well this is where the implementation details get richer and there are many options - they also become less portable. For example, what does all this mean for the sparc64 processors which are UMA. Delay to runtime, use an ignorant partition for UMA. (Blue Gene/Q is also essentially uniform.) But note that even with uniform memory, cache still makes it somewhat hierarchical. Not to mention Intel MIC which also supports OpenMP. I guess I am cautious about getting too bogged down with very invasive optimisations until we have benchmarked the basic approach which in a wide range of use cases will achieve good thread/page locality as illustrated previously. I guess I'm just interested in exposing enough semantic information to be able to schedule a few different ways using run-time (or, if absolutely necessary, configure-time) options. I don't want to have to revisit individual loops. -- next part -- An HTML attachment was scrubbed... URL: http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120227/5163a159/attachment.html
[petsc-dev] OpenMP/Vec
On Sun, Feb 26, 2012 at 04:07, Gerard Gorman g.gorman at imperial.ac.ukwrote: Did you post a repository yet? I'd like to have a look at the code. It's on Barry's favourite collaborative software development site of course ;-) https://bitbucket.org/wence/petsc-dev-omp/overview I looked through the code and I'm concerned that all the OpenMP code is inlined into vec/impls/seq and mat/impls/aij/seq with, as far as I can tell, no way to use OpenMP for some objects in a simulation, but not others. I think that all the pragmas should have num_threads(x-nthreads) clauses. We can compute the correct number of threads based on sizes when memory is allocated (or specified through command line options, inherited from related objects, etc). I don't think we can get away from OpenMP schedule overhead (several hundred cycles) even for those objects that we choose not to use threads for, but (at least with my gcc tests), that overhead is only about a third of the cost of actually starting a parallel region. It's really not acceptable to insert unguarded #pragma omp ... into the code because this will generate tons of warnings or errors with compilers that don't know about OpenMP. It would be better to test for _Pragma and use #define PetscPragmatize(x) _Pragma(#x) #if defined(PETSC_USE_OPENMP) # define PetscPragmaOMP(x) PetscPragmatize(omp x) #else # define PetscPragmaOMP(x) #endif then use PetscPragmaOMP(parallel for ...) We should probably use a variant for object-based threading #define PetscPragmaOMPObject(obj,x) PetscPragmaOMP(x num_threads((obj)-nthreads)) In the case of multiple objects, I think you usually want the object being modified to control the number of threads. In many cases, I would prefer more control over the partition of the loop. For example, in many cases, I'd be willing to tolerate a slight computational imbalance between threads in exchange for working exclusively within my page. Note that the arithmetic to compute such things is orders of magnitude less expensive than the schedule/distribution to threads. I don't know how to do that except to PragmaOMP(parallel) { int nthreads = omp_get_num_threads(); int tnum = omp_get_thread_num(); int start,end; // compute start and end for (int i=start; iend; i++) { // the work } } We could perhaps capture some of this common logic in a macro: #define VecOMPParallelBegin(X,args) do { \ PragmaOMPObject(X,parallel args) { \ PetscInt _start, _end; \ VecOMPGetThreadLocalPart(X,_start,_end); \ { do {} while(0) #define VecOMPParallelEnd() }}} while(0) VecOMPParallelBegin(X, shared/private ...); { PetscInt i; for (i=_start; i_end; i++) { // the work } } VecOMPParallelEnd(); That should reasonably give us complete run-time control of the number of parallel threads per object and their distribution, within the constraints of contiguous thread partition. That also leaves open the possibility of using libnuma to query and migrate pages. (For example, a short vector that needs to be accessed from multiple NUMA nodes might intentionally be faulted with pages spread apart even though other vectors of similar size might be accessed from within one NUMA nodes and thus not use threads at all. (One 4 KiB page is only 512 doubles, but if the memory is local to a single NUMA node, we wouldn't use threads until the vector length was 4 to 8 times larger.) -- next part -- An HTML attachment was scrubbed... URL: http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120226/164b1637/attachment.html
[petsc-dev] OpenMP/Vec
Dear Jed, For multithread support in PETSc, my question is whether KSP and/or PC work when Vec and Mat use multithread mode. Thanks, Yujie On Sun, Feb 26, 2012 at 6:39 PM, Jed Brown jedbrown at mcs.anl.gov wrote: On Sun, Feb 26, 2012 at 04:07, Gerard Gorman g.gorman at imperial.ac.ukwrote: Did you post a repository yet? I'd like to have a look at the code. It's on Barry's favourite collaborative software development site of course ;-) https://bitbucket.org/wence/petsc-dev-omp/overview I looked through the code and I'm concerned that all the OpenMP code is inlined into vec/impls/seq and mat/impls/aij/seq with, as far as I can tell, no way to use OpenMP for some objects in a simulation, but not others. I think that all the pragmas should have num_threads(x-nthreads) clauses. We can compute the correct number of threads based on sizes when memory is allocated (or specified through command line options, inherited from related objects, etc). I don't think we can get away from OpenMP schedule overhead (several hundred cycles) even for those objects that we choose not to use threads for, but (at least with my gcc tests), that overhead is only about a third of the cost of actually starting a parallel region. It's really not acceptable to insert unguarded #pragma omp ... into the code because this will generate tons of warnings or errors with compilers that don't know about OpenMP. It would be better to test for _Pragma and use #define PetscPragmatize(x) _Pragma(#x) #if defined(PETSC_USE_OPENMP) # define PetscPragmaOMP(x) PetscPragmatize(omp x) #else # define PetscPragmaOMP(x) #endif then use PetscPragmaOMP(parallel for ...) We should probably use a variant for object-based threading #define PetscPragmaOMPObject(obj,x) PetscPragmaOMP(x num_threads((obj)-nthreads)) In the case of multiple objects, I think you usually want the object being modified to control the number of threads. In many cases, I would prefer more control over the partition of the loop. For example, in many cases, I'd be willing to tolerate a slight computational imbalance between threads in exchange for working exclusively within my page. Note that the arithmetic to compute such things is orders of magnitude less expensive than the schedule/distribution to threads. I don't know how to do that except to PragmaOMP(parallel) { int nthreads = omp_get_num_threads(); int tnum = omp_get_thread_num(); int start,end; // compute start and end for (int i=start; iend; i++) { // the work } } We could perhaps capture some of this common logic in a macro: #define VecOMPParallelBegin(X,args) do { \ PragmaOMPObject(X,parallel args) { \ PetscInt _start, _end; \ VecOMPGetThreadLocalPart(X,_start,_end); \ { do {} while(0) #define VecOMPParallelEnd() }}} while(0) VecOMPParallelBegin(X, shared/private ...); { PetscInt i; for (i=_start; i_end; i++) { // the work } } VecOMPParallelEnd(); That should reasonably give us complete run-time control of the number of parallel threads per object and their distribution, within the constraints of contiguous thread partition. That also leaves open the possibility of using libnuma to query and migrate pages. (For example, a short vector that needs to be accessed from multiple NUMA nodes might intentionally be faulted with pages spread apart even though other vectors of similar size might be accessed from within one NUMA nodes and thus not use threads at all. (One 4 KiB page is only 512 doubles, but if the memory is local to a single NUMA node, we wouldn't use threads until the vector length was 4 to 8 times larger.) -- next part -- An HTML attachment was scrubbed... URL: http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120226/c87f9ab1/attachment.html
[petsc-dev] OpenMP/Vec
Hi I have been running benchmarks on the OpenMP branch of petsc-dev on an Intel Westmere (Intel(R) Xeon(R) CPU X5670 @ 2.93GHz). You can see all graphs + test code + code to generate results in the tar ball linked below and I am just going to give a quick summary here. http://amcg.ese.ic.ac.uk/~ggorman/omp_vec_benchmarks.tar.gz There are 3 sets of results: gcc/ : GCC 4.6 intel/ : Intel 12.0 with MKL intel-pinning/ : as above put applying hard affinity. Files matching mpi_*.pdf show the MPI speedup and parallel efficiency for a range of vector sizes. Similarly for omp_*.pdf with respect to OpenMP. The remaining files directly compare scaling of MPI Vs OpenMP for the various tests for the largest vector size. I think the results are very encouraging and there are many interesting little details in there. I am just going to summarise a few here that I think are particularly important. 1. In most cases the threaded code performs as well as, and in many cases better then the mpi code. 2. For GCC I did not use a threaded blas. For Intel I used -lmkl_intel_thread. However, it appears dnrm2 is not threaded. It seems to be a common feature among other threaded blas libraries that Level 1 is not completely threaded (e.g. cray). Unfortunately most of this is experience/anecdotal information. I do not know of any proper survey. We have the option here of either rolling our own or ignoring the issue until profiling shows it is a problem...and eventually someone else will release a fully threaded blas. 3. Comparing intel/ and intel-pinning/ is particularly interesting. First touch has been applied to all memory in VecCreate so that memory should be paged correctly for NUMA. But first touch does not gain you much if threads migrate, so for the intel-pinning/ results I set the env KMP_AFFINITY=scatter to get hard affinity. You can clearly from the results that this improves parallel efficiency by a few percentage points in many cases. It also really smooths out efficiency dips as you run on different number of threads. Full blown benchmarks would not make a lot of sense until we get the Mat classes threaded in a similar fashion. However, at this point I would like feedback on the direction this is taking and if we can start getting code committed. Cheers Gerard
[petsc-dev] OpenMP support
Matthew Knepley emailed the following on 14/02/12 00:34: As a first step - can we add OpenMP support to PETSc conf? Lawrence made a first pass at this: https://bitbucket.org/wence/petsc-dev-omp/src/52afd5fd2c25/config/PETSc/packages/openmp.py It does need extending because it will fail in it is current state for a number of compilers. I am guessing we would have to reimplement something like ax_openmp or similar...what is the right thing to do here? Can you explain the right test? The definitive test would be to try to compile a test code: eg int main(){ #ifndef _OPENMP choke me #endif return 0; } The OpenMP standard specifies that the preprocessor macro name _OPENMP should be set. As for the actual compile flag ... reading FindOpenMP.cmake you can see that they take the test compile approach as above, and they have a list of candidates (OpenMP_C_FLAG_CANDIDATES) which are tested. Another option would be to allow --with-openmp[=magic_flag] for the odd compiler not recognised. For example I notice that the Fujitsu compiler is not listed in FindOpenMP.cmake. Cheers Gerard
[petsc-dev] OpenMP support
Hi I have been working with Lawrence Mitchell and Michele Weiland at EPCC to add OpenMP support to the mat/vec classes and we are at the stage that we would like to give other other people a chance to play with it. Lawrence put a branch on bitbucket if you want to browse: https://bitbucket.org/wence/petsc-dev-omp/overview I think that there is a lot in there that needs discussion(/modification), so I propose that we try to break this into a number of discussions/steps (also I do not want to write a several page email). I am making the assumption here that OpenMP is interesting enough to want to put in PETSc. Other than the garden variety multicore, it can also be used on Intel MIC. Does this need further discussion? As a first step - can we add OpenMP support to PETSc conf? Lawrence made a first pass at this: https://bitbucket.org/wence/petsc-dev-omp/src/52afd5fd2c25/config/PETSc/packages/openmp.py It does need extending because it will fail in it is current state for a number of compilers. I am guessing we would have to reimplement something like ax_openmp or similar...what is the right thing to do here? Regarding cmake - I only recently learned cmake myself and I have been using: FIND_PACKAGE(OpenMP) (i.e. cmake does all the heavy lifting in FindOpenMP.cmake which it provides). However, as PETSc does not appear to use the find_package feature I am not sure what approach should be adopted for PETSc. Suggestions? Cheers Gerard
[petsc-dev] OpenMP support
On Mon, Feb 13, 2012 at 9:28 AM, Gerard Gorman g.gorman at imperial.ac.ukwrote: Hi I have been working with Lawrence Mitchell and Michele Weiland at EPCC to add OpenMP support to the mat/vec classes and we are at the stage that we would like to give other other people a chance to play with it. Lawrence put a branch on bitbucket if you want to browse: https://bitbucket.org/wence/petsc-dev-omp/overview I think that there is a lot in there that needs discussion(/modification), so I propose that we try to break this into a number of discussions/steps (also I do not want to write a several page email). I am making the assumption here that OpenMP is interesting enough to want to put in PETSc. Other than the garden variety multicore, it can also be used on Intel MIC. Does this need further discussion? As a first step - can we add OpenMP support to PETSc conf? Lawrence made a first pass at this: https://bitbucket.org/wence/petsc-dev-omp/src/52afd5fd2c25/config/PETSc/packages/openmp.py It does need extending because it will fail in it is current state for a number of compilers. I am guessing we would have to reimplement something like ax_openmp or similar...what is the right thing to do here? Can you explain the right test? Regarding cmake - I only recently learned cmake myself and I have been using: FIND_PACKAGE(OpenMP) (i.e. cmake does all the heavy lifting in FindOpenMP.cmake which it provides). However, as PETSc does not appear to use the find_package feature I am not sure what approach should be adopted for PETSc. Suggestions? If the configure works right, there is nothing left to do for CMake. Matt Cheers Gerard -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -- next part -- An HTML attachment was scrubbed... URL: http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20120213/af0ebdbf/attachment.html