ok, I thought it was a compile-time option.

I can't answer which option is best but if I interpret Jed correctly, we
should build OpenBLAS single threaded.

--
Anders


tis 31 mars 2015 kl 20:41 skrev Johannes Ring <[email protected]>:

> On Tue, Mar 31, 2015 at 8:35 PM, Anders Logg <[email protected]>
> wrote:
> >
> >
> > tis 31 mars 2015 kl 20:22 skrev Johannes Ring <[email protected]>:
> >>
> >> On Tue, Mar 31, 2015 at 7:03 PM, Jan Blechta <
> [email protected]>
> >> wrote:
> >> > It is OpenBLAS.
> >> >
> >> > Citing from https://github.com/xianyi/OpenBLAS/wiki/faq#multi-
> threaded
> >> >
> >> >   If your application is already multi-threaded, it will conflict with
> >> >   OpenBLAS multi-threading.
> >>
> >> Good find! Setting OPENBLAS_NUM_THREADS=1 (or OMP_NUM_THREADS=1) the
> >> Cahn-Hilliard demo takes 35s with 1.6.0dev, while it took 1m7s without
> >> this environment variable.
> >>
> >> Johannes
> >
> >
> > Great! Can you push this fix? Then I can try here as well.
>
> This is a runtime variable. We could add that variable to the config
> file, or should we build OpenBLAS single threaded version (using make
> USE_THREAD=0)?
>
> Johannes
>
> > --
> > Anders
> >
> >
> >> > We must be cautious of this. (Are PETSc devs? CCing Jed Brown.) This
> >> > should be taken into account in HashDist and possibly we could check
> >> > for this during DOLFIN configure. It is also possible that it has
> >> > something to do with issues #326 and #491.
> >> >
> >> > Jan
> >> >
> >> >
> >> > On Tue, 31 Mar 2015 17:47:15 +0200
> >> > Jan Blechta <[email protected]> wrote:
> >> >
> >> >> We have found together with Jaroslav Hron here that HashDist build
> >> >> (about 2 months old) of 1.5 spawns (processes/threads) and steals all
> >> >> the (hyper-threading) cores on the machine. Can you confirm it,
> >> >> Anders?
> >> >>
> >> >> I'm not sure which piece of software is doing this. Any guess here?
> >> >> I'd like to know which software is so cheeky.
> >> >>
> >> >>     time OMP_NUM_THREADS=1 DOLFIN_NOPLOT=1 python
> >> >> demo_cahn-hilliard.py
> >> >>
> >> >> produces much more satisfactory timings.
> >> >>
> >> >> Jan
> >> >>
> >> >>
> >> >> On Tue, 31 Mar 2015 15:09:24 +0000
> >> >> Anders Logg <[email protected]> wrote:
> >> >>
> >> >> > Hmm... So what conclusions should one make?
> >> >> >
> >> >> > - Major difference lies in PETSc LU solver
> >> >> >
> >> >> > - 1.6dev looks faster than 1.5 for Johannes
> >> >> >
> >> >> > - assemble_cells twice as fast in Debian package than HashDist
> build
> >> >>
> >> >>
> >> >>
> >> >> >
> >> >> > - Apply (PETScVector) happens a lot more than it used to
> >> >> >
> >> >> > - Init tensor, build sparsity, delete sparsity happens a lot less
> >> >> >
> >> >> > Important questions:
> >> >> >
> >> >> > - Are the Debian packages built with more optimization than the
> >> >> > HashDist build uses? (indicated by faster assemble_cells for Debian
> >> >> > version)
> >> >> >
> >> >> > - How can the PETSc LU solve timings change? Are different PETSc
> >> >> > version being used, or is PETSc built differently?
> >> >> >
> >> >> > --
> >> >> > Anders
> >> >> >
> >> >> >
> >> >> > tis 31 mars 2015 kl 10:25 skrev Johannes Ring <[email protected]>:
> >> >> >
> >> >> > > Here are my numbers (see attachment).
> >> >> > >
> >> >> > > Johannes
> >> >> > >
> >> >> > > On Tue, Mar 31, 2015 at 9:46 AM, Garth N. Wells <[email protected]
> >
> >> >> > > wrote:
> >> >> > > > FEniCS 1.4 package (Ubuntu 14.10)
> >> >> > > >
> >> >> > > > Summary of timings                                       |
> >> >> > > > Average time  Total time  Reps
> >> >> > > > ------------------------------------------------------------
> >> >> > > ------------------------------
> >> >> > > > Apply (PETScMatrix)                                      |
> >> >> > > > 0.00033009    0.079882   242
> >> >> > > > Apply (PETScVector)                                      |
> >> >> > > > 6.9951e-06    0.005806   830
> >> >> > > > Assemble cells                                           |
> >> >> > > > 0.017927      9.5731   534
> >> >> > > > Boost Cuthill-McKee graph ordering (from dolfin::Graph)  |
> >> >> > > > 9.5844e-05  9.5844e-05     1
> >> >> > > > Build Boost CSR graph                                    |
> >> >> > > > 7.7009e-05  7.7009e-05     1
> >> >> > > > Build mesh number mesh entities                          |
> >> >> > > > 0           0     2
> >> >> > > > Build sparsity                                           |
> >> >> > > > 0.0041105   0.0082209     2
> >> >> > > > Delete sparsity                                          |
> >> >> > > > 1.0729e-06  2.1458e-06     2
> >> >> > > > Init MPI                                                 |
> >> >> > > > 0.055825    0.055825     1
> >> >> > > > Init PETSc                                               |
> >> >> > > > 0.056171    0.056171     1
> >> >> > > > Init dof vector                                          |
> >> >> > > > 0.00018656  0.00037313     2
> >> >> > > > Init dofmap                                              |
> >> >> > > > 0.0064399   0.0064399     1
> >> >> > > > Init dofmap from UFC dofmap                              |
> >> >> > > > 0.0017549   0.0035098     2
> >> >> > > > Init tensor                                              |
> >> >> > > > 0.0002135  0.00042701     2
> >> >> > > > LU solver                                                |
> >> >> > > > 0.11543      27.933   242
> >> >> > > > PETSc LU solver                                          |
> >> >> > > > 0.1154      27.926   242
> >> >> > > >
> >> >> > > >
> >> >> > > >
> >> >> > > > FEniCS dev (my build, using PETSc dev)
> >> >> > > >
> >> >> > > > [MPI_AVG] Summary of timings     |  reps    wall avg    wall
> tot
> >> >> > > > ------------------------------------------------------------
> ----
> >> >> > > > Apply (PETScMatrix)              |   242  0.00020009
> 0.048421
> >> >> > > > Apply (PETScVector)              |   830  8.5487e-06
>  0.0070954
> >> >> > > > Assemble cells                   |   534    0.017001
> 9.0787
> >> >> > > > Build mesh number mesh entities  |     1    7.35e-07
> 7.35e-07
> >> >> > > > Build sparsity                   |     2   0.0068867
> 0.013773
> >> >> > > > Delete sparsity                  |     2    9.88e-07
>  1.976e-06
> >> >> > > > Init MPI                         |     1   0.0023164
>  0.0023164
> >> >> > > > Init PETSc                       |     1    0.002519
> 0.002519
> >> >> > > > Init dof vector                  |     2  0.00016088
> 0.00032177
> >> >> > > > Init dofmap                      |     1     0.04457
>  0.04457
> >> >> > > > Init dofmap from UFC dofmap      |     1   0.0035997
>  0.0035997
> >> >> > > > Init tensor                      |     2  0.00034076
> 0.00068153
> >> >> > > > LU solver                        |   242    0.097293
> 23.545
> >> >> > > > PETSc LU solver                  |   242    0.097255
> 23.536
> >> >> > > > SCOTCH graph ordering            |     1   0.0005598
>  0.0005598
> >> >> > > > compute connectivity 1 - 2       |     1  0.00088592
> 0.00088592
> >> >> > > > compute entities dim = 1         |     1    0.028021
> 0.028021
> >> >> > > >
> >> >> > > > Garth
> >> >> > > >
> >> >> > > >
> >> >> > > > On Mon, Mar 30, 2015 at 11:37 PM, Jan Blechta
> >> >> > > > <[email protected]> wrote:
> >> >> > > >> Could you, guys, run it with
> >> >> > > >>
> >> >> > > >>   list_timings()
> >> >> > > >>
> >> >> > > >> to get a detailed structure where's the time spent?
> >> >> > > >>
> >> >> > > >> Jan
> >> >> > > >>
> >> >> > > >>
> >> >> > > >> On Mon, 30 Mar 2015 23:21:41 +0200
> >> >> > > >> Johannes Ring <[email protected]> wrote:
> >> >> > > >>
> >> >> > > >>> On Mon, Mar 30, 2015 at 8:37 PM, Anders Logg
> >> >> > > >>> <[email protected]> wrote:
> >> >> > > >>> > Could you or someone else build FEniCS with
> >> >> > > >>> > fenics-install.sh (takes time but is presumably automatic)
> >> >> > > >>> > and compare?
> >> >> > > >>>
> >> >> > > >>> I got 53s with the Debian packages and 1m5s with the HashDist
> >> >> > > >>> based installation.
> >> >> > > >>>
> >> >> > > >>> > The alternative would be for me to build FEniCS manually
> but
> >> >> > > >>> > that takes a lot of manual effort and it's not clear I can
> >> >> > > >>> > make a "good" build. It would be good to get a number, not
> >> >> > > >>> > only to check for a possible regression but also to test
> >> >> > > >>> > whether something is suboptimal in the HashDist build.
> >> >> > > >>> >
> >> >> > > >>> > Johannes, is the HashDist build with optimization?
> >> >> > > >>>
> >> >> > > >>> DOLFIN is built with CMAKE_BUILD_TYPE=Release. The flags for
> >> >> > > >>> building PETSc is listed below.
> >> >> > > >>>
> >> >> > > >>> Johannes
> >> >> > > >>>
> >> >> > > >>> PETSc flags for Debian package:
> >> >> > > >>>
> >> >> > > >>> PETSC_DIR=/tmp/src/petsc-3.4.2.dfsg1
> >> >> > > >>> PETSC_ARCH=linux-gnu-c-opt \ ./config/configure.py
> >> >> > > >>> --with-shared-libraries --with-debugging=0 \ --useThreads 0
> >> >> > > >>> --with-clanguage=C++ --with-c-support \
> >> >> > > >>> --with-fortran-interfaces=1 \ --with-mpi-dir=/usr/lib/
> openmpi
> >> >> > > >>> --with-mpi-shared=1 \ --with-blas-lib=-lblas
> >> >> > > >>> --with-lapack-lib=-llapack \ --with-blacs=1
> >> >> > > >>> --with-blacs-include=/usr/include \
> >> >> > > >>> --with-blacs-lib=[/usr/lib/libblacsCinit-openmpi.so,/usr/
> >> >> > > lib/libblacs-openmpi.so]
> >> >> > > >>> \
> >> >> > > >>>   --with-scalapack=1 --with-scalapack-include=/usr/include \
> >> >> > > >>>   --with-scalapack-lib=/usr/lib/libscalapack-openmpi.so \
> >> >> > > >>>   --with-mumps=1 --with-mumps-include=/usr/include \
> >> >> > > >>>   --with-mumps-lib=[/usr/lib/libdmumps.so,/usr/lib/
> >> >> > > libzmumps.so,/usr/lib/libsmumps.so,/usr/lib/
> libcmumps.so,/usr/lib/
> >> >> > > libmumps_common.so,/usr/lib/libpord.so]
> >> >> > > >>> \
> >> >> > > >>>   --with-umfpack=1
> >> >> > > >>> --with-umfpack-include=/usr/include/suitesparse \
> >> >> > > >>> --with-umfpack-lib=[/usr/lib/libumfpack.so,/usr/lib/libamd.
> so]
> >> >> > > >>> \ --with-cholmod=1
> >> >> > > >>> --with-cholmod-include=/usr/include/suitesparse \
> >> >> > > >>> --with-cholmod-lib=/usr/lib/libcholmod.so \ --with-spooles=1
> >> >> > > >>> --with-spooles-include=/usr/include/spooles \
> >> >> > > >>> --with-spooles-lib=/usr/lib/libspooles.so \ --with-hypre=1
> >> >> > > >>> --with-hypre-dir=/usr \ --with-ptscotch=1
> >> >> > > >>> --with-ptscotch-include=/usr/include/scotch \
> >> >> > > >>> --with-ptscotch-lib=[/usr/lib/libptesmumps.so,/usr/lib/
> >> >> > > libptscotch.so,/usr/lib/libptscotcherr.so]
> >> >> > > >>> \
> >> >> > > >>>   --with-fftw=1 --with-fftw-include=/usr/include \
> >> >> > > >>>   --with-fftw-lib=[/usr/lib/x86_
> 64-linux-gnu/libfftw3.so,/usr/
> >> >> > > lib/x86_64-linux-gnu/libfftw3_mpi.so]
> >> >> > > >>> \
> >> >> > > >>>   --with-hdf5=1
> >> >> > > >>> --with-hdf5-dir=/usr/lib/x86_64-linux-gnu/hdf5/openmpi
> >> >> > > >>> --CXX_LINKER_FLAGS="-Wl,--no-as-needed"
> >> >> > > >>>
> >> >> > > >>>
> >> >> > > >>> PETSc flags for HashDist based build:
> >> >> > > >>>
> >> >> > > >>> mkdir ${PWD}/_tmp && TMPDIR=${PWD}/_tmp \
> >> >> > > >>>   ./configure --prefix="${ARTIFACT}" \
> >> >> > > >>>   COPTFLAGS=-O2 \
> >> >> > > >>>   --with-shared-libraries=1 \
> >> >> > > >>>   --with-debugging=0 \
> >> >> > > >>>   --with-ssl=0 \
> >> >> > > >>>   --with-blas-lapack-lib=${OPENBLAS_DIR}/lib/libopenblas.so
> \
> >> >> > > >>>   --with-metis-dir=$PARMETIS_DIR \
> >> >> > > >>>   --with-parmetis-dir=$PARMETIS_DIR \
> >> >> > > >>>   --with-scotch-dir=${SCOTCH_DIR} \
> >> >> > > >>>   --with-ptscotch-dir=${SCOTCH_DIR} \
> >> >> > > >>>   --with-suitesparse=1 \
> >> >> > > >>>
> >> >> > > >>> --with-suitesparse-include=${SUITESPARSE_DIR}/include/
> suitesparse
> >> >> > > >>> \ --with-suitesparse-lib=[${SUITESPARSE_DIR}/lib/
> >> >> > >
> >> >> > > libumfpack.a,libklu.a,libcholmod.a,libbtf.a,
> libccolamd.a,libcolamd.a,
> >> >> > > libcamd.a,libamd.a,libsuitesparseconfig.a]
> >> >> > > >>> \
> >> >> > > >>>   --with-hypre=1 \
> >> >> > > >>>   --with-hypre-include=${HYPRE_DIR}/include \
> >> >> > > >>>   --with-hypre-lib=${HYPRE_DIR}/lib/libHYPRE.so \
> >> >> > > >>>   --with-mpi-compilers \
> >> >> > > >>>   CC=$MPICC \
> >> >> > > >>>   CXX=$MPICXX \
> >> >> > > >>>   F77=$MPIF77 \
> >> >> > > >>>   F90=$MPIF90 \
> >> >> > > >>>   FC=$MPIF90 \
> >> >> > > >>>   --with-patchelf-dir=$PATCHELF_DIR \
> >> >> > > >>>   --with-python-dir=$PYTHON_DIR \
> >> >> > > >>>   --with-superlu_dist-dir=$SUPERLU_DIST_DIR \
> >> >> > > >>>   --download-mumps=1 \
> >> >> > > >>>   --download-scalapack=1 \
> >> >> > > >>>   --download-blacs=1 \
> >> >> > > >>>   --download-ml=1
> >> >> > > >>>
> >> >> > > >>>
> >> >> > > >>> > --
> >> >> > > >>> > Anders
> >> >> > > >>> >
> >> >> > > >>> >
> >> >> > > >>> > mån 30 mars 2015 kl 17:05 skrev Garth N. Wells
> >> >> > > >>> > <[email protected]>:
> >> >> > > >>> >>
> >> >> > > >>> >> On Mon, Mar 30, 2015 at 1:34 PM, Anders Logg
> >> >> > > >>> >> <[email protected]> wrote:
> >> >> > > >>> >> > See this question on the QA forum:
> >> >> > > >>> >> >
> >> >> > > >>> >> >
> >> >> > > >>> >> > http://fenicsproject.org/qa/6875/ubuntu-compile-from-
> >> >> > > source-which-provide-better-performance
> >> >> > > >>> >> >
> >> >> > > >>> >> > The Cahn-Hilliard demo takes 40 seconds with 1.3 Ubuntu
> >> >> > > >>> >> > packages and 52 seconds with 1.5+ built from source. Are
> >> >> > > >>> >> > these regressions in performance or
> >> >> > > >>> >> > is Johannes that much better at building Debian packages
> >> >> > > >>> >> > than I am building
> >> >> > > >>> >> > FEniCS (with HashDist).
> >> >> > > >>> >> >
> >> >> > > >>> >>
> >> >> > > >>> >> With the 1.4 Ubuntu package (Ubuntu 14.10), I get 42s.
> With
> >> >> > > >>> >> my build of the dev version (I don't use Hashdist) I get
> >> >> > > >>> >> 34s.
> >> >> > > >>> >>
> >> >> > > >>> >> Garth
> >> >> > > >>> >>
> >> >> > > >>> >> > PS: Looking at the benchbot, there seem to have been
> some
> >> >> > > >>> >> > regressions in the
> >> >> > > >>> >> > timing facilities with the recent changes:
> >> >> > > >>> >> >
> >> >> > > >>> >> > http://fenicsproject.org/benchbot/
> >> >> > > >>> >> >
> >> >> > > >>> >> > --
> >> >> > > >>> >> > Anders
> >> >> > > >>> >> >
> >> >> > > >>> >> >
> >> >> > > >>> >> > _______________________________________________
> >> >> > > >>> >> > fenics mailing list
> >> >> > > >>> >> > [email protected]
> >> >> > > >>> >> > http://fenicsproject.org/mailman/listinfo/fenics
> >> >> > > >>> >> >
> >> >> > > >>> >
> >> >> > > >>> >
> >> >> > > >>> > _______________________________________________
> >> >> > > >>> > fenics mailing list
> >> >> > > >>> > [email protected]
> >> >> > > >>> > http://fenicsproject.org/mailman/listinfo/fenics
> >> >> > > >>> >
> >> >> > > >>> _______________________________________________
> >> >> > > >>> fenics mailing list
> >> >> > > >>> [email protected]
> >> >> > > >>> http://fenicsproject.org/mailman/listinfo/fenics
> >> >> > > >>
> >> >> > > >> _______________________________________________
> >> >> > > >> fenics mailing list
> >> >> > > >> [email protected]
> >> >> > > >> http://fenicsproject.org/mailman/listinfo/fenics
> >> >> > >
> >> >>
> >> >> _______________________________________________
> >> >> fenics mailing list
> >> >> [email protected]
> >> >> http://fenicsproject.org/mailman/listinfo/fenics
> >> >
> >> > _______________________________________________
> >> > fenics mailing list
> >> > [email protected]
> >> > http://fenicsproject.org/mailman/listinfo/fenics
> _______________________________________________
> fenics mailing list
> [email protected]
> http://fenicsproject.org/mailman/listinfo/fenics
>
_______________________________________________
fenics mailing list
[email protected]
http://fenicsproject.org/mailman/listinfo/fenics

Reply via email to