Dear all,

I think parmetis is not involved since I still run out of memory if I use
the following options:
export opts='-st_type sinvert -st_ksp_type preonly -st_pc_type lu
-st_pc_factor_mat_solver_type superlu_dist -eps_true_residual 1'
and  issuing:
mpiexec -n 24 ./ex7 -f1 A.petsc -f2 B.petsc -eps_nev 1 -eps_target
-4.008e-3+1.57142i $opts -eps_target_magnitude -eps_tol 1e-14 -memory_view

Bottom line is that the memory usage of petsc-3.9.4 / slepc-3.9.2 is much
lower than current version. I can only solve relatively small problems
using the 3.12 series :(
I have an example with smaller matrices that will likely fail in a 32 Gb
ram machine with petsc-3.12 but runs just fine with petsc-3.9. The
-memory_view output is

with petsc-3.9.4: (log 'justfine.log' attached)

Summary of Memory Usage in PETSc
Maximum (over computational time) process memory:        total 1.6665e+10
max 7.5674e+08 min 6.4215e+08
Current process memory:                                  total 1.5841e+10
max 7.2881e+08 min 6.0905e+08
Maximum (over computational time) space PetscMalloc()ed: total 3.1290e+09
max 1.5868e+08 min 1.0179e+08
Current space PetscMalloc()ed:                           total 1.8808e+06
max 7.8368e+04 min 7.8368e+04


with petsc-3.12.2: (log 'toobig.log' attached)

Summary of Memory Usage in PETSc
Maximum (over computational time) process memory:        total 3.1564e+10
max 1.3662e+09 min 1.2604e+09
Current process memory:                                  total 3.0355e+10
max 1.3082e+09 min 1.2254e+09
Maximum (over computational time) space PetscMalloc()ed: total 2.7618e+09
max 1.4339e+08 min 8.6493e+07
Current space PetscMalloc()ed:                           total 3.6127e+06
max 1.5053e+05 min 1.5053e+05

Strangely, monitoring with 'top' I can see *appreciably higher* peak memory
use, usually twice what -memory_view ends up reporting, both for
petsc-3.9.4 and current. Program fails usually at this peak if not enough
ram available

The matrices for the example quoted above can be downloaded here (I use
slepc's tutorial ex7.c to solve the problem):
https://www.dropbox.com/s/as9bec9iurjra6r/A.petsc?dl=0  (about 600 Mb)
https://www.dropbox.com/s/u2bbmng23rp8l91/B.petsc?dl=0  (about 210 Mb)

I haven't been able to use a debugger successfully since I am using a
compute node without the possibility of an xterm ... note that I have no
experience using a debugger so any help on that will also be appreciated!
Hope I can switch to the current petsc/slepc version for my production runs
soon...

Thanks again!
Santiago



On Thu, Jan 9, 2020 at 4:25 PM Stefano Zampini <stefano.zamp...@gmail.com>
wrote:

> Can you reproduce the issue with smaller matrices? Or with a debug build
> (i.e. using —with-debugging=1 and compilation flags -02 -g)?
>
> The only changes in parmetis between the two PETSc releases are these
> below, but I don’t see how they could cause issues
>
> kl-18448:pkg-parmetis szampini$ git log -2
> commit ab4fedc6db1f2e3b506be136e3710fcf89ce16ea (*HEAD -> **master*, *tag:
> v4.0.3-p5*, *origin/master*, *origin/dalcinl/random*, *origin/HEAD*)
> Author: Lisandro Dalcin <dalc...@gmail.com>
> Date:   Thu May 9 18:44:10 2019 +0300
>
>     GKLib: Make FPRFX##randInRange() portable for 32bit/64bit indices
>
> commit 2b4afc79a79ef063f369c43da2617fdb64746dd7
> Author: Lisandro Dalcin <dalc...@gmail.com>
> Date:   Sat May 4 17:22:19 2019 +0300
>
>     GKlib: Use gk_randint32() to define the RandomInRange() macro
>
>
>
> On Jan 9, 2020, at 4:31 AM, Smith, Barry F. via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
>
>  This is extremely worrisome:
>
> ==23361== Use of uninitialised value of size 8
> ==23361==    at 0x847E939: gk_randint64 (random.c:99)
> ==23361==    by 0x847EF88: gk_randint32 (random.c:128)
> ==23361==    by 0x81EBF0B: libparmetis__Match_Global (in
> /space/hpc-home/trianas/petsc-3.12.3/arch-linux2-c-debug/lib/libparmetis.so)
>
> do you get that with PETSc-3.9.4 or only with 3.12.3?
>
>   This may result in Parmetis using non-random numbers and then giving
> back an inappropriate ordering that requires more memory for SuperLU_DIST.
>
>  Suggest looking at the code, or running in the debugger to see what is
> going on there. We use parmetis all the time and don't see this.
>
>  Barry
>
>
>
>
>
>
> On Jan 8, 2020, at 4:34 PM, Santiago Andres Triana <rep...@gmail.com>
> wrote:
>
> Dear Matt, petsc-users:
>
> Finally back after the holidays to try to solve this issue, thanks for
> your patience!
> I compiled the latest petsc (3.12.3) with debugging enabled, the same
> problem appears: relatively large matrices result in out of memory errors.
> This is not the case for petsc-3.9.4, all fine there.
> This is a non-hermitian, generalized eigenvalue problem, I generate the A
> and B matrices myself and then I use example 7 (from the slepc tutorial at
> $SLEPC_DIR/src/eps/examples/tutorials/ex7.c ) to solve the problem:
>
> mpiexec -n 24 valgrind --tool=memcheck -q --num-callers=20
> --log-file=valgrind.log.%p ./ex7 -malloc off -f1 A.petsc -f2 B.petsc
> -eps_nev 1 -eps_target -2.5e-4+1.56524i -eps_target_magnitude -eps_tol
> 1e-14 $opts
>
> where the $opts variable is:
> export opts='-st_type sinvert -st_ksp_type preonly -st_pc_type lu
> -eps_error_relative ::ascii_info_detail -st_pc_factor_mat_solver_type
> superlu_dist -mat_superlu_dist_iterrefine 1 -mat_superlu_dist_colperm
> PARMETIS -mat_superlu_dist_parsymbfact 1 -eps_converged_reason
> -eps_conv_rel -eps_monitor_conv -eps_true_residual 1'
>
> the output from valgrind (sample from one processor) and from the program
> are attached.
> If it's of any use the matrices are here (might need at least 180 Gb of
> ram to solve the problem succesfully under petsc-3.9.4):
>
> https://www.dropbox.com/s/as9bec9iurjra6r/A.petsc?dl=0
> https://www.dropbox.com/s/u2bbmng23rp8l91/B.petsc?dl=0
>
> WIth petsc-3.9.4 and slepc-3.9.2 I can use matrices up to 10Gb (with 240
> Gb ram), but only up to 3Gb with the latest petsc/slepc.
> Any suggestions, comments or any other help are very much appreciated!
>
> Cheers,
> Santiago
>
>
>
> On Mon, Dec 23, 2019 at 11:19 PM Matthew Knepley <knep...@gmail.com>
> wrote:
> On Mon, Dec 23, 2019 at 3:14 PM Santiago Andres Triana <rep...@gmail.com>
> wrote:
> Dear all,
>
> After upgrading to petsc 3.12.2 my solver program crashes consistently.
> Before the upgrade I was using petsc 3.9.4 with no problems.
>
> My application deals with a complex-valued, generalized eigenvalue
> problem. The matrices involved are relatively large, typically 2 to 10 Gb
> in size, which is no problem for petsc 3.9.4.
>
> Are you sure that your indices do not exceed 4B? If so, you need to
> configure using
>
>  --with-64-bit-indices
>
> Also, it would be nice if you ran with the debugger so we can get a stack
> trace for the SEGV.
>
>  Thanks,
>
>    Matt
>
> However, after the upgrade I can only obtain solutions when the matrices
> are small, the solver crashes when the matrices' size exceed about 1.5 Gb:
>
> [0]PETSC ERROR:
> ------------------------------------------------------------------------
> [0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the
> batch system) has told this process to end
> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
> [0]PETSC ERROR: or see
> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS
> X to find memory corruption errors
> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and
> run
> [0]PETSC ERROR: to get more information on the crash.
>
> and so on for each cpu.
>
>
> I tried using valgrind and this is the typical output:
>
> ==2874== Conditional jump or move depends on uninitialised value(s)
> ==2874==    at 0x4018178: index (in /lib64/ld-2.22.so)
> ==2874==    by 0x400752D: expand_dynamic_string_token (in /lib64/
> ld-2.22.so)
> ==2874==    by 0x4008009: _dl_map_object (in /lib64/ld-2.22.so)
> ==2874==    by 0x40013E4: map_doit (in /lib64/ld-2.22.so)
> ==2874==    by 0x400EA53: _dl_catch_error (in /lib64/ld-2.22.so)
> ==2874==    by 0x4000ABE: do_preload (in /lib64/ld-2.22.so)
> ==2874==    by 0x4000EC0: handle_ld_preload (in /lib64/ld-2.22.so)
> ==2874==    by 0x40034F0: dl_main (in /lib64/ld-2.22.so)
> ==2874==    by 0x4016274: _dl_sysdep_start (in /lib64/ld-2.22.so)
> ==2874==    by 0x4004A99: _dl_start (in /lib64/ld-2.22.so)
> ==2874==    by 0x40011F7: ??? (in /lib64/ld-2.22.so)
> ==2874==    by 0x12: ???
> ==2874==
>
>
> These are my configuration options. Identical for both petsc 3.9.4 and
> 3.12.2:
>
> ./configure --with-scalar-type=complex --download-mumps
> --download-parmetis --download-metis --download-scalapack=1
> --download-fblaslapack=1 --with-debugging=0 --download-superlu_dist=1
> --download-ptscotch=1 CXXOPTFLAGS='-O3 -march=native' FOPTFLAGS='-O3
> -march=native' COPTFLAGS='-O3 -march=native'
>
>
> Thanks in advance for any comments or ideas!
>
> Cheers,
> Santiago
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <test1.e6034496><valgrind.log.23361>
>
>
>
>
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------



      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was compiled with a debugging option,      #
      #   To get timing results run ./configure                #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################


./ex7 on a arch-linux2-c-debug named hpcb-n01 with 24 processors, by trianas Thu Jan  9 21:01:38 2020
Using Petsc Release Version 3.9.4, Sep, 11, 2018 

                         Max       Max/Min        Avg      Total 
Time (sec):           5.941e+01      1.00054   5.939e+01
Objects:              6.300e+01      1.00000   6.300e+01
Flop:                 3.271e+08      1.16325   3.042e+08  7.301e+09
Flop/sec:            5.509e+06      1.16356   5.122e+06  1.229e+08
Memory:               1.587e+08      1.55890              3.129e+09
MPI Messages:         8.300e+01      8.30000   1.917e+01  4.600e+02
MPI Message Lengths:  4.025e+08     27.51161   1.861e+06  8.562e+08
MPI Reductions:       9.990e+02      1.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 5.9392e+01 100.0%  7.3011e+09 100.0%  4.600e+02 100.0%  1.861e+06      100.0%  9.910e+02  99.2% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------


      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was compiled with a debugging option,      #
      #   To get timing results run ./configure                #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################


Event                Count      Time (sec)     Flop                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

BuildTwoSidedF         3 1.0 2.0218e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult               22 1.0 1.0256e-01 1.5 1.18e+08 1.6 1.4e+02 3.5e+05 0.0e+00  0 31 30  6  0   0 31 30  6  0 22352
MatSolve              16 1.0 4.4843e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  8  0  0  0  0   8  0  0  0  0     0
MatLUFactorSym         1 1.0 6.2037e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         1 1.0 5.2568e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 89  0  0  0  0  89  0  0  0  0     0
MatConvert             1 1.0 2.5325e-02 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       3 1.0 3.5961e-0111.0 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  1   0  0  0  0  1     0
MatAssemblyEnd         3 1.0 1.8667e-01 1.0 0.00e+00 0.0 1.8e+02 4.3e+04 6.9e+01  0  0 40  1  7   0  0 40  1  7     0
MatGetRow          81840 1.0 1.9601e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            1 1.0 2.6226e-06 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 2.8653e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLoad                2 1.0 9.1971e-01 1.0 0.00e+00 0.0 2.3e+02 3.5e+06 8.6e+01  2  0 50 94  9   2  0 50 94  9     0
MatAXPY                1 1.0 6.7604e-01 1.0 0.00e+00 0.0 9.2e+01 4.3e+04 3.5e+01  1  0 20  0  4   1  0 20  0  4     0
VecNorm                3 1.0 6.8927e-04 1.5 9.82e+05 1.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  1   0  0  0  0  1 34196
VecCopy               17 1.0 1.1094e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet                22 1.0 2.8210e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY                3 1.0 1.3402e-03 1.2 9.82e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 17588
VecScatterBegin       22 1.0 1.2519e-03 1.1 0.00e+00 0.0 1.4e+02 3.5e+05 0.0e+00  0  0 30  6  0   0  0 30  6  0     0
VecScatterEnd         22 1.0 1.3747e-02 5.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSetRandom           1 1.0 4.6382e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
EPSSetUp               1 1.0 5.3278e+01 1.0 0.00e+00 0.0 9.2e+01 4.3e+04 1.3e+02 90  0 20  0 13  90  0 20  0 13     0
EPSSolve               1 1.0 5.8069e+01 1.0 3.08e+08 1.1 1.8e+02 1.9e+05 8.0e+02 98 95 40  4 80  98 95 40  4 81   120
STSetUp                1 1.0 5.3274e+01 1.0 0.00e+00 0.0 9.2e+01 4.3e+04 8.1e+01 90  0 20  0  8  90  0 20  0  8     0
STApply               16 1.0 4.5754e+00 1.0 6.49e+07 1.5 0.0e+00 0.0e+00 1.6e+01  8 18  0  0  2   8 18  0  0  2   286
STMatSolve            16 1.0 4.5395e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.6e+01  8  0  0  0  2   8  0  0  0  2     0
BVCopy                 1 1.0 2.7299e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
BVMultVec             34 1.0 5.2741e-02 1.0 9.92e+07 1.0 0.0e+00 0.0e+00 1.3e+02  0 33  0  0 13   0 33  0  0 13 45137
BVMultInPlace          2 1.0 3.0698e-02 1.1 5.57e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0  4351
BVDotVec              32 1.0 5.3395e-02 1.0 9.92e+07 1.0 0.0e+00 0.0e+00 1.6e+02  0 33  0  0 16   0 33  0  0 16 44584
BVOrthogonalizeV      17 1.0 1.0149e-01 1.0 1.88e+08 1.0 0.0e+00 0.0e+00 4.2e+02  0 62  0  0 42   0 62  0  0 42 44437
BVScale               17 1.0 1.3008e-03 1.1 2.78e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 51338
BVSetRandom            1 1.0 4.6455e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
DSSolve                1 1.0 3.4571e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
DSVectors              3 1.0 4.4417e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
DSOther                1 1.0 3.5477e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               1 1.0 3.8147e-06 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve              16 1.0 4.5385e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.6e+01  8  0  0  0  2   8  0  0  0  2     0
PCSetUp                1 1.0 5.2572e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.8e+01 89  0  0  0  2  89  0  0  0  2     0
PCApply               16 1.0 4.4844e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  8  0  0  0  0   8  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Viewer     4              3         2520     0.
              Matrix    15             15    125236536     0.
              Vector    22             22     19713856     0.
           Index Set    10             10       995280     0.
         Vec Scatter     4              4         4928     0.
          EPS Solver     1              1         2276     0.
  Spectral Transform     1              1          848     0.
       Basis Vectors     1              1         2168     0.
         PetscRandom     1              1          662     0.
              Region     1              1          672     0.
       Direct Solver     1              1        17440     0.
       Krylov Solver     1              1         1176     0.
      Preconditioner     1              1         1000     0.
========================================================================================================================
Average time to get PetscTime(): 2.38419e-08
Average time for MPI_Barrier(): 5.34058e-06
Average time for zero size MPI_Send(): 2.08616e-06
#PETSc Option Table entries:
-eps_nev 1
-eps_target -4.008e-3+1.57142i
-eps_target_magnitude
-eps_tol 1e-14
-eps_true_residual 1
-f1 A.petsc
-f2 B.petsc
-log_view :justfine.log
-matload_block_size 1
-memory_view
-st_ksp_type preonly
-st_pc_factor_mat_solver_type superlu_dist
-st_pc_type lu
-st_type sinvert
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4
Configure options: --with-scalar-type=complex --download-mumps --download-parmetis --download-metis --download-scalapack=1 --download-fblaslapack=1 --with-debugging=1 --download-superlu_dist=1 --download-ptscotch=1 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 COPTFLAGS=-O3
-----------------------------------------
Libraries compiled on 2020-01-09 07:55:04 on hpca-login 
Machine characteristics: Linux-4.4.114-94.14-default-x86_64-with-SuSE-12-x86_64
Using PETSc directory: /space/hpc-home/trianas/petsc-3.9.4
Using PETSc arch: arch-linux2-c-debug
-----------------------------------------

Using C compiler: mpicc    -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -O3  
Using Fortran compiler: mpif90   -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -O3    
-----------------------------------------

Using include paths: -I/space/hpc-home/trianas/petsc-3.9.4/include -I/space/hpc-home/trianas/petsc-3.9.4/arch-linux2-c-debug/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/space/hpc-home/trianas/petsc-3.9.4/arch-linux2-c-debug/lib -L/space/hpc-home/trianas/petsc-3.9.4/arch-linux2-c-debug/lib -lpetsc -Wl,-rpath,/space/hpc-home/trianas/petsc-3.9.4/arch-linux2-c-debug/lib -L/space/hpc-home/trianas/petsc-3.9.4/arch-linux2-c-debug/lib -Wl,-rpath,/space/hpc-apps/bira/18/mpich/lib64 -L/space/hpc-apps/bira/18/mpich/lib64 -Wl,-rpath,/space/hpc-apps/bira/18/base/lib64 -L/space/hpc-apps/bira/18/base/lib64 -Wl,-rpath,/space/hpc-apps/bira/18/py36/lib64 -L/space/hpc-apps/bira/18/py36/lib64 -Wl,-rpath,/space/hpc-apps/bira/18/base/lib64/gcc/x86_64-pc-linux-gnu/7.2.0 -L/space/hpc-apps/bira/18/base/lib64/gcc/x86_64-pc-linux-gnu/7.2.0 -Wl,-rpath,/space/hpc-apps/obs/flex-2.6.4/lib -L/space/hpc-apps/obs/flex-2.6.4/lib -Wl,-rpath,/space/hpc-apps/obs/bison-3.1/lib -L/space/hpc-apps/obs/bison-3.1/lib -Wl,-rpath,/space/hpc-apps/obs/valgrind-3.11.0/lib -L/space/hpc-apps/obs/valgrind-3.11.0/lib -Wl,-rpath,/sw/sdev/intel/parallel_studio_xe_2015_update_3-pguyan/composer_xe_2015.3.187/mkl/lib/intel64 -L/sw/sdev/intel/parallel_studio_xe_2015_update_3-pguyan/composer_xe_2015.3.187/mkl/lib/intel64 -Wl,-rpath,/sw/sdev/intel/parallel_studio_xe_2015_update_3-pguyan/composer_xe_2015.3.187/mkl/lib/mic -L/sw/sdev/intel/parallel_studio_xe_2015_update_3-pguyan/composer_xe_2015.3.187/mkl/lib/mic -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist -lflapack -lfblas -lparmetis -lmetis -lptesmumps -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lm -lX11 -lpthread -lstdc++ -ldl -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lrt -lm -lpthread -lz -lstdc++ -ldl
-----------------------------------------



      ##########################################################
      #                                                        #
      #                          WARNING!!!                    #
      #                                                        #
      #   This code was compiled with a debugging option,      #
      #   To get timing results run ./configure                #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################


************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------



      ##########################################################
      #                                                        #
      #                       WARNING!!!                       #
      #                                                        #
      #   This code was compiled with a debugging option.      #
      #   To get timing results run ./configure                #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################


./ex7 on a arch-linux2-c-debug named hpcb-n01 with 24 processors, by trianas Thu Jan  9 20:48:17 2020
Using Petsc Release Version 3.12.3, Jan, 03, 2020 

                         Max       Max/Min     Avg       Total 
Time (sec):           6.043e+01     1.000   6.043e+01
Objects:              6.700e+01     1.000   6.700e+01
Flop:                 3.338e+08     1.162   3.106e+08  7.454e+09
Flop/sec:             5.525e+06     1.162   5.140e+06  1.234e+08
Memory:               1.434e+08     1.658   1.151e+08  2.762e+09
MPI Messages:         8.400e+01     8.000   2.012e+01  4.830e+02
MPI Message Lengths:  4.027e+08    27.218   1.781e+06  8.602e+08
MPI Reductions:       1.061e+03     1.000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flop
                            and VecAXPY() for complex vectors of length N --> 8N flop

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     Avg         %Total    Count   %Total 
 0:      Main Stage: 6.0429e+01 100.0%  7.4541e+09 100.0%  4.830e+02 100.0%  1.781e+06      100.0%  1.053e+03  99.2% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------


      ##########################################################
      #                                                        #
      #                       WARNING!!!                       #
      #                                                        #
      #   This code was compiled with a debugging option.      #
      #   To get timing results run ./configure                #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################


Event                Count      Time (sec)     Flop                              --- Global ---  --- Stage ----  Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

BuildTwoSided          4 1.0 1.5187e-04 1.1 0.00e+00 0.0 6.9e+01 4.0e+00 0.0e+00  0  0 14  0  0   0  0 14  0  0     0
BuildTwoSidedF         3 1.0 2.0576e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult               22 1.0 1.0126e-01 1.7 1.18e+08 1.6 1.4e+02 3.5e+05 0.0e+00  0 31 29  6  0   0 31 29  6  0 22640
MatSolve              16 1.0 5.2042e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  9  0  0  0  0   9  0  0  0  0     0
MatLUFactorSym         1 1.0 2.5845e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLUFactorNum         1 1.0 5.3477e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 88  0  0  0  0  88  0  0  0  0     0
MatConvert             1 1.0 3.3887e-02 2.0 0.00e+00 0.0 6.9e+01 5.8e+04 6.0e+00  0  0 14  0  1   0  0 14  0  1     0
MatAssemblyBegin       3 1.0 2.5305e-01 5.9 0.00e+00 0.0 0.0e+00 0.0e+00 6.0e+00  0  0  0  0  1   0  0  0  0  1     0
MatAssemblyEnd         3 1.0 1.0584e-01 1.0 0.00e+00 0.0 1.4e+02 5.8e+04 5.7e+01  0  0 29  1  5   0  0 29  1  5     0
MatGetRowIJ            1 1.0 4.7684e-07 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 2.8210e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatLoad                2 1.0 9.9677e-01 1.0 0.00e+00 0.0 2.1e+02 3.9e+06 8.6e+01  2  0 43 94  8   2  0 43 94  8     0
MatAXPY                1 1.0 1.6761e-01 1.0 1.46e+06 1.8 6.9e+01 5.8e+04 3.5e+01  0  0 14  0  3   0  0 14  0  3   163
VecNorm               19 1.0 6.7976e-03 2.0 6.22e+06 1.0 0.0e+00 0.0e+00 3.8e+01  0  2  0  0  4   0  2  0  0  4 21960
VecCopy               17 1.0 1.0455e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet                22 1.0 1.8997e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY                3 1.0 7.8321e-04 1.3 9.82e+05 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 30094
VecScatterBegin       22 1.0 1.2724e-03 1.3 0.00e+00 0.0 1.4e+02 3.5e+05 0.0e+00  0  0 29  6  0   0  0 29  6  0     0
VecScatterEnd         22 1.0 1.2303e-02 9.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSetRandom           1 1.0 3.6097e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetGraph             4 1.0 3.5715e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetUp                4 1.0 3.8376e-03 1.1 0.00e+00 0.0 2.1e+02 5.8e+04 0.0e+00  0  0 43  1  0   0  0 43  1  0     0
SFBcastOpBegin        22 1.0 1.2124e-03 1.3 0.00e+00 0.0 1.4e+02 3.5e+05 0.0e+00  0  0 29  6  0   0  0 29  6  0     0
SFBcastOpEnd          22 1.0 1.2223e-02 9.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
EPSSetUp               1 1.0 5.3688e+01 1.0 1.46e+06 1.8 1.4e+02 5.8e+04 1.5e+02 89  0 29  1 14  89  0 29  1 15     1
EPSSolve               1 1.0 5.9127e+01 1.0 3.15e+08 1.1 2.3e+02 1.7e+05 8.6e+02 98 95 48  5 81  98 95 48  5 82   120
STSetUp                1 1.0 5.3683e+01 1.0 1.46e+06 1.8 1.4e+02 5.8e+04 9.3e+01 89  0 29  1  9  89  0 29  1  9     1
STApply               16 1.0 5.2970e+00 1.0 7.01e+07 1.4 0.0e+00 0.0e+00 4.8e+01  9 19  0  0  5   9 19  0  0  5   271
STMatSolve            16 1.0 5.2621e+00 1.0 5.24e+06 1.0 0.0e+00 0.0e+00 4.8e+01  9  2  0  0  5   9  2  0  0  5    24
BVCopy                 1 1.0 2.3913e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
BVMultVec             34 1.0 4.5938e-02 1.0 9.92e+07 1.0 0.0e+00 0.0e+00 1.3e+02  0 32  0  0 12   0 32  0  0 12 51822
BVMultInPlace          2 1.0 2.8484e-03 1.1 5.57e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   0  2  0  0  0 46891
BVDotVec              32 1.0 4.7087e-02 1.0 9.92e+07 1.0 0.0e+00 0.0e+00 1.6e+02  0 32  0  0 15   0 32  0  0 15 50557
BVOrthogonalizeV      17 1.0 8.8424e-02 1.0 1.88e+08 1.0 0.0e+00 0.0e+00 4.2e+02  0 60  0  0 39   0 60  0  0 40 51001
BVScale               17 1.0 7.2718e-04 1.1 2.78e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   0  1  0  0  0 91837
BVSetRandom            1 1.0 3.6461e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 4.0e+00  0  0  0  0  0   0  0  0  0  0     0
DSSolve                1 1.0 3.3617e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
DSVectors              3 1.0 1.0419e-04 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
DSOther                1 1.0 4.0054e-05 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp               1 1.0 6.9141e-06 7.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve              16 1.0 5.2619e+00 1.0 5.24e+06 1.0 0.0e+00 0.0e+00 4.8e+01  9  2  0  0  5   9  2  0  0  5    24
PCSetUp                1 1.0 5.3481e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 2.6e+01 89  0  0  0  2  89  0  0  0  2     0
PCApply               16 1.0 5.2042e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  9  0  0  0  0   9  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Viewer     4              3         2520     0.
              Matrix    15             15    125237144     0.
              Vector    22             22     19714528     0.
           Index Set    10             10       995096     0.
         Vec Scatter     4              4         3168     0.
   Star Forest Graph     4              4         3936     0.
          EPS Solver     1              1         2292     0.
  Spectral Transform     1              1          848     0.
       Basis Vectors     1              1         2184     0.
         PetscRandom     1              1          662     0.
              Region     1              1          672     0.
       Direct Solver     1              1        17456     0.
       Krylov Solver     1              1         1400     0.
      Preconditioner     1              1         1000     0.
========================================================================================================================
Average time to get PetscTime(): 2.38419e-08
Average time for MPI_Barrier(): 6.05583e-06
Average time for zero size MPI_Send(): 1.80801e-06
#PETSc Option Table entries:
-eps_nev 1
-eps_target -4.008e-3+1.57142i
-eps_target_magnitude
-eps_tol 1e-14
-eps_true_residual 1
-f1 A.petsc
-f2 B.petsc
-log_view :toobig.log
-matload_block_size 1
-memory_view
-st_ksp_type preonly
-st_pc_factor_mat_solver_type superlu_dist
-st_pc_type lu
-st_type sinvert
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 16 sizeof(PetscInt) 4
Configure options: --with-scalar-type=complex --download-mumps --download-parmetis --download-metis --download-scalapack=1 --download-fblaslapack=1 --with-debugging=1 --download-superlu_dist=1 --download-ptscotch=1 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 COPTFLAGS=-O3
-----------------------------------------
Libraries compiled on 2020-01-08 19:48:25 on hpca-login 
Machine characteristics: Linux-4.4.114-94.14-default-x86_64-with-SuSE-12-x86_64
Using PETSc directory: /space/hpc-home/trianas/petsc-3.12.3
Using PETSc arch: arch-linux2-c-debug
-----------------------------------------

Using C compiler: mpicc  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -O3  
Using Fortran compiler: mpif90  -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -O3    
-----------------------------------------

Using include paths: -I/space/hpc-home/trianas/petsc-3.12.3/include -I/space/hpc-home/trianas/petsc-3.12.3/arch-linux2-c-debug/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: -Wl,-rpath,/space/hpc-home/trianas/petsc-3.12.3/arch-linux2-c-debug/lib -L/space/hpc-home/trianas/petsc-3.12.3/arch-linux2-c-debug/lib -lpetsc -Wl,-rpath,/space/hpc-home/trianas/petsc-3.12.3/arch-linux2-c-debug/lib -L/space/hpc-home/trianas/petsc-3.12.3/arch-linux2-c-debug/lib -Wl,-rpath,/space/hpc-apps/bira/18/mpich/lib64 -L/space/hpc-apps/bira/18/mpich/lib64 -Wl,-rpath,/space/hpc-apps/bira/18/base/lib64 -L/space/hpc-apps/bira/18/base/lib64 -Wl,-rpath,/space/hpc-apps/bira/18/py36/lib64 -L/space/hpc-apps/bira/18/py36/lib64 -Wl,-rpath,/space/hpc-apps/bira/18/base/lib64/gcc/x86_64-pc-linux-gnu/7.2.0 -L/space/hpc-apps/bira/18/base/lib64/gcc/x86_64-pc-linux-gnu/7.2.0 -Wl,-rpath,/space/hpc-apps/obs/flex-2.6.4/lib -L/space/hpc-apps/obs/flex-2.6.4/lib -Wl,-rpath,/space/hpc-apps/obs/bison-3.1/lib -L/space/hpc-apps/obs/bison-3.1/lib -Wl,-rpath,/space/hpc-apps/obs/valgrind-3.11.0/lib -L/space/hpc-apps/obs/valgrind-3.11.0/lib -Wl,-rpath,/sw/sdev/intel/parallel_studio_xe_2015_update_3-pguyan/composer_xe_2015.3.187/mkl/lib/intel64 -L/sw/sdev/intel/parallel_studio_xe_2015_update_3-pguyan/composer_xe_2015.3.187/mkl/lib/intel64 -Wl,-rpath,/sw/sdev/intel/parallel_studio_xe_2015_update_3-pguyan/composer_xe_2015.3.187/mkl/lib/mic -L/sw/sdev/intel/parallel_studio_xe_2015_update_3-pguyan/composer_xe_2015.3.187/mkl/lib/mic -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lsuperlu_dist -lflapack -lfblas -lptesmumps -lptscotchparmetis -lptscotch -lptscotcherr -lesmumps -lscotch -lscotcherr -lpthread -lparmetis -lmetis -lm -lX11 -lstdc++ -ldl -lmpifort -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lrt -lquadmath -lstdc++ -ldl
-----------------------------------------



      ##########################################################
      #                                                        #
      #                       WARNING!!!                       #
      #                                                        #
      #   This code was compiled with a debugging option.      #
      #   To get timing results run ./configure                #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################


Reply via email to