Re: [petsc-users] killed 9 signal after upgrade from petsc 3.9.4 to 3.12.2

2020-01-10 Thread Santiago Andres Triana
Hi Barry, petsc-users:

Just updated to petsc-3.12.3 and the performance is about the same as
3.12.2, i.e. about 2x the memory use of petsc-3.9.4


petsc-3.12.3 (uses superlu_dist-6.2.0)

Summary of Memory Usage in PETSc
Maximum (over computational time) process memory:total 2.9368e+10
max 1.2922e+09 min 1.1784e+09
Current process memory:  total 2.8192e+10
max 1.2263e+09 min 1.1456e+09
Maximum (over computational time) space PetscMalloc()ed: total 2.7619e+09
max 1.4339e+08 min 8.6494e+07
Current space PetscMalloc()ed:   total 3.6127e+06
max 1.5053e+05 min 1.5053e+05


petsc-3.9.4

Summary of Memory Usage in PETSc
Maximum (over computational time) process memory:total 1.5695e+10
max 7.1985e+08 min 6.0131e+08
Current process memory:  total 1.3186e+10
max 6.9240e+08 min 4.2821e+08
Maximum (over computational time) space PetscMalloc()ed: total 3.1290e+09
max 1.5869e+08 min 1.0179e+08
Current space PetscMalloc()ed:   total 1.8808e+06
max 7.8368e+04 min 7.8368e+04


However, it seems that the culprit is superlu_dist: I recompiled current
petsc/slepc with superlu_dist-5.4.0 (used option
--download-superlu_dist=/home/spin/superlu_dist-5.4.0.tar.gz) and the
result is this:

petsc-3.12.3 with superlu_dist-5.4.0:

Summary of Memory Usage in PETSc
Maximum (over computational time) process memory:total 1.5636e+10
max 7.1217e+08 min 5.9963e+08
Current process memory:  total 1.3401e+10
max 6.5498e+08 min 4.2626e+08
Maximum (over computational time) space PetscMalloc()ed: total 2.7619e+09
max 1.4339e+08 min 8.6494e+07
Current space PetscMalloc()ed:   total 3.6127e+06
max 1.5053e+05 min 1.5053e+05

I could not compile petsc-3.12.3 with the exact superlu_dist version that
petsc-3.9.4 uses (5.3.0), but will try newer versions to see how they
perform ... I guess I should address this issue to the superlu mantainers?

Thanks!
Santiago

On Fri, Jan 10, 2020 at 9:19 PM Smith, Barry F.  wrote:

>
>   Can you please try v3.12.3  There was some funky business mistakenly
> added related to partitioning that has been fixed in 3.12.3
>
>Barry
>
>
> > On Jan 10, 2020, at 1:57 PM, Santiago Andres Triana 
> wrote:
> >
> > Dear all,
> >
> > I ran the program with valgrind --tool=massif, the results are cryptic
> to me ... not sure who's the memory hog! the logs are attached.
> >
> > The command I used is:
> > mpiexec -n 24 valgrind --tool=massif --num-callers=20
> --log-file=valgrind.log.%p ./ex7 -f1 A.petsc -f2 B.petsc -eps_nev 1 $opts
> -eps_target -4.008e-3+1.57142i -eps_target_magnitude -eps_tol 1e-14
> >
> > Is there any possibility to install a version of superlu_dist (or mumps)
> different from what the petsc version automatically downloads?
> >
> > Thanks!
> > Santiago
> >
> >
> > On Thu, Jan 9, 2020 at 10:04 PM Dave May 
> wrote:
> > This kind of issue is difficult to untangle because you have potentially
> three pieces of software which might have changed between v3.9 and v3.12,
> namely
> > PETSc, SLEPC and SuperLU_dist.
> > You need to isolate which software component is responsible for the 2x
> increase in memory.
> >
> > When I look at the memory usage in the log files, things look very very
> similar for the raw PETSc objects.
> >
> > [v3.9]
> > --- Event Stage 0: Main Stage
> >
> >   Viewer 4  3 2520 0.
> >   Matrix15 15125236536 0.
> >   Vector22 22 19713856 0.
> >Index Set10 10   995280 0.
> >  Vec Scatter 4  4 4928 0.
> >   EPS Solver 1  1 2276 0.
> >   Spectral Transform 1  1  848 0.
> >Basis Vectors 1  1 2168 0.
> >  PetscRandom 1  1  662 0.
> >   Region 1  1  672 0.
> >Direct Solver 1  117440 0.
> >Krylov Solver 1  1 1176 0.
> >   Preconditioner 1  1 1000 0.
> >
> > versus
> >
> > [v3.12]
> > --- Event Stage 0: Main Stage
> >
> >   Viewer 4  3 2520 0.
> >   Matrix15 15125237144 0.
> >   Vector22 22 19714528 0.
> >Index Set10 10   995096 0.
> >  Vec Scatter 4 

Re: [petsc-users] killed 9 signal after upgrade from petsc 3.9.4 to 3.12.2

2020-01-09 Thread Santiago Andres Triana
Dear all,

I think parmetis is not involved since I still run out of memory if I use
the following options:
export opts='-st_type sinvert -st_ksp_type preonly -st_pc_type lu
-st_pc_factor_mat_solver_type superlu_dist -eps_true_residual 1'
and  issuing:
mpiexec -n 24 ./ex7 -f1 A.petsc -f2 B.petsc -eps_nev 1 -eps_target
-4.008e-3+1.57142i $opts -eps_target_magnitude -eps_tol 1e-14 -memory_view

Bottom line is that the memory usage of petsc-3.9.4 / slepc-3.9.2 is much
lower than current version. I can only solve relatively small problems
using the 3.12 series :(
I have an example with smaller matrices that will likely fail in a 32 Gb
ram machine with petsc-3.12 but runs just fine with petsc-3.9. The
-memory_view output is

with petsc-3.9.4: (log 'justfine.log' attached)

Summary of Memory Usage in PETSc
Maximum (over computational time) process memory:total 1.6665e+10
max 7.5674e+08 min 6.4215e+08
Current process memory:  total 1.5841e+10
max 7.2881e+08 min 6.0905e+08
Maximum (over computational time) space PetscMalloc()ed: total 3.1290e+09
max 1.5868e+08 min 1.0179e+08
Current space PetscMalloc()ed:   total 1.8808e+06
max 7.8368e+04 min 7.8368e+04


with petsc-3.12.2: (log 'toobig.log' attached)

Summary of Memory Usage in PETSc
Maximum (over computational time) process memory:total 3.1564e+10
max 1.3662e+09 min 1.2604e+09
Current process memory:  total 3.0355e+10
max 1.3082e+09 min 1.2254e+09
Maximum (over computational time) space PetscMalloc()ed: total 2.7618e+09
max 1.4339e+08 min 8.6493e+07
Current space PetscMalloc()ed:   total 3.6127e+06
max 1.5053e+05 min 1.5053e+05

Strangely, monitoring with 'top' I can see *appreciably higher* peak memory
use, usually twice what -memory_view ends up reporting, both for
petsc-3.9.4 and current. Program fails usually at this peak if not enough
ram available

The matrices for the example quoted above can be downloaded here (I use
slepc's tutorial ex7.c to solve the problem):
https://www.dropbox.com/s/as9bec9iurjra6r/A.petsc?dl=0  (about 600 Mb)
https://www.dropbox.com/s/u2bbmng23rp8l91/B.petsc?dl=0  (about 210 Mb)

I haven't been able to use a debugger successfully since I am using a
compute node without the possibility of an xterm ... note that I have no
experience using a debugger so any help on that will also be appreciated!
Hope I can switch to the current petsc/slepc version for my production runs
soon...

Thanks again!
Santiago



On Thu, Jan 9, 2020 at 4:25 PM Stefano Zampini 
wrote:

> Can you reproduce the issue with smaller matrices? Or with a debug build
> (i.e. using —with-debugging=1 and compilation flags -02 -g)?
>
> The only changes in parmetis between the two PETSc releases are these
> below, but I don’t see how they could cause issues
>
> kl-18448:pkg-parmetis szampini$ git log -2
> commit ab4fedc6db1f2e3b506be136e3710fcf89ce16ea (*HEAD -> **master*, *tag:
> v4.0.3-p5*, *origin/master*, *origin/dalcinl/random*, *origin/HEAD*)
> Author: Lisandro Dalcin 
> Date:   Thu May 9 18:44:10 2019 +0300
>
> GKLib: Make FPRFX##randInRange() portable for 32bit/64bit indices
>
> commit 2b4afc79a79ef063f369c43da2617fdb64746dd7
> Author: Lisandro Dalcin 
> Date:   Sat May 4 17:22:19 2019 +0300
>
> GKlib: Use gk_randint32() to define the RandomInRange() macro
>
>
>
> On Jan 9, 2020, at 4:31 AM, Smith, Barry F. via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
>
>  This is extremely worrisome:
>
> ==23361== Use of uninitialised value of size 8
> ==23361==at 0x847E939: gk_randint64 (random.c:99)
> ==23361==by 0x847EF88: gk_randint32 (random.c:128)
> ==23361==by 0x81EBF0B: libparmetis__Match_Global (in
> /space/hpc-home/trianas/petsc-3.12.3/arch-linux2-c-debug/lib/libparmetis.so)
>
> do you get that with PETSc-3.9.4 or only with 3.12.3?
>
>   This may result in Parmetis using non-random numbers and then giving
> back an inappropriate ordering that requires more memory for SuperLU_DIST.
>
>  Suggest looking at the code, or running in the debugger to see what is
> going on there. We use parmetis all the time and don't see this.
>
>  Barry
>
>
>
>
>
>
> On Jan 8, 2020, at 4:34 PM, Santiago Andres Triana 
> wrote:
>
> Dear Matt, petsc-users:
>
> Finally back after the holidays to try to solve this issue, thanks for
> your patience!
> I compiled the latest petsc (3.12.3) with debugging enabled, the same
> problem appears: relatively large matrices result in out of memory errors.
> This is not the case for petsc-3.9.4, all fine there.
> This is a non-hermitian, generalized eigenvalue problem, I generate the A
> and B matrices myself and then I use example 7 (from the slepc tutorial at
> $SLEPC_DIR/src/eps/examples/tutorials/ex7.c ) to

Re: [petsc-users] killed 9 signal after upgrade from petsc 3.9.4 to 3.12.2

2020-01-08 Thread Santiago Andres Triana
Dear Matt, petsc-users:

Finally back after the holidays to try to solve this issue, thanks for your
patience!
I compiled the latest petsc (3.12.3) with debugging enabled, the same
problem appears: relatively large matrices result in out of memory errors.
This is not the case for petsc-3.9.4, all fine there.
This is a non-hermitian, generalized eigenvalue problem, I generate the A
and B matrices myself and then I use example 7 (from the slepc tutorial at
$SLEPC_DIR/src/eps/examples/tutorials/ex7.c ) to solve the problem:

mpiexec -n 24 valgrind --tool=memcheck -q --num-callers=20
--log-file=valgrind.log.%p ./ex7 -malloc off -f1 A.petsc -f2 B.petsc
-eps_nev 1 -eps_target -2.5e-4+1.56524i -eps_target_magnitude -eps_tol
1e-14 $opts

where the $opts variable is:
export opts='-st_type sinvert -st_ksp_type preonly -st_pc_type lu
-eps_error_relative ::ascii_info_detail -st_pc_factor_mat_solver_type
superlu_dist -mat_superlu_dist_iterrefine 1 -mat_superlu_dist_colperm
PARMETIS -mat_superlu_dist_parsymbfact 1 -eps_converged_reason
-eps_conv_rel -eps_monitor_conv -eps_true_residual 1'

the output from valgrind (sample from one processor) and from the program
are attached.
If it's of any use the matrices are here (might need at least 180 Gb of ram
to solve the problem succesfully under petsc-3.9.4):

https://www.dropbox.com/s/as9bec9iurjra6r/A.petsc?dl=0
https://www.dropbox.com/s/u2bbmng23rp8l91/B.petsc?dl=0

WIth petsc-3.9.4 and slepc-3.9.2 I can use matrices up to 10Gb (with 240 Gb
ram), but only up to 3Gb with the latest petsc/slepc.
Any suggestions, comments or any other help are very much appreciated!

Cheers,
Santiago



On Mon, Dec 23, 2019 at 11:19 PM Matthew Knepley  wrote:

> On Mon, Dec 23, 2019 at 3:14 PM Santiago Andres Triana 
> wrote:
>
>> Dear all,
>>
>> After upgrading to petsc 3.12.2 my solver program crashes consistently.
>> Before the upgrade I was using petsc 3.9.4 with no problems.
>>
>> My application deals with a complex-valued, generalized eigenvalue
>> problem. The matrices involved are relatively large, typically 2 to 10 Gb
>> in size, which is no problem for petsc 3.9.4.
>>
>
> Are you sure that your indices do not exceed 4B? If so, you need to
> configure using
>
>   --with-64-bit-indices
>
> Also, it would be nice if you ran with the debugger so we can get a stack
> trace for the SEGV.
>
>   Thanks,
>
> Matt
>
>
>> However, after the upgrade I can only obtain solutions when the matrices
>> are small, the solver crashes when the matrices' size exceed about 1.5 Gb:
>>
>> [0]PETSC ERROR:
>> 
>> [0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the
>> batch system) has told this process to end
>> [0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
>> [0]PETSC ERROR: or see
>> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>> [0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS
>> X to find memory corruption errors
>> [0]PETSC ERROR: configure using --with-debugging=yes, recompile, link,
>> and run
>> [0]PETSC ERROR: to get more information on the crash.
>>
>> and so on for each cpu.
>>
>>
>> I tried using valgrind and this is the typical output:
>>
>> ==2874== Conditional jump or move depends on uninitialised value(s)
>> ==2874==at 0x4018178: index (in /lib64/ld-2.22.so)
>> ==2874==by 0x400752D: expand_dynamic_string_token (in /lib64/
>> ld-2.22.so)
>> ==2874==by 0x4008009: _dl_map_object (in /lib64/ld-2.22.so)
>> ==2874==by 0x40013E4: map_doit (in /lib64/ld-2.22.so)
>> ==2874==by 0x400EA53: _dl_catch_error (in /lib64/ld-2.22.so)
>> ==2874==by 0x4000ABE: do_preload (in /lib64/ld-2.22.so)
>> ==2874==by 0x4000EC0: handle_ld_preload (in /lib64/ld-2.22.so)
>> ==2874==by 0x40034F0: dl_main (in /lib64/ld-2.22.so)
>> ==2874==by 0x4016274: _dl_sysdep_start (in /lib64/ld-2.22.so)
>> ==2874==by 0x4004A99: _dl_start (in /lib64/ld-2.22.so)
>> ==2874==by 0x40011F7: ??? (in /lib64/ld-2.22.so)
>> ==2874==by 0x12: ???
>> ==2874==
>>
>>
>> These are my configuration options. Identical for both petsc 3.9.4 and
>> 3.12.2:
>>
>> ./configure --with-scalar-type=complex --download-mumps
>> --download-parmetis --download-metis --download-scalapack=1
>> --download-fblaslapack=1 --with-debugging=0 --download-superlu_dist=1
>> --download-ptscotch=1 CXXOPTFLAGS='-O3 -march=native' FOPTFLAGS='-O3
>> -march=native' COPTFLAGS='-O3 -march=native'
>>
>>
>> Thanks in advance for any comments or ideas!
>>
>> Cheers,
>> Santiago
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>


test1.e6034496
Description: Binary data


valgrind.log.23361
Description: Binary data


[petsc-users] killed 9 signal after upgrade from petsc 3.9.4 to 3.12.2

2019-12-23 Thread Santiago Andres Triana
Dear all,

After upgrading to petsc 3.12.2 my solver program crashes consistently.
Before the upgrade I was using petsc 3.9.4 with no problems.

My application deals with a complex-valued, generalized eigenvalue problem.
The matrices involved are relatively large, typically 2 to 10 Gb in size,
which is no problem for petsc 3.9.4.
However, after the upgrade I can only obtain solutions when the matrices
are small, the solver crashes when the matrices' size exceed about 1.5 Gb:

[0]PETSC ERROR:

[0]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the
batch system) has told this process to end
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see
https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[0]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS X
to find memory corruption errors
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and
run
[0]PETSC ERROR: to get more information on the crash.

and so on for each cpu.


I tried using valgrind and this is the typical output:

==2874== Conditional jump or move depends on uninitialised value(s)
==2874==at 0x4018178: index (in /lib64/ld-2.22.so)
==2874==by 0x400752D: expand_dynamic_string_token (in /lib64/ld-2.22.so)
==2874==by 0x4008009: _dl_map_object (in /lib64/ld-2.22.so)
==2874==by 0x40013E4: map_doit (in /lib64/ld-2.22.so)
==2874==by 0x400EA53: _dl_catch_error (in /lib64/ld-2.22.so)
==2874==by 0x4000ABE: do_preload (in /lib64/ld-2.22.so)
==2874==by 0x4000EC0: handle_ld_preload (in /lib64/ld-2.22.so)
==2874==by 0x40034F0: dl_main (in /lib64/ld-2.22.so)
==2874==by 0x4016274: _dl_sysdep_start (in /lib64/ld-2.22.so)
==2874==by 0x4004A99: _dl_start (in /lib64/ld-2.22.so)
==2874==by 0x40011F7: ??? (in /lib64/ld-2.22.so)
==2874==by 0x12: ???
==2874==


These are my configuration options. Identical for both petsc 3.9.4 and
3.12.2:

./configure --with-scalar-type=complex --download-mumps --download-parmetis
--download-metis --download-scalapack=1 --download-fblaslapack=1
--with-debugging=0 --download-superlu_dist=1 --download-ptscotch=1
CXXOPTFLAGS='-O3 -march=native' FOPTFLAGS='-O3 -march=native'
COPTFLAGS='-O3 -march=native'


Thanks in advance for any comments or ideas!

Cheers,
Santiago


[petsc-users] problem downloading "fix-syntax-for-nag.tar.gx"

2019-11-19 Thread Santiago Andres Triana via petsc-users
Hello petsc-users:

I found this error when configure tries to download fblaslapack:

***
 UNABLE to CONFIGURE with GIVEN OPTIONS(see configure.log for
details):
---
Error during download/extract/detection of FBLASLAPACK:
file could not be opened successfully
Downloaded package FBLASLAPACK from:
https://bitbucket.org/petsc/pkg-fblaslapack/get/origin/barry/2019-08-22/fix-syntax-for-nag.tar.gz
is not a tarball.
[or installed python cannot process compressed files]
* If you are behind a firewall - please fix your proxy and rerun ./configure
  For example at LANL you may need to set the environmental variable
http_proxy (or HTTP_PROXY?) to  http://proxyout.lanl.gov
* You can run with --with-packages-download-dir=/adirectory and ./configure
will instruct you what packages to download manually
* or you can download the above URL manually, to
/yourselectedlocation/fix-syntax-for-nag.tar.gz
  and use the configure option:
  --download-fblaslapack=/yourselectedlocation/fix-syntax-for-nag.tar.gz
***


Any ideas? the file in question doesn't seem to exist ... Thanks a lot in
advance!

Santiago


Re: [petsc-users] Segmentation violation

2018-10-31 Thread Santiago Andres Triana via petsc-users
Hi Hong,

You can find the matrices here:
https://www.dropbox.com/s/ejpa9owkv8tjnwi/A.petsc?dl=0
https://www.dropbox.com/s/urjtxaezl0cv3om/B.petsc?dl=0

Changing the target value leads to the same error. What is strange is that
this works without a problem on two other machines. But in my main
workstation (the one I use for developing and testing) it fails :(

Thanks so much for your help!
Santiago



On Wed, Oct 31, 2018 at 2:48 AM Zhang, Hong  wrote:

> Santiago,
> The shift '-eps_target -2e-3+1.01i' is very close to the eigenvalues. What
> happens if you pick a target little away from your eigenvalues?
> I suspect mumps encounters a zero pivot during numerical factorization.
> There are options to handle it, but I need matrices A and B to investigate.
> I am not sure if the problem comes from memory bug.
> Anyway, I'm cc'ing mumps developers here.
>
> Hong
>
> On Tue, Oct 30, 2018 at 8:09 PM Smith, Barry F. via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
>>
>>   Yeah this doesn't look good for MUMPS but isn't for sure the problem
>> either.
>>
>>The valgrind output should be sent to the MUMPS developers.
>>
>>Hong,
>>
>>  Can you send this to the MUMPS developers and see what they say?
>>
>> Thanks
>>
>>Barry
>>
>>
>> > On Oct 30, 2018, at 2:04 PM, Santiago Andres Triana 
>> wrote:
>> >
>> > This is the output of
>> > mpiexec -n 2 valgrind --tool=memcheck -q --num-callers=20
>> --log-file=valgrind.log.%p ./ex7 -malloc off -f1 A.petsc -f2 B.petsc
>> -eps_nev 4 -eps_target -2e-3+1.01i -st_type sinvert
>> >
>> > Generalized eigenproblem stored in file.
>> >
>> >  Reading COMPLEX matrices from binary files...
>> > [1]PETSC ERROR:
>> 
>> > [1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation,
>> probably memory access out of range
>> > [1]PETSC ERROR: Try option -start_in_debugger or
>> -on_error_attach_debugger
>> > [1]PETSC ERROR: or see
>> http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>> > [1]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac
>> OS X to find memory corruption errors
>> > [1]PETSC ERROR: likely location of problem given in stack below
>> > [1]PETSC ERROR: -  Stack Frames
>> 
>> > [1]PETSC ERROR: Note: The EXACT line numbers in the stack are not
>> available,
>> > [1]PETSC ERROR:   INSTEAD the line number of the start of the
>> function
>> > [1]PETSC ERROR:   is given.
>> > [1]PETSC ERROR: [1] MatFactorNumeric_MUMPS line 1205
>> /home/spin2/petsc-3.10.2/src/mat/impls/aij/mpi/mumps/mumps.c
>> > [1]PETSC ERROR: [1] MatLUFactorNumeric line 3054
>> /home/spin2/petsc-3.10.2/src/mat/interface/matrix.c
>> > [1]PETSC ERROR: [1] PCSetUp_LU line 59
>> /home/spin2/petsc-3.10.2/src/ksp/pc/impls/factor/lu/lu.c
>> > [1]PETSC ERROR: [1] PCSetUp line 894
>> /home/spin2/petsc-3.10.2/src/ksp/pc/interface/precon.c
>> > [1]PETSC ERROR: [1] KSPSetUp line 304
>> /home/spin2/petsc-3.10.2/src/ksp/ksp/interface/itfunc.c
>> > [1]PETSC ERROR: [1] STSetUp_Sinvert line 96
>> /home/spin2/slepc-3.10.1/src/sys/classes/st/impls/sinvert/sinvert.c
>> > [1]PETSC ERROR: [1] STSetUp line 233
>> /home/spin2/slepc-3.10.1/src/sys/classes/st/interface/stsolve.c
>> > [1]PETSC ERROR: [1] EPSSetUp line 104
>> /home/spin2/slepc-3.10.1/src/eps/interface/epssetup.c
>> > [1]PETSC ERROR: [1] EPSSolve line 129
>> /home/spin2/slepc-3.10.1/src/eps/interface/epssolve.c
>> > [1]PETSC ERROR: - Error Message
>> --
>> > [1]PETSC ERROR: Signal received
>> > [1]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
>> for trouble shooting.
>> > [1]PETSC ERROR: Petsc Release Version 3.10.2, Oct, 09, 2018
>> > [1]PETSC ERROR: ./ex7 on a arch-linux2-c-opt named wobble-wkst-as by
>> spin2 Tue Oct 30 19:42:18 2018
>> > [1]PETSC ERROR: Configure options --download-mpich
>> -with-scalar-type=complex --download-mumps --download-parmetis
>> --download-metis --download-scalapack --download-fblaslapack
>> --with-debugging=1 --download-superlu_dist --download-ptscotch
>> > [1]PETSC ERROR: #1 User provided function() line 0 in  unknown file
>> > application called MPI_Abort(MPI_COMM_WORLD, 59) - process 1
>> >
>> >
>> >
>> > and 

Re: [petsc-users] [SLEPc] ex5 fails, error in lapack

2018-10-28 Thread Santiago Andres Triana
Hi Dave,

Indeed, I added that last arg myself after the configure script asked for
it (--with-batch seems to need it). I just tried with petsc-3.9.1, without
the --with-batch and --known-64-blas-indices=1 options and everything is
working nicely. I will try again later with the latest version.

Thanks!

Santiago

On Sun, Oct 28, 2018 at 10:31 AM Dave May  wrote:

>
>
> On Sun, 28 Oct 2018 at 09:37, Santiago Andres Triana 
> wrote:
>
>> Hi petsc-users,
>>
>> I am experiencing problems running ex5 and ex7 from the slepc tutorial.
>> This is after upgrade to petsc-3.10.2 and slepc-3.10.1. Has anyone run into
>> this problem? see the error message below. Any help or advice would be
>> highly appreciated. Thanks in advance!
>>
>> Santiago
>>
>>
>>
>> trianas@hpcb-n02:/home/trianas/slepc-3.10.1/src/eps/examples/tutorials>
>> ./ex5 -eps_nev 4
>>
>> Markov Model, N=120 (m=15)
>>
>> [0]PETSC ERROR: - Error Message
>> --
>> [0]PETSC ERROR: Error in external library
>> [0]PETSC ERROR: Error in LAPACK subroutine hseqr: info=0
>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
>> for trouble shooting.
>> [0]PETSC ERROR: Petsc Release Version 3.10.2, Oct, 09, 2018
>> [0]PETSC ERROR: ./ex5 on a arch-linux2-c-opt named hpcb-n02 by trianas
>> Sun Oct 28 09:30:18 2018
>> [0]PETSC ERROR: Configure options --known-level1-dcache-size=32768
>> --known-level1-dcache-linesize=64 --known-level1-dcache-assoc=8
>> --known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2
>> --known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8
>> --known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8
>> --known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4
>> --known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1
>> --known-mpi-c-double-complex=1 --known-has-attribute-aligned=1
>> --with-scalar-type=complex --download-mumps=1 --download-parmetis
>> --download-metis --download-scalapack=1 --download-fblaslapack=1
>> --with-debugging=0 --download-superlu_dist=1 --download-ptscotch=1
>> CXXOPTFLAGS="-O3 -march=native" FOPTFLAGS="-O3 -march=native"
>> COPTFLAGS="-O3 -march=native" --with-batch --known-64-bit-blas-indices=1
>>
>
> I think this last arg is wrong if you use --download-fblaslapack.
>
> Did you explicitly add this option yourself?
>
>
> [0]PETSC ERROR: #1 DSSolve_NHEP() line 586 in
>> /space/hpc-home/trianas/slepc-3.10.1/src/sys/classes/ds/impls/nhep/dsnhep.c
>> [0]PETSC ERROR: #2 DSSolve() line 586 in
>> /space/hpc-home/trianas/slepc-3.10.1/src/sys/classes/ds/interface/dsops.c
>> [0]PETSC ERROR: #3 EPSSolve_KrylovSchur_Default() line 275 in
>> /space/hpc-home/trianas/slepc-3.10.1/src/eps/impls/krylov/krylovschur/krylovschur.c
>> [0]PETSC ERROR: #4 EPSSolve() line 148 in
>> /space/hpc-home/trianas/slepc-3.10.1/src/eps/interface/epssolve.c
>> [0]PETSC ERROR: #5 main() line 90 in
>> /home/trianas/slepc-3.10.1/src/eps/examples/tutorials/ex5.c
>> [0]PETSC ERROR: PETSc Option Table entries:
>> [0]PETSC ERROR: -eps_nev 4
>> [0]PETSC ERROR: End of Error Message ---send entire
>> error message to petsc-ma...@mcs.anl.gov--
>> application called MPI_Abort(MPI_COMM_WORLD, 76) - process 0
>> [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=76
>> :
>> system msg for write_line failure : Bad file descriptor
>>
>>


[petsc-users] [SLEPc] ex5 fails, error in lapack

2018-10-28 Thread Santiago Andres Triana
Hi petsc-users,

I am experiencing problems running ex5 and ex7 from the slepc tutorial.
This is after upgrade to petsc-3.10.2 and slepc-3.10.1. Has anyone run into
this problem? see the error message below. Any help or advice would be
highly appreciated. Thanks in advance!

Santiago



trianas@hpcb-n02:/home/trianas/slepc-3.10.1/src/eps/examples/tutorials>
./ex5 -eps_nev 4

Markov Model, N=120 (m=15)

[0]PETSC ERROR: - Error Message
--
[0]PETSC ERROR: Error in external library
[0]PETSC ERROR: Error in LAPACK subroutine hseqr: info=0
[0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for
trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.10.2, Oct, 09, 2018
[0]PETSC ERROR: ./ex5 on a arch-linux2-c-opt named hpcb-n02 by trianas Sun
Oct 28 09:30:18 2018
[0]PETSC ERROR: Configure options --known-level1-dcache-size=32768
--known-level1-dcache-linesize=64 --known-level1-dcache-assoc=8
--known-sizeof-char=1 --known-sizeof-void-p=8 --known-sizeof-short=2
--known-sizeof-int=4 --known-sizeof-long=8 --known-sizeof-long-long=8
--known-sizeof-float=4 --known-sizeof-double=8 --known-sizeof-size_t=8
--known-bits-per-byte=8 --known-memcmp-ok=1 --known-sizeof-MPI_Comm=4
--known-sizeof-MPI_Fint=4 --known-mpi-long-double=1 --known-mpi-int64_t=1
--known-mpi-c-double-complex=1 --known-has-attribute-aligned=1
--with-scalar-type=complex --download-mumps=1 --download-parmetis
--download-metis --download-scalapack=1 --download-fblaslapack=1
--with-debugging=0 --download-superlu_dist=1 --download-ptscotch=1
CXXOPTFLAGS="-O3 -march=native" FOPTFLAGS="-O3 -march=native"
COPTFLAGS="-O3 -march=native" --with-batch --known-64-bit-blas-indices=1
[0]PETSC ERROR: #1 DSSolve_NHEP() line 586 in
/space/hpc-home/trianas/slepc-3.10.1/src/sys/classes/ds/impls/nhep/dsnhep.c
[0]PETSC ERROR: #2 DSSolve() line 586 in
/space/hpc-home/trianas/slepc-3.10.1/src/sys/classes/ds/interface/dsops.c
[0]PETSC ERROR: #3 EPSSolve_KrylovSchur_Default() line 275 in
/space/hpc-home/trianas/slepc-3.10.1/src/eps/impls/krylov/krylovschur/krylovschur.c
[0]PETSC ERROR: #4 EPSSolve() line 148 in
/space/hpc-home/trianas/slepc-3.10.1/src/eps/interface/epssolve.c
[0]PETSC ERROR: #5 main() line 90 in
/home/trianas/slepc-3.10.1/src/eps/examples/tutorials/ex5.c
[0]PETSC ERROR: PETSc Option Table entries:
[0]PETSC ERROR: -eps_nev 4
[0]PETSC ERROR: End of Error Message ---send entire
error message to petsc-ma...@mcs.anl.gov--
application called MPI_Abort(MPI_COMM_WORLD, 76) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=76
:
system msg for write_line failure : Bad file descriptor


Re: [petsc-users] problem with installation using quad precision

2018-07-30 Thread Santiago Andres Triana
Dear Karl, Jed:

It was indeed the --with-fortran-kernels=1 option the culprit. Without it
the make check steps succeeds :)

Thanks so much for your prompt help!

Santiago



On Mon, Jul 30, 2018 at 6:58 PM, Karl Rupp  wrote:

> Hi Santiago,
>
>
> I am trying to install petsc with the option --with-precision=__float128.
>> The ./configure goes fine, as well as the make all stage. However, the make
>> check step to test the libraries fails with the following error:
>>
>> /usr/bin/ld: home/spin/petsc-3.9.3/arch-linux2-c-opt/lib/libpetsc.so:
>> undefined reference to `dgemv_'
>>
>>
>> this is the configure command I use:
>>
>> ./configure --with-scalar-type=complex --with-precision=__float128
>> --with-debugging=0 --with-fortran-kernels=1 COPTFLAGS='-O3 -march=native
>> -mtune=native' CXXOPTFLAGS='-O3 -march=native -mtune=native' FOPTFLAGS='-O3
>> -march=native -mtune=native' --download-f2cblaslapack
>>
>>
>> Any hints or suggestions are welcome. Thanks so much in advance!
>>
>
> I just verified the following to work on my laptop:
>
> ./configure --with-scalar-type=complex --with-precision=__float128
> --download-f2cblaslapack --download-mpich
>
> As Jed pointed out, --with-fortran-kernels=1 is probably clashing with
> --download-f2cblaslapack. Does the build succeed without
> --with-fortran-kernels=1?
>
> Best regards,
> Karli
>


[petsc-users] problem with installation using quad precision

2018-07-30 Thread Santiago Andres Triana
Dear petsc-users,

I am trying to install petsc with the option --with-precision=__float128.
The ./configure goes fine, as well as the make all stage. However, the make
check step to test the libraries fails with the following error:

/usr/bin/ld: home/spin/petsc-3.9.3/arch-linux2-c-opt/lib/libpetsc.so:
undefined reference to `dgemv_'


this is the configure command I use:

./configure --with-scalar-type=complex --with-precision=__float128
--with-debugging=0 --with-fortran-kernels=1 COPTFLAGS='-O3 -march=native
-mtune=native' CXXOPTFLAGS='-O3 -march=native -mtune=native' FOPTFLAGS='-O3
-march=native -mtune=native' --download-f2cblaslapack


Any hints or suggestions are welcome. Thanks so much in advance!

Santiago


Re: [petsc-users] Generalized eigenvalue problem using quad precision

2018-03-05 Thread Santiago Andres Triana
 processor): 945
INFOG(22) (size in MB of memory effectively used during
factorization - sum over all processors): 8386
INFOG(23) (after analysis: value of ICNTL(6) effectively
used): 0
INFOG(24) (after analysis: value of ICNTL(12) effectively
used): 1
INFOG(25) (after factorization: number of pivots modified
by static pivoting): 0
INFOG(28) (after factorization: number of null pivots
encountered): 0
INFOG(29) (after factorization: effective number of entries
in the factors (sum over all processors)): 358009746
INFOG(30, 31) (after solution: size in Mbytes of memory
used during solution phase): 1105, 9342
INFOG(32) (after analysis: type of analysis done): 1
INFOG(33) (value used for ICNTL(8)): 7
INFOG(34) (exponent of the determinant if determinant is
requested): 0
linear system matrix = precond matrix:
Mat Object: 12 MPI processes
  type: mpiaij
  rows=903168, cols=903168
  total: nonzeros=17538393, allocated nonzeros=17538393
  total number of mallocs used during MatSetValues calls =0
not using I-node (on process 0) routines









On Sun, Mar 4, 2018 at 10:12 PM, Jose E. Roman <jro...@dsic.upv.es> wrote:

> Why do you want to move to quad precision? Double precision is usually
> enough.
> The fact that B is singular should not be a problem, provided that you do
> shift-and-invert with a nonzero target value.
> Can you send the output of -eps_view so that I can get a better idea what
> you are doing?
>
> Jose
>
>
> > El 5 mar 2018, a las 0:50, Santiago Andres Triana <rep...@gmail.com>
> escribió:
> >
> > Dear all,
> >
> > A rather general question, is there any possibility of solving a
> complex-valued generalized eigenvalue problem using quad (or extended)
> precision when the 'B' matrix is singular? So far I have been using MUMPS
> with double precision with good results but I require eventually extended
> precision. Any comment or advice highly welcome. Thanks in advance!
> >
> > Santiago
>
>


[petsc-users] Generalized eigenvalue problem using quad precision

2018-03-04 Thread Santiago Andres Triana
Dear all,

A rather general question, is there any possibility of solving a
complex-valued generalized eigenvalue problem using quad (or extended)
precision when the 'B' matrix is singular? So far I have been using MUMPS
with double precision with good results but I require eventually extended
precision. Any comment or advice highly welcome. Thanks in advance!

Santiago


[petsc-users] quad precision solvers

2017-12-31 Thread Santiago Andres Triana
Hi petsc-users,

What solvers (either petsc-native or external packages) are available for
quad precision (i.e. __float128) computations? I am dealing with a large
(1e6 x 1e6), sparse, complex-valued, non-hermitian, and non-symmetric
generalized eigenvalue problem. So far I have been using mumps
(Krylov-Schur) with double precision but I'd like to have an idea about the
rounding off errors I might be incurring.

Thanks in advance for any comments!
Cheers, Andres.


Re: [petsc-users] configure cannot find a c preprocessor

2017-12-20 Thread Santiago Andres Triana
This is what I get:

hpca-login:~> mpicc -show
gcc -I/opt/sgi/mpt/mpt-2.12/include -L/opt/sgi/mpt/mpt-2.12/lib -lmpi
-lpthread /usr/lib64/libcpuset.so.1 /usr/lib64/libbitmask.so.1

On Wed, Dec 20, 2017 at 11:59 PM, Satish Balay <ba...@mcs.anl.gov> wrote:

> >>>
> Executing: mpicc -E  
> -I/dev/shm/pbs.3111462.hpc-pbs/petsc-fdYfuH/config.setCompilers
> /dev/shm/pbs.3111462.hpc-pbs/petsc-fdYfuH/config.setCompilers/conftest.c
> stderr:
> gcc: warning: /usr/lib64/libcpuset.so.1: linker input file unused because
> linking not done
> gcc: warning: /usr/lib64/libbitmask.so.1: linker input file unused because
> linking not done
> <<<<
>
> Looks like your mpicc is printing this verbose thing on stdout [why is
> it doing a link check during preprocesing?] - thus confusing PETSc
> configure.
>
> Workarround is to fix this compiler not to print such messages. Or use
> different compilers..
>
> What do you have for:
>
> mpicc -show
>
>
> Satish
>
> On Wed, 20 Dec 2017, Santiago Andres Triana wrote:
>
> > Dear petsc-users,
> >
> > I'm trying to install petsc in a cluster using SGI's MPT. The mpicc
> > compiler is in the search path. The configure command is:
> >
> > ./configure --with-scalar-type=complex --with-mumps=1 --download-mumps
> > --download-parmetis --download-metis --download-scalapack
> >
> > However, this leads to an error (configure.log attached):
> >
> > 
> ===
> >  Configuring PETSc to compile on your system
> >
> > 
> ===
> > TESTING: checkCPreprocessor from
> > config.setCompilers(config/BuildSystem/config/setCompilers.py:599)
> >
> > 
> ***
> >  UNABLE to CONFIGURE with GIVEN OPTIONS(see configure.log for
> > details):
> > 
> ---
> > Cannot find a C preprocessor
> > 
> ***
> >
> > The configure.log says something about cpp32, here's the excerpt:
> >
> > Possible ERROR while running preprocessor: exit code 256
> > stderr:
> > gcc: error: cpp32: No such file or directory
> >
> >
> > Any ideas of what is going wrong? any help or comments are highly
> > appreciated. Thanks in advance!
> >
> > Andres
> >
>
>


[petsc-users] configure cannot find a c preprocessor

2017-12-20 Thread Santiago Andres Triana
Dear petsc-users,

I'm trying to install petsc in a cluster using SGI's MPT. The mpicc
compiler is in the search path. The configure command is:

./configure --with-scalar-type=complex --with-mumps=1 --download-mumps
--download-parmetis --download-metis --download-scalapack

However, this leads to an error (configure.log attached):

===
 Configuring PETSc to compile on your system

===
TESTING: checkCPreprocessor from
config.setCompilers(config/BuildSystem/config/setCompilers.py:599)

***
 UNABLE to CONFIGURE with GIVEN OPTIONS(see configure.log for
details):
---
Cannot find a C preprocessor
***

The configure.log says something about cpp32, here's the excerpt:

Possible ERROR while running preprocessor: exit code 256
stderr:
gcc: error: cpp32: No such file or directory


Any ideas of what is going wrong? any help or comments are highly
appreciated. Thanks in advance!

Andres


configure.log
Description: Binary data


Re: [petsc-users] configure fails with batch+scalapack

2017-12-19 Thread Santiago Andres Triana
Epilogue:

I was able to complete the configuration and compilation using an
interactive session in one compute node. Certainly, there was no need for
the --with-batch option.

However, at run time, the SGI MPT's mpiexec_mpt (required by the job
scheduler in this cluster) throws a cryptic error: Cannot find executable:
-f
It seems not petsc specific, though, as other mpi programs also fail.

In any case I would like to thank you all for the prompt help!

Santiago

On Mon, Dec 18, 2017 at 1:03 AM, Smith, Barry F. <bsm...@mcs.anl.gov> wrote:

>
>   Configure runs fine. When it runs fine absolutely no reason to run it
> with --with-batch.
>
>Make test fails because it cannot launch parallel jobs directly using
> the mpiexec it is using.
>
>You need to determine how to submit jobs on this system and then you
> are ready to go.
>
>Barry
>
>
> > On Dec 17, 2017, at 4:55 PM, Santiago Andres Triana <rep...@gmail.com>
> wrote:
> >
> > Thanks for your quick responses!
> >
> > Attached is the configure.log obtained without using the --with-batch
> option. Configures without errors but fails at the 'make test' stage. A
> snippet of the output with the error (which I attributed to the job
> manager) is:
> >
> >
> >
> > >   Local host:  hpca-login
> > >   Registerable memory: 32768 MiB
> > >   Total memory:65427 MiB
> > >
> > > Your MPI job will continue, but may be behave poorly and/or hang.
> > > 
> --
> > 3c25
> > < 0 KSP Residual norm 0.239155
> > ---
> > > 0 KSP Residual norm 0.235858
> > 6c28
> > < 0 KSP Residual norm 6.81968e-05
> > ---
> > > 0 KSP Residual norm 2.30906e-05
> > 9a32,33
> > > [hpca-login:38557] 1 more process has sent help message
> help-mpi-btl-openib.txt / reg mem limit low
> > > [hpca-login:38557] Set MCA parameter "orte_base_help_aggregate" to 0
> to see all help / error messages
> > /home/trianas/petsc-3.8.3/src/snes/examples/tutorials
> > Possible problem with ex19_fieldsplit_fieldsplit_mumps, diffs above
> > =
> > Possible error running Fortran example src/snes/examples/tutorials/ex5f
> with 1 MPI process
> > See http://www.mcs.anl.gov/petsc/documentation/faq.html
> > 
> --
> > WARNING: It appears that your OpenFabrics subsystem is configured to only
> > allow registering part of your physical memory.  This can cause MPI jobs
> to
> > run with erratic performance, hang, and/or crash.
> >
> > This may be caused by your OpenFabrics vendor limiting the amount of
> > physical memory that can be registered.  You should investigate the
> > relevant Linux kernel module parameters that control how much physical
> > memory can be registered, and increase them to allow registering all
> > physical memory on your machine.
> >
> > See this Open MPI FAQ item for more information on these Linux kernel
> module
> > parameters:
> >
> > http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
> >
> >   Local host:  hpca-login
> >   Registerable memory: 32768 MiB
> >   Total memory:65427 MiB
> >
> > Your MPI job will continue, but may be behave poorly and/or hang.
> > --------
> --
> > Number of SNES iterations = 4
> > Completed test examples
> > =
> > Now to evaluate the computer systems you plan use - do:
> > make PETSC_DIR=/home/trianas/petsc-3.8.3 PETSC_ARCH=arch-linux2-c-debug
> streams
> >
> >
> >
> >
> > On Sun, Dec 17, 2017 at 11:32 PM, Matthew Knepley <knep...@gmail.com>
> wrote:
> > On Sun, Dec 17, 2017 at 3:29 PM, Santiago Andres Triana <
> rep...@gmail.com> wrote:
> > Dear petsc-users,
> >
> > I'm trying to install petsc in a cluster that uses a job manager.  This
> is the configure command I use:
> >
> > ./configure --known-mpi-shared-libraries=1 --with-scalar-type=complex
> --with-mumps=1 --download-mumps --download-parmetis
> --with-blaslapack-dir=/sw/sdev/intel/psxe2015u3/composer_xe_2015.3.187/mkl
> --download-metis --with-scalapack=1 --download-scalapack --with-batch
> >
> > This fails when including the option --with-batch together with
> --download-scalapack:
> >
> > We need configure.log
> >
> > =

[petsc-users] configure fails with batch+scalapack

2017-12-17 Thread Santiago Andres Triana
Dear petsc-users,

I'm trying to install petsc in a cluster that uses a job manager.  This is
the configure command I use:

./configure --known-mpi-shared-libraries=1 --with-scalar-type=complex
--with-mumps=1 --download-mumps --download-parmetis
--with-blaslapack-dir=/sw/sdev/intel/psxe2015u3/composer_xe_2015.3.187/mkl
--download-metis --with-scalapack=1 --download-scalapack --with-batch

This fails when including the option --with-batch together with
--download-scalapack:

===
 Configuring PETSc to compile on your system

===
TESTING: check from
config.libraries(config/BuildSystem/config/libraries.py:158)

 ***
 UNABLE to CONFIGURE with GIVEN OPTIONS(see configure.log for
details):
---
Unable to find scalapack in default locations!
Perhaps you can specify with --with-scalapack-dir=
If you do not want scalapack, then give --with-scalapack=0
You might also consider using --download-scalapack instead
***


However, if I omit the --with-batch option, the configure script manages to
succeed (it downloads and compiles scalapack; the install fails later at
the make debug because of the job manager). Any help or suggestion is
highly appreciated. Thanks in advance!

Andres