Re: [petsc-users] LU Performance

2019-07-05 Thread Abhyankar, Shrirang G via petsc-users
I believe standalone UMFPACK uses appropriate minimum degree (amd) ordering by 
default provided by package AMD.

Use the option -pc_factor_mat_ordering_type amd to use the amd ordering. You 
can also try quotient minimum degree (qmd) ordering available in PETSc 
(-pc_factor_mat_ordering_type qmd)

Thanks,
Shri
From: petsc-users  on behalf of Jed Brown via 
petsc-users 
Reply-To: Jed Brown 
Date: Friday, July 5, 2019 at 9:48 AM
To: Stefano Zampini , Jared Crean 

Cc: PETSc users list 
Subject: Re: [petsc-users] LU Performance

Stefano Zampini via petsc-users 
mailto:petsc-users@mcs.anl.gov>> writes:

Jared,

The petsc output shows

package used to perform factorization: petsc

You are not using umfpack, but the PETSc native LU. You can run with 
-options_left  to see the options that are not processed from the PETSc options 
database.

-pc_factor_mat_solver_package was used until PETSc-3.9.  You're using a
rather old (~2016) PETSc with new options.

https://www.mcs.anl.gov/petsc/documentation/changes/39.html



On Jul 5, 2019, at 10:26 AM, Jared Crean via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
 This is in reply to both David and Barry's emails.
 I am using the Umfpack that Petsc built (--download-suitesparse=yes was 
passed to configure), so all the compiler flags and Blas/Lapack libraries are 
the same.  I used OpenBlas for Blas and Lapack, with multi-threading disabled.  
When calling Umfpack directly, the factorization takes about 4 second, compared 
to 135 seconds spent in MatLUFactorNum when using Umfpack via Petsc.
 I added a call to umfpack_di_report_control()  (which prints  the Umfpack 
parameters) to my code, and also added -mat_umfpack_prl 2 to the Petsc options, 
which should cause Petsc to call the same function just before doing the 
symbolic factorization (umfpack.c line 245 in Petsc 3.7.6). The output is 
attached (also with the -ksp_view option).  My code did print the parameters, 
but Petsc did not, which makes me think MatLUFactorSymbolic_UMFPACK never got 
called.  For reference, here is how I am invoking the program:
./test -ksp_type preonly -pc_type lu -pc_factor_mat_solver_type umfpack 
-log_view -ksp_view -mat_umfpack_prl 2 > fout_umfpacklu
 Jared Crean
On 7/5/19 4:02 AM, Smith, Barry F. wrote:
When you use Umfpack standalone do you use OpenMP threads? When you use 
umfpack alone do you us thread enabled BLAS/LAPACK? Perhaps OpenBLAS or MKL?
You can run both cases with -ksp_view and it will print more details 
indicating indicating the solver used.
 Do you use the same compiler and same options when compiling PETSc and 
Umfpack standalone. Is the Umfpack standalone time in the numerical 
factorization much smaller? Perhaps umfpack is using a much better ordering 
then when used with PETSc (perhaps the default orderings are different).
Does Umfpack has a routine that tiggers output of the parameters etc it is 
using? If you can trigger it you might see differences between standalone and 
not.
Barry
On Jul 4, 2019, at 4:05 PM, Jared Crean via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:
Hello,
 I am getting very bad performance from the Umfpack LU solver when I use it 
via Petsc compared to calling Umfpack directly. It takes about 5.5 seconds to 
factor and solve the matrix with Umfpack, but 140 seconds when I use Petsc with 
-ksp_type preonly -pc_type lu -pc_factor_mat_solver_type umfpack.
 I have attached a minimal example (test.c) that reads a matrix from a 
file, solves with Umfpack, and then solves with Petsc.  The matrix data files 
are not included because they are about 250 megabytes.  I also attached the 
output of the program with -log_view for -pc_factor_mat_solver_type umfpack 
(fout_umfpacklu) and -pc_factor_mat_solver_type petsc (fout_petsclu).  Both 
results show nearly all of the time is spent in MatLuFactorNum.  The times are 
very similar, so I am wondering if Petsc is really calling Umfpack or if the 
Petsc LU solver is getting called in both cases.
 Jared Crean





Re: [petsc-users] LU Performance

2019-07-05 Thread Jed Brown via petsc-users
Stefano Zampini via petsc-users  writes:

> Jared,
>
> The petsc output shows
>
> package used to perform factorization: petsc
>
> You are not using umfpack, but the PETSc native LU. You can run with 
> -options_left  to see the options that are not processed from the PETSc 
> options database.

-pc_factor_mat_solver_package was used until PETSc-3.9.  You're using a
rather old (~2016) PETSc with new options.

https://www.mcs.anl.gov/petsc/documentation/changes/39.html

>
>
>> On Jul 5, 2019, at 10:26 AM, Jared Crean via petsc-users 
>>  wrote:
>> 
>> This is in reply to both David and Barry's emails.
>> 
>> I am using the Umfpack that Petsc built (--download-suitesparse=yes was 
>> passed to configure), so all the compiler flags and Blas/Lapack libraries 
>> are the same.  I used OpenBlas for Blas and Lapack, with multi-threading 
>> disabled.  When calling Umfpack directly, the factorization takes about 4 
>> second, compared to 135 seconds spent in MatLUFactorNum when using Umfpack 
>> via Petsc.
>> 
>> I added a call to umfpack_di_report_control()  (which prints  the 
>> Umfpack parameters) to my code, and also added -mat_umfpack_prl 2 to the 
>> Petsc options, which should cause Petsc to call the same function just 
>> before doing the symbolic factorization (umfpack.c line 245 in Petsc 3.7.6). 
>> The output is attached (also with the -ksp_view option).  My code did print 
>> the parameters, but Petsc did not, which makes me think 
>> MatLUFactorSymbolic_UMFPACK never got called.  For reference, here is how I 
>> am invoking the program:
>> 
>> ./test -ksp_type preonly -pc_type lu -pc_factor_mat_solver_type umfpack 
>> -log_view -ksp_view -mat_umfpack_prl 2 > fout_umfpacklu
>> 
>> Jared Crean
>> 
>> On 7/5/19 4:02 AM, Smith, Barry F. wrote:
>>>When you use Umfpack standalone do you use OpenMP threads? When you use 
>>> umfpack alone do you us thread enabled BLAS/LAPACK? Perhaps OpenBLAS or MKL?
>>> 
>>>You can run both cases with -ksp_view and it will print more details 
>>> indicating indicating the solver used.
>>> 
>>> Do you use the same compiler and same options when compiling PETSc and 
>>> Umfpack standalone. Is the Umfpack standalone time in the numerical 
>>> factorization much smaller? Perhaps umfpack is using a much better ordering 
>>> then when used with PETSc (perhaps the default orderings are different).
>>> 
>>>Does Umfpack has a routine that tiggers output of the parameters etc it 
>>> is using? If you can trigger it you might see differences between 
>>> standalone and not.
>>> 
>>>Barry
>>> 
>>> 
 On Jul 4, 2019, at 4:05 PM, Jared Crean via petsc-users 
  wrote:
 
 Hello,
 
 I am getting very bad performance from the Umfpack LU solver when I 
 use it via Petsc compared to calling Umfpack directly. It takes about 5.5 
 seconds to factor and solve the matrix with Umfpack, but 140 seconds when 
 I use Petsc with -ksp_type preonly -pc_type lu -pc_factor_mat_solver_type 
 umfpack.
 
 I have attached a minimal example (test.c) that reads a matrix from a 
 file, solves with Umfpack, and then solves with Petsc.  The matrix data 
 files are not included because they are about 250 megabytes.  I also 
 attached the output of the program with -log_view for 
 -pc_factor_mat_solver_type umfpack (fout_umfpacklu) and 
 -pc_factor_mat_solver_type petsc (fout_petsclu).  Both results show nearly 
 all of the time is spent in MatLuFactorNum.  The times are very similar, 
 so I am wondering if Petsc is really calling Umfpack or if the Petsc LU 
 solver is getting called in both cases.
 
 
 Jared Crean
 
 
>> 
>> 
>> 


Re: [petsc-users] LU Performance

2019-07-05 Thread Smith, Barry F. via petsc-users


   When you use Umfpack standalone do you use OpenMP threads? When you use 
umfpack alone do you us thread enabled BLAS/LAPACK? Perhaps OpenBLAS or MKL?

   You can run both cases with -ksp_view and it will print more details 
indicating indicating the solver used.

Do you use the same compiler and same options when compiling PETSc and 
Umfpack standalone. Is the Umfpack standalone time in the numerical 
factorization much smaller? Perhaps umfpack is using a much better ordering 
then when used with PETSc (perhaps the default orderings are different).

   Does Umfpack has a routine that tiggers output of the parameters etc it is 
using? If you can trigger it you might see differences between standalone and 
not.

   Barry


> On Jul 4, 2019, at 4:05 PM, Jared Crean via petsc-users 
>  wrote:
> 
> Hello,
> 
> I am getting very bad performance from the Umfpack LU solver when I use 
> it via Petsc compared to calling Umfpack directly. It takes about 5.5 seconds 
> to factor and solve the matrix with Umfpack, but 140 seconds when I use Petsc 
> with -ksp_type preonly -pc_type lu -pc_factor_mat_solver_type umfpack.
> 
> I have attached a minimal example (test.c) that reads a matrix from a 
> file, solves with Umfpack, and then solves with Petsc.  The matrix data files 
> are not included because they are about 250 megabytes.  I also attached the 
> output of the program with -log_view for -pc_factor_mat_solver_type umfpack 
> (fout_umfpacklu) and -pc_factor_mat_solver_type petsc (fout_petsclu).  Both 
> results show nearly all of the time is spent in MatLuFactorNum.  The times 
> are very similar, so I am wondering if Petsc is really calling Umfpack or if 
> the Petsc LU solver is getting called in both cases.
> 
> 
> Jared Crean
> 
>