Re: [petsc-dev] https://software.intel.com/en-us/devcloud/oneapi

2019-11-18 Thread Jed Brown via petsc-dev
And for those who've said that the future is exclusively GPUs, the most efficient HPC machine in the world right now is CPU only.https://www.top500.org/green500/lists/2019/11/On Nov 18, 2019 19:52, Jed Brown  wrote:This and OpenMP target are there recommended models for Aurora.On Nov 18, 2019 19:13, "Balay, Satish via petsc-dev"  wrote:Ah - ok - so we need to use this oneapi for aurora..

https://hothardware.com/news/intel-ponte-vecchio-7nm-exascale-gpu-for-hpc-market

Satish



Re: [petsc-dev] https://software.intel.com/en-us/devcloud/oneapi

2019-11-18 Thread Jed Brown via petsc-dev
This and OpenMP target are there recommended models for Aurora.On Nov 18, 2019 19:13, "Balay, Satish via petsc-dev"  wrote:Ah - ok - so we need to use this oneapi for aurora..

https://hothardware.com/news/intel-ponte-vecchio-7nm-exascale-gpu-for-hpc-market

Satish



Re: [petsc-dev] [Suggestion] Configure QOL Improvements

2019-11-09 Thread Jed Brown via petsc-dev
Matthew Knepley  writes:

> On Wed, Oct 23, 2019 at 4:59 PM Jed Brown  wrote:
>
>> Matthew Knepley  writes:
>>
>> > That is an unreliable check for Z. You would not eliminate the case
>> > where you give --with-Z, but the check fails, so Z is not available,
>> > but you do not find out until checking X or Y.
>>
>> You can verify that Z works in a fraction of a second, but building X
>> may take minutes.  Work bottom up, verify everything that is given
>> before building anything.
>>
>
> That is exactly what we do now. What are you talking about?

No, it does not.  See how libceed is built before determining that
libpng does not have its required dependency of zlib?

$ time ./configure PETSC_ARCH=ompi-png --download-libceed --download-libpng 
--with-fortran-interfaces=0
===
 Configuring PETSc to compile on your system   
===
=== 

   
  Warning: PETSC_ARCH from environment does not match command-line or name 
of script.  

  Warning: Using from command-line or name of script: ompi-png, ignoring 
environment: ompi-optg  
  
=== 

   
=== 

   
  Trying to download git://https://github.com/CEED/libceed.git for LIBCEED  

   
=== 

   
=== 

   
  Compiling libceed; this may take several minutes  

   
=== 

   
=== 

   
  Installing libceed; this may take several minutes 

   
=== 

   
TESTING: checkDependencies from 
config.packages.libpng(config/BuildSystem/config/package.py:834)
   
***
 UNABLE to CONFIGURE with GIVEN OPTIONS(see configure.log for 
details):
---
Did not find package ZLIB needed by libpng.
Enable the package using --with-zlib or --download-zlib
***

38.987 real   29.069 user   8.760 sys   97.03 cpu


Note that this is most of the way through the configure before
recognizing that anything is missing.  In contrast, spack can tell me in
a fraction of a second that libpng needs zlib.  We should work out the
dependency graph first to make sure it has no unsatisfied nodes, then
check bottom-up that provided stuff works before building.

$ time spack spec libpng
Input spec

libpng

Concretized

libpng@1.6.34%gcc@9.2.0 arch=linux-arch-skylake
^zlib@1.2.11%gcc@9.2.0+optimize+pic+shared arch=linux-arch-skylake

0.407 real   0.358 user   0.034 sys   96.44 cpu


Re: [petsc-dev] Right-preconditioned GMRES

2019-11-06 Thread Jed Brown via petsc-dev
"Smith, Barry F. via petsc-dev"  writes:

>   Some idiot logged what they did, but not why they did it.
>
> commit bf108f309acab50613e150419c680842cf4b8a05 (HEAD)
> Author: Barry Smith 
> Date:   Thu Mar 18 20:40:53 2004 -0600
>
> bk-changeset-1.2063.1.1
> barrysmith@barry-smiths-computer.local|ChangeSet|20040319024053|12244
> ChangeSet
>   1.2063.1.1 04/03/18 20:40:53 barrysmith@barry-smiths-computer.local +5 
> -0
>   if matrix is symmetric try to use preconditioner options for symmetric 
> matrices
>
>
>Here is my guess as to this guys reasoning 15 years ago: if the user knows 
> their problem is SPD and they thus switch to CG they will get garbage using 
> the default ASM type, they will be confused and unhappy with the result not 
> realizing that it is due to an inappropriate preconditioner default when 
> using CG. The penalty is, of course, someone using GMRES will get slower 
> convergence than they should as you point out.
>
>Today I think we could do better. We could introduce the concept of
>a "symmetric" preconditioner PCIsSymmetric() PCIsSymmetricKnown()
>and then CG/all KSP that require symmetric preconditioners could
>query this information and error immediately if the PC indicates it
>is NOT symmetric. 

There are some methods in the literature where the preconditioner is
nonsymmetric, but the RHS (or preconditioned RHS) is in some benign
space in which the preconditioned operator is actually symmetric.  So
I'd rather not error, but it'd be fine to require explicit instructions
(via options or the API) to use a nonsymmetric PC with CG/MINRES.


Re: [petsc-dev] note on submiting gitlab pipelines

2019-10-31 Thread Jed Brown via petsc-dev
"Balay, Satish via petsc-dev"  writes:

> Just a reminder:
>
> We've had regular changes to CI - [.gitlab-ci.yaml] - so its generally
> a good idea to rebase branches to latest maint/master before starting
> any new test pipeline.
>
> One can always check if .gitlab-ci.yaml was updated with:
>
> git fetch
> git log my-branch..master .gitlab-ci.yaml

The fetch updates 'origin/master', not 'master', so the appropriate
check is

  git log ..origin/master .gitlab-ci.yaml

When you rebase, you should try to confirm that intermediate commits on
your branch still compile (otherwise you can break bisection).

> [even without an updated .gitlab-ci.yaml - a rebase generally helps
> with better usage of package cache for the builds - thus reducing
> rebuilds of external-packages]
>
> Satish
>
> On Wed, 23 Oct 2019, Balay, Satish via petsc-dev wrote:
>
>> A note to all petsc developers @gitlab who start test pipelines on
>> MRs:
>> 
>> Please rebase the MR branch over latest master (or maint - if
>> appropriate) before starting the test pipeline pipeline to use latest
>> ci fixes.
>> 
>> i.e make sure e73aa2c6ed is in your branch - when the test is done.
>> Without this fix - you might see a success status even-though some
>> have tests failed. [so the results would be useless].
>> 
>> One way to check if your branch has this fix:
>> 
>> git branch --contains e73aa2c6ed 
>> 
>> thanks,
>> Satish
>> 


Re: [petsc-dev] AVX kernels, old gcc, still broken

2019-10-26 Thread Jed Brown via petsc-dev
"Smith, Barry F."  writes:

>> On Oct 26, 2019, at 9:09 AM, Jed Brown  wrote:
>> 
>> "Smith, Barry F."  writes:
>> 
>>>  The proposed fix is #if defined(PETSC_USE_AVX512_KERNELS)   && && && && && 
>>> in https://gitlab.com/petsc/petsc/merge_requests/2213/diffs
>> 
>> Looks fine; approved.
>> 
>>> but note that PETSC_USE_AVX512_KERNELS does not even do a configure check 
>>> to make sure it is valid. The user has to guess that passing that flag will 
>>> work. Of course a proper configure test is needed and since a proper test 
>>> is needed it can handle all the issues in one place instead of having one 
>>> issue in  configure and n - 1 in the source code. 
>> 
>> What are "all the issues"?  32-bit indices, precision=double,
>> scalar=real?  So we'll need 8 CPP macros that test each of those
>> combinations?
>
>No, if suddenly there is support for single precision for example, the 
> developer would modify the configure test to turn on PETSC_USE_AVX512_KERNELS 
> for that additional  case and not touch the source code at all; 


1. We still need to distinguish code paths for single and double.  They
definitely have to touch the source code because double-precision
intrinsics don't work on single-precision data.

2. The work is inevitably done incrementally so the developer needs a
#if test to work one kernel at a time.  It's error-prone to create a new
preprocessor macro for testing and change it before submitting the MR.
It also forces all single-precision support into a single commit that
implements several kernels instead of just one.

3. "git pull && make libs" will fail for every user of master because
they need to reconfigure.  Testing the branch before merging becomes a
many-minutes job instead of 10 seconds to recompile a few files and run
a test.  Switching back to 'master' after commenting/approval might
require getting back the old macros, thus another reconfigure.


Re: [petsc-dev] AVX kernels, old gcc, still broken

2019-10-26 Thread Jed Brown via petsc-dev
"Smith, Barry F."  writes:

>   The proposed fix is #if defined(PETSC_USE_AVX512_KERNELS)   && && && && && 
> in https://gitlab.com/petsc/petsc/merge_requests/2213/diffs

Looks fine; approved.

> but note that PETSC_USE_AVX512_KERNELS does not even do a configure check to 
> make sure it is valid. The user has to guess that passing that flag will 
> work. Of course a proper configure test is needed and since a proper test is 
> needed it can handle all the issues in one place instead of having one issue 
> in  configure and n - 1 in the source code. 

What are "all the issues"?  32-bit indices, precision=double,
scalar=real?  So we'll need 8 CPP macros that test each of those
combinations?

>   This is a basic implementation disagreement, I hate CPP and think it should 
> be used minimally, you hate configure and think it should be used minimally.


Re: [petsc-dev] AVX kernels, old gcc, still broken

2019-10-25 Thread Jed Brown via petsc-dev
"Smith, Barry F. via petsc-dev"  writes:

>This needs to be fixed properly with a configure test(s) and not with huge 
> and inconsistent checks like this  
>
> #if defined(PETSC_HAVE_IMMINTRIN_H) && defined(__AVX512F__) && 
> defined(PETSC_USE_REAL_DOUBLE) && !defined(PETSC_USE_COMPLEX) && 
> !defined(PETSC_USE_64BIT_INDICES)  or this
>
> #elif defined(PETSC_USE_AVX512_KERNELS) && defined(PETSC_HAVE_IMMINTRIN_H) && 
> defined(__AVX512F__) && defined(PETSC_USE_REAL_DOUBLE) && 
> !defined(PETSC_USE_COMPLEX) && !defined(PETSC_USE_64BIT_INDICES) && 
> !defined(PETSC_SKIP_IMMINTRIN_H_CUDAWORKAROUND)
>
>
>
>self.useAVX512Kernels = self.framework.argDB['with-avx512-kernels']
> if self.useAVX512Kernels:
>   self.addDefine('USE_AVX512_KERNELS', 1)
>
>  Here you should check that the needed include files are available, that 
> it is 32 bit integers, that defined(__AVX512F__) exists and that appropriate 
> functions exist. 

What happens when one AVX512 kernel gets a 64-bit integer or
single-precision implementation?  You'll have new macros for each
combination of criteria instead of using && on the normal macros?

We should investigate whether #pragma omp simd or ivdep can generate
comparable vectorization without directly using the intrinsics -- those
are quite a bit more portable when they work, and they often work with
some experimentation.

> Maybe you need two configure checks if there are two different types of 
> functionality you are trying to catch? 
>
> Continuing to hack away with gross #if def means this crud is 
> unmaintainable and will always haunt you. Yes the incremental cost of doing a 
> proper configure test is there but once that is done the maintenance costs 
> (which have been haunting use for months) will be gone.
>
>   Barry
>
>
>
>
>
>
>
>
>
>> On Oct 25, 2019, at 2:49 AM, Lisandro Dalcin via petsc-dev 
>>  wrote:
>> 
>> 
>> 
>> On Fri, 25 Oct 2019 at 01:40, Balay, Satish  wrote:
>> I'm curious why this issue comes up for you. The code was unrelated to 
>> --with-avx512-kernels=0 option.
>> 
>> Its relying on __AVX512F__and PETSC_HAVE_IMMINTRIN_H flags. And
>> assumes immintrin.h has a definition for _mm512_reduce_add_pd()
>> 
>> Is the flag __AVX512F__ always set on your machine by gcc? 
>> 
>> And does this change based on the hardware? I just tried this build
>> [same os/compiler] on "Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz" - and
>> can't reproduce the issue.
>> 
>> I do see _mm512_reduce_add_pd is missing from immintrin.h - but the
>> flag __AVX512F__ is not set for me.
>> 
>> 
>> Of course it is not set, you are just invoking the preprocessor. Try this 
>> way:
>> 
>> $ cat xyz.c
>> #if defined __AVX512F__
>> #error "avx512f flag set"
>> #endif
>> 
>> $ gcc -march=native -c xyz.c
>> xyz.c:2:2: error: #error "avx512f flag set"
>>  #error "avx512f flag set"
>> 
>> I forgot to mention my XXXOPTFLAGS, full reconfigure script below.
>> Do we have some Ubuntu 16 builder using system GCC?
>>  Maybe we should use `-march=native -O3 -g3` in one of these builders?
>> 
>> $ cat arch-gnu-opt/lib/petsc/conf/reconfigure-arch-gnu-opt.py 
>> #!/usr/bin/python
>> if __name__ == '__main__':
>>   import sys
>>   import os
>>   sys.path.insert(0, os.path.abspath('config'))
>>   import configure
>>   configure_options = [
>> '--COPTFLAGS=-march=native -mtune=native -O3',
>> '--CXXOPTFLAGS=-march=native -mtune=native -O3',
>> '--FOPTFLAGS=-march=native -mtune=native -O3',
>> '--download-metis=1',
>> '--download-p4est=1',
>> '--download-parmetis=1',
>> '--with-avx512-kernels=0',
>> '--with-debugging=0',
>> '--with-fortran-bindings=0',
>> '--with-zlib=1',
>> 'CC=mpicc',
>> 'CXX=mpicxx',
>> 'FC=mpifort',
>> 'PETSC_ARCH=arch-gnu-opt',
>>   ]
>>   configure.petsc_configure(configure_options)
>> 
>> 
>> -- 
>> Lisandro Dalcin
>> 
>> Research Scientist
>> Extreme Computing Research Center (ECRC)
>> King Abdullah University of Science and Technology (KAUST)
>> http://ecrc.kaust.edu.sa/


Re: [petsc-dev] Feed back on report on performance of vector operations on Summit requested

2019-10-23 Thread Jed Brown via petsc-dev
IMO, Figures 2 and 7+ are more interesting when the x axis (vector size)
is replaced by execution time.  We don't scale by fixing the resource
and increasing the problem size, we choose the global problem size based
on accuracy/model complexity and choose a Pareto tradeoff of execution
time with efficiency (1/cost) to decide how many nodes to use.  Most of
those sloping tails on the left become vertical lines under that
transformation.

How is latency defined in Figure 6?

Data upon which the latency-bandwidth model is derived should be plotted
to show the fit, and the model needs to be constrained to avoid negative
latency.

If you give me access to the repository with data and current plotting
scripts, I can take a crack at slicing it in the way that I think would
be useful.

"Smith, Barry F. via petsc-dev"  writes:

>We've prepared a short report on the performance of vector operations on 
> Summit and would appreciate any feed back including: inconsistencies, lack of 
> clarity, incorrect notation or terminology, etc.
>
>Thanks
>
> Barry, Hannah, and Richard


Re: [petsc-dev] [Suggestion] Configure QOL Improvements

2019-10-23 Thread Jed Brown via petsc-dev
Matthew Knepley  writes:

> That is an unreliable check for Z. You would not eliminate the case
> where you give --with-Z, but the check fails, so Z is not available,
> but you do not find out until checking X or Y.

You can verify that Z works in a fraction of a second, but building X
may take minutes.  Work bottom up, verify everything that is given
before building anything.


Re: [petsc-dev] [Suggestion] Configure QOL Improvements

2019-10-23 Thread Jed Brown via petsc-dev
Matthew Knepley  writes:

> On Wed, Oct 23, 2019 at 2:27 PM Jed Brown  wrote:
>
>> Matthew Knepley via petsc-dev  writes:
>>
>> > On Wed, Oct 23, 2019 at 11:24 AM Faibussowitsch, Jacob via petsc-dev <
>> > petsc-dev@mcs.anl.gov> wrote:
>> >> As I am largely unfamiliar with the internals of the configure process,
>> >> this is potentially more of an involved change than I am imagining,
>> given
>> >> that many libraries likely have many small dependencies and hooks which
>> >> have to be set throughout the configuration process, and so its possible
>> >> not everything could be skipped.
>> >>
>> >
>> > We had this many years ago. It was removed because the benefits did not
>> > outweigh the costs.
>>
>> I don't know if it's still the case, but it should be possible to run
>> non-interactively (like apt-get -y).  My bigger complaint is that
>> missing dependencies aren't resolved in the first couple seconds.
>>
>
> How do you know that you actually have something until you actually
> run the tests? This is the classic misconception of pkg-config, "I'll
> just believe the user", which generated 99% of user mail over the
> first 20 years of PETSc.

All other build systems get this right.  You're asking to build X and Y,
where Y depends on Z, and there is no --download-Z or --with-Z.  You
shouldn't need to build X before noticing that Z is unavailable.


Re: [petsc-dev] [Suggestion] Configure QOL Improvements

2019-10-23 Thread Jed Brown via petsc-dev
Matthew Knepley via petsc-dev  writes:

> On Wed, Oct 23, 2019 at 11:24 AM Faibussowitsch, Jacob via petsc-dev <
> petsc-dev@mcs.anl.gov> wrote:
>> As I am largely unfamiliar with the internals of the configure process,
>> this is potentially more of an involved change than I am imagining, given
>> that many libraries likely have many small dependencies and hooks which
>> have to be set throughout the configuration process, and so its possible
>> not everything could be skipped.
>>
>
> We had this many years ago. It was removed because the benefits did not
> outweigh the costs.

I don't know if it's still the case, but it should be possible to run
non-interactively (like apt-get -y).  My bigger complaint is that
missing dependencies aren't resolved in the first couple seconds.


Re: [petsc-dev] Wrong "failed tests" command

2019-10-21 Thread Jed Brown via petsc-dev
Switch to search (we really don't need to invoke Python for this) and slap a % 
on the end of anything where there might be extra.

make -f gmakefile test globsearch='mat_tests-ex128_1 
mat_tests-ex37_nsize-2_mat_type-mpibaij_mat_block_size%'

Scott Kruger  writes:

> When we created the map of directory+test+variants to targets we did not 
> use enough delimiters to allow the inverse map to be determined so 
> creating the proper list of target names is an ill-posed problem.   The 
> globsearch was a way of trying to just catch more of them, but obviously 
> it still has a bug.  I'll look at it.
>
> Scott
>
>
> On 10/21/19 12:47 PM, Jed Brown wrote:
>> Yeah, it's missing the numeric block size.  The following works
>> 
>> /usr/bin/make -f gmakefile test globsearch='mat_tests-ex128_1 
>> mat_tests-ex37_nsize-2_mat_type-mpibaij_mat_block_size-1'
>> 
>> Also, globsearch can be replaced by search in this usage.
>> 
>> "Smith, Barry F. via petsc-dev"  writes:
>> 
>>>May need more work on the tester infrastructure?
>>>
 On Oct 21, 2019, at 12:30 PM, Pierre Jolivet via petsc-dev 
  wrote:

 Hello,
 In this pipeline build log, 
 https://gitlab.com/petsc/petsc/-/jobs/326525063, it shows that I can rerun 
 failed tests using the following command:
 /usr/bin/make -f gmakefile test globsearch='mat_tests-ex128_1 
 mat_tests-ex37_nsize-2_mat_type-mpibaij_mat_block_size 
 mat_tests-ex37_nsize-1_mat_type-mpibaij_mat_block_size mat_tests-ex128_2 
 mat_tests-ex37_nsize-2_mat_type-baij_mat_block_size 
 mat_tests-ex37_nsize-1_mat_type-baij_mat_block_size mat_tests-ex30_4 
 mat_tests-ex37_nsize-2_mat_type-sbaij_mat_block_size mat_tests-ex18_* 
 mat_tests-ex76_3 mat_tests-ex37_nsize-2_mat_type-mpisbaij_mat_block_size 
 mat_tests-ex37_nsize-1_mat_type-sbaij_mat_block_size’

 If used, this command does not run any of the mat_test-ex37* tests.

 Thanks,
 Pierre
>
> -- 
> Tech-X Corporation   kru...@txcorp.com
> 5621 Arapahoe Ave, Suite A   Phone: (720) 974-1841
> Boulder, CO 80303Fax:   (303) 448-7756


Re: [petsc-dev] Wrong "failed tests" command

2019-10-21 Thread Jed Brown via petsc-dev
Yeah, it's missing the numeric block size.  The following works

/usr/bin/make -f gmakefile test globsearch='mat_tests-ex128_1 
mat_tests-ex37_nsize-2_mat_type-mpibaij_mat_block_size-1'

Also, globsearch can be replaced by search in this usage.

"Smith, Barry F. via petsc-dev"  writes:

>   May need more work on the tester infrastructure?
>
>> On Oct 21, 2019, at 12:30 PM, Pierre Jolivet via petsc-dev 
>>  wrote:
>> 
>> Hello,
>> In this pipeline build log, https://gitlab.com/petsc/petsc/-/jobs/326525063, 
>> it shows that I can rerun failed tests using the following command:
>> /usr/bin/make -f gmakefile test globsearch='mat_tests-ex128_1 
>> mat_tests-ex37_nsize-2_mat_type-mpibaij_mat_block_size 
>> mat_tests-ex37_nsize-1_mat_type-mpibaij_mat_block_size mat_tests-ex128_2 
>> mat_tests-ex37_nsize-2_mat_type-baij_mat_block_size 
>> mat_tests-ex37_nsize-1_mat_type-baij_mat_block_size mat_tests-ex30_4 
>> mat_tests-ex37_nsize-2_mat_type-sbaij_mat_block_size mat_tests-ex18_* 
>> mat_tests-ex76_3 mat_tests-ex37_nsize-2_mat_type-mpisbaij_mat_block_size 
>> mat_tests-ex37_nsize-1_mat_type-sbaij_mat_block_size’
>> 
>> If used, this command does not run any of the mat_test-ex37* tests.
>> 
>> Thanks,
>> Pierre


Re: [petsc-dev] "participants" on gitlab

2019-10-21 Thread Jed Brown via petsc-dev
All "developers" are listed as able to grant (optional) approvals --
approval from codeowners/integrators is still needed regardless of those
optional approvals.  We should perhaps remove that because I don't know
a way to have some able to approve without the notification problem you
mention below.  Unfortunately, I think that reduces incentive to review,
and we're always stressed for reviewing resources.

"Zhang, Hong via petsc-dev"  writes:

> How is the list of participants determined when a MR is created on gitlab? It 
> seems to include everybody by default. Is there any way to shorten the list? 
> Ideally only the participants involved in the particular MR should be picked. 
> Note that currently there is a huge gap between the ''Participate'' and ''On 
> mention'' levels in the notification settings. With the former, I get spammed 
> with notifications whenever a new MR is created. With the later, I won’t 
> receive any notification (even someone replied my comments) unless explicitly 
> @ by someone.
>
> Hong (Mr.)


Re: [petsc-dev] ksp_error_if_not_converged in multilevel solvers

2019-10-21 Thread Jed Brown via petsc-dev
Pierre Jolivet via petsc-dev  writes:

> On Oct 20, 2019, at 6:07 PM, "Smith, Barry F."  wrote:
>
>> 
>>   The reason the code works this way is that normally 
>> -ksp_error_if_not_converged is propagated into the inner (and innerer) 
>> solves and normally it is desirable that these inner solves do not error 
>> simply because they reach the maximum number of iterations since for nested 
>> iterative methods generally we don't need or care if the inner solves 
>> "converge". 
>
> I fully agree with you on the last part of the above sentence. Thus, this 
> makes me question the first part (which I wasn't aware of): why is 
> error_if_not_converged being propagated to inner solves?
> I'm sure there are good usages, but if one cares that ksp_1 (which depends on 
> ksp_2) converges, why should an error be thrown if ksp_2 does not converge as 
> long as ksp_1 does (I guess this goes along your last paragraph)?

What if the user is debugging a singular or indefinite coarse operator
when they expect it to be SPD?  Sure, they could set that flag
directly for the coarse KSP via the options database.


Re: [petsc-dev] BlockGetIndices and GetBlockIndices

2019-10-21 Thread Jed Brown via petsc-dev
Pierre Jolivet via petsc-dev  writes:

>> On 21 Oct 2019, at 7:52 AM, Smith, Barry F.  wrote:
>> 
>> 
>> 
>>> On Oct 21, 2019, at 12:23 AM, Pierre Jolivet  
>>> wrote:
>>> 
>>> 
>>> 
 On 21 Oct 2019, at 7:11 AM, Smith, Barry F.  wrote:
 
 
 
> On Oct 20, 2019, at 11:52 PM, Pierre Jolivet  
> wrote:
> 
> 
> 
>> On 21 Oct 2019, at 6:42 AM, Smith, Barry F.  wrote:
>> 
>> Could you provide a use case where you want to access/have a block size 
>> of a IS that is not an ISBlock? 
> 
> In the end, all I really want is get access to the underlying 
> is->data->idx without having to worry about the subclass of is.
> I don’t have such a use case, but I don’t think this is really related to 
> what I want to achieve (or maybe it is…).
 
 ISGetIndices()
>>> 
>>> Not true for ISBlock with bs > 1.
>> 
>>  Certainly suppose to be, is there a bug?
>> 
>> static PetscErrorCode ISGetIndices_Block(IS in,const PetscInt *idx[])
>> {
>>  IS_Block   *sub = (IS_Block*)in->data;
>>  PetscErrorCode ierr;
>>  PetscInt   i,j,k,bs,n,*ii,*jj;
>> 
>>  PetscFunctionBegin;
>>  ierr = PetscLayoutGetBlockSize(in->map, &bs);CHKERRQ(ierr);
>> 
>> Dang, there is that stupid layout stuff again. Who put this crap in. 
>> 
>>  ierr = PetscLayoutGetLocalSize(in->map, &n);CHKERRQ(ierr);
>>  n   /= bs;
>>  if (bs == 1) *idx = sub->idx;
>>  else {
>
> There it is, I don’t want this if/else. ISGetBlockIndices would have been a 
> function always returning sub->idx.

Your code still can't skip the branch because ISGeneral can have bs>1.
Block size in IS means "these indices go together" while ISBlock imposes
the further constraint: "indices in each block are contiguous".  So you
can't just take the code you quoted where it returns the raw idx:

if (!isblock) {
  ISGetIndices(is,&indices);
  ISLocalToGlobalMappingCreate(comm,1,n,indices,PETSC_COPY_VALUES,mapping);
  ISRestoreIndices(is,&indices);
} else {
  ISGetBlockSize(is,&bs);
  ISBlockGetIndices(is,&indices);
  ISLocalToGlobalMappingCreate(comm,bs,n/bs,indices,PETSC_COPY_VALUES,mapping);
  ISBlockRestoreIndices(is,&indices);
}


PetscErrorCode  ISGetBlockSize(IS is,PetscInt *size)
{
  PetscErrorCode ierr;

  PetscFunctionBegin;
  ierr = PetscLayoutGetBlockSize(is->map, size);CHKERRQ(ierr);
  PetscFunctionReturn(0);
}



Re: [petsc-dev] PetscLayoutFindOwner and PetscLayoutFindOwnerIndex

2019-10-16 Thread Jed Brown via petsc-dev
Pierre Jolivet via petsc-dev  writes:

>> On 16 Oct 2019, at 8:01 PM, Zhang, Junchao  wrote:
>> 
>> The value of "owner" should fit in PetscMPIInt.
>
> Are you implying that BuildSystem always promotes PetscInt to be able to 
> store a PetscMPIInt (what if you configure with 32 bit indices and a 64 bit 
> MPI implementation)?

This isn't possible at present.  MPI uses "int" and PETSc uses either
"int" or "int64_t".  On ILP64, they're all 64-bit, but there is
currently no way to use int32_t on ILP64 and there would need to be a
new MPI standard to make it use int64_t on LP64.


Re: [petsc-dev] BlockGetIndices and GetBlockIndices

2019-10-16 Thread Jed Brown via petsc-dev
Stefano Zampini  writes:

> I just took a look at the ISGENERAL code. ISSetBlockSize_General just sets 
> the block size of the layout (??)
> ISGetIndices always return the data->idx memory.
> So, a more profound question is: what is the model behind setting the block 
> size on a ISGENERAL? And on a IS in general?

My recollection is that it was added to support MatNest and
MatGetLocalSubMatrix preserving block sizes so that MatSetValuesBlocked
could be used.


Re: [petsc-dev] BlockGetIndices and GetBlockIndices

2019-10-16 Thread Jed Brown via petsc-dev
Stefano Zampini via petsc-dev  writes:

>> Thoughts and/or comments? Would it make sense to add an 
>> ISGetBlockIndices/ISRestoreBlockIndices or would that be too confusing for 
>> the user?
>
> That would be more general and I think it makes sense, and should pair with  
> ISGetBlockSize

What happens if you call ISGetBlockIndices on an ISGeneral for which
blocks are not contiguous?


Re: [petsc-dev] Periodic meshes with <3 elements per edge?

2019-10-15 Thread Jed Brown via petsc-dev
I think this thread got dropped when I was on travel (two months ago and
I'm just now getting back to it, eek!).  Matt, could you please comment
on this model?

Jed Brown via petsc-dev  writes:

> Matthew Knepley  writes:
>
>>>> >> The local points could be distinct for
>>>> >> both fields and coordinates, with the global SF de-duplicating the
>>>> >> periodic points for fields, versus leaving them distinct for
>>>> >> coordinates.
>>>> >
>>>> >
>>>> > Oh, no I would never do that.
>>>>
>>>> Can you help me understand why that model is bad?
>>>>
>>>
>>> I'm also interested in the answer to this question, because I am
>>> considering something similar for DMStag; if DM has a periodic BC, the
>>> corresponding coordinate DM has a "none"  BC, so the boundary points are
>>> duplicated - this would hopefully make it much easier to locate particles
>>> in elements.
>>>
>>
>> If you start asking topological questions of the mesh, it looked
>> complicated to get them all right. For example, if you start expanding
>> the overlap over the periodic boundary. 
>
> How is this different from what we have now?  You have to go through
> global points anyway to connect between processors, so why would it
> matter if the point and its periodic alias may appear separately in a
> local space?
>
>> Fundamentally, periodicity is a topological notion. It is not defined
>> by the coordinate chart.
>
> The global SF would be the same as you have now.  The local SF would
> distinguish the alias only so those points would be valid in the
> coordinate chart.  So the periodic mesh
>
>   A -- B -- C -- D -- a
>
> on two processes would be represented via the cones
>
>   {AB, BC}  {CD, Da}
>
> with l2g
>
>   {0,1,2} {2,3,0} for fields
>   {0,1,2} {2,3,4} for coordinates
>
>
> Why doesn't this work, or where is the greater complexity of this model
> versus the present scheme of localizing coordinates?


Re: [petsc-dev] Right-preconditioned GMRES

2019-10-13 Thread Jed Brown via petsc-dev
"Smith, Barry F. via petsc-dev"  writes:

>   Is this one process with one subdomain? (And hence no meaningful overlap 
> since there is nothing to overlap?) And you expect to get the "exact" answer 
> on one iteration? 
>
>   Please run the right preconditioned GMRES with -pc_asm_type [restrict and 
> basic and none]  -ksp_monitor_true_solution  and send the output for the 
> three cases.
>
>   For kicks you can also try FGMRES (which always uses right preconditioning) 
> to see if the same problem appears.

Note that FGMRES (and GCR) does not need to apply the preconditioner
again to recover the solution, thus you save one preconditioner
application by choosing them (in exchange for needing to store two
vectors per Krylov iteration).  This often pays off when the iteration
count is <10 or so.  It's also a reason why results may be different,
e.g., in case of null space issues.


Re: [petsc-dev] Why no SpGEMM support in AIJCUSPARSE and AIJVIENNACL?

2019-10-03 Thread Jed Brown via petsc-dev
Karl Rupp  writes:

>> Do you have any experience with nsparse?
>> 
>> https://github.com/EBD-CREST/nsparse
>> 
>> I've seen claims that it is much faster than cuSPARSE for sparse
>> matrix-matrix products.
>
> I haven't tried nsparse, no.
>
> But since the performance comes from a hardware feature (cache), I would 
> be surprised if there is a big performance leap over ViennaCL. (There's 
> certainly some potential for some tweaking of ViennaCL's kernels; but 
> note that even ViennaCL is much faster than cuSPARSE's spGEMM on average).
>
> With the libaxb-wrapper we can just add nsparse as an operations backend 
> and then easily try it out and compare against the other packages. In 
> the end it doesn't matter which package provides the best performance; 
> we just want to leverage it :-)

Indeed.  It'll be interesting to compare whenever someone has time to
add the interface.  I guess we could compare with a Matrix Market file
any time (using the driver distributed with nsparse).


Re: [petsc-dev] Why no SpGEMM support in AIJCUSPARSE and AIJVIENNACL?

2019-10-02 Thread Jed Brown via petsc-dev
Do you have any experience with nsparse?

https://github.com/EBD-CREST/nsparse

I've seen claims that it is much faster than cuSPARSE for sparse
matrix-matrix products.

Karl Rupp via petsc-dev  writes:

> Hi Richard,
>
> CPU spGEMM is about twice as fast even on the GPU-friendly case of a 
> single rank: http://viennacl.sourceforge.net/viennacl-benchmarks-spmm.html
>
> I agree that it would be good to have a GPU-MatMatMult for the sake of 
> experiments. Under these performance constraints it's not top priority, 
> though.
>
> Best regards,
> Karli
>
>
> On 10/3/19 12:00 AM, Mills, Richard Tran via petsc-dev wrote:
>> Fellow PETSc developers,
>> 
>> I am wondering why the AIJCUSPARSE and AIJVIENNACL matrix types do not 
>> support the sparse matrix-matrix multiplication (SpGEMM, or MatMatMult() 
>> in PETSc parlance) routines provided by cuSPARSE and ViennaCL, 
>> respectively. Is there a good reason that I shouldn't add those? My 
>> guess is that support was not added because SpGEMM is hard to do well on 
>> a GPU compared to many CPUs (it is hard to compete with, say, Intel Xeon 
>> CPUs with their huge caches) and it has been the case that one would 
>> generally be better off doing these operations on the CPU. Since the 
>> trend at the big supercomputing centers seems to be to put more and more 
>> of the computational power into GPUs, I'm thinking that I should add the 
>> option to use the GPU library routines for SpGEMM, though. Is there some 
>> good reason to *not* do this that I am not aware of? (Maybe the CPUs are 
>> better for this even on a machine like Summit, but I think we're at the 
>> point that we should at least be able to experimentally verify this.)
>> 
>> --Richard


Re: [petsc-dev] Better error message for missing components

2019-10-01 Thread Jed Brown via petsc-dev
Matthew Knepley via petsc-dev  writes:

> Can anyone think of a way to get a better message from

We could register all types and implement PetscViewerCreate_HDF5() to
raise an error when not configured with HDF5.  The "downside" is that
-help would show implementations that aren't supported by the current
configuration, but I see that as a minor consequence, and perhaps not
even negative (because users would learn about implementations that
might be useful).

> [0]PETSC ERROR: - Error Message
> --
> [0]PETSC ERROR: Unknown type. Check for miss-spelling or missing package:
> http://www.mcs.anl.gov/petsc/documentation/installation.html#external
> [0]PETSC ERROR: Unknown PetscViewer type given: hdf5
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for
> trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.10.2, Oct, 09, 2018
> [0]PETSC ERROR: ./testPlex on a  named kif.eng.buffalo.edu by bldenton Mon
> Sep 30 21:18:30 2019
> [0]PETSC ERROR: Configure options --with-make-np=24
> --prefix=/kif1/data/shared/software/libs/petsc/3.10.2/gcc/8.2.0/mpich/3.2.1/openblas/0.2.20/opt
> --with-debugging=false --COPTFLAGS="-O3 -mavx" --CXXOPTFLAGS="-O3 -mavx"
> --FOPTFLAGS=-O3 --with-shared-libraries=1
> --with-mpi-dir=/kif1/data/shared/software/libs/mpich/3.2.1/gcc/8.2.0
> --with-mumps=true --download-mumps=1 --with-metis=true --download-metis=1
> --with-parmetis=true --download-parmetis=1 --with-superlu=true
> --download-superlu=1 --with-superludir=true --download-superlu_dist=1
> --with-blacs=true --download-blacs=1 --with-scalapack=true
> --download-scalapack=1 --with-hypre=true --download-hypre=1
> --with-blas-lib="[/kif1/data/shared/software/libs/openblas/0.2.20/gcc/8.2.0/lib/libopenblas.so]"
> --with-lapack-lib="[/kif1/data/shared/software/libs/openblas/0.2.20/gcc/8.2.0/lib/libopenblas.so]"
> --LDFLAGS=
> [0]PETSC ERROR: #1 PetscViewerSetType() line 444 in
> /kif1/data/shared/software/builddir/petsc-L9c6Pv/petsc-3.10.2/src/sys/classes/viewer/interface/viewreg.c
> [0]PETSC ERROR: #2 PetscOptionsGetViewer() line 327 in
> /kif1/data/shared/software/builddir/petsc-L9c6Pv/petsc-3.10.2/src/sys/classes/viewer/interface/viewreg.c
> [0]PETSC ERROR: #3 PetscObjectViewFromOptions() line 133 in
> /kif1/data/shared/software/builddir/petsc-L9c6Pv/petsc-3.10.2/src/sys/objects/destroy.c
> [0]PETSC ERROR: #4 main() line 210 in
> /kif1/data/users/bldenton/EGADSlite/egadsPlex/egadsPlex.c
>
> I want it to say "Try configuring with --download-hdf5"
>
>   Thanks,
>
>  Matt
>
> -- 
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-dev] It would be really nice if you could run a single job on the pipeline with a branch

2019-09-23 Thread Jed Brown via petsc-dev
"Smith, Barry F. via petsc-dev"  writes:

>> On Sep 22, 2019, at 11:26 PM, Balay, Satish  wrote:
>> 
>> Even-though a fix addresses a breakage in a single build - that change
>> could break other things so its generally best to run a full test.
>
>   Sure before a merge we want everything tested but when one is iterating on 
> a single bug it would be a much better use of resources

FWIW, any containerized jobs can easily be run locally.

I think there would be complexity for GitLab to deploy this due to
artifacts and other state that may create dependencies between jobs, but
for our independent jobs, it should be possible.


Re: [petsc-dev] MatMult on Summit

2019-09-22 Thread Jed Brown via petsc-dev
Run two resource sets on one side versus separate nodes.On Sep 22, 2019 08:46, "Smith, Barry F."  wrote:
   I'm guessing it would be very difficult to connect this particular performance bug with a decrease in performance for an actual full application since models don't catch this level of detail well (and  since you cannot run the application without the bug to see the better performance)?  IBM/Nvidia are not going to care about it if is just an abstract oddity as opposed to clearly demonstrating a problem for the use of the machine, especially if the machine is an orphan.

> On Sep 22, 2019, at 8:35 AM, Jed Brown via petsc-dev  wrote:
> 
> Karl Rupp  writes:
> 
>>> I wonder if the single-node latency bugs on AC922 are related to these
>>> weird performance results.
>>> 
>>> https://docs.google.com/spreadsheets/d/1amFJIbpvs9oJcUc-WntsFHO_C0LE7xFJeor-oElt0LY/edit#gid=0
>>> 
>> 
>> Thanks for these numbers!
>> Intra-Node > Inter-Node is indeed weird. I haven't observed such an 
>> inversion before.
> 
> As far as I know, it's been there since the machines were deployed
> despite obviously being a bug.  I know people at LLNL regard it as a
> bug, but it has not been their top priority (presumably at least in part
> because applications have not clearly expressed the impact of latency
> regressions on their science).




Re: [petsc-dev] MatMult on Summit

2019-09-22 Thread Jed Brown via petsc-dev
Karl Rupp  writes:

>> I wonder if the single-node latency bugs on AC922 are related to these
>> weird performance results.
>> 
>> https://docs.google.com/spreadsheets/d/1amFJIbpvs9oJcUc-WntsFHO_C0LE7xFJeor-oElt0LY/edit#gid=0
>> 
>
> Thanks for these numbers!
> Intra-Node > Inter-Node is indeed weird. I haven't observed such an 
> inversion before.

As far as I know, it's been there since the machines were deployed
despite obviously being a bug.  I know people at LLNL regard it as a
bug, but it has not been their top priority (presumably at least in part
because applications have not clearly expressed the impact of latency
regressions on their science).


Re: [petsc-dev] MatMult on Summit

2019-09-22 Thread Jed Brown via petsc-dev
"Smith, Barry F."  writes:

>> On Sep 21, 2019, at 11:43 PM, Jed Brown  wrote:
>> 
>> "Smith, Barry F."  writes:
>> 
>>>  Jed,
>>> 
>>>  What does latency as a function of message size mean?   It is in the plots
>> 
>> It's just the wall-clock time to ping-pong a message of that size.  All
>> the small sizes take the same amount of time (i.e., the latency), then
>> transition to being network bandwidth limited for large sizes.
>
>Thanks, this is fine for the small size. But he has the graph up to
>size 100 and the plotted values change for larger sizes, surely
>for 100 the time is a combination of latency and bandwidth?
>Isn't calling it latency a misnomer, or do people use this
>inconsistent terminology when doing ping-pongs?

Latency of an operation is just how long from when you initiate it until
it completes.  Latency in a performance model, such as LogP, is additive
with other factors (like bandwidth and compute throughput).


Re: [petsc-dev] MatMult on Summit

2019-09-21 Thread Jed Brown via petsc-dev
"Smith, Barry F."  writes:

>   Jed,
>
>   What does latency as a function of message size mean?   It is in the plots

It's just the wall-clock time to ping-pong a message of that size.  All
the small sizes take the same amount of time (i.e., the latency), then
transition to being network bandwidth limited for large sizes.

>
>> On Sep 21, 2019, at 11:15 PM, Jed Brown via petsc-dev 
>>  wrote:
>> 
>> Karl Rupp via petsc-dev  writes:
>> 
>>> Hi Junchao,
>>> 
>>> thanks, these numbers are interesting.
>>> 
>>> Do you have an easy way to evaluate the benefits of a CUDA-aware MPI vs. 
>>> a non-CUDA-aware MPI that still keeps the benefits of your 
>>> packing/unpacking routines?
>>> 
>>> I'd like to get a feeling of where the performance gains come from. Is 
>>> it due to the reduced PCI-Express transfer 
>> 
>> It's NVLink, not PCI-express.
>> 
>> I wonder if the single-node latency bugs on AC922 are related to these
>> weird performance results.
>> 
>> https://docs.google.com/spreadsheets/d/1amFJIbpvs9oJcUc-WntsFHO_C0LE7xFJeor-oElt0LY/edit#gid=0


Re: [petsc-dev] MatMult on Summit

2019-09-21 Thread Jed Brown via petsc-dev
Karl Rupp via petsc-dev  writes:

> Hi Junchao,
>
> thanks, these numbers are interesting.
>
> Do you have an easy way to evaluate the benefits of a CUDA-aware MPI vs. 
> a non-CUDA-aware MPI that still keeps the benefits of your 
> packing/unpacking routines?
>
> I'd like to get a feeling of where the performance gains come from. Is 
> it due to the reduced PCI-Express transfer 

It's NVLink, not PCI-express.

I wonder if the single-node latency bugs on AC922 are related to these
weird performance results.

https://docs.google.com/spreadsheets/d/1amFJIbpvs9oJcUc-WntsFHO_C0LE7xFJeor-oElt0LY/edit#gid=0


Re: [petsc-dev] MatMult on Summit

2019-09-21 Thread Jed Brown via petsc-dev
For an AIJ matrix with 32-bit integers, this is 1 flops/6 bytes, or 165
GB/s for the node for the best case (42 ranks).

My understanding is that these systems have 8 channels of DDR4-2666 per
socket, which is ~340 GB/s of theoretical bandwidth on a 2-socket
system, and 270 GB/s STREAM Triad according to this post

  
https://openpowerblog.wordpress.com/2018/07/19/epyc-skylake-vs-power9-stream-memory-bandwidth-comparison-via-zaius-barreleye-g2/

Is this 60% of Triad the best we can get for SpMV?

"Zhang, Junchao via petsc-dev"  writes:

> 42 cores have better performance.
>
> 36 MPI ranks
> MatMult  100 1.0 2.2435e+00 1.0 1.75e+09 1.3 2.9e+04 4.5e+04 
> 0.0e+00  6 99 97 28  0 100100100100  0 25145   0  0 0.00e+000 
> 0.00e+00  0
> VecScatterBegin  100 1.0 2.1869e-02 3.3 0.00e+00 0.0 2.9e+04 4.5e+04 
> 0.0e+00  0  0 97 28  0   1  0100100  0 0   0  0 0.00e+000 
> 0.00e+00  0
> VecScatterEnd100 1.0 7.9205e-0152.6 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  1  0  0  0  0  22  0  0  0  0 0   0  0 0.00e+000 
> 0.00e+00  0
>
> --Junchao Zhang
>
>
> On Sat, Sep 21, 2019 at 9:41 PM Smith, Barry F. 
> mailto:bsm...@mcs.anl.gov>> wrote:
>
>   Junchao,
>
> Mark has a good point; could you also try for completeness the CPU with 
> 36 cores and see if it is any better than the 42 core case?
>
>   Barry
>
>   So extrapolating about 20 nodes of the CPUs is equivalent to 1 node of the 
> GPUs for the multiply for this problem size.
>
>> On Sep 21, 2019, at 6:40 PM, Mark Adams 
>> mailto:mfad...@lbl.gov>> wrote:
>>
>> I came up with 36 cores/node for CPU GAMG runs. The memory bus is pretty 
>> saturated at that point.
>>
>> On Sat, Sep 21, 2019 at 1:44 AM Zhang, Junchao via petsc-dev 
>> mailto:petsc-dev@mcs.anl.gov>> wrote:
>> Here are CPU version results on one node with 24 cores, 42 cores. Click the 
>> links for core layout.
>>
>> 24 MPI ranks, https://jsrunvisualizer.olcf.ornl.gov/?s4f1o01n6c4g1r14d1b21l0=
>> MatMult  100 1.0 3.1431e+00 1.0 2.63e+09 1.2 1.9e+04 5.9e+04 
>> 0.0e+00  8 99 97 25  0 100100100100  0 17948   0  0 0.00e+000 
>> 0.00e+00  0
>> VecScatterBegin  100 1.0 2.0583e-02 2.3 0.00e+00 0.0 1.9e+04 5.9e+04 
>> 0.0e+00  0  0 97 25  0   0  0100100  0 0   0  0 0.00e+000 
>> 0.00e+00  0
>> VecScatterEnd100 1.0 1.0639e+0050.0 0.00e+00 0.0 0.0e+00 0.0e+00 
>> 0.0e+00  2  0  0  0  0  19  0  0  0  0 0   0  0 0.00e+000 
>> 0.00e+00  0
>>
>> 42 MPI ranks, https://jsrunvisualizer.olcf.ornl.gov/?s4f1o01n6c7g1r17d1b21l0=
>> MatMult  100 1.0 2.0519e+00 1.0 1.52e+09 1.3 3.5e+04 4.1e+04 
>> 0.0e+00 23 99 97 30  0 100100100100  0 27493   0  0 0.00e+000 
>> 0.00e+00  0
>> VecScatterBegin  100 1.0 2.0971e-02 3.4 0.00e+00 0.0 3.5e+04 4.1e+04 
>> 0.0e+00  0  0 97 30  0   1  0100100  0 0   0  0 0.00e+000 
>> 0.00e+00  0
>> VecScatterEnd100 1.0 8.5184e-0162.0 0.00e+00 0.0 0.0e+00 0.0e+00 
>> 0.0e+00  6  0  0  0  0  24  0  0  0  0 0   0  0 0.00e+000 
>> 0.00e+00  0
>>
>> --Junchao Zhang
>>
>>
>> On Fri, Sep 20, 2019 at 11:48 PM Smith, Barry F. 
>> mailto:bsm...@mcs.anl.gov>> wrote:
>>
>>   Junchao,
>>
>>Very interesting. For completeness please run also 24 and 42 CPUs without 
>> the GPUs. Note that the default layout for CPU cores is not good. You will 
>> want 3 cores on each socket then 12 on each.
>>
>>   Thanks
>>
>>Barry
>>
>>   Since Tim is one of our reviewers next week this is a very good test 
>> matrix :-)
>>
>>
>> > On Sep 20, 2019, at 11:39 PM, Zhang, Junchao via petsc-dev 
>> > mailto:petsc-dev@mcs.anl.gov>> wrote:
>> >
>> > Click the links to visualize it.
>> >
>> > 6 ranks
>> > https://jsrunvisualizer.olcf.ornl.gov/?s4f1o01n6c1g1r11d1b21l0=
>> > jsrun -n 6 -a 1 -c 1 -g 1 -r 6 --latency_priority GPU-GPU 
>> > --launch_distribution packed --bind packed:1 js_task_info ./ex900 -f 
>> > HV15R.aij -mat_type aijcusparse -vec_type cuda -n 100 -log_view
>> >
>> > 24 ranks
>> > https://jsrunvisualizer.olcf.ornl.gov/?s4f1o01n6c4g1r14d1b21l0=
>> > jsrun -n 6 -a 4 -c 4 -g 1 -r 6 --latency_priority GPU-GPU 
>> > --launch_distribution packed --bind packed:1 js_task_info ./ex900 -f 
>> > HV15R.aij -mat_type aijcusparse -vec_type cuda -n 100 -log_view
>> >
>> > --Junchao Zhang
>> >
>> >
>> > On Fri, Sep 20, 2019 at 11:34 PM Mills, Richard Tran via petsc-dev 
>> > mailto:petsc-dev@mcs.anl.gov>> wrote:
>> > Junchao,
>> >
>> > Can you share your 'jsrun' command so that we can see how you are mapping 
>> > things to resource sets?
>> >
>> > --Richard
>> >
>> > On 9/20/19 11:22 PM, Zhang, Junchao via petsc-dev wrote:
>> >> I downloaded a sparse matrix (HV15R) from Florida Sparse Matrix 
>> >> Collection. Its size is about 2M x 2M. Then I ran the same MatMult 100 
>> >> times on one node of Summit with -mat_type aijcusparse -vec_type cuda. I 
>> >> found MatMult was almost dominated by VecScatter in this simple test. 
>> >> Us

Re: [petsc-dev] Tip while using valgrind

2019-09-21 Thread Jed Brown via petsc-dev
"Smith, Barry F. via petsc-dev"  writes:

>   When using valgrind it is important to understand that it does not 
> immediately make a report when it finds an uninitialized memory, it only 
> makes a report when an uninitialized memory would cause a change in the 
> program flow (like in an if statement). This is why sometimes it seems to 
> report an uninitialized variable that doesn't make sense. It could be that 
> the value at the location came from an earlier uninitialized location and 
> that is why valgrind is reporting it, not because the reported location was 
> uninitialized.  Using the valgrind option --track-origins=yes is very useful 
> since it will always point back to the area of memory that had the 
> uninitialized value.

Yes, I sometimes wish --track-origins was the default, though it does
slow Valgrind down even more.

>  I'm sending this out because twice recently I've struggled with cases where 
> the initialized location "traveled" a long way before valgrind reported it 
> and my confusion as to how valgrind worked kept making me leap to the wrong 
> conclusions.

We can also use Valgrind client checks to assert that memory is defined at 
various stages.

  VALGRIND_CHECK_MEM_IS_DEFINED

http://valgrind.org/docs/manual/mc-manual.html


Re: [petsc-dev] test harness: output of actually executed command for V=1 gone?

2019-09-20 Thread Jed Brown via petsc-dev
"Smith, Barry F."  writes:

>> Satish and Barry:  Do we need the Error codes or can I revert to previous 
>> functionality?
>
>   I think it is important to display the error codes.
>
>   How about displaying at the bottom how to run the broken tests? You already 
> show how to run them with the test harness, you could also print how to run 
> them directly? Better then mixing it up with the TAP output?

How about a target for it?

make -f gmakefile show-test search=abcd

We already have print-test, which might more accurately be named ls-test.


Re: [petsc-dev] test harness: output of actually executed command for V=1 gone?

2019-09-20 Thread Jed Brown via petsc-dev
Hapla  Vaclav via petsc-dev  writes:

> On 20 Sep 2019, at 19:59, Scott Kruger 
> mailto:kru...@txcorp.com>> wrote:
>
>
> On 9/20/19 10:44 AM, Hapla Vaclav via petsc-dev wrote:
> I was used to copy the command actually run by test harness, change to 
> example's directory and paste the command (just changing one .. to ., e.g. 
> ../ex1 to ./ex1).
> Is this output gone? Bad news. I think there should definitely be an option 
> to quickly reproduce the test run to work on failing tests.
>
> I only modified the V=0 option to suppress the TAP 'ok' output.
>
> I think you are referring to the 'not ok' now giving the error code instead 
> of the cmd which is now true regardless of V.  This was suggested by others.  
> I defer to the larger group on what's desired here.
>
> Note that is sometimes tedious to deduce the whole command line from the test 
> declarations, for example because of multiple args: lines.
>
> Personally, I recommend just cd'ing into the test directory and running the 
> scripts by hand.
>
> For example:
> cd $PETSC_ARCH/tests/ksp/ksp/examples/tests/runex22
> cat ksp_ksp_tests-ex22_1.sh
> mpiexec  -n 1 ../ex22   > ex22_1.tmp 2> runex22.err
>
> OK, this takes a bit more time but does to job.

That's yucky.  I think we should have an option to print the command(s)
that would be run, one line per expanded {{a b c}}, so we can copy-paste
into the terminal with only one step of indirection.


Re: [petsc-dev] How to check that MatMatMult is available

2019-09-19 Thread Jed Brown via petsc-dev
Pierre Jolivet via petsc-dev  writes:

> Hello,
> Given a Mat A, I’d like to know if there is an implementation available for 
> doing C=A*B
> I was previously using MatHasOperation(A, MATOP_MATMAT_MULT, &hasMatMatMult) 
> but the result is not correct in at least two cases:

Do you want MATOP_MAT_MULT and MATOP_TRANSPOSE_MAT_MULT?

> 1) A is a MATTRANSPOSE and the underlying Mat B=A^T has a MatTransposeMatMult 
> implementation (there is currently no MATOP_MATTRANSPOSEMAT_MULT)
> 2) A is a MATNEST. This could be fixed in MatHasOperation_Nest, by checking 
> MATOP_MATMAT_MULT of all matrices in the MATNEST, but this would be incorrect 
> again if there is a single MATTRANPOSE in there
> What is then the proper way to check that I can indeed call MatMatMult(A,…)?
> Do I need to copy/paste all this 
> https://www.mcs.anl.gov/petsc/petsc-current/src/mat/interface/matrix.c.html#line9801
>  
> 
>  in user code?

Unfortunately, I don't think there is any common interface for the
string handling, though it would make sense to add one because code of
this sort is copied many times:

/* dispatch based on the type of A and B from their PetscObject's 
PetscFunctionLists. */
char multname[256];
ierr = PetscStrncpy(multname,"MatMatMult_",sizeof(multname));CHKERRQ(ierr);
ierr = 
PetscStrlcat(multname,((PetscObject)A)->type_name,sizeof(multname));CHKERRQ(ierr);
ierr = PetscStrlcat(multname,"_",sizeof(multname));CHKERRQ(ierr);
ierr = 
PetscStrlcat(multname,((PetscObject)B)->type_name,sizeof(multname));CHKERRQ(ierr);
ierr = PetscStrlcat(multname,"_C",sizeof(multname));CHKERRQ(ierr); /* e.g., 
multname = "MatMatMult_seqdense_seqaij_C" */
ierr = 
PetscObjectQueryFunction((PetscObject)B,multname,&mult);CHKERRQ(ierr);
if (!mult) 
SETERRQ2(PetscObjectComm((PetscObject)A),PETSC_ERR_ARG_INCOMP,"MatMatMult 
requires A, %s, to be compatible with B, 
%s",((PetscObject)A)->type_name,((PetscObject)B)->type_name);

> Thanks,
> Pierre
>
> PS: in my case, C and B are always of type MATDENSE. Should we handle
> this in MatMatMult and never error out for such a simple case. 

I would say yes.

> Indeed, one can just loop on the columns of B and C by doing multiple
> MatMult. This is what I’m currently doing in user code when
> hasMatMatMult == PETSC_FALSE.


Re: [petsc-dev] Master broken after changes to PetscSection headers

2019-09-19 Thread Jed Brown via petsc-dev
Stefano Zampini  writes:

> So, for example, including petscmat.h we get all the constructors,
> including "petscdm.h" we don't get DMPlexCreate... (BTW, this should
> spelled DMCreatePlex if we follow the Mat convention)

That's a legacy convention for DM.  There is a usage difference in that
anyone calling DMPlexCreate will also call lots of other DMPlex*
functions, but a caller of MatCreateAIJ may not call any MatAIJ*
functions.  I don't personally care whether DMPlexCreate is changed to
DMCreatePlex.

> If we plan to do any change, we should do it right before a release. Making
> it after, it will be a pain managing maint fixes and merges to master.
>
> I'm of the opinion that "petsc.h" should expose everything PETSc offers,
> including API. But what about "petscksp.h" for example? Should it only
> expose its *types.h dependencies? or the full API for all the objects down
> the hierarchy? (PC,Mat,Vec,IS etc)?

I would prefer only the *types.h, but that's a very disruptive change.


Re: [petsc-dev] Master broken after changes to PetscSection headers

2019-09-19 Thread Jed Brown via petsc-dev
Václav Hapla via petsc-dev  writes:

> 19. září 2019 12:23:43 SELČ, Matthew Knepley  napsal:
>>On Thu, Sep 19, 2019 at 6:21 AM Matthew Knepley 
>>wrote:
>>
>>> On Thu, Sep 19, 2019 at 6:20 AM Stefano Zampini
>>
>>> wrote:
>>>
 So why it is in the vec package?

>>>
>>> Its in IS, so if anything it should go in petscis.h. I will move it
>>there.
>>>
>>
>>Now that we are doing this. I think petscsf.h belongs there too, not
>>just
>>the types.
>
> The is directory just groups different utility classes dealing with integer 
> mappings. Probably just from historical reasons. I don't think they should be 
> considered belonging to the IS package. I don't see a point in including it 
> all in petscis.h. Why e.g. users of KSP should implicitly include all this 
> stuff when they don't need it.

I would rather move toward headers including only *types.h from their
dependencies.  Reducing such dependencies is a breaking change so we
need to be judicious and document it, but adding dependencies
gratuitously is the wrong direction, IMO.  (It slows down incremental
rebuilds of PETSc and user code.)


Re: [petsc-dev] Gitlab notifications

2019-09-12 Thread Jed Brown via petsc-dev
"Mills, Richard Tran via petsc-dev"  writes:

> On 9/12/19 6:33 AM, Jed Brown via petsc-dev wrote:
> [...]
>> https://docs.gitlab.com/ee/user/project/code_owners.html
>>
>> We currently require approval from Integration (of which you are a
>> member) and a code owner (as specified in the file).
>>
>> We used to have optional approvals from any other developer, but Satish
>> just removed that due to this notification thing, which I guess means
>> that any other developer (non-integrator, non-owner) should just comment
>> their approval if they find time to review.
>
> Alright, this CODEOWNERS thing is new to me. I assume that everyone should go 
> and edit this file and add themselves as "code owners" for the relevant 
> portions of PETSc that they've done significant development on?

It's a statement that you want MRs to depend on your approval (or that
of a co-author) before being eligible to merge.


Re: [petsc-dev] Gitlab notifications

2019-09-12 Thread Jed Brown via petsc-dev
I added it.

"Balay, Satish"  writes:

> Ah good to know.
>
> I've tried adding back 'Team' with 'petsc/developers' listed - but
> 'petsc/developers' keeps disappearing from it. Not sure whats
> happening..
>
>
> thanks,
> Satish
>
> On Thu, 12 Sep 2019, Scott Kruger wrote:
>
>> 
>> 
>> Here's what I did:
>> 
>> Settings -> Notifications -> developers + Participate
>> 
>> The default is "Global".  "Participate" is what I'm using now.
>> 
>> There is a "Custom", but it confuses me since it says you can
>> use it to match "Participate", but you can't do something like:
>> Email all new issues, but only show me the MR's I am mentioned
>> in or own.
>> 
>> Scott
>> 
>> 
>> On 9/12/19 7:39 AM, Balay, Satish via petsc-dev wrote:
>> > On Thu, 12 Sep 2019, Jed Brown via petsc-dev wrote:
>> > 
>> >> Matthew Knepley via petsc-dev  writes:
>> >>
>> >>> On Thu, Sep 12, 2019 at 9:05 AM Balay, Satish via petsc-dev <
>> >>> petsc-dev@mcs.anl.gov> wrote:
>> >>>
>> >>>> When a new MR is created, approval rules default to 'Integration' and
>> >>>> 'Team'
>> >>>>
>> >>>> So everyone in the team probably receives emails on all MRs. Now that
>> >>>> we have CODEOWNERS setup - perhaps the Team should be removed?
>> >>>>
>> >>>
>> >>> Can you explain CODEOWNERS to me? I cannot find it on the GItlab site. I
>> >>> want to see every MR.
>> >>
>> >> https://docs.gitlab.com/ee/user/project/code_owners.html
>> >>
>> >> We currently require approval from Integration (of which you are a
>> >> member) and a code owner (as specified in the file).
>> >>
>> >> We used to have optional approvals from any other developer, but Satish
>> >> just removed that due to this notification thing, which I guess means
>> >> that any other developer (non-integrator, non-owner) should just comment
>> >> their approval if they find time to review.
>> > 
>> > Ah - forgot the primary purpose of having 'Team' in the 'approve' list.
>> > Is there a way to disable notifications for team  - unless they 
>> > participate?
>> > 
>> > Satish
>> > 
>> 
>> 


Re: [petsc-dev] Gitlab notifications

2019-09-12 Thread Jed Brown via petsc-dev
Matthew Knepley via petsc-dev  writes:

> On Thu, Sep 12, 2019 at 9:05 AM Balay, Satish via petsc-dev <
> petsc-dev@mcs.anl.gov> wrote:
>
>> When a new MR is created, approval rules default to 'Integration' and
>> 'Team'
>>
>> So everyone in the team probably receives emails on all MRs. Now that
>> we have CODEOWNERS setup - perhaps the Team should be removed?
>>
>
> Can you explain CODEOWNERS to me? I cannot find it on the GItlab site. I
> want to see every MR.

https://docs.gitlab.com/ee/user/project/code_owners.html

We currently require approval from Integration (of which you are a
member) and a code owner (as specified in the file).

We used to have optional approvals from any other developer, but Satish
just removed that due to this notification thing, which I guess means
that any other developer (non-integrator, non-owner) should just comment
their approval if they find time to review.



Re: [petsc-dev] gitlab migration for pull request

2019-09-11 Thread Jed Brown via petsc-dev
Please fork the repository on GitLab
(https://gitlab.com/petsc/petsc/-/forks/new) and push your branch to the
fork, then make a merge request.  If you become a regular contributor,
we can give you push privileges to the main repository.

Pierre Gosselet via petsc-dev  writes:

> Dear all,
> I am sorry, I have not understood how to migrate Push Request from
> bitbucket to gitlab.
>
> I think I need the equivalent of
> git remote set-url origin https://gitlab.com/petsc/petsc.git
> applied to my development branch (I see that my branch's remote is
> incorrect in .git/config).

You can fix that using git branch --set-upstream-to (see the man page).

> Or am I supposed to make a fresh fork and MR from gitlab ?
>
> thank you for your help.
> best regards
> pierre
>
>
>
> -- 
> Pierre Gosselet
> CR CNRS (research agent) 
> LMT -- ENS Paris-Saclay/UMR8535
> 61 av. du président Wilson, 94235 CACHAN
> tel: +33 1 47405333


Re: [petsc-dev] PETSC_HAVE_ZLIB gone with PR #1853

2019-09-05 Thread Jed Brown via petsc-dev
Can we query HDF5 to determine whether it supports zlib?  When shipping
shared libraries, some people will use a different libhdf5, so it'd be
better to determine this at run-time.

"Smith, Barry F. via petsc-dev"  writes:

>Vaclav,
>
>  At the time of the PR Jed complained about all the configure information 
> passed to PETSc source that was not used or relevant to PETSc. 
>
>  Hence I did a cleanup, part of the cleanup was packages not directly 
> used by PETSc no longer had information generated about them, zlib was
> one of them because I did a grep of HAVE_ZLIB in PETSc which would not detect 
> the zlib in the requires. 
>
>   We can certainly bring it back ASAP.
>
>   But I'd like to explore additional options, since the use of zlib in 
> the requires: location is a bit of a hack that only works for those specific 
> tests. 
> You say that HDF fails when trying to process compressed HDF5 if HDF5 was 
> built without zlib. Is there any other way to determine if the HDF5 file is 
> compressed with zlib? An HDF5 call specifically or that? Some indicate in the 
> file? If so our viewers could use this to check the file and generate a very 
> useful error message instead of having HDF5 misbehave when called.
>
>   When bringing it back I will make it more specific, for example 
> HDF5_BUILT_WITH_ZLIB because just because zlib exists it doesn't mean HDF5 
> was built with it.
>
>   Sorry for the inconvenience, we'll fix it better than ever
>
> Barry
>
>
>
>
>> On Sep 5, 2019, at 8:52 AM, Hapla Vaclav  wrote:
>> 
>> Barry, you did a petscconf.h cleanup in BitBucket PR #1834 (merge commit 
>> 52556f0f).
>> 
>> The problem is PETSC_HAVE_ZLIB is no longer set when ZLIB is installed.
>> That effectively disables my tests with `requires: zlib` in
>>  src/mat/examples/tutorials/ex10.c
>>  src/ksp/ksp/examples/tutorials/ex27.c
>> 
>> That condition is there because loading an HDF5 file which uses compression 
>> without zlib installed leads to HDF5 errors (very cryptic at least in some 
>> older HDF5 versions).
>> 
>> Was the PETSC_HAVE_ZLIB removal intentional? Could it be brought back?
>> 
>> Vaclav


Re: [petsc-dev] Negative blocksize

2019-09-04 Thread Jed Brown via petsc-dev
"Smith, Barry F."  writes:

>   Jed,
>
>  Good recall. We could use the new flag that indicates the block size was 
> never set by the user to allow a change from the 1?

Yeah, I thought that had been the idea behind -1, but the code doesn't seem to 
enforce it.


Re: [petsc-dev] Negative blocksize

2019-09-04 Thread Jed Brown via petsc-dev
"Smith, Barry F. via petsc-dev"  writes:

>   It seems reasonable at SetUp time to make it 1. If we need to have the 
> information that user never set it (I don't know why we would want this) then 
> that can go into a new flag.

I think I recall code paths in which the blocksize is set after
PetscLayoutSetUp, and allowed so long as it's compatible with the sizes.
It's possible that those are no longer relevant.


Re: [petsc-dev] petsc-dev post from tabrez...@gmail.com requires approval

2019-08-28 Thread Jed Brown via petsc-dev
Lisandro Dalcin via petsc-dev  writes:

>>If this line was protected with  #if defined(PETSC_HAVE_METIS) and
>> PETSc was not installed with ParMetis, but only Metis would the code run
>> correctly? Or is it somehow that even though you are only using metis here
>> you still need parmetis? For what reason?
>>
>>
> With some code changes, it would be possible to support PETSc with METIS
> and no ParMETIS, but that would only cover a special (though frequent) case
> (partitioning an initially sequential mesh), and the PetscPartitioner type
> name is "parmetis", so that would be confusing. I think it is not really
> worth it to make these changes.

I recall Jack observing that sequential METIS often outperformed (in
time) ParMETIS for nested dissection ordering in Elemental's sparse
solvers.  This isn't trivial since we'd need to implement gathering to
one process and scattering the results back, but it's something to
consider if we add support for calling METIS directly.


Re: [petsc-dev] Sequential external packages and MPI

2019-08-22 Thread Jed Brown via petsc-dev
"Smith, Barry F. via petsc-dev"  writes:

>> Our Metis wrapper is marked as a sequential one, but since you are linking 
>> libmetis with MPI, this is problematic for some configurations.
>
>   What is your work flow? Are you using --prefix to compile particular 
> combinations of external packages and put them in the prefix directory ? Then 
> you can make PETSc builds and just use --with-xxx-dir=/prefixlocation to use 
> them when building PETSc?
>
>   With this model you can use the sequential compilers for the sequential 
> libraries and the MPI ones for the MPI libraries for example
>
>   ./configure --download-metis --with-mpi=0  --prefix=/home/bsmith/myprebuilts
>
>./configure --download-parmetis --with-metis-dir=/home/bsmith/myprebuilts 
> --prefix=/home/bsmith/myprebuilts
>
>For a sequential PETSc build that uses metis then
>
>   ./configure --with-metis-dir=/home/bsmith/myprebuilts --with-mpi=0
>
>   For a parallel PETSc build 
>
>  /configure --with-metis-dir=/home/bsmith/myprebuilts 
> --with-parmetis-dir=/home/bsmith/myprebuilts. 

What if Pierre were to pull out the underlying compilers to configure
without mpicc wrappers, something like

  ./configure CC=gcc CXX=g++ --with-mpi-include=... --with-mpi-lib=...

Packages that don't framework.require MPI shouldn't be linked to MPI.


Re: [petsc-dev] Working Group Beginners: Feedback On Layout

2019-08-16 Thread Jed Brown via petsc-dev
FYI, the source for this example is here:

https://bitbucket.org/psanan/sphinx_scratch/src/master/introductory_tutorial_ksp.rst

(raw) 
https://bitbucket.org/psanan/sphinx_scratch/raw/a19b48b61e50181e754becb57fc6ff36d7639005/introductory_tutorial_ksp.rst

I'm concerned that the code is copied in rather than referenced from its
original source.  I think you modify to use start-at and end-at to
extract the pieces.

I don't think it's necessary to discuss every part, but it would be
useful to have more prose (perhaps with equations/diagrams) between
blocks.

I'd like for each function to be clickable, like it is in our current
documentation and in Doxygen.  (We've been chatting about this in
previous discussion, but haven't found a clean solution.)

"Faibussowitsch, Jacob via petsc-dev"  writes:

> Hello All PETSC Developers/Users!
>
> As many of you may or may not know, PETSc recently held an all-hands 
> strategic meeting to chart the medium term course for the group. As part of 
> this meeting a working group was formed to focus on beginner tutorial guides 
> aimed at bringing new users up to speed on how to program basic to 
> intermediate PETSc scripts. We have just completed a first draft of our 
> template for these guides and would like to ask you all for your feedback!  
> Any and all feedback would be greatly appreciated, however please limit your 
> feedback to the general layout and structure. The visual presentation of the 
> web page and content is still all a WIP, and is not necessarily 
> representative of the finished product.
>
> That being said, in order to keep the project moving forward we will soft-cap 
> feedback collection by the end of next Friday (August 23) so that we can get 
> started on writing the tutorials and integrating them with the rest of the 
> revamped user-guides. Please email me directly at 
> jfaibussowit...@anl.gov with your comments! 
> Be sure to include specific details and examples of what you like and don’t 
> like with your mail.
>
> Here is the template:
> http://patricksanan.com/temp/_build/html/introductory_tutorial_ksp.html
>
> Sincerely,
>
> Jacob Faibussowitsch


Re: [petsc-dev] Periodic meshes with <3 elements per edge?

2019-08-14 Thread Jed Brown via petsc-dev
Matthew Knepley  writes:

>>> >> The local points could be distinct for
>>> >> both fields and coordinates, with the global SF de-duplicating the
>>> >> periodic points for fields, versus leaving them distinct for
>>> >> coordinates.
>>> >
>>> >
>>> > Oh, no I would never do that.
>>>
>>> Can you help me understand why that model is bad?
>>>
>>
>> I'm also interested in the answer to this question, because I am
>> considering something similar for DMStag; if DM has a periodic BC, the
>> corresponding coordinate DM has a "none"  BC, so the boundary points are
>> duplicated - this would hopefully make it much easier to locate particles
>> in elements.
>>
>
> If you start asking topological questions of the mesh, it looked
> complicated to get them all right. For example, if you start expanding
> the overlap over the periodic boundary. 

How is this different from what we have now?  You have to go through
global points anyway to connect between processors, so why would it
matter if the point and its periodic alias may appear separately in a
local space?

> Fundamentally, periodicity is a topological notion. It is not defined
> by the coordinate chart.

The global SF would be the same as you have now.  The local SF would
distinguish the alias only so those points would be valid in the
coordinate chart.  So the periodic mesh

  A -- B -- C -- D -- a

on two processes would be represented via the cones

  {AB, BC}  {CD, Da}

with l2g

  {0,1,2} {2,3,0} for fields
  {0,1,2} {2,3,4} for coordinates


Why doesn't this work, or where is the greater complexity of this model
versus the present scheme of localizing coordinates?


Re: [petsc-dev] [petsc-maint] running CUDA on SUMMIT

2019-08-14 Thread Jed Brown via petsc-dev
"Smith, Barry F."  writes:

>> On Aug 14, 2019, at 5:58 PM, Jed Brown  wrote:
>> 
>> "Smith, Barry F."  writes:
>> 
 On Aug 14, 2019, at 2:37 PM, Jed Brown  wrote:
 
 Mark Adams via petsc-dev  writes:
 
> On Wed, Aug 14, 2019 at 2:35 PM Smith, Barry F.  
> wrote:
> 
>> 
>> Mark,
>> 
>>  Would you be able to make one run using single precision? Just single
>> everywhere since that is all we support currently?
>> 
>> 
> Experience in engineering at least is single does not work for FE
> elasticity. I have tried it many years ago and have heard this from 
> others.
> This problem is pretty simple other than using Q2. I suppose I could try
> it, but just be aware the FE people might say that single sucks.
 
 When they say that single sucks, is it for the definition of the
 operator or the preconditioner?
 
 As point of reference, we can apply Q2 elasticity operators in double
 precision at nearly a billion dofs/second per GPU.
>>> 
>>>  And in single you get what?
>> 
>> I don't have exact numbers, but <2x faster on V100, and it sort of
>> doesn't matter because preconditioning cost will dominate.  
>
>When using block formats a much higher percentage of the bandwidth goes to 
> moving the double precision matrix entries so switching to single could 
> conceivably benefitup to almost a factor of two. 
>
> Depending on the matrix structure perhaps the column indices could be 
> handled by a shift and short j indices. Or 2 shifts and 2 sets of j indices

Shorts are a problem, but a lot of matrices are actually quite
compressible if you subtract the row from all the column indices.  I've
done some experiments using zstd and the CPU decode rate is competitive
to better than DRAM bandwidth.  But that gives up random access, which
seems important for vectorization.  Maybe someone who knows more about
decompression on GPUs can comment?

>> The big win
>> of single is on consumer-grade GPUs, which DOE doesn't install and
>> NVIDIA forbids to be used in data centers (because they're so
>> cost-effective ;-)).
>
>DOE LCFs are not our only customers. Cheap-o engineering professors
>might stack a bunch of consumer grade in their lab, would they
>benefit? Satish's basement could hold a great deal of consumer
>grades.

Fair point.  Time is also important so most companies buy the more
expensive hardware on the assumption it means less frequent problems
(due to lack of ECC, etc.).


Re: [petsc-dev] [petsc-maint] running CUDA on SUMMIT

2019-08-14 Thread Jed Brown via petsc-dev
"Smith, Barry F."  writes:

>> On Aug 14, 2019, at 2:37 PM, Jed Brown  wrote:
>> 
>> Mark Adams via petsc-dev  writes:
>> 
>>> On Wed, Aug 14, 2019 at 2:35 PM Smith, Barry F.  wrote:
>>> 
 
  Mark,
 
   Would you be able to make one run using single precision? Just single
 everywhere since that is all we support currently?
 
 
>>> Experience in engineering at least is single does not work for FE
>>> elasticity. I have tried it many years ago and have heard this from others.
>>> This problem is pretty simple other than using Q2. I suppose I could try
>>> it, but just be aware the FE people might say that single sucks.
>> 
>> When they say that single sucks, is it for the definition of the
>> operator or the preconditioner?
>> 
>> As point of reference, we can apply Q2 elasticity operators in double
>> precision at nearly a billion dofs/second per GPU.
>
>   And in single you get what?

I don't have exact numbers, but <2x faster on V100, and it sort of
doesn't matter because preconditioning cost will dominate.  The big win
of single is on consumer-grade GPUs, which DOE doesn't install and
NVIDIA forbids to be used in data centers (because they're so
cost-effective ;-)).

>> I'm skeptical of big wins in preconditioning (especially setup) due to
>> the cost and irregularity of indexing being large compared to the
>> bandwidth cost of the floating point values.


Re: [petsc-dev] [petsc-maint] running CUDA on SUMMIT

2019-08-14 Thread Jed Brown via petsc-dev
Brad Aagaard via petsc-dev  writes:

> Q2 is often useful in problems with body forces (such as gravitational 
> body forces), which tend to have linear variations in stress.

It's similar on the free-surface Stokes side, where pressure has a
linear gradient and must be paired with a stable velocity space.

Regarding elasticity, it would be useful to have collect some
application problems where Q2 shows a big advantage.

We should be able to solve Q2 at the same or lower cost per dof to Q1
(multigrid for this case isn't off-the-shelf at present, but it's
something we're working on).

> On 8/14/19 2:51 PM, Mark Adams via petsc-dev wrote:
>> 
>> 
>> Do you have any applications that specifically want Q2 (versus Q1)
>> elasticity or have some test problems that would benefit?
>> 
>> 
>> No, I'm just trying to push things.


Re: [petsc-dev] [petsc-maint] running CUDA on SUMMIT

2019-08-14 Thread Jed Brown via petsc-dev
Mark Adams  writes:

> On Wed, Aug 14, 2019 at 3:37 PM Jed Brown  wrote:
>
>> Mark Adams via petsc-dev  writes:
>>
>> > On Wed, Aug 14, 2019 at 2:35 PM Smith, Barry F. 
>> wrote:
>> >
>> >>
>> >>   Mark,
>> >>
>> >>Would you be able to make one run using single precision? Just single
>> >> everywhere since that is all we support currently?
>> >>
>> >>
>> > Experience in engineering at least is single does not work for FE
>> > elasticity. I have tried it many years ago and have heard this from
>> others.
>> > This problem is pretty simple other than using Q2. I suppose I could try
>> > it, but just be aware the FE people might say that single sucks.
>>
>> When they say that single sucks, is it for the definition of the
>> operator or the preconditioner?
>>
>
> Operator.
>
> And "ve seen GMRES stagnate when using single in communication in parallel
> Gauss-Seidel. Roundoff is nonlinear.

Fair; single may still be useful in the preconditioner while using
double for operator and Krylov.

Do you have any applications that specifically want Q2 (versus Q1)
elasticity or have some test problems that would benefit?

>> As point of reference, we can apply Q2 elasticity operators in double
>> precision at nearly a billion dofs/second per GPU.
>
>
>> I'm skeptical of big wins in preconditioning (especially setup) due to
>> the cost and irregularity of indexing being large compared to the
>> bandwidth cost of the floating point values.
>>


Re: [petsc-dev] [petsc-maint] running CUDA on SUMMIT

2019-08-14 Thread Jed Brown via petsc-dev
Mark Adams via petsc-dev  writes:

> On Wed, Aug 14, 2019 at 2:35 PM Smith, Barry F.  wrote:
>
>>
>>   Mark,
>>
>>Would you be able to make one run using single precision? Just single
>> everywhere since that is all we support currently?
>>
>>
> Experience in engineering at least is single does not work for FE
> elasticity. I have tried it many years ago and have heard this from others.
> This problem is pretty simple other than using Q2. I suppose I could try
> it, but just be aware the FE people might say that single sucks.

When they say that single sucks, is it for the definition of the
operator or the preconditioner?

As point of reference, we can apply Q2 elasticity operators in double
precision at nearly a billion dofs/second per GPU.

I'm skeptical of big wins in preconditioning (especially setup) due to
the cost and irregularity of indexing being large compared to the
bandwidth cost of the floating point values.


Re: [petsc-dev] Periodic meshes with <3 elements per edge?

2019-08-14 Thread Jed Brown via petsc-dev
Matthew Knepley  writes:

> On Wed, Aug 14, 2019 at 11:46 AM Jed Brown  wrote:
>
>> Matthew Knepley  writes:
>>
>> > On Tue, Aug 13, 2019 at 7:35 PM Stefano Zampini <
>> stefano.zamp...@gmail.com>
>> > wrote:
>> >
>> >>
>> >>
>> >> On Aug 14, 2019, at 1:19 AM, Jed Brown via petsc-dev <
>> >> petsc-dev@mcs.anl.gov> wrote:
>> >>
>> >> [Cc: petsc-dev]
>> >>
>> >> Also, why is our current mode of localized coordinates preferred over
>> >> the coordinate DM being non-periodic?  Is the intent to preserve that
>> >> for every point in a DM, the point is also valid in the coordinate DM?
>> >>
>> >> Yes.
>> >
>> >> Can there be "gaps" in a chart?
>> >>
>> >> Yes.
>> >
>> >> I've been digging around in the implementation because there is no
>> >> documentation of localized coordinates, but it feels more complicated
>> >> than I'd have hoped.
>> >>
>> >>
>> >> A while ago, “localization” of coordinates was supporting only very
>> simple
>> >> cases, where periodic points were identified though the ‘maxCell’
>> parameter
>> >> (used to compute  the proper cell coordinates). I think this is the
>> reason
>> >> why you need at least 3 cells to support periodicity, since the BoxMesh
>> >> constructor uses the maxCell trick.
>> >>
>> >
>> > This was my original conception since it was the only fully automatic way
>> > to apply periodicity on an unstructured mesh that
>> > was read from a file or generator.
>>
>> I take it this is in reference to this logic.
>>
>> PetscErrorCode DMLocalizeCoordinateReal_Internal(DM dm, PetscInt dim,
>> const PetscReal anchor[], const PetscReal in[], PetscReal out[])
>> {
>>   PetscInt d;
>>
>>   PetscFunctionBegin;
>>   if (!dm->maxCell) {
>> for (d = 0; d < dim; ++d) out[d] = in[d];
>>   } else {
>> for (d = 0; d < dim; ++d) {
>>   if ((dm->bdtype[d] != DM_BOUNDARY_NONE) && (PetscAbsReal(anchor[d] -
>> in[d]) > dm->maxCell[d])) {
>> out[d] = anchor[d] > in[d] ? dm->L[d] + in[d] : in[d] - dm->L[d];
>>   } else {
>> out[d] = in[d];
>>   }
>> }
>>   }
>>   PetscFunctionReturn(0);
>> }
>>
>>
>> This implies that the mesh be aligned to coordinates (at least in the
>> periodic dimension)
>
>
> No, it does not. That is the point, that you can have edges crossing the
> periodic boundary.

I mean that u(x) = u(x+(L,0)) rather than u(x) = u(x+v) where ||v||=L is
arbitrary.

>> An (unstructured) mesh generator has to support periodicity anyway to
>> generate a mesh with that topology.
>
> Or you can fake it.

How would you handle a different number/distribution of points on
surfaces that are meant to be joined periodically?

>> >> The DMPlex code fully support coordinates localized only in those cells
>> >> touching the periodic boundary. (I’m not a fan of this, since it
>> requires a
>> >> lot of ‘if’ ‘else’ switches )
>> >>
>> >
>> > I believe that Lisandro wanted this to cut down on redundant storage of
>> > coordinates.
>>
>> What was the rationale for having these special cells with
>> localized/DG-style coordinates versus storing coordinates as a
>> non-periodic continuous field?
>
>
> I do not understand what you mean.
>
>
>> The local points could be distinct for
>> both fields and coordinates, with the global SF de-duplicating the
>> periodic points for fields, versus leaving them distinct for
>> coordinates.
>
>
> Oh, no I would never do that.

Can you help me understand why that model is bad?

>> (Global coordinates are often not used.)  It seems like a
>> simpler model to me, and would eliminate a lot of if/else statements,
>> but you've been thinking about this more than me.
>>
>> >> I think domain_box_size 1 is not possible, we can probably allow
>> >> domain_box_size 2.
>> >>
>> >
>> > Technically, now a single box is possible with higher order coordinate
>> > spaces, but you have to do everything by hand
>> > and it completely untested.
>>
>> Why would higher order coordinate spaces be needed?  I could localize
>> coordinates for all cells.
>>
>
> I mean if you wanted a single cell that was periodic, you could do that
> with higher order coordinates.

But it would also work with linear coordinates, no?  (Assume the user
localizes their own coordinates.)


Re: [petsc-dev] Periodic meshes with <3 elements per edge?

2019-08-14 Thread Jed Brown via petsc-dev
Jed Brown via petsc-dev  writes:

> Matthew Knepley  writes:
>
>> On Tue, Aug 13, 2019 at 7:35 PM Stefano Zampini 
>> wrote:
>>
>>>
>>>
>>> On Aug 14, 2019, at 1:19 AM, Jed Brown via petsc-dev <
>>> petsc-dev@mcs.anl.gov> wrote:
>>>
>>> [Cc: petsc-dev]
>>>
>>> Also, why is our current mode of localized coordinates preferred over
>>> the coordinate DM being non-periodic?  Is the intent to preserve that
>>> for every point in a DM, the point is also valid in the coordinate DM?
>>>
>>> Yes.
>>
>>> Can there be "gaps" in a chart?
>>>
>>> Yes.
>>
>>> I've been digging around in the implementation because there is no
>>> documentation of localized coordinates, but it feels more complicated
>>> than I'd have hoped.
>>>
>>>
>>> A while ago, “localization” of coordinates was supporting only very simple
>>> cases, where periodic points were identified though the ‘maxCell’ parameter
>>> (used to compute  the proper cell coordinates). I think this is the reason
>>> why you need at least 3 cells to support periodicity, since the BoxMesh
>>> constructor uses the maxCell trick.
>>>
>>
>> This was my original conception since it was the only fully automatic way
>> to apply periodicity on an unstructured mesh that
>> was read from a file or generator.
>
> I take it this is in reference to this logic.
>
> PetscErrorCode DMLocalizeCoordinateReal_Internal(DM dm, PetscInt dim, const 
> PetscReal anchor[], const PetscReal in[], PetscReal out[])
> {
>   PetscInt d;
>
>   PetscFunctionBegin;
>   if (!dm->maxCell) {
> for (d = 0; d < dim; ++d) out[d] = in[d];
>   } else {
> for (d = 0; d < dim; ++d) {
>   if ((dm->bdtype[d] != DM_BOUNDARY_NONE) && (PetscAbsReal(anchor[d] - 
> in[d]) > dm->maxCell[d])) {
> out[d] = anchor[d] > in[d] ? dm->L[d] + in[d] : in[d] - dm->L[d];
>   } else {
> out[d] = in[d];
>   }
> }
>   }
>   PetscFunctionReturn(0);
> }
>
>
> This implies that the mesh be aligned to coordinates (at least in the
> periodic dimension) and that dm->L[d] be set (perhaps by the user?).
> And if the mesh generator specifies periodicity, as with Gmsh below,
> this logic isn't needed.

If we know dm->L[d], we can also know the min and max values in that
dimension.  Then, any time we come to an inverted element, we can
localize such that the Jacobian is positive and coordinates lie in the
bounding box [min,max].

I guess you're concerned about doubly-periodic meshes, such as this ("a"
is the implicit alias for "A"), where the periodic element Dcab has
positive Jacobian since it is a rotation of ABCD.

 a -- b -- a
 |||
 C -- D -- c
 |||
 A -- B -- a

Note that BacD and CDba are correctly flagged as negative Jacobian in
this case, but (without some biasing choice) we can't determine whether
to fix an element given an BACD to become BacD or baCD.  Let's use the
current algorithm for doubly-periodic.

But when there is only one dimension of periodicity, what do you think
of using the bounding box?

 E -- F -- e
 |||
 C -- D -- c
 |||
 A -- B -- a

In this case, we have elements like BacD and know that we can't move B
and D because they would go outside the bounding box.  We need to move
two vertices and the only choices that stay in the bounding box are
x[A]+(L,0) and x[C]+(L,0).

We can even have

 E -- e
 ||
 C -- c
 ||
 A -- a

which gives us zero Jacobian (AacC), and we know we have to move two
vertices.  The only positive orientation is AacC because aAcC is tangled
and aACc has negative Jacobian.

Where does this sort of technique fail?


Re: [petsc-dev] Periodic meshes with <3 elements per edge?

2019-08-14 Thread Jed Brown via petsc-dev
Matthew Knepley  writes:

> On Tue, Aug 13, 2019 at 7:35 PM Stefano Zampini 
> wrote:
>
>>
>>
>> On Aug 14, 2019, at 1:19 AM, Jed Brown via petsc-dev <
>> petsc-dev@mcs.anl.gov> wrote:
>>
>> [Cc: petsc-dev]
>>
>> Also, why is our current mode of localized coordinates preferred over
>> the coordinate DM being non-periodic?  Is the intent to preserve that
>> for every point in a DM, the point is also valid in the coordinate DM?
>>
>> Yes.
>
>> Can there be "gaps" in a chart?
>>
>> Yes.
>
>> I've been digging around in the implementation because there is no
>> documentation of localized coordinates, but it feels more complicated
>> than I'd have hoped.
>>
>>
>> A while ago, “localization” of coordinates was supporting only very simple
>> cases, where periodic points were identified though the ‘maxCell’ parameter
>> (used to compute  the proper cell coordinates). I think this is the reason
>> why you need at least 3 cells to support periodicity, since the BoxMesh
>> constructor uses the maxCell trick.
>>
>
> This was my original conception since it was the only fully automatic way
> to apply periodicity on an unstructured mesh that
> was read from a file or generator.

I take it this is in reference to this logic.

PetscErrorCode DMLocalizeCoordinateReal_Internal(DM dm, PetscInt dim, const 
PetscReal anchor[], const PetscReal in[], PetscReal out[])
{
  PetscInt d;

  PetscFunctionBegin;
  if (!dm->maxCell) {
for (d = 0; d < dim; ++d) out[d] = in[d];
  } else {
for (d = 0; d < dim; ++d) {
  if ((dm->bdtype[d] != DM_BOUNDARY_NONE) && (PetscAbsReal(anchor[d] - 
in[d]) > dm->maxCell[d])) {
out[d] = anchor[d] > in[d] ? dm->L[d] + in[d] : in[d] - dm->L[d];
  } else {
out[d] = in[d];
  }
}
  }
  PetscFunctionReturn(0);
}


This implies that the mesh be aligned to coordinates (at least in the
periodic dimension) and that dm->L[d] be set (perhaps by the user?).
And if the mesh generator specifies periodicity, as with Gmsh below,
this logic isn't needed.

>> Now, you can also inform Plex about periodicity without the maxCell trick
>> , see e.g.
>> https://bitbucket.org/petsc/petsc/src/6a494beb09767ff86fff34131928e076224d7569/src/dm/impls/plex/plexgmsh.c#lines-1468.
>> In this case, it is user responsibility to populate the cell part of the
>> coordinate section with the proper localized coordinates.
>>
>
> This is a great addition from Stefano and Lisandro, but note that it is
> nontrivial. The user has to identify the
> periodic boundary.

An (unstructured) mesh generator has to support periodicity anyway to
generate a mesh with that topology.

>> The DMPlex code fully support coordinates localized only in those cells
>> touching the periodic boundary. (I’m not a fan of this, since it requires a
>> lot of ‘if’ ‘else’ switches )
>>
>
> I believe that Lisandro wanted this to cut down on redundant storage of
> coordinates.

What was the rationale for having these special cells with
localized/DG-style coordinates versus storing coordinates as a
non-periodic continuous field?  The local points could be distinct for
both fields and coordinates, with the global SF de-duplicating the
periodic points for fields, versus leaving them distinct for
coordinates.  (Global coordinates are often not used.)  It seems like a
simpler model to me, and would eliminate a lot of if/else statements,
but you've been thinking about this more than me.

>> I think domain_box_size 1 is not possible, we can probably allow
>> domain_box_size 2.
>>
>
> Technically, now a single box is possible with higher order coordinate
> spaces, but you have to do everything by hand
> and it completely untested.

Why would higher order coordinate spaces be needed?  I could localize
coordinates for all cells.

I'm asking about this partly for spectral discretizations and partly as
a hack to use code written for 3D to solve 2D problems.


Re: [petsc-dev] Periodic meshes with <3 elements per edge?

2019-08-13 Thread Jed Brown via petsc-dev
[Cc: petsc-dev]

Also, why is our current mode of localized coordinates preferred over
the coordinate DM being non-periodic?  Is the intent to preserve that
for every point in a DM, the point is also valid in the coordinate DM?
Can there be "gaps" in a chart?

I've been digging around in the implementation because there is no
documentation of localized coordinates, but it feels more complicated
than I'd have hoped.

Jed Brown  writes:

> Can this be fixed?  Even better, can we allow -domain_box_size 1?
>
> $ mpich/tests/dm/impls/plex/examples/tests/ex1 -dim 1 -domain_box_sizes 2 
> -cell_simplex 0 -x_periodicity periodic
> [0]PETSC ERROR: - Error Message 
> --
> [0]PETSC ERROR: Invalid argument
> [0]PETSC ERROR: Mesh cell 1 is inverted, |J| = -0.25
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for 
> trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.11.3-1683-g1ac5c604ca  GIT 
> Date: 2019-08-13 14:39:38 +
> [0]PETSC ERROR: mpich/tests/dm/impls/plex/examples/tests/ex1 on a mpich named 
> joule.cs.colorado.edu by jed Tue Aug 13 17:11:25 2019
> [0]PETSC ERROR: Configure options --download-chaco --download-ctetgen 
> --download-exodusii --download-hypre --download-med --download-ml 
> --download-mumps --download-pnetcdf --download-pragmati
> c --download-scalapack --download-spai --download-sundials --download-superlu 
> --download-superlu_dist --download-triangle --with-c2html 
> --with-eigen-dir=/usr --with-hdf5-dir=/opt/hdf5-mpich -
> -with-lgrind --with-metis --with-mpi-dir=/home/jed/usr/ccache/mpich/ 
> --download-netcdf --download-conduit --with-parmetis --with-single-library=0 
> --with-suitesparse --with-yaml --with-zlib -P
> ETSC_ARCH=mpich COPTFLAGS="-Og -march=native -g" CXXOPTFLAGS="-Og 
> -march=native -g" FOPTFLAGS="-Og -march=native -g"
> [0]PETSC ERROR: #1 DMPlexCheckGeometry() line 7029 in 
> /home/jed/petsc/src/dm/impls/plex/plex.c
> [0]PETSC ERROR: #2 CreateMesh() line 412 in 
> /home/jed/petsc/src/dm/impls/plex/examples/tests/ex1.c
> [0]PETSC ERROR: #3 main() line 426 in 
> /home/jed/petsc/src/dm/impls/plex/examples/tests/ex1.c
> [0]PETSC ERROR: PETSc Option Table entries:
> [0]PETSC ERROR: -cell_simplex 0
> [0]PETSC ERROR: -dim 1
> [0]PETSC ERROR: -domain_box_sizes 2
> [0]PETSC ERROR: -malloc_test
> [0]PETSC ERROR: -x_periodicity periodic
> [0]PETSC ERROR: End of Error Message ---send entire error 
> message to petsc-ma...@mcs.anl.gov--
> application called MPI_Abort(MPI_COMM_WORLD, 62) - process 0
> [unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=62
> :
> system msg for write_line failure : Bad file descriptor


Re: [petsc-dev] [petsc-users] MatMultTranspose memory usage

2019-07-31 Thread Jed Brown via petsc-dev
https://bitbucket.org/petsc/petsc/issues/333/use-64-bit-indices-for-row-offsets-in

"Smith, Barry F."  writes:

>   Make an issue
>
>
>> On Jul 30, 2019, at 7:00 PM, Jed Brown  wrote:
>> 
>> "Smith, Barry F. via petsc-users"  writes:
>> 
>>>   The reason this worked for 4 processes is that the largest count in that 
>>> case was roughly 6,653,750,976/4 which does fit into an int. PETSc only 
>>> needs to know the number of nonzeros on each process, it doesn't need to 
>>> know the amount across all the processors. In other words you may want to 
>>> use a different PETSC_ARCH (different configuration) for small number of 
>>> processors and large number depending on how large your problem is. Or you 
>>> can always use 64 bit integers at a little performance and memory cost.
>> 
>> We could consider always using 64-bit ints for quantities like row
>> starts, keeping column indices (the "heavy" part) in 32-bit.  This may
>> become a more frequent issue with fatter nodes and many GPUs potentially
>> being driven by a single MPI rank.


Re: [petsc-dev] [petsc-users] MatMultTranspose memory usage

2019-07-30 Thread Jed Brown via petsc-dev
"Smith, Barry F. via petsc-users"  writes:

>The reason this worked for 4 processes is that the largest count in that 
> case was roughly 6,653,750,976/4 which does fit into an int. PETSc only needs 
> to know the number of nonzeros on each process, it doesn't need to know the 
> amount across all the processors. In other words you may want to use a 
> different PETSC_ARCH (different configuration) for small number of processors 
> and large number depending on how large your problem is. Or you can always 
> use 64 bit integers at a little performance and memory cost.

We could consider always using 64-bit ints for quantities like row
starts, keeping column indices (the "heavy" part) in 32-bit.  This may
become a more frequent issue with fatter nodes and many GPUs potentially
being driven by a single MPI rank.


Re: [petsc-dev] MPI shared library check broken for a very long time !!!!!!!

2019-07-29 Thread Jed Brown via petsc-dev
"Smith, Barry F."  writes:

>   I don't know what it means.
>
>   I just know that for several years the result of the test said the MPI 
> libraries were not shared. I don't think that changed anything the rest of 
> configure did.

Can we delete it/

$ git grep '\.shared\b' config
config/BuildSystem/config/packages/MPI.py:self.shared   = 0
config/BuildSystem/config/packages/MPI.py:  self.shared = 
self.libraries.checkShared('#include 
\n','MPI_Init','MPI_Initialized','MPI_Finalize',checkLink = 
self.checkPackageLink,libraries = self.lib, defaultArg = 
'known-mpi-shared-libraries', executor = self.mpiexec)
config/BuildSystem/config/packages/MPI.py:  self.shared = 0


Re: [petsc-dev] MPI shared library check broken for a very long time !!!!!!!

2019-07-29 Thread Jed Brown via petsc-dev
Does this mean we've been incorrectly identifying shared libraries all this 
time?

"Smith, Barry F. via petsc-dev"  writes:

>   Jed and Matt,
>
>I have two problems with the MPI shared library check goes back to at 
> least 3.5
>
> 1) Executing: /Users/barrysmith/soft/gnu-gfortran/bin/mpiexec 
> /var/folders/y5/5_h50n196d3_hpl0jbpv51phgn/T/petsc-5Abny2/config.libraries/conftest
> sh: /Users/barrysmith/soft/gnu-gfortran/bin/mpiexec 
> /var/folders/y5/5_h50n196d3_hpl0jbpv51phgn/T/petsc-5Abny2/config.libraries/conftest
> Executing: /Users/barrysmith/soft/gnu-gfortran/bin/mpiexec 
> /var/folders/y5/5_h50n196d3_hpl0jbpv51phgn/T/petsc-5Abny2/config.libraries/conftest
> sh: 
> ERROR while running executable: Could not execute 
> "/Users/barrysmith/soft/gnu-gfortran/bin/mpiexec 
> /var/folders/y5/5_h50n196d3_hpl0jbpv51phgn/T/petsc-5Abny2/config.libraries/conftest":
> Could not find initialization function
>
>This is due to the visibility flag being passed in building the test 
> libraries hence symbol not visible form outside
>
> 2) If you turn off the visibility flag with ./configure --with-visibility=0 
> then the problem becomes
>
> Could not find initialization check function
>
> I could not figure out why this fails. Not related to visibility
>
> Both gnu and clang compilers.


Re: [petsc-dev] What's the easiest route (for a beginner) to visualize a solution from a PETSc example?

2019-07-25 Thread Jed Brown via petsc-dev
Dave May via petsc-dev  writes:

> I'd describe how to use the binary dump and how to generate vtk files.
>
> The first is the most universal as it's completely generic and does not
> depend on a dm, thus users with their own mesh data structure and or don't
> have a mesh at all can use it. Would be worth while also providing a
> minimal python+mathplotlib script which loads the data and spits out a PDF
> so folks don't have to depend on matlab.

If you have a parallel example that doesn't use DM, you're gonna have a
lot of bookkeeping to plot the data after loading the vector.  Seems
like a distraction that will trap a lot of people in the minutia instead
of the big picture.


Re: [petsc-dev] What's the easiest route (for a beginner) to visualize a solution from a PETSc example?

2019-07-25 Thread Jed Brown via petsc-dev
X11 plotting is unreliable (requires installing something non-obvious)
on anything but Linux.  I use it in live demos, but it's so limited I
wouldn't recommend it to users.  VTK is more discoverable/explorable for
users; install a binary and make all the plots they want.

Patrick Sanan via petsc-dev  writes:

> This came up in the beginner's working group meeting. We all seemed to
> agree that a very powerful thing for beginners is to be able to run a set
> of well-defined instructions to go from 0 to being able to solve and
> visualize a simple problem (I'm imagining a PDE on a 2D spatial domain).
>
> PETSc itself isn't a visualization library, obviously, so there are many
> ways to visualize data but most involve some external tools. We'd be
> interested in opinions on what we should recommend to beginners, for
> example one or more of:
> - Dump binary, load into MATLAB/Octave/Python+numpy+matplotlib
> - Dump something which Paraview and/or VisIt can open
> - Use PETSc's native drawing (X window) capabilities
> - Include custom script for the tutorials, say which requires libpng and
> produces an image
> - ASCII art


[petsc-dev] AGU Session: T003: Advances in Computational Geosciences

2019-07-22 Thread Jed Brown via petsc-dev
If you are thinking about attending the American Geophysical Union Fall
Meeting (Dec 9-13 in San Francisco), please consider submitting an
abstract to this interdisciplinary session.  Abstracts are due July 31.

T003: Advances in Computational Geosciences

This session highlights advances in the theory and practice of
computational geoscience, from improvements in numerical methods to
their application to outstanding problems in the Earth sciences. Common
issues include robust and efficient solvers, multiscale discretizations,
design of benchmark problems and standards for comparison. Increasing
data and computational power necessitates open source scientific
libraries and workflow automation for model setup, 3D feature
connectivity, and data assimilation, and automation in uncertainty
representation and propagation, optimal design of field studies, risk
quantification, and testing the predictive power of numerical
simulations. By bringing these crosscutting computational activities
together in one session, we hope to sharpen our collective understanding
of fundamental challenges, level of rigor, and opportunities for
reusable implementations. Contributions from all areas are welcome,
including, but not limited to, fault modeling, tectonics, subduction,
seismology, magma dynamics, mantle convection, the core, as well as
surface processes, hydrology, and cryosphere.

Confirmed invited presenters: Talea Mayo, Andreas Fichtner

https://agu.confex.com/agu/fm19/prelim.cgi/Session/83797

Conveners
  Jed Brown
  University of Colorado at Boulder
  Alice-Agnes Gabriel
  Ludwig-Maximilians-Universität
  Georg S Reuber
  Johannes Gutenberg University of Mainz
  Nathan Collier
  Oak Ridge National Laboratory


Re: [petsc-dev] Dependence of build on test target broken

2019-07-13 Thread Jed Brown via petsc-dev
"Smith, Barry F. via petsc-dev"  writes:

>   Satish,
>
> I am confused. I checked out the commit just before this commit and do 
>
> $ touch src/mat/interface/matrix.c
> $ make -j 12 -f gmakefile.test test globsearch="snes*tests*ex1*"

Use "-f gmakefile" if you want to include library build dependencies
(almost always while developing).  gmakefile.test primarily exists to
run tests with an installed PETSc.

> Use "/usr/bin/make V=1" to see verbose compile lines, "/usr/bin/make V=0" to 
> suppress.
>   CC arch-basic/tests/snes/examples/tests/ex1.o
>   CC arch-basic/tests/snes/examples/tests/ex17.o
> ...
>
> It is not attempting to rebuild the library, I then editing matrix.c with a 
> blank line and again 
> make test had no interest in rebuilding the library.
>
> I then checkout maint and do the same thing. It doesn't bother rebuilding the 
> library, just starts on the tests
>
>
>   What am I misunderstanding? Something to with the Mac? 
>
>   Barry
>
>
>
>> On Jul 11, 2019, at 2:35 PM, Balay, Satish via petsc-dev 
>>  wrote:
>> 
>> On Thu, 11 Jul 2019, Matthew Knepley via petsc-dev wrote:
>> 
>>> After my latest pull of master, making the 'test' target no longer rebuilds
>>> the library. I have tested this on a few arches, and rebuilt. This is
>>> pretty inconvenient, but I do not know how to fix it.
>> 
>> 
>> git bisect gives the following.
>> 
>> Satish
>> 
>> -
>> 
>> 27d73d1f0a5c445a3a02971e31a2a1a02ed6d224 is the first bad commit
>> commit 27d73d1f0a5c445a3a02971e31a2a1a02ed6d224
>> Author: Barry Smith 
>> Date:   Sat Jun 22 22:56:05 2019 -0500
>> 
>>Fix the error from gmakefile.test test about trying to remove a non-empty 
>> directory
>> 
>>The problem was the target to rm -r the directory was running at the same 
>> time as
>>tests tests where generatering new files in the directory
>> 
>>Commit-type: bug-fix
>> 
>> :100644 100644 67a00e0c1c9ce3eb91a88f314e81a74ac278131f 
>> 9c0c1b310a238fc27df3af114ec96336ef5640d2 M  gmakefile.test
>> 


Re: [petsc-dev] [Radev, Martin] Re: Adding a new encoding for FP data

2019-07-11 Thread Jed Brown via petsc-dev
"Zhang, Junchao"  writes:

> A side question: Do lossy compressors have value for PETSc?

Perhaps if they're very fast, but I think it's usually not PETSc's place
to be performing such compression due to tolerances being really subtle.

There certainly is a place for preconditioning using reduced precision.
PETSc used to have MatScalar to store Mat entries in reduced (single)
precision while MFFD Jacobian application and Krylov work stayed in
double.  That was used in FUN3D papers circa 2000 and was "successful",
but rarely used in practice (PETSc had to be built a special way) and
removed due to the maintenance burden.  I think there would be interest
in a runtime option to compress matrix entries.  For regular stencil
operations where they may be a lot of redundancy, such compression could
be lossless.  For general problems, simply working in reduced precision
would be enough.


Re: [petsc-dev] [Radev, Martin] Re: Adding a new encoding for FP data

2019-07-11 Thread Jed Brown via petsc-dev
"Smith, Barry F."  writes:

>   Sorry, I wasn't clear. Just meant something simpler. Compress the matrix to 
> copy it to the GPU for faster transfers (and uncompress it appropriately on 
> the GPU).

Oh, perhaps.  Probably not relevant with NVLink (because it's nearly as fast as 
DRAM), but could be a win for PCI-e.


Re: [petsc-dev] [Radev, Martin] Re: Adding a new encoding for FP data

2019-07-11 Thread Jed Brown via petsc-dev
I don't know anything about zstd (or competitive compression) for GPU,
but doubt it works at the desired granularity.  I think SpMV on late-gen
CPUs can be accelerated by zstd column index compression, especially for
semi-structured problems, but likely also for unstructured problems
numbered by breadth-first search or similar.  But we'd need to demo that
use specifically.

"Smith, Barry F."  writes:

>   CPU to GPU? Especially matrices?
>
>> On Jul 11, 2019, at 9:05 AM, Jed Brown via petsc-dev  
>> wrote:
>> 
>> Zstd is a remarkably good compressor.  I've experimented with it for
>> compressing column indices for sparse matrices on structured grids and
>> (after a simple transform: subtracting the row number) gotten
>> decompression speed in the neighborhood of 10 GB/s (i.e., faster per
>> core than DRAM).  I've been meaning to follow up.  The transformation
>> described below (splitting the bytes) is yielding decompression speed
>> around 1GB/s (in this link below), which isn't competitive for things
>> like MatMult, but could be useful for things like trajectory
>> checkpointing.
>> 
>> https://drive.google.com/file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view
>> 
>> 
>> From: "Radev, Martin" 
>> Subject: Re: Adding a new encoding for FP data
>> Date: July 11, 2019 at 4:55:03 AM CDT
>> To: "d...@arrow.apache.org" 
>> Cc: "Raoofy, Amir" , "Karlstetter, Roman" 
>> 
>> Reply-To: 
>> 
>> 
>> Hello Liya Fan,
>> 
>> 
>> this explains the technique but for a more complex case:
>> 
>> https://fgiesen.wordpress.com/2011/01/24/x86-code-compression-in-kkrunchy/
>> 
>> For FP data, the approach which seemed to be the best is the following.
>> 
>> Say we have a buffer of two 32-bit floating point values:
>> 
>> buf = [af, bf]
>> 
>> We interpret each FP value as a 32-bit uint and look at each individual 
>> byte. We have 8 bytes in total for this small input.
>> 
>> buf = [af0, af1, af2, af3, bf0, bf1, bf2, bf3]
>> 
>> Then we apply stream splitting and the new buffer becomes:
>> 
>> newbuf = [af0, bf0, af1, bf1, af2, bf2, af3, bf3]
>> 
>> We compress newbuf.
>> 
>> Due to similarities the sign bits, mantissa bits and MSB exponent bits, we 
>> might have a lot more repetitions in data. For scientific data, the 2nd and 
>> 3rd byte for 32-bit data is probably largely noise. Thus in the original 
>> representation we would always have a few bytes of data which could appear 
>> somewhere else in the buffer and then a couple bytes of possible noise. In 
>> the new representation we have a long stream of data which could compress 
>> well and then a sequence of noise towards the end.
>> 
>> This transformation improved compression ratio as can be seen in the report.
>> 
>> It also improved speed for ZSTD. This could be because ZSTD makes a decision 
>> of how to compress the data - RLE, new huffman tree, huffman tree of the 
>> previous frame, raw representation. Each can potentially achieve a different 
>> compression ratio and compression/decompression speed. It turned out that 
>> when the transformation is applied, zstd would attempt to compress fewer 
>> frames and copy the other. This could lead to less attempts to build a 
>> huffman tree. It's hard to pin-point the exact reason.
>> 
>> I did not try other lossless text compressors but I expect similar results.
>> 
>> For code, I can polish my patches, create a Jira task and submit the patches 
>> for review.
>> 
>> 
>> Regards,
>> 
>> Martin
>> 
>> 
>> 
>> From: Fan Liya 
>> Sent: Thursday, July 11, 2019 11:32:53 AM
>> To: d...@arrow.apache.org
>> Cc: Raoofy, Amir; Karlstetter, Roman
>> Subject: Re: Adding a new encoding for FP data
>> 
>> Hi Radev,
>> 
>> Thanks for the information. It seems interesting.
>> IMO, Arrow has much to do for data compression. However, it seems there are
>> some differences for memory data compression and external storage data
>> compression.
>> 
>> Could you please provide some reference for stream splitting?
>> 
>> Best,
>> Liya Fan
>> 
>> On Thu, Jul 11, 2019 at 5:15 PM Radev, Martin  wrote:
>> 
>>> Hello people,
>>> 
>>> 
>>> there has been discussion in the Apache Parquet mailing list on adding a
>>> new encoder for FP data.
>>> The reason for this is that the supported c

[petsc-dev] [Radev, Martin] Re: Adding a new encoding for FP data

2019-07-11 Thread Jed Brown via petsc-dev
Zstd is a remarkably good compressor.  I've experimented with it for
compressing column indices for sparse matrices on structured grids and
(after a simple transform: subtracting the row number) gotten
decompression speed in the neighborhood of 10 GB/s (i.e., faster per
core than DRAM).  I've been meaning to follow up.  The transformation
described below (splitting the bytes) is yielding decompression speed
around 1GB/s (in this link below), which isn't competitive for things
like MatMult, but could be useful for things like trajectory
checkpointing.

https://drive.google.com/file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view

--- Begin Message ---
Hello Liya Fan,


this explains the technique but for a more complex case:

https://fgiesen.wordpress.com/2011/01/24/x86-code-compression-in-kkrunchy/

For FP data, the approach which seemed to be the best is the following.

Say we have a buffer of two 32-bit floating point values:

buf = [af, bf]

We interpret each FP value as a 32-bit uint and look at each individual byte. 
We have 8 bytes in total for this small input.

buf = [af0, af1, af2, af3, bf0, bf1, bf2, bf3]

Then we apply stream splitting and the new buffer becomes:

newbuf = [af0, bf0, af1, bf1, af2, bf2, af3, bf3]

We compress newbuf.

Due to similarities the sign bits, mantissa bits and MSB exponent bits, we 
might have a lot more repetitions in data. For scientific data, the 2nd and 3rd 
byte for 32-bit data is probably largely noise. Thus in the original 
representation we would always have a few bytes of data which could appear 
somewhere else in the buffer and then a couple bytes of possible noise. In the 
new representation we have a long stream of data which could compress well and 
then a sequence of noise towards the end.

This transformation improved compression ratio as can be seen in the report.

It also improved speed for ZSTD. This could be because ZSTD makes a decision of 
how to compress the data - RLE, new huffman tree, huffman tree of the previous 
frame, raw representation. Each can potentially achieve a different compression 
ratio and compression/decompression speed. It turned out that when the 
transformation is applied, zstd would attempt to compress fewer frames and copy 
the other. This could lead to less attempts to build a huffman tree. It's hard 
to pin-point the exact reason.

I did not try other lossless text compressors but I expect similar results.

For code, I can polish my patches, create a Jira task and submit the patches 
for review.


Regards,

Martin



From: Fan Liya 
Sent: Thursday, July 11, 2019 11:32:53 AM
To: d...@arrow.apache.org
Cc: Raoofy, Amir; Karlstetter, Roman
Subject: Re: Adding a new encoding for FP data

Hi Radev,

Thanks for the information. It seems interesting.
IMO, Arrow has much to do for data compression. However, it seems there are
some differences for memory data compression and external storage data
compression.

Could you please provide some reference for stream splitting?

Best,
Liya Fan

On Thu, Jul 11, 2019 at 5:15 PM Radev, Martin  wrote:

> Hello people,
>
>
> there has been discussion in the Apache Parquet mailing list on adding a
> new encoder for FP data.
> The reason for this is that the supported compressors by Apache Parquet
> (zstd, gzip, etc) do not compress well raw FP data.
>
>
> In my investigation it turns out that a very simple simple technique,
> named stream splitting, can improve the compression ratio and even speed
> for some of the compressors.
>
> You can read about the results here:
> https://drive.google.com/file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view
>
>
> I went through the developer guide for Apache Arrow and wrote a patch to
> add the new encoding and test coverage for it.
>
> I will polish my patch and work in parallel to extend the Apache Parquet
> format for the new encoding.
>
>
> If you have any concerns, please let me know.
>
>
> Regards,
>
> Martin
>
>
--- End Message ---


Re: [petsc-dev] circular dependencies SLEPc

2019-07-09 Thread Jed Brown via petsc-dev
Matthew Knepley  writes:

> On Mon, Jul 8, 2019 at 10:37 PM Jed Brown via petsc-dev <
> petsc-dev@mcs.anl.gov> wrote:
>
>> "Smith, Barry F. via petsc-dev"  writes:
>>
>> >> On Jul 8, 2019, at 9:53 PM, Jakub Kruzik via petsc-dev <
>> petsc-dev@mcs.anl.gov> wrote:
>> >>
>> >> Just to clarify, the suggested solution is a plug-in sitting anywhere
>> in the PETSc source tree with postponed compilation and using
>> __attribute__((constructor)) to register (as in libCEED) for static
>> libraries?
>> >
>> >   Yes, this is my understanding. Good luck.
>>
>> There is some nontrivial infrastructure that would be needed for this
>> model.
>>
>> 1. This new component needs to be built into a new library such as
>>libpetsc-plugin.a (when static).
>>
>> 2. Users need to know when they should link this module.  They'll need a
>>link line something like -lpetsc-plugin -lslepc -lpetsc in this case.
>>It would need to be specified correctly in makefiles and pkg-config.
>>
>> 3. Anything with __attribute__((constructor)) runs *before* main, thus
>>before PetscInitialize.  There would need to be a new mechanism to
>>register a callback to be run at the end of PetscInitialize.
>>
>
> I think the simpler course it just to declare that this does not work
> outside of dynamic linking.
> The number of platforms that do not have dynamic linking is small, and we
> are not putting
> anything critical here. This decision can be reevaluated, but for now it
> makes everything much
> much simpler.

For the library to be loaded automatically, you would define a runpath,
perhaps defaulting to

  $(dirname /path/to/lib/libpetsc.so)/petsc/plugins

that would be scanned and each shared library in that directory would be
PetscDLLibraryOpen'd?


Re: [petsc-dev] circular dependencies SLEPc

2019-07-08 Thread Jed Brown via petsc-dev
"Smith, Barry F."  writes:

>> There is some nontrivial infrastructure that would be needed for this
>> model.
>> 
>> 1. This new component needs to be built into a new library such as
>>   libpetsc-plugin.a (when static).
>> 
>> 2. Users need to know when they should link this module.  They'll need a
>>   link line something like -lpetsc-plugin -lslepc -lpetsc in this case.
>>   It would need to be specified correctly in makefiles and pkg-config.
>> 
>> 3. Anything with __attribute__((constructor)) runs *before* main, thus
>>   before PetscInitialize.  There would need to be a new mechanism to
>>   register a callback to be run at the end of PetscInitialize.
>
>Are you saying we need something like PetscPlugInRegister(PetscErrorCode 
> (*)(void)) that can be called before PetscInitialize() by plugin libraries 
> that 
> registers the function that PetscInitialize() than calls? This is doable, 
> just needs to use malloc() directly and cannot use PETSc's FList constructs.

Yes, either with malloc or some fixed (fairly large) number of slots.
If using malloc, you'd want an __attribute__((destructor)) that runs
after main to free the list.  (You can't free it in PetscFinalize
because the user could call PetscInitialize again; you need to free it
to avoid Valgrind noise.)


Re: [petsc-dev] circular dependencies SLEPc

2019-07-08 Thread Jed Brown via petsc-dev
"Smith, Barry F. via petsc-dev"  writes:

>> On Jul 8, 2019, at 9:53 PM, Jakub Kruzik via petsc-dev 
>>  wrote:
>> 
>> Just to clarify, the suggested solution is a plug-in sitting anywhere in the 
>> PETSc source tree with postponed compilation and using 
>> __attribute__((constructor)) to register (as in libCEED) for static 
>> libraries?
>
>   Yes, this is my understanding. Good luck.

There is some nontrivial infrastructure that would be needed for this
model.

1. This new component needs to be built into a new library such as
   libpetsc-plugin.a (when static).

2. Users need to know when they should link this module.  They'll need a
   link line something like -lpetsc-plugin -lslepc -lpetsc in this case.
   It would need to be specified correctly in makefiles and pkg-config.

3. Anything with __attribute__((constructor)) runs *before* main, thus
   before PetscInitialize.  There would need to be a new mechanism to
   register a callback to be run at the end of PetscInitialize.


Re: [petsc-dev] Slowness of PetscSortIntWithArrayPair in MatAssembly

2019-07-02 Thread Jed Brown via petsc-dev
John Peterson  writes:

>> Do you add values many times into the same location?  The array length
>> will be the number of misses to the local part of the matrix.  We could
>> (and maybe should) make the stash use a hash instead of building the
>> array with multiplicity and combining duplicates later.
>>
>
>  This is a 3D model with a so-called "spider" node that is connected to
> (and constrained in terms of) many other nodes by 1D "beam" elements. So,
> yes, I would imagine the dofs of the spider node would be assembled into
> from many (possibly off-processor) elements.

Makes sense.

> The "legacy" variant sends all that redundant data and inserts one at a
>> time into the local data structures.
>>
>
> OK, I guess in this case the problem is so small that we don't even notice
> the communication time, even for the legacy algorithm.

Fande said it took a second, which sounds like a long time to me -- it's
enough time to move many GB in memory.


Re: [petsc-dev] Slowness of PetscSortIntWithArrayPair in MatAssembly

2019-07-02 Thread Jed Brown via petsc-dev
John Peterson  writes:

> On Tue, Jul 2, 2019 at 1:44 PM Jed Brown  wrote:
>
>> Fande Kong via petsc-dev  writes:
>>
>> > Hi Developers,
>> >
>> > John just noticed that the matrix assembly was slow when having
>> sufficient
>> > amount of off-diagonal entries. It was not a MPI issue since I was  able
>> to
>> > reproduce the issue using two cores on my desktop, that is, "mpirun -n
>> 2".
>> >
>> > I turned  on a profiling, and 99.99% of the time was spent
>> > on PetscSortIntWithArrayPair (recursively calling).   It took THREE
>> MINUTES
>> >  to get the assembly done. And then changed to use the option
>> > "-matstash_legacy" to restore
>> > the code to the old assembly routine, and the same code took ONE SECOND
>> to
>> > get the matrix assembly done.
>>
>> Uff.
>>
>> > Should write any better sorting algorithms?
>>
>> It would be good to confirm (e.g., via debugger) that the problematic
>> array has some particular structure.  The naive quicksort in PETSc
>> degrades to quadratic for some ordered inputs data.  I (cheap) "fixed"
>> that in 2010, but never propagated it to the WithArray variants.  I
>> think we should do something similar (better median estimation) or
>> perhaps move to radix sorts.
>>
>>
>> https://bitbucket.org/petsc/petsc/commits/ef8e358335c5882e7d377b87160557d47280dc77
>
>
> Hi Jed,
>
> I replied to Junchao on Fande's thread, but my email was held for
> moderation. Apologies if you have already seen that mail, but as I
> mentioned there, I think the issue is that we pass an already-sorted array
> that is also very long for some reason to PetscSortIntWithArrayPair (see
> stack trace below).
>
>
> That's what I think is happening. We managed to get a stack trace from one
> of the slow running cases, and it was over 100k frames deep in recursive
> PetscSortIntWithArrayPair_Private calls (see below). So perhaps we are
> seeing worse case O(n) stack frame depth for quicksort on an already-sorted
> array, but I also am not sure where the big number, n=35241426, comes from,
> as the problem did not have nearly that many DOFs.

Do you add values many times into the same location?  The array length
will be the number of misses to the local part of the matrix.  We could
(and maybe should) make the stash use a hash instead of building the
array with multiplicity and combining duplicates later.

The "legacy" variant sends all that redundant data and inserts one at a
time into the local data structures.

> #104609 PetscSortIntWithArrayPair_Private (L=0x7ff6a413e650,
> J=0x7ff69bace650, K=0x7ff68277e650, right=33123131)
> #104610 PetscSortIntWithArrayPair_Private (L=0x7ff6a413e650,
> J=0x7ff69bace650, K=0x7ff68277e650, right=35241425)
> #104611 PetscSortIntWithArrayPair (n=35241426, L=0x7ff6a413e650,
> J=0x7ff69bace650, K=0x7ff68277e650)
> #104612 MatStashSortCompress_Private (stash=0x555f7603b628,
> insertmode=ADD_VALUES)
> #104613 MatStashScatterBegin_BTS (mat=0x555f7603aee0, stash=0x555f7603b628,
> owners=0x555f7623c9c0)
> #104614 MatStashScatterBegin_Private (mat=0x555f7603aee0,
> stash=0x555f7603b628, owners=0x555f7623c9c0)
> #104615 MatAssemblyBegin_MPIAIJ (mat=0x555f7603aee0,
> mode=MAT_FINAL_ASSEMBLY)
>
>
>
> -- 
> John


Re: [petsc-dev] Slowness of PetscSortIntWithArrayPair in MatAssembly

2019-07-02 Thread Jed Brown via petsc-dev
Fande Kong via petsc-dev  writes:

> Hi Developers,
>
> John just noticed that the matrix assembly was slow when having sufficient
> amount of off-diagonal entries. It was not a MPI issue since I was  able to
> reproduce the issue using two cores on my desktop, that is, "mpirun -n 2".
>
> I turned  on a profiling, and 99.99% of the time was spent
> on PetscSortIntWithArrayPair (recursively calling).   It took THREE MINUTES
>  to get the assembly done. And then changed to use the option
> "-matstash_legacy" to restore
> the code to the old assembly routine, and the same code took ONE SECOND to
> get the matrix assembly done.

Uff.

> Should write any better sorting algorithms?

It would be good to confirm (e.g., via debugger) that the problematic
array has some particular structure.  The naive quicksort in PETSc
degrades to quadratic for some ordered inputs data.  I (cheap) "fixed"
that in 2010, but never propagated it to the WithArray variants.  I
think we should do something similar (better median estimation) or
perhaps move to radix sorts.

https://bitbucket.org/petsc/petsc/commits/ef8e358335c5882e7d377b87160557d47280dc77


Re: [petsc-dev] alternatives to cygwin on Windows with PETSc

2019-06-29 Thread Jed Brown via petsc-dev
"Smith, Barry F. via petsc-dev"  writes:

>   Does it make sense to recommend/suggest  git bash for Windows as an 
> alternative/in addition to Cygwin?

I would love to be able to recommend git-bash and/or WSL2 (which now
includes a full Linux kernel).  I don't have a system on which to test,
but it should be possible to make it work (if it doesn't already).


Re: [petsc-dev] Unused macros in petscconf.h

2019-06-29 Thread Jed Brown via petsc-dev
Matthew Knepley  writes:

> On Sat, Jun 29, 2019 at 8:39 AM Jed Brown  wrote:
>
>> Matthew Knepley  writes:
>>
>> > On Fri, Jun 28, 2019 at 4:37 PM Jed Brown  wrote:
>> >
>> >> Matthew Knepley  writes:
>> >>
>> >> > On Fri, Jun 28, 2019 at 2:04 PM Smith, Barry F. via petsc-dev <
>> >> > petsc-dev@mcs.anl.gov> wrote:
>> >> >
>> >> >>
>> >> >>   You are right, these do not belong in petscconf.h
>> >> >>
>> >> >
>> >> > The problematic thing here is hiding information from users of
>> >> > PETSc. If you are a user that counts on PETSc configure to check
>> >> > something, but then we hide it because we do not use it, I would not
>> >> > be happy.
>> >>
>> >> You want PETSc to test things that it doesn't use because maybe a user
>> >> would want to know?  Where does that end
>> >
>> >
>> > Very clearly it ends with testing the things users SPECIFICALLY ASKED
>> > US TO TEST on the configure command line.
>>
>> They asked us to test the size of short and for the existence of sched.h
>> and mkstemp?
>>
>
> I have no problem trimming the automatic tests we do (I copied them from
> Autoconf),
> as long as we provide a way to turn them on again.

I want to delete the code that we don't use.  In the rest of PETSc, when
we delete code, we delete it, not comment it out or make it optional and
untested.

>> >> and how would we ever know if
>> >> the information is correct?
>> >>
>> >
>> > This is just nonsensical. We know its correct because we tested it.
>>
>> Only by the code that decides whether to define the macro, but if one of
>> those tests is/becomes broken (this has happened), we wouldn't know.
>>
>
> I am not sure that worrying that code might become broken is a first order
> problem.

This unused & untested code is a net liability, not an asset.


Re: [petsc-dev] Unused macros in petscconf.h

2019-06-29 Thread Jed Brown via petsc-dev
Matthew Knepley  writes:

> On Fri, Jun 28, 2019 at 4:37 PM Jed Brown  wrote:
>
>> Matthew Knepley  writes:
>>
>> > On Fri, Jun 28, 2019 at 2:04 PM Smith, Barry F. via petsc-dev <
>> > petsc-dev@mcs.anl.gov> wrote:
>> >
>> >>
>> >>   You are right, these do not belong in petscconf.h
>> >>
>> >
>> > The problematic thing here is hiding information from users of
>> > PETSc. If you are a user that counts on PETSc configure to check
>> > something, but then we hide it because we do not use it, I would not
>> > be happy.
>>
>> You want PETSc to test things that it doesn't use because maybe a user
>> would want to know?  Where does that end
>
>
> Very clearly it ends with testing the things users SPECIFICALLY ASKED
> US TO TEST on the configure command line.

They asked us to test the size of short and for the existence of sched.h
and mkstemp?

>> and how would we ever know if
>> the information is correct?
>>
>
> This is just nonsensical. We know its correct because we tested it.

Only by the code that decides whether to define the macro, but if one of
those tests is/becomes broken (this has happened), we wouldn't know.


Re: [petsc-dev] Unused macros in petscconf.h

2019-06-28 Thread Jed Brown via petsc-dev
Matthew Knepley  writes:

> On Fri, Jun 28, 2019 at 2:04 PM Smith, Barry F. via petsc-dev <
> petsc-dev@mcs.anl.gov> wrote:
>
>>
>>   You are right, these do not belong in petscconf.h
>>
>
> The problematic thing here is hiding information from users of
> PETSc. If you are a user that counts on PETSc configure to check
> something, but then we hide it because we do not use it, I would not
> be happy.

You want PETSc to test things that it doesn't use because maybe a user
would want to know?  Where does that end and how would we ever know if
the information is correct?


[petsc-dev] Unused macros in petscconf.h

2019-06-28 Thread Jed Brown via petsc-dev
We have a lot of lines like this

$ grep -c HAVE_LIB $PETSC_ARCH/include/petscconf.h
96

but only four of these are ever checked in src/.  Delete them?

IMO, unused stuff should not go into petscconf.h.  We have to scroll up
past these lines every time configure crashes.  These are apparently all
unused:

$ for name in $(grep '#define \w\+ ' $PETSC_ARCH/include/petscconf.h | cut -f 2 
-d\ ); do rg -q -c $name include/ src/ || echo $name; done
STDC_HEADERS
HAVE_MATH_INFINITY
ANSI_DECLARATORS
PETSC_PATH_SEPARATOR
PETSC_HAVE_BLASLAPACK
PETSC_HAVE_PTHREAD
PETSC_HAVE_NETCDF
PETSC_HAVE_PNETCDF
PETSC_HAVE_METIS
PETSC_HAVE_MATHLIB
PETSC_HAVE_ZLIB
PETSC_TIME_WITH_SYS_TIME
PETSC_HAVE_MATH_H
PETSC_HAVE_ENDIAN_H
PETSC_HAVE_LIMITS_H
PETSC_HAVE_SEARCH_H
PETSC_HAVE_SCHED_H
PETSC_HAVE_PTHREAD_H
PETSC_HAVE_CXX_NAMESPACE
PETSC_HAVE_FORTRAN_TYPE_INITIALIZE
PETSC_HAVE_LIBDL
PETSC_HAVE_LIBQUADMATH
PETSC_HAVE_LIBX11
PETSC_HAVE_LIBZ
PETSC_HAVE_LIBTRIANGLE
PETSC_HAVE_LIBMETIS
PETSC_HAVE_LIBPARMETIS
PETSC_HAVE_LIBHDF5HL_FORTRAN
PETSC_HAVE_LIBHDF5_FORTRAN
PETSC_HAVE_LIBHDF5_HL
PETSC_HAVE_LIBHDF5
PETSC_HAVE_LIBPNETCDF
PETSC_HAVE_LIBNETCDF
PETSC_HAVE_LIBEXODUS
PETSC_HAVE_LIBBLAS
PETSC_HAVE_LIBLAPACK
PETSC_HAVE_LIBSTDC__
PETSC_HAVE_LIBMPI_USEMPIF08
PETSC_HAVE_LIBMPI_USEMPI_IGNORE_TKR
PETSC_HAVE_LIBMPI_MPIFH
PETSC_HAVE_LIBMPI
PETSC_HAVE_LIBGFORTRAN
PETSC_HAVE_LIBGCC_S
PETSC_HAVE_LIBPTHREAD
PETSC_HAVE_LIBSUNDIALS_CVODE
PETSC_HAVE_LIBSUNDIALS_NVECSERIAL
PETSC_HAVE_LIBSUNDIALS_NVECPARALLEL
PETSC_HAVE_LIBML
PETSC_HAVE_LIBSUPERLU_DIST
PETSC_HAVE_LIBSUPERLU
PETSC_HAVE_LIBUMFPACK
PETSC_HAVE_LIBKLU
PETSC_HAVE_LIBCHOLMOD
PETSC_HAVE_LIBBTF
PETSC_HAVE_LIBCCOLAMD
PETSC_HAVE_LIBCOLAMD
PETSC_HAVE_LIBCAMD
PETSC_HAVE_LIBAMD
PETSC_HAVE_LIBSUITESPARSECONFIG
PETSC_HAVE_LIBSCALAPACK
PETSC_HAVE_LIBCMUMPS
PETSC_HAVE_LIBDMUMPS
PETSC_HAVE_LIBSMUMPS
PETSC_HAVE_LIBZMUMPS
PETSC_HAVE_LIBMUMPS_COMMON
PETSC_HAVE_LIBPORD
PETSC_HAVE_LIBHYPRE
PETSC_USE_SCALAR_REAL
PETSC_HAVE_REAL___FLOAT128
PETSC_RETSIGTYPE
PETSC_SIZEOF_SHORT
PETSC_SIZEOF_MPI_COMM
PETSC_SIZEOF_MPI_FINT
PETSC_HAVE_VPRINTF
PETSC_HAVE_VFPRINTF
PETSC_HAVE_MKSTEMP
PETSC_HAVE_GETHOSTBYNAME
PETSC_HAVE_SOCKET
PETSC_HAVE_TIMES
PETSC_HAVE_SIGSET
PETSC_HAVE_GETTIMEOFDAY
PETSC_HAVE_SIGNAL
PETSC_HAVE_GET_NPROCS
PETSC_HAVE_SIGACTION
PETSC_HAVE__GFORTRAN_IARGC
PETSC_HAVE_SHARED_LIBRARIES
PETSC_USE_GDB_DEBUGGER
PETSC_VERSION_BRANCH_GIT
PETSC_HAVE_MPI_COMM_F2C
PETSC_HAVE_MPI_COMM_C2F
PETSC_HAVE_MPI_FINT
PETSC_HAVE_MPI_REPLACE
PETSC_HAVE_MPI_WIN_ALLOCATE_SHARED
PETSC_HAVE_MPI_WIN_SHARED_QUERY
PETSC_HAVE_MPI_ALLTOALLW
PETSC_HAVE_MPI_COMM_SPAWN
PETSC_HAVE_MPI_TYPE_GET_EXTENT
PETSC_HAVE_PTHREAD_BARRIER_T
PETSC_HAVE_SCHED_CPU_SET_T
PETSC_HAVE_SYS_SYSCTL_H
PETSC_LEVEL1_DCACHE_SIZE
PETSC_LEVEL1_DCACHE_ASSOC


Re: [petsc-dev] configure using stale packages

2019-06-28 Thread Jed Brown via petsc-dev
"Smith, Barry F."  writes:

>If we stash the --download-xxx=yyy   yyy value and the state of the xxx.py 
> then we can  know that the package may be need to be re-downloaded, 
> re-configured, rebuilt, reinstalled. Essentially get the dependencies of 
> package xxx on itself right. There is also the dependency of package xxx on 
> packager www (for example hdf5 changes thus pnetcdf needs to be rebuilt. 
> Finally there is the dependency of xxx on other stuff that may change in 
> config (for example package.py) 
>
> My branch handles all these dependencies (at the cost of rebuilding
> more than what may need to be rebuilt) 

I think the cost of manually rerunning configure repeatedly to notice
dependencies, delete pkg.conf files, etc. (and the associated cognitive
load) is *much* worse than rebuilding a bit more than strictly
necessary.  We've been paying for --download being a quick-n-dirty hack.
If it's going to be a package manager, it should accurately track
dependencies and dependent versions.

> but it is a different model then how we have traditionally used
> --download-xxx. It is not clear to me if we should just discard the
> old model for a new one, support them both or something else. It seems
> inefficient to support both models forever, maybe a smarter hybrid of
> the two is desirable. I need to develop some experience with my new
> model to see how it can be improved, extended or if it is complete
> trash. I am not sure trying to "patch up" the old model is the best
> approach; though maybe it is.
>
>   Barry
>
>
>> On Jun 28, 2019, at 10:41 AM, Jed Brown via petsc-dev 
>>  wrote:
>> 
>> If we configure with --download-pnetcdf (version 1.9.0), then update the
>> PETSc repository to use a new version (1.11.2), then re-run ./configure
>> --download-pnetcdf, we get a warning making us look like dolts:
>> 
>> ===
>>  
>>
>>  Warning: Using version 1.9.0 of package pnetcdf; PETSc is tested with 
>> 1.11 
>>   
>>  Suggest using --download-pnetcdf for a compatible pnetcdf   
>>  
>> 
>> ===
>> 
>> It looks like we have a mechanism for gitcommit versions, but not for
>> normal numbered versions?
>> 
>> NB: There are other changes in updating pnetcdf that will bypass this
>> issue today, but I wanted to raise it in case someone had a vision for
>> how the logic should be organized.  It's currently spread out, and
>> any attempt at a fresh download is skipped if there is a matching
>> download directory:
>> 
>>if not self.packageDir: self.packageDir = self.downLoad()


Re: [petsc-dev] configure using stale packages

2019-06-28 Thread Jed Brown via petsc-dev
"Balay, Satish"  writes:

> On Fri, 28 Jun 2019, Jed Brown via petsc-dev wrote:
>
>> If we configure with --download-pnetcdf (version 1.9.0), then update the
>> PETSc repository to use a new version (1.11.2), then re-run ./configure
>> --download-pnetcdf, we get a warning making us look like dolts:
>> 
>> ===
>>  
>>
>>   Warning: Using version 1.9.0 of package pnetcdf; PETSc is tested with 
>> 1.11 
>>   
>>   Suggest using --download-pnetcdf for a compatible pnetcdf  
>>  
>>  
>> ===
>> 
>> It looks like we have a mechanism for gitcommit versions, but not for
>> normal numbered versions?
>
> Yes. We have some overloaded functionality here wrt --download-package
> for tarballs.  This is to partly support:
>
> --download-package=URL
>
> [where URL could be a different version - or have a different dir structure 
> in the tarball].

Right, I understand being permissive in that setting, but if someone is
asking to use the package version defined in packages/foo.py, we should
strive for configure to produce the same result as if we had deleted
PETSC_ARCH and reconfigured from scratch with the same options.

> And then - our desire to avoid re-download the package if its already
> downloaded.  Perhaps this part can be improved by stashing the tarball
> somehow that corresponds to the URL used.. [but sometimes one can use
> the same tarball with a different URL]

If we trust our former selves, perhaps we can record self.version at the
time of a download and clean/re-download any time we're being asked to
download a different version (e.g., because packages/foo.py has changed).

> Wrt a git repo, one can overcome this with:
>
> --download-package=git://URL --download-package-commit=HASH
>
> [but configure does not expect the URL to change after a clone is
> created - so that part can break]

Yes, I had to manually fix this in about a dozen PETSC_ARCHes after the
Hypre repository moved.


[petsc-dev] configure using stale packages

2019-06-28 Thread Jed Brown via petsc-dev
If we configure with --download-pnetcdf (version 1.9.0), then update the
PETSc repository to use a new version (1.11.2), then re-run ./configure
--download-pnetcdf, we get a warning making us look like dolts:

=== 

   
  Warning: Using version 1.9.0 of package pnetcdf; PETSc is tested with 
1.11
   
  Suggest using --download-pnetcdf for a compatible pnetcdf 

   
===

It looks like we have a mechanism for gitcommit versions, but not for
normal numbered versions?

NB: There are other changes in updating pnetcdf that will bypass this
issue today, but I wanted to raise it in case someone had a vision for
how the logic should be organized.  It's currently spread out, and
any attempt at a fresh download is skipped if there is a matching
download directory:

if not self.packageDir: self.packageDir = self.downLoad()


Re: [petsc-dev] circular dependencies SLEPc

2019-06-26 Thread Jed Brown via petsc-dev
Matthew Knepley  writes:

> On Wed, Jun 26, 2019 at 3:42 PM Jed Brown via petsc-dev <
> petsc-dev@mcs.anl.gov> wrote:
>
>> "Smith, Barry F."  writes:
>>
>> >> On Jun 26, 2019, at 1:53 PM, Jed Brown  wrote:
>> >>
>> >> "Smith, Barry F."  writes:
>> >>
>> >>>  It is still a PC, it may as part of its computation solve an
>> eigenvalue problem but its use is as a PC, hence does not belong in SLEPc.
>> >>
>> >> Fine; it does not belong in src/ksp/pc/.
>> >
>> >   Why not? From the code mangement point of view that is the perfect
>> place for it. It just depends on an external package in the same way that
>> PCHYPRE depends on an external library. Having it off in some other
>> directory src/plugins would serve no purpose. Of course making sure it
>> doesn't get compiled into -lpetsc may require a tweak to the make
>> infrastructure. Make could, for example, skip plugin subdirectories for
>> example.
>>
>> I think it's confusing to have code that is part of libpetsc.so
>> alongside code that is not (e.g., won't be accessible to users unless
>> they also build SLEPc and link the plugin).
>>
>> >   BTW: Matt's perverse use of SNES from DMPLEx could also be fixed to
>> >   work this way instead of the disgusting PetscObject casting used to
>> >   cancel the SNES object.
>>
>> That code could be part of libpetscsnes.so.
>>
>
> What? I thought I moved everything to SNES a long time ago.

I thought there was a place where SNES was cast to PetscObject.  There
is DMAddField, but it's different.

PetscViewerVTKAddField is another example of code that uses PetscObject
to avoid depending on a higher level type like Vec.


Re: [petsc-dev] circular dependencies SLEPc

2019-06-26 Thread Jed Brown via petsc-dev
"Smith, Barry F."  writes:

>> On Jun 26, 2019, at 1:53 PM, Jed Brown  wrote:
>> 
>> "Smith, Barry F."  writes:
>> 
>>>  It is still a PC, it may as part of its computation solve an eigenvalue 
>>> problem but its use is as a PC, hence does not belong in SLEPc.
>> 
>> Fine; it does not belong in src/ksp/pc/.
>
>   Why not? From the code mangement point of view that is the perfect place 
> for it. It just depends on an external package in the same way that PCHYPRE 
> depends on an external library. Having it off in some other directory 
> src/plugins would serve no purpose. Of course making sure it doesn't get 
> compiled into -lpetsc may require a tweak to the make infrastructure. Make 
> could, for example, skip plugin subdirectories for example.

I think it's confusing to have code that is part of libpetsc.so
alongside code that is not (e.g., won't be accessible to users unless
they also build SLEPc and link the plugin).

>   BTW: Matt's perverse use of SNES from DMPLEx could also be fixed to
>   work this way instead of the disgusting PetscObject casting used to
>   cancel the SNES object.

That code could be part of libpetscsnes.so.


Re: [petsc-dev] circular dependencies SLEPc

2019-06-26 Thread Jed Brown via petsc-dev
"Smith, Barry F."  writes:

>> On Jun 26, 2019, at 1:53 PM, Jed Brown  wrote:
>> 
>> "Smith, Barry F."  writes:
>> 
>>>  It can be a plug-in whose source sits in the PETSc source tree, even in 
>>> the PC directory. It gets built by the PETSc build system after the 
>>> build system installs PETSc and SLEPc (in the Spack world it would have its 
>>> own Spack file that just depends on PETSc and SLEPc). Pretty easy to setup 
>>> (and yes less hacky than my previous suggestion). 
>>> 
>>>   The only thing missing is the PCRegister(newmethod) inside the PETSc 
>>> library so -pc_type newmethod is truly transparent but actually even that 
>>> is workable with dynamic libraries, each PETSc dynamic library has a 
>>> function that is called by PETSc when PETSc loads the library, this routine 
>>> could call the registration function. One can provide, for example in an 
>>> environmental variable a list of dynamic library plug-ins that PETSc loads, 
>>> but I now realized
>>> and even more transparent way, in the PETSc install library directory we 
>>> have a subdirectory (called say petsc-plugins), PETSc sys would 
>>> automatically load these libraries at run time and thus transparently 
>>> register the new PC.  Almost close enough to satisfy Matt without the 
>>> hackyness of my first suggestion that Jed hated. Drawback, only with 
>>> dynamic libraries for the plug-ins.
>> 
>> With static libraries, the user would just need to link the 
>> libpetsc-plugin.a.
>> 
>> Actual registration can be done using __attribute((constructor)) (or a
>> C++ static constructor, but I think the attribute is widely supported).
>
>   Good idea. If the combination of the two is widely supported then we don't 
> need to even worry about the dynamic library case; which will be a relieve

We're using __attribute((constructor)) in libCEED to avoid having a central 
registration list (like in PCRegisterAll).  It has been working fine.


Re: [petsc-dev] circular dependencies SLEPc

2019-06-26 Thread Jed Brown via petsc-dev
"Smith, Barry F."  writes:

>   It is still a PC, it may as part of its computation solve an eigenvalue 
> problem but its use is as a PC, hence does not belong in SLEPc.

Fine; it does not belong in src/ksp/pc/.


Re: [petsc-dev] circular dependencies SLEPc

2019-06-26 Thread Jed Brown via petsc-dev
"Smith, Barry F."  writes:

>   It can be a plug-in whose source sits in the PETSc source tree, even in the 
> PC directory. It gets built by the PETSc build system after the 
> build system installs PETSc and SLEPc (in the Spack world it would have its 
> own Spack file that just depends on PETSc and SLEPc). Pretty easy to setup 
> (and yes less hacky than my previous suggestion). 
>
>The only thing missing is the PCRegister(newmethod) inside the PETSc 
> library so -pc_type newmethod is truly transparent but actually even that is 
> workable with dynamic libraries, each PETSc dynamic library has a function 
> that is called by PETSc when PETSc loads the library, this routine could call 
> the registration function. One can provide, for example in an environmental 
> variable a list of dynamic library plug-ins that PETSc loads, but I now 
> realized
> and even more transparent way, in the PETSc install library directory we have 
> a subdirectory (called say petsc-plugins), PETSc sys would automatically load 
> these libraries at run time and thus transparently register the new PC.  
> Almost close enough to satisfy Matt without the hackyness of my first 
> suggestion that Jed hated. Drawback, only with dynamic libraries for the 
> plug-ins.

With static libraries, the user would just need to link the libpetsc-plugin.a.

Actual registration can be done using __attribute((constructor)) (or a
C++ static constructor, but I think the attribute is widely supported).


Re: [petsc-dev] circular dependencies SLEPc

2019-06-26 Thread Jed Brown via petsc-dev
Jed Brown  writes:

> Patrick Sanan  writes:
>
>> How about a plug-in PC implementation, compiled as its own dynamic library,
>> depending on both SLEPc and PETSc?
>
> Of course, but such a thing would need its own continuous integration, etc.

We could develop a better system for packaging and distribution of
plugins, but that isn't going to make Matt happy.


Re: [petsc-dev] circular dependencies SLEPc

2019-06-26 Thread Jed Brown via petsc-dev
"Smith, Barry F."  writes:

>> You can implement and register a PC in SLEPc (it would go in libslepc.so).
>
>   It makes no sense to have a PC in SLEPc. 

We're talking about a PC that is implemented by iteratively solving an
eigenproblem.


Re: [petsc-dev] circular dependencies SLEPc

2019-06-26 Thread Jed Brown via petsc-dev
Patrick Sanan  writes:

> How about a plug-in PC implementation, compiled as its own dynamic library,
> depending on both SLEPc and PETSc?

Of course, but such a thing would need its own continuous integration, etc.


Re: [petsc-dev] circular dependencies SLEPc

2019-06-26 Thread Jed Brown via petsc-dev
Matthew Knepley  writes:

> On Wed, Jun 26, 2019 at 1:05 PM Jed Brown  wrote:
>
>> Matthew Knepley  writes:
>>
>> > On Wed, Jun 26, 2019 at 12:45 PM Jed Brown  wrote:
>> >
>> >> Matthew Knepley  writes:
>> >>
>> >> >> You can implement and register a PC in SLEPc (it would go in
>> >> libslepc.so).
>> >> >>
>> >> >
>> >> > I think this is the bad workflow solution. What Barry suggested will
>> work
>> >> > and be MUCH easier for a developer. Isn't
>> >> > the point of our tools to make our lives easier, not to enforce rules
>> >> that
>> >> > make them harder?
>> >>
>> >> Circular dependencies with a special build process is an enormous
>> >> development and distribution tax.
>> >>
>> >
>> > The difference in the arguments here is that there are very specific
>> > problems with the "right" way,
>> > namely that I need to deal with two different repos, two testing systems,
>> > release schedules, etc.
>> > Whereas the taxes above are currently theoretical.
>>
>> It isn't remotely theoretical.
>>
>> You could propose merging SLEPc into the PETSc repository (similar to
>> what we did with TAO a while back) if you think "PETSc" code will
>> frequently need to depend on SLEPc, but creating a circular dependency
>> between separate packages is worse than having code in Vec that depends
>> on DM.
>>
>
> I think there will be very frequent dependencies. I would say this is
> a very convenient stopgap that is preferable to making anyone work in
> both places at once.  That is a very real development nightmare.

As a concrete issue unrelated from packaging/distribution (which is very
important), what happens when a PETSc interface used by SLEPc changes in
a branch?  If the repositories are separate and you have this circular
dependency, the PETSc build and tests fail until SLEPc updates to the
new interface and that lands in 'master' where PETSc can use it.  But
the SLEPc updates can't land until the branch with this new change is
merged in PETSc, so you have to do custom testing and synchronize these
merges.  Stop-the-world disruption is not okay, full stop.

I think it's up to Jose whether closer integration is desirable.


Re: [petsc-dev] circular dependencies SLEPc

2019-06-26 Thread Jed Brown via petsc-dev
Matthew Knepley  writes:

> On Wed, Jun 26, 2019 at 12:45 PM Jed Brown  wrote:
>
>> Matthew Knepley  writes:
>>
>> >> You can implement and register a PC in SLEPc (it would go in
>> libslepc.so).
>> >>
>> >
>> > I think this is the bad workflow solution. What Barry suggested will work
>> > and be MUCH easier for a developer. Isn't
>> > the point of our tools to make our lives easier, not to enforce rules
>> that
>> > make them harder?
>>
>> Circular dependencies with a special build process is an enormous
>> development and distribution tax.
>>
>
> The difference in the arguments here is that there are very specific
> problems with the "right" way,
> namely that I need to deal with two different repos, two testing systems,
> release schedules, etc.
> Whereas the taxes above are currently theoretical.

It isn't remotely theoretical.

You could propose merging SLEPc into the PETSc repository (similar to
what we did with TAO a while back) if you think "PETSc" code will
frequently need to depend on SLEPc, but creating a circular dependency
between separate packages is worse than having code in Vec that depends
on DM.


Re: [petsc-dev] circular dependencies SLEPc

2019-06-26 Thread Jed Brown via petsc-dev
Matthew Knepley  writes:

>> You can implement and register a PC in SLEPc (it would go in libslepc.so).
>>
>
> I think this is the bad workflow solution. What Barry suggested will work
> and be MUCH easier for a developer. Isn't
> the point of our tools to make our lives easier, not to enforce rules that
> make them harder?

Circular dependencies with a special build process is an enormous
development and distribution tax.


Re: [petsc-dev] circular dependencies SLEPc

2019-06-26 Thread Jed Brown via petsc-dev
"Smith, Barry F. via petsc-dev"  writes:

>> On Jun 26, 2019, at 9:56 AM, Balay, Satish via petsc-dev 
>>  wrote:
>> 
>> On Wed, 26 Jun 2019, Jakub Kruzik via petsc-dev wrote:
>> 
>>> Hello,
>>> 
>>> as I mentioned in PR #1819, I would like to use SLEPc in PETSc.
>>> 
>>> Currently when PETSc is configured with --download-slepc, it defines
>>> PETSC_HAVE_SLEPC and each compilation of PETSc recompiles SLEPc.
>> 
>> yes - slepc uses petsc, so when petsc is updated - its best to rebuild slepc
>> 
>> You can ignore PETSC_HAVE_SLEPC flag [its just a build tool thingy]
>> PETSc code does not use this flag - and there is no circular
>> dependency.
>> 
>>> The first way to use SLEPc is from an example. That should be easy, all we
>>> need is to add -lslepc when compiling an example.
>> 
>> Its best to use slepc examples as templates - and slepc makefiles [as 
>> examples].
>> 
>> --download-slepc is a convinence feature to install petsc and slepc in
>> a single go. It does not change how you would use slepc.
>> 
>> Satish
>> 
>> 
>>> 
>>> The other option is to use SLEPc inside PETSc code. I do not know how to
>>> achieve this. One way could be to define PETSC_HAVE_SLEPC after the
>>> compilation of SLEPc and again compile PETSc but this time linking with 
>>> SLEPc.
>>> Although, even if it works, it is ugly.
>
>If you make SLEPc calls from PETSc source you should only need the SLEPc 
> header files to compile the PETSc source; not the SLEPc library. So one way 
> to accomplish this would be to do a "partial" install of SLEPc, build PETSc 
> (that uses SLEPc) and then complete the SLEPc install. When --download-slepc 
> is used this would mean during the SLEPc.py script it would copy over the 
> SLEPc include files to the prefix location and after PETSc is built it would 
> build the SLEPc libraries and move them to the prefix location.  The on iffy 
> thing is that SLEPc include files may depend on generated PETSc include files 
> (which are not fully generated until configure is done). Thus instead of 
> having SLEPc.py move the SLEPc include to the prefix location it would need 
> to post-pone that until just at the end of configure (we have other packages 
> to do this). So when you ready to try this out let us know and we can help 
> with the infrastructure. (it will avoid 2 builds of either PETSc or SLEPc).

That is disgusting.

If code in libpetsc.so depends on libslepc.so, then you'd have a circular 
dependency.


You can implement and register a PC in SLEPc (it would go in libslepc.so).


Re: [petsc-dev] PETSc blame digest (next) 2019-06-20

2019-06-20 Thread Jed Brown via petsc-dev
"Hapla  Vaclav"  writes:

>> On 20 Jun 2019, at 15:56, Vaclav Hapla  wrote:
>> 
>> 
>> 
>>> On 20 Jun 2019, at 15:52, Vaclav Hapla  wrote:
>>> 
>>> 
>>> 
 On 20 Jun 2019, at 15:15, Hapla Vaclav  wrote:
 
 
 
> On 20 Jun 2019, at 15:14, Jed Brown  wrote:
> 
> Hapla  Vaclav via petsc-dev  writes:
> 
>>> On 20 Jun 2019, at 14:28, PETSc checkBuilds 
>>>  wrote:
>>> 
>>> 
>>> 
>>> Dear PETSc developer,
>>> 
>>> This email contains listings of contributions attributed to you by
>>> `git blame` that caused compiler errors or warnings in PETSc automated
>>> testing.  Follow the links to see the full log files. Please attempt to 
>>> fix
>>> the issues promptly or let us know at petsc-dev@mcs.anl.gov if you are 
>>> unable
>>> to resolve the issues.
>>> 
>>> Thanks,
>>> The PETSc development team
>>> 
>>> 
>>> 
>>> warnings attributed to commit 
>>> https://bitbucket.org/petsc/petsc/commits/eb91f32
>>> MatLoad_Dense_HDF5 impl.
>>> 
>>> src/mat/impls/dense/seq/densehdf5.c:62
>>> [http://ftp.mcs.anl.gov/pub/petsc/nightlylogs//archive/2019/06/20/build_next_arch-linux-pkgs-cxx-mlib_el6.log]
>>>  
>>> /home/sandbox/petsc/petsc.next-3/src/mat/impls/dense/seq/densehdf5.c:62:
>>>  undefined reference to `PetscViewerHDF5Load'
>> 
>> Does it mean I need to change visibility of PetscViewerHDF5Load in 
>> isimpl.h to PETSC_EXTERN?
>> Are you OK with that?
> 
> Yes, and when doing that, it needs a Developer level man page.
 
 OK, thanks.
>>> 
>>> So as a rule of thumb, every PETSC_EXTERN function should be documented, 
>>> although in private headers?
>>> 
>>> Then src/vec/is/utils/hdf5io.c should have
>>> #include  /*I "petsc/private/isimpl.h" I*/
>>> ?
>>> Looks a bit weird, doesn't it?
>>> 
>>> I think there are countless cases petsc-wide which break this.
>> 
>> dev manual page 15 bullet 15:
>> "private functions may need to be marked PETSC_EXTERN"
>> There is an example of MatHeaderReplace. It hasn't a manpage.
>> 
>
> Well, I'm not against making it public (it originally was) and write a 
> manpage, but then it should be also moved to the public header file.

If it's meant to be private, it needs _Private.  If you don't otherwise
need private/isimpl.h, then it should probably become public.

MatHeaderReplace is public (in petscmat.h) so it should have a man page.


Re: [petsc-dev] PETSc blame digest (next) 2019-06-20

2019-06-20 Thread Jed Brown via petsc-dev
Hapla  Vaclav via petsc-dev  writes:

>> On 20 Jun 2019, at 14:28, PETSc checkBuilds  
>> wrote:
>> 
>> 
>> 
>> Dear PETSc developer,
>> 
>> This email contains listings of contributions attributed to you by
>> `git blame` that caused compiler errors or warnings in PETSc automated
>> testing.  Follow the links to see the full log files. Please attempt to fix
>> the issues promptly or let us know at petsc-dev@mcs.anl.gov if you are unable
>> to resolve the issues.
>> 
>> Thanks,
>>  The PETSc development team
>> 
>> 
>> 
>> warnings attributed to commit 
>> https://bitbucket.org/petsc/petsc/commits/eb91f32
>> MatLoad_Dense_HDF5 impl.
>> 
>>  src/mat/impls/dense/seq/densehdf5.c:62
>>
>> [http://ftp.mcs.anl.gov/pub/petsc/nightlylogs//archive/2019/06/20/build_next_arch-linux-pkgs-cxx-mlib_el6.log]
>>  
>> /home/sandbox/petsc/petsc.next-3/src/mat/impls/dense/seq/densehdf5.c:62: 
>> undefined reference to `PetscViewerHDF5Load'
>
> Does it mean I need to change visibility of PetscViewerHDF5Load in isimpl.h 
> to PETSC_EXTERN?
> Are you OK with that?

Yes, and when doing that, it needs a Developer level man page.


Re: [petsc-dev] moving from BitBucket to GitLab

2019-06-18 Thread Jed Brown via petsc-dev
Alexander Lindsay  writes:

> I'm assuming this would be served out of an Argonne domain?

No, gitlab.com.

> On Sun, Jun 16, 2019 at 12:49 PM Jed Brown via petsc-dev <
> petsc-dev@mcs.anl.gov> wrote:
>
>> "Zhang, Hong via petsc-dev"  writes:
>>
>> > If it is mainly because of CI, why don't we host petsc on GitHub and use
>> the GitLab CI?
>> > https://about.gitlab.com/solutions/github/
>>
>> There are significant missing features for that mode of operation.
>>
>> https://gitlab.com/gitlab-org/gitlab-ce/issues/60158
>>
>> > GitHub has been the biggest social network for developers. Changing a
>> utility is easy to me, but changing a social network isn't.
>>
>> From a previous conversation with Barry:
>>
>> | Barry writes:
>> | >   BTW: You're going to get some abuse for advocating for GitLab and
>> not GitHub; I don't care because I'm a contrarian but it would be good if
>> you had a few sentences about why moving to GitLab is
>> | > better than GitHub (and it can't be open source philosophy arguments
>> :-)
>> |
>> | The short answer is that I think the PR integration with CI/metrics is
>> | clearly superior to anything presently available at GitHub, GitLab can
>> | import all the issues/pull requests/comments, and GitLab supports math
>> | in comments.
>> |
>> | GitHub has more community visibility.  If everyone wants to move there,
>> | it'd be okay with me, but wouldn't completely resolve the CI situation
>> | and we'd lose lots of our history.  On purely technical merits outside
>> | of CI and import, I think GitLab is on par with GitHub.
>>


Re: [petsc-dev] User(s) manual sections field in manual pages?

2019-06-17 Thread Jed Brown via petsc-dev
Patrick Sanan  writes:

>> It ought, I suppose, be possible to write a plugin that adds links
>> automagically to all keywords in formatted source, but I don't know the
>> details of how these are written.
>>
> Sounds like Jed's suggesting that this could be done with a script similar
> to the one that exists now for the manual. How about
> 1. Use pandoc to convert (a section of) the dev manual to the new .rst
> format
> 2. Modify the existing script to add links from (temporary copies of) the
> new .rst to the existing man pages (still using the existing htmlmap file)
> 3. Generate an example of what the new html dev manual would like
> This would
> - probably reveal some gotchas about this approach
> - be a good step along the way to a useful first result, which is the
> manual and dev manual on the web

I think this would be a good start.


  1   2   3   >