Re: [petsc-dev] New implementation of PtAP based on all-at-once algorithm

2019-04-11 Thread Smith, Barry F. via petsc-dev



> On Apr 11, 2019, at 9:07 PM, Mark Adams via petsc-dev  
> wrote:
> 
> Interesting, nice work.
> 
> It would be interesting to get the flop counters working.
> 
> This looks like GMG, I assume 3D.
> 
> The degree of parallelism is not very realistic. You should probably run a 
> 10x smaller problem, at least, or use 10x more processes.

   Why do you say that? He's got his machine with a certain amount of physical 
memory per node, are you saying he should ignore/not use 90% of that physical 
memory for his simulation? He should buy a machine 10x bigger just because it 
means having less degrees of freedom per node (whose footing the bill for this 
purchase?). At INL they run simulations for a purpose, not just for scalability 
studies and there are no dang GPUs or barely used over-sized monstrocities 
sitting around to brag about twice a year at SC.

   Barry



> I guess it does not matter. This basically like a one node run because the 
> subdomains are so large.
> 
> And are you sure the numerics are the same with and without hypre? Hypre is 
> 15x slower. Any ideas what is going on?
> 
> It might be interesting to scale this test down to a node to see if this is 
> from communication.
> 
> Again, nice work,
> Mark
> 
> 
> On Thu, Apr 11, 2019 at 7:08 PM Fande Kong  wrote:
> Hi Developers,
> 
> I just want to share a good news.  It is known PETSc-ptap-scalable is taking 
> too much memory for some applications because it needs to build intermediate 
> data structures.  According to Mark's suggestions, I implemented the  
> all-at-once algorithm that does not cache any intermediate data. 
> 
> I did some comparison,  the new implementation is actually scalable in terms 
> of the memory usage and the compute time even though it is still  slower than 
> "ptap-scalable".   There are some memory profiling results (see the 
> attachments). The new all-at-once implementation use the similar amount of 
> memory as hypre, but it way faster than hypre.
> 
> For example, for a problem with 14,893,346,880 unknowns using 10,000 
> processor cores,  There are timing results:
> 
> Hypre algorithm:
> 
> MatPtAP   50 1.0 3.5353e+03 1.0 0.00e+00 0.0 1.9e+07 3.3e+04 
> 6.0e+02 33  0  1  0 17  33  0  1  0 17 0
> MatPtAPSymbolic   50 1.0 2.3969e-0213.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> MatPtAPNumeric50 1.0 3.5353e+03 1.0 0.00e+00 0.0 1.9e+07 3.3e+04 
> 6.0e+02 33  0  1  0 17  33  0  1  0 17 0
> 
> PETSc scalable PtAP:
> 
> MatPtAP   50 1.0 1.1453e+02 1.0 2.07e+09 3.8 6.6e+07 2.0e+05 
> 7.5e+02  2  1  4  6 20   2  1  4  6 20 129418
> MatPtAPSymbolic   50 1.0 5.1562e+01 1.0 0.00e+00 0.0 4.1e+07 1.4e+05 
> 3.5e+02  1  0  3  3  9   1  0  3  3  9 0
> MatPtAPNumeric50 1.0 6.3072e+01 1.0 2.07e+09 3.8 2.4e+07 3.1e+05 
> 4.0e+02  1  1  2  4 11   1  1  2  4 11 235011
> 
> New implementation of the all-at-once algorithm:
> 
> MatPtAP   50 1.0 2.2153e+02 1.0 0.00e+00 0.0 1.0e+08 1.4e+05 
> 6.0e+02  4  0  7  7 17   4  0  7  7 17 0
> MatPtAPSymbolic   50 1.0 1.1055e+02 1.0 0.00e+00 0.0 7.9e+07 1.2e+05 
> 2.0e+02  2  0  5  4  6   2  0  5  4  6 0
> MatPtAPNumeric50 1.0 1.1102e+02 1.0 0.00e+00 0.0 2.6e+07 2.0e+05 
> 4.0e+02  2  0  2  3 11   2  0  2  3 11 0
> 
> 
> You can see here the all-at-once is a bit slower than ptap-scalable, but it 
> uses only much less memory.   
> 
> 
> Fande
>  



Re: [petsc-dev] New implementation of PtAP based on all-at-once algorithm

2019-04-11 Thread Mark Adams via petsc-dev
Interesting, nice work.

It would be interesting to get the flop counters working.

This looks like GMG, I assume 3D.

The degree of parallelism is not very realistic. You should probably run a
10x smaller problem, at least, or use 10x more processes. I guess it does
not matter. This basically like a one node run because the subdomains are
so large.

And are you sure the numerics are the same with and without hypre? Hypre is
15x slower. Any ideas what is going on?

It might be interesting to scale this test down to a node to see if this is
from communication.

Again, nice work,
Mark


On Thu, Apr 11, 2019 at 7:08 PM Fande Kong  wrote:

> Hi Developers,
>
> I just want to share a good news.  It is known PETSc-ptap-scalable is
> taking too much memory for some applications because it needs to build
> intermediate data structures.  According to Mark's suggestions, I
> implemented the  all-at-once algorithm that does not cache any intermediate
> data.
>
> I did some comparison,  the new implementation is actually scalable in
> terms of the memory usage and the compute time even though it is still
> slower than "ptap-scalable".   There are some memory profiling results (see
> the attachments). The new all-at-once implementation use the similar amount
> of memory as hypre, but it way faster than hypre.
>
> For example, for a problem with 14,893,346,880 unknowns using 10,000
> processor cores,  There are timing results:
>
> Hypre algorithm:
>
> MatPtAP   50 1.0 3.5353e+03 1.0 0.00e+00 0.0 1.9e+07 3.3e+04
> 6.0e+02 33  0  1  0 17  33  0  1  0 17 0
> MatPtAPSymbolic   50 1.0 2.3969e-0213.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> MatPtAPNumeric50 1.0 3.5353e+03 1.0 0.00e+00 0.0 1.9e+07 3.3e+04
> 6.0e+02 33  0  1  0 17  33  0  1  0 17 0
>
> PETSc scalable PtAP:
>
> MatPtAP   50 1.0 1.1453e+02 1.0 2.07e+09 3.8 6.6e+07 2.0e+05
> 7.5e+02  2  1  4  6 20   2  1  4  6 20 129418
> MatPtAPSymbolic   50 1.0 5.1562e+01 1.0 0.00e+00 0.0 4.1e+07 1.4e+05
> 3.5e+02  1  0  3  3  9   1  0  3  3  9 0
> MatPtAPNumeric50 1.0 6.3072e+01 1.0 2.07e+09 3.8 2.4e+07 3.1e+05
> 4.0e+02  1  1  2  4 11   1  1  2  4 11 235011
>
> New implementation of the all-at-once algorithm:
>
> MatPtAP   50 1.0 2.2153e+02 1.0 0.00e+00 0.0 1.0e+08 1.4e+05
> 6.0e+02  4  0  7  7 17   4  0  7  7 17 0
> MatPtAPSymbolic   50 1.0 1.1055e+02 1.0 0.00e+00 0.0 7.9e+07 1.2e+05
> 2.0e+02  2  0  5  4  6   2  0  5  4  6 0
> MatPtAPNumeric50 1.0 1.1102e+02 1.0 0.00e+00 0.0 2.6e+07 2.0e+05
> 4.0e+02  2  0  2  3 11   2  0  2  3 11 0
>
>
> You can see here the all-at-once is a bit slower than ptap-scalable, but
> it uses only much less memory.
>
>
> Fande
>
>


Re: [petsc-dev] New implementation of PtAP based on all-at-once algorithm

2019-04-11 Thread Smith, Barry F. via petsc-dev


  Excellent! Thanks

   Barry


> On Apr 11, 2019, at 6:08 PM, Fande Kong via petsc-dev  
> wrote:
> 
> Hi Developers,
> 
> I just want to share a good news.  It is known PETSc-ptap-scalable is taking 
> too much memory for some applications because it needs to build intermediate 
> data structures.  According to Mark's suggestions, I implemented the  
> all-at-once algorithm that does not cache any intermediate data. 
> 
> I did some comparison,  the new implementation is actually scalable in terms 
> of the memory usage and the compute time even though it is still  slower than 
> "ptap-scalable".   There are some memory profiling results (see the 
> attachments). The new all-at-once implementation use the similar amount of 
> memory as hypre, but it way faster than hypre.
> 
> For example, for a problem with 14,893,346,880 unknowns using 10,000 
> processor cores,  There are timing results:
> 
> Hypre algorithm:
> 
> MatPtAP   50 1.0 3.5353e+03 1.0 0.00e+00 0.0 1.9e+07 3.3e+04 
> 6.0e+02 33  0  1  0 17  33  0  1  0 17 0
> MatPtAPSymbolic   50 1.0 2.3969e-0213.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> MatPtAPNumeric50 1.0 3.5353e+03 1.0 0.00e+00 0.0 1.9e+07 3.3e+04 
> 6.0e+02 33  0  1  0 17  33  0  1  0 17 0
> 
> PETSc scalable PtAP:
> 
> MatPtAP   50 1.0 1.1453e+02 1.0 2.07e+09 3.8 6.6e+07 2.0e+05 
> 7.5e+02  2  1  4  6 20   2  1  4  6 20 129418
> MatPtAPSymbolic   50 1.0 5.1562e+01 1.0 0.00e+00 0.0 4.1e+07 1.4e+05 
> 3.5e+02  1  0  3  3  9   1  0  3  3  9 0
> MatPtAPNumeric50 1.0 6.3072e+01 1.0 2.07e+09 3.8 2.4e+07 3.1e+05 
> 4.0e+02  1  1  2  4 11   1  1  2  4 11 235011
> 
> New implementation of the all-at-once algorithm:
> 
> MatPtAP   50 1.0 2.2153e+02 1.0 0.00e+00 0.0 1.0e+08 1.4e+05 
> 6.0e+02  4  0  7  7 17   4  0  7  7 17 0
> MatPtAPSymbolic   50 1.0 1.1055e+02 1.0 0.00e+00 0.0 7.9e+07 1.2e+05 
> 2.0e+02  2  0  5  4  6   2  0  5  4  6 0
> MatPtAPNumeric50 1.0 1.1102e+02 1.0 0.00e+00 0.0 2.6e+07 2.0e+05 
> 4.0e+02  2  0  2  3 11   2  0  2  3 11 0
> 
> 
> You can see here the all-at-once is a bit slower than ptap-scalable, but it 
> uses only much less memory.   
> 
> 
> Fande
>  
> 



Re: [petsc-dev] Fwd: [firedrake] Dockerised build tests.

2019-04-11 Thread Jed Brown via petsc-dev
Lawrence Mitchell via petsc-dev  writes:

>> On 11 Apr 2019, at 21:02, Matthew Knepley via petsc-dev 
>>  wrote:
>> 
>> Jed, should we be doing this? My first impression is that our builds catch a 
>> lot of configure errors so we do not want it.
>
> We still configure and build firedrake in the tests. This is just for 
> downstream applications that test on our build hardware with master. The 
> containers are built on a passing master build of firedrake itself, so the 
> downstream tests don't have to recreate that build. 

And this is certainly something PETSc can do, though we'd have to agree
on some supported configuration(s).

The biggest cost for PETSc is that we have so many damn configuration
options.


Re: [petsc-dev] Fwd: [firedrake] Dockerised build tests.

2019-04-11 Thread Lawrence Mitchell via petsc-dev



> On 11 Apr 2019, at 21:02, Matthew Knepley via petsc-dev 
>  wrote:
> 
> Jed, should we be doing this? My first impression is that our builds catch a 
> lot of configure errors so we do not want it.

We still configure and build firedrake in the tests. This is just for 
downstream applications that test on our build hardware with master. The 
containers are built on a passing master build of firedrake itself, so the 
downstream tests don't have to recreate that build. 

Lawrence

Re: [petsc-dev] Fwd: [firedrake] Dockerised build tests.

2019-04-11 Thread Jed Brown via petsc-dev
You mean using it for all the dependencies?  Most commits don't change
anything in the config/ tree, so we don't need to re-run configure.  But
they do generally change the source so we'd still need to build PETSc.
We could set up caching to only rebuild what is needed, but the details
of the caching would be specific to the environment we're running in
(currently a mix of Satish's cron jobs and Jenkins).

I've been experimenting with GitLab-CI recently, which would be another
option (much less resource-hungry and click-around-configuration-heavy
than Jenkins).  (GitLab-CI works with repositories not hosted at GitLab,
though the PR integration and related features are much better when
hosted on GitLab.)

Matthew Knepley via petsc-dev  writes:

> Jed, should we be doing this? My first impression is that our builds catch
> a lot of configure errors so we do not want it.
>
>
>Matt
>
> -- Forwarded message -
> From: Ham, David A 
> Date: Thu, Apr 11, 2019, 12:11
> Subject: [firedrake] Dockerised build tests.
> To: firedrake 
>
>
> Dear Firedrakers,
>
>
>
> As of this afternoon, the build test systems for Firedrake and Gusto master
> branches are containerised, Thetis will follow shortly. This enables us to
> use significantly more build resources. For Gusto and Thetis it also
> removes the need for the build system to build Firedrake (and in particular
> PETSc) on every push.
>
>
>
> Short version: this makes build testing on Jenkins faster.
>
>
>
> *If you maintain branches of Firedrake, Gusto or Thetis:* please merge or
> rebase on the respective master. This will cause your Jenkinsfile to pick
> up the necessary updates. For Thetis branch maintainers, you need to wait
> until the branch lands. This will hopefully be in the next 24 hours.
>
>
>
> *If you are using a continuous integration system for a project that builds
> on Firedrake:* consider basing your builder on the
> firedrakeproject/firedrake-vanilla:latest container on Docker Hub. This
> will save you the load of rebuilding Firedrake every time.
>
>
>
> *If you just use Firedrake*: You don’t need to do anything. You shouldn’t
> notice the change.
>
>
>
> Regards,
>
>
>
> David
>
>
>
> --
>
> Dr David Ham
>
> Department of Mathematics
>
> Imperial College London
>
> https://www.imperial.ac.uk/people/david.ham
>
> https://www.firedrakeproject.org
>
>
> ___
> firedrake mailing list
> firedr...@imperial.ac.uk
> https://mailman.ic.ac.uk/mailman/listinfo/firedrake