Re: [petsc-dev] New implementation of PtAP based on all-at-once algorithm
> On Apr 11, 2019, at 9:07 PM, Mark Adams via petsc-dev > wrote: > > Interesting, nice work. > > It would be interesting to get the flop counters working. > > This looks like GMG, I assume 3D. > > The degree of parallelism is not very realistic. You should probably run a > 10x smaller problem, at least, or use 10x more processes. Why do you say that? He's got his machine with a certain amount of physical memory per node, are you saying he should ignore/not use 90% of that physical memory for his simulation? He should buy a machine 10x bigger just because it means having less degrees of freedom per node (whose footing the bill for this purchase?). At INL they run simulations for a purpose, not just for scalability studies and there are no dang GPUs or barely used over-sized monstrocities sitting around to brag about twice a year at SC. Barry > I guess it does not matter. This basically like a one node run because the > subdomains are so large. > > And are you sure the numerics are the same with and without hypre? Hypre is > 15x slower. Any ideas what is going on? > > It might be interesting to scale this test down to a node to see if this is > from communication. > > Again, nice work, > Mark > > > On Thu, Apr 11, 2019 at 7:08 PM Fande Kong wrote: > Hi Developers, > > I just want to share a good news. It is known PETSc-ptap-scalable is taking > too much memory for some applications because it needs to build intermediate > data structures. According to Mark's suggestions, I implemented the > all-at-once algorithm that does not cache any intermediate data. > > I did some comparison, the new implementation is actually scalable in terms > of the memory usage and the compute time even though it is still slower than > "ptap-scalable". There are some memory profiling results (see the > attachments). The new all-at-once implementation use the similar amount of > memory as hypre, but it way faster than hypre. > > For example, for a problem with 14,893,346,880 unknowns using 10,000 > processor cores, There are timing results: > > Hypre algorithm: > > MatPtAP 50 1.0 3.5353e+03 1.0 0.00e+00 0.0 1.9e+07 3.3e+04 > 6.0e+02 33 0 1 0 17 33 0 1 0 17 0 > MatPtAPSymbolic 50 1.0 2.3969e-0213.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatPtAPNumeric50 1.0 3.5353e+03 1.0 0.00e+00 0.0 1.9e+07 3.3e+04 > 6.0e+02 33 0 1 0 17 33 0 1 0 17 0 > > PETSc scalable PtAP: > > MatPtAP 50 1.0 1.1453e+02 1.0 2.07e+09 3.8 6.6e+07 2.0e+05 > 7.5e+02 2 1 4 6 20 2 1 4 6 20 129418 > MatPtAPSymbolic 50 1.0 5.1562e+01 1.0 0.00e+00 0.0 4.1e+07 1.4e+05 > 3.5e+02 1 0 3 3 9 1 0 3 3 9 0 > MatPtAPNumeric50 1.0 6.3072e+01 1.0 2.07e+09 3.8 2.4e+07 3.1e+05 > 4.0e+02 1 1 2 4 11 1 1 2 4 11 235011 > > New implementation of the all-at-once algorithm: > > MatPtAP 50 1.0 2.2153e+02 1.0 0.00e+00 0.0 1.0e+08 1.4e+05 > 6.0e+02 4 0 7 7 17 4 0 7 7 17 0 > MatPtAPSymbolic 50 1.0 1.1055e+02 1.0 0.00e+00 0.0 7.9e+07 1.2e+05 > 2.0e+02 2 0 5 4 6 2 0 5 4 6 0 > MatPtAPNumeric50 1.0 1.1102e+02 1.0 0.00e+00 0.0 2.6e+07 2.0e+05 > 4.0e+02 2 0 2 3 11 2 0 2 3 11 0 > > > You can see here the all-at-once is a bit slower than ptap-scalable, but it > uses only much less memory. > > > Fande >
Re: [petsc-dev] New implementation of PtAP based on all-at-once algorithm
Interesting, nice work. It would be interesting to get the flop counters working. This looks like GMG, I assume 3D. The degree of parallelism is not very realistic. You should probably run a 10x smaller problem, at least, or use 10x more processes. I guess it does not matter. This basically like a one node run because the subdomains are so large. And are you sure the numerics are the same with and without hypre? Hypre is 15x slower. Any ideas what is going on? It might be interesting to scale this test down to a node to see if this is from communication. Again, nice work, Mark On Thu, Apr 11, 2019 at 7:08 PM Fande Kong wrote: > Hi Developers, > > I just want to share a good news. It is known PETSc-ptap-scalable is > taking too much memory for some applications because it needs to build > intermediate data structures. According to Mark's suggestions, I > implemented the all-at-once algorithm that does not cache any intermediate > data. > > I did some comparison, the new implementation is actually scalable in > terms of the memory usage and the compute time even though it is still > slower than "ptap-scalable". There are some memory profiling results (see > the attachments). The new all-at-once implementation use the similar amount > of memory as hypre, but it way faster than hypre. > > For example, for a problem with 14,893,346,880 unknowns using 10,000 > processor cores, There are timing results: > > Hypre algorithm: > > MatPtAP 50 1.0 3.5353e+03 1.0 0.00e+00 0.0 1.9e+07 3.3e+04 > 6.0e+02 33 0 1 0 17 33 0 1 0 17 0 > MatPtAPSymbolic 50 1.0 2.3969e-0213.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatPtAPNumeric50 1.0 3.5353e+03 1.0 0.00e+00 0.0 1.9e+07 3.3e+04 > 6.0e+02 33 0 1 0 17 33 0 1 0 17 0 > > PETSc scalable PtAP: > > MatPtAP 50 1.0 1.1453e+02 1.0 2.07e+09 3.8 6.6e+07 2.0e+05 > 7.5e+02 2 1 4 6 20 2 1 4 6 20 129418 > MatPtAPSymbolic 50 1.0 5.1562e+01 1.0 0.00e+00 0.0 4.1e+07 1.4e+05 > 3.5e+02 1 0 3 3 9 1 0 3 3 9 0 > MatPtAPNumeric50 1.0 6.3072e+01 1.0 2.07e+09 3.8 2.4e+07 3.1e+05 > 4.0e+02 1 1 2 4 11 1 1 2 4 11 235011 > > New implementation of the all-at-once algorithm: > > MatPtAP 50 1.0 2.2153e+02 1.0 0.00e+00 0.0 1.0e+08 1.4e+05 > 6.0e+02 4 0 7 7 17 4 0 7 7 17 0 > MatPtAPSymbolic 50 1.0 1.1055e+02 1.0 0.00e+00 0.0 7.9e+07 1.2e+05 > 2.0e+02 2 0 5 4 6 2 0 5 4 6 0 > MatPtAPNumeric50 1.0 1.1102e+02 1.0 0.00e+00 0.0 2.6e+07 2.0e+05 > 4.0e+02 2 0 2 3 11 2 0 2 3 11 0 > > > You can see here the all-at-once is a bit slower than ptap-scalable, but > it uses only much less memory. > > > Fande > >
Re: [petsc-dev] New implementation of PtAP based on all-at-once algorithm
Excellent! Thanks Barry > On Apr 11, 2019, at 6:08 PM, Fande Kong via petsc-dev > wrote: > > Hi Developers, > > I just want to share a good news. It is known PETSc-ptap-scalable is taking > too much memory for some applications because it needs to build intermediate > data structures. According to Mark's suggestions, I implemented the > all-at-once algorithm that does not cache any intermediate data. > > I did some comparison, the new implementation is actually scalable in terms > of the memory usage and the compute time even though it is still slower than > "ptap-scalable". There are some memory profiling results (see the > attachments). The new all-at-once implementation use the similar amount of > memory as hypre, but it way faster than hypre. > > For example, for a problem with 14,893,346,880 unknowns using 10,000 > processor cores, There are timing results: > > Hypre algorithm: > > MatPtAP 50 1.0 3.5353e+03 1.0 0.00e+00 0.0 1.9e+07 3.3e+04 > 6.0e+02 33 0 1 0 17 33 0 1 0 17 0 > MatPtAPSymbolic 50 1.0 2.3969e-0213.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatPtAPNumeric50 1.0 3.5353e+03 1.0 0.00e+00 0.0 1.9e+07 3.3e+04 > 6.0e+02 33 0 1 0 17 33 0 1 0 17 0 > > PETSc scalable PtAP: > > MatPtAP 50 1.0 1.1453e+02 1.0 2.07e+09 3.8 6.6e+07 2.0e+05 > 7.5e+02 2 1 4 6 20 2 1 4 6 20 129418 > MatPtAPSymbolic 50 1.0 5.1562e+01 1.0 0.00e+00 0.0 4.1e+07 1.4e+05 > 3.5e+02 1 0 3 3 9 1 0 3 3 9 0 > MatPtAPNumeric50 1.0 6.3072e+01 1.0 2.07e+09 3.8 2.4e+07 3.1e+05 > 4.0e+02 1 1 2 4 11 1 1 2 4 11 235011 > > New implementation of the all-at-once algorithm: > > MatPtAP 50 1.0 2.2153e+02 1.0 0.00e+00 0.0 1.0e+08 1.4e+05 > 6.0e+02 4 0 7 7 17 4 0 7 7 17 0 > MatPtAPSymbolic 50 1.0 1.1055e+02 1.0 0.00e+00 0.0 7.9e+07 1.2e+05 > 2.0e+02 2 0 5 4 6 2 0 5 4 6 0 > MatPtAPNumeric50 1.0 1.1102e+02 1.0 0.00e+00 0.0 2.6e+07 2.0e+05 > 4.0e+02 2 0 2 3 11 2 0 2 3 11 0 > > > You can see here the all-at-once is a bit slower than ptap-scalable, but it > uses only much less memory. > > > Fande > >
Re: [petsc-dev] Fwd: [firedrake] Dockerised build tests.
Lawrence Mitchell via petsc-dev writes: >> On 11 Apr 2019, at 21:02, Matthew Knepley via petsc-dev >> wrote: >> >> Jed, should we be doing this? My first impression is that our builds catch a >> lot of configure errors so we do not want it. > > We still configure and build firedrake in the tests. This is just for > downstream applications that test on our build hardware with master. The > containers are built on a passing master build of firedrake itself, so the > downstream tests don't have to recreate that build. And this is certainly something PETSc can do, though we'd have to agree on some supported configuration(s). The biggest cost for PETSc is that we have so many damn configuration options.
Re: [petsc-dev] Fwd: [firedrake] Dockerised build tests.
> On 11 Apr 2019, at 21:02, Matthew Knepley via petsc-dev > wrote: > > Jed, should we be doing this? My first impression is that our builds catch a > lot of configure errors so we do not want it. We still configure and build firedrake in the tests. This is just for downstream applications that test on our build hardware with master. The containers are built on a passing master build of firedrake itself, so the downstream tests don't have to recreate that build. Lawrence
Re: [petsc-dev] Fwd: [firedrake] Dockerised build tests.
You mean using it for all the dependencies? Most commits don't change anything in the config/ tree, so we don't need to re-run configure. But they do generally change the source so we'd still need to build PETSc. We could set up caching to only rebuild what is needed, but the details of the caching would be specific to the environment we're running in (currently a mix of Satish's cron jobs and Jenkins). I've been experimenting with GitLab-CI recently, which would be another option (much less resource-hungry and click-around-configuration-heavy than Jenkins). (GitLab-CI works with repositories not hosted at GitLab, though the PR integration and related features are much better when hosted on GitLab.) Matthew Knepley via petsc-dev writes: > Jed, should we be doing this? My first impression is that our builds catch > a lot of configure errors so we do not want it. > > >Matt > > -- Forwarded message - > From: Ham, David A > Date: Thu, Apr 11, 2019, 12:11 > Subject: [firedrake] Dockerised build tests. > To: firedrake > > > Dear Firedrakers, > > > > As of this afternoon, the build test systems for Firedrake and Gusto master > branches are containerised, Thetis will follow shortly. This enables us to > use significantly more build resources. For Gusto and Thetis it also > removes the need for the build system to build Firedrake (and in particular > PETSc) on every push. > > > > Short version: this makes build testing on Jenkins faster. > > > > *If you maintain branches of Firedrake, Gusto or Thetis:* please merge or > rebase on the respective master. This will cause your Jenkinsfile to pick > up the necessary updates. For Thetis branch maintainers, you need to wait > until the branch lands. This will hopefully be in the next 24 hours. > > > > *If you are using a continuous integration system for a project that builds > on Firedrake:* consider basing your builder on the > firedrakeproject/firedrake-vanilla:latest container on Docker Hub. This > will save you the load of rebuilding Firedrake every time. > > > > *If you just use Firedrake*: You don’t need to do anything. You shouldn’t > notice the change. > > > > Regards, > > > > David > > > > -- > > Dr David Ham > > Department of Mathematics > > Imperial College London > > https://www.imperial.ac.uk/people/david.ham > > https://www.firedrakeproject.org > > > ___ > firedrake mailing list > firedr...@imperial.ac.uk > https://mailman.ic.ac.uk/mailman/listinfo/firedrake