Re: [petsc-dev] MatMult on Summit

2019-09-20 Thread Smith, Barry F. via petsc-dev


  Dang, makes the GPUs less impressive :-). 

> On Sep 21, 2019, at 12:44 AM, Zhang, Junchao  wrote:
> 
> Here are CPU version results on one node with 24 cores, 42 cores. Click the 
> links for core layout.
> 
> 24 MPI ranks, https://jsrunvisualizer.olcf.ornl.gov/?s4f1o01n6c4g1r14d1b21l0=
> MatMult  100 1.0 3.1431e+00 1.0 2.63e+09 1.2 1.9e+04 5.9e+04 
> 0.0e+00  8 99 97 25  0 100100100100  0 17948   0  0 0.00e+000 
> 0.00e+00  0
> VecScatterBegin  100 1.0 2.0583e-02 2.3 0.00e+00 0.0 1.9e+04 5.9e+04 
> 0.0e+00  0  0 97 25  0   0  0100100  0 0   0  0 0.00e+000 
> 0.00e+00  0
> VecScatterEnd100 1.0 1.0639e+0050.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  2  0  0  0  0  19  0  0  0  0 0   0  0 0.00e+000 
> 0.00e+00  0
> 
> 42 MPI ranks, https://jsrunvisualizer.olcf.ornl.gov/?s4f1o01n6c7g1r17d1b21l0=
> MatMult  100 1.0 2.0519e+00 1.0 1.52e+09 1.3 3.5e+04 4.1e+04 
> 0.0e+00 23 99 97 30  0 100100100100  0 27493   0  0 0.00e+000 
> 0.00e+00  0
> VecScatterBegin  100 1.0 2.0971e-02 3.4 0.00e+00 0.0 3.5e+04 4.1e+04 
> 0.0e+00  0  0 97 30  0   1  0100100  0 0   0  0 0.00e+000 
> 0.00e+00  0
> VecScatterEnd100 1.0 8.5184e-0162.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  6  0  0  0  0  24  0  0  0  0 0   0  0 0.00e+000 
> 0.00e+00  0
> 
> --Junchao Zhang
> 
> 
> On Fri, Sep 20, 2019 at 11:48 PM Smith, Barry F.  wrote:
> 
>   Junchao,
> 
>Very interesting. For completeness please run also 24 and 42 CPUs without 
> the GPUs. Note that the default layout for CPU cores is not good. You will 
> want 3 cores on each socket then 12 on each.
> 
>   Thanks
> 
>Barry
> 
>   Since Tim is one of our reviewers next week this is a very good test matrix 
> :-)
> 
> 
> > On Sep 20, 2019, at 11:39 PM, Zhang, Junchao via petsc-dev 
> >  wrote:
> > 
> > Click the links to visualize it.
> > 
> > 6 ranks
> > https://jsrunvisualizer.olcf.ornl.gov/?s4f1o01n6c1g1r11d1b21l0=
> > jsrun -n 6 -a 1 -c 1 -g 1 -r 6 --latency_priority GPU-GPU 
> > --launch_distribution packed --bind packed:1 js_task_info ./ex900 -f 
> > HV15R.aij -mat_type aijcusparse -vec_type cuda -n 100 -log_view
> > 
> > 24 ranks
> > https://jsrunvisualizer.olcf.ornl.gov/?s4f1o01n6c4g1r14d1b21l0=
> > jsrun -n 6 -a 4 -c 4 -g 1 -r 6 --latency_priority GPU-GPU 
> > --launch_distribution packed --bind packed:1 js_task_info ./ex900 -f 
> > HV15R.aij -mat_type aijcusparse -vec_type cuda -n 100 -log_view
> > 
> > --Junchao Zhang
> > 
> > 
> > On Fri, Sep 20, 2019 at 11:34 PM Mills, Richard Tran via petsc-dev 
> >  wrote:
> > Junchao,
> > 
> > Can you share your 'jsrun' command so that we can see how you are mapping 
> > things to resource sets?
> > 
> > --Richard
> > 
> > On 9/20/19 11:22 PM, Zhang, Junchao via petsc-dev wrote:
> >> I downloaded a sparse matrix (HV15R) from Florida Sparse Matrix 
> >> Collection. Its size is about 2M x 2M. Then I ran the same MatMult 100 
> >> times on one node of Summit with -mat_type aijcusparse -vec_type cuda. I 
> >> found MatMult was almost dominated by VecScatter in this simple test. 
> >> Using 6 MPI ranks + 6 GPUs,  I found CUDA aware SF could improve 
> >> performance. But if I enabled Multi-Process Service on Summit and used 24 
> >> ranks + 6 GPUs, I found CUDA aware SF hurt performance. I don't know why 
> >> and have to profile it. I will also collect  data with multiple nodes. Are 
> >> the matrix and tests proper?
> >> 
> >> 
> >> EventCount  Time (sec) Flop
> >>   --- Global ---  --- Stage   Total   GPU- CpuToGpu -   - 
> >> GpuToCpu - GPU
> >>Max Ratio  Max Ratio   Max  Ratio  Mess   AvgLen  
> >> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s Count   Size   
> >> Count   Size  %F
> >> ---
> >> 6 MPI ranks (CPU version)
> >> MatMult  100 1.0 1.1895e+01 1.0 9.63e+09 1.1 2.8e+03 2.2e+05 
> >> 0.0e+00 24 99 97 18  0 100100100100  0  4743   0  0 0.00e+000 
> >> 0.00e+00  0
> >> VecScatterBegin  100 1.0 4.9145e-02 3.0 0.00e+00 0.0 2.8e+03 2.2e+05 
> >> 0.0e+00  0  0 97 18  0   0  0100100  0 0   0  0 0.00e+000 
> >> 0.00e+00  0
> >> VecScatterEnd100 1.0 2.9441e+00133  0.00e+00 0.0 0.0e+00 0.0e+00 
> >> 0.0e+00  3  0  0  0  0  13  0  0  0  0 0   0  0 0.00e+000 
> >> 0.00e+00  0
> >> 
> >> 6 MPI ranks + 6 GPUs + regular SF
> >> MatMult  100 1.0 1.7800e-01 1.0 9.66e+09 1.1 2.8e+03 2.2e+05 
> >> 0.0e+00  0 99 97 18  0 100100100100  0 318057   3084009 100 1.02e+02  100 
> >> 2.69e+02 100
> >> VecScatterBegin  100 1.0 1.2786e-01 1.3 0.00e+00 0.0 2.8e+03 2.2e+05 

Re: [petsc-dev] MatMult on Summit

2019-09-20 Thread Zhang, Junchao via petsc-dev
Here are CPU version results on one node with 24 cores, 42 cores. Click the 
links for core layout.

24 MPI ranks, https://jsrunvisualizer.olcf.ornl.gov/?s4f1o01n6c4g1r14d1b21l0=
MatMult  100 1.0 3.1431e+00 1.0 2.63e+09 1.2 1.9e+04 5.9e+04 
0.0e+00  8 99 97 25  0 100100100100  0 17948   0  0 0.00e+000 
0.00e+00  0
VecScatterBegin  100 1.0 2.0583e-02 2.3 0.00e+00 0.0 1.9e+04 5.9e+04 
0.0e+00  0  0 97 25  0   0  0100100  0 0   0  0 0.00e+000 
0.00e+00  0
VecScatterEnd100 1.0 1.0639e+0050.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  2  0  0  0  0  19  0  0  0  0 0   0  0 0.00e+000 
0.00e+00  0

42 MPI ranks, https://jsrunvisualizer.olcf.ornl.gov/?s4f1o01n6c7g1r17d1b21l0=
MatMult  100 1.0 2.0519e+00 1.0 1.52e+09 1.3 3.5e+04 4.1e+04 
0.0e+00 23 99 97 30  0 100100100100  0 27493   0  0 0.00e+000 
0.00e+00  0
VecScatterBegin  100 1.0 2.0971e-02 3.4 0.00e+00 0.0 3.5e+04 4.1e+04 
0.0e+00  0  0 97 30  0   1  0100100  0 0   0  0 0.00e+000 
0.00e+00  0
VecScatterEnd100 1.0 8.5184e-0162.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  6  0  0  0  0  24  0  0  0  0 0   0  0 0.00e+000 
0.00e+00  0

--Junchao Zhang


On Fri, Sep 20, 2019 at 11:48 PM Smith, Barry F. 
mailto:bsm...@mcs.anl.gov>> wrote:

  Junchao,

   Very interesting. For completeness please run also 24 and 42 CPUs without 
the GPUs. Note that the default layout for CPU cores is not good. You will want 
3 cores on each socket then 12 on each.

  Thanks

   Barry

  Since Tim is one of our reviewers next week this is a very good test matrix 
:-)


> On Sep 20, 2019, at 11:39 PM, Zhang, Junchao via petsc-dev 
> mailto:petsc-dev@mcs.anl.gov>> wrote:
>
> Click the links to visualize it.
>
> 6 ranks
> https://jsrunvisualizer.olcf.ornl.gov/?s4f1o01n6c1g1r11d1b21l0=
> jsrun -n 6 -a 1 -c 1 -g 1 -r 6 --latency_priority GPU-GPU 
> --launch_distribution packed --bind packed:1 js_task_info ./ex900 -f 
> HV15R.aij -mat_type aijcusparse -vec_type cuda -n 100 -log_view
>
> 24 ranks
> https://jsrunvisualizer.olcf.ornl.gov/?s4f1o01n6c4g1r14d1b21l0=
> jsrun -n 6 -a 4 -c 4 -g 1 -r 6 --latency_priority GPU-GPU 
> --launch_distribution packed --bind packed:1 js_task_info ./ex900 -f 
> HV15R.aij -mat_type aijcusparse -vec_type cuda -n 100 -log_view
>
> --Junchao Zhang
>
>
> On Fri, Sep 20, 2019 at 11:34 PM Mills, Richard Tran via petsc-dev 
> mailto:petsc-dev@mcs.anl.gov>> wrote:
> Junchao,
>
> Can you share your 'jsrun' command so that we can see how you are mapping 
> things to resource sets?
>
> --Richard
>
> On 9/20/19 11:22 PM, Zhang, Junchao via petsc-dev wrote:
>> I downloaded a sparse matrix (HV15R) from Florida Sparse Matrix Collection. 
>> Its size is about 2M x 2M. Then I ran the same MatMult 100 times on one node 
>> of Summit with -mat_type aijcusparse -vec_type cuda. I found MatMult was 
>> almost dominated by VecScatter in this simple test. Using 6 MPI ranks + 6 
>> GPUs,  I found CUDA aware SF could improve performance. But if I enabled 
>> Multi-Process Service on Summit and used 24 ranks + 6 GPUs, I found CUDA 
>> aware SF hurt performance. I don't know why and have to profile it. I will 
>> also collect  data with multiple nodes. Are the matrix and tests proper?
>>
>> 
>> EventCount  Time (sec) Flop  
>> --- Global ---  --- Stage   Total   GPU- CpuToGpu -   - GpuToCpu 
>> - GPU
>>Max Ratio  Max Ratio   Max  Ratio  Mess   AvgLen  
>> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s Count   Size   Count  
>>  Size  %F
>> ---
>> 6 MPI ranks (CPU version)
>> MatMult  100 1.0 1.1895e+01 1.0 9.63e+09 1.1 2.8e+03 2.2e+05 
>> 0.0e+00 24 99 97 18  0 100100100100  0  4743   0  0 0.00e+000 
>> 0.00e+00  0
>> VecScatterBegin  100 1.0 4.9145e-02 3.0 0.00e+00 0.0 2.8e+03 2.2e+05 
>> 0.0e+00  0  0 97 18  0   0  0100100  0 0   0  0 0.00e+000 
>> 0.00e+00  0
>> VecScatterEnd100 1.0 2.9441e+00133  0.00e+00 0.0 0.0e+00 0.0e+00 
>> 0.0e+00  3  0  0  0  0  13  0  0  0  0 0   0  0 0.00e+000 
>> 0.00e+00  0
>>
>> 6 MPI ranks + 6 GPUs + regular SF
>> MatMult  100 1.0 1.7800e-01 1.0 9.66e+09 1.1 2.8e+03 2.2e+05 
>> 0.0e+00  0 99 97 18  0 100100100100  0 318057   3084009 100 1.02e+02  100 
>> 2.69e+02 100
>> VecScatterBegin  100 1.0 1.2786e-01 1.3 0.00e+00 0.0 2.8e+03 2.2e+05 
>> 0.0e+00  0  0 97 18  0  64  0100100  0 0   0  0 0.00e+00  100 
>> 2.69e+02  0
>> VecScatterEnd100 1.0 6.2196e-02 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 
>> 0.0e+00  0  0  0  0  0  22  0  0  0  0 0   0  0 0.00e+000 

[petsc-dev] Tip while using valgrind

2019-09-20 Thread Smith, Barry F. via petsc-dev


  When using valgrind it is important to understand that it does not 
immediately make a report when it finds an uninitialized memory, it only makes 
a report when an uninitialized memory would cause a change in the program flow 
(like in an if statement). This is why sometimes it seems to report an 
uninitialized variable that doesn't make sense. It could be that the value at 
the location came from an earlier uninitialized location and that is why 
valgrind is reporting it, not because the reported location was uninitialized.  
Using the valgrind option --track-origins=yes is very useful since it will 
always point back to the area of memory that had the uninitialized value.

 I'm sending this out because twice recently I've struggled with cases where 
the initialized location "traveled" a long way before valgrind reported it and 
my confusion as to how valgrind worked kept making me leap to the wrong 
conclusions.

   Barry




Re: [petsc-dev] Configure hangs on Summit

2019-09-20 Thread Smith, Barry F. via petsc-dev


  Then the hang is curious. 

> On Sep 20, 2019, at 11:28 PM, Mills, Richard Tran  wrote:
> 
> Everything that Barry says about '--with-batch' is valid, but let me point 
> out one thing about Summit: You don't need "--with-batch" at all, because the 
> Summit login/compile nodes run the same hardware (minus the GPUs) and 
> software stack as the back-end compute nodes. This makes configuring and 
> building software far, far easier than we are used to on the big LCF 
> machines. I was actually shocked when I found this out -- I'd gotten so used 
> to struggling with cross-compilers, etc.
> 
> --Richard
> 
> On 9/20/19 9:28 PM, Smith, Barry F. wrote:
>> --with-batch is still there and should be used in such circumstances. 
>> The difference is that --with-branch does not generate a program that you 
>> need to submit to the batch system before continuing the configure. Instead 
>> --with-batch guesses at and skips some of the tests (with clear warnings on 
>> how you can adjust the guesses).
>> 
>>  Regarding the hanging. This happens because the thread monitoring of 
>> configure started executables was removed years ago since it was slow and 
>> occasionally buggy (the default wait was an absurd 10 minutes too). Thus 
>> when configure tried to test an mpiexec that hung the test would hang.   
>> There is code in one of my branches I've been struggling to get into master 
>> for a long time that puts back the thread monitoring for this one call with 
>> a small timeout so you should never see this hang again.
>> 
>>   Barry
>> 
>>We could be a little clever and have configure detect it is on a Cray or 
>> other batch system and automatically add the batch option. That would be a 
>> nice little feature for someone to add. Probably just a few lines of code. 
>>
>> 
>> 
>>> On Sep 20, 2019, at 8:59 PM, Mills, Richard Tran via petsc-dev 
>>> 
>>>  wrote:
>>> 
>>> Hi Junchao,
>>> 
>>> Glad you've found a workaround, but I don't know why you are hitting this 
>>> problem. The last time I built PETSc on Summit (just a couple days ago), I 
>>> didn't have this problem. I'm working from the example template that's in 
>>> the PETSc repo at config/examples/arch-olcf-summit-opt.py.
>>> 
>>> Can you point me to your configure script on Summit so I can try to 
>>> reproduce your problem?
>>> 
>>> --Richard
>>> 
>>> On 9/20/19 4:25 PM, Zhang, Junchao via petsc-dev wrote:
>>> 
 Satish's trick --with-mpiexec=/bin/true solved the problem.  Thanks.
 --Junchao Zhang
 
 
 On Fri, Sep 20, 2019 at 3:50 PM Junchao Zhang 
 
  wrote:
 My configure hangs on Summit at
   TESTING: configureMPIEXEC from 
 config.packages.MPI(config/BuildSystem/config/packages/MPI.py:170)
 
 On the machine one has to use script to submit jobs. So why do we need 
 configureMPIEXEC? Do I need to use --with-batch? I remember we removed 
 that.
 
 --Junchao Zhang
 
> 



Re: [petsc-dev] MatMult on Summit

2019-09-20 Thread Smith, Barry F. via petsc-dev


  Junchao,

   Very interesting. For completeness please run also 24 and 42 CPUs without 
the GPUs. Note that the default layout for CPU cores is not good. You will want 
3 cores on each socket then 12 on each.

  Thanks

   Barry

  Since Tim is one of our reviewers next week this is a very good test matrix 
:-)


> On Sep 20, 2019, at 11:39 PM, Zhang, Junchao via petsc-dev 
>  wrote:
> 
> Click the links to visualize it.
> 
> 6 ranks
> https://jsrunvisualizer.olcf.ornl.gov/?s4f1o01n6c1g1r11d1b21l0=
> jsrun -n 6 -a 1 -c 1 -g 1 -r 6 --latency_priority GPU-GPU 
> --launch_distribution packed --bind packed:1 js_task_info ./ex900 -f 
> HV15R.aij -mat_type aijcusparse -vec_type cuda -n 100 -log_view
> 
> 24 ranks
> https://jsrunvisualizer.olcf.ornl.gov/?s4f1o01n6c4g1r14d1b21l0=
> jsrun -n 6 -a 4 -c 4 -g 1 -r 6 --latency_priority GPU-GPU 
> --launch_distribution packed --bind packed:1 js_task_info ./ex900 -f 
> HV15R.aij -mat_type aijcusparse -vec_type cuda -n 100 -log_view
> 
> --Junchao Zhang
> 
> 
> On Fri, Sep 20, 2019 at 11:34 PM Mills, Richard Tran via petsc-dev 
>  wrote:
> Junchao,
> 
> Can you share your 'jsrun' command so that we can see how you are mapping 
> things to resource sets?
> 
> --Richard
> 
> On 9/20/19 11:22 PM, Zhang, Junchao via petsc-dev wrote:
>> I downloaded a sparse matrix (HV15R) from Florida Sparse Matrix Collection. 
>> Its size is about 2M x 2M. Then I ran the same MatMult 100 times on one node 
>> of Summit with -mat_type aijcusparse -vec_type cuda. I found MatMult was 
>> almost dominated by VecScatter in this simple test. Using 6 MPI ranks + 6 
>> GPUs,  I found CUDA aware SF could improve performance. But if I enabled 
>> Multi-Process Service on Summit and used 24 ranks + 6 GPUs, I found CUDA 
>> aware SF hurt performance. I don't know why and have to profile it. I will 
>> also collect  data with multiple nodes. Are the matrix and tests proper?
>> 
>> 
>> EventCount  Time (sec) Flop  
>> --- Global ---  --- Stage   Total   GPU- CpuToGpu -   - GpuToCpu 
>> - GPU
>>Max Ratio  Max Ratio   Max  Ratio  Mess   AvgLen  
>> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s Count   Size   Count  
>>  Size  %F
>> ---
>> 6 MPI ranks (CPU version)
>> MatMult  100 1.0 1.1895e+01 1.0 9.63e+09 1.1 2.8e+03 2.2e+05 
>> 0.0e+00 24 99 97 18  0 100100100100  0  4743   0  0 0.00e+000 
>> 0.00e+00  0
>> VecScatterBegin  100 1.0 4.9145e-02 3.0 0.00e+00 0.0 2.8e+03 2.2e+05 
>> 0.0e+00  0  0 97 18  0   0  0100100  0 0   0  0 0.00e+000 
>> 0.00e+00  0
>> VecScatterEnd100 1.0 2.9441e+00133  0.00e+00 0.0 0.0e+00 0.0e+00 
>> 0.0e+00  3  0  0  0  0  13  0  0  0  0 0   0  0 0.00e+000 
>> 0.00e+00  0
>> 
>> 6 MPI ranks + 6 GPUs + regular SF
>> MatMult  100 1.0 1.7800e-01 1.0 9.66e+09 1.1 2.8e+03 2.2e+05 
>> 0.0e+00  0 99 97 18  0 100100100100  0 318057   3084009 100 1.02e+02  100 
>> 2.69e+02 100
>> VecScatterBegin  100 1.0 1.2786e-01 1.3 0.00e+00 0.0 2.8e+03 2.2e+05 
>> 0.0e+00  0  0 97 18  0  64  0100100  0 0   0  0 0.00e+00  100 
>> 2.69e+02  0
>> VecScatterEnd100 1.0 6.2196e-02 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 
>> 0.0e+00  0  0  0  0  0  22  0  0  0  0 0   0  0 0.00e+000 
>> 0.00e+00  0
>> VecCUDACopyTo100 1.0 1.0850e-02 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 
>> 0.0e+00  0  0  0  0  0   5  0  0  0  0 0   0100 1.02e+020 
>> 0.00e+00  0
>> VecCopyFromSome  100 1.0 1.0263e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 
>> 0.0e+00  0  0  0  0  0  54  0  0  0  0 0   0  0 0.00e+00  100 
>> 2.69e+02  0
>> 
>> 6 MPI ranks + 6 GPUs + CUDA-aware SF
>> MatMult  100 1.0 1.1112e-01 1.0 9.66e+09 1.1 2.8e+03 2.2e+05 
>> 0.0e+00  1 99 97 18  0 100100100100  0 509496   3133521   0 0.00e+000 
>> 0.00e+00 100
>> VecScatterBegin  100 1.0 7.9461e-02 1.1 0.00e+00 0.0 2.8e+03 2.2e+05 
>> 0.0e+00  1  0 97 18  0  70  0100100  0 0   0  0 0.00e+000 
>> 0.00e+00  0
>> VecScatterEnd100 1.0 2.2805e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 
>> 0.0e+00  0  0  0  0  0  17  0  0  0  0 0   0  0 0.00e+000 
>> 0.00e+00  0
>> 
>> 24 MPI ranks + 6 GPUs + regular SF
>> MatMult  100 1.0 1.1094e-01 1.0 2.63e+09 1.2 1.9e+04 5.9e+04 
>> 0.0e+00  1 99 97 25  0 100100100100  0 510337   951558  100 4.61e+01  100 
>> 6.72e+01 100
>> VecScatterBegin  100 1.0 4.8966e-02 1.8 0.00e+00 0.0 1.9e+04 5.9e+04 
>> 0.0e+00  0  0 97 25  0  34  0100100  0 0   0  0 0.00e+00  100 
>> 6.72e+01  0
>> VecScatterEnd100 1.0 7.2969e-02 4.9 0.00e+00 0.0 0.0e+00 0.0e+00 
>> 

Re: [petsc-dev] MatMult on Summit

2019-09-20 Thread Zhang, Junchao via petsc-dev
Click the links to visualize it.

6 ranks
https://jsrunvisualizer.olcf.ornl.gov/?s4f1o01n6c1g1r11d1b21l0=
jsrun -n 6 -a 1 -c 1 -g 1 -r 6 --latency_priority GPU-GPU --launch_distribution 
packed --bind packed:1 js_task_info ./ex900 -f HV15R.aij -mat_type aijcusparse 
-vec_type cuda -n 100 -log_view

24 ranks
https://jsrunvisualizer.olcf.ornl.gov/?s4f1o01n6c4g1r14d1b21l0=
jsrun -n 6 -a 4 -c 4 -g 1 -r 6 --latency_priority GPU-GPU --launch_distribution 
packed --bind packed:1 js_task_info ./ex900 -f HV15R.aij -mat_type aijcusparse 
-vec_type cuda -n 100 -log_view

--Junchao Zhang


On Fri, Sep 20, 2019 at 11:34 PM Mills, Richard Tran via petsc-dev 
mailto:petsc-dev@mcs.anl.gov>> wrote:
Junchao,

Can you share your 'jsrun' command so that we can see how you are mapping 
things to resource sets?

--Richard

On 9/20/19 11:22 PM, Zhang, Junchao via petsc-dev wrote:
I downloaded a sparse matrix (HV15R) 
from Florida Sparse Matrix Collection. Its size is about 2M x 2M. Then I ran 
the same MatMult 100 times on one node of Summit with -mat_type aijcusparse 
-vec_type cuda. I found MatMult was almost dominated by VecScatter in this 
simple test. Using 6 MPI ranks + 6 GPUs,  I found CUDA aware SF could improve 
performance. But if I enabled Multi-Process Service on Summit and used 24 ranks 
+ 6 GPUs, I found CUDA aware SF hurt performance. I don't know why and have to 
profile it. I will also collect  data with multiple nodes. Are the matrix and 
tests proper?


EventCount  Time (sec) Flop 
 --- Global ---  --- Stage   Total   GPU- CpuToGpu -   - GpuToCpu - GPU
   Max Ratio  Max Ratio   Max  Ratio  Mess   AvgLen  Reduct 
 %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s Count   Size   Count   Size  %F
---
6 MPI ranks (CPU version)
MatMult  100 1.0 1.1895e+01 1.0 9.63e+09 1.1 2.8e+03 2.2e+05 
0.0e+00 24 99 97 18  0 100100100100  0  4743   0  0 0.00e+000 
0.00e+00  0
VecScatterBegin  100 1.0 4.9145e-02 3.0 0.00e+00 0.0 2.8e+03 2.2e+05 
0.0e+00  0  0 97 18  0   0  0100100  0 0   0  0 0.00e+000 
0.00e+00  0
VecScatterEnd100 1.0 2.9441e+00133  0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  3  0  0  0  0  13  0  0  0  0 0   0  0 0.00e+000 
0.00e+00  0

6 MPI ranks + 6 GPUs + regular SF
MatMult  100 1.0 1.7800e-01 1.0 9.66e+09 1.1 2.8e+03 2.2e+05 
0.0e+00  0 99 97 18  0 100100100100  0 318057   3084009 100 1.02e+02  100 
2.69e+02 100
VecScatterBegin  100 1.0 1.2786e-01 1.3 0.00e+00 0.0 2.8e+03 2.2e+05 
0.0e+00  0  0 97 18  0  64  0100100  0 0   0  0 0.00e+00  100 
2.69e+02  0
VecScatterEnd100 1.0 6.2196e-02 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0  22  0  0  0  0 0   0  0 0.00e+000 
0.00e+00  0
VecCUDACopyTo100 1.0 1.0850e-02 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   5  0  0  0  0 0   0100 1.02e+020 
0.00e+00  0
VecCopyFromSome  100 1.0 1.0263e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0  54  0  0  0  0 0   0  0 0.00e+00  100 
2.69e+02  0

6 MPI ranks + 6 GPUs + CUDA-aware SF
MatMult  100 1.0 1.1112e-01 1.0 9.66e+09 1.1 2.8e+03 2.2e+05 
0.0e+00  1 99 97 18  0 100100100100  0 509496   3133521   0 0.00e+000 
0.00e+00 100
VecScatterBegin  100 1.0 7.9461e-02 1.1 0.00e+00 0.0 2.8e+03 2.2e+05 
0.0e+00  1  0 97 18  0  70  0100100  0 0   0  0 0.00e+000 
0.00e+00  0
VecScatterEnd100 1.0 2.2805e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0  17  0  0  0  0 0   0  0 0.00e+000 
0.00e+00  0

24 MPI ranks + 6 GPUs + regular SF
MatMult  100 1.0 1.1094e-01 1.0 2.63e+09 1.2 1.9e+04 5.9e+04 
0.0e+00  1 99 97 25  0 100100100100  0 510337   951558  100 4.61e+01  100 
6.72e+01 100
VecScatterBegin  100 1.0 4.8966e-02 1.8 0.00e+00 0.0 1.9e+04 5.9e+04 
0.0e+00  0  0 97 25  0  34  0100100  0 0   0  0 0.00e+00  100 
6.72e+01  0
VecScatterEnd100 1.0 7.2969e-02 4.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0  42  0  0  0  0 0   0  0 0.00e+000 
0.00e+00  0
VecCUDACopyTo100 1.0 4.4487e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   3  0  0  0  0 0   0100 4.61e+010 
0.00e+00  0
VecCopyFromSome  100 1.0 4.3315e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0  29  0  0  0  0 0   0  0 0.00e+00  100 
6.72e+01  0

24 MPI ranks + 6 GPUs + CUDA-aware SF
MatMult  100 1.0 1.4597e-01 1.2 2.63e+09 1.2 1.9e+04 5.9e+04 
0.0e+00  1 99 97 25  0 100100100100  0 387864   973391   

Re: [petsc-dev] MatMult on Summit

2019-09-20 Thread Mills, Richard Tran via petsc-dev
Junchao,

Can you share your 'jsrun' command so that we can see how you are mapping 
things to resource sets?

--Richard

On 9/20/19 11:22 PM, Zhang, Junchao via petsc-dev wrote:
I downloaded a sparse matrix (HV15R) 
from Florida Sparse Matrix Collection. Its size is about 2M x 2M. Then I ran 
the same MatMult 100 times on one node of Summit with -mat_type aijcusparse 
-vec_type cuda. I found MatMult was almost dominated by VecScatter in this 
simple test. Using 6 MPI ranks + 6 GPUs,  I found CUDA aware SF could improve 
performance. But if I enabled Multi-Process Service on Summit and used 24 ranks 
+ 6 GPUs, I found CUDA aware SF hurt performance. I don't know why and have to 
profile it. I will also collect  data with multiple nodes. Are the matrix and 
tests proper?


EventCount  Time (sec) Flop 
 --- Global ---  --- Stage   Total   GPU- CpuToGpu -   - GpuToCpu - GPU
   Max Ratio  Max Ratio   Max  Ratio  Mess   AvgLen  Reduct 
 %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s Count   Size   Count   Size  %F
---
6 MPI ranks (CPU version)
MatMult  100 1.0 1.1895e+01 1.0 9.63e+09 1.1 2.8e+03 2.2e+05 
0.0e+00 24 99 97 18  0 100100100100  0  4743   0  0 0.00e+000 
0.00e+00  0
VecScatterBegin  100 1.0 4.9145e-02 3.0 0.00e+00 0.0 2.8e+03 2.2e+05 
0.0e+00  0  0 97 18  0   0  0100100  0 0   0  0 0.00e+000 
0.00e+00  0
VecScatterEnd100 1.0 2.9441e+00133  0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  3  0  0  0  0  13  0  0  0  0 0   0  0 0.00e+000 
0.00e+00  0

6 MPI ranks + 6 GPUs + regular SF
MatMult  100 1.0 1.7800e-01 1.0 9.66e+09 1.1 2.8e+03 2.2e+05 
0.0e+00  0 99 97 18  0 100100100100  0 318057   3084009 100 1.02e+02  100 
2.69e+02 100
VecScatterBegin  100 1.0 1.2786e-01 1.3 0.00e+00 0.0 2.8e+03 2.2e+05 
0.0e+00  0  0 97 18  0  64  0100100  0 0   0  0 0.00e+00  100 
2.69e+02  0
VecScatterEnd100 1.0 6.2196e-02 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0  22  0  0  0  0 0   0  0 0.00e+000 
0.00e+00  0
VecCUDACopyTo100 1.0 1.0850e-02 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   5  0  0  0  0 0   0100 1.02e+020 
0.00e+00  0
VecCopyFromSome  100 1.0 1.0263e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0  54  0  0  0  0 0   0  0 0.00e+00  100 
2.69e+02  0

6 MPI ranks + 6 GPUs + CUDA-aware SF
MatMult  100 1.0 1.1112e-01 1.0 9.66e+09 1.1 2.8e+03 2.2e+05 
0.0e+00  1 99 97 18  0 100100100100  0 509496   3133521   0 0.00e+000 
0.00e+00 100
VecScatterBegin  100 1.0 7.9461e-02 1.1 0.00e+00 0.0 2.8e+03 2.2e+05 
0.0e+00  1  0 97 18  0  70  0100100  0 0   0  0 0.00e+000 
0.00e+00  0
VecScatterEnd100 1.0 2.2805e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0  17  0  0  0  0 0   0  0 0.00e+000 
0.00e+00  0

24 MPI ranks + 6 GPUs + regular SF
MatMult  100 1.0 1.1094e-01 1.0 2.63e+09 1.2 1.9e+04 5.9e+04 
0.0e+00  1 99 97 25  0 100100100100  0 510337   951558  100 4.61e+01  100 
6.72e+01 100
VecScatterBegin  100 1.0 4.8966e-02 1.8 0.00e+00 0.0 1.9e+04 5.9e+04 
0.0e+00  0  0 97 25  0  34  0100100  0 0   0  0 0.00e+00  100 
6.72e+01  0
VecScatterEnd100 1.0 7.2969e-02 4.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0  42  0  0  0  0 0   0  0 0.00e+000 
0.00e+00  0
VecCUDACopyTo100 1.0 4.4487e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   3  0  0  0  0 0   0100 4.61e+010 
0.00e+00  0
VecCopyFromSome  100 1.0 4.3315e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0  29  0  0  0  0 0   0  0 0.00e+00  100 
6.72e+01  0

24 MPI ranks + 6 GPUs + CUDA-aware SF
MatMult  100 1.0 1.4597e-01 1.2 2.63e+09 1.2 1.9e+04 5.9e+04 
0.0e+00  1 99 97 25  0 100100100100  0 387864   9733910 0.00e+000 
0.00e+00 100
VecScatterBegin  100 1.0 6.4899e-02 2.9 0.00e+00 0.0 1.9e+04 5.9e+04 
0.0e+00  1  0 97 25  0  35  0100100  0 0   0  0 0.00e+000 
0.00e+00  0
VecScatterEnd100 1.0 1.1179e-01 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0  48  0  0  0  0 0   0  0 0.00e+000 
0.00e+00  0


--Junchao Zhang



Re: [petsc-dev] Configure hangs on Summit

2019-09-20 Thread Mills, Richard Tran via petsc-dev
Everything that Barry says about '--with-batch' is valid, but let me point out 
one thing about Summit: You don't need "--with-batch" at all, because the 
Summit login/compile nodes run the same hardware (minus the GPUs) and software 
stack as the back-end compute nodes. This makes configuring and building 
software far, far easier than we are used to on the big LCF machines. I was 
actually shocked when I found this out -- I'd gotten so used to struggling with 
cross-compilers, etc.

--Richard

On 9/20/19 9:28 PM, Smith, Barry F. wrote:


--with-batch is still there and should be used in such circumstances. The 
difference is that --with-branch does not generate a program that you need to 
submit to the batch system before continuing the configure. Instead 
--with-batch guesses at and skips some of the tests (with clear warnings on how 
you can adjust the guesses).

 Regarding the hanging. This happens because the thread monitoring of 
configure started executables was removed years ago since it was slow and 
occasionally buggy (the default wait was an absurd 10 minutes too). Thus when 
configure tried to test an mpiexec that hung the test would hang.   There is 
code in one of my branches I've been struggling to get into master for a long 
time that puts back the thread monitoring for this one call with a small 
timeout so you should never see this hang again.

  Barry

   We could be a little clever and have configure detect it is on a Cray or 
other batch system and automatically add the batch option. That would be a nice 
little feature for someone to add. Probably just a few lines of code.




On Sep 20, 2019, at 8:59 PM, Mills, Richard Tran via petsc-dev 
 wrote:

Hi Junchao,

Glad you've found a workaround, but I don't know why you are hitting this 
problem. The last time I built PETSc on Summit (just a couple days ago), I 
didn't have this problem. I'm working from the example template that's in the 
PETSc repo at config/examples/arch-olcf-summit-opt.py.

Can you point me to your configure script on Summit so I can try to reproduce 
your problem?

--Richard

On 9/20/19 4:25 PM, Zhang, Junchao via petsc-dev wrote:


Satish's trick --with-mpiexec=/bin/true solved the problem.  Thanks.
--Junchao Zhang


On Fri, Sep 20, 2019 at 3:50 PM Junchao Zhang 
 wrote:
My configure hangs on Summit at
  TESTING: configureMPIEXEC from 
config.packages.MPI(config/BuildSystem/config/packages/MPI.py:170)

On the machine one has to use script to submit jobs. So why do we need 
configureMPIEXEC? Do I need to use --with-batch? I remember we removed that.

--Junchao Zhang










[petsc-dev] MatMult on Summit

2019-09-20 Thread Zhang, Junchao via petsc-dev
I downloaded a sparse matrix (HV15R) 
from Florida Sparse Matrix Collection. Its size is about 2M x 2M. Then I ran 
the same MatMult 100 times on one node of Summit with -mat_type aijcusparse 
-vec_type cuda. I found MatMult was almost dominated by VecScatter in this 
simple test. Using 6 MPI ranks + 6 GPUs,  I found CUDA aware SF could improve 
performance. But if I enabled Multi-Process Service on Summit and used 24 ranks 
+ 6 GPUs, I found CUDA aware SF hurt performance. I don't know why and have to 
profile it. I will also collect  data with multiple nodes. Are the matrix and 
tests proper?


EventCount  Time (sec) Flop 
 --- Global ---  --- Stage   Total   GPU- CpuToGpu -   - GpuToCpu - GPU
   Max Ratio  Max Ratio   Max  Ratio  Mess   AvgLen  Reduct 
 %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s Count   Size   Count   Size  %F
---
6 MPI ranks (CPU version)
MatMult  100 1.0 1.1895e+01 1.0 9.63e+09 1.1 2.8e+03 2.2e+05 
0.0e+00 24 99 97 18  0 100100100100  0  4743   0  0 0.00e+000 
0.00e+00  0
VecScatterBegin  100 1.0 4.9145e-02 3.0 0.00e+00 0.0 2.8e+03 2.2e+05 
0.0e+00  0  0 97 18  0   0  0100100  0 0   0  0 0.00e+000 
0.00e+00  0
VecScatterEnd100 1.0 2.9441e+00133  0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  3  0  0  0  0  13  0  0  0  0 0   0  0 0.00e+000 
0.00e+00  0

6 MPI ranks + 6 GPUs + regular SF
MatMult  100 1.0 1.7800e-01 1.0 9.66e+09 1.1 2.8e+03 2.2e+05 
0.0e+00  0 99 97 18  0 100100100100  0 318057   3084009 100 1.02e+02  100 
2.69e+02 100
VecScatterBegin  100 1.0 1.2786e-01 1.3 0.00e+00 0.0 2.8e+03 2.2e+05 
0.0e+00  0  0 97 18  0  64  0100100  0 0   0  0 0.00e+00  100 
2.69e+02  0
VecScatterEnd100 1.0 6.2196e-02 3.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0  22  0  0  0  0 0   0  0 0.00e+000 
0.00e+00  0
VecCUDACopyTo100 1.0 1.0850e-02 2.3 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   5  0  0  0  0 0   0100 1.02e+020 
0.00e+00  0
VecCopyFromSome  100 1.0 1.0263e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0  54  0  0  0  0 0   0  0 0.00e+00  100 
2.69e+02  0

6 MPI ranks + 6 GPUs + CUDA-aware SF
MatMult  100 1.0 1.1112e-01 1.0 9.66e+09 1.1 2.8e+03 2.2e+05 
0.0e+00  1 99 97 18  0 100100100100  0 509496   3133521   0 0.00e+000 
0.00e+00 100
VecScatterBegin  100 1.0 7.9461e-02 1.1 0.00e+00 0.0 2.8e+03 2.2e+05 
0.0e+00  1  0 97 18  0  70  0100100  0 0   0  0 0.00e+000 
0.00e+00  0
VecScatterEnd100 1.0 2.2805e-02 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0  17  0  0  0  0 0   0  0 0.00e+000 
0.00e+00  0

24 MPI ranks + 6 GPUs + regular SF
MatMult  100 1.0 1.1094e-01 1.0 2.63e+09 1.2 1.9e+04 5.9e+04 
0.0e+00  1 99 97 25  0 100100100100  0 510337   951558  100 4.61e+01  100 
6.72e+01 100
VecScatterBegin  100 1.0 4.8966e-02 1.8 0.00e+00 0.0 1.9e+04 5.9e+04 
0.0e+00  0  0 97 25  0  34  0100100  0 0   0  0 0.00e+00  100 
6.72e+01  0
VecScatterEnd100 1.0 7.2969e-02 4.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0  42  0  0  0  0 0   0  0 0.00e+000 
0.00e+00  0
VecCUDACopyTo100 1.0 4.4487e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   3  0  0  0  0 0   0100 4.61e+010 
0.00e+00  0
VecCopyFromSome  100 1.0 4.3315e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0  29  0  0  0  0 0   0  0 0.00e+00  100 
6.72e+01  0

24 MPI ranks + 6 GPUs + CUDA-aware SF
MatMult  100 1.0 1.4597e-01 1.2 2.63e+09 1.2 1.9e+04 5.9e+04 
0.0e+00  1 99 97 25  0 100100100100  0 387864   9733910 0.00e+000 
0.00e+00 100
VecScatterBegin  100 1.0 6.4899e-02 2.9 0.00e+00 0.0 1.9e+04 5.9e+04 
0.0e+00  1  0 97 25  0  35  0100100  0 0   0  0 0.00e+000 
0.00e+00  0
VecScatterEnd100 1.0 1.1179e-01 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0  48  0  0  0  0 0   0  0 0.00e+000 
0.00e+00  0


--Junchao Zhang


Re: [petsc-dev] Configure hangs on Summit

2019-09-20 Thread Zhang, Junchao via petsc-dev
Richard,
  I almost copied arch-olcf-summit-opt.py. The hanging is random. I met it few 
weeks ago. I retried and it passed. It happened today when I did a fresh 
configure.
  On Summit login nodes, mpiexec is actually in everyone's PATH. I did "ps ux" 
and found the script was executing "mpiexec ... "
--Junchao Zhang


On Fri, Sep 20, 2019 at 8:59 PM Mills, Richard Tran via petsc-dev 
mailto:petsc-dev@mcs.anl.gov>> wrote:
Hi Junchao,

Glad you've found a workaround, but I don't know why you are hitting this 
problem. The last time I built PETSc on Summit (just a couple days ago), I 
didn't have this problem. I'm working from the example template that's in the 
PETSc repo at config/examples/arch-olcf-summit-opt.py.

Can you point me to your configure script on Summit so I can try to reproduce 
your problem?

--Richard

On 9/20/19 4:25 PM, Zhang, Junchao via petsc-dev wrote:
Satish's trick --with-mpiexec=/bin/true solved the problem.  Thanks.
--Junchao Zhang


On Fri, Sep 20, 2019 at 3:50 PM Junchao Zhang 
mailto:jczh...@mcs.anl.gov>> wrote:
My configure hangs on Summit at
  TESTING: configureMPIEXEC from 
config.packages.MPI(config/BuildSystem/config/packages/MPI.py:170)

On the machine one has to use script to submit jobs. So why do we need 
configureMPIEXEC? Do I need to use --with-batch? I remember we removed that.

--Junchao Zhang



Re: [petsc-dev] test harness: output of actually executed command for V=1 gone?

2019-09-20 Thread Jed Brown via petsc-dev
"Smith, Barry F."  writes:

>> Satish and Barry:  Do we need the Error codes or can I revert to previous 
>> functionality?
>
>   I think it is important to display the error codes.
>
>   How about displaying at the bottom how to run the broken tests? You already 
> show how to run them with the test harness, you could also print how to run 
> them directly? Better then mixing it up with the TAP output?

How about a target for it?

make -f gmakefile show-test search=abcd

We already have print-test, which might more accurately be named ls-test.


Re: [petsc-dev] test harness: output of actually executed command for V=1 gone?

2019-09-20 Thread Smith, Barry F. via petsc-dev



> On Sep 20, 2019, at 4:18 PM, Scott Kruger via petsc-dev 
>  wrote:
> 
> 
> 
> 
> 
> On 9/20/19 2:49 PM, Jed Brown wrote:
>> Hapla  Vaclav via petsc-dev  writes:
>>> On 20 Sep 2019, at 19:59, Scott Kruger 
>>> mailto:kru...@txcorp.com>> wrote:
>>> 
>>> 
>>> On 9/20/19 10:44 AM, Hapla Vaclav via petsc-dev wrote:
>>> I was used to copy the command actually run by test harness, change to 
>>> example's directory and paste the command (just changing one .. to ., e.g. 
>>> ../ex1 to ./ex1).
>>> Is this output gone? Bad news. I think there should definitely be an option 
>>> to quickly reproduce the test run to work on failing tests.
>>> 
>>> I only modified the V=0 option to suppress the TAP 'ok' output.
>>> 
>>> I think you are referring to the 'not ok' now giving the error code instead 
>>> of the cmd which is now true regardless of V.  This was suggested by 
>>> others.  I defer to the larger group on what's desired here.
>>> 
>>> Note that is sometimes tedious to deduce the whole command line from the 
>>> test declarations, for example because of multiple args: lines.
>>> 
>>> Personally, I recommend just cd'ing into the test directory and running the 
>>> scripts by hand.
>>> 
>>> For example:
>>> cd $PETSC_ARCH/tests/ksp/ksp/examples/tests/runex22
>>> cat ksp_ksp_tests-ex22_1.sh
>>> mpiexec  -n 1 ../ex22   > ex22_1.tmp 2> runex22.err
>>> 
>>> OK, this takes a bit more time but does to job.
>> That's yucky.  I think we should have an option to print the command(s)
>> that would be run, one line per expanded {{a b c}}, so we can copy-paste
>> into the terminal with only one step of indirection.
> 
> Ugh.  I'm dealing with bash at this level - not python.
> 
> Satish and Barry:  Do we need the Error codes or can I revert to previous 
> functionality?

  I think it is important to display the error codes.

  How about displaying at the bottom how to run the broken tests? You already 
show how to run them with the test harness, you could also print how to run 
them directly? Better then mixing it up with the TAP output?

   Barry

> 
> Scott
> 
> 
> -- 
> Tech-X Corporation   kru...@txcorp.com
> 5621 Arapahoe Ave, Suite A   Phone: (720) 974-1841
> Boulder, CO 80303Fax:   (303) 448-7756



Re: [petsc-dev] Configure hangs on Summit

2019-09-20 Thread Smith, Barry F. via petsc-dev


--with-batch is still there and should be used in such circumstances. The 
difference is that --with-branch does not generate a program that you need to 
submit to the batch system before continuing the configure. Instead 
--with-batch guesses at and skips some of the tests (with clear warnings on how 
you can adjust the guesses).

 Regarding the hanging. This happens because the thread monitoring of 
configure started executables was removed years ago since it was slow and 
occasionally buggy (the default wait was an absurd 10 minutes too). Thus when 
configure tried to test an mpiexec that hung the test would hang.   There is 
code in one of my branches I've been struggling to get into master for a long 
time that puts back the thread monitoring for this one call with a small 
timeout so you should never see this hang again.

  Barry

   We could be a little clever and have configure detect it is on a Cray or 
other batch system and automatically add the batch option. That would be a nice 
little feature for someone to add. Probably just a few lines of code. 
   

> On Sep 20, 2019, at 8:59 PM, Mills, Richard Tran via petsc-dev 
>  wrote:
> 
> Hi Junchao,
> 
> Glad you've found a workaround, but I don't know why you are hitting this 
> problem. The last time I built PETSc on Summit (just a couple days ago), I 
> didn't have this problem. I'm working from the example template that's in the 
> PETSc repo at config/examples/arch-olcf-summit-opt.py.
> 
> Can you point me to your configure script on Summit so I can try to reproduce 
> your problem?
> 
> --Richard
> 
> On 9/20/19 4:25 PM, Zhang, Junchao via petsc-dev wrote:
>> Satish's trick --with-mpiexec=/bin/true solved the problem.  Thanks.
>> --Junchao Zhang
>> 
>> 
>> On Fri, Sep 20, 2019 at 3:50 PM Junchao Zhang  wrote:
>> My configure hangs on Summit at
>>   TESTING: configureMPIEXEC from 
>> config.packages.MPI(config/BuildSystem/config/packages/MPI.py:170)
>> 
>> On the machine one has to use script to submit jobs. So why do we need 
>> configureMPIEXEC? Do I need to use --with-batch? I remember we removed that.
>> 
>> --Junchao Zhang
> 



Re: [petsc-dev] Configure hangs on Summit

2019-09-20 Thread Mills, Richard Tran via petsc-dev
Hi Junchao,

Glad you've found a workaround, but I don't know why you are hitting this 
problem. The last time I built PETSc on Summit (just a couple days ago), I 
didn't have this problem. I'm working from the example template that's in the 
PETSc repo at config/examples/arch-olcf-summit-opt.py.

Can you point me to your configure script on Summit so I can try to reproduce 
your problem?

--Richard

On 9/20/19 4:25 PM, Zhang, Junchao via petsc-dev wrote:
Satish's trick --with-mpiexec=/bin/true solved the problem.  Thanks.
--Junchao Zhang


On Fri, Sep 20, 2019 at 3:50 PM Junchao Zhang 
mailto:jczh...@mcs.anl.gov>> wrote:
My configure hangs on Summit at
  TESTING: configureMPIEXEC from 
config.packages.MPI(config/BuildSystem/config/packages/MPI.py:170)

On the machine one has to use script to submit jobs. So why do we need 
configureMPIEXEC? Do I need to use --with-batch? I remember we removed that.

--Junchao Zhang



Re: [petsc-dev] test harness: output of actually executed command for V=1 gone?

2019-09-20 Thread Hapla Vaclav via petsc-dev



> On 20 Sep 2019, at 23:18, Scott Kruger  wrote:
> 
> 
> 
> 
> 
> On 9/20/19 2:49 PM, Jed Brown wrote:
>> Hapla  Vaclav via petsc-dev  writes:
>>> On 20 Sep 2019, at 19:59, Scott Kruger 
>>> mailto:kru...@txcorp.com>> wrote:
>>> 
>>> 
>>> On 9/20/19 10:44 AM, Hapla Vaclav via petsc-dev wrote:
>>> I was used to copy the command actually run by test harness, change to 
>>> example's directory and paste the command (just changing one .. to ., e.g. 
>>> ../ex1 to ./ex1).
>>> Is this output gone? Bad news. I think there should definitely be an option 
>>> to quickly reproduce the test run to work on failing tests.
>>> 
>>> I only modified the V=0 option to suppress the TAP 'ok' output.
>>> 
>>> I think you are referring to the 'not ok' now giving the error code instead 
>>> of the cmd which is now true regardless of V.  This was suggested by 
>>> others.  I defer to the larger group on what's desired here.
>>> 
>>> Note that is sometimes tedious to deduce the whole command line from the 
>>> test declarations, for example because of multiple args: lines.
>>> 
>>> Personally, I recommend just cd'ing into the test directory and running the 
>>> scripts by hand.
>>> 
>>> For example:
>>> cd $PETSC_ARCH/tests/ksp/ksp/examples/tests/runex22
>>> cat ksp_ksp_tests-ex22_1.sh
>>> mpiexec  -n 1 ../ex22   > ex22_1.tmp 2> runex22.err
>>> 
>>> OK, this takes a bit more time but does to job.
>> That's yucky.  I think we should have an option to print the command(s)
>> that would be run, one line per expanded {{a b c}}, so we can copy-paste
>> into the terminal with only one step of indirection.
> 
> Ugh.  I'm dealing with bash at this level - not python.
> 
> Satish and Barry:  Do we need the Error codes or can I revert to previous 
> functionality?

What about triggering the previous behavior only conditionally, e.g. by 
something like SHOWCMD=1, at least for now?

Of course, in longer term something like Jed suggests would be very welcome.

Vaclav

> 
> Scott
> 
> 
> -- 
> Tech-X Corporation   kru...@txcorp.com
> 5621 Arapahoe Ave, Suite A   Phone: (720) 974-1841
> Boulder, CO 80303Fax:   (303) 448-7756



Re: [petsc-dev] Configure hangs on Summit

2019-09-20 Thread Zhang, Junchao via petsc-dev
Satish's trick --with-mpiexec=/bin/true solved the problem.  Thanks.
--Junchao Zhang


On Fri, Sep 20, 2019 at 3:50 PM Junchao Zhang 
mailto:jczh...@mcs.anl.gov>> wrote:
My configure hangs on Summit at
  TESTING: configureMPIEXEC from 
config.packages.MPI(config/BuildSystem/config/packages/MPI.py:170)

On the machine one has to use script to submit jobs. So why do we need 
configureMPIEXEC? Do I need to use --with-batch? I remember we removed that.

--Junchao Zhang


Re: [petsc-dev] test harness: output of actually executed command for V=1 gone?

2019-09-20 Thread Scott Kruger via petsc-dev






On 9/20/19 2:49 PM, Jed Brown wrote:

Hapla  Vaclav via petsc-dev  writes:


On 20 Sep 2019, at 19:59, Scott Kruger 
mailto:kru...@txcorp.com>> wrote:


On 9/20/19 10:44 AM, Hapla Vaclav via petsc-dev wrote:
I was used to copy the command actually run by test harness, change to 
example's directory and paste the command (just changing one .. to ., e.g. 
../ex1 to ./ex1).
Is this output gone? Bad news. I think there should definitely be an option to 
quickly reproduce the test run to work on failing tests.

I only modified the V=0 option to suppress the TAP 'ok' output.

I think you are referring to the 'not ok' now giving the error code instead of 
the cmd which is now true regardless of V.  This was suggested by others.  I 
defer to the larger group on what's desired here.

Note that is sometimes tedious to deduce the whole command line from the test 
declarations, for example because of multiple args: lines.

Personally, I recommend just cd'ing into the test directory and running the 
scripts by hand.

For example:
cd $PETSC_ARCH/tests/ksp/ksp/examples/tests/runex22
cat ksp_ksp_tests-ex22_1.sh
mpiexec  -n 1 ../ex22   > ex22_1.tmp 2> runex22.err

OK, this takes a bit more time but does to job.


That's yucky.  I think we should have an option to print the command(s)
that would be run, one line per expanded {{a b c}}, so we can copy-paste
into the terminal with only one step of indirection.


Ugh.  I'm dealing with bash at this level - not python.

Satish and Barry:  Do we need the Error codes or can I revert to 
previous functionality?


Scott


--
Tech-X Corporation   kru...@txcorp.com
5621 Arapahoe Ave, Suite A   Phone: (720) 974-1841
Boulder, CO 80303Fax:   (303) 448-7756


[petsc-dev] Configure hangs on Summit

2019-09-20 Thread Zhang, Junchao via petsc-dev
My configure hangs on Summit at
  TESTING: configureMPIEXEC from 
config.packages.MPI(config/BuildSystem/config/packages/MPI.py:170)

On the machine one has to use script to submit jobs. So why do we need 
configureMPIEXEC? Do I need to use --with-batch? I remember we removed that.

--Junchao Zhang


Re: [petsc-dev] test harness: output of actually executed command for V=1 gone?

2019-09-20 Thread Jed Brown via petsc-dev
Hapla  Vaclav via petsc-dev  writes:

> On 20 Sep 2019, at 19:59, Scott Kruger 
> mailto:kru...@txcorp.com>> wrote:
>
>
> On 9/20/19 10:44 AM, Hapla Vaclav via petsc-dev wrote:
> I was used to copy the command actually run by test harness, change to 
> example's directory and paste the command (just changing one .. to ., e.g. 
> ../ex1 to ./ex1).
> Is this output gone? Bad news. I think there should definitely be an option 
> to quickly reproduce the test run to work on failing tests.
>
> I only modified the V=0 option to suppress the TAP 'ok' output.
>
> I think you are referring to the 'not ok' now giving the error code instead 
> of the cmd which is now true regardless of V.  This was suggested by others.  
> I defer to the larger group on what's desired here.
>
> Note that is sometimes tedious to deduce the whole command line from the test 
> declarations, for example because of multiple args: lines.
>
> Personally, I recommend just cd'ing into the test directory and running the 
> scripts by hand.
>
> For example:
> cd $PETSC_ARCH/tests/ksp/ksp/examples/tests/runex22
> cat ksp_ksp_tests-ex22_1.sh
> mpiexec  -n 1 ../ex22   > ex22_1.tmp 2> runex22.err
>
> OK, this takes a bit more time but does to job.

That's yucky.  I think we should have an option to print the command(s)
that would be run, one line per expanded {{a b c}}, so we can copy-paste
into the terminal with only one step of indirection.


Re: [petsc-dev] test harness: output of actually executed command for V=1 gone?

2019-09-20 Thread Hapla Vaclav via petsc-dev

On 20 Sep 2019, at 19:59, Scott Kruger 
mailto:kru...@txcorp.com>> wrote:


On 9/20/19 10:44 AM, Hapla Vaclav via petsc-dev wrote:
I was used to copy the command actually run by test harness, change to 
example's directory and paste the command (just changing one .. to ., e.g. 
../ex1 to ./ex1).
Is this output gone? Bad news. I think there should definitely be an option to 
quickly reproduce the test run to work on failing tests.

I only modified the V=0 option to suppress the TAP 'ok' output.

I think you are referring to the 'not ok' now giving the error code instead of 
the cmd which is now true regardless of V.  This was suggested by others.  I 
defer to the larger group on what's desired here.

Note that is sometimes tedious to deduce the whole command line from the test 
declarations, for example because of multiple args: lines.

Personally, I recommend just cd'ing into the test directory and running the 
scripts by hand.

For example:
cd $PETSC_ARCH/tests/ksp/ksp/examples/tests/runex22
cat ksp_ksp_tests-ex22_1.sh
mpiexec  -n 1 ../ex22   > ex22_1.tmp 2> runex22.err

OK, this takes a bit more time but does to job.


or

cd $PETSC_ARCH/tests/ksp/ksp/examples/tests
./runex22.sh
ok ksp_ksp_tests-ex22_1
ok diff-ksp_ksp_tests-ex22_1

A problem with this is that if the test includes many {{ }}, it runs all 
combinations and one can't focus on just one failing. E.g.
  make -f gmakefile.test test 
search='dm_impls_plex_tests-ex18_7_hdf5_repart_nsize-5interpolate-serial'
does not run anything and one needs to enter e.g.
  make -f gmakefile.test test search='dm_impls_plex_tests-ex18_7_hdf5_repart%'
which can run tons of tests and one has to wait and then CTRL+C.

BTW I think there should an underscore after nsize:
dm_impls_plex_tests-ex18_7_hdf5_repart_nsize-5_interpolate-serial


Thanks,
Vaclav




Scott



--
Tech-X Corporation   kru...@txcorp.com
5621 Arapahoe Ave, Suite A   Phone: (720) 974-1841
Boulder, CO 80303Fax:   (303) 448-7756



Re: [petsc-dev] Should we add something about GPU support to the user manual?

2019-09-20 Thread Bisht, Gautam via petsc-dev
Hi Richard,

Information about PETSc’s support for GPU would be super helpful. Btw, I 
noticed that in PETSc User 2019 
meeting you gave a talk 
on "Progress with PETSc on Manycore and GPU-based Systems on the Path to 
Exascale”, but the slides for the talk were not up on the website. Is it 
possible for you to share those slides or post them on the online?

Thanks,
-Gautam

On Sep 12, 2019, at 10:18 AM, Mills, Richard Tran via petsc-dev 
mailto:petsc-dev@mcs.anl.gov>> wrote:

Fellow PETSc developers,

I've had a few people recently ask me something along the lines of "Where do I 
look in the user manual for information about how to use GPUs with PETSc?", and 
then I have to give them the slightly embarrassing answer that there is nothing 
in there. Since we officially added GPU support a few releases ago, it might be 
appropriate to put something in the manual (even though our GPU support is 
still a moving target). I think I can draft something based on the existing 
tutorial material that Karl and I have been presenting. Do others think this 
would be worthwhile, or is our GPU support still too immature to belong in the 
manual? And are there any thoughts on where this belongs in the manual?

--Richard



Re: [petsc-dev] test harness: output of actually executed command for V=1 gone?

2019-09-20 Thread Scott Kruger via petsc-dev






On 9/20/19 10:44 AM, Hapla Vaclav via petsc-dev wrote:

I was used to copy the command actually run by test harness, change to 
example's directory and paste the command (just changing one .. to ., e.g. 
../ex1 to ./ex1).

Is this output gone? Bad news. I think there should definitely be an option to 
quickly reproduce the test run to work on failing tests.


I only modified the V=0 option to suppress the TAP 'ok' output.

I think you are referring to the 'not ok' now giving the error code 
instead of the cmd which is now true regardless of V.  This was 
suggested by others.  I defer to the larger group on what's desired here.




Note that is sometimes tedious to deduce the whole command line from the test 
declarations, for example because of multiple args: lines.


Personally, I recommend just cd'ing into the test directory and running 
the scripts by hand.


For example:
cd $PETSC_ARCH/tests/ksp/ksp/examples/tests/runex22
cat ksp_ksp_tests-ex22_1.sh
mpiexec  -n 1 ../ex22   > ex22_1.tmp 2> runex22.err

or

cd $PETSC_ARCH/tests/ksp/ksp/examples/tests
./runex22.sh
 ok ksp_ksp_tests-ex22_1
 ok diff-ksp_ksp_tests-ex22_1



Scott



--
Tech-X Corporation   kru...@txcorp.com
5621 Arapahoe Ave, Suite A   Phone: (720) 974-1841
Boulder, CO 80303Fax:   (303) 448-7756


[petsc-dev] test harness: output of actually executed command for V=1 gone?

2019-09-20 Thread Hapla Vaclav via petsc-dev
I was used to copy the command actually run by test harness, change to 
example's directory and paste the command (just changing one .. to ., e.g. 
../ex1 to ./ex1).

Is this output gone? Bad news. I think there should definitely be an option to 
quickly reproduce the test run to work on failing tests.

Note that is sometimes tedious to deduce the whole command line from the test 
declarations, for example because of multiple args: lines.

Vaclav

Re: [petsc-dev] How to check that MatMatMult is available

2019-09-20 Thread Pierre Jolivet via petsc-dev

> On 20 Sep 2019, at 7:36 AM, Jed Brown  > wrote:
> 
> Pierre Jolivet via petsc-dev  > writes:
> 
>> Hello,
>> Given a Mat A, I’d like to know if there is an implementation available for 
>> doing C=A*B
>> I was previously using MatHasOperation(A, MATOP_MATMAT_MULT, ) 
>> but the result is not correct in at least two cases:
> 
> Do you want MATOP_MAT_MULT and MATOP_TRANSPOSE_MAT_MULT?

Ah, OK, MATMAT => MatMatMat, so you are right.
I’ll make sure MatHasOperation_Transpose and MatHasOperation_Nest return the 
correct value then.

>> 1) A is a MATTRANSPOSE and the underlying Mat B=A^T has a 
>> MatTransposeMatMult implementation (there is currently no 
>> MATOP_MATTRANSPOSEMAT_MULT)
>> 2) A is a MATNEST. This could be fixed in MatHasOperation_Nest, by checking 
>> MATOP_MATMAT_MULT of all matrices in the MATNEST, but this would be 
>> incorrect again if there is a single MATTRANPOSE in there
>> What is then the proper way to check that I can indeed call MatMatMult(A,…)?
>> Do I need to copy/paste all this 
>> https://www.mcs.anl.gov/petsc/petsc-current/src/mat/interface/matrix.c.html#line9801
>>  
>> 
>>  in user code?
> 
> Unfortunately, I don't think there is any common interface for the
> string handling, though it would make sense to add one because code of
> this sort is copied many times:
> 
>   /* dispatch based on the type of A and B from their PetscObject's 
> PetscFunctionLists. */
>   char multname[256];
>   ierr = PetscStrncpy(multname,"MatMatMult_",sizeof(multname));CHKERRQ(ierr);
>   ierr = 
> PetscStrlcat(multname,((PetscObject)A)->type_name,sizeof(multname));CHKERRQ(ierr);
>   ierr = PetscStrlcat(multname,"_",sizeof(multname));CHKERRQ(ierr);
>   ierr = 
> PetscStrlcat(multname,((PetscObject)B)->type_name,sizeof(multname));CHKERRQ(ierr);
>   ierr = PetscStrlcat(multname,"_C",sizeof(multname));CHKERRQ(ierr); /* e.g., 
> multname = "MatMatMult_seqdense_seqaij_C" */
>   ierr = 
> PetscObjectQueryFunction((PetscObject)B,multname,);CHKERRQ(ierr);
>   if (!mult) 
> SETERRQ2(PetscObjectComm((PetscObject)A),PETSC_ERR_ARG_INCOMP,"MatMatMult 
> requires A, %s, to be compatible with B, 
> %s",((PetscObject)A)->type_name,((PetscObject)B)->type_name);
> 
>> Thanks,
>> Pierre
>> 
>> PS: in my case, C and B are always of type MATDENSE. Should we handle
>> this in MatMatMult and never error out for such a simple case. 
> 
> I would say yes.

Perfect, that’s all I need!

Thanks,
Pierre

>> Indeed, one can just loop on the columns of B and C by doing multiple
>> MatMult. This is what I’m currently doing in user code when
>> hasMatMatMult == PETSC_FALSE.