Re: [petsc-dev] Should we add something about GPU support to the user manual?

2019-10-29 Thread Mills, Richard Tran via petsc-dev
Hi Gautam,

Apologies for overlooking this. The slides are now available online:

  https://www.mcs.anl.gov/petsc/meetings/2019/slides/mills-petsc-2019.pdf

There isn't a whole lot in terms of GPU results in there, but GPU support is 
currently a very active area of development for the PETSc team and we should 
have a lot more to report soon. (We'll have several presentations at SIAM-PP in 
Seattle on this topic, for instance.)

Best regards,
Richard

On 9/20/19 12:37 PM, Bisht, Gautam wrote:
Hi Richard,

Information about PETSc’s support for GPU would be super helpful. Btw, I 
noticed that in PETSc User 2019 
meeting you gave a talk 
on "Progress with PETSc on Manycore and GPU-based Systems on the Path to 
Exascale”, but the slides for the talk were not up on the website. Is it 
possible for you to share those slides or post them on the online?

Thanks,
-Gautam

On Sep 12, 2019, at 10:18 AM, Mills, Richard Tran via petsc-dev 
mailto:petsc-dev@mcs.anl.gov>> wrote:

Fellow PETSc developers,

I've had a few people recently ask me something along the lines of "Where do I 
look in the user manual for information about how to use GPUs with PETSc?", and 
then I have to give them the slightly embarrassing answer that there is nothing 
in there. Since we officially added GPU support a few releases ago, it might be 
appropriate to put something in the manual (even though our GPU support is 
still a moving target). I think I can draft something based on the existing 
tutorial material that Karl and I have been presenting. Do others think this 
would be worthwhile, or is our GPU support still too immature to belong in the 
manual? And are there any thoughts on where this belongs in the manual?

--Richard




Re: [petsc-dev] Feed back on report on performance of vector operations on Summit requested

2019-10-29 Thread Smith, Barry F. via petsc-dev


  Karl,

Thanks for your comments.


> On Oct 10, 2019, at 12:34 AM, Karl Rupp via petsc-dev  
> wrote:
> 
> Hi,
> 
> Table 2 reports negative latencies. This doesn't look right to me ;-)
> If it's the outcome of a parameter fit to the performance model, then use a 
> parameter name (e.g. alpha) instead of the term 'latency'.

  Per Jed's suggestion we will include some plots and additional information to 
make clearer what happens on the CPU.


> 
> Figure 11 has a very narrow range in the y-coordinate and thus exaggerates 
> the variation greatly. "GPU performance" should be adjusted to something like 
> "execution time" to explain the meaning of the y-axis.

Thanks. Fixed by adding the next size of 10^7 also

> 
> Page 12: The latency for VecDot is higher than for VecAXPY because VecDot 
> requires the result to be copied back to the host. This is an additional 
> operation.

   Good point. We will include this. 
> 
> Regarding performance measurements: Did you synchronize after each kernel 
> launch? I.e. did you run (approach A)

For all our runs, as stated near the beginning of the text many times == 1. 
This seems to work fine and are reproducible so I don't see a need to run 
multiple times.

  Barry



> for (many times) {
>   synchronize();
>   start_timer();
>   kernel_launch();
>   synchronize();
>   stop_timer();
> }
> and then take averages over the timings obtained, or did you (approach B)
> synchronize();
> start_timer();
> for (many times) {
>   kernel_launch();
> }
> synchronize();
> stop_timer();
> and then divide the obtained time by the number of runs?
> 
> Approach A will report a much higher latency than the latter, because 
> synchronizations are expensive (i.e. your latency consists of kernel launch 
> latency plus device synchronization latency). Approach B is slightly 
> over-optimistic, but I've found it to better match what one observes for an 
> algorithm involving several kernel launches.
> 
> Best regards,
> Karli
> 
> 
> 
> On 10/10/19 12:34 AM, Smith, Barry F. via petsc-dev wrote:
>>We've prepared a short report on the performance of vector operations on 
>> Summit and would appreciate any feed back including: inconsistencies, lack 
>> of clarity, incorrect notation or terminology, etc.
>>Thanks
>> Barry, Hannah, and Richard



Re: [petsc-dev] Feed back on report on performance of vector operations on Summit requested

2019-10-29 Thread Smith, Barry F. via petsc-dev


  Jed,

 Thanks for your feedback.


> On Oct 23, 2019, at 7:15 PM, Jed Brown  wrote:
> 
> IMO, Figures 2 and 7+ are more interesting when the x axis (vector size)
> is replaced by execution time.  


> We don't scale by fixing the resource
> and increasing the problem size, we choose the global problem size based
> on accuracy/model complexity and choose a Pareto tradeoff of execution
> time with efficiency (1/cost) to decide how many nodes to use.  Most of
> those sloping tails on the left become vertical lines under that
> transformation.

   I don't see the connection between your first sentence and the other 
sentences.

   How does the plot with time instead of size tell you what number of 
processors to use?

   I don't understand the plots with x as a time axis, so I suspect most 
potential readers won't. The only point of the plots is really to give an idea 
of the scale of the performance and that performance is low except for large 
sizes so will keep the plot axis as is.


> 
> How is latency defined in Figure 6?

  Least squares. We will reference the definition given later.
> 
> Data upon which the latency-bandwidth model is derived should be plotted
> to show the fit, and the model needs to be constrained to avoid negative
> latency.

   For the CPU we are playing with the plots to see if they can convey some 
useful information. The performance over each range of values on the CPU is 
simply not linear, hence a linear fit gives incomplete information. The 
negative latencies perhaps don't convey particularly useful information so we 
may not list them, the bandwidth value I think is still useful because it does 
convey locally the performance of the memory.

> 
> If you give me access to the repository with data and current plotting
> scripts, I can take a crack at slicing it in the way that I think would
> be useful.

  Hannah will give you the data.

  Barry

> 
> "Smith, Barry F. via petsc-dev"  writes:
> 
>>   We've prepared a short report on the performance of vector operations on 
>> Summit and would appreciate any feed back including: inconsistencies, lack 
>> of clarity, incorrect notation or terminology, etc.
>> 
>>   Thanks
>> 
>>Barry, Hannah, and Richard