Karl,

    Thanks for your comments.


> On Oct 10, 2019, at 12:34 AM, Karl Rupp via petsc-dev <petsc-dev@mcs.anl.gov> 
> wrote:
> 
> Hi,
> 
> Table 2 reports negative latencies. This doesn't look right to me ;-)
> If it's the outcome of a parameter fit to the performance model, then use a 
> parameter name (e.g. alpha) instead of the term 'latency'.

  Per Jed's suggestion we will include some plots and additional information to 
make clearer what happens on the CPU.

    
> 
> Figure 11 has a very narrow range in the y-coordinate and thus exaggerates 
> the variation greatly. "GPU performance" should be adjusted to something like 
> "execution time" to explain the meaning of the y-axis.

    Thanks. Fixed by adding the next size of 10^7 also

> 
> Page 12: The latency for VecDot is higher than for VecAXPY because VecDot 
> requires the result to be copied back to the host. This is an additional 
> operation.

   Good point. We will include this. 
> 
> Regarding performance measurements: Did you synchronize after each kernel 
> launch? I.e. did you run (approach A)

For all our runs, as stated near the beginning of the text many times == 1. 
This seems to work fine and are reproducible so I don't see a need to run 
multiple times.

  Barry



> for (many times) {
>   synchronize();
>   start_timer();
>   kernel_launch();
>   synchronize();
>   stop_timer();
> }
> and then take averages over the timings obtained, or did you (approach B)
> synchronize();
> start_timer();
> for (many times) {
>   kernel_launch();
> }
> synchronize();
> stop_timer();
> and then divide the obtained time by the number of runs?
> 
> Approach A will report a much higher latency than the latter, because 
> synchronizations are expensive (i.e. your latency consists of kernel launch 
> latency plus device synchronization latency). Approach B is slightly 
> over-optimistic, but I've found it to better match what one observes for an 
> algorithm involving several kernel launches.
> 
> Best regards,
> Karli
> 
> 
> 
> On 10/10/19 12:34 AM, Smith, Barry F. via petsc-dev wrote:
>>    We've prepared a short report on the performance of vector operations on 
>> Summit and would appreciate any feed back including: inconsistencies, lack 
>> of clarity, incorrect notation or terminology, etc.
>>    Thanks
>>     Barry, Hannah, and Richard

Reply via email to