Re: [petsc-users] log_summary time ratio and flops ratio

Jed Brown Mon, 08 Feb 2016 19:42:07 -0800

Xiangdong <[email protected]> writes:
> For these functions, the flop ratios are all 1.1, while the time ratio are
> 1.5-2.2. So the amount of work are sort of balanced for each processes.
> Both runs on Stampede and my group cluster gave similar behaviors. Given
> that I only use 256 cores, do you think it is likely that my job was
> assigned cores with different speeds? How can I test/measure this since
> each time the job was assigned to different nodes?
>
> Are there any other factors I should also look into for the behavior that
> flops ratio 1.1 but time ratio 1.5-2.1 for non-communicating functions?


Memory bandwidth can be an issue.  Like some nodes could have slower
memory installed.  Or, like happened to Dave and me at ETH, an old,
lopsided ramdisk partition could have been left behind by a previous
job, causing slow bandwidth after all the memory ends up faulted onto a
single memory channel.  You can investigate such issues with numastat
and third-party profilers.

I would start by seeing if you can reproduce with simpler PETSc
examples, then distinguish performance of a flops-limited local
operation versus bandwidth-limited operation.  It might be simple to
figure out, but it might also take a lot of work.

signature.asc
Description: PGP signature

Re: [petsc-users] log_summary time ratio and flops ratio

Reply via email to