Thanks Philippe.

For reference, the two machine types in typical use in the Packet system
can be referenced as follows:

c1.large.arm - 96-core Cavium (Marvell) ThunderX
c2.large.arm - 32-core Ampere eMag

The Ampere data sheet from their OEM (Lenovo) is below.

https://amperecomputing.com/wp-content/uploads/2019/01/Lenovo_ThinkSystem_HR330A_20190118.pdf

Memory configuration of both systems is 128G so that should not be a need
to adjust for.

The Travis systems are indeed using the c2.large.arm builds within an LXD
container. There may be other limits provided to each container there, but
certainly you want to use more than 3 cores. The Travis machines are
specially configured with NVMe instead of SSD disk, which helps IO quite a
bit.

Configuring optimal builds across a variety of multi-core systems can be
hard to tune for optimal performance. Depending on the system - and they
vary a lot - you might or might not get more performance from adding more
cores, as sometimes you start to max out memory bandwidth. As a concrete
example, note that the Marvell ThunderX2 configuration will report 4
hardware threads per core to the system, making you think that you can do
`make -j 224`, but if you do timings on actual throughput often you'll be
better off with `make -j 56`, apparently due to scheduling leading to
contention between threads.

I concur with your decision to run the test suite single-threaded. Unless
the test environment has been designed from the start to use lots of cores,
my observation is that lots of real-world test suites have unavoidable
dependencies in test run order, and generally make optimistic assumptions
about machine state between tests.

Happy to be a resource for any other performance issues, and if you run
into anything fun I'm also happy to relate it to my friends at Ampere who
have been tracking perf and regressions across a wide set of operating and
open source packages. Also, I am always and forever looking for people who
have interest and tools and good intuition about how to make the best use
out of machines with many cores.

thanks

Ed

On Fri, Jan 31, 2020 at 10:52 AM Philippe Mathieu-Daudé <phi...@redhat.com>
wrote:

> (Cc'ing Ed Vielmetti)
>
> On 1/30/20 12:32 PM, Alex Bennée wrote:
> > The arm64 hardware was especially hit by only building on 3 of the 32
> > available cores. Introduce a JOBS environment variable which we use
> > for all parallel builds. We still run the main checks single threaded
> > though so to make it easier to spot hangs.
> >
> > Signed-off-by: Alex Bennée <alex.ben...@linaro.org>
> > ---
> >   .travis.yml | 20 +++++++++++---------
> >   1 file changed, 11 insertions(+), 9 deletions(-)
> >
> > diff --git a/.travis.yml b/.travis.yml
> > index 1b92f40eab..a600f508b0 100644
> > --- a/.travis.yml
> > +++ b/.travis.yml
> > @@ -85,6 +85,8 @@ git:
> >   # Common first phase for all steps
> >   before_install:
> >     - if command -v ccache ; then ccache --zero-stats ; fi
> > +  - export JOBS=$(($(getconf _NPROCESSORS_ONLN) + 1))
>
> Yeah finally!
>
> Note, on the Cavium ThunderX CN88XX provided by Packet, Ed Vielmetti
> once suggested to use the --load-average make option due to Amdahl's
> law, and I noticed a minor speedup using -j96 -l47.5 (-l48 already
> starts to decrease).
>
> On https://docs.travis-ci.com/user/reference/overview/#linux I read
> "LXD compliant OS images for arm64 are run in Packet."
>
> Per
>
> https://travis-ci.community/t/what-machine-s-does-travis-use-for-arm64/5579/2
> the CPU seems to be a Ampere eMAG Skylark:
> https://en.wikichip.org/wiki/apm/microarchitectures/skylark
> Probably the eMAG 8180:
> https://en.wikichip.org/wiki/ampere_computing/emag/8180
>
> I don't know what would be the best limit for this CPU.
>
> Back to this patch, it indeed reduced the build time by 2+, so:
> Reviewed-by: Philippe Mathieu-Daudé <phi...@redhat.com>
> Tested-by: Philippe Mathieu-Daudé <phi...@redhat.com>
>
> > +  - echo "=== Using ${JOBS} simultaneous jobs ==="
> >
> >   # Configure step - may be overridden
> >   before_script:
> > @@ -93,7 +95,7 @@ before_script:
> >
> >   # Main build & test - rarely overridden - controlled by TEST_CMD
> >   script:
> > -  - BUILD_RC=0 && make -j3 || BUILD_RC=$?
> > +  - BUILD_RC=0 && make -j${JOBS} || BUILD_RC=$?
> >     - if [ "$BUILD_RC" -eq 0 ] ; then travis_retry ${TEST_CMD} ; else
> $(exit $BUILD_RC); fi
> >   after_script:
> >     - if command -v ccache ; then ccache --show-stats ; fi
> > @@ -125,7 +127,7 @@ matrix:
> >         env:
> >           - BASE_CONFIG="--enable-tools"
> >           - CONFIG="--disable-user --disable-system"
> > -        - TEST_CMD="make check-unit check-softfloat -j3"
> > +        - TEST_CMD="make check-unit check-softfloat -j${JOBS}"
> >           - CACHE_NAME="${TRAVIS_BRANCH}-linux-gcc-default"
> >
> >
> > @@ -160,13 +162,13 @@ matrix:
> >       - name: "check-unit coroutine=ucontext"
> >         env:
> >           - CONFIG="--with-coroutine=ucontext --disable-tcg"
> > -        - TEST_CMD="make check-unit -j3 V=1"
> > +        - TEST_CMD="make check-unit -j${JOBS} V=1"
> >
> >
> >       - name: "check-unit coroutine=sigaltstack"
> >         env:
> >           - CONFIG="--with-coroutine=sigaltstack --disable-tcg"
> > -        - TEST_CMD="make check-unit -j3 V=1"
> > +        - TEST_CMD="make check-unit -j${JOBS} V=1"
> >
> >
> >       # Check we can build docs and tools (out of tree)
> > @@ -366,7 +368,7 @@ matrix:
> >       - name: "GCC check-tcg (user)"
> >         env:
> >           - CONFIG="--disable-system --enable-debug-tcg"
> > -        - TEST_CMD="make -j3 check-tcg V=1"
> > +        - TEST_CMD="make -j${JOBS} check-tcg V=1"
> >           - CACHE_NAME="${TRAVIS_BRANCH}-linux-gcc-debug-tcg"
> >
> >
> > @@ -375,7 +377,7 @@ matrix:
> >       - name: "GCC plugins check-tcg (user)"
> >         env:
> >           - CONFIG="--disable-system --enable-plugins --enable-debug-tcg
> --target-list-exclude=sparc64-linux-user"
> > -        - TEST_CMD="make -j3 check-tcg V=1"
> > +        - TEST_CMD="make -j${JOBS} check-tcg V=1"
> >           - CACHE_NAME="${TRAVIS_BRANCH}-linux-gcc-debug-tcg"
> >
> >
> > @@ -383,7 +385,7 @@ matrix:
> >       - name: "GCC check-tcg (some-softmmu)"
> >         env:
> >           - CONFIG="--enable-debug-tcg
> --target-list=xtensa-softmmu,arm-softmmu,aarch64-softmmu,alpha-softmmu"
> > -        - TEST_CMD="make -j3 check-tcg V=1"
> > +        - TEST_CMD="make -j${JOBS} check-tcg V=1"
> >           - CACHE_NAME="${TRAVIS_BRANCH}-linux-gcc-debug-tcg"
> >
> >
> > @@ -391,7 +393,7 @@ matrix:
> >       - name: "GCC plugins check-tcg (some-softmmu)"
> >         env:
> >           - CONFIG="--enable-plugins --enable-debug-tcg
> --target-list=xtensa-softmmu,arm-softmmu,aarch64-softmmu,alpha-softmmu"
> > -        - TEST_CMD="make -j3 check-tcg V=1"
> > +        - TEST_CMD="make -j${JOBS} check-tcg V=1"
> >           - CACHE_NAME="${TRAVIS_BRANCH}-linux-gcc-debug-tcg"
> >
> >       - name: "[aarch64] GCC check-tcg"
> > @@ -500,7 +502,7 @@ matrix:
> >           - BUILD_DIR="release/build/dir" SRC_DIR="../../.."
> >           - BASE_CONFIG="--prefix=$PWD/dist"
> >           -
> CONFIG="--target-list=x86_64-softmmu,aarch64-softmmu,armeb-linux-user,ppc-linux-user"
> > -        - TEST_CMD="make install -j3"
> > +        - TEST_CMD="make install -j${JOBS}"
> >           - QEMU_VERSION="${TRAVIS_TAG:1}"
> >           - CACHE_NAME="${TRAVIS_BRANCH}-linux-gcc-default"
> >         script:
> >
>
>

Reply via email to