Fergus Henderson wrote:
    What's possibly more interesting is the performance results.  I'm
    compiling about 1300 source files on nodes that have 16 cpu's each.

    Single node (plain GNU make, no distcc):

    -j2 8m 19s
    -j4 5m 46s
    -j5 6m 39s
    -j8 10m 35s

    I don't understand this, but it is repeatable.  Any ideas on that one?


It looks like your machine probably has 4 CPUs, with each job using nearly 100% CPU.

My node actually has four quad-core processors.

make -j4 cpu utilization, according to top:

Cpu0  :  0.7%us,  4.0%sy,  0.0%ni, 95.3%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu1  : 13.6%us, 29.7%sy,  0.0%ni, 56.8%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu2  :  0.7%us,  1.8%sy,  0.0%ni, 97.4%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu3  :  0.7%us,  0.7%sy,  0.0%ni, 98.5%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu4  : 26.8%us, 22.4%sy,  0.0%ni, 49.6%id,  0.0%wa,  0.0%hi,  1.1%si
Cpu5  :  7.0%us, 15.8%sy,  0.0%ni, 77.3%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu6  :  2.9%us,  2.9%sy,  0.0%ni, 94.2%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu7  :  2.6%us,  0.7%sy,  0.0%ni, 96.7%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu8  :  0.7%us,  3.7%sy,  0.0%ni, 95.6%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu9  :  9.5%us, 15.8%sy,  0.0%ni, 74.7%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu10 :  0.7%us,  2.9%sy,  0.0%ni, 96.4%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu11 :  0.4%us,  0.4%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu12 :  1.5%us,  5.8%sy,  0.0%ni, 92.7%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu13 :  5.1%us,  5.1%sy,  0.0%ni, 89.8%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu14 :  6.6%us,  5.1%sy,  0.0%ni, 88.3%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu15 : 27.5%us, 30.0%sy,  0.0%ni, 41.8%id,  0.0%wa,  0.0%hi,  0.7%si

It bounces all over the place but this is not an atypical snapshot. The machine is mostly idle, according to the fourth column. Is this somehow a major resource contention issue? Disk access, maybe? Note that I have tried local disk access on a /tmp partition, and while overall performance improves a bit, the scaling with increasing -j does not. -j5 is still slower than -j4.

A make -j8 run barely eats any more cpu. This is with fully local disk, too:

Cpu0  : 20.0%us, 38.2%sy,  0.0%ni, 41.8%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu1  :  5.4%us, 30.4%sy,  0.0%ni, 64.3%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu2  : 14.5%us, 21.8%sy,  0.0%ni, 63.6%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu3  : 12.7%us, 41.8%sy,  0.0%ni, 45.5%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu4  : 20.0%us, 27.3%sy,  0.0%ni, 52.7%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu5  :  3.7%us, 29.6%sy,  0.0%ni, 66.7%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu6  :  1.8%us, 32.7%sy,  0.0%ni, 65.5%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu7  :  1.8%us,  3.5%sy,  0.0%ni, 94.7%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu8  :  1.8%us, 30.9%sy,  0.0%ni, 67.3%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu9  :  5.4%us, 23.2%sy,  0.0%ni, 71.4%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu10 :  3.6%us, 12.7%sy,  0.0%ni, 83.6%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu11 :  0.0%us,  7.3%sy,  0.0%ni, 92.7%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu12 :  3.6%us, 32.7%sy,  0.0%ni, 63.6%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu13 :  5.4%us, 19.6%sy,  0.0%ni, 75.0%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu14 :  0.0%us,  5.5%sy,  0.0%ni, 94.5%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu15 :  3.5%us, 28.1%sy,  0.0%ni, 68.4%id,  0.0%wa,  0.0%hi,  0.0%si

Recall that -j8 results in a 10m build whereas -j4 results in a 5m build.

    I'm not sure how to effectively profile this. All the sources are on
    NFS.

    So I then went multi-node w/ 4 jobs per node.  Using localhost as a
    server only seems to slow things down, incidentally.

    1 node,  -j4:  5m 28s (using distcc and 1 remote node)
    2 nodes, -j8:  2m 57s
    3 nodes, -j12: 2m 16s
    4 nodes, -j16: 1m 58s
    5 nodes, -j20: 2m 7s

    Scaling seems to break down around the 4 node mark.  Our link step
    is only 5-6 seconds, so we are not getting bound by that.  Messing
    with -j further doesn't seem to help.  Any ideas for profiling this
    to find any final bottlenecks?


First, try running "top" during the build to determine the CPU usage on your local host. If it stays near 100%, then the bottleneck is local jobs such as linking and/or include scanning, and top will show you which jobs are using the CPU most. That's quite likely to be the limiting factor if you have a large number of nodes.

Not surprisingly (now), the localhost CPU is mostly idle as well during a multi-node build. A snapshot:

Cpu0  :  0.0%us, 11.5%sy,  0.0%ni, 88.5%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu1  :  1.9%us, 26.9%sy,  0.0%ni, 71.2%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu2  :  1.9%us, 29.6%sy,  0.0%ni, 68.5%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu3  :  1.9%us, 17.0%sy,  0.0%ni, 81.1%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu4  :  9.4%us, 43.4%sy,  0.0%ni, 43.4%id,  0.0%wa,  0.0%hi,  3.8%si
Cpu5  :  3.8%us, 28.3%sy,  0.0%ni, 67.9%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu6  :  1.9%us, 18.9%sy,  0.0%ni, 79.2%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu7  :  1.9%us, 28.8%sy,  0.0%ni, 69.2%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu8  :  1.9%us, 11.3%sy,  0.0%ni, 86.8%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu9  :  3.7%us, 37.0%sy,  0.0%ni, 59.3%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu10 :  1.9%us, 26.4%sy,  0.0%ni, 71.7%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu11 :  0.0%us, 11.3%sy,  0.0%ni, 88.7%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu12 :  1.9%us, 15.4%sy,  0.0%ni, 82.7%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu13 :  1.9%us, 30.2%sy,  0.0%ni, 67.9%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu14 :  3.8%us, 22.6%sy,  0.0%ni, 73.6%id,  0.0%wa,  0.0%hi,  0.0%si
Cpu15 :  1.9%us, 24.5%sy,  0.0%ni, 73.6%id,  0.0%wa,  0.0%hi,  0.0%si


Another possibility is lack of parallelism in your Makefile; you may have 1300 source files, but the dependencies in your Makefile probably mean that you can't actually run 1300 compiles in parallel. Maybe your Makefile only allows about 16 compiles to run in parallel on average.

I believe I fixed my makefiles to be, after a couple of short initial serial steps, fully parallel in compiling the source, both per directory and per source file. I do see the directories being interleaved in my output, and also big bursts of files from the same directory being launched.


--
Robert W. Anderson
Center for Applied Scientific Computing
Email: anderson...@llnl.gov
Tel: 925-424-2858  Fax: 925-423-8704
__ distcc mailing list http://distcc.samba.org/ To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/distcc

Reply via email to