Re: [distcc] homogeneous environments

Fergus Henderson Thu, 30 Apr 2009 11:09:03 -0700

The include server could be the bottleneck.  What's the CPU usage for the
include server process?
Or it could be disk I/O.  Try iostat or vmstat to profile that.


On Thu, Apr 30, 2009 at 1:58 PM, Robert W. Anderson <
anderson...@poptop.llnl.gov> wrote:

> Fergus Henderson wrote:
>
>>    What's possibly more interesting is the performance results.  I'm
>>    compiling about 1300 source files on nodes that have 16 cpu's each.
>>
>>    Single node (plain GNU make, no distcc):
>>
>>    -j2 8m 19s
>>    -j4 5m 46s
>>    -j5 6m 39s
>>    -j8 10m 35s
>>
>>    I don't understand this, but it is repeatable.  Any ideas on that one?
>>
>>
>> It looks like your machine probably has 4 CPUs, with each job using nearly
>> 100% CPU.
>>
>
> My node actually has four quad-core processors.
>
> make -j4 cpu utilization, according to top:
>
> Cpu0  :  0.7%us,  4.0%sy,  0.0%ni, 95.3%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu1  : 13.6%us, 29.7%sy,  0.0%ni, 56.8%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu2  :  0.7%us,  1.8%sy,  0.0%ni, 97.4%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu3  :  0.7%us,  0.7%sy,  0.0%ni, 98.5%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu4  : 26.8%us, 22.4%sy,  0.0%ni, 49.6%id,  0.0%wa,  0.0%hi,  1.1%si
> Cpu5  :  7.0%us, 15.8%sy,  0.0%ni, 77.3%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu6  :  2.9%us,  2.9%sy,  0.0%ni, 94.2%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu7  :  2.6%us,  0.7%sy,  0.0%ni, 96.7%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu8  :  0.7%us,  3.7%sy,  0.0%ni, 95.6%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu9  :  9.5%us, 15.8%sy,  0.0%ni, 74.7%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu10 :  0.7%us,  2.9%sy,  0.0%ni, 96.4%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu11 :  0.4%us,  0.4%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu12 :  1.5%us,  5.8%sy,  0.0%ni, 92.7%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu13 :  5.1%us,  5.1%sy,  0.0%ni, 89.8%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu14 :  6.6%us,  5.1%sy,  0.0%ni, 88.3%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu15 : 27.5%us, 30.0%sy,  0.0%ni, 41.8%id,  0.0%wa,  0.0%hi,  0.7%si
>
> It bounces all over the place but this is not an atypical snapshot.  The
> machine is mostly idle, according to the fourth column.  Is this somehow a
> major resource contention issue?  Disk access, maybe?  Note that I have
> tried local disk access on a /tmp partition, and while overall performance
> improves a bit, the scaling with increasing -j does not. -j5 is still slower
> than -j4.
>
> A make -j8 run barely eats any more cpu.  This is with fully local disk,
> too:
>
> Cpu0  : 20.0%us, 38.2%sy,  0.0%ni, 41.8%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu1  :  5.4%us, 30.4%sy,  0.0%ni, 64.3%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu2  : 14.5%us, 21.8%sy,  0.0%ni, 63.6%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu3  : 12.7%us, 41.8%sy,  0.0%ni, 45.5%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu4  : 20.0%us, 27.3%sy,  0.0%ni, 52.7%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu5  :  3.7%us, 29.6%sy,  0.0%ni, 66.7%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu6  :  1.8%us, 32.7%sy,  0.0%ni, 65.5%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu7  :  1.8%us,  3.5%sy,  0.0%ni, 94.7%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu8  :  1.8%us, 30.9%sy,  0.0%ni, 67.3%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu9  :  5.4%us, 23.2%sy,  0.0%ni, 71.4%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu10 :  3.6%us, 12.7%sy,  0.0%ni, 83.6%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu11 :  0.0%us,  7.3%sy,  0.0%ni, 92.7%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu12 :  3.6%us, 32.7%sy,  0.0%ni, 63.6%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu13 :  5.4%us, 19.6%sy,  0.0%ni, 75.0%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu14 :  0.0%us,  5.5%sy,  0.0%ni, 94.5%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu15 :  3.5%us, 28.1%sy,  0.0%ni, 68.4%id,  0.0%wa,  0.0%hi,  0.0%si
>
> Recall that -j8 results in a 10m build whereas -j4 results in a 5m build.
>
>     I'm not sure how to effectively profile this. All the sources are on
>>    NFS.
>>
>>    So I then went multi-node w/ 4 jobs per node.  Using localhost as a
>>    server only seems to slow things down, incidentally.
>>
>>    1 node,  -j4:  5m 28s (using distcc and 1 remote node)
>>    2 nodes, -j8:  2m 57s
>>    3 nodes, -j12: 2m 16s
>>    4 nodes, -j16: 1m 58s
>>    5 nodes, -j20: 2m 7s
>>
>>    Scaling seems to break down around the 4 node mark.  Our link step
>>    is only 5-6 seconds, so we are not getting bound by that.  Messing
>>    with -j further doesn't seem to help.  Any ideas for profiling this
>>    to find any final bottlenecks?
>>
>>
>> First, try running "top" during the build to determine the CPU usage on
>> your local host.  If it stays near 100%, then the bottleneck is local jobs
>> such as linking and/or include scanning, and top will show you which jobs
>> are using the CPU most.  That's quite likely to be the limiting factor if
>> you have a large number of nodes.
>>
>
> Not surprisingly (now), the localhost CPU is mostly idle as well during a
> multi-node build.  A snapshot:
>
> Cpu0  :  0.0%us, 11.5%sy,  0.0%ni, 88.5%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu1  :  1.9%us, 26.9%sy,  0.0%ni, 71.2%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu2  :  1.9%us, 29.6%sy,  0.0%ni, 68.5%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu3  :  1.9%us, 17.0%sy,  0.0%ni, 81.1%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu4  :  9.4%us, 43.4%sy,  0.0%ni, 43.4%id,  0.0%wa,  0.0%hi,  3.8%si
> Cpu5  :  3.8%us, 28.3%sy,  0.0%ni, 67.9%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu6  :  1.9%us, 18.9%sy,  0.0%ni, 79.2%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu7  :  1.9%us, 28.8%sy,  0.0%ni, 69.2%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu8  :  1.9%us, 11.3%sy,  0.0%ni, 86.8%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu9  :  3.7%us, 37.0%sy,  0.0%ni, 59.3%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu10 :  1.9%us, 26.4%sy,  0.0%ni, 71.7%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu11 :  0.0%us, 11.3%sy,  0.0%ni, 88.7%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu12 :  1.9%us, 15.4%sy,  0.0%ni, 82.7%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu13 :  1.9%us, 30.2%sy,  0.0%ni, 67.9%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu14 :  3.8%us, 22.6%sy,  0.0%ni, 73.6%id,  0.0%wa,  0.0%hi,  0.0%si
> Cpu15 :  1.9%us, 24.5%sy,  0.0%ni, 73.6%id,  0.0%wa,  0.0%hi,  0.0%si
>
>
>  Another possibility is lack of parallelism in your Makefile; you may have
>> 1300 source files, but the dependencies in your Makefile probably mean that
>> you can't actually run 1300 compiles in parallel.  Maybe your Makefile only
>> allows about 16 compiles to run in parallel on average.
>>
>
> I believe I fixed my makefiles to be, after a couple of short initial
> serial steps, fully parallel in compiling the source, both per directory and
> per source file.  I do see the directories being interleaved in my output,
> and also big bursts of files from the same directory being launched.
>
>
> --
> Robert W. Anderson
> Center for Applied Scientific Computing
> Email: anderson...@llnl.gov
> Tel: 925-424-2858  Fax: 925-423-8704
> __ distcc mailing list            http://distcc.samba.org/
> To unsubscribe or change options:
> https://lists.samba.org/mailman/listinfo/distcc
>



-- 
Fergus Henderson <fer...@google.com>

__ 
distcc mailing list            http://distcc.samba.org/
To unsubscribe or change options: 
https://lists.samba.org/mailman/listinfo/distcc

Re: [distcc] homogeneous environments

Reply via email to