The include server could be the bottleneck. What's the CPU usage for the include server process? Or it could be disk I/O. Try iostat or vmstat to profile that.
On Thu, Apr 30, 2009 at 1:58 PM, Robert W. Anderson < anderson...@poptop.llnl.gov> wrote: > Fergus Henderson wrote: > >> What's possibly more interesting is the performance results. I'm >> compiling about 1300 source files on nodes that have 16 cpu's each. >> >> Single node (plain GNU make, no distcc): >> >> -j2 8m 19s >> -j4 5m 46s >> -j5 6m 39s >> -j8 10m 35s >> >> I don't understand this, but it is repeatable. Any ideas on that one? >> >> >> It looks like your machine probably has 4 CPUs, with each job using nearly >> 100% CPU. >> > > My node actually has four quad-core processors. > > make -j4 cpu utilization, according to top: > > Cpu0 : 0.7%us, 4.0%sy, 0.0%ni, 95.3%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu1 : 13.6%us, 29.7%sy, 0.0%ni, 56.8%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu2 : 0.7%us, 1.8%sy, 0.0%ni, 97.4%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu3 : 0.7%us, 0.7%sy, 0.0%ni, 98.5%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu4 : 26.8%us, 22.4%sy, 0.0%ni, 49.6%id, 0.0%wa, 0.0%hi, 1.1%si > Cpu5 : 7.0%us, 15.8%sy, 0.0%ni, 77.3%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu6 : 2.9%us, 2.9%sy, 0.0%ni, 94.2%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu7 : 2.6%us, 0.7%sy, 0.0%ni, 96.7%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu8 : 0.7%us, 3.7%sy, 0.0%ni, 95.6%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu9 : 9.5%us, 15.8%sy, 0.0%ni, 74.7%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu10 : 0.7%us, 2.9%sy, 0.0%ni, 96.4%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu11 : 0.4%us, 0.4%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu12 : 1.5%us, 5.8%sy, 0.0%ni, 92.7%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu13 : 5.1%us, 5.1%sy, 0.0%ni, 89.8%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu14 : 6.6%us, 5.1%sy, 0.0%ni, 88.3%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu15 : 27.5%us, 30.0%sy, 0.0%ni, 41.8%id, 0.0%wa, 0.0%hi, 0.7%si > > It bounces all over the place but this is not an atypical snapshot. The > machine is mostly idle, according to the fourth column. Is this somehow a > major resource contention issue? Disk access, maybe? Note that I have > tried local disk access on a /tmp partition, and while overall performance > improves a bit, the scaling with increasing -j does not. -j5 is still slower > than -j4. > > A make -j8 run barely eats any more cpu. This is with fully local disk, > too: > > Cpu0 : 20.0%us, 38.2%sy, 0.0%ni, 41.8%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu1 : 5.4%us, 30.4%sy, 0.0%ni, 64.3%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu2 : 14.5%us, 21.8%sy, 0.0%ni, 63.6%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu3 : 12.7%us, 41.8%sy, 0.0%ni, 45.5%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu4 : 20.0%us, 27.3%sy, 0.0%ni, 52.7%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu5 : 3.7%us, 29.6%sy, 0.0%ni, 66.7%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu6 : 1.8%us, 32.7%sy, 0.0%ni, 65.5%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu7 : 1.8%us, 3.5%sy, 0.0%ni, 94.7%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu8 : 1.8%us, 30.9%sy, 0.0%ni, 67.3%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu9 : 5.4%us, 23.2%sy, 0.0%ni, 71.4%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu10 : 3.6%us, 12.7%sy, 0.0%ni, 83.6%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu11 : 0.0%us, 7.3%sy, 0.0%ni, 92.7%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu12 : 3.6%us, 32.7%sy, 0.0%ni, 63.6%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu13 : 5.4%us, 19.6%sy, 0.0%ni, 75.0%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu14 : 0.0%us, 5.5%sy, 0.0%ni, 94.5%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu15 : 3.5%us, 28.1%sy, 0.0%ni, 68.4%id, 0.0%wa, 0.0%hi, 0.0%si > > Recall that -j8 results in a 10m build whereas -j4 results in a 5m build. > > I'm not sure how to effectively profile this. All the sources are on >> NFS. >> >> So I then went multi-node w/ 4 jobs per node. Using localhost as a >> server only seems to slow things down, incidentally. >> >> 1 node, -j4: 5m 28s (using distcc and 1 remote node) >> 2 nodes, -j8: 2m 57s >> 3 nodes, -j12: 2m 16s >> 4 nodes, -j16: 1m 58s >> 5 nodes, -j20: 2m 7s >> >> Scaling seems to break down around the 4 node mark. Our link step >> is only 5-6 seconds, so we are not getting bound by that. Messing >> with -j further doesn't seem to help. Any ideas for profiling this >> to find any final bottlenecks? >> >> >> First, try running "top" during the build to determine the CPU usage on >> your local host. If it stays near 100%, then the bottleneck is local jobs >> such as linking and/or include scanning, and top will show you which jobs >> are using the CPU most. That's quite likely to be the limiting factor if >> you have a large number of nodes. >> > > Not surprisingly (now), the localhost CPU is mostly idle as well during a > multi-node build. A snapshot: > > Cpu0 : 0.0%us, 11.5%sy, 0.0%ni, 88.5%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu1 : 1.9%us, 26.9%sy, 0.0%ni, 71.2%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu2 : 1.9%us, 29.6%sy, 0.0%ni, 68.5%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu3 : 1.9%us, 17.0%sy, 0.0%ni, 81.1%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu4 : 9.4%us, 43.4%sy, 0.0%ni, 43.4%id, 0.0%wa, 0.0%hi, 3.8%si > Cpu5 : 3.8%us, 28.3%sy, 0.0%ni, 67.9%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu6 : 1.9%us, 18.9%sy, 0.0%ni, 79.2%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu7 : 1.9%us, 28.8%sy, 0.0%ni, 69.2%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu8 : 1.9%us, 11.3%sy, 0.0%ni, 86.8%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu9 : 3.7%us, 37.0%sy, 0.0%ni, 59.3%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu10 : 1.9%us, 26.4%sy, 0.0%ni, 71.7%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu11 : 0.0%us, 11.3%sy, 0.0%ni, 88.7%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu12 : 1.9%us, 15.4%sy, 0.0%ni, 82.7%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu13 : 1.9%us, 30.2%sy, 0.0%ni, 67.9%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu14 : 3.8%us, 22.6%sy, 0.0%ni, 73.6%id, 0.0%wa, 0.0%hi, 0.0%si > Cpu15 : 1.9%us, 24.5%sy, 0.0%ni, 73.6%id, 0.0%wa, 0.0%hi, 0.0%si > > > Another possibility is lack of parallelism in your Makefile; you may have >> 1300 source files, but the dependencies in your Makefile probably mean that >> you can't actually run 1300 compiles in parallel. Maybe your Makefile only >> allows about 16 compiles to run in parallel on average. >> > > I believe I fixed my makefiles to be, after a couple of short initial > serial steps, fully parallel in compiling the source, both per directory and > per source file. I do see the directories being interleaved in my output, > and also big bursts of files from the same directory being launched. > > > -- > Robert W. Anderson > Center for Applied Scientific Computing > Email: anderson...@llnl.gov > Tel: 925-424-2858 Fax: 925-423-8704 > __ distcc mailing list http://distcc.samba.org/ > To unsubscribe or change options: > https://lists.samba.org/mailman/listinfo/distcc > -- Fergus Henderson <fer...@google.com>
__ distcc mailing list http://distcc.samba.org/ To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/distcc