Fergus Henderson wrote:
On Tue, Apr 28, 2009 at 4:28 PM, Robert W. Anderson <anderson...@poptop.llnl.gov <mailto:anderson...@poptop.llnl.gov>> wrote:


    I have an environment where we have many nodes potentially available
    for compilation, and all of them see the same file spaces via NFS.
     We are seeing decent performance out of distcc 3.1 using pump mode,
    but from reading the docs there may be big performance gains left to
    wring out in this special(?) case.

    If I understand correctly, distcc's pump mode finds a set of header
    files necessary to send along with the source file to enable
    compilation on a remote node.  In a homogeneous environment, it
    seems both steps here are unnecessary if the master and slave nodes
    are more or less indistinguishable in terms of compiler, sources,
    and headers.

    I think we could really achieve some screaming compile times (over
    thousands of source files) if these steps could be bypassed with the
    user's explicit acknowledgement that he is making assumptions about
    the homogeneity of his build server machines.

    How extensive would the modifications be to support such an
    optimization?  It was not clear to me after a few minutes of poking
    around in the source, and thought I'd seek an expert opinion first.


Typically NFS is a lot slower than local file access.
So it's not clear that this approach would actually improve overall performance.

Distcc can work faster than NFS, because it sends all of the source files at once, requiring only one round-trip between the client and the distcc server for each compilation. With NFS, you need a round-trip between the distcc server and the NFS server for each header file that is included (directly or indirectly) from the source file being compiled.

Of course with distcc, if your source files are on NFS, the client needs to do the same round-trips to the NFS server to fetch the files, but this is not as bad as having the distcc servers do that, because the distcc client need only fetch each file once for the whole build, not once for each compilation in which it is referenced, and after that the file will probably be cached. In addition, the client machine is more likely to have source files cached from previous builds, since on the client machine you're probably compiling the same sources that you compiled last time, whereas on the distcc server machines they are serving lots of different users who may be compiling very different programs.

Another issue with this approach is that there may also be additional security considerations. Currently distcc servers normally run as user "distcc", which may not have access to the user's NFS files, so this approach would not work if the source files are not world-readable. Of course it would be possible to address this issue by having the distcc server authenticate the user, and then access the user's files on NFS as that user, but that would require additional authentication, which would have a performance impact. For example one way to do it would be to use distcc's ssh mode, but that mode has a major performance impact. (The recently posted patches for GSSAPI support have less performance impact, but there is still a significant impact.)

For the approach that you are considering, you may not need to use distcc at all; a simple script using ssh may be sufficient, though the overheads of ssh may be prohibitive (ssh connection sharing may help with that, although that has security concerns of its own). If you do want to modify distcc, I'd guess that the modifications needed would be moderate in scope.

Fergus,

Thanks for the clear and detailed reply. First I should note that I am already using ssh mode (via rsh) because I was unable to make TCP mode work. I don't know if this is some kind of port blocking restriction on my machine or what:

distcc[1663] (dcc_pump_sendfile) ERROR: sendfile failed: Connection reset by peer
distcc[1663] (dcc_writex) ERROR: failed to write: Broken pipe
distcc[1663] Warning: failed to distribute source.c to host16,cpp,lzo, running locally instead

Perhaps getting TCP mode running should be my first performance priority.

I just tried what you suggested in your last paragraph, manually distributing compiles via rsh, and am finding that it is, as you suspected, a little slower than distcc using pump mode. Rather than pursue that any further, based on your comments, I would like to see if I can get TCP pump mode working first.

Thanks,
--
Robert W. Anderson
Center for Applied Scientific Computing
Email: anderson...@llnl.gov
Tel: 925-424-2858  Fax: 925-423-8704
__ distcc mailing list http://distcc.samba.org/ To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/distcc

Reply via email to