On Tue, Apr 28, 2009 at 4:28 PM, Robert W. Anderson < anderson...@poptop.llnl.gov> wrote:
> > I have an environment where we have many nodes potentially available for > compilation, and all of them see the same file spaces via NFS. We are > seeing decent performance out of distcc 3.1 using pump mode, but from > reading the docs there may be big performance gains left to wring out in > this special(?) case. > > If I understand correctly, distcc's pump mode finds a set of header files > necessary to send along with the source file to enable compilation on a > remote node. In a homogeneous environment, it seems both steps here are > unnecessary if the master and slave nodes are more or less indistinguishable > in terms of compiler, sources, and headers. > > I think we could really achieve some screaming compile times (over > thousands of source files) if these steps could be bypassed with the user's > explicit acknowledgement that he is making assumptions about the homogeneity > of his build server machines. > > How extensive would the modifications be to support such an optimization? > It was not clear to me after a few minutes of poking around in the source, > and thought I'd seek an expert opinion first. Typically NFS is a lot slower than local file access. So it's not clear that this approach would actually improve overall performance. Distcc can work faster than NFS, because it sends all of the source files at once, requiring only one round-trip between the client and the distcc server for each compilation. With NFS, you need a round-trip between the distcc server and the NFS server for each header file that is included (directly or indirectly) from the source file being compiled. Of course with distcc, if your source files are on NFS, the client needs to do the same round-trips to the NFS server to fetch the files, but this is not as bad as having the distcc servers do that, because the distcc client need only fetch each file once for the whole build, not once for each compilation in which it is referenced, and after that the file will probably be cached. In addition, the client machine is more likely to have source files cached from previous builds, since on the client machine you're probably compiling the same sources that you compiled last time, whereas on the distcc server machines they are serving lots of different users who may be compiling very different programs. Another issue with this approach is that there may also be additional security considerations. Currently distcc servers normally run as user "distcc", which may not have access to the user's NFS files, so this approach would not work if the source files are not world-readable. Of course it would be possible to address this issue by having the distcc server authenticate the user, and then access the user's files on NFS as that user, but that would require additional authentication, which would have a performance impact. For example one way to do it would be to use distcc's ssh mode, but that mode has a major performance impact. (The recently posted patches for GSSAPI support have less performance impact, but there is still a significant impact.) For the approach that you are considering, you may not need to use distcc at all; a simple script using ssh may be sufficient, though the overheads of ssh may be prohibitive (ssh connection sharing may help with that, although that has security concerns of its own). If you do want to modify distcc, I'd guess that the modifications needed would be moderate in scope. Cheers, Fergus. -- Fergus Henderson <fer...@google.com>
__ distcc mailing list http://distcc.samba.org/ To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/distcc