[distcc] DISTCC on Scyld cluster

Marcio Teixeira Mon, 22 Oct 2007 09:35:15 -0700


Hi everyone,

I'm trying find a good way to run "distcc" on a cluster that is runningScyld ClusterWare from Penguin Computing. This architecture consists ofseveral compute nodes which are hidden from the external network behinda single master node which is responsible for managing a work queue anddispatching jobs to appropriate compute nodes. The master and thecompute nodes are on a private network and can see each other, but theonly external access is to the master node. The "proper" way to use thesystem is to submit jobs via the queuing system. I manged to come upwith job script that does just that... it submits a job which reservesseveral nodes, and when it get scheduled, it runs "distccd" on theassigned nodes, and then does a "distcc" compile on the master node.This works, but there are several disadvantages. First, it's not allthat interactive. Submitting a compile job and having to wait someindeterminate amount of time for it to execute is sort of perverse...developers might as well compile on their own machines. Second, and Ithink this is the most frustrating problem, is that since "distcc" isrunning on the head node, the head node must have access to all thesource code. Which means developers must upload their code to the headnode, or put it on an NFS drive. Doing either of this defeats distcc'sbest feature, namely the transport protocol that it provides to allowyou to compile stuff on your *own* desktop machine using local storage.

So, we've been searching for alternatives. Two ideas came up, but bothare rather iffy, so I thought I would ping this group before spending alot of time fiddling with them. The first idea was to "daisy-chain"distcc. The idea is that we would run "distccd" on each of the workernodes (outside of the queuing system), and run another "distccd" on thehead node. The daemon on the head node would accept connections from theoutside world, and when it tried to run "gcc", it would really berunning "distcc", which would forward the request to a "distccd" on aworker node. So, in the outside network, developers would run "distcc",but set their DISTCC_HOSTS to only one machine -- the head node of thecluster.

So that's the first idea. The second idea is similar in that the headnode runs "distccd", and that developers have only that machine in theirDISTCC_HOSTS, but now, when the "distccd" runs "gcc", it instead runs awrapper script which calls "gcc" using Scyld ClusterWare "bpsh" wrapper,which starts the job on the head node, then migrates it to a computenode. My concern with this approach is that there is the possibility oftheir being a lot of overhead in migrating hundreds of small "gcc" tasksto the compute nodes one at a time.

So those are the two ideas. None of them seem ideal. Since I doubtanyone can comment on the second idea (it's likely something we have totry), my question to the group is 1) whether there is third, better,alternative someone has come up with, 2) whether I should even attemptthe "daisy-chaining" approach (would distcc even be able to handle this,or would it get hopelessly confused?).


Any thought would be very much appreciated! Thank you!

-- Marcio

__distcc mailing list http://distcc.samba.org/To unsubscribe or change options:https://lists.samba.org/mailman/listinfo/distcc

[distcc] DISTCC on Scyld cluster

Reply via email to