Good afternoon. I've been following the discussion by Dan, Victor, Jean, and others regarding load balancing of distcc jobs. Sorry I've been lurking and haven't said anything until now.
Actually, I have said something on the subject, but it was quite a while ago. Martin probably still remembers the "tcpbalance" distcc load balancing proxy[1] that I'd written specifically for solving the kinds of load balancing problems that Dan & Victor have been trying to solve. Tcpbalance does the following: * Choose the fastest available distcc server at the moment. Servers are configured in a list, sorted from fastest CPUs to slowest. So it's possible to use hand-me-down machines with slower CPUs: they'll be used only if there's enough demand for tcpbalance to walk that far down the list. Machines with multiple CPUs can be configured to accept multiple simultaneous jobs from the proxy. If all servers are busy, the proxy waits until a server becomes available. At first, I'd used TCP load balancing proxies designed for HTTP use. They work fine as distcc proxies, but their round-robin scheduling was suboptimal. Especially when M developers with N simultaneous jobs approached or exceeded the # of CPUs available. * Supports builds initiated on multiple machines and by multiple developers. * Does not rely on NFS or other mechanism for sharing state information. * Requires no changes to distcc: it's just a dumb TCP proxy (once the distcc server is chosen). All developers have a DISTCC_HOSTS environment variable that is simply "tcpbalance_proxy_host/N" where N is the same number that the developer uses for "make -jN". * Automatically detects if a distcc server is down and stops sending jobs to down servers. * Has an embedded HTTP server to show the status of all distcc servers: # of jobs currently active, total # of jobs since the proxy started, total # of seconds used by all jobs since the proxy started, ... * Well-tested in a cross-compilation environment. I'd written tcpbalance while I was at Caspian Networks. I'd left Caspian before tcpbalance could be deployed across the entire company[2], we did have several developers using the same tcpbalance proxy quite happily * Allowed sysadmins to change the up/down state of existing servers as well as add & remove machines from the server list. * On-the-fly code upgrades, so I could fix bugs without taking the proxy down even for 1 second. * Crash-proof: if a bug caused the server to crash, the crashed thread(s) would automatically be restarted by a "supervisor" component. Tcpbalance is written in Erlang[3], which is a "concurrency-oriented programming language"[4], and such "micro-reboots" are extremely easy to do. If you've heard of the trend of Recovery-Oriented Computing[5], the concept is quite similar (but much less heavyweight than when done in other programming languages.) Things that tcpbalance does not do: * Tcpbalance doesn't keep track of other activities on backend machines. It assumes that they're dedicated to distcc use. If they're used for other purposes (e.g. desktops, running other CPU-intensive jobs), then a status monitor would have to be added to change the server status based on that external activity. Not hard to do, but I didn't need to do it. To be honest, I don't know why tcpbalance isn't used more often. It's probably a combination of factors.[6] * I haven't done much to promote tcpbalance's existance. * Building and installing the Erlang VM, and its accompanying s/w packages (stuff like a complete CORBA development environment), is too much of a hassle, too much disk space for the installation (40-50MB), too much time, too steep a learning curve, too verbose documentation (when I bothered to write any documentation at all), ... * People were worried about performance: the proxy is a single point where all distcc traffic goes in an out. In reality, a single CPU 800MHz proxy machine (100Mbit/sec Ethernet) handles 20+ simultaneous jobs without a problem, and that was good enough at the time. * Not that many people need to solve this particular problem. Dan's host randomization hack is quite cool for its simplicity. It solves the scheduling problem without any shared state. The solution isn't optimal, but it's good enough for some environments. Discussion of solving the distributed scheduling problem has popped up again in the last few weeks. I've had an idea kicking around for a while to solve the biggest "problems" of the tcpbalance solution: * Use a more "conventional" (otherwise known as "popular") language. * Avoid the perceived performance "problem" of the man-in-the-middle proxy. (Such a solution will probably be more scalable: fewer worries about CPU utilization, network utilization, etc.) So, this weekend, I decided to set aside a bit of time to cook up such a "better" solution. It's in Tcl (run on UNIX, Windows, Mac, ...), it's quite simple (simpler than "icecream"[7], as far as I can tell), has most of the features that tcpbalance does, and it wouldn't be very hard to add some other features that other environments need (e.g. status monitor on multi-use distcc servers (load average, screen saver status, whatever)). It would require a small change to the distcc client in order to be effective. Attached is a very short Tcl script that demonstrates that it's possible to just change "distcc"'s view of the DISTCC_HOSTS environment variable, and everything else can remain as-is. The Tcl programs that I'm attaching to this message are: 1. balance-daemon.tcl. It listens to 3 TCP ports: 5656: to tell the distcc client which distcc server to send the job to. The client should keep the TCP connection open until the job is done: that's the way the daemon knows that the client is done and the distcc server is idle & ready for another job. 5657: Send back a summary of the daemon's state. It's moderately human readable. It's also parsable by the balance-status-daemon.tcl, which uses it to get the list of distcc servers. 5658: Used by the balance-status-daemon.tcl to change the up/down state of distcc servers. 2. balance-status-daemon.tcl. Queries the balance-daemon on port 5657, then does a simple TCP connection to each distcc server, then sends the status update to the balance-daemon one port 5658. (It was simpler Tcl code to use these separate TCP ports, no other reason.) Because it opens a lot of TCP connections to distccd and then immediately closes them, distccd will spew a lot of complaints to syslog; feel free to ignore them. It was easier to perform this status checking function outside of the balance-daemon due to Tcl event loop reasons (or rather my understanding of how to (ab)use the Tcl event loop). But it does set the precedent of using a simple little protocol to have outside agents changing the state of the balance-daemon. 3. bal-wrap.tcl ... use "bal-wrap.tcl distcc gcc ...". It has hardcoded values for how to talk to the balance-daemon, and it has some other flaws, but it demonstrates what would be required to change in the distcc client in order to use balance-daemon. Distcc has been a wonderful, time-saving tool. I hope that these tools help other people solve their problems. If the techniques and/or source get folded into the distcc distribution, great. If not, c'est la vie. Enjoy! -Scott [1] Martin has mentioned Tcpbalance on the list a couple of times since my posting, IIRC. If you're interested, the source code is at http://www.snookles.com/erlang/tcpbalance/. [2] Actually, it would be 1 proxy per build cluster, and each development office had its own build cluster (to avoid cross-continent distcc traffic ... although running distcc in California and using a tcpbalance proxy & distcc servers in Minnesota was almost as good). [3] See http://www.erlang.org/. [4] See http://www.sics.se/~joe/talks/ll2_2002.pdf for the slides that Joe Armstrong used for a presentation he made to the 2002 MIT Lightweight Languages Conference. A recording of his presentation is available at http://ll2.ai.mit.edu/. One of his examples is an HTTP server written in Erlang that handles 80,000 concurrent HTTP requests on a modest single CPU Linux box. Interestingly, at the same conference, Todd Proebsting of Microsoft Research (IIRC) said in his presentation ("Disruptive Programming Language Technologies") that the concurrency & isolation features of Erlang are extremely useful. [5] Ask Mr. Google to look for the following terms & names: Recovery-oriented computing, crash-only computing, micro-reboots, Armando Fox, George Candea. They use UNIX processes to keep a fault in one software component from affecting other components (e.g. running components in separate Java VMs). Shameless plug: Erlang's threading model provides very strict isolation between threads within the same VM, so separate UNIX processes aren't required. And Erlang has been providing ROC-like "process supervisor" hierarchies for over a decade. [6] I guess I'm in an enumerative mood. :-) [7] http://wiki.kde.org/tiki-index.php?page=icecream --- balance-daemon.tcl --- #!/bin/sh # Trick to get next line ignored by tclsh \ exec /usr/bin/tclsh "$0" ${1+"$@"} ### ### Distcc load balancing daemon. ### Copyright (c) Scott Lystig Fritchie, 2004. ### All rights reserved. ### ### Permission is hereby granted, free of charge, to any person obtaining a ### copy of this software and associated documentation files (the "Software"), ### to deal in the Software without restriction, including without limitation ### the rights to use, copy, modify, merge, publish, distribute, sublicense, ### and/or sell copies of the Software, and to permit persons to whom the ### Software is furnished to do so, subject to the following conditions: ### ### The above copyright notice and this permission notice shall be included ### in all copies or substantial portions of the Software. ### ### THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR ### IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, ### FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL ### THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR ### OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ### ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR ### OTHER DEALINGS IN THE SOFTWARE. ### ### ### Global variables ### # The TCP port number that we listen to for balancing queries set BalanceTcpPort 5656 # The TCP port number that we listen to for status queries set StatusTcpPort 5657 # The TCP port number that we listen to for distcc server status updates. set StatusUpdateTcpPort 5658 # The distcc server list. The inside lists are pairs of: # 1. "hostname/IP address : distcc server TCP port number", just like # the DISTCC_HOSTS syntax # 2. The maximum number of simultaneous distcc jobs (usually 1 job # per CPU). # # The hosts should be listed order of in fastest to slowest CPUs. # We assume all CPUs on a host are the same speed. set ServerList [list \ [list "snookles:3632" 1] \ [list "newbb:3632" 1] \ [list "10.1.1.143" 1] \ ] set IdleServers 0; # Will be set correctly later set DownServers 0 # This is the waiting queue of sockets waiting for an idle server. # Initialize to be an empty list. set WaitQueue [list] set WaitQueueLen 0 # ServerStatus is an array that will be initialized later. # ServerUpState is an array that will be initialized later. # ClientStatus is an array that will be initialized later. # Verbose debugging output set Verbose 1 ### ### Procs ### proc error_msg {msg} { puts stderr $msg } proc verbose {msg} { global Verbose if {$Verbose} { puts $msg } } proc add_to_wait_queue {tuple} { global WaitQueue WaitQueueLen lappend WaitQueue $tuple incr WaitQueueLen } proc remove_from_wait_queue {} { global WaitQueue WaitQueueLen if {$WaitQueueLen == 0} { return "" } else { incr WaitQueueLen -1 set tuple [lindex $WaitQueue 0] set WaitQueue [lreplace $WaitQueue 0 0] return $tuple } } proc get_server_tuple {server_info_wanted} { global ServerList foreach pair $ServerList { set server_info [lindex $pair 0] if {$server_info == $server_info_wanted} { return $pair } } return "" } proc get_idle_server {} { global ServerList IdleServers DownServers ServerStatus global ServerUpState if {$IdleServers - $DownServers <= 0} { return "" } foreach pair $ServerList { set server_info [lindex $pair 0] if {$ServerStatus($server_info) > 0 && $ServerUpState($server_info) == "up"} { incr ServerStatus($server_info) -1 incr IdleServers -1 return $server_info } } error "get_idle_server: should never happen" } proc put_idle_server {server_info} { global ServerList IdleServers ServerStatus verbose "put_idle_server: $server_info is now idle" incr ServerStatus($server_info) incr IdleServers } proc save_client_state {client_s client_addr client_port server_info} { global ClientStatus set ClientStatus($client_s) [list $client_addr $client_port $server_info] } proc sock_readable {sock} { global ClientStatus if {[eof $sock]} { set client_addr [lindex $ClientStatus($sock) 0] set client_port [lindex $ClientStatus($sock) 1] set server_info [lindex $ClientStatus($sock) 2] verbose "Sock $sock has closed: client_s = client_addr = $client_addr, client_port = $client_port, server_info = $server_info" unset ClientStatus($sock) put_idle_server $server_info catch { fileevent $sock readable "" close $sock } # Main loop only runs if no other clients are connected. # So, we usually need to trigger a new dispatch here. dispatch_waiting_clients } else { while {[read $sock 1024] != ""} { verbose "Sock $sock sent us something..." } } } proc handle_new_client {client_s client_addr client_port} { verbose "client_s = $client_s, client_addr = $client_addr, client_port = $client_port" set server_info [get_idle_server] if {$server_info == ""} { verbose "No idle servers, append to wait queue" add_to_wait_queue [list $client_s $client_addr $client_port] return } save_client_state $client_s $client_addr $client_port $server_info catch { puts $client_s $server_info flush $client_s } fconfigure $client_s -blocking 0 -buffering none -translation binary fileevent $client_s readable [list sock_readable $client_s] verbose "handle_new_client: end" } proc dispatch_waiting_clients {} { global IdleServers DownServers verbose "dispatch_waiting_clients: top" while {$IdleServers - $DownServers > 0} { verbose "dispatch_waiting_clients: top of loop" set last_dispatched 0 set tuple [remove_from_wait_queue] if {$tuple == ""} { break } set client_s [lindex $tuple 0] set client_addr [lindex $tuple 1] set client_port [lindex $tuple 2] verbose "dispatch_waiting_client: client_s = $client_s" handle_new_client $client_s $client_addr $client_port } verbose "dispatch_waiting_clients: end" } proc handle_new_status_client {client_s client_addr client_port} { global BalanceTcpPort StatusTcpPort global ServerList IdleServers DownServers global WaitQueue WaitQueueLen global ServerStatus ClientStatus global ServerUpState catch { puts $client_s "BalanceTcpPort = $BalanceTcpPort, StatusTcpPort = $StatusTcpPort" puts $client_s "ServerList = $ServerList" puts $client_s "IdleServers = $IdleServers" puts $client_s "DownServers = $DownServers" puts $client_s "WaitQueue = $WaitQueue" puts $client_s "WaitQueueLen = $WaitQueueLen" puts $client_s "ServerStatus (available distcc servers) = [array get ServerStatus]" puts $client_s "ServerUpState (distcc server up/down status) = [array get ServerUpState]" puts $client_s "ClientStatus (who's connected to what) = [array get ClientStatus]" flush $client_s close $client_s } } proc handle_new_statusupdate_client {client_s client_addr client_port} { global ServerUpState global DownServers set line [gets $client_s] set l [split $line] set server_info [lindex $l 0] set status [lindex $l 1] if {$ServerUpState($server_info) == "down" && $status == "up"} { set tuple [get_server_tuple $server_info] if {$tuple != ""} { verbose "State change: $server_info was down, now up" set ServerUpState($server_info) "up" set max [lindex $tuple 1] incr DownServers "-$max" dispatch_waiting_clients } } if {$ServerUpState($server_info) == "up" && $status == "down"} { set tuple [get_server_tuple $server_info] if {$tuple != ""} { verbose "State change: $server_info was up, now down" set ServerUpState($server_info) "down" set max [lindex $tuple 1] incr DownServers $max } } close $client_s } ### ### Argument handling and startup tasks ### if {[catch {set Balance_s [socket -server handle_new_client $BalanceTcpPort]} cv]} { error_msg "Fatal error initializing TCP port $BalanceTcpPort: $cv" exit 1 } if {[catch {set Status_s [socket -server handle_new_status_client $StatusTcpPort]} cv]} { error_msg "Fatal error initializing TCP port $StatusTcpPort: $cv" exit 1 } if {[catch {set Update_s [socket -server handle_new_statusupdate_client $StatusUpdateTcpPort]} cv]} { error_msg "Fatal error initializing TCP port $StatusUpdateTcpPort: $cv" exit 1 } ### ### Initialize the ServerStatus array. ### set sum 0 foreach pair $ServerList { set server_info [lindex $pair 0] set max [lindex $pair 1] set ServerStatus($server_info) $max set ServerUpState($server_info) up ;# External source gives host down info incr sum $max } set IdleServers $sum ### ### Start running the Tcl event loop ### vwait forever error "NOT REACHED" --- balance-status-daemon.tcl --- #!/bin/sh # Trick to get next line ignored by tclsh \ exec /usr/bin/tclsh "$0" ${1+"$@"} ### ### Distcc load balancing daemon. ### Copyright (c) Scott Lystig Fritchie, 2004. ### All rights reserved. ### ### Permission is hereby granted, free of charge, to any person obtaining a ### copy of this software and associated documentation files (the "Software"), ### to deal in the Software without restriction, including without limitation ### the rights to use, copy, modify, merge, publish, distribute, sublicense, ### and/or sell copies of the Software, and to permit persons to whom the ### Software is furnished to do so, subject to the following conditions: ### ### The above copyright notice and this permission notice shall be included ### in all copies or substantial portions of the Software. ### ### THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR ### IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, ### FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL ### THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR ### OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ### ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR ### OTHER DEALINGS IN THE SOFTWARE. ### set LbHost [lindex $argv 0] set LbStatusPort [lindex $argv 1] set LbStatusUpdatePort [lindex $argv 2] set TestInterval 10 set Verbose 1 proc verbose {msg} { global Verbose if {$Verbose} { puts $msg } } while {1} { if {[catch { set sock [socket $LbHost $LbStatusPort] set stuff [read $sock] close $sock regexp {ServerList = ([^\n]*)\n} $stuff dummy server_list foreach server $server_list { set server_info [lindex $server 0] set si [split $server_info ":"] if {[lindex $si 1] == ""} { set port 3632 } else { set port [lindex $si 1] } if {[catch { set sock [socket [lindex $si 0] $port] close $sock }]} { verbose "$server_info is down" set status down } else { verbose "$server_info is up" set status up } set sock [socket $LbHost $LbStatusUpdatePort] puts $sock "$server_info $status" close $sock } } cv]} { puts stderr "Catch: $cv" } after [expr $TestInterval * 1000] } --- bal-wrap.tcl --- #!/bin/sh # Trick to get next line ignored by tclsh \ exec /usr/bin/tclsh "$0" ${1+"$@"} set LbHost localhost set LbPort 5656 # # There are three things that we don't do well: # # 1. We mix the compiler's stderr into stdout # 2. We don't preserve the exit status of the compiler. # 3. We assume that the load balancing service is always available. # # However, this is meant to be a demo only, so I # (at least) can live with the limitations. # set sock [socket $LbHost $LbPort] set server_info [gets $sock] set env(DISTCC_HOSTS) $server_info set cmd $argv catch {set out [eval "exec $cmd 2>@ stdout"] ; puts $out} catch {close $sock} exit 0 __ distcc mailing list http://distcc.samba.org/ To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/distcc