It seems like if there's nowhere to execute the job, we want the client program to just pause, before using too many resources, until it gets unqueued by a server ready to do the job. (Or, by a local slot being available.)
On Thu Oct 16 2014 at 2:43:35 AM Łukasz Tasz <luk...@tasz.eu> wrote: > Hi Martin, > > Lets assume that you can trigger more compilation tasks executors then you > have. > In this scenario you are facing situation that cluster is saturated. > When such a compilation will be triggered by two developers, or two CI > (e.g jenkins) jobs, then cluster is saturated twice... > > Default behaviour is to lock locally slot, and try to connect three > times, if not, fallback, if fallback is disabled CI got failed build > (fallback is not the case, since local machine cannot handle -j > $(distcc -j)). > > consider scenario, I have 1000 objects, 500 executors, > - clean build on one machine takes > 1000 * 20 sec (one obj) = 20000 / 16 processors = 1000 sec, > - on cluster (1000/500) * 20 sec = 40 sec > > Saturating cluster was impossible without pump mode, but now with pump > mode after "warm up" effect, pump can dispatch many tasks, and I faced > situation that saturated cluster destroys almost every compilation. > > My expectation is that cluster wont reject my connect, or reject will > be handled, either by client, either by server. > > by server: > - accept every connetion, > - fork child if not accepted by child, > - in case of pump prepare local dir structure, receive headers > - --critical section starts here-- multi value semaphore with value > maxchild > - execute job > - release semaphore > > > Also what you suggested may be even better solution, since client will > pick first avaliable executor instead of entering queue, so distcc > could make connection already in function dcc_lock_one() > > I already tried to set DISTCC_DIR on a common nfs share, but in case > you are triggering so many jobs, this started to be bottle neck... I > won't tell about locking on nfs, and also scenario that somebody will > make a lock on nfs and machine will got crash - will not work by > design :) > > I know that scenario is not happening very often, and it has more or > less picks characteristic, but we should be happy that distcc cluster > is saturated and this case should be handled. > > hope it's more clear now! > br > LT > > > > > > > > > > Łukasz Tasz > > > 2014-10-16 1:39 GMT+02:00 Martin Pool <m...@sourcefrog.net>: > > Can you try to explain more clearly what difference in queueing behavior > you > > expect from this change? > > > > I think probably the main change that's needed is for the client to ask > all > > masters if they have space, to avoid needing to effectively poll by > > retrying, or getting stuck waiting for a particular server. > > > > On Wed, Oct 15, 2014 at 12:53 PM, Łukasz Tasz <luk...@tasz.eu> wrote: > >> > >> Hi Guys, > >> > >> please correct me if I'm wrong, > >> - currently distcc tries to connect server 3 times, with small delay, > >> - server forks x childs and all of them are trying to accept incoming > >> connection. > >> If server runs out of childs (all of them are busy), client will > >> fallback, and within next 60 sec will not try this machine. > >> > >> What do you think about redesigning distcc in a way that master server > >> will always accept inconing connection, fork a child, but in a same > >> time only x of them will be able to enter compilation > >> task(dcc_spawn_child)? (mayby preforking still could be used?) > >> > >> This may create kind of queue, client always can decide by his own, if > >> can wait some time, or maximum is DISTCC_IO_TIMEOUT, but still it's > >> faster to wait, since probably on a cluster side it's just a pick of > >> saturation then making falback to local machine. > >> > >> currently I'm facing situation that many jobs are making fallback, and > >> localmachine is being killed by make's -j calculated for distccd... > >> > >> other trick maybe to pick different machine, if current is busy, but > >> this may be much more complex in my opinion. > >> > >> what do you think? > >> regards > >> Łukasz Tasz > >> __ > >> distcc mailing list http://distcc.samba.org/ > >> To unsubscribe or change options: > >> https://lists.samba.org/mailman/listinfo/distcc > > > > > > > > > > -- > > Martin >
__ distcc mailing list http://distcc.samba.org/ To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/distcc