you seem like not to note the configuration varible "spreadOutApps"
And it's comment: // As a temporary workaround before better ways of configuring memory, we allow users to set // a flag that will perform round-robin scheduling across the nodes (spreading out each app // among all the nodes) instead of trying to consolidate each app onto a small # of nodes. 2015-03-14 10:41 GMT+08:00 bit1...@163.com <bit1...@163.com>: > Hi, sparkers, > When I read the code about computing resources allocation for the newly > submitted application in the Master#schedule method, I got a question > about data locality: > > // Pack each app into as few nodes as possible until we've assigned all > its cores > for (worker <- workers if worker.coresFree > 0 && worker.state == > WorkerState.ALIVE) { > for (app <- waitingApps if app.coresLeft > 0) { > if (canUse(app, worker)) { > val coresToUse = math.min(worker.coresFree, app.coresLeft) > if (coresToUse > 0) { > val exec = app.addExecutor(worker, coresToUse) > launchExecutor(worker, exec) > app.state = ApplicationState.RUNNING > } > } > } > } > > Looks that the resource allocation policy here is that Master will assign > as few workers as possible, so long as these few workers has enough > resources for the application. > My question is: Assume that the data the application will process is > spread on all the worker nodes, then the data locality is lost if using > the above policy? > Not sure whether I have unstandood correctly or I have missed something. > > > ------------------------------ > bit1...@163.com > -- 王海华