+1! Tommaso 2010/12/2 Filipe David Manana <[email protected]>
> Hi all, > > While thinking about how to implement HAMA-298 (killing a job): > > - Each task of a user's submitted job is run in a thread > (BSPTaskRunner) spawned by a GroomServer. Stopping a thread is > something not trivial, and it requires that the thread cooperates. > Typically one sets a volatile boolean to true or false, the thread > periodically checks it and then *should* stop if it's value changed to > true or false. This requires implementers of the class BSP to check > for this boolean and possibly adding the same sort of behaviour to > BSPPeer. > > - If a task is simply blocked, for whatever reason, it will never stops. > > - If we assume users' job code will do this check, we're being too > optimistic. > > Hadoop spawns a new JVM for each task, and the child JVM communicates > with the parent (TaskTracker) though the TaskUmbilicalProtocol. This > makes it easier to stop a task (simply kill the child process), > doesn't require users' job code to periodically check some special > boolean, and equally important, allows isolation of errors. > > I think we should implement the same idea, not only helps implementing > a better job kill solution, but it will also be a big +1 for when we > support multiple running tasks per GroomServer. > > If everyone agrees with this, I'll start working on it (it will > possibly imply adding much new code). > > regards, > > -- > Filipe David Manana, > [email protected], [email protected] > > "Reasonable men adapt themselves to the world. > Unreasonable men adapt the world to themselves. > That's why all progress depends on unreasonable men." >
