Hi all, While thinking about how to implement HAMA-298 (killing a job):
- Each task of a user's submitted job is run in a thread (BSPTaskRunner) spawned by a GroomServer. Stopping a thread is something not trivial, and it requires that the thread cooperates. Typically one sets a volatile boolean to true or false, the thread periodically checks it and then *should* stop if it's value changed to true or false. This requires implementers of the class BSP to check for this boolean and possibly adding the same sort of behaviour to BSPPeer. - If a task is simply blocked, for whatever reason, it will never stops. - If we assume users' job code will do this check, we're being too optimistic. Hadoop spawns a new JVM for each task, and the child JVM communicates with the parent (TaskTracker) though the TaskUmbilicalProtocol. This makes it easier to stop a task (simply kill the child process), doesn't require users' job code to periodically check some special boolean, and equally important, allows isolation of errors. I think we should implement the same idea, not only helps implementing a better job kill solution, but it will also be a big +1 for when we support multiple running tasks per GroomServer. If everyone agrees with this, I'll start working on it (it will possibly imply adding much new code). regards, -- Filipe David Manana, [email protected], [email protected] "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men."
