Hi folks Branch TEZ-2003 has made a fair amount of progress over the last several months. This would be a good time to get started with merging the branch back to master. It's been running stable on a multi-node cluster for a while now. 0.8 is just getting started, so merging this in now would allow for additional testing. Also the merge/rebase is starting to get fairly painful.
Committers/interested individuals, could you please start reviewing the changes, and providing feedback. I'd like to target a merge vote by end of the month or early next month - if that's possible. Here's the core list of what has (and has not) changed in the branch, with some details which should help with reviews. - The Task Communicator plane has been made pluggable TaskAttemtListenerImp(l)TezDag split into a controller and TaskCommunicators. Most of the logic for handling task heartbeats, etc moved into TezTaskCommunicator - Custom ContainerLaunchers allowed. - Custom TaskSchedulers allowed - Task execution in the runtime re-written to be less prone to errors and races. (TezTaskRunner2 and related classes). - Runtime modified to preempt and interrupt running tasks. - New event type TA_KILLED - for situations where the runtime decides to kill a task. - Node tracking and blacklisting is per scheduler source - Test framework added to test external services. - The core state machines and DAG co-ordination remains mostly unchanged (except for the TA_KILLED support) Features which exist in the branch, as a side affect of the changes - Support for uber mode (Running tasks within the AM) - Partial support for preemptable tasks instead of containers (this exists only in the runtime. Additional changes are required in the runtime and AM to make this production ready) Pending work items, which are going to get added over the next few days. - API modifications to make them consistent with the way Tez defines APIs (Plugin / PluginContext) - API simplification for specification of which plugins run in an AM, and how vertices use these (This is currently conf based) - Additional unit tests. I expect all the APIs to evolve as we move forward and learn from usage. To that extent, these changes will be Unstable and can change even across minor releases. Thanks - Sid