[Discussion] Yarn-related parts and the DAG framework Refactoring

Hyunsik Choi Thu, 02 May 2013 10:31:22 -0700

Hi folks,

I'm going to share the current status of Tajo's DAG framework. Then, I'd
like to discuss the refactor of Yarn-related code and Tajo's DAG framework.
I'm looking forward to some advice and ideas. After this discussion, I hope
that we create some concrete Jira issues.


= Current status for DAG framework =

* A DAG framework consists of two parts: representation and control parts.
* MasterPlan and ExecutionBlock belong to the representation part.
* Query and SubQuery belong to the control part
* Query is a finite state machine and controls A DAG of ExecutionBlocks.
* SubQuery is a finite state machine and controls an ExecutionBlock.

= The below is more detailed description. =

* a distributed execution plan (MasterPlan.java) is a directed acyclic
graph, where each vertex is an ExecutionBlock and edges represents data
channels.

* an ExecutionBlock is a logical execution unit that could be distributed
across nodes.
  ** It's similar to map or reduce phase in MapReduce framework.
  ** an ExecutionBlock includes a logical plan to be transformed to a
physical execution plan that runs on each machine.
  ** a data channel indicates a pull-based data transmission in default and
includes one among repartition types, such as range, hash and list.

* Query internally has a FIFO scheduler (ExecutionBlockCursor) for a DAG of
ExecutionBlocks.
  ** For each call of ExecutionBlockCursor::nextBlock(), it retrieves an
ExecutionBlock to be executed in a postfix order. So, it keeps the
dependency of ExecutionBlocks.

* For each execution block, a SubQuery launches containers and then reuses
them for all tasks of this SubQuery. After all tasks are completed,
SubQuery kills all containers by invoking ContainerManager.stopContainer().

= Discussions =

* FIFO scheduler is inefficient. Even though there are available resources
in a cluster, it executes one ExecutionBlock at a time. We need a better
scheduler.

* For each ExecutionBlock, a SubQuery asks containers to RM of Yarn.
However, I haven't found out nice ways for determine the number of
containers and proper resources for each containers.

* We need a Local Mode (TAJO-45) for Tajo cluster. However, it looks
somewhat complicated because many parts are tied to Yarn. How about
refactoring all parts to be independent from Yarn?

* In the current implementation, Tajo uses Yarn as a cluster resource
manager and launches containers when a query is issued. However, this
approach is very slow. The initialization cost (for allocating and
launching containers) takes at least 3-5 seconds even in 32 cluster nodes
according to my experiences. How about considering standby mode?
  ** Standby mode means that a number of TaskRunners are in standby
according to user's request.

Best regards,
Hyunsik Choi

[Discussion] Yarn-related parts and the DAG framework Refactoring

Reply via email to