[ 
https://issues.apache.org/jira/browse/TAJO-540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890328#comment-13890328
 ] 

Min Zhou commented on TAJO-540:
-------------------------------

Continue my previous 2 comments.  Sparrow improves "The power of two Choices" 
algorithm on  2 issues: 1) queued assignment can't accurately measure the real 
cost time of a task 2)  the concurrent scheduling problem. You can check the 
sparrow paper for the details.

As I mentioned, If we leverage a low-latency scheduler in an interactive or 
real-time system, we need radically change current design of tajo's scheduling.

Firstly, the way we use Yarn is quite different from Spark and Impala.  The 
resource requests are issued by Tajo workers, one container for one 
task/queryunit attempt.  While spark and impala uses yarn as a higher layer 
scheduler for resource management. They use sparrow(-like) as their own 
internal scheduler in a lower layer for the purpose of low latency.  Yarn is 
used for allocate the resources for a whole spark/impala cluster, not for a 
task. For example, if a spark cluster has 1 master and 10 slaves. The master 
need 10GB memory, and each of the slaves need 20GB memory. Yarn allocate a 10GB 
container for master daemon, and  20GB container for a slave daemon.  Because 
those daemons are long-lived process, those resource are long time occupied by 
the spark cluster.  Yarn revoke the resource only if one slave get 
decommissioned from the cluster. 

> (Umbrella) Implement Tajo Query Scheduler
> -----------------------------------------
>
>                 Key: TAJO-540
>                 URL: https://issues.apache.org/jira/browse/TAJO-540
>             Project: Tajo
>          Issue Type: New Feature
>            Reporter: Hyunsik Choi
>
> Currently, there is no Tajo query scheduler. So, all queries launched 
> simultaneously compete cluster resource which is managed by 
> TajoResourceManager.
> In this issue, we will investigate,  design, and implement a Tajo query 
> scheduler. This is an umbrella issue for that. We will create subtasks for 
> them.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to