[ 
https://issues.apache.org/jira/browse/TAJO-603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902982#comment-13902982
 ] 

Min Zhou edited comment on TAJO-603 at 2/17/14 6:17 AM:
--------------------------------------------------------

Hi Jihoon, 

Thank you giving me the background around this.  I got your point. However, I 
think it's not necessary a bundle of tasks sharing one container. Here is my 
reason.

1. For standalone mode, each container is actually a thread. It's fast enough, 
no need to share. 
2. For yarn mode. I think in the future, we might support 2 types of yarn mode. 
   One is as spark/impala does,  yarn can start a tajo cluster, resouce is 
allocated for tajo daemons like tajo master or worker.  You can consider this 
type of clusters as dedicated clusters, only used for low latency SQL queries. 
Task resource allocation takes the same way as standalone mode. So container 
sharing is not needed as well. 
    The other type is like current implementation, share resources with other 
applications like storm, samza, spark or mapreduce jobs. Sharing container is 
just  slacken off  the overhead of yarn scheduling, can't fundamentally solve 
the problem, right?  Actually, Tajo queries in this type are always not very 
speed sensitive, and the data volume should be very large, thus the job always 
need minutes of time.  You can consider this type as a replacement of hive ETL 
jobs.  As fast as Tez is enough for users.  The overhead of yarn scheduling is 
quite light for those situations.

How do you think, Jihoon?

Regards,
Min


was (Author: coderplay):
Hi Jihoon, 

Thank you giving me the background around this.  I got your point. However, I 
think it's not necessary a bundle of tasks sharing one container. Here is my 
reason.

1. For standalone mode, each container is actually a thread. It's fast enough, 
no need to share. 
2. For yarn mode. I think in the future, we might support 2 types of yarn mode. 
   One is as spark/impala does,  yarn can start a tajo cluster, resouce is 
allocated for tajo daemons like tajo master or worker.  You can consider this 
type of clusters as dedicated clusters, only used for low latency SQL queries.
    The other type is like current implementation, share resources with other 
applications like storm, samza, spark or mapreduce jobs. You can consider tajo 
queries in this type as a replacement of hive ETL jobs. They are always not 
very speed sensitive, and the data volume should be very large, thus the job 
always need minutes of time. I think for this kind of job,  as fast as Tez is 
enough.  The overhead of yarn scheduling is quite light. 

>From the reason above, why not make the code simpler than before?

How do you think, Jihoon?

Regards,
Min

> Move container allocation from SubQuery down to QueryUnitAttempt
> ----------------------------------------------------------------
>
>                 Key: TAJO-603
>                 URL: https://issues.apache.org/jira/browse/TAJO-603
>             Project: Tajo
>          Issue Type: Sub-task
>            Reporter: Min Zhou
>             Fix For: 1.0-incubating
>
>         Attachments: schedule.png
>
>
> Tajo currently allocates all of the containers in SubQuery. That make things 
> complicated.  Both SubQuery and DefaultTaskScheduler should hold a copy of 
> allocated containers and running tasks.  And the event flow is difficult to 
> understand. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to