[
https://issues.apache.org/jira/browse/TAJO-603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902982#comment-13902982
]
Min Zhou edited comment on TAJO-603 at 2/17/14 6:17 AM:
--------------------------------------------------------
Hi Jihoon,
Thank you giving me the background around this. I got your point. However, I
think it's not necessary a bundle of tasks sharing one container. Here is my
reason.
1. For standalone mode, each container is actually a thread. It's fast enough,
no need to share.
2. For yarn mode. I think in the future, we might support 2 types of yarn mode.
One is as spark/impala does, yarn can start a tajo cluster, resouce is
allocated for tajo daemons like tajo master or worker. You can consider this
type of clusters as dedicated clusters, only used for low latency SQL queries.
Task resource allocation takes the same way as standalone mode. So container
sharing is not needed as well.
The other type is like current implementation, share resources with other
applications like storm, samza, spark or mapreduce jobs. Sharing container is
just slacken off the overhead of yarn scheduling, can't fundamentally solve
the problem, right? Actually, Tajo queries in this type are always not very
speed sensitive, and the data volume should be very large, thus the job always
need minutes of time. You can consider this type as a replacement of hive ETL
jobs. As fast as Tez is enough for users. The overhead of yarn scheduling is
quite light for those situations.
How do you think, Jihoon?
Regards,
Min
was (Author: coderplay):
Hi Jihoon,
Thank you giving me the background around this. I got your point. However, I
think it's not necessary a bundle of tasks sharing one container. Here is my
reason.
1. For standalone mode, each container is actually a thread. It's fast enough,
no need to share.
2. For yarn mode. I think in the future, we might support 2 types of yarn mode.
One is as spark/impala does, yarn can start a tajo cluster, resouce is
allocated for tajo daemons like tajo master or worker. You can consider this
type of clusters as dedicated clusters, only used for low latency SQL queries.
The other type is like current implementation, share resources with other
applications like storm, samza, spark or mapreduce jobs. You can consider tajo
queries in this type as a replacement of hive ETL jobs. They are always not
very speed sensitive, and the data volume should be very large, thus the job
always need minutes of time. I think for this kind of job, as fast as Tez is
enough. The overhead of yarn scheduling is quite light.
>From the reason above, why not make the code simpler than before?
How do you think, Jihoon?
Regards,
Min
> Move container allocation from SubQuery down to QueryUnitAttempt
> ----------------------------------------------------------------
>
> Key: TAJO-603
> URL: https://issues.apache.org/jira/browse/TAJO-603
> Project: Tajo
> Issue Type: Sub-task
> Reporter: Min Zhou
> Fix For: 1.0-incubating
>
> Attachments: schedule.png
>
>
> Tajo currently allocates all of the containers in SubQuery. That make things
> complicated. Both SubQuery and DefaultTaskScheduler should hold a copy of
> allocated containers and running tasks. And the event flow is difficult to
> understand.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)