If you mentioned MapReduce 2 jobs as Hadoop Yarn, you are right. Tajo uses Hadoop Yarn as a primary resource manager.
- hyunsik On Tue, May 28, 2013 at 7:46 AM, Tejas Patil <[email protected]>wrote: > Please correct me if I am wrong. > > Hive : converts query to Map Reduce job(s). Can work on large scale data > irrespective of the size of result set. > Impala : runs daemons across all data nodes to get results. no map-reduce > job is launched. Good for queries with small result set. > Tajo : converts query to Map Reduce 2 job(s). Smarter in terms of query > plans generated and physical operator selection both based on cluster > characteristics. > > > On Sun, May 26, 2013 at 7:47 AM, Jihoon Son <[email protected]> wrote: > > > I'm sorry to send this mail again. > > I cannot understand why the lower part of the above mail is regarded as a > > signature. > > ===================================================== > > > > Hi, Tejas > > > > The key differences between Tajo and Impala is the design goal. To > increase > > the performance of query processing, Impala adopts an approach which the > > main memory is utilized as much as possible and intermediate data are > > transfered via streaming. If a query requires too much memory, Impala > > cannot process the query. Thus, Impala says that it is not an alternate > of > > Hive. > > > > However, Tajo uses a query optimization which considers user queries, > > characteristics of data, the status of cluster, and so on. Thus, Tajo can > > process a query with Impala's algorithm, Hive's algorithm or any other > > algorithms. For an example, Tajo can process a join query using the > > repartition join, or the merge join. Intermediate results can be > > materialized to disks or maintained in memory. Since Tajo builds a query > > plan considering above mentioned various factors, it can always process > > user queries. So, we can say that Tajo can be an alternate of Hive. > > > > Tajo can perform well over Hive for most of queries. The key reason is > that > > Tajo uses the own query engine while Hive uses MapReduce. This limits > that > > Hive can uses only MapReduce-based algorithms. However, Tajo can uses a > > more optimized algorithm. > > > > A sort query is a good example. Hive supports only the hash partitioning. > > Thus, each node sort data locally in the map phase and *ONE NODE* should > > perform global sort in the reduce phase. > > However, Tajo supports a sort algorithm using the range partitioning. In > > the first phase, each node sort data locally as in Hive, but the > > intermediate data are partitioned by the range of the sort key. In the > > second phase, each node performs local sort to get the final results. > Since > > intermediate data are partitioned by the range of sort key, final results > > are correct. > > > > If you have any questions about this, > > please feel free to ask. > > > > Thanks, > > Jihoon > > > > > > > > 2013/5/26 Jihoon Son <[email protected]> > > > > > Hi, Tejas > > > > > > The key differences between Tajo and Impala is the design goal. To > > > increase the performance of query processing, Impala adopts an approach > > > which the main memory is utilized as much as possible and intermediate > > data > > > are transfered via streaming. If a query requires too much memory, > Impala > > > cannot process the query. Thus, Impala says that it is not an alternate > > of > > > Hive. > > > > > > However, Tajo uses a query optimization which considers user queries, > > > characteristics of data, the status of cluster, and so on. Thus, Tajo > can > > > process a query with Impala's algorithm, Hive's algorithm or any other > > > algorithms. For an example, Tajo can process a join query using the > > > repartition join, or the merge join. Intermediate results can be > > > materialized to disks or maintained in memory. Since Tajo builds a > query > > > plan considering above mentioned various factors, it can always process > > > user queries. So, we can say that Tajo can be an alternate of Hive. > > > > > > Tajo can perform well over Hive for most of queries. The key reason is > > > that Tajo uses the own query engine while Hive uses MapReduce. This > > limits > > > that Hive can uses only MapReduce-based algorithms. However, Tajo can > > uses > > > a more optimized algorithm. > > > > > > A sort query is a good example. Hive supports only the hash > partitioning. > > > Thus, each node sort data locally in the map phase and*ONE NODE* should > > > perform global sort in the reduce phase. > > > However, Tajo supports a sort algorithm using the range partitioning. > In > > > the first phase, each node sort data locally as in Hive, but the > > > intermediate data are partitioned by the range of the sort key. In the > > > second phase, each node performs local sort to get the final results. > > Since > > > intermediate data are partitioned by the range of sort key, final > results > > > are correct. > > > > > > If you have any questions about this, > > > please feel free to ask. > > > > > > Thanks, > > > Jihoon > > > > > > > > > 2013/5/26 Tejas Patil <[email protected]> > > > > > >> Hi @dev, > > >> > > >> Can anyone comment about the difference between Tajo, Hive and Impala > ? > > >> Also, what is the reason for Tajo to perform well over Hive ? In what > > >> scenario would it be good to use Tajo ? and when would it be bad ? > > >> > > >> Thanks, > > >> Tejas Patil > > >> http://www.linkedin.com/in/tejaspatil1 > > >> > > > > > > > > > > > > -- > > > Jihoon Son > > > > > > Database & Information Systems Group, > > > Prof. Yon Dohn Chung Lab. > > > Dept. of Computer Science & Engineering, > > > Korea University > > > 1, 5-ga, Anam-dong, Seongbuk-gu, > > > Seoul, 136-713, Republic of Korea > > > > > > Tel : +82-2-3290-3580 > > > E-mail : [email protected] > > > > > > > > > > > -- > > Jihoon Son > > > > Database & Information Systems Group, > > Prof. Yon Dohn Chung Lab. > > Dept. of Computer Science & Engineering, > > Korea University > > 1, 5-ga, Anam-dong, Seongbuk-gu, > > Seoul, 136-713, Republic of Korea > > > > Tel : +82-2-3290-3580 > > E-mail : [email protected] > > >
