Re: difference between Tajo, Hive and Impala

Hyunsik Choi Mon, 27 May 2013 21:19:32 -0700

If you mentioned MapReduce 2 jobs as Hadoop Yarn, you are right. Tajo uses
Hadoop Yarn as a primary resource manager.


- hyunsik


On Tue, May 28, 2013 at 7:46 AM, Tejas Patil <[email protected]>wrote:

> Please correct me if I am wrong.
>
> Hive : converts query to Map Reduce job(s). Can work on large scale data
> irrespective of the size of result set.
> Impala : runs daemons across all data nodes to get results. no map-reduce
> job is launched. Good for queries with small result set.
> Tajo : converts query to Map Reduce 2 job(s). Smarter in terms of query
> plans generated and physical operator selection both based on cluster
> characteristics.
>
>
> On Sun, May 26, 2013 at 7:47 AM, Jihoon Son <[email protected]> wrote:
>
> > I'm sorry to send this mail again.
> > I cannot understand why the lower part of the above mail is regarded as a
> > signature.
> > =====================================================
> >
> > Hi, Tejas
> >
> > The key differences between Tajo and Impala is the design goal. To
> increase
> > the performance of query processing, Impala adopts an approach which the
> > main memory is utilized as much as possible and intermediate data are
> > transfered via streaming. If a query requires too much memory, Impala
> > cannot process the query. Thus, Impala says that it is not an alternate
> of
> > Hive.
> >
> > However, Tajo uses a query optimization which considers user queries,
> > characteristics of data, the status of cluster, and so on. Thus, Tajo can
> > process a query with Impala's algorithm, Hive's algorithm or any other
> > algorithms. For an example, Tajo can process a join query using the
> > repartition join, or the merge join. Intermediate results can be
> > materialized to disks or maintained in memory. Since Tajo builds a query
> > plan considering above mentioned various factors, it can always process
> > user queries. So, we can say that Tajo can be an alternate of Hive.
> >
> > Tajo can perform well over Hive for most of queries. The key reason is
> that
> > Tajo uses the own query engine while Hive uses MapReduce. This limits
> that
> > Hive can uses only MapReduce-based algorithms. However, Tajo can uses a
> > more optimized algorithm.
> >
> > A sort query is a good example. Hive supports only the hash partitioning.
> > Thus, each node sort data locally in the map phase and *ONE NODE* should
> > perform global sort in the reduce phase.
> > However, Tajo supports a sort algorithm using the range partitioning. In
> > the first phase, each node sort data locally as in Hive, but the
> > intermediate data are partitioned by the range of the sort key. In the
> > second phase, each node performs local sort to get the final results.
> Since
> > intermediate data are partitioned by the range of sort key, final results
> > are correct.
> >
> > If you have any questions about this,
> > please feel free to ask.
> >
> > Thanks,
> > Jihoon
> >
> >
> >
> > 2013/5/26 Jihoon Son <[email protected]>
> >
> > > Hi, Tejas
> > >
> > > The key differences between Tajo and Impala is the design goal. To
> > > increase the performance of query processing, Impala adopts an approach
> > > which the main memory is utilized as much as possible and intermediate
> > data
> > > are transfered via streaming. If a query requires too much memory,
> Impala
> > > cannot process the query. Thus, Impala says that it is not an alternate
> > of
> > > Hive.
> > >
> > > However, Tajo uses a query optimization which considers user queries,
> > > characteristics of data, the status of cluster, and so on. Thus, Tajo
> can
> > > process a query with Impala's algorithm, Hive's algorithm or any other
> > > algorithms. For an example, Tajo can process a join query using the
> > > repartition join, or the merge join. Intermediate results can be
> > > materialized to disks or maintained in memory. Since Tajo builds a
> query
> > > plan considering above mentioned various factors, it can always process
> > > user queries. So, we can say that Tajo can be an alternate of Hive.
> > >
> > > Tajo can perform well over Hive for most of queries. The key reason is
> > > that Tajo uses the own query engine while Hive uses MapReduce. This
> > limits
> > > that Hive can uses only MapReduce-based algorithms. However, Tajo can
> > uses
> > > a more optimized algorithm.
> > >
> > > A sort query is a good example. Hive supports only the hash
> partitioning.
> > > Thus, each node sort data locally in the map phase and*ONE NODE* should
> > > perform global sort in the reduce phase.
> > > However, Tajo supports a sort algorithm using the range partitioning.
> In
> > > the first phase, each node sort data locally as in Hive, but the
> > > intermediate data are partitioned by the range of the sort key. In the
> > > second phase, each node performs local sort to get the final results.
> > Since
> > > intermediate data are partitioned by the range of sort key, final
> results
> > > are correct.
> > >
> > > If you have any questions about this,
> > > please feel free to ask.
> > >
> > > Thanks,
> > > Jihoon
> > >
> > >
> > > 2013/5/26 Tejas Patil <[email protected]>
> > >
> > >> Hi @dev,
> > >>
> > >> Can anyone comment about the difference between Tajo, Hive and Impala
> ?
> > >> Also, what is the reason for Tajo to perform well over Hive ? In what
> > >> scenario would it be good to use Tajo ? and when would it be bad ?
> > >>
> > >> Thanks,
> > >> Tejas Patil
> > >> http://www.linkedin.com/in/tejaspatil1
> > >>
> > >
> > >
> > >
> > > --
> > > Jihoon Son
> > >
> > > Database & Information Systems Group,
> > > Prof. Yon Dohn Chung Lab.
> > > Dept. of Computer Science & Engineering,
> > > Korea University
> > > 1, 5-ga, Anam-dong, Seongbuk-gu,
> > > Seoul, 136-713, Republic of Korea
> > >
> > > Tel : +82-2-3290-3580
> > > E-mail : [email protected]
> > >
> >
> >
> >
> > --
> > Jihoon Son
> >
> > Database & Information Systems Group,
> > Prof. Yon Dohn Chung Lab.
> > Dept. of Computer Science & Engineering,
> > Korea University
> > 1, 5-ga, Anam-dong, Seongbuk-gu,
> > Seoul, 136-713, Republic of Korea
> >
> > Tel : +82-2-3290-3580
> > E-mail : [email protected]
> >
>

Re: difference between Tajo, Hive and Impala

Reply via email to