Hello Hive users,

We have released Hive on MR3 1.10. MR3 is an execution engine similar to
MapReduce and Tez, and it supports Hadoop, Kubernetes, and standalone mode.
Hive-MR3 uses MR3 for its execution backend in Hive 3.1.3. If you are
interested, please give it a try.

In MR3 1.10, we have re-written the shuffle library in Tez. In the previous
version, all tasks manage fetchers independently of each other. Now all
fetchers inside a container are managed by a common shuffle server.

For those interested in performance comparison, here are the latest results
of testing Hive-MR3 1.9/1.10, Trino 435, and Spark 3.4.1 using the
(original) TPC-DS benchmark with 10TB scale. All the systems were tested
with Java 17.

Hive-MR3 1.9: total 6473 seconds, geo-mean 25.0 seconds.
Hive-MR3 1.10: total 6138 seconds, geo-mean 24.4 seconds.
Trino 435: total 6950 seconds, geo-mean 19.2 seconds. Query 23 returns
wrong results. Query 72 fails.
Spark 3.4.1 (using Parquet instead of ORC): total 19044 seconds, geo-mean
35.9 seconds.

Thank you,

--- Sungwoo

Reply via email to