Did anybody try to convert HiveQL queries to SparkSQL? If so, would you
share the experience, pros cons please? Thank you.
On Thu, Jul 30, 2015 at 10:37 AM, Bigdata techguy bigdatatech...@gmail.com
wrote:
Thanks Jorn for the response and for the pointer questions to Hive
optimization tips
is
happening, using compression, using the best data types for join columns,
denormalizing etc:. I am using Hive version - 0.13.
The idea behind this POC is to find the strengths of SparkSQL over HiveQL
and identify the use cases where SparkSQL can perform better than HiveQL
other than
Hi All,
I have a fairly complex HiveQL data processing which I am trying to convert
to SparkSQL to improve performance. Below is what it does.
Select around 100 columns including Aggregates
From a FACT_TABLE
Joined to the summary of the same FACT_TABLE
Joined to 2 smaller DIMENSION tables.
The
What Hive Version are you using? Do you run it in on TEZ? Are you using the
ORC Format? Do you use compression? Snappy? Do you use Bloom filters? Do
you insert the data sorted on the right columns? Do you use partitioning?
Did you increase the replication factor for often used tables or