Re: HiveQL to SparkSQL

2015-08-03 Thread Bigdata techguy
Did anybody try to convert HiveQL queries to SparkSQL? If so, would you share the experience, pros cons please? Thank you. On Thu, Jul 30, 2015 at 10:37 AM, Bigdata techguy bigdatatech...@gmail.com wrote: Thanks Jorn for the response and for the pointer questions to Hive optimization tips

Re: HiveQL to SparkSQL

2015-07-30 Thread Bigdata techguy
is happening, using compression, using the best data types for join columns, denormalizing etc:. I am using Hive version - 0.13. The idea behind this POC is to find the strengths of SparkSQL over HiveQL and identify the use cases where SparkSQL can perform better than HiveQL other than

HiveQL to SparkSQL

2015-07-29 Thread Bigdata techguy
Hi All, I have a fairly complex HiveQL data processing which I am trying to convert to SparkSQL to improve performance. Below is what it does. Select around 100 columns including Aggregates From a FACT_TABLE Joined to the summary of the same FACT_TABLE Joined to 2 smaller DIMENSION tables. The

Re: HiveQL to SparkSQL

2015-07-29 Thread Jörn Franke
What Hive Version are you using? Do you run it in on TEZ? Are you using the ORC Format? Do you use compression? Snappy? Do you use Bloom filters? Do you insert the data sorted on the right columns? Do you use partitioning? Did you increase the replication factor for often used tables or