Re: Hive on Spark is not populating correct records
After lots of expermiments, I have figured out that it was a potential bug in cloudera with Hive on Spark. Hive on Spark does not populate consistent output on aggregate functions. Hopefully, it will be fixed in next relaese. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Hive-on-Spark-is-not-populating-correct-records-tp28128p28650.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Hive on Spark is not populating correct records
Hi, Not sure whether it is right place to discuss this issue. I am running following Hive query multiple times with execution engine as Hive on Spark and Hive on MapReduce. With Hive on Spark: Result (count) were different of every execution. With Hive on MapReduce: Result (count) were same of every execution. Seems like Hive on Spark behaving differently in each execution and does not populating correct result. Volume of data as follow: my_table1 (left): 30 million records my_table2 (right): 85 million records -- Thanks Vikash -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Hive-on-Spark-is-not-populating-correct-records-tp28128.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org