Re: Hive on Spark is not populating correct records

2017-05-04 Thread Vikash Pareek
After lots of expermiments, I have figured out that it was a potential bug in
cloudera with Hive on Spark.
Hive on Spark does not populate consistent output on aggregate functions.

Hopefully, it will be fixed in next relaese.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Hive-on-Spark-is-not-populating-correct-records-tp28128p28650.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Hive on Spark is not populating correct records

2016-11-24 Thread Vikash Pareek
Hi,

Not sure whether it is right place to discuss this issue.

I am running following Hive query multiple times with execution engine as
Hive on Spark and Hive on MapReduce.

With Hive on Spark: Result (count) were different of every execution.
With Hive on MapReduce: Result (count) were same of every execution.

Seems like Hive on Spark behaving differently in each execution and does not
populating correct result.

Volume of data as follow:
my_table1 (left): 30 million records
my_table2 (right): 85 million records

-- Thanks
Vikash




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Hive-on-Spark-is-not-populating-correct-records-tp28128.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org