XiaodongCui created SPARK-19102: ----------------------------------- Summary: Accuracy error of spark SQL results Key: SPARK-19102 URL: https://issues.apache.org/jira/browse/SPARK-19102 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 1.6.1, 1.6.0 Environment: Spark 1.6.0, Hadoop 2.6.0,JDK 1.8,CentOS6.6 Reporter: XiaodongCui
the problem is the result of the code blow that is not the same.the second sql result is 10000 times bigger than the first sql result.the bug is only reappear in the format like sum(a * b),count (distinct c) DataFrame df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); df1.registerTempTable("hd_salesflat"); DataFrame cube5 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); DataFrame cube6 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM hd_salesflat GROUP BY areacode1"); cube5.show(50); cube6.show(50); -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org