[ https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
XiaodongCui updated SPARK-19102: -------------------------------- Attachment: a.zip the attach file is my data,the data is parquet format > Accuracy error of spark SQL results > ----------------------------------- > > Key: SPARK-19102 > URL: https://issues.apache.org/jira/browse/SPARK-19102 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL > Affects Versions: 1.6.0, 1.6.1 > Environment: Spark 1.6.0, Hadoop 2.6.0,JDK 1.8,CentOS6.6 > Reporter: XiaodongCui > Attachments: a.zip > > > the problem is the result of the code blow that the second column's value > is not the same.the second sql result is 10000 times bigger than the first > sql result.the bug is only reappear in the format like sum(a * b),count > (distinct c) > DataFrame > df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); > df1.registerTempTable("hd_salesflat"); > DataFrame cube5 = sqlContext.sql("SELECT areacode1, > SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); > DataFrame cube6 = sqlContext.sql("SELECT areacode1, > SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM > hd_salesflat GROUP BY areacode1"); > cube5.show(50); > cube6.show(50); > my data: > transno | quantity | unitprice | areacode1 > 76317828| 1.0000 | 25.0000 | HDCN > data schema: > |-- areacode1: string (nullable = true) > |-- quantity: decimal(20,4) (nullable = true) > |-- unitprice: decimal(20,4) (nullable = true) > |-- transno: string (nullable = true) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org