[ 
https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiaodongCui updated SPARK-19102:
--------------------------------
    Description: 
the problem is the  result  of the code blow that the second column's value   
is not the same.the second  sql result is 10000 times bigger than the     first 
sql result.the bug is only reappear  in the format like sum(a * b),count 
(distinct  c)

        DataFrame 
df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
                df1.registerTempTable("hd_salesflat");
                DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
                DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM hd_salesflat 
GROUP BY areacode1");
                cube5.show(50);
                cube6.show(50);



  was:
the problem is the  result  of the code blow that is not the same.the second  
sql result is 10000 times bigger than the first sql result.the bug is only 
reappear  in the format like sum(a * b),count (distinct  c)

        DataFrame 
df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
                df1.registerTempTable("hd_salesflat");
                DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
                DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM hd_salesflat 
GROUP BY areacode1");
                cube5.show(50);
                cube6.show(50);




> Accuracy error of spark SQL results
> -----------------------------------
>
>                 Key: SPARK-19102
>                 URL: https://issues.apache.org/jira/browse/SPARK-19102
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 1.6.0, 1.6.1
>         Environment: Spark 1.6.0, Hadoop 2.6.0,JDK 1.8,CentOS6.6
>            Reporter: XiaodongCui
>
> the problem is the  result  of the code blow that the second column's value   
> is not the same.the second  sql result is 10000 times bigger than the   first 
> sql result.the bug is only reappear  in the format like sum(a * b),count 
> (distinct  c)
>       DataFrame 
> df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
>               df1.registerTempTable("hd_salesflat");
>               DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
> SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
>               DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
> SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM 
> hd_salesflat GROUP BY areacode1");
>               cube5.show(50);
>               cube6.show(50);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to