[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results
[ https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiaodongCui updated SPARK-19102: Description: the problem is cube6's second column named sumprice is 1 times bigger than the cube5's second column named sumprice,but they should be equal .the first sql is "SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1" , the second sql is "SELECT areacode1, SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM hd_salesflat GROUP BY areacode1",the result of sumprice should be equal,but actually they are not ,that's the problem. code: DataFrame df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); df1.registerTempTable("hd_salesflat"); DataFrame cube5 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); DataFrame cube6 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM hd_salesflat GROUP BY areacode1"); cube5.select("sumprice").show(50); cube6.select("sumprice").show(50); my data has only one row and four column in the attach file: transno | quantity | unitprice | areacode1 76317828| 1. | 25. | HDCN data schema: |-- areacode1: string (nullable = true) |-- quantity: decimal(20,4) (nullable = true) |-- unitprice: decimal(20,4) (nullable = true) |-- transno: string (nullable = true) was: the problem is cube6's second column named sumprice is 1 times bigger than the cube5's second column named sumprice,but they should be equal .the first sql is "SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1" , the second sql is "SELECT areacode1, SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM hd_salesflat GROUP BY areacode1",the result of sumprice should be equal,but actually they are not ,that's the problem. code: DataFrame df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); df1.registerTempTable("hd_salesflat"); DataFrame cube5 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); DataFrame cube6 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM hd_salesflat GROUP BY areacode1"); cube5.select("sumprice").show(50); cube6.select("sumprice").show(50); my data: transno | quantity | unitprice | areacode1 76317828| 1. | 25. | HDCN data schema: |-- areacode1: string (nullable = true) |-- quantity: decimal(20,4) (nullable = true) |-- unitprice: decimal(20,4) (nullable = true) |-- transno: string (nullable = true) > Accuracy error of spark SQL results > --- > > Key: SPARK-19102 > URL: https://issues.apache.org/jira/browse/SPARK-19102 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0, 1.6.1 > Environment: Spark 1.6.0, Hadoop 2.6.0,JDK 1.8,CentOS6.6 >Reporter: XiaodongCui > Attachments: a.zip > > > the problem is cube6's second column named sumprice is 1 times bigger > than the cube5's second column named sumprice,but they should be equal .the > first sql is "SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM > hd_salesflat GROUP BY areacode1" , the second sql is "SELECT areacode1, > SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM > hd_salesflat GROUP BY areacode1",the result of sumprice should be equal,but > actually they are not ,that's the problem. > code: > > DataFrame > df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); > df1.registerTempTable("hd_salesflat"); > DataFrame cube5 = sqlContext.sql("SELECT areacode1, > SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); > DataFrame cube6 = sqlContext.sql("SELECT areacode1, > SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM > hd_salesflat GROUP BY areacode1"); > cube5.select("sumprice").show(50); > cube6.select("sumprice").show(50); > > my data has only
[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results
[ https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiaodongCui updated SPARK-19102: Description: the problem is cube6's second column named sumprice is 1 times bigger than the cube5's second column named sumprice,but they should be equal .the first sql is "SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1" , the second sql is "SELECT areacode1, SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM hd_salesflat GROUP BY areacode1",the result of sumprice should be equal,but actually they are not ,that's the problem. code: DataFrame df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); df1.registerTempTable("hd_salesflat"); DataFrame cube5 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); DataFrame cube6 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM hd_salesflat GROUP BY areacode1"); cube5.select("sumprice").show(50); cube6.select("sumprice").show(50); my data: transno | quantity | unitprice | areacode1 76317828| 1. | 25. | HDCN data schema: |-- areacode1: string (nullable = true) |-- quantity: decimal(20,4) (nullable = true) |-- unitprice: decimal(20,4) (nullable = true) |-- transno: string (nullable = true) was: the problem is cube6's second column named sumprice is 1 times bigger than the cube5's second column named sumprice,but they should be equal .the bug is only reappear in the format like sum(a * b),count (distinct c) code: DataFrame df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); df1.registerTempTable("hd_salesflat"); DataFrame cube5 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); DataFrame cube6 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM hd_salesflat GROUP BY areacode1"); cube5.select("sumprice").show(50); cube6.select("sumprice").show(50); my data: transno | quantity | unitprice | areacode1 76317828| 1. | 25. | HDCN data schema: |-- areacode1: string (nullable = true) |-- quantity: decimal(20,4) (nullable = true) |-- unitprice: decimal(20,4) (nullable = true) |-- transno: string (nullable = true) > Accuracy error of spark SQL results > --- > > Key: SPARK-19102 > URL: https://issues.apache.org/jira/browse/SPARK-19102 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0, 1.6.1 > Environment: Spark 1.6.0, Hadoop 2.6.0,JDK 1.8,CentOS6.6 >Reporter: XiaodongCui > Attachments: a.zip > > > the problem is cube6's second column named sumprice is 1 times bigger > than the cube5's second column named sumprice,but they should be equal .the > first sql is "SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM > hd_salesflat GROUP BY areacode1" , the second sql is "SELECT areacode1, > SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM > hd_salesflat GROUP BY areacode1",the result of sumprice should be equal,but > actually they are not ,that's the problem. > code: > > DataFrame > df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); > df1.registerTempTable("hd_salesflat"); > DataFrame cube5 = sqlContext.sql("SELECT areacode1, > SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); > DataFrame cube6 = sqlContext.sql("SELECT areacode1, > SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM > hd_salesflat GROUP BY areacode1"); > cube5.select("sumprice").show(50); > cube6.select("sumprice").show(50); > > my data: > transno | quantity | unitprice | areacode1 > 76317828| 1. | 25. | HDCN > data schema: > |-- areacode1: string (nullable = true) > |-- quantity: decimal(20,4) (nullable = true) > |-- unitprice: decimal(20,4) (nullable = true) > |-- transno: string (nullable = true) -- This message was sent by Atlassia
[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results
[ https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiaodongCui updated SPARK-19102: Description: the problem is cube6's second column named sumprice is 1 times bigger than the cube5's second column named sumprice,but they should be equal .the bug is only reappear in the format like sum(a * b),count (distinct c) code: DataFrame df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); df1.registerTempTable("hd_salesflat"); DataFrame cube5 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); DataFrame cube6 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM hd_salesflat GROUP BY areacode1"); cube5.select("sumprice").show(50); cube6.select("sumprice").show(50); my data: transno | quantity | unitprice | areacode1 76317828| 1. | 25. | HDCN data schema: |-- areacode1: string (nullable = true) |-- quantity: decimal(20,4) (nullable = true) |-- unitprice: decimal(20,4) (nullable = true) |-- transno: string (nullable = true) was: the problem is cube6's second column named sumprice is 1 times bigger than the cube5's second column named sumprice,but they should be equal .the bug is only reappear in the format like sum(a * b),count (distinct c) code: DataFrame df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); df1.registerTempTable("hd_salesflat"); DataFrame cube5 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); DataFrame cube6 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM hd_salesflat GROUP BY areacode1"); cube5.show(50); cube6.show(50); my data: transno | quantity | unitprice | areacode1 76317828| 1. | 25. | HDCN data schema: |-- areacode1: string (nullable = true) |-- quantity: decimal(20,4) (nullable = true) |-- unitprice: decimal(20,4) (nullable = true) |-- transno: string (nullable = true) > Accuracy error of spark SQL results > --- > > Key: SPARK-19102 > URL: https://issues.apache.org/jira/browse/SPARK-19102 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.6.0, 1.6.1 > Environment: Spark 1.6.0, Hadoop 2.6.0,JDK 1.8,CentOS6.6 >Reporter: XiaodongCui > Attachments: a.zip > > > the problem is cube6's second column named sumprice is 1 times bigger > than the cube5's second column named sumprice,but they should be equal .the > bug is only reappear in the format like sum(a * b),count (distinct c) > code: > > DataFrame > df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); > df1.registerTempTable("hd_salesflat"); > DataFrame cube5 = sqlContext.sql("SELECT areacode1, > SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); > DataFrame cube6 = sqlContext.sql("SELECT areacode1, > SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM > hd_salesflat GROUP BY areacode1"); > cube5.select("sumprice").show(50); > cube6.select("sumprice").show(50); > > my data: > transno | quantity | unitprice | areacode1 > 76317828| 1. | 25. | HDCN > data schema: > |-- areacode1: string (nullable = true) > |-- quantity: decimal(20,4) (nullable = true) > |-- unitprice: decimal(20,4) (nullable = true) > |-- transno: string (nullable = true) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results
[ https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiaodongCui updated SPARK-19102: Description: the problem is cube6's second column named sumprice is 1 times bigger than the cube5's second column named sumprice,but they should be equal .the bug is only reappear in the format like sum(a * b),count (distinct c) code: DataFrame df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); df1.registerTempTable("hd_salesflat"); DataFrame cube5 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); DataFrame cube6 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM hd_salesflat GROUP BY areacode1"); cube5.show(50); cube6.show(50); my data: transno | quantity | unitprice | areacode1 76317828| 1. | 25. | HDCN data schema: |-- areacode1: string (nullable = true) |-- quantity: decimal(20,4) (nullable = true) |-- unitprice: decimal(20,4) (nullable = true) |-- transno: string (nullable = true) was: the problem is cube6's second column named sumprice is 1 times bigger than the cube5's second column named sumprice,but they should be equal .the bug is only reappear in the format like sum(a * b),count (distinct c) DataFrame df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); df1.registerTempTable("hd_salesflat"); DataFrame cube5 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); DataFrame cube6 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM hd_salesflat GROUP BY areacode1"); cube5.show(50); cube6.show(50); my data: transno | quantity | unitprice | areacode1 76317828| 1. | 25. | HDCN data schema: |-- areacode1: string (nullable = true) |-- quantity: decimal(20,4) (nullable = true) |-- unitprice: decimal(20,4) (nullable = true) |-- transno: string (nullable = true) > Accuracy error of spark SQL results > --- > > Key: SPARK-19102 > URL: https://issues.apache.org/jira/browse/SPARK-19102 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.6.0, 1.6.1 > Environment: Spark 1.6.0, Hadoop 2.6.0,JDK 1.8,CentOS6.6 >Reporter: XiaodongCui > Attachments: a.zip > > > the problem is cube6's second column named sumprice is 1 times bigger > than the cube5's second column named sumprice,but they should be equal .the > bug is only reappear in the format like sum(a * b),count (distinct c) > code: > > DataFrame > df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); > df1.registerTempTable("hd_salesflat"); > DataFrame cube5 = sqlContext.sql("SELECT areacode1, > SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); > DataFrame cube6 = sqlContext.sql("SELECT areacode1, > SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM > hd_salesflat GROUP BY areacode1"); > cube5.show(50); > cube6.show(50); > > my data: > transno | quantity | unitprice | areacode1 > 76317828| 1. | 25. | HDCN > data schema: > |-- areacode1: string (nullable = true) > |-- quantity: decimal(20,4) (nullable = true) > |-- unitprice: decimal(20,4) (nullable = true) > |-- transno: string (nullable = true) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-19102) Accuracy error of spark SQL results
[ https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiaodongCui reopened SPARK-19102: - the data under the path :hdfs://cdh01:8020/sandboxdata_A/test/a in the attach file > Accuracy error of spark SQL results > --- > > Key: SPARK-19102 > URL: https://issues.apache.org/jira/browse/SPARK-19102 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.6.0, 1.6.1 > Environment: Spark 1.6.0, Hadoop 2.6.0,JDK 1.8,CentOS6.6 >Reporter: XiaodongCui > Attachments: a.zip > > > the problem is cube6's second column named sumprice is 1 times bigger > than the cube5's second column named sumprice,but they should be equal .the > bug is only reappear in the format like sum(a * b),count (distinct c) > DataFrame > df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); > df1.registerTempTable("hd_salesflat"); > DataFrame cube5 = sqlContext.sql("SELECT areacode1, > SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); > DataFrame cube6 = sqlContext.sql("SELECT areacode1, > SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM > hd_salesflat GROUP BY areacode1"); > cube5.show(50); > cube6.show(50); > my data: > transno | quantity | unitprice | areacode1 > 76317828| 1. | 25. | HDCN > data schema: > |-- areacode1: string (nullable = true) > |-- quantity: decimal(20,4) (nullable = true) > |-- unitprice: decimal(20,4) (nullable = true) > |-- transno: string (nullable = true) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results
[ https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiaodongCui updated SPARK-19102: Description: the problem is cube6's second column named sumprice is 1 times bigger than the cube5's second column named sumprice,but they should be equal .the bug is only reappear in the format like sum(a * b),count (distinct c) DataFrame df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); df1.registerTempTable("hd_salesflat"); DataFrame cube5 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); DataFrame cube6 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM hd_salesflat GROUP BY areacode1"); cube5.show(50); cube6.show(50); my data: transno | quantity | unitprice | areacode1 76317828| 1. | 25. | HDCN data schema: |-- areacode1: string (nullable = true) |-- quantity: decimal(20,4) (nullable = true) |-- unitprice: decimal(20,4) (nullable = true) |-- transno: string (nullable = true) was: the problem is the result of the code blow that the second column's value is not the same.the second sql result is 1 times bigger than the first sql result.the bug is only reappear in the format like sum(a * b),count (distinct c) DataFrame df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); df1.registerTempTable("hd_salesflat"); DataFrame cube5 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); DataFrame cube6 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM hd_salesflat GROUP BY areacode1"); cube5.show(50); cube6.show(50); my data: transno | quantity | unitprice | areacode1 76317828| 1. | 25. | HDCN data schema: |-- areacode1: string (nullable = true) |-- quantity: decimal(20,4) (nullable = true) |-- unitprice: decimal(20,4) (nullable = true) |-- transno: string (nullable = true) > Accuracy error of spark SQL results > --- > > Key: SPARK-19102 > URL: https://issues.apache.org/jira/browse/SPARK-19102 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.6.0, 1.6.1 > Environment: Spark 1.6.0, Hadoop 2.6.0,JDK 1.8,CentOS6.6 >Reporter: XiaodongCui > Attachments: a.zip > > > the problem is cube6's second column named sumprice is 1 times bigger > than the cube5's second column named sumprice,but they should be equal .the > bug is only reappear in the format like sum(a * b),count (distinct c) > DataFrame > df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); > df1.registerTempTable("hd_salesflat"); > DataFrame cube5 = sqlContext.sql("SELECT areacode1, > SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); > DataFrame cube6 = sqlContext.sql("SELECT areacode1, > SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM > hd_salesflat GROUP BY areacode1"); > cube5.show(50); > cube6.show(50); > my data: > transno | quantity | unitprice | areacode1 > 76317828| 1. | 25. | HDCN > data schema: > |-- areacode1: string (nullable = true) > |-- quantity: decimal(20,4) (nullable = true) > |-- unitprice: decimal(20,4) (nullable = true) > |-- transno: string (nullable = true) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results
[ https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiaodongCui updated SPARK-19102: Attachment: a.zip the attach file is my data,the data is parquet format > Accuracy error of spark SQL results > --- > > Key: SPARK-19102 > URL: https://issues.apache.org/jira/browse/SPARK-19102 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.6.0, 1.6.1 > Environment: Spark 1.6.0, Hadoop 2.6.0,JDK 1.8,CentOS6.6 >Reporter: XiaodongCui > Attachments: a.zip > > > the problem is the result of the code blow that the second column's value > is not the same.the second sql result is 1 times bigger than the first > sql result.the bug is only reappear in the format like sum(a * b),count > (distinct c) > DataFrame > df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); > df1.registerTempTable("hd_salesflat"); > DataFrame cube5 = sqlContext.sql("SELECT areacode1, > SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); > DataFrame cube6 = sqlContext.sql("SELECT areacode1, > SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM > hd_salesflat GROUP BY areacode1"); > cube5.show(50); > cube6.show(50); > my data: > transno | quantity | unitprice | areacode1 > 76317828| 1. | 25. | HDCN > data schema: > |-- areacode1: string (nullable = true) > |-- quantity: decimal(20,4) (nullable = true) > |-- unitprice: decimal(20,4) (nullable = true) > |-- transno: string (nullable = true) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results
[ https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiaodongCui updated SPARK-19102: Description: the problem is the result of the code blow that the second column's value is not the same.the second sql result is 1 times bigger than the first sql result.the bug is only reappear in the format like sum(a * b),count (distinct c) DataFrame df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); df1.registerTempTable("hd_salesflat"); DataFrame cube5 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); DataFrame cube6 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM hd_salesflat GROUP BY areacode1"); cube5.show(50); cube6.show(50); my data: transno | quantity | unitprice | areacode1 76317828| 1. | 25. | HDCN data schema: |-- areacode1: string (nullable = true) |-- quantity: decimal(20,4) (nullable = true) |-- unitprice: decimal(20,4) (nullable = true) |-- transno: string (nullable = true) was: the problem is the result of the code blow that the second column's value is not the same.the second sql result is 1 times bigger than the first sql result.the bug is only reappear in the format like sum(a * b),count (distinct c) DataFrame df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); df1.registerTempTable("hd_salesflat"); DataFrame cube5 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); DataFrame cube6 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM hd_salesflat GROUP BY areacode1"); cube5.show(50); cube6.show(50); my data: transno | quantity | unitprice | areacode1 76317828| 1. | 25. | HDCN > Accuracy error of spark SQL results > --- > > Key: SPARK-19102 > URL: https://issues.apache.org/jira/browse/SPARK-19102 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.6.0, 1.6.1 > Environment: Spark 1.6.0, Hadoop 2.6.0,JDK 1.8,CentOS6.6 >Reporter: XiaodongCui > > the problem is the result of the code blow that the second column's value > is not the same.the second sql result is 1 times bigger than the first > sql result.the bug is only reappear in the format like sum(a * b),count > (distinct c) > DataFrame > df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); > df1.registerTempTable("hd_salesflat"); > DataFrame cube5 = sqlContext.sql("SELECT areacode1, > SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); > DataFrame cube6 = sqlContext.sql("SELECT areacode1, > SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM > hd_salesflat GROUP BY areacode1"); > cube5.show(50); > cube6.show(50); > my data: > transno | quantity | unitprice | areacode1 > 76317828| 1. | 25. | HDCN > data schema: > |-- areacode1: string (nullable = true) > |-- quantity: decimal(20,4) (nullable = true) > |-- unitprice: decimal(20,4) (nullable = true) > |-- transno: string (nullable = true) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results
[ https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiaodongCui updated SPARK-19102: Description: the problem is the result of the code blow that the second column's value is not the same.the second sql result is 1 times bigger than the first sql result.the bug is only reappear in the format like sum(a * b),count (distinct c) DataFrame df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); df1.registerTempTable("hd_salesflat"); DataFrame cube5 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); DataFrame cube6 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM hd_salesflat GROUP BY areacode1"); cube5.show(50); cube6.show(50); my data: transno | quantity | unitprice | areacode1 76317828| 1. | 25. | HDCN was: the problem is the result of the code blow that the second column's value is not the same.the second sql result is 1 times bigger than the first sql result.the bug is only reappear in the format like sum(a * b),count (distinct c) DataFrame df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); df1.registerTempTable("hd_salesflat"); DataFrame cube5 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); DataFrame cube6 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM hd_salesflat GROUP BY areacode1"); cube5.show(50); cube6.show(50); > Accuracy error of spark SQL results > --- > > Key: SPARK-19102 > URL: https://issues.apache.org/jira/browse/SPARK-19102 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.6.0, 1.6.1 > Environment: Spark 1.6.0, Hadoop 2.6.0,JDK 1.8,CentOS6.6 >Reporter: XiaodongCui > > the problem is the result of the code blow that the second column's value > is not the same.the second sql result is 1 times bigger than the first > sql result.the bug is only reappear in the format like sum(a * b),count > (distinct c) > DataFrame > df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); > df1.registerTempTable("hd_salesflat"); > DataFrame cube5 = sqlContext.sql("SELECT areacode1, > SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); > DataFrame cube6 = sqlContext.sql("SELECT areacode1, > SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM > hd_salesflat GROUP BY areacode1"); > cube5.show(50); > cube6.show(50); > my data: > transno | quantity | unitprice | areacode1 > 76317828| 1. | 25. | HDCN -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results
[ https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiaodongCui updated SPARK-19102: Description: the problem is the result of the code blow that the second column's value is not the same.the second sql result is 1 times bigger than the first sql result.the bug is only reappear in the format like sum(a * b),count (distinct c) DataFrame df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); df1.registerTempTable("hd_salesflat"); DataFrame cube5 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); DataFrame cube6 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM hd_salesflat GROUP BY areacode1"); cube5.show(50); cube6.show(50); was: the problem is the result of the code blow that the second column's value is not the same.the second sql result is 1 times bigger than the first sql result.the bug is only reappear in the format like sum(a * b),count (distinct c) DataFrame df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); df1.registerTempTable("hd_salesflat"); DataFrame cube5 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); DataFrame cube6 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM hd_salesflat GROUP BY areacode1"); cube5.show(50); cube6.show(50); my data : | transno| lineno|productid|netamount|netamountoperation|serviceamount|quantity|unitprice|taxamount|discountamount|discountamountoperation|saleshour|businessdate|salesdate|week|holidayname|holidayid|financialyear|financialmonth| dateticket|calendaryear|calendarmonth|calendarmonthchr|memberno|salestype|covers|grossamountticket|netamountticket|netamountoperationticket|points|discountamountticket|discountamounroperationticket|serviceamountticket|invoicecount|taxamountticket| shopno|shopid|tableno|areacode1|areaname1|areacode2|areaname2|areacode3|areaname3|areacode4|areaname4| orgno|orgtype| hdsino|shopname| shopenname|shopbrname|commercial1|com1name|commercial2| com2name|shoptype1|shoptype1name|shoptype2|shoptype2name|taxtype|floorlocation| m2|deliverareano|deliverareaname|parentorgno|cityno|country|menutype|menutypename|costcenterno|costcentername|pricearea|priceareaname| opendate|openyear|shopcategory|timeperiod| closedate| sapshopno|cg5no|countryname|province|provincename|cityname|countrycode|categoryno|categoryname|categoryno2|categoryname2|categoryno3|categoryname3|categoryno4|categoryname4|productno|productname|productenname|salesprice|vouchertype| startdate| enddate|flavor|basicunit|discountno|discountdetailamountoperation|disdesctiption|promotionno|salestag|salestagname|usertype|usertypevalue|usercd| grossavg| netoperationavg| netavg|dineincount|dayamttotal|daynetamttotal|daynetamtopttotal|daytctotal|tablecount| ++-+-+-+--+-++-+-+--+---+-++-++---+-+-+--+++-+++-+--+-+---++--++-+---++---++--+---+-+-+-+-+-+-+-+-++---+++-+--+---++---+---+-+-+-+-+---+-+--+-+---+---+--+---++++--+-+-++++--++--+-+---++++---+--++---+-+---+-+---+-+-+---+-+--+---+++--+-+--+-+--+---++++-+--++--+--+---+---+--+-+--+--+ |76317828|121082663| 1392| 25.| 25.| null| 1.| 25.| 1.4200|0.| 0.
[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results
[ https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiaodongCui updated SPARK-19102: Description: the problem is the result of the code blow that the second column's value is not the same.the second sql result is 1 times bigger than the first sql result.the bug is only reappear in the format like sum(a * b),count (distinct c) DataFrame df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); df1.registerTempTable("hd_salesflat"); DataFrame cube5 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); DataFrame cube6 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM hd_salesflat GROUP BY areacode1"); cube5.show(50); cube6.show(50); my data : | transno| lineno|productid|netamount|netamountoperation|serviceamount|quantity|unitprice|taxamount|discountamount|discountamountoperation|saleshour|businessdate|salesdate|week|holidayname|holidayid|financialyear|financialmonth| dateticket|calendaryear|calendarmonth|calendarmonthchr|memberno|salestype|covers|grossamountticket|netamountticket|netamountoperationticket|points|discountamountticket|discountamounroperationticket|serviceamountticket|invoicecount|taxamountticket| shopno|shopid|tableno|areacode1|areaname1|areacode2|areaname2|areacode3|areaname3|areacode4|areaname4| orgno|orgtype| hdsino|shopname| shopenname|shopbrname|commercial1|com1name|commercial2| com2name|shoptype1|shoptype1name|shoptype2|shoptype2name|taxtype|floorlocation| m2|deliverareano|deliverareaname|parentorgno|cityno|country|menutype|menutypename|costcenterno|costcentername|pricearea|priceareaname| opendate|openyear|shopcategory|timeperiod| closedate| sapshopno|cg5no|countryname|province|provincename|cityname|countrycode|categoryno|categoryname|categoryno2|categoryname2|categoryno3|categoryname3|categoryno4|categoryname4|productno|productname|productenname|salesprice|vouchertype| startdate| enddate|flavor|basicunit|discountno|discountdetailamountoperation|disdesctiption|promotionno|salestag|salestagname|usertype|usertypevalue|usercd| grossavg| netoperationavg| netavg|dineincount|dayamttotal|daynetamttotal|daynetamtopttotal|daytctotal|tablecount| ++-+-+-+--+-++-+-+--+---+-++-++---+-+-+--+++-+++-+--+-+---++--++-+---++---++--+---+-+-+-+-+-+-+-+-++---+++-+--+---++---+---+-+-+-+-+---+-+--+-+---+---+--+---++++--+-+-++++--++--+-+---++++---+--++---+-+---+-+---+-+-+---+-+--+---+++--+-+--+-+--+---++++-+--++--+--+---+---+--+-+--+--+ |76317828|121082663| 1392| 25.| 25.| null| 1.| 25.| 1.4200|0.| 0.|5| 20160920| 20160920| Tue| | null| 2017| 4|2016-09-20 17:03:...|2016|9| Sep| 1329651| SALE| 1| 25.|25.| 25.| null| 0.| 0.| 0.| 0| 1.4200|CNSHA006| 202| | HDCN| 哈根达斯中国| CN01| 大华东区| CN0001| 上海大区| CN01| 上海1区|CNSHA006| 1|HDAS0251| 上海南东店|NAN DONG SHOP| SHND| 01| 市级商业中心| 2|High Street| 1| Flagship|1| 无户外| 10|1|298.00| DL0141| 上海A天天| CN01| SHA| CN| 1|Full| CN8X| 上海本地| MK0004|新外带菜单价格区域|200
[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results
[ https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiaodongCui updated SPARK-19102: Description: the problem is the result of the code blow that the second column's value is not the same.the second sql result is 1 times bigger than the first sql result.the bug is only reappear in the format like sum(a * b),count (distinct c) DataFrame df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); df1.registerTempTable("hd_salesflat"); DataFrame cube5 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); DataFrame cube6 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM hd_salesflat GROUP BY areacode1"); cube5.show(50); cube6.show(50); was: the problem is the result of the code blow that is not the same.the second sql result is 1 times bigger than the first sql result.the bug is only reappear in the format like sum(a * b),count (distinct c) DataFrame df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); df1.registerTempTable("hd_salesflat"); DataFrame cube5 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); DataFrame cube6 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM hd_salesflat GROUP BY areacode1"); cube5.show(50); cube6.show(50); > Accuracy error of spark SQL results > --- > > Key: SPARK-19102 > URL: https://issues.apache.org/jira/browse/SPARK-19102 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.6.0, 1.6.1 > Environment: Spark 1.6.0, Hadoop 2.6.0,JDK 1.8,CentOS6.6 >Reporter: XiaodongCui > > the problem is the result of the code blow that the second column's value > is not the same.the second sql result is 1 times bigger than the first > sql result.the bug is only reappear in the format like sum(a * b),count > (distinct c) > DataFrame > df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); > df1.registerTempTable("hd_salesflat"); > DataFrame cube5 = sqlContext.sql("SELECT areacode1, > SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); > DataFrame cube6 = sqlContext.sql("SELECT areacode1, > SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM > hd_salesflat GROUP BY areacode1"); > cube5.show(50); > cube6.show(50); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19102) Accuracy error of spark SQL results
XiaodongCui created SPARK-19102: --- Summary: Accuracy error of spark SQL results Key: SPARK-19102 URL: https://issues.apache.org/jira/browse/SPARK-19102 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 1.6.1, 1.6.0 Environment: Spark 1.6.0, Hadoop 2.6.0,JDK 1.8,CentOS6.6 Reporter: XiaodongCui the problem is the result of the code blow that is not the same.the second sql result is 1 times bigger than the first sql result.the bug is only reappear in the format like sum(a * b),count (distinct c) DataFrame df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a"); df1.registerTempTable("hd_salesflat"); DataFrame cube5 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1"); DataFrame cube6 = sqlContext.sql("SELECT areacode1, SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno) FROM hd_salesflat GROUP BY areacode1"); cube5.show(50); cube6.show(50); -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org