[jira] [Commented] (SPARK-31430) Bug in the approximate quantile computation.

2020-10-09 Thread Sean R. Owen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211286#comment-17211286
 ] 

Sean R. Owen commented on SPARK-31430:
--

Sounds good, I usually mark as a Duplicate.

> Bug in the approximate quantile computation.
> 
>
> Key: SPARK-31430
> URL: https://issues.apache.org/jira/browse/SPARK-31430
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Siddartha Naidu
>Priority: Major
> Attachments: approx_quantile_data.csv
>
>
> I am seeing a bug where passing lower relative error to the 
> {{approxQuantile}} function is leading to incorrect result in the presence of 
> partitions. Setting a relative error 1e-6 causes it to compute equal values 
> for 0.9 and 1.0 quantiles. Coalescing it back to 1 partition gives correct 
> results. This issue was not present in spark version 2.4.5, we noticed it 
> when testing 3.0.0-preview.
> {{>>> df = spark.read.csv('file:///tmp/approx_quantile_data.csv', 
> header=True, 
> schema=T.StructType([T.StructField('Store',T.StringType(),True),T.StructField('seconds',T.LongType(),True)]))}}
> {{>>> df = df.repartition(200, 'Store').localCheckpoint()}}
> {{>>> df.approxQuantile('seconds', [0.8, 0.9, 1.0], 0.0001)}}
> {{[1422576000.0, 1430352000.0, 1438300800.0]}}
> {{>>> df.approxQuantile('seconds', [0.8, 0.9, 1.0], 0.1)}}
> {{[1422576000.0, 1430524800.0, 1438300800.0]}}
> {color:#de350b}{{>>> df.approxQuantile('seconds', [0.8, 0.9, 1.0], 
> 0.01)}}{color}
> {color:#de350b}{{[1422576000.0, 1438300800.0, 1438300800.0]}}{color}
> {{>>> df.coalesce(1).approxQuantile('seconds', [0.8, 0.9, 1.0], 0.01)}}
> {{[1422576000.0, 1430524800.0, 1438300800.0]}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31430) Bug in the approximate quantile computation.

2020-10-09 Thread Aoyuan Liao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211282#comment-17211282
 ] 

Aoyuan Liao commented on SPARK-31430:
-

[~srowen] This is already fixed.

> Bug in the approximate quantile computation.
> 
>
> Key: SPARK-31430
> URL: https://issues.apache.org/jira/browse/SPARK-31430
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Siddartha Naidu
>Priority: Major
> Attachments: approx_quantile_data.csv
>
>
> I am seeing a bug where passing lower relative error to the 
> {{approxQuantile}} function is leading to incorrect result in the presence of 
> partitions. Setting a relative error 1e-6 causes it to compute equal values 
> for 0.9 and 1.0 quantiles. Coalescing it back to 1 partition gives correct 
> results. This issue was not present in spark version 2.4.5, we noticed it 
> when testing 3.0.0-preview.
> {{>>> df = spark.read.csv('file:///tmp/approx_quantile_data.csv', 
> header=True, 
> schema=T.StructType([T.StructField('Store',T.StringType(),True),T.StructField('seconds',T.LongType(),True)]))}}
> {{>>> df = df.repartition(200, 'Store').localCheckpoint()}}
> {{>>> df.approxQuantile('seconds', [0.8, 0.9, 1.0], 0.0001)}}
> {{[1422576000.0, 1430352000.0, 1438300800.0]}}
> {{>>> df.approxQuantile('seconds', [0.8, 0.9, 1.0], 0.1)}}
> {{[1422576000.0, 1430524800.0, 1438300800.0]}}
> {color:#de350b}{{>>> df.approxQuantile('seconds', [0.8, 0.9, 1.0], 
> 0.01)}}{color}
> {color:#de350b}{{[1422576000.0, 1438300800.0, 1438300800.0]}}{color}
> {{>>> df.coalesce(1).approxQuantile('seconds', [0.8, 0.9, 1.0], 0.01)}}
> {{[1422576000.0, 1430524800.0, 1438300800.0]}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31430) Bug in the approximate quantile computation.

2020-10-05 Thread Vladimir (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17207920#comment-17207920
 ] 

Vladimir commented on SPARK-31430:
--

Bug fixed in https://issues.apache.org/jira/browse/SPARK-32908

> Bug in the approximate quantile computation.
> 
>
> Key: SPARK-31430
> URL: https://issues.apache.org/jira/browse/SPARK-31430
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Siddartha Naidu
>Priority: Major
> Attachments: approx_quantile_data.csv
>
>
> I am seeing a bug where passing lower relative error to the 
> {{approxQuantile}} function is leading to incorrect result in the presence of 
> partitions. Setting a relative error 1e-6 causes it to compute equal values 
> for 0.9 and 1.0 quantiles. Coalescing it back to 1 partition gives correct 
> results. This issue was not present in spark version 2.4.5, we noticed it 
> when testing 3.0.0-preview.
> {{>>> df = spark.read.csv('file:///tmp/approx_quantile_data.csv', 
> header=True, 
> schema=T.StructType([T.StructField('Store',T.StringType(),True),T.StructField('seconds',T.LongType(),True)]))}}
> {{>>> df = df.repartition(200, 'Store').localCheckpoint()}}
> {{>>> df.approxQuantile('seconds', [0.8, 0.9, 1.0], 0.0001)}}
> {{[1422576000.0, 1430352000.0, 1438300800.0]}}
> {{>>> df.approxQuantile('seconds', [0.8, 0.9, 1.0], 0.1)}}
> {{[1422576000.0, 1430524800.0, 1438300800.0]}}
> {color:#de350b}{{>>> df.approxQuantile('seconds', [0.8, 0.9, 1.0], 
> 0.01)}}{color}
> {color:#de350b}{{[1422576000.0, 1438300800.0, 1438300800.0]}}{color}
> {{>>> df.coalesce(1).approxQuantile('seconds', [0.8, 0.9, 1.0], 0.01)}}
> {{[1422576000.0, 1430524800.0, 1438300800.0]}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31430) Bug in the approximate quantile computation.

2020-05-12 Thread Karim Magomedov (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17105363#comment-17105363
 ] 

Karim Magomedov commented on SPARK-31430:
-

I'd like to work on this issue

> Bug in the approximate quantile computation.
> 
>
> Key: SPARK-31430
> URL: https://issues.apache.org/jira/browse/SPARK-31430
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Siddartha Naidu
>Priority: Major
> Attachments: approx_quantile_data.csv
>
>
> I am seeing a bug where passing lower relative error to the 
> {{approxQuantile}} function is leading to incorrect result in the presence of 
> partitions. Setting a relative error 1e-6 causes it to compute equal values 
> for 0.9 and 1.0 quantiles. Coalescing it back to 1 partition gives correct 
> results. This issue was not present in spark version 2.4.5, we noticed it 
> when testing 3.0.0-preview.
> {{>>> df = spark.read.csv('file:///tmp/approx_quantile_data.csv', 
> header=True, 
> schema=T.StructType([T.StructField('Store',T.StringType(),True),T.StructField('seconds',T.LongType(),True)]))}}
> {{>>> df = df.repartition(200, 'Store').localCheckpoint()}}
> {{>>> df.approxQuantile('seconds', [0.8, 0.9, 1.0], 0.0001)}}
> {{[1422576000.0, 1430352000.0, 1438300800.0]}}
> {{>>> df.approxQuantile('seconds', [0.8, 0.9, 1.0], 0.1)}}
> {{[1422576000.0, 1430524800.0, 1438300800.0]}}
> {color:#de350b}{{>>> df.approxQuantile('seconds', [0.8, 0.9, 1.0], 
> 0.01)}}{color}
> {color:#de350b}{{[1422576000.0, 1438300800.0, 1438300800.0]}}{color}
> {{>>> df.coalesce(1).approxQuantile('seconds', [0.8, 0.9, 1.0], 0.01)}}
> {{[1422576000.0, 1430524800.0, 1438300800.0]}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org