[jira] [Commented] (SPARK-33726) Duplicate field names causes wrong answers during aggregation

2021-02-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277644#comment-17277644
 ] 

Apache Spark commented on SPARK-33726:
--

User 'yliou' has created a pull request for this issue:
https://github.com/apache/spark/pull/31447

> Duplicate field names causes wrong answers during aggregation
> -
>
> Key: SPARK-33726
> URL: https://issues.apache.org/jira/browse/SPARK-33726
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.1
>Reporter: Yian Liou
>Assignee: Yian Liou
>Priority: Major
>  Labels: correctness
> Fix For: 2.4.8, 3.0.2, 3.1.1
>
>
> We saw this bug at Workday.
> Duplicate field names for different fields can cause  
> org.apache.spark.sql.catalyst.expressions.RowBasedKeyValueBatch#allocate to 
> return a fixed batch when it should have returned a variable batch leading to 
> wrong results.
> This example produces wrong results in the spark shell:
> scala> sql("with T as (select id as a, -id as x from range(3)), U as (select 
> id as b, cast(id as string) as x from range(3)) select T.x, U.x, min(a) as 
> ma, min(b) as mb from T join U on a=b group by U.x, T.x").show
>  
> |*x*|*x*|*ma*|*mb*|
> |-2|2|0|null|
> |-1|1|null|1|
> |0|0|0|0|
>  instead of correct output : 
> |*x*|*x*|*ma*|*mb*|
> |0|0|0|0|
> |-2|2|2|2|
> |-1|1|1|1|
> The issue can be solved by iterating over the fields themselves instead of 
> field names. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33726) Duplicate field names causes wrong answers during aggregation

2021-02-02 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17277643#comment-17277643
 ] 

Apache Spark commented on SPARK-33726:
--

User 'yliou' has created a pull request for this issue:
https://github.com/apache/spark/pull/31447

> Duplicate field names causes wrong answers during aggregation
> -
>
> Key: SPARK-33726
> URL: https://issues.apache.org/jira/browse/SPARK-33726
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.1
>Reporter: Yian Liou
>Assignee: Yian Liou
>Priority: Major
>  Labels: correctness
> Fix For: 2.4.8, 3.0.2, 3.1.1
>
>
> We saw this bug at Workday.
> Duplicate field names for different fields can cause  
> org.apache.spark.sql.catalyst.expressions.RowBasedKeyValueBatch#allocate to 
> return a fixed batch when it should have returned a variable batch leading to 
> wrong results.
> This example produces wrong results in the spark shell:
> scala> sql("with T as (select id as a, -id as x from range(3)), U as (select 
> id as b, cast(id as string) as x from range(3)) select T.x, U.x, min(a) as 
> ma, min(b) as mb from T join U on a=b group by U.x, T.x").show
>  
> |*x*|*x*|*ma*|*mb*|
> |-2|2|0|null|
> |-1|1|null|1|
> |0|0|0|0|
>  instead of correct output : 
> |*x*|*x*|*ma*|*mb*|
> |0|0|0|0|
> |-2|2|2|2|
> |-1|1|1|1|
> The issue can be solved by iterating over the fields themselves instead of 
> field names. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33726) Duplicate field names causes wrong answers during aggregation

2021-01-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271600#comment-17271600
 ] 

Apache Spark commented on SPARK-33726:
--

User 'yliou' has created a pull request for this issue:
https://github.com/apache/spark/pull/31327

> Duplicate field names causes wrong answers during aggregation
> -
>
> Key: SPARK-33726
> URL: https://issues.apache.org/jira/browse/SPARK-33726
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.1
>Reporter: Yian Liou
>Assignee: Yian Liou
>Priority: Major
>  Labels: correctness
> Fix For: 3.0.2, 3.1.1
>
>
> We saw this bug at Workday.
> Duplicate field names for different fields can cause  
> org.apache.spark.sql.catalyst.expressions.RowBasedKeyValueBatch#allocate to 
> return a fixed batch when it should have returned a variable batch leading to 
> wrong results.
> This example produces wrong results in the spark shell:
> scala> sql("with T as (select id as a, -id as x from range(3)), U as (select 
> id as b, cast(id as string) as x from range(3)) select T.x, U.x, min(a) as 
> ma, min(b) as mb from T join U on a=b group by U.x, T.x").show
>  
> |*x*|*x*|*ma*|*mb*|
> |-2|2|0|null|
> |-1|1|null|1|
> |0|0|0|0|
>  instead of correct output : 
> |*x*|*x*|*ma*|*mb*|
> |0|0|0|0|
> |-2|2|2|2|
> |-1|1|1|1|
> The issue can be solved by iterating over the fields themselves instead of 
> field names. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33726) Duplicate field names causes wrong answers during aggregation

2021-01-25 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17271599#comment-17271599
 ] 

Apache Spark commented on SPARK-33726:
--

User 'yliou' has created a pull request for this issue:
https://github.com/apache/spark/pull/31327

> Duplicate field names causes wrong answers during aggregation
> -
>
> Key: SPARK-33726
> URL: https://issues.apache.org/jira/browse/SPARK-33726
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.1
>Reporter: Yian Liou
>Assignee: Yian Liou
>Priority: Major
>  Labels: correctness
> Fix For: 3.0.2, 3.1.1
>
>
> We saw this bug at Workday.
> Duplicate field names for different fields can cause  
> org.apache.spark.sql.catalyst.expressions.RowBasedKeyValueBatch#allocate to 
> return a fixed batch when it should have returned a variable batch leading to 
> wrong results.
> This example produces wrong results in the spark shell:
> scala> sql("with T as (select id as a, -id as x from range(3)), U as (select 
> id as b, cast(id as string) as x from range(3)) select T.x, U.x, min(a) as 
> ma, min(b) as mb from T join U on a=b group by U.x, T.x").show
>  
> |*x*|*x*|*ma*|*mb*|
> |-2|2|0|null|
> |-1|1|null|1|
> |0|0|0|0|
>  instead of correct output : 
> |*x*|*x*|*ma*|*mb*|
> |0|0|0|0|
> |-2|2|2|2|
> |-1|1|1|1|
> The issue can be solved by iterating over the fields themselves instead of 
> field names. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33726) Duplicate field names causes wrong answers during aggregation

2020-12-15 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249858#comment-17249858
 ] 

Apache Spark commented on SPARK-33726:
--

User 'yliou' has created a pull request for this issue:
https://github.com/apache/spark/pull/30788

> Duplicate field names causes wrong answers during aggregation
> -
>
> Key: SPARK-33726
> URL: https://issues.apache.org/jira/browse/SPARK-33726
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.1
>Reporter: Yian Liou
>Priority: Major
>  Labels: correctness
>
> We saw this bug at Workday.
> Duplicate field names for different fields can cause  
> org.apache.spark.sql.catalyst.expressions.RowBasedKeyValueBatch#allocate to 
> return a fixed batch when it should have returned a variable batch leading to 
> wrong results.
> This example produces wrong results in the spark shell:
> scala> sql("with T as (select id as a, -id as x from range(3)), U as (select 
> id as b, cast(id as string) as x from range(3)) select T.x, U.x, min(a) as 
> ma, min(b) as mb from T join U on a=b group by U.x, T.x").show
>  
> |*x*|*x*|*ma*|*mb*|
> |-2|2|0|null|
> |-1|1|null|1|
> |0|0|0|0|
>  instead of correct output : 
> |*x*|*x*|*ma*|*mb*|
> |0|0|0|0|
> |-2|2|2|2|
> |-1|1|1|1|
> The issue can be solved by iterating over the fields themselves instead of 
> field names. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33726) Duplicate field names causes wrong answers during aggregation

2020-12-09 Thread Yian Liou (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246812#comment-17246812
 ] 

Yian Liou commented on SPARK-33726:
---

Will create a PR for the issue.

> Duplicate field names causes wrong answers during aggregation
> -
>
> Key: SPARK-33726
> URL: https://issues.apache.org/jira/browse/SPARK-33726
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.1
>Reporter: Yian Liou
>Priority: Major
>
> We saw this bug at Workday.
> Duplicate field names for different fields can cause  
> org.apache.spark.sql.catalyst.expressions.RowBasedKeyValueBatch#allocate to 
> return a fixed batch when it should have returned a variable batch leading to 
> wrong results.
> This example produces wrong results in the spark shell:
> scala> sql("with T as (select id as a, -id as x from range(3)), U as (select 
> id as b, cast(id as string) as x from range(3)) select T.x, U.x, min(a) as 
> ma, min(b) as mb from T join U on a=b group by U.x, T.x").show
>  
> |*x*|*x*|*ma*|*mb*|
> |-2|2|0|null|
> |-1|1|null|1|
> |0|0|0|0|
>  instead of correct output : 
> |*x*|*x*|*ma*|*mb*|
> |0|0|0|0|
> |-2|2|2|2|
> |-1|1|1|1|
> The issue can be solved by iterating over the fields themselves instead of 
> field names. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org