[jira] [Commented] (SPARK-34794) Nested higher-order functions broken in DSL

2021-05-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338411#comment-17338411
 ] 

Apache Spark commented on SPARK-34794:
--

User 'maropu' has created a pull request for this issue:
https://github.com/apache/spark/pull/32424

> Nested higher-order functions broken in DSL
> ---
>
> Key: SPARK-34794
> URL: https://issues.apache.org/jira/browse/SPARK-34794
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1, 3.2.0
> Environment: 3.1.1
>Reporter: Daniel Solow
>Priority: Major
>  Labels: correctness
>
> In Spark 3, if I have:
> {code:java}
> val df = Seq(
> (Seq(1,2,3), Seq("a", "b", "c"))
> ).toDF("numbers", "letters")
> {code}
> and I want to take the cross product of these two arrays, I can do the 
> following in SQL:
> {code:java}
> df.selectExpr("""
> FLATTEN(
> TRANSFORM(
> numbers,
> number -> TRANSFORM(
> letters,
> letter -> (number AS number, letter AS letter)
> )
> )
> ) AS zipped
> """).show(false)
> ++
> |zipped  |
> ++
> |[{1, a}, {1, b}, {1, c}, {2, a}, {2, b}, {2, c}, {3, a}, {3, b}, {3, c}]|
> ++
> {code}
> This works fine. But when I try the equivalent using the scala DSL, the 
> result is wrong:
> {code:java}
> df.select(
> f.flatten(
> f.transform(
> $"numbers",
> (number: Column) => { f.transform(
> $"letters",
> (letter: Column) => { f.struct(
> number.as("number"),
> letter.as("letter")
> ) }
> ) }
> )
> ).as("zipped")
> ).show(10, false)
> ++
> |zipped  |
> ++
> |[{a, a}, {b, b}, {c, c}, {a, a}, {b, b}, {c, c}, {a, a}, {b, b}, {c, c}]|
> ++
> {code}
> Note that the numbers are not included in the output. The explain for this 
> second version is:
> {code:java}
> == Parsed Logical Plan ==
> 'Project [flatten(transform('numbers, lambdafunction(transform('letters, 
> lambdafunction(struct(NamePlaceholder, lambda 'x AS number#442, 
> NamePlaceholder, lambda 'x AS letter#443), lambda 'x, false)), lambda 'x, 
> false))) AS zipped#444]
> +- Project [_1#303 AS numbers#308, _2#304 AS letters#309]
>+- LocalRelation [_1#303, _2#304]
> == Analyzed Logical Plan ==
> zipped: array>
> Project [flatten(transform(numbers#308, lambdafunction(transform(letters#309, 
> lambdafunction(struct(number, lambda x#446, letter, lambda x#446), lambda 
> x#446, false)), lambda x#445, false))) AS zipped#444]
> +- Project [_1#303 AS numbers#308, _2#304 AS letters#309]
>+- LocalRelation [_1#303, _2#304]
> == Optimized Logical Plan ==
> LocalRelation [zipped#444]
> == Physical Plan ==
> LocalTableScan [zipped#444]
> {code}
> Seems like variable name x is hardcoded. And sure enough: 
> https://github.com/apache/spark/blob/v3.1.1/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3647



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34794) Nested higher-order functions broken in DSL

2021-03-18 Thread Daniel Solow (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304533#comment-17304533
 ] 

Daniel Solow commented on SPARK-34794:
--

PR is at https://github.com/apache/spark/pull/31887

Thanks to [~rspitzer] for suggesting AtomicInteger

> Nested higher-order functions broken in DSL
> ---
>
> Key: SPARK-34794
> URL: https://issues.apache.org/jira/browse/SPARK-34794
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
> Environment: 3.1.1
>Reporter: Daniel Solow
>Priority: Major
>
> In Spark 3, if I have:
> {code:java}
> val df = Seq(
> (Seq(1,2,3), Seq("a", "b", "c"))
> ).toDF("numbers", "letters")
> {code}
> and I want to take the cross product of these two arrays, I can do the 
> following in SQL:
> {code:java}
> df.selectExpr("""
> FLATTEN(
> TRANSFORM(
> numbers,
> number -> TRANSFORM(
> letters,
> letter -> (number AS number, letter AS letter)
> )
> )
> ) AS zipped
> """).show(false)
> ++
> |zipped  |
> ++
> |[{1, a}, {1, b}, {1, c}, {2, a}, {2, b}, {2, c}, {3, a}, {3, b}, {3, c}]|
> ++
> {code}
> This works fine. But when I try the equivalent using the scala DSL, the 
> result is wrong:
> {code:java}
> df.select(
> f.flatten(
> f.transform(
> $"numbers",
> (number: Column) => { f.transform(
> $"letters",
> (letter: Column) => { f.struct(
> number.as("number"),
> letter.as("letter")
> ) }
> ) }
> )
> ).as("zipped")
> ).show(10, false)
> ++
> |zipped  |
> ++
> |[{a, a}, {b, b}, {c, c}, {a, a}, {b, b}, {c, c}, {a, a}, {b, b}, {c, c}]|
> ++
> {code}
> Note that the numbers are not included in the output. The explain for this 
> second version is:
> {code:java}
> == Parsed Logical Plan ==
> 'Project [flatten(transform('numbers, lambdafunction(transform('letters, 
> lambdafunction(struct(NamePlaceholder, lambda 'x AS number#442, 
> NamePlaceholder, lambda 'x AS letter#443), lambda 'x, false)), lambda 'x, 
> false))) AS zipped#444]
> +- Project [_1#303 AS numbers#308, _2#304 AS letters#309]
>+- LocalRelation [_1#303, _2#304]
> == Analyzed Logical Plan ==
> zipped: array>
> Project [flatten(transform(numbers#308, lambdafunction(transform(letters#309, 
> lambdafunction(struct(number, lambda x#446, letter, lambda x#446), lambda 
> x#446, false)), lambda x#445, false))) AS zipped#444]
> +- Project [_1#303 AS numbers#308, _2#304 AS letters#309]
>+- LocalRelation [_1#303, _2#304]
> == Optimized Logical Plan ==
> LocalRelation [zipped#444]
> == Physical Plan ==
> LocalTableScan [zipped#444]
> {code}
> Seems like variable name x is hardcoded. And sure enough: 
> https://github.com/apache/spark/blob/v3.1.1/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3647



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-34794) Nested higher-order functions broken in DSL

2021-03-18 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-34794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304532#comment-17304532
 ] 

Apache Spark commented on SPARK-34794:
--

User 'dmsolow' has created a pull request for this issue:
https://github.com/apache/spark/pull/31887

> Nested higher-order functions broken in DSL
> ---
>
> Key: SPARK-34794
> URL: https://issues.apache.org/jira/browse/SPARK-34794
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
> Environment: 3.1.1
>Reporter: Daniel Solow
>Priority: Major
>
> In Spark 3, if I have:
> {code:java}
> val df = Seq(
> (Seq(1,2,3), Seq("a", "b", "c"))
> ).toDF("numbers", "letters")
> {code}
> and I want to take the cross product of these two arrays, I can do the 
> following in SQL:
> {code:java}
> df.selectExpr("""
> FLATTEN(
> TRANSFORM(
> numbers,
> number -> TRANSFORM(
> letters,
> letter -> (number AS number, letter AS letter)
> )
> )
> ) AS zipped
> """).show(false)
> ++
> |zipped  |
> ++
> |[{1, a}, {1, b}, {1, c}, {2, a}, {2, b}, {2, c}, {3, a}, {3, b}, {3, c}]|
> ++
> {code}
> This works fine. But when I try the equivalent using the scala DSL, the 
> result is wrong:
> {code:java}
> df.select(
> f.flatten(
> f.transform(
> $"numbers",
> (number: Column) => { f.transform(
> $"letters",
> (letter: Column) => { f.struct(
> number.as("number"),
> letter.as("letter")
> ) }
> ) }
> )
> ).as("zipped")
> ).show(10, false)
> ++
> |zipped  |
> ++
> |[{a, a}, {b, b}, {c, c}, {a, a}, {b, b}, {c, c}, {a, a}, {b, b}, {c, c}]|
> ++
> {code}
> Note that the numbers are not included in the output. The explain for this 
> second version is:
> {code:java}
> == Parsed Logical Plan ==
> 'Project [flatten(transform('numbers, lambdafunction(transform('letters, 
> lambdafunction(struct(NamePlaceholder, lambda 'x AS number#442, 
> NamePlaceholder, lambda 'x AS letter#443), lambda 'x, false)), lambda 'x, 
> false))) AS zipped#444]
> +- Project [_1#303 AS numbers#308, _2#304 AS letters#309]
>+- LocalRelation [_1#303, _2#304]
> == Analyzed Logical Plan ==
> zipped: array>
> Project [flatten(transform(numbers#308, lambdafunction(transform(letters#309, 
> lambdafunction(struct(number, lambda x#446, letter, lambda x#446), lambda 
> x#446, false)), lambda x#445, false))) AS zipped#444]
> +- Project [_1#303 AS numbers#308, _2#304 AS letters#309]
>+- LocalRelation [_1#303, _2#304]
> == Optimized Logical Plan ==
> LocalRelation [zipped#444]
> == Physical Plan ==
> LocalTableScan [zipped#444]
> {code}
> Seems like variable name x is hardcoded. And sure enough: 
> https://github.com/apache/spark/blob/v3.1.1/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3647



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org