[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...
Github user mn-mikke commented on the issue: https://github.com/apache/spark/pull/20858 retest please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20858 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20858 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20858 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88596/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20858 **[Test build #88596 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88596/testReport)** for PR 20858 at commit [`bb46c3d`](https://github.com/apache/spark/commit/bb46c3d3d3e18a9e05ddb6fe6efda3c25c2711a4). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class UnresolvedConcat(children: Seq[Expression]) extends Expression` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20858 **[Test build #88596 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88596/testReport)** for PR 20858 at commit [`bb46c3d`](https://github.com/apache/spark/commit/bb46c3d3d3e18a9e05ddb6fe6efda3c25c2711a4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20858 @mn-mikke Could you update the PR title? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...
Github user mn-mikke commented on the issue: https://github.com/apache/spark/pull/20858 Merged concat and concat_arrays functions into one via an unresolved expression and subsequent resolution. Do you have any objections to this approach? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20858 Also, `postgresql` has the function `array_cat` for concatenating arrays, so it might be better to make the behaviour the same with the `postgresql` one: https://www.postgresql.org/docs/10/static/functions-array.html --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20858 We should handle different (and compatible) typed arrays in this funs? ``` scala> sql("select concat_arrays(array(1L, 2L), array(3, 4))").show org.apache.spark.sql.AnalysisException: cannot resolve 'concat_arrays(array(1L, 2L), array(3, 4))' due to data type mismatch: input to function concat_arrays sh ould all be the same type, but it's [array, array]; line 1 pos 7; 'Project [unresolvedalias(concat_arrays(array(1, 2), array(3, 4)), None)] +- OneRowRelation ``` Also, could you add more tests for this case in `SQLQueryTestSuite`? probably, we can add a new test file like `concat_arrays.sql` in `typeCoercion.native`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20858 The current code can't handle inner arrays; ``` scala> sql("select concat_arrays(array(1, 2, array(3, 4)), array(5, 6, 7, 8))").show org.apache.spark.sql.AnalysisException: cannot resolve 'array(1, 2, array(3, 4))' due to data type mismatch: input to function array should all be the same type, but it's [int, int, array]; line 1 pos 21; 'Project [unresolvedalias('concat_arrays(array(1, 2, array(3, 4)), array(5, 6, 7, 8)), None)] +- OneRowRelation ``` IMHO, it's better to make this function behaviour the same with postgresql: https://www.postgresql.org/docs/10/static/functions-array.html Could you brush up code to handle this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20858 ok, I'll check later! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20858 @maropu Maybe you can help @mn-mikke review this PR ? Will open an umbrella JIRA for the built-in functions we plan to do in Apache 2.4. In the list, we have multiple for operating nested data. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...
Github user mn-mikke commented on the issue: https://github.com/apache/spark/pull/20858 @maropu What other libraries do you mean? I'm not aware of any library providing this functionality on top Spark SQL. When using Spark SQL as an ETL tool for structured and nested data, people are forced to use UDFs for transforming arrays since current api for array columns is lacking. This approach brings several drawbacks: - bad code readability - Catalyst is blind when performing optimizations - impossibility to track data lineage of the transformation (a key aspect for the financial industry, see [Spline](https://absaoss.github.io/spline/) and [Spline paper](https://github.com/AbsaOSS/spline/releases/download/release%2F0.2.7/Spline_paper_IEEE_2018.pdf)) So my colleagues and I decided to extend the current Spark SQL API with well-known collection functions like concat, flatten, zipWithIndex, etc. We don't want to keep this functionality just in our fork of Spark, but would like to share it with others. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/20858 Thinks for this work! One question; why do you think we need to support this api in Spark native? Other libraries support this as first-class? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20858 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20858: [SPARK-23736][SQL] Implementation of the concat_arrays f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20858 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org