[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21215 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21215 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90137/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21215 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21215 **[Test build #90137 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90137/testReport)** for PR 21215 at commit [`e151ab7`](https://github.com/apache/spark/commit/e151ab7475fed32b2baaca5c0cbcf427a5c09ad3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user mn-mikke commented on the issue: https://github.com/apache/spark/pull/21215 @maropu Really nice idea to create typed empty arrays via an `Literal` expression! On the other hand, I feel that the end user shouldn't work with classes from Catalyst internals if we consider that the creation of typed empty arrays is an elementary operation. I've tailored the solution according to your suggestion, but still think that some function should be introduced. What do you think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21215 **[Test build #90137 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90137/testReport)** for PR 21215 at commit [`e151ab7`](https://github.com/apache/spark/commit/e151ab7475fed32b2baaca5c0cbcf427a5c09ad3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user lokm01 commented on the issue: https://github.com/apache/spark/pull/21215 @maropu Thanks! Didn't know about creating a literal this way. Don't you feel that the suggested change is way more elegant? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21215 Like this? ``` scala> val structTy = StructType.fromDDL("a ARRAY>") structTy: org.apache.spark.sql.types.StructType = StructType(StructField(a,ArrayType(StructType(StructField(b,IntegerType,true), StructField(c,StringType,true)),true),true)) scala> val newCol = new Column(Literal.create(Seq.empty[Inner], structTy.head.dataType)) newCol: org.apache.spark.sql.Column = [] scala> val df = Seq(1, 2, 3).toDF("a").withColumn("b", newCol) df: org.apache.spark.sql.DataFrame = [a: int, b: array>] scala> df.show +---+---+ | a| b| +---+---+ | 1| []| | 2| []| | 3| []| +---+---+ scala> df.printSchema root |-- a: integer (nullable = false) |-- b: array (nullable = false) ||-- element: struct (containsNull = true) |||-- b: integer (nullable = true) |||-- c: string (nullable = true) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user lokm01 commented on the issue: https://github.com/apache/spark/pull/21215 @maropu That would work if you had scala case classes for all the types. In our case, we're working on a generic framework, where we only have Spark schemas (and I'd rather not generate case classes at runtime). Can you suggest an existing way to do this using spark's DataType please? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21215 How about this? ``` scala> val df = Seq(Outer(Seq.empty[Inner]), Outer(Seq.empty[Inner])).toDF("a") df: org.apache.spark.sql.DataFrame = [a: array>] scala> df.printSchema root |-- a: array (nullable = true) ||-- element: struct (containsNull = true) |||-- b: integer (nullable = false) |||-- c: string (nullable = true) scala> df.show +---+ | a| +---+ | []| | []| +---+ scala> val df = Seq(1, 2, 3).toDF("a").withColumn("b", typedLit(Seq.empty[Inner])) df: org.apache.spark.sql.DataFrame = [a: int, b: array>] scala> df.printSchema root |-- a: integer (nullable = false) |-- b: array (nullable = false) ||-- element: struct (containsNull = true) |||-- b: integer (nullable = false) |||-- c: string (nullable = true) scala> df.show +---+---+ | a| b| +---+---+ | 1| []| | 2| []| | 3| []| +---+---+ ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user lokm01 commented on the issue: https://github.com/apache/spark/pull/21215 Hey @maropu, So we've encountered a number of issues with casting: 1. Casting an empty array to an array of primitive types caused an exception on 2.2.1, but works on 2.3.0+ so that's sorted 2. We're still facing an issue on 2.3.0 when we try to cast an empty array to an array of complex types. See the following example: ` case class Outer(a: List[Inner]) case class Inner(b: Int, c: String) object App4 extends App { val spark = SparkSession.builder().appName("").master("local[*]").getOrCreate() import spark.implicits._ import org.apache.spark.sql.functions._ val df = spark.createDataFrame(Seq[Outer]()) val r = spark.range(100).select(array().cast(df.schema("a").dataType)) r.printSchema() r.show } ` This code produces > Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'array()' due to data type mismatch: cannot cast array to array>;; --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21215 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90086/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21215 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21215 **[Test build #90086 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90086/testReport)** for PR 21215 at commit [`9c12457`](https://github.com/apache/spark/commit/9c124574a3fefe2e63dcd95bd03e47f1f8d5071a). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21215 Do you wanna do this? ``` scala> sql("select array()").printSchema root |-- array(): array (nullable = false) ||-- element: string (containsNull = false) scala> sql("select CAST(array() AS ARRAY) c").printSchema root |-- c: array (nullable = false) ||-- element: integer (containsNull = true) scala> sql("select CAST(array() AS ARRAY) c").show +---+ | c| +---+ | []| +---+ ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21215 **[Test build #90086 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90086/testReport)** for PR 21215 at commit [`9c12457`](https://github.com/apache/spark/commit/9c124574a3fefe2e63dcd95bd03e47f1f8d5071a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21215 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21215 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90079/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21215 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21215 **[Test build #90079 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90079/testReport)** for PR 21215 at commit [`9c12457`](https://github.com/apache/spark/commit/9c124574a3fefe2e63dcd95bd03e47f1f8d5071a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21215 **[Test build #90079 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90079/testReport)** for PR 21215 at commit [`9c12457`](https://github.com/apache/spark/commit/9c124574a3fefe2e63dcd95bd03e47f1f8d5071a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21215 **[Test build #90073 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90073/testReport)** for PR 21215 at commit [`44b1852`](https://github.com/apache/spark/commit/44b18520dcf8e3e3639756cd8a12f75ea1080bee). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class CreateArray(children: Seq[Expression], defaultElementType: DataType = StringType)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21215 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90073/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21215 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21215 **[Test build #90073 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90073/testReport)** for PR 21215 at commit [`44b1852`](https://github.com/apache/spark/commit/44b18520dcf8e3e3639756cd8a12f75ea1080bee). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21215 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21215 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user mn-mikke commented on the issue: https://github.com/apache/spark/pull/21215 @lokm01 @gatorsmile @maropu @ueshin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21215: [SPARK-24148][SQL] Overloading array function to support...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21215 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org