[GitHub] spark pull request: [SPARK-12477][SQL] - Tungsten projection fails...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/10429#issuecomment-167266382 It looks like this accidentally broke test compilation in branch-1.5; I'm hotfixing in #10478. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12477][SQL] - Tungsten projection fails...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/10429 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12477][SQL] - Tungsten projection fails...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10429#issuecomment-166816809 Thanks - I've merged this in master and branch-1.6 and branch-1.5. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12477][SQL] - Tungsten projection fails...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10429#issuecomment-166778676 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12477][SQL] - Tungsten projection fails...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10429#issuecomment-166778680 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48223/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12477][SQL] - Tungsten projection fails...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10429#issuecomment-166778197 **[Test build #48223 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48223/consoleFull)** for PR 10429 at commit [`64f95ec`](https://github.com/apache/spark/commit/64f95ec3b307d44af42ee021f707904eae3a7076). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12477][SQL] - Tungsten projection fails...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10429#issuecomment-166767492 **[Test build #48223 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48223/consoleFull)** for PR 10429 at commit [`64f95ec`](https://github.com/apache/spark/commit/64f95ec3b307d44af42ee021f707904eae3a7076). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12477][SQL] - Tungsten projection fails...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/10429#issuecomment-166766070 test this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12477][SQL] - Tungsten projection fails...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10429#issuecomment-166741715 **[Test build #2250 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2250/consoleFull)** for PR 10429 at commit [`b1fc7e5`](https://github.com/apache/spark/commit/b1fc7e5d7d120411342cd2234ab1b65b096dd524). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12477][SQL] - Tungsten projection fails...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10429#issuecomment-166739024 **[Test build #2250 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2250/consoleFull)** for PR 10429 at commit [`b1fc7e5`](https://github.com/apache/spark/commit/b1fc7e5d7d120411342cd2234ab1b65b096dd524). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12477][SQL] - Tungsten projection fails...
Github user pierre-borckmans commented on the pull request: https://github.com/apache/spark/pull/10429#issuecomment-166737236 @rxin I fixed the test title, and the scala style issues. I ran `dev/scalastyle` successfully. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12477][SQL] - Tungsten projection fails...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/10429#discussion_r48297749 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala --- @@ -43,4 +43,12 @@ class DataFrameComplexTypeSuite extends QueryTest with SharedSQLContext { val df = sparkContext.parallelize(Seq((1, 1))).toDF("a", "b") df.select(array($"a").as("s")).select(f(expr("s[0]"))).collect() } + + test("Accessing null element in array field") { --- End diff -- title --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12477][SQL] - Tungsten projection fails...
Github user pierre-borckmans commented on a diff in the pull request: https://github.com/apache/spark/pull/10429#discussion_r48297482 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala --- @@ -43,4 +43,12 @@ class DataFrameComplexTypeSuite extends QueryTest with SharedSQLContext { val df = sparkContext.parallelize(Seq((1, 1))).toDF("a", "b") df.select(array($"a").as("s")).select(f(expr("s[0]"))).collect() } + + test("Accessing null element in array field") { +val df = sc.parallelize(Seq((Seq("val1",null,"val2"),Seq(Some(1),None,Some(2).toDF("s","i") --- End diff -- Indeed, will do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12477][SQL] - Tungsten projection fails...
Github user pierre-borckmans commented on a diff in the pull request: https://github.com/apache/spark/pull/10429#discussion_r48297441 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala --- @@ -43,4 +43,12 @@ class DataFrameComplexTypeSuite extends QueryTest with SharedSQLContext { val df = sparkContext.parallelize(Seq((1, 1))).toDF("a", "b") df.select(array($"a").as("s")).select(f(expr("s[0]"))).collect() } + + test("Accessing null element in array field") { --- End diff -- @rxin You mean as the test title or as a comment? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12477][SQL] - Tungsten projection fails...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10429#issuecomment-166709057 **[Test build #2249 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2249/consoleFull)** for PR 10429 at commit [`3c8a795`](https://github.com/apache/spark/commit/3c8a7955dbb62648de83b9cd5595f7687092f55e). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12477][SQL] - Tungsten projection fails...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10429#issuecomment-166708237 **[Test build #2249 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2249/consoleFull)** for PR 10429 at commit [`3c8a795`](https://github.com/apache/spark/commit/3c8a7955dbb62648de83b9cd5595f7687092f55e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12477][SQL] - Tungsten projection fails...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/10429#discussion_r48287722 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala --- @@ -43,4 +43,12 @@ class DataFrameComplexTypeSuite extends QueryTest with SharedSQLContext { val df = sparkContext.parallelize(Seq((1, 1))).toDF("a", "b") df.select(array($"a").as("s")).select(f(expr("s[0]"))).collect() } + + test("Accessing null element in array field") { +val df = sc.parallelize(Seq((Seq("val1",null,"val2"),Seq(Some(1),None,Some(2).toDF("s","i") --- End diff -- you need to add spaces after comma; otherwise this will fail stylecheck. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12477][SQL] - Tungsten projection fails...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/10429#discussion_r48287694 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala --- @@ -43,4 +43,12 @@ class DataFrameComplexTypeSuite extends QueryTest with SharedSQLContext { val df = sparkContext.parallelize(Seq((1, 1))).toDF("a", "b") df.select(array($"a").as("s")).select(f(expr("s[0]"))).collect() } + + test("Accessing null element in array field") { --- End diff -- best to add the JIRA ticket here, i.e. "SPARK-12477 accessing null element in array field" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12477][SQL] - Tungsten projection fails...
Github user nongli commented on the pull request: https://github.com/apache/spark/pull/10429#issuecomment-166706210 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12477][SQL] - Tungsten projection fails...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10429#issuecomment-166700171 cc @nongli can you review this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK 12477][SQL] Tungsten projection fails f...
Github user pierre-borckmans commented on the pull request: https://github.com/apache/spark/pull/10429#issuecomment-166562978 @rxin This PR incidentally also fixes another issue. Accessing a null element in an array of IntegerType erroneously returned 0: ``` scala> val df = sc.parallelize(Seq((Seq("val1",null,"val2"),Seq(Some(1),None,Some(2).toDF("s","i") df: org.apache.spark.sql.DataFrame = [s: array, i: array] scala> df.selectExpr("i[1]").collect()(0) res1: org.apache.spark.sql.Row = [0] ``` It now correctly returns null. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK 12477][SQL] Tungsten projection fails f...
Github user pierre-borckmans commented on the pull request: https://github.com/apache/spark/pull/10429#issuecomment-166561023 @rxin I added a small test, let me know if more should be added. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK 12477][SQL] Tungsten projection fails f...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10429#issuecomment-166557339 Maybe DataFrameComplexTypeSuite? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK 12477][SQL] Tungsten projection fails f...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10429#issuecomment-166557047 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK 12477][SQL] Tungsten projection fails f...
Github user pierre-borckmans commented on the pull request: https://github.com/apache/spark/pull/10429#issuecomment-166557024 @rxin Where should it go to be sure? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK 12477][SQL] Tungsten projection fails f...
Github user pierre-borckmans commented on the pull request: https://github.com/apache/spark/pull/10429#issuecomment-166556972 @rxin Sure! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK 12477][SQL] Tungsten projection fails f...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10429#issuecomment-166556865 Can you also add a unit test? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK 12477][SQL] Tungsten projection fails f...
GitHub user pierre-borckmans opened a pull request: https://github.com/apache/spark/pull/10429 [SPARK 12477][SQL] Tungsten projection fails for null values in array fields Accessing null elements in an array field fails when tungsten is enabled. It works in Spark 1.3.1, and in Spark > 1.5 with Tungsten disabled. This PR solves this by checking if the accessed element in the array field is null, in the generated code. Example: ``` // Array of String case class AS( as: Seq[String] ) val dfAS = sc.parallelize( Seq( AS ( Seq("a",null,"b") ) ) ).toDF dfAS.registerTempTable("T_AS") for (i <- 0 to 2) { println(i + " = " + sqlContext.sql(s"select as[$i] from T_AS").collect.mkString(","))} ``` With Tungsten disabled: ``` 0 = [a] 1 = [null] 2 = [b] ``` With Tungsten enabled: ``` 15/12/22 09:32:50 ERROR Executor: Exception in task 7.0 in stage 1.0 (TID 15) java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.UnsafeRowWriters$UTF8StringWriter.getSize(UnsafeRowWriters.java:90) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:90) at org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:88) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/pierre-borckmans/spark SPARK-12477_Tungsten-Projection-Null-Element-In-Array Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10429.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10429 commit b6a79e7fe73b5a1cabbc39a50fa4e47dd4f2a079 Author: pierre-borckmans Date: 2015-12-22T08:43:55Z CHECK if element in array field is null --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org