[GitHub] spark pull request #13631: [SPARK-15911][SQL] Remove the additional Project ...
Github user viirya closed the pull request at: https://github.com/apache/spark/pull/13631 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13631: [SPARK-15911][SQL] Remove the additional Project ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13631#discussion_r67091914 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -470,20 +470,16 @@ class Analyzer( // Assume partition columns are correctly placed at the end of the child's output i.copy(table = EliminateSubqueryAliases(table)) } else { - // Set up the table's partition scheme with all dynamic partitions by moving partition - // columns to the end of the column list, in partition order. - val (inputPartCols, columns) = child.output.partition { attr => -tablePartitionNames.contains(attr.name) - } // All partition columns are dynamic because this InsertIntoTable had no partitioning - val partColumns = tablePartitionNames.map { name => -inputPartCols.find(_.name == name).getOrElse( - throw new AnalysisException(s"Cannot find partition column $name")) + tablePartitionNames.filterNot { name => +child.output.exists(_.name == name) --- End diff -- hmm. indeed. As we use ordering not name, this check is not needed anymore. I will remove it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13631: [SPARK-15911][SQL] Remove the additional Project ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13631#discussion_r67008835 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -470,20 +470,16 @@ class Analyzer( // Assume partition columns are correctly placed at the end of the child's output i.copy(table = EliminateSubqueryAliases(table)) } else { - // Set up the table's partition scheme with all dynamic partitions by moving partition - // columns to the end of the column list, in partition order. - val (inputPartCols, columns) = child.output.partition { attr => -tablePartitionNames.contains(attr.name) - } // All partition columns are dynamic because this InsertIntoTable had no partitioning - val partColumns = tablePartitionNames.map { name => -inputPartCols.find(_.name == name).getOrElse( - throw new AnalysisException(s"Cannot find partition column $name")) + tablePartitionNames.filterNot { name => +child.output.exists(_.name == name) --- End diff -- do we really need this check? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13631: [SPARK-15911][SQL] Remove the additional Project ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13631#discussion_r66929775 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -480,10 +480,10 @@ class Analyzer( inputPartCols.find(_.name == name).getOrElse( throw new AnalysisException(s"Cannot find partition column $name")) } + // Assume partition columns are correctly placed at the end of the child's output i.copy( table = EliminateSubqueryAliases(table), -partition = tablePartitionNames.map(_ -> None).toMap, -child = Project(columns ++ partColumns, child)) --- End diff -- yea. simplified now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13631: [SPARK-15911][SQL] Remove the additional Project ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13631#discussion_r66927656 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala --- @@ -257,27 +257,6 @@ class InsertIntoHiveTableSuite extends QueryTest with TestHiveSingleton with Bef } } - test("Detect table partitioning with correct partition order") { --- End diff -- It is added by PR #12239. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13631: [SPARK-15911][SQL] Remove the additional Project ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13631#discussion_r66827140 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -480,10 +480,10 @@ class Analyzer( inputPartCols.find(_.name == name).getOrElse( throw new AnalysisException(s"Cannot find partition column $name")) } + // Assume partition columns are correctly placed at the end of the child's output i.copy( table = EliminateSubqueryAliases(table), -partition = tablePartitionNames.map(_ -> None).toMap, -child = Project(columns ++ partColumns, child)) --- End diff -- Looks like the else branch can be simplified further more? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13631: [SPARK-15911][SQL] Remove the additional Project ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13631#discussion_r66826931 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala --- @@ -257,27 +257,6 @@ class InsertIntoHiveTableSuite extends QueryTest with TestHiveSingleton with Bef } } - test("Detect table partitioning with correct partition order") { --- End diff -- can you link to the PR that added this test? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13631: [SPARK-15911][SQL] Remove the additional Project ...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/13631 [SPARK-15911][SQL] Remove the additional Project to be consistent with SQL ## What changes were proposed in this pull request? Currently In `DataFrameWriter`'s `insertInto` and `ResolveRelations` of `Analyzer`, we add additional Project to adjust column ordering. However, it should be using ordering not name for this resolution. This is how Hive does for dynamic partition. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 inserttable Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13631.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13631 commit 5f4455ae3400302c4f3cb019419dbdada4edf5c9 Author: Liang-Chi HsiehDate: 2016-06-13T04:00:40Z Remove the additional Project to be consistent with SQL. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org