[GitHub] spark pull request #13631: [SPARK-15911][SQL] Remove the additional Project ...

2016-06-20 Thread viirya
Github user viirya closed the pull request at:

https://github.com/apache/spark/pull/13631


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13631: [SPARK-15911][SQL] Remove the additional Project ...

2016-06-14 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13631#discussion_r67091914
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -470,20 +470,16 @@ class Analyzer(
   // Assume partition columns are correctly placed at the end 
of the child's output
   i.copy(table = EliminateSubqueryAliases(table))
 } else {
-  // Set up the table's partition scheme with all dynamic 
partitions by moving partition
-  // columns to the end of the column list, in partition order.
-  val (inputPartCols, columns) = child.output.partition { attr 
=>
-tablePartitionNames.contains(attr.name)
-  }
   // All partition columns are dynamic because this 
InsertIntoTable had no partitioning
-  val partColumns = tablePartitionNames.map { name =>
-inputPartCols.find(_.name == name).getOrElse(
-  throw new AnalysisException(s"Cannot find partition 
column $name"))
+  tablePartitionNames.filterNot { name =>
+child.output.exists(_.name == name)
--- End diff --

hmm. indeed. As we use ordering not name, this check is not needed anymore. 
I will remove it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13631: [SPARK-15911][SQL] Remove the additional Project ...

2016-06-14 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13631#discussion_r67008835
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -470,20 +470,16 @@ class Analyzer(
   // Assume partition columns are correctly placed at the end 
of the child's output
   i.copy(table = EliminateSubqueryAliases(table))
 } else {
-  // Set up the table's partition scheme with all dynamic 
partitions by moving partition
-  // columns to the end of the column list, in partition order.
-  val (inputPartCols, columns) = child.output.partition { attr 
=>
-tablePartitionNames.contains(attr.name)
-  }
   // All partition columns are dynamic because this 
InsertIntoTable had no partitioning
-  val partColumns = tablePartitionNames.map { name =>
-inputPartCols.find(_.name == name).getOrElse(
-  throw new AnalysisException(s"Cannot find partition 
column $name"))
+  tablePartitionNames.filterNot { name =>
+child.output.exists(_.name == name)
--- End diff --

do we really need this check?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13631: [SPARK-15911][SQL] Remove the additional Project ...

2016-06-14 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13631#discussion_r66929775
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -480,10 +480,10 @@ class Analyzer(
 inputPartCols.find(_.name == name).getOrElse(
   throw new AnalysisException(s"Cannot find partition 
column $name"))
   }
+  // Assume partition columns are correctly placed at the end 
of the child's output
   i.copy(
 table = EliminateSubqueryAliases(table),
-partition = tablePartitionNames.map(_ -> None).toMap,
-child = Project(columns ++ partColumns, child))
--- End diff --

yea. simplified now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13631: [SPARK-15911][SQL] Remove the additional Project ...

2016-06-14 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/13631#discussion_r66927656
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala
 ---
@@ -257,27 +257,6 @@ class InsertIntoHiveTableSuite extends QueryTest with 
TestHiveSingleton with Bef
 }
   }
 
-  test("Detect table partitioning with correct partition order") {
--- End diff --

It is added by PR #12239.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13631: [SPARK-15911][SQL] Remove the additional Project ...

2016-06-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13631#discussion_r66827140
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -480,10 +480,10 @@ class Analyzer(
 inputPartCols.find(_.name == name).getOrElse(
   throw new AnalysisException(s"Cannot find partition 
column $name"))
   }
+  // Assume partition columns are correctly placed at the end 
of the child's output
   i.copy(
 table = EliminateSubqueryAliases(table),
-partition = tablePartitionNames.map(_ -> None).toMap,
-child = Project(columns ++ partColumns, child))
--- End diff --

Looks like the else branch can be simplified further more?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13631: [SPARK-15911][SQL] Remove the additional Project ...

2016-06-13 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13631#discussion_r66826931
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala
 ---
@@ -257,27 +257,6 @@ class InsertIntoHiveTableSuite extends QueryTest with 
TestHiveSingleton with Bef
 }
   }
 
-  test("Detect table partitioning with correct partition order") {
--- End diff --

can you link to the PR that added this test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13631: [SPARK-15911][SQL] Remove the additional Project ...

2016-06-12 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/13631

[SPARK-15911][SQL] Remove the additional Project to be consistent with SQL

## What changes were proposed in this pull request?

Currently In `DataFrameWriter`'s `insertInto` and `ResolveRelations` of 
`Analyzer`, we add additional Project to adjust column ordering. However, it 
should be using ordering not name for this resolution. This is how Hive does 
for dynamic partition.

## How was this patch tested?
Existing tests.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 inserttable

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13631.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13631


commit 5f4455ae3400302c4f3cb019419dbdada4edf5c9
Author: Liang-Chi Hsieh 
Date:   2016-06-13T04:00:40Z

Remove the additional Project to be consistent with SQL.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org