[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-176686941
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-176686945
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50368/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-176686339
  
**[Test build #50368 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50368/consoleFull)**
 for PR 10630 at commit 
[`b600089`](https://github.com/apache/spark/commit/b600089d4736e9768a8968fb4dead08d28014ac1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-29 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-176653934
  
Thank you! Just cleaned the codes. : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-176659772
  
**[Test build #50368 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50368/consoleFull)**
 for PR 10630 at commit 
[`b600089`](https://github.com/apache/spark/commit/b600089d4736e9768a8968fb4dead08d28014ac1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-29 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-176656084
  
LGTM, pending test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-29 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-176921008
  
Thanks - I'm going to merge this.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-29 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10630


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-28 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-176628957
  
LGTM. we can merge it first and @gatorsmile can address remaining comments 
in a follow-up PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-28 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-176630203
  
This is not that big. Let's just do it together here.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r51233504
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceOperatorSuite.scala
 ---
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.dsl.expressions._
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.plans.{LeftSemi, PlanTest}
+import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
+
+class ReplaceOperatorSuite extends PlanTest {
+
+  object Optimize extends RuleExecutor[LogicalPlan] {
+val batches =
+  Batch("Intersect", FixedPoint(100),
+ReplaceIntersectWithSemiJoin) :: Nil
+  }
+
+  test("replace Intersect with Left-semi Join") {
--- End diff --

also move the test for `ReplaceDistinctWithAggregate` here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r51233589
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
 ---
@@ -270,7 +270,8 @@ class AnalysisErrorSuite extends AnalysisTest {
 val error = intercept[AnalysisException] {
   SimpleAnalyzer.checkAnalysis(join)
 }
-assert(error.message.contains("Failure when resolving conflicting 
references in Join"))
+assert(error.message.contains("Failure when resolving conflicting 
references\n" +
--- End diff --

we can revert this after revert the error message change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r51233135
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -214,12 +214,22 @@ trait CheckAnalysis {
   s"""Only a single table generating function is allowed in a 
SELECT clause, found:
  | 
${exprs.map(_.prettyString).mkString(",")}""".stripMargin)
 
-  // Special handling for cases when self-join introduce duplicate 
expression ids.
-  case j @ Join(left, right, _, _) if 
left.outputSet.intersect(right.outputSet).nonEmpty =>
-val conflictingAttributes = 
left.outputSet.intersect(right.outputSet)
+  case j: Join if !j.duplicateResolved =>
+val conflictingAttributes = 
j.left.outputSet.intersect(j.right.outputSet)
 failAnalysis(
   s"""
- |Failure when resolving conflicting references in Join:
--- End diff --

now we can keep this message as it only checks join :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r51233162
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -214,12 +214,22 @@ trait CheckAnalysis {
   s"""Only a single table generating function is allowed in a 
SELECT clause, found:
  | 
${exprs.map(_.prettyString).mkString(",")}""".stripMargin)
 
-  // Special handling for cases when self-join introduce duplicate 
expression ids.
-  case j @ Join(left, right, _, _) if 
left.outputSet.intersect(right.outputSet).nonEmpty =>
-val conflictingAttributes = 
left.outputSet.intersect(right.outputSet)
+  case j: Join if !j.duplicateResolved =>
+val conflictingAttributes = 
j.left.outputSet.intersect(j.right.outputSet)
 failAnalysis(
   s"""
- |Failure when resolving conflicting references in Join:
+ |Failure when resolving conflicting references
+ |in operator ${operator.simpleString}:
+ |$plan
+ |Conflicting attributes: 
${conflictingAttributes.mkString(",")}
+ |""".stripMargin)
+
+  case i: Intersect if !i.duplicateResolved =>
+val conflictingAttributes = 
i.left.outputSet.intersect(i.right.outputSet)
+failAnalysis(
+  s"""
+ |Failure when resolving conflicting references
+ |in operator ${operator.simpleString}:
--- End diff --

same here, we could say `Intersect` directly


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r51233323
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -315,4 +315,7 @@ abstract class BinaryNode extends LogicalPlan {
   def right: LogicalPlan
 
   override def children: Seq[LogicalPlan] = Seq(left, right)
+
+  override lazy val resolved: Boolean =
+expressions.forall(_.resolved) && childrenResolved
--- End diff --

why override this? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-176441985
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-176441989
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50298/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-176487231
  
**[Test build #50313 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50313/consoleFull)**
 for PR 10630 at commit 
[`e51de8f`](https://github.com/apache/spark/commit/e51de8f98b25281b338ca57f0f13645b626c7673).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-176507818
  
**[Test build #50313 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50313/consoleFull)**
 for PR 10630 at commit 
[`e51de8f`](https://github.com/apache/spark/commit/e51de8f98b25281b338ca57f0f13645b626c7673).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class SetOperation(left: LogicalPlan, right: LogicalPlan) 
extends BinaryNode`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-176508356
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50313/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-28 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-176482358
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-176508353
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r51091050
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -388,57 +445,18 @@ class Analyzer(
 .map(_.asInstanceOf[NamedExpression])
 a.copy(aggregateExpressions = expanded)
 
-  // Special handling for cases when self-join introduce duplicate 
expression ids.
-  case j @ Join(left, right, _, _) if !j.selfJoinResolved =>
-val conflictingAttributes = 
left.outputSet.intersect(right.outputSet)
-logDebug(s"Conflicting attributes 
${conflictingAttributes.mkString(",")} in $j")
-
-right.collect {
-  // Handle base relations that might appear more than once.
-  case oldVersion: MultiInstanceRelation
-  if 
oldVersion.outputSet.intersect(conflictingAttributes).nonEmpty =>
-val newVersion = oldVersion.newInstance()
-(oldVersion, newVersion)
-
-  // Handle projects that create conflicting aliases.
-  case oldVersion @ Project(projectList, _)
-  if 
findAliases(projectList).intersect(conflictingAttributes).nonEmpty =>
-(oldVersion, oldVersion.copy(projectList = 
newAliases(projectList)))
-
-  case oldVersion @ Aggregate(_, aggregateExpressions, _)
-  if 
findAliases(aggregateExpressions).intersect(conflictingAttributes).nonEmpty =>
-(oldVersion, oldVersion.copy(aggregateExpressions = 
newAliases(aggregateExpressions)))
-
-  case oldVersion: Generate
-  if 
oldVersion.generatedSet.intersect(conflictingAttributes).nonEmpty =>
-val newOutput = oldVersion.generatorOutput.map(_.newInstance())
-(oldVersion, oldVersion.copy(generatorOutput = newOutput))
-
-  case oldVersion @ Window(_, windowExpressions, _, _, child)
-  if 
AttributeSet(windowExpressions.map(_.toAttribute)).intersect(conflictingAttributes)
-.nonEmpty =>
-(oldVersion, oldVersion.copy(windowExpressions = 
newAliases(windowExpressions)))
-}
-// Only handle first case, others will be fixed on the next pass.
-.headOption match {
-  case None =>
-/*
- * No result implies that there is a logical plan node that 
produces new references
- * that this rule cannot handle. When that is the case, there 
must be another rule
- * that resolves these conflicts. Otherwise, the analysis will 
fail.
- */
-j
-  case Some((oldRelation, newRelation)) =>
-val attributeRewrites = 
AttributeMap(oldRelation.output.zip(newRelation.output))
-val newRight = right transformUp {
-  case r if r == oldRelation => newRelation
-} transformUp {
-  case other => other transformExpressions {
-case a: Attribute => attributeRewrites.get(a).getOrElse(a)
-  }
-}
-j.copy(right = newRight)
-}
+  // To resolve duplicate expression IDs for all the BinaryNode
+  case b: BinaryNode if !b.duplicateResolved => b match {
+case j @ Join(left, right, _, _) =>
+  j.copy(right = dedupRight(left, right))
+case i @ Intersect(left, right) =>
+  i.copy(right = dedupRight(left, right))
+case e @ Except(left, right) =>
+  e.copy(right = dedupRight(left, right))
+case cg: CoGroup =>
--- End diff --

ah makes sense, let's add some comment to explain this :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r51163511
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -388,57 +445,18 @@ class Analyzer(
 .map(_.asInstanceOf[NamedExpression])
 a.copy(aggregateExpressions = expanded)
 
-  // Special handling for cases when self-join introduce duplicate 
expression ids.
-  case j @ Join(left, right, _, _) if !j.selfJoinResolved =>
-val conflictingAttributes = 
left.outputSet.intersect(right.outputSet)
-logDebug(s"Conflicting attributes 
${conflictingAttributes.mkString(",")} in $j")
-
-right.collect {
-  // Handle base relations that might appear more than once.
-  case oldVersion: MultiInstanceRelation
-  if 
oldVersion.outputSet.intersect(conflictingAttributes).nonEmpty =>
-val newVersion = oldVersion.newInstance()
-(oldVersion, newVersion)
-
-  // Handle projects that create conflicting aliases.
-  case oldVersion @ Project(projectList, _)
-  if 
findAliases(projectList).intersect(conflictingAttributes).nonEmpty =>
-(oldVersion, oldVersion.copy(projectList = 
newAliases(projectList)))
-
-  case oldVersion @ Aggregate(_, aggregateExpressions, _)
-  if 
findAliases(aggregateExpressions).intersect(conflictingAttributes).nonEmpty =>
-(oldVersion, oldVersion.copy(aggregateExpressions = 
newAliases(aggregateExpressions)))
-
-  case oldVersion: Generate
-  if 
oldVersion.generatedSet.intersect(conflictingAttributes).nonEmpty =>
-val newOutput = oldVersion.generatorOutput.map(_.newInstance())
-(oldVersion, oldVersion.copy(generatorOutput = newOutput))
-
-  case oldVersion @ Window(_, windowExpressions, _, _, child)
-  if 
AttributeSet(windowExpressions.map(_.toAttribute)).intersect(conflictingAttributes)
-.nonEmpty =>
-(oldVersion, oldVersion.copy(windowExpressions = 
newAliases(windowExpressions)))
-}
-// Only handle first case, others will be fixed on the next pass.
-.headOption match {
-  case None =>
-/*
- * No result implies that there is a logical plan node that 
produces new references
- * that this rule cannot handle. When that is the case, there 
must be another rule
- * that resolves these conflicts. Otherwise, the analysis will 
fail.
- */
-j
-  case Some((oldRelation, newRelation)) =>
-val attributeRewrites = 
AttributeMap(oldRelation.output.zip(newRelation.output))
-val newRight = right transformUp {
-  case r if r == oldRelation => newRelation
-} transformUp {
-  case other => other transformExpressions {
-case a: Attribute => attributeRewrites.get(a).getOrElse(a)
-  }
-}
-j.copy(right = newRight)
-}
+  // To resolve duplicate expression IDs for all the BinaryNode
+  case b: BinaryNode if !b.duplicateResolved => b match {
+case j @ Join(left, right, _, _) =>
+  j.copy(right = dedupRight(left, right))
+case i @ Intersect(left, right) =>
+  i.copy(right = dedupRight(left, right))
+case e @ Except(left, right) =>
+  e.copy(right = dedupRight(left, right))
+case cg: CoGroup =>
--- End diff --

Will do. : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r51061765
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -388,57 +445,18 @@ class Analyzer(
 .map(_.asInstanceOf[NamedExpression])
 a.copy(aggregateExpressions = expanded)
 
-  // Special handling for cases when self-join introduce duplicate 
expression ids.
-  case j @ Join(left, right, _, _) if !j.selfJoinResolved =>
-val conflictingAttributes = 
left.outputSet.intersect(right.outputSet)
-logDebug(s"Conflicting attributes 
${conflictingAttributes.mkString(",")} in $j")
-
-right.collect {
-  // Handle base relations that might appear more than once.
-  case oldVersion: MultiInstanceRelation
-  if 
oldVersion.outputSet.intersect(conflictingAttributes).nonEmpty =>
-val newVersion = oldVersion.newInstance()
-(oldVersion, newVersion)
-
-  // Handle projects that create conflicting aliases.
-  case oldVersion @ Project(projectList, _)
-  if 
findAliases(projectList).intersect(conflictingAttributes).nonEmpty =>
-(oldVersion, oldVersion.copy(projectList = 
newAliases(projectList)))
-
-  case oldVersion @ Aggregate(_, aggregateExpressions, _)
-  if 
findAliases(aggregateExpressions).intersect(conflictingAttributes).nonEmpty =>
-(oldVersion, oldVersion.copy(aggregateExpressions = 
newAliases(aggregateExpressions)))
-
-  case oldVersion: Generate
-  if 
oldVersion.generatedSet.intersect(conflictingAttributes).nonEmpty =>
-val newOutput = oldVersion.generatorOutput.map(_.newInstance())
-(oldVersion, oldVersion.copy(generatorOutput = newOutput))
-
-  case oldVersion @ Window(_, windowExpressions, _, _, child)
-  if 
AttributeSet(windowExpressions.map(_.toAttribute)).intersect(conflictingAttributes)
-.nonEmpty =>
-(oldVersion, oldVersion.copy(windowExpressions = 
newAliases(windowExpressions)))
-}
-// Only handle first case, others will be fixed on the next pass.
-.headOption match {
-  case None =>
-/*
- * No result implies that there is a logical plan node that 
produces new references
- * that this rule cannot handle. When that is the case, there 
must be another rule
- * that resolves these conflicts. Otherwise, the analysis will 
fail.
- */
-j
-  case Some((oldRelation, newRelation)) =>
-val attributeRewrites = 
AttributeMap(oldRelation.output.zip(newRelation.output))
-val newRight = right transformUp {
-  case r if r == oldRelation => newRelation
-} transformUp {
-  case other => other transformExpressions {
-case a: Attribute => attributeRewrites.get(a).getOrElse(a)
-  }
-}
-j.copy(right = newRight)
-}
+  // To resolve duplicate expression IDs for all the BinaryNode
+  case b: BinaryNode if !b.duplicateResolved => b match {
+case j @ Join(left, right, _, _) =>
+  j.copy(right = dedupRight(left, right))
+case i @ Intersect(left, right) =>
+  i.copy(right = dedupRight(left, right))
+case e @ Except(left, right) =>
+  e.copy(right = dedupRight(left, right))
+case cg: CoGroup =>
--- End diff --

For other operators, can we construct test cases that can make them fail 
without de-duplication? If we can, then we should create a JIRA and fix it in 
another PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r51059167
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -388,57 +445,18 @@ class Analyzer(
 .map(_.asInstanceOf[NamedExpression])
 a.copy(aggregateExpressions = expanded)
 
-  // Special handling for cases when self-join introduce duplicate 
expression ids.
-  case j @ Join(left, right, _, _) if !j.selfJoinResolved =>
-val conflictingAttributes = 
left.outputSet.intersect(right.outputSet)
-logDebug(s"Conflicting attributes 
${conflictingAttributes.mkString(",")} in $j")
-
-right.collect {
-  // Handle base relations that might appear more than once.
-  case oldVersion: MultiInstanceRelation
-  if 
oldVersion.outputSet.intersect(conflictingAttributes).nonEmpty =>
-val newVersion = oldVersion.newInstance()
-(oldVersion, newVersion)
-
-  // Handle projects that create conflicting aliases.
-  case oldVersion @ Project(projectList, _)
-  if 
findAliases(projectList).intersect(conflictingAttributes).nonEmpty =>
-(oldVersion, oldVersion.copy(projectList = 
newAliases(projectList)))
-
-  case oldVersion @ Aggregate(_, aggregateExpressions, _)
-  if 
findAliases(aggregateExpressions).intersect(conflictingAttributes).nonEmpty =>
-(oldVersion, oldVersion.copy(aggregateExpressions = 
newAliases(aggregateExpressions)))
-
-  case oldVersion: Generate
-  if 
oldVersion.generatedSet.intersect(conflictingAttributes).nonEmpty =>
-val newOutput = oldVersion.generatorOutput.map(_.newInstance())
-(oldVersion, oldVersion.copy(generatorOutput = newOutput))
-
-  case oldVersion @ Window(_, windowExpressions, _, _, child)
-  if 
AttributeSet(windowExpressions.map(_.toAttribute)).intersect(conflictingAttributes)
-.nonEmpty =>
-(oldVersion, oldVersion.copy(windowExpressions = 
newAliases(windowExpressions)))
-}
-// Only handle first case, others will be fixed on the next pass.
-.headOption match {
-  case None =>
-/*
- * No result implies that there is a logical plan node that 
produces new references
- * that this rule cannot handle. When that is the case, there 
must be another rule
- * that resolves these conflicts. Otherwise, the analysis will 
fail.
- */
-j
-  case Some((oldRelation, newRelation)) =>
-val attributeRewrites = 
AttributeMap(oldRelation.output.zip(newRelation.output))
-val newRight = right transformUp {
-  case r if r == oldRelation => newRelation
-} transformUp {
-  case other => other transformExpressions {
-case a: Attribute => attributeRewrites.get(a).getOrElse(a)
-  }
-}
-j.copy(right = newRight)
-}
+  // To resolve duplicate expression IDs for all the BinaryNode
+  case b: BinaryNode if !b.duplicateResolved => b match {
+case j @ Join(left, right, _, _) =>
+  j.copy(right = dedupRight(left, right))
+case i @ Intersect(left, right) =>
+  i.copy(right = dedupRight(left, right))
+case e @ Except(left, right) =>
+  e.copy(right = dedupRight(left, right))
+case cg: CoGroup =>
--- End diff --

In this case, it should work! Let me know if we should deduplicate the 
expression IDs for the other operators. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r51084606
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -388,57 +445,18 @@ class Analyzer(
 .map(_.asInstanceOf[NamedExpression])
 a.copy(aggregateExpressions = expanded)
 
-  // Special handling for cases when self-join introduce duplicate 
expression ids.
-  case j @ Join(left, right, _, _) if !j.selfJoinResolved =>
-val conflictingAttributes = 
left.outputSet.intersect(right.outputSet)
-logDebug(s"Conflicting attributes 
${conflictingAttributes.mkString(",")} in $j")
-
-right.collect {
-  // Handle base relations that might appear more than once.
-  case oldVersion: MultiInstanceRelation
-  if 
oldVersion.outputSet.intersect(conflictingAttributes).nonEmpty =>
-val newVersion = oldVersion.newInstance()
-(oldVersion, newVersion)
-
-  // Handle projects that create conflicting aliases.
-  case oldVersion @ Project(projectList, _)
-  if 
findAliases(projectList).intersect(conflictingAttributes).nonEmpty =>
-(oldVersion, oldVersion.copy(projectList = 
newAliases(projectList)))
-
-  case oldVersion @ Aggregate(_, aggregateExpressions, _)
-  if 
findAliases(aggregateExpressions).intersect(conflictingAttributes).nonEmpty =>
-(oldVersion, oldVersion.copy(aggregateExpressions = 
newAliases(aggregateExpressions)))
-
-  case oldVersion: Generate
-  if 
oldVersion.generatedSet.intersect(conflictingAttributes).nonEmpty =>
-val newOutput = oldVersion.generatorOutput.map(_.newInstance())
-(oldVersion, oldVersion.copy(generatorOutput = newOutput))
-
-  case oldVersion @ Window(_, windowExpressions, _, _, child)
-  if 
AttributeSet(windowExpressions.map(_.toAttribute)).intersect(conflictingAttributes)
-.nonEmpty =>
-(oldVersion, oldVersion.copy(windowExpressions = 
newAliases(windowExpressions)))
-}
-// Only handle first case, others will be fixed on the next pass.
-.headOption match {
-  case None =>
-/*
- * No result implies that there is a logical plan node that 
produces new references
- * that this rule cannot handle. When that is the case, there 
must be another rule
- * that resolves these conflicts. Otherwise, the analysis will 
fail.
- */
-j
-  case Some((oldRelation, newRelation)) =>
-val attributeRewrites = 
AttributeMap(oldRelation.output.zip(newRelation.output))
-val newRight = right transformUp {
-  case r if r == oldRelation => newRelation
-} transformUp {
-  case other => other transformExpressions {
-case a: Attribute => attributeRewrites.get(a).getOrElse(a)
-  }
-}
-j.copy(right = newRight)
-}
+  // To resolve duplicate expression IDs for all the BinaryNode
+  case b: BinaryNode if !b.duplicateResolved => b match {
+case j @ Join(left, right, _, _) =>
+  j.copy(right = dedupRight(left, right))
+case i @ Intersect(left, right) =>
+  i.copy(right = dedupRight(left, right))
+case e @ Except(left, right) =>
+  e.copy(right = dedupRight(left, right))
+case cg: CoGroup =>
--- End diff --

@cloud-fan After code changes, I just realized the problem. If we replace 
intersect with left-semi join during analysis, it does not work. We have to 
de-duplicate the left and the right children at first; otherwise, the tree will 
be like
```
== Analyzed Logical Plan ==
id: int
Distinct
+- Join LeftSemi, Some((id#1 <=> id#1))
   :- Sample 0.0, 0.4, false, 1
   :  +- Sort [id#1 ASC], false
   : +- Project [_1#0 AS id#1]
   :+- LogicalRDD [_1#0], MapPartitionsRDD[2] at apply at 
Transformer.scala:22
   +- Sample 0.4, 1.0, false, 1
  +- Sort [id#4 ASC], false
 +- Project [_1#0 AS id#4]
+- LogicalRDD [_1#0], MapPartitionsRDD[2] at apply at 
Transformer.scala:22
```
The Join condition is invalid. Should we keep the rule in the optimizer? 

BTW, I will remove the `Except` and `CoGroup`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact 

[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r51074011
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -388,57 +445,18 @@ class Analyzer(
 .map(_.asInstanceOf[NamedExpression])
 a.copy(aggregateExpressions = expanded)
 
-  // Special handling for cases when self-join introduce duplicate 
expression ids.
-  case j @ Join(left, right, _, _) if !j.selfJoinResolved =>
-val conflictingAttributes = 
left.outputSet.intersect(right.outputSet)
-logDebug(s"Conflicting attributes 
${conflictingAttributes.mkString(",")} in $j")
-
-right.collect {
-  // Handle base relations that might appear more than once.
-  case oldVersion: MultiInstanceRelation
-  if 
oldVersion.outputSet.intersect(conflictingAttributes).nonEmpty =>
-val newVersion = oldVersion.newInstance()
-(oldVersion, newVersion)
-
-  // Handle projects that create conflicting aliases.
-  case oldVersion @ Project(projectList, _)
-  if 
findAliases(projectList).intersect(conflictingAttributes).nonEmpty =>
-(oldVersion, oldVersion.copy(projectList = 
newAliases(projectList)))
-
-  case oldVersion @ Aggregate(_, aggregateExpressions, _)
-  if 
findAliases(aggregateExpressions).intersect(conflictingAttributes).nonEmpty =>
-(oldVersion, oldVersion.copy(aggregateExpressions = 
newAliases(aggregateExpressions)))
-
-  case oldVersion: Generate
-  if 
oldVersion.generatedSet.intersect(conflictingAttributes).nonEmpty =>
-val newOutput = oldVersion.generatorOutput.map(_.newInstance())
-(oldVersion, oldVersion.copy(generatorOutput = newOutput))
-
-  case oldVersion @ Window(_, windowExpressions, _, _, child)
-  if 
AttributeSet(windowExpressions.map(_.toAttribute)).intersect(conflictingAttributes)
-.nonEmpty =>
-(oldVersion, oldVersion.copy(windowExpressions = 
newAliases(windowExpressions)))
-}
-// Only handle first case, others will be fixed on the next pass.
-.headOption match {
-  case None =>
-/*
- * No result implies that there is a logical plan node that 
produces new references
- * that this rule cannot handle. When that is the case, there 
must be another rule
- * that resolves these conflicts. Otherwise, the analysis will 
fail.
- */
-j
-  case Some((oldRelation, newRelation)) =>
-val attributeRewrites = 
AttributeMap(oldRelation.output.zip(newRelation.output))
-val newRight = right transformUp {
-  case r if r == oldRelation => newRelation
-} transformUp {
-  case other => other transformExpressions {
-case a: Attribute => attributeRewrites.get(a).getOrElse(a)
-  }
-}
-j.copy(right = newRight)
-}
+  // To resolve duplicate expression IDs for all the BinaryNode
+  case b: BinaryNode if !b.duplicateResolved => b match {
+case j @ Join(left, right, _, _) =>
+  j.copy(right = dedupRight(left, right))
+case i @ Intersect(left, right) =>
+  i.copy(right = dedupRight(left, right))
+case e @ Except(left, right) =>
+  e.copy(right = dedupRight(left, right))
+case cg: CoGroup =>
--- End diff --

So far, I am unable to create such test cases. Let me revert it back and 
enable de-duplication for Join only. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-176013909
  
**[Test build #50257 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50257/consoleFull)**
 for PR 10630 at commit 
[`3be78c4`](https://github.com/apache/spark/commit/3be78c4197519edb4659d8064468125b5223dd29).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-176036668
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-176036575
  
**[Test build #50257 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50257/consoleFull)**
 for PR 10630 at commit 
[`3be78c4`](https://github.com/apache/spark/commit/3be78c4197519edb4659d8064468125b5223dd29).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-176036669
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50257/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r51040321
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -111,6 +113,7 @@ object SamplePushDown extends Rule[LogicalPlan] {
 }
 
 /**
+<<< HEAD
--- End diff --

remove this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r51041095
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -388,57 +445,18 @@ class Analyzer(
 .map(_.asInstanceOf[NamedExpression])
 a.copy(aggregateExpressions = expanded)
 
-  // Special handling for cases when self-join introduce duplicate 
expression ids.
-  case j @ Join(left, right, _, _) if !j.selfJoinResolved =>
-val conflictingAttributes = 
left.outputSet.intersect(right.outputSet)
-logDebug(s"Conflicting attributes 
${conflictingAttributes.mkString(",")} in $j")
-
-right.collect {
-  // Handle base relations that might appear more than once.
-  case oldVersion: MultiInstanceRelation
-  if 
oldVersion.outputSet.intersect(conflictingAttributes).nonEmpty =>
-val newVersion = oldVersion.newInstance()
-(oldVersion, newVersion)
-
-  // Handle projects that create conflicting aliases.
-  case oldVersion @ Project(projectList, _)
-  if 
findAliases(projectList).intersect(conflictingAttributes).nonEmpty =>
-(oldVersion, oldVersion.copy(projectList = 
newAliases(projectList)))
-
-  case oldVersion @ Aggregate(_, aggregateExpressions, _)
-  if 
findAliases(aggregateExpressions).intersect(conflictingAttributes).nonEmpty =>
-(oldVersion, oldVersion.copy(aggregateExpressions = 
newAliases(aggregateExpressions)))
-
-  case oldVersion: Generate
-  if 
oldVersion.generatedSet.intersect(conflictingAttributes).nonEmpty =>
-val newOutput = oldVersion.generatorOutput.map(_.newInstance())
-(oldVersion, oldVersion.copy(generatorOutput = newOutput))
-
-  case oldVersion @ Window(_, windowExpressions, _, _, child)
-  if 
AttributeSet(windowExpressions.map(_.toAttribute)).intersect(conflictingAttributes)
-.nonEmpty =>
-(oldVersion, oldVersion.copy(windowExpressions = 
newAliases(windowExpressions)))
-}
-// Only handle first case, others will be fixed on the next pass.
-.headOption match {
-  case None =>
-/*
- * No result implies that there is a logical plan node that 
produces new references
- * that this rule cannot handle. When that is the case, there 
must be another rule
- * that resolves these conflicts. Otherwise, the analysis will 
fail.
- */
-j
-  case Some((oldRelation, newRelation)) =>
-val attributeRewrites = 
AttributeMap(oldRelation.output.zip(newRelation.output))
-val newRight = right transformUp {
-  case r if r == oldRelation => newRelation
-} transformUp {
-  case other => other transformExpressions {
-case a: Attribute => attributeRewrites.get(a).getOrElse(a)
-  }
-}
-j.copy(right = newRight)
-}
+  // To resolve duplicate expression IDs for all the BinaryNode
+  case b: BinaryNode if !b.duplicateResolved => b match {
+case j @ Join(left, right, _, _) =>
+  j.copy(right = dedupRight(left, right))
+case i @ Intersect(left, right) =>
+  i.copy(right = dedupRight(left, right))
+case e @ Except(left, right) =>
+  e.copy(right = dedupRight(left, right))
+case cg: CoGroup =>
--- End diff --

do we have test cases to prove that duplicated expression IDs in set 
operations will cause problems?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r51054458
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -388,57 +445,18 @@ class Analyzer(
 .map(_.asInstanceOf[NamedExpression])
 a.copy(aggregateExpressions = expanded)
 
-  // Special handling for cases when self-join introduce duplicate 
expression ids.
-  case j @ Join(left, right, _, _) if !j.selfJoinResolved =>
-val conflictingAttributes = 
left.outputSet.intersect(right.outputSet)
-logDebug(s"Conflicting attributes 
${conflictingAttributes.mkString(",")} in $j")
-
-right.collect {
-  // Handle base relations that might appear more than once.
-  case oldVersion: MultiInstanceRelation
-  if 
oldVersion.outputSet.intersect(conflictingAttributes).nonEmpty =>
-val newVersion = oldVersion.newInstance()
-(oldVersion, newVersion)
-
-  // Handle projects that create conflicting aliases.
-  case oldVersion @ Project(projectList, _)
-  if 
findAliases(projectList).intersect(conflictingAttributes).nonEmpty =>
-(oldVersion, oldVersion.copy(projectList = 
newAliases(projectList)))
-
-  case oldVersion @ Aggregate(_, aggregateExpressions, _)
-  if 
findAliases(aggregateExpressions).intersect(conflictingAttributes).nonEmpty =>
-(oldVersion, oldVersion.copy(aggregateExpressions = 
newAliases(aggregateExpressions)))
-
-  case oldVersion: Generate
-  if 
oldVersion.generatedSet.intersect(conflictingAttributes).nonEmpty =>
-val newOutput = oldVersion.generatorOutput.map(_.newInstance())
-(oldVersion, oldVersion.copy(generatorOutput = newOutput))
-
-  case oldVersion @ Window(_, windowExpressions, _, _, child)
-  if 
AttributeSet(windowExpressions.map(_.toAttribute)).intersect(conflictingAttributes)
-.nonEmpty =>
-(oldVersion, oldVersion.copy(windowExpressions = 
newAliases(windowExpressions)))
-}
-// Only handle first case, others will be fixed on the next pass.
-.headOption match {
-  case None =>
-/*
- * No result implies that there is a logical plan node that 
produces new references
- * that this rule cannot handle. When that is the case, there 
must be another rule
- * that resolves these conflicts. Otherwise, the analysis will 
fail.
- */
-j
-  case Some((oldRelation, newRelation)) =>
-val attributeRewrites = 
AttributeMap(oldRelation.output.zip(newRelation.output))
-val newRight = right transformUp {
-  case r if r == oldRelation => newRelation
-} transformUp {
-  case other => other transformExpressions {
-case a: Attribute => attributeRewrites.get(a).getOrElse(a)
-  }
-}
-j.copy(right = newRight)
-}
+  // To resolve duplicate expression IDs for all the BinaryNode
+  case b: BinaryNode if !b.duplicateResolved => b match {
+case j @ Join(left, right, _, _) =>
+  j.copy(right = dedupRight(left, right))
+case i @ Intersect(left, right) =>
+  i.copy(right = dedupRight(left, right))
+case e @ Except(left, right) =>
+  e.copy(right = dedupRight(left, right))
+case cg: CoGroup =>
--- End diff --

After we convert Intersect to Join, the query can return a wrong answer due 
to the pushdown of Join predicate in Optimizer. That is why I did the change. 
Based on the comments 
https://github.com/apache/spark/pull/10630#discussion_r49227379  , I also tried 
to resolve this issue in the other operators.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r51055023
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -111,6 +113,7 @@ object SamplePushDown extends Rule[LogicalPlan] {
 }
 
 /**
+<<< HEAD
--- End diff --

will do. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r51057617
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -388,57 +445,18 @@ class Analyzer(
 .map(_.asInstanceOf[NamedExpression])
 a.copy(aggregateExpressions = expanded)
 
-  // Special handling for cases when self-join introduce duplicate 
expression ids.
-  case j @ Join(left, right, _, _) if !j.selfJoinResolved =>
-val conflictingAttributes = 
left.outputSet.intersect(right.outputSet)
-logDebug(s"Conflicting attributes 
${conflictingAttributes.mkString(",")} in $j")
-
-right.collect {
-  // Handle base relations that might appear more than once.
-  case oldVersion: MultiInstanceRelation
-  if 
oldVersion.outputSet.intersect(conflictingAttributes).nonEmpty =>
-val newVersion = oldVersion.newInstance()
-(oldVersion, newVersion)
-
-  // Handle projects that create conflicting aliases.
-  case oldVersion @ Project(projectList, _)
-  if 
findAliases(projectList).intersect(conflictingAttributes).nonEmpty =>
-(oldVersion, oldVersion.copy(projectList = 
newAliases(projectList)))
-
-  case oldVersion @ Aggregate(_, aggregateExpressions, _)
-  if 
findAliases(aggregateExpressions).intersect(conflictingAttributes).nonEmpty =>
-(oldVersion, oldVersion.copy(aggregateExpressions = 
newAliases(aggregateExpressions)))
-
-  case oldVersion: Generate
-  if 
oldVersion.generatedSet.intersect(conflictingAttributes).nonEmpty =>
-val newOutput = oldVersion.generatorOutput.map(_.newInstance())
-(oldVersion, oldVersion.copy(generatorOutput = newOutput))
-
-  case oldVersion @ Window(_, windowExpressions, _, _, child)
-  if 
AttributeSet(windowExpressions.map(_.toAttribute)).intersect(conflictingAttributes)
-.nonEmpty =>
-(oldVersion, oldVersion.copy(windowExpressions = 
newAliases(windowExpressions)))
-}
-// Only handle first case, others will be fixed on the next pass.
-.headOption match {
-  case None =>
-/*
- * No result implies that there is a logical plan node that 
produces new references
- * that this rule cannot handle. When that is the case, there 
must be another rule
- * that resolves these conflicts. Otherwise, the analysis will 
fail.
- */
-j
-  case Some((oldRelation, newRelation)) =>
-val attributeRewrites = 
AttributeMap(oldRelation.output.zip(newRelation.output))
-val newRight = right transformUp {
-  case r if r == oldRelation => newRelation
-} transformUp {
-  case other => other transformExpressions {
-case a: Attribute => attributeRewrites.get(a).getOrElse(a)
-  }
-}
-j.copy(right = newRight)
-}
+  // To resolve duplicate expression IDs for all the BinaryNode
+  case b: BinaryNode if !b.duplicateResolved => b match {
+case j @ Join(left, right, _, _) =>
+  j.copy(right = dedupRight(left, right))
+case i @ Intersect(left, right) =>
+  i.copy(right = dedupRight(left, right))
+case e @ Except(left, right) =>
+  e.copy(right = dedupRight(left, right))
+case cg: CoGroup =>
--- End diff --

If we replace intersect with left-semi join during analysis, then we can 
get de-duplication for free(the rule for join can do the work), will it help 
here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-175469365
  
**[Test build #50176 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50176/consoleFull)**
 for PR 10630 at commit 
[`e566d79`](https://github.com/apache/spark/commit/e566d79b12ab1d9ed9e3124fec8290cdba99549b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-175432369
  
**[Test build #50176 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50176/consoleFull)**
 for PR 10630 at commit 
[`e566d79`](https://github.com/apache/spark/commit/e566d79b12ab1d9ed9e3124fec8290cdba99549b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-175469557
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-175469558
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50176/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-26 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r50897086
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -125,17 +128,15 @@ object EliminateSerialization extends 
Rule[LogicalPlan] {
 
 /**
  * Pushes certain operations to both sides of a Union, Intersect or Except 
operator.
+===
+ * Pushes certain operations to both sides of a Union or Except operator.
+>>> IntersectBySemiJoinMerged
--- End diff --

Seems we need to remove this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-26 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r50899426
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1055,6 +1045,22 @@ object ReplaceDistinctWithAggregate extends 
Rule[LogicalPlan] {
 }
 
 /**
+ * Replaces logical [[Intersect]] operator with a left-semi [[Join]] 
operator.
+ * {{{
+ *   SELECT a1, a2 FROM Tab1 INTERSECT SELECT b1, b2 FROM Tab2
+ *   ==>  SELECT DISTINCT a1, a2 FROM Tab1 LEFT SEMI JOIN Tab2 ON a1<=>b1 
AND a2<=>b2
+ * }}}
+ */
+object ReplaceIntersectWithSemiJoin extends Rule[LogicalPlan] {
--- End diff --

I think we need to add a comment at here to mention that this rewrite is 
just for `INTERSECT` (i.e. `INTERSECT DISTINCT`) and is not applicable to 
`INTERSECT ALL`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-26 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r50901811
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -125,17 +128,15 @@ object EliminateSerialization extends 
Rule[LogicalPlan] {
 
 /**
  * Pushes certain operations to both sides of a Union, Intersect or Except 
operator.
+===
+ * Pushes certain operations to both sides of a Union or Except operator.
+>>> IntersectBySemiJoinMerged
--- End diff --

yeah, sure, will do. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-26 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r50901898
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1055,6 +1045,22 @@ object ReplaceDistinctWithAggregate extends 
Rule[LogicalPlan] {
 }
 
 /**
+ * Replaces logical [[Intersect]] operator with a left-semi [[Join]] 
operator.
+ * {{{
+ *   SELECT a1, a2 FROM Tab1 INTERSECT SELECT b1, b2 FROM Tab2
+ *   ==>  SELECT DISTINCT a1, a2 FROM Tab1 LEFT SEMI JOIN Tab2 ON a1<=>b1 
AND a2<=>b2
+ * }}}
+ */
+object ReplaceIntersectWithSemiJoin extends Rule[LogicalPlan] {
--- End diff --

Yeah, will do it. Actually, I will also implement `INTERSECT ALL` after 
this one. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-25 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-174630543
  
I don't think its a problem for there to be conflicting attribute ids for 
set operations, this is because only on childs attribute references need to be 
propagated up (unlike with a join).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-25 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-174665556
  
Yeah, agree! Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-23 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-174165179
  
When resolving the conflicts, I realized the multi-children `Union` might 
have duplicate `exprId`. So far, I did not add a function to de-duplicate them. 
This is not a trivial work, if needed. When the children has hundreds of nodes, 
it is infeasible to use the current per-pair de-duplication. That means, we 
need to rewrite the whole function `dedup`. 

Let me know if we need to open a separate PR to do it now. So far, unlike 
`Intersect`, we did not hit any issue even if there exist duplicate `exprId` 
values. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-174165441
  
**[Test build #49936 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49936/consoleFull)**
 for PR 10630 at commit 
[`6a7979d`](https://github.com/apache/spark/commit/6a7979d35d5cf89d9cd14eb6d6a2a4418a44a669).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-174171695
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49936/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-174171694
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-174171631
  
**[Test build #49936 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49936/consoleFull)**
 for PR 10630 at commit 
[`6a7979d`](https://github.com/apache/spark/commit/6a7979d35d5cf89d9cd14eb6d6a2a4418a44a669).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-22 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-174148184
  
sure, let me do it tonight. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-22 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-174144971
  
@gatorsmile there is a conflict now - mind bringing it up to date?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-170220929
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-170208865
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-170208817
  
**[Test build #49038 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49038/consoleFull)**
 for PR 10630 at commit 
[`f820c61`](https://github.com/apache/spark/commit/f820c616fe217494ccaed0bf74a0a7410ce503cf).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Intersect(left: LogicalPlan, right: LogicalPlan) extends 
SetOperation(left, right)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-170208866
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49038/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-09 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-170209280
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-170209903
  
**[Test build #49044 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49044/consoleFull)**
 for PR 10630 at commit 
[`4372170`](https://github.com/apache/spark/commit/4372170f600eb25996c3aa4f09d569312c263686).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-170220857
  
**[Test build #49044 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49044/consoleFull)**
 for PR 10630 at commit 
[`4372170`](https://github.com/apache/spark/commit/4372170f600eb25996c3aa4f09d569312c263686).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-170220930
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49044/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-169926993
  
**[Test build #49008 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49008/consoleFull)**
 for PR 10630 at commit 
[`27192be`](https://github.com/apache/spark/commit/27192be27708265aba6707d1100eefca537fb2b8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-169927020
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49008/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-08 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-169927573
  
Both failure are not related to this PR. They are caused by the test case 
`randomSplit on reordered partitions` in 
https://github.com/apache/spark/pull/10626. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-169931210
  
**[Test build #49012 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49012/consoleFull)**
 for PR 10630 at commit 
[`a932cdb`](https://github.com/apache/spark/commit/a932cdb05c889ded8de72024b2eef19723bdb0f4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-169936735
  
**[Test build #49012 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49012/consoleFull)**
 for PR 10630 at commit 
[`a932cdb`](https://github.com/apache/spark/commit/a932cdb05c889ded8de72024b2eef19723bdb0f4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class JavaSampleActorReceiver extends JavaActorReceiver `
  * `public class JavaActorWordCount `
  * `abstract class ActorReceiver extends Actor `
  * `abstract class JavaActorReceiver extends UntypedActor `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-169936845
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49012/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-169936843
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-169927019
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-170082147
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-170081871
  
**[Test build #49020 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49020/consoleFull)**
 for PR 10630 at commit 
[`04a26bd`](https://github.com/apache/spark/commit/04a26bd3beaab9d8ea3a4b300eb9f413b5eaa704).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Intersect(left: LogicalPlan, right: LogicalPlan) extends 
SetOperation(left, right) `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-08 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r49227379
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -388,57 +441,14 @@ class Analyzer(
 .map(_.asInstanceOf[NamedExpression])
 a.copy(aggregateExpressions = expanded)
 
-  // Special handling for cases when self-join introduce duplicate 
expression ids.
+  // Special handling for cases when self-join introduces duplicate 
expression IDs.
   case j @ Join(left, right, _, _) if !j.selfJoinResolved =>
-val conflictingAttributes = 
left.outputSet.intersect(right.outputSet)
-logDebug(s"Conflicting attributes 
${conflictingAttributes.mkString(",")} in $j")
-
-right.collect {
-  // Handle base relations that might appear more than once.
-  case oldVersion: MultiInstanceRelation
-  if 
oldVersion.outputSet.intersect(conflictingAttributes).nonEmpty =>
-val newVersion = oldVersion.newInstance()
-(oldVersion, newVersion)
-
-  // Handle projects that create conflicting aliases.
-  case oldVersion @ Project(projectList, _)
-  if 
findAliases(projectList).intersect(conflictingAttributes).nonEmpty =>
-(oldVersion, oldVersion.copy(projectList = 
newAliases(projectList)))
-
-  case oldVersion @ Aggregate(_, aggregateExpressions, _)
-  if 
findAliases(aggregateExpressions).intersect(conflictingAttributes).nonEmpty =>
-(oldVersion, oldVersion.copy(aggregateExpressions = 
newAliases(aggregateExpressions)))
-
-  case oldVersion: Generate
-  if 
oldVersion.generatedSet.intersect(conflictingAttributes).nonEmpty =>
-val newOutput = oldVersion.generatorOutput.map(_.newInstance())
-(oldVersion, oldVersion.copy(generatorOutput = newOutput))
-
-  case oldVersion @ Window(_, windowExpressions, _, _, child)
-  if 
AttributeSet(windowExpressions.map(_.toAttribute)).intersect(conflictingAttributes)
-.nonEmpty =>
-(oldVersion, oldVersion.copy(windowExpressions = 
newAliases(windowExpressions)))
-}
-// Only handle first case, others will be fixed on the next pass.
-.headOption match {
-  case None =>
-/*
- * No result implies that there is a logical plan node that 
produces new references
- * that this rule cannot handle. When that is the case, there 
must be another rule
- * that resolves these conflicts. Otherwise, the analysis will 
fail.
- */
-j
-  case Some((oldRelation, newRelation)) =>
-val attributeRewrites = 
AttributeMap(oldRelation.output.zip(newRelation.output))
-val newRight = right transformUp {
-  case r if r == oldRelation => newRelation
-} transformUp {
-  case other => other transformExpressions {
-case a: Attribute => attributeRewrites.get(a).getOrElse(a)
-  }
-}
-j.copy(right = newRight)
-}
+j.copy(right = buildRightChild4Deduplication(left, right))
+
+  // Special handling for cases when self-intersect introduces 
duplicate expression IDs.
+  // In Optimizer, Intersect will be converted to semi join.
+  case i @ Intersect(left, right) if !i.duplicateResolved =>
+i.copy(right = buildRightChild4Deduplication(left, right))
--- End diff --

Do we need to do this for all non-leaf/unary nodes?  I think the only 
reason this wasn't a problem in the past was there were no optimizer rules for 
those nodes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-08 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r49239333
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -344,6 +344,59 @@ class Analyzer(
   }
 }
 
+def buildRightChild4Deduplication (left: LogicalPlan, right: 
LogicalPlan): LogicalPlan = {
--- End diff --

Sure. : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-08 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r49239389
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 ---
@@ -116,7 +116,18 @@ case class Union(left: LogicalPlan, right: 
LogicalPlan) extends SetOperation(lef
   }
 }
 
-case class Intersect(left: LogicalPlan, right: LogicalPlan) extends 
SetOperation(left, right)
+case class Intersect(left: LogicalPlan, right: LogicalPlan) extends 
SetOperation(left, right) {
+  def duplicateResolved: Boolean = 
left.outputSet.intersect(right.outputSet).isEmpty
+
+  // Intersect is only resolved if they don't introduce ambiguous 
expression ids,
+  // since it will be converted to semi Join by optimizer.
--- End diff --

Will update the comments in the code. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-08 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r49227764
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 ---
@@ -116,7 +116,18 @@ case class Union(left: LogicalPlan, right: 
LogicalPlan) extends SetOperation(lef
   }
 }
 
-case class Intersect(left: LogicalPlan, right: LogicalPlan) extends 
SetOperation(left, right)
+case class Intersect(left: LogicalPlan, right: LogicalPlan) extends 
SetOperation(left, right) {
+  def duplicateResolved: Boolean = 
left.outputSet.intersect(right.outputSet).isEmpty
+
+  // Intersect is only resolved if they don't introduce ambiguous 
expression ids,
+  // since it will be converted to semi Join by optimizer.
--- End diff --

This check might belong more generally in `childrenResolved` as I mention 
above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-08 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r49239278
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -388,57 +441,14 @@ class Analyzer(
 .map(_.asInstanceOf[NamedExpression])
 a.copy(aggregateExpressions = expanded)
 
-  // Special handling for cases when self-join introduce duplicate 
expression ids.
+  // Special handling for cases when self-join introduces duplicate 
expression IDs.
   case j @ Join(left, right, _, _) if !j.selfJoinResolved =>
-val conflictingAttributes = 
left.outputSet.intersect(right.outputSet)
-logDebug(s"Conflicting attributes 
${conflictingAttributes.mkString(",")} in $j")
-
-right.collect {
-  // Handle base relations that might appear more than once.
-  case oldVersion: MultiInstanceRelation
-  if 
oldVersion.outputSet.intersect(conflictingAttributes).nonEmpty =>
-val newVersion = oldVersion.newInstance()
-(oldVersion, newVersion)
-
-  // Handle projects that create conflicting aliases.
-  case oldVersion @ Project(projectList, _)
-  if 
findAliases(projectList).intersect(conflictingAttributes).nonEmpty =>
-(oldVersion, oldVersion.copy(projectList = 
newAliases(projectList)))
-
-  case oldVersion @ Aggregate(_, aggregateExpressions, _)
-  if 
findAliases(aggregateExpressions).intersect(conflictingAttributes).nonEmpty =>
-(oldVersion, oldVersion.copy(aggregateExpressions = 
newAliases(aggregateExpressions)))
-
-  case oldVersion: Generate
-  if 
oldVersion.generatedSet.intersect(conflictingAttributes).nonEmpty =>
-val newOutput = oldVersion.generatorOutput.map(_.newInstance())
-(oldVersion, oldVersion.copy(generatorOutput = newOutput))
-
-  case oldVersion @ Window(_, windowExpressions, _, _, child)
-  if 
AttributeSet(windowExpressions.map(_.toAttribute)).intersect(conflictingAttributes)
-.nonEmpty =>
-(oldVersion, oldVersion.copy(windowExpressions = 
newAliases(windowExpressions)))
-}
-// Only handle first case, others will be fixed on the next pass.
-.headOption match {
-  case None =>
-/*
- * No result implies that there is a logical plan node that 
produces new references
- * that this rule cannot handle. When that is the case, there 
must be another rule
- * that resolves these conflicts. Otherwise, the analysis will 
fail.
- */
-j
-  case Some((oldRelation, newRelation)) =>
-val attributeRewrites = 
AttributeMap(oldRelation.output.zip(newRelation.output))
-val newRight = right transformUp {
-  case r if r == oldRelation => newRelation
-} transformUp {
-  case other => other transformExpressions {
-case a: Attribute => attributeRewrites.get(a).getOrElse(a)
-  }
-}
-j.copy(right = newRight)
-}
+j.copy(right = buildRightChild4Deduplication(left, right))
+
+  // Special handling for cases when self-intersect introduces 
duplicate expression IDs.
+  // In Optimizer, Intersect will be converted to semi join.
+  case i @ Intersect(left, right) if !i.duplicateResolved =>
+i.copy(right = buildRightChild4Deduplication(left, right))
--- End diff --

I see, will do a change to make it more general. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-170082149
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49020/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-08 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r49227485
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -344,6 +344,59 @@ class Analyzer(
   }
 }
 
+def buildRightChild4Deduplication (left: LogicalPlan, right: 
LogicalPlan): LogicalPlan = {
--- End diff --

extra space.  Also I'm not a huge fan of `4`.  How about `dedupRight` with 
some scala doc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-170200677
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49039/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-170200645
  
**[Test build #49039 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49039/consoleFull)**
 for PR 10630 at commit 
[`4372170`](https://github.com/apache/spark/commit/4372170f600eb25996c3aa4f09d569312c263686).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-170200676
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-170193612
  
**[Test build #49039 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49039/consoleFull)**
 for PR 10630 at commit 
[`4372170`](https://github.com/apache/spark/commit/4372170f600eb25996c3aa4f09d569312c263686).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-170189725
  
**[Test build #49038 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49038/consoleFull)**
 for PR 10630 at commit 
[`f820c61`](https://github.com/apache/spark/commit/f820c616fe217494ccaed0bf74a0a7410ce503cf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-08 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-170189804
  
In this update, the code changes include
  - Fixed a bug in `Range` by inheriting the trait `MultiInstanceRelation`.
  - Added a de-duplication resolution for all the binary nodes: `Except`, 
`Union` and `Co-Group`, besides `Intersect` and `Join`.
  - Added a new function `duplicateResolved` for all the binary nodes.
  - Improved the analysis exception message when failure to resolve 
conflicting references
  - Resolved all the other comments. 

The analysis procedure is kind of tricky. I am unable to directly include 
`duplicateResolved` into `childrenResolved`. `resolve` is lazy evaluated. The 
resolution procedure need to follow the order: children at first, then itself, 
and then deduplicate the attributes' expression IDs in tis children.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-08 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-170013936
  
Found the root cause of this failure. We also need to change the analyzer 
to handle duplicate expression ids. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-08 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-170052294
  
The latest update is for resolving the ambiguous attributes, which could 
trigger invalid predicate pushdown by the Optimizer rule 
`PushPredicateThroughJoin`. The solution is the same as what we are doing for 
self join in analyzer. 

Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-170053763
  
**[Test build #49020 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49020/consoleFull)**
 for PR 10630 at commit 
[`04a26bd`](https://github.com/apache/spark/commit/04a26bd3beaab9d8ea3a4b300eb9f413b5eaa704).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-169591104
  
**[Test build #48916 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48916/consoleFull)**
 for PR 10630 at commit 
[`e4c34f0`](https://github.com/apache/spark/commit/e4c34f01acbb626a8084c99a5f0ad26fbfb175e2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-169618123
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-169635223
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48912/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-169618125
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48916/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-169719510
  
**[Test build #48947 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48947/consoleFull)**
 for PR 10630 at commit 
[`e4c34f0`](https://github.com/apache/spark/commit/e4c34f01acbb626a8084c99a5f0ad26fbfb175e2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/10630#discussion_r49094926
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -966,6 +952,22 @@ object ReplaceDistinctWithAggregate extends 
Rule[LogicalPlan] {
 }
 
 /**
+ * Replaces logical [[Intersect]] operator with a left-semi [[Join]] 
operator.
+ * {{{
+ *   SELECT a1, a2 FROM Tab1 INTERSECT SELECT b1, b2 FROM Tab2
+ *   ==>  SELECT a1, a2 FROM Tab1 LEFT SEMI JOIN Tab2 ON a1<=>b1 AND 
a2<=>b2
+ * }}}
+ */
+object ReplaceIntersectWithSemiJoin extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case Intersect(left, right) =>
+  assert(left.output.size == right.output.size)
+  val joinCond = left.output.zip(right.output).map { case (l, r) => 
EqualNullSafe(l, r) }
+  Join(left, right, LeftSemi, joinCond.reduceLeftOption(And))
--- End diff --

You are right! It should remove duplicates. I just tried it in DB2. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-169913011
  
**[Test build #49007 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49007/consoleFull)**
 for PR 10630 at commit 
[`27192be`](https://github.com/apache/spark/commit/27192be27708265aba6707d1100eefca537fb2b8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-169916605
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49007/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-169916531
  
**[Test build #49007 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49007/consoleFull)**
 for PR 10630 at commit 
[`27192be`](https://github.com/apache/spark/commit/27192be27708265aba6707d1100eefca537fb2b8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-169916600
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-07 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-169913342
  
In this update, the code changes include
  - Added `DISTINCT` on top of `Semi Join` for `Intersect`
  - Added a test case for detecting if the duplicate values are removed 
after `Intersect`
  - Threw an exception if the `Intersect` is not converted before 
conversion from logical plan to physical plan
  - Merge two rules `ReplaceIntersectWithSemiJoin` and 
`ReplaceDistinctWithAggregate` in the same batch `"Replace Operators"` for 
ensuring `ReplaceIntersectWithSemiJoin` should be called before 
`ReplaceDistinctWithAggregate` 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...

2016-01-07 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/10630#issuecomment-169917411
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >