[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-12-30 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-168112377
  
I'm going to close this pull request. If this is still relevant and you are 
interested in pushing it forward, please open a new pull request. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-12-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3249


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-09-01 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-136913305
  
Also cc @chenghao-intel who wrote the similar patch #4812


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-09-01 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-136913268
  
@ravipesala can you answer @marmbrus' question and/or rebase this to master 
so we can decide how to proceed with it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-06-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-114973063
  
Build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-06-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-114973023
  
 Build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-06-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-114973373
  
  [Test build #35706 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35706/consoleFull)
 for   PR 3249 at commit 
[`7653eee`](https://github.com/apache/spark/commit/7653eee3d8ded847ceeb3a03e11a507b827c5066).
 * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-06-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-114999781
  
Build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-06-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-114999674
  
  [Test build #35706 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35706/consoleFull)
 for   PR 3249 at commit 
[`7653eee`](https://github.com/apache/spark/commit/7653eee3d8ded847ceeb3a03e11a507b827c5066).
 * This patch **passes all tests**.
 * This patch **does not merge cleanly**.
 * This patch adds the following public classes _(experimental)_:
  * `case class SubqueryExpression(subquery: LogicalPlan) extends 
Expression `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-04-02 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-89090615
  
Sorry for letting this languish.  What is the status here and how does this 
relate to  #4812?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-03-02 Thread ravipesala
Github user ravipesala commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-76770343
  
@chenghao-intel  Thank you for reviewing it.I will go through your comments 
and fix it. And regarding ```not in``` case we can use ``` left outer join``` . 
I will try to add to same PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-27 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r25552305
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -422,6 +424,108 @@ class Analyzer(catalog: Catalog,
 Generate(g, join = false, outer = false, None, child)
 }
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to left semi join.
+   * select T1.x from T1 where T1.x in (select T2.y from T2) transformed to
+   * select T1.x from T1 left semi join T2 on T1.x = T2.y.
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case p: LogicalPlan if !p.childrenResolved = p
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = conditions.collect {
+  case In(exp, Seq(SubqueryExpression(subquery))) = (exp, 
subquery)
+}
+// Replace subqueries with a dummy true literal since they are 
evaluated separately now.
+val transformedConds = conditions.transform {
+  case In(_, Seq(SubqueryExpression(_))) = Literal(true)
+}
+subqueryExprs match {
+  case Seq() = filter // No subqueries.
+  case Seq((exp, subquery)) =
+createLeftSemiJoin(
+  child,
+  exp,
+  subquery,
+  transformedConds)
+  case _ =
+throw new TreeNodeException(filter, Only one SubQuery 
expression is supported.)
+}
+}
+
+/**
+ * Create LeftSemi join with parent query to the subquery which is 
mentioned in 'IN' predicate
+ * And combine the subquery conditions and parent query conditions.
+ */ 
+def createLeftSemiJoin(left: LogicalPlan,
+value: Expression,
+subquery: LogicalPlan,
+parentConds: Expression) : LogicalPlan = {
+  val (transformedPlan, subqueryConds) = 
transformAndGetConditions(value, subquery)
+  // Add both parent query conditions and subquery conditions as join 
conditions
+  val allPredicates = And(parentConds, subqueryConds)
+  Join(left, transformedPlan, LeftSemi, Some(allPredicates))
+}
+
+/**
+ * Transform the subquery LogicalPlan and add the expressions which 
are used as filters to the
+ * projection. And also return filter conditions used in subquery
+ */
+def transformAndGetConditions(value: Expression,
+  subquery: LogicalPlan): (LogicalPlan, Expression) = {
+  val expr = new scala.collection.mutable.ArrayBuffer[Expression]()
+  // TODO : we only decorelate subqueries in very specific cases like 
the cases mentioned above
+  // in documentation. The more complex queries like using of 
subqueries inside subqueries can 
+  // be supported in future.
+  val transformedPlan = subquery transform {
+case project @ Project(projectList, f @ Filter(condition, child)) 
=
+  // Don't support more than one item in select list of subquery
+  if(projectList.size  1) {
+throw new TreeNodeException(
+project,
+SubQuery can contain only one item in Select List)
+  }
+  val resolvedChild = ResolveRelations(child)
+  // Add the expressions to the projections which are used as 
filters in subquery
+  val toBeAddedExprs = f.references.filter{a =
+resolvedChild.resolve(a.name, resolver) != None  
!project.outputSet.contains(a)}
+  val nameToExprMap = collection.mutable.Map[String, Alias]()
+  // Create aliases for all projection expressions.
+  val witAliases = (projectList ++ 
toBeAddedExprs).zipWithIndex.map {
+case (exp, index) = 
+  nameToExprMap.put(exp.name, Alias(exp, ssqc$index)())
+  Alias(exp, ssqc$index)()
+  }
+  // Replace the condition column names with alias names.
+  val transformedConds = condition.transform {
+case a: Attribute if resolvedChild.resolve(a.name, resolver) 
!= None =
+  nameToExprMap.get(a.name).get.toAttribute
+  }
+  // Join the first projection column of subquery to the main 
query and add as condition
+  // TODO : We can avoid if the parent condition already has this 
condition.
+  expr += EqualTo(value, witAliases(0).toAttribute)
+  expr += transformedConds
--- End diff --

Connect the subquery with the join condition doesn't make sense to me, as 
we will transform the whole logical plan as 

[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-27 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-76503896
  
Thank you @ravipesala for implementing this, however, this PR probably 
involve some unnecessary join condition transformation, probably you need to 
understand the rule of pushing down the join filter / condition first. Sorry, 
please correct me if I misunderstood something.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-27 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r25552219
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -422,6 +424,108 @@ class Analyzer(catalog: Catalog,
 Generate(g, join = false, outer = false, None, child)
 }
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to left semi join.
+   * select T1.x from T1 where T1.x in (select T2.y from T2) transformed to
+   * select T1.x from T1 left semi join T2 on T1.x = T2.y.
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case p: LogicalPlan if !p.childrenResolved = p
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = conditions.collect {
+  case In(exp, Seq(SubqueryExpression(subquery))) = (exp, 
subquery)
+}
+// Replace subqueries with a dummy true literal since they are 
evaluated separately now.
+val transformedConds = conditions.transform {
+  case In(_, Seq(SubqueryExpression(_))) = Literal(true)
+}
+subqueryExprs match {
+  case Seq() = filter // No subqueries.
+  case Seq((exp, subquery)) =
+createLeftSemiJoin(
+  child,
+  exp,
+  subquery,
+  transformedConds)
+  case _ =
+throw new TreeNodeException(filter, Only one SubQuery 
expression is supported.)
+}
+}
+
+/**
+ * Create LeftSemi join with parent query to the subquery which is 
mentioned in 'IN' predicate
+ * And combine the subquery conditions and parent query conditions.
+ */ 
+def createLeftSemiJoin(left: LogicalPlan,
+value: Expression,
+subquery: LogicalPlan,
+parentConds: Expression) : LogicalPlan = {
+  val (transformedPlan, subqueryConds) = 
transformAndGetConditions(value, subquery)
+  // Add both parent query conditions and subquery conditions as join 
conditions
+  val allPredicates = And(parentConds, subqueryConds)
+  Join(left, transformedPlan, LeftSemi, Some(allPredicates))
+}
+
+/**
+ * Transform the subquery LogicalPlan and add the expressions which 
are used as filters to the
+ * projection. And also return filter conditions used in subquery
+ */
+def transformAndGetConditions(value: Expression,
+  subquery: LogicalPlan): (LogicalPlan, Expression) = {
+  val expr = new scala.collection.mutable.ArrayBuffer[Expression]()
+  // TODO : we only decorelate subqueries in very specific cases like 
the cases mentioned above
+  // in documentation. The more complex queries like using of 
subqueries inside subqueries can 
+  // be supported in future.
+  val transformedPlan = subquery transform {
+case project @ Project(projectList, f @ Filter(condition, child)) 
=
+  // Don't support more than one item in select list of subquery
+  if(projectList.size  1) {
+throw new TreeNodeException(
+project,
+SubQuery can contain only one item in Select List)
+  }
+  val resolvedChild = ResolveRelations(child)
+  // Add the expressions to the projections which are used as 
filters in subquery
+  val toBeAddedExprs = f.references.filter{a =
+resolvedChild.resolve(a.name, resolver) != None  
!project.outputSet.contains(a)}
+  val nameToExprMap = collection.mutable.Map[String, Alias]()
+  // Create aliases for all projection expressions.
+  val witAliases = (projectList ++ 
toBeAddedExprs).zipWithIndex.map {
+case (exp, index) = 
+  nameToExprMap.put(exp.name, Alias(exp, ssqc$index)())
+  Alias(exp, ssqc$index)()
+  }
+  // Replace the condition column names with alias names.
+  val transformedConds = condition.transform {
--- End diff --

I am not so sure why you cares about the subquery condition, as in Hive wiki
```
As of Hive 0.13 some types of subqueries are supported in the WHERE clause. 
Those are queries where the result of the query can be treated as a constant 
for IN and NOT IN statements (called uncorrelated subqueries because the 
subquery does not reference columns from the parent query):
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature 

[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-27 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r25552386
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -422,6 +424,108 @@ class Analyzer(catalog: Catalog,
 Generate(g, join = false, outer = false, None, child)
 }
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to left semi join.
+   * select T1.x from T1 where T1.x in (select T2.y from T2) transformed to
+   * select T1.x from T1 left semi join T2 on T1.x = T2.y.
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case p: LogicalPlan if !p.childrenResolved = p
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = conditions.collect {
+  case In(exp, Seq(SubqueryExpression(subquery))) = (exp, 
subquery)
+}
+// Replace subqueries with a dummy true literal since they are 
evaluated separately now.
+val transformedConds = conditions.transform {
+  case In(_, Seq(SubqueryExpression(_))) = Literal(true)
+}
+subqueryExprs match {
+  case Seq() = filter // No subqueries.
+  case Seq((exp, subquery)) =
+createLeftSemiJoin(
+  child,
+  exp,
+  subquery,
+  transformedConds)
+  case _ =
+throw new TreeNodeException(filter, Only one SubQuery 
expression is supported.)
+}
+}
+
+/**
+ * Create LeftSemi join with parent query to the subquery which is 
mentioned in 'IN' predicate
+ * And combine the subquery conditions and parent query conditions.
+ */ 
+def createLeftSemiJoin(left: LogicalPlan,
+value: Expression,
+subquery: LogicalPlan,
+parentConds: Expression) : LogicalPlan = {
+  val (transformedPlan, subqueryConds) = 
transformAndGetConditions(value, subquery)
+  // Add both parent query conditions and subquery conditions as join 
conditions
+  val allPredicates = And(parentConds, subqueryConds)
+  Join(left, transformedPlan, LeftSemi, Some(allPredicates))
+}
+
+/**
+ * Transform the subquery LogicalPlan and add the expressions which 
are used as filters to the
+ * projection. And also return filter conditions used in subquery
+ */
+def transformAndGetConditions(value: Expression,
+  subquery: LogicalPlan): (LogicalPlan, Expression) = {
+  val expr = new scala.collection.mutable.ArrayBuffer[Expression]()
+  // TODO : we only decorelate subqueries in very specific cases like 
the cases mentioned above
+  // in documentation. The more complex queries like using of 
subqueries inside subqueries can 
+  // be supported in future.
+  val transformedPlan = subquery transform {
+case project @ Project(projectList, f @ Filter(condition, child)) 
=
+  // Don't support more than one item in select list of subquery
+  if(projectList.size  1) {
+throw new TreeNodeException(
+project,
+SubQuery can contain only one item in Select List)
+  }
+  val resolvedChild = ResolveRelations(child)
+  // Add the expressions to the projections which are used as 
filters in subquery
+  val toBeAddedExprs = f.references.filter{a =
+resolvedChild.resolve(a.name, resolver) != None  
!project.outputSet.contains(a)}
+  val nameToExprMap = collection.mutable.Map[String, Alias]()
+  // Create aliases for all projection expressions.
+  val witAliases = (projectList ++ 
toBeAddedExprs).zipWithIndex.map {
+case (exp, index) = 
+  nameToExprMap.put(exp.name, Alias(exp, ssqc$index)())
+  Alias(exp, ssqc$index)()
+  }
+  // Replace the condition column names with alias names.
+  val transformedConds = condition.transform {
+case a: Attribute if resolvedChild.resolve(a.name, resolver) 
!= None =
+  nameToExprMap.get(a.name).get.toAttribute
+  }
+  // Join the first projection column of subquery to the main 
query and add as condition
+  // TODO : We can avoid if the parent condition already has this 
condition.
+  expr += EqualTo(value, witAliases(0).toAttribute)
+  expr += transformedConds
--- End diff --



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-76455136
  
  [Test build #28083 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28083/consoleFull)
 for   PR 3249 at commit 
[`7653eee`](https://github.com/apache/spark/commit/7653eee3d8ded847ceeb3a03e11a507b827c5066).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-76467964
  
  [Test build #28083 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28083/consoleFull)
 for   PR 3249 at commit 
[`7653eee`](https://github.com/apache/spark/commit/7653eee3d8ded847ceeb3a03e11a507b827c5066).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class SubqueryExpression(subquery: LogicalPlan) extends 
Expression `
  * `logError(User class threw exception:  + 
cause.getMessage, cause)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-76467971
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28083/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-27 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-76358262
  
@ravipesala I've created another PR for `[NOT]EXISTS`, I believe the `IN` 
can be transformed into `EXISTS` and the implementation can be much more 
simple, can you review the code for me?

Please see #4812 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-27 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-76501277
  
@ravipesala, do you have any idea how to implement the `NOT IN`? I believe 
we should consider how to implement the `NOT IN` when doing `IN`, or should 
they come within the same PR?

BTW, can you also enable the hive compatible suite like `subquery_in.q` or 
`subquery_in_having.q` if you think that's also supported in this PR. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-27 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r25551625
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubqueryExpression.scala
 ---
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+
+/**
+ * Evaluates whether `subquery` result contains `value`. 
+ * For example : 'SELECT * FROM src a WHERE a.key in (SELECT b.key FROM 
src b)'
+ * @param subquery  In the above example 'SELECT b.key FROM src b' is 
'subquery'
+ */
+case class SubqueryExpression(subquery: LogicalPlan) extends Expression {
--- End diff --

Instead of making the Subquery as a fake expression, a better idea probably 
create a new logical plan like 
```
SubQueryIn(left: LogicalPlan, nested: LogicalPlan, isNotIn:Boolean)
```
That's also how I implement the `EXISTS` at 
https://github.com/apache/spark/pull/4812/files#diff-9a11e98e8f4bd1c4bb18ca6a7a7b8948R262


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-27 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r25551740
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -422,6 +424,108 @@ class Analyzer(catalog: Catalog,
 Generate(g, join = false, outer = false, None, child)
 }
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to left semi join.
+   * select T1.x from T1 where T1.x in (select T2.y from T2) transformed to
+   * select T1.x from T1 left semi join T2 on T1.x = T2.y.
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case p: LogicalPlan if !p.childrenResolved = p
+  case filter @ Filter(conditions, child) =
--- End diff --

We are not going to handle the non `Subquery` case here right? how about
```
case filter @ Filter(In(expr, SubqueryExpression(subquery)), child) =
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-27 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r25551872
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -422,6 +424,108 @@ class Analyzer(catalog: Catalog,
 Generate(g, join = false, outer = false, None, child)
 }
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to left semi join.
+   * select T1.x from T1 where T1.x in (select T2.y from T2) transformed to
+   * select T1.x from T1 left semi join T2 on T1.x = T2.y.
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case p: LogicalPlan if !p.childrenResolved = p
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = conditions.collect {
+  case In(exp, Seq(SubqueryExpression(subquery))) = (exp, 
subquery)
+}
+// Replace subqueries with a dummy true literal since they are 
evaluated separately now.
+val transformedConds = conditions.transform {
+  case In(_, Seq(SubqueryExpression(_))) = Literal(true)
+}
+subqueryExprs match {
+  case Seq() = filter // No subqueries.
+  case Seq((exp, subquery)) =
+createLeftSemiJoin(
+  child,
+  exp,
+  subquery,
+  transformedConds)
+  case _ =
+throw new TreeNodeException(filter, Only one SubQuery 
expression is supported.)
+}
+}
+
+/**
+ * Create LeftSemi join with parent query to the subquery which is 
mentioned in 'IN' predicate
+ * And combine the subquery conditions and parent query conditions.
+ */ 
+def createLeftSemiJoin(left: LogicalPlan,
+value: Expression,
+subquery: LogicalPlan,
+parentConds: Expression) : LogicalPlan = {
+  val (transformedPlan, subqueryConds) = 
transformAndGetConditions(value, subquery)
+  // Add both parent query conditions and subquery conditions as join 
conditions
+  val allPredicates = And(parentConds, subqueryConds)
+  Join(left, transformedPlan, LeftSemi, Some(allPredicates))
+}
+
+/**
+ * Transform the subquery LogicalPlan and add the expressions which 
are used as filters to the
+ * projection. And also return filter conditions used in subquery
+ */
+def transformAndGetConditions(value: Expression,
+  subquery: LogicalPlan): (LogicalPlan, Expression) = {
+  val expr = new scala.collection.mutable.ArrayBuffer[Expression]()
+  // TODO : we only decorelate subqueries in very specific cases like 
the cases mentioned above
+  // in documentation. The more complex queries like using of 
subqueries inside subqueries can 
+  // be supported in future.
+  val transformedPlan = subquery transform {
+case project @ Project(projectList, f @ Filter(condition, child)) 
=
+  // Don't support more than one item in select list of subquery
+  if(projectList.size  1) {
+throw new TreeNodeException(
+project,
+SubQuery can contain only one item in Select List)
+  }
+  val resolvedChild = ResolveRelations(child)
--- End diff --

Why do you resolve the relation here? I believe the subquery should be 
resolved already before entering the rule of `SubQueryExpressions`, is that the 
limitation of using the `SubQueryExpression`, which can not check if children 
logical plan if it's resolved?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-74372645
  
  [Test build #27482 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27482/consoleFull)
 for   PR 3249 at commit 
[`9e404c0`](https://github.com/apache/spark/commit/9e404c053a9c43f5d5d6b2b696dd3cbd4f2c7c8f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-74373266
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27483/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-74373265
  
  [Test build #27483 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27483/consoleFull)
 for   PR 3249 at commit 
[`e70bc79`](https://github.com/apache/spark/commit/e70bc798337fbff98e2b7389d7143453c37b7054).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-74372730
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27482/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-74372729
  
  [Test build #27482 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27482/consoleFull)
 for   PR 3249 at commit 
[`9e404c0`](https://github.com/apache/spark/commit/9e404c053a9c43f5d5d6b2b696dd3cbd4f2c7c8f).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-74373203
  
  [Test build #27483 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27483/consoleFull)
 for   PR 3249 at commit 
[`e70bc79`](https://github.com/apache/spark/commit/e70bc798337fbff98e2b7389d7143453c37b7054).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-74378694
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27486/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-74378691
  
  [Test build #27486 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27486/consoleFull)
 for   PR 3249 at commit 
[`dbd0b87`](https://github.com/apache/spark/commit/dbd0b873acc6cb928140b3044ac3b989e9e54240).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class SubqueryExpression(subquery: LogicalPlan) extends 
Expression `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-74378995
  
  [Test build #27487 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27487/consoleFull)
 for   PR 3249 at commit 
[`036f3c6`](https://github.com/apache/spark/commit/036f3c60bc1e190fddf3c4593b6b6f8f28e5d62b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-74378280
  
  [Test build #27486 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27486/consoleFull)
 for   PR 3249 at commit 
[`dbd0b87`](https://github.com/apache/spark/commit/dbd0b873acc6cb928140b3044ac3b989e9e54240).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-74377584
  
  [Test build #27484 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27484/consoleFull)
 for   PR 3249 at commit 
[`0a90b21`](https://github.com/apache/spark/commit/0a90b218b37ab3618bc074956c0b9165a1cb5ce2).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-74377896
  
  [Test build #27484 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27484/consoleFull)
 for   PR 3249 at commit 
[`0a90b21`](https://github.com/apache/spark/commit/0a90b218b37ab3618bc074956c0b9165a1cb5ce2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class SubqueryExpression(subquery: LogicalPlan) extends 
Expression `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-74377898
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27484/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-74381608
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27487/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-74381599
  
  [Test build #27487 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27487/consoleFull)
 for   PR 3249 at commit 
[`036f3c6`](https://github.com/apache/spark/commit/036f3c60bc1e190fddf3c4593b6b6f8f28e5d62b).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class SubqueryExpression(subquery: LogicalPlan) extends 
Expression `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-06 Thread ravipesala
Github user ravipesala commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-73280560
  
@marmbrus Please check whether it is ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-05 Thread scwf
Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-73187723
  
LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-73187137
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26897/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-73187135
  
  [Test build #26897 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26897/consoleFull)
 for   PR 3249 at commit 
[`eb0a13b`](https://github.com/apache/spark/commit/eb0a13b2d2fb87b04899c05b62ce82c237dff750).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class SubqueryExpression(subquery: LogicalPlan) extends 
Expression `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-73182749
  
  [Test build #26897 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26897/consoleFull)
 for   PR 3249 at commit 
[`eb0a13b`](https://github.com/apache/spark/commit/eb0a13b2d2fb87b04899c05b62ce82c237dff750).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-73100763
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26847/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-73100753
  
  [Test build #26847 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26847/consoleFull)
 for   PR 3249 at commit 
[`e49452b`](https://github.com/apache/spark/commit/e49452be83f0f7f1ca2cf252348e03d454137ed0).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-02-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-73098157
  
  [Test build #26847 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26847/consoleFull)
 for   PR 3249 at commit 
[`e49452b`](https://github.com/apache/spark/commit/e49452be83f0f7f1ca2cf252348e03d454137ed0).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-01-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-70603746
  
  [Test build #25788 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25788/consoleFull)
 for   PR 3249 at commit 
[`0347f02`](https://github.com/apache/spark/commit/0347f027cef940d987cd079357059c7695e9400f).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class SubqueryExpression(subquery: LogicalPlan) extends 
Expression `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-01-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-70603747
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25788/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-01-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-70599526
  
  [Test build #25788 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25788/consoleFull)
 for   PR 3249 at commit 
[`0347f02`](https://github.com/apache/spark/commit/0347f027cef940d987cd079357059c7695e9400f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-01-19 Thread ravipesala
Github user ravipesala commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-70528755
  
@marmbrus Please review it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-01-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-70356119
  
  [Test build #25697 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25697/consoleFull)
 for   PR 3249 at commit 
[`86e98d5`](https://github.com/apache/spark/commit/86e98d5484d69167d1de7b1bc264016989924a59).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-01-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-70357598
  
  [Test build #25697 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25697/consoleFull)
 for   PR 3249 at commit 
[`86e98d5`](https://github.com/apache/spark/commit/86e98d5484d69167d1de7b1bc264016989924a59).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-01-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-70357600
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25697/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-01-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-68677425
  
  [Test build #25049 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25049/consoleFull)
 for   PR 3249 at commit 
[`09e471a`](https://github.com/apache/spark/commit/09e471aa670298bcb1e8e135d260da54a11de88e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class SubqueryExpression(subquery: LogicalPlan) extends 
Expression `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-68677427
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25049/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-01-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-68673730
  
  [Test build #25049 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25049/consoleFull)
 for   PR 3249 at commit 
[`09e471a`](https://github.com/apache/spark/commit/09e471aa670298bcb1e8e135d260da54a11de88e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-01-04 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22448850
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -414,6 +418,123 @@ class Analyzer(catalog: Catalog,
 Generate(g, join = false, outer = false, None, child)
 }
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case p: LogicalPlan if !p.childrenResolved = p
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new scala.collection.mutable.ArrayBuffer[In]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+conditions.collect {
+  case s @ In(exp, Seq(SubqueryExpression(subquery))) =
+subqueryExprs += s
+}
+val transformedConds = conditions.transform {
+  // Replace with dummy
+  case s @ In(exp,Seq(SubqueryExpression(subquery))) =
+Literal(true)
+}
+if (subqueryExprs.size == 1) {
+  val subqueryExpr = subqueryExprs.remove(0)
+  createLeftSemiJoin(
+child,
+subqueryExpr.value,
+subqueryExpr.list(0).asInstanceOf[SubqueryExpression].subquery,
+transformedConds)
+} else if (subqueryExprs.size  1) {
+  // Only one subquery expression is supported.
+  throw new TreeNodeException(filter, Only one SubQuery 
expression is supported.)
+} else {
+  filter
+}
+}
+
+/**
+ * Create LeftSemi join with parent query to the subquery which is 
mentioned in 'IN' predicate
+ * And combine the subquery conditions and parent query conditions.
+ */ 
+def createLeftSemiJoin(left: LogicalPlan,
+value: Expression,
+subquery: LogicalPlan,
+parentConds: Expression) : LogicalPlan = {
+  val (transformedPlan, subqueryConds) = 
transformAndGetConditions(value, subquery)
+  // Unify the parent query conditions and subquery conditions and add 
these as join conditions
+  val unifyConds = And(parentConds, subqueryConds)
+  Join(left, transformedPlan, LeftSemi, Some(unifyConds))
+}
+
+/**
+ * Transform the subquery LogicalPlan and add the expressions which 
are used as filters to the
+ * projection. And also return filter conditions used in subquery
+ */
+def transformAndGetConditions(value: Expression,
+  subquery: LogicalPlan): (LogicalPlan, Expression) = {
+  val expr = new scala.collection.mutable.ArrayBuffer[Expression]()
+  val transformedPlan = subquery transform {
+case project @ Project(projectList, f @ Filter(condition, child)) 
=
+  // Don't support more than one item in select list of subquery
+  if(projectList.size  1) {
+throw new TreeNodeException(
+project,
+SubQuery can contain only one item in Select List)
+  }
+  val resolvedChild = ResolveRelations(child)
+  // Add the expressions to the projections which are used as 
filters in subquery
+  val toBeAddedExprs = f.references.filter{a =
+resolvedChild.resolve(a.name, resolver) != None  
!project.outputSet.contains(a)}
+  val nameToExprMap = collection.mutable.Map[String, Alias]()
+  // Create aliases for all projection expressions.
+  val witAliases = (projectList ++ 
toBeAddedExprs).zipWithIndex.map {
+case (exp, index) = 
+  nameToExprMap.put(exp.name, Alias(exp, ssqc$index)())
+  Alias(exp, ssqc$index)()
+  }
+  // Replace the condition column names with alias names.
+  val transformedConds = condition.transform {
+

[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-01-04 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22448848
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubqueryExpression.scala
 ---
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+
+/**
+ * Evaluates whether `subquery` result contains `value`. 
+ * For example : 'SELECT * FROM src a WHERE a.key in (SELECT b.key FROM 
src b)'
+ * @param subquery  In the above example 'SELECT b.key FROM src b' is 
'subquery'
+ */
+case class SubqueryExpression(subquery: LogicalPlan) extends Expression {
+
+  type EvaluatedType = Any
+  def dataType = subquery.output.head.dataType
+  override def foldable = false
+  def nullable = true
+  override def toString = sSubqueryExpression($subquery)
--- End diff --

Ok. I will change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-01-04 Thread ravipesala
Github user ravipesala commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-68673618
  
Thank you for reviewing it. Fixed the review comments. And added the TODO 
for future expansion of complex queries.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-01-04 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22448842
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -414,6 +418,123 @@ class Analyzer(catalog: Catalog,
 Generate(g, join = false, outer = false, None, child)
 }
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case p: LogicalPlan if !p.childrenResolved = p
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new scala.collection.mutable.ArrayBuffer[In]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+conditions.collect {
+  case s @ In(exp, Seq(SubqueryExpression(subquery))) =
+subqueryExprs += s
+}
+val transformedConds = conditions.transform {
+  // Replace with dummy
+  case s @ In(exp,Seq(SubqueryExpression(subquery))) =
+Literal(true)
+}
+if (subqueryExprs.size == 1) {
+  val subqueryExpr = subqueryExprs.remove(0)
+  createLeftSemiJoin(
+child,
+subqueryExpr.value,
+subqueryExpr.list(0).asInstanceOf[SubqueryExpression].subquery,
+transformedConds)
+} else if (subqueryExprs.size  1) {
+  // Only one subquery expression is supported.
+  throw new TreeNodeException(filter, Only one SubQuery 
expression is supported.)
+} else {
+  filter
+}
--- End diff --

Thanks for code snippet. It is good I will add it like this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2015-01-04 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22448876
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -414,6 +418,123 @@ class Analyzer(catalog: Catalog,
 Generate(g, join = false, outer = false, None, child)
 }
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case p: LogicalPlan if !p.childrenResolved = p
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new scala.collection.mutable.ArrayBuffer[In]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+conditions.collect {
+  case s @ In(exp, Seq(SubqueryExpression(subquery))) =
+subqueryExprs += s
+}
+val transformedConds = conditions.transform {
+  // Replace with dummy
+  case s @ In(exp,Seq(SubqueryExpression(subquery))) =
+Literal(true)
+}
+if (subqueryExprs.size == 1) {
+  val subqueryExpr = subqueryExprs.remove(0)
+  createLeftSemiJoin(
+child,
+subqueryExpr.value,
+subqueryExpr.list(0).asInstanceOf[SubqueryExpression].subquery,
+transformedConds)
+} else if (subqueryExprs.size  1) {
+  // Only one subquery expression is supported.
+  throw new TreeNodeException(filter, Only one SubQuery 
expression is supported.)
+} else {
+  filter
+}
+}
+
+/**
+ * Create LeftSemi join with parent query to the subquery which is 
mentioned in 'IN' predicate
+ * And combine the subquery conditions and parent query conditions.
+ */ 
+def createLeftSemiJoin(left: LogicalPlan,
+value: Expression,
+subquery: LogicalPlan,
+parentConds: Expression) : LogicalPlan = {
+  val (transformedPlan, subqueryConds) = 
transformAndGetConditions(value, subquery)
+  // Unify the parent query conditions and subquery conditions and add 
these as join conditions
+  val unifyConds = And(parentConds, subqueryConds)
+  Join(left, transformedPlan, LeftSemi, Some(unifyConds))
+}
+
+/**
+ * Transform the subquery LogicalPlan and add the expressions which 
are used as filters to the
+ * projection. And also return filter conditions used in subquery
+ */
+def transformAndGetConditions(value: Expression,
+  subquery: LogicalPlan): (LogicalPlan, Expression) = {
+  val expr = new scala.collection.mutable.ArrayBuffer[Expression]()
+  val transformedPlan = subquery transform {
+case project @ Project(projectList, f @ Filter(condition, child)) 
=
--- End diff --

Yes , this type of queries cannot be evaluated in present code. May be we 
can expand it in future. I will add the TODO. And even hive also does not seems 
work this type of queries.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-30 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22363285
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -414,6 +418,123 @@ class Analyzer(catalog: Catalog,
 Generate(g, join = false, outer = false, None, child)
 }
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case p: LogicalPlan if !p.childrenResolved = p
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new scala.collection.mutable.ArrayBuffer[In]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+conditions.collect {
--- End diff --

Collect returns the matching expressions:

```scala
val subqueryExprs = conditions.collect {
  case s @ In(exp, Seq(SubqueryExpression(subquery))) = s
}


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-30 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22363305
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -414,6 +418,123 @@ class Analyzer(catalog: Catalog,
 Generate(g, join = false, outer = false, None, child)
 }
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case p: LogicalPlan if !p.childrenResolved = p
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new scala.collection.mutable.ArrayBuffer[In]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
--- End diff --

This is unused?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-30 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22363362
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -414,6 +418,123 @@ class Analyzer(catalog: Catalog,
 Generate(g, join = false, outer = false, None, child)
 }
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case p: LogicalPlan if !p.childrenResolved = p
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new scala.collection.mutable.ArrayBuffer[In]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+conditions.collect {
+  case s @ In(exp, Seq(SubqueryExpression(subquery))) =
--- End diff --

Also I'd consider extracting to a tuple `(exp, subquery)` here to avoid the 
weirdness of `.list(0)` below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-30 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22363511
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -414,6 +418,123 @@ class Analyzer(catalog: Catalog,
 Generate(g, join = false, outer = false, None, child)
 }
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case p: LogicalPlan if !p.childrenResolved = p
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new scala.collection.mutable.ArrayBuffer[In]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+conditions.collect {
+  case s @ In(exp, Seq(SubqueryExpression(subquery))) =
+subqueryExprs += s
+}
+val transformedConds = conditions.transform {
+  // Replace with dummy
+  case s @ In(exp,Seq(SubqueryExpression(subquery))) =
+Literal(true)
+}
+if (subqueryExprs.size == 1) {
+  val subqueryExpr = subqueryExprs.remove(0)
+  createLeftSemiJoin(
+child,
+subqueryExpr.value,
+subqueryExpr.list(0).asInstanceOf[SubqueryExpression].subquery,
+transformedConds)
+} else if (subqueryExprs.size  1) {
+  // Only one subquery expression is supported.
+  throw new TreeNodeException(filter, Only one SubQuery 
expression is supported.)
+} else {
+  filter
+}
+}
+
+/**
+ * Create LeftSemi join with parent query to the subquery which is 
mentioned in 'IN' predicate
+ * And combine the subquery conditions and parent query conditions.
+ */ 
+def createLeftSemiJoin(left: LogicalPlan,
+value: Expression,
+subquery: LogicalPlan,
+parentConds: Expression) : LogicalPlan = {
+  val (transformedPlan, subqueryConds) = 
transformAndGetConditions(value, subquery)
+  // Unify the parent query conditions and subquery conditions and add 
these as join conditions
+  val unifyConds = And(parentConds, subqueryConds)
--- End diff --

This is a bit of a nit, but unify has [a pretty specific 
meaning](http://en.wikipedia.org/wiki/Unification_%28computer_science%29) when 
talking about predicates, and I don't think that is what is happening here.  
Maybe `allPredicates` or something?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-30 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22365235
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -414,6 +418,123 @@ class Analyzer(catalog: Catalog,
 Generate(g, join = false, outer = false, None, child)
 }
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case p: LogicalPlan if !p.childrenResolved = p
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new scala.collection.mutable.ArrayBuffer[In]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+conditions.collect {
+  case s @ In(exp, Seq(SubqueryExpression(subquery))) =
+subqueryExprs += s
+}
+val transformedConds = conditions.transform {
+  // Replace with dummy
+  case s @ In(exp,Seq(SubqueryExpression(subquery))) =
+Literal(true)
+}
+if (subqueryExprs.size == 1) {
+  val subqueryExpr = subqueryExprs.remove(0)
+  createLeftSemiJoin(
+child,
+subqueryExpr.value,
+subqueryExpr.list(0).asInstanceOf[SubqueryExpression].subquery,
+transformedConds)
+} else if (subqueryExprs.size  1) {
+  // Only one subquery expression is supported.
+  throw new TreeNodeException(filter, Only one SubQuery 
expression is supported.)
+} else {
+  filter
+}
+}
+
+/**
+ * Create LeftSemi join with parent query to the subquery which is 
mentioned in 'IN' predicate
+ * And combine the subquery conditions and parent query conditions.
+ */ 
+def createLeftSemiJoin(left: LogicalPlan,
+value: Expression,
+subquery: LogicalPlan,
+parentConds: Expression) : LogicalPlan = {
+  val (transformedPlan, subqueryConds) = 
transformAndGetConditions(value, subquery)
+  // Unify the parent query conditions and subquery conditions and add 
these as join conditions
+  val unifyConds = And(parentConds, subqueryConds)
+  Join(left, transformedPlan, LeftSemi, Some(unifyConds))
+}
+
+/**
+ * Transform the subquery LogicalPlan and add the expressions which 
are used as filters to the
+ * projection. And also return filter conditions used in subquery
+ */
+def transformAndGetConditions(value: Expression,
+  subquery: LogicalPlan): (LogicalPlan, Expression) = {
+  val expr = new scala.collection.mutable.ArrayBuffer[Expression]()
+  val transformedPlan = subquery transform {
+case project @ Project(projectList, f @ Filter(condition, child)) 
=
--- End diff --

This works for the two queries you specified but is going to fail as soon 
as things get even a little more complicated.  For example

```scala
SELECT a.key FROM src a
WHERE a.key in
(SELECT b FROM (SELECT b.key FROM src b WHERE b.key in (230)and 
a.value=b.value) a)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-30 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22365363
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -414,6 +418,123 @@ class Analyzer(catalog: Catalog,
 Generate(g, join = false, outer = false, None, child)
 }
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case p: LogicalPlan if !p.childrenResolved = p
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new scala.collection.mutable.ArrayBuffer[In]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+conditions.collect {
+  case s @ In(exp, Seq(SubqueryExpression(subquery))) =
+subqueryExprs += s
+}
+val transformedConds = conditions.transform {
+  // Replace with dummy
+  case s @ In(exp,Seq(SubqueryExpression(subquery))) =
+Literal(true)
+}
+if (subqueryExprs.size == 1) {
+  val subqueryExpr = subqueryExprs.remove(0)
+  createLeftSemiJoin(
+child,
+subqueryExpr.value,
+subqueryExpr.list(0).asInstanceOf[SubqueryExpression].subquery,
+transformedConds)
+} else if (subqueryExprs.size  1) {
+  // Only one subquery expression is supported.
+  throw new TreeNodeException(filter, Only one SubQuery 
expression is supported.)
+} else {
+  filter
+}
+}
+
+/**
+ * Create LeftSemi join with parent query to the subquery which is 
mentioned in 'IN' predicate
+ * And combine the subquery conditions and parent query conditions.
+ */ 
+def createLeftSemiJoin(left: LogicalPlan,
+value: Expression,
+subquery: LogicalPlan,
+parentConds: Expression) : LogicalPlan = {
+  val (transformedPlan, subqueryConds) = 
transformAndGetConditions(value, subquery)
+  // Unify the parent query conditions and subquery conditions and add 
these as join conditions
+  val unifyConds = And(parentConds, subqueryConds)
+  Join(left, transformedPlan, LeftSemi, Some(unifyConds))
+}
+
+/**
+ * Transform the subquery LogicalPlan and add the expressions which 
are used as filters to the
+ * projection. And also return filter conditions used in subquery
+ */
+def transformAndGetConditions(value: Expression,
+  subquery: LogicalPlan): (LogicalPlan, Expression) = {
+  val expr = new scala.collection.mutable.ArrayBuffer[Expression]()
+  val transformedPlan = subquery transform {
+case project @ Project(projectList, f @ Filter(condition, child)) 
=
+  // Don't support more than one item in select list of subquery
+  if(projectList.size  1) {
+throw new TreeNodeException(
+project,
+SubQuery can contain only one item in Select List)
+  }
+  val resolvedChild = ResolveRelations(child)
+  // Add the expressions to the projections which are used as 
filters in subquery
+  val toBeAddedExprs = f.references.filter{a =
+resolvedChild.resolve(a.name, resolver) != None  
!project.outputSet.contains(a)}
+  val nameToExprMap = collection.mutable.Map[String, Alias]()
+  // Create aliases for all projection expressions.
+  val witAliases = (projectList ++ 
toBeAddedExprs).zipWithIndex.map {
+case (exp, index) = 
+  nameToExprMap.put(exp.name, Alias(exp, ssqc$index)())
+  Alias(exp, ssqc$index)()
+  }
+  // Replace the condition column names with alias names.
+  val transformedConds = condition.transform {
+

[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-30 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22365419
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -414,6 +418,123 @@ class Analyzer(catalog: Catalog,
 Generate(g, join = false, outer = false, None, child)
 }
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case p: LogicalPlan if !p.childrenResolved = p
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new scala.collection.mutable.ArrayBuffer[In]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+conditions.collect {
+  case s @ In(exp, Seq(SubqueryExpression(subquery))) =
+subqueryExprs += s
+}
+val transformedConds = conditions.transform {
+  // Replace with dummy
+  case s @ In(exp,Seq(SubqueryExpression(subquery))) =
+Literal(true)
+}
+if (subqueryExprs.size == 1) {
+  val subqueryExpr = subqueryExprs.remove(0)
+  createLeftSemiJoin(
+child,
+subqueryExpr.value,
+subqueryExpr.list(0).asInstanceOf[SubqueryExpression].subquery,
+transformedConds)
+} else if (subqueryExprs.size  1) {
+  // Only one subquery expression is supported.
+  throw new TreeNodeException(filter, Only one SubQuery 
expression is supported.)
+} else {
+  filter
+}
--- End diff --

Maybe simpler as:
```scala
val subqueryExprs = conditions.collect {
  case In(exp, Seq(SubqueryExpression(subquery))) = (exp, subquery)
}
// Replace subqueries with a dummy true literal since they are 
evaluated separately now.
val transformedConds = conditions.transform {
  case In(_, Seq(SubqueryExpression(_))) = Literal(true)
}
subqueryExprs match {
  case Seq() = filter // No subqueries.
  case Seq((exp, subquery)) =
createLeftSemiJoin(
  child,
  exp,
  subquery,
  transformedConds)
  case _ =
throw new TreeNodeException(filter, Only one SubQuery 
expression is supported.)
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-30 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22365551
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -414,6 +418,123 @@ class Analyzer(catalog: Catalog,
 Generate(g, join = false, outer = false, None, child)
 }
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case p: LogicalPlan if !p.childrenResolved = p
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new scala.collection.mutable.ArrayBuffer[In]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+conditions.collect {
+  case s @ In(exp, Seq(SubqueryExpression(subquery))) =
+subqueryExprs += s
+}
+val transformedConds = conditions.transform {
+  // Replace with dummy
+  case s @ In(exp,Seq(SubqueryExpression(subquery))) =
+Literal(true)
+}
+if (subqueryExprs.size == 1) {
+  val subqueryExpr = subqueryExprs.remove(0)
+  createLeftSemiJoin(
+child,
+subqueryExpr.value,
+subqueryExpr.list(0).asInstanceOf[SubqueryExpression].subquery,
+transformedConds)
+} else if (subqueryExprs.size  1) {
+  // Only one subquery expression is supported.
+  throw new TreeNodeException(filter, Only one SubQuery 
expression is supported.)
+} else {
+  filter
+}
+}
+
+/**
+ * Create LeftSemi join with parent query to the subquery which is 
mentioned in 'IN' predicate
+ * And combine the subquery conditions and parent query conditions.
+ */ 
+def createLeftSemiJoin(left: LogicalPlan,
+value: Expression,
+subquery: LogicalPlan,
+parentConds: Expression) : LogicalPlan = {
+  val (transformedPlan, subqueryConds) = 
transformAndGetConditions(value, subquery)
+  // Unify the parent query conditions and subquery conditions and add 
these as join conditions
+  val unifyConds = And(parentConds, subqueryConds)
+  Join(left, transformedPlan, LeftSemi, Some(unifyConds))
+}
+
+/**
+ * Transform the subquery LogicalPlan and add the expressions which 
are used as filters to the
+ * projection. And also return filter conditions used in subquery
+ */
+def transformAndGetConditions(value: Expression,
+  subquery: LogicalPlan): (LogicalPlan, Expression) = {
+  val expr = new scala.collection.mutable.ArrayBuffer[Expression]()
+  val transformedPlan = subquery transform {
+case project @ Project(projectList, f @ Filter(condition, child)) 
=
--- End diff --

Okay so this is a pretty hard problem.  Perhaps its okay to add these 
simple cases and then try and expand support in the future.  Out of curiosity, 
does hive support the query above?  We should at least leave a TODO here that 
says we only decorelate subqueries in very specific cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-30 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22365598
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubqueryExpression.scala
 ---
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+
+/**
+ * Evaluates whether `subquery` result contains `value`. 
+ * For example : 'SELECT * FROM src a WHERE a.key in (SELECT b.key FROM 
src b)'
+ * @param subquery  In the above example 'SELECT b.key FROM src b' is 
'subquery'
+ */
+case class SubqueryExpression(subquery: LogicalPlan) extends Expression {
+
+  type EvaluatedType = Any
+  def dataType = subquery.output.head.dataType
+  override def foldable = false
+  def nullable = true
+  override def toString = sSubqueryExpression($subquery)
--- End diff --

This doesn't print nicely since `subquery` has newlines in it.  Perhaps 
just `${subquery.output.mkString(,)}`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-30 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-68400682
  
Okay, I think this is getting closer.  Thanks for working on this.  As I 
commented I'm worried that this only handles a few very specific cases.  
However, since this is pretty clean / isolated perhaps that is okay and we can 
expand it in the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-21 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22148838
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -314,6 +318,113 @@ class Analyzer(catalog: Catalog, registry: 
FunctionRegistry, caseSensitive: Bool
 protected def containsStar(exprs: Seq[Expression]): Boolean =
   exprs.collect { case _: Star = true }.nonEmpty
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new 
scala.collection.mutable.ArrayBuffer[SubqueryExpression]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+val transformedConds = conditions.transform{
--- End diff --

Ok.  Done in two steps.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-21 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22148840
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -314,6 +318,113 @@ class Analyzer(catalog: Catalog, registry: 
FunctionRegistry, caseSensitive: Bool
 protected def containsStar(exprs: Seq[Expression]): Boolean =
   exprs.collect { case _: Star = true }.nonEmpty
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new 
scala.collection.mutable.ArrayBuffer[SubqueryExpression]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+val transformedConds = conditions.transform{
+  // Replace with dummy
+  case s @ SubqueryExpression(exp,subquery) =
+subqueryExprs += s
+Literal(true)
+}
+if (subqueryExprs.size == 1) {
+  val subqueryExpr = subqueryExprs.remove(0)
+  createLeftSemiJoin(
+child, subqueryExpr.value,
+subqueryExpr.subquery, transformedConds)
+} else if (subqueryExprs.size  1) {
+  // Only one subquery expression is supported.
+  throw new TreeNodeException(filter, Only 1 SubQuery expression 
is supported.)
--- End diff --

Ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-21 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22148841
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -314,6 +318,113 @@ class Analyzer(catalog: Catalog, registry: 
FunctionRegistry, caseSensitive: Bool
 protected def containsStar(exprs: Seq[Expression]): Boolean =
   exprs.collect { case _: Star = true }.nonEmpty
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new 
scala.collection.mutable.ArrayBuffer[SubqueryExpression]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+val transformedConds = conditions.transform{
+  // Replace with dummy
+  case s @ SubqueryExpression(exp,subquery) =
+subqueryExprs += s
+Literal(true)
+}
+if (subqueryExprs.size == 1) {
+  val subqueryExpr = subqueryExprs.remove(0)
+  createLeftSemiJoin(
+child, subqueryExpr.value,
+subqueryExpr.subquery, transformedConds)
--- End diff --

Ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-21 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22148845
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -314,6 +318,113 @@ class Analyzer(catalog: Catalog, registry: 
FunctionRegistry, caseSensitive: Bool
 protected def containsStar(exprs: Seq[Expression]): Boolean =
   exprs.collect { case _: Star = true }.nonEmpty
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new 
scala.collection.mutable.ArrayBuffer[SubqueryExpression]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+val transformedConds = conditions.transform{
+  // Replace with dummy
+  case s @ SubqueryExpression(exp,subquery) =
+subqueryExprs += s
+Literal(true)
+}
+if (subqueryExprs.size == 1) {
+  val subqueryExpr = subqueryExprs.remove(0)
+  createLeftSemiJoin(
+child, subqueryExpr.value,
+subqueryExpr.subquery, transformedConds)
+} else if (subqueryExprs.size  1) {
+  // Only one subquery expression is supported.
+  throw new TreeNodeException(filter, Only 1 SubQuery expression 
is supported.)
+} else {
+  filter
+}
+}
+
+/**
+ * Create LeftSemi join with parent query to the subquery which is 
mentioned in 'IN' predicate
+ * And combine the subquery conditions and parent query conditions.
+ */ 
+def createLeftSemiJoin(left: LogicalPlan,
+value: Expression, subquery: LogicalPlan,
+parentConds: Expression) : LogicalPlan = {
+  val (transformedPlan, subqueryConds) = transformAndGetConditions(
+  value, subquery)
--- End diff --

Ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-6306
  
  [Test build #24684 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24684/consoleFull)
 for   PR 3249 at commit 
[`8cae35b`](https://github.com/apache/spark/commit/8cae35bc70094011d9e62f465cdae87aacbf9240).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-21 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22148859
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -314,6 +318,113 @@ class Analyzer(catalog: Catalog, registry: 
FunctionRegistry, caseSensitive: Bool
 protected def containsStar(exprs: Seq[Expression]): Boolean =
   exprs.collect { case _: Star = true }.nonEmpty
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new 
scala.collection.mutable.ArrayBuffer[SubqueryExpression]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+val transformedConds = conditions.transform{
+  // Replace with dummy
+  case s @ SubqueryExpression(exp,subquery) =
+subqueryExprs += s
+Literal(true)
+}
+if (subqueryExprs.size == 1) {
+  val subqueryExpr = subqueryExprs.remove(0)
+  createLeftSemiJoin(
+child, subqueryExpr.value,
+subqueryExpr.subquery, transformedConds)
+} else if (subqueryExprs.size  1) {
+  // Only one subquery expression is supported.
+  throw new TreeNodeException(filter, Only 1 SubQuery expression 
is supported.)
+} else {
+  filter
+}
+}
+
+/**
+ * Create LeftSemi join with parent query to the subquery which is 
mentioned in 'IN' predicate
+ * And combine the subquery conditions and parent query conditions.
+ */ 
+def createLeftSemiJoin(left: LogicalPlan,
+value: Expression, subquery: LogicalPlan,
+parentConds: Expression) : LogicalPlan = {
+  val (transformedPlan, subqueryConds) = transformAndGetConditions(
+  value, subquery)
+  // Unify the parent query conditions and subquery conditions and add 
these as join conditions
+  val unifyConds = And(parentConds, subqueryConds)
+  Join(left, transformedPlan, LeftSemi, Some(unifyConds))
+}
+
+/**
+ * Transform the subquery LogicalPlan and add the expressions which 
are used as filters to the
+ * projection. And also return filter conditions used in subquery
+ */
+def transformAndGetConditions(value: Expression,
+  subquery: LogicalPlan): (LogicalPlan, Expression) = {
+  val expr = new scala.collection.mutable.ArrayBuffer[Expression]()
+  val transformedPlan = subquery transform {
+case project @ Project(projectList, f @ Filter(condition, child)) 
=
+  // Don't support more than 1 item in select list of subquery
+  if(projectList.size  1) {
+throw new TreeNodeException(project, SubQuery can contain 
only 1 item in Select List)
+  }
+  val resolvedChild = ResolveRelations(child)
--- End diff --

Here guarding may not work because SubqueryExpression does not resolve with 
main query so I guess we need to resolve it on the need basis.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-21 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r2214
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -314,6 +318,113 @@ class Analyzer(catalog: Catalog, registry: 
FunctionRegistry, caseSensitive: Bool
 protected def containsStar(exprs: Seq[Expression]): Boolean =
   exprs.collect { case _: Star = true }.nonEmpty
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new 
scala.collection.mutable.ArrayBuffer[SubqueryExpression]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+val transformedConds = conditions.transform{
+  // Replace with dummy
+  case s @ SubqueryExpression(exp,subquery) =
+subqueryExprs += s
+Literal(true)
+}
+if (subqueryExprs.size == 1) {
+  val subqueryExpr = subqueryExprs.remove(0)
+  createLeftSemiJoin(
+child, subqueryExpr.value,
+subqueryExpr.subquery, transformedConds)
+} else if (subqueryExprs.size  1) {
+  // Only one subquery expression is supported.
+  throw new TreeNodeException(filter, Only 1 SubQuery expression 
is supported.)
+} else {
+  filter
+}
+}
+
+/**
+ * Create LeftSemi join with parent query to the subquery which is 
mentioned in 'IN' predicate
+ * And combine the subquery conditions and parent query conditions.
+ */ 
+def createLeftSemiJoin(left: LogicalPlan,
+value: Expression, subquery: LogicalPlan,
+parentConds: Expression) : LogicalPlan = {
+  val (transformedPlan, subqueryConds) = transformAndGetConditions(
+  value, subquery)
+  // Unify the parent query conditions and subquery conditions and add 
these as join conditions
+  val unifyConds = And(parentConds, subqueryConds)
+  Join(left, transformedPlan, LeftSemi, Some(unifyConds))
+}
+
+/**
+ * Transform the subquery LogicalPlan and add the expressions which 
are used as filters to the
+ * projection. And also return filter conditions used in subquery
+ */
+def transformAndGetConditions(value: Expression,
+  subquery: LogicalPlan): (LogicalPlan, Expression) = {
+  val expr = new scala.collection.mutable.ArrayBuffer[Expression]()
+  val transformedPlan = subquery transform {
+case project @ Project(projectList, f @ Filter(condition, child)) 
=
+  // Don't support more than 1 item in select list of subquery
+  if(projectList.size  1) {
+throw new TreeNodeException(project, SubQuery can contain 
only 1 item in Select List)
+  }
+  val resolvedChild = ResolveRelations(child)
+  // Add the expressions to the projections which are used as 
filters in subquery
+  val toBeAddedExprs = f.references.filter(
+  a=resolvedChild.resolve(a.name, resolver) != None  
!projectList.contains(a))
--- End diff --

Here guarding may not work because SubqueryExpression does not resolve with 
main query so I guess we need to resolve it on the need basis.
And I used ```project.outputSet``` to filter


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-21 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22148896
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -314,6 +318,113 @@ class Analyzer(catalog: Catalog, registry: 
FunctionRegistry, caseSensitive: Bool
 protected def containsStar(exprs: Seq[Expression]): Boolean =
   exprs.collect { case _: Star = true }.nonEmpty
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new 
scala.collection.mutable.ArrayBuffer[SubqueryExpression]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+val transformedConds = conditions.transform{
+  // Replace with dummy
+  case s @ SubqueryExpression(exp,subquery) =
+subqueryExprs += s
+Literal(true)
+}
+if (subqueryExprs.size == 1) {
+  val subqueryExpr = subqueryExprs.remove(0)
+  createLeftSemiJoin(
+child, subqueryExpr.value,
+subqueryExpr.subquery, transformedConds)
+} else if (subqueryExprs.size  1) {
+  // Only one subquery expression is supported.
+  throw new TreeNodeException(filter, Only 1 SubQuery expression 
is supported.)
+} else {
+  filter
+}
+}
+
+/**
+ * Create LeftSemi join with parent query to the subquery which is 
mentioned in 'IN' predicate
+ * And combine the subquery conditions and parent query conditions.
+ */ 
+def createLeftSemiJoin(left: LogicalPlan,
+value: Expression, subquery: LogicalPlan,
+parentConds: Expression) : LogicalPlan = {
+  val (transformedPlan, subqueryConds) = transformAndGetConditions(
+  value, subquery)
+  // Unify the parent query conditions and subquery conditions and add 
these as join conditions
+  val unifyConds = And(parentConds, subqueryConds)
+  Join(left, transformedPlan, LeftSemi, Some(unifyConds))
+}
+
+/**
+ * Transform the subquery LogicalPlan and add the expressions which 
are used as filters to the
+ * projection. And also return filter conditions used in subquery
+ */
+def transformAndGetConditions(value: Expression,
+  subquery: LogicalPlan): (LogicalPlan, Expression) = {
+  val expr = new scala.collection.mutable.ArrayBuffer[Expression]()
+  val transformedPlan = subquery transform {
+case project @ Project(projectList, f @ Filter(condition, child)) 
=
+  // Don't support more than 1 item in select list of subquery
+  if(projectList.size  1) {
+throw new TreeNodeException(project, SubQuery can contain 
only 1 item in Select List)
+  }
+  val resolvedChild = ResolveRelations(child)
+  // Add the expressions to the projections which are used as 
filters in subquery
+  val toBeAddedExprs = f.references.filter(
+  a=resolvedChild.resolve(a.name, resolver) != None  
!projectList.contains(a))
+  val cache = collection.mutable.Map[String, String]()
+  // Create aliases for all projection expressions.
+  val witAliases = (projectList ++ 
toBeAddedExprs).zipWithIndex.map {
+case (exp, index) = 
+  cache.put(exp.name, ssqc$index)
+  Alias(exp, ssqc$index)()
+  }
+  // Replace the condition column names with alias names.
+  val transformedConds = condition.transform{
+case a: Attribute if resolvedChild.resolve(a.name, resolver) 
!= None =
+  UnresolvedAttribute(subquery. + cache.get(a.name).get)
--- End diff --

Ok . Added ```Alias``` in all places and removed 

[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-6608
  
  [Test build #24685 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24685/consoleFull)
 for   PR 3249 at commit 
[`11a238d`](https://github.com/apache/spark/commit/11a238d06320940f0278eae621df36d05436970f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-21 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22148903
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubqueryExpression.scala
 ---
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+
+/**
+ * Evaluates whether `subquery` result contains `value`. 
+ * For example : 'SELECT * FROM src a WHERE a.key in (SELECT b.key FROM 
src b)'
+ * @param value   In the above example 'a.key' is 'value'
+ * @param subquery  In the above example 'SELECT b.key FROM src b' is 
'subquery'
+ */
+case class SubqueryExpression(value: Expression, subquery: LogicalPlan) 
extends Expression {
--- End diff --

Ok. Added like ```In('a.key, SubqueryExpression(...))```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-21 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22148913
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubqueryExpression.scala
 ---
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+
+/**
+ * Evaluates whether `subquery` result contains `value`. 
+ * For example : 'SELECT * FROM src a WHERE a.key in (SELECT b.key FROM 
src b)'
+ * @param value   In the above example 'a.key' is 'value'
+ * @param subquery  In the above example 'SELECT b.key FROM src b' is 
'subquery'
+ */
+case class SubqueryExpression(value: Expression, subquery: LogicalPlan) 
extends Expression {
+
+  type EvaluatedType = Any
+  def dataType = value.dataType
+  override def foldable = value.foldable
+  def nullable = value.nullable
+  override def toString = sSubqueryExpression($value, $subquery)
+  override lazy val resolved = childrenResolved
--- End diff --

Yes, it is always unresolved. It will only be converted not executed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-21 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22148917
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubqueryExpression.scala
 ---
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+
+/**
+ * Evaluates whether `subquery` result contains `value`. 
+ * For example : 'SELECT * FROM src a WHERE a.key in (SELECT b.key FROM 
src b)'
+ * @param value   In the above example 'a.key' is 'value'
+ * @param subquery  In the above example 'SELECT b.key FROM src b' is 
'subquery'
+ */
+case class SubqueryExpression(value: Expression, subquery: LogicalPlan) 
extends Expression {
+
+  type EvaluatedType = Any
+  def dataType = value.dataType
+  override def foldable = value.foldable
+  def nullable = value.nullable
--- End diff --

Yes. I guess it is nullable


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-21 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22149004
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -314,6 +318,113 @@ class Analyzer(catalog: Catalog, registry: 
FunctionRegistry, caseSensitive: Bool
 protected def containsStar(exprs: Seq[Expression]): Boolean =
   exprs.collect { case _: Star = true }.nonEmpty
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new 
scala.collection.mutable.ArrayBuffer[SubqueryExpression]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+val transformedConds = conditions.transform{
+  // Replace with dummy
+  case s @ SubqueryExpression(exp,subquery) =
+subqueryExprs += s
+Literal(true)
+}
+if (subqueryExprs.size == 1) {
+  val subqueryExpr = subqueryExprs.remove(0)
+  createLeftSemiJoin(
+child, subqueryExpr.value,
+subqueryExpr.subquery, transformedConds)
+} else if (subqueryExprs.size  1) {
+  // Only one subquery expression is supported.
+  throw new TreeNodeException(filter, Only 1 SubQuery expression 
is supported.)
+} else {
+  filter
+}
+}
+
+/**
+ * Create LeftSemi join with parent query to the subquery which is 
mentioned in 'IN' predicate
+ * And combine the subquery conditions and parent query conditions.
+ */ 
+def createLeftSemiJoin(left: LogicalPlan,
+value: Expression, subquery: LogicalPlan,
+parentConds: Expression) : LogicalPlan = {
+  val (transformedPlan, subqueryConds) = transformAndGetConditions(
+  value, subquery)
+  // Unify the parent query conditions and subquery conditions and add 
these as join conditions
+  val unifyConds = And(parentConds, subqueryConds)
+  Join(left, transformedPlan, LeftSemi, Some(unifyConds))
+}
+
+/**
+ * Transform the subquery LogicalPlan and add the expressions which 
are used as filters to the
+ * projection. And also return filter conditions used in subquery
+ */
+def transformAndGetConditions(value: Expression,
+  subquery: LogicalPlan): (LogicalPlan, Expression) = {
+  val expr = new scala.collection.mutable.ArrayBuffer[Expression]()
+  val transformedPlan = subquery transform {
+case project @ Project(projectList, f @ Filter(condition, child)) 
=
+  // Don't support more than 1 item in select list of subquery
+  if(projectList.size  1) {
+throw new TreeNodeException(project, SubQuery can contain 
only 1 item in Select List)
+  }
+  val resolvedChild = ResolveRelations(child)
+  // Add the expressions to the projections which are used as 
filters in subquery
+  val toBeAddedExprs = f.references.filter(
+  a=resolvedChild.resolve(a.name, resolver) != None  
!projectList.contains(a))
+  val cache = collection.mutable.Map[String, String]()
--- End diff --

I may not able to use ```AttributeMap``` without resolving the subquery 
completly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-21 Thread ravipesala
Github user ravipesala commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-67778115
  
Thank you for reviewing it. 
I have worked on review comments.Please review it.
I guess the ```SubqueryExpression``` may not be  resolved along with main 
query and also we may not able to resolve it separately as it may contain the 
main query references like ```select C from R1 where R1.A in (Select B from R2 
where R1.X = R2.Y) ```. Here R1.X is used inside subquery. So I guess we can 
resolve it on the need basis.  Please comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-67779525
  
  [Test build #24684 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24684/consoleFull)
 for   PR 3249 at commit 
[`8cae35b`](https://github.com/apache/spark/commit/8cae35bc70094011d9e62f465cdae87aacbf9240).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class SubqueryExpression(subquery: LogicalPlan) extends 
Expression `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-67779527
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24684/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-67779852
  
  [Test build #24685 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24685/consoleFull)
 for   PR 3249 at commit 
[`11a238d`](https://github.com/apache/spark/commit/11a238d06320940f0278eae621df36d05436970f).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class SubqueryExpression(subquery: LogicalPlan) extends 
Expression `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3249#issuecomment-67779856
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24685/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-17 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22004038
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -314,6 +318,113 @@ class Analyzer(catalog: Catalog, registry: 
FunctionRegistry, caseSensitive: Bool
 protected def containsStar(exprs: Seq[Expression]): Boolean =
   exprs.collect { case _: Star = true }.nonEmpty
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new 
scala.collection.mutable.ArrayBuffer[SubqueryExpression]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+val transformedConds = conditions.transform{
--- End diff --

Space before `{`.

Also I would consider doing this in two steps to avoid depending on 
transform for side effects: a collect to get the list and then a transform to 
replace with `true`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-17 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22004108
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -314,6 +318,113 @@ class Analyzer(catalog: Catalog, registry: 
FunctionRegistry, caseSensitive: Bool
 protected def containsStar(exprs: Seq[Expression]): Boolean =
   exprs.collect { case _: Star = true }.nonEmpty
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new 
scala.collection.mutable.ArrayBuffer[SubqueryExpression]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+val transformedConds = conditions.transform{
+  // Replace with dummy
+  case s @ SubqueryExpression(exp,subquery) =
+subqueryExprs += s
+Literal(true)
+}
+if (subqueryExprs.size == 1) {
+  val subqueryExpr = subqueryExprs.remove(0)
+  createLeftSemiJoin(
+child, subqueryExpr.value,
+subqueryExpr.subquery, transformedConds)
+} else if (subqueryExprs.size  1) {
+  // Only one subquery expression is supported.
+  throw new TreeNodeException(filter, Only 1 SubQuery expression 
is supported.)
--- End diff --

Only one subquery expression is supported


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-17 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22004302
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -314,6 +318,113 @@ class Analyzer(catalog: Catalog, registry: 
FunctionRegistry, caseSensitive: Bool
 protected def containsStar(exprs: Seq[Expression]): Boolean =
   exprs.collect { case _: Star = true }.nonEmpty
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new 
scala.collection.mutable.ArrayBuffer[SubqueryExpression]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+val transformedConds = conditions.transform{
+  // Replace with dummy
+  case s @ SubqueryExpression(exp,subquery) =
+subqueryExprs += s
+Literal(true)
+}
+if (subqueryExprs.size == 1) {
+  val subqueryExpr = subqueryExprs.remove(0)
+  createLeftSemiJoin(
+child, subqueryExpr.value,
+subqueryExpr.subquery, transformedConds)
--- End diff --

If you need to wrap then I'd put each arg on its own line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-17 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22004338
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -314,6 +318,113 @@ class Analyzer(catalog: Catalog, registry: 
FunctionRegistry, caseSensitive: Bool
 protected def containsStar(exprs: Seq[Expression]): Boolean =
   exprs.collect { case _: Star = true }.nonEmpty
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new 
scala.collection.mutable.ArrayBuffer[SubqueryExpression]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+val transformedConds = conditions.transform{
+  // Replace with dummy
+  case s @ SubqueryExpression(exp,subquery) =
+subqueryExprs += s
+Literal(true)
+}
+if (subqueryExprs.size == 1) {
+  val subqueryExpr = subqueryExprs.remove(0)
+  createLeftSemiJoin(
+child, subqueryExpr.value,
+subqueryExpr.subquery, transformedConds)
+} else if (subqueryExprs.size  1) {
+  // Only one subquery expression is supported.
+  throw new TreeNodeException(filter, Only 1 SubQuery expression 
is supported.)
+} else {
+  filter
+}
+}
+
+/**
+ * Create LeftSemi join with parent query to the subquery which is 
mentioned in 'IN' predicate
+ * And combine the subquery conditions and parent query conditions.
+ */ 
+def createLeftSemiJoin(left: LogicalPlan,
+value: Expression, subquery: LogicalPlan,
+parentConds: Expression) : LogicalPlan = {
+  val (transformedPlan, subqueryConds) = transformAndGetConditions(
+  value, subquery)
--- End diff --

Does this need to wrap?  If so you should break at the `=`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-17 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22004386
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -314,6 +318,113 @@ class Analyzer(catalog: Catalog, registry: 
FunctionRegistry, caseSensitive: Bool
 protected def containsStar(exprs: Seq[Expression]): Boolean =
   exprs.collect { case _: Star = true }.nonEmpty
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new 
scala.collection.mutable.ArrayBuffer[SubqueryExpression]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+val transformedConds = conditions.transform{
+  // Replace with dummy
+  case s @ SubqueryExpression(exp,subquery) =
+subqueryExprs += s
+Literal(true)
+}
+if (subqueryExprs.size == 1) {
+  val subqueryExpr = subqueryExprs.remove(0)
+  createLeftSemiJoin(
+child, subqueryExpr.value,
+subqueryExpr.subquery, transformedConds)
+} else if (subqueryExprs.size  1) {
+  // Only one subquery expression is supported.
+  throw new TreeNodeException(filter, Only 1 SubQuery expression 
is supported.)
+} else {
+  filter
+}
+}
+
+/**
+ * Create LeftSemi join with parent query to the subquery which is 
mentioned in 'IN' predicate
+ * And combine the subquery conditions and parent query conditions.
+ */ 
+def createLeftSemiJoin(left: LogicalPlan,
+value: Expression, subquery: LogicalPlan,
+parentConds: Expression) : LogicalPlan = {
--- End diff --

Same as above, if you need to wrap then put each thing on its own line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-17 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22004443
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -314,6 +318,113 @@ class Analyzer(catalog: Catalog, registry: 
FunctionRegistry, caseSensitive: Bool
 protected def containsStar(exprs: Seq[Expression]): Boolean =
   exprs.collect { case _: Star = true }.nonEmpty
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new 
scala.collection.mutable.ArrayBuffer[SubqueryExpression]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+val transformedConds = conditions.transform{
+  // Replace with dummy
+  case s @ SubqueryExpression(exp,subquery) =
+subqueryExprs += s
+Literal(true)
+}
+if (subqueryExprs.size == 1) {
+  val subqueryExpr = subqueryExprs.remove(0)
+  createLeftSemiJoin(
+child, subqueryExpr.value,
+subqueryExpr.subquery, transformedConds)
+} else if (subqueryExprs.size  1) {
+  // Only one subquery expression is supported.
+  throw new TreeNodeException(filter, Only 1 SubQuery expression 
is supported.)
+} else {
+  filter
+}
+}
+
+/**
+ * Create LeftSemi join with parent query to the subquery which is 
mentioned in 'IN' predicate
+ * And combine the subquery conditions and parent query conditions.
+ */ 
+def createLeftSemiJoin(left: LogicalPlan,
+value: Expression, subquery: LogicalPlan,
+parentConds: Expression) : LogicalPlan = {
+  val (transformedPlan, subqueryConds) = transformAndGetConditions(
+  value, subquery)
+  // Unify the parent query conditions and subquery conditions and add 
these as join conditions
+  val unifyConds = And(parentConds, subqueryConds)
+  Join(left, transformedPlan, LeftSemi, Some(unifyConds))
+}
+
+/**
+ * Transform the subquery LogicalPlan and add the expressions which 
are used as filters to the
+ * projection. And also return filter conditions used in subquery
+ */
+def transformAndGetConditions(value: Expression,
+  subquery: LogicalPlan): (LogicalPlan, Expression) = {
+  val expr = new scala.collection.mutable.ArrayBuffer[Expression]()
+  val transformedPlan = subquery transform {
+case project @ Project(projectList, f @ Filter(condition, child)) 
=
+  // Don't support more than 1 item in select list of subquery
+  if(projectList.size  1) {
+throw new TreeNodeException(project, SubQuery can contain 
only 1 item in Select List)
+  }
+  val resolvedChild = ResolveRelations(child)
--- End diff --

Instead of doing this, I'd propose you guard the rule at the top to only 
apply once the child plan is resolved.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-17 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22004962
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -314,6 +318,113 @@ class Analyzer(catalog: Catalog, registry: 
FunctionRegistry, caseSensitive: Bool
 protected def containsStar(exprs: Seq[Expression]): Boolean =
   exprs.collect { case _: Star = true }.nonEmpty
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new 
scala.collection.mutable.ArrayBuffer[SubqueryExpression]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+val transformedConds = conditions.transform{
+  // Replace with dummy
+  case s @ SubqueryExpression(exp,subquery) =
+subqueryExprs += s
+Literal(true)
+}
+if (subqueryExprs.size == 1) {
+  val subqueryExpr = subqueryExprs.remove(0)
+  createLeftSemiJoin(
+child, subqueryExpr.value,
+subqueryExpr.subquery, transformedConds)
+} else if (subqueryExprs.size  1) {
+  // Only one subquery expression is supported.
+  throw new TreeNodeException(filter, Only 1 SubQuery expression 
is supported.)
+} else {
+  filter
+}
+}
+
+/**
+ * Create LeftSemi join with parent query to the subquery which is 
mentioned in 'IN' predicate
+ * And combine the subquery conditions and parent query conditions.
+ */ 
+def createLeftSemiJoin(left: LogicalPlan,
+value: Expression, subquery: LogicalPlan,
+parentConds: Expression) : LogicalPlan = {
+  val (transformedPlan, subqueryConds) = transformAndGetConditions(
+  value, subquery)
+  // Unify the parent query conditions and subquery conditions and add 
these as join conditions
+  val unifyConds = And(parentConds, subqueryConds)
+  Join(left, transformedPlan, LeftSemi, Some(unifyConds))
+}
+
+/**
+ * Transform the subquery LogicalPlan and add the expressions which 
are used as filters to the
+ * projection. And also return filter conditions used in subquery
+ */
+def transformAndGetConditions(value: Expression,
+  subquery: LogicalPlan): (LogicalPlan, Expression) = {
+  val expr = new scala.collection.mutable.ArrayBuffer[Expression]()
+  val transformedPlan = subquery transform {
+case project @ Project(projectList, f @ Filter(condition, child)) 
=
+  // Don't support more than 1 item in select list of subquery
+  if(projectList.size  1) {
+throw new TreeNodeException(project, SubQuery can contain 
only 1 item in Select List)
+  }
+  val resolvedChild = ResolveRelations(child)
+  // Add the expressions to the projections which are used as 
filters in subquery
+  val toBeAddedExprs = f.references.filter(
+  a=resolvedChild.resolve(a.name, resolver) != None  
!projectList.contains(a))
--- End diff --

A few things here
 - I'd add the guard above so we don't have to resolve here.  Hand 
resolution like you are doing here will work for simple cases but doesn't run 
to fixed point.
 - Never compare attributes by name, and don't inspect the project list to 
see what the output will be.  The reasons are as follows: names can be 
ambiguous and the project list could have aliases and other things that confuse 
the inspection.  Instead use `outputSet` which returns an `AttributeSet` that 
does comparisons correctly.
 - if you need multi line lambas, prefer the following syntax:
```  
  list.filter { a =
a
  }
```

All together I think you want something like:
 

[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-17 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22005225
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -314,6 +318,113 @@ class Analyzer(catalog: Catalog, registry: 
FunctionRegistry, caseSensitive: Bool
 protected def containsStar(exprs: Seq[Expression]): Boolean =
   exprs.collect { case _: Star = true }.nonEmpty
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new 
scala.collection.mutable.ArrayBuffer[SubqueryExpression]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+val transformedConds = conditions.transform{
+  // Replace with dummy
+  case s @ SubqueryExpression(exp,subquery) =
+subqueryExprs += s
+Literal(true)
+}
+if (subqueryExprs.size == 1) {
+  val subqueryExpr = subqueryExprs.remove(0)
+  createLeftSemiJoin(
+child, subqueryExpr.value,
+subqueryExpr.subquery, transformedConds)
+} else if (subqueryExprs.size  1) {
+  // Only one subquery expression is supported.
+  throw new TreeNodeException(filter, Only 1 SubQuery expression 
is supported.)
+} else {
+  filter
+}
+}
+
+/**
+ * Create LeftSemi join with parent query to the subquery which is 
mentioned in 'IN' predicate
+ * And combine the subquery conditions and parent query conditions.
+ */ 
+def createLeftSemiJoin(left: LogicalPlan,
+value: Expression, subquery: LogicalPlan,
+parentConds: Expression) : LogicalPlan = {
+  val (transformedPlan, subqueryConds) = transformAndGetConditions(
+  value, subquery)
+  // Unify the parent query conditions and subquery conditions and add 
these as join conditions
+  val unifyConds = And(parentConds, subqueryConds)
+  Join(left, transformedPlan, LeftSemi, Some(unifyConds))
+}
+
+/**
+ * Transform the subquery LogicalPlan and add the expressions which 
are used as filters to the
+ * projection. And also return filter conditions used in subquery
+ */
+def transformAndGetConditions(value: Expression,
+  subquery: LogicalPlan): (LogicalPlan, Expression) = {
+  val expr = new scala.collection.mutable.ArrayBuffer[Expression]()
+  val transformedPlan = subquery transform {
+case project @ Project(projectList, f @ Filter(condition, child)) 
=
+  // Don't support more than 1 item in select list of subquery
+  if(projectList.size  1) {
+throw new TreeNodeException(project, SubQuery can contain 
only 1 item in Select List)
+  }
+  val resolvedChild = ResolveRelations(child)
+  // Add the expressions to the projections which are used as 
filters in subquery
+  val toBeAddedExprs = f.references.filter(
+  a=resolvedChild.resolve(a.name, resolver) != None  
!projectList.contains(a))
+  val cache = collection.mutable.Map[String, String]()
+  // Create aliases for all projection expressions.
+  val witAliases = (projectList ++ 
toBeAddedExprs).zipWithIndex.map {
+case (exp, index) = 
+  cache.put(exp.name, ssqc$index)
+  Alias(exp, ssqc$index)()
+  }
+  // Replace the condition column names with alias names.
+  val transformedConds = condition.transform{
--- End diff --

Space before `{`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature 

[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-17 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22005471
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -314,6 +318,113 @@ class Analyzer(catalog: Catalog, registry: 
FunctionRegistry, caseSensitive: Bool
 protected def containsStar(exprs: Seq[Expression]): Boolean =
   exprs.collect { case _: Star = true }.nonEmpty
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new 
scala.collection.mutable.ArrayBuffer[SubqueryExpression]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+val transformedConds = conditions.transform{
+  // Replace with dummy
+  case s @ SubqueryExpression(exp,subquery) =
+subqueryExprs += s
+Literal(true)
+}
+if (subqueryExprs.size == 1) {
+  val subqueryExpr = subqueryExprs.remove(0)
+  createLeftSemiJoin(
+child, subqueryExpr.value,
+subqueryExpr.subquery, transformedConds)
+} else if (subqueryExprs.size  1) {
+  // Only one subquery expression is supported.
+  throw new TreeNodeException(filter, Only 1 SubQuery expression 
is supported.)
+} else {
+  filter
+}
+}
+
+/**
+ * Create LeftSemi join with parent query to the subquery which is 
mentioned in 'IN' predicate
+ * And combine the subquery conditions and parent query conditions.
+ */ 
+def createLeftSemiJoin(left: LogicalPlan,
+value: Expression, subquery: LogicalPlan,
+parentConds: Expression) : LogicalPlan = {
+  val (transformedPlan, subqueryConds) = transformAndGetConditions(
+  value, subquery)
+  // Unify the parent query conditions and subquery conditions and add 
these as join conditions
+  val unifyConds = And(parentConds, subqueryConds)
+  Join(left, transformedPlan, LeftSemi, Some(unifyConds))
+}
+
+/**
+ * Transform the subquery LogicalPlan and add the expressions which 
are used as filters to the
+ * projection. And also return filter conditions used in subquery
+ */
+def transformAndGetConditions(value: Expression,
+  subquery: LogicalPlan): (LogicalPlan, Expression) = {
+  val expr = new scala.collection.mutable.ArrayBuffer[Expression]()
+  val transformedPlan = subquery transform {
+case project @ Project(projectList, f @ Filter(condition, child)) 
=
+  // Don't support more than 1 item in select list of subquery
+  if(projectList.size  1) {
+throw new TreeNodeException(project, SubQuery can contain 
only 1 item in Select List)
+  }
+  val resolvedChild = ResolveRelations(child)
+  // Add the expressions to the projections which are used as 
filters in subquery
+  val toBeAddedExprs = f.references.filter(
+  a=resolvedChild.resolve(a.name, resolver) != None  
!projectList.contains(a))
+  val cache = collection.mutable.Map[String, String]()
+  // Create aliases for all projection expressions.
+  val witAliases = (projectList ++ 
toBeAddedExprs).zipWithIndex.map {
+case (exp, index) = 
+  cache.put(exp.name, ssqc$index)
+  Alias(exp, ssqc$index)()
+  }
+  // Replace the condition column names with alias names.
+  val transformedConds = condition.transform{
+case a: Attribute if resolvedChild.resolve(a.name, resolver) 
!= None =
+  UnresolvedAttribute(subquery. + cache.get(a.name).get)
--- End diff --

We should avoid creating more unresolved attributes 

[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-17 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22005513
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -314,6 +318,113 @@ class Analyzer(catalog: Catalog, registry: 
FunctionRegistry, caseSensitive: Bool
 protected def containsStar(exprs: Seq[Expression]): Boolean =
   exprs.collect { case _: Star = true }.nonEmpty
   }
+
+  /**
+   * Transforms the query which has subquery expressions in where clause 
to join queries.
+   * Case 1 Uncorelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2)
+   * -- rewritten query
+   * Select C from R1 left semi join (select B as sqc0 from R2) subquery 
on R1.A = subquery.sqc0
+   *
+   * Case 2 Corelated queries
+   * -- original query
+   * select C from R1 where R1.A in (Select B from R2 where R1.X = R2.Y)
+   * -- rewritten query
+   * select C from R1 left semi join (select B as sqc0, R2.Y as sqc1 from 
R2) subquery
+   *   on R1.X = subquery.sqc1 and R1.A = subquery.sqc0
+   * 
+   * Refer: 
https://issues.apache.org/jira/secure/attachment/12614003/SubQuerySpec.pdf
+   */
+  object SubQueryExpressions extends Rule[LogicalPlan] {
+
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case filter @ Filter(conditions, child) =
+val subqueryExprs = new 
scala.collection.mutable.ArrayBuffer[SubqueryExpression]()
+val nonSubQueryConds = new 
scala.collection.mutable.ArrayBuffer[Expression]()
+val transformedConds = conditions.transform{
+  // Replace with dummy
+  case s @ SubqueryExpression(exp,subquery) =
+subqueryExprs += s
+Literal(true)
+}
+if (subqueryExprs.size == 1) {
+  val subqueryExpr = subqueryExprs.remove(0)
+  createLeftSemiJoin(
+child, subqueryExpr.value,
+subqueryExpr.subquery, transformedConds)
+} else if (subqueryExprs.size  1) {
+  // Only one subquery expression is supported.
+  throw new TreeNodeException(filter, Only 1 SubQuery expression 
is supported.)
+} else {
+  filter
+}
+}
+
+/**
+ * Create LeftSemi join with parent query to the subquery which is 
mentioned in 'IN' predicate
+ * And combine the subquery conditions and parent query conditions.
+ */ 
+def createLeftSemiJoin(left: LogicalPlan,
+value: Expression, subquery: LogicalPlan,
+parentConds: Expression) : LogicalPlan = {
+  val (transformedPlan, subqueryConds) = transformAndGetConditions(
+  value, subquery)
+  // Unify the parent query conditions and subquery conditions and add 
these as join conditions
+  val unifyConds = And(parentConds, subqueryConds)
+  Join(left, transformedPlan, LeftSemi, Some(unifyConds))
+}
+
+/**
+ * Transform the subquery LogicalPlan and add the expressions which 
are used as filters to the
+ * projection. And also return filter conditions used in subquery
+ */
+def transformAndGetConditions(value: Expression,
+  subquery: LogicalPlan): (LogicalPlan, Expression) = {
+  val expr = new scala.collection.mutable.ArrayBuffer[Expression]()
+  val transformedPlan = subquery transform {
+case project @ Project(projectList, f @ Filter(condition, child)) 
=
+  // Don't support more than 1 item in select list of subquery
+  if(projectList.size  1) {
+throw new TreeNodeException(project, SubQuery can contain 
only 1 item in Select List)
+  }
+  val resolvedChild = ResolveRelations(child)
+  // Add the expressions to the projections which are used as 
filters in subquery
+  val toBeAddedExprs = f.references.filter(
+  a=resolvedChild.resolve(a.name, resolver) != None  
!projectList.contains(a))
+  val cache = collection.mutable.Map[String, String]()
+  // Create aliases for all projection expressions.
+  val witAliases = (projectList ++ 
toBeAddedExprs).zipWithIndex.map {
+case (exp, index) = 
+  cache.put(exp.name, ssqc$index)
+  Alias(exp, ssqc$index)()
+  }
+  // Replace the condition column names with alias names.
+  val transformedConds = condition.transform{
+case a: Attribute if resolvedChild.resolve(a.name, resolver) 
!= None =
+  UnresolvedAttribute(subquery. + cache.get(a.name).get)
+  }
+  // Join the first projection column of subquery 

[GitHub] spark pull request: [SPARK-4226][SQL] SparkSQL - Add support for s...

2014-12-17 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/3249#discussion_r22005662
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SubqueryExpression.scala
 ---
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+
+/**
+ * Evaluates whether `subquery` result contains `value`. 
+ * For example : 'SELECT * FROM src a WHERE a.key in (SELECT b.key FROM 
src b)'
+ * @param value   In the above example 'a.key' is 'value'
+ * @param subquery  In the above example 'SELECT b.key FROM src b' is 
'subquery'
+ */
+case class SubqueryExpression(value: Expression, subquery: LogicalPlan) 
extends Expression {
--- End diff --

Why do we need both the `value` and the `subquery`.  Could we do something 
like subquery.output.head.dataType?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >