subject:"\[GitHub\] spark pull request #14132\: \[SPARK\-16475\]\[SQL\] Broadcast Hint for SQL Queries"

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-30 Thread dongjoon-hyun

Github user dongjoon-hyun closed the pull request at:

https://github.com/apache/spark/pull/14132


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72182414
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+
+  for (table <- parameters) {
+var stop = false
+resolvedChild = resolvedChild.transformDown {
--- End diff --

Oh, I see. That's the reason why I can not find the clear logic. Thank you 
so much, @yhuai !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-25 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72181762
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+
+  for (table <- parameters) {
+var stop = false
+resolvedChild = resolvedChild.transformDown {
--- End diff --

You probably want to look at older versions of Hive. For example, 0.10. 
After https://issues.apache.org/jira/browse/HIVE-3784, Hive does not really use 
map join hint.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72179412
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+
+  for (table <- parameters) {
+var stop = false
+resolvedChild = resolvedChild.transformDown {
--- End diff --

cc @yhuai, any idea how does hive handle sql hints?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72178111
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+
+  for (table <- parameters) {
+var stop = false
+resolvedChild = resolvedChild.transformDown {
--- End diff --

For your case, what I understand is the DFS should be used. I've search 
Hive codebase until now, but I couldn't find the clear logic until now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72176938
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+
+  for (table <- parameters) {
+var stop = false
+resolvedChild = resolvedChild.transformDown {
--- End diff --

yea, for this example, DFS is right and BFS is wrong. But how about the 
case I mentioned above?

I think we don't need to traverse the plan tree recursively, just find the 
sub-tree of relation part, flatten the join node if any, and then iterate these 
relations, wrap it with broadcast hint if it's a table and table name matches 
the hint argument.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72114534
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+
+  for (table <- parameters) {
+var stop = false
+resolvedChild = resolvedChild.transformDown {
--- End diff --

Since `JOIN` is binary operation, **closest** does not mean BFS. Please see 
the following example. We don't think that the hint will match the fourth `t1`. 
If that term is ambiguous, I will change that into the first one in DFS.
```
scala> sql("select * from t1 JOIN t1 JOIN t1 JOIN t1").explain
== Physical Plan ==
CartesianProduct
:- CartesianProduct
:  :- CartesianProduct
:  :  :- *Range (0, 1, splits=8)
:  :  +- *Range (0, 1, splits=8)
:  +- *Range (0, 1, splits=8)
+- *Range (0, 1, splits=8)

scala> sql("select /*+ MAPJOIN(t1) */ * from t1 JOIN t1 JOIN t1 JOIN 
t1").explain
== Physical Plan ==
CartesianProduct
:- CartesianProduct
:  :- BroadcastNestedLoopJoin BuildLeft, Inner, true
:  :  :- BroadcastExchange IdentityBroadcastMode
:  :  :  +- *Range (0, 1, splits=8)
:  :  +- *Range (0, 1, splits=8)
:  +- *Range (0, 1, splits=8)
+- *Range (0, 1, splits=8)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72052381
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+
+  for (table <- parameters) {
+var stop = false
+resolvedChild = resolvedChild.transformDown {
--- End diff --

and I think BFS is the one that matches the semantic "closest".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r7205
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+
+  for (table <- parameters) {
+var stop = false
+resolvedChild = resolvedChild.transformDown {
--- End diff --

`SELECT /*+ MAPJOIN(a) */ FROM (SELECT * FROM a) x JOIN a`
If we do DFS, we will add the hint to the table `a` insode subquery, not 
the table `a` after JOIN.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72040705
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -347,6 +347,16 @@ querySpecification
windows?)
 ;
 
+hint
+: '/*+' hintStatement '*/'
--- End diff --

Yes. Currently, it does. Should we expand this, too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-25 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72040281
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -347,6 +347,16 @@ querySpecification
windows?)
 ;
 
+hint
+: '/*+' hintStatement '*/'
--- End diff --

I might have missed something, but do we only support one hint?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72040264
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+
+  for (table <- parameters) {
+var stop = false
+resolvedChild = resolvedChild.transformDown {
--- End diff --

I'll check again this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72038785
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+
+  for (table <- parameters) {
+var stop = false
+resolvedChild = resolvedChild.transformDown {
--- End diff --

I think DFS is right for this match. Do you think BFS?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72037553
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+
+  for (table <- parameters) {
+var stop = false
+resolvedChild = resolvedChild.transformDown {
--- End diff --

Does this really work? `transformDown` is DFS I think


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72036822
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -372,6 +372,10 @@ trait CheckAnalysis extends PredicateHelper {
  |in operator ${operator.simpleString}
""".stripMargin)
 
+  case Hint(_, _, _) =>
+throw new IllegalStateException(
+  "logical hint operator should have been removed by the 
optimizer")
--- End diff --

Oh!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72036883
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
--- End diff --

Sure. I see.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72036607
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
--- End diff --

we can make it a constant value(use set instead of seq) in this rule.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72036166
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -372,6 +372,10 @@ trait CheckAnalysis extends PredicateHelper {
  |in operator ${operator.simpleString}
""".stripMargin)
 
+  case Hint(_, _, _) =>
+throw new IllegalStateException(
+  "logical hint operator should have been removed by the 
optimizer")
--- End diff --

by analyzer


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-24 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72012385
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
--- End diff --

Yep. I will wrap the whole query plan. `Project` is not always there.
```
SELECT /*+ MAPJOIN(t) */ * from t, u where 1=1 group by 1 order by 1

'Sort [1 ASC], true
+- 'Aggregate [1], [*]
   +- 'Filter (1 = 1)
  +- 'Hint MAPJOIN, [t]  
 +- 'Join Inner  
:- 'UnresolvedRelation `t`   
+- 'UnresolvedRelation `u`   
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-24 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72011582
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
--- End diff --

First of all, I'll take a look again. I need some boot-up time. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-24 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72011413
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
--- End diff --

For the second one, I will revise again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-24 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72011394
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
--- End diff --

I see. Since this is `SELECT hint`, which one do you mean?
- `Hint(Project)`
- `Project(Hint)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72011145
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
--- End diff --

then should we wrap the whole query plan with `Hint` instead of just the 
relation? 
https://github.com/apache/spark/pull/14132#discussion-diff-71820260R343


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-24 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72010521
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
--- End diff --

Hi, @cloud-fan . Thank you again.

Yes. That's true. So, at my first commit of this PR, I actually implemented 
to attach just the concrete `BroadcastHint` to the relations in 
`AstBuilder.scala`. But, there were some advices to
- Move out the logic from the parser module (for preventing future updates 
on parser module)
- Generalize the hint syntax like the current code. 

Since a general `Hint` is attached on a logical plan not a table relation, 
we can substitute any other plans (filter/project/aggregator), too. The 
following is my previous comment about this situation. The child of hint is 
`JOIN`.

> Yep. I grasp your point. First, the given relation is just a logical 
plan, not a table. In the following case, the relation is Join.
> SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0, parquet_t1



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-24 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r72008134
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,49 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   *
+   * This rule substitutes `UnresolvedRelation`s in `Substitute` batch 
before `ResolveRelations`
+   * rule is applied. Here are two reasons.
+   * - To support `MetastoreRelation` in Hive module.
+   * - To reduce the effect of `Hint` on the other rules.
+   *
+   * After this rule, it is guaranteed that there exists no unknown `Hint` 
in the plan.
+   * All new `Hint`s should be transformed into concrete Hint classes 
`BroadcastHint` here.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
--- End diff --

how does hive handle sql hints? This rule looks a little complex to me, 
according to the fact that hints can only be applied to table relation. It will 
be great if we can wrap `UnresolvedRelation` with `Hint` in the parser, and 
then it will be very easy to resolve hints at analyzer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71826514
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -193,6 +196,9 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
 case OneRowRelation =>
   ""
 
+case Hint(_, _, child) =>
--- End diff --

I remember the reason. Originally, I designed `Hint` is removed by some 
rules of `Analyzer` or `Optimizer`. But, now, it's changed. Only rules of 
`Analyzer` handles that and removes all unknown hints. `CheckAnalysis` is the 
correct position.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71825927
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -193,6 +196,9 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
 case OneRowRelation =>
   ""
 
+case Hint(_, _, child) =>
--- End diff --

Ur, it's added by previous advice comments.
No problem. I'll move that into `CheckAnalysis`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71825810
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -339,8 +339,11 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] 
with Logging {
   case SqlBaseParser.SELECT =>
 // Regular select
 
+// Add hints.
+val withHint = relation.optionalMap(ctx.hint)(withHints)
--- End diff --

Yep. I grasp your point. First, the given `relation` is just a logical 
plan, not a table. In the following case, the `relation` is `Join`.
 ```
SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0, parquet_t1
```
This means just a context. 

Second, I agree with your point. Generally, `hint` is about tables like 
join order or access path. We can provides some hints on filter conditions, I'm 
not sure about the cases so far. But, even for that cases, we can adapt that in 
`Analyzer` easily. In other words, we can identify `Hint("FILTERHINT")` in the 
same SPJ.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-21 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71823645
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -193,6 +196,9 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
 case OneRowRelation =>
   ""
 
+case Hint(_, _, child) =>
--- End diff --

so why do we have this case? `SQLBuilder` will be applied on analyzed plan 
right?

And I think it's better to push this check in `CheckAnalysis`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-21 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71823613
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -339,8 +339,11 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] 
with Logging {
   case SqlBaseParser.SELECT =>
 // Regular select
 
+// Add hints.
+val withHint = relation.optionalMap(ctx.hint)(withHints)
--- End diff --

for now the hint can only affect the relations(broadcast hint), so it's ok 
to just put the `Hint` on top of relation. But generally hint can affect 
anything in the select clause, e.g. maybe the filter condition.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71823152
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -193,6 +196,9 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
 case OneRowRelation =>
   ""
 
+case Hint(_, _, child) =>
--- End diff --

Also, `SparkStrategies` ensures that here 
https://github.com/apache/spark/pull/14132/files#diff-7253a38df7e111ecf6b1ef71feba383bR347
 , too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71823079
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -193,6 +196,9 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
 case OneRowRelation =>
   ""
 
+case Hint(_, _, child) =>
--- End diff --

After applying the rules of `Analyzer`, there will be no more `Hint` case 
class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71822992
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -339,8 +339,11 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] 
with Logging {
   case SqlBaseParser.SELECT =>
 // Regular select
 
+// Add hints.
+val withHint = relation.optionalMap(ctx.hint)(withHints)
--- End diff --

Ur, could you give some example what you concern?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71822903
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -347,6 +347,16 @@ querySpecification
windows?)
 ;
 
+hint
+: '/*+' hintStatement '*/'
+;
+
+hintStatement
+: hintName=identifier
+| hintName=identifier '(' parameters+=identifier 
parameters+=identifier ')'
--- End diff --

Thank you for review, @cloud-fan .
The first goal of this PR provides a general syntax for hints, not only 
broadcast hints. The `(` and `)` syntax is for `INDEX(t idx_emp)` style. You 
can see the testcase for this in the testcase, too.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-21 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71820467
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -193,6 +196,9 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
 case OneRowRelation =>
   ""
 
+case Hint(_, _, child) =>
--- End diff --

will an analyzed plan contain `Hint`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-21 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71820260
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -339,8 +339,11 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] 
with Logging {
   case SqlBaseParser.SELECT =>
 // Regular select
 
+// Add hints.
+val withHint = relation.optionalMap(ctx.hint)(withHints)
--- End diff --

according to the grammar: `SELECT /*+ hint... */ ...`, it seems that hint 
should be applied to the whole SELECT clause, not only its relations?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-21 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71819403
  
--- Diff: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ---
@@ -347,6 +347,16 @@ querySpecification
windows?)
 ;
 
+hint
+: '/*+' hintStatement '*/'
+;
+
+hintStatement
+: hintName=identifier
+| hintName=identifier '(' parameters+=identifier 
parameters+=identifier ')'
--- End diff --

do we need this rule for broadcast hint?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71756794
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -162,6 +162,9 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
 toSQL(p.right),
 p.condition.map(" ON " + _.sql).getOrElse(""))
 
+case h @ Hint(_, _, s @ SubqueryAlias(alias, p @ Project(_, _: 
SQLTable))) =>
--- End diff --

Sure!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71756744
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +443,46 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+/**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside 
the plan, and the
+ * information of BroadcastHint resides its current position inside 
the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information 
of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the 
nearest Project node.
+ */
+object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan 
transformUp {
+// Capture the broadcasted information and store it in Hint.
+case BroadcastHint(child @ SubqueryAlias(_, Project(_, 
SQLTable(database, table, _, _ =>
+  Hint("BROADCAST", Seq(table), child)
+
+// Nearest Project is found.
+case p @ Project(_, Hint(_, _, _)) => p
+
+// Merge BROADCAST hints up to the nearest Project.
+case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) 
=>
+  h.copy(parameters = params1 ++ params2)
+case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ 
Hint("BROADCAST", p2, right), _, _) =>
+  h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right 
= right))
+
+// Bubble up BROADCAST hints to the nearest Project.
+case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+  h.copy(child = j.copy(left = hintChild))
+case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+  h.copy(child = j.copy(right = hintChild))
+
+// Other UnaryNodes are bypassed.
+case u: UnaryNode
+if u.child.isInstanceOf[Hint] && 
u.child.asInstanceOf[Hint].name.equals("BROADCAST") =>
--- End diff --

No problem, if the decision is made.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71756590
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+/**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside 
the plan, and the
+ * information of BroadcastHint resides its current position inside 
the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information 
of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the 
nearest Project node.
+ */
+object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan 
transformUp {
+// Capture the broadcasted information and store it in Hint.
+case BroadcastHint(child @ SubqueryAlias(_, Project(_, 
SQLTable(database, table, _, _ =>
+  Hint("BROADCAST", Seq(table), child)
+
+// Nearest Project is found.
+case p @ Project(_, Hint(_, _, _)) => p
+
+// Merge BROADCAST hints up to the nearest Project.
+case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) 
=>
+  h.copy(parameters = params1 ++ params2)
+case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ 
Hint("BROADCAST", p2, right), _, _) =>
+  h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right 
= right))
+
+// Bubble up BROADCAST hints to the nearest Project.
+case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+  h.copy(child = j.copy(left = hintChild))
+case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+  h.copy(child = j.copy(right = hintChild))
--- End diff --

Yep. No problem. For both ways, I'm ready. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-21 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71736274
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -162,6 +162,9 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
 toSQL(p.right),
 p.condition.map(" ON " + _.sql).getOrElse(""))
 
+case h @ Hint(_, _, s @ SubqueryAlias(alias, p @ Project(_, _: 
SQLTable))) =>
--- End diff --

The new way looks cleaner, but this looks like a special handling. Can you 
write a code comment to explain when it is being used?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-21 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71735446
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +443,46 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+/**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside 
the plan, and the
+ * information of BroadcastHint resides its current position inside 
the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information 
of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the 
nearest Project node.
+ */
+object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan 
transformUp {
+// Capture the broadcasted information and store it in Hint.
+case BroadcastHint(child @ SubqueryAlias(_, Project(_, 
SQLTable(database, table, _, _ =>
+  Hint("BROADCAST", Seq(table), child)
+
+// Nearest Project is found.
+case p @ Project(_, Hint(_, _, _)) => p
+
+// Merge BROADCAST hints up to the nearest Project.
+case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) 
=>
+  h.copy(parameters = params1 ++ params2)
+case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ 
Hint("BROADCAST", p2, right), _, _) =>
+  h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right 
= right))
+
+// Bubble up BROADCAST hints to the nearest Project.
+case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+  h.copy(child = j.copy(left = hintChild))
+case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+  h.copy(child = j.copy(right = hintChild))
+
+// Other UnaryNodes are bypassed.
+case u: UnaryNode
+if u.child.isInstanceOf[Hint] && 
u.child.asInstanceOf[Hint].name.equals("BROADCAST") =>
--- End diff --

As mentioned before, how about using `isBroadcastHint` here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-21 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71735281
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+/**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside 
the plan, and the
+ * information of BroadcastHint resides its current position inside 
the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information 
of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the 
nearest Project node.
+ */
+object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan 
transformUp {
+// Capture the broadcasted information and store it in Hint.
+case BroadcastHint(child @ SubqueryAlias(_, Project(_, 
SQLTable(database, table, _, _ =>
+  Hint("BROADCAST", Seq(table), child)
+
+// Nearest Project is found.
+case p @ Project(_, Hint(_, _, _)) => p
+
+// Merge BROADCAST hints up to the nearest Project.
+case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) 
=>
+  h.copy(parameters = params1 ++ params2)
+case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ 
Hint("BROADCAST", p2, right), _, _) =>
+  h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right 
= right))
+
+// Bubble up BROADCAST hints to the nearest Project.
+case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+  h.copy(child = j.copy(left = hintChild))
+case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+  h.copy(child = j.copy(right = hintChild))
--- End diff --

Actually, `isBroadcastHint` can be reused below for `UnaryNode`
```SQL
case j: Join if j.children.exists(isBroadcastHint) =>
  val parameter = 
j.children.filter(isBroadcastHint).map(_.asInstanceOf[Hint]).flatMap(_.parameters)
  val newChildren = j.children.map(x => if (isBroadcastHint(x)) 
x.asInstanceOf[Hint].child else x)
  Hint("BROADCAST",
parameter = parameter,
j.copy(left = newChildren(0), right = newChildren(1)))
```

However, I am not sure which way is better. Leave it here and let the 
Committers decide it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71694743
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+/**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside 
the plan, and the
+ * information of BroadcastHint resides its current position inside 
the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information 
of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the 
nearest Project node.
+ */
+object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan 
transformUp {
+// Capture the broadcasted information and store it in Hint.
+case BroadcastHint(child @ SubqueryAlias(_, Project(_, 
SQLTable(database, table, _, _ =>
+  Hint("BROADCAST", Seq(table), child)
+
+// Nearest Project is found.
+case p @ Project(_, Hint(_, _, _)) => p
+
+// Merge BROADCAST hints up to the nearest Project.
+case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) 
=>
+  h.copy(parameters = params1 ++ params2)
+case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ 
Hint("BROADCAST", p2, right), _, _) =>
+  h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right 
= right))
+
+// Bubble up BROADCAST hints to the nearest Project.
+case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+  h.copy(child = j.copy(left = hintChild))
+case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+  h.copy(child = j.copy(right = hintChild))
--- End diff --

For the functional equivalence, you can see the logic. Also, the above code 
is passed `LogicalPlanToSQLSuite` with the existing tests, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71694195
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+/**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside 
the plan, and the
+ * information of BroadcastHint resides its current position inside 
the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information 
of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the 
nearest Project node.
+ */
+object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan 
transformUp {
+// Capture the broadcasted information and store it in Hint.
+case BroadcastHint(child @ SubqueryAlias(_, Project(_, 
SQLTable(database, table, _, _ =>
+  Hint("BROADCAST", Seq(table), child)
+
+// Nearest Project is found.
+case p @ Project(_, Hint(_, _, _)) => p
+
+// Merge BROADCAST hints up to the nearest Project.
+case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) 
=>
+  h.copy(parameters = params1 ++ params2)
+case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ 
Hint("BROADCAST", p2, right), _, _) =>
+  h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right 
= right))
+
+// Bubble up BROADCAST hints to the nearest Project.
+case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+  h.copy(child = j.copy(left = hintChild))
+case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+  h.copy(child = j.copy(right = hintChild))
--- End diff --

Here is the candidate. Since we cannot take advantage of `case` statement, 
I added a function for that.
```
private def isBroadcastHint(logicalPlan: LogicalPlan): Boolean =
logicalPlan.isInstanceOf[Hint] && 
logicalPlan.asInstanceOf[Hint].name.equals("BRAODCAST")

...
case j: Join if j.children.exists(isBroadcastHint) =>
  val newChildren =
j.children.map(x => if (isBroadcastHint(x)) x.asInstanceOf[Hint].child 
else x)
  Hint("BROADCAST",

j.children.filter(isBroadcastHint).map(_.asInstanceOf[Hint]).flatMap(_.parameters),
j.copy(left = newChildren(0), right = newChildren(1)))
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71683008
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -356,8 +372,14 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+val broadcastHint = project match {
+  case p @ Project(projectList, Hint("BROADCAST", tables, child)) =>
+if (tables.nonEmpty) s"/*+ MAPJOIN(${tables.mkString(", ")}) */" 
else ""
+  case _ => ""
+}
 build(
   "SELECT",
+  broadcastHint,
--- End diff --

Hmm. For this one, I found more reasonable solution. Instead of adding new 
dummy `Project`, we can utilize the existing dummy `Project` on the `SQLTable`.

Basically, the root cause of this problem of `groupingSetToSQL` is that it 
removes the project by doing this `toSQL(project.child)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71671510
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -356,8 +372,14 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+val broadcastHint = project match {
+  case p @ Project(projectList, Hint("BROADCAST", tables, child)) =>
+if (tables.nonEmpty) s"/*+ MAPJOIN(${tables.mkString(", ")}) */" 
else ""
+  case _ => ""
+}
 build(
   "SELECT",
+  broadcastHint,
--- End diff --

To do this, we need to add the extra dummy `Project` for the following 
condition.
```scala
case a @ Aggregate(_, _, e @ Expand(_, _, p: Project)) if isGroupingSet(a, 
e, p) =>
  groupingSetToSQL(a, e, p)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-20 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71627841
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+/**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside 
the plan, and the
+ * information of BroadcastHint resides its current position inside 
the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information 
of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the 
nearest Project node.
+ */
+object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan 
transformUp {
+// Capture the broadcasted information and store it in Hint.
+case BroadcastHint(child @ SubqueryAlias(_, Project(_, 
SQLTable(database, table, _, _ =>
+  Hint("BROADCAST", Seq(table), child)
+
+// Nearest Project is found.
+case p @ Project(_, Hint(_, _, _)) => p
+
+// Merge BROADCAST hints up to the nearest Project.
+case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) 
=>
+  h.copy(parameters = params1 ++ params2)
+case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ 
Hint("BROADCAST", p2, right), _, _) =>
+  h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right 
= right))
+
+// Bubble up BROADCAST hints to the nearest Project.
+case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+  h.copy(child = j.copy(left = hintChild))
+case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+  h.copy(child = j.copy(right = hintChild))
--- End diff --

You mean before committing here? No problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71627448
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+/**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside 
the plan, and the
+ * information of BroadcastHint resides its current position inside 
the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information 
of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the 
nearest Project node.
+ */
+object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan 
transformUp {
+// Capture the broadcasted information and store it in Hint.
+case BroadcastHint(child @ SubqueryAlias(_, Project(_, 
SQLTable(database, table, _, _ =>
+  Hint("BROADCAST", Seq(table), child)
+
+// Nearest Project is found.
+case p @ Project(_, Hint(_, _, _)) => p
+
+// Merge BROADCAST hints up to the nearest Project.
+case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) 
=>
+  h.copy(parameters = params1 ++ params2)
+case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ 
Hint("BROADCAST", p2, right), _, _) =>
+  h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right 
= right))
+
+// Bubble up BROADCAST hints to the nearest Project.
+case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+  h.copy(child = j.copy(left = hintChild))
+case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+  h.copy(child = j.copy(right = hintChild))
--- End diff --

Can you post your code when you merge them in a single case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-20 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71627192
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+/**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside 
the plan, and the
+ * information of BroadcastHint resides its current position inside 
the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information 
of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the 
nearest Project node.
+ */
+object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan 
transformUp {
+// Capture the broadcasted information and store it in Hint.
+case BroadcastHint(child @ SubqueryAlias(_, Project(_, 
SQLTable(database, table, _, _ =>
+  Hint("BROADCAST", Seq(table), child)
+
+// Nearest Project is found.
+case p @ Project(_, Hint(_, _, _)) => p
+
+// Merge BROADCAST hints up to the nearest Project.
+case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) 
=>
+  h.copy(parameters = params1 ++ params2)
+case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ 
Hint("BROADCAST", p2, right), _, _) =>
+  h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right 
= right))
+
+// Bubble up BROADCAST hints to the nearest Project.
+case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+  h.copy(child = j.copy(left = hintChild))
+case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+  h.copy(child = j.copy(right = hintChild))
--- End diff --

Ur, in this case, I thought the combined one looks complex, doesn't it?
I'm not sure that also could reduce the number of lines.

Anyway, it's your advice. I'll try to find some way. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71626761
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+/**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside 
the plan, and the
+ * information of BroadcastHint resides its current position inside 
the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information 
of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the 
nearest Project node.
+ */
+object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan 
transformUp {
+// Capture the broadcasted information and store it in Hint.
+case BroadcastHint(child @ SubqueryAlias(_, Project(_, 
SQLTable(database, table, _, _ =>
+  Hint("BROADCAST", Seq(table), child)
+
+// Nearest Project is found.
+case p @ Project(_, Hint(_, _, _)) => p
+
+// Merge BROADCAST hints up to the nearest Project.
+case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) 
=>
+  h.copy(parameters = params1 ++ params2)
+case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ 
Hint("BROADCAST", p2, right), _, _) =>
+  h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right 
= right))
+
+// Bubble up BROADCAST hints to the nearest Project.
+case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+  h.copy(child = j.copy(left = hintChild))
+case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+  h.copy(child = j.copy(right = hintChild))
+
+// Other UnaryNodes are bypassed.
+case u: UnaryNode
+if u.child.isInstanceOf[Hint] && 
u.child.asInstanceOf[Hint].name.equals("BROADCAST") =>
+  val hint = u.child.asInstanceOf[Hint]
+  hint.copy(child = u.withNewChildren(Seq(hint.child)))
+
+// Other binary or higher operations are ignored.
--- End diff --

We still need to explain why they can be ignored. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71626714
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+/**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside 
the plan, and the
+ * information of BroadcastHint resides its current position inside 
the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information 
of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the 
nearest Project node.
+ */
+object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan 
transformUp {
+// Capture the broadcasted information and store it in Hint.
+case BroadcastHint(child @ SubqueryAlias(_, Project(_, 
SQLTable(database, table, _, _ =>
+  Hint("BROADCAST", Seq(table), child)
+
+// Nearest Project is found.
+case p @ Project(_, Hint(_, _, _)) => p
+
+// Merge BROADCAST hints up to the nearest Project.
+case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) 
=>
+  h.copy(parameters = params1 ++ params2)
+case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ 
Hint("BROADCAST", p2, right), _, _) =>
+  h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right 
= right))
+
+// Bubble up BROADCAST hints to the nearest Project.
+case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+  h.copy(child = j.copy(left = hintChild))
+case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+  h.copy(child = j.copy(right = hintChild))
--- End diff --

The above three `Join` cases can be combined into a single one. The code 
might looks cleaner


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-20 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71626636
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -193,6 +193,12 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
 case OneRowRelation =>
   ""
 
+case Hint(_, _, child) =>
+  toSQL(child)
+
+case BroadcastHint(child) =>
+  toSQL(child)
--- End diff --

Yep. Right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71626212
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -193,6 +193,12 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
 case OneRowRelation =>
   ""
 
+case Hint(_, _, child) =>
+  toSQL(child)
+
+case BroadcastHint(child) =>
+  toSQL(child)
--- End diff --

This is dead code. Right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-20 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71625598
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -356,8 +372,14 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+val broadcastHint = project match {
+  case p @ Project(projectList, Hint("BROADCAST", tables, child)) =>
+if (tables.nonEmpty) s"/*+ MAPJOIN(${tables.mkString(", ")}) */" 
else ""
+  case _ => ""
+}
 build(
   "SELECT",
+  broadcastHint,
--- End diff --

Okay. I'll try that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71625297
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -356,8 +372,14 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+val broadcastHint = project match {
+  case p @ Project(projectList, Hint("BROADCAST", tables, child)) =>
+if (tables.nonEmpty) s"/*+ MAPJOIN(${tables.mkString(", ")}) */" 
else ""
+  case _ => ""
+}
 build(
   "SELECT",
+  broadcastHint,
--- End diff --

Just did a quick check. Except `groupingSetToSQL`, the other nodes are not 
converting the `Project` operator to SQL. That is the reason why we have to add 
a special handling for `BroadcastHint`. Then, maybe we can add an extra dummy 
`Project` in `groupingSetToSQL`. If so, we can avoid adding the special 
handling for `BroadcastHint`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71623500
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -356,8 +372,14 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+val broadcastHint = project match {
+  case p @ Project(projectList, Hint("BROADCAST", tables, child)) =>
+if (tables.nonEmpty) s"/*+ MAPJOIN(${tables.mkString(", ")}) */" 
else ""
+  case _ => ""
+}
 build(
   "SELECT",
+  broadcastHint,
--- End diff --

That is like a temporary patch, instead of a permanent solution. We need a 
clean way to resolve it. Let me find what is the root cause


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71465508
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,35 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
--- End diff --

Also, in the PR description, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71464803
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,35 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
--- End diff --

Oh, I missed you comment here. It's too far from the bottom now. :)
I'll add more `prerequisite, dependency assumptions` here now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71464346
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -356,8 +372,14 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+val broadcastHint = project match {
+  case p @ Project(projectList, Hint("BROADCAST", tables, child)) =>
+if (tables.nonEmpty) s"/*+ MAPJOIN(${tables.mkString(", ")}) */" 
else ""
+  case _ => ""
+}
 build(
   "SELECT",
+  broadcastHint,
--- End diff --

"SELECT" occurs in the followings. But, I didn't added meaning-logic based 
on the testcases.
- aggregateToSQL
- generateToSQL  => This has only "SELECT 1"
- groupingSetToSQL
- projectToSQL
- windowToSQL

I think the test coverage enough due to it generates the above cases. If 
you suggests more testcases, I welcome. I like robustness both for this PR and 
for the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71463733
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -356,8 +372,14 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+val broadcastHint = project match {
+  case p @ Project(projectList, Hint("BROADCAST", tables, child)) =>
+if (tables.nonEmpty) s"/*+ MAPJOIN(${tables.mkString(", ")}) */" 
else ""
+  case _ => ""
+}
 build(
   "SELECT",
+  broadcastHint,
--- End diff --

It's the result of Window test query. For the Windows query, there were 
nested Projects.
```
test("broadcast hint with window") {
checkSQL(
  """
|SELECT /*+ MAPJOIN(parquet_t1) */
|   x.key, MAX(y.key) OVER (PARTITION BY x.key % 5 ORDER BY 
x.key)
|FROM parquet_t1 x JOIN parquet_t1 y ON x.key = y.key
  """.stripMargin,
  "broadcast_hint_window")
  }
```

I had the same feeling why some "SELECT" doesn't happen. 

After https://issues.apache.org/jira/browse/SPARK-16576 , I think this kind 
of weirdness will be reduced.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71463337
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+/**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside 
the plan, and the
+ * information of BroadcastHint resides its current position inside 
the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information 
of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the 
nearest Project node.
+ */
+object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan 
transformUp {
+// Capture the broadcasted information and store it in Hint.
+case BroadcastHint(child @ SubqueryAlias(_, Project(_, 
SQLTable(database, table, _, _ =>
+  Hint("BROADCAST", Seq(table), child)
+
+// Nearest Project is found.
+case p @ Project(_, Hint(_, _, _)) => p
+
+// Merge BROADCAST hints up to the nearest Project.
+case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) 
=>
+  h.copy(parameters = params1 ++ params2)
+case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ 
Hint("BROADCAST", p2, right), _, _) =>
+  h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right 
= right))
+
+// Bubble up BROADCAST hints to the nearest Project.
+case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+  h.copy(child = j.copy(left = hintChild))
+case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+  h.copy(child = j.copy(right = hintChild))
+
+// Other UnaryNodes are bypassed.
+case u: UnaryNode
+  if u.child.isInstanceOf[Hint] && 
u.child.asInstanceOf[Hint].name.equals("BROADCAST") =>
--- End diff --

Sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71457670
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+/**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside 
the plan, and the
+ * information of BroadcastHint resides its current position inside 
the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information 
of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the 
nearest Project node.
+ */
+object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan 
transformUp {
+// Capture the broadcasted information and store it in Hint.
+case BroadcastHint(child @ SubqueryAlias(_, Project(_, 
SQLTable(database, table, _, _ =>
+  Hint("BROADCAST", Seq(table), child)
+
+// Nearest Project is found.
+case p @ Project(_, Hint(_, _, _)) => p
+
+// Merge BROADCAST hints up to the nearest Project.
+case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) 
=>
+  h.copy(parameters = params1 ++ params2)
+case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ 
Hint("BROADCAST", p2, right), _, _) =>
+  h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right 
= right))
+
+// Bubble up BROADCAST hints to the nearest Project.
+case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+  h.copy(child = j.copy(left = hintChild))
+case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+  h.copy(child = j.copy(right = hintChild))
+
+// Other UnaryNodes are bypassed.
+case u: UnaryNode
+  if u.child.isInstanceOf[Hint] && 
u.child.asInstanceOf[Hint].name.equals("BROADCAST") =>
--- End diff --

uh, yeah! please add two more spaces before `if`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71457606
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -356,8 +372,14 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+val broadcastHint = project match {
+  case p @ Project(projectList, Hint("BROADCAST", tables, child)) =>
+if (tables.nonEmpty) s"/*+ MAPJOIN(${tables.mkString(", ")}) */" 
else ""
+  case _ => ""
+}
 build(
   "SELECT",
+  broadcastHint,
--- End diff --

Could you please do more investigation on this? The current solution looks 
not clean to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71446840
  
--- Diff: 
sql/hive/src/test/resources/sqlgen/broadcast_hint_nested_query_1.sql ---
@@ -0,0 +1,7 @@
+-- This file is automatically generated by LogicalPlanToSQLSuite.
+SELECT /*+ MAPJOIN(parquet_t0) */ *
+FROM (SELECT id tid FROM parquet_t0) T
+JOIN (SELECT key uid FROM parquet_t1) U
+ON tid=uid

+
+SELECT `gen_attr` AS `tid`, `gen_attr` AS `uid` FROM (SELECT `gen_attr`, 
`gen_attr` FROM (SELECT /*+ MAPJOIN(parquet_t0) */ `gen_attr` AS `gen_attr` 
FROM (SELECT `id` AS `gen_attr` FROM `default`.`parquet_t0`) AS gen_subquery_0) 
AS T INNER JOIN (SELECT `gen_attr` AS `gen_attr` FROM (SELECT `key` AS 
`gen_attr`, `value` AS `gen_attr` FROM `default`.`parquet_t1`) AS 
gen_subquery_1) AS U ON (`gen_attr` = `gen_attr`)) AS gen_subquery_2
--- End diff --

Yep. I'm trying to recover them in 
https://github.com/apache/spark/pull/14257 , too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71444098
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+/**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside 
the plan, and the
+ * information of BroadcastHint resides its current position inside 
the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information 
of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the 
nearest Project node.
+ */
+object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan 
transformUp {
+// Capture the broadcasted information and store it in Hint.
+case BroadcastHint(child @ SubqueryAlias(_, Project(_, 
SQLTable(database, table, _, _ =>
+  Hint("BROADCAST", Seq(table), child)
+
+// Nearest Project is found.
+case p @ Project(_, Hint(_, _, _)) => p
+
+// Merge BROADCAST hints up to the nearest Project.
+case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) 
=>
+  h.copy(parameters = params1 ++ params2)
+case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ 
Hint("BROADCAST", p2, right), _, _) =>
+  h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right 
= right))
+
+// Bubble up BROADCAST hints to the nearest Project.
+case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+  h.copy(child = j.copy(left = hintChild))
+case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+  h.copy(child = j.copy(right = hintChild))
+
+// Other UnaryNodes are bypassed.
+case u: UnaryNode
+  if u.child.isInstanceOf[Hint] && 
u.child.asInstanceOf[Hint].name.equals("BROADCAST") =>
+  val hint = u.child.asInstanceOf[Hint]
+  hint.copy(child = u.withNewChildren(Seq(hint.child)))
+
+// Other binary operations are ignored.
--- End diff --

Yep. I worried you say like that. :) I hesitated to use `higher` or 
`multinary`.
May I enumerate all of them? At first, I did but remove that comment since 
it will be outdated.
Let me think about this more.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71443911
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+/**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside 
the plan, and the
+ * information of BroadcastHint resides its current position inside 
the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information 
of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the 
nearest Project node.
+ */
+object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan 
transformUp {
+// Capture the broadcasted information and store it in Hint.
+case BroadcastHint(child @ SubqueryAlias(_, Project(_, 
SQLTable(database, table, _, _ =>
+  Hint("BROADCAST", Seq(table), child)
+
+// Nearest Project is found.
+case p @ Project(_, Hint(_, _, _)) => p
+
+// Merge BROADCAST hints up to the nearest Project.
+case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) 
=>
+  h.copy(parameters = params1 ++ params2)
+case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ 
Hint("BROADCAST", p2, right), _, _) =>
+  h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right 
= right))
+
+// Bubble up BROADCAST hints to the nearest Project.
+case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+  h.copy(child = j.copy(left = hintChild))
+case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+  h.copy(child = j.copy(right = hintChild))
+
+// Other UnaryNodes are bypassed.
+case u: UnaryNode
+  if u.child.isInstanceOf[Hint] && 
u.child.asInstanceOf[Hint].name.equals("BROADCAST") =>
--- End diff --

? This one is switching the `UnaryNode` and its child `Hint`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71443752
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -356,8 +372,14 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+val broadcastHint = project match {
+  case p @ Project(projectList, Hint("BROADCAST", tables, child)) =>
+if (tables.nonEmpty) s"/*+ MAPJOIN(${tables.mkString(", ")}) */" 
else ""
+  case _ => ""
+}
 build(
   "SELECT",
+  broadcastHint,
--- End diff --

Yep. At first, I thought we need to change `projectToSQL`, but the current 
logic requires this. For `windowToSQL`, it seems not to be attached 
`BroadcastHint` directly.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71443423
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
 ---
@@ -447,4 +447,42 @@ class PlanParserSuite extends PlanTest {
 assertEqual("select a, b from db.c where x !> 1",
   table("db", "c").where('x <= 1).select('a, 'b))
   }
+
+  test("select hint syntax") {
+// Hive compatibility: Missing parameter raises ParseException.
+val m = intercept[ParseException] {
+  parsePlan("SELECT /*+ HINT() */ * FROM t")
+}.getMessage
+assert(m.contains("no viable alternative at input"))
+
+// Hive compatibility: No database.
+val m2 = intercept[ParseException] {
+  parsePlan("SELECT /*+ MAPJOIN(default.t) */ * from default.t;")
+}.getMessage
+assert(m2.contains("no viable alternative at input"))
+
+comparePlans(
+  parsePlan("SELECT /*+ HINT */ * FROM t"),
+  Hint("HINT", Seq.empty, table("t")).select(star()))
+
+comparePlans(
+  parsePlan("SELECT /*+ HINT */ * FROM t"),
+  Hint("HINT", Seq.empty, table("t")).select(star()))
--- End diff --

My bad.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71443387
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
 ---
@@ -447,4 +447,42 @@ class PlanParserSuite extends PlanTest {
 assertEqual("select a, b from db.c where x !> 1",
   table("db", "c").where('x <= 1).select('a, 'b))
   }
+
+  test("select hint syntax") {
+// Hive compatibility: Missing parameter raises ParseException.
+val m = intercept[ParseException] {
+  parsePlan("SELECT /*+ HINT() */ * FROM t")
+}.getMessage
+assert(m.contains("no viable alternative at input"))
+
+// Hive compatibility: No database.
+val m2 = intercept[ParseException] {
+  parsePlan("SELECT /*+ MAPJOIN(default.t) */ * from default.t;")
+}.getMessage
+assert(m2.contains("no viable alternative at input"))
+
+comparePlans(
+  parsePlan("SELECT /*+ HINT */ * FROM t"),
+  Hint("HINT", Seq.empty, table("t")).select(star()))
+
+comparePlans(
+  parsePlan("SELECT /*+ HINT */ * FROM t"),
+  Hint("HINT", Seq.empty, table("t")).select(star()))
+
+comparePlans(
+  parsePlan("SELECT /*+ BROADCASTJOIN(u) */ * FROM t"),
+  Hint("BROADCASTJOIN", Seq("u"), table("t")).select(star()))
+
+comparePlans(
+  parsePlan("SELECT /*+ MAPJOIN(u) */ * FROM t"),
+  Hint("MAPJOIN", Seq("u"), table("t")).select(star()))
+
+comparePlans(
+  parsePlan("SELECT /*+ STREAMTABLE(a,b,c) */ * FROM t"),
+  Hint("STREAMTABLE", Seq("a", "b", "c"), table("t")).select(star()))
+
+comparePlans(
+  parsePlan("SELECT /*+ INDEX(t emp_job_ix) */ * FROM t"),
+  Hint("INDEX", Seq("t", "emp_job_ix"), table("t")).select(star()))
+  }
--- End diff --

Yep. I added like this.
```
comparePlans(
  parsePlan("SELECT /*+ MAPJOIN(`default.t`) */ * from `default.t`"),
  Hint("MAPJOIN", Seq("default.t"), table("default.t")).select(star()))
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71441985
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,35 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
--- End diff --

It sounds like you remove the comment you put before. 

Second, we also need to explain the restriction and the future work.

Third, we also need to document the issue you hit in the PR description. It 
can help us understand the issues when we enhance it later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71441464
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+/**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside 
the plan, and the
+ * information of BroadcastHint resides its current position inside 
the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information 
of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the 
nearest Project node.
+ */
+object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan 
transformUp {
+// Capture the broadcasted information and store it in Hint.
+case BroadcastHint(child @ SubqueryAlias(_, Project(_, 
SQLTable(database, table, _, _ =>
+  Hint("BROADCAST", Seq(table), child)
+
+// Nearest Project is found.
+case p @ Project(_, Hint(_, _, _)) => p
+
+// Merge BROADCAST hints up to the nearest Project.
+case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) 
=>
+  h.copy(parameters = params1 ++ params2)
+case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ 
Hint("BROADCAST", p2, right), _, _) =>
+  h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right 
= right))
+
+// Bubble up BROADCAST hints to the nearest Project.
+case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+  h.copy(child = j.copy(left = hintChild))
+case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+  h.copy(child = j.copy(right = hintChild))
+
+// Other UnaryNodes are bypassed.
+case u: UnaryNode
+  if u.child.isInstanceOf[Hint] && 
u.child.asInstanceOf[Hint].name.equals("BROADCAST") =>
+  val hint = u.child.asInstanceOf[Hint]
+  hint.copy(child = u.withNewChildren(Seq(hint.child)))
+
+// Other binary operations are ignored.
--- End diff --

This is not clear. Which operations are ignored? and Why they are ignored? 
You know, `Union` is not a binary operation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71441209
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +449,44 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+/**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside 
the plan, and the
+ * information of BroadcastHint resides its current position inside 
the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information 
of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the 
nearest Project node.
+ */
+object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan 
transformUp {
+// Capture the broadcasted information and store it in Hint.
+case BroadcastHint(child @ SubqueryAlias(_, Project(_, 
SQLTable(database, table, _, _ =>
+  Hint("BROADCAST", Seq(table), child)
+
+// Nearest Project is found.
+case p @ Project(_, Hint(_, _, _)) => p
+
+// Merge BROADCAST hints up to the nearest Project.
+case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) 
=>
+  h.copy(parameters = params1 ++ params2)
+case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ 
Hint("BROADCAST", p2, right), _, _) =>
+  h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right 
= right))
+
+// Bubble up BROADCAST hints to the nearest Project.
+case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+  h.copy(child = j.copy(left = hintChild))
+case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+  h.copy(child = j.copy(right = hintChild))
+
+// Other UnaryNodes are bypassed.
+case u: UnaryNode
+  if u.child.isInstanceOf[Hint] && 
u.child.asInstanceOf[Hint].name.equals("BROADCAST") =>
--- End diff --

Yu can put another case for Hint. Then, you  do not need this `if`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71441085
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -356,8 +372,14 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+val broadcastHint = project match {
+  case p @ Project(projectList, Hint("BROADCAST", tables, child)) =>
+if (tables.nonEmpty) s"/*+ MAPJOIN(${tables.mkString(", ")}) */" 
else ""
+  case _ => ""
+}
 build(
   "SELECT",
+  broadcastHint,
--- End diff --

This is in `groupingSetToSQL`. This looks weird. How about the others? 
e.g., `windowToSQL`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71433270
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
 ---
@@ -447,4 +447,42 @@ class PlanParserSuite extends PlanTest {
 assertEqual("select a, b from db.c where x !> 1",
   table("db", "c").where('x <= 1).select('a, 'b))
   }
+
+  test("select hint syntax") {
+// Hive compatibility: Missing parameter raises ParseException.
+val m = intercept[ParseException] {
+  parsePlan("SELECT /*+ HINT() */ * FROM t")
+}.getMessage
+assert(m.contains("no viable alternative at input"))
+
+// Hive compatibility: No database.
+val m2 = intercept[ParseException] {
+  parsePlan("SELECT /*+ MAPJOIN(default.t) */ * from default.t;")
+}.getMessage
+assert(m2.contains("no viable alternative at input"))
+
+comparePlans(
+  parsePlan("SELECT /*+ HINT */ * FROM t"),
+  Hint("HINT", Seq.empty, table("t")).select(star()))
+
+comparePlans(
+  parsePlan("SELECT /*+ HINT */ * FROM t"),
+  Hint("HINT", Seq.empty, table("t")).select(star()))
--- End diff --

A duplicate case, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71433044
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
 ---
@@ -447,4 +447,42 @@ class PlanParserSuite extends PlanTest {
 assertEqual("select a, b from db.c where x !> 1",
   table("db", "c").where('x <= 1).select('a, 'b))
   }
+
+  test("select hint syntax") {
+// Hive compatibility: Missing parameter raises ParseException.
+val m = intercept[ParseException] {
+  parsePlan("SELECT /*+ HINT() */ * FROM t")
+}.getMessage
+assert(m.contains("no viable alternative at input"))
+
+// Hive compatibility: No database.
+val m2 = intercept[ParseException] {
+  parsePlan("SELECT /*+ MAPJOIN(default.t) */ * from default.t;")
+}.getMessage
+assert(m2.contains("no viable alternative at input"))
+
+comparePlans(
+  parsePlan("SELECT /*+ HINT */ * FROM t"),
+  Hint("HINT", Seq.empty, table("t")).select(star()))
+
+comparePlans(
+  parsePlan("SELECT /*+ HINT */ * FROM t"),
+  Hint("HINT", Seq.empty, table("t")).select(star()))
+
+comparePlans(
+  parsePlan("SELECT /*+ BROADCASTJOIN(u) */ * FROM t"),
+  Hint("BROADCASTJOIN", Seq("u"), table("t")).select(star()))
+
+comparePlans(
+  parsePlan("SELECT /*+ MAPJOIN(u) */ * FROM t"),
+  Hint("MAPJOIN", Seq("u"), table("t")).select(star()))
+
+comparePlans(
+  parsePlan("SELECT /*+ STREAMTABLE(a,b,c) */ * FROM t"),
+  Hint("STREAMTABLE", Seq("a", "b", "c"), table("t")).select(star()))
+
+comparePlans(
+  parsePlan("SELECT /*+ INDEX(t emp_job_ix) */ * FROM t"),
+  Hint("INDEX", Seq("t", "emp_job_ix"), table("t")).select(star()))
+  }
--- End diff --

This should work. Please add this test case.

```parsePlan("SELECT /*+ MAPJOIN(`default.t`) */ * from default.t")```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71432748
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala
 ---
@@ -447,4 +447,42 @@ class PlanParserSuite extends PlanTest {
 assertEqual("select a, b from db.c where x !> 1",
   table("db", "c").where('x <= 1).select('a, 'b))
   }
+
+  test("select hint syntax") {
+// Hive compatibility: Missing parameter raises ParseException.
+val m = intercept[ParseException] {
+  parsePlan("SELECT /*+ HINT() */ * FROM t")
+}.getMessage
+assert(m.contains("no viable alternative at input"))
+
+// Hive compatibility: No database.
+val m2 = intercept[ParseException] {
+  parsePlan("SELECT /*+ MAPJOIN(default.t) */ * from default.t;")
--- End diff --

remove the semicolon


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71430308
  
--- Diff: 
sql/hive/src/test/resources/sqlgen/broadcast_hint_nested_query_1.sql ---
@@ -0,0 +1,7 @@
+-- This file is automatically generated by LogicalPlanToSQLSuite.
+SELECT /*+ MAPJOIN(parquet_t0) */ *
+FROM (SELECT id tid FROM parquet_t0) T
+JOIN (SELECT key uid FROM parquet_t1) U
+ON tid=uid

+
+SELECT `gen_attr` AS `tid`, `gen_attr` AS `uid` FROM (SELECT `gen_attr`, 
`gen_attr` FROM (SELECT /*+ MAPJOIN(parquet_t0) */ `gen_attr` AS `gen_attr` 
FROM (SELECT `id` AS `gen_attr` FROM `default`.`parquet_t0`) AS gen_subquery_0) 
AS T INNER JOIN (SELECT `gen_attr` AS `gen_attr` FROM (SELECT `key` AS 
`gen_attr`, `value` AS `gen_attr` FROM `default`.`parquet_t1`) AS 
gen_subquery_1) AS U ON (`gen_attr` = `gen_attr`)) AS gen_subquery_2
--- End diff --

oh, I see. The compare removes the id.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71429222
  
--- Diff: 
sql/hive/src/test/resources/sqlgen/broadcast_hint_nested_query_1.sql ---
@@ -0,0 +1,7 @@
+-- This file is automatically generated by LogicalPlanToSQLSuite.
+SELECT /*+ MAPJOIN(parquet_t0) */ *
+FROM (SELECT id tid FROM parquet_t0) T
+JOIN (SELECT key uid FROM parquet_t1) U
+ON tid=uid

+
+SELECT `gen_attr` AS `tid`, `gen_attr` AS `uid` FROM (SELECT `gen_attr`, 
`gen_attr` FROM (SELECT /*+ MAPJOIN(parquet_t0) */ `gen_attr` AS `gen_attr` 
FROM (SELECT `id` AS `gen_attr` FROM `default`.`parquet_t0`) AS gen_subquery_0) 
AS T INNER JOIN (SELECT `gen_attr` AS `gen_attr` FROM (SELECT `key` AS 
`gen_attr`, `value` AS `gen_attr` FROM `default`.`parquet_t1`) AS 
gen_subquery_1) AS U ON (`gen_attr` = `gen_attr`)) AS gen_subquery_2
--- End diff --

This is wrong, right?

If we remove the hint, the output should be like
```
SELECT `gen_attr_86` AS `tid`, 
   `gen_attr_87` AS `uid` 
FROM   (SELECT `gen_attr_86`, 
   `gen_attr_87` 
FROM   (SELECT `gen_attr_13` AS `gen_attr_86` 
FROM   (SELECT `id` AS `gen_attr_13` 
FROM   `default`.`parquet_t0`) AS gen_subquery_0) 
AS T 
   INNER JOIN (SELECT `gen_attr_88` AS `gen_attr_87` 
   FROM   (SELECT `key`   AS `gen_attr_88`, 
  `value` AS `gen_attr_89` 
   FROM   `default`.`parquet_t1`) AS 
  gen_subquery_1) AS 
  U 
   ON ( `gen_attr_86` = `gen_attr_87` )) AS 
gen_subquery_2 
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71425663
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/BroadcastHintSuite.scala ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.catalyst.plans.logical.BroadcastHint
+import org.apache.spark.sql.hive.test.TestHiveSingleton
+import org.apache.spark.sql.test.SQLTestUtils
+
+class BroadcastHintSuite extends QueryTest with SQLTestUtils with 
TestHiveSingleton {
+  test("broadcast hint on Hive table") {
+withTable("hive_t", "hive_u") {
+  spark.sql("CREATE TABLE hive_t(a int)")
+  spark.sql("CREATE TABLE hive_u(b int)")
+
+  val hive_t = spark.table("hive_t").queryExecution.analyzed
+  val hive_u = spark.table("hive_u").queryExecution.analyzed
+
+  val plan = spark.sql("SELECT /*+ MAPJOIN(hive_t) */ * FROM hive_t, 
hive_u")
+.queryExecution.analyzed
+  assert(plan.children(0).children(0).isInstanceOf[BroadcastHint])
--- End diff --

Sure! No problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71425497
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala
 ---
@@ -933,4 +933,124 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest 
with SQLTestUtils {
   checkSQL("select * from orc_t", "select_orc_table")
 }
   }
+
+  test("broadcast hint on single table") {
+checkSQL("SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0",
+  "broadcast_hint_single_table_1")
+
+checkSQL("SELECT /*+ MAPJOIN(parquet_t0, parquet_t0) */ * FROM 
parquet_t0",
+  "broadcast_hint_single_table_2")
+
+checkSQL(
+  "SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0 as a",
+  "broadcast_hint_single_table_3")
+
+checkSQL(
+  """
+|SELECT /*+ MAPJOIN(parquet_t0) */ *
+|FROM (SELECT id tid FROM parquet_t0) T
+|JOIN (SELECT id uid FROM parquet_t0) U
+|ON tid=uid
+  """.stripMargin,
+  "broadcast_hint_single_4")
+  }
+
+  test("broadcast hint on multiple tables") {
+checkSQL(
+  "SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0, parquet_t1",
+  "broadcast_hint_multiple_table_1")
+checkSQL(
+  "SELECT /*+ MAPJOIN(parquet_t1) */ * FROM parquet_t0, parquet_t1",
+  "broadcast_hint_multiple_table_2")
+  }
+
+  test("broadcast hint on nested query 1") {
+checkSQL(
+  """
+|SELECT /*+ MAPJOIN(parquet_t0) */ *
+|FROM (SELECT id tid FROM parquet_t0) T
+|JOIN (SELECT key uid FROM parquet_t1) U
+|ON tid=uid
+  """.stripMargin,
+  "broadcast_hint_nested_query_1")
+  }
+
+  test("broadcast hint on nested query 2") {
+checkSQL(
+  """
+|SELECT /*+ MAPJOIN(parquet_t1) */ tid
+|FROM (SELECT id tid FROM parquet_t0) T
+|JOIN (SELECT key uid FROM parquet_t1) U
+|ON tid=uid
+  """.stripMargin,
+  "broadcast_hint_nested_query_2")
+  }
+
+  test("multiple broadcast hints on multiple tables") {
+checkSQL(
+  "SELECT /*+ MAPJOIN(parquet_t0, parquet_t1) */ * FROM parquet_t0, 
parquet_t1",
+  "multiple_broadcast_hints")
+  }
+
+  test("broadcast hint with filter") {
+checkSQL(
+  "SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0 WHERE id < 10",
+  "broadcast_hint_with_filter")
+  }
+
+  test("broadcast hint with filter/limit") {
+checkSQL(
+  "SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0 WHERE id < 10 
LIMIT 10",
+  "broadcast_hint_with_filter_limit")
+  }
+
+  test("broadcast hint with generator") {
+checkSQL(
+  "SELECT * FROM (SELECT /*+ MAPJOIN(parquet_t0) */ 
EXPLODE(ARRAY(1,2,3)) FROM parquet_t0) T",
+  "broadcast_hint_generator")
+  }
+
+  test("broadcast hint with groupby/having/orderby") {
+checkSQL(
+  """
+|SELECT /*+ MAPJOIN(parquet_t0) */ *
+|FROM parquet_t0
+|WHERE id > 0
+|GROUP BY id
+|HAVING count(*) > 0
+|ORDER BY id
+  """.stripMargin,
+  "broadcast_hint_groupby_having_orderby")
+  }
+
+  test("broadcast hint with window") {
+checkSQL(
+  """
+|SELECT /*+ MAPJOIN(parquet_t1) */
+|   x.key, MAX(y.key) OVER (PARTITION BY x.key % 5 ORDER BY 
x.key)
+|FROM parquet_t1 x JOIN parquet_t1 y ON x.key = y.key
+  """.stripMargin,
+  "broadcast_hint_window")
+  }
+
+  test("broadcast hint with rollup") {
+checkSQL(
+  """
+|SELECT /*+ MAPJOIN(parquet_t1) */
+|   count(*) as cnt, key%5, grouping_id()
+|FROM parquet_t1
+|GROUP BY key % 5 WITH ROLLUP""".stripMargin,
--- End diff --

format


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71424857
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/BroadcastHintSuite.scala ---
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.catalyst.plans.logical.BroadcastHint
+import org.apache.spark.sql.hive.test.TestHiveSingleton
+import org.apache.spark.sql.test.SQLTestUtils
+
+class BroadcastHintSuite extends QueryTest with SQLTestUtils with 
TestHiveSingleton {
+  test("broadcast hint on Hive table") {
+withTable("hive_t", "hive_u") {
+  spark.sql("CREATE TABLE hive_t(a int)")
+  spark.sql("CREATE TABLE hive_u(b int)")
+
+  val hive_t = spark.table("hive_t").queryExecution.analyzed
+  val hive_u = spark.table("hive_u").queryExecution.analyzed
+
+  val plan = spark.sql("SELECT /*+ MAPJOIN(hive_t) */ * FROM hive_t, 
hive_u")
+.queryExecution.analyzed
+  assert(plan.children(0).children(0).isInstanceOf[BroadcastHint])
--- End diff --

This is fragile. You can use `collectFirst` instead. Also check the content 
of `BroadcastHint`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71265020
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,51 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+
+  for (param <- parameters) {
+val names = param.split("\\.")
+val tid = if (names.length > 1) {
+  TableIdentifier(names(1), Some(names(0)))
+} else {
+  TableIdentifier(param, None)
+}
+try {
+  catalog.lookupRelation(tid)
+
+  var stop = false
+  resolvedChild = resolvedChild.transformDown {
+case r @ BroadcastHint(SubqueryAlias(t, _))
+  if !stop && resolver(t, tid.identifier) =>
+  stop = true
+  r
+case r @ SubqueryAlias(t, _) if !stop && resolver(t, 
tid.identifier) =>
+  stop = true
+  BroadcastHint(r)
--- End diff --

I think we have to remove it; otherwise, the result will be wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71261455
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,51 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+
+  for (param <- parameters) {
+val names = param.split("\\.")
+val tid = if (names.length > 1) {
+  TableIdentifier(names(1), Some(names(0)))
+} else {
+  TableIdentifier(param, None)
+}
+try {
+  catalog.lookupRelation(tid)
+
+  var stop = false
+  resolvedChild = resolvedChild.transformDown {
+case r @ BroadcastHint(SubqueryAlias(t, _))
+  if !stop && resolver(t, tid.identifier) =>
+  stop = true
+  r
+case r @ SubqueryAlias(t, _) if !stop && resolver(t, 
tid.identifier) =>
+  stop = true
+  BroadcastHint(r)
--- End diff --

Sure, If you want to remove this, I can simply this. It's a little bit 
legacy I tried to follow your advice as much as possible. (As I mentioned 
before, I decide to block in Parser layer after I found that Hive does.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71261042
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +452,49 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+/**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside 
the plan, and the
+ * information of BroadcastHint resides its current position inside 
the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information 
of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the 
nearest Project node.
+ */
+object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan 
transformUp {
+// Capture the broadcasted information and store it in Hint.
+case BroadcastHint(child @ SubqueryAlias(_, Project(_, 
SQLTable(database, table, _, _ =>
+  Hint("BROADCAST", Seq(table), child)
+
+// Nearest Project is found.
+case p @ Project(_, Hint(_, _, _)) => p
+
+// Merge BROADCAST hints up to the nearest Project.
+case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) 
=>
+  h.copy(parameters = params1 ++ params2)
+case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ 
Hint("BROADCAST", p2, right), _, _) =>
+  h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right 
= right))
+
+// Bubble up BROADCAST hints to the nearest Project.
+case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+  h.copy(child = j.copy(left = hintChild))
+case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+  h.copy(child = j.copy(right = hintChild))
+case s @ SubqueryAlias(_, h @ Hint("BROADCAST", _, hintChild)) =>
+  h.copy(child = s.copy(child = hintChild))
+case ll @ LocalLimit(_, h @ Hint("BROADCAST", _, hintChild)) =>
+  h.copy(child = ll.copy(child = hintChild))
+case f @ Filter(_, h @ Hint("BROADCAST", _, hintChild)) =>
+  h.copy(child = f.copy(child = hintChild))
+case a @ Aggregate(_, _, h @ Hint("BROADCAST", _, hintChild)) =>
+  h.copy(child = a.copy(child = hintChild))
+case s @ Sort(_, _, h @ Hint("BROADCAST", _, hintChild)) =>
+  h.copy(child = s.copy(child = hintChild))
+case g @ Generate(_, _, _, _, _, h @ Hint("BROADCAST", _, 
hintChild)) =>
+  h.copy(child = g.copy(child = hintChild))
+// Set operation is not allowed to be across. 
UNION/INTERCEPT/EXCEPT
--- End diff --

Okay. But, that was the reason not to allowed there. Hmm, maybe it looks 
different. Sorry.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71260947
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,51 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+
+  for (param <- parameters) {
+val names = param.split("\\.")
+val tid = if (names.length > 1) {
+  TableIdentifier(names(1), Some(names(0)))
+} else {
+  TableIdentifier(param, None)
+}
+try {
+  catalog.lookupRelation(tid)
+
+  var stop = false
+  resolvedChild = resolvedChild.transformDown {
+case r @ BroadcastHint(SubqueryAlias(t, _))
+  if !stop && resolver(t, tid.identifier) =>
+  stop = true
+  r
+case r @ SubqueryAlias(t, _) if !stop && resolver(t, 
tid.identifier) =>
+  stop = true
+  BroadcastHint(r)
--- End diff --

If we do not support `db.table`, why we still compare the whole identifier? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71260158
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +452,49 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+/**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside 
the plan, and the
+ * information of BroadcastHint resides its current position inside 
the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information 
of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the 
nearest Project node.
+ */
+object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan 
transformUp {
+// Capture the broadcasted information and store it in Hint.
+case BroadcastHint(child @ SubqueryAlias(_, Project(_, 
SQLTable(database, table, _, _ =>
+  Hint("BROADCAST", Seq(table), child)
+
+// Nearest Project is found.
+case p @ Project(_, Hint(_, _, _)) => p
+
+// Merge BROADCAST hints up to the nearest Project.
+case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) 
=>
+  h.copy(parameters = params1 ++ params2)
+case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ 
Hint("BROADCAST", p2, right), _, _) =>
+  h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right 
= right))
+
+// Bubble up BROADCAST hints to the nearest Project.
+case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+  h.copy(child = j.copy(left = hintChild))
+case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+  h.copy(child = j.copy(right = hintChild))
+case s @ SubqueryAlias(_, h @ Hint("BROADCAST", _, hintChild)) =>
+  h.copy(child = s.copy(child = hintChild))
+case ll @ LocalLimit(_, h @ Hint("BROADCAST", _, hintChild)) =>
+  h.copy(child = ll.copy(child = hintChild))
+case f @ Filter(_, h @ Hint("BROADCAST", _, hintChild)) =>
+  h.copy(child = f.copy(child = hintChild))
+case a @ Aggregate(_, _, h @ Hint("BROADCAST", _, hintChild)) =>
+  h.copy(child = a.copy(child = hintChild))
+case s @ Sort(_, _, h @ Hint("BROADCAST", _, hintChild)) =>
+  h.copy(child = s.copy(child = hintChild))
+case g @ Generate(_, _, _, _, _, h @ Hint("BROADCAST", _, 
hintChild)) =>
+  h.copy(child = g.copy(child = hintChild))
+// Set operation is not allowed to be across. 
UNION/INTERCEPT/EXCEPT
--- End diff --

Your reply is different from the comment in the code. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71259834
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -87,6 +87,7 @@ class Analyzer(
   EliminateUnions),
 Batch("Resolution", fixedPoint,
   ResolveRelations ::
+  SubstituteHints ::
--- End diff --

In each batch, the order of rules should not matter. That means, the rule 
`SubstituteHints` impacts the other rules. Please fix it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71259759
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,51 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+
+  for (param <- parameters) {
+val names = param.split("\\.")
--- End diff --

Sorry, I do not think this is beyond of the scope. Instead, this is a bug. 
This identifier stores a table identifier. If we do not use the same parsing 
solution, the result will be wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71259501
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +452,49 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+/**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside 
the plan, and the
+ * information of BroadcastHint resides its current position inside 
the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information 
of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the 
nearest Project node.
+ */
+object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan 
transformUp {
+// Capture the broadcasted information and store it in Hint.
+case BroadcastHint(child @ SubqueryAlias(_, Project(_, 
SQLTable(database, table, _, _ =>
+  Hint("BROADCAST", Seq(table), child)
+
+// Nearest Project is found.
+case p @ Project(_, Hint(_, _, _)) => p
+
+// Merge BROADCAST hints up to the nearest Project.
+case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) 
=>
+  h.copy(parameters = params1 ++ params2)
+case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ 
Hint("BROADCAST", p2, right), _, _) =>
+  h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right 
= right))
+
+// Bubble up BROADCAST hints to the nearest Project.
--- End diff --

If you read what we did in SQLBuilder, you might know that is not the 
normal way we did. @rxin gave the same comment below: 
https://github.com/apache/spark/pull/14132#issuecomment-233503432. 

Keeping a white list is hard to maintain. You know, I still can find more 
missing cases here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71227811
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala
 ---
@@ -988,9 +988,88 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest 
with SQLTestUtils {
 )
   }
 
+  test("broadcast hint with window") {
+checkGeneratedSQL(
+  """
+|SELECT /*+ MAPJOIN(parquet_t1) */
+|   x.key, MAX(y.key) OVER (PARTITION BY x.key % 5 ORDER BY 
x.key)
+|FROM parquet_t1 x JOIN parquet_t1 y ON x.key = y.key
+  """.stripMargin,
+  """
+|SELECT `gen_attr` AS `key`, `gen_attr` AS `max(key) OVER 
(PARTITION BY (key % CAST(5 AS BIGINT)) ORDER BY key ASC RANGE BETWEEN 
UNBOUNDED PRECEDING AND CURRENT ROW)`
+|FROM (SELECT `gen_attr`, `gen_attr`
+|  FROM (SELECT gen_subquery_2.`gen_attr`, 
gen_subquery_2.`gen_attr`, gen_subquery_2.`gen_attr`,
+|   max(`gen_attr`) OVER (PARTITION BY `gen_attr` 
ORDER BY `gen_attr` ASC
+| RANGE BETWEEN UNBOUNDED 
PRECEDING AND CURRENT ROW) AS `gen_attr`
+|FROM (SELECT /*+ MAPJOIN(parquet_t1) */ `gen_attr`, 
`gen_attr`,
+| (`gen_attr` % CAST(5 AS BIGINT)) AS 
`gen_attr`
+|  FROM ((SELECT `key` AS `gen_attr`, `value` AS 
`gen_attr`
+| FROM `default`.`parquet_t1`)
+| AS gen_subquery_0)
+|  AS x
+|   INNER JOIN (SELECT `key` AS `gen_attr`, `value` AS 
`gen_attr`
+|   FROM `default`.`parquet_t1`)
+|   AS gen_subquery_1
+|   ON (`gen_attr` = `gen_attr`))
+|   AS gen_subquery_2)
+|  AS gen_subquery_3)
+|  AS x
+  """.stripMargin
+)
+  }
+
+  test("broadcast hint with rollup") {
+checkGeneratedSQL(
+  """
+|SELECT /*+ MAPJOIN(parquet_t1) */
+|   count(*) as cnt, key%5, grouping_id()
+|FROM parquet_t1
+|GROUP BY key % 5 WITH ROLLUP""".stripMargin,
+  """
+|SELECT `gen_attr` AS `cnt`, `gen_attr` AS `(key % CAST(5 AS 
BIGINT))`, `gen_attr` AS `grouping_id()`
+|FROM (SELECT /*+ MAPJOIN(parquet_t1) */ count(1) AS `gen_attr`,
+| (`gen_attr` % CAST(5 AS BIGINT)) AS `gen_attr`, 
grouping_id() AS `gen_attr`
+|  FROM (SELECT `key` AS `gen_attr`, `value` AS `gen_attr`
+|FROM `default`.`parquet_t1`)
+|AS gen_subquery_0
+|  GROUP BY (`gen_attr` % CAST(5 AS BIGINT))
+|  GROUPING SETS(((`gen_attr` % CAST(5 AS BIGINT))), ()))
+|  AS gen_subquery_1
+  """.stripMargin)
+
+  }
+
+  test("broadcast hint with grouping sets") {
+checkGeneratedSQL(
+  s"""
+ |SELECT /*+ MAPJOIN(parquet_t1) */
+ |   count(*) AS cnt, key % 5 AS k1, key - 5 AS 
k2,grouping_id() AS k3
+ |FROM (SELECT key, key % 2, key - 5 FROM parquet_t1) t GROUP BY 
key % 5, key - 5
+ |GROUPING SETS (key % 5, key - 5)
+  """.stripMargin,
+  """
+|SELECT `gen_attr` AS `cnt`, `gen_attr` AS `k1`, `gen_attr` AS 
`k2`, `gen_attr` AS `k3`
+|FROM (SELECT count(1) AS `gen_attr`, (`gen_attr` % CAST(5 AS 
BIGINT)) AS `gen_attr`,
+| (`gen_attr` - CAST(5 AS BIGINT)) AS `gen_attr`, 
grouping_id() AS `gen_attr`
+|  FROM (SELECT /*+ MAPJOIN(parquet_t1) */ `gen_attr`,
+|   (`gen_attr` % CAST(2 AS BIGINT)) AS `gen_attr`,
+|   (`gen_attr` - CAST(5 AS BIGINT)) AS `gen_attr`
+|FROM (SELECT `key` AS `gen_attr`, `value` AS 
`gen_attr`
+|  FROM `default`.`parquet_t1`)
+|  AS gen_subquery_0)
+|AS t
+|  GROUP BY (`gen_attr` % CAST(5 AS BIGINT)), (`gen_attr` - 
CAST(5 AS BIGINT))
+|  GROUPING SETS(((`gen_attr` % CAST(5 AS BIGINT))), 
((`gen_attr` - CAST(5 AS BIGINT)
+|  AS gen_subquery_1
+  """.stripMargin)
+  }
+
+
   // This will be removed in favor of SPARK-16590.
   private def checkGeneratedSQL(sqlString: String, expectedString: 
String): Unit = {
-val generatedSQL = new SQLBuilder(sql(sqlString)).toSQL
+val df = sql(sqlString)
+logError("\n" + df.queryExecution.analyzed.treeString)
--- End diff --

After https://github.com/apache/spark/pull/14235, I need to change

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71227405
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala
 ---
@@ -988,9 +988,88 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest 
with SQLTestUtils {
 )
   }
 
+  test("broadcast hint with window") {
+checkGeneratedSQL(
+  """
+|SELECT /*+ MAPJOIN(parquet_t1) */
+|   x.key, MAX(y.key) OVER (PARTITION BY x.key % 5 ORDER BY 
x.key)
+|FROM parquet_t1 x JOIN parquet_t1 y ON x.key = y.key
+  """.stripMargin,
+  """
+|SELECT `gen_attr` AS `key`, `gen_attr` AS `max(key) OVER 
(PARTITION BY (key % CAST(5 AS BIGINT)) ORDER BY key ASC RANGE BETWEEN 
UNBOUNDED PRECEDING AND CURRENT ROW)`
+|FROM (SELECT `gen_attr`, `gen_attr`
+|  FROM (SELECT gen_subquery_2.`gen_attr`, 
gen_subquery_2.`gen_attr`, gen_subquery_2.`gen_attr`,
+|   max(`gen_attr`) OVER (PARTITION BY `gen_attr` 
ORDER BY `gen_attr` ASC
+| RANGE BETWEEN UNBOUNDED 
PRECEDING AND CURRENT ROW) AS `gen_attr`
+|FROM (SELECT /*+ MAPJOIN(parquet_t1) */ `gen_attr`, 
`gen_attr`,
+| (`gen_attr` % CAST(5 AS BIGINT)) AS 
`gen_attr`
+|  FROM ((SELECT `key` AS `gen_attr`, `value` AS 
`gen_attr`
+| FROM `default`.`parquet_t1`)
+| AS gen_subquery_0)
+|  AS x
+|   INNER JOIN (SELECT `key` AS `gen_attr`, `value` AS 
`gen_attr`
+|   FROM `default`.`parquet_t1`)
+|   AS gen_subquery_1
+|   ON (`gen_attr` = `gen_attr`))
+|   AS gen_subquery_2)
+|  AS gen_subquery_3)
+|  AS x
+  """.stripMargin
+)
+  }
+
+  test("broadcast hint with rollup") {
+checkGeneratedSQL(
+  """
+|SELECT /*+ MAPJOIN(parquet_t1) */
+|   count(*) as cnt, key%5, grouping_id()
+|FROM parquet_t1
+|GROUP BY key % 5 WITH ROLLUP""".stripMargin,
+  """
+|SELECT `gen_attr` AS `cnt`, `gen_attr` AS `(key % CAST(5 AS 
BIGINT))`, `gen_attr` AS `grouping_id()`
+|FROM (SELECT /*+ MAPJOIN(parquet_t1) */ count(1) AS `gen_attr`,
+| (`gen_attr` % CAST(5 AS BIGINT)) AS `gen_attr`, 
grouping_id() AS `gen_attr`
+|  FROM (SELECT `key` AS `gen_attr`, `value` AS `gen_attr`
+|FROM `default`.`parquet_t1`)
+|AS gen_subquery_0
+|  GROUP BY (`gen_attr` % CAST(5 AS BIGINT))
+|  GROUPING SETS(((`gen_attr` % CAST(5 AS BIGINT))), ()))
+|  AS gen_subquery_1
+  """.stripMargin)
+
+  }
+
+  test("broadcast hint with grouping sets") {
+checkGeneratedSQL(
+  s"""
+ |SELECT /*+ MAPJOIN(parquet_t1) */
+ |   count(*) AS cnt, key % 5 AS k1, key - 5 AS 
k2,grouping_id() AS k3
+ |FROM (SELECT key, key % 2, key - 5 FROM parquet_t1) t GROUP BY 
key % 5, key - 5
+ |GROUPING SETS (key % 5, key - 5)
+  """.stripMargin,
+  """
+|SELECT `gen_attr` AS `cnt`, `gen_attr` AS `k1`, `gen_attr` AS 
`k2`, `gen_attr` AS `k3`
+|FROM (SELECT count(1) AS `gen_attr`, (`gen_attr` % CAST(5 AS 
BIGINT)) AS `gen_attr`,
+| (`gen_attr` - CAST(5 AS BIGINT)) AS `gen_attr`, 
grouping_id() AS `gen_attr`
+|  FROM (SELECT /*+ MAPJOIN(parquet_t1) */ `gen_attr`,
+|   (`gen_attr` % CAST(2 AS BIGINT)) AS `gen_attr`,
+|   (`gen_attr` - CAST(5 AS BIGINT)) AS `gen_attr`
+|FROM (SELECT `key` AS `gen_attr`, `value` AS 
`gen_attr`
+|  FROM `default`.`parquet_t1`)
+|  AS gen_subquery_0)
+|AS t
+|  GROUP BY (`gen_attr` % CAST(5 AS BIGINT)), (`gen_attr` - 
CAST(5 AS BIGINT))
+|  GROUPING SETS(((`gen_attr` % CAST(5 AS BIGINT))), 
((`gen_attr` - CAST(5 AS BIGINT)
+|  AS gen_subquery_1
+  """.stripMargin)
+  }
+
+
   // This will be removed in favor of SPARK-16590.
   private def checkGeneratedSQL(sqlString: String, expectedString: 
String): Unit = {
-val generatedSQL = new SQLBuilder(sql(sqlString)).toSQL
+val df = sql(sqlString)
+logError("\n" + df.queryExecution.analyzed.treeString)
--- End diff --

Oops. Please ignore this since we will be under review for a while.

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71203247
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala
 ---
@@ -755,4 +755,243 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest 
with SQLTestUtils {
   checkHiveQl("select * from orc_t")
 }
   }
+
+  test("broadcast hint on single table") {
+checkGeneratedSQL(
+  "SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0",
+  """
+|SELECT `gen_attr` AS `id`
+|FROM (SELECT /*+ MAPJOIN(parquet_t0) */ `gen_attr`
+|  FROM (SELECT `id` AS `gen_attr`
+|FROM `default`.`parquet_t0`)
+|AS gen_subquery_0)
+|  AS parquet_t0
+|""".stripMargin)
+
+checkGeneratedSQL(
+  "SELECT /*+ MAPJOIN(parquet_t0, parquet_t0) */ * FROM parquet_t0",
+  """
+|SELECT `gen_attr` AS `id`
+|FROM (SELECT /*+ MAPJOIN(parquet_t0) */ `gen_attr`
+|  FROM (SELECT `id` AS `gen_attr`
+|FROM `default`.`parquet_t0`)
+|AS gen_subquery_0)
+|  AS parquet_t0
+|""".stripMargin)
+
+checkGeneratedSQL(
+  "SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0 as a",
+  """
+|SELECT `gen_attr` AS `id`
+|FROM (SELECT /*+ MAPJOIN(parquet_t0) */ `gen_attr`
+|  FROM ((SELECT `id` AS `gen_attr`
+| FROM `default`.`parquet_t0`)
+| AS gen_subquery_0)
+|AS a)
+|  AS a
+  """.stripMargin)
+
+checkGeneratedSQL(
+  """
+|SELECT /*+ MAPJOIN(parquet_t0) */ *
--- End diff --

It's beyond the scope of this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71203038
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,51 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+
+  for (param <- parameters) {
+val names = param.split("\\.")
+val tid = if (names.length > 1) {
+  TableIdentifier(names(1), Some(names(0)))
+} else {
+  TableIdentifier(param, None)
+}
+try {
+  catalog.lookupRelation(tid)
+
+  var stop = false
+  resolvedChild = resolvedChild.transformDown {
+case r @ BroadcastHint(SubqueryAlias(t, _))
+  if !stop && resolver(t, tid.identifier) =>
+  stop = true
+  r
+case r @ SubqueryAlias(t, _) if !stop && resolver(t, 
tid.identifier) =>
+  stop = true
+  BroadcastHint(r)
--- End diff --

Here, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71202999
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +452,49 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+/**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside 
the plan, and the
+ * information of BroadcastHint resides its current position inside 
the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information 
of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the 
nearest Project node.
+ */
+object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan 
transformUp {
+// Capture the broadcasted information and store it in Hint.
+case BroadcastHint(child @ SubqueryAlias(_, Project(_, 
SQLTable(database, table, _, _ =>
+  Hint("BROADCAST", Seq(table), child)
+
+// Nearest Project is found.
+case p @ Project(_, Hint(_, _, _)) => p
+
+// Merge BROADCAST hints up to the nearest Project.
+case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) 
=>
+  h.copy(parameters = params1 ++ params2)
+case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ 
Hint("BROADCAST", p2, right), _, _) =>
+  h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right 
= right))
+
+// Bubble up BROADCAST hints to the nearest Project.
+case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+  h.copy(child = j.copy(left = hintChild))
+case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+  h.copy(child = j.copy(right = hintChild))
+case s @ SubqueryAlias(_, h @ Hint("BROADCAST", _, hintChild)) =>
--- End diff --

It seems `database.table` issue which I mentioned before.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71202822
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +452,49 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+/**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside 
the plan, and the
+ * information of BroadcastHint resides its current position inside 
the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information 
of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the 
nearest Project node.
+ */
+object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan 
transformUp {
+// Capture the broadcasted information and store it in Hint.
+case BroadcastHint(child @ SubqueryAlias(_, Project(_, 
SQLTable(database, table, _, _ =>
+  Hint("BROADCAST", Seq(table), child)
+
+// Nearest Project is found.
+case p @ Project(_, Hint(_, _, _)) => p
+
+// Merge BROADCAST hints up to the nearest Project.
+case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) 
=>
+  h.copy(parameters = params1 ++ params2)
+case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ 
Hint("BROADCAST", p2, right), _, _) =>
+  h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right 
= right))
+
+// Bubble up BROADCAST hints to the nearest Project.
+case j @ Join(h @ Hint("BROADCAST", _, hintChild), _, _, _) =>
+  h.copy(child = j.copy(left = hintChild))
+case j @ Join(_, h @ Hint("BROADCAST", _, hintChild), _, _) =>
+  h.copy(child = j.copy(right = hintChild))
+case s @ SubqueryAlias(_, h @ Hint("BROADCAST", _, hintChild)) =>
+  h.copy(child = s.copy(child = hintChild))
+case ll @ LocalLimit(_, h @ Hint("BROADCAST", _, hintChild)) =>
+  h.copy(child = ll.copy(child = hintChild))
+case f @ Filter(_, h @ Hint("BROADCAST", _, hintChild)) =>
+  h.copy(child = f.copy(child = hintChild))
+case a @ Aggregate(_, _, h @ Hint("BROADCAST", _, hintChild)) =>
+  h.copy(child = a.copy(child = hintChild))
+case s @ Sort(_, _, h @ Hint("BROADCAST", _, hintChild)) =>
+  h.copy(child = s.copy(child = hintChild))
+case g @ Generate(_, _, _, _, _, h @ Hint("BROADCAST", _, 
hintChild)) =>
+  h.copy(child = g.copy(child = hintChild))
+// Set operation is not allowed to be across. 
UNION/INTERCEPT/EXCEPT
--- End diff --

It's not a restriction, SET operation UNION/INTERCETP/EXCEPT has `Project` 
inside it. So, I skip those.

As I mentioned before, theoretically, the Hint can not be restored into the 
original `Project`. In addition to that, 
since `Canonicalizer` adds a number of extra things, we can not move upward 
enough, too. We need to stop of  nearest `Project`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71202124
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/catalyst/SQLBuilder.scala ---
@@ -425,6 +452,49 @@ class SQLBuilder(logicalPlan: LogicalPlan) extends 
Logging {
   }
 }
 
+/**
+ * Merge and move upward to the nearest Project.
+ * A broadcast hint comment is scattered into multiple nodes inside 
the plan, and the
+ * information of BroadcastHint resides its current position inside 
the plan. In order to
+ * reconstruct broadcast hint comment, we need to pack the information 
of BroadcastHint into
+ * Hint("BROADCAST", _, _) and collect them up by moving upward to the 
nearest Project node.
+ */
+object NormalizeBroadcastHint extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan 
transformUp {
+// Capture the broadcasted information and store it in Hint.
+case BroadcastHint(child @ SubqueryAlias(_, Project(_, 
SQLTable(database, table, _, _ =>
+  Hint("BROADCAST", Seq(table), child)
+
+// Nearest Project is found.
+case p @ Project(_, Hint(_, _, _)) => p
+
+// Merge BROADCAST hints up to the nearest Project.
+case Hint("BROADCAST", params1, h @ Hint("BROADCAST", params2, _)) 
=>
+  h.copy(parameters = params1 ++ params2)
+case j @ Join(h1 @ Hint("BROADCAST", p1, left), h2 @ 
Hint("BROADCAST", p2, right), _, _) =>
+  h1.copy(parameters = p1 ++ p2, child = j.copy(left = left, right 
= right))
+
+// Bubble up BROADCAST hints to the nearest Project.
--- End diff --

Thanks. I added `GlobalLimit`, too. It seems my mistake during removing the 
operator not allowed in SQLBuilder. When I develop Optimizers, I frequently 
asked to use white-list approach. I hope to keep this style if you are okay.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71201300
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1774,6 +1775,51 @@ class Analyzer(
   }
 
   /**
+   * Substitute Hints.
+   * - BROADCAST/BROADCASTJOIN/MAPJOIN match the closest table with the 
given name parameters.
+   */
+  object SubstituteHints extends Rule[LogicalPlan] {
+def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  case logical: LogicalPlan => logical transformDown {
+case h @ Hint(name, parameters, child)
+if Seq("BROADCAST", "BROADCASTJOIN", 
"MAPJOIN").contains(name.toUpperCase) =>
+  var resolvedChild = child
+
+  for (param <- parameters) {
+val names = param.split("\\.")
--- End diff --

I took a look at #14244 . It's about `TableIdentifier`. This PR is designed 
to use `identifier` only instead of `tableIdentifier`.
```
tableIdentifier
: (db=identifier '.')? table=identifier
;
```
It's beyond of the scope.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-18 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71200265
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -87,6 +87,7 @@ class Analyzer(
   EliminateUnions),
 Batch("Resolution", fixedPoint,
   ResolveRelations ::
+  SubstituteHints ::
--- End diff --

Yep. Here is the error when we move it from the batch `Resolution` into the 
batch `Nondeterministic`.
```
unresolved operator 'Project [*];
org.apache.spark.sql.AnalysisException: unresolved operator 'Project [*];
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:40)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:58)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14132: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-07-17 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14132#discussion_r71100213
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/catalyst/LogicalPlanToSQLSuite.scala
 ---
@@ -755,4 +755,243 @@ class LogicalPlanToSQLSuite extends SQLBuilderTest 
with SQLTestUtils {
   checkHiveQl("select * from orc_t")
 }
   }
+
+  test("broadcast hint on single table") {
+checkGeneratedSQL(
+  "SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0",
+  """
+|SELECT `gen_attr` AS `id`
+|FROM (SELECT /*+ MAPJOIN(parquet_t0) */ `gen_attr`
+|  FROM (SELECT `id` AS `gen_attr`
+|FROM `default`.`parquet_t0`)
+|AS gen_subquery_0)
+|  AS parquet_t0
+|""".stripMargin)
+
+checkGeneratedSQL(
+  "SELECT /*+ MAPJOIN(parquet_t0, parquet_t0) */ * FROM parquet_t0",
+  """
+|SELECT `gen_attr` AS `id`
+|FROM (SELECT /*+ MAPJOIN(parquet_t0) */ `gen_attr`
+|  FROM (SELECT `id` AS `gen_attr`
+|FROM `default`.`parquet_t0`)
+|AS gen_subquery_0)
+|  AS parquet_t0
+|""".stripMargin)
+
+checkGeneratedSQL(
+  "SELECT /*+ MAPJOIN(parquet_t0) */ * FROM parquet_t0 as a",
+  """
+|SELECT `gen_attr` AS `id`
+|FROM (SELECT /*+ MAPJOIN(parquet_t0) */ `gen_attr`
+|  FROM ((SELECT `id` AS `gen_attr`
+| FROM `default`.`parquet_t0`)
+| AS gen_subquery_0)
+|AS a)
+|  AS a
+  """.stripMargin)
+
+checkGeneratedSQL(
+  """
+|SELECT /*+ MAPJOIN(parquet_t0) */ *
--- End diff --

This should be `T. parquet_t0`, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 194 matches

Mail list logo