[GitHub] spark pull request #20405: [SPARK-23229][SQL] Dataset.hint should use planWi...

2018-10-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20405


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20405: [SPARK-23229][SQL] Dataset.hint should use planWi...

2018-05-08 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20405#discussion_r186637073
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1216,7 +1216,7 @@ class Dataset[T] private[sql](
*/
   @scala.annotation.varargs
   def hint(name: String, parameters: Any*): Dataset[T] = withTypedPlan {
-UnresolvedHint(name, parameters, logicalPlan)
+UnresolvedHint(name, parameters, planWithBarrier)
--- End diff --

I think so. cc @jaceklaskowski 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20405: [SPARK-23229][SQL] Dataset.hint should use planWi...

2018-05-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20405#discussion_r186632978
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1216,7 +1216,7 @@ class Dataset[T] private[sql](
*/
   @scala.annotation.varargs
   def hint(name: String, parameters: Any*): Dataset[T] = withTypedPlan {
-UnresolvedHint(name, parameters, logicalPlan)
+UnresolvedHint(name, parameters, planWithBarrier)
--- End diff --

shall we go with this workaround?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20405: [SPARK-23229][SQL] Dataset.hint should use planWi...

2018-01-28 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20405#discussion_r164340914
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1216,7 +1216,7 @@ class Dataset[T] private[sql](
*/
   @scala.annotation.varargs
   def hint(name: String, parameters: Any*): Dataset[T] = withTypedPlan {
-UnresolvedHint(name, parameters, logicalPlan)
+UnresolvedHint(name, parameters, planWithBarrier)
--- End diff --

 ah, I see. makes sense to me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20405: [SPARK-23229][SQL] Dataset.hint should use planWi...

2018-01-28 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20405#discussion_r164340278
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1216,7 +1216,7 @@ class Dataset[T] private[sql](
*/
   @scala.annotation.varargs
   def hint(name: String, parameters: Any*): Dataset[T] = withTypedPlan {
-UnresolvedHint(name, parameters, logicalPlan)
+UnresolvedHint(name, parameters, planWithBarrier)
--- End diff --

If we do that, it is basically as the same as passing the logical plan 
without barrier.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20405: [SPARK-23229][SQL] Dataset.hint should use planWi...

2018-01-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20405#discussion_r164338606
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1216,7 +1216,7 @@ class Dataset[T] private[sql](
*/
   @scala.annotation.varargs
   def hint(name: String, parameters: Any*): Dataset[T] = withTypedPlan {
-UnresolvedHint(name, parameters, logicalPlan)
+UnresolvedHint(name, parameters, planWithBarrier)
--- End diff --

a possible workaround is to explicitly go through the barrier in the hint 
resolution rules, so that we can still use barrier here and skip analysis in 
other analyzer rules.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20405: [SPARK-23229][SQL] Dataset.hint should use planWi...

2018-01-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20405#discussion_r164267184
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1216,7 +1216,7 @@ class Dataset[T] private[sql](
*/
   @scala.annotation.varargs
   def hint(name: String, parameters: Any*): Dataset[T] = withTypedPlan {
-UnresolvedHint(name, parameters, logicalPlan)
+UnresolvedHint(name, parameters, planWithBarrier)
--- End diff --

@jaceklaskowski Because the logical plan is wrapped in analysis barrier, 
`ResolveBroadcastHints` can't traverse down it to reach the 
`UnresolvedRelation` at 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala#L60-L61.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20405: [SPARK-23229][SQL] Dataset.hint should use planWi...

2018-01-27 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/20405#discussion_r164267100
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1216,7 +1216,7 @@ class Dataset[T] private[sql](
*/
   @scala.annotation.varargs
   def hint(name: String, parameters: Any*): Dataset[T] = withTypedPlan {
-UnresolvedHint(name, parameters, logicalPlan)
+UnresolvedHint(name, parameters, planWithBarrier)
--- End diff --

I thought that that's what `ResolveBroadcastHints` does --> 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala#L93-L101,
 doesn't it? I'm going to write a test case for it to confirm (and that's what 
I was asking for in the email to dev@spark the other day).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20405: [SPARK-23229][SQL] Dataset.hint should use planWi...

2018-01-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20405#discussion_r164250206
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1216,7 +1216,7 @@ class Dataset[T] private[sql](
*/
   @scala.annotation.varargs
   def hint(name: String, parameters: Any*): Dataset[T] = withTypedPlan {
-UnresolvedHint(name, parameters, logicalPlan)
+UnresolvedHint(name, parameters, planWithBarrier)
--- End diff --

I think @viirya has a valid concern. think about
```
val df1 = spark.table("t").select("id")
df1.hint("broadcast", "id")
```
We should transform down the plan of `df1`, find the bottom table relation 
and apply the hint.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20405: [SPARK-23229][SQL] Dataset.hint should use planWi...

2018-01-26 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/20405#discussion_r164191381
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1216,7 +1216,7 @@ class Dataset[T] private[sql](
*/
   @scala.annotation.varargs
   def hint(name: String, parameters: Any*): Dataset[T] = withTypedPlan {
-UnresolvedHint(name, parameters, logicalPlan)
+UnresolvedHint(name, parameters, planWithBarrier)
--- End diff --

My understanding however is that `planWithBarrier` is already analyzed (and 
`ResolveBroadcastHints` as the very first rule had its chance to do its work). 
That's the extra processing `hint` does every time it's called. Using 
`planWithBarrier` makes it less "painful".

Just use `hint` twice and see the analyzed plan.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20405: [SPARK-23229][SQL] Dataset.hint should use planWi...

2018-01-26 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20405#discussion_r164102996
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1216,7 +1216,7 @@ class Dataset[T] private[sql](
*/
   @scala.annotation.varargs
   def hint(name: String, parameters: Any*): Dataset[T] = withTypedPlan {
-UnresolvedHint(name, parameters, logicalPlan)
+UnresolvedHint(name, parameters, planWithBarrier)
--- End diff --

I think `ResolveBroadcastHints` rule will traverse recursively the children 
of logical plan. If we wrap it with a barrier, we can't be traverse down the 
tree.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20405: [SPARK-23229][SQL] Dataset.hint should use planWi...

2018-01-26 Thread jaceklaskowski
GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/20405

[SPARK-23229][SQL] Dataset.hint should use planWithBarrier logical plan

## What changes were proposed in this pull request?

Every time `Dataset.hint` is used it triggers execution of logical 
commands, their unions and hint resolution (among other things that analyzer 
does).

`hint` should use `planWithBarrier` instead.

Fixes https://issues.apache.org/jira/browse/SPARK-23229

## How was this patch tested?

Existing unit tests, local build + awaiting Jenkins

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark 
SPARK-23229-hint-planWithBarrier

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20405.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20405


commit 47bb245353202208f2c41634c3796c8e4d2be663
Author: Jacek Laskowski 
Date:   2018-01-26T11:20:48Z

[SPARK-23229][SQL] Dataset.hint should use planWithBarrier logical plan




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org