[GitHub] spark pull request #20405: [SPARK-23229][SQL] Dataset.hint should use planWi...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20405 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20405: [SPARK-23229][SQL] Dataset.hint should use planWi...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20405#discussion_r186637073 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1216,7 +1216,7 @@ class Dataset[T] private[sql]( */ @scala.annotation.varargs def hint(name: String, parameters: Any*): Dataset[T] = withTypedPlan { -UnresolvedHint(name, parameters, logicalPlan) +UnresolvedHint(name, parameters, planWithBarrier) --- End diff -- I think so. cc @jaceklaskowski --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20405: [SPARK-23229][SQL] Dataset.hint should use planWi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20405#discussion_r186632978 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1216,7 +1216,7 @@ class Dataset[T] private[sql]( */ @scala.annotation.varargs def hint(name: String, parameters: Any*): Dataset[T] = withTypedPlan { -UnresolvedHint(name, parameters, logicalPlan) +UnresolvedHint(name, parameters, planWithBarrier) --- End diff -- shall we go with this workaround? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20405: [SPARK-23229][SQL] Dataset.hint should use planWi...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20405#discussion_r164340914 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1216,7 +1216,7 @@ class Dataset[T] private[sql]( */ @scala.annotation.varargs def hint(name: String, parameters: Any*): Dataset[T] = withTypedPlan { -UnresolvedHint(name, parameters, logicalPlan) +UnresolvedHint(name, parameters, planWithBarrier) --- End diff -- ah, I see. makes sense to me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20405: [SPARK-23229][SQL] Dataset.hint should use planWi...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20405#discussion_r164340278 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1216,7 +1216,7 @@ class Dataset[T] private[sql]( */ @scala.annotation.varargs def hint(name: String, parameters: Any*): Dataset[T] = withTypedPlan { -UnresolvedHint(name, parameters, logicalPlan) +UnresolvedHint(name, parameters, planWithBarrier) --- End diff -- If we do that, it is basically as the same as passing the logical plan without barrier. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20405: [SPARK-23229][SQL] Dataset.hint should use planWi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20405#discussion_r164338606 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1216,7 +1216,7 @@ class Dataset[T] private[sql]( */ @scala.annotation.varargs def hint(name: String, parameters: Any*): Dataset[T] = withTypedPlan { -UnresolvedHint(name, parameters, logicalPlan) +UnresolvedHint(name, parameters, planWithBarrier) --- End diff -- a possible workaround is to explicitly go through the barrier in the hint resolution rules, so that we can still use barrier here and skip analysis in other analyzer rules. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20405: [SPARK-23229][SQL] Dataset.hint should use planWi...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20405#discussion_r164267184 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1216,7 +1216,7 @@ class Dataset[T] private[sql]( */ @scala.annotation.varargs def hint(name: String, parameters: Any*): Dataset[T] = withTypedPlan { -UnresolvedHint(name, parameters, logicalPlan) +UnresolvedHint(name, parameters, planWithBarrier) --- End diff -- @jaceklaskowski Because the logical plan is wrapped in analysis barrier, `ResolveBroadcastHints` can't traverse down it to reach the `UnresolvedRelation` at https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala#L60-L61. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20405: [SPARK-23229][SQL] Dataset.hint should use planWi...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/20405#discussion_r164267100 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1216,7 +1216,7 @@ class Dataset[T] private[sql]( */ @scala.annotation.varargs def hint(name: String, parameters: Any*): Dataset[T] = withTypedPlan { -UnresolvedHint(name, parameters, logicalPlan) +UnresolvedHint(name, parameters, planWithBarrier) --- End diff -- I thought that that's what `ResolveBroadcastHints` does --> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala#L93-L101, doesn't it? I'm going to write a test case for it to confirm (and that's what I was asking for in the email to dev@spark the other day). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20405: [SPARK-23229][SQL] Dataset.hint should use planWi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20405#discussion_r164250206 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1216,7 +1216,7 @@ class Dataset[T] private[sql]( */ @scala.annotation.varargs def hint(name: String, parameters: Any*): Dataset[T] = withTypedPlan { -UnresolvedHint(name, parameters, logicalPlan) +UnresolvedHint(name, parameters, planWithBarrier) --- End diff -- I think @viirya has a valid concern. think about ``` val df1 = spark.table("t").select("id") df1.hint("broadcast", "id") ``` We should transform down the plan of `df1`, find the bottom table relation and apply the hint. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20405: [SPARK-23229][SQL] Dataset.hint should use planWi...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/20405#discussion_r164191381 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1216,7 +1216,7 @@ class Dataset[T] private[sql]( */ @scala.annotation.varargs def hint(name: String, parameters: Any*): Dataset[T] = withTypedPlan { -UnresolvedHint(name, parameters, logicalPlan) +UnresolvedHint(name, parameters, planWithBarrier) --- End diff -- My understanding however is that `planWithBarrier` is already analyzed (and `ResolveBroadcastHints` as the very first rule had its chance to do its work). That's the extra processing `hint` does every time it's called. Using `planWithBarrier` makes it less "painful". Just use `hint` twice and see the analyzed plan. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20405: [SPARK-23229][SQL] Dataset.hint should use planWi...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20405#discussion_r164102996 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1216,7 +1216,7 @@ class Dataset[T] private[sql]( */ @scala.annotation.varargs def hint(name: String, parameters: Any*): Dataset[T] = withTypedPlan { -UnresolvedHint(name, parameters, logicalPlan) +UnresolvedHint(name, parameters, planWithBarrier) --- End diff -- I think `ResolveBroadcastHints` rule will traverse recursively the children of logical plan. If we wrap it with a barrier, we can't be traverse down the tree. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20405: [SPARK-23229][SQL] Dataset.hint should use planWi...
GitHub user jaceklaskowski opened a pull request: https://github.com/apache/spark/pull/20405 [SPARK-23229][SQL] Dataset.hint should use planWithBarrier logical plan ## What changes were proposed in this pull request? Every time `Dataset.hint` is used it triggers execution of logical commands, their unions and hint resolution (among other things that analyzer does). `hint` should use `planWithBarrier` instead. Fixes https://issues.apache.org/jira/browse/SPARK-23229 ## How was this patch tested? Existing unit tests, local build + awaiting Jenkins You can merge this pull request into a Git repository by running: $ git pull https://github.com/jaceklaskowski/spark SPARK-23229-hint-planWithBarrier Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20405.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20405 commit 47bb245353202208f2c41634c3796c8e4d2be663 Author: Jacek Laskowski Date: 2018-01-26T11:20:48Z [SPARK-23229][SQL] Dataset.hint should use planWithBarrier logical plan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org