[GitHub] spark pull request #23204: Revert "[SPARK-21052][SQL] Add hash map metrics t...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23204#discussion_r240104812
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala ---
@@ -213,10 +213,6 @@ trait HashJoin {
   s"BroadcastHashJoin should not take $x as the JoinType")
 }
 
-// At the end of the task, we update the avg hash probe.
-TaskContext.get().addTaskCompletionListener[Unit](_ =>
--- End diff --

in this file, the `join` method takes `avgHashProbe: SQLMetric`, we should 
remove it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23204: Revert "[SPARK-21052][SQL] Add hash map metrics t...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23204#discussion_r238323331
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
 ---
@@ -483,8 +470,6 @@ private[execution] final class LongToUnsafeRowMap(val 
mm: TaskMemoryManager, cap
*/
   def getValue(key: Long, resultRow: UnsafeRow): UnsafeRow = {
 if (isDense) {
-  numKeyLookups += 1
-  numProbes += 1
--- End diff --

+1, like I said in 
https://github.com/apache/spark/pull/23204/files#r238257371


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23204: Revert "[SPARK-21052][SQL] Add hash map metrics t...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23204#discussion_r238322699
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
 ---
@@ -483,8 +470,6 @@ private[execution] final class LongToUnsafeRowMap(val 
mm: TaskMemoryManager, cap
*/
   def getValue(key: Long, resultRow: UnsafeRow): UnsafeRow = {
 if (isDense) {
-  numKeyLookups += 1
-  numProbes += 1
--- End diff --

If this is proved to cause perf regression, I think it's safer to revert 
`HashAggregateExec` as well, since they are doing the same thing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23204: Revert "[SPARK-21052][SQL] Add hash map metrics t...

2018-12-03 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/23204#discussion_r238270550
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala
 ---
@@ -483,8 +470,6 @@ private[execution] final class LongToUnsafeRowMap(val 
mm: TaskMemoryManager, cap
*/
   def getValue(key: Long, resultRow: UnsafeRow): UnsafeRow = {
 if (isDense) {
-  numKeyLookups += 1
-  numProbes += 1
--- End diff --

If as your test shows this is the cause of performance regression, we can 
just revert this and related changes. The change in `HashAggregateExec`, etc. 
can be kept.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23204: Revert "[SPARK-21052][SQL] Add hash map metrics t...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23204#discussion_r238264122
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala 
---
@@ -57,12 +57,6 @@ class SQLMetric(val metricType: String, initValue: Long 
= 0L) extends Accumulato
 
   override def add(v: Long): Unit = _value += v
 
-  // We can set a double value to `SQLMetric` which stores only long 
value, if it is
--- End diff --

I think we can keep the changes in this file as well.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23204: Revert "[SPARK-21052][SQL] Add hash map metrics t...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23204#discussion_r23825
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala
 ---
@@ -374,22 +374,6 @@ class TungstenAggregationIterator(
 }
   }
 
-  TaskContext.get().addTaskCompletionListener[Unit](_ => {
--- End diff --

ditto, it's better to put this code block here, let's keep this change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23204: Revert "[SPARK-21052][SQL] Add hash map metrics t...

2018-12-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23204#discussion_r238257371
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
 ---
@@ -63,7 +63,7 @@ case class HashAggregateExec(
 "peakMemory" -> SQLMetrics.createSizeMetric(sparkContext, "peak 
memory"),
 "spillSize" -> SQLMetrics.createSizeMetric(sparkContext, "spill size"),
 "aggTime" -> SQLMetrics.createTimingMetric(sparkContext, "aggregate 
time"),
-"avgHashProbe" -> SQLMetrics.createAverageMetric(sparkContext, "avg 
hash probe"))
+"avgHashmapProbe" -> SQLMetrics.createAverageMetric(sparkContext, "avg 
hashmap probe"))
--- End diff --

I know it's easy to just run the `git revert` command, but I'd like to 
manually revert it, since that PR was merged long time ago. And we should still 
keep changes like this renaming, as they are not quite related to the 
performance regression.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23204: Revert "[SPARK-21052][SQL] Add hash map metrics t...

2018-12-03 Thread JkSelf
GitHub user JkSelf opened a pull request:

https://github.com/apache/spark/pull/23204

Revert "[SPARK-21052][SQL] Add hash map metrics to join"

Because of the performance degradation discussion in 
[SPARK-26155](https://issues.apache.org/jira/browse/SPARK-26155), currently we 
revert [SPARK-21052](https://issues.apache.org/jira/browse/SPARK-21052)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JkSelf/spark revert-21052

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23204.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23204


commit 7d5008d11e37086f4a8206276791b9424bcf60b7
Author: jiake 
Date:   2018-12-03T08:18:44Z

Revert "[SPARK-21052][SQL] Add hash map metrics to join"

This reverts commit 18066f2e61f430b691ed8a777c9b4e5786bf9dbc.

Conflicts:

sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala

sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala

sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org