[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-148515049
  
  [Test build #43801 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43801/console)
 for   PR 8931 at commit 
[`7e1b68d`](https://github.com/apache/spark/commit/7e1b68d9aa9745f96164ed230f3156189b2086a1).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-148515344
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43801/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-148515341
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-148514504
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43799/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8931


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-148514502
  
Build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-148514249
  
  [Test build #43799 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43799/console)
 for   PR 8931 at commit 
[`4d0e8c5`](https://github.com/apache/spark/commit/4d0e8c5908824389cda955c59aa89833d03440a8).
 * This patch **passes all tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-15 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-148532025
  
Merged into master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-148482047
  
 Build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-148482080
  
Build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-148483326
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-148483297
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-148484027
  
  [Test build #43801 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43801/consoleFull)
 for   PR 8931 at commit 
[`7e1b68d`](https://github.com/apache/spark/commit/7e1b68d9aa9745f96164ed230f3156189b2086a1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-148482468
  
  [Test build #43799 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43799/consoleFull)
 for   PR 8931 at commit 
[`4d0e8c5`](https://github.com/apache/spark/commit/4d0e8c5908824389cda955c59aa89833d03440a8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-14 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8931#discussion_r42062981
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala
 ---
@@ -40,7 +40,9 @@ case class TungstenAggregate(
 
   override private[sql] lazy val metrics = Map(
 "numInputRows" -> SQLMetrics.createLongMetric(sparkContext, "number of 
input rows"),
-"numOutputRows" -> SQLMetrics.createLongMetric(sparkContext, "number 
of output rows"))
+"numOutputRows" -> SQLMetrics.createLongMetric(sparkContext, "number 
of output rows"),
+"dataSize" -> SQLMetrics.createSizeMetric(sparkContext, "data size"),
+"spilledSize" -> SQLMetrics.createSizeMetric(sparkContext, "spilled 
size"))
--- End diff --

I think this can just be `spill size`. If you change this then you should 
also change all the variable names


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-14 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8931#discussion_r42065959
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala 
---
@@ -102,20 +96,53 @@ private object LongSQLMetricParam extends 
SQLMetricParam[LongSQLMetricValue, Lon
 
   override def zero(initialValue: LongSQLMetricValue): LongSQLMetricValue 
= zero
 
-  override def zero: LongSQLMetricValue = new LongSQLMetricValue(0L)
+  override def zero: LongSQLMetricValue = new 
LongSQLMetricValue(initialValue)
 }
 
 private[sql] object SQLMetrics {
 
-  def createLongMetric(sc: SparkContext, name: String): LongSQLMetric = {
-val acc = new LongSQLMetric(name)
+  def createLongMetric(
--- End diff --

can this be `private`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-14 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8931#discussion_r42065719
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala 
---
@@ -102,20 +96,53 @@ private object LongSQLMetricParam extends 
SQLMetricParam[LongSQLMetricValue, Lon
 
   override def zero(initialValue: LongSQLMetricValue): LongSQLMetricValue 
= zero
 
-  override def zero: LongSQLMetricValue = new LongSQLMetricValue(0L)
+  override def zero: LongSQLMetricValue = new 
LongSQLMetricValue(initialValue)
 }
 
 private[sql] object SQLMetrics {
 
-  def createLongMetric(sc: SparkContext, name: String): LongSQLMetric = {
-val acc = new LongSQLMetric(name)
+  def createLongMetric(
+ sc: SparkContext,
+ name: String,
+ stringValue: Seq[Long] => String,
+ initialValue: Long): LongSQLMetric = {
+val param = new LongSQLMetricParam(stringValue, initialValue)
+val acc = new LongSQLMetric(name, param)
 sc.cleaner.foreach(_.registerAccumulatorForCleanup(acc))
 acc
   }
 
+  def createLongMetric(sc: SparkContext, name: String): LongSQLMetric = {
+createLongMetric(sc, name, _.sum.toString, 0L)
+  }
+
+  /**
+   * Create a metric to report the size information(including total, min, 
med, max) like data size,
+   * spilled size, etc.
+   */
+  def createSizeMetric(sc: SparkContext, name: String): LongSQLMetric = {
+val stringValue = (values: Seq[Long]) => {
+  // This is a work around for 
https://issues.apache.org/jira/browse/SPARK-11013
--- End diff --

also, work around -> workaround


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-14 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8931#discussion_r42065808
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala 
---
@@ -102,20 +96,53 @@ private object LongSQLMetricParam extends 
SQLMetricParam[LongSQLMetricValue, Lon
 
   override def zero(initialValue: LongSQLMetricValue): LongSQLMetricValue 
= zero
 
-  override def zero: LongSQLMetricValue = new LongSQLMetricValue(0L)
+  override def zero: LongSQLMetricValue = new 
LongSQLMetricValue(initialValue)
 }
 
 private[sql] object SQLMetrics {
 
-  def createLongMetric(sc: SparkContext, name: String): LongSQLMetric = {
-val acc = new LongSQLMetric(name)
+  def createLongMetric(
+ sc: SparkContext,
+ name: String,
+ stringValue: Seq[Long] => String,
+ initialValue: Long): LongSQLMetric = {
+val param = new LongSQLMetricParam(stringValue, initialValue)
+val acc = new LongSQLMetric(name, param)
 sc.cleaner.foreach(_.registerAccumulatorForCleanup(acc))
 acc
   }
 
+  def createLongMetric(sc: SparkContext, name: String): LongSQLMetric = {
+createLongMetric(sc, name, _.sum.toString, 0L)
+  }
+
+  /**
+   * Create a metric to report the size information(including total, min, 
med, max) like data size,
--- End diff --

space before (


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-14 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8931#discussion_r42065649
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala 
---
@@ -102,20 +96,53 @@ private object LongSQLMetricParam extends 
SQLMetricParam[LongSQLMetricValue, Lon
 
   override def zero(initialValue: LongSQLMetricValue): LongSQLMetricValue 
= zero
 
-  override def zero: LongSQLMetricValue = new LongSQLMetricValue(0L)
+  override def zero: LongSQLMetricValue = new 
LongSQLMetricValue(initialValue)
 }
 
 private[sql] object SQLMetrics {
 
-  def createLongMetric(sc: SparkContext, name: String): LongSQLMetric = {
-val acc = new LongSQLMetric(name)
+  def createLongMetric(
+ sc: SparkContext,
+ name: String,
+ stringValue: Seq[Long] => String,
+ initialValue: Long): LongSQLMetric = {
+val param = new LongSQLMetricParam(stringValue, initialValue)
+val acc = new LongSQLMetric(name, param)
 sc.cleaner.foreach(_.registerAccumulatorForCleanup(acc))
 acc
   }
 
+  def createLongMetric(sc: SparkContext, name: String): LongSQLMetric = {
+createLongMetric(sc, name, _.sum.toString, 0L)
+  }
+
+  /**
+   * Create a metric to report the size information(including total, min, 
med, max) like data size,
+   * spilled size, etc.
+   */
+  def createSizeMetric(sc: SparkContext, name: String): LongSQLMetric = {
+val stringValue = (values: Seq[Long]) => {
+  // This is a work around for 
https://issues.apache.org/jira/browse/SPARK-11013
+  // We use -1 as initial value of the accumulator, if the accumulator 
is valid, we will update
+  // it at the end of task and the value will be at least 0.
+  val validValues = values.filter(_ >= 0)
+  val Seq(sum, min, med, max) = {
+val metric = if (validValues.length == 0) {
+  Seq.fill(4)(0L)
+} else {
+  val sorted = validValues.sorted
+  Seq(sorted.sum, sorted(0), sorted(validValues.length / 2), 
sorted(validValues.length - 1))
+}
+metric.map(Utils.bytesToString)
+  }
+  s"\n$sum ($min, $med, $max)"
+}
+createLongMetric(sc, s"$name total (min, med, max)", stringValue, -1L)
--- End diff --

can you add an example of what it looks like?
```
// e.g. data size total (min, med, max): 100g (100m, 1g, 10g)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-14 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8931#discussion_r42065578
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala 
---
@@ -102,20 +96,53 @@ private object LongSQLMetricParam extends 
SQLMetricParam[LongSQLMetricValue, Lon
 
   override def zero(initialValue: LongSQLMetricValue): LongSQLMetricValue 
= zero
 
-  override def zero: LongSQLMetricValue = new LongSQLMetricValue(0L)
+  override def zero: LongSQLMetricValue = new 
LongSQLMetricValue(initialValue)
 }
 
 private[sql] object SQLMetrics {
 
-  def createLongMetric(sc: SparkContext, name: String): LongSQLMetric = {
-val acc = new LongSQLMetric(name)
+  def createLongMetric(
+ sc: SparkContext,
+ name: String,
+ stringValue: Seq[Long] => String,
+ initialValue: Long): LongSQLMetric = {
+val param = new LongSQLMetricParam(stringValue, initialValue)
+val acc = new LongSQLMetric(name, param)
 sc.cleaner.foreach(_.registerAccumulatorForCleanup(acc))
 acc
   }
 
+  def createLongMetric(sc: SparkContext, name: String): LongSQLMetric = {
+createLongMetric(sc, name, _.sum.toString, 0L)
+  }
+
+  /**
+   * Create a metric to report the size information(including total, min, 
med, max) like data size,
+   * spilled size, etc.
+   */
+  def createSizeMetric(sc: SparkContext, name: String): LongSQLMetric = {
+val stringValue = (values: Seq[Long]) => {
+  // This is a work around for 
https://issues.apache.org/jira/browse/SPARK-11013
--- End diff --

nit: just put `SPARK-11013` here instead of the whole URL


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-14 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8931#discussion_r42065490
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala 
---
@@ -102,20 +96,53 @@ private object LongSQLMetricParam extends 
SQLMetricParam[LongSQLMetricValue, Lon
 
   override def zero(initialValue: LongSQLMetricValue): LongSQLMetricValue 
= zero
 
-  override def zero: LongSQLMetricValue = new LongSQLMetricValue(0L)
+  override def zero: LongSQLMetricValue = new 
LongSQLMetricValue(initialValue)
 }
 
 private[sql] object SQLMetrics {
 
-  def createLongMetric(sc: SparkContext, name: String): LongSQLMetric = {
-val acc = new LongSQLMetric(name)
+  def createLongMetric(
+ sc: SparkContext,
+ name: String,
+ stringValue: Seq[Long] => String,
+ initialValue: Long): LongSQLMetric = {
--- End diff --

indent by 1 more space


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-14 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-148230227
  
LGTM. @cloud-fan can you try to produce a screenshot where the spill size 
is not always 0?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-14 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-148230804
  
A question for a follow-up patch: Right now each operator looks a little 
cluttered. @rxin @pwendell What do you guys think about displaying just the 
total by default, and show the quantiles on hover?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-147829168
  
  [Test build #43665 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43665/consoleFull)
 for   PR 8931 at commit 
[`fdae182`](https://github.com/apache/spark/commit/fdae1827564a0535f22a19b432442d66e56f12a6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-147827720
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-147827799
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-147863989
  
  [Test build #43665 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43665/console)
 for   PR 8931 at commit 
[`fdae182`](https://github.com/apache/spark/commit/fdae1827564a0535f22a19b432442d66e56f12a6).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-147864159
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43665/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-147864158
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-147528913
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-12 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-147530041
  
work around https://issues.apache.org/jira/browse/SPARK-11013 by using -1 
as initial value and filter out only the valid values(greater than 0).

It's ready for review now :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-147530110
  
  [Test build #43586 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43586/consoleFull)
 for   PR 8931 at commit 
[`380662e`](https://github.com/apache/spark/commit/380662e70041a3bd0b59e2870b3f7ee8c8e1aa6e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-147569486
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-147569487
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43603/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-147534437
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43586/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-147534435
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-147534385
  
  [Test build #43586 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43586/console)
 for   PR 8931 at commit 
[`380662e`](https://github.com/apache/spark/commit/380662e70041a3bd0b59e2870b3f7ee8c8e1aa6e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-147565828
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-147565843
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-147528883
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-147589254
  
  [Test build #43612 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43612/consoleFull)
 for   PR 8931 at commit 
[`796fce1`](https://github.com/apache/spark/commit/796fce147f2bafa10b115d305ceaab8b382134e9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-147587062
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-147614038
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-12 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-147613951
  
  [Test build #43612 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43612/console)
 for   PR 8931 at commit 
[`796fce1`](https://github.com/apache/spark/commit/796fce147f2bafa10b115d305ceaab8b382134e9).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-147614040
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43612/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-147587017
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-12 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-147586197
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-08 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/8931#discussion_r41531452
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala
 ---
@@ -690,6 +697,9 @@ class TungstenAggregationIterator(
 val mapMemory = hashMap.getPeakMemoryUsedBytes
 val sorterMemory = 
Option(externalSorter).map(_.getPeakMemoryUsedBytes).getOrElse(0L)
 val peakMemory = Math.max(mapMemory, sorterMemory)
+totalPeakMemory += peakMemory
+maxPeakMemory += peakMemory
+totalSpilledSize += 
TaskContext.get().taskMetrics().memoryBytesSpilled - spilledSizeBefore
--- End diff --

@andrewor14 is it any place that needs to create multiple threads in a 
task? Currently, accumulators are not thread-safe. I assume there is only one 
thread updating accumulators and one thread collecting accumulators' values 
when running a task. If there are multiple updating threads, we need to make 
accumulators thread-safe.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-08 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/8931#discussion_r41534622
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala
 ---
@@ -690,6 +697,9 @@ class TungstenAggregationIterator(
 val mapMemory = hashMap.getPeakMemoryUsedBytes
 val sorterMemory = 
Option(externalSorter).map(_.getPeakMemoryUsedBytes).getOrElse(0L)
 val peakMemory = Math.max(mapMemory, sorterMemory)
+totalPeakMemory += peakMemory
+maxPeakMemory += peakMemory
+totalSpilledSize += 
TaskContext.get().taskMetrics().memoryBytesSpilled - spilledSizeBefore
--- End diff --

It's possible for pipelined transformations inside of a stage to be 
executed in separate threads if there's a PythonRDD, RRDD, PipedRDD, or Hive 
ScriptTransformation transformation in the middle of the chain.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-08 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146675441
  
This is blocked by https://issues.apache.org/jira/browse/SPARK-11013
Will work on that first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-14172
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-14191
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146669509
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146669512
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43420/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8931#discussion_r41468414
  
--- Diff: 
core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java ---
@@ -560,7 +560,7 @@ public boolean putNewKey(
   // Here, we'll copy the data into our data pages. Because we only 
store a relative offset from
   // the key address instead of storing the absolute address of the 
value, the key and value
   // must be stored in the same memory page.
-  // (8 byte key length) (key) (value)
+  // (8 bytes length info) (key) (value)
--- End diff --

why change this? wasn't it correct before?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146389520
  
I slightly prefer to separate the total and the quantiles, e.g.
```
memory total: 10G
memory per task (min / med / max): 10MB / 100MB / 1GB
```
These are really two different metrics: the first is the total memory 
aggregated across all tasks, while the second is the memory used within each 
task. Right now you have many numbers on the same line where some of them mean 
different things.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8931#discussion_r41468810
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala
 ---
@@ -690,6 +697,9 @@ class TungstenAggregationIterator(
 val mapMemory = hashMap.getPeakMemoryUsedBytes
 val sorterMemory = 
Option(externalSorter).map(_.getPeakMemoryUsedBytes).getOrElse(0L)
 val peakMemory = Math.max(mapMemory, sorterMemory)
+totalPeakMemory += peakMemory
+maxPeakMemory += peakMemory
+totalSpilledSize += 
TaskContext.get().taskMetrics().memoryBytesSpilled - spilledSizeBefore
--- End diff --

this seems fine, but I wonder what happens if you have multiple threads per 
task, i.e. if another thread spills a lot of things in the mean time in a 
different operator. Is that a potential problem?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/8931#discussion_r41468737
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala 
---
@@ -35,6 +36,12 @@ private[sql] abstract class SQLMetric[R <: 
SQLMetricValue[T], T](
  */
 private[sql] trait SQLMetricParam[R <: SQLMetricValue[T], T] extends 
AccumulableParam[R, T] {
 
+  val mergeValues: (T, T) => T
--- End diff --

should add a comment here. What are you merging?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/8931#discussion_r41419650
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala
 ---
@@ -69,6 +72,9 @@ case class TungstenAggregate(
   protected override def doExecute(): RDD[InternalRow] = attachTree(this, 
"execute") {
 val numInputRows = longMetric("numInputRows")
 val numOutputRows = longMetric("numOutputRows")
+val totalPeakMemory = longMetric("totalPeakMemory")
+val maxPeakMemory = longMetric("maxPeakMemory")
--- End diff --

max and peak are redundant.  maxTaskMemory? maxPerTaskMemory?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146270250
  
What was the decision on having percentiles for per task memory?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146291989
  
> memory total (min,med,max): 10GB (1MB, 100MB, 1GB)

@pwendell thats my new favorite, with the caveat that we should say `data 
size` not `memory` (to avoid confusion since the same memory could actually be 
reused by multiple tasks)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146292851
  
(and data size should include all the data, including spilled)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146293403
  
SGTM.

On Wed, Oct 7, 2015 at 11:55 AM, Reynold Xin 
wrote:

> (and data size should include all the data, including spilled)
>
> —
> Reply to this email directly or view it on GitHub
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/8931#discussion_r41424166
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala
 ---
@@ -69,6 +72,9 @@ case class TungstenAggregate(
   protected override def doExecute(): RDD[InternalRow] = attachTree(this, 
"execute") {
 val numInputRows = longMetric("numInputRows")
 val numOutputRows = longMetric("numOutputRows")
+val totalPeakMemory = longMetric("totalPeakMemory")
--- End diff --

Also is this really total and peak, or just total?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/8931#discussion_r41426589
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala
 ---
@@ -69,6 +72,9 @@ case class TungstenAggregate(
   protected override def doExecute(): RDD[InternalRow] = attachTree(this, 
"execute") {
 val numInputRows = longMetric("numInputRows")
 val numOutputRows = longMetric("numOutputRows")
+val totalPeakMemory = longMetric("totalPeakMemory")
--- End diff --

Also it's weird to say "total" in some places but not in others. For 
instance, records in and out are also totals, but it doesn't say "total" in 
those.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146285580
  
I think we want to have concise and consistent names here across all the 
metrics. Here is my proposal for naming:

```
input rows
output rows
spilled data
memory
task memory (max)
task spilled data (max)
```

I think the word "peak" is not necessary because I assume your report the 
peak memory over the lifetime of a task. I think the word "total" is not 
necessary because these are accumulated values and can be assumed to be total 
unless otherwise stated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146288685
  
The reason I like accumulated memory is that it's something that should be 
roughly constant over multiple runs of a workload so people can get a sense of 
how much data they are buffering during execution. The max and median will 
depend a lot on how tasks are scheduled, etc, so they don't give someone a 
great idea of how they can change their query or data to get memory under 
control. It's just how in hadoop you can see the total input size for a job. 
These totals are often really helpful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146289878
  
My ideal UI would be:

high level:
- input rows
- output rows
- spill (using on-disk size)
- data size (instead of calling it memory - this should include the total 
data size)

low level (maybe a tab or a separate page):
- histogram on task memory
- histogram on task spill

If we don't have time to implement the histogram, we can either not report 
the individual task ones, or report them if they are not too verbose. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146278708
  
This PR shows we can support percentiles metric(like max) and what the 
result looks like on the web page, so we can discuss the percentiles metric 
stuff here.
Not like the stage page, we don't have a table to show metrics, so having a 
lot of percentiles metrics may looks messy(image min, 25%, medium, 75%, max all 
appears in a operator block line by line).
It looks to me that the max memory of task is most important, so I only put 
it here. If you do think all percentiles metrics are necessary, we can add them 
easily.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146280376
  
I think at least "min" "median" and "max" per task memory would be useful 
and not too much clutter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/8931#discussion_r41425408
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala
 ---
@@ -69,6 +72,9 @@ case class TungstenAggregate(
   protected override def doExecute(): RDD[InternalRow] = attachTree(this, 
"execute") {
 val numInputRows = longMetric("numInputRows")
 val numOutputRows = longMetric("numOutputRows")
+val totalPeakMemory = longMetric("totalPeakMemory")
--- End diff --

it's a total amount of all tasks' peak memory.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/8931#discussion_r41425821
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala
 ---
@@ -69,6 +72,9 @@ case class TungstenAggregate(
   protected override def doExecute(): RDD[InternalRow] = attachTree(this, 
"execute") {
 val numInputRows = longMetric("numInputRows")
 val numOutputRows = longMetric("numOutputRows")
+val totalPeakMemory = longMetric("totalPeakMemory")
--- End diff --

I see, given limited space, I question the utility of this metric (these 
tasks might not have even been running at the same time).  The percentiles seem 
way more useful (i.e. is skew causing some partitions to spill or are they all 
spilling).  We can leave it in if @pwendell and @rxin feel strongly though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/8931#discussion_r41426412
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala
 ---
@@ -69,6 +72,9 @@ case class TungstenAggregate(
   protected override def doExecute(): RDD[InternalRow] = attachTree(this, 
"execute") {
 val numInputRows = longMetric("numInputRows")
 val numOutputRows = longMetric("numOutputRows")
+val totalPeakMemory = longMetric("totalPeakMemory")
--- End diff --

Hey can you give a more precise definition here of what this means? I think 
the word "Peak" is throwing me off and maybe we could delete it, if you say 
"memory" I will assume you mean the maximum amount of memory a task is using 
over its lifetime. I think on this one it might be best to just discuss it 
briefly in person.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146287101
  
I agree "peak" is not a right word becaues we always report the memory at 
the end of task so it's not "peak".

I'll change to use @pwendell 's naming suggestions if others don't have 
better ideas.
 cc @marmbrus @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146287008
  
@pwendell , I like those.  I'd consider dropping memory completely and 
adding (min).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-07 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146289643
  
BTW - one alternative would be to create an accumulator that tracks max, 
min, median, and total and then have it display nicely in two lines. For 
instance:

```
memory total (min,med,max):
10GB (1MB,100MB,1GB)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146080586
  
  [Test build #43314 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43314/console)
 for   PR 8931 at commit 
[`0d881ab`](https://github.com/apache/spark/commit/0d881ab779b37053e0e28559279fc18da3bbcb51).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146080646
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146080647
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43314/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146057685
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146057707
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-06 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146057863
  
cc @andrewor14 @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146058733
  
  [Test build #43312 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43312/consoleFull)
 for   PR 8931 at commit 
[`59a23f7`](https://github.com/apache/spark/commit/59a23f75da0a59e2f1423390df2fbb68347a9db4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146059664
  
  [Test build #43312 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43312/console)
 for   PR 8931 at commit 
[`59a23f7`](https://github.com/apache/spark/commit/59a23f75da0a59e2f1423390df2fbb68347a9db4).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146059680
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43312/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146059674
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146060882
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146060898
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146061603
  
  [Test build #43314 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43314/consoleFull)
 for   PR 8931 at commit 
[`0d881ab`](https://github.com/apache/spark/commit/0d881ab779b37053e0e28559279fc18da3bbcb51).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-10-06 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-146062259
  
@zsxwing can you have a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-144111611
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-09-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-144103038
  
  [Test build #43082 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43082/consoleFull)
 for   PR 8931 at commit 
[`4f93884`](https://github.com/apache/spark/commit/4f93884ffb2dd8eedda508ba35bf9f804e5a016e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-144101834
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-09-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-144111542
  
  [Test build #43082 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43082/console)
 for   PR 8931 at commit 
[`4f93884`](https://github.com/apache/spark/commit/4f93884ffb2dd8eedda508ba35bf9f804e5a016e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-144101862
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-144111616
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43082/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-144223101
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-09-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-144223081
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-09-29 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8931#discussion_r40735019
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregate.scala
 ---
@@ -39,7 +40,9 @@ case class TungstenAggregate(
 
   override private[sql] lazy val metrics = Map(
 "numInputRows" -> SQLMetrics.createLongMetric(sparkContext, "number of 
input rows"),
-"numOutputRows" -> SQLMetrics.createLongMetric(sparkContext, "number 
of output rows"))
+"numOutputRows" -> SQLMetrics.createLongMetric(sparkContext, "number 
of output rows"),
+"numBytesUsed" ->
--- End diff --

+1 `peakMemoryUsed`. We will show the unit in the value string, right? So, 
we do not say the unit in the metric name?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10412][SQL] report memory usage for tun...

2015-09-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8931#issuecomment-144224468
  
  [Test build #1831 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1831/consoleFull)
 for   PR 8931 at commit 
[`36e700f`](https://github.com/apache/spark/commit/36e700f6690abc50b34bbc21ddc035e8f2f76956).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >