[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12899


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-10 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-218243622
  
LGTM merging into master 2.0. Let's address any follow-ups in a future 
patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-10 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12899#discussion_r62723405
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -155,7 +155,13 @@ private[spark] abstract class Task[T](
*/
   def collectAccumulatorUpdates(taskFailed: Boolean = false): 
Seq[AccumulatorV2[_, _]] = {
 if (context != null) {
-  context.taskMetrics.accumulators().filter { a => !taskFailed || 
a.countFailedValues }
+  context.taskMetrics.internalAccums.filter { a =>
+// RESULT_SIZE accumulator is always zero at executor, we need to 
send it back as its
+// value will be updated at driver side.
+!a.isZero || a.name == Some(InternalAccumulator.RESULT_SIZE)
--- End diff --

ok I'll add it when I merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-10 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/12899#discussion_r62711158
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -155,7 +155,13 @@ private[spark] abstract class Task[T](
*/
   def collectAccumulatorUpdates(taskFailed: Boolean = false): 
Seq[AccumulatorV2[_, _]] = {
 if (context != null) {
-  context.taskMetrics.accumulators().filter { a => !taskFailed || 
a.countFailedValues }
+  context.taskMetrics.internalAccums.filter { a =>
+// RESULT_SIZE accumulator is always zero at executor, we need to 
send it back as its
+// value will be updated at driver side.
+!a.isZero || a.name == Some(InternalAccumulator.RESULT_SIZE)
--- End diff --

Can we add a comment to say internal accumulators will always count on 
failures?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-10 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/12899#discussion_r62710891
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -155,7 +155,13 @@ private[spark] abstract class Task[T](
*/
   def collectAccumulatorUpdates(taskFailed: Boolean = false): 
Seq[AccumulatorV2[_, _]] = {
 if (context != null) {
-  context.taskMetrics.accumulators().filter { a => !taskFailed || 
a.countFailedValues }
+  context.taskMetrics.internalAccums.filter { a =>
+// RESULT_SIZE accumulator is always zero at executor, we need to 
send it back as its
+// value will be updated at driver side.
+!a.isZero || a.name == Some(InternalAccumulator.RESULT_SIZE)
+  // zero value external accumulators may still be useful, e.g. 
SQLMetrics, we should not filter
--- End diff --

Sounds good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-10 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/12899#discussion_r62709430
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -155,7 +155,13 @@ private[spark] abstract class Task[T](
*/
   def collectAccumulatorUpdates(taskFailed: Boolean = false): 
Seq[AccumulatorV2[_, _]] = {
 if (context != null) {
-  context.taskMetrics.accumulators().filter { a => !taskFailed || 
a.countFailedValues }
+  context.taskMetrics.internalAccums.filter { a =>
+// RESULT_SIZE accumulator is always zero at executor, we need to 
send it back as its
+// value will be updated at driver side.
+!a.isZero || a.name == Some(InternalAccumulator.RESULT_SIZE)
+  // zero value external accumulators may still be useful, e.g. 
SQLMetrics, we should not filter
--- End diff --

If this is the only issue, can we merge this pull request and have a new pr 
to fix this semantics - which clearly there are some disagreement with.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-10 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/12899#discussion_r62708707
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -155,7 +155,13 @@ private[spark] abstract class Task[T](
*/
   def collectAccumulatorUpdates(taskFailed: Boolean = false): 
Seq[AccumulatorV2[_, _]] = {
 if (context != null) {
-  context.taskMetrics.accumulators().filter { a => !taskFailed || 
a.countFailedValues }
+  context.taskMetrics.internalAccums.filter { a =>
+// RESULT_SIZE accumulator is always zero at executor, we need to 
send it back as its
+// value will be updated at driver side.
+!a.isZero || a.name == Some(InternalAccumulator.RESULT_SIZE)
+  // zero value external accumulators may still be useful, e.g. 
SQLMetrics, we should not filter
--- End diff --

If there is no spilling, we could say the size of spilling is not define 
(null for unknown). We also have total value, we could know that how many task 
had spilled. Right now, we can't know how many tasks had spilled, actually it's 
worse.

I don't think it's wrong. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/12899#discussion_r62622150
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -155,7 +155,13 @@ private[spark] abstract class Task[T](
*/
   def collectAccumulatorUpdates(taskFailed: Boolean = false): 
Seq[AccumulatorV2[_, _]] = {
 if (context != null) {
-  context.taskMetrics.accumulators().filter { a => !taskFailed || 
a.countFailedValues }
+  context.taskMetrics.internalAccums.filter { a =>
+// RESULT_SIZE accumulator is always zero at executor, we need to 
send it back as its
+// value will be updated at driver side.
+!a.isZero || a.name == Some(InternalAccumulator.RESULT_SIZE)
+  // zero value external accumulators may still be useful, e.g. 
SQLMetrics, we should not filter
--- End diff --

It's task level statistics. Let's say an operator launched 100 tasks to 
execute it, and 99 tasks don't spill, only one task spills 100 mb, then the avg 
of `spilling size` will be 100 mb, if we don't include zero values. This is 
obviously wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-09 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/12899#discussion_r62614867
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -155,7 +155,13 @@ private[spark] abstract class Task[T](
*/
   def collectAccumulatorUpdates(taskFailed: Boolean = false): 
Seq[AccumulatorV2[_, _]] = {
 if (context != null) {
-  context.taskMetrics.accumulators().filter { a => !taskFailed || 
a.countFailedValues }
+  context.taskMetrics.internalAccums.filter { a =>
+// RESULT_SIZE accumulator is always zero at executor, we need to 
send it back as its
+// value will be updated at driver side.
+!a.isZero || a.name == Some(InternalAccumulator.RESULT_SIZE)
+  // zero value external accumulators may still be useful, e.g. 
SQLMetrics, we should not filter
--- End diff --

We could see zero value as `null`, it's reasonable to not include the 
`null` in sum/avg. Is there any downside to exclude these zeros ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-218042313
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58185/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-218042312
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-218042198
  
**[Test build #58185 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58185/consoleFull)**
 for PR 12899 at commit 
[`18aa4ab`](https://github.com/apache/spark/commit/18aa4abb4ddd4cf0800e0b353077d083f66096de).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-218027664
  
**[Test build #58185 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58185/consoleFull)**
 for PR 12899 at commit 
[`18aa4ab`](https://github.com/apache/spark/commit/18aa4abb4ddd4cf0800e0b353077d083f66096de).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/12899#discussion_r62595519
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala 
---
@@ -66,7 +66,7 @@ private[sql] object SQLMetrics {
 
   def createMetric(sc: SparkContext, name: String): SQLMetric = {
 val acc = new SQLMetric(SUM_METRIC)
-acc.register(sc, name = Some(name), countFailedValues = true)
+acc.register(sc, name = Some(name), countFailedValues = false)
--- End diff --

It was a mistake, SQLMetric should not count failed values.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/12899#discussion_r62595350
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -155,7 +155,13 @@ private[spark] abstract class Task[T](
*/
   def collectAccumulatorUpdates(taskFailed: Boolean = false): 
Seq[AccumulatorV2[_, _]] = {
 if (context != null) {
-  context.taskMetrics.accumulators().filter { a => !taskFailed || 
a.countFailedValues }
+  context.taskMetrics.internalAccums.filter { a =>
+// RESULT_SIZE accumulator is always zero at executor, we need to 
send it back as its
+// value will be updated at driver side.
+!a.isZero || a.name == Some(InternalAccumulator.RESULT_SIZE)
+  // zero value external accumulators may still be useful, e.g. 
SQLMetrics, we should not filter
--- End diff --

and it's for UI.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/12899#discussion_r62595291
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -155,7 +155,13 @@ private[spark] abstract class Task[T](
*/
   def collectAccumulatorUpdates(taskFailed: Boolean = false): 
Seq[AccumulatorV2[_, _]] = {
 if (context != null) {
-  context.taskMetrics.accumulators().filter { a => !taskFailed || 
a.countFailedValues }
+  context.taskMetrics.internalAccums.filter { a =>
+// RESULT_SIZE accumulator is always zero at executor, we need to 
send it back as its
+// value will be updated at driver side.
+!a.isZero || a.name == Some(InternalAccumulator.RESULT_SIZE)
+  // zero value external accumulators may still be useful, e.g. 
SQLMetrics, we should not filter
--- End diff --

Because SQL Metrics has some statistics, e.g. max, min, avg, we need all 
values even it's zero.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-09 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/12899#discussion_r62539578
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -155,7 +155,13 @@ private[spark] abstract class Task[T](
*/
   def collectAccumulatorUpdates(taskFailed: Boolean = false): 
Seq[AccumulatorV2[_, _]] = {
 if (context != null) {
-  context.taskMetrics.accumulators().filter { a => !taskFailed || 
a.countFailedValues }
+  context.taskMetrics.internalAccums.filter { a =>
+// RESULT_SIZE accumulator is always zero at executor, we need to 
send it back as its
+// value will be updated at driver side.
+!a.isZero || a.name == Some(InternalAccumulator.RESULT_SIZE)
+  // zero value external accumulators may still be useful, e.g. 
SQLMetrics, we should not filter
--- End diff --

Another question is that: Why zero value of SQL metrics are useful? Are 
they only useful for tests or UI?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-09 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/12899#discussion_r62539364
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -155,7 +155,13 @@ private[spark] abstract class Task[T](
*/
   def collectAccumulatorUpdates(taskFailed: Boolean = false): 
Seq[AccumulatorV2[_, _]] = {
 if (context != null) {
-  context.taskMetrics.accumulators().filter { a => !taskFailed || 
a.countFailedValues }
+  context.taskMetrics.internalAccums.filter { a =>
+// RESULT_SIZE accumulator is always zero at executor, we need to 
send it back as its
+// value will be updated at driver side.
+!a.isZero || a.name == Some(InternalAccumulator.RESULT_SIZE)
+  // zero value external accumulators may still be useful, e.g. 
SQLMetrics, we should not filter
--- End diff --

SQL metrics should not count on failures, will we fix that in this PR or a 
separate one? Then this part should also be updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-09 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217932594
  
This patch itself LGTM. I'm merging it into master 2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-09 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12899#discussion_r62538599
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -155,7 +155,13 @@ private[spark] abstract class Task[T](
*/
   def collectAccumulatorUpdates(taskFailed: Boolean = false): 
Seq[AccumulatorV2[_, _]] = {
 if (context != null) {
-  context.taskMetrics.accumulators().filter { a => !taskFailed || 
a.countFailedValues }
+  context.taskMetrics.internalAccums.filter { a =>
+// RESULT_SIZE accumulator is always zero at executor, we need to 
send it back as its
+// value will be updated at driver side.
+!a.isZero || a.name == Some(InternalAccumulator.RESULT_SIZE)
+  // zero value external accumulators may still be useful, e.g. 
SQLMetrics, we should not filter
--- End diff --

we should probably change the semantics of internal to mean internal to 
Spark (i.e. include SQL metrics), but that's a separate issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/12899#discussion_r62410420
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -155,7 +155,13 @@ private[spark] abstract class Task[T](
*/
   def collectAccumulatorUpdates(taskFailed: Boolean = false): 
Seq[AccumulatorV2[_, _]] = {
 if (context != null) {
-  context.taskMetrics.accumulators().filter { a => !taskFailed || 
a.countFailedValues }
+  context.taskMetrics.internalAccums.filter { a =>
+// RESULT_SIZE accumulator is always zero at executor, we need to 
send it back as its
+// value will be updated at driver side.
+!a.isZero || a.name == Some(InternalAccumulator.RESULT_SIZE)
+  // zero value external accumulators may still be useful, e.g. 
SQLMetrics, we should not filter
--- End diff --

There are 2 concepts:

1. internal accumulators: like GCtime, resultSize, which are internal to 
DAGScheduler.
2. `countFailedValues` accumulator: `countFailedValues` is an internal flag 
that can only be set by us. All internal accumulators are `countFailedValues` 
accumulators, and SQLMetrics are also `countFailedValues` accumulators.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-06 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12899#discussion_r62397915
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -155,7 +155,13 @@ private[spark] abstract class Task[T](
*/
   def collectAccumulatorUpdates(taskFailed: Boolean = false): 
Seq[AccumulatorV2[_, _]] = {
 if (context != null) {
-  context.taskMetrics.accumulators().filter { a => !taskFailed || 
a.countFailedValues }
+  context.taskMetrics.internalAccums.filter { a =>
+// RESULT_SIZE accumulator is always zero at executor, we need to 
send it back as its
+// value will be updated at driver side.
+!a.isZero || a.name == Some(InternalAccumulator.RESULT_SIZE)
+  // zero value external accumulators may still be useful, e.g. 
SQLMetrics, we should not filter
--- End diff --

internal here means task metrics. "internal" to core.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-06 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/12899#discussion_r62374320
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -155,7 +155,13 @@ private[spark] abstract class Task[T](
*/
   def collectAccumulatorUpdates(taskFailed: Boolean = false): 
Seq[AccumulatorV2[_, _]] = {
 if (context != null) {
-  context.taskMetrics.accumulators().filter { a => !taskFailed || 
a.countFailedValues }
+  context.taskMetrics.internalAccums.filter { a =>
+// RESULT_SIZE accumulator is always zero at executor, we need to 
send it back as its
+// value will be updated at driver side.
+!a.isZero || a.name == Some(InternalAccumulator.RESULT_SIZE)
+  // zero value external accumulators may still be useful, e.g. 
SQLMetrics, we should not filter
--- End diff --

We change the countOnFailure from false to true recently, is that an design 
change?

Why SQLMetrics are external accumulators?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217394779
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217394784
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57969/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217394568
  
**[Test build #57969 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57969/consoleFull)**
 for PR 12899 at commit 
[`b4e7385`](https://github.com/apache/spark/commit/b4e7385823880b622f17d5cdf57fca037fe93cb7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217369918
  
**[Test build #57969 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57969/consoleFull)**
 for PR 12899 at commit 
[`b4e7385`](https://github.com/apache/spark/commit/b4e7385823880b622f17d5cdf57fca037fe93cb7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217352598
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217352599
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57959/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217352525
  
**[Test build #57959 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57959/consoleFull)**
 for PR 12899 at commit 
[`cb21034`](https://github.com/apache/spark/commit/cb210349233e06a368f4693b92ed50314e168eab).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217344297
  
**[Test build #57959 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57959/consoleFull)**
 for PR 12899 at commit 
[`cb21034`](https://github.com/apache/spark/commit/cb210349233e06a368f4693b92ed50314e168eab).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217340511
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217340512
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57946/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217340426
  
**[Test build #57946 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57946/consoleFull)**
 for PR 12899 at commit 
[`b48dda8`](https://github.com/apache/spark/commit/b48dda8402bd85ab02586e36bd7eb9440c140e00).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217338062
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217338038
  
**[Test build #57944 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57944/consoleFull)**
 for PR 12899 at commit 
[`41f5cb4`](https://github.com/apache/spark/commit/41f5cb4da8d9192bc75f547ec1b4dd68d6205161).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217338064
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57944/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217330300
  
**[Test build #57946 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57946/consoleFull)**
 for PR 12899 at commit 
[`b48dda8`](https://github.com/apache/spark/commit/b48dda8402bd85ab02586e36bd7eb9440c140e00).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217324749
  
**[Test build #57944 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57944/consoleFull)**
 for PR 12899 at commit 
[`41f5cb4`](https://github.com/apache/spark/commit/41f5cb4da8d9192bc75f547ec1b4dd68d6205161).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217324631
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217229568
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57898/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217229565
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217229469
  
**[Test build #57898 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57898/consoleFull)**
 for PR 12899 at commit 
[`41f5cb4`](https://github.com/apache/spark/commit/41f5cb4da8d9192bc75f547ec1b4dd68d6205161).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217203206
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217204592
  
**[Test build #57898 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57898/consoleFull)**
 for PR 12899 at commit 
[`41f5cb4`](https://github.com/apache/spark/commit/41f5cb4da8d9192bc75f547ec1b4dd68d6205161).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217200613
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217200615
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57893/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217200510
  
**[Test build #57893 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57893/consoleFull)**
 for PR 12899 at commit 
[`41f5cb4`](https://github.com/apache/spark/commit/41f5cb4da8d9192bc75f547ec1b4dd68d6205161).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217197047
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57892/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217197044
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217196930
  
**[Test build #57892 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57892/consoleFull)**
 for PR 12899 at commit 
[`2d3d3d4`](https://github.com/apache/spark/commit/2d3d3d4b65d659188f2a328282fbf81e35657014).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217172496
  
**[Test build #57893 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57893/consoleFull)**
 for PR 12899 at commit 
[`41f5cb4`](https://github.com/apache/spark/commit/41f5cb4da8d9192bc75f547ec1b4dd68d6205161).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/12899#discussion_r62196091
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1097,8 +1097,8 @@ class DAGScheduler(
 throw new SparkException(s"attempted to access non-existent 
accumulator $id")
 }
 acc.merge(updates.asInstanceOf[AccumulatorV2[Any, Any]])
-// To avoid UI cruft, ignore cases where value wasn't updated
-if (acc.name.isDefined && !updates.isZero) {
+// Only display named accumulators on UI.
+if (acc.name.isDefined) {
--- End diff --

I reverted this change because we can't do assert here. It's surrounded by 
a `try catch` which captures all NonFatal exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217171297
  
**[Test build #57892 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57892/consoleFull)**
 for PR 12899 at commit 
[`2d3d3d4`](https://github.com/apache/spark/commit/2d3d3d4b65d659188f2a328282fbf81e35657014).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217115149
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57866/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217115146
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217115088
  
**[Test build #57866 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57866/consoleFull)**
 for PR 12899 at commit 
[`c523fee`](https://github.com/apache/spark/commit/c523feeb58376ed4813d1e5119638fe6528f742a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217101715
  
**[Test build #57866 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57866/consoleFull)**
 for PR 12899 at commit 
[`c523fee`](https://github.com/apache/spark/commit/c523feeb58376ed4813d1e5119638fe6528f742a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217100897
  
@davies , I think the added `assert` can guarantee this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-05 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217100854
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217075463
  
I will be great we could have a test to make sure that we always have this 
behavior.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217075163
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57832/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217075088
  
**[Test build #57832 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57832/consoleFull)**
 for PR 12899 at commit 
[`c523fee`](https://github.com/apache/spark/commit/c523feeb58376ed4813d1e5119638fe6528f742a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217075162
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217064808
  
**[Test build #57832 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57832/consoleFull)**
 for PR 12899 at commit 
[`c523fee`](https://github.com/apache/spark/commit/c523feeb58376ed4813d1e5119638fe6528f742a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-217002044
  
Since we have only one BlockStatusesAccumulator object in TaskMetrics, it 
may not worth to do 2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12899#discussion_r62112750
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1097,8 +1097,8 @@ class DAGScheduler(
 throw new SparkException(s"attempted to access non-existent 
accumulator $id")
 }
 acc.merge(updates.asInstanceOf[AccumulatorV2[Any, Any]])
-// To avoid UI cruft, ignore cases where value wasn't updated
-if (acc.name.isDefined && !updates.isZero) {
+// Only display named accumulators on UI.
+if (acc.name.isDefined) {
--- End diff --

it's because the executor no longer sends back updates where the value is 
zero, so the second condition is always assumed to be true. Maybe we should add 
an assert or something instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12899#discussion_r62112538
  
--- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala 
---
@@ -291,25 +291,32 @@ private[spark] object TaskMetrics extends Logging {
 
 private[spark] class BlockStatusesAccumulator
   extends AccumulatorV2[(BlockId, BlockStatus), Seq[(BlockId, 
BlockStatus)]] {
-  private[this] var _seq = ArrayBuffer.empty[(BlockId, BlockStatus)]
+  private[this] var _seq: ArrayBuffer[(BlockId, BlockStatus)] = _
 
-  override def isZero(): Boolean = _seq.isEmpty
+  private def seq = {
--- End diff --

can you add return type


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-216998738
  
I agree. I think (1) is straightforward and we should do it. (2) I'm not so 
sure since it only affects one of the accumulators.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-216965921
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-216965926
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57771/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-216965728
  
**[Test build #57771 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57771/consoleFull)**
 for PR 12899 at commit 
[`b4571f9`](https://github.com/apache/spark/commit/b4571f96d0e73bdd5c0b53d2f90ff68c0bc98105).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-216943351
  
@cloud-fan we don't need to do this here but I think we can also 
substantially cut down the size of a task result if we consolidate all the 
accumulators into a single one in TaskMetrics.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-216940164
  
```
scala> ser.newInstance().serialize(ArrayBuffer.empty[Any])
res4: java.nio.ByteBuffer = java.nio.HeapByteBuffer[pos=0 lim=173 cap=214]

scala> ser.newInstance().serialize(null)
res5: java.nio.ByteBuffer = java.nio.HeapByteBuffer[pos=0 lim=5 cap=32]

scala> ser.newInstance().serialize(new java.util.ArrayList[Long])
res6: java.nio.ByteBuffer = java.nio.HeapByteBuffer[pos=0 lim=58 cap=64]

scala> ser.newInstance().serialize(1L)
res7: java.nio.ByteBuffer = java.nio.HeapByteBuffer[pos=0 lim=82 cap=128]

scala> ser.newInstance().serialize(Array(1L))
res8: java.nio.ByteBuffer = java.nio.HeapByteBuffer[pos=0 lim=35 cap=64]
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-216937041
  
@cloud-fan How much we can gain from 2)? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/12899#discussion_r62077628
  
--- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala 
---
@@ -291,25 +291,32 @@ private[spark] object TaskMetrics extends Logging {
 
 private[spark] class BlockStatusesAccumulator
   extends AccumulatorV2[(BlockId, BlockStatus), Seq[(BlockId, 
BlockStatus)]] {
-  private[this] var _seq = ArrayBuffer.empty[(BlockId, BlockStatus)]
+  private[this] var _seq: ArrayBuffer[(BlockId, BlockStatus)] = _
 
-  override def isZero(): Boolean = _seq.isEmpty
+  private def seq = {
--- End diff --

Should this be thread-safe?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/12899#discussion_r62077575
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1097,8 +1097,8 @@ class DAGScheduler(
 throw new SparkException(s"attempted to access non-existent 
accumulator $id")
 }
 acc.merge(updates.asInstanceOf[AccumulatorV2[Any, Any]])
-// To avoid UI cruft, ignore cases where value wasn't updated
-if (acc.name.isDefined && !updates.isZero) {
+// Only display named accumulators on UI.
+if (acc.name.isDefined) {
--- End diff --

Why this change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-216935103
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-216935106
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57767/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-216935011
  
**[Test build #57767 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57767/consoleFull)**
 for PR 12899 at commit 
[`b4571f9`](https://github.com/apache/spark/commit/b4571f96d0e73bdd5c0b53d2f90ff68c0bc98105).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-216934067
  
**[Test build #57771 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57771/consoleFull)**
 for PR 12899 at commit 
[`b4571f9`](https://github.com/apache/spark/commit/b4571f96d0e73bdd5c0b53d2f90ff68c0bc98105).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-216932538
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-216923365
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57766/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-216923361
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-216923251
  
**[Test build #57766 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57766/consoleFull)**
 for PR 12899 at commit 
[`57fc9af`](https://github.com/apache/spark/commit/57fc9afbda5265d389771c0e102266e39171034b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-216900257
  
**[Test build #57767 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57767/consoleFull)**
 for PR 12899 at commit 
[`b4571f9`](https://github.com/apache/spark/commit/b4571f96d0e73bdd5c0b53d2f90ff68c0bc98105).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12837][CORE] reduce network IO for accu...

2016-05-04 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/12899#issuecomment-216899245
  
cc @davies @rxin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org