[jira] [Commented] (SPARK-16827) Stop reporting spill metrics as shuffle metrics

2016-11-12 Thread Gaoxiang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15659972#comment-15659972
 ] 

Gaoxiang Liu commented on SPARK-16827:
--

ping ping..

> Stop reporting spill metrics as shuffle metrics
> ---
>
> Key: SPARK-16827
> URL: https://issues.apache.org/jira/browse/SPARK-16827
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 2.0.0
>Reporter: Sital Kedia
>Assignee: Brian Cho
>  Labels: performance
>
> One of our hive job which looks like this -
> {code}
>  SELECT  userid
>  FROM  table1 a
>  JOIN table2 b
>   ONa.ds = '2016-07-15'
>   AND  b.ds = '2016-07-15'
>   AND  a.source_id = b.id
> {code}
> After upgrade to Spark 2.0 the job is significantly slow.  Digging a little 
> into it, we found out that one of the stages produces excessive amount of 
> shuffle data.  Please note that this is a regression from Spark 1.6. Stage 2 
> of the job which used to produce 32KB shuffle data with 1.6, now produces 
> more than 400GB with Spark 2.0. We also tried turning off whole stage code 
> generation but that did not help. 
> PS - Even if the intermediate shuffle data size is huge, the job still 
> produces accurate output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16827) Stop reporting spill metrics as shuffle metrics

2016-10-12 Thread Gaoxiang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15571007#comment-15571007
 ] 

Gaoxiang Liu commented on SPARK-16827:
--

I am quite new to spark, but I think I am getting some confusion on the 
requirement now - 
[~sitalke...@gmail.com] , [~chobrian] , could you please give some comment here 
as for the requirement based on what [~rxin] said ?

> Stop reporting spill metrics as shuffle metrics
> ---
>
> Key: SPARK-16827
> URL: https://issues.apache.org/jira/browse/SPARK-16827
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 2.0.0
>Reporter: Sital Kedia
>Assignee: Brian Cho
>  Labels: performance
> Fix For: 2.0.2, 2.1.0
>
>
> One of our hive job which looks like this -
> {code}
>  SELECT  userid
>  FROM  table1 a
>  JOIN table2 b
>   ONa.ds = '2016-07-15'
>   AND  b.ds = '2016-07-15'
>   AND  a.source_id = b.id
> {code}
> After upgrade to Spark 2.0 the job is significantly slow.  Digging a little 
> into it, we found out that one of the stages produces excessive amount of 
> shuffle data.  Please note that this is a regression from Spark 1.6. Stage 2 
> of the job which used to produce 32KB shuffle data with 1.6, now produces 
> more than 400GB with Spark 2.0. We also tried turning off whole stage code 
> generation but that did not help. 
> PS - Even if the intermediate shuffle data size is huge, the job still 
> produces accurate output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16827) Stop reporting spill metrics as shuffle metrics

2016-10-12 Thread Gaoxiang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570446#comment-15570446
 ] 

Gaoxiang Liu edited comment on SPARK-16827 at 10/13/16 1:16 AM:


[~rxin], for this one, I think spill byte (both memory and disk), and shuffle 
bytes are already logged and reported, right ?
Also, if I want to add spill time metrics, do you suggest I create a parent 
class DiskWriteMetrics, and ShuffleWriteMetrics and my new class (eg 
SpillWriteMetrics) inherit from it, and then pass parent 
class(DiskWriteMetrics) to UnsafeSorterSpillWriter 
https://github.com/facebook/FB-Spark/blob/fb-2.0/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java#L209
 ?  

Or do you suggest rename the ShuffleWriteMetrics class to something like 
WriteMetrics ?


was (Author: dreamworks007):
[~rxin], for this one, if I want to add spill time metrics, do you suggest I 
create a parent class DiskWriteMetrics, and ShuffleWriteMetrics and my new 
class (eg SpillWriteMetrics) inherit from it, and then pass parent 
class(DiskWriteMetrics) to UnsafeSorterSpillWriter 
https://github.com/facebook/FB-Spark/blob/fb-2.0/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java#L209
 ?  

Or do you suggest rename the ShuffleWriteMetrics class to something like 
WriteMetrics ?

> Stop reporting spill metrics as shuffle metrics
> ---
>
> Key: SPARK-16827
> URL: https://issues.apache.org/jira/browse/SPARK-16827
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 2.0.0
>Reporter: Sital Kedia
>Assignee: Brian Cho
>  Labels: performance
> Fix For: 2.1.0
>
>
> One of our hive job which looks like this -
> {code}
>  SELECT  userid
>  FROM  table1 a
>  JOIN table2 b
>   ONa.ds = '2016-07-15'
>   AND  b.ds = '2016-07-15'
>   AND  a.source_id = b.id
> {code}
> After upgrade to Spark 2.0 the job is significantly slow.  Digging a little 
> into it, we found out that one of the stages produces excessive amount of 
> shuffle data.  Please note that this is a regression from Spark 1.6. Stage 2 
> of the job which used to produce 32KB shuffle data with 1.6, now produces 
> more than 400GB with Spark 2.0. We also tried turning off whole stage code 
> generation but that did not help. 
> PS - Even if the intermediate shuffle data size is huge, the job still 
> produces accurate output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16827) Stop reporting spill metrics as shuffle metrics

2016-10-12 Thread Gaoxiang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570446#comment-15570446
 ] 

Gaoxiang Liu edited comment on SPARK-16827 at 10/13/16 1:14 AM:


[~rxin], for this one, if I want to add spill time metrics, do you suggest I 
create a parent class DiskWriteMetrics, and ShuffleWriteMetrics and my new 
class (eg SpillWriteMetrics) inherit from it, and then pass parent 
class(DiskWriteMetrics) to UnsafeSorterSpillWriter 
https://github.com/facebook/FB-Spark/blob/fb-2.0/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java#L209
 ?  

Or do you suggest rename the ShuffleWriteMetrics class to something like 
WriteMetrics ?


was (Author: dreamworks007):
[~rxin], for this one, if I want to add spill metrics, do you suggest I create 
a parent class DiskWriteMetrics, and ShuffleWriteMetrics and my new class (eg 
SpillWriteMetrics) inherit from it, and then pass parent 
class(DiskWriteMetrics) to UnsafeSorterSpillWriter 
https://github.com/facebook/FB-Spark/blob/fb-2.0/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java#L209
 ?  

Or do you suggest rename the ShuffleWriteMetrics class to something like 
WriteMetrics ?

> Stop reporting spill metrics as shuffle metrics
> ---
>
> Key: SPARK-16827
> URL: https://issues.apache.org/jira/browse/SPARK-16827
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 2.0.0
>Reporter: Sital Kedia
>Assignee: Brian Cho
>  Labels: performance
> Fix For: 2.1.0
>
>
> One of our hive job which looks like this -
> {code}
>  SELECT  userid
>  FROM  table1 a
>  JOIN table2 b
>   ONa.ds = '2016-07-15'
>   AND  b.ds = '2016-07-15'
>   AND  a.source_id = b.id
> {code}
> After upgrade to Spark 2.0 the job is significantly slow.  Digging a little 
> into it, we found out that one of the stages produces excessive amount of 
> shuffle data.  Please note that this is a regression from Spark 1.6. Stage 2 
> of the job which used to produce 32KB shuffle data with 1.6, now produces 
> more than 400GB with Spark 2.0. We also tried turning off whole stage code 
> generation but that did not help. 
> PS - Even if the intermediate shuffle data size is huge, the job still 
> produces accurate output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16827) Stop reporting spill metrics as shuffle metrics

2016-10-12 Thread Gaoxiang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570446#comment-15570446
 ] 

Gaoxiang Liu commented on SPARK-16827:
--

[~rxin], for this one, if I want to add spill metrics, do you suggest I create 
a parent class DiskWriteMetrics, and ShuffleWriteMetrics and my new class (eg 
SpillWriteMetrics) inherit from it, and then pass parent 
class(DiskWriteMetrics) to UnsafeSorterSpillWriter 
https://github.com/facebook/FB-Spark/blob/fb-2.0/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java#L209
 ?  

Or do you suggest rename the ShuffleWriteMetrics class to something like 
WriteMetrics ?

> Stop reporting spill metrics as shuffle metrics
> ---
>
> Key: SPARK-16827
> URL: https://issues.apache.org/jira/browse/SPARK-16827
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 2.0.0
>Reporter: Sital Kedia
>Assignee: Brian Cho
>  Labels: performance
> Fix For: 2.1.0
>
>
> One of our hive job which looks like this -
> {code}
>  SELECT  userid
>  FROM  table1 a
>  JOIN table2 b
>   ONa.ds = '2016-07-15'
>   AND  b.ds = '2016-07-15'
>   AND  a.source_id = b.id
> {code}
> After upgrade to Spark 2.0 the job is significantly slow.  Digging a little 
> into it, we found out that one of the stages produces excessive amount of 
> shuffle data.  Please note that this is a regression from Spark 1.6. Stage 2 
> of the job which used to produce 32KB shuffle data with 1.6, now produces 
> more than 400GB with Spark 2.0. We also tried turning off whole stage code 
> generation but that did not help. 
> PS - Even if the intermediate shuffle data size is huge, the job still 
> produces accurate output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3577) Add task metric to report spill time

2016-10-10 Thread Gaoxiang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563038#comment-15563038
 ] 

Gaoxiang Liu edited comment on SPARK-3577 at 10/11/16 4:49 AM:
---

[~rxin] I find that the spill size metrics is already added  in 
https://github.com/apache/spark/commit/bb8098f203e6faddf2e1a04b03d62037e6c7#diff-1bd3dc38f6306e0a822f93d62c32b1d0,
 and I have confirm in the UI. (please refer to the attachment of this JIRA - 
https://issues.apache.org/jira/secure/attachment/12832515/spill_size.jpg)

Also, we noticed that it's wield that the spill size is somehow not reported in 
the reducer , but reported in the mapper.

Back to the previous question, for the spill time, if it's still relevant to 
add, then I plan to work on it if there is no objections. 


was (Author: dreamworks007):
I find that the spill size metrics is already added  in 
https://github.com/apache/spark/commit/bb8098f203e6faddf2e1a04b03d62037e6c7#diff-1bd3dc38f6306e0a822f93d62c32b1d0,
 and I have confirm in the UI. (please refer to the attachment of this JIRA - 
https://issues.apache.org/jira/secure/attachment/12832515/spill_size.jpg)

Also, we noticed that it's wield that the spill size is somehow not reported in 
the reducer , but reported in the mapper.

Back to the previous question, for the spill time, if it's still relevant to 
add, then I plan to work on it if there is no objections. 

> Add task metric to report spill time
> 
>
> Key: SPARK-3577
> URL: https://issues.apache.org/jira/browse/SPARK-3577
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.1.0
>Reporter: Kay Ousterhout
>Priority: Minor
> Attachments: spill_size.jpg
>
>
> The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into 
> {{ExternalSorter}}.  The write time recorded in those metrics is never used.  
> We should probably add task metrics to report this spill time, since for 
> shuffles, this would have previously been reported as part of shuffle write 
> time (with the original hash-based sorter).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3577) Add task metric to report spill time

2016-10-10 Thread Gaoxiang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563038#comment-15563038
 ] 

Gaoxiang Liu edited comment on SPARK-3577 at 10/10/16 6:17 PM:
---

I find that the spill size metrics is already added  in 
https://github.com/apache/spark/commit/bb8098f203e6faddf2e1a04b03d62037e6c7#diff-1bd3dc38f6306e0a822f93d62c32b1d0,
 and I have confirm in the UI. (please refer to the attachment of this JIRA - 
https://issues.apache.org/jira/secure/attachment/12832515/spill_size.jpg)

Also, we notices that it's wield that the spill size is somehow not reported in 
the reducer , but reported in the mapper.

Back to the previous question, for the spill time, if it's still relevant to 
add, then I plan to work on it if there is no objections. 


was (Author: dreamworks007):
I find that the spill size metrics is already added  in 
https://github.com/apache/spark/commit/bb8098f203e6faddf2e1a04b03d62037e6c7#diff-1bd3dc38f6306e0a822f93d62c32b1d0,
 and I have confirm in the UI. (please refer to the attachment of this JIRA )

Also, we notices that it's wield that the spill size is somehow not reported in 
the reducer , but reported in the mapper.

Back to the previous question, for the spill time, if it's still relevant to 
add, then I plan to work on it if there is no objections. 

> Add task metric to report spill time
> 
>
> Key: SPARK-3577
> URL: https://issues.apache.org/jira/browse/SPARK-3577
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.1.0
>Reporter: Kay Ousterhout
>Priority: Minor
> Attachments: spill_size.jpg
>
>
> The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into 
> {{ExternalSorter}}.  The write time recorded in those metrics is never used.  
> We should probably add task metrics to report this spill time, since for 
> shuffles, this would have previously been reported as part of shuffle write 
> time (with the original hash-based sorter).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3577) Add task metric to report spill time

2016-10-10 Thread Gaoxiang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563038#comment-15563038
 ] 

Gaoxiang Liu edited comment on SPARK-3577 at 10/10/16 6:18 PM:
---

I find that the spill size metrics is already added  in 
https://github.com/apache/spark/commit/bb8098f203e6faddf2e1a04b03d62037e6c7#diff-1bd3dc38f6306e0a822f93d62c32b1d0,
 and I have confirm in the UI. (please refer to the attachment of this JIRA - 
https://issues.apache.org/jira/secure/attachment/12832515/spill_size.jpg)

Also, we noticed that it's wield that the spill size is somehow not reported in 
the reducer , but reported in the mapper.

Back to the previous question, for the spill time, if it's still relevant to 
add, then I plan to work on it if there is no objections. 


was (Author: dreamworks007):
I find that the spill size metrics is already added  in 
https://github.com/apache/spark/commit/bb8098f203e6faddf2e1a04b03d62037e6c7#diff-1bd3dc38f6306e0a822f93d62c32b1d0,
 and I have confirm in the UI. (please refer to the attachment of this JIRA - 
https://issues.apache.org/jira/secure/attachment/12832515/spill_size.jpg)

Also, we notices that it's wield that the spill size is somehow not reported in 
the reducer , but reported in the mapper.

Back to the previous question, for the spill time, if it's still relevant to 
add, then I plan to work on it if there is no objections. 

> Add task metric to report spill time
> 
>
> Key: SPARK-3577
> URL: https://issues.apache.org/jira/browse/SPARK-3577
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.1.0
>Reporter: Kay Ousterhout
>Priority: Minor
> Attachments: spill_size.jpg
>
>
> The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into 
> {{ExternalSorter}}.  The write time recorded in those metrics is never used.  
> We should probably add task metrics to report this spill time, since for 
> shuffles, this would have previously been reported as part of shuffle write 
> time (with the original hash-based sorter).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3577) Add task metric to report spill time

2016-10-10 Thread Gaoxiang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563038#comment-15563038
 ] 

Gaoxiang Liu edited comment on SPARK-3577 at 10/10/16 6:17 PM:
---

I find that the spill size metrics is already added  in 
https://github.com/apache/spark/commit/bb8098f203e6faddf2e1a04b03d62037e6c7#diff-1bd3dc38f6306e0a822f93d62c32b1d0,
 and I have confirm in the UI. (please refer to the attachment of this JIRA )

Also, we notices that it's wield that the spill size is somehow not reported in 
the reducer , but reported in the mapper.

Back to the previous question, for the spill time, if it's still relevant to 
add, then I plan to work on it if there is no objections. 


was (Author: dreamworks007):
I find that the spill size metrics is already added  in 
https://github.com/apache/spark/commit/bb8098f203e6faddf2e1a04b03d62037e6c7#diff-1bd3dc38f6306e0a822f93d62c32b1d0,
 and I have confirm in the UI.

Also, we notices that it's wield that the spill size is somehow not reported in 
the reducer , but reported in the mapper.

Back to the previous question, for the spill time, if it's still relevant to 
add, then I plan to work on it if there is no objections. 

> Add task metric to report spill time
> 
>
> Key: SPARK-3577
> URL: https://issues.apache.org/jira/browse/SPARK-3577
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.1.0
>Reporter: Kay Ousterhout
>Priority: Minor
> Attachments: spill_size.jpg
>
>
> The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into 
> {{ExternalSorter}}.  The write time recorded in those metrics is never used.  
> We should probably add task metrics to report this spill time, since for 
> shuffles, this would have previously been reported as part of shuffle write 
> time (with the original hash-based sorter).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-3577) Add task metric to report spill time

2016-10-10 Thread Gaoxiang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gaoxiang Liu updated SPARK-3577:

Comment: was deleted

(was: spill size metrics)

> Add task metric to report spill time
> 
>
> Key: SPARK-3577
> URL: https://issues.apache.org/jira/browse/SPARK-3577
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.1.0
>Reporter: Kay Ousterhout
>Priority: Minor
> Attachments: spill_size.jpg
>
>
> The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into 
> {{ExternalSorter}}.  The write time recorded in those metrics is never used.  
> We should probably add task metrics to report this spill time, since for 
> shuffles, this would have previously been reported as part of shuffle write 
> time (with the original hash-based sorter).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3577) Add task metric to report spill time

2016-10-10 Thread Gaoxiang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563038#comment-15563038
 ] 

Gaoxiang Liu commented on SPARK-3577:
-

I find that the spill size metrics is already added  in 
https://github.com/apache/spark/commit/bb8098f203e6faddf2e1a04b03d62037e6c7#diff-1bd3dc38f6306e0a822f93d62c32b1d0,
 and I have confirm in the UI.

Also, we notices that it's wield that the spill size is somehow not reported in 
the reducer , but reported in the mapper.

Back to the previous question, for the spill time, if it's still relevant to 
add, then I plan to work on it if there is no objections. 

> Add task metric to report spill time
> 
>
> Key: SPARK-3577
> URL: https://issues.apache.org/jira/browse/SPARK-3577
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.1.0
>Reporter: Kay Ousterhout
>Priority: Minor
> Attachments: spill_size.jpg
>
>
> The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into 
> {{ExternalSorter}}.  The write time recorded in those metrics is never used.  
> We should probably add task metrics to report this spill time, since for 
> shuffles, this would have previously been reported as part of shuffle write 
> time (with the original hash-based sorter).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3577) Add task metric to report spill time

2016-10-10 Thread Gaoxiang Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gaoxiang Liu updated SPARK-3577:

Attachment: spill_size.jpg

spill size metrics

> Add task metric to report spill time
> 
>
> Key: SPARK-3577
> URL: https://issues.apache.org/jira/browse/SPARK-3577
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.1.0
>Reporter: Kay Ousterhout
>Priority: Minor
> Attachments: spill_size.jpg
>
>
> The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into 
> {{ExternalSorter}}.  The write time recorded in those metrics is never used.  
> We should probably add task metrics to report this spill time, since for 
> shuffles, this would have previously been reported as part of shuffle write 
> time (with the original hash-based sorter).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3577) Add task metric to report spill time

2016-10-10 Thread Gaoxiang Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15562925#comment-15562925
 ] 

Gaoxiang Liu commented on SPARK-3577:
-

Hi [~kayousterhout], 

Just want to make sure that this JIRA is still relevant, right ?  Is there any 
changes to the requirement ?

I am currently working on this one, so just want to make sure.

Thanks !

> Add task metric to report spill time
> 
>
> Key: SPARK-3577
> URL: https://issues.apache.org/jira/browse/SPARK-3577
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.1.0
>Reporter: Kay Ousterhout
>Priority: Minor
>
> The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into 
> {{ExternalSorter}}.  The write time recorded in those metrics is never used.  
> We should probably add task metrics to report this spill time, since for 
> shuffles, this would have previously been reported as part of shuffle write 
> time (with the original hash-based sorter).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org