[jira] [Commented] (SPARK-16827) Stop reporting spill metrics as shuffle metrics
[ https://issues.apache.org/jira/browse/SPARK-16827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15659972#comment-15659972 ] Gaoxiang Liu commented on SPARK-16827: -- ping ping.. > Stop reporting spill metrics as shuffle metrics > --- > > Key: SPARK-16827 > URL: https://issues.apache.org/jira/browse/SPARK-16827 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 2.0.0 >Reporter: Sital Kedia >Assignee: Brian Cho > Labels: performance > > One of our hive job which looks like this - > {code} > SELECT userid > FROM table1 a > JOIN table2 b > ONa.ds = '2016-07-15' > AND b.ds = '2016-07-15' > AND a.source_id = b.id > {code} > After upgrade to Spark 2.0 the job is significantly slow. Digging a little > into it, we found out that one of the stages produces excessive amount of > shuffle data. Please note that this is a regression from Spark 1.6. Stage 2 > of the job which used to produce 32KB shuffle data with 1.6, now produces > more than 400GB with Spark 2.0. We also tried turning off whole stage code > generation but that did not help. > PS - Even if the intermediate shuffle data size is huge, the job still > produces accurate output. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16827) Stop reporting spill metrics as shuffle metrics
[ https://issues.apache.org/jira/browse/SPARK-16827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15571007#comment-15571007 ] Gaoxiang Liu commented on SPARK-16827: -- I am quite new to spark, but I think I am getting some confusion on the requirement now - [~sitalke...@gmail.com] , [~chobrian] , could you please give some comment here as for the requirement based on what [~rxin] said ? > Stop reporting spill metrics as shuffle metrics > --- > > Key: SPARK-16827 > URL: https://issues.apache.org/jira/browse/SPARK-16827 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 2.0.0 >Reporter: Sital Kedia >Assignee: Brian Cho > Labels: performance > Fix For: 2.0.2, 2.1.0 > > > One of our hive job which looks like this - > {code} > SELECT userid > FROM table1 a > JOIN table2 b > ONa.ds = '2016-07-15' > AND b.ds = '2016-07-15' > AND a.source_id = b.id > {code} > After upgrade to Spark 2.0 the job is significantly slow. Digging a little > into it, we found out that one of the stages produces excessive amount of > shuffle data. Please note that this is a regression from Spark 1.6. Stage 2 > of the job which used to produce 32KB shuffle data with 1.6, now produces > more than 400GB with Spark 2.0. We also tried turning off whole stage code > generation but that did not help. > PS - Even if the intermediate shuffle data size is huge, the job still > produces accurate output. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16827) Stop reporting spill metrics as shuffle metrics
[ https://issues.apache.org/jira/browse/SPARK-16827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570446#comment-15570446 ] Gaoxiang Liu edited comment on SPARK-16827 at 10/13/16 1:16 AM: [~rxin], for this one, I think spill byte (both memory and disk), and shuffle bytes are already logged and reported, right ? Also, if I want to add spill time metrics, do you suggest I create a parent class DiskWriteMetrics, and ShuffleWriteMetrics and my new class (eg SpillWriteMetrics) inherit from it, and then pass parent class(DiskWriteMetrics) to UnsafeSorterSpillWriter https://github.com/facebook/FB-Spark/blob/fb-2.0/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java#L209 ? Or do you suggest rename the ShuffleWriteMetrics class to something like WriteMetrics ? was (Author: dreamworks007): [~rxin], for this one, if I want to add spill time metrics, do you suggest I create a parent class DiskWriteMetrics, and ShuffleWriteMetrics and my new class (eg SpillWriteMetrics) inherit from it, and then pass parent class(DiskWriteMetrics) to UnsafeSorterSpillWriter https://github.com/facebook/FB-Spark/blob/fb-2.0/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java#L209 ? Or do you suggest rename the ShuffleWriteMetrics class to something like WriteMetrics ? > Stop reporting spill metrics as shuffle metrics > --- > > Key: SPARK-16827 > URL: https://issues.apache.org/jira/browse/SPARK-16827 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 2.0.0 >Reporter: Sital Kedia >Assignee: Brian Cho > Labels: performance > Fix For: 2.1.0 > > > One of our hive job which looks like this - > {code} > SELECT userid > FROM table1 a > JOIN table2 b > ONa.ds = '2016-07-15' > AND b.ds = '2016-07-15' > AND a.source_id = b.id > {code} > After upgrade to Spark 2.0 the job is significantly slow. Digging a little > into it, we found out that one of the stages produces excessive amount of > shuffle data. Please note that this is a regression from Spark 1.6. Stage 2 > of the job which used to produce 32KB shuffle data with 1.6, now produces > more than 400GB with Spark 2.0. We also tried turning off whole stage code > generation but that did not help. > PS - Even if the intermediate shuffle data size is huge, the job still > produces accurate output. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16827) Stop reporting spill metrics as shuffle metrics
[ https://issues.apache.org/jira/browse/SPARK-16827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570446#comment-15570446 ] Gaoxiang Liu edited comment on SPARK-16827 at 10/13/16 1:14 AM: [~rxin], for this one, if I want to add spill time metrics, do you suggest I create a parent class DiskWriteMetrics, and ShuffleWriteMetrics and my new class (eg SpillWriteMetrics) inherit from it, and then pass parent class(DiskWriteMetrics) to UnsafeSorterSpillWriter https://github.com/facebook/FB-Spark/blob/fb-2.0/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java#L209 ? Or do you suggest rename the ShuffleWriteMetrics class to something like WriteMetrics ? was (Author: dreamworks007): [~rxin], for this one, if I want to add spill metrics, do you suggest I create a parent class DiskWriteMetrics, and ShuffleWriteMetrics and my new class (eg SpillWriteMetrics) inherit from it, and then pass parent class(DiskWriteMetrics) to UnsafeSorterSpillWriter https://github.com/facebook/FB-Spark/blob/fb-2.0/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java#L209 ? Or do you suggest rename the ShuffleWriteMetrics class to something like WriteMetrics ? > Stop reporting spill metrics as shuffle metrics > --- > > Key: SPARK-16827 > URL: https://issues.apache.org/jira/browse/SPARK-16827 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 2.0.0 >Reporter: Sital Kedia >Assignee: Brian Cho > Labels: performance > Fix For: 2.1.0 > > > One of our hive job which looks like this - > {code} > SELECT userid > FROM table1 a > JOIN table2 b > ONa.ds = '2016-07-15' > AND b.ds = '2016-07-15' > AND a.source_id = b.id > {code} > After upgrade to Spark 2.0 the job is significantly slow. Digging a little > into it, we found out that one of the stages produces excessive amount of > shuffle data. Please note that this is a regression from Spark 1.6. Stage 2 > of the job which used to produce 32KB shuffle data with 1.6, now produces > more than 400GB with Spark 2.0. We also tried turning off whole stage code > generation but that did not help. > PS - Even if the intermediate shuffle data size is huge, the job still > produces accurate output. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16827) Stop reporting spill metrics as shuffle metrics
[ https://issues.apache.org/jira/browse/SPARK-16827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570446#comment-15570446 ] Gaoxiang Liu commented on SPARK-16827: -- [~rxin], for this one, if I want to add spill metrics, do you suggest I create a parent class DiskWriteMetrics, and ShuffleWriteMetrics and my new class (eg SpillWriteMetrics) inherit from it, and then pass parent class(DiskWriteMetrics) to UnsafeSorterSpillWriter https://github.com/facebook/FB-Spark/blob/fb-2.0/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java#L209 ? Or do you suggest rename the ShuffleWriteMetrics class to something like WriteMetrics ? > Stop reporting spill metrics as shuffle metrics > --- > > Key: SPARK-16827 > URL: https://issues.apache.org/jira/browse/SPARK-16827 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 2.0.0 >Reporter: Sital Kedia >Assignee: Brian Cho > Labels: performance > Fix For: 2.1.0 > > > One of our hive job which looks like this - > {code} > SELECT userid > FROM table1 a > JOIN table2 b > ONa.ds = '2016-07-15' > AND b.ds = '2016-07-15' > AND a.source_id = b.id > {code} > After upgrade to Spark 2.0 the job is significantly slow. Digging a little > into it, we found out that one of the stages produces excessive amount of > shuffle data. Please note that this is a regression from Spark 1.6. Stage 2 > of the job which used to produce 32KB shuffle data with 1.6, now produces > more than 400GB with Spark 2.0. We also tried turning off whole stage code > generation but that did not help. > PS - Even if the intermediate shuffle data size is huge, the job still > produces accurate output. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-3577) Add task metric to report spill time
[ https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563038#comment-15563038 ] Gaoxiang Liu edited comment on SPARK-3577 at 10/11/16 4:49 AM: --- [~rxin] I find that the spill size metrics is already added in https://github.com/apache/spark/commit/bb8098f203e6faddf2e1a04b03d62037e6c7#diff-1bd3dc38f6306e0a822f93d62c32b1d0, and I have confirm in the UI. (please refer to the attachment of this JIRA - https://issues.apache.org/jira/secure/attachment/12832515/spill_size.jpg) Also, we noticed that it's wield that the spill size is somehow not reported in the reducer , but reported in the mapper. Back to the previous question, for the spill time, if it's still relevant to add, then I plan to work on it if there is no objections. was (Author: dreamworks007): I find that the spill size metrics is already added in https://github.com/apache/spark/commit/bb8098f203e6faddf2e1a04b03d62037e6c7#diff-1bd3dc38f6306e0a822f93d62c32b1d0, and I have confirm in the UI. (please refer to the attachment of this JIRA - https://issues.apache.org/jira/secure/attachment/12832515/spill_size.jpg) Also, we noticed that it's wield that the spill size is somehow not reported in the reducer , but reported in the mapper. Back to the previous question, for the spill time, if it's still relevant to add, then I plan to work on it if there is no objections. > Add task metric to report spill time > > > Key: SPARK-3577 > URL: https://issues.apache.org/jira/browse/SPARK-3577 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.1.0 >Reporter: Kay Ousterhout >Priority: Minor > Attachments: spill_size.jpg > > > The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into > {{ExternalSorter}}. The write time recorded in those metrics is never used. > We should probably add task metrics to report this spill time, since for > shuffles, this would have previously been reported as part of shuffle write > time (with the original hash-based sorter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-3577) Add task metric to report spill time
[ https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563038#comment-15563038 ] Gaoxiang Liu edited comment on SPARK-3577 at 10/10/16 6:17 PM: --- I find that the spill size metrics is already added in https://github.com/apache/spark/commit/bb8098f203e6faddf2e1a04b03d62037e6c7#diff-1bd3dc38f6306e0a822f93d62c32b1d0, and I have confirm in the UI. (please refer to the attachment of this JIRA - https://issues.apache.org/jira/secure/attachment/12832515/spill_size.jpg) Also, we notices that it's wield that the spill size is somehow not reported in the reducer , but reported in the mapper. Back to the previous question, for the spill time, if it's still relevant to add, then I plan to work on it if there is no objections. was (Author: dreamworks007): I find that the spill size metrics is already added in https://github.com/apache/spark/commit/bb8098f203e6faddf2e1a04b03d62037e6c7#diff-1bd3dc38f6306e0a822f93d62c32b1d0, and I have confirm in the UI. (please refer to the attachment of this JIRA ) Also, we notices that it's wield that the spill size is somehow not reported in the reducer , but reported in the mapper. Back to the previous question, for the spill time, if it's still relevant to add, then I plan to work on it if there is no objections. > Add task metric to report spill time > > > Key: SPARK-3577 > URL: https://issues.apache.org/jira/browse/SPARK-3577 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.1.0 >Reporter: Kay Ousterhout >Priority: Minor > Attachments: spill_size.jpg > > > The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into > {{ExternalSorter}}. The write time recorded in those metrics is never used. > We should probably add task metrics to report this spill time, since for > shuffles, this would have previously been reported as part of shuffle write > time (with the original hash-based sorter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-3577) Add task metric to report spill time
[ https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563038#comment-15563038 ] Gaoxiang Liu edited comment on SPARK-3577 at 10/10/16 6:18 PM: --- I find that the spill size metrics is already added in https://github.com/apache/spark/commit/bb8098f203e6faddf2e1a04b03d62037e6c7#diff-1bd3dc38f6306e0a822f93d62c32b1d0, and I have confirm in the UI. (please refer to the attachment of this JIRA - https://issues.apache.org/jira/secure/attachment/12832515/spill_size.jpg) Also, we noticed that it's wield that the spill size is somehow not reported in the reducer , but reported in the mapper. Back to the previous question, for the spill time, if it's still relevant to add, then I plan to work on it if there is no objections. was (Author: dreamworks007): I find that the spill size metrics is already added in https://github.com/apache/spark/commit/bb8098f203e6faddf2e1a04b03d62037e6c7#diff-1bd3dc38f6306e0a822f93d62c32b1d0, and I have confirm in the UI. (please refer to the attachment of this JIRA - https://issues.apache.org/jira/secure/attachment/12832515/spill_size.jpg) Also, we notices that it's wield that the spill size is somehow not reported in the reducer , but reported in the mapper. Back to the previous question, for the spill time, if it's still relevant to add, then I plan to work on it if there is no objections. > Add task metric to report spill time > > > Key: SPARK-3577 > URL: https://issues.apache.org/jira/browse/SPARK-3577 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.1.0 >Reporter: Kay Ousterhout >Priority: Minor > Attachments: spill_size.jpg > > > The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into > {{ExternalSorter}}. The write time recorded in those metrics is never used. > We should probably add task metrics to report this spill time, since for > shuffles, this would have previously been reported as part of shuffle write > time (with the original hash-based sorter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-3577) Add task metric to report spill time
[ https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563038#comment-15563038 ] Gaoxiang Liu edited comment on SPARK-3577 at 10/10/16 6:17 PM: --- I find that the spill size metrics is already added in https://github.com/apache/spark/commit/bb8098f203e6faddf2e1a04b03d62037e6c7#diff-1bd3dc38f6306e0a822f93d62c32b1d0, and I have confirm in the UI. (please refer to the attachment of this JIRA ) Also, we notices that it's wield that the spill size is somehow not reported in the reducer , but reported in the mapper. Back to the previous question, for the spill time, if it's still relevant to add, then I plan to work on it if there is no objections. was (Author: dreamworks007): I find that the spill size metrics is already added in https://github.com/apache/spark/commit/bb8098f203e6faddf2e1a04b03d62037e6c7#diff-1bd3dc38f6306e0a822f93d62c32b1d0, and I have confirm in the UI. Also, we notices that it's wield that the spill size is somehow not reported in the reducer , but reported in the mapper. Back to the previous question, for the spill time, if it's still relevant to add, then I plan to work on it if there is no objections. > Add task metric to report spill time > > > Key: SPARK-3577 > URL: https://issues.apache.org/jira/browse/SPARK-3577 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.1.0 >Reporter: Kay Ousterhout >Priority: Minor > Attachments: spill_size.jpg > > > The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into > {{ExternalSorter}}. The write time recorded in those metrics is never used. > We should probably add task metrics to report this spill time, since for > shuffles, this would have previously been reported as part of shuffle write > time (with the original hash-based sorter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-3577) Add task metric to report spill time
[ https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gaoxiang Liu updated SPARK-3577: Comment: was deleted (was: spill size metrics) > Add task metric to report spill time > > > Key: SPARK-3577 > URL: https://issues.apache.org/jira/browse/SPARK-3577 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.1.0 >Reporter: Kay Ousterhout >Priority: Minor > Attachments: spill_size.jpg > > > The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into > {{ExternalSorter}}. The write time recorded in those metrics is never used. > We should probably add task metrics to report this spill time, since for > shuffles, this would have previously been reported as part of shuffle write > time (with the original hash-based sorter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3577) Add task metric to report spill time
[ https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563038#comment-15563038 ] Gaoxiang Liu commented on SPARK-3577: - I find that the spill size metrics is already added in https://github.com/apache/spark/commit/bb8098f203e6faddf2e1a04b03d62037e6c7#diff-1bd3dc38f6306e0a822f93d62c32b1d0, and I have confirm in the UI. Also, we notices that it's wield that the spill size is somehow not reported in the reducer , but reported in the mapper. Back to the previous question, for the spill time, if it's still relevant to add, then I plan to work on it if there is no objections. > Add task metric to report spill time > > > Key: SPARK-3577 > URL: https://issues.apache.org/jira/browse/SPARK-3577 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.1.0 >Reporter: Kay Ousterhout >Priority: Minor > Attachments: spill_size.jpg > > > The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into > {{ExternalSorter}}. The write time recorded in those metrics is never used. > We should probably add task metrics to report this spill time, since for > shuffles, this would have previously been reported as part of shuffle write > time (with the original hash-based sorter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3577) Add task metric to report spill time
[ https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gaoxiang Liu updated SPARK-3577: Attachment: spill_size.jpg spill size metrics > Add task metric to report spill time > > > Key: SPARK-3577 > URL: https://issues.apache.org/jira/browse/SPARK-3577 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.1.0 >Reporter: Kay Ousterhout >Priority: Minor > Attachments: spill_size.jpg > > > The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into > {{ExternalSorter}}. The write time recorded in those metrics is never used. > We should probably add task metrics to report this spill time, since for > shuffles, this would have previously been reported as part of shuffle write > time (with the original hash-based sorter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3577) Add task metric to report spill time
[ https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15562925#comment-15562925 ] Gaoxiang Liu commented on SPARK-3577: - Hi [~kayousterhout], Just want to make sure that this JIRA is still relevant, right ? Is there any changes to the requirement ? I am currently working on this one, so just want to make sure. Thanks ! > Add task metric to report spill time > > > Key: SPARK-3577 > URL: https://issues.apache.org/jira/browse/SPARK-3577 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.1.0 >Reporter: Kay Ousterhout >Priority: Minor > > The {{ExternalSorter}} passes its own {{ShuffleWriteMetrics}} into > {{ExternalSorter}}. The write time recorded in those metrics is never used. > We should probably add task metrics to report this spill time, since for > shuffles, this would have previously been reported as part of shuffle write > time (with the original hash-based sorter). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org