[jira] [Updated] (SPARK-10250) Scala PairRDDFunctions.groupByKey() should be fault-tolerant of single large groups

2019-05-20 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-10250:
-
Labels: bulk-closed  (was: )

> Scala PairRDDFunctions.groupByKey() should be fault-tolerant of single large 
> groups
> ---
>
> Key: SPARK-10250
> URL: https://issues.apache.org/jira/browse/SPARK-10250
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.1
>Reporter: Matt Cheah
>Priority: Minor
>  Labels: bulk-closed
>
> PairRDDFunctions.groupByKey() is less robust that Python's equivalent, as 
> PySpark's groupByKey can spill single large groups to disk. We should bring 
> the Scala implementation up to parity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10250) Scala PairRDDFunctions.groupByKey() should be fault-tolerant of single large groups

2015-12-15 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-10250:
--
Summary: Scala PairRDDFunctions.groupByKey() should be fault-tolerant of 
single large groups  (was: Scala PairRDDFuncitons.groupByKey() should be 
fault-tolerant of single large groups)

> Scala PairRDDFunctions.groupByKey() should be fault-tolerant of single large 
> groups
> ---
>
> Key: SPARK-10250
> URL: https://issues.apache.org/jira/browse/SPARK-10250
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.1
>Reporter: Matt Cheah
>Priority: Minor
>
> PairRDDFunctions.groupByKey() is less robust that Python's equivalent, as 
> PySpark's groupByKey can spill single large groups to disk. We should bring 
> the Scala implementation up to parity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org