[ 
https://issues.apache.org/jira/browse/SPARK-16984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414571#comment-15414571
 ] 

Apache Spark commented on SPARK-16984:
--------------------------------------

User 'robert3005' has created a pull request for this issue:
https://github.com/apache/spark/pull/14573

> executeTake tries all partitions if first parition is empty
> -----------------------------------------------------------
>
>                 Key: SPARK-16984
>                 URL: https://issues.apache.org/jira/browse/SPARK-16984
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Robert Kruszewski
>
> in executeTake if the number of rows returned by first partition is 0 we try 
> all partitions next time. This can lead to pathological cases where your 
> first partition is empty and rest have data. This unfortunately can happen 
> with skewed data. Empirically observed it's better to make few roundtrips 
> instead of potentially killing driver with big collect



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to