[jira] [Comment Edited] (SPARK-27025) Speed up toLocalIterator

2019-03-04 Thread Erik van Oosten (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783220#comment-16783220
 ] 

Erik van Oosten edited comment on SPARK-27025 at 3/4/19 10:36 AM:
--

[~hyukjin.kwon] maybe I misunderstood Sean's comment. I understood that every 
invocation of toLocalIterator will either benefit, or not have any negative 
side effect.

Under this assumption, it would be better to put the 
cache/count/iterate/unpersist logic directly in toLocalIterator.

I can not make any assumptions on the number of use cases.


was (Author: erikvanoosten):
[~hyukjin.kwon] maybe I misunderstood Sean's comment. I understood that every 
invocation of toLocalIterator will either benefit, or not have any negative 
side effect.

Under this assumption, it would be better to put the 
cache/count/iterate/unpersist logic directly in toLocalIterator.

> Speed up toLocalIterator
> 
>
> Key: SPARK-27025
> URL: https://issues.apache.org/jira/browse/SPARK-27025
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core
>Affects Versions: 2.3.3
>Reporter: Erik van Oosten
>Priority: Major
>
> Method {{toLocalIterator}} fetches the partitions to the driver one by one. 
> However, as far as I can see, any required computation for the 
> yet-to-be-fetched-partitions is not kicked off until it is fetched. 
> Effectively only one partition is being computed at the same time. 
> Desired behavior: immediately start calculation of all partitions while 
> retaining the download-a-partition at a time behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-27025) Speed up toLocalIterator

2019-03-02 Thread Erik van Oosten (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782322#comment-16782322
 ] 

Erik van Oosten edited comment on SPARK-27025 at 3/2/19 8:43 AM:
-

The point is to _not_ fetch pro-actively.

I have a program in which several steps need to be executed before anything can 
be transferred to the driver. So why can't the executors start executing 
immediately, and only transfer the results to the driver when its ready?


was (Author: erikvanoosten):
I have a program in which several steps need to be executed before anything can 
be transferred to the driver. So why can't the executors start executing 
immediately, and only transfer the results to the driver when its ready?

> Speed up toLocalIterator
> 
>
> Key: SPARK-27025
> URL: https://issues.apache.org/jira/browse/SPARK-27025
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core
>Affects Versions: 2.3.3
>Reporter: Erik van Oosten
>Priority: Major
>
> Method {{toLocalIterator}} fetches the partitions to the driver one by one. 
> However, as far as I can see, any required computation for the 
> yet-to-be-fetched-partitions is not kicked off until it is fetched. 
> Effectively only one partition is being computed at the same time. 
> Desired behavior: immediately start calculation of all partitions while 
> retaining the download-a-partition at a time behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org