Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/16677
  
    @sujith71955 For `executeTake`, to optimize it we need to collect 
statistics of RDD. `executeTake` incrementally scans partitions. Ideally, it 
should just scan few partitions to return `n` rows, and remaining partitions 
can be skipped and don't need to be materialized. So going back to the 
beginning, IMHO, if we are going to collect the statistics, we will materialize 
all partitions, and that seems to be opposite to `executeTake`'s optimization.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to