[GitHub] spark pull request: SPARK-11258 Converting a Spark DataFrame into ...

FRosner Mon, 26 Oct 2015 14:44:28 -0700

Github user FRosner commented on the pull request:

    https://github.com/apache/spark/pull/9222#issuecomment-151293332
  
    @felixcheung the runtime performance tests I conducted came to the result 
that the performance gain is minimal if there is enough RAM available. It was 
faster all the time but only a few per cent. However when we tried to load a 
huge parquet file (70 GB uncompressed), we ran into problems with the old 
implementation. As @Gerrrr indicated it might be related to the map and then 
.toArray, which causes memory overhead. I think when it comes to collecting 
data to the driver we should ensure that we have a small memory consumption.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-11258 Converting a Spark DataFrame into ...

Reply via email to