Github user FRosner commented on the pull request: https://github.com/apache/spark/pull/9222#issuecomment-151293332 @felixcheung the runtime performance tests I conducted came to the result that the performance gain is minimal if there is enough RAM available. It was faster all the time but only a few per cent. However when we tried to load a huge parquet file (70 GB uncompressed), we ran into problems with the old implementation. As @Gerrrr indicated it might be related to the map and then .toArray, which causes memory overhead. I think when it comes to collecting data to the driver we should ensure that we have a small memory consumption.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org