[ https://issues.apache.org/jira/browse/SPARK-8125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14626456#comment-14626456 ]
Apache Spark commented on SPARK-8125: ------------------------------------- User 'liancheng' has created a pull request for this issue: https://github.com/apache/spark/pull/7396 > Accelerate ParquetRelation2 metadata discovery > ---------------------------------------------- > > Key: SPARK-8125 > URL: https://issues.apache.org/jira/browse/SPARK-8125 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 1.4.0 > Reporter: Cheng Lian > Assignee: Cheng Lian > Priority: Blocker > > For large Parquet tables (e.g., with thousands of partitions), it can be very > slow to discover Parquet metadata for schema merging and generating splits > for Spark jobs. We need to accelerate this processes. One possible solution > is to do the discovery via a distributed Spark job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org