[ https://issues.apache.org/jira/browse/SPARK-21435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16089442#comment-16089442 ]
Li Yuanjian commented on SPARK-21435: ------------------------------------- [~sowen] I tested the patch in our scenario and add a UT, I think it can skip empty files in Parquet. > Empty files should be skipped while write to file > ------------------------------------------------- > > Key: SPARK-21435 > URL: https://issues.apache.org/jira/browse/SPARK-21435 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.2.0 > Reporter: Li Yuanjian > Priority: Minor > > Consider of this scenario, source table has many partitions and data files, > after the query filter, only a few data write to the destination dir. > In this case the destination dir or table will have many empty files or files > only have schema meta(parquet format), I know we can use coalesce but skip > the empty file may be more better in this case. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org