[ https://issues.apache.org/jira/browse/SPARK-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-10216: --------------------------------- Fix Version/s: 2.3.0 > Avoid creating empty files during overwrite into Hive table with group by > query > ------------------------------------------------------------------------------- > > Key: SPARK-10216 > URL: https://issues.apache.org/jira/browse/SPARK-10216 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.0.0 > Reporter: Keuntae Park > Assignee: Hyukjin Kwon > Priority: Minor > Fix For: 2.3.0 > > > Exchange from GROUP BY query results in at least certain amount of partitions > specified in 'spark.sql.shuffle.partition'. > Hence, even when the number of distinct group-by key is small, > INSERT INTO with GROUP BY query try to make at least 200 files (default value > of 'spark.sql.shuffle.partition'), > which results in lots of empty files. > I think it is undesirable because upcoming queries on the resulting table > will also make zero size partitions and unnecessary tasks do nothing on > handling the queries. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org