cen yuhai created SPARK-16188: --------------------------------- Summary: Spark sql will create a lot of small files Key: SPARK-16188 URL: https://issues.apache.org/jira/browse/SPARK-16188 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.6.0, 2.0.0 Environment: spark 1.6.1 Reporter: cen yuhai
I find that spark sql will create files as many as partition size. When the results are small, there will be too many small files and most of them are empty. Hive have a function to detect the avg of file size. If avg file size is smaller than "hive.merge.smallfiles.avgsize", hive will add a job to merge files. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org