cen yuhai created SPARK-16188:
---------------------------------

             Summary: Spark sql will create a lot of small files
                 Key: SPARK-16188
                 URL: https://issues.apache.org/jira/browse/SPARK-16188
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.6.0, 2.0.0
         Environment: spark 1.6.1
            Reporter: cen yuhai


I find that spark sql will create files as many as partition size. When the 
results are small, there will be too many small files and most of them are 
empty. 

Hive have a function to detect the avg of file size. If  avg file size is 
smaller than "hive.merge.smallfiles.avgsize", hive will add a job to merge 
files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to