[jira] [Commented] (SPARK-16188) Spark sql create a lot of small files

xianlongZhang (JIRA) Wed, 16 Aug 2017 18:33:47 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-16188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129742#comment-16129742
 ]


xianlongZhang commented on SPARK-16188:
---------------------------------------

But when we use Spark sql, we can not call the 'coalesce' method. What should 
we do in this case? In our production environment, this often happens and does 
not find a better solution until now

> Spark sql create a lot of small files
> -------------------------------------
>
>                 Key: SPARK-16188
>                 URL: https://issues.apache.org/jira/browse/SPARK-16188
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.6.0
>         Environment: spark 1.6.1
>            Reporter: cen yuhai
>
> I find that spark sql will create files as many as partition size. When the 
> results are small, there will be too many small files and most of them are 
> empty. 
> Hive have a function to detect the avg of file size. If  avg file size is 
> smaller than "hive.merge.smallfiles.avgsize", hive will add a job to merge 
> files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16188) Spark sql create a lot of small files

Reply via email to