Re: spark sql job create too many files in HDFS when doing insert overwrite hive table

2016-04-28 Thread linxi zeng
BTW, I have created a JIRA task to follow this issue: https://issues.apache.org/jira/browse/SPARK-14974 2016-04-28 18:08 GMT+08:00 linxi zeng : > Hi, > > Recently, we often encounter problems using spark sql for inserting data > into a partition table (ex.: insert

spark sql job create too many files in HDFS when doing insert overwrite hive table

2016-04-28 Thread linxi zeng
Hi, Recently, we often encounter problems using spark sql for inserting data into a partition table (ex.: insert overwrite table $output_table partition(dt) select xxx from tmp_table). After the spark job start running on yarn, *the app will create too many files (ex. 200w+, or even 1000w+),