carbondata partitioned by date generate many small files

陈星宇 Mon, 04 Jun 2018 23:49:46 -0700

hi carbondata team,


i am using carbondata 1.3.1 to create table and import data, generated many 
small files and spark job is very slow, i suspected the number of file is 
related to the number of spark job . but if i decrease the jobs, job will fail 
because of outofmemory. see my ddl as below:


create table xx.xx(
dept_name string,
xx
.
.
.
) PARTITIONED BY (xxx date)
STORED BY 'carbondata' TBLPROPERTIES('SORT_COLUMNS'='xxx,xxx,xxx ,xxx,xxx')



please give some advice.


thanks


ChenXingYu

carbondata partitioned by date generate many small files

Reply via email to