Hi, Jacky
Because it can't take full advantage of my resources and failed randomly.
I want to make the insert process more quickly and stable.
Yes, you are right. we should also find out why the loading failed.
I don't think LOCAL_SORT and BATCH_SORT should use
“sparkSession.sparkContext.defaultParallism”, use spark default behavior is
more resonable, because it is based on the datasize. Users don't need to adjust
it when the data become larger.
Best regards!
Yuhai Cen
在2017年10月28日 12:21,Jacky Li<[email protected]> 写道:
Hi,
I am not getting the intention behind this proposal. Is it because of the
loading failure? If yes, we should find out why the loading failed.
If not, then what is the intention?
Actually I think the “carbon.number.of.cores.while.loading” property should be
marked as obsolete.
GLOBAL_SORT and NO_SORT should use spark default behavior
LOCAL_SORT and BATCH_SORT should use
“sparkSession.sparkContext.defaultParallism” as the cores to do local sorting
Regards,
Jacky
> 在 2017年10月27日,上午8:43,cenyuhai11 <[email protected]> 写道:
>
> When I insert data into carbondata from one table, I should do as the
> following:
> 1、select count(1) from table1
> and then
> 2、insert into table table1 select * from table1
>
> Why I should execute "select count(1) from table1" first?
> because the number of tasks are compute by carbondata, it is releated to how
> many executor hosts we have now!
>
> I don't think it is the right way. We should let spark to control the number
> of tasks.
> set the parameter "mapred.max.splits.size" is a common way to adjust the
> number of tasks.
>
> Even when I do the step 2, some tasks still failed, it will increase the
> insert time.
>
> So I sugguest that don't adjust the number of tasks, just use the default
> behavior of spark.
> And then if there are small files, add a fast merge job(merge data at
> blocket level, just as )
>
> so we also need to set the default value of
> "carbon.number.of.cores.while.loading" to 1
>
>
>
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/