Hi All,
Recently, I found carbon over-use cluster resources. Generally the design of 
carbon work flow does not act as common spark task which only do one small work 
in one thread, but the task has its mind/logic.

For example,
1.launch carbon with --num-executors=1 but set 
carbon.number.of.cores.while.loading=10;
2.no_sort table with multi-block input, N Iterator<CarbonRowBatch> for example, 
carbon will start N tasks in parallel. And in each task the 
CarbonFactDataHandlerColumnar has model.getNumberOfCores() (let's say C) in 
ProducerPool. Totally launch N*C threads; ==>This is the case makes me take 
this as serious problem. To many threads stucks the executor to send heartbeat 
and be killed.

So, the over-use is related to usage of threadpool.

This would affect the cluster overall resource usage and may lead to wrong 
performance results.

I hope this get your notice while fixing or writing new codes.

Reply via email to