Hi Ajantha,
If we think of this problem in the opposite, carbon may waste resources if user 
do not set the properties correctly. 

What about the case when concurrent loading? 

So first of all, we need to figure out where and how many the executor services 
is used. If keeping logic of one node one task, need to keep the overall 
running threads in a task. 

Then, a little thinking:
Is a global executor service possible? That may cause some dependencies in 
different steps of loading.
Is multiple executor services for each step(or others) of loading possible? Can 
the specific executor services change size? (like local-sort is done, then most 
threads work for writing and none for input reading and converting)

BTW, do you know why the cofigurtation "carbon.number.of.cores.while.loading" 
born ? 




On 2020/04/15 13:54:50, Ajantha Bhat <ajanthab...@gmail.com> wrote: 
> Hi Manhua,
> 
> For only No sort and Local sort, we don't follow spark task launch logic.
> we have our own logic of one node one task. And inside that task we can
> control resource by configuration (carbon.number.of.cores.while.loading)
> 
> As you pointed in the above mail, *N * C is controlled by configuration*
> and the default value of C is 2.
> *I see over use cluster problem only if you configure it badly.*
> 
> Do you have any suggestion to the change design? Feel free to raise a
> discussion and work on it.
> 
> Thanks,
> Ajantha
> 
> On Tue, Apr 14, 2020 at 6:06 PM Liang Chen <chenliang6...@gmail.com> wrote:
> 
> > OK, thank you feedbacked this issue, let us look into it.
> >
> > Regards
> > Liang
> >
> >
> > Manhua Jiang wrote
> > > Hi All,
> > > Recently, I found carbon over-use cluster resources. Generally the design
> > > of carbon work flow does not act as common spark task which only do one
> > > small work in one thread, but the task has its mind/logic.
> > >
> > > For example,
> > > 1.launch carbon with --num-executors=1 but set
> > > carbon.number.of.cores.while.loading=10;
> > > 2.no_sort table with multi-block input, N Iterator
> > > <CarbonRowBatch>
> > >  for example, carbon will start N tasks in parallel. And in each task the
> > > CarbonFactDataHandlerColumnar has model.getNumberOfCores() (let's say C)
> > > in ProducerPool. Totally launch N*C threads; ==>This is the case makes me
> > > take this as serious problem. To many threads stucks the executor to send
> > > heartbeat and be killed.
> > >
> > > So, the over-use is related to usage of threadpool.
> > >
> > > This would affect the cluster overall resource usage and may lead to
> > wrong
> > > performance results.
> > >
> > > I hope this get your notice while fixing or writing new codes.
> >
> >
> >
> >
> >
> > --
> > Sent from:
> > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> >
> 

Reply via email to