date:20210202

[DISCUSSION] Improve Simple insert performance in carbondata

2021-02-02 Thread akshay_nuthala

Hi Community,

As Carbon is closely integrated with spark, insert operations in carbon are
done using spark API. This in turn fires spark jobs, which adds various
overhead like task serialisation cost, extra memory consumption, execution
time in remote nodes, shuffle etc.

In case of simple insert operations - we can improve the performance by
reusing SDK (which is plain java code) to achieve the same, thereby cutting
off the overheads discussed above. 

Following is the link to the design document. Please give your valuable
comments/inputs/suggestions.

https://docs.google.com/document/d/1BcbTcO__vZbLLuhU73NIcbJOM2FRcKBa-ZxackofAS0/edit?usp=sharing

Thanks,

Regards,
N Akshay Kumar



--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [DISCUSSION] Improve Simple insert performance in carbondata

2021-02-02 Thread Ajantha Bhat

Hi,

Simple insert you mean "insert by values"? I don't think in real data
pipeline this will be used frequently. Ideally insert will be used for
inserting from other table or external table.

Just for one row insert (or insert by values) I don't think we need to
avoid using spark Rdd flow. Also based on your design using sdk to write a
transactional table segment brings extra overhead of creating metadata
files manually. Considering the changes and it's value addition in the real
time scenario,

-1 from my side for this requirement.

Thanks,
Ajantha

On Tue, 2 Feb, 2021, 6:51 pm akshay_nuthala, 
wrote:

> Hi Community,
>
> As Carbon is closely integrated with spark, insert operations in carbon are
> done using spark API. This in turn fires spark jobs, which adds various
> overhead like task serialisation cost, extra memory consumption, execution
> time in remote nodes, shuffle etc.
>
> In case of simple insert operations - we can improve the performance by
> reusing SDK (which is plain java code) to achieve the same, thereby cutting
> off the overheads discussed above.
>
> Following is the link to the design document. Please give your valuable
> comments/inputs/suggestions.
>
>
> https://docs.google.com/document/d/1BcbTcO__vZbLLuhU73NIcbJOM2FRcKBa-ZxackofAS0/edit?usp=sharing
>
> Thanks,
>
> Regards,
> N Akshay Kumar
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>

[DISCUSSION] Improve Simple insert performance in carbondata

Re: [DISCUSSION] Improve Simple insert performance in carbondata

2 matches

Site Navigation

Mail list logo

Footer information