hi Vinod,

It is an expected feature for many people as Jacky mentioned. I think
Update/Delete should be basic module for CarbonData, meanwhile it is
complex question for distributed storage system. The solution you proposed
is based on traditional 'Base + Delta' approach, which is applied on
bigtable/hbase/kudu/etc successfully. following your proposed solution for
CarbonData i have some confusion include doubts Jacky mentioned transaction
and index:

1. How to trade-off IO overhead when add delta files. i think there may be
two query approaches for delta files: (1) load whole delta data and replace
based query result if also exist in delta file. in this case, it may
increase IO overhead which CarbonData try to reduce it as possible.  (2)
build separate index for all delta file, or label delta records and upgrade
file format. right?
2. When and how to do minor/major compaction on (base + delta) or (delta +
delta)?
3. Any questions for update or delete Directory item?

I look forward to the detailed design of your solution.

Please correct me if i am wrong.

Best Regards,
He Xiaoqiao


On Tue, Nov 15, 2016 at 5:39 PM, Jacky Li <jacky.li...@qq.com> wrote:

> Hi Vinod,
>
> It is great to have this feature, as there were many people asking for
> data update during the CarbonData meetup earlier. I believe it will be
> useful for many big data applications.
>
> For the solution you proposed, I have following doubts:
> 1. Data update is complex as if transaction is involved, so what kind of
> ACID level support are you thinking about?
> 2. If I understand correctly, you are proposing to do data update via base
> + delta file approach, right? So in this case, new file format needs to be
> added in CarbonData project.
> 3. As CarbonData has builtin support for index, any idea what is the
> impaction to the B tree index already in driver and executor memory?
>
> Regards,
> Jacky
>
> > 在 2016年11月15日,下午12:25,Vinod KC <vinod.kc...@gmail.com> 写道:
> >
> > Hi All
> > I would like to propose following new features in Carbon data
> > 1) Update statement to support modifying existing records in carbon data
> > table
> > 2) Delete statement to remove records from carbon data table
> >
> > A) Update operation: 'Update' features can be added to CarbonData using
> > intermediate Delta files [delete/update delta files] support with lesser
> > impact on existing code.
> > Update can be considered as a ‘delete’ followed by an‘insert’ operation.
> > Once an update is done on carbon data file, on select query operation,
> > Carbondata store reader can make use of delete delta data cache to
> exclude
> > deleted records in that segment and then include records from newly added
> > update delta files.
> >
> > B) Delete operation: In the case of delete operation, a delete delta file
> > will be added to each segment matching the records. During select query
> > operation Carbon data reader will exclude those deleted records from the
> > result set.
> >
> > Please share your suggestions and thoughts about design and functional
> > aspects on this feature. I’ll share a detailed design document about
> above
> > thoughts later.
> >
> > Regards
> > Vinod
>
>
>
>

Reply via email to