Hi David,
+1

Initially when segments concept is started, it is viewed as a folder which
is incrementally added with time, so that data retention use-cases like
"delete segments before a given date" were thought of. In that case if
updated records are written into new segment, then old records will become
new records and retention model will not work on that data. So update
records were written to the same segment folder.

But later as the partition concept was introduced, that will be a clean
method to implement retention or even using a delete by time column is a
better method.
So inserting new records into the new segment makes sense.

Only disadvantage can be later supporting one column data update/replace
feature which Likun was mentioning previously.

So to generalize, update feature can support inserting the updated records
to new segment. The logic to reload indexes when segments are updated can
still be there, however when there is no insert of data to old segments,
reload of indexes needs to be avoided.

Increasing the number of segments need not be a reason for this to go
ahead, as the problem of increasing segments anyway is a problem and needs
to be solved using compaction either horizontal or vertical. Also
optimization of segment file storage either filebased or DB based(embedded
or external) for too big deployments needs to be solved independently.

Regards,
Ramana

On Sat, Sep 5, 2020 at 7:58 AM Ajantha Bhat <ajanthab...@gmail.com> wrote:

> Hi David. Thanks for proposing this.
>
> *+1 from my side.*
>
> I have seen users with 200K segments table stored in cloud.
> It will be really slow to reload all the segments where update happened for
> indexes like SI, min-max, MV.
>
> So, it is good to write as a new segment
> and just load new segment indexes. (try to reuse this flow
> UpdateTableModel.loadAsNewSegment
> = true)
>
> and user can compact the segments to avoid many new segments created by
> update.
> and we can also move the compacted segments to table status history I guess
> to avoid more entries in table status.
>
> Thanks,
> Ajantha
>
>
>
> On Fri, Sep 4, 2020 at 1:48 PM David CaiQiang <david.c...@gmail.com>
> wrote:
>
> > Hi Akash,
> >
> >     3. Update operation contain a insert operation.  Update operation
> will
> > do the same thing how the insert operation process this issue.
> >
> >
> >
> > -----
> > Best Regards
> > David Cai
> > --
> > Sent from:
> > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> >
>

Reply via email to