Hi David, +1 Initially when segments concept is started, it is viewed as a folder which is incrementally added with time, so that data retention use-cases like "delete segments before a given date" were thought of. In that case if updated records are written into new segment, then old records will become new records and retention model will not work on that data. So update records were written to the same segment folder.
But later as the partition concept was introduced, that will be a clean method to implement retention or even using a delete by time column is a better method. So inserting new records into the new segment makes sense. Only disadvantage can be later supporting one column data update/replace feature which Likun was mentioning previously. So to generalize, update feature can support inserting the updated records to new segment. The logic to reload indexes when segments are updated can still be there, however when there is no insert of data to old segments, reload of indexes needs to be avoided. Increasing the number of segments need not be a reason for this to go ahead, as the problem of increasing segments anyway is a problem and needs to be solved using compaction either horizontal or vertical. Also optimization of segment file storage either filebased or DB based(embedded or external) for too big deployments needs to be solved independently. Regards, Ramana On Sat, Sep 5, 2020 at 7:58 AM Ajantha Bhat <ajanthab...@gmail.com> wrote: > Hi David. Thanks for proposing this. > > *+1 from my side.* > > I have seen users with 200K segments table stored in cloud. > It will be really slow to reload all the segments where update happened for > indexes like SI, min-max, MV. > > So, it is good to write as a new segment > and just load new segment indexes. (try to reuse this flow > UpdateTableModel.loadAsNewSegment > = true) > > and user can compact the segments to avoid many new segments created by > update. > and we can also move the compacted segments to table status history I guess > to avoid more entries in table status. > > Thanks, > Ajantha > > > > On Fri, Sep 4, 2020 at 1:48 PM David CaiQiang <david.c...@gmail.com> > wrote: > > > Hi Akash, > > > > 3. Update operation contain a insert operation. Update operation > will > > do the same thing how the insert operation process this issue. > > > > > > > > ----- > > Best Regards > > David Cai > > -- > > Sent from: > > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > > >