Re: [discuss]CarbonData update operation enhance

2020-09-22 Thread David CaiQiang
hi Linwood,
  1. better to implement "Update feature enhancement" at first, it will
create a new segment to store new files.
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/Discussion-Update-feature-enhancement-td99769.html
  2. clean deletedelta files
  now carbon need clean invalid .deletedelta files before update/delete. 
If we don't clean them, after next update/delete, these files will become
valid .deletedela files.

  How to avoid clean invalid .deletedelta files and they don't impact
data after next update/delete operation?



-
Best Regards
David Cai
--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: Regarding Carbondata Benchmarking & Feature presentation

2020-09-22 Thread Liang Chen
Hi

Great.
Happy to see more and more companies  use Apache CarbonData.

Regards
Liang


Vimal Das Kammath wrote
> Hi Carbondata Team,
> 
> I am working on proposing Carbondata to the Data Analytics team in Uber.
> It
> will be great if any of you can share the latest benchmarking and
> feature/design presentation.
> 
> Regards,
> Vimal





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/


Re: [discuss]CarbonData update operation enhance

2020-09-22 Thread Liang Chen
Hi

Thank you started this discussion.
This proposal is for improving data updation performance, right ?

Regards
Liang


Linwood wrote
> *[Background]*
> Update operation will clean up delta files before update( see
> cleanUpDeltaFiles(carbonTable, false)), It's loop traversal metadata path
> and segment path many times. When there are too many files, the overhead
> will increase and update time will be longer.
> 
> *[Motivation & Goal]*
> During the update process, reduce loop traversal or remove
> cleanUpDelteFiles
> to another method.
> 
> *[Modification]*
> There are some solutions as following.
> 
> Solution 1:
> 
> In cleanUpDeltaFiles have some same points in get files method, like
> updateStatusManager.getUpdateDeltaFilesList(segment,
> false,CarbonCommonConstants.UPDATE_DELTA_FILE_EXT, true,
> allSegmentFiles,true) and
> updateStatusManager.getUpdateDeltaFilesList(segment,
> false,CarbonCommonConstants.UPDATE_INDEX_FILE_EXT, true,
> allSegmentFiles,true), They are just different file types,but loop
> traversal
> segment path twice. we can merge it.
> 
> Solution 2:
> 
> Base solution 1,Use Spark or MapReduce to hand over tasks to other nodes.
> 
> Solution 3:
> 
> Submit cleanUpDelaFiles  to another task, process them in the early
> morning
> or when the cluster is not busy.
> 
> Solution 4:
> 
> Establish a garbage collection bin, which provides some interfaces for our
> program to determine when files enter the garbage collection bin and how
> to
> deal with them.
> 
> Please vote for all solutions.
> 
> Best Regards,
> LinWood
> 
> 
> 
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/