emm, eliminating delta files to enhance query performance is quite reasonable and compaction is a candidate for it. However I have some questions about this, maybe they will help in your design.
Q1: A segment with delta files means there are some UD(update/delete) operations on this segment before, which means there will still be some UD in the future. So, is it worth conpacting this segment? Also please keep in mind that UD operations will be blocked if the compaction is going on. Q2: I feel there may be too many kinds of compaction in carbondata... What if in the further I want another compaction that can merge smaller carbondata file into larger ones? Will we add another kind of compaction? I think it's time for us to consider extensibility for the further while proposing this feature. Q3: Currently all kinds of compactions are using the query procedure to rewrite all the records for the related segments. Suppose we have a segment with 100 carbondata files and we only delete one record in this segment. The penalty of rewriting all the records for this segment is heavy. -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
