RE: Re:[DISCUSSION] Support Incremental load in datamap and other MV datamap enhancement

ChuanYin Xu Mon, 18 Feb 2019 07:33:34 -0800

“For index datamap we can have same behavior as mv datamap only”
===
Emm, That’s just what I am concerned. In your plan, if the index datamap is 
lazy, each time after a dataload is completed, this datamap will be ignored 
before rebuild index is fired for this segment, even though all the index data 
for the historical segments could be used during the time. In a word, I think 
this implementation is *UNACCEPTABLE*. Actually I also came across the ‘lazy 
index datamap’ feature months ago and gave up this idea  because I thought this 
implementation will make this feature useless and no one will try it in the 
real world.


I strongly *recommend* the following implementation:
Each time after a data load is finished and before the index data for this 
segment is generated, if a query is fired,

  1.  For the historical segments which have index data generated already, 
carbondata will do pruning using the corresponding index datamap data and 
return the pruning result A;
  2.  For the historical segments and the newly generated segment which do not 
have the index data generated, carbondata will skip pruning using the 
corresponding index datamap and return the pruning result B;
  3.  Carbondata will use A union B as the pruning result from the driver side.

The above implementation means that carbondata should *support pruning by 
segment*.

RE: Re:[DISCUSSION] Support Incremental load in datamap and other MV datamap enhancement

Reply via email to