ShreelekhyaG opened a new pull request, #4287:
URL: https://github.com/apache/carbondata/pull/4287

    ### Why is this PR needed?
   1. Performance degradation for Incremental updates is observed more in 
partition table.
   - During the update, in the prune step we are listing files from segment 
path to get the carbondata files and create `fileNameToMetaInfoMapping `map. On 
incremental update for partition table, the number of invalid files keep on 
increasing each time which is causing the degradation in listing files.
   
   2.  Invalid segments cache is not removed after delete/update.
    
    ### What changes were proposed in this PR?
   1. Instead of listing files, made a change to get carbon file from the file 
name and create BlockMetaInfo directly in `createBlockMetaInfo`. 
     _**Impact when tested on a single partition with 100 segments:**_ 
           - There is significant improvement observed in the Incremental 
update operation.
           - Improvement of `select count(*)` operation from 200 secs to 9 
secs. Because in `select count(*)` flow it was listing files for each segment 
and the map was not reused.
   
   3. Clearing invalid/deleted segments from cache after delete and update.
       
    ### Does this PR introduce any user interface change?
    - No
   
    ### Is any new testcase added?
    - Yes
   
       
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to