hemantk-12 opened a new pull request, #5236:
URL: https://github.com/apache/ozone/pull/5236

   ## What changes were proposed in this pull request?
   The problem is that 
[SSTFilteringService](https://github.com/apache/ozone/blob/master/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/SstFilteringService.java)
 and [SST pruning 
service](https://github.com/apache/ozone/blob/master/hadoop-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java#L1483)
 work independently and try to optimize the space by deleting unnecessary SST 
files. SSTFilteringService deletes some files which don't belongs to the 
snapshotted bucket and SST prune service deletes the file which are not 
required for diff calculations. On the other hand compaction DAG is global at 
Ozone level and is kind a not aware of the above two clean ups. Problem arises 
when calculating the delta files for two snapshots and traversal reaches to 
this 
[condition](https://github.com/apache/ozone/blob/c801c02455982d3488cb099942f86912a492dc89/hadoop-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/R
 ocksDBCheckpointDiffer.java#L1049). Graph traversal adds a node because it is 
not present in the toSnapshot (because it might be deleted by 
SSTFilteringService) and later gets added to diff file because of this 
[condition](https://github.com/apache/ozone/blob/c801c02455982d3488cb099942f86912a492dc89/hadoop-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java#L1024).
   Before [returning delta files to 
SnapshotDiffManager](https://github.com/apache/ozone/blob/c801c02455982d3488cb099942f86912a492dc89/hadoop-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java#L877),
 we look for the files in either [active DB dir and SST backup 
dir](https://github.com/apache/ozone/blob/c801c02455982d3488cb099942f86912a492dc89/hadoop-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java#L832).
 Active DB dir doesn't have these files because they were compacted and SST 
backup dir doesn't have because of SST pruning service.
   
   Detailed 
[explanation](https://issues.apache.org/jira/browse/HDDS-8940?focusedCommentId=17755663&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17755663)
 and 
[example](https://issues.apache.org/jira/browse/HDDS-8940?focusedCommentId=17755668&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17755668).
   
   In this PR, it is proposed to keep key range in the DAG node and use that to 
early return while traversing.
   A new DAO class CompactionLogEntry is added which persistent compaction 
files information to the compaction log.
   
   ## What is the link to the Apache JIRA
   https://issues.apache.org/jira/browse/HDDS-8940
   
   ## How was this patch tested?
   * Existing unit tests.
   * New tests are in progress.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to