nsivabalan edited a comment on pull request #2438: URL: https://github.com/apache/hudi/pull/2438#issuecomment-864413719
Guess we can simplify things. Let me go over some pseudo code of interest. within DeltaSync.read() ``` // set right checkpoint value if(cfg.checkpoint != null && ! (commitMetadata.contains(Checkpoint_RESET_Key) ) { checkpoint = cfg.checkpoint; } else if (commitMetadata.contains(Checkpoint_Key)) { checkpoint = commitMetadata.get(Checkpoint_Key)); } else { Option.empty() } ``` // Note that first if condition deals with RESET_key where as 2nd else if conditions deals with Checkpoint_key within write() ``` // towards the end commitMetadata.out(Checkpoint_Key, updated checkpoint after writing) if(cfg.checkpoint != null) { commitMetadata.add(Checkpoint_RESET_Key); } ``` If cfg.checkpoint is set, only during first round, it will be honored. At the end of first batch, we add Checkpoint_RESET_Key to the commitmetadata and hence from subsequent batches, checkpoint will be parsed from commitMetadata. With this PR, only addition is that we are introducing a new checkpoint type. Let me propose a simple add on to above code that would work for us. within DeltaSync.read() ``` // set right checkpoint value boolean resetCheckpointType = true; // New addition if(cfg.checkpoint != null && ! (commitMetadata.contains(Checkpoint_RESET_Key) ) { checkpoint = cfg.checkpoint; resetCheckpointType = false; // New addition } else if (commitMetadata.contains(Checkpoint_Key)) { checkpoint = commitMetadata.get(Checkpoint_Key)); } else { Option.empty() } // New addition if (resetCheckpointType) { **reset checkpoint type if set.** } ``` No other changes are required. This is based of the assumption that Checkpoint_RESET_Key and checkpoint type goes hand in hand. During first batch, checkpoint type could be set, there won't be any Checkpoint_RESET_Key set. But from 2nd batch, it should be reverse. check point type should not be set, but Checkpoint_RESET_Key should be part of the commit metadata. Given this assumption, we don't really need to add checkpoint type to commitMetadata, but still decide whether to use the checkpoint type or not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org