akalash commented on pull request #16101: URL: https://github.com/apache/flink/pull/16101#issuecomment-856759212
> IIUC, the changes only clarify the timings but don't add any new information (checkpointDuration was logged before; finalizationTime can be infered from log message timestamps). It is true, but it is hidden knowledge. As you can see in the ticket(and I agree with that), everybody expected that difference between 'Triggering checkpoint' and 'Completed checkpoint' would be equal to checkpoint duration which is not true. My changes just clarify this situation in order to remove misunderstanding. > WDYT about logging the duration of CheckpointCoordinator.dropSubsumedCheckpoints and CheckpointSubsumeHelper.subsume? It is not even the suspect. It is definitely the reason for the delay(more precisely org.apache.flink.runtime.checkpoint.CompletedCheckpoint#discard -> FileStateHandle#discardState). But I don't think that adding extra time for subsume helps us somehow because subsume is too complex by itself and we need to have time for every step inside of subsume in so on. So in general, I also thought about that and it looks like a good idea to have some universal time tracker which can be used to measure different steps of the checkpoint but I don't think that we want to do it now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
