dhruvilshah3 commented on a change in pull request #9110:
URL: https://github.com/apache/kafka/pull/9110#discussion_r464100823



##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -2227,14 +2210,17 @@ class Log(@volatile private var _dir: File,
    * @param segments The log segments to schedule for deletion
    * @param asyncDelete Whether the segment files should be deleted 
asynchronously
    */
-  private def removeAndDeleteSegments(segments: Iterable[LogSegment], 
asyncDelete: Boolean): Unit = {
+  private def removeAndDeleteSegments(segments: Iterable[LogSegment],
+                                      asyncDelete: Boolean,
+                                      reason: SegmentDeletionReason): Unit = {
     if (segments.nonEmpty) {
       lock synchronized {
         // As most callers hold an iterator into the `segments` collection and 
`removeAndDeleteSegment` mutates it by
         // removing the deleted segment, we should force materialization of 
the iterator here, so that results of the
         // iteration remain valid and deterministic.
         val toDelete = segments.toList
         toDelete.foreach { segment =>
+          info(s"${reason.reasonString(this, segment)}")

Review comment:
       @ijuma We log one message per deleted segment. This could cause 
temporary increase in log volume when DeleteRecords is used or when retention 
is lowered, for example.
   
   Overall, we have a few options with different tradeoffs:
   
   1. Log a common reason per batch being deleted, including base offsets of 
segments being deleted. This was the behavior before 
https://github.com/apache/kafka/pull/8850. eg.
   ```
   Deleting segments due to retention time 999ms breach. BaseOffsets: (0,5,...).
   ```
   
   2. Log a common reason per batch being deleted, including base offsets and 
metadata of segments. eg. 
   ```
   Deleting segments due to retention time 999ms breach: 
LogSegment(baseOffset=0, size=360, lastModifiedTime=1596387738000, 
largestRecordTimestamp=Some(1596387737414)),LogSegment(baseOffset=5, size=360, 
lastModifiedTime=1596387738000, largestRecordTimestamp=Some(1596387737414)),...
   ```
   
   3. Log one message per segment being deleted. This is the current behavior. 
eg.
   ```
   Segment with base offset 0 will be deleted due to retention time 999ms 
breach based on the largest record timestamp from the segment, which is ...
   Segment with base offset 5 will be deleted due to retention time 999ms 
breach based on the largest record timestamp from the segment, which is ...
   ...
   ```
   
   Doing (2) may be a reasonable tradeoff. It eliminates some of the redundancy 
at the cost of making it to glean per segment metadata. Let me know what you 
think.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to