hachikuji commented on a change in pull request #9110:
URL: https://github.com/apache/kafka/pull/9110#discussion_r464545190
##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -2227,13 +2210,16 @@ class Log(@volatile private var _dir: File,
* @param segments The log segments to schedule for deletion
* @param asyncDelete Whether the segment files should be deleted
asynchronously
*/
- private def removeAndDeleteSegments(segments: Iterable[LogSegment],
asyncDelete: Boolean): Unit = {
+ private def removeAndDeleteSegments(segments: Iterable[LogSegment],
+ asyncDelete: Boolean,
+ reason: SegmentDeletionReason): Unit = {
if (segments.nonEmpty) {
lock synchronized {
// As most callers hold an iterator into the `segments` collection and
`removeAndDeleteSegment` mutates it by
// removing the deleted segment, we should force materialization of
the iterator here, so that results of the
// iteration remain valid and deterministic.
val toDelete = segments.toList
+ info(s"${reason.reasonString(this, toDelete)}")
Review comment:
A little annoying to need to pass through segments just to be added to
each log message individually. Maybe we could do it like this instead
```scala
info(s"Deleting segments due to ${reason.reasonString(this)}: $toDelete")
```
##########
File path: core/src/main/scala/kafka/log/Log.scala
##########
@@ -2686,3 +2670,50 @@ object LogMetricNames {
List(NumLogSegments, LogStartOffset, LogEndOffset, Size)
}
}
+
+sealed trait SegmentDeletionReason {
+ def reasonString(log: Log, toDelete: Iterable[LogSegment]): String
+}
+
+case object RetentionMsBreachDeletion extends SegmentDeletionReason {
Review comment:
nit: is it necessary to add `Deletion` to all of these? Maybe only
`LogDeletion` needs it since it is referring to deletion of the log itself.
##########
File path: core/src/main/scala/kafka/log/LogSegment.scala
##########
@@ -413,7 +413,7 @@ class LogSegment private[log] (val log: FileRecords,
override def toString: String = "LogSegment(baseOffset=" + baseOffset +
", size=" + size +
", lastModifiedTime=" + lastModified +
- ", largestTime=" + largestTimestamp +
+ ", largestRecordTimestamp=" + largestRecordTimestamp +
Review comment:
I'm ok with the change. I think it's better to reflect the underlying
fields directly and redundant information just adds noise to the logs.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]