[ https://issues.apache.org/jira/browse/KAFKA-405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neha Narkhede updated KAFKA-405: -------------------------------- Attachment: kafka-405-v2.patch 1. Good point. Changed the name to Log.truncateTo(targetOffset) 2. It is possible to truncate multiple segments. The truncateTo API handles that. It deletes segments that have start offset > targetOffset. Only one segment will ever need to be truncated. Rest will have to be deleted. 3. Changed FileMessageSet.truncateUpto to truncateTo to make it consistent with Log.truncateTo 4. Created wrapper HighwatermarkCheckpoint. Documented the file format here. I had thought about atomic updates but there is only one thread that serializes the checkpoints, so didn't think swapping the old file with the new one would be required, no ? 5. As for unit testing, I've added a new test HighwatermarkPersistenceTest that tests the writing/reading high watermark values for single as well as multiple partitions. 6. I think it is a good idea to version the high watermark file, just in case we didn't cover something that we need to in the future 7. We have a JIRA open for fixing all getters/setters, so I'll defer that change. The logEndOffset logic is a little tricky. It seems correct to not expose a separate API to set the logEndOffsetUpdateTime and just let the logEndOffset setter API do it. But, here is the problem. The leader needs to record its own log end offset update time while appending messages to local log. However, since the Log doesn't know anything about logEndOffsetUpdateTime, its append API cannot set the udpate time. Also, the leader cannot use the logEndOffset setter API since its log end offset is recorded by its local Log object. The logEndOffset setter API is meant only to record the follower's log end offset. But since it makes sense for the update time to be updated while setting the logEndOffset, I've fixed it. Basically, the logEndOffset() setter API updates only the logEndOffset time when a local log exists for the replica. For all other cases, it updates both the logEndOffset as well as the logEndOffsetUpdateTime 8. Yeah, there are several callers that use the getReplica() API and don't always re-throw an exception. Some are re-throwing an error while others are using the Option to return some default value for some state of the Replica (highwatermark). And case match in Scala is good for that since it always evaluates to a value, a try catch block doesn't. But if all the callers throw an exception, then it makes sense to have getReplica throw it instead. 9. You have a valid point about KafkaScheduler usage. However, we name the thread appropriately with every instance of the scheduler. Ideally, if there was a way to override the base thread name independently with the same scheduler, it would be possible to use a single scheduler. 10. Good point about abbreviations. Fixed that. Let's standardize on this. Also, changed the name to updateLEO to updateLeo > Improve the high water mark maintenance to store high watermarks for all > partitions in a single file on disk > ------------------------------------------------------------------------------------------------------------ > > Key: KAFKA-405 > URL: https://issues.apache.org/jira/browse/KAFKA-405 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.8 > Reporter: Neha Narkhede > Assignee: Neha Narkhede > Attachments: kafka-405-v1.patch, kafka-405-v2.patch > > > KAFKA-46 introduced per partition leader high watermarks. But it stores those > in one file per partition. A more performant solution would be to store all > high watermarks in a single file on disk -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira