[jira] [Updated] (KAFKA-405) Improve the high water mark maintenance to store high watermarks for all partitions in a single file on disk

Neha Narkhede (JIRA) Thu, 19 Jul 2012 19:39:39 -0700

     [ 
https://issues.apache.org/jira/browse/KAFKA-405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Neha Narkhede updated KAFKA-405:
--------------------------------

    Attachment: kafka-405-v2.patch

1. Good point. Changed the name to Log.truncateTo(targetOffset)
2. It is possible to truncate multiple segments. The truncateTo API handles 
that. It deletes segments that have start offset > targetOffset. Only one 
segment will ever need to be truncated. Rest will have to be deleted.
3. Changed FileMessageSet.truncateUpto to truncateTo to make it consistent with 
Log.truncateTo
4. Created wrapper HighwatermarkCheckpoint. Documented the file format here. I 
had thought about atomic updates but there is only one thread that serializes 
the checkpoints, so didn't think swapping the old file with the new one would 
be required, no ?
5. As for unit testing, I've added a new test HighwatermarkPersistenceTest that 
tests the writing/reading high watermark values for single as well as multiple 
partitions.
6. I think it is a good idea to version the high watermark file, just in case 
we didn't cover something that we need to in the future
7. We have a JIRA open for fixing all getters/setters, so I'll defer that 
change. The logEndOffset logic is a little tricky. It seems correct to not 
expose a separate API to set the logEndOffsetUpdateTime and just let the 
logEndOffset setter API do it. But, here is the problem. The leader needs to 
record its own log end offset update time while appending messages to local 
log. However, since the Log doesn't know anything about logEndOffsetUpdateTime, 
its append API cannot set the udpate time. Also, the leader cannot use the 
logEndOffset setter API since its log end offset is recorded by its local Log 
object. The logEndOffset setter API is meant only to record the follower's log 
end offset. But since it makes sense for the update time to be updated while 
setting the logEndOffset, I've fixed it. Basically, the logEndOffset() setter 
API updates only the logEndOffset time when a local log exists for the replica. 
For all other cases, it updates both the logEndOffset as well as the 
logEndOffsetUpdateTime 
8. Yeah, there are several callers that use the getReplica() API and don't 
always re-throw an exception. Some are re-throwing an error while others are 
using the Option to return some default value for some state of the Replica 
(highwatermark). And case match in Scala is good for that since it always 
evaluates to a value, a try catch block doesn't. But if all the callers throw 
an exception, then it makes sense to have getReplica throw it instead. 
9. You have a valid point about KafkaScheduler usage. However, we name the 
thread appropriately with every instance of the scheduler. Ideally, if there 
was a way to override the base thread name independently with the same 
scheduler, it would be possible to use a single scheduler.
10. Good point about abbreviations. Fixed that. Let's standardize on this.

Also, changed the name to updateLEO to updateLeo

                
> Improve the high water mark maintenance to store high watermarks for all 
> partitions in a single file on disk
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-405
>                 URL: https://issues.apache.org/jira/browse/KAFKA-405
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Neha Narkhede
>            Assignee: Neha Narkhede
>         Attachments: kafka-405-v1.patch, kafka-405-v2.patch
>
>
> KAFKA-46 introduced per partition leader high watermarks. But it stores those 
> in one file per partition. A more performant solution would be to store all 
> high watermarks in a single file on disk

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-405) Improve the high water mark maintenance to store high watermarks for all partitions in a single file on disk

Reply via email to