[ 
https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14328260#comment-14328260
 ] 

Aditya Auradkar commented on KAFKA-1546:
----------------------------------------

I do have a concern about the heuristic. [~jkreps] Using your example:

"if(!fetchedData.readToEndOfLog)
 this.lagBegin = System.currentTimeMillis() 
 else 
this.lagBegin = -1 

Then the liveness criteria is 
partitionLagging = this.lagBegin > 0 && System.currentTimeMillis() - 
this.lagBegin > REPLICA_LAG_TIME_MS"

The time counter starts when the read doesn't go the end of log and only stops 
when it does reach the end. In this case, the lag measures the absolute 
duration of time for which this replica is lagging but not how far behind it is 
in terms of applying commits. For example a replica could be catching up 
quickly but the "replica.lag.max.ms" counter would still increase until it 
fully catches up and then it will abruptly drop to zero.


> Automate replica lag tuning
> ---------------------------
>
>                 Key: KAFKA-1546
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1546
>             Project: Kafka
>          Issue Type: Improvement
>          Components: replication
>    Affects Versions: 0.8.0, 0.8.1, 0.8.1.1
>            Reporter: Neha Narkhede
>            Assignee: Aditya Auradkar
>              Labels: newbie++
>
> Currently, there is no good way to tune the replica lag configs to 
> automatically account for high and low volume topics on the same cluster. 
> For the low-volume topic it will take a very long time to detect a lagging
> replica, and for the high-volume topic it will have false-positives.
> One approach to making this easier would be to have the configuration
> be something like replica.lag.max.ms and translate this into a number
> of messages dynamically based on the throughput of the partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to