[
https://issues.apache.org/jira/browse/KAFKA-4558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15806421#comment-15806421
]
Apurva Mehta edited comment on KAFKA-4558 at 1/7/17 2:00 AM:
-------------------------------------------------------------
So I had a look at the code. All the 13 tests which use
`ProduceConsumeValidate` have changed since that commit. So it is totally
unproductive revert that change at this point.
Regarding your proposal for two metrics: partitions assigned and per-partition
lag may not be what we want. Particularly, in the`ProduceConsumeValidate` test,
the producer is started after the consumer. So if the topic is originally
empty, or if the consumer is configured to read from the end, the lag will
always be zero. This is per my understanding of how lag is reported, viz. how
far from the tail of the log the consumer is. So the lag metric probably won't
be very useful in majority of the cases.
But waiting until partitions assigned is non zero may be what we want. The
tests I have seen just have a single console consumer for the entire topic, so
there should be enough partitions to go around. Of course this may not be true
in the future). At the very least it will be better than what we have right
now. And if there are not enough partitions to go around, the test will fail
early (since the wait_until will time out), and can be diagnosed before
checkin.
Regarding implementation of partitions assigned alone, I thought it might be
worth staging the implementation by first using the metric through jmx. This
would give us a shorter turn around time and validate whether this approach is
sufficient to fix the current issues. We can even play with different metrics
more quickly if necessary.
Finally, would adding an HttpMetricsReporter necessitate a KIP?
was (Author: apurva):
So I had a look at the code. All the 13 tests which use
`ProduceConsumeValidate` have changed since that commit. So it is totally
unproductive revert that change at this point.
Regarding your proposal for two metrics: partitions assigned and per-partition
lag may not be what we want. Particularly, in the`ProduceConsumeValidate` test,
the producer is started after the consumer. So if the topic is originally
empty, or if the consumer is configured to read from the end, the lag will
always be zero. This is per my understanding of how lag is reported, viz. how
far from the tail of the log the consumer is. So the lag metric probably won't
be very useful in majority of the cases.
But waiting until partitions assigned is non zero may be what we want. At the
very least it will be better than what we have right now.
Regarding implementation of partitions assigned alone, I thought it might be
worth staging the implementation by first using the metric through jmx. This
would give us a shorter turn around time and validate whether this approach is
sufficient to fix the current issues. We can even play with different metrics
more quickly if necessary.
Finally, would adding an HttpMetricsReporter necessitate a KIP?
> throttling_test fails if the producer starts too fast.
> ------------------------------------------------------
>
> Key: KAFKA-4558
> URL: https://issues.apache.org/jira/browse/KAFKA-4558
> Project: Kafka
> Issue Type: Bug
> Reporter: Apurva Mehta
> Assignee: Apurva Mehta
>
> As described in https://issues.apache.org/jira/browse/KAFKA-4526, the
> throttling test will fail if the producer in the produce-consume-validate
> loop starts up before the consumer is fully initialized.
> We need to block the start of the producer until the consumer is ready to go.
> The current plan is to poll the consumer for a particular metric (like, for
> instance, partition assignment) which will act as a good proxy for successful
> initialization. Currently, we just check for the existence of a process with
> the PID, which is not a strong enough check, causing the test to fail
> intermittently.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)