fixing the cc for navina. On Fri, Apr 29, 2016 at 1:06 AM, Onur Karaman <[email protected]> wrote:
> Hey everyone. I think we might need to have an actual discussion on an > issue I brought up a while ago in > https://issues.apache.org/jira/browse/KAFKA-3494. It seems like > client-ids are being used for too many things today: > 1. kafka-request.log. This helps if you ever want to associate a client > with a specific request. Maybe you're looking for a badly behaved client. > Maybe the client has reported unexpectedly long response times from the > broker and you want to figure out what was happening. > 2. quotas. Quotas today are implemented on a (client-id, broker) > granularity. > 3. metrics. KafkaConsumer and KafkaProducer metrics only go as granular as > the client-id. > > The reason I'm bringing this up is because it looks like there's a > conflict in intent for client-ids between the quota and metrics scenarios. > One of the motivating factors for choosing the client-id for quotas was > that it allows for flexibility in the granularity of the quota enforcement. > For instance, entire services can share the same id to get some form of > (service, broker) granularity quotas. From my understanding, client-id was > chosen as the quota id because it's a property that already exists on the > clients, so we'd be able to quota older clients with no additional work, > and reusing it had relatively low impact. > > So while quotas encourage reuse of client-ids across client instances, > there is a common scenario where the metrics fall apart and mbeans get > overwritten. It looks like if there are two KafkaConsumers or two > KafkaProducers with the same client-id in the same jvm, then JmxReporter > will unregister the first client's mbeans while registering the second > client's mbeans. > > It seems like for the three use cases noted above (kafka-request.log, > metrics, quotas), there are different desirable characteristics: > 1. kafka-request.log at the very least would want an id that could > distinguish individual client instances, but it might be nice to go even > more granular at say a per connection level. > 2. quotas would want an id that's sharable among a group of clients that > wish to be quotad together. This id can be defined by the user. > 3. metrics would want an id that could distinguish invidual client > instance. This id can be defined by the user. We expect it to stay the same > across process restarts so we can potentially associate metrics across > process restarts. > > To resolve this, I think we'd want metrics to have another tag to > differentiate mbeans from instances with the same client-id. Another > alternative is to make quotas depend on a quota id instead of client-id (as > brought up in KIP-55), but this means we no longer can quota older clients > out of the box. > > Other suggestions are welcome! >
