[
https://issues.apache.org/jira/browse/KAFKA-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13954308#comment-13954308
]
Jay Kreps commented on KAFKA-1348:
----------------------------------
Jay, the cluster maintains knowledge of its state as brokers join and leave as
well as where each partition is currently hosted, who is the leader for each
partition, etc. This is the information the client needs to direct its requests
and it is a lot more than just what nodes are alive and in the cluster. You get
this information by issuing a metadata request to any broker in the cluster.
Once you are connected to the cluster the client issues metadata requests to
find out about any cluster changes. The client automatically issues metadata
requests at a fixed interval or any time it gets an error talking to a broker
that might indicate stale metadata (e.g. a network exception, timeout, not
leader exception, etc).
So in your scenario, once connected, the clients discover the new brokers as
they are added by sending metadata requests to the existing brokers. As soon as
ec2-12-123-456-444.compute-1.amazonaws.com is leader for any partition the
client needs to send data to it will discover the leadership change.
It is true that if you suddenly killed 100% of the brokers in your cluster and
replace them with 100% new brokers then there will be no one you know about who
is left to tell you about the changes. The solution to this is not to kill 100%
of your cluster all at once.
So the problem that needs to be solved is the problem of bootstrapping
knowledge of at least one active node in the cluster. If you knew this then you
could use that node to find out where all the partitions are hosted so you
could publish data. But how to find that out?
The way we do this is by giving a comma separated list of bootstrap nodes that
you can use for your initial bootstrap on startup. This is only used during
initialization. After initialization all further metadata updates will use the
full set of alive nodes.
The problem that I understand is that if you already have some home-grown
service discovery mechanism you may not want to configure any bootstrap urls
directly, instead you may want to configure the url of your service discovery
system to help make initial contact with a broker. This makes sense. But this
contact will just be used during initialization to bootstrap acquiring full
metadata about partition assignment.
Hopefully that makes sense.
> Producer's Broker Discovery Interface
> -------------------------------------
>
> Key: KAFKA-1348
> URL: https://issues.apache.org/jira/browse/KAFKA-1348
> Project: Kafka
> Issue Type: Improvement
> Components: producer
> Reporter: Jay Bae
> Assignee: Jun Rao
>
> Producer has a property 'broker.list' static configuration. I need a
> requirement to be able to override this behavior such as Netflix Eureka
> Discovery module. Let me contribute and please add this to 0.8.1.1 release.
--
This message was sent by Atlassian JIRA
(v6.2#6252)