[ 
https://issues.apache.org/jira/browse/KAFKA-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13954308#comment-13954308
 ] 

Jay Kreps commented on KAFKA-1348:
----------------------------------

Jay, the cluster maintains knowledge of its state as brokers join and leave as 
well as where each partition is currently hosted, who is the leader for each 
partition, etc. This is the information the client needs to direct its requests 
and it is a lot more than just what nodes are alive and in the cluster. You get 
this information by issuing a metadata request to any broker in the cluster.

Once you are connected to the cluster the client issues metadata requests to 
find out about any cluster changes. The client automatically issues metadata 
requests at a fixed interval or any time it gets an error talking to a broker 
that might indicate stale metadata (e.g. a network exception, timeout, not 
leader exception, etc).

So in your scenario, once connected, the clients discover the new brokers as 
they are added by sending metadata requests to the existing brokers. As soon as 
ec2-12-123-456-444.compute-1.amazonaws.com is leader for any partition the 
client needs to send data to it will discover the leadership change.

It is true that if you suddenly killed 100% of the brokers in your cluster and 
replace them with 100% new brokers then there will be no one you know about who 
is left to tell you about the changes. The solution to this is not to kill 100% 
of your cluster all at once.

So the problem that needs to be solved is the problem of bootstrapping 
knowledge of at least one active node in the cluster. If you knew this then you 
could use that node to find out where all the partitions are hosted so you 
could publish data. But how to find that out?

The way we do this is by giving a comma separated list of bootstrap nodes that 
you can use for your initial bootstrap on startup. This is only used during 
initialization. After initialization all further metadata updates will use the 
full set of alive nodes.

The problem that I understand is that if you already have some home-grown 
service discovery mechanism you may not want to configure any bootstrap urls 
directly, instead you may want to configure the url of your service discovery 
system to help make initial contact with a broker. This makes sense. But this 
contact will just be used during initialization to bootstrap acquiring full 
metadata about partition assignment.

Hopefully that makes sense.



> Producer's Broker Discovery Interface
> -------------------------------------
>
>                 Key: KAFKA-1348
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1348
>             Project: Kafka
>          Issue Type: Improvement
>          Components: producer 
>            Reporter: Jay Bae
>            Assignee: Jun Rao
>
> Producer has a property 'broker.list' static configuration. I need a 
> requirement to be able to override this behavior such as Netflix Eureka 
> Discovery module. Let me contribute and please add this to 0.8.1.1 release.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to