In the case that producer does not require zk.connect, how can the producer recognize the new brokers or brokers which went down?
On Tue, Nov 20, 2012 at 8:31 AM, Jun Rao <[email protected]> wrote: > David, > > The change in 0.8 is that instead of requiring zk.connect, we require > broker.list. In both cases, you typically provide a list of hosts and > ports. Functionality wise, they achieve the same thing, ie, the producer is > able to send the data to the right broker. Are you saying that zk.connect > is more convenient? One benefit of using broker.list is that one can > provide a vip as the only host. This makes it easy to add/remove brokers > since no producer side config needs to be changed. Changing hosts in > zk.connect, on the other hand, requires config changes in the client. > Another reason for removing zkclient in the producer is that if the client > GCs, it can cause churns in the producer and extra load on the zk server. > Since our producer can be embedded in any client, it's hard for us to > control the GC rate. So, removing zkclient in the producer releases the > potential pressure from client GC. > > We still rely on ZK for failure detection and leader election on the broker > and the consumer though. > > Thanks, > > Jun > > On Tue, Nov 20, 2012 at 7:54 AM, David Arthur <[email protected]> wrote: > >> >> On Nov 20, 2012, at 12:23 AM, Jun Rao wrote: >> >> > Jason, >> > >> > In 0.8, producer doesn't use zkclient at all. You just need to set >> > broker.list. >> >> This seems like a regression in functionality. For me, one of the benefits >> of Kafka is only needing to know about ZooKeeper >> >> > A number of things have changed In 0.8. First, number of >> > partitions of a topic is global in a cluster and they don't really change >> > as new brokers are added. Second, a partition is assigned to multiple >> > brokers for replication and one of the replicas is the leader which >> serves >> > writes. When a producer starts up, it first uses the getMetadata api to >> > figure out the replica assignment for the relevant topic/partition. It >> then >> > issues producer request directly the broker where the leader resides. If >> > the leader broker goes down, the producer gets an exception and it will >> > re-issue the getMetadata api to obtain the information about the new >> leader. >> > >> > Thanks, >> > >> > Jun >> > >> > On Mon, Nov 19, 2012 at 1:29 PM, Jason Rosenberg <[email protected]> >> wrote: >> > >> >> Well, they do use zk though, to get the initial list of kafka nodes, and >> >> while zk is available, presumably they do use it to keep up with the >> >> dynamically changing set of kafka brokers, no? You are just saying >> that if >> >> zk goes away, 0.8 producers can keep on producing, as long as the kafka >> >> cluster remains stable? >> >> >> >> Jason >> >> >> >> On Mon, Nov 19, 2012 at 12:20 PM, Neha Narkhede < >> [email protected] >> >>> wrote: >> >> >> >>> In 0.8, producers don't use zk. When producers encounter an error >> >>> while sending data, they use a special getMetadata request to refresh >> >>> the kafka cluster info from a randomly selected Kafka broker, and >> >>> retry sending the data. >> >>> >> >>> Thanks, >> >>> Neha >> >>> >> >>> On Mon, Nov 19, 2012 at 12:10 PM, Jason Rosenberg <[email protected]> >> >>> wrote: >> >>>> Are you saying that in 0.8, producers don't use zkclient? Or don't >> >> need >> >>>> it? How can a producer dynamically respond to a change in the kafka >> >>>> cluster without zk? >> >>>> >> >>>> On Mon, Nov 19, 2012 at 8:07 AM, Jun Rao <[email protected]> wrote: >> >>>> >> >>>>> Jae, >> >>>>> >> >>>>> In 0.8, producers don't need ZK client anymore. Instead, it uses a >> new >> >>>>> getMetadata api to get topic/partition/leader information from the >> >>> broker. >> >>>>> Consumers still need ZK client. We plan to redesign the consumer post >> >>> 0.8 >> >>>>> and can keep this in mind. >> >>>>> >> >>>>> Thanks, >> >>>>> >> >>>>> Jun >> >>>>> >> >>>>> On Sun, Nov 18, 2012 at 10:35 PM, Bae, Jae Hyeon <[email protected] >> > >> >>>>> wrote: >> >>>>> >> >>>>>> I want to suggest kafka should create only one instance of ZkClient >> >>>>>> globally because ZkClient is thread safe and it will make many users >> >>>>>> easily customize kafka source code for Zookeeper. >> >>>>>> >> >>>>>> In our company's cloud environment, it is not recommended to create >> >>>>>> ZkClient from zkConnect string directly because zookeeper cluster >> >> can >> >>>>>> be dynamically changing. So, I have to create ZkClient using our >> >>>>>> company's own platform library. Because of this requirement, I can't >> >>>>>> use kafka jar file directly. I can modify and build kafka source >> >> code >> >>>>>> but I have to repeat this work whenever I update kafka version, >> >> pretty >> >>>>>> annoying. >> >>>>>> >> >>>>>> So, my suggestion is, let me pass ZkClient outs of Producer, >> >> Consumer, >> >>>>>> and Broker, as the following example. >> >>>>>> >> >>>>>> Producer<String, String> producer = >> >>>>>> >> >>>>>> >> >>>>> >> >>> >> >> >> ProducerBuilder.withZkClient(zkClient).build<String,String>(producerConfig); >> >>>>>> >> >>>>>> ConsumerConnector connector = >> >>>>>> Consumer.withZkClient(zkClient).createJavaConsumerConnector(new >> >>>>>> ConsumerConfig(consumerProps)); >> >>>>>> >> >>>>>> KafkaServer is a little more complicated but I believe without much >> >>>>>> effort we can refactor KafkaServer to be customized with ZkClient. >> >>>>>> >> >>>>>> I really appreciate if this suggestion is accepted and merged to >> >> 0.8. >> >>>>>> If you want me to contribute with this suggestion, please let me >> >> know >> >>>>>> your opinion. If you are positive with this idea, I will contribute >> >>>>>> very happily. >> >>>>>> >> >>>>>> Thank you >> >>>>>> Best, Jae >> >>>>>> >> >>>>> >> >>> >> >> >> >>
