Re: New Consumer API discussion

Chris Riccomini Mon, 03 Mar 2014 10:19:50 -0800

Hey Guys,

Sorry for the late follow up. Here are my questions/thoughts on the API:

1. Why is the config String->Object instead of String->String?

2. Are these Java docs correct?

  KafkaConsumer(java.util.Map<java.lang.String,java.lang.Object> configs)
  A consumer is instantiated by providing a set of key-value pairs as
configuration and a ConsumerRebalanceCallback implementation

There is no ConsumerRebalanceCallback parameter.

3. Would like to have a method:

  poll(long timeout, java.util.concurrent.TimeUnit timeUnit,
TopicPartition... topicAndPartitionsToPoll)

I see I can effectively do this by just fiddling with subscribe and
unsubscribe before each poll. Is this a low-overhead operation? Can I just
unsubscribe from everything after each poll, then re-subscribe to a topic
the next iteration. I would probably be doing this in a fairly tight loop.

4. The behavior of AUTO_OFFSET_RESET_CONFIG is overloaded. I think there
are use cases for decoupling "what to do when no offset exists" from "what
to do when I'm out of range". I might want to start from smallest the
first time I run, but fail if I ever get offset out of range.

5. ENABLE_JMX could use Java docs, even though it's fairly
self-explanatory.

6. Clarity about whether FETCH_BUFFER_CONFIG is per-topic/partition, or
across all topic/partitions is useful. I believe it's per-topic/partition,
right? That is, setting to 2 megs with two TopicAndPartitions would result
in 4 megs worth of data coming in per fetch, right?

7. What does the consumer do if METADATA_FETCH_TIMEOUT_CONFIG times out?
Retry, or throw exception?

8. Does RECONNECT_BACKOFF_MS_CONFIG apply to both metadata requests and
fetch requests?

9. What does SESSION_TIMEOUT_MS default to?

10. Is this consumer thread-safe?

11. How do you use a different offset management strategy? Your email
implies that it's pluggable, but I don't see how. "The offset management
strategy defaults to Kafka based offset management and the API provides a
way for the user to use a customized offset store to manage the consumer's
offsets."

12. If I wish to decouple the consumer from the offset checkpointing, is
it OK to use Joel's offset management stuff directly, rather than through
the consumer's commit API?

Cheers,
Chris

On 2/10/14 10:54 AM, "Neha Narkhede" <neha.narkh...@gmail.com> wrote:

>As mentioned in previous emails, we are also working on a
>re-implementation
>of the consumer. I would like to use this email thread to discuss the
>details of the public API. I would also like us to be picky about this
>public api now so it is as good as possible and we don't need to break it
>in the future.
>
>The best way to get a feel for the API is actually to take a look at the
>javadoc<http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/
>doc/kafka/clients/consumer/KafkaConsumer.html>,
>the hope is to get the api docs good enough so that it is
>self-explanatory.
>You can also take a look at the configs
>here<http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc
>/kafka/clients/consumer/ConsumerConfig.html>
>
>Some background info on implementation:
>
>At a high level the primary difference in this consumer is that it removes
>the distinction between the "high-level" and "low-level" consumer. The new
>consumer API is non blocking and instead of returning a blocking iterator,
>the consumer provides a poll() API that returns a list of records. We
>think
>this is better compared to the blocking iterators since it effectively
>decouples the threading strategy used for processing messages from the
>consumer. It is worth noting that the consumer is entirely single threaded
>and runs in the user thread. The advantage is that it can be easily
>rewritten in less multi-threading-friendly languages. The consumer batches
>data and multiplexes I/O over TCP connections to each of the brokers it
>communicates with, for high throughput. The consumer also allows long poll
>to reduce the end-to-end message latency for low throughput data.
>
>The consumer provides a group management facility that supports the
>concept
>of a group with multiple consumer instances (just like the current
>consumer). This is done through a custom heartbeat and group management
>protocol transparent to the user. At the same time, it allows users the
>option to subscribe to a fixed set of partitions and not use group
>management at all. The offset management strategy defaults to Kafka based
>offset management and the API provides a way for the user to use a
>customized offset store to manage the consumer's offsets.
>
>A key difference in this consumer also is the fact that it does not depend
>on zookeeper at all.
>
>More details about the new consumer design are
>here<https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Consumer+
>Rewrite+Design>
>
>Please take a look at the new
>API<http://people.apache.org/~nehanarkhede/kafka-0.9-consumer-javadoc/doc/
>kafka/clients/consumer/KafkaConsumer.html>and
>give us any thoughts you may have.
>
>Thanks,
>Neha

Re: New Consumer API discussion

Reply via email to