[ https://issues.apache.org/jira/browse/KAFKA-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16588002#comment-16588002 ]
ASF GitHub Bot commented on KAFKA-7063: --------------------------------------- hachikuji closed pull request #5240: KAFKA-7063: Update documentation to remove references to old producers and consumers URL: https://github.com/apache/kafka/pull/5240 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/docs/api.html b/docs/api.html index 015d5140bd0..ea511b44453 100644 --- a/docs/api.html +++ b/docs/api.html @@ -115,12 +115,6 @@ <h3><a id="adminapi" href="#adminapi">2.5 AdminClient API</a></h3> For more information about the AdminClient APIs, see the <a href="/{{version}}/javadoc/index.html?org/apache/kafka/clients/admin/AdminClient.html" title="Kafka {{dotVersion}} Javadoc">javadoc</a>. <p> - <h3><a id="legacyapis" href="#legacyapis">2.6 Legacy APIs</a></h3> - - <p> - A more limited legacy producer and consumer api is also included in Kafka. These old Scala APIs are deprecated and only still available for compatibility purposes. Information on them can be found here <a href="/081/documentation.html#producerapi" title="Kafka 0.8.1 Docs"> - here</a>. - </p> </script> <div class="p-api"></div> diff --git a/docs/configuration.html b/docs/configuration.html index e5576f9263b..bde53a749c7 100644 --- a/docs/configuration.html +++ b/docs/configuration.html @@ -260,190 +260,14 @@ <h3><a id="topicconfigs" href="#topicconfigs">3.2 Topic-Level Configs</a></h3> <h3><a id="producerconfigs" href="#producerconfigs">3.3 Producer Configs</a></h3> - Below is the configuration of the Java producer: + Below is the configuration of the producer: <!--#include virtual="generated/producer_config.html" --> - <p> - For those interested in the legacy Scala producer configs, information can be found <a href="http://kafka.apache.org/082/documentation.html#producerconfigs"> - here</a>. - </p> - <h3><a id="consumerconfigs" href="#consumerconfigs">3.4 Consumer Configs</a></h3> - In 0.9.0.0 we introduced the new Java consumer as a replacement for the older Scala-based simple and high-level consumers. - The configs for both new and old consumers are described below. - - <h4><a id="newconsumerconfigs" href="#newconsumerconfigs">3.4.1 New Consumer Configs</a></h4> - Below is the configuration for the new consumer: + Below is the configuration for the consumer: <!--#include virtual="generated/consumer_config.html" --> - <h4><a id="oldconsumerconfigs" href="#oldconsumerconfigs">3.4.2 Old Consumer Configs</a></h4> - - The essential old consumer configurations are the following: - <ul> - <li><code>group.id</code> - <li><code>zookeeper.connect</code> - </ul> - - <table class="data-table"> - <tbody><tr> - <th>Property</th> - <th>Default</th> - <th>Description</th> - </tr> - <tr> - <td>group.id</td> - <td colspan="1"></td> - <td>A string that uniquely identifies the group of consumer processes to which this consumer belongs. By setting the same group id multiple processes indicate that they are all part of the same consumer group.</td> - </tr> - <tr> - <td>zookeeper.connect</td> - <td colspan="1"></td> - <td>Specifies the ZooKeeper connection string in the form <code>hostname:port</code> where host and port are the host and port of a ZooKeeper server. To allow connecting through other ZooKeeper nodes when that ZooKeeper machine is down you can also specify multiple hosts in the form <code>hostname1:port1,hostname2:port2,hostname3:port3</code>. - <p> - The server may also have a ZooKeeper chroot path as part of its ZooKeeper connection string which puts its data under some path in the global ZooKeeper namespace. If so the consumer should use the same chroot path in its connection string. For example to give a chroot path of <code>/chroot/path</code> you would give the connection string as <code>hostname1:port1,hostname2:port2,hostname3:port3/chroot/path</code>.</td> - </tr> - <tr> - <td>consumer.id</td> - <td colspan="1">null</td> - <td> - <p>Generated automatically if not set.</p> - </td> - </tr> - <tr> - <td>socket.timeout.ms</td> - <td colspan="1">30 * 1000</td> - <td>The socket timeout for network requests. The actual timeout set will be fetch.wait.max.ms + socket.timeout.ms.</td> - </tr> - <tr> - <td>socket.receive.buffer.bytes</td> - <td colspan="1">64 * 1024</td> - <td>The socket receive buffer for network requests</td> - </tr> - <tr> - <td>fetch.message.max.bytes</td> - <td nowrap>1024 * 1024</td> - <td>The number of bytes of messages to attempt to fetch for each topic-partition in each fetch request. These bytes will be read into memory for each partition, so this helps control the memory used by the consumer. The fetch request size must be at least as large as the maximum message size the server allows or else it is possible for the producer to send messages larger than the consumer can fetch.</td> - </tr> - <tr> - <td>num.consumer.fetchers</td> - <td colspan="1">1</td> - <td>The number fetcher threads used to fetch data.</td> - </tr> - <tr> - <td>auto.commit.enable</td> - <td colspan="1">true</td> - <td>If true, periodically commit to ZooKeeper the offset of messages already fetched by the consumer. This committed offset will be used when the process fails as the position from which the new consumer will begin.</td> - </tr> - <tr> - <td>auto.commit.interval.ms</td> - <td colspan="1">60 * 1000</td> - <td>The frequency in ms that the consumer offsets are committed to zookeeper.</td> - </tr> - <tr> - <td>queued.max.message.chunks</td> - <td colspan="1">2</td> - <td>Max number of message chunks buffered for consumption. Each chunk can be up to fetch.message.max.bytes.</td> - </tr> - <tr> - <td>rebalance.max.retries</td> - <td colspan="1">4</td> - <td>When a new consumer joins a consumer group the set of consumers attempt to "rebalance" the load to assign partitions to each consumer. If the set of consumers changes while this assignment is taking place the rebalance will fail and retry. This setting controls the maximum number of attempts before giving up.</td> - </tr> - <tr> - <td>fetch.min.bytes</td> - <td colspan="1">1</td> - <td>The minimum amount of data the server should return for a fetch request. If insufficient data is available the request will wait for that much data to accumulate before answering the request.</td> - </tr> - <tr> - <td>fetch.wait.max.ms</td> - <td colspan="1">100</td> - <td>The maximum amount of time the server will block before answering the fetch request if there isn't sufficient data to immediately satisfy fetch.min.bytes</td> - </tr> - <tr> - <td>rebalance.backoff.ms</td> - <td>2000</td> - <td>Backoff time between retries during rebalance. If not set explicitly, the value in zookeeper.sync.time.ms is used. - </td> - </tr> - <tr> - <td>refresh.leader.backoff.ms</td> - <td colspan="1">200</td> - <td>Backoff time to wait before trying to determine the leader of a partition that has just lost its leader.</td> - </tr> - <tr> - <td>auto.offset.reset</td> - <td colspan="1">largest</td> - <td> - <p>What to do when there is no initial offset in ZooKeeper or if an offset is out of range:<br/>* smallest : automatically reset the offset to the smallest offset<br/>* largest : automatically reset the offset to the largest offset<br/>* anything else: throw exception to the consumer</p> - </td> - </tr> - <tr> - <td>consumer.timeout.ms</td> - <td colspan="1">-1</td> - <td>Throw a timeout exception to the consumer if no message is available for consumption after the specified interval</td> - </tr> - <tr> - <td>exclude.internal.topics</td> - <td colspan="1">true</td> - <td>Whether messages from internal topics (such as offsets) should be exposed to the consumer.</td> - </tr> - <tr> - <td>client.id</td> - <td colspan="1">group id value</td> - <td>The client id is a user-specified string sent in each request to help trace calls. It should logically identify the application making the request.</td> - </tr> - <tr> - <td>zookeeper.session.timeout.ms </td> - <td colspan="1">6000</td> - <td>ZooKeeper session timeout. If the consumer fails to heartbeat to ZooKeeper for this period of time it is considered dead and a rebalance will occur.</td> - </tr> - <tr> - <td>zookeeper.connection.timeout.ms</td> - <td colspan="1">6000</td> - <td>The max time that the client waits while establishing a connection to zookeeper.</td> - </tr> - <tr> - <td>zookeeper.sync.time.ms </td> - <td colspan="1">2000</td> - <td>How far a ZK follower can be behind a ZK leader</td> - </tr> - <tr> - <td>offsets.storage</td> - <td colspan="1">zookeeper</td> - <td>Select where offsets should be stored (zookeeper or kafka).</td> - </tr> - <tr> - <td>offsets.channel.backoff.ms</td> - <td colspan="1">1000</td> - <td>The backoff period when reconnecting the offsets channel or retrying failed offset fetch/commit requests.</td> - </tr> - <tr> - <td>offsets.channel.socket.timeout.ms</td> - <td colspan="1">10000</td> - <td>Socket timeout when reading responses for offset fetch/commit requests. This timeout is also used for ConsumerMetadata requests that are used to query for the offset manager.</td> - </tr> - <tr> - <td>offsets.commit.max.retries</td> - <td colspan="1">5</td> - <td>Retry the offset commit up to this many times on failure. This retry count only applies to offset commits during shut-down. It does not apply to commits originating from the auto-commit thread. It also does not apply to attempts to query for the offset coordinator before committing offsets. i.e., if a consumer metadata request fails for any reason, it will be retried and that retry does not count toward this limit.</td> - </tr> - <tr> - <td>dual.commit.enabled</td> - <td colspan="1">true</td> - <td>If you are using "kafka" as offsets.storage, you can dual commit offsets to ZooKeeper (in addition to Kafka). This is required during migration from zookeeper-based offset storage to kafka-based offset storage. With respect to any given consumer group, it is safe to turn this off after all instances within that group have been migrated to the new version that commits offsets to the broker (instead of directly to ZooKeeper).</td> - </tr> - <tr> - <td>partition.assignment.strategy</td> - <td colspan="1">range</td> - <td><p>Select between the "range" or "roundrobin" strategy for assigning partitions to consumer streams.<p>The round-robin partition assignor lays out all the available partitions and all the available consumer threads. It then proceeds to do a round-robin assignment from partition to consumer thread. If the subscriptions of all consumer instances are identical, then the partitions will be uniformly distributed. (i.e., the partition ownership counts will be within a delta of exactly one across all consumer threads.) Round-robin assignment is permitted only if: (a) Every topic has the same number of streams within a consumer instance (b) The set of subscribed topics is identical for every consumer instance within the group.<p> Range partitioning works on a per-topic basis. For each topic, we lay out the available partitions in numeric order and the consumer threads in lexicographic order. We then divide the number of partitions by the total number of consumer streams (threads) to determine the number of partitions to assign to each consumer. If it does not evenly divide, then the first few consumers will have one extra partition.</td> - </tr> - </tbody> - </table> - - - <p>More details about consumer configuration can be found in the scala class <code>kafka.consumer.ConsumerConfig</code>.</p> - <h3><a id="connectconfigs" href="#connectconfigs">3.5 Kafka Connect Configs</a></h3> Below is the configuration of the Kafka Connect framework. <!--#include virtual="generated/connect_config.html" --> diff --git a/docs/implementation.html b/docs/implementation.html index 8e1c50a6895..4ecce7b4485 100644 --- a/docs/implementation.html +++ b/docs/implementation.html @@ -231,31 +231,29 @@ <h4><a id="impl_guarantees" href="#impl_guarantees">Guarantees</a></h4> <h3><a id="distributionimpl" href="#distributionimpl">5.5 Distribution</a></h3> <h4><a id="impl_offsettracking" href="#impl_offsettracking">Consumer Offset Tracking</a></h4> <p> - The high-level consumer tracks the maximum offset it has consumed in each partition and periodically commits its offset vector so that it can resume from those offsets in the event of a restart. Kafka provides the option to store all the offsets for a given consumer group in a designated broker (for that group) called the <i>offset manager</i>. i.e., any consumer instance in that consumer group should send its offset commits and fetches to that offset manager (broker). The high-level consumer handles this automatically. If you use the simple consumer you will need to manage offsets manually. This is currently unsupported in the Java simple consumer which can only commit or fetch offsets in ZooKeeper. If you use the Scala simple consumer you can discover the offset manager and explicitly commit or fetch offsets to the offset manager. A consumer can look up its offset manager by issuing a GroupCoordinatorRequest to any Kafka broker and reading the GroupCoordinatorResponse which will contain the offset manager. The consumer can then proceed to commit or fetch offsets from the offsets manager broker. In case the offset manager moves, the consumer will need to rediscover the offset manager. If you wish to manage your offsets manually, you can take a look at these <a href="https://cwiki.apache.org/confluence/display/KAFKA/Committing+and+fetching+consumer+offsets+in+Kafka">code samples that explain how to issue OffsetCommitRequest and OffsetFetchRequest</a>. + Kafka consumer tracks the maximum offset it has consumed in each partition and has the capability to commit offsets so + that it can resume from those offsets in the event of a restart. Kafka provides the option to store all the offsets for + a given consumer group in a designated broker (for that group) called the group coordinator. i.e., any consumer instance + in that consumer group should send its offset commits and fetches to that group coordinator (broker). Consumer groups are + assigned to coordinators based on their group names. A consumer can look up its coordinator by issuing a FindCoordinatorRequest + to any Kafka broker and reading the FindCoordinatorResponse which will contain the coordinator details. The consumer + can then proceed to commit or fetch offsets from the coordinator broker. In case the coordinator moves, the consumer will + need to rediscover the coordinator. Offset commits can be done automatically or manually by consumer instance. </p> <p> - When the offset manager receives an OffsetCommitRequest, it appends the request to a special <a href="#compaction">compacted</a> Kafka topic named <i>__consumer_offsets</i>. The offset manager sends a successful offset commit response to the consumer only after all the replicas of the offsets topic receive the offsets. In case the offsets fail to replicate within a configurable timeout, the offset commit will fail and the consumer may retry the commit after backing off. (This is done automatically by the high-level consumer.) The brokers periodically compact the offsets topic since it only needs to maintain the most recent offset commit per partition. The offset manager also caches the offsets in an in-memory table in order to serve offset fetches quickly. + When the group coordinator receives an OffsetCommitRequest, it appends the request to a special <a href="#compaction">compacted</a> Kafka topic named <i>__consumer_offsets</i>. + The broker sends a successful offset commit response to the consumer only after all the replicas of the offsets topic receive the offsets. + In case the offsets fail to replicate within a configurable timeout, the offset commit will fail and the consumer may retry the commit after backing off. + The brokers periodically compact the offsets topic since it only needs to maintain the most recent offset commit per partition. + The coordinator also caches the offsets in an in-memory table in order to serve offset fetches quickly. </p> <p> - When the offset manager receives an offset fetch request, it simply returns the last committed offset vector from the offsets cache. In case the offset manager was just started or if it just became the offset manager for a new set of consumer groups (by becoming a leader for a partition of the offsets topic), it may need to load the offsets topic partition into the cache. In this case, the offset fetch will fail with an OffsetsLoadInProgress exception and the consumer may retry the OffsetFetchRequest after backing off. (This is done automatically by the high-level consumer.) - </p> - - <h5><a id="offsetmigration" href="#offsetmigration">Migrating offsets from ZooKeeper to Kafka</a></h5> - <p> - Kafka consumers in earlier releases store their offsets by default in ZooKeeper. It is possible to migrate these consumers to commit offsets into Kafka by following these steps: - <ol> - <li>Set <code>offsets.storage=kafka</code> and <code>dual.commit.enabled=true</code> in your consumer config. - </li> - <li>Do a rolling bounce of your consumers and then verify that your consumers are healthy. - </li> - <li>Set <code>dual.commit.enabled=false</code> in your consumer config. - </li> - <li>Do a rolling bounce of your consumers and then verify that your consumers are healthy. - </li> - </ol> - A roll-back (i.e., migrating from Kafka back to ZooKeeper) can also be performed using the above steps if you set <code>offsets.storage=zookeeper</code>. + When the coordinator receives an offset fetch request, it simply returns the last committed offset vector from the offsets cache. + In case coordinator was just started or if it just became the coordinator for a new set of consumer groups (by becoming a leader for a partition of the offsets topic), + it may need to load the offsets topic partition into the cache. In this case, the offset fetch will fail with an + CoordinatorLoadInProgressException and the consumer may retry the OffsetFetchRequest after backing off. </p> <h4><a id="impl_zookeeper" href="#impl_zookeeper">ZooKeeper Directories</a></h4> @@ -287,47 +285,6 @@ <h4><a id="impl_zktopic" href="#impl_zktopic">Broker Topic Registry</a></h4> Each broker registers itself under the topics it maintains and stores the number of partitions for that topic. </p> - <h4><a id="impl_zkconsumers" href="#impl_zkconsumers">Consumers and Consumer Groups</a></h4> - <p> - Consumers of topics also register themselves in ZooKeeper, in order to coordinate with each other and balance the consumption of data. Consumers can also store their offsets in ZooKeeper by setting <code>offsets.storage=zookeeper</code>. However, this offset storage mechanism will be deprecated in a future release. Therefore, it is recommended to <a href="#offsetmigration">migrate offsets storage to Kafka</a>. - </p> - - <p> - Multiple consumers can form a group and jointly consume a single topic. Each consumer in the same group is given a shared group_id. - For example if one consumer is your foobar process, which is run across three machines, then you might assign this group of consumers the id "foobar". This group id is provided in the configuration of the consumer, and is your way to tell the consumer which group it belongs to. - </p> - - <p> - The consumers in a group divide up the partitions as fairly as possible, each partition is consumed by exactly one consumer in a consumer group. - </p> - - <h4><a id="impl_zkconsumerid" href="#impl_zkconsumerid">Consumer Id Registry</a></h4> - <p> - In addition to the group_id which is shared by all consumers in a group, each consumer is given a transient, unique consumer_id (of the form hostname:uuid) for identification purposes. Consumer ids are registered in the following directory. - <pre class="brush: json;"> - /consumers/[group_id]/ids/[consumer_id] --> {"version":...,"subscription":{...:...},"pattern":...,"timestamp":...} (ephemeral node) - </pre> - Each of the consumers in the group registers under its group and creates a znode with its consumer_id. The value of the znode contains a map of <topic, #streams>. This id is simply used to identify each of the consumers which is currently active within a group. This is an ephemeral node so it will disappear if the consumer process dies. - </p> - - <h4><a id="impl_zkconsumeroffsets" href="#impl_zkconsumeroffsets">Consumer Offsets</a></h4> - <p> - Consumers track the maximum offset they have consumed in each partition. This value is stored in a ZooKeeper directory if <code>offsets.storage=zookeeper</code>. - </p> - <pre class="brush: json;"> - /consumers/[group_id]/offsets/[topic]/[partition_id] --> offset_counter_value (persistent node) - </pre> - - <h4><a id="impl_zkowner" href="#impl_zkowner">Partition Owner registry</a></h4> - - <p> - Each broker partition is consumed by a single consumer within a given consumer group. The consumer must establish its ownership of a given partition before any consumption can begin. To establish its ownership, a consumer writes its own id in an ephemeral node under the particular broker partition it is claiming. - </p> - - <pre class="brush: json;"> - /consumers/[group_id]/owners/[topic]/[partition_id] --> consumer_node_id (ephemeral node) - </pre> - <h4><a id="impl_clusterid" href="#impl_clusterid">Cluster Id</a></h4> <p> @@ -342,45 +299,6 @@ <h4><a id="impl_brokerregistration" href="#impl_brokerregistration">Broker node <p> The broker nodes are basically independent, so they only publish information about what they have. When a broker joins, it registers itself under the broker node registry directory and writes information about its host name and port. The broker also register the list of existing topics and their logical partitions in the broker topic registry. New topics are registered dynamically when they are created on the broker. </p> - - <h4><a id="impl_consumerregistration" href="#impl_consumerregistration">Consumer registration algorithm</a></h4> - - <p> - When a consumer starts, it does the following: - <ol> - <li> Register itself in the consumer id registry under its group. - </li> - <li> Register a watch on changes (new consumers joining or any existing consumers leaving) under the consumer id registry. (Each change triggers rebalancing among all consumers within the group to which the changed consumer belongs.) - </li> - <li> Register a watch on changes (new brokers joining or any existing brokers leaving) under the broker id registry. (Each change triggers rebalancing among all consumers in all consumer groups.) </li> - <li> If the consumer creates a message stream using a topic filter, it also registers a watch on changes (new topics being added) under the broker topic registry. (Each change will trigger re-evaluation of the available topics to determine which topics are allowed by the topic filter. A new allowed topic will trigger rebalancing among all consumers within the consumer group.)</li> - <li> Force itself to rebalance within in its consumer group. - </li> - </ol> - </p> - - <h4><a id="impl_consumerrebalance" href="#impl_consumerrebalance">Consumer rebalancing algorithm</a></h4> - <p> - The consumer rebalancing algorithms allows all the consumers in a group to come into consensus on which consumer is consuming which partitions. Consumer rebalancing is triggered on each addition or removal of both broker nodes and other consumers within the same group. For a given topic and a given consumer group, broker partitions are divided evenly among consumers within the group. A partition is always consumed by a single consumer. This design simplifies the implementation. Had we allowed a partition to be concurrently consumed by multiple consumers, there would be contention on the partition and some kind of locking would be required. If there are more consumers than partitions, some consumers won't get any data at all. During rebalancing, we try to assign partitions to consumers in such a way that reduces the number of broker nodes each consumer has to connect to. - </p> - <p> - Each consumer does the following during rebalancing: - </p> - <pre class="brush: text;"> - 1. For each topic T that C<sub>i</sub> subscribes to - 2. let P<sub>T</sub> be all partitions producing topic T - 3. let C<sub>G</sub> be all consumers in the same group as C<sub>i</sub> that consume topic T - 4. sort P<sub>T</sub> (so partitions on the same broker are clustered together) - 5. sort C<sub>G</sub> - 6. let i be the index position of C<sub>i</sub> in C<sub>G</sub> and let N = size(P<sub>T</sub>)/size(C<sub>G</sub>) - 7. assign partitions from i*N to (i+1)*N - 1 to consumer C<sub>i</sub> - 8. remove current entries owned by C<sub>i</sub> from the partition owner registry - 9. add newly assigned partitions to the partition owner registry - (we may need to re-try this until the original partition owner releases its ownership) - </pre> - <p> - When rebalancing is triggered at one consumer, rebalancing should be triggered in other consumers within the same group about the same time. - </p> </script> <div class="p-implementation"></div> diff --git a/docs/ops.html b/docs/ops.html index 1445bfd2d25..9d4f6cc4dfe 100644 --- a/docs/ops.html +++ b/docs/ops.html @@ -127,12 +127,7 @@ <h4><a id="basic_ops_mirror_maker" href="#basic_ops_mirror_maker">Mirroring data --producer.config producer.properties --whitelist my-topic </pre> Note that we specify the list of topics with the <code>--whitelist</code> option. This option allows any regular expression using <a href="http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html">Java-style regular expressions</a>. So you could mirror two topics named <i>A</i> and <i>B</i> using <code>--whitelist 'A|B'</code>. Or you could mirror <i>all</i> topics using <code>--whitelist '*'</code>. Make sure to quote any regular expression to ensure the shell doesn't try to expand it as a file path. For convenience we allow the use of ',' instead of '|' to specify a list of topics. - <p> - Sometimes it is easier to say what it is that you <i>don't</i> want. Instead of using <code>--whitelist</code> to say what you want - to mirror you can use <code>--blacklist</code> to say what to exclude. This also takes a regular expression argument. - However, <code>--blacklist</code> is not supported when the new consumer has been enabled (i.e. when <code>bootstrap.servers</code> - has been defined in the consumer configuration). - <p> + Combining mirroring with the configuration <code>auto.create.topics.enable=true</code> makes it possible to have a replica cluster that will automatically create and replicate all data in a source cluster even as new topics are added. <h4><a id="basic_ops_consumer_lag" href="#basic_ops_consumer_lag">Checking consumer position</a></h4> @@ -140,29 +135,15 @@ <h4><a id="basic_ops_consumer_lag" href="#basic_ops_consumer_lag">Checking consu <pre class="brush: bash;"> > bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group my-group - Note: This will only show information about consumers that use the Java consumer API (non-ZooKeeper-based consumers). - TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID my-topic 0 2 4 2 consumer-1-029af89c-873c-4751-a720-cefd41a669d6 /127.0.0.1 consumer-1 my-topic 1 2 3 1 consumer-1-029af89c-873c-4751-a720-cefd41a669d6 /127.0.0.1 consumer-1 my-topic 2 2 3 1 consumer-2-42c1abd4-e3b2-425d-a8bb-e1ea49b29bb2 /127.0.0.1 consumer-2 </pre> - This tool also works with ZooKeeper-based consumers: - <pre class="brush: bash;"> - > bin/kafka-consumer-groups.sh --zookeeper localhost:2181 --describe --group my-group - - Note: This will only show information about consumers that use ZooKeeper (not those using the Java consumer API). - - TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID - my-topic 0 2 4 2 my-group_consumer-1 - my-topic 1 2 3 1 my-group_consumer-1 - my-topic 2 2 3 1 my-group_consumer-2 - </pre> - <h4><a id="basic_ops_consumer_group" href="#basic_ops_consumer_group">Managing Consumer Groups</a></h4> - With the ConsumerGroupCommand tool, we can list, describe, or delete consumer groups. When using the <a href="http://kafka.apache.org/documentation.html#newconsumerapi">new consumer API</a> (where the broker handles coordination of partition handling and rebalance), the group can be deleted manually, or automatically when the last committed offset for that group expires. Manual deletion works only if the group does not have any active members. + With the ConsumerGroupCommand tool, we can list, describe, or delete the consumer groups. The consumer group can be deleted manually, or automatically when the last committed offset for that group expires. Manual deletion works only if the group does not have any active members. For example, to list all consumer groups across all topics: @@ -186,7 +167,7 @@ <h4><a id="basic_ops_consumer_group" href="#basic_ops_consumer_group">Managing C topic3 2 243655 398812 155157 consumer4-117fe4d3-c6c1-4178-8ee9-eb4a3954bee0 /127.0.0.1 consumer4 </pre> - There are a number of additional "describe" options that can be used to provide more detailed information about a consumer group that uses the new consumer API: + There are a number of additional "describe" options that can be used to provide more detailed information about a consumer group: <ul> <li>--members: This option provides the list of all active members in the consumer group. <pre class="brush: bash;"> @@ -225,7 +206,6 @@ <h4><a id="basic_ops_consumer_group" href="#basic_ops_consumer_group">Managing C <pre class="brush: bash;"> > bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --delete --group my-group --group my-other-group - Note: This will not show information about old Zookeeper-based consumers. Deletion of requested consumer groups ('my-group', 'my-other-group') was successful. </pre> @@ -660,14 +640,8 @@ <h3><a id="datacenters" href="#datacenters">6.2 Datacenters</a></h3> <h3><a id="config" href="#config">6.3 Kafka Configuration</a></h3> <h4><a id="clientconfig" href="#clientconfig">Important Client Configurations</a></h4> - The most important old Scala producer configurations control - <ul> - <li>acks</li> - <li>compression</li> - <li>sync vs async production</li> - <li>batch size (for async producers)</li> - </ul> - The most important new Java producer configurations control + + The most important producer configurations are: <ul> <li>acks</li> <li>compression</li> @@ -805,7 +779,7 @@ <h5><a id="ext4" href="#ext4">EXT4 Notes</a></h5> <h3><a id="monitoring" href="#monitoring">6.6 Monitoring</a></h3> - Kafka uses Yammer Metrics for metrics reporting in the server and Scala clients. The Java clients use Kafka Metrics, a built-in metrics registry that minimizes transitive dependencies pulled into client applications. Both expose metrics via JMX and can be configured to report stats using pluggable stats reporters to hook up to your monitoring system. + Kafka uses Yammer Metrics for metrics reporting in the server. The Java clients use Kafka Metrics, a built-in metrics registry that minimizes transitive dependencies pulled into client applications. Both expose metrics via JMX and can be configured to report stats using pluggable stats reporters to hook up to your monitoring system. <p> All Kafka rate metrics have a corresponding cumulative count metric with suffix <code>-total</code>. For example, <code>records-consumed-rate</code> has a corresponding metric named <code>records-consumed-total</code>. @@ -985,10 +959,7 @@ <h3><a id="monitoring" href="#monitoring">6.6 Monitoring</a></h3> </tr> <tr> <td>Number of messages the consumer lags behind the producer by. Published by the consumer, not broker.</td> - <td> - <p><em>Old consumer:</em> kafka.consumer:type=ConsumerFetcherManager,name=MaxLag,clientId=([-.\w]+)</p> - <p><em>New consumer:</em> kafka.consumer:type=consumer-fetch-manager-metrics,client-id={client-id} Attribute: records-lag-max</p> - </td> + <td>kafka.consumer:type=consumer-fetch-manager-metrics,client-id={client-id} Attribute: records-lag-max</td> <td></td> </tr> <tr> @@ -1295,11 +1266,11 @@ <h5><a id="producer_sender_monitoring" href="#producer_sender_monitoring">Produc <!--#include virtual="generated/producer_metrics.html" --> - <h4><a id="new_consumer_monitoring" href="#new_consumer_monitoring">New consumer monitoring</a></h4> + <h4><a id="consumer_monitoring" href="#consumer_monitoring">consumer monitoring</a></h4> - The following metrics are available on new consumer instances. + The following metrics are available on consumer instances. - <h5><a id="new_consumer_group_monitoring" href="#new_consumer_group_monitoring">Consumer Group Metrics</a></h5> + <h5><a id="consumer_group_monitoring" href="#consumer_group_monitoring">Consumer Group Metrics</a></h5> <table class="data-table"> <tbody> <tr> @@ -1395,7 +1366,7 @@ <h5><a id="new_consumer_group_monitoring" href="#new_consumer_group_monitoring"> </tbody> </table> - <h5><a id="new_consumer_fetch_monitoring" href="#new_consumer_fetch_monitoring">Consumer Fetch Metrics</a></h5> + <h5><a id="consumer_fetch_monitoring" href="#consumer_fetch_monitoring">Consumer Fetch Metrics</a></h5> <!--#include virtual="generated/consumer_metrics.html" --> diff --git a/docs/security.html b/docs/security.html index 743d673fe80..d7859e08d72 100644 --- a/docs/security.html +++ b/docs/security.html @@ -1286,7 +1286,7 @@ <h4><a id="zk_authz_new" href="#zk_authz_new">7.6.1 New clusters</a></h4> <li> Set the configuration property <tt>zookeeper.set.acl</tt> in each broker to true</li> </ol> - The metadata stored in ZooKeeper for the Kafka cluster is world-readable, but can only be modified by the brokers. The rationale behind this decision is that the data stored in ZooKeeper is not sensitive, but inappropriate manipulation of that data can cause cluster disruption. We also recommend limiting the access to ZooKeeper via network segmentation (only brokers and some admin tools need access to ZooKeeper if the new Java consumer and producer clients are used). + The metadata stored in ZooKeeper for the Kafka cluster is world-readable, but can only be modified by the brokers. The rationale behind this decision is that the data stored in ZooKeeper is not sensitive, but inappropriate manipulation of that data can cause cluster disruption. We also recommend limiting the access to ZooKeeper via network segmentation (only brokers and some admin tools need access to ZooKeeper). <h4><a id="zk_authz_migration" href="#zk_authz_migration">7.6.2 Migrating clusters</a></h4> If you are running a version of Kafka that does not support security or simply with security disabled, and you want to make the cluster secure, then you need to execute the following steps to enable ZooKeeper authentication with minimal disruption to your operations: diff --git a/docs/toc.html b/docs/toc.html index e7d939e72b7..f1897fe3a0c 100644 --- a/docs/toc.html +++ b/docs/toc.html @@ -36,7 +36,6 @@ <li><a href="/{{version}}/documentation/streams">2.3 Streams API</a> <li><a href="#connectapi">2.4 Connect API</a> <li><a href="#adminapi">2.5 AdminClient API</a> - <li><a href="#legacyapis">2.6 Legacy APIs</a> </ul> </li> <li><a href="#configuration">3. Configuration</a> @@ -45,10 +44,6 @@ <li><a href="#topicconfigs">3.2 Topic Configs</a> <li><a href="#producerconfigs">3.3 Producer Configs</a> <li><a href="#consumerconfigs">3.4 Consumer Configs</a> - <ul> - <li><a href="#newconsumerconfigs">3.4.1 New Consumer Configs</a> - <li><a href="#oldconsumerconfigs">3.4.2 Old Consumer Configs</a> - </ul> <li><a href="#connectconfigs">3.5 Kafka Connect Configs</a> <li><a href="#streamsconfigs">3.6 Kafka Streams Configs</a> <li><a href="#adminclientconfigs">3.7 AdminClient Configs</a> ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Update documentation to remove references to old producers and consumers > ------------------------------------------------------------------------ > > Key: KAFKA-7063 > URL: https://issues.apache.org/jira/browse/KAFKA-7063 > Project: Kafka > Issue Type: Improvement > Reporter: Ismael Juma > Assignee: Manikumar > Priority: Major > Labels: newbie > > We should also remove any mention of "new consumer" or "new producer". They > should just be "producer" and "consumer". > Finally, any mention of "Scala producer/consumer/client" should also be > removed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)