[jira] [Created] (KAFKA-7813) JmxTool throws NPE when --object-name is omitted
Attila Sasvari created KAFKA-7813: - Summary: JmxTool throws NPE when --object-name is omitted Key: KAFKA-7813 URL: https://issues.apache.org/jira/browse/KAFKA-7813 Project: Kafka Issue Type: Bug Reporter: Attila Sasvari Running the JMX tool without --object-name parameter, results in a NullPointerException: {code} $ bin/kafka-run-class.sh kafka.tools.JmxTool --jmx-url service:jmx:rmi:///jndi/rmi://127.0.0.1:/jmxrmi ... Exception in thread "main" java.lang.NullPointerException at kafka.tools.JmxTool$$anonfun$3.apply(JmxTool.scala:143) at kafka.tools.JmxTool$$anonfun$3.apply(JmxTool.scala:143) at scala.collection.LinearSeqOptimized$class.exists(LinearSeqOptimized.scala:93) at scala.collection.immutable.List.exists(List.scala:84) at kafka.tools.JmxTool$.main(JmxTool.scala:143) at kafka.tools.JmxTool.main(JmxTool.scala) {code} Documentation of the tool says: {code} --object-name A JMX object name to use as a query. This can contain wild cards, and this option can be given multiple times to specify more than one query. If no objects are specified all objects will be queried. {code} Running the tool with {{--object-name ''}}, also results in an NPE: {code} $ bin/kafka-run-class.sh kafka.tools.JmxTool --jmx-url service:jmx:rmi:///jndi/rmi://127.0.0.1:/jmxrmi --object-name '' ... Exception in thread "main" java.lang.NullPointerException at kafka.tools.JmxTool$.main(JmxTool.scala:197) at kafka.tools.JmxTool.main(JmxTool.scala) {code} Runnig the tool with --object-name without an argument, the tool with OptionMissingRequiredArgumentException: {code} $ bin/kafka-run-class.sh kafka.tools.JmxTool --jmx-url service:jmx:rmi:///jndi/rmi://127.0.0.1:/jmxrmi --object-name Exception in thread "main" joptsimple.OptionMissingRequiredArgumentException: Option object-name requires an argument at joptsimple.RequiredArgumentOptionSpec.detectOptionArgument(RequiredArgumentOptionSpec.java:48) at joptsimple.ArgumentAcceptingOptionSpec.handleOption(ArgumentAcceptingOptionSpec.java:257) at joptsimple.OptionParser.handleLongOptionToken(OptionParser.java:513) at joptsimple.OptionParserState$2.handleArgument(OptionParserState.java:56) at joptsimple.OptionParser.parse(OptionParser.java:396) at kafka.tools.JmxTool$.main(JmxTool.scala:104) at kafka.tools.JmxTool.main(JmxTool.scala) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7752) zookeeper-security-migration.sh does not remove ACL on kafka-acl-extended
Attila Sasvari created KAFKA-7752: - Summary: zookeeper-security-migration.sh does not remove ACL on kafka-acl-extended Key: KAFKA-7752 URL: https://issues.apache.org/jira/browse/KAFKA-7752 Project: Kafka Issue Type: Bug Components: tools Affects Versions: 2.0.0 Reporter: Attila Sasvari Executed {{zookeeper-security-migration.sh --zookeeper.connect $(hostname -f):2181/kafka --zookeeper.acl secure}} to secure Kafka znodes and then {{zookeeper-security-migration.sh --zookeeper.connect $(hostname -f):2181/kafka --zookeeper.acl unsecure}} to unsecure those. I noticed that the tool did not remove ACLs on certain nodes: {code} ] getAcl /kafka/kafka-acl-extended 'world,'anyone : r 'sasl,'kafka : cdrwa {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-7696) kafka-delegation-tokens.sh using a config file that contains security.protocol=SASL_PLAINTEXT throws OutOfMemoryError if it tries to connect to an SSL-enabled secured br
[ https://issues.apache.org/jira/browse/KAFKA-7696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Sasvari resolved KAFKA-7696. --- Resolution: Duplicate > kafka-delegation-tokens.sh using a config file that contains > security.protocol=SASL_PLAINTEXT throws OutOfMemoryError if it tries to > connect to an SSL-enabled secured broker > - > > Key: KAFKA-7696 > URL: https://issues.apache.org/jira/browse/KAFKA-7696 > Project: Kafka > Issue Type: Bug > Components: tools >Affects Versions: 1.1.0, 2.0.0 >Reporter: Attila Sasvari >Assignee: Viktor Somogyi >Priority: Major > > When the command-config file of kafka-delegation-tokens contain > security.protocol=SASL_PLAINTEXT instead of SASL_SSL (i.e. due to a user > error), the process throws a java.lang.OutOfMemoryError upon connection > attempt to a secured (i.e. Kerberized, SSL-enabled) Kafka broker. > {code} > [2018-12-03 11:27:13,221] ERROR Uncaught exception in thread > 'kafka-admin-client-thread | adminclient-1': > (org.apache.kafka.common.utils.KafkaThread) > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.kafka.common.memory.MemoryPool$1.tryAllocate(MemoryPool.java:30) > at > org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:112) > at > org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveResponseOrToken(SaslClientAuthenticator.java:407) > at > org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveKafkaResponse(SaslClientAuthenticator.java:497) > at > org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.authenticate(SaslClientAuthenticator.java:207) > at > org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:173) > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:533) > at org.apache.kafka.common.network.Selector.poll(Selector.java:468) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:535) > at > org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1125) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7696) kafka-delegation-tokens.sh using a config file that contains security.protocol=SASL_PLAINTEXT throws OutOfMemoryError if it tries to connect to an SSL-enabled secured bro
Attila Sasvari created KAFKA-7696: - Summary: kafka-delegation-tokens.sh using a config file that contains security.protocol=SASL_PLAINTEXT throws OutOfMemoryError if it tries to connect to an SSL-enabled secured broker Key: KAFKA-7696 URL: https://issues.apache.org/jira/browse/KAFKA-7696 Project: Kafka Issue Type: Bug Components: tools Affects Versions: 2.0.0, 1.1.0 Reporter: Attila Sasvari When the command-config file of kafka-delegation-tokens contain security.protocol=SASL_PLAINTEXT instead of SASL_SSL (i.e. due to a user error), the process throws a java.lang.OutOfMemoryError upon connection attempt to a secured (i.e. Kerberized, SSL-enabled) Kafka broker. {code} [2018-12-03 11:27:13,221] ERROR Uncaught exception in thread 'kafka-admin-client-thread | adminclient-1': (org.apache.kafka.common.utils.KafkaThread) java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at org.apache.kafka.common.memory.MemoryPool$1.tryAllocate(MemoryPool.java:30) at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:112) at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveResponseOrToken(SaslClientAuthenticator.java:407) at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.receiveKafkaResponse(SaslClientAuthenticator.java:497) at org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.authenticate(SaslClientAuthenticator.java:207) at org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:173) at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:533) at org.apache.kafka.common.network.Selector.poll(Selector.java:468) at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:535) at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1125) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7691) Encypt-then-MAC Delegation token metadata
Attila Sasvari created KAFKA-7691: - Summary: Encypt-then-MAC Delegation token metadata Key: KAFKA-7691 URL: https://issues.apache.org/jira/browse/KAFKA-7691 Project: Kafka Issue Type: Improvement Reporter: Attila Sasvari Currently delegation token metadata is stored unencrypted in Zookeeper. Kafka brokers could implement a strategy called [Encrypt-then-MAC|https://en.wikipedia.org/wiki/Authenticated_encryption#Encrypt-then-MAC_(EtM)] to encrypt sensitive metadata information about delegation tokens. For more details, please read https://cwiki.apache.org/confluence/display/KAFKA/KIP-395%3A+Encypt-then-MAC+Delegation+token+metadata -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7455) JmxTool cannot connect to an SSL-enabled JMX RMI port
Attila Sasvari created KAFKA-7455: - Summary: JmxTool cannot connect to an SSL-enabled JMX RMI port Key: KAFKA-7455 URL: https://issues.apache.org/jira/browse/KAFKA-7455 Project: Kafka Issue Type: Bug Components: tools Reporter: Attila Sasvari When JmxTool tries to connect to an SSL-enabled JMX RMI port with JMXConnectorFactory'connect(), the connection attempt results in a "java.rmi.ConnectIOException: non-JRMP server at remote endpoint": {code} $ export KAFKA_OPTS="-Djavax.net.ssl.trustStore=/tmp/kafka.server.truststore.jks -Djavax.net.ssl.trustStorePassword=test" $ bin/kafka-run-class.sh kafka.tools.JmxTool --object-name "kafka.server:type=kafka-metrics-count" --jmx-url service:jmx:rmi:///jndi/rmi://localhost:9393/jmxrmi ConnectIOException: non-JRMP server at remote endpoint]. java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.CommunicationException [Root exception is java.rmi.ConnectIOException: non-JRMP server at remote endpoint] at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:369) at javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:270) at kafka.tools.JmxTool$.main(JmxTool.scala:120) at kafka.tools.JmxTool.main(JmxTool.scala) {code} The problem is that {{JmxTool}} does not specify {{SslRMIClientSocketFactory}} when it tries to connect https://github.com/apache/kafka/blob/70d90c371833b09cf934c8c2358171433892a085/core/src/main/scala/kafka/tools/JmxTool.scala#L120 {code} jmxc = JMXConnectorFactory.connect(url, null) {code} To connect to a secured RMI port, it should pass an envionrment map that contains a {{("com.sun.jndi.rmi.factory.socket", new SslRMIClientSocketFactory)}} entry. More info: - https://docs.oracle.com/cd/E19698-01/816-7609/security-35/index.html - https://docs.oracle.com/javase/8/docs/technotes/guides/management/agent.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7418) Add '--help' option to all available Kafka CLI commands
Attila Sasvari created KAFKA-7418: - Summary: Add '--help' option to all available Kafka CLI commands Key: KAFKA-7418 URL: https://issues.apache.org/jira/browse/KAFKA-7418 Project: Kafka Issue Type: Improvement Components: tools Reporter: Attila Sasvari Currently, the '--help' option is not recognized by some Kafka commands . For example: {code} $ kafka-console-producer --help help is not a recognized option {code} However, the '--help' option is supported by other commands: {code} $ kafka-verifiable-producer --help usage: verifiable-producer [-h] --topic TOPIC --broker-list HOST1:PORT1[,HOST2:PORT2[...]] [--max-messages MAX-MESSAGES] [--throughput THROUGHPUT] [--acks ACKS] [--producer.config CONFIG_FILE] [--message-create-time CREATETIME] [--value-prefix VALUE-PREFIX] ... {code} To provide a consistent user experience, it would be nice to add a '--help' option to all Kafka commands. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7392) Allow to specify subnet for Docker containers using standard CIDR notation
Attila Sasvari created KAFKA-7392: - Summary: Allow to specify subnet for Docker containers using standard CIDR notation Key: KAFKA-7392 URL: https://issues.apache.org/jira/browse/KAFKA-7392 Project: Kafka Issue Type: Improvement Components: system tests Reporter: Attila Sasvari During Kafka system test execution, the IP range of the Docker subnet, 'ducknet' is allocated by Docker. {code} docker network inspect ducknet [ { "Name": "ducknet", "Id": "f4325c524feee777817b9cc57b91634e20de96127409c1906c2c156bfeb4beeb", "Created": "2018-09-09T11:53:40.4332613Z", "Scope": "local", "Driver": "bridge", "EnableIPv6": false, "IPAM": { "Driver": "default", "Options": {}, "Config": [ { "Subnet": "172.23.0.0/16", "Gateway": "172.23.0.1" } ] }, {code} The default bridge (docker0) can be controlled [externally|https://success.docker.com/article/how-do-i-configure-the-default-bridge-docker0-network-for-docker-engine-to-a-different-subnet] through etc/docker/daemon.json, however, subnet created by ducknet is not. It might be a problem as many businesses make extensive use of the [RFC1918|https://tools.ietf.org/html/rfc1918] private address space (such as 172.16.0.0/12 : 172.16.0.0 - 172.31.255.255) for internal networks (e.g. VPN). h4. Proposed changes: - Introduce a new subnet argument that can be used by {{ducker-ak up}} to specify IP range using standard CIDR, extend help message with the following: {code} If --subnet is specified, default Docker subnet is overriden by given IP address and netmask, using standard CIDR notation. For example: 192.168.1.5/24. {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7289) Performance tools should allow user to specify output type
Attila Sasvari created KAFKA-7289: - Summary: Performance tools should allow user to specify output type Key: KAFKA-7289 URL: https://issues.apache.org/jira/browse/KAFKA-7289 Project: Kafka Issue Type: Improvement Components: tools Affects Versions: 2.0.0 Reporter: Attila Sasvari Currently, org.apache.kafka.tools.ProducerPerformance and kafka.tools.ConsumerPerformance do not provide command line options to specify output type(s). Sample output of ProducerPerformance is as follows: {code} 1000 records sent, 48107.452807 records/sec (9.18 MB/sec), 3284.34 ms avg latency, 3858.00 ms max latency, 3313 ms 50th, 3546 ms 95th, 3689 ms 99th, 3842 ms 99.9th. {code} It would be, however, nice to allow users to generate performance reports in a machine-readable format (such as CSV and JSON). This way, performance results could be easily processed by external applications (e.g. displayed in charts). It will probably require a KIP. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-7159) mark configuration files in confluent-kafka RPM SPEC file
[ https://issues.apache.org/jira/browse/KAFKA-7159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Sasvari resolved KAFKA-7159. --- Resolution: Won't Fix > mark configuration files in confluent-kafka RPM SPEC file > - > > Key: KAFKA-7159 > URL: https://issues.apache.org/jira/browse/KAFKA-7159 > Project: Kafka > Issue Type: Improvement > Components: packaging >Affects Versions: 1.1.0 > Environment: RHEL7 >Reporter: Robert Fabisiak >Priority: Trivial > Labels: rpm > > All configuration files in kafka RPM SPEC file should be marked with %config > prefix in %files section. > This would prevent overwrites during install/upgrade and uninstall operations > [https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/rpm_packaging_guide/index#files] > It's especially important to save configuration during package upgrades. > Section to change in SPEC file: > {code:java} > %files > %config(noreplace) %{_sysconfdir}/kafka/*.conf > %config(noreplace) %{_sysconfdir}/kafka/*.properties > {code} > It would also be good to mark documentation files with %doc -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6960) Remove the methods from the internal Scala AdminClient that are provided by the new AdminClient
Attila Sasvari created KAFKA-6960: - Summary: Remove the methods from the internal Scala AdminClient that are provided by the new AdminClient Key: KAFKA-6960 URL: https://issues.apache.org/jira/browse/KAFKA-6960 Project: Kafka Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Attila Sasvari This is a follow-up task of KAFKA-6884. We should remove all the methods from the internal Scala AdminClient that are provided by the new AdminClient. To "safe delete" them (i.e. {{deleteConsumerGroups, describeConsumerGroup, listGroups, listAllGroups, listAllGroupsFlattened}}), related tests need to be reviewed and adjusted (for example: the tests in core_tests and streams_test). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6883) KafkaShortnamer should allow to convert Kerberos principal name to upper case user name
Attila Sasvari created KAFKA-6883: - Summary: KafkaShortnamer should allow to convert Kerberos principal name to upper case user name Key: KAFKA-6883 URL: https://issues.apache.org/jira/browse/KAFKA-6883 Project: Kafka Issue Type: Improvement Reporter: Attila Sasvari KAFKA-5764 implemented support to convert Kerberos principal name to lower case Linux user name via auth_to_local rules. As a follow-up, KafkaShortnamer could be further extended to allow converting principal names to uppercase by appending /U to the rule. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-6799) Consumer livelock during consumer group rebalance
[ https://issues.apache.org/jira/browse/KAFKA-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Sasvari resolved KAFKA-6799. --- Resolution: Information Provided > Consumer livelock during consumer group rebalance > - > > Key: KAFKA-6799 > URL: https://issues.apache.org/jira/browse/KAFKA-6799 > Project: Kafka > Issue Type: Bug > Components: clients, consumer >Affects Versions: 1.0.0, 0.11.0.2, 1.1.0 >Reporter: Pierre-Henri Dezanneau >Assignee: Attila Sasvari >Priority: Critical > > We have the following environment: > * 1 kafka cluster with 3 brokers > * 1 topic with 3 partitions > * 1 producer > * 1 consumer group with 3 consumers > From this setup, we remove one broker from the cluster, the hard way, by > simply killing it. Quite often, we see that the consumer group is not > rebalanced correctly. By that I mean that all 3 consumers stop consuming and > get stuck in a loop, forever. > The thread dump shows that the consumer threads aren't blocked but run > forever in {{AbstractCoordinator.ensureCoordinatorReady}}, holding a lock due > to the {{synchonized}} keyword on the calling method. Heartbeat threads are > blocked, waiting for the consumer threads to release the lock. This situation > prevents all consumers from consuming any more record. > We build a simple project which seems to reliably demonstrate this: > {code:sh} > $ mkdir -p /tmp/sandbox && cd /tmp/sandbox > $ git clone https://github.com/phdezann/helloworld-kafka-livelock > $ cd helloworld-kafka-livelock && ./spin.sh > ... > livelock detected > {code} > {code:sh|title=Consumer thread|borderStyle=solid} > "kafka-consumer-1@10733" daemon prio=5 tid=0x31 nid=NA runnable > java.lang.Thread.State: RUNNABLE >blocks kafka-coordinator-heartbeat-thread | helloWorldGroup@10728 > at sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:-1) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) > - locked <0x2a15> (a sun.nio.ch.EPollSelectorImpl) > - locked <0x2a16> (a java.util.Collections$UnmodifiableSet) > - locked <0x2a17> (a sun.nio.ch.Util$3) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) > at org.apache.kafka.common.network.Selector.select(Selector.java:684) > at org.apache.kafka.common.network.Selector.poll(Selector.java:408) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:261) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:156) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:228) > - locked <0x2a0c> (a > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:205) > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:279) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1149) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1115) > at > org.helloworld.kafka.bus.HelloWorldKafkaListener.lambda$createConsumerInDedicatedThread$0(HelloWorldKafkaListener.java:45) > at > org.helloworld.kafka.bus.HelloWorldKafkaListener$$Lambda$42.1776656466.run(Unknown > Source:-1) > at java.lang.Thread.run(Thread.java:748) > {code} > {code:sh|title=Heartbeat thread|borderStyle=solid} > "kafka-coordinator-heartbeat-thread | helloWorldGroup@10728" daemon prio=5 > tid=0x36 nid=NA waiting for monitor entry > java.lang.Thread.State: BLOCKED >waiting for kafka-consumer-1@10733 to release lock on <0x2a0c> (a > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator) > at java.lang.Object.wait(Object.java:-1) > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:955) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-6703) MirrorMaker cannot make progress when any matched topic from a whitelist regexp has -1 leader
[ https://issues.apache.org/jira/browse/KAFKA-6703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Sasvari resolved KAFKA-6703. --- Resolution: Not A Bug It turned out I forgot to set proper replication factor for {{__consumer_offsets}}. The default replication factor is 1, and the consumer group controller (determined by {{partitionFor(group)}} in GroupCoordinator) was down. Changing replication factor to 3, I did not experience the issue. I still saw a couple of messages like {code:java} [2018-03-23 16:45:52,298] DEBUG [Consumer clientId=2-1, groupId=2] Leader for partition testR1P3-2 is unavailable for fetching offset (org.apache.kafka.clients.consumer.internals.Fetcher){code} but messages in other topics matched by the whitelist regexp were fetched by MirrorMaker. > MirrorMaker cannot make progress when any matched topic from a whitelist > regexp has -1 leader > - > > Key: KAFKA-6703 > URL: https://issues.apache.org/jira/browse/KAFKA-6703 > Project: Kafka > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Attila Sasvari >Priority: Major > > Scenario: > - MM whitelabel regexp matches multiple topics > - destination cluster has 5 brokers with multiple topics replication factor 3 > - without partition reassign shut down 2 brokers > - suppose a topic has no leader any more because it was off-sync and the > leader and the rest of the replicas are hosted on the downed brokers. > - so we have 1 topic with some partitions with leader -1 > - the rest of the matching topics has 3 replicas with leaders > MM will not produce into any of the matched topics until: > - the "orphaned" topic removed or > - the partition reassign carried out from the downed brokers (suppose you > can turn these back on) > In the MirrorMaker logs, there are a lot of messages like the following ones: > {code} > [2018-03-22 19:55:32,522] DEBUG [Consumer clientId=consumer-1, > groupId=console-consumer-43054] Coordinator discovery failed, refreshing > metadata (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2018-03-22 19:55:32,522] DEBUG [Consumer clientId=consumer-1, > groupId=console-consumer-43054] Sending metadata request > (type=MetadataRequest, topics=) to node 192.168.1.102:9092 (id: 0 rack: > null) (org.apache.kafka.clients.NetworkClient) > [2018-03-22 19:55:32,525] DEBUG Updated cluster metadata version 10 to > Cluster(id = Y-qtoFP-RMq2uuVnkEKAAw, nodes = [192.168.1.102:9092 (id: 0 rack: > null)], partitions = [Partition(topic = testR1P2, partition = 1, leader = > none, replicas = [42], isr = [], offlineReplicas = [42]), Partition(topic = > testR1P1, partition = 0, leader = 0, replicas = [0], isr = [0], > offlineReplicas = []), Partition(topic = testAlive, partition = 0, leader = > 0, replicas = [0], isr = [0], offlineReplicas = []), Partition(topic = > testERRR, partition = 0, leader = 0, replicas = [0], isr = [0], > offlineReplicas = []), Partition(topic = testR1P2, partition = 0, leader = 0, > replicas = [0], isr = [0], offlineReplicas = [])]) > (org.apache.kafka.clients.Metadata) > [2018-03-22 19:55:32,525] DEBUG [Consumer clientId=consumer-1, > groupId=console-consumer-43054] Sending FindCoordinator request to broker > 192.168.1.102:9092 (id: 0 rack: null) > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2018-03-22 19:55:32,525] DEBUG [Consumer clientId=consumer-1, > groupId=console-consumer-43054] Received FindCoordinator response > ClientResponse(receivedTimeMs=1521744932525, latencyMs=0, disconnected=false, > requestHeader=RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=1, > clientId=consumer-1, correlationId=19), > responseBody=FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null', > error=COORDINATOR_NOT_AVAILABLE, node=:-1 (id: -1 rack: null))) > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > [2018-03-22 19:55:32,526] DEBUG [Consumer clientId=consumer-1, > groupId=console-consumer-43054] Group coordinator lookup failed: The > coordinator is not available. > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) > {code} > Interestingly, if MirrorMaker uses {{zookeeper.connect}} in its consumer > properties file, then an OldConsumer is created, and it can make progress. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6703) MirrorMaker cannot make progress when any matched topic from a whitelist regexp has -1 leader
Attila Sasvari created KAFKA-6703: - Summary: MirrorMaker cannot make progress when any matched topic from a whitelist regexp has -1 leader Key: KAFKA-6703 URL: https://issues.apache.org/jira/browse/KAFKA-6703 Project: Kafka Issue Type: Bug Affects Versions: 1.1.0 Reporter: Attila Sasvari Scenario: - MM whitelabel regexp matches multiple topics - destination cluster has 5 brokers with multiple topics replication factor 3 - without partition reassign shut down 2 brokers - suppose a topic has no leader any more because it was off-sync and the leader and the rest of the replicas are hosted on the downed brokers. - so we have 1 topic with some partitions with leader -1 - the rest of the matching topics has 3 replicas with leaders MM will not produce into any of the matched topics until: - the "orphaned" topic removed or - the partition reassign carried out from the downed brokers (suppose you can turn these back on) In the MirrorMaker logs, there are a lot of messages like the following ones: {code} [2018-03-22 18:59:07,781] DEBUG [Consumer clientId=1-1, groupId=1] Sending FindCoordinator request to broker 192.168.1.102:9092 (id: 0 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2018-03-22 18:59:07,781] DEBUG [Consumer clientId=1-0, groupId=1] Sending FindCoordinator request to broker 192.168.1.102:9092 (id: 0 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-0, groupId=1] Received FindCoordinator response ClientResponse(receivedTimeMs=1521741547782, latencyMs=1, disconnected=false, requestHeader=RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=1, clientId=1-0, correlationId=71), responseBody=FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null', error=COORDINATOR_NOT_AVAILABLE, node=:-1 (id: -1 rack: null))) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-1, groupId=1] Received FindCoordinator response ClientResponse(receivedTimeMs=1521741547782, latencyMs=1, disconnected=false, requestHeader=RequestHeader(apiKey=FIND_COORDINATOR, apiVersion=1, clientId=1-1, correlationId=71), responseBody=FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null', error=COORDINATOR_NOT_AVAILABLE, node=:-1 (id: -1 rack: null))) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-0, groupId=1] Group coordinator lookup failed: The coordinator is not available. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-1, groupId=1] Group coordinator lookup failed: The coordinator is not available. (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-0, groupId=1] Coordinator discovery failed, refreshing metadata (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) [2018-03-22 18:59:07,783] DEBUG [Consumer clientId=1-1, groupId=1] Coordinator discovery failed, refreshing metadata (org.apache.kafka.clients.consumer.internals.AbstractCoordinator) {code} Interestingly, if MirrorMaker uses {{zookeeper.connect}} in its consumer properties file, then an OldConsumer is created, and it can make progress. -- This message was sent by Atlassian JIRA (v7.6.3#76005)