[jira] [Commented] (KAFKA-9535) Metadata not updated when consumer encounters FENCED_LEADER_EPOCH
[ https://issues.apache.org/jira/browse/KAFKA-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17034172#comment-17034172 ] ASF GitHub Bot commented on KAFKA-9535: --- abbccdda commented on pull request #8088: KAFKA-9535: Update metadata upon receiving FENCED_LEADER_EPOCH in ListOffset URL: https://github.com/apache/kafka/pull/8088 Today if we attempt to list offsets with a fenced leader epoch, consumer will infinitely retry without updating the metadata. The fix is to trigger the metadata update call when we see `FENCED_LEADER_EPOCH`, even as partial failure. `UNKNOWN_LEADER_EPOCH`, on the other hand, indicates a metadata staleness on broker side, so consumer don't have to update metadata. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Metadata not updated when consumer encounters FENCED_LEADER_EPOCH > - > > Key: KAFKA-9535 > URL: https://issues.apache.org/jira/browse/KAFKA-9535 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.3.0, 2.4.0, 2.5.0 >Reporter: Boyang Chen >Priority: Major > > Inside the consumer Fetcher's handling of ListOffsetResponse, if we hit > `FENCED_LEADER_EPOCH` on partition level, the client will blindly retry > without refreshing the metadata, creating a stuck state as the local leader > epoch never gets updated and constantly fails the broker check. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9525) Allow explicit rebalance triggering on the Consumer
[ https://issues.apache.org/jira/browse/KAFKA-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17034116#comment-17034116 ] ASF GitHub Bot commented on KAFKA-9525: --- ableegoldman commented on pull request #8087: KAFKA-9525: add enforceRebalance method to Consumer API URL: https://github.com/apache/kafka/pull/8087 As described in [KIP-568](https://cwiki.apache.org/confluence/display/KAFKA/KIP-568%3A+Explicit+rebalance+triggering+on+the+Consumer). Waiting on acceptance of the KIP to write the tests, on the off chance something changes. But rest assured unit tests are coming ⚡️ Will also kick off existing Streams system tests which leverage this new API (eg version probing, sometimes broker bounce) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow explicit rebalance triggering on the Consumer > --- > > Key: KAFKA-9525 > URL: https://issues.apache.org/jira/browse/KAFKA-9525 > Project: Kafka > Issue Type: Improvement > Components: clients >Reporter: Sophie Blee-Goldman >Assignee: Sophie Blee-Goldman >Priority: Major > Labels: needs-kip > > Currently the only way to explicitly trigger a rebalance is by unsubscribing > the consumer. This has two drawbacks: it does not work with static > membership, and it causes the consumer to revoke all its currently owned > partitions. Streams relies on being able to enforce a rebalance for its > version probing upgrade protocol and the upcoming KIP-441, both of which > should be able to work with static membership and be able to leverage the > improvements of KIP-429 to no longer revoke all owned partitions. > We should add an API that will allow users to explicitly trigger a rebalance > without going through #unsubscribe -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KAFKA-9047) AdminClient group operations may not respect backoff
[ https://issues.apache.org/jira/browse/KAFKA-9047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanjana Kaundinya reassigned KAFKA-9047: Assignee: Sanjana Kaundinya (was: Vikas Singh) > AdminClient group operations may not respect backoff > > > Key: KAFKA-9047 > URL: https://issues.apache.org/jira/browse/KAFKA-9047 > Project: Kafka > Issue Type: Bug > Components: admin >Reporter: Jason Gustafson >Assignee: Sanjana Kaundinya >Priority: Major > > The retry logic for consumer group operations in the admin client is > complicated by the need to find the coordinator. Instead of simply retry > loops which send the same request over and over, we can get more complex > retry loops like the following: > # Send FindCoordinator to B -> Coordinator is A > # Send DescribeGroup to A -> NOT_COORDINATOR > # Go back to 1 > Currently we construct a new Call object for each step in this loop, which > means we lose some of retry bookkeeping such as the last retry time and the > number of tries. This means it is possible to have tight retry loops which > bounce between steps 1 and 2 and do not respect the retry backoff. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KAFKA-8403) Suppress needs a Materialized variant
[ https://issues.apache.org/jira/browse/KAFKA-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjin Lee reassigned KAFKA-8403: -- Assignee: Dongjin Lee > Suppress needs a Materialized variant > - > > Key: KAFKA-8403 > URL: https://issues.apache.org/jira/browse/KAFKA-8403 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: John Roesler >Assignee: Dongjin Lee >Priority: Major > Labels: needs-kip > > WIP KIP: > [https://cwiki.apache.org/confluence/display/KAFKA/KIP-508%3A+Make+Suppression+State+Queriable] > The newly added KTable Suppress operator lacks a Materialized variant, which > would be useful if you wanted to query the results of the suppression. > Suppression results will eventually match the upstream results, but the > intermediate distinction may be meaningful for some applications. For > example, you could want to query only the final results of a windowed > aggregation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9536) Integration tests for KIP-558
Konstantine Karantasis created KAFKA-9536: - Summary: Integration tests for KIP-558 Key: KAFKA-9536 URL: https://issues.apache.org/jira/browse/KAFKA-9536 Project: Kafka Issue Type: Test Reporter: Konstantine Karantasis Assignee: Konstantine Karantasis Extend testing coverage for [KIP-558|https://cwiki.apache.org/confluence/display/KAFKA/KIP-558%3A+Track+the+set+of+actively+used+topics+by+connectors+in+Kafka+Connect] with integration tests and additional unit tests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9440) Add ConsumerGroupCommand to delete static members
[ https://issues.apache.org/jira/browse/KAFKA-9440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17034056#comment-17034056 ] Jason Gustafson commented on KAFKA-9440: [~xuel1] Feel free to pick this up if you have time. Here is the link to KIP guidelines: https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals. > Add ConsumerGroupCommand to delete static members > - > > Key: KAFKA-9440 > URL: https://issues.apache.org/jira/browse/KAFKA-9440 > Project: Kafka > Issue Type: Improvement >Reporter: Boyang Chen >Priority: Major > Labels: help-wanted, kip, newbie, newbie++ > > We introduced a new AdminClient API removeMembersFromConsumerGroup in 2.4. It > would be good to instantiate the API as part of the ConsumerGroupCommand for > easy command line usage. > This change requires a new KIP, and just posting out here in case anyone who > uses static membership to pick it up, if they would like to use. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9181) Flaky test kafka.api.SaslGssapiSslEndToEndAuthorizationTest.testNoConsumeWithoutDescribeAclViaSubscribe
[ https://issues.apache.org/jira/browse/KAFKA-9181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17034043#comment-17034043 ] ASF GitHub Bot commented on KAFKA-9181: --- soondenana commented on pull request #8084: KAFKA-9181; Maintain clean separation between local and group subscriptions in consumer's SubscriptionState (#7941) URL: https://github.com/apache/kafka/pull/8084 Reviewers: Jason Gustafson , Guozhang Wang (cherry picked from commit a565d1a182cc69c9994c4512b5e9877e97f06cdf) *More detailed description of your change, if necessary. The PR title and PR message become the squashed commit message, so use a separate comment to ping reviewers.* *Summary of testing strategy (including rationale) for the feature or bug fix. Unit and/or integration tests are expected for any behaviour change and system tests should be considered for larger changes.* ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Flaky test > kafka.api.SaslGssapiSslEndToEndAuthorizationTest.testNoConsumeWithoutDescribeAclViaSubscribe > --- > > Key: KAFKA-9181 > URL: https://issues.apache.org/jira/browse/KAFKA-9181 > Project: Kafka > Issue Type: Test > Components: core >Reporter: Bill Bejeck >Assignee: Rajini Sivaram >Priority: Major > Labels: flaky-test, tests > Fix For: 2.5.0 > > > Failed in > [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/26571/testReport/junit/kafka.api/SaslGssapiSslEndToEndAuthorizationTest/testNoConsumeWithoutDescribeAclViaSubscribe/] > > {noformat} > Error Messageorg.apache.kafka.common.errors.TopicAuthorizationException: Not > authorized to access topics: > [topic2]Stacktraceorg.apache.kafka.common.errors.TopicAuthorizationException: > Not authorized to access topics: [topic2] > Standard OutputAdding ACLs for resource > `ResourcePattern(resourceType=CLUSTER, name=kafka-cluster, > patternType=LITERAL)`: > (principal=User:kafka, host=*, operation=CLUSTER_ACTION, > permissionType=ALLOW) > Current ACLs for resource `Cluster:LITERAL:kafka-cluster`: > User:kafka has Allow permission for operations: ClusterAction from > hosts: * > Adding ACLs for resource `ResourcePattern(resourceType=TOPIC, name=*, > patternType=LITERAL)`: > (principal=User:kafka, host=*, operation=READ, permissionType=ALLOW) > Current ACLs for resource `Topic:LITERAL:*`: > User:kafka has Allow permission for operations: Read from hosts: * > Debug is true storeKey true useTicketCache false useKeyTab true doNotPrompt > false ticketCache is null isInitiator true KeyTab is > /tmp/kafka6494439724844851846.tmp refreshKrb5Config is false principal is > kafka/localh...@example.com tryFirstPass is false useFirstPass is false > storePass is false clearPass is false > principal is kafka/localh...@example.com > Will use keytab > Commit Succeeded > [2019-11-13 04:43:16,187] ERROR [ReplicaFetcher replicaId=1, leaderId=0, > fetcherId=0] Error for partition __consumer_offsets-0 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-11-13 04:43:16,191] ERROR [ReplicaFetcher replicaId=2, leaderId=0, > fetcherId=0] Error for partition __consumer_offsets-0 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-11-13 04:43:16,384] ERROR [ReplicaFetcher replicaId=1, leaderId=2, > fetcherId=0] Error for partition e2etopic-0 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-11-13 04:43:16,384] ERROR [ReplicaFetcher replicaId=0, leaderId=2, > fetcherId=0] Error for partition e2etopic-0 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > Adding ACLs for resource `ResourcePattern(resourceType=TOPIC, name=e2etopic, > patternType=LITERAL)`: > (principal=User:client, host=*, operation=WRITE,
[jira] [Updated] (KAFKA-9535) Metadata not updated when consumer encounters FENCED_LEADER_EPOCH
[ https://issues.apache.org/jira/browse/KAFKA-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyang Chen updated KAFKA-9535: --- Summary: Metadata not updated when consumer encounters FENCED_LEADER_EPOCH (was: Metadata not updated when consumer encounters leader epoch related failures) > Metadata not updated when consumer encounters FENCED_LEADER_EPOCH > - > > Key: KAFKA-9535 > URL: https://issues.apache.org/jira/browse/KAFKA-9535 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.3.0, 2.4.0, 2.5.0 >Reporter: Boyang Chen >Priority: Major > > Inside the consumer Fetcher's handling of ListOffsetResponse, if we hit > `FENCED_LEADER_EPOCH` on partition level, the client will blindly retry > without refreshing the metadata, creating a stuck state as the local leader > epoch never gets updated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9535) Metadata not updated when consumer encounters FENCED_LEADER_EPOCH
[ https://issues.apache.org/jira/browse/KAFKA-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyang Chen updated KAFKA-9535: --- Description: Inside the consumer Fetcher's handling of ListOffsetResponse, if we hit `FENCED_LEADER_EPOCH` on partition level, the client will blindly retry without refreshing the metadata, creating a stuck state as the local leader epoch never gets updated and constantly fails the broker check. (was: Inside the consumer Fetcher's handling of ListOffsetResponse, if we hit `FENCED_LEADER_EPOCH` on partition level, the client will blindly retry without refreshing the metadata, creating a stuck state as the local leader epoch never gets updated.) > Metadata not updated when consumer encounters FENCED_LEADER_EPOCH > - > > Key: KAFKA-9535 > URL: https://issues.apache.org/jira/browse/KAFKA-9535 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.3.0, 2.4.0, 2.5.0 >Reporter: Boyang Chen >Priority: Major > > Inside the consumer Fetcher's handling of ListOffsetResponse, if we hit > `FENCED_LEADER_EPOCH` on partition level, the client will blindly retry > without refreshing the metadata, creating a stuck state as the local leader > epoch never gets updated and constantly fails the broker check. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9535) Metadata not updated when consumer encounters leader epoch related failures
[ https://issues.apache.org/jira/browse/KAFKA-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyang Chen updated KAFKA-9535: --- Description: Inside the consumer Fetcher's handling of ListOffsetResponse, if we hit `FENCED_LEADER_EPOCH` on partition level, the client will blindly retry without refreshing the metadata, creating a stuck state as the local leader epoch never gets updated. > Metadata not updated when consumer encounters leader epoch related failures > --- > > Key: KAFKA-9535 > URL: https://issues.apache.org/jira/browse/KAFKA-9535 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.3.0, 2.4.0, 2.5.0 >Reporter: Boyang Chen >Priority: Major > > Inside the consumer Fetcher's handling of ListOffsetResponse, if we hit > `FENCED_LEADER_EPOCH` on partition level, the client will blindly retry > without refreshing the metadata, creating a stuck state as the local leader > epoch never gets updated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9535) Metadata not updated when consumer encounters leader epoch related failures
[ https://issues.apache.org/jira/browse/KAFKA-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyang Chen updated KAFKA-9535: --- Affects Version/s: 2.5.0 2.3.0 2.4.0 > Metadata not updated when consumer encounters leader epoch related failures > --- > > Key: KAFKA-9535 > URL: https://issues.apache.org/jira/browse/KAFKA-9535 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.3.0, 2.4.0, 2.5.0 >Reporter: Boyang Chen >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9374) Worker can be disabled by blocked connectors
[ https://issues.apache.org/jira/browse/KAFKA-9374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17034006#comment-17034006 ] Chris Egerton commented on KAFKA-9374: -- Hi [~tombentley], sorry for the delay in reply. After some thought, it seems that setting a timeout on interactions with connectors but keeping those actions synchronous within the herder's tick method isn't really a viable approach. There doesn't seem to be a good value to use for that timeout; if it's too conservative it may be impossible to start some connectors that have to do heavy-duty initialization on startup, and if it's too liberal there will still be the original problem (for however long the timeout is) of the worker being effectively disabled during that period, and potentially even dropping out of the group due. Instead, in [https://github.com/apache/kafka/pull/8069], I've made changes to make most connector interactions (specifically, calls to the start, {{stop}}, {{config}}, {{validate}}, and {{initialize}} methods) completely asynchronous and handle any follow-up logic via callback. In the {{DistributedHerder}} class, this callback adds a new herder request to the queue, which helps keep the class thread-safe and preserves some of the guarantees provided by the current {{tick}} model. Unfortunately, this means that status tracking for connectors becomes... difficult. If we don't establish a timeout for any of our connector interactions, we also then don't have a good metric for know if/when to update the status of a connector to {{FAILED}}. At this point, the best we may be able to do is include log messages detailing when certain connector interactions are scheduled, and when those interactions are complete. That should at least provide a decent method for diagnosing via log files whether a connector is blocking and effectively a zombie. In the future, a KIP may be warranted for adding a new metric to track the number and types of zombie connectors/tasks. This also still leaves the door open for zombie thread creation; any connector that blocks in any of the aforementioned methods will still be taking up a thread until/unless it returns control to the framework. > Worker can be disabled by blocked connectors > > > Key: KAFKA-9374 > URL: https://issues.apache.org/jira/browse/KAFKA-9374 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 1.0.0, 1.0.1, 1.0.2, 1.1.0, 1.1.1, 2.0.0, 2.0.1, 2.1.0, > 2.2.0, 2.1.1, 2.3.0, 2.2.1, 2.2.2, 2.4.0, 2.3.1 >Reporter: Chris Egerton >Assignee: Chris Egerton >Priority: Major > > If a connector hangs during any of its {{initialize}}, {{start}}, {{stop}}, > \{taskConfigs}}, {{taskClass}}, {{version}}, {{config}}, or {{validate}} > methods, the worker will be disabled for some types of requests thereafter, > including connector creation, connector reconfiguration, and connector > deletion. > -This only occurs in distributed mode and is due to the threading model used > by the > [DistributedHerder|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java] > class.- This affects both distributed and standalone mode. Distributed > herders perform some connector work synchronously in their {{tick}} thread, > which also handles group membership and some REST requests. The majority of > the herder methods for the standalone herder are {{synchronized}}, including > those for creating, updating, and deleting connectors; as long as one of > those methods blocks, all subsequent calls to any of these methods will also > be blocked. > > One potential solution could be to treat connectors that fail to start, stop, > etc. in time similarly to tasks that fail to stop within the [task graceful > shutdown timeout > period|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerConfig.java#L121-L126] > by handling all connector interactions on a separate thread, waiting for > them to complete within a timeout, and abandoning the thread (and > transitioning the connector to the {{FAILED}} state, if it has been created > at all) if that timeout expires. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9535) Metadata not updated when consumer encounters leader epoch related failures
Boyang Chen created KAFKA-9535: -- Summary: Metadata not updated when consumer encounters leader epoch related failures Key: KAFKA-9535 URL: https://issues.apache.org/jira/browse/KAFKA-9535 Project: Kafka Issue Type: Bug Reporter: Boyang Chen -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-7061) Enhanced log compaction
[ https://issues.apache.org/jira/browse/KAFKA-7061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17034004#comment-17034004 ] Senthilnathan Muthusamy commented on KAFKA-7061: [~guozhang] [~junrao] [~mjsax] looking to get this moving on... waiting for long time on the code review... can you please help moving. 12th is the code freeze date, right? are you guys thinking we can't make it for the 2.5 release? > Enhanced log compaction > --- > > Key: KAFKA-7061 > URL: https://issues.apache.org/jira/browse/KAFKA-7061 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 2.5.0 >Reporter: Luis Cabral >Assignee: Senthilnathan Muthusamy >Priority: Major > Labels: kip > > Enhance log compaction to support more than just offset comparison, so the > insertion order isn't dictating which records to keep. > Default behavior is kept as it was, with the enhanced approached having to be > purposely activated. > The enhanced compaction is done either via the record timestamp, by settings > the new configuration as "timestamp" or via the record headers by setting > this configuration to anything other than the default "offset" or the > reserved "timestamp". > See > [KIP-280|https://cwiki.apache.org/confluence/display/KAFKA/KIP-280%3A+Enhanced+log+compaction] > for more details. > +From Guozhang:+ We should emphasize on the WIKI that the newly introduced > config yields to the existing "log.cleanup.policy", i.e. if the latter's > value is `delete` not `compact`, then the previous config would be ignored. > +From Jun Rao:+ With the timestamp/header strategy, the behavior of the > application may need to change. In particular, the application can't just > blindly take the record with a larger offset and assuming that it's the value > to keep. It needs to check the timestamp or the header now. So, it would be > useful to at least document this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9534) Topics could not be deleted when there is a concurrent create topic request loop
[ https://issues.apache.org/jira/browse/KAFKA-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033997#comment-17033997 ] Boyang Chen commented on KAFKA-9534: The reproduction is not very consistent, will take some time to further debug. > Topics could not be deleted when there is a concurrent create topic request > loop > > > Key: KAFKA-9534 > URL: https://issues.apache.org/jira/browse/KAFKA-9534 > Project: Kafka > Issue Type: Bug > Components: admin >Affects Versions: 2.3.0, 2.5.0 >Reporter: Boyang Chen >Priority: Major > > The reproduce steps: > # start local ZK > # start local broker > # Run the following script which keeps creating an input topic until > success: > > {code:java} > package kafka.examples; > import org.apache.kafka.clients.admin.Admin; > import org.apache.kafka.clients.admin.NewTopic; > import org.apache.kafka.clients.consumer.ConsumerConfig; > import org.apache.kafka.common.errors.TopicExistsException; > import java.util.Arrays; > import java.util.List; > import java.util.Properties; > import java.util.concurrent.CountDownLatch; > import java.util.concurrent.ExecutionException; > public class Reproduce { > public static void main(String[] args) throws ExecutionException, > InterruptedException { > Properties props = new Properties(); > props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, > KafkaProperties.KAFKA_SERVER_URL + ":" + > KafkaProperties.KAFKA_SERVER_PORT); > Admin adminClient = Admin.create(props); > createTopics(adminClient); > CountDownLatch createSucceed = new CountDownLatch(1); > Thread deleteTopicThread = new Thread(() -> { > List topicsToDelete = Arrays.asList("input-topic", > "output-topic"); > while (true) { > try { > Thread.sleep(1000); > adminClient.deleteTopics(topicsToDelete).all().get(); > if (createSucceed.getCount() == 0) { > break; > } > } catch (ExecutionException | InterruptedException e) { > System.out.println("Encountered exception during topic > deletion: " + e.getCause()); > } > } > System.out.println("Deleted old topics: " + topicsToDelete); > }); > deleteTopicThread.start(); > while (true) { > try { > createTopics(adminClient); > System.out.println("Created new topic!"); > break; > } catch (ExecutionException | InterruptedException e) { > if (!(e.getCause() instanceof TopicExistsException)) { > throw e; > } > System.out.println("Metadata of the old topics are not > cleared yet... " + e.getMessage()); > Thread.sleep(1000); > } > } > createSucceed.countDown(); > deleteTopicThread.join(); > } > private static void createTopics(Admin adminClient) throws > InterruptedException, ExecutionException { > adminClient.createTopics(Arrays.asList( > new NewTopic("input-topic", 1, (short) 1), > new NewTopic("output-topic", 1, (short) 1))).all().get(); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9534) Topics could not be deleted when there is a concurrent create topic request loop
[ https://issues.apache.org/jira/browse/KAFKA-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyang Chen updated KAFKA-9534: --- Description: The reproduce steps: # start local ZK # start local broker # Run the following script which keeps creating an input topic until success: {code:java} package kafka.examples; import org.apache.kafka.clients.admin.Admin; import org.apache.kafka.clients.admin.NewTopic; import org.apache.kafka.clients.consumer.ConsumerConfig; import org.apache.kafka.common.errors.TopicExistsException; import java.util.Arrays; import java.util.List; import java.util.Properties; import java.util.concurrent.CountDownLatch; import java.util.concurrent.ExecutionException; public class Reproduce { public static void main(String[] args) throws ExecutionException, InterruptedException { Properties props = new Properties(); props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, KafkaProperties.KAFKA_SERVER_URL + ":" + KafkaProperties.KAFKA_SERVER_PORT); Admin adminClient = Admin.create(props); createTopics(adminClient); CountDownLatch createSucceed = new CountDownLatch(1); Thread deleteTopicThread = new Thread(() -> { List topicsToDelete = Arrays.asList("input-topic", "output-topic"); while (true) { try { Thread.sleep(1000); adminClient.deleteTopics(topicsToDelete).all().get(); if (createSucceed.getCount() == 0) { break; } } catch (ExecutionException | InterruptedException e) { System.out.println("Encountered exception during topic deletion: " + e.getCause()); } } System.out.println("Deleted old topics: " + topicsToDelete); }); deleteTopicThread.start(); while (true) { try { createTopics(adminClient); System.out.println("Created new topic!"); break; } catch (ExecutionException | InterruptedException e) { if (!(e.getCause() instanceof TopicExistsException)) { throw e; } System.out.println("Metadata of the old topics are not cleared yet... " + e.getMessage()); Thread.sleep(1000); } } createSucceed.countDown(); deleteTopicThread.join(); } private static void createTopics(Admin adminClient) throws InterruptedException, ExecutionException { adminClient.createTopics(Arrays.asList( new NewTopic("input-topic", 1, (short) 1), new NewTopic("output-topic", 1, (short) 1))).all().get(); } } {code} was:Not sure this is indeed a bug, just making it under track. > Topics could not be deleted when there is a concurrent create topic request > loop > > > Key: KAFKA-9534 > URL: https://issues.apache.org/jira/browse/KAFKA-9534 > Project: Kafka > Issue Type: Bug > Components: admin >Affects Versions: 2.3.0, 2.5.0 >Reporter: Boyang Chen >Priority: Major > > The reproduce steps: > # start local ZK > # start local broker > # Run the following script which keeps creating an input topic until > success: > > {code:java} > package kafka.examples; > import org.apache.kafka.clients.admin.Admin; > import org.apache.kafka.clients.admin.NewTopic; > import org.apache.kafka.clients.consumer.ConsumerConfig; > import org.apache.kafka.common.errors.TopicExistsException; > import java.util.Arrays; > import java.util.List; > import java.util.Properties; > import java.util.concurrent.CountDownLatch; > import java.util.concurrent.ExecutionException; > public class Reproduce { > public static void main(String[] args) throws ExecutionException, > InterruptedException { > Properties props = new Properties(); > props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, > KafkaProperties.KAFKA_SERVER_URL + ":" + > KafkaProperties.KAFKA_SERVER_PORT); > Admin adminClient = Admin.create(props); > createTopics(adminClient); > CountDownLatch createSucceed = new CountDownLatch(1); > Thread deleteTopicThread = new Thread(() -> { > List topicsToDelete = Arrays.asList("input-topic", > "output-topic"); > while (true) { > try { > Thread.sleep(1000); > adminClient.deleteTopics(topicsToDelete).all().get(); > if (createSucceed.getCount() == 0) { > break; > } > } catch (ExecutionException | InterruptedException e) { >
[jira] [Updated] (KAFKA-9534) Topics could not be deleted when there is a concurrent create topic request loop
[ https://issues.apache.org/jira/browse/KAFKA-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyang Chen updated KAFKA-9534: --- Affects Version/s: 2.3.0 > Topics could not be deleted when there is a concurrent create topic request > loop > > > Key: KAFKA-9534 > URL: https://issues.apache.org/jira/browse/KAFKA-9534 > Project: Kafka > Issue Type: Bug > Components: admin >Affects Versions: 2.3.0, 2.5.0 >Reporter: Boyang Chen >Priority: Major > > Not sure this is indeed a bug, just making it under track. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9534) Topics could not be deleted when there is a concurrent create topic request loop
[ https://issues.apache.org/jira/browse/KAFKA-9534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyang Chen updated KAFKA-9534: --- Summary: Topics could not be deleted when there is a concurrent create topic request loop (was: Potential inconsistent result for listTopic talking to older brokers) > Topics could not be deleted when there is a concurrent create topic request > loop > > > Key: KAFKA-9534 > URL: https://issues.apache.org/jira/browse/KAFKA-9534 > Project: Kafka > Issue Type: Bug > Components: admin >Affects Versions: 2.5.0 >Reporter: Boyang Chen >Priority: Major > > Not sure this is indeed a bug, just making it under track. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9517) KTable Joins Without Materialized Argument Yield Results That Further Joins NPE On
[ https://issues.apache.org/jira/browse/KAFKA-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033985#comment-17033985 ] John Roesler commented on KAFKA-9517: - Quick update, I've merged this fix to trunk, 2.5, and 2.4. I'll go ahead and mark this ticket resolved so that it doesn't prevent the creation of 2.4.1 or 2.5.0 release candidates. I'd still very much appreciate it if you can test it to make sure it resolves the issue for your use case. > KTable Joins Without Materialized Argument Yield Results That Further Joins > NPE On > -- > > Key: KAFKA-9517 > URL: https://issues.apache.org/jira/browse/KAFKA-9517 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.4.0 >Reporter: Paul Snively >Assignee: John Roesler >Priority: Blocker > Fix For: 2.5.0, 2.4.1 > > Attachments: test.tar.xz > > > The `KTable` API implemented [[here||#L842-L844]] > [https://github.com/apache/kafka/blob/2.4.0/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KTableImpl.java#L842-L844] > []|#L842-L844]] calls `doJoinOnForeignKey` with an argument of > `Materialized.with(null, null)`, as apparently do several other APIs. As the > comment spanning [these lines|#L1098-L1099]] makes clear, the result is a > `KTable` whose `valueSerde` (as a `KTableImpl`) is `null`. Therefore, > attempts to `join` etc. on the resulting `KTable` fail with a > `NullPointerException`. > While there is an obvious workaround—explicitly construct the required > `Materialized` and use the APIs that take it as an argument—I have to admit I > find the existence of public APIs with this sort of bug, particularly when > the bug is literally documented as a comment in the source code, astonishing > to the point of incredulity. It calls the quality and trustworthiness of > Kafka Streams into serious question, and if a resolution is not forthcoming > within a week, we will be left with no other option but to consider technical > alternatives. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9517) KTable Joins Without Materialized Argument Yield Results That Further Joins NPE On
[ https://issues.apache.org/jira/browse/KAFKA-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033962#comment-17033962 ] ASF GitHub Bot commented on KAFKA-9517: --- vvcephei commented on pull request #8061: KAFKA-9517: Fix default serdes with FK join URL: https://github.com/apache/kafka/pull/8061 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > KTable Joins Without Materialized Argument Yield Results That Further Joins > NPE On > -- > > Key: KAFKA-9517 > URL: https://issues.apache.org/jira/browse/KAFKA-9517 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.4.0 >Reporter: Paul Snively >Assignee: John Roesler >Priority: Blocker > Fix For: 2.5.0, 2.4.1 > > Attachments: test.tar.xz > > > The `KTable` API implemented [[here||#L842-L844]] > [https://github.com/apache/kafka/blob/2.4.0/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KTableImpl.java#L842-L844] > []|#L842-L844]] calls `doJoinOnForeignKey` with an argument of > `Materialized.with(null, null)`, as apparently do several other APIs. As the > comment spanning [these lines|#L1098-L1099]] makes clear, the result is a > `KTable` whose `valueSerde` (as a `KTableImpl`) is `null`. Therefore, > attempts to `join` etc. on the resulting `KTable` fail with a > `NullPointerException`. > While there is an obvious workaround—explicitly construct the required > `Materialized` and use the APIs that take it as an argument—I have to admit I > find the existence of public APIs with this sort of bug, particularly when > the bug is literally documented as a comment in the source code, astonishing > to the point of incredulity. It calls the quality and trustworthiness of > Kafka Streams into serious question, and if a resolution is not forthcoming > within a week, we will be left with no other option but to consider technical > alternatives. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9517) KTable Joins Without Materialized Argument Yield Results That Further Joins NPE On
[ https://issues.apache.org/jira/browse/KAFKA-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033960#comment-17033960 ] John Roesler commented on KAFKA-9517: - Hi Paul, Ah, those failures shouldn't affect your testing. #8015 depends on a fix I added for the TopologyTestDriver (https://github.com/apache/kafka/pull/8065). If you want to clear the error, you can cherry-pick that one also, or you can just skip the tests and build the artifacts directly with `installAll`. Sorry for neglecting to mention that initially; both changes were part of 8015 to begin with, but the reviewers rightly suggested I should pull out the TopologyTestDriver fix into a separately verified PR. Thanks, -John > KTable Joins Without Materialized Argument Yield Results That Further Joins > NPE On > -- > > Key: KAFKA-9517 > URL: https://issues.apache.org/jira/browse/KAFKA-9517 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.4.0 >Reporter: Paul Snively >Assignee: John Roesler >Priority: Blocker > Fix For: 2.5.0, 2.4.1 > > Attachments: test.tar.xz > > > The `KTable` API implemented [[here||#L842-L844]] > [https://github.com/apache/kafka/blob/2.4.0/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KTableImpl.java#L842-L844] > []|#L842-L844]] calls `doJoinOnForeignKey` with an argument of > `Materialized.with(null, null)`, as apparently do several other APIs. As the > comment spanning [these lines|#L1098-L1099]] makes clear, the result is a > `KTable` whose `valueSerde` (as a `KTableImpl`) is `null`. Therefore, > attempts to `join` etc. on the resulting `KTable` fail with a > `NullPointerException`. > While there is an obvious workaround—explicitly construct the required > `Materialized` and use the APIs that take it as an argument—I have to admit I > find the existence of public APIs with this sort of bug, particularly when > the bug is literally documented as a comment in the source code, astonishing > to the point of incredulity. It calls the quality and trustworthiness of > Kafka Streams into serious question, and if a resolution is not forthcoming > within a week, we will be left with no other option but to consider technical > alternatives. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-9523) Reduce flakiness of BranchedMultiLevelRepartitionConnectedTopologyTest
[ https://issues.apache.org/jira/browse/KAFKA-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang resolved KAFKA-9523. -- Fix Version/s: 2.5.0 Resolution: Fixed > Reduce flakiness of BranchedMultiLevelRepartitionConnectedTopologyTest > -- > > Key: KAFKA-9523 > URL: https://issues.apache.org/jira/browse/KAFKA-9523 > Project: Kafka > Issue Type: Test > Components: streams, unit tests >Reporter: Boyang Chen >Assignee: Boyang Chen >Priority: Major > Fix For: 2.5.0 > > > KAFKA-9335 introduces an integration test to verify the topology builder > itself could survive from building a complex topology. This test gets flaky > some time for stream client to broker connection, so we should consider > making it less flaky by either converting to a unit test or just focus on > making the test logic more robust. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9523) Reduce flakiness of BranchedMultiLevelRepartitionConnectedTopologyTest
[ https://issues.apache.org/jira/browse/KAFKA-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033947#comment-17033947 ] ASF GitHub Bot commented on KAFKA-9523: --- guozhangwang commented on pull request #8081: KAFKA-9523: Migrate BranchedMultiLevelRepartitionConnectedTopologyTest into a unit test URL: https://github.com/apache/kafka/pull/8081 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Reduce flakiness of BranchedMultiLevelRepartitionConnectedTopologyTest > -- > > Key: KAFKA-9523 > URL: https://issues.apache.org/jira/browse/KAFKA-9523 > Project: Kafka > Issue Type: Test > Components: streams, unit tests >Reporter: Boyang Chen >Assignee: Boyang Chen >Priority: Major > > KAFKA-9335 introduces an integration test to verify the topology builder > itself could survive from building a complex topology. This test gets flaky > some time for stream client to broker connection, so we should consider > making it less flaky by either converting to a unit test or just focus on > making the test logic more robust. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9480) Value for Task-level Metric process-rate is Constant Zero
[ https://issues.apache.org/jira/browse/KAFKA-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033946#comment-17033946 ] ASF GitHub Bot commented on KAFKA-9480: --- guozhangwang commented on pull request #8018: KAFKA-9480: Fix bug that prevented to measure task-level process-rate URL: https://github.com/apache/kafka/pull/8018 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Value for Task-level Metric process-rate is Constant Zero > -- > > Key: KAFKA-9480 > URL: https://issues.apache.org/jira/browse/KAFKA-9480 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.4.0 >Reporter: Bruno Cadonna >Assignee: Bruno Cadonna >Priority: Major > Fix For: 2.5.0 > > > The value for task-level metric process-rate is constant zero. The value > should reflect the number of calls to {{process()}} on source processors > which clearly cannot be constant zero. > This behavior applies to built-in metrics version {{latest}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-9505) InternalTopicManager may falls into infinite loop with partially created topics
[ https://issues.apache.org/jira/browse/KAFKA-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang resolved KAFKA-9505. -- Fix Version/s: 2.5.0 Resolution: Fixed > InternalTopicManager may falls into infinite loop with partially created > topics > --- > > Key: KAFKA-9505 > URL: https://issues.apache.org/jira/browse/KAFKA-9505 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Guozhang Wang >Assignee: Guozhang Wang >Priority: Major > Fix For: 2.5.0 > > > In {{InternalTopicManager#validateTopics(topicsNotReady, topics)}}, the > topics map (second) does not change while the first topicsNotReady may change > if some topics have been validated while others do not, however inside that > function we still loop of the second map which may never completes then. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9505) InternalTopicManager may falls into infinite loop with partially created topics
[ https://issues.apache.org/jira/browse/KAFKA-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033942#comment-17033942 ] ASF GitHub Bot commented on KAFKA-9505: --- guozhangwang commented on pull request #8039: KAFKA-9505: Only loop over topics-to-validate in retries URL: https://github.com/apache/kafka/pull/8039 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > InternalTopicManager may falls into infinite loop with partially created > topics > --- > > Key: KAFKA-9505 > URL: https://issues.apache.org/jira/browse/KAFKA-9505 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Guozhang Wang >Assignee: Guozhang Wang >Priority: Major > > In {{InternalTopicManager#validateTopics(topicsNotReady, topics)}}, the > topics map (second) does not change while the first topicsNotReady may change > if some topics have been validated while others do not, however inside that > function we still loop of the second map which may never completes then. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KAFKA-9523) Reduce flakiness of BranchedMultiLevelRepartitionConnectedTopologyTest
[ https://issues.apache.org/jira/browse/KAFKA-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boyang Chen reassigned KAFKA-9523: -- Assignee: Boyang Chen > Reduce flakiness of BranchedMultiLevelRepartitionConnectedTopologyTest > -- > > Key: KAFKA-9523 > URL: https://issues.apache.org/jira/browse/KAFKA-9523 > Project: Kafka > Issue Type: Test > Components: streams, unit tests >Reporter: Boyang Chen >Assignee: Boyang Chen >Priority: Major > > KAFKA-9335 introduces an integration test to verify the topology builder > itself could survive from building a complex topology. This test gets flaky > some time for stream client to broker connection, so we should consider > making it less flaky by either converting to a unit test or just focus on > making the test logic more robust. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9534) Potential inconsistent result for listTopic talking to older brokers
Boyang Chen created KAFKA-9534: -- Summary: Potential inconsistent result for listTopic talking to older brokers Key: KAFKA-9534 URL: https://issues.apache.org/jira/browse/KAFKA-9534 Project: Kafka Issue Type: Bug Components: admin Affects Versions: 2.5.0 Reporter: Boyang Chen Not sure this is indeed a bug, just making it under track. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9450) Decouple inner state flushing from committing
[ https://issues.apache.org/jira/browse/KAFKA-9450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sophie Blee-Goldman updated KAFKA-9450: --- Description: When EOS is turned on, the commit interval is set quite low (100ms) and all the store layers are flushed during a commit. This is necessary for forwarding records in the cache to the changelog, but unfortunately also forces rocksdb to flush the current memtable before it's full. The result is a large number of small writes to disk, losing the benefits of batching, and a large number of very small L0 files that are likely to slow compaction. Since we have to delete the stores to recreate from scratch anyways during an unclean shutdown with EOS, we may as well skip flushing the innermost StateStore during a commit and only do so during a graceful shutdown, before a rebalance, etc. This is currently blocked on a refactoring of the state store layers to allow decoupling the flush of the caching layer from the actual state store. Note that this is especially problematic with EOS due to the necessarily-low commit interval, but still hurts even with at-least-once and a much larger commit interval. was: When EOS is turned on, the commit interval is set quite low (100ms) and all the store layers are flushed during a commit. This is necessary for forwarding records in the cache to the changelog, but unfortunately also forces rocksdb to flush the current memtable before it's full. The result is a large number of small writes to disk, losing the benefits of batching, and a large number of very small L0 files that are likely to slow compaction. Since we have to delete the stores to recreate from scratch anyways during an unclean shutdown with EOS, we may as well skip flushing the innermost StateStore during a commit and only do so during a graceful shutdown, before a rebalance, etc. This is currently blocked on a refactoring of the state store layers to allow decoupling the flush of the caching layer from the actual state store. > Decouple inner state flushing from committing > - > > Key: KAFKA-9450 > URL: https://issues.apache.org/jira/browse/KAFKA-9450 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Sophie Blee-Goldman >Priority: Major > > When EOS is turned on, the commit interval is set quite low (100ms) and all > the store layers are flushed during a commit. This is necessary for > forwarding records in the cache to the changelog, but unfortunately also > forces rocksdb to flush the current memtable before it's full. The result is > a large number of small writes to disk, losing the benefits of batching, and > a large number of very small L0 files that are likely to slow compaction. > Since we have to delete the stores to recreate from scratch anyways during an > unclean shutdown with EOS, we may as well skip flushing the innermost > StateStore during a commit and only do so during a graceful shutdown, before > a rebalance, etc. This is currently blocked on a refactoring of the state > store layers to allow decoupling the flush of the caching layer from the > actual state store. > Note that this is especially problematic with EOS due to the necessarily-low > commit interval, but still hurts even with at-least-once and a much larger > commit interval. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9450) Decouple inner state flushing from committing
[ https://issues.apache.org/jira/browse/KAFKA-9450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sophie Blee-Goldman updated KAFKA-9450: --- Summary: Decouple inner state flushing from committing (was: Decouple inner state flushing from committing with EOS) > Decouple inner state flushing from committing > - > > Key: KAFKA-9450 > URL: https://issues.apache.org/jira/browse/KAFKA-9450 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Sophie Blee-Goldman >Priority: Major > > When EOS is turned on, the commit interval is set quite low (100ms) and all > the store layers are flushed during a commit. This is necessary for > forwarding records in the cache to the changelog, but unfortunately also > forces rocksdb to flush the current memtable before it's full. The result is > a large number of small writes to disk, losing the benefits of batching, and > a large number of very small L0 files that are likely to slow compaction. > Since we have to delete the stores to recreate from scratch anyways during an > unclean shutdown with EOS, we may as well skip flushing the innermost > StateStore during a commit and only do so during a graceful shutdown, before > a rebalance, etc. This is currently blocked on a refactoring of the state > store layers to allow decoupling the flush of the caching layer from the > actual state store. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9450) Decouple inner state flushing from committing with EOS
[ https://issues.apache.org/jira/browse/KAFKA-9450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033915#comment-17033915 ] Sophie Blee-Goldman commented on KAFKA-9450: [~NaviBrar] I assume that was only merged into the most recent RocksdB version? We aren't able to bump the rocksdb dependency further until the next major version bump due to some breaking changes in the options-related API. Not sure if the rocks folks might be willing to cherry-pick this back to a 5.x version and release that, if not maybe you should make a ticket to track this and mark as blocked by KAFKA-8897 so it doesn't get forgotten > Decouple inner state flushing from committing with EOS > -- > > Key: KAFKA-9450 > URL: https://issues.apache.org/jira/browse/KAFKA-9450 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Sophie Blee-Goldman >Priority: Major > > When EOS is turned on, the commit interval is set quite low (100ms) and all > the store layers are flushed during a commit. This is necessary for > forwarding records in the cache to the changelog, but unfortunately also > forces rocksdb to flush the current memtable before it's full. The result is > a large number of small writes to disk, losing the benefits of batching, and > a large number of very small L0 files that are likely to slow compaction. > Since we have to delete the stores to recreate from scratch anyways during an > unclean shutdown with EOS, we may as well skip flushing the innermost > StateStore during a commit and only do so during a graceful shutdown, before > a rebalance, etc. This is currently blocked on a refactoring of the state > store layers to allow decoupling the flush of the caching layer from the > actual state store. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-8897) Increase Version of RocksDB
[ https://issues.apache.org/jira/browse/KAFKA-8897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sophie Blee-Goldman updated KAFKA-8897: --- Fix Version/s: 3.0.0 > Increase Version of RocksDB > --- > > Key: KAFKA-8897 > URL: https://issues.apache.org/jira/browse/KAFKA-8897 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Bruno Cadonna >Assignee: Bruno Cadonna >Priority: Major > Fix For: 3.0.0 > > > A higher version (6+) of RocksDB is needed for some metrics specified in > KIP-471. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-8897) Increase Version of RocksDB
[ https://issues.apache.org/jira/browse/KAFKA-8897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sophie Blee-Goldman updated KAFKA-8897: --- Priority: Blocker (was: Major) > Increase Version of RocksDB > --- > > Key: KAFKA-8897 > URL: https://issues.apache.org/jira/browse/KAFKA-8897 > Project: Kafka > Issue Type: Improvement > Components: streams >Reporter: Bruno Cadonna >Assignee: Bruno Cadonna >Priority: Blocker > Fix For: 3.0.0 > > > A higher version (6+) of RocksDB is needed for some metrics specified in > KIP-471. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9523) Reduce flakiness of BranchedMultiLevelRepartitionConnectedTopologyTest
[ https://issues.apache.org/jira/browse/KAFKA-9523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033913#comment-17033913 ] ASF GitHub Bot commented on KAFKA-9523: --- abbccdda commented on pull request #8081: KAFKA-9523: Migrate BranchedMultiLevelRepartitionConnectedTopologyTest into a unit test URL: https://github.com/apache/kafka/pull/8081 Relying on integration test to catch an algorithm bug introduces more flakiness, reduce the test into a unit test to reduce the flakiness until we upgrade Java/Scala libs. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Reduce flakiness of BranchedMultiLevelRepartitionConnectedTopologyTest > -- > > Key: KAFKA-9523 > URL: https://issues.apache.org/jira/browse/KAFKA-9523 > Project: Kafka > Issue Type: Test > Components: streams, unit tests >Reporter: Boyang Chen >Priority: Major > > KAFKA-9335 introduces an integration test to verify the topology builder > itself could survive from building a complex topology. This test gets flaky > some time for stream client to broker connection, so we should consider > making it less flaky by either converting to a unit test or just focus on > making the test logic more robust. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-9423) Refine layout of configuration options on website and make individual settings directly linkable
[ https://issues.apache.org/jira/browse/KAFKA-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mickael Maison resolved KAFKA-9423. --- Fix Version/s: 2.6.0 Resolution: Fixed > Refine layout of configuration options on website and make individual > settings directly linkable > > > Key: KAFKA-9423 > URL: https://issues.apache.org/jira/browse/KAFKA-9423 > Project: Kafka > Issue Type: Improvement > Components: documentation >Reporter: Sönke Liebau >Assignee: Sönke Liebau >Priority: Trivial > Fix For: 2.6.0 > > Attachments: option1.png, option2.png, option3.png, option4.png > > > KAFKA-8474 changed the layout of configuration options on the website from a > table which over time ran out of horizontal space to a list. > This vastly improved readability but is not yet ideal. Further discussion was > had in the [PR|https://github.com/apache/kafka/pull/6870] for KAFKA-8474. > This ticket is to move that discussion to a separate thread and make it more > visible to other people and to give subsequent PRs a home. > Currently proposed options are attached to this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (KAFKA-6266) Kafka 1.0.0 : Repeated occurrence of WARN Resetting first dirty offset of __consumer_offsets-xx to log start offset 203569 since the checkpointed offset 120955 is invali
[ https://issues.apache.org/jira/browse/KAFKA-6266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mao reassigned KAFKA-6266: Assignee: David Mao (was: Anna Povzner) > Kafka 1.0.0 : Repeated occurrence of WARN Resetting first dirty offset of > __consumer_offsets-xx to log start offset 203569 since the checkpointed > offset 120955 is invalid. (kafka.log.LogCleanerManager$) > -- > > Key: KAFKA-6266 > URL: https://issues.apache.org/jira/browse/KAFKA-6266 > Project: Kafka > Issue Type: Bug > Components: offset manager >Affects Versions: 1.0.0, 1.0.1 > Environment: CentOS 7, Apache kafka_2.12-1.0.0 >Reporter: VinayKumar >Assignee: David Mao >Priority: Major > Fix For: 2.5.0, 2.4.1 > > > I upgraded Kafka from 0.10.2.1 to 1.0.0 version. From then, I see the below > warnings in the log. > I'm seeing these continuously in the log, and want these to be fixed- so > that they wont repeat. Can someone please help me in fixing the below > warnings. > {code} > WARN Resetting first dirty offset of __consumer_offsets-17 to log start > offset 3346 since the checkpointed offset 3332 is invalid. > (kafka.log.LogCleanerManager$) > WARN Resetting first dirty offset of __consumer_offsets-23 to log start > offset 4 since the checkpointed offset 1 is invalid. > (kafka.log.LogCleanerManager$) > WARN Resetting first dirty offset of __consumer_offsets-19 to log start > offset 203569 since the checkpointed offset 120955 is invalid. > (kafka.log.LogCleanerManager$) > WARN Resetting first dirty offset of __consumer_offsets-35 to log start > offset 16957 since the checkpointed offset 7 is invalid. > (kafka.log.LogCleanerManager$) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9533) ValueTransform forwards `null` values
[ https://issues.apache.org/jira/browse/KAFKA-9533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Viamari updated KAFKA-9533: --- Description: According to the documentation for `KStream#transformValues`, nulls returned from `ValueTransformer#transform` are not forwarded. (see [KStream#transformValues|https://kafka.apache.org/24/javadoc/org/apache/kafka/streams/kstream/KStream.html#transformValues-org.apache.kafka.streams.kstream.ValueTransformerSupplier-java.lang.String...-] However, this does not appear to be the case. In `KStreamTransformValuesProcessor#process` the result of the transform is forwarded directly. {code:java} @Override public void process(final K key, final V value) { context.forward(key, valueTransformer.transform(key, value)); } {code} was: According to the documentation for `KStream#transformValues`, nulls returned from `ValueTransformer#transform` are not forwarded. (see [KStream#transformValues|https://kafka.apache.org/24/javadoc/org/apache/kafka/streams/kstream/KStream.html#transformValues-org.apache.kafka.streams.kstream.ValueTransformerSupplier-java.lang.String...-] However, this does not appear to be the case. In `KStreamTransformValuesProcessor#transform` the result of the transform is forwarded directly. {code:java} @Override public void process(final K key, final V value) { context.forward(key, valueTransformer.transform(key, value)); } {code} > ValueTransform forwards `null` values > - > > Key: KAFKA-9533 > URL: https://issues.apache.org/jira/browse/KAFKA-9533 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.4.0 >Reporter: Michael Viamari >Priority: Minor > > According to the documentation for `KStream#transformValues`, nulls returned > from `ValueTransformer#transform` are not forwarded. (see > [KStream#transformValues|https://kafka.apache.org/24/javadoc/org/apache/kafka/streams/kstream/KStream.html#transformValues-org.apache.kafka.streams.kstream.ValueTransformerSupplier-java.lang.String...-] > However, this does not appear to be the case. In > `KStreamTransformValuesProcessor#process` the result of the transform is > forwarded directly. > {code:java} > @Override > public void process(final K key, final V value) { > context.forward(key, valueTransformer.transform(key, value)); > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9533) ValueTransform forwards `null` values
Michael Viamari created KAFKA-9533: -- Summary: ValueTransform forwards `null` values Key: KAFKA-9533 URL: https://issues.apache.org/jira/browse/KAFKA-9533 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 2.4.0 Reporter: Michael Viamari According to the documentation for `KStream#transformValues`, nulls returned from `ValueTransformer#transform` are not forwarded. (see [KStream#transformValues|https://kafka.apache.org/24/javadoc/org/apache/kafka/streams/kstream/KStream.html#transformValues-org.apache.kafka.streams.kstream.ValueTransformerSupplier-java.lang.String...-] However, this does not appear to be the case. In `KStreamTransformValuesProcessor#transform` the result of the transform is forwarded directly. {code:java} @Override public void process(final K key, final V value) { context.forward(key, valueTransformer.transform(key, value)); } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9487) Followup : KAFKA-9445(Allow fetching a key from a single partition); addressing code review comments
[ https://issues.apache.org/jira/browse/KAFKA-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033823#comment-17033823 ] ASF GitHub Bot commented on KAFKA-9487: --- vvcephei commented on pull request #8033: KAFKA-9487: Follow-up PR of Kafka-9445 URL: https://github.com/apache/kafka/pull/8033 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Followup : KAFKA-9445(Allow fetching a key from a single partition); > addressing code review comments > > > Key: KAFKA-9487 > URL: https://issues.apache.org/jira/browse/KAFKA-9487 > Project: Kafka > Issue Type: New Feature > Components: streams >Reporter: Navinder Brar >Assignee: Navinder Brar >Priority: Blocker > Fix For: 2.5.0 > > > A few code review comments are left to be addressed from Kafka 9445, which I > will be addressing in this PR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9423) Refine layout of configuration options on website and make individual settings directly linkable
[ https://issues.apache.org/jira/browse/KAFKA-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033816#comment-17033816 ] ASF GitHub Bot commented on KAFKA-9423: --- mimaison commented on pull request #7955: KAFKA-9423: Refine layout of configuration options on website and make individual settings directly linkable URL: https://github.com/apache/kafka/pull/7955 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refine layout of configuration options on website and make individual > settings directly linkable > > > Key: KAFKA-9423 > URL: https://issues.apache.org/jira/browse/KAFKA-9423 > Project: Kafka > Issue Type: Improvement > Components: documentation >Reporter: Sönke Liebau >Assignee: Sönke Liebau >Priority: Trivial > Attachments: option1.png, option2.png, option3.png, option4.png > > > KAFKA-8474 changed the layout of configuration options on the website from a > table which over time ran out of horizontal space to a list. > This vastly improved readability but is not yet ideal. Further discussion was > had in the [PR|https://github.com/apache/kafka/pull/6870] for KAFKA-8474. > This ticket is to move that discussion to a separate thread and make it more > visible to other people and to give subsequent PRs a home. > Currently proposed options are attached to this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-4090) JVM runs into OOM if (Java) client uses a SSL port without setting the security protocol
[ https://issues.apache.org/jira/browse/KAFKA-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033810#comment-17033810 ] David Mollitor commented on KAFKA-4090: --- https://github.com/apache/kafka/pull/8066 > JVM runs into OOM if (Java) client uses a SSL port without setting the > security protocol > > > Key: KAFKA-4090 > URL: https://issues.apache.org/jira/browse/KAFKA-4090 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.9.0.1, 0.10.0.1, 2.1.0 >Reporter: Jaikiran Pai >Assignee: Alexandre Dupriez >Priority: Major > > Quoting from the mail thread that was sent to Kafka mailing list: > {quote} > We have been using Kafka 0.9.0.1 (server and Java client libraries). So far > we had been using it with plaintext transport but recently have been > considering upgrading to using SSL. It mostly works except that a > mis-configured producer (and even consumer) causes a hard to relate > OutOfMemory exception and thus causing the JVM in which the client is > running, to go into a bad state. We can consistently reproduce that OOM very > easily. We decided to check if this is something that is fixed in 0.10.0.1 so > upgraded one of our test systems to that version (both server and client > libraries) but still see the same issue. Here's how it can be easily > reproduced > 1. Enable SSL listener on the broker via server.properties, as per the Kafka > documentation > {code} > listeners=PLAINTEXT://:9092,SSL://:9093 > ssl.keystore.location= > ssl.keystore.password=pass > ssl.key.password=pass > ssl.truststore.location= > ssl.truststore.password=pass > {code} > 2. Start zookeeper and kafka server > 3. Create a "oom-test" topic (which will be used for these tests): > {code} > kafka-topics.sh --zookeeper localhost:2181 --create --topic oom-test > --partitions 1 --replication-factor 1 > {code} > 4. Create a simple producer which sends a single message to the topic via > Java (new producer) APIs: > {code} > public class OOMTest { > public static void main(final String[] args) throws Exception { > final Properties kafkaProducerConfigs = new Properties(); > // NOTE: Intentionally use a SSL port without specifying > security.protocol as SSL > > kafkaProducerConfigs.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, > "localhost:9093"); > > kafkaProducerConfigs.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, > StringSerializer.class.getName()); > > kafkaProducerConfigs.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, > StringSerializer.class.getName()); > try (KafkaProducer producer = new > KafkaProducer<>(kafkaProducerConfigs)) { > System.out.println("Created Kafka producer"); > final String topicName = "oom-test"; > final String message = "Hello OOM!"; > // send a message to the topic > final Future recordMetadataFuture = > producer.send(new ProducerRecord<>(topicName, message)); > final RecordMetadata sentRecordMetadata = > recordMetadataFuture.get(); > System.out.println("Sent message '" + message + "' to topic '" + > topicName + "'"); > } > System.out.println("Tests complete"); > } > } > {code} > Notice that the server URL is using a SSL endpoint localhost:9093 but isn't > specifying any of the other necessary SSL configs like security.protocol. > 5. For the sake of easily reproducing this issue run this class with a max > heap size of 256MB (-Xmx256M). Running this code throws up the following > OutOfMemoryError in one of the Sender threads: > {code} > 18:33:25,770 ERROR [KafkaThread] - Uncaught exception in > kafka-producer-network-thread | producer-1: > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93) > at > org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) > at > org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153) > at > org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134) > at org.apache.kafka.common.network.Selector.poll(Selector.java:286) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256) > at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) > at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) > at java.lang.Thread.run(Thread.java:745) > {code} > Note that I set it to 256MB as heap size to easily reproduce it but this > isn't specific to
[jira] [Commented] (KAFKA-9517) KTable Joins Without Materialized Argument Yield Results That Further Joins NPE On
[ https://issues.apache.org/jira/browse/KAFKA-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033785#comment-17033785 ] Paul Snively commented on KAFKA-9517: - Thanks for the suggestion to squash and cherry-pick #8015 and #8061. I've done that, and `.gradlew test` is giving me four errors that seem related to the cherry-picked PRs. I'm attaching the test report for others to perhaps analyze. My colleague and I will also attempt to reproduce the issues we specifically encountered in using 2.4.0. I'm reasonably confident we can also take some time to review these two PRs, but that seems somewhat unlikely to happen today. [^test.tar.xz] > KTable Joins Without Materialized Argument Yield Results That Further Joins > NPE On > -- > > Key: KAFKA-9517 > URL: https://issues.apache.org/jira/browse/KAFKA-9517 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.4.0 >Reporter: Paul Snively >Assignee: John Roesler >Priority: Blocker > Fix For: 2.5.0, 2.4.1 > > Attachments: test.tar.xz > > > The `KTable` API implemented [[here||#L842-L844]] > [https://github.com/apache/kafka/blob/2.4.0/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KTableImpl.java#L842-L844] > []|#L842-L844]] calls `doJoinOnForeignKey` with an argument of > `Materialized.with(null, null)`, as apparently do several other APIs. As the > comment spanning [these lines|#L1098-L1099]] makes clear, the result is a > `KTable` whose `valueSerde` (as a `KTableImpl`) is `null`. Therefore, > attempts to `join` etc. on the resulting `KTable` fail with a > `NullPointerException`. > While there is an obvious workaround—explicitly construct the required > `Materialized` and use the APIs that take it as an argument—I have to admit I > find the existence of public APIs with this sort of bug, particularly when > the bug is literally documented as a comment in the source code, astonishing > to the point of incredulity. It calls the quality and trustworthiness of > Kafka Streams into serious question, and if a resolution is not forthcoming > within a week, we will be left with no other option but to consider technical > alternatives. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9517) KTable Joins Without Materialized Argument Yield Results That Further Joins NPE On
[ https://issues.apache.org/jira/browse/KAFKA-9517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Snively updated KAFKA-9517: Attachment: test.tar.xz > KTable Joins Without Materialized Argument Yield Results That Further Joins > NPE On > -- > > Key: KAFKA-9517 > URL: https://issues.apache.org/jira/browse/KAFKA-9517 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.4.0 >Reporter: Paul Snively >Assignee: John Roesler >Priority: Blocker > Fix For: 2.5.0, 2.4.1 > > Attachments: test.tar.xz > > > The `KTable` API implemented [[here||#L842-L844]] > [https://github.com/apache/kafka/blob/2.4.0/streams/src/main/java/org/apache/kafka/streams/kstream/internals/KTableImpl.java#L842-L844] > []|#L842-L844]] calls `doJoinOnForeignKey` with an argument of > `Materialized.with(null, null)`, as apparently do several other APIs. As the > comment spanning [these lines|#L1098-L1099]] makes clear, the result is a > `KTable` whose `valueSerde` (as a `KTableImpl`) is `null`. Therefore, > attempts to `join` etc. on the resulting `KTable` fail with a > `NullPointerException`. > While there is an obvious workaround—explicitly construct the required > `Materialized` and use the APIs that take it as an argument—I have to admit I > find the existence of public APIs with this sort of bug, particularly when > the bug is literally documented as a comment in the source code, astonishing > to the point of incredulity. It calls the quality and trustworthiness of > Kafka Streams into serious question, and if a resolution is not forthcoming > within a week, we will be left with no other option but to consider technical > alternatives. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (KAFKA-8623) KafkaProducer possible deadlock when sending to different topics
[ https://issues.apache.org/jira/browse/KAFKA-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Byrne resolved KAFKA-8623. Fix Version/s: 2.3.0 Resolution: Fixed This appears to be due to an issue concerning the handling of consecutive metadata updates in clients, where the first update could effectively clear the request for the second because no version/instance which request was outstanding was maintained. This was fixed in PR [6621|https://github.com/apache/kafka/pull/6221] (see item 3), which is available in the 2.3.0 release. > KafkaProducer possible deadlock when sending to different topics > > > Key: KAFKA-8623 > URL: https://issues.apache.org/jira/browse/KAFKA-8623 > Project: Kafka > Issue Type: Bug > Components: producer >Affects Versions: 2.2.1 >Reporter: Alexander Bagiev >Assignee: Kun Song >Priority: Critical > Fix For: 2.3.0 > > > Project with bug reproduction: [https://github.com/abagiev/kafka-producer-bug] > It was found that sending two messages in two different topics in a row > results in hanging of KafkaProducer for 60s and the following exception: > {noformat} > org.springframework.kafka.core.KafkaProducerException: Failed to send; nested > exception is org.apache.kafka.common.errors.TimeoutException: Failed to > update metadata after 6 ms. > at > org.springframework.kafka.core.KafkaTemplate.lambda$buildCallback$0(KafkaTemplate.java:405) > ~[spring-kafka-2.2.7.RELEASE.jar:2.2.7.RELEASE] > at > org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:877) > ~[kafka-clients-2.0.1.jar:na] > at > org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:803) > ~[kafka-clients-2.0.1.jar:na] > at > org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer.send(DefaultKafkaProducerFactory.java:444) > ~[spring-kafka-2.2.7.RELEASE.jar:2.2.7.RELEASE] > at > org.springframework.kafka.core.KafkaTemplate.doSend(KafkaTemplate.java:381) > ~[spring-kafka-2.2.7.RELEASE.jar:2.2.7.RELEASE] > at > org.springframework.kafka.core.KafkaTemplate.send(KafkaTemplate.java:193) > ~[spring-kafka-2.2.7.RELEASE.jar:2.2.7.RELEASE] > ... > {noformat} > It looks like KafkaProducer requests two times for meta information for each > topic and hangs just before second request due to some deadlock. When 60s > pass TimeoutException is thrown and meta information is requested/received > immediately (but after exception has been already thrown). > The issue in the example project is reproduced every time; and the use case > is trivial. > This is a critical bug for us. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9515) Upgrade ZooKeeper to 3.5.7
[ https://issues.apache.org/jira/browse/KAFKA-9515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033759#comment-17033759 ] Ron Dagostino commented on KAFKA-9515: -- We should probably also expand the system tests to include the case where ZK TLS is enabled but clientAuth=none. > Upgrade ZooKeeper to 3.5.7 > -- > > Key: KAFKA-9515 > URL: https://issues.apache.org/jira/browse/KAFKA-9515 > Project: Kafka > Issue Type: Improvement >Reporter: Ismael Juma >Assignee: Ismael Juma >Priority: Blocker > Fix For: 2.5.0, 2.4.1 > > > There are some critical fixes in ZK 3.5.7 and the first RC has been posted: > [https://mail-archives.apache.org/mod_mbox/zookeeper-dev/202002.mbox/%3cCAGH6_KiULzemT-V4x_2ybWeKLMvQ+eh=q-dzsiz8a-ypp5t...@mail.gmail.com%3e] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9515) Upgrade ZooKeeper to 3.5.7
[ https://issues.apache.org/jira/browse/KAFKA-9515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033753#comment-17033753 ] Ismael Juma commented on KAFKA-9515: [~rndgstn] I actually think 2.5 should ship with ZK 3.5.7. Does that mean we have to do extra work? > Upgrade ZooKeeper to 3.5.7 > -- > > Key: KAFKA-9515 > URL: https://issues.apache.org/jira/browse/KAFKA-9515 > Project: Kafka > Issue Type: Improvement >Reporter: Ismael Juma >Assignee: Ismael Juma >Priority: Blocker > Fix For: 2.5.0, 2.4.1 > > > There are some critical fixes in ZK 3.5.7 and the first RC has been posted: > [https://mail-archives.apache.org/mod_mbox/zookeeper-dev/202002.mbox/%3cCAGH6_KiULzemT-V4x_2ybWeKLMvQ+eh=q-dzsiz8a-ypp5t...@mail.gmail.com%3e] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KAFKA-9515) Upgrade ZooKeeper to 3.5.7
[ https://issues.apache.org/jira/browse/KAFKA-9515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033753#comment-17033753 ] Ismael Juma edited comment on KAFKA-9515 at 2/10/20 4:58 PM: - [~rndgstn] I actually think AK 2.5 should ship with ZK 3.5.7. Does that mean we have to do extra work? was (Author: ijuma): [~rndgstn] I actually think 2.5 should ship with ZK 3.5.7. Does that mean we have to do extra work? > Upgrade ZooKeeper to 3.5.7 > -- > > Key: KAFKA-9515 > URL: https://issues.apache.org/jira/browse/KAFKA-9515 > Project: Kafka > Issue Type: Improvement >Reporter: Ismael Juma >Assignee: Ismael Juma >Priority: Blocker > Fix For: 2.5.0, 2.4.1 > > > There are some critical fixes in ZK 3.5.7 and the first RC has been posted: > [https://mail-archives.apache.org/mod_mbox/zookeeper-dev/202002.mbox/%3cCAGH6_KiULzemT-V4x_2ybWeKLMvQ+eh=q-dzsiz8a-ypp5t...@mail.gmail.com%3e] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8374) KafkaApis.handleLeaderAndIsrRequest not robust to ZooKeeper exceptions
[ https://issues.apache.org/jira/browse/KAFKA-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033734#comment-17033734 ] Jun Rao commented on KAFKA-8374: This could be related to https://issues.apache.org/jira/browse/KAFKA-9307. > KafkaApis.handleLeaderAndIsrRequest not robust to ZooKeeper exceptions > -- > > Key: KAFKA-8374 > URL: https://issues.apache.org/jira/browse/KAFKA-8374 > Project: Kafka > Issue Type: Bug > Components: core, offset manager >Affects Versions: 2.0.1 > Environment: Linux x86_64 (Ubuntu Xenial) running on AWS EC2 >Reporter: Mike Mintz >Assignee: Bob Barrett >Priority: Major > > h2. Summary of bug (theory) > During a leader election, when a broker is transitioning from leader to > follower on some __consumer_offset partitions and some __transaction_state > partitions, it’s possible for a ZooKeeper exception to be thrown, leaving the > GroupMetadataManager in an inconsistent state. > > In particular, in KafkaApis.handleLeaderAndIsrRequest in the > onLeadershipChange callback, it’s possible for > TransactionCoordinator.handleTxnEmigration to throw > ZooKeeperClientExpiredException, ending the updatedFollowers.foreach loop > early. If there were any __consumer_offset partitions to be handled later in > the loop, GroupMetadataManager will be left with stale data in its > groupMetadataCache. Later, when this broker resumes leadership for the > affected __consumer_offset partitions, it will fail to load the updated > groups into the cache since it uses putIfNotExists, and it will serve stale > offsets to consumers. > > h2. Details of what we experienced > We ran into this issue running Kafka 2.0.1 in production. Several Kafka > consumers received stale offsets when reconnecting to their group coordinator > after a leadership election on their __consumer_offsets partition. This > caused them to reprocess many duplicate messages. > > We believe we’ve tracked down the root cause: * On 2019-04-01, we were having > memory pressure in ZooKeeper, and we were getting several > ZooKeeperClientExpiredException errors in the logs. > * The impacted consumers were all in __consumer_offsets-15. There was a > leader election on this partition, and leadership transitioned from broker > 1088 to broker 1069. During this leadership election, the former leader > (1088) saw a ZooKeeperClientExpiredException (stack trace below). This > happened inside KafkaApis.handleLeaderAndIsrRequest, specifically in > onLeadershipChange while it was updating a __transaction_state partition. > Since there are no “Scheduling unloading” or “Finished unloading” log > messages in this period, we believe it threw this exception before getting to > __consumer_offsets-15, so it never got a chance to call > GroupCoordinator.handleGroupEmigration, which means this broker didn’t unload > offsets from its GroupMetadataManager. > * Four days later, on 2019-04-05, we manually restarted broker 1069, so > broker 1088 became the leader of __consumer_offsets-15 again. When it ran > GroupMetadataManager.loadGroup, it presumably failed to update > groupMetadataCache since it uses putIfNotExists, and it would have found the > group id already in the cache. Unfortunately we did not have debug logging > enabled, but I would expect to have seen a log message like "Attempt to load > group ${group.groupId} from log with generation ${group.generationId} failed > because there is already a cached group with generation > ${currentGroup.generationId}". > * After the leadership election, the impacted consumers reconnected to > broker 1088 and received stale offsets that correspond to the last committed > offsets around 2019-04-01. > > h2. Relevant log/stacktrace > {code:java} > [2019-04-01 22:44:18.968617] [2019-04-01 22:44:18,963] ERROR [KafkaApi-1088] > Error when handling request > {controller_id=1096,controller_epoch=122,partition_states=[...,{topic=__consumer_offsets,partition=15,controller_epoch=122,leader=1069,leader_epoch=440,isr=[1092,1088,1069],zk_version=807,replicas=[1069,1088,1092],is_new=false},...],live_leaders=[{id=1069,host=10.68.42.121,port=9094}]} > (kafka.server.KafkaApis) > [2019-04-01 22:44:18.968689] kafka.zookeeper.ZooKeeperClientExpiredException: > Session expired either before or while waiting for connection > [2019-04-01 22:44:18.968712] at > kafka.zookeeper.ZooKeeperClient$$anonfun$kafka$zookeeper$ZooKeeperClient$$waitUntilConnected$1.apply$mcV$sp(ZooKeeperClient.scala:238) > [2019-04-01 22:44:18.968736] at > kafka.zookeeper.ZooKeeperClient$$anonfun$kafka$zookeeper$ZooKeeperClient$$waitUntilConnected$1.apply(ZooKeeperClient.scala:226) > [2019-04-01 22:44:18.968759] at >
[jira] [Commented] (KAFKA-9515) Upgrade ZooKeeper to 3.5.7
[ https://issues.apache.org/jira/browse/KAFKA-9515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033729#comment-17033729 ] Ron Dagostino commented on KAFKA-9515: -- ZooKeeper 3.5.7 also adds support for the "ssl.clientAuth=[want|need|none]" configuration on the ZooKeeper server side. This means with v3.5.7 client certificates become optional (they are required in 3.5.6, which is what shipped with AK 2.4 and what will ship with AK 2.5). As per [this GitHub PR conversation for KIP 515|https://github.com/apache/kafka/pull/8003#discussion_r376476887] (text adjusted abit now that we have more info): "We need to decide in 3 places (KafkaServer, ConfigCommand, and ZkSecurityMigrator) whether or not the ZooKeeper client should generate ACls in ZooKeeper when creating znodes. Prior to the possibility of x509 authentication it was easy to decide: was SASL enabled to ZooKeeper or not. Now it is supported for SASL to not be enabled but x509 auth to be enabled -- and in that case we want to generate ACLs. So in the 3 cases we have to look for this possibility. I agree it is entirely possible that ZooKeeper might not authenticate the client -- technically in ZK 3.5.6 it is not possible to turn that off, but it will be possible in ZK 3.5.7 and beyond. So while with ZooKeeper 3.5.6 it isn't an issue, at some point in the future it will be. It is possible that ZK might ignore the client certificate, we might generate ACLs, and those ACLs might grant access to World. One idea to avoid this is to make the connection with ACls enabled, create a random temporary znode, read the ACls, and check if it is world-enabled; then abort at that point if it is. It would probably be a good idea to add this when we upgrade to ZooKeeper 3.5.7." > Upgrade ZooKeeper to 3.5.7 > -- > > Key: KAFKA-9515 > URL: https://issues.apache.org/jira/browse/KAFKA-9515 > Project: Kafka > Issue Type: Improvement >Reporter: Ismael Juma >Assignee: Ismael Juma >Priority: Blocker > Fix For: 2.5.0, 2.4.1 > > > There are some critical fixes in ZK 3.5.7 and the first RC has been posted: > [https://mail-archives.apache.org/mod_mbox/zookeeper-dev/202002.mbox/%3cCAGH6_KiULzemT-V4x_2ybWeKLMvQ+eh=q-dzsiz8a-ypp5t...@mail.gmail.com%3e] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9469) Add zookeeper.ssl.context.supplier.class config if/when adopting ZooKeeper 3.6
[ https://issues.apache.org/jira/browse/KAFKA-9469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033727#comment-17033727 ] Ron Dagostino commented on KAFKA-9469: -- Just checked and "zookeeper.ssl.context.supplier.class" did not make it into ZooKeeper 3.5.7. So we must still wait for v3.6. > Add zookeeper.ssl.context.supplier.class config if/when adopting ZooKeeper 3.6 > -- > > Key: KAFKA-9469 > URL: https://issues.apache.org/jira/browse/KAFKA-9469 > Project: Kafka > Issue Type: New Feature > Components: config >Reporter: Ron Dagostino >Assignee: Ron Dagostino >Priority: Minor > > The "zookeeper.ssl.context.supplier.class" configuration doesn't actually > exist in ZooKeeper 3.5.6. The ZooKeeper admin guide documents it as being > there, but it doesn't appear in the code. This means we can't support it in > KIP-515, and it has been removed from that KIP. > I checked the latest ZooKeeper 3.6 SNAPSHOT, and it has been added. So this > config could probably be added to Kafka via a new, small KIP if/when we > upgrade to ZooKeeper 3.6 (which looks to be in release-candidate stage at the > moment). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KAFKA-9531) java.net.UnknownHostException loop on VM rolling update using CNAME
[ https://issues.apache.org/jira/browse/KAFKA-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033650#comment-17033650 ] Rui Abreu edited comment on KAFKA-9531 at 2/10/20 3:13 PM: --- Seems related to the following issues: https://issues.apache.org/jira/browse/KAFKA-7890 https://issues.apache.org/jira/browse/KAFKA-8057 https://issues.apache.org/jira/browse/KAFKA-7755 was (Author: rabreu): Seems related to the following issues: https://issues.apache.org/jira/browse/KAFKA-7890 https://issues.apache.org/jira/browse/KAFKA-8057 > java.net.UnknownHostException loop on VM rolling update using CNAME > --- > > Key: KAFKA-9531 > URL: https://issues.apache.org/jira/browse/KAFKA-9531 > Project: Kafka > Issue Type: Bug > Components: clients, controller, network, producer >Affects Versions: 2.4.0 >Reporter: Rui Abreu >Priority: Major > > Hello, > > My cluster setup in based on VMs behind DNS CNAME . > Example: node.internal is a CNAME to either nodeA.internal or nodeB.internal > Since kafka-client 1.2.1, it has been observed that sometimes Kafka clients > get stuck on a loop with the exception: > Example after nodeB.internal is replaced with nodeA.internal > > {code:java} > 2020-02-10T12:11:28.181Z o.a.k.c.NetworkClient [WARN]- [Consumer > clientId=consumer-6, groupId=consumer.group] Error connecting to node > nodeB.internal:9092 (id: 2 rack: null) > java.net.UnknownHostException: nodeB.internal:9092 > at java.net.InetAddress.getAllByName0(InetAddress.java:1281) > ~[?:1.8.0_222] > at java.net.InetAddress.getAllByName(InetAddress.java:1193) > ~[?:1.8.0_222] > at java.net.InetAddress.getAllByName(InetAddress.java:1127) > ~[?:1.8.0_222] > at org.apache.kafka.clients.ClientUtils.resolve(ClientUtils.java:104) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.currentAddress(ClusterConnectionStates.java:403) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.access$200(ClusterConnectionStates.java:363) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.ClusterConnectionStates.currentAddress(ClusterConnectionStates.java:151) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:943) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.NetworkClient.access$600(NetworkClient.java:68) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1114) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1005) > ~[stormjar.jar:?] > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:537) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:161) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:366) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1251) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1220) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1159) > ~[stormjar.jar:?] > at > org.apache.storm.kafka.spout.KafkaSpout.pollKafkaBroker(KafkaSpout.java:365) > ~[stormjar.jar:?] > at > org.apache.storm.kafka.spout.KafkaSpout.nextTuple(KafkaSpout.java:294) > ~[stormjar.jar:?] > at > org.apache.storm.daemon.executor$fn__10715$fn__10730$fn__10761.invoke(executor.clj:649) > ~[storm-core-1.1.3.jar:1.1.3] > at org.apache.storm.util$async_loop$fn__553.invoke(util.clj:484) > ~[storm-core-1.1.3.jar:1.1.3] > at clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.7.0.jar:?] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222] > {code} > > The time it spends in the loop is arbitrary, but it seems the client > effectively stops while this is happening. > This error contrasts with instances where the client is able to recover on > its own after a few seconds: > {code:java} >
[jira] [Assigned] (KAFKA-8623) KafkaProducer possible deadlock when sending to different topics
[ https://issues.apache.org/jira/browse/KAFKA-8623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kun Song reassigned KAFKA-8623: --- Assignee: Kun Song > KafkaProducer possible deadlock when sending to different topics > > > Key: KAFKA-8623 > URL: https://issues.apache.org/jira/browse/KAFKA-8623 > Project: Kafka > Issue Type: Bug > Components: producer >Affects Versions: 2.2.1 >Reporter: Alexander Bagiev >Assignee: Kun Song >Priority: Critical > > Project with bug reproduction: [https://github.com/abagiev/kafka-producer-bug] > It was found that sending two messages in two different topics in a row > results in hanging of KafkaProducer for 60s and the following exception: > {noformat} > org.springframework.kafka.core.KafkaProducerException: Failed to send; nested > exception is org.apache.kafka.common.errors.TimeoutException: Failed to > update metadata after 6 ms. > at > org.springframework.kafka.core.KafkaTemplate.lambda$buildCallback$0(KafkaTemplate.java:405) > ~[spring-kafka-2.2.7.RELEASE.jar:2.2.7.RELEASE] > at > org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:877) > ~[kafka-clients-2.0.1.jar:na] > at > org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:803) > ~[kafka-clients-2.0.1.jar:na] > at > org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer.send(DefaultKafkaProducerFactory.java:444) > ~[spring-kafka-2.2.7.RELEASE.jar:2.2.7.RELEASE] > at > org.springframework.kafka.core.KafkaTemplate.doSend(KafkaTemplate.java:381) > ~[spring-kafka-2.2.7.RELEASE.jar:2.2.7.RELEASE] > at > org.springframework.kafka.core.KafkaTemplate.send(KafkaTemplate.java:193) > ~[spring-kafka-2.2.7.RELEASE.jar:2.2.7.RELEASE] > ... > {noformat} > It looks like KafkaProducer requests two times for meta information for each > topic and hangs just before second request due to some deadlock. When 60s > pass TimeoutException is thrown and meta information is requested/received > immediately (but after exception has been already thrown). > The issue in the example project is reproduced every time; and the use case > is trivial. > This is a critical bug for us. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KAFKA-9531) java.net.UnknownHostException loop on VM rolling update using CNAME
[ https://issues.apache.org/jira/browse/KAFKA-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Abreu updated KAFKA-9531: - Component/s: network > java.net.UnknownHostException loop on VM rolling update using CNAME > --- > > Key: KAFKA-9531 > URL: https://issues.apache.org/jira/browse/KAFKA-9531 > Project: Kafka > Issue Type: Bug > Components: clients, controller, network, producer >Affects Versions: 2.4.0 >Reporter: Rui Abreu >Priority: Major > > Hello, > > My cluster setup in based on VMs behind DNS CNAME . > Example: node.internal is a CNAME to either nodeA.internal or nodeB.internal > Since kafka-client 1.2.1, it has been observed that sometimes Kafka clients > get stuck on a loop with the exception: > Example after nodeB.internal is replaced with nodeA.internal > > {code:java} > 2020-02-10T12:11:28.181Z o.a.k.c.NetworkClient [WARN]- [Consumer > clientId=consumer-6, groupId=consumer.group] Error connecting to node > nodeB.internal:9092 (id: 2 rack: null) > java.net.UnknownHostException: nodeB.internal:9092 > at java.net.InetAddress.getAllByName0(InetAddress.java:1281) > ~[?:1.8.0_222] > at java.net.InetAddress.getAllByName(InetAddress.java:1193) > ~[?:1.8.0_222] > at java.net.InetAddress.getAllByName(InetAddress.java:1127) > ~[?:1.8.0_222] > at org.apache.kafka.clients.ClientUtils.resolve(ClientUtils.java:104) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.currentAddress(ClusterConnectionStates.java:403) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.access$200(ClusterConnectionStates.java:363) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.ClusterConnectionStates.currentAddress(ClusterConnectionStates.java:151) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:943) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.NetworkClient.access$600(NetworkClient.java:68) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1114) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1005) > ~[stormjar.jar:?] > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:537) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:161) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:366) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1251) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1220) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1159) > ~[stormjar.jar:?] > at > org.apache.storm.kafka.spout.KafkaSpout.pollKafkaBroker(KafkaSpout.java:365) > ~[stormjar.jar:?] > at > org.apache.storm.kafka.spout.KafkaSpout.nextTuple(KafkaSpout.java:294) > ~[stormjar.jar:?] > at > org.apache.storm.daemon.executor$fn__10715$fn__10730$fn__10761.invoke(executor.clj:649) > ~[storm-core-1.1.3.jar:1.1.3] > at org.apache.storm.util$async_loop$fn__553.invoke(util.clj:484) > ~[storm-core-1.1.3.jar:1.1.3] > at clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.7.0.jar:?] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222] > {code} > > The time it spends in the loop is arbitrary, but it seems the client > effectively stops while this is happening. > This error contrasts with instances where the client is able to recover on > its own after a few seconds: > {code:java} > 2020-02-08T01:15:37.390Z o.a.k.c.c.i.AbstractCoordinator [INFO] - [Consumer > clientId=consumer-7, groupId=consumer-group] Group coordinator > nodeA.internal:9092 (id: 2147483645 rack: null) is unavailable or invalid, > will attempt rediscovery > > 2020-02-08T01:15:37.885Z o.a.k.c.c.i.AbstractCoordinator [INFO] - [Consumer > clientId=consumer-7, groupId=consumer-group] Discovered group coordinator > nodeB.internal:9092 (id:
[jira] [Commented] (KAFKA-9504) Memory leak in KafkaMetrics registered to MBean
[ https://issues.apache.org/jira/browse/KAFKA-9504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033663#comment-17033663 ] Murilo Tavares commented on KAFKA-9504: --- I just ran the above against version 2.3.1, and it looks fine on that release. So apparently this was really introduced on 2.4.0. > Memory leak in KafkaMetrics registered to MBean > --- > > Key: KAFKA-9504 > URL: https://issues.apache.org/jira/browse/KAFKA-9504 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 2.4.0 >Reporter: Andreas Holmén >Priority: Major > > After close() called on a KafkaConsumer some registered MBeans are not > unregistered causing leak. > > > {code:java} > import static > org.apache.kafka.clients.consumer.ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG; > import java.lang.management.ManagementFactory; > import java.util.HashMap; > import java.util.Map; > import javax.management.MBeanServer; > import org.apache.kafka.clients.consumer.KafkaConsumer; > import org.apache.kafka.common.serialization.ByteArrayDeserializer; > public class Leaker { > private static String bootstrapServers = "hostname:9092"; > > public static void main(String[] args) throws InterruptedException { > MBeanServer mBeanServer = ManagementFactory.getPlatformMBeanServer(); > Map props = new HashMap<>(); > props.put(BOOTSTRAP_SERVERS_CONFIG, bootstrapServers); > > int beans = mBeanServer.getMBeanCount(); > for (int i = 0; i < 100; i++) { >KafkaConsumer consumer = new KafkaConsumer<>(props, new > ByteArrayDeserializer(), new ByteArrayDeserializer()); >consumer.close(); > } > int newBeans = mBeanServer.getMBeanCount(); > System.out.println("\nbeans delta: " + (newBeans - beans)); > } > } > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-8904) Reduce metadata lookups when producing to a large number of topics
[ https://issues.apache.org/jira/browse/KAFKA-8904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033653#comment-17033653 ] ASF GitHub Bot commented on KAFKA-8904: --- rajinisivaram commented on pull request #7781: KAFKA-8904: Improve producer's topic metadata fetching. URL: https://github.com/apache/kafka/pull/7781 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Reduce metadata lookups when producing to a large number of topics > -- > > Key: KAFKA-8904 > URL: https://issues.apache.org/jira/browse/KAFKA-8904 > Project: Kafka > Issue Type: Improvement > Components: controller, producer >Reporter: Brian Byrne >Priority: Minor > > Per [~lbradstreet]: > > "The problem was that the producer starts with no knowledge of topic > metadata. So they start the producer up, and then they start sending messages > to any of the thousands of topics that exist. Each time a message is sent to > a new topic, it'll trigger a metadata request if the producer doesn't know > about it. These metadata requests are done in serial such that if you send > 2000 messages to 2000 topics, it will trigger 2000 new metadata requests. > > Each successive metadata request will include every topic seen so far, so the > first metadata request will include 1 topic, the second will include 2 > topics, etc. > > An additional problem is that this can take a while, and metadata expiry (for > metadata that has not been recently used) is hard coded to 5 mins, so if this > the initial fetches take long enough you can end up evicting the metadata > before you send another message to a topic. > So the approaches above are: > 1. We can linger for a bit before making a metadata request, allow more sends > to go through, and then batch the metadata request for topics we we need in a > single metadata request. > 2. We can allow pre-seeding the producer with metadata for a list of topics > you care about. > I prefer 1 if we can make it work." -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-9531) java.net.UnknownHostException loop on VM rolling update using CNAME
[ https://issues.apache.org/jira/browse/KAFKA-9531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033650#comment-17033650 ] Rui Abreu commented on KAFKA-9531: -- Seems related to the following issues: https://issues.apache.org/jira/browse/KAFKA-7890 https://issues.apache.org/jira/browse/KAFKA-8057 > java.net.UnknownHostException loop on VM rolling update using CNAME > --- > > Key: KAFKA-9531 > URL: https://issues.apache.org/jira/browse/KAFKA-9531 > Project: Kafka > Issue Type: Bug > Components: clients, controller, producer >Affects Versions: 2.4.0 >Reporter: Rui Abreu >Priority: Major > > Hello, > > My cluster setup in based on VMs behind DNS CNAME . > Example: node.internal is a CNAME to either nodeA.internal or nodeB.internal > Since kafka-client 1.2.1, it has been observed that sometimes Kafka clients > get stuck on a loop with the exception: > Example after nodeB.internal is replaced with nodeA.internal > > {code:java} > 2020-02-10T12:11:28.181Z o.a.k.c.NetworkClient [WARN]- [Consumer > clientId=consumer-6, groupId=consumer.group] Error connecting to node > nodeB.internal:9092 (id: 2 rack: null) > java.net.UnknownHostException: nodeB.internal:9092 > at java.net.InetAddress.getAllByName0(InetAddress.java:1281) > ~[?:1.8.0_222] > at java.net.InetAddress.getAllByName(InetAddress.java:1193) > ~[?:1.8.0_222] > at java.net.InetAddress.getAllByName(InetAddress.java:1127) > ~[?:1.8.0_222] > at org.apache.kafka.clients.ClientUtils.resolve(ClientUtils.java:104) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.currentAddress(ClusterConnectionStates.java:403) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.access$200(ClusterConnectionStates.java:363) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.ClusterConnectionStates.currentAddress(ClusterConnectionStates.java:151) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:943) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.NetworkClient.access$600(NetworkClient.java:68) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1114) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1005) > ~[stormjar.jar:?] > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:537) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:161) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:366) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1251) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1220) > ~[stormjar.jar:?] > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1159) > ~[stormjar.jar:?] > at > org.apache.storm.kafka.spout.KafkaSpout.pollKafkaBroker(KafkaSpout.java:365) > ~[stormjar.jar:?] > at > org.apache.storm.kafka.spout.KafkaSpout.nextTuple(KafkaSpout.java:294) > ~[stormjar.jar:?] > at > org.apache.storm.daemon.executor$fn__10715$fn__10730$fn__10761.invoke(executor.clj:649) > ~[storm-core-1.1.3.jar:1.1.3] > at org.apache.storm.util$async_loop$fn__553.invoke(util.clj:484) > ~[storm-core-1.1.3.jar:1.1.3] > at clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.7.0.jar:?] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222] > {code} > > The time it spends in the loop is arbitrary, but it seems the client > effectively stops while this is happening. > This error contrasts with instances where the client is able to recover on > its own after a few seconds: > {code:java} > 2020-02-08T01:15:37.390Z o.a.k.c.c.i.AbstractCoordinator [INFO] - [Consumer > clientId=consumer-7, groupId=consumer-group] Group coordinator > nodeA.internal:9092 (id: 2147483645 rack: null) is unavailable or invalid, > will attempt rediscovery > > 2020-02-08T01:15:37.885Z
[jira] [Created] (KAFKA-9531) java.net.UnknownHostException loop on VM rolling update using CNAME
Rui Abreu created KAFKA-9531: Summary: java.net.UnknownHostException loop on VM rolling update using CNAME Key: KAFKA-9531 URL: https://issues.apache.org/jira/browse/KAFKA-9531 Project: Kafka Issue Type: Bug Components: clients, controller, producer Affects Versions: 2.4.0 Reporter: Rui Abreu Hello, My cluster setup in based on VMs behind DNS CNAME . Example: node.internal is a CNAME to either nodeA.internal or nodeB.internal Since kafka-client 1.2.1, it has been observed that sometimes Kafka clients get stuck on a loop with the exception: Example after nodeB.internal is replaced with nodeA.internal {code:java} 2020-02-10T12:11:28.181Z o.a.k.c.NetworkClient [WARN]- [Consumer clientId=consumer-6, groupId=consumer.group] Error connecting to node nodeB.internal:9092 (id: 2 rack: null) java.net.UnknownHostException: nodeB.internal:9092 at java.net.InetAddress.getAllByName0(InetAddress.java:1281) ~[?:1.8.0_222] at java.net.InetAddress.getAllByName(InetAddress.java:1193) ~[?:1.8.0_222] at java.net.InetAddress.getAllByName(InetAddress.java:1127) ~[?:1.8.0_222] at org.apache.kafka.clients.ClientUtils.resolve(ClientUtils.java:104) ~[stormjar.jar:?] at org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.currentAddress(ClusterConnectionStates.java:403) ~[stormjar.jar:?] at org.apache.kafka.clients.ClusterConnectionStates$NodeConnectionState.access$200(ClusterConnectionStates.java:363) ~[stormjar.jar:?] at org.apache.kafka.clients.ClusterConnectionStates.currentAddress(ClusterConnectionStates.java:151) ~[stormjar.jar:?] at org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:943) ~[stormjar.jar:?] at org.apache.kafka.clients.NetworkClient.access$600(NetworkClient.java:68) ~[stormjar.jar:?] at org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1114) ~[stormjar.jar:?] at org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1005) ~[stormjar.jar:?] at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:537) ~[stormjar.jar:?] at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262) ~[stormjar.jar:?] at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:233) ~[stormjar.jar:?] at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) ~[stormjar.jar:?] at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:161) ~[stormjar.jar:?] at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:366) ~[stormjar.jar:?] at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1251) ~[stormjar.jar:?] at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1220) ~[stormjar.jar:?] at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1159) ~[stormjar.jar:?] at org.apache.storm.kafka.spout.KafkaSpout.pollKafkaBroker(KafkaSpout.java:365) ~[stormjar.jar:?] at org.apache.storm.kafka.spout.KafkaSpout.nextTuple(KafkaSpout.java:294) ~[stormjar.jar:?] at org.apache.storm.daemon.executor$fn__10715$fn__10730$fn__10761.invoke(executor.clj:649) ~[storm-core-1.1.3.jar:1.1.3] at org.apache.storm.util$async_loop$fn__553.invoke(util.clj:484) ~[storm-core-1.1.3.jar:1.1.3] at clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.7.0.jar:?] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_222] {code} The time it spends in the loop is arbitrary, but it seems the client effectively stops while this is happening. This error contrasts with instances where the client is able to recover on its own after a few seconds: {code:java} 2020-02-08T01:15:37.390Z o.a.k.c.c.i.AbstractCoordinator [INFO] - [Consumer clientId=consumer-7, groupId=consumer-group] Group coordinator nodeA.internal:9092 (id: 2147483645 rack: null) is unavailable or invalid, will attempt rediscovery 2020-02-08T01:15:37.885Z o.a.k.c.c.i.AbstractCoordinator [INFO] - [Consumer clientId=consumer-7, groupId=consumer-group] Discovered group coordinator nodeB.internal:9092 (id: 2147483646 rack: null) 2020-02-08T01:15:37.885Z o.a.k.c.ClusterConnectionStates [INFO] - [Consumer clientId=consumer-7, groupId=consumer-group] Hostname for node 2147483646 changed from nodeA.internal to nodeB.internal {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9532) Deleting the consumer group programatically using RESTFul API
Rakshith Mamidala created KAFKA-9532: - Summary: Deleting the consumer group programatically using RESTFul API Key: KAFKA-9532 URL: https://issues.apache.org/jira/browse/KAFKA-9532 Project: Kafka Issue Type: Wish Components: clients Affects Versions: 2.4.0 Reporter: Rakshith Mamidala Fix For: 2.4.0 As a requirement in project, instead of listening the messages and consuming / storing message data into database, we are creating the consumer groups run time per user (to avoid thread safe issue) and using consumer.poll and consumer.seekToBeginning and once read all the messages we are closing the connection, unsubscribing consumer group. Whats happening in Kafka is, the consumer groups moved from active state to DEAD state but not getting removed / deleted, in Kafka Tools it shows all the consumers even if those are DEAD. *What we want:* # How to remove / delete the consumer groups programatically. # Is there any REST Endpoint / command line / script to delete the consumer groups? What are those. # What impact the DEAD consumer groups can creates in terms of production environment.? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-4090) JVM runs into OOM if (Java) client uses a SSL port without setting the security protocol
[ https://issues.apache.org/jira/browse/KAFKA-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033588#comment-17033588 ] Alexandre Dupriez commented on KAFKA-4090: -- Thanks [~belugabehr] for sharing your results. Do you have a pre-PR with your changes so that I can have a look? > JVM runs into OOM if (Java) client uses a SSL port without setting the > security protocol > > > Key: KAFKA-4090 > URL: https://issues.apache.org/jira/browse/KAFKA-4090 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.9.0.1, 0.10.0.1, 2.1.0 >Reporter: Jaikiran Pai >Assignee: Alexandre Dupriez >Priority: Major > > Quoting from the mail thread that was sent to Kafka mailing list: > {quote} > We have been using Kafka 0.9.0.1 (server and Java client libraries). So far > we had been using it with plaintext transport but recently have been > considering upgrading to using SSL. It mostly works except that a > mis-configured producer (and even consumer) causes a hard to relate > OutOfMemory exception and thus causing the JVM in which the client is > running, to go into a bad state. We can consistently reproduce that OOM very > easily. We decided to check if this is something that is fixed in 0.10.0.1 so > upgraded one of our test systems to that version (both server and client > libraries) but still see the same issue. Here's how it can be easily > reproduced > 1. Enable SSL listener on the broker via server.properties, as per the Kafka > documentation > {code} > listeners=PLAINTEXT://:9092,SSL://:9093 > ssl.keystore.location= > ssl.keystore.password=pass > ssl.key.password=pass > ssl.truststore.location= > ssl.truststore.password=pass > {code} > 2. Start zookeeper and kafka server > 3. Create a "oom-test" topic (which will be used for these tests): > {code} > kafka-topics.sh --zookeeper localhost:2181 --create --topic oom-test > --partitions 1 --replication-factor 1 > {code} > 4. Create a simple producer which sends a single message to the topic via > Java (new producer) APIs: > {code} > public class OOMTest { > public static void main(final String[] args) throws Exception { > final Properties kafkaProducerConfigs = new Properties(); > // NOTE: Intentionally use a SSL port without specifying > security.protocol as SSL > > kafkaProducerConfigs.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, > "localhost:9093"); > > kafkaProducerConfigs.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, > StringSerializer.class.getName()); > > kafkaProducerConfigs.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, > StringSerializer.class.getName()); > try (KafkaProducer producer = new > KafkaProducer<>(kafkaProducerConfigs)) { > System.out.println("Created Kafka producer"); > final String topicName = "oom-test"; > final String message = "Hello OOM!"; > // send a message to the topic > final Future recordMetadataFuture = > producer.send(new ProducerRecord<>(topicName, message)); > final RecordMetadata sentRecordMetadata = > recordMetadataFuture.get(); > System.out.println("Sent message '" + message + "' to topic '" + > topicName + "'"); > } > System.out.println("Tests complete"); > } > } > {code} > Notice that the server URL is using a SSL endpoint localhost:9093 but isn't > specifying any of the other necessary SSL configs like security.protocol. > 5. For the sake of easily reproducing this issue run this class with a max > heap size of 256MB (-Xmx256M). Running this code throws up the following > OutOfMemoryError in one of the Sender threads: > {code} > 18:33:25,770 ERROR [KafkaThread] - Uncaught exception in > kafka-producer-network-thread | producer-1: > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) > at > org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93) > at > org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) > at > org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153) > at > org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134) > at org.apache.kafka.common.network.Selector.poll(Selector.java:286) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256) > at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) > at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) > at java.lang.Thread.run(Thread.java:745) > {code} > Note that I set
[jira] [Commented] (KAFKA-9423) Refine layout of configuration options on website and make individual settings directly linkable
[ https://issues.apache.org/jira/browse/KAFKA-9423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033569#comment-17033569 ] ASF GitHub Bot commented on KAFKA-9423: --- mimaison commented on pull request #251: KAFKA-9423: Refine layout of configuration options on website and make individual settings directly linkable URL: https://github.com/apache/kafka-site/pull/251 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refine layout of configuration options on website and make individual > settings directly linkable > > > Key: KAFKA-9423 > URL: https://issues.apache.org/jira/browse/KAFKA-9423 > Project: Kafka > Issue Type: Improvement > Components: documentation >Reporter: Sönke Liebau >Assignee: Sönke Liebau >Priority: Trivial > Attachments: option1.png, option2.png, option3.png, option4.png > > > KAFKA-8474 changed the layout of configuration options on the website from a > table which over time ran out of horizontal space to a list. > This vastly improved readability but is not yet ideal. Further discussion was > had in the [PR|https://github.com/apache/kafka/pull/6870] for KAFKA-8474. > This ticket is to move that discussion to a separate thread and make it more > visible to other people and to give subsequent PRs a home. > Currently proposed options are attached to this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-7052) ExtractField SMT throws NPE - needs clearer error message
[ https://issues.apache.org/jira/browse/KAFKA-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033473#comment-17033473 ] Robin Moffatt commented on KAFKA-7052: -- I agree that even just improving the error message would be a good first-step here. > ExtractField SMT throws NPE - needs clearer error message > - > > Key: KAFKA-7052 > URL: https://issues.apache.org/jira/browse/KAFKA-7052 > Project: Kafka > Issue Type: Improvement > Components: KafkaConnect >Reporter: Robin Moffatt >Priority: Major > > With the following Single Message Transform: > {code:java} > "transforms.ExtractId.type":"org.apache.kafka.connect.transforms.ExtractField$Key", > "transforms.ExtractId.field":"id"{code} > Kafka Connect errors with : > {code:java} > java.lang.NullPointerException > at > org.apache.kafka.connect.transforms.ExtractField.apply(ExtractField.java:61) > at > org.apache.kafka.connect.runtime.TransformationChain.apply(TransformationChain.java:38){code} > There should be a better error message here, identifying the reason for the > NPE. > Version: Confluent Platform 4.1.1 -- This message was sent by Atlassian Jira (v8.3.4#803005)