[jira] [Created] (KAFKA-8885) The Kafka Protocol should Support Optional Tagged Fields
Colin P. McCabe created KAFKA-8885: -- Summary: The Kafka Protocol should Support Optional Tagged Fields Key: KAFKA-8885 URL: https://issues.apache.org/jira/browse/KAFKA-8885 Project: Kafka Issue Type: New Feature Reporter: Colin P. McCabe Assignee: Colin P. McCabe Implement KIP-482: The Kafka Protocol should Support Optional Tagged Fields See [KIP-482|https://cwiki.apache.org/confluence/display/KAFKA/KIP-482%3A+The+Kafka+Protocol+should+Support+Optional+Tagged+Fields] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (KAFKA-8840) Fix bug where ClientCompatibilityFeaturesTest fails when running multiple iterations
Colin P. McCabe created KAFKA-8840: -- Summary: Fix bug where ClientCompatibilityFeaturesTest fails when running multiple iterations Key: KAFKA-8840 URL: https://issues.apache.org/jira/browse/KAFKA-8840 Project: Kafka Issue Type: Bug Components: system tests Reporter: Colin P. McCabe Assignee: Colin P. McCabe Fix a bug where ClientCompatibilityFeaturesTest fails when running multiple iterations. The error message looks like this: {code} 16:47:58 OSError: [Errno 17] File exists: '/home/jenkins/workspace/system-test-kafka-branch-builder/results/2019-08-26--001/ClientCompatibilityFeaturesTest/run_compatibility_test/broker_version=0.10.1.1/0 {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (KAFKA-8826) Improve logging for SSL ciphers and principals
Colin P. McCabe created KAFKA-8826: -- Summary: Improve logging for SSL ciphers and principals Key: KAFKA-8826 URL: https://issues.apache.org/jira/browse/KAFKA-8826 Project: Kafka Issue Type: Improvement Reporter: Colin P. McCabe Assignee: Colin P. McCabe -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (KAFKA-8746) Kibosh must handle an empty JSON string from Trogdor
Colin P. McCabe created KAFKA-8746: -- Summary: Kibosh must handle an empty JSON string from Trogdor Key: KAFKA-8746 URL: https://issues.apache.org/jira/browse/KAFKA-8746 Project: Kafka Issue Type: Bug Reporter: Colin P. McCabe Assignee: Colin P. McCabe When Trogdor wants to clear all the faults injected to Kibosh, it sends the empty JSON object `{}`. However, Kibosh expects `{"faults":[]}` instead. Kibosh should handle the empty JSON object, since that's consistent with how Trogdor handles empty JSON fields in general (if they're empty, they can be omitted). We should also have a test for this. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (KAFKA-8684) Create a test of consumer group behavior that tests old clients and new servers
Colin P. McCabe created KAFKA-8684: -- Summary: Create a test of consumer group behavior that tests old clients and new servers Key: KAFKA-8684 URL: https://issues.apache.org/jira/browse/KAFKA-8684 Project: Kafka Issue Type: Improvement Reporter: Colin P. McCabe We should create a test of consumer group behavior that tests old clients and new servers. The current tests would not catch another bug like KAFKA-8653. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (KAFKA-8644) The Kafka protocol generator should allow null defaults for bytes and array fields
Colin P. McCabe created KAFKA-8644: -- Summary: The Kafka protocol generator should allow null defaults for bytes and array fields Key: KAFKA-8644 URL: https://issues.apache.org/jira/browse/KAFKA-8644 Project: Kafka Issue Type: Improvement Reporter: Colin P. McCabe Assignee: Colin P. McCabe The Kafka protocol generator should allow null defaults for bytes and array fields. Currently, null defaults are only allowed for string fields. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-7788) Support null defaults in KAFKA-7609 RPC specifications
[ https://issues.apache.org/jira/browse/KAFKA-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-7788. Resolution: Duplicate > Support null defaults in KAFKA-7609 RPC specifications > -- > > Key: KAFKA-7788 > URL: https://issues.apache.org/jira/browse/KAFKA-7788 > Project: Kafka > Issue Type: Sub-task >Reporter: Colin P. McCabe >Assignee: Colin P. McCabe >Priority: Minor > > It would be nice if we could support null values as defaults in the > KAFKA-7609 RPC specification files. null defaults should be allowed only if > the field is nullable in all supported versions of the field. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-6771) Make specifying partitions in RoundTripWorkload, ProduceBench more flexible
[ https://issues.apache.org/jira/browse/KAFKA-6771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-6771. Resolution: Fixed > Make specifying partitions in RoundTripWorkload, ProduceBench more flexible > --- > > Key: KAFKA-6771 > URL: https://issues.apache.org/jira/browse/KAFKA-6771 > Project: Kafka > Issue Type: Improvement > Components: system tests >Reporter: Colin P. McCabe >Assignee: Colin P. McCabe >Priority: Major > > Make specifying partitions in RoundTripWorkload, ProduceBench more flexible -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8628) Auto-generated Kafka RPC code should be able to use zero-copy ByteBuffers
Colin P. McCabe created KAFKA-8628: -- Summary: Auto-generated Kafka RPC code should be able to use zero-copy ByteBuffers Key: KAFKA-8628 URL: https://issues.apache.org/jira/browse/KAFKA-8628 Project: Kafka Issue Type: Improvement Reporter: Colin P. McCabe Assignee: Colin P. McCabe Auto-generated Kafka RPC code should be able to use zero-copy ByteBuffers -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8594) Add Kafka Streams compatibility test for Kafka 2.3.0
Colin P. McCabe created KAFKA-8594: -- Summary: Add Kafka Streams compatibility test for Kafka 2.3.0 Key: KAFKA-8594 URL: https://issues.apache.org/jira/browse/KAFKA-8594 Project: Kafka Issue Type: Improvement Reporter: Colin P. McCabe We should add a compatibility test for the Kafka 2.3.0 release -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8560) The Kafka protocol generator should support common structures
Colin P. McCabe created KAFKA-8560: -- Summary: The Kafka protocol generator should support common structures Key: KAFKA-8560 URL: https://issues.apache.org/jira/browse/KAFKA-8560 Project: Kafka Issue Type: Improvement Reporter: Colin P. McCabe Assignee: Colin P. McCabe The Kafka protocol generator should support common structures. This would make things simpler in cases where we need to refer to a single structure from multiple places in a message. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-8448) Too many kafka.log.Log instances (Memory Leak)
[ https://issues.apache.org/jira/browse/KAFKA-8448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-8448. Resolution: Fixed Fix Version/s: 2.4.0 > Too many kafka.log.Log instances (Memory Leak) > -- > > Key: KAFKA-8448 > URL: https://issues.apache.org/jira/browse/KAFKA-8448 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.2.0 > Environment: Red Hat 4.4.7-16, java version "1.8.0_152", > kafka_2.12-2.2.0 >Reporter: Juan Olivares >Assignee: Justine Olshan >Priority: Major > Fix For: 2.4.0 > > > We have a custom Kafka health check which creates a topic, add some ACLs > (read/write topic and group), produce & consume a single message and then > quickly remove it and all the related ACLs created. We close the consumer > involved, but no the producer. > We have observed that # of instances of {{kafka.log.Log}} keep growing, while > there's no evidence of topics being leaked, neither running > {{/opt/kafka/bin/kafka-topics.sh --zookeeper localhost:2181 --describe}} , > nor looking at the disk directory where topics are stored. > After looking at the heapdump we've observed the following > - None of the {{kafka.log.Log}} references ({{currentLogs}}, > {{logsToBeDeleted }} and {{logsToBeDeleted}}) in {{kafka.log.LogManager}} is > holding the big amount of {{kafka.log.Log}} instances. > - The only reference preventing {{kafka.log.Log}} to be Garbage collected > seems to be > {{java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue}} which > contains schedule tasks created with the name > {{PeriodicProducerExpirationCheck}}. > I can see in the code that for every {{kafka.log.Log}} a task with this name > is scheduled. > {code:java} > scheduler.schedule(name = "PeriodicProducerExpirationCheck", fun = () => { > lock synchronized { > producerStateManager.removeExpiredProducers(time.milliseconds) > } > }, period = producerIdExpirationCheckIntervalMs, delay = > producerIdExpirationCheckIntervalMs, unit = TimeUnit.MILLISECONDS) > {code} > However it seems those tasks are never unscheduled/cancelled -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-8265) Connect Client Config Override policy
[ https://issues.apache.org/jira/browse/KAFKA-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-8265. Resolution: Fixed Fix Version/s: 2.3 > Connect Client Config Override policy > - > > Key: KAFKA-8265 > URL: https://issues.apache.org/jira/browse/KAFKA-8265 > Project: Kafka > Issue Type: New Feature > Components: KafkaConnect >Reporter: Magesh kumar Nandakumar >Assignee: Magesh kumar Nandakumar >Priority: Major > Fix For: 2.3 > > > Right now, each source connector and sink connector inherit their client > configurations from the worker properties. Within the worker properties, all > configurations that have a prefix of "producer." or "consumer." are applied > to all source connectors and sink connectors respectively. > We should allow the "producer." or "consumer." to be overridden in > accordance to an override policy determined by the administrator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-3816) Provide more context in Kafka Connect log messages using MDC
[ https://issues.apache.org/jira/browse/KAFKA-3816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-3816. Resolution: Fixed > Provide more context in Kafka Connect log messages using MDC > > > Key: KAFKA-3816 > URL: https://issues.apache.org/jira/browse/KAFKA-3816 > Project: Kafka > Issue Type: Improvement > Components: KafkaConnect >Affects Versions: 0.9.0.1 >Reporter: Randall Hauch >Assignee: Randall Hauch >Priority: Critical > Fix For: 2.3.0 > > > Currently it is relatively difficult to correlate individual log messages > with the various threads and activities that are going on within a Kafka > Connect worker, let along a cluster of workers. Log messages should provide > more context to make it easier and to allow log scraping tools to coalesce > related log messages. > One simple way to do this is by using _mapped diagnostic contexts_, or MDC. > This is supported by the SLF4J API, and by the Logback and Log4J logging > frameworks. > Basically, the framework would be changed so that each thread is configured > with one or more MDC parameters using the > {{org.slf4j.MDC.put(String,String)}} method in SLF4J. Once that thread is > configured, all log messages made using that thread have that context. The > logs can then be configured to use those parameters. > It would be ideal to define a convention for connectors and the Kafka Connect > framework. A single set of MDC parameters means that the logging framework > can use the specific parameters on its message formats. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8482) alterReplicaLogDirs should be better documented
Colin P. McCabe created KAFKA-8482: -- Summary: alterReplicaLogDirs should be better documented Key: KAFKA-8482 URL: https://issues.apache.org/jira/browse/KAFKA-8482 Project: Kafka Issue Type: Improvement Components: admin Reporter: Colin P. McCabe Fix For: 2.4.0 alterReplicaLogDirs should be better documented. In particular, it should document what exceptions it throws in {{AdminClient.java}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-8449) Restart task on reconfiguration under incremental cooperative rebalancing
[ https://issues.apache.org/jira/browse/KAFKA-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-8449. Resolution: Fixed > Restart task on reconfiguration under incremental cooperative rebalancing > - > > Key: KAFKA-8449 > URL: https://issues.apache.org/jira/browse/KAFKA-8449 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 2.3.0 >Reporter: Konstantine Karantasis >Assignee: Konstantine Karantasis >Priority: Blocker > Fix For: 2.3.0 > > > Tasks that are already running and are not redistributed are not currently > restarted under incremental cooperative rebalancing when their configuration > changes. With eager rebalancing the restart was triggered and therefore > implied by rebalancing itself. But now existing tasks will not read the new > configuration unless restarted via the REST api. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-8475) Temporarily restore SslFactory.sslContext() helper
[ https://issues.apache.org/jira/browse/KAFKA-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-8475. Resolution: Fixed Fix Version/s: 2.3 > Temporarily restore SslFactory.sslContext() helper > -- > > Key: KAFKA-8475 > URL: https://issues.apache.org/jira/browse/KAFKA-8475 > Project: Kafka > Issue Type: Bug >Reporter: Colin P. McCabe >Assignee: Randall Hauch >Priority: Blocker > Fix For: 2.3 > > > Temporarily restore the SslFactory.sslContext() function, which some > connectors use. This function is not a public API and it will be removed > eventually. For now, we will mark it as deprecated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8475) Temporarily restore SslFactory.sslContext() helper
Colin P. McCabe created KAFKA-8475: -- Summary: Temporarily restore SslFactory.sslContext() helper Key: KAFKA-8475 URL: https://issues.apache.org/jira/browse/KAFKA-8475 Project: Kafka Issue Type: Bug Reporter: Colin P. McCabe Assignee: Randall Hauch Temporarily restore the SslFactory.sslContext() function, which some connectors use. This function is not a public API and it will be removed eventually. For now, we will mark it as deprecated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-5476) Implement a system test that creates network partitions
[ https://issues.apache.org/jira/browse/KAFKA-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-5476. Resolution: Duplicate > Implement a system test that creates network partitions > --- > > Key: KAFKA-5476 > URL: https://issues.apache.org/jira/browse/KAFKA-5476 > Project: Kafka > Issue Type: Test >Reporter: Colin P. McCabe >Assignee: Colin P. McCabe >Priority: Major > > Implement a system test that creates network partitions -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-7921) Instable KafkaStreamsTest
[ https://issues.apache.org/jira/browse/KAFKA-7921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-7921. Resolution: Fixed Let's re-open this if we see another failure. > Instable KafkaStreamsTest > - > > Key: KAFKA-7921 > URL: https://issues.apache.org/jira/browse/KAFKA-7921 > Project: Kafka > Issue Type: Bug > Components: streams, unit tests >Affects Versions: 2.3.0 >Reporter: Matthias J. Sax >Priority: Critical > Labels: flaky-test > Fix For: 2.3.0 > > > {{KafkaStreamsTest}} failed multiple times, eg, > {quote}java.lang.AssertionError: Condition not met within timeout 15000. > Streams never started. > at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:365) > at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:325) > at > org.apache.kafka.streams.KafkaStreamsTest.shouldThrowOnCleanupWhileRunning(KafkaStreamsTest.java:556){quote} > or > {quote}java.lang.AssertionError: Condition not met within timeout 15000. > Streams never started. > at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:365) > at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:325) > at > org.apache.kafka.streams.KafkaStreamsTest.testStateThreadClose(KafkaStreamsTest.java:255){quote} > > The preserved logs are as follows: > {quote}[2019-02-12 07:02:17,198] INFO Kafka version: 2.3.0-SNAPSHOT > (org.apache.kafka.common.utils.AppInfoParser:109) > [2019-02-12 07:02:17,198] INFO Kafka commitId: 08036fa4b1e5b822 > (org.apache.kafka.common.utils.AppInfoParser:110) > [2019-02-12 07:02:17,199] INFO stream-client [clientId] State transition from > CREATED to REBALANCING (org.apache.kafka.streams.KafkaStreams:263) > [2019-02-12 07:02:17,200] INFO stream-thread [clientId-StreamThread-238] > Starting (org.apache.kafka.streams.processor.internals.StreamThread:767) > [2019-02-12 07:02:17,200] INFO stream-client [clientId] State transition from > REBALANCING to PENDING_SHUTDOWN (org.apache.kafka.streams.KafkaStreams:263) > [2019-02-12 07:02:17,200] INFO stream-thread [clientId-StreamThread-239] > Starting (org.apache.kafka.streams.processor.internals.StreamThread:767) > [2019-02-12 07:02:17,200] INFO stream-thread [clientId-StreamThread-238] > State transition from CREATED to STARTING > (org.apache.kafka.streams.processor.internals.StreamThread:214) > [2019-02-12 07:02:17,200] INFO stream-thread [clientId-StreamThread-239] > State transition from CREATED to STARTING > (org.apache.kafka.streams.processor.internals.StreamThread:214) > [2019-02-12 07:02:17,200] INFO stream-thread [clientId-StreamThread-238] > Informed to shut down > (org.apache.kafka.streams.processor.internals.StreamThread:1192) > [2019-02-12 07:02:17,201] INFO stream-thread [clientId-StreamThread-238] > State transition from STARTING to PENDING_SHUTDOWN > (org.apache.kafka.streams.processor.internals.StreamThread:214) > [2019-02-12 07:02:17,201] INFO stream-thread [clientId-StreamThread-239] > Informed to shut down > (org.apache.kafka.streams.processor.internals.StreamThread:1192) > [2019-02-12 07:02:17,201] INFO stream-thread [clientId-StreamThread-239] > State transition from STARTING to PENDING_SHUTDOWN > (org.apache.kafka.streams.processor.internals.StreamThread:214) > [2019-02-12 07:02:17,205] INFO Cluster ID: J8uJhiTKQx-Y_i9LzT0iLg > (org.apache.kafka.clients.Metadata:365) > [2019-02-12 07:02:17,205] INFO Cluster ID: J8uJhiTKQx-Y_i9LzT0iLg > (org.apache.kafka.clients.Metadata:365) > [2019-02-12 07:02:17,205] INFO [Consumer > clientId=clientId-StreamThread-238-consumer, groupId=appId] Discovered group > coordinator localhost:36122 (id: 2147483647 rack: null) > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:675) > [2019-02-12 07:02:17,205] INFO [Consumer > clientId=clientId-StreamThread-239-consumer, groupId=appId] Discovered group > coordinator localhost:36122 (id: 2147483647 rack: null) > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:675) > [2019-02-12 07:02:17,206] INFO [Consumer > clientId=clientId-StreamThread-238-consumer, groupId=appId] Revoking > previously assigned partitions [] > (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:458) > [2019-02-12 07:02:17,206] INFO [Consumer > clientId=clientId-StreamThread-239-consumer, groupId=appId] Revoking > previously assigned partitions [] > (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:458) > [2019-02-12 07:02:17,206] INFO [Consumer > clientId=clientId-StreamThread-238-consumer, groupId=appId] (Re-)joining > group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:491) > [2019-02-12 07:02:17,206] INFO [Consumer > clientId=clientId-StreamThread-239-consumer, groupId=appId] (Re-)joining > group (org.apache.kafka
[jira] [Resolved] (KAFKA-7992) Add a server start time metric
[ https://issues.apache.org/jira/browse/KAFKA-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-7992. Resolution: Fixed Fix Version/s: 2.3.0 > Add a server start time metric > -- > > Key: KAFKA-7992 > URL: https://issues.apache.org/jira/browse/KAFKA-7992 > Project: Kafka > Issue Type: Improvement >Reporter: Stanislav Kozlovski >Assignee: Stanislav Kozlovski >Priority: Minor > Labels: needs-kip > Fix For: 2.3.0 > > > KIP: KIP-436 > As with all software systems, observability into their health is critical. > With many deployment platforms (be them custom-built or open-source), tasks > like restarting a misbehaving server in a cluster are completely automated. > With Kafka, monitoring systems have no definitive source of truth to gauge > when a server/client has been started. They are left to either use arbitrary > Kafka-specific metrics as a heuristic or the JVM RuntimeMXBean's StartTime, > which is not exactly indicative of when the application itself started > It would be useful to have a metric exposing when the kafka server has > started. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8345) Create an Administrative API for Replica Reassignment
Colin P. McCabe created KAFKA-8345: -- Summary: Create an Administrative API for Replica Reassignment Key: KAFKA-8345 URL: https://issues.apache.org/jira/browse/KAFKA-8345 Project: Kafka Issue Type: Improvement Reporter: Colin P. McCabe Assignee: Colin P. McCabe Create an Administrative API for Replica Reassignment, as discussed in KIP-455 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-8134) ProducerConfig.LINGER_MS_CONFIG undocumented breaking change in kafka-clients 2.1
[ https://issues.apache.org/jira/browse/KAFKA-8134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-8134. Resolution: Fixed > ProducerConfig.LINGER_MS_CONFIG undocumented breaking change in kafka-clients > 2.1 > - > > Key: KAFKA-8134 > URL: https://issues.apache.org/jira/browse/KAFKA-8134 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 2.1.0 >Reporter: Sam Lendle >Assignee: Dhruvil Shah >Priority: Major > Fix For: 2.3.0, 2.1.2, 2.2.1 > > > Prior to 2.1, the type of the "linger.ms" config was Long, but was changed to > Integer in 2.1.0 ([https://github.com/apache/kafka/pull/5270]) A config using > a Long value for that parameter which works with kafka-clients < 2.1 will > cause a ConfigException to be thrown when constructing a KafkaProducer if > kafka-clients is upgraded to >= 2.1. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8255) Replica fetcher thread exits with OffsetOutOfRangeException
Colin P. McCabe created KAFKA-8255: -- Summary: Replica fetcher thread exits with OffsetOutOfRangeException Key: KAFKA-8255 URL: https://issues.apache.org/jira/browse/KAFKA-8255 Project: Kafka Issue Type: Bug Components: core Reporter: Colin P. McCabe Replica fetcher threads can exits with OffsetOutOfRangeException when the log start offset has advanced beyond the high water mark on the fetching broker. Example stack trace: {code} org.apache.kafka.common.KafkaException: Error processing data for partition __consumer_offsets-46 offset 18761 at kafka.server.AbstractFetcherThread$$anonfun$kafka$server$AbstractFetcherThread$$processFetchRequest$2$$anonfun$apply$mcV$sp$3$$anonfun$apply$10.apply(AbstractFetcherThread.scala:335) at kafka.server.AbstractFetcherThread$$anonfun$kafka$server$AbstractFetcherThread$$processFetchRequest$2$$anonfun$apply$mcV$sp$3$$anonfun$apply$10.apply(AbstractFetcherThread.scala:294) at scala.Option.foreach(Option.scala:257) at kafka.server.AbstractFetcherThread$$anonfun$kafka$server$AbstractFetcherThread$$processFetchRequest$2$$anonfun$apply$mcV$sp$3.apply(AbstractFetcherThread.scala:294) at kafka.server.AbstractFetcherThread$$anonfun$kafka$server$AbstractFetcherThread$$processFetchRequest$2$$anonfun$apply$mcV$sp$3.apply(AbstractFetcherThread.scala:293) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at kafka.server.AbstractFetcherThread$$anonfun$kafka$server$AbstractFetcherThread$$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:293) at kafka.server.AbstractFetcherThread$$anonfun$kafka$server$AbstractFetcherThread$$processFetchRequest$2.apply(AbstractFetcherThread.scala:293) at kafka.server.AbstractFetcherThread$$anonfun$kafka$server$AbstractFetcherThread$$processFetchRequest$2.apply(AbstractFetcherThread.scala:293) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251) at kafka.server.AbstractFetcherThread.kafka$server$AbstractFetcherThread$$processFetchRequest(AbstractFetcherThread.scala:292) at kafka.server.AbstractFetcherThread$$anonfun$maybeFetch$1.apply(AbstractFetcherThread.scala:132) at kafka.server.AbstractFetcherThread$$anonfun$maybeFetch$1.apply(AbstractFetcherThread.scala:131) at scala.Option.foreach(Option.scala:257) at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:131) at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:113) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82) Caused by: org.apache.kafka.common.errors.OffsetOutOfRangeException: Cannot increment the log start offset to 4808819 of partition __consumer_offsets-46 since it is larger than the high watermark 18761 [2019-04-16 14:16:42,257] INFO [ReplicaFetcher replicaId=1001, leaderId=1003, fetcherId=0] Stopped (kafka.server.ReplicaFetcherThread) {code} It seems that we should not terminate the replica fetcher thread in this case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8238) Log how many bytes and messages were read from __consumer_offsets
Colin P. McCabe created KAFKA-8238: -- Summary: Log how many bytes and messages were read from __consumer_offsets Key: KAFKA-8238 URL: https://issues.apache.org/jira/browse/KAFKA-8238 Project: Kafka Issue Type: Improvement Reporter: Colin P. McCabe We should log how many bytes and messages were read from __consumer_offsets. Currently we only log how long it took. Example: {code} [GroupMetadataManager brokerId=2] Finished loading offsets and group metadata from __consumer_offsets-22 in 23131 milliseconds. {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8168) Add a generated ApiMessageType class
Colin P. McCabe created KAFKA-8168: -- Summary: Add a generated ApiMessageType class Key: KAFKA-8168 URL: https://issues.apache.org/jira/browse/KAFKA-8168 Project: Kafka Issue Type: Improvement Reporter: Colin P. McCabe Assignee: Colin P. McCabe Add a generated ApiMessageType class. This will make it easier to do operations based on the type of an ApiMessage. Once all the RPCs are converted to use protocol generation, we can switch to using this instead of ApiKeys.java (possibly renaming this to ApiKeys.java?) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8158) Add EntityType for Kafka RPC fields
Colin P. McCabe created KAFKA-8158: -- Summary: Add EntityType for Kafka RPC fields Key: KAFKA-8158 URL: https://issues.apache.org/jira/browse/KAFKA-8158 Project: Kafka Issue Type: Improvement Reporter: Colin P. McCabe Assignee: Colin P. McCabe Add an EntityType for Kafka RPC fields so that we know what type they should be. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-8150) Fix bugs in handling null arrays in generated RPC code
[ https://issues.apache.org/jira/browse/KAFKA-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-8150. Resolution: Fixed Fix Version/s: 2.2.1 The code path that this fixes isn't used in 2.2, I think. But I backported the patch to that branch just for the purpose of future-proofing. > Fix bugs in handling null arrays in generated RPC code > -- > > Key: KAFKA-8150 > URL: https://issues.apache.org/jira/browse/KAFKA-8150 > Project: Kafka > Issue Type: Bug >Reporter: Colin P. McCabe >Assignee: Colin P. McCabe >Priority: Major > Fix For: 2.2.1 > > > Fix bugs in handling null arrays in generated RPC code. > toString should not get a NullPointException. > Also, read() must properly translate a negative array length to a null field. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8150) Generated protocol class toString functions get NPEs when array fields are null
Colin P. McCabe created KAFKA-8150: -- Summary: Generated protocol class toString functions get NPEs when array fields are null Key: KAFKA-8150 URL: https://issues.apache.org/jira/browse/KAFKA-8150 Project: Kafka Issue Type: Bug Reporter: Colin P. McCabe Assignee: Colin P. McCabe Generated protocol class toString functions get NPEs when array fields are null. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8129) Shade Kafka client dependencies
Colin P. McCabe created KAFKA-8129: -- Summary: Shade Kafka client dependencies Key: KAFKA-8129 URL: https://issues.apache.org/jira/browse/KAFKA-8129 Project: Kafka Issue Type: Improvement Components: clients Reporter: Colin P. McCabe Assignee: Colin P. McCabe The Kafka client should shade its library dependencies. This will ensure that its dependencies don't collide with those employed by users. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-7974) KafkaAdminClient loses worker thread/enters zombie state when initial DNS lookup fails
[ https://issues.apache.org/jira/browse/KAFKA-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-7974. Resolution: Fixed Fix Version/s: 2.2.0 > KafkaAdminClient loses worker thread/enters zombie state when initial DNS > lookup fails > -- > > Key: KAFKA-7974 > URL: https://issues.apache.org/jira/browse/KAFKA-7974 > Project: Kafka > Issue Type: Bug >Reporter: Nicholas Parker >Priority: Major > Fix For: 2.2.0 > > > Version: kafka-clients-2.1.0 > I have some code that creates creates a KafkaAdminClient instance and then > invokes listTopics(). I was seeing the following stacktrace in the logs, > after which the KafkaAdminClient instance became unresponsive: > {code:java} > ERROR [kafka-admin-client-thread | adminclient-1] 2019-02-18 01:00:45,597 > KafkaThread.java:51 - Uncaught exception in thread 'kafka-admin-client-thread > | adminclient-1': > java.lang.IllegalStateException: No entry found for connection 0 > at > org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:330) > at > org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:134) > at > org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:921) > at org.apache.kafka.clients.NetworkClient.ready(NetworkClient.java:287) > at > org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.sendEligibleCalls(KafkaAdminClient.java:898) > at > org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1113) > at java.lang.Thread.run(Thread.java:748){code} > From looking at the code I was able to trace down a possible cause: > * NetworkClient.ready() invokes this.initiateConnect() as seen in the above > stacktrace > * NetworkClient.initiateConnect() invokes > ClusterConnectionStates.connecting(), which internally invokes > ClientUtils.resolve() to to resolve the host when creating an entry for the > connection. > * If this host lookup fails, a UnknownHostException can be thrown back to > NetworkClient.initiateConnect() and the connection entry is not created in > ClusterConnectionStates. This exception doesn't get logged so this is a guess > on my part. > * NetworkClient.initiateConnect() catches the exception and attempts to call > ClusterConnectionStates.disconnected(), which throws an IllegalStateException > because no entry had yet been created due to the lookup failure. > * This IllegalStateException ends up killing the worker thread and > KafkaAdminClient gets stuck, never returning from listTopics(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8060) The Kafka protocol generator should allow null default values for strings
Colin P. McCabe created KAFKA-8060: -- Summary: The Kafka protocol generator should allow null default values for strings Key: KAFKA-8060 URL: https://issues.apache.org/jira/browse/KAFKA-8060 Project: Kafka Issue Type: Improvement Reporter: Colin P. McCabe Assignee: Colin P. McCabe The Kafka protocol generator should allow null default values for strings. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-7792) Trogdor should have an uptime function
[ https://issues.apache.org/jira/browse/KAFKA-7792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-7792. Resolution: Fixed > Trogdor should have an uptime function > -- > > Key: KAFKA-7792 > URL: https://issues.apache.org/jira/browse/KAFKA-7792 > Project: Kafka > Issue Type: Improvement >Reporter: Colin P. McCabe >Assignee: Stanislav Kozlovski >Priority: Minor > > Trogdor should have an uptime function which returns how long the coordinator > or agent has been up. This will also be a good way to test that the daemon > is running without fetching a full status. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-7793) Improve the Trogdor command-line
[ https://issues.apache.org/jira/browse/KAFKA-7793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-7793. Resolution: Fixed > Improve the Trogdor command-line > > > Key: KAFKA-7793 > URL: https://issues.apache.org/jira/browse/KAFKA-7793 > Project: Kafka > Issue Type: Improvement >Reporter: Colin P. McCabe >Assignee: Colin P. McCabe >Priority: Major > > Improve the Trogdor command-line. It should be easier to launch tasks from a > task spec in a file. It should be easier to list the currently-running tasks > in a readable way. We should be able to filter the currently-running tasks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-7790) Fix Bugs in Trogdor Task Expiration
[ https://issues.apache.org/jira/browse/KAFKA-7790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-7790. Resolution: Fixed > Fix Bugs in Trogdor Task Expiration > --- > > Key: KAFKA-7790 > URL: https://issues.apache.org/jira/browse/KAFKA-7790 > Project: Kafka > Issue Type: Improvement >Reporter: Stanislav Kozlovski >Assignee: Stanislav Kozlovski >Priority: Major > > If an Agent process is restarted, it will be re-sent the worker > specifications for any tasks that are not DONE. The agent will run these > tasks for the original time period. It should be fixed to run them only for > the remaining task time. There is also a bug where the coordinator can > sometimes re-create a worker even when the task is DONE. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-7741) Bad dependency via SBT
[ https://issues.apache.org/jira/browse/KAFKA-7741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-7741. Resolution: Fixed > Bad dependency via SBT > -- > > Key: KAFKA-7741 > URL: https://issues.apache.org/jira/browse/KAFKA-7741 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.0.0, 2.0.1, 2.1.0 > Environment: Windows 10 professional, IntelliJ IDEA 2017.1 >Reporter: sacha barber >Assignee: John Roesler >Priority: Major > Fix For: 2.2.0, 2.1.1, 2.0.2 > > > I am using the Kafka-Streams-Scala 2.1.0 JAR. > And if I create a new Scala project using SBT with these dependencies > {code} > name := "ScalaKafkaStreamsDemo" > version := "1.0" > scalaVersion := "2.12.1" > libraryDependencies += "org.apache.kafka" %% "kafka" % "2.0.0" > libraryDependencies += "org.apache.kafka" % "kafka-clients" % "2.0.0" > libraryDependencies += "org.apache.kafka" % "kafka-streams" % "2.0.0" > libraryDependencies += "org.apache.kafka" %% "kafka-streams-scala" % "2.0.0" > //TEST > libraryDependencies += "org.scalatest" %% "scalatest" % "3.0.5" % Test > libraryDependencies += "org.apache.kafka" % "kafka-streams-test-utils" % > "2.0.0" % Test > {code} > I get this error > > {code} > SBT 'ScalaKafkaStreamsDemo' project refresh failed > Error:Error while importing SBT project:...[info] Resolving > jline#jline;2.14.1 ... > [warn] [FAILED ] > javax.ws.rs#javax.ws.rs-api;2.1.1!javax.ws.rs-api.${packaging.type}: (0ms) > [warn] local: tried > [warn] > C:\Users\sacha\.ivy2\local\javax.ws.rs\javax.ws.rs-api\2.1.1\${packaging.type}s\javax.ws.rs-api.${packaging.type} > [warn] public: tried > [warn] > https://repo1.maven.org/maven2/javax/ws/rs/javax.ws.rs-api/2.1.1/javax.ws.rs-api-2.1.1.${packaging.type} > [info] downloading > https://repo1.maven.org/maven2/org/apache/kafka/kafka-streams-test-utils/2.1.0/kafka-streams-test-utils-2.1.0.jar > ... > [info] [SUCCESSFUL ] > org.apache.kafka#kafka-streams-test-utils;2.1.0!kafka-streams-test-utils.jar > (344ms) > [warn] :: > [warn] :: FAILED DOWNLOADS :: > [warn] :: ^ see resolution messages for details ^ :: > [warn] :: > [warn] :: javax.ws.rs#javax.ws.rs-api;2.1.1!javax.ws.rs-api.${packaging.type} > [warn] :: > [trace] Stack trace suppressed: run 'last *:ssExtractDependencies' for the > full output. > [trace] Stack trace suppressed: run 'last *:update' for the full output. > [error] (*:ssExtractDependencies) sbt.ResolveException: download failed: > javax.ws.rs#javax.ws.rs-api;2.1.1!javax.ws.rs-api.${packaging.type} > [error] (*:update) sbt.ResolveException: download failed: > javax.ws.rs#javax.ws.rs-api;2.1.1!javax.ws.rs-api.${packaging.type} > [error] Total time: 8 s, completed 16-Dec-2018 19:27:21 > Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=384M; > support was removed in 8.0See complete log in href="file:/C:/Users/sacha/.IdeaIC2017.1/system/log/sbt.last.log">file:/C:/Users/sacha/.IdeaIC2017.1/system/log/sbt.last.log > {code} > This seems to be a common issue with bad dependency from Kafka to > javax.ws.rs-api. > if I drop the Kafka version down to 2.0.0 and add this line to my SBT file > this error goes away > {code} > libraryDependencies += "javax.ws.rs" % "javax.ws.rs-api" % "2.1" > artifacts(Artifact("javax.ws.rs-api", "jar", "jar"))` > {code} > > However I would like to work with 2.1.0 version. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-7856) Cryptographic Issues by Insufficient Entropy
[ https://issues.apache.org/jira/browse/KAFKA-7856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-7856. Resolution: Invalid > Cryptographic Issues by Insufficient Entropy > > > Key: KAFKA-7856 > URL: https://issues.apache.org/jira/browse/KAFKA-7856 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Victor Sartori >Priority: Major > Labels: patch, pull-request-available, security > Fix For: 2.1.1 > > > We pass the kakfa client in security analisys ans this scans reports: > CWE-331 - Flaw medium,SANS TOP 25 > [https://cwe.mitre.org/data/definitions/331.html] > > A PR on github is present. (https://github.com/apache/kafka/pull/6184) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7844) Fix "Cannot lock buildSrc build lock as it has already been locked by this process" error in jarAll
Colin P. McCabe created KAFKA-7844: -- Summary: Fix "Cannot lock buildSrc build lock as it has already been locked by this process" error in jarAll Key: KAFKA-7844 URL: https://issues.apache.org/jira/browse/KAFKA-7844 Project: Kafka Issue Type: Bug Reporter: Colin P. McCabe Assignee: Colin P. McCabe When building with jarAll, installAll, and the other "All" target I get this error: {code} Cannot lock buildSrc build lock as it has already been locked by this process" error in jarAll {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7832) Use automatic RPC generation in CreateTopics
Colin P. McCabe created KAFKA-7832: -- Summary: Use automatic RPC generation in CreateTopics Key: KAFKA-7832 URL: https://issues.apache.org/jira/browse/KAFKA-7832 Project: Kafka Issue Type: Sub-task Reporter: Colin P. McCabe Assignee: Colin P. McCabe Use automatic RPC generation for the CreateTopics RPC. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7830) Convert Kafka RPCs to use automatically generated code
Colin P. McCabe created KAFKA-7830: -- Summary: Convert Kafka RPCs to use automatically generated code Key: KAFKA-7830 URL: https://issues.apache.org/jira/browse/KAFKA-7830 Project: Kafka Issue Type: Improvement Components: clients, core Reporter: Colin P. McCabe Assignee: Colin P. McCabe KAFKA-7609 added a way of automatically generating code for reading and writing Kafka RPC message types from JSON schemas. Automatically generated code is preferrable to manually written serialization code. * * It is less tedious and error-prone to use than hand-written code. * For developers writing Kafka clients in other languages, the JSON schemas are useful in a way that the java serialization code is not. * It will eventually be possible to automatically validate aspects of cross-version compatibility, when using JSON message schemas. * Once all of the RPCs are converted, we can drop using Structs in favor of serializing directly to ByteBuffer, to reduce GC load. This JIRA tracks converting the current hand-written message serialization code to automatically generated serialization. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7793) Improve the Trogdor command-line
Colin P. McCabe created KAFKA-7793: -- Summary: Improve the Trogdor command-line Key: KAFKA-7793 URL: https://issues.apache.org/jira/browse/KAFKA-7793 Project: Kafka Issue Type: Improvement Reporter: Colin P. McCabe Improve the Trogdor command-line. It should be easier to launch tasks from a task spec in a file. It should be easier to list the currently-running tasks in a readable way. We should be able to filter the currently-running tasks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7792) Trogdor should have an uptime function
Colin P. McCabe created KAFKA-7792: -- Summary: Trogdor should have an uptime function Key: KAFKA-7792 URL: https://issues.apache.org/jira/browse/KAFKA-7792 Project: Kafka Issue Type: Improvement Reporter: Colin P. McCabe Assignee: Colin P. McCabe Trogdor should have an uptime function which returns how long the coordinator or agent has been up. This will also be a good way to test that the daemon is running without fetching a full status. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7788) Support null defaults in KAFKA-7609 RPC specifications
Colin P. McCabe created KAFKA-7788: -- Summary: Support null defaults in KAFKA-7609 RPC specifications Key: KAFKA-7788 URL: https://issues.apache.org/jira/browse/KAFKA-7788 Project: Kafka Issue Type: Improvement Reporter: Colin P. McCabe It would be nice if we could support null values as defaults in the KAFKA-7609 RPC specification files. null defaults should be allowed only if the field is nullable in all supported versions of the field. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7787) Add error specifications to KAFKA-7609
Colin P. McCabe created KAFKA-7787: -- Summary: Add error specifications to KAFKA-7609 Key: KAFKA-7787 URL: https://issues.apache.org/jira/browse/KAFKA-7787 Project: Kafka Issue Type: Improvement Reporter: Colin P. McCabe In our RPC JSON, it would be nice if we could specify what versions of a response could contain what errors. See the discussion here: https://github.com/apache/kafka/pull/5893 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-5775) Implement Fault Injection testing for Kafka
[ https://issues.apache.org/jira/browse/KAFKA-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-5775. Resolution: Fixed > Implement Fault Injection testing for Kafka > --- > > Key: KAFKA-5775 > URL: https://issues.apache.org/jira/browse/KAFKA-5775 > Project: Kafka > Issue Type: Bug >Reporter: Colin P. McCabe >Assignee: Colin P. McCabe >Priority: Major > > Implement fault injection testing for Apache Kafka. The general idea is to > create faults such as network partitions or disk failures, and see what > happens in the cluster. The fault injector can run as part of a ducktape > system test, or standalone. More details here: > https://cwiki.apache.org/confluence/display/KAFKA/Fault+Injection -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-6696) Trogdor should support destroying tasks
[ https://issues.apache.org/jira/browse/KAFKA-6696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-6696. Resolution: Fixed > Trogdor should support destroying tasks > --- > > Key: KAFKA-6696 > URL: https://issues.apache.org/jira/browse/KAFKA-6696 > Project: Kafka > Issue Type: Improvement > Components: system tests >Reporter: Colin P. McCabe >Assignee: Colin P. McCabe >Priority: Major > > Trogdor should support destroying tasks. This will make it more practical to > have very long running Trogdor instances. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-6372) Trogdor should use LogContext for log messages
[ https://issues.apache.org/jira/browse/KAFKA-6372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-6372. Resolution: Fixed > Trogdor should use LogContext for log messages > -- > > Key: KAFKA-6372 > URL: https://issues.apache.org/jira/browse/KAFKA-6372 > Project: Kafka > Issue Type: Improvement >Reporter: Colin P. McCabe >Assignee: Colin P. McCabe >Priority: Minor > > Trogdor should use LogContext for log messages, rather than manually > prefixing log messages with the context. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-7428) ConnectionStressSpec: add "action", allow multiple clients
[ https://issues.apache.org/jira/browse/KAFKA-7428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-7428. Resolution: Fixed > ConnectionStressSpec: add "action", allow multiple clients > -- > > Key: KAFKA-7428 > URL: https://issues.apache.org/jira/browse/KAFKA-7428 > Project: Kafka > Issue Type: Test >Reporter: Colin P. McCabe >Assignee: Colin P. McCabe >Priority: Major > > ConnectionStressSpec: add "action", allow multiple clients -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-7183) Add a trogdor test that creates many connections to brokers
[ https://issues.apache.org/jira/browse/KAFKA-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-7183. Resolution: Fixed > Add a trogdor test that creates many connections to brokers > --- > > Key: KAFKA-7183 > URL: https://issues.apache.org/jira/browse/KAFKA-7183 > Project: Kafka > Issue Type: Test > Components: system tests >Reporter: Colin P. McCabe >Assignee: Colin P. McCabe >Priority: Major > > Add a trogdor test that creates many connections to brokers -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-6785) Add Trogdor documentation
[ https://issues.apache.org/jira/browse/KAFKA-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-6785. Resolution: Fixed > Add Trogdor documentation > - > > Key: KAFKA-6785 > URL: https://issues.apache.org/jira/browse/KAFKA-6785 > Project: Kafka > Issue Type: Improvement > Components: system tests >Reporter: Colin P. McCabe >Assignee: Colin P. McCabe >Priority: Major > > Add documentation for the Trogdor framework -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7609) Add Protocol Generator for Kafka
Colin P. McCabe created KAFKA-7609: -- Summary: Add Protocol Generator for Kafka Key: KAFKA-7609 URL: https://issues.apache.org/jira/browse/KAFKA-7609 Project: Kafka Issue Type: Improvement Reporter: Colin P. McCabe Assignee: Colin P. McCabe Rather than hand-writing the code for sending and receiving all the different versions of the Kafka RPC protocol, we should have a protocol generator which can generate this code from an interface definition language (IDL). This will make it less labor-intensive and error-prone to add new message types and versions. It will also make it easier to support these new RPC changes in clients such as librdkafka. Eventually, we should be able to get rid of the Struct classes and serialize messages directly to byte buffers. This will greatly reduce the garbage collection overhead of the network stack, especially when handling large messages. Furthermore, having a formal definition for the Kafka protocol may eventually allow us to expose other transports such as REST without writing lots of glue code to translate back and forth. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-7495) AdminClient thread dies on invalid input
[ https://issues.apache.org/jira/browse/KAFKA-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-7495. Resolution: Duplicate > AdminClient thread dies on invalid input > > > Key: KAFKA-7495 > URL: https://issues.apache.org/jira/browse/KAFKA-7495 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Xavier Léauté >Priority: Major > > The following code results in an uncaught IllegalArgumentException in the > admin client thread, resulting in a zombie admin client. > {code} > AclBindingFilter aclFilter = new AclBindingFilter( > new ResourcePatternFilter(ResourceType.UNKNOWN, null, PatternType.ANY), > AccessControlEntryFilter.ANY > ); > kafkaAdminClient.describeAcls(aclFilter).values().get(); > {code} > See the resulting stacktrace below > {code} > ERROR [kafka-admin-client-thread | adminclient-3] Uncaught exception in > thread 'kafka-admin-client-thread | adminclient-3': > (org.apache.kafka.common.utils.KafkaThread) > java.lang.IllegalArgumentException: Filter contain UNKNOWN elements > at > org.apache.kafka.common.requests.DescribeAclsRequest.validate(DescribeAclsRequest.java:140) > at > org.apache.kafka.common.requests.DescribeAclsRequest.(DescribeAclsRequest.java:92) > at > org.apache.kafka.common.requests.DescribeAclsRequest$Builder.build(DescribeAclsRequest.java:77) > at > org.apache.kafka.common.requests.DescribeAclsRequest$Builder.build(DescribeAclsRequest.java:67) > at org.apache.kafka.clients.NetworkClient.doSend(NetworkClient.java:450) > at org.apache.kafka.clients.NetworkClient.send(NetworkClient.java:411) > at > org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.sendEligibleCalls(KafkaAdminClient.java:910) > at > org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1107) > at java.base/java.lang.Thread.run(Thread.java:844) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7496) KafkaAdminClient#describeAcls should handle invalid filters gracefully
Colin P. McCabe created KAFKA-7496: -- Summary: KafkaAdminClient#describeAcls should handle invalid filters gracefully Key: KAFKA-7496 URL: https://issues.apache.org/jira/browse/KAFKA-7496 Project: Kafka Issue Type: Bug Components: admin Reporter: Colin P. McCabe Assignee: Colin P. McCabe KafkaAdminClient#describeAcls should handle invalid filters gracefully. Specifically, it should return a future which yields an exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7466) Implement KIP-339: Create a new IncrementalAlterConfigs API
Colin P. McCabe created KAFKA-7466: -- Summary: Implement KIP-339: Create a new IncrementalAlterConfigs API Key: KAFKA-7466 URL: https://issues.apache.org/jira/browse/KAFKA-7466 Project: Kafka Issue Type: Improvement Reporter: Colin P. McCabe Assignee: Colin P. McCabe Implement KIP-339: Create a new IncrementalAlterConfigs API -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7428) ConnectionStressSpec: add "action", allow multiple clients
Colin P. McCabe created KAFKA-7428: -- Summary: ConnectionStressSpec: add "action", allow multiple clients Key: KAFKA-7428 URL: https://issues.apache.org/jira/browse/KAFKA-7428 Project: Kafka Issue Type: Test Reporter: Colin P. McCabe Assignee: Colin P. McCabe ConnectionStressSpec: add "action", allow multiple clients -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7183) Add a trogdor test that creates many connections to brokers
Colin P. McCabe created KAFKA-7183: -- Summary: Add a trogdor test that creates many connections to brokers Key: KAFKA-7183 URL: https://issues.apache.org/jira/browse/KAFKA-7183 Project: Kafka Issue Type: Test Components: system tests Reporter: Colin P. McCabe Assignee: Colin P. McCabe Add a trogdor test that creates many connections to brokers -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7109) KafkaConsumer should close its incremental fetch sessions on close
Colin P. McCabe created KAFKA-7109: -- Summary: KafkaConsumer should close its incremental fetch sessions on close Key: KAFKA-7109 URL: https://issues.apache.org/jira/browse/KAFKA-7109 Project: Kafka Issue Type: Improvement Reporter: Colin P. McCabe Assignee: Colin P. McCabe KafkaConsumer should close its incremental fetch sessions on close. Currently, the sessions are not closed, but simply time out once the consumer is gone. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7051) Improve the efficiency of the ReplicaManager when there are many partitions
Colin P. McCabe created KAFKA-7051: -- Summary: Improve the efficiency of the ReplicaManager when there are many partitions Key: KAFKA-7051 URL: https://issues.apache.org/jira/browse/KAFKA-7051 Project: Kafka Issue Type: Bug Components: replication Affects Versions: 0.8.0 Reporter: Colin P. McCabe Assignee: Colin P. McCabe -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6785) Add Trogdor documentation
Colin P. McCabe created KAFKA-6785: -- Summary: Add Trogdor documentation Key: KAFKA-6785 URL: https://issues.apache.org/jira/browse/KAFKA-6785 Project: Kafka Issue Type: Improvement Components: system tests Reporter: Colin P. McCabe Assignee: Colin P. McCabe Add documentation for the Trogdor framework -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6771) Make specifying partitions in RoundTripWorkload, ProduceBench more flexible
Colin P. McCabe created KAFKA-6771: -- Summary: Make specifying partitions in RoundTripWorkload, ProduceBench more flexible Key: KAFKA-6771 URL: https://issues.apache.org/jira/browse/KAFKA-6771 Project: Kafka Issue Type: Improvement Components: system tests Reporter: Colin P. McCabe Assignee: Colin P. McCabe Make specifying partitions in RoundTripWorkload, ProduceBench more flexible -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-6457) Error: NOT_LEADER_FOR_PARTITION leads to NPE
[ https://issues.apache.org/jira/browse/KAFKA-6457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-6457. Resolution: Resolved I believe this is a duplicate of KAFKA-6260 > Error: NOT_LEADER_FOR_PARTITION leads to NPE > > > Key: KAFKA-6457 > URL: https://issues.apache.org/jira/browse/KAFKA-6457 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 1.0.0 >Reporter: Seweryn Habdank-Wojewodzki >Priority: Major > > One of our nodes was dead. Then the second one tooks all responsibility. > But streamming aplication in the meanwhile crashed due to NPE caused by > {{Error: NOT_LEADER_FOR_PARTITION}}. > The stack trace is below. > > Is it something expected? > > {code:java} > 2018-01-17 11:47:21 [my] [WARN ] Sender:251 - [Producer ...2018-01-17 > 11:47:21 [my] [WARN ] Sender:251 - [Producer > clientId=restreamer-my-fef07ca9-b067-45c0-a5af-68b5a1730dac-StreamThread-1-producer] > Got error produce response with correlation id 768962 on topic-partition > my_internal_topic-5, retrying (9 attempts left). Error: > NOT_LEADER_FOR_PARTITION > 2018-01-17 11:47:21 [my] [WARN ] Sender:251 - [Producer > clientId=restreamer-my-fef07ca9-b067-45c0-a5af-68b5a1730dac-StreamThread-1-producer] > Got error produce response with correlation id 768962 on topic-partition > my_internal_topic-7, retrying (9 attempts left). Error: > NOT_LEADER_FOR_PARTITION > 2018-01-17 11:47:21 [my] [ERROR] AbstractCoordinator:296 - [Consumer > clientId=restreamer-my-fef07ca9-b067-45c0-a5af-68b5a1730dac-StreamThread-1-consumer, > groupId=restreamer-my] Heartbeat thread for group restreamer-my failed due > to unexpected error > java.lang.NullPointerException: null > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:436) > ~[my-restreamer.jar:?] > at org.apache.kafka.common.network.Selector.poll(Selector.java:395) > ~[my-restreamer.jar:?] > at > org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460) > ~[my-restreamer.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:238) > ~[my-restreamer.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.pollNoWakeup(ConsumerNetworkClient.java:275) > ~[my-restreamer.jar:?] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:934) > [my-restreamer.jar:?] > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-6459) By (Re-)joining group StreamThread got NPE
[ https://issues.apache.org/jira/browse/KAFKA-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-6459. Resolution: Duplicate Fix Version/s: 1.0.1 Duplicate of KAFKA-6260 > By (Re-)joining group StreamThread got NPE > -- > > Key: KAFKA-6459 > URL: https://issues.apache.org/jira/browse/KAFKA-6459 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 1.0.0 >Reporter: Seweryn Habdank-Wojewodzki >Priority: Major > Fix For: 1.0.1 > > > We encouteres more instabilities in Kafka Streams. > By (Re-)joining group StreamThread got NPE. > {code} > 2018-01-18 09:48:44 INFO AbstractCoordinator:336 - [Consumer > clientId=kafka-endpoint-c1e67c77-c0c8-413d-b0f2-31f22bdeae05-StreamThread-1-consumer, > groupId=kafka-endpoint] (Re-)joining group > 2018-01-18 09:48:44 INFO StreamPartitionAssignor:341 - stream-thread > [kafka-endpoint-c1e67c77-c0c8-413d-b0f2-31f22bdeae05-StreamThread-1-consumer] > Assigned tasks to clients as > \{c1e67c77-c0c8-413d-b0f2-31f22bdeae05=[activeTasks: ([0_0, 0_1, 0_2, 0_3, > 0_4, 0_5, 0_6, 0_7, 0_8, 0_9]) standbyTasks: ([]) assignedTasks: ([0_0, 0_1, > 0_2, 0_3, 0_4, 0_5, 0_6, 0_7, 0_8, 0_9]) prevActiveTasks: ([0_0, 0_1, 0_2, > 0_3, 0_4]) prevAssignedTasks: ([0_0, 0_1, 0_2, 0_3, 0_4]) capacity: 1]}. > 2018-01-18 09:48:44 INFO AbstractCoordinator:341 - [Consumer > clientId=kafka-endpoint-c1e67c77-c0c8-413d-b0f2-31f22bdeae05-StreamThread-1-consumer, > groupId=kafka-endpoint] Successfully joined group with generation 3950 > 2018-01-18 09:48:44 INFO ConsumerCoordinator:341 - [Consumer > clientId=kafka-endpoint-c1e67c77-c0c8-413d-b0f2-31f22bdeae05-StreamThread-1-consumer, > groupId=kafka-endpoint] Setting newly assigned partitions > [my_internal_topic-6, my_internal_topic-5, my_internal_topic-8, > my_internal_topic-7, my_internal_topic-9, my_internal_topic-0, > my_internal_topic-2, my_internal_topic-1, my_internal_topic-4, > my_internal_topic-3] > 2018-01-18 09:48:44 INFO StreamThread:346 - stream-thread > [kafka-endpoint-c1e67c77-c0c8-413d-b0f2-31f22bdeae05-StreamThread-1] State > transition from PARTITIONS_REVOKED to PARTITIONS_ASSIGNED > 2018-01-18 09:48:44 INFO StreamThread:351 - stream-thread > [kafka-endpoint-c1e67c77-c0c8-413d-b0f2-31f22bdeae05-StreamThread-1] > partition assignment took 149 ms. > current active tasks: [0_0, 0_1, 0_2, 0_3, 0_4, 0_5, 0_6, 0_7, 0_8, > 0_9] > current standby tasks: [] > previous active tasks: [0_0, 0_1, 0_2, 0_3, 0_4] > 2018-01-18 09:48:44 ERROR StreamThread:306 - stream-thread > [kafka-endpoint-c1e67c77-c0c8-413d-b0f2-31f22bdeae05-StreamThread-1] > Encountered the following error during processing: > java.lang.NullPointerException: null > at > org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:436) > ~[my-kafka-endpoint.jar:?] > at org.apache.kafka.common.network.Selector.poll(Selector.java:395) > ~[my-kafka-endpoint.jar:?] > at > org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460) > ~[my-kafka-endpoint.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:238) > ~[my-kafka-endpoint.jar:?] > at > org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.pollNoWakeup(ConsumerNetworkClient.java:275) > ~[my-kafka-endpoint.jar:?] > at > org.apache.kafka.clients.consumer.internals.AbstractCoordinator$HeartbeatThread.run(AbstractCoordinator.java:934) > ~[my-kafka-endpoint.jar:?] > 2018-01-18 09:48:44 INFO StreamThread:346 - stream-thread > [kafka-endpoint-c1e67c77-c0c8-413d-b0f2-31f22bdeae05-StreamThread-1] State > transition from PARTITIONS_ASSIGNED to PENDING_SHUTDOWN > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6696) Trogdor should support destroying tasks
Colin P. McCabe created KAFKA-6696: -- Summary: Trogdor should support destroying tasks Key: KAFKA-6696 URL: https://issues.apache.org/jira/browse/KAFKA-6696 Project: Kafka Issue Type: Improvement Components: system tests Reporter: Colin P. McCabe Assignee: Colin P. McCabe Trogdor should support destroying tasks. This will make it more practical to have very long running Trogdor instances. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6694) The Trogdor Coordinator should support filtering task responses
Colin P. McCabe created KAFKA-6694: -- Summary: The Trogdor Coordinator should support filtering task responses Key: KAFKA-6694 URL: https://issues.apache.org/jira/browse/KAFKA-6694 Project: Kafka Issue Type: Improvement Components: system tests Reporter: Colin P. McCabe Assignee: Colin P. McCabe Currently, a user must get the status of all tasks when hitting the {{/coordinator/tasks}} endpoint on the Trogdor coordinator. To make the responses smaller, the Trogdor coordinator should support filtering task responses. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6688) The Trogdor coordinator should track task statuses
Colin P. McCabe created KAFKA-6688: -- Summary: The Trogdor coordinator should track task statuses Key: KAFKA-6688 URL: https://issues.apache.org/jira/browse/KAFKA-6688 Project: Kafka Issue Type: Improvement Components: system tests Reporter: Colin P. McCabe Assignee: Colin P. McCabe The Trogdor coordinator should track task statuses -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6658) Fix RoundTripWorkload and make k/v generation configurable
Colin P. McCabe created KAFKA-6658: -- Summary: Fix RoundTripWorkload and make k/v generation configurable Key: KAFKA-6658 URL: https://issues.apache.org/jira/browse/KAFKA-6658 Project: Kafka Issue Type: Bug Components: system tests, unit tests Reporter: Colin P. McCabe Assignee: Colin P. McCabe Fixes RoundTripWorkload. Currently RoundTripWorkload is unable to get the sequence number of the keys that it produced. Also, make PayloadGenerator an interface which can have multiple implementations: constant, uniform random, sequential, and allow different payload generators to be used for keys and values. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-6372) Trogdor should use LogContext for log messages
Colin P. McCabe created KAFKA-6372: -- Summary: Trogdor should use LogContext for log messages Key: KAFKA-6372 URL: https://issues.apache.org/jira/browse/KAFKA-6372 Project: Kafka Issue Type: Sub-task Reporter: Colin P. McCabe Assignee: Colin P. McCabe Priority: Minor Trogdor should use LogContext for log messages, rather than manually prefixing log messages with the context. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-6368) AdminClient should contact multiple nodes before timing out a call
Colin P. McCabe created KAFKA-6368: -- Summary: AdminClient should contact multiple nodes before timing out a call Key: KAFKA-6368 URL: https://issues.apache.org/jira/browse/KAFKA-6368 Project: Kafka Issue Type: Bug Components: admin Affects Versions: 0.10.1.0 Reporter: Colin P. McCabe Assignee: Colin P. McCabe The AdminClient should contact multiple nodes before timing out a call. Right now, we could use up our entire call timeout just waiting for one very slow node to respond. We probably need to decouple the call timeout from the NetworkClient request timeout. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-6299) Fix AdminClient error handling when metadata changes
Colin P. McCabe created KAFKA-6299: -- Summary: Fix AdminClient error handling when metadata changes Key: KAFKA-6299 URL: https://issues.apache.org/jira/browse/KAFKA-6299 Project: Kafka Issue Type: Bug Reporter: Colin P. McCabe Assignee: Colin P. McCabe * AdminClient should retry requests for which the controller or partition leader has changed * Fix an issue where AdminClient requests might not get a security exception, even when a metadata fetch fails with an authorization exception. * Fix a possible issue where AdminClient might leak a socket after the timeout expires on a hard close, if a very narrow race condition is hit -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-6298) Line numbers on log messages are incorrect
Colin P. McCabe created KAFKA-6298: -- Summary: Line numbers on log messages are incorrect Key: KAFKA-6298 URL: https://issues.apache.org/jira/browse/KAFKA-6298 Project: Kafka Issue Type: Bug Reporter: Colin P. McCabe The line numbers on log messages are all incorrect now. For example, AdminClient should have this log message on line 394: {code} 394 log.debug("Kafka admin client initialized") {code} But instead, it shows up as being on line 177: {code} [2017-12-01 15:42:18,710] DEBUG [AdminClient clientId=adminclient-1] Kafka admin client initialized (org.apache.kafka.clients.admin.KafkaAdminClient:177) {code} The line numbers appear to be coming from {{LogContext.java}}: {code} 174@Override 175public void debug(String message) { 176if (logger.isDebugEnabled()) 177logger.debug(logPrefix + message); 178} {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-6255) Add ProduceBench to Trogdor
Colin P. McCabe created KAFKA-6255: -- Summary: Add ProduceBench to Trogdor Key: KAFKA-6255 URL: https://issues.apache.org/jira/browse/KAFKA-6255 Project: Kafka Issue Type: Sub-task Reporter: Colin P. McCabe Assignee: Colin P. McCabe Add ProduceBench, a benchmark of producer latency, to Trogdor. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-6254) Introduce Incremental FetchRequests to Increase Partition Scalability
Colin P. McCabe created KAFKA-6254: -- Summary: Introduce Incremental FetchRequests to Increase Partition Scalability Key: KAFKA-6254 URL: https://issues.apache.org/jira/browse/KAFKA-6254 Project: Kafka Issue Type: Improvement Reporter: Colin P. McCabe Assignee: Colin P. McCabe Introduce Incremental FetchRequests to Increase Partition Scalability. See https://cwiki.apache.org/confluence/pages/editpage.action?pageId=74687799 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-6247) Fix system test dependency issues
Colin P. McCabe created KAFKA-6247: -- Summary: Fix system test dependency issues Key: KAFKA-6247 URL: https://issues.apache.org/jira/browse/KAFKA-6247 Project: Kafka Issue Type: Bug Components: system tests Reporter: Colin P. McCabe Kibosh needs to be installed on Vagrant instances as well as in Docker environments. And we need to download old Apache Kafka releases from a stable mirror that will not purge old releases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-6102) Consolidate MockTime implementations between connect and clients
Colin P. McCabe created KAFKA-6102: -- Summary: Consolidate MockTime implementations between connect and clients Key: KAFKA-6102 URL: https://issues.apache.org/jira/browse/KAFKA-6102 Project: Kafka Issue Type: Improvement Reporter: Colin P. McCabe Assignee: Colin P. McCabe Priority: Minor Consolidate MockTime implementations between connect and clients -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-6070) ducker-ak: add ipaddress and enum34 dependencies to docker image
Colin P. McCabe created KAFKA-6070: -- Summary: ducker-ak: add ipaddress and enum34 dependencies to docker image Key: KAFKA-6070 URL: https://issues.apache.org/jira/browse/KAFKA-6070 Project: Kafka Issue Type: Bug Reporter: Colin P. McCabe Assignee: Colin P. McCabe ducker-ak: add ipaddress and enum34 dependencies to docker image -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-6060) Add workload generation capabilities to Trogdor
Colin P. McCabe created KAFKA-6060: -- Summary: Add workload generation capabilities to Trogdor Key: KAFKA-6060 URL: https://issues.apache.org/jira/browse/KAFKA-6060 Project: Kafka Issue Type: Sub-task Reporter: Colin P. McCabe Assignee: Colin P. McCabe Add workload generation capabilities to Trogdor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5849) Add partitioned produce consume test
Colin P. McCabe created KAFKA-5849: -- Summary: Add partitioned produce consume test Key: KAFKA-5849 URL: https://issues.apache.org/jira/browse/KAFKA-5849 Project: Kafka Issue Type: Bug Reporter: Colin P. McCabe Assignee: Colin P. McCabe Add partitioned produce consume test -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5838) Speed up running system tests in docker a bit
Colin P. McCabe created KAFKA-5838: -- Summary: Speed up running system tests in docker a bit Key: KAFKA-5838 URL: https://issues.apache.org/jira/browse/KAFKA-5838 Project: Kafka Issue Type: Bug Reporter: Colin P. McCabe Assignee: Colin P. McCabe Speed up running system tests in docker a bit by using optimized sshd options. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5811) Trogdor should handle injecting disk faults
Colin P. McCabe created KAFKA-5811: -- Summary: Trogdor should handle injecting disk faults Key: KAFKA-5811 URL: https://issues.apache.org/jira/browse/KAFKA-5811 Project: Kafka Issue Type: Sub-task Reporter: Colin P. McCabe Assignee: Colin P. McCabe Trogdor should handle injecting disk faults -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5806) Fix transient unit test failure in trogdor coordinator shutdown
Colin P. McCabe created KAFKA-5806: -- Summary: Fix transient unit test failure in trogdor coordinator shutdown Key: KAFKA-5806 URL: https://issues.apache.org/jira/browse/KAFKA-5806 Project: Kafka Issue Type: Sub-task Components: system tests Reporter: Colin P. McCabe Assignee: Colin P. McCabe Fix transient unit test failure in trogdor coordinator shutdown -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5791) Add HTTPS support to the fault injector
Colin P. McCabe created KAFKA-5791: -- Summary: Add HTTPS support to the fault injector Key: KAFKA-5791 URL: https://issues.apache.org/jira/browse/KAFKA-5791 Project: Kafka Issue Type: Sub-task Reporter: Colin P. McCabe Add HTTPS support to the fault injector -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5777) Add ducktape integration for the Trogdor Fault injection daemon
Colin P. McCabe created KAFKA-5777: -- Summary: Add ducktape integration for the Trogdor Fault injection daemon Key: KAFKA-5777 URL: https://issues.apache.org/jira/browse/KAFKA-5777 Project: Kafka Issue Type: Improvement Components: system tests Reporter: Colin P. McCabe -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5776) Add the Trogdor fault injection daemon
Colin P. McCabe created KAFKA-5776: -- Summary: Add the Trogdor fault injection daemon Key: KAFKA-5776 URL: https://issues.apache.org/jira/browse/KAFKA-5776 Project: Kafka Issue Type: Improvement Components: system tests Reporter: Colin P. McCabe -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5775) Implement Fault Injection testing for Kafka
Colin P. McCabe created KAFKA-5775: -- Summary: Implement Fault Injection testing for Kafka Key: KAFKA-5775 URL: https://issues.apache.org/jira/browse/KAFKA-5775 Project: Kafka Issue Type: Bug Reporter: Colin P. McCabe Assignee: Colin P. McCabe Implement fault injection testing for Apache Kafka. The general idea is to create faults such as network partitions or disk failures, and see what happens in the cluster. The fault injector can run as part of a ducktape system test, or standalone. More details here: https://cwiki.apache.org/confluence/display/KAFKA/Fault+Injection -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5768) Upgrade ducktape version to 0.7.0, and use new kill_java_processes
Colin P. McCabe created KAFKA-5768: -- Summary: Upgrade ducktape version to 0.7.0, and use new kill_java_processes Key: KAFKA-5768 URL: https://issues.apache.org/jira/browse/KAFKA-5768 Project: Kafka Issue Type: Improvement Reporter: Colin P. McCabe Assignee: Colin P. McCabe Upgrade the ducktape version to 0.7.0. Use the new `kill_java_processes` function in ducktape to kill only the processes that are part of a service when starting or stopping a service, rather than killing all java processes (in some cases) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5744) ShellTest: add tests for attempting to run nonexistent program, error return
Colin P. McCabe created KAFKA-5744: -- Summary: ShellTest: add tests for attempting to run nonexistent program, error return Key: KAFKA-5744 URL: https://issues.apache.org/jira/browse/KAFKA-5744 Project: Kafka Issue Type: Improvement Components: unit tests Reporter: Colin P. McCabe Assignee: Colin P. McCabe ShellTest should have tests for attempting to run nonexistent program, and running a program which returns an error. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5743) Ducktape services should store their files in subdirectories of /mnt
Colin P. McCabe created KAFKA-5743: -- Summary: Ducktape services should store their files in subdirectories of /mnt Key: KAFKA-5743 URL: https://issues.apache.org/jira/browse/KAFKA-5743 Project: Kafka Issue Type: Improvement Components: system tests Reporter: Colin P. McCabe Assignee: Colin P. McCabe Currently, ducktape services like KafkaService store their files directly in /mnt. This means that cleanup involves running {{rm -rf /mnt/*}}. It would be better if services stored their files in subdirectories of mount. For example, KafkaService could store its files in /mnt/kafka. This would make cleanup simpler and avoid the need to remove all of /mnt. It would also make running multiple services on the same node possible. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5737) KafkaAdminClient thread should be daemon
Colin P. McCabe created KAFKA-5737: -- Summary: KafkaAdminClient thread should be daemon Key: KAFKA-5737 URL: https://issues.apache.org/jira/browse/KAFKA-5737 Project: Kafka Issue Type: Bug Affects Versions: 0.11.0.0 Reporter: Colin P. McCabe Assignee: Colin P. McCabe The admin client thread should be daemon, for consistency with the consumer and producer threads. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5720) In Jenkins, kafka.api.SaslSslAdminClientIntegrationTest failed with org.apache.kafka.common.errors.TimeoutException
Colin P. McCabe created KAFKA-5720: -- Summary: In Jenkins, kafka.api.SaslSslAdminClientIntegrationTest failed with org.apache.kafka.common.errors.TimeoutException Key: KAFKA-5720 URL: https://issues.apache.org/jira/browse/KAFKA-5720 Project: Kafka Issue Type: Bug Reporter: Colin P. McCabe Priority: Minor In Jenkins, kafka.api.SaslSslAdminClientIntegrationTest failed with org.apache.kafka.common.errors.TimeoutException. {code} java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Aborted due to timeout. at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45) at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32) at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89) at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:213) at kafka.api.AdminClientIntegrationTest.testCallInFlightTimeouts(AdminClientIntegrationTest.scala:399) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.kafka.common.errors.TimeoutException: Aborted due to timeout. {code} It's unclear whether this was an environment error or test bug. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5659) AdminClient#describeConfigs makes an extra empty request when only broker info is requested
Colin P. McCabe created KAFKA-5659: -- Summary: AdminClient#describeConfigs makes an extra empty request when only broker info is requested Key: KAFKA-5659 URL: https://issues.apache.org/jira/browse/KAFKA-5659 Project: Kafka Issue Type: Bug Affects Versions: 0.11.0.0 Reporter: Colin P. McCabe Assignee: Colin P. McCabe AdminClient#describeConfigs makes an extra empty request when only broker info is requested -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5623) ducktape kafka service: do not assume Service contains num_nodes
Colin P. McCabe created KAFKA-5623: -- Summary: ducktape kafka service: do not assume Service contains num_nodes Key: KAFKA-5623 URL: https://issues.apache.org/jira/browse/KAFKA-5623 Project: Kafka Issue Type: Bug Components: system tests Affects Versions: 0.11.0.1 Reporter: Colin P. McCabe In the ducktape kafka service, we should not assume that {{ducktape.Service}} contains {{num_nodes}}. In newer versions of ducktape, it does not. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5602) Add --custom-ducktape flag to ducker-ak
Colin P. McCabe created KAFKA-5602: -- Summary: Add --custom-ducktape flag to ducker-ak Key: KAFKA-5602 URL: https://issues.apache.org/jira/browse/KAFKA-5602 Project: Kafka Issue Type: Improvement Components: system tests Reporter: Colin P. McCabe Assignee: Colin P. McCabe We should add a --custom-ducktape flag to ducker-ak. This will make it easy to run new versions of Ducktape against Apache Kafka. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5565) Add a broker metric specifying the number of consumer group rebalances in progress
Colin P. McCabe created KAFKA-5565: -- Summary: Add a broker metric specifying the number of consumer group rebalances in progress Key: KAFKA-5565 URL: https://issues.apache.org/jira/browse/KAFKA-5565 Project: Kafka Issue Type: Improvement Reporter: Colin P. McCabe We should add a broker metric specifying the number of consumer group rebalances in progress. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5484) Refactor kafkatest docker support
Colin P. McCabe created KAFKA-5484: -- Summary: Refactor kafkatest docker support Key: KAFKA-5484 URL: https://issues.apache.org/jira/browse/KAFKA-5484 Project: Kafka Issue Type: Bug Reporter: Colin P. McCabe Assignee: Colin P. McCabe Refactor kafkatest docker support to fix some issues. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5476) Implement a system test that creates network partitions
Colin P. McCabe created KAFKA-5476: -- Summary: Implement a system test that creates network partitions Key: KAFKA-5476 URL: https://issues.apache.org/jira/browse/KAFKA-5476 Project: Kafka Issue Type: Test Reporter: Colin P. McCabe Assignee: Colin P. McCabe Implement a system test that creates network partitions -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KAFKA-5445) Document exceptions thrown by AdminClient methods
[ https://issues.apache.org/jira/browse/KAFKA-5445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16049351#comment-16049351 ] Colin P. McCabe commented on KAFKA-5445: We should also add tests that these exceptions are thrown, where possible > Document exceptions thrown by AdminClient methods > - > > Key: KAFKA-5445 > URL: https://issues.apache.org/jira/browse/KAFKA-5445 > Project: Kafka > Issue Type: Improvement >Reporter: Ismael Juma >Assignee: Colin P. McCabe > Fix For: 0.11.0.1 > > > AdminClient should document the exceptions that users may have to handle. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (KAFKA-5336) ListGroup requires Describe on Cluster, but the command-line AclCommand tool does not allow this to be set
[ https://issues.apache.org/jira/browse/KAFKA-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe updated KAFKA-5336: --- Summary: ListGroup requires Describe on Cluster, but the command-line AclCommand tool does not allow this to be set (was: The required ACL permission for ListGroup is invalid) > ListGroup requires Describe on Cluster, but the command-line AclCommand tool > does not allow this to be set > -- > > Key: KAFKA-5336 > URL: https://issues.apache.org/jira/browse/KAFKA-5336 > Project: Kafka > Issue Type: Bug > Components: security >Affects Versions: 0.10.2.1 >Reporter: Vahid Hashemian >Assignee: Vahid Hashemian >Priority: Minor > Fix For: 0.11.0.0 > > > The {{ListGroup}} API authorizes requests with _Describe_ access to the > cluster resource: > {code} > def handleListGroupsRequest(request: RequestChannel.Request) { > if (!authorize(request.session, Describe, Resource.ClusterResource)) { > sendResponseMaybeThrottle(request, requestThrottleMs => > ListGroupsResponse.fromError(requestThrottleMs, > Errors.CLUSTER_AUTHORIZATION_FAILED)) > } else { > ... > {code} > However, the list of operations (or permissions) allowed for the cluster > resource does not include _Describe_: > {code} > val ResourceTypeToValidOperations = Map[ResourceType, Set[Operation]] ( > ... > Cluster -> Set(Create, ClusterAction, DescribeConfigs, AlterConfigs, > IdempotentWrite, All), > ... > ) > {code} > Only a user with _All_ cluster permission can successfully call the > {{ListGroup}} API. No other permission (not even any combination that does > not include _All_) would let user use this API. > The bug could be as simple as a typo in the API handler. Though it's not > obvious what actual permission was meant to be used there (perhaps > _DescribeConfigs_?) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (KAFKA-5336) The required ACL permission for ListGroup is invalid
[ https://issues.apache.org/jira/browse/KAFKA-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin P. McCabe resolved KAFKA-5336. Resolution: Duplicate Fix Version/s: 0.11.0.0 KAFKA-5292 in 0.11 added {{Describe}} as a valid operation on {{Cluster}} > The required ACL permission for ListGroup is invalid > > > Key: KAFKA-5336 > URL: https://issues.apache.org/jira/browse/KAFKA-5336 > Project: Kafka > Issue Type: Bug > Components: security >Affects Versions: 0.10.2.1 >Reporter: Vahid Hashemian >Assignee: Vahid Hashemian >Priority: Minor > Fix For: 0.11.0.0 > > > The {{ListGroup}} API authorizes requests with _Describe_ access to the > cluster resource: > {code} > def handleListGroupsRequest(request: RequestChannel.Request) { > if (!authorize(request.session, Describe, Resource.ClusterResource)) { > sendResponseMaybeThrottle(request, requestThrottleMs => > ListGroupsResponse.fromError(requestThrottleMs, > Errors.CLUSTER_AUTHORIZATION_FAILED)) > } else { > ... > {code} > However, the list of operations (or permissions) allowed for the cluster > resource does not include _Describe_: > {code} > val ResourceTypeToValidOperations = Map[ResourceType, Set[Operation]] ( > ... > Cluster -> Set(Create, ClusterAction, DescribeConfigs, AlterConfigs, > IdempotentWrite, All), > ... > ) > {code} > Only a user with _All_ cluster permission can successfully call the > {{ListGroup}} API. No other permission (not even any combination that does > not include _All_) would let user use this API. > The bug could be as simple as a typo in the API handler. Though it's not > obvious what actual permission was meant to be used there (perhaps > _DescribeConfigs_?) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-3925) Default log.dir=/tmp/kafka-logs is unsafe
[ https://issues.apache.org/jira/browse/KAFKA-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043469#comment-16043469 ] Colin P. McCabe commented on KAFKA-3925: The issue with {{/var}} is that if you run Kafka as an ordinary user, you cannot write to this directory. Perhaps we could write to {{$HOME/kafka-logs}}? Although that would imply a different location for different users. I also think a lot of unit tests rely on the {{/tmp}} location. > Default log.dir=/tmp/kafka-logs is unsafe > - > > Key: KAFKA-3925 > URL: https://issues.apache.org/jira/browse/KAFKA-3925 > Project: Kafka > Issue Type: Bug > Components: config >Affects Versions: 0.10.0.0 > Environment: Various, depends on OS and configuration >Reporter: Peter Davis > > Many operating systems are configured to delete files under /tmp. For > example Ubuntu has > [tmpreaper|http://manpages.ubuntu.com/manpages/wily/man8/tmpreaper.8.html], > others use tmpfs, others delete /tmp on startup. > Defaults are OK to make getting started easier but should not be unsafe > (risking data loss). > Something under /var would be a better default log.dir under *nix. Or > relative to the Kafka bin directory to avoid needing root. > If the default cannot be changed, I would suggest a special warning print to > the console on broker startup if log.dir is under /tmp. > See [users list > thread|http://mail-archives.apache.org/mod_mbox/kafka-users/201607.mbox/%3cCAD5tkZb-0MMuWJqHNUJ3i1+xuNPZ4tnQt-RPm65grxE0=0o...@mail.gmail.com%3e]. > I've also been bitten by this. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-1955) Explore disk-based buffering in new Kafka Producer
[ https://issues.apache.org/jira/browse/KAFKA-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043458#comment-16043458 ] Colin P. McCabe commented on KAFKA-1955: As Jay wrote, there are some potential problems with the disk-based buffering approach: {quote} The cons of the second approach are the following: 1. You end up depending on disks on all the producer machines. If you have 1 producers, that is 10k places state is kept. These tend to fail a lot. 2. You can get data arbitrarily delayed 3. You still don't tolerate hard outages since there is no replication in the producer tier 4. This tends to make problems with duplicates more common in certain failure scenarios. {quote} Do we have potential solutions for these? bq. I believe a malloc/free implementation over `MappedByteBuffer` will be the best choice. This will allow the producer buffers to use a file like a piece of memory at the cost of maintaining a more complex free list. How do you plan on ensuring that the messages are written to disk in a timely fashion? It seems possible that you could lose quite a lot of data if you lose power before the memory-mapped regions are written back to disk. Also, a malloc implementation is quite a lot of complexity-- are we sure it's worth it? If we are going to do this, we'd probably want to start with something like an append-only log that on which we call {{fsync}} periodically. Also, we would need a KIP... > Explore disk-based buffering in new Kafka Producer > -- > > Key: KAFKA-1955 > URL: https://issues.apache.org/jira/browse/KAFKA-1955 > Project: Kafka > Issue Type: Improvement > Components: producer >Affects Versions: 0.8.2.0 >Reporter: Jay Kreps >Assignee: Jay Kreps > Attachments: KAFKA-1955.patch, > KAFKA-1955-RABASED-TO-8th-AUG-2015.patch > > > There are two approaches to using Kafka for capturing event data that has no > other "source of truth store": > 1. Just write to Kafka and try hard to keep the Kafka cluster up as you would > a database. > 2. Write to some kind of local disk store and copy from that to Kafka. > The cons of the second approach are the following: > 1. You end up depending on disks on all the producer machines. If you have > 1 producers, that is 10k places state is kept. These tend to fail a lot. > 2. You can get data arbitrarily delayed > 3. You still don't tolerate hard outages since there is no replication in the > producer tier > 4. This tends to make problems with duplicates more common in certain failure > scenarios. > There is one big pro, though: you don't have to keep Kafka running all the > time. > So far we have done nothing in Kafka to help support approach (2), but people > have built a lot of buffering things. It's not clear that this is necessarily > bad. > However implementing this in the new Kafka producer might actually be quite > easy. Here is an idea for how to do it. Implementation of this idea is > probably pretty easy but it would require some pretty thorough testing to see > if it was a success. > The new producer maintains a pool of ByteBuffer instances which it attempts > to recycle and uses to buffer and send messages. When unsent data is queuing > waiting to be sent to the cluster it is hanging out in this pool. > One approach to implementing a disk-baked buffer would be to slightly > generalize this so that the buffer pool has the option to use a mmap'd file > backend for it's ByteBuffers. When the BufferPool was created with a > totalMemory setting of 1GB it would preallocate a 1GB sparse file and memory > map it, then chop the file into batchSize MappedByteBuffer pieces and > populate it's buffer with those. > Everything else would work normally except now all the buffered data would be > disk backed and in cases where there was significant backlog these would > start to fill up and page out. > We currently allow messages larger than batchSize and to handle these we do a > one-off allocation of the necessary size. We would have to disallow this when > running in mmap mode. However since the disk buffer will be really big this > should not be a significant limitation as the batch size can be pretty big. > We would want to ensure that the pooling always gives out the most recently > used ByteBuffer (I think it does). This way under normal operation where > requests are processed quickly a given buffer would be reused many times > before any physical disk write activity occurred. > Note that although this let's the producer buffer very large amounts of data > the buffer isn't really fault-tolerant, since the ordering in the file isn't > known so there is no easy way to recovery the producer's buffer in a failure. > So the scope of this feature would just be to