[jira] [Created] (KAFKA-9339) Increased CPU utilization in brokers in 2.4.0
James Brown created KAFKA-9339: -- Summary: Increased CPU utilization in brokers in 2.4.0 Key: KAFKA-9339 URL: https://issues.apache.org/jira/browse/KAFKA-9339 Project: Kafka Issue Type: Bug Affects Versions: 2.4.0 Environment: CentOS 6; Java 1.8.0_232 (OpenJDK) Reporter: James Brown I upgraded one of my company's test clusters from 2.3.1 to 2.4.0 and have noticed a significant (40%) increase in the CPU time consumed. This is a small cluster of three nodes (running on t2.large EC2 instances all in the same AZ) pushing about 150 message/s in aggregate spread across 208 topics (a total of 266 partitions; most topics only have one partition). Leadership is reasonably well-distributed and each node has between 83 and 94 partitions which it leads. This CPU time increase is visible symmetrically on all three nodes in the cluster (e.g., the controller isn't using more CPU than the other nodes). The CPU consumption did not return to normal after I did the second restart to bump the log and inter-broker protocol versions to 2.4, so I don't think it has anything to do with down-converting to the 2.3 protocols. No settings were changed, nor was anything about the JVM changed. There is nothing interesting being written to the logs. There's no sign of any instability (partitions aren't being reassigned, etc). The best guess I have for the increased CPU usage is that the number of garbage collections increased by approximately 30%, suggesting that something is churning a lot more garbage inside Kafka. This is a small cluster, so it's only got a 3GB heap allocated to Kafka on each node; we're using G1GC with some light tuning and are on Java 8 if that helps. We are only using OpenJDK, so I don't think I can produce a Flight Recorder profile. The kafka-users mailing list suggested this was worth filing a Jira issue about. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-9338) Incremental fetch sessions do not maintain or use leader epoch for fencing purposes
Lucas Bradstreet created KAFKA-9338: --- Summary: Incremental fetch sessions do not maintain or use leader epoch for fencing purposes Key: KAFKA-9338 URL: https://issues.apache.org/jira/browse/KAFKA-9338 Project: Kafka Issue Type: Bug Components: core Affects Versions: 2.4.0, 2.3.0, 2.2.0, 2.1.0 Reporter: Lucas Bradstreet KIP-320 adds the ability to fence replicas by detecting stale leader epochs from followers, and helping consumers handle unclean truncation. Unfortunately the incremental fetch session handling does not maintain or use the leader epoch in the fetch session cache. As a result, it does not appear that the leader epoch is used for fencing a majority of the time. I'm not sure if this is only the case after incremental fetch sessions are established - it may be the case that the first "full" fetch session is safe. Optional.empty is returned for the FetchRequest.PartitionData here: [https://github.com/apache/kafka/blob/a4cbdc6a7b3140ccbcd0e2339e28c048b434974e/core/src/main/scala/kafka/server/FetchSession.scala#L111] I believe this affects brokers from 2.1.0 when fencing was improved on the replica fetcher side, and 2.3.0 and above for consumers, which is when client side truncation detection was added on the consumer side. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Permission to create KIP
What is your wiki account id? -Matthias On 12/27/19 7:12 AM, Sagar wrote: > Hi, > > I have done some work on adding prefix scan for state store and would like > to create a KIP for the same. > > Thanks! > Sagar. > signature.asc Description: OpenPGP digital signature
Re: [DISCUSS] KIP-515: Enable ZK client to use the new TLS supported authentication
Hi everyone. I would like to make the following changes to the KIP. MOTIVATION: Include a statement that it will be difficult in the short term to deprecate direct Zookeeper communication in kafka-configs.{sh, bat} (which invoke kafka.admin.ConfigCommand) because bootstrapping a Kafka cluster with encrypted passwords in Zookeeper is an explicitly-supported use case; therefore it is in scope to be able to securely configure the CLI tools that still leverage non-deprecated direct Zookeeper communication for TLS (the other 2 tools are kafka-reassign-partitions.{sh, bat} and zookeeper-security-migration.sh). GOALS: Support the secure configuration of TLS-encrypted communication between Zookeeper and: a) Kafka brokers b) The three CLI tools mentioned above that still support direct, non-deprecated communication to Zookeeper It is explicitly out-of-scope to deprecate any direct Zookeeper communication in CLI tools as part of this KIP; such work will occur in future KIPs instead. PUBLIC INTERFACES: 1) The following new broker configurations will be recognized. zookeeper.client.secure (default value = false, for backwards compatibility) zookeeper.clientCnxnSocket zookeeper.ssl.keyStore.location zookeeper.ssl.keyStore.password zookeeper.ssl.trustStore.location zookeeper.ssl.trustStore.password It will be an error for any of the last 5 values to be left unspecified if zookeeper.client.secure is explicitly set to true. 2) In addition, the kafka.security.authorizer.AclAuthorizer class supports the ability to connect to a different Zookeeper instance than the one the brokers use. We therefore also add the following optional configs, which override the corresponding ones from above when present: authorizer.zookeeper.client.secure authorizer.zookeeper.clientCnxnSocket authorizer.zookeeper.ssl.keyStore.location authorizer.zookeeper.ssl.keyStore.password authorizer.zookeeper.ssl.trustStore.location authorizer.zookeeper.ssl.trustStore.password 3) The three CLI tools mentioned above will support a new --zk-tls-config-file " option. The following properties will be recognized in that file, and unrecognized properties will be ignored to allow the possibility of pointing zk-tls-config-file at the broker's config file. zookeeper.client.secure (default value = false) zookeeper.clientCnxnSocket zookeeper.ssl.keyStore.location zookeeper.ssl.keyStore.password zookeeper.ssl.trustStore.location zookeeper.ssl.trustStore.password It will be an error for any of the last 5 values to be left unspecified if zookeeper.client.secure is explicitly set to true. Ron On Mon, Dec 23, 2019 at 3:03 PM Ron Dagostino wrote: > Hi everyone. Let's get this discussion going again now that Kafka 2.4 > with Zookeeper 3.5.6 is out. > > First, regarding the KIP number, the other KIP that was using this number > moved to KIP 534, so KIP 515 remains the correct number for this > discussion. I've updated the Kafka Improvement Proposal page to list this > KIP in the 515 slot, so we're all set there. > > Regarding the actual issues under discussion, I think there are some > things we should clarify. > > 1) It is possible to use TLS connectivity to Zookeeper from Apache Kafka > 2.4 -- the problem is that configuration information has to be passed via > system properties as "-D" command line options on the java invocation of > the broker, and those are not secure (anyone with access to the box can see > the command line used to invoke the broker); the configuration includes > sensitive information (e.g. a keystore password), so we need a secure > mechanism for passing the configuration values. I believe the real > motivation for this KIP is to harden the configuration mechanism for > Zookeeper TLS connectivity. > > 2) I believe the list of CLI tools that continue to use direct Zookeeper > connectivity in a non-deprecated fashion is: > a) zookeeper-security-migration.sh/kafka.admin.ZkSecurityMigrator > b) > kafka-reassign-partitions.{sh,bat}/kafka.admin.ReassignPartitionsCommand > c) kafka-configs.{sh, bat}/kafka.admin.ConfigCommand > > 3) I believe Kafka.admin.ConfigCommand presents a conundrum because it > explicitly states in a comment that a supported use case is bootstrapping a > Kafka cluster with encrypted passwords in Zookeeper (see > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/admin/ConfigCommand.scala#L65). > This means it will be especially difficult to deprecate this particular > direct Zookeeper connectivity without a different storage mechanism for > dynamic configuration values being available (i.e. the self-managed quorum > referred to in KIP-500). > > I think it would be easier and simpler to harden the Zookeeper TLS > configuration in both Kafka and in CLI tools compared to trying to > deprecate the direct Zookeeper connectivity in the above 3 CLI tools -- > especially when ConfigCommand has no obvious short-term path to deprecation. > > Regarding how to harden the
Re: Kafka 2.4.0 & Mirror Maker 2.0 Error
Thanks Peter, I'll take a look. Ryanne On Fri, Dec 27, 2019, 7:48 AM Péter Sinóros-Szabó wrote: > Hi, > > I see the same. > I just downloaded the Kafka zip and I run: > > ~/kafka-2.4.0-rc3$ ./bin/connect-mirror-maker.sh > config/connect-mirror-maker.properties > > Peter > > On Mon, 16 Dec 2019 at 17:14, Ryanne Dolan wrote: > > > Hey Jamie, are you running the MM2 connectors on an existing Connect > > cluster, or with the connet-mirror-maker.sh driver? Given your question > > about plugin.path I'm guessing the former. Is the Connect cluster running > > 2.4.0 as well? The jars should land in the Connect runtime without any > need > > to modify the plugin.path or copy jars around. > > > > Ryanne > > > > On Mon, Dec 16, 2019, 6:23 AM Jamie wrote: > > > > > Hi All, > > > I'm trying to set up mirror maker 2.0 with Kafka 2.4.0 however, I'm > > > receiving the following errors on startup: > > > ERROR Plugin class loader for connector > > > 'org.apache.kafka.connect.mirror.MirrorSourceConnector' was not found. > > > Returning: > > > > org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader@187eb9a8 > > > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > > > ERROR Plugin class loader for connector > > > 'org.apache.kafka.connect.mirror.MirrorHeartbeatConnector' was not > > > found. Returning: > > > > org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader@187eb9a8 > > > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > > > ERROR Plugin class loader for connector > > > 'org.apache.kafka.connect.mirror.MirrorCheckpointConnector' was not > > > found. Returning: > > > > org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader@187eb9a8 > > > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > > > > > > I've checked the jar file containing these class file is in the class > > > path. > > > Is there anything I need to add to plugin.path for the connect > properties > > > when running mirror maker? > > > Many Thanks, > > > Jamie > > > > > -- > - Sini >
Re: Kafka 2.4.0 & Mirror Maker 2.0 Error
Hi, I see the same. I just downloaded the Kafka zip and I run: ~/kafka-2.4.0-rc3$ ./bin/connect-mirror-maker.sh config/connect-mirror-maker.properties Peter On Mon, 16 Dec 2019 at 17:14, Ryanne Dolan wrote: > Hey Jamie, are you running the MM2 connectors on an existing Connect > cluster, or with the connet-mirror-maker.sh driver? Given your question > about plugin.path I'm guessing the former. Is the Connect cluster running > 2.4.0 as well? The jars should land in the Connect runtime without any need > to modify the plugin.path or copy jars around. > > Ryanne > > On Mon, Dec 16, 2019, 6:23 AM Jamie wrote: > > > Hi All, > > I'm trying to set up mirror maker 2.0 with Kafka 2.4.0 however, I'm > > receiving the following errors on startup: > > ERROR Plugin class loader for connector > > 'org.apache.kafka.connect.mirror.MirrorSourceConnector' was not found. > > Returning: > > org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader@187eb9a8 > > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > > ERROR Plugin class loader for connector > > 'org.apache.kafka.connect.mirror.MirrorHeartbeatConnector' was not > > found. Returning: > > org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader@187eb9a8 > > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > > ERROR Plugin class loader for connector > > 'org.apache.kafka.connect.mirror.MirrorCheckpointConnector' was not > > found. Returning: > > org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader@187eb9a8 > > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader) > > > > I've checked the jar file containing these class file is in the class > > path. > > Is there anything I need to add to plugin.path for the connect properties > > when running mirror maker? > > Many Thanks, > > Jamie > -- - Sini
Permission to create KIP
Hi, I have done some work on adding prefix scan for state store and would like to create a KIP for the same. Thanks! Sagar.
Re: [DISCUSS] KIP-405: Kafka Tiered Storage
Hi all, Jun: > (a) Cost: S3 list object requests cost $0.005 per 1000 requests. If you > have 100,000 partitions and want to pull the metadata for each partition at > the rate of 1/sec. It can cost $0.5/sec, which is roughly $40K per day. I want to note here, that no reasonably durable storage will be cheap at 100k RPS. For example, DynamoDB might give the same ballpark figures. If we want to keep the pull-based approach, we can try to reduce this number in several ways: doing listings less frequently (as Satish mentioned, with the current defaults it's ~3.33k RPS for your example), batching listing operations in some way (depending on the storage; it might require the change of RSM's interface). > There are different ways for doing push based metadata propagation. Some > object stores may support that already. For example, S3 supports events > notification This sounds interesting. However, I see a couple of issues using it: 1. As I understand the documentation, notification delivery is not guaranteed and it's recommended to periodically do LIST to fill the gaps. Which brings us back to the same LIST consistency guarantees issue. 2. The same goes for the broker start: to get the current state, we need to LIST. 3. The dynamic set of multiple consumers (RSMs): AFAIK SQS and SNS aren't designed for such a case. Alexandre: > A.1 As commented on PR 7561, S3 consistency model [1][2] implies RSM cannot > relies solely on S3 APIs to guarantee the expected strong consistency. The > proposed implementation [3] would need to be updated to take this into > account. Let’s talk more about this. Thank you for the feedback. I clearly see the need for changing the S3 implementation to provide stronger consistency guarantees. As it see from this thread, there are several possible approaches to this. Let's discuss RemoteLogManager's contract and behavior (like pull vs push model) further before picking one (or several - ?) of them. I'm going to do some evaluation of DynamoDB for the pull-based approach, if it's possible to apply it paying a reasonable bill. Also, of the push-based approach with a Kafka topic as the medium. > A.2.3 Atomicity – what does an implementation of RSM need to provide with > respect to atomicity of the APIs copyLogSegment, cleanupLogUntil and > deleteTopicPartition? If a partial failure happens in any of those (e.g. in > the S3 implementation, if one of the multiple uploads fails [4]), The S3 implementation is going to change, but it's worth clarifying anyway. The segment log file is being uploaded after S3 has acked uploading of all other files associated with the segment and only after this the whole segment file set becomes visible remotely for operations like listRemoteSegments [1]. In case of upload failure, the files that has been successfully uploaded stays as invisible garbage that is collected by cleanupLogUntil (or overwritten successfully later). And the opposite happens during the deletion: log files are deleted first. This approach should generally work when we solve consistency issues by adding a strongly consistent storage: a segment's uploaded files remain invisible garbage until some metadata about them is written. > A.3 Caching – storing locally the segments retrieved from the remote > storage is excluded as it does not align with the original intent and even > defeat some of its purposes (save disk space etc.). That said, could there > be other types of use cases where the pattern of access to the remotely > stored segments would benefit from local caching (and potentially > read-ahead)? Consider the use case of a large pool of consumers which start > a backfill at the same time for one day worth of data from one year ago > stored remotely. Caching the segments locally would allow to uncouple the > load on the remote storage from the load on the Kafka cluster. Maybe the > RLM could expose a configuration parameter to switch that feature on/off? I tend to agree here, caching remote segments locally and making this configurable sounds pretty practical to me. We should implement this, maybe not in the first iteration. Br, Ivan [1] https://github.com/harshach/kafka/pull/18/files#diff-4d73d01c16caed6f2548fc3063550ef0R152 On Thu, 19 Dec 2019 at 19:49, Alexandre Dupriez wrote: > Hi Jun, > > Thank you for the feedback. I am trying to understand how a push-based > approach would work. > In order for the metadata to be propagated (under the assumption you > stated), would you plan to add a new API in Kafka to allow the > metadata store to send them directly to the brokers? > > Thanks, > Alexandre > > > Le mer. 18 déc. 2019 à 20:14, Jun Rao a écrit : > > > > Hi, Satish and Ivan, > > > > There are different ways for doing push based metadata propagation. Some > > object stores may support that already. For example, S3 supports events > > notification ( > > https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html). > > Otherwise one could use
[jira] [Created] (KAFKA-9337) Simplifying standalone mm2-connect config
karan kumar created KAFKA-9337: -- Summary: Simplifying standalone mm2-connect config Key: KAFKA-9337 URL: https://issues.apache.org/jira/browse/KAFKA-9337 Project: Kafka Issue Type: Improvement Components: KafkaConnect, mirrormaker Affects Versions: 2.4.0 Reporter: karan kumar One of the nice things about kafka is setting up in the local environment is really simple. I was giving a try to the latest feature ie MM2 and found it took me some time to get a minimal setup running. Default config provided assumes that there will already be 3 brokers running due to the default replication factor of the admin topics the mm2 connector creates. This got me thinking that most of the people would follow the same approach I followed. 1. Start a single broker cluster on 9092 2. Start another single cluster broker on, let's say, 10002 3. Start mm2 by"./bin/connect-mirror-maker.sh ./config/connect-mirror-maker.properties" What happened was I had to supply a lot more configs This jira is created post discussion on the mailing list: https://lists.apache.org/thread.html/%3ccajxudh13kw3nam3ho69wrozsyovwue1nxf9hkcbawc9r-3d...@mail.gmail.com%3E cc [~ryannedolan] -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] KIP-552: Add interface to handle unused config
Hi, Indeed changing log level to debug would be the easiest and I think that would be a good solution. When no one object I'm ready to move forward with this approach and submit a MR. The only minor thing I have – having it at debug log level might make it a bit less friendly for developers, especially for those who just do the first steps in Kafka. For example, if you misspelled the property name and trying to understand why things don't do what you expect. Having a warning might save some time in this case. Other than that I cannot see any reasons to have warnings there. Thanks, Artur On Thu, Dec 26, 2019 at 10:01 PM John Roesler wrote: > > Thanks for the KIP, Artur! > > For reference, here is the kip: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-552%3A+Add+interface+to+handle+unused+config > > I agree, these warnings are kind of a nuisance. Would it be feasible just to > leverage log4j in some way to make it easy to filter these messages? For > example, we could move those warnings to debug level, or even use a separate > logger for them. > > Thanks for starting the discussion. > -John > > On Tue, Dec 24, 2019, at 07:23, Artur Burtsev wrote: > > Hi, > > > > This KIP provides a way to deal with a warning "The configuration {} > > was supplied but isn't a known config." when it is not relevant. > > > > Cheers, > > Artur > >