What metrics to use to diagnose performance degradation?
Hi, I have a Kafka cluster and two topics it, X and Y. I have two unrelated applications, one reads from topic X and one from topic Y. Those applications don’t share any resources except for Kafka and K8S clusters. In both application the processing is very basic and no changes were made to it. During last days at Grafana of both topics I see drops in the rate processing at both applications. The drops are perfectly correlated between two applications. Which metrics exposed by Kafka brokers can help me understand whether the root cause is related to Kafka performance. Thanks, Victoria --- NOTICE: This email and all attachments are confidential, may be proprietary, and may be privileged or otherwise protected from disclosure. They are intended solely for the individual or entity to whom the email is addressed. However, mistakes sometimes happen in addressing emails. If you believe that you are not an intended recipient, please stop reading immediately. Do not copy, forward, or rely on the contents in any way. Notify the sender and/or Imperva, Inc. by telephone at +1 (650) 832-6006 and then delete or destroy any copy of this email and its attachments. The sender reserves and asserts all rights to confidentiality, as well as any privileges that may apply. Any disclosure, copying, distribution or action taken or omitted to be taken by an unintended recipient in reliance on this message is prohibited and may be unlawful. Please consider the environment before printing this email.
Number of topics to which provider sends
Hi, Background: Java, Kafka 2.1.0 I have application that sends to two different topics. In theory I can use the same provider. Are there any advantages to having provider per topic? I looked for best practices for this matter but didn’t find any... Thanks, Victoria --- NOTICE: This email and all attachments are confidential, may be proprietary, and may be privileged or otherwise protected from disclosure. They are intended solely for the individual or entity to whom the email is addressed. However, mistakes sometimes happen in addressing emails. If you believe that you are not an intended recipient, please stop reading immediately. Do not copy, forward, or rely on the contents in any way. Notify the sender and/or Imperva, Inc. by telephone at +1 (650) 832-6006 and then delete or destroy any copy of this email and its attachments. The sender reserves and asserts all rights to confidentiality, as well as any privileges that may apply. Any disclosure, copying, distribution or action taken or omitted to be taken by an unintended recipient in reliance on this message is prohibited and may be unlawful. Please consider the environment before printing this email.
Keys and partitions
Hi, I have userId as a key. Many users have moderate amounts of data but some users have more and some users have huge amount of data. I have been thinking about the following aspects of partitioning: 1. If two or more large users will fall into same partition I might end up with large partition/s (unbalanced with other partitions) 2. If smaller users fall in the same partition as a huge user the small users might get slower processing due to the amount of data the huge user has 3. If the order of the messages is not critical, maybe I would want to allow several consumers to work on the data of the same huge user, therefore I would like to partition one userId into several partitions I have some ideas how to partition to solve those issues that but if you have something that worked well for you at production I would love to hear. Also, any links to relevant blogposts/etc will be welcome Thanks, Victoria --- NOTICE: This email and all attachments are confidential, may be proprietary, and may be privileged or otherwise protected from disclosure. They are intended solely for the individual or entity to whom the email is addressed. However, mistakes sometimes happen in addressing emails. If you believe that you are not an intended recipient, please stop reading immediately. Do not copy, forward, or rely on the contents in any way. Notify the sender and/or Imperva, Inc. by telephone at +1 (650) 832-6006 and then delete or destroy any copy of this email and its attachments. The sender reserves and asserts all rights to confidentiality, as well as any privileges that may apply. Any disclosure, copying, distribution or action taken or omitted to be taken by an unintended recipient in reliance on this message is prohibited and may be unlawful. Please consider the environment before printing this email.
Re: Disk space - sharp increase in usage
Regards kafka-logs directory, it was an interesting lead, we checked and it is the same. Regards replication factor and retention, I am not looking for current information, I am look for metrics that can give me information about a change. Still looking for more ideas On 02/06/2020, 11:31, "Peter Bukowinski" wrote: CAUTION: This message was sent from outside the company. Do not click links or open attachments unless you recognize the sender and know the content is safe. > On Jun 2, 2020, at 12:56 AM, Victoria Zuberman wrote: > > Hi, > > Background: > Kafka cluster > 7 brokers, with 4T disk each > version 2.3 (recently upgraded from 0.1.0 via 1.0.1) > > Problem: > Used disk space went from 40% to 80%. > Looking for root cause. > > Suspects: > > 1. Incoming traffic > > Ruled out, according to metrics no significant change in “bytes in” for topics in cluster > > 1. Upgrade > > The raise started on the day of upgrade to 2.3 > > But we upgraded another cluster in the same way and we don’t see similar issue there > > Is there a known change or issue at 2.3 related to disk space usage? > > 1. Replication factor > > Is there a way to see whether replication factor of any topic was changed recently? Didn’t find in metrics... You can use the kafka-topics.sh script to check the replica count for all your topics. Upgrading would not have affected the replica count, though. > 1. Retention > > Is there a way to see whether retention was changed recently? Didn’t find in metrics... You can use kafka-topics.sh —-zookeeper host:2181 --describe --topics-with-overrides to list any topics with non-default retention, but I’m guessing that’s not it. If your disk usage went from 40 to 80% on all brokers — effectively doubled — it could be that your kafka data log directory path(s) changed during the upgrade. As you upgraded each broker and (re)started the kafka, it would have left the existing data under the old one path and created new topic partition directories and logs under the new path as it rejoined the cluster. Have you verified that your data log directory locations are the same as they used to be? > Would appreciate any other ideas or investigation leads > > Thanks, > Victoria > > --- > NOTICE: > This email and all attachments are confidential, may be proprietary, and may be privileged or otherwise protected from disclosure. They are intended solely for the individual or entity to whom the email is addressed. However, mistakes sometimes happen in addressing emails. If you believe that you are not an intended recipient, please stop reading immediately. Do not copy, forward, or rely on the contents in any way. Notify the sender and/or Imperva, Inc. by telephone at +1 (650) 832-6006 and then delete or destroy any copy of this email and its attachments. The sender reserves and asserts all rights to confidentiality, as well as any privileges that may apply. Any disclosure, copying, distribution or action taken or omitted to be taken by an unintended recipient in reliance on this message is prohibited and may be unlawful. > Please consider the environment before printing this email.
Disk space - sharp increase in usage
Hi, Background: Kafka cluster 7 brokers, with 4T disk each version 2.3 (recently upgraded from 0.1.0 via 1.0.1) Problem: Used disk space went from 40% to 80%. Looking for root cause. Suspects: 1. Incoming traffic Ruled out, according to metrics no significant change in “bytes in” for topics in cluster 1. Upgrade The raise started on the day of upgrade to 2.3 But we upgraded another cluster in the same way and we don’t see similar issue there Is there a known change or issue at 2.3 related to disk space usage? 1. Replication factor Is there a way to see whether replication factor of any topic was changed recently? Didn’t find in metrics... 1. Retention Is there a way to see whether retention was changed recently? Didn’t find in metrics... Would appreciate any other ideas or investigation leads Thanks, Victoria --- NOTICE: This email and all attachments are confidential, may be proprietary, and may be privileged or otherwise protected from disclosure. They are intended solely for the individual or entity to whom the email is addressed. However, mistakes sometimes happen in addressing emails. If you believe that you are not an intended recipient, please stop reading immediately. Do not copy, forward, or rely on the contents in any way. Notify the sender and/or Imperva, Inc. by telephone at +1 (650) 832-6006 and then delete or destroy any copy of this email and its attachments. The sender reserves and asserts all rights to confidentiality, as well as any privileges that may apply. Any disclosure, copying, distribution or action taken or omitted to be taken by an unintended recipient in reliance on this message is prohibited and may be unlawful. Please consider the environment before printing this email.
Re: Partitioning issue when a broker is going down
Regards number of partitions: Still don't understand it fully. I revisited Java default partitioner. I see that there available partitions are used only when key is not provided (virtually when it is round-robin). When key is provided, it uses number of partitions (regardless of availability). This makes sense since assigning messages to different partition when some partitions become unavailable would violate order within partition guarantee. Providing Custom partitioner with available partitions as partition number sounds strange. Still looking for insights whether there is a flow where number of partitions reported for topic is less than what is configured when topic is created. On 17/05/2020, 22:18, "Peter Bukowinski" wrote: CAUTION: This message was sent from outside the company. Do not click links or open attachments unless you recognize the sender and know the content is safe. > On May 17, 2020, at 11:45 AM, Victoria Zuberman wrote: > > Regards acks=all: > - > Interesting point. Will check acks and min.insync.replicas values. > If I understand the root cause that you are suggesting correctly, given my RF=2 and 3 brokers in cluster: > min.insync.replicas > 1 and acks=all, removing one broker ---> partition that had a replica on the removed broker can't get written until the replica is up on another broker? That is correct. From a producer standpoint, the unaffected partitions will still be able to accept data, so depending on data rate and message size, producers may not be negatively affected by the missing broker. > Regards number of partitions > - > The producer to this topic is using librdkafka, using partioner_cb callback, which receives number of partition as partitions_cnt. This makes sense, when called, you will get partitions that are able to accept data. When a broker goes down and some topics become under-replicated, and your producer settings omit the remaining replicas of those partitions as valid targets, then partitions_cnt will only enumerate the remaining partitions. > Still trying to understand how the library obtains partitions_cnt value. > I wonder if the behavior is similar to Java library, where it the default partitioner uses the number of available partitions as the number of current partitions... The logic is similar as that is how kafka is designed. The client will fetch the topic’s metadata (including partitions available for writing) on connect, on error, and by the interval determined by topic.metadata.refresh.interval.ms, unless it is set to -1. > On 17/05/2020, 20:59, "Peter Bukowinski" wrote: > > >If your producer is set to use acks=all, then it won’t be able to produce to the topic topic partitions that had replicas on the missing broker until the replacement broker has finished catching up to be included in the ISR. > >What method are you using that reports on the number of topic partitions? If some partitions go offline, the cluster still knows how many there are supposed to be, so I’m curious what is reporting 10 when there should be 15. > >-- Peter > >> On May 17, 2020, at 10:36 AM, Victoria Zuberman wrote: >> >> Hi, >> >> Kafka cluster with 3 brokers, version 1.0.1. >> Topic with 15 partitions, replication factor 2. All replicas in sync. >> Bringing down one of the brokers (ungracefully), then adding a broker in version 1.0.1 >> >> During this process, are we expected either of the following to happen: >> >> 1. Some of the partitions become unavailable for producer to write to >> 2. Cluster reports the number of partitions at the topic as 10 and not 15 >> It seems like both issues take place in our case, for about a minute. >> >> We are trying to understand whether it is an expected behavior and if not, what can be causing it. >> >> Thanks, >> Victoria >> --- >> NOTICE: >> This email and all attachments are confidential, may be proprietary, and may be privileged or otherwise protected from disclosure. They are intended solely for the individual or entity to whom the email is addressed. However, mistakes sometimes happen in addressing emails. If you believe that you are not an intended recipient, please stop reading immediately. Do not copy, forward, or rely on the contents in any way. Notify the sender and/or Imperva, Inc. by telephone at +1 (650) 832-6006 and then delete or destroy any copy of this email and its attachments. The sender reserves and asserts all ri
Re: Partitioning issue when a broker is going down
Regards acks=all: - Interesting point. Will check acks and min.insync.replicas values. If I understand the root cause that you are suggesting correctly, given my RF=2 and 3 brokers in cluster: min.insync.replicas > 1 and acks=all, removing one broker ---> partition that had a replica on the removed broker can't get written until the replica is up on another broker? Regards number of partitions - The producer to this topic is using librdkafka, using partioner_cb callback, which receives number of partition as partitions_cnt. Still trying to understand how the library obtains partitions_cnt value. I wonder if the behavior is similar to Java library, where it the default partitioner uses the number of available partitions as the number of current partitions... On 17/05/2020, 20:59, "Peter Bukowinski" wrote: If your producer is set to use acks=all, then it won’t be able to produce to the topic topic partitions that had replicas on the missing broker until the replacement broker has finished catching up to be included in the ISR. What method are you using that reports on the number of topic partitions? If some partitions go offline, the cluster still knows how many there are supposed to be, so I’m curious what is reporting 10 when there should be 15. -- Peter > On May 17, 2020, at 10:36 AM, Victoria Zuberman wrote: > > Hi, > > Kafka cluster with 3 brokers, version 1.0.1. > Topic with 15 partitions, replication factor 2. All replicas in sync. > Bringing down one of the brokers (ungracefully), then adding a broker in version 1.0.1 > > During this process, are we expected either of the following to happen: > > 1. Some of the partitions become unavailable for producer to write to > 2. Cluster reports the number of partitions at the topic as 10 and not 15 > It seems like both issues take place in our case, for about a minute. > > We are trying to understand whether it is an expected behavior and if not, what can be causing it. > > Thanks, > Victoria > --- > NOTICE: > This email and all attachments are confidential, may be proprietary, and may be privileged or otherwise protected from disclosure. They are intended solely for the individual or entity to whom the email is addressed. However, mistakes sometimes happen in addressing emails. If you believe that you are not an intended recipient, please stop reading immediately. Do not copy, forward, or rely on the contents in any way. Notify the sender and/or Imperva, Inc. by telephone at +1 (650) 832-6006 and then delete or destroy any copy of this email and its attachments. The sender reserves and asserts all rights to confidentiality, as well as any privileges that may apply. Any disclosure, copying, distribution or action taken or omitted to be taken by an unintended recipient in reliance on this message is prohibited and may be unlawful. > Please consider the environment before printing this email.
Partitioning issue when a broker is going down
Hi, Kafka cluster with 3 brokers, version 1.0.1. Topic with 15 partitions, replication factor 2. All replicas in sync. Bringing down one of the brokers (ungracefully), then adding a broker in version 1.0.1 During this process, are we expected either of the following to happen: 1. Some of the partitions become unavailable for producer to write to 2. Cluster reports the number of partitions at the topic as 10 and not 15 It seems like both issues take place in our case, for about a minute. We are trying to understand whether it is an expected behavior and if not, what can be causing it. Thanks, Victoria --- NOTICE: This email and all attachments are confidential, may be proprietary, and may be privileged or otherwise protected from disclosure. They are intended solely for the individual or entity to whom the email is addressed. However, mistakes sometimes happen in addressing emails. If you believe that you are not an intended recipient, please stop reading immediately. Do not copy, forward, or rely on the contents in any way. Notify the sender and/or Imperva, Inc. by telephone at +1 (650) 832-6006 and then delete or destroy any copy of this email and its attachments. The sender reserves and asserts all rights to confidentiality, as well as any privileges that may apply. Any disclosure, copying, distribution or action taken or omitted to be taken by an unintended recipient in reliance on this message is prohibited and may be unlawful. Please consider the environment before printing this email.
Re: Using Kafka AdminUtils
Hi, John Thanks a lot for valuable information. I looked at KafkaAdminClient and I see that it offers createTopics method that indeed seems suitable. I still have a couple of questions: 1. In the documentation it is not mentioned what is the expected behavior if the specified topic already exists. Will it fail? Will it throw TopicExistsException exception? If topic existed before createTopics was called will it remain unchanged? The behavior is not easily deduced from KafkaAdminClient code alone, I did try 2. I see that AdminClient is supported for a while now but API is still marked as Evolving. From version notes it seems that its basic functionality (like createTopics) remains pretty stable. Is it considered stable enough for production? Thanks, Victoria On 16/02/2020, 20:15, "John Roesler" wrote: Hi Victoria, I’ve used the AdminClient for this kind of thing before. It’s the official java client for administrative actions like creating topics. You can create topics with any partition count, replication, or any other config. I hope this helps, John On Sat, Feb 15, 2020, at 22:41, Victoria Zuberman wrote: > Hi, > > I have an application based on Kafka Streams. > It reads from Kafka topic (I call this topic “input topic”). > That topic has many partitions and their number varies based on the env > in which application is running. > I don’t want to create different input topics manually. > Configuration of auto.create.topics.enable and num.partitions is not > enough for me. > The solution I am looking to implement is to check during application > init whether the input topic exists and if not to create it with > relevant partition number and replication factor. > > I found the following example that uses kafka.admin.AdminUtils and it > seems to be suitable: > https://www.codota.com/code/java/methods/kafka.admin.AdminUtils/createTopic > > Please advise whether using AdminUtils is considered a good practice. > Is AdminUtils functionality considered stable and reliable? > If there are other solutions, I would appreciate to hear about them. > > Thanks, > Victoria > > --- > NOTICE: > This email and all attachments are confidential, may be proprietary, > and may be privileged or otherwise protected from disclosure. They are > intended solely for the individual or entity to whom the email is > addressed. However, mistakes sometimes happen in addressing emails. If > you believe that you are not an intended recipient, please stop reading > immediately. Do not copy, forward, or rely on the contents in any way. > Notify the sender and/or Imperva, Inc. by telephone at +1 (650) > 832-6006 and then delete or destroy any copy of this email and its > attachments. The sender reserves and asserts all rights to > confidentiality, as well as any privileges that may apply. Any > disclosure, copying, distribution or action taken or omitted to be > taken by an unintended recipient in reliance on this message is > prohibited and may be unlawful. > Please consider the environment before printing this email. > --- NOTICE: This email and all attachments are confidential, may be proprietary, and may be privileged or otherwise protected from disclosure. They are intended solely for the individual or entity to whom the email is addressed. However, mistakes sometimes happen in addressing emails. If you believe that you are not an intended recipient, please stop reading immediately. Do not copy, forward, or rely on the contents in any way. Notify the sender and/or Imperva, Inc. by telephone at +1 (650) 832-6006 and then delete or destroy any copy of this email and its attachments. The sender reserves and asserts all rights to confidentiality, as well as any privileges that may apply. Any disclosure, copying, distribution or action taken or omitted to be taken by an unintended recipient in reliance on this message is prohibited and may be unlawful. Please consider the environment before printing this email.
Using Kafka AdminUtils
Hi, I have an application based on Kafka Streams. It reads from Kafka topic (I call this topic “input topic”). That topic has many partitions and their number varies based on the env in which application is running. I don’t want to create different input topics manually. Configuration of auto.create.topics.enable and num.partitions is not enough for me. The solution I am looking to implement is to check during application init whether the input topic exists and if not to create it with relevant partition number and replication factor. I found the following example that uses kafka.admin.AdminUtils and it seems to be suitable: https://www.codota.com/code/java/methods/kafka.admin.AdminUtils/createTopic Please advise whether using AdminUtils is considered a good practice. Is AdminUtils functionality considered stable and reliable? If there are other solutions, I would appreciate to hear about them. Thanks, Victoria --- NOTICE: This email and all attachments are confidential, may be proprietary, and may be privileged or otherwise protected from disclosure. They are intended solely for the individual or entity to whom the email is addressed. However, mistakes sometimes happen in addressing emails. If you believe that you are not an intended recipient, please stop reading immediately. Do not copy, forward, or rely on the contents in any way. Notify the sender and/or Imperva, Inc. by telephone at +1 (650) 832-6006 and then delete or destroy any copy of this email and its attachments. The sender reserves and asserts all rights to confidentiality, as well as any privileges that may apply. Any disclosure, copying, distribution or action taken or omitted to be taken by an unintended recipient in reliance on this message is prohibited and may be unlawful. Please consider the environment before printing this email.