What metrics to use to diagnose performance degradation?

2021-02-23 Thread Victoria Zuberman
Hi,

I have a Kafka cluster and two topics it, X and Y.
I have two unrelated applications, one reads from topic X and one from topic Y.
Those applications don’t share any resources except for Kafka and K8S clusters.
In both application the processing is very basic and no changes were made to it.


During last days at Grafana of both topics I see drops in the rate processing 
at both applications.
The drops are perfectly correlated between two applications.


Which metrics exposed by Kafka brokers can help me understand whether the root 
cause is related to Kafka performance.

Thanks,
Victoria
---
NOTICE:
This email and all attachments are confidential, may be proprietary, and may be 
privileged or otherwise protected from disclosure. They are intended solely for 
the individual or entity to whom the email is addressed. However, mistakes 
sometimes happen in addressing emails. If you believe that you are not an 
intended recipient, please stop reading immediately. Do not copy, forward, or 
rely on the contents in any way. Notify the sender and/or Imperva, Inc. by 
telephone at +1 (650) 832-6006 and then delete or destroy any copy of this 
email and its attachments. The sender reserves and asserts all rights to 
confidentiality, as well as any privileges that may apply. Any disclosure, 
copying, distribution or action taken or omitted to be taken by an unintended 
recipient in reliance on this message is prohibited and may be unlawful.
Please consider the environment before printing this email.


Number of topics to which provider sends

2020-10-14 Thread Victoria Zuberman
Hi,

Background: Java, Kafka 2.1.0

I have application that sends to two different topics.
In theory I can use the same provider.
Are there any advantages to having provider per topic?
I looked for best practices for this matter but didn’t find any...

Thanks,
Victoria
---
NOTICE:
This email and all attachments are confidential, may be proprietary, and may be 
privileged or otherwise protected from disclosure. They are intended solely for 
the individual or entity to whom the email is addressed. However, mistakes 
sometimes happen in addressing emails. If you believe that you are not an 
intended recipient, please stop reading immediately. Do not copy, forward, or 
rely on the contents in any way. Notify the sender and/or Imperva, Inc. by 
telephone at +1 (650) 832-6006 and then delete or destroy any copy of this 
email and its attachments. The sender reserves and asserts all rights to 
confidentiality, as well as any privileges that may apply. Any disclosure, 
copying, distribution or action taken or omitted to be taken by an unintended 
recipient in reliance on this message is prohibited and may be unlawful.
Please consider the environment before printing this email.


Keys and partitions

2020-07-06 Thread Victoria Zuberman
Hi,

I have userId as a key.
Many users have moderate amounts of data but some users have more and some 
users have huge amount of data.

I have been thinking about the following aspects of partitioning:

  1.  If two or more large users will fall into same partition I might end up 
with large partition/s (unbalanced with other partitions)
  2.  If smaller users fall in the same partition as a huge user the small 
users might get slower processing due to the amount of data the huge user has
  3.  If the order of the messages is not critical, maybe I would want to allow 
several consumers to work on the data of the same huge user, therefore I would 
like to partition one userId into several partitions

I have some ideas how to partition to solve those issues that but if you have 
something that worked well for you at production I would love to hear.
Also, any links to relevant blogposts/etc will be welcome

Thanks,
Victoria
---
NOTICE:
This email and all attachments are confidential, may be proprietary, and may be 
privileged or otherwise protected from disclosure. They are intended solely for 
the individual or entity to whom the email is addressed. However, mistakes 
sometimes happen in addressing emails. If you believe that you are not an 
intended recipient, please stop reading immediately. Do not copy, forward, or 
rely on the contents in any way. Notify the sender and/or Imperva, Inc. by 
telephone at +1 (650) 832-6006 and then delete or destroy any copy of this 
email and its attachments. The sender reserves and asserts all rights to 
confidentiality, as well as any privileges that may apply. Any disclosure, 
copying, distribution or action taken or omitted to be taken by an unintended 
recipient in reliance on this message is prohibited and may be unlawful.
Please consider the environment before printing this email.


Re: Disk space - sharp increase in usage

2020-06-02 Thread Victoria Zuberman
Regards kafka-logs directory, it was an interesting lead, we checked and it is 
the same.

Regards replication factor and retention, I am not looking for current 
information, I am look for metrics that can give me information about a change.

Still looking for more ideas

On 02/06/2020, 11:31, "Peter Bukowinski"  wrote:

CAUTION: This message was sent from outside the company. Do not click links 
or open attachments unless you recognize the sender and know the content is 
safe.


> On Jun 2, 2020, at 12:56 AM, Victoria Zuberman 
 wrote:
>
> Hi,
>
> Background:
> Kafka cluster
> 7 brokers, with 4T disk each
> version 2.3 (recently upgraded from 0.1.0 via 1.0.1)
>
> Problem:
> Used disk space went from 40% to 80%.
> Looking for root cause.
>
> Suspects:
>
>  1.  Incoming traffic
>
> Ruled out, according to metrics no significant change in “bytes in” for 
topics in cluster
>
>  1.  Upgrade
>
> The raise started on the day of upgrade to 2.3
>
> But we upgraded another cluster in the same way and we don’t see similar 
issue there
>
> Is there a known change or issue at 2.3 related to disk space usage?
>
>  1.  Replication factor
>
> Is there a way to see whether replication factor of any topic was changed 
recently? Didn’t find in metrics...

You can use the kafka-topics.sh script to check the replica count for all 
your topics. Upgrading would not have affected the replica count, though.

>  1.  Retention
>
> Is there a way to see whether retention was changed recently? Didn’t find 
in metrics...

You can use  kafka-topics.sh —-zookeeper host:2181 --describe 
--topics-with-overrides
to list any topics with non-default retention, but I’m guessing that’s not 
it.

If your disk usage went from 40 to 80% on all brokers — effectively doubled 
— it could be that your kafka data log directory path(s) changed during the 
upgrade. As you upgraded each broker and (re)started the kafka, it would have 
left the existing data under the old one path and created new topic partition 
directories and logs under the new path as it rejoined the cluster. Have you 
verified that your data log directory locations are the same as they used to be?

> Would appreciate any other ideas or investigation leads
>
> Thanks,
> Victoria
>
> ---
> NOTICE:
> This email and all attachments are confidential, may be proprietary, and 
may be privileged or otherwise protected from disclosure. They are intended 
solely for the individual or entity to whom the email is addressed. However, 
mistakes sometimes happen in addressing emails. If you believe that you are not 
an intended recipient, please stop reading immediately. Do not copy, forward, 
or rely on the contents in any way. Notify the sender and/or Imperva, Inc. by 
telephone at +1 (650) 832-6006 and then delete or destroy any copy of this 
email and its attachments. The sender reserves and asserts all rights to 
confidentiality, as well as any privileges that may apply. Any disclosure, 
copying, distribution or action taken or omitted to be taken by an unintended 
recipient in reliance on this message is prohibited and may be unlawful.
> Please consider the environment before printing this email.




Disk space - sharp increase in usage

2020-06-02 Thread Victoria Zuberman
Hi,

Background:
Kafka cluster
7 brokers, with 4T disk each
version 2.3 (recently upgraded from 0.1.0 via 1.0.1)

Problem:
Used disk space went from 40% to 80%.
Looking for root cause.

Suspects:

  1.  Incoming traffic

Ruled out, according to metrics no significant change in “bytes in” for topics 
in cluster

  1.  Upgrade

The raise started on the day of upgrade to 2.3

But we upgraded another cluster in the same way and we don’t see similar issue 
there

Is there a known change or issue at 2.3 related to disk space usage?

  1.  Replication factor

Is there a way to see whether replication factor of any topic was changed 
recently? Didn’t find in metrics...

  1.  Retention

Is there a way to see whether retention was changed recently? Didn’t find in 
metrics...

Would appreciate any other ideas or investigation leads

Thanks,
Victoria

---
NOTICE:
This email and all attachments are confidential, may be proprietary, and may be 
privileged or otherwise protected from disclosure. They are intended solely for 
the individual or entity to whom the email is addressed. However, mistakes 
sometimes happen in addressing emails. If you believe that you are not an 
intended recipient, please stop reading immediately. Do not copy, forward, or 
rely on the contents in any way. Notify the sender and/or Imperva, Inc. by 
telephone at +1 (650) 832-6006 and then delete or destroy any copy of this 
email and its attachments. The sender reserves and asserts all rights to 
confidentiality, as well as any privileges that may apply. Any disclosure, 
copying, distribution or action taken or omitted to be taken by an unintended 
recipient in reliance on this message is prohibited and may be unlawful.
Please consider the environment before printing this email.


Re: Partitioning issue when a broker is going down

2020-05-17 Thread Victoria Zuberman
Regards number of partitions:
Still don't understand it fully.
I revisited Java default partitioner.
I see that there available partitions are used only when key is not provided 
(virtually when it is round-robin).
When key is provided, it uses number of partitions (regardless of availability).
This makes sense since assigning messages to different partition when some 
partitions become unavailable would violate order within partition guarantee.
Providing Custom partitioner with available partitions as partition number 
sounds strange.
Still looking for insights whether there is a flow where number of partitions 
reported for topic is less than what is configured when topic is created.

On 17/05/2020, 22:18, "Peter Bukowinski"  wrote:

CAUTION: This message was sent from outside the company. Do not click links 
or open attachments unless you recognize the sender and know the content is 
safe.


> On May 17, 2020, at 11:45 AM, Victoria Zuberman 
 wrote:
>
>  Regards acks=all:
> -
> Interesting point. Will check acks and min.insync.replicas values.
> If I understand the root cause that you are suggesting correctly, given 
my RF=2 and 3 brokers in cluster:
> min.insync.replicas > 1 and acks=all, removing one broker ---> 
partition that had a replica on the removed broker can't get written until the 
replica is up on another broker?

That is correct. From a producer standpoint, the unaffected partitions will 
still be able to accept data, so depending on data rate and message size, 
producers may not be negatively affected by the missing broker.

> Regards number of partitions
> -
> The producer to this topic is using librdkafka, using partioner_cb 
callback, which receives number of partition as partitions_cnt.

This makes sense, when called, you will get partitions that are able to 
accept data. When a broker goes down and some topics become under-replicated, 
and your producer settings omit the remaining replicas of those partitions as 
valid targets, then partitions_cnt will only enumerate the remaining partitions.

> Still trying to understand how the library obtains partitions_cnt value.
> I wonder if the behavior is similar to Java library, where it the default 
partitioner uses the number of available partitions as the number of current 
partitions...

The logic is similar as that is how kafka is designed. The client will 
fetch the topic’s metadata (including partitions available for writing) on 
connect, on error, and by the interval determined by 
topic.metadata.refresh.interval.ms, unless it is set to -1.

> On 17/05/2020, 20:59, "Peter Bukowinski"  wrote:
>
>
>If your producer is set to use acks=all, then it won’t be able to 
produce to the topic topic partitions that had replicas on the missing broker 
until the replacement broker has finished catching up to be included in the ISR.
>
>What method are you using that reports on the number of topic 
partitions? If some partitions go offline, the cluster still knows how many 
there are supposed to be, so I’m curious what is reporting 10 when there should 
be 15.
>
>-- Peter
>
>> On May 17, 2020, at 10:36 AM, Victoria Zuberman 
 wrote:
>>
>> Hi,
>>
>> Kafka cluster with 3 brokers, version 1.0.1.
>> Topic with 15 partitions, replication factor 2. All replicas in sync.
>> Bringing down one of the brokers (ungracefully), then adding a broker in 
version 1.0.1
>>
>> During this process, are we expected either of the following to happen:
>>
>> 1.  Some of the partitions become unavailable for producer to write to
>> 2.  Cluster reports the number of partitions at the topic as 10 and not 
15
>> It seems like both issues take place in our case, for about a minute.
>>
>> We are trying to understand whether it is an expected behavior and if 
not, what can be causing it.
>>
>> Thanks,
>> Victoria
>> ---
>> NOTICE:
>> This email and all attachments are confidential, may be proprietary, and 
may be privileged or otherwise protected from disclosure. They are intended 
solely for the individual or entity to whom the email is addressed. However, 
mistakes sometimes happen in addressing emails. If you believe that you are not 
an intended recipient, please stop reading immediately. Do not copy, forward, 
or rely on the contents in any way. Notify the sender and/or Imperva, Inc. by 
telephone at +1 (650) 832-6006 and then delete or destroy any copy of this 
email and its attachments. The sender reserves and asserts all ri

Re: Partitioning issue when a broker is going down

2020-05-17 Thread Victoria Zuberman
 Regards acks=all:
-
Interesting point. Will check acks and min.insync.replicas values.
If I understand the root cause that you are suggesting correctly, given my RF=2 
and 3 brokers in cluster:
min.insync.replicas > 1 and acks=all, removing one broker ---> partition 
that had a replica on the removed broker can't get written until the replica is 
up on another broker?

Regards number of partitions
-
The producer to this topic is using librdkafka, using partioner_cb callback, 
which receives number of partition as partitions_cnt.
Still trying to understand how the library obtains partitions_cnt value.
I wonder if the behavior is similar to Java library, where it the default 
partitioner uses the number of available partitions as the number of current 
partitions...

On 17/05/2020, 20:59, "Peter Bukowinski"  wrote:


If your producer is set to use acks=all, then it won’t be able to produce 
to the topic topic partitions that had replicas on the missing broker until the 
replacement broker has finished catching up to be included in the ISR.

What method are you using that reports on the number of topic partitions? 
If some partitions go offline, the cluster still knows how many there are 
supposed to be, so I’m curious what is reporting 10 when there should be 15.

-- Peter

> On May 17, 2020, at 10:36 AM, Victoria Zuberman 
 wrote:
>
> Hi,
>
> Kafka cluster with 3 brokers, version 1.0.1.
> Topic with 15 partitions, replication factor 2. All replicas in sync.
> Bringing down one of the brokers (ungracefully), then adding a broker in 
version 1.0.1
>
> During this process, are we expected either of the following to happen:
>
>  1.  Some of the partitions become unavailable for producer to write to
>  2.  Cluster reports the number of partitions at the topic as 10 and not 
15
> It seems like both issues take place in our case, for about a minute.
>
> We are trying to understand whether it is an expected behavior and if 
not, what can be causing it.
>
> Thanks,
> Victoria
> ---
> NOTICE:
> This email and all attachments are confidential, may be proprietary, and 
may be privileged or otherwise protected from disclosure. They are intended 
solely for the individual or entity to whom the email is addressed. However, 
mistakes sometimes happen in addressing emails. If you believe that you are not 
an intended recipient, please stop reading immediately. Do not copy, forward, 
or rely on the contents in any way. Notify the sender and/or Imperva, Inc. by 
telephone at +1 (650) 832-6006 and then delete or destroy any copy of this 
email and its attachments. The sender reserves and asserts all rights to 
confidentiality, as well as any privileges that may apply. Any disclosure, 
copying, distribution or action taken or omitted to be taken by an unintended 
recipient in reliance on this message is prohibited and may be unlawful.
> Please consider the environment before printing this email.




Partitioning issue when a broker is going down

2020-05-17 Thread Victoria Zuberman
Hi,

Kafka cluster with 3 brokers, version 1.0.1.
Topic with 15 partitions, replication factor 2. All replicas in sync.
Bringing down one of the brokers (ungracefully), then adding a broker in 
version 1.0.1

During this process, are we expected either of the following to happen:

  1.  Some of the partitions become unavailable for producer to write to
  2.  Cluster reports the number of partitions at the topic as 10 and not 15
It seems like both issues take place in our case, for about a minute.

We are trying to understand whether it is an expected behavior and if not, what 
can be causing it.

Thanks,
Victoria
---
NOTICE:
This email and all attachments are confidential, may be proprietary, and may be 
privileged or otherwise protected from disclosure. They are intended solely for 
the individual or entity to whom the email is addressed. However, mistakes 
sometimes happen in addressing emails. If you believe that you are not an 
intended recipient, please stop reading immediately. Do not copy, forward, or 
rely on the contents in any way. Notify the sender and/or Imperva, Inc. by 
telephone at +1 (650) 832-6006 and then delete or destroy any copy of this 
email and its attachments. The sender reserves and asserts all rights to 
confidentiality, as well as any privileges that may apply. Any disclosure, 
copying, distribution or action taken or omitted to be taken by an unintended 
recipient in reliance on this message is prohibited and may be unlawful.
Please consider the environment before printing this email.


Re: Using Kafka AdminUtils

2020-02-16 Thread Victoria Zuberman
Hi, John

Thanks a lot for valuable information.
I looked at KafkaAdminClient and I see that it offers createTopics method that 
indeed seems suitable.

I still have a couple of questions:

1. In the documentation it is not mentioned what is the expected behavior if 
the specified topic already exists.
 Will it fail?
 Will it throw TopicExistsException exception?
If topic existed before createTopics was called will it remain unchanged?
The behavior is not easily deduced  from KafkaAdminClient code alone, I did 
try

2. I see that AdminClient is supported for a while now but API is still marked 
as Evolving.
From version notes it seems that its basic functionality (like 
createTopics) remains pretty stable.
Is it considered stable enough for production?

Thanks,
Victoria


On 16/02/2020, 20:15, "John Roesler"  wrote:

Hi Victoria,

I’ve used the AdminClient for this kind of thing before. It’s the official 
java client for administrative actions like creating topics. You can create 
topics with any partition count, replication, or any other config.

I hope this helps,
John

On Sat, Feb 15, 2020, at 22:41, Victoria Zuberman wrote:
> Hi,
>
> I have an application based on Kafka Streams.
> It reads from Kafka topic (I call this topic “input topic”).
> That topic has many partitions and their number varies based on the env
> in which application is running.
> I don’t want to create different input topics manually.
> Configuration of auto.create.topics.enable and num.partitions is not
> enough for me.
> The solution I am looking to implement is to check during application
> init whether the input topic exists and if not to create it with
> relevant partition number and replication factor.
>
> I found the following example that uses kafka.admin.AdminUtils and it
> seems to be suitable:
> 
https://www.codota.com/code/java/methods/kafka.admin.AdminUtils/createTopic
>
> Please advise whether using AdminUtils is considered a good practice.
> Is AdminUtils functionality considered stable and reliable?
> If there are other solutions, I would appreciate to hear about them.
>
> Thanks,
> Victoria
>
> ---
> NOTICE:
> This email and all attachments are confidential, may be proprietary,
> and may be privileged or otherwise protected from disclosure. They are
> intended solely for the individual or entity to whom the email is
> addressed. However, mistakes sometimes happen in addressing emails. If
> you believe that you are not an intended recipient, please stop reading
> immediately. Do not copy, forward, or rely on the contents in any way.
> Notify the sender and/or Imperva, Inc. by telephone at +1 (650)
> 832-6006 and then delete or destroy any copy of this email and its
> attachments. The sender reserves and asserts all rights to
> confidentiality, as well as any privileges that may apply. Any
> disclosure, copying, distribution or action taken or omitted to be
> taken by an unintended recipient in reliance on this message is
> prohibited and may be unlawful.
> Please consider the environment before printing this email.
>


---
NOTICE:
This email and all attachments are confidential, may be proprietary, and may be 
privileged or otherwise protected from disclosure. They are intended solely for 
the individual or entity to whom the email is addressed. However, mistakes 
sometimes happen in addressing emails. If you believe that you are not an 
intended recipient, please stop reading immediately. Do not copy, forward, or 
rely on the contents in any way. Notify the sender and/or Imperva, Inc. by 
telephone at +1 (650) 832-6006 and then delete or destroy any copy of this 
email and its attachments. The sender reserves and asserts all rights to 
confidentiality, as well as any privileges that may apply. Any disclosure, 
copying, distribution or action taken or omitted to be taken by an unintended 
recipient in reliance on this message is prohibited and may be unlawful.
Please consider the environment before printing this email.


Using Kafka AdminUtils

2020-02-15 Thread Victoria Zuberman
Hi,

I have an application based on Kafka Streams.
It reads from Kafka topic (I call this topic “input topic”).
That topic has many partitions and their number varies based on the env in 
which application is running.
I don’t want to create different input topics manually.
Configuration of auto.create.topics.enable and num.partitions is not enough for 
me.
The solution I am looking to implement is to check during application init 
whether the input topic exists and if not to create it with relevant partition 
number and replication factor.

I found the following example that uses kafka.admin.AdminUtils and it seems to 
be suitable:
https://www.codota.com/code/java/methods/kafka.admin.AdminUtils/createTopic

Please advise whether using AdminUtils is considered a good practice.
Is AdminUtils functionality considered stable and reliable?
If there are other solutions, I would appreciate to hear about them.

Thanks,
Victoria

---
NOTICE:
This email and all attachments are confidential, may be proprietary, and may be 
privileged or otherwise protected from disclosure. They are intended solely for 
the individual or entity to whom the email is addressed. However, mistakes 
sometimes happen in addressing emails. If you believe that you are not an 
intended recipient, please stop reading immediately. Do not copy, forward, or 
rely on the contents in any way. Notify the sender and/or Imperva, Inc. by 
telephone at +1 (650) 832-6006 and then delete or destroy any copy of this 
email and its attachments. The sender reserves and asserts all rights to 
confidentiality, as well as any privileges that may apply. Any disclosure, 
copying, distribution or action taken or omitted to be taken by an unintended 
recipient in reliance on this message is prohibited and may be unlawful.
Please consider the environment before printing this email.