[jira] [Comment Edited] (KAFKA-16344) Internal topic mm2-offset-syncsinternal created with single partition is putting more load on the broker

2024-03-25 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17830685#comment-17830685
 ] 

Greg Harris edited comment on KAFKA-16344 at 3/25/24 8:06 PM:
--

Hi [~janardhanag] Yes, you should consider increasing offset.lag.max. 0 is best 
for precise offset translation, but can significantly increase the amount of 
traffic on the offset-syncs topic.

For topics without offset gaps (e.g. not compacted, not transactional) the 
offset.lag.max currently behaves as a ratio between the amount of records/sec 
in a mirrored topic, and that topic's additional load on the offset-syncs 
topic. There's a semaphore 
[https://github.com/apache/kafka/blob/f8ce7feebcede6c6da93c649413a64695bc8d4c1/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorSourceTask.java#L95]
 that prevents this load from slowing down the mirror source connector, but it 
won't protect the brokers holding the mm2 offsets topic from being overloaded.

For example, if you were to mirror 100 records, offset.lag.max=0 could 
write 100 records to the offset-syncs topic, generating huge load. For 
100 records with offset.lag.max=100, the number of offset syncs could be 
1, 100x less. For offset.lag.max=1, the number of offset syncs could be 
100, 1x less. To answer your questions:

> 1. What is the proportion of mm2-offsetsyncsinternal topic writes to overall 
>MM2 traffic?

Configurable with offset.lag.max for topics with contiguous offsets. For 
transactional topics it isn't configurable, as each transaction may cause 1 or 
2 offset syncs and that isn't limited by the offset.lag.max, so reducing the 
throughput may require increasing the transaction intervals on your source 
applications.

> 2. Is there a way to tune the internal topic writes with increased traffic 
> MM2?

As the throughput of your MM2 instance increases, you should increase the 
offset.lag.max to keep the offset-syncs topic fixed.

> 3. I want to understand the expected mm2-offsetsyncsinternal write TPS for a 
> given amount of MM2 traffic. Are there any tunable parameters to reduce these 
> writes, and what are the consequences of tuning if any?

There isn't a closed formula, because a sync may be initiated by multiple 
conditions 
[https://github.com/apache/kafka/blob/f8ce7feebcede6c6da93c649413a64695bc8d4c1/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorSourceTask.java#L360-L363]
 and prevented by the semaphore when the producer latency becomes significant.

The consequence of decreasing throughput on this topic is less precise offset 
translation in the MirrorCheckpointConnector, and increased redelivery when 
failing over from source to target consumer offsets. If you aren't using this 
connector, then the offset syncs topic isn't read from at all. You currently 
can't turn the topic off, but there is a KIP open for that: 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-1031%3A+Control+offset+translation+in+MirrorSourceConnector]
 . If for example you were satisfied with 30 seconds of redelivery for a topic 
which had 1000 records/second, then you could set offset.lag.max to 30*1000.

> 4. In a larger Kafka cluster, if a single broker is overloaded with 
> mm2-offsetsyncsinternal traffic, can it lead to a broker crash? Are there any 
> guidelines available for such scenarios? Currently, we have 6 brokers with 
> 24K MM2 traffic, and internal writes are at 10K, resulting in a 20% CPU 
> increase on one broker.

I'm not familiar with the failure conditions for overloaded brokers. If you are 
seeing failures then I would definitely recommend trying to tune MM2, or try to 
use quotas [https://kafka.apache.org/documentation/#design_quotas] to get this 
topic-partition under control.

> 5. Are there any limitations on Kafka brokers scaling , I mean how much kafka 
>broker can be expanded in single Kakfa cluster due to the 
>mm2-offsetsyncsinternal topic?

Because the offset syncs topic only supports a single partition, adding brokers 
will not solve this problem, other than having a broker dedicated to only 
serving this one topic. At some point, you would need two separate offset-syncs 
topics and separate MirrorSourceConnectors to shard the load.

> 6. How can the system be dimensioned to handle MM2 internal topic writes 
> effectively? Are there any recommended figures available? For instance, for a 
> given amount of traffic (X), what percentage increase in CPU (Y) should each 
> broker have to handle MM2 internal topic writes? Note that in other pods, 
> this resource may not be utilized.

The Kafka project doesn't publish "expected" dimensions, other than the default 
values of configurations. We don't publish performance analysis on any 
particular hardware setups, because the diversity of hardware running Kafka is 
just too vast to capture properly.


was (Author: gharris1727):
Hi 

[jira] [Comment Edited] (KAFKA-16344) Internal topic mm2-offset-syncsinternal created with single partition is putting more load on the broker

2024-03-07 Thread Janardhana Gopalachar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824385#comment-17824385
 ] 

Janardhana Gopalachar edited comment on KAFKA-16344 at 3/7/24 2:27 PM:
---

HI [~gharris1727] 

Currently in our mirror maker spec  we have offset.lag.max: 0, so should it be 
set to max 100, Will this reduce the through put on source topics or target 
topic, 

Is the offset.lag.max value set to 0 is contributing for CPU load ?

what would be value that could be set if it to be more than 100? 
{code:java}
sourceConnector:
        maxTasks: 12
        settings:
          consumer.max.poll.records: 2000
          consumer.ssl.endpoint.identification.algorithm: ""
          offset-syncs.topic.location: target
          offset-syncs.topic.replication.factor: 3
          offset.lag.max: 0
          producer.ssl.endpoint.identification.algorithm: ""
          refresh.topics.interval.seconds: 300
          replication.factor: 3
          replication.policy.class: 
org.apache.kafka.connect.mirror.IdentityReplicationPolicy
          replication.policy.separator: ""
          ssl.endpoint.identification.algorithm: ""
          sync.topic.acls.enabled: "false"{code}
and along with we have below questions 

1. What is the proportion of mm2-offsetsyncsinternal topic writes to overall 
MM2 traffic?

2. Is there a way to tune the internal topic writes with increased traffic MM2?

3. I want to understand the expected mm2-offsetsyncsinternal write TPS for a 
given amount of MM2 traffic. Are there any tunable parameters to reduce these 
writes, and what are the consequences of tuning if any?

4. In a larger Kafka cluster, if a single broker is overloaded with 
mm2-offsetsyncsinternal traffic, can it lead to a broker crash? Are there any 
guidelines available for such scenarios? Currently, we have 6 brokers with 24K 
MM2 traffic, and internal writes are at 10K, resulting in a 20% CPU increase on 
one broker.

5. Are there any limitations on Kafka brokers scaling , I mean how much kafka 
broker can be expanded in single Kakfa cluster due to the 
mm2-offsetsyncsinternal topic?

6. How can the system be dimensioned to handle MM2 internal topic writes 
effectively? Are there any recommended figures available? For instance, for a 
given amount of traffic (X), what percentage increase in CPU (Y) should each 
broker have to handle MM2 internal topic writes? Note that in other pods, this 
resource may not be utilized.

Regards

Jana


was (Author: JIRAUSER301625):
HI [~gharris1727] 

Currently in our mirror maker spec  we have offset.lag.max: 0, so should it be 
set to max 100, Will this reduce the through put on source topics or target 
topic, basically couldnt find much documentation for the same. 

Is the offset.lag.max value set to 0 is contributing for CPU load ?

what would be value that could be set if it to be more than 100? 
{code:java}
sourceConnector:
        maxTasks: 12
        settings:
          consumer.max.poll.records: 2000
          consumer.ssl.endpoint.identification.algorithm: ""
          offset-syncs.topic.location: target
          offset-syncs.topic.replication.factor: 3
          offset.lag.max: 0
          producer.ssl.endpoint.identification.algorithm: ""
          refresh.topics.interval.seconds: 300
          replication.factor: 3
          replication.policy.class: 
org.apache.kafka.connect.mirror.IdentityReplicationPolicy
          replication.policy.separator: ""
          ssl.endpoint.identification.algorithm: ""
          sync.topic.acls.enabled: "false"{code}
and along with we have below questions 

1. What is the proportion of mm2-offsetsyncsinternal topic writes to overall 
MM2 traffic?

2. Is there a way to tune the internal topic writes with increased traffic MM2?

3. I want to understand the expected mm2-offsetsyncsinternal write TPS for a 
given amount of MM2 traffic. Are there any tunable parameters to reduce these 
writes, and what are the consequences of tuning if any?

4. In a larger Kafka cluster, if a single broker is overloaded with 
mm2-offsetsyncsinternal traffic, can it lead to a broker crash? Are there any 
guidelines available for such scenarios? Currently, we have 6 brokers with 24K 
MM2 traffic, and internal writes are at 10K, resulting in a 20% CPU increase on 
one broker.

5. Are there any limitations on Kafka brokers scaling , I mean how much kafka 
broker can be expanded in single Kakfa cluster due to the 
mm2-offsetsyncsinternal topic?

6. How can the system be dimensioned to handle MM2 internal topic writes 
effectively? Are there any recommended figures available? For instance, for a 
given amount of traffic (X), what percentage increase in CPU (Y) should each 
broker have to handle MM2 internal topic writes? Note that in other pods, this 
resource may not be utilized.

Regards

Jana

> Internal topic mm2-offset-syncsinternal created with single 

[jira] [Comment Edited] (KAFKA-16344) Internal topic mm2-offset-syncsinternal created with single partition is putting more load on the broker

2024-03-07 Thread Janardhana Gopalachar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17824385#comment-17824385
 ] 

Janardhana Gopalachar edited comment on KAFKA-16344 at 3/7/24 2:05 PM:
---

HI [~gharris1727] 

Currently in our mirror maker spec  we have offset.lag.max: 0, so should it be 
set to max 100, Will this reduce the through put on source topics or target 
topic, basically couldnt find much documentation for the same. 

Is the offset.lag.max value set to 0 is contributing for CPU load ?

what would be value that could be set if it to be more than 100? 
{code:java}
sourceConnector:
        maxTasks: 12
        settings:
          consumer.max.poll.records: 2000
          consumer.ssl.endpoint.identification.algorithm: ""
          offset-syncs.topic.location: target
          offset-syncs.topic.replication.factor: 3
          offset.lag.max: 0
          producer.ssl.endpoint.identification.algorithm: ""
          refresh.topics.interval.seconds: 300
          replication.factor: 3
          replication.policy.class: 
org.apache.kafka.connect.mirror.IdentityReplicationPolicy
          replication.policy.separator: ""
          ssl.endpoint.identification.algorithm: ""
          sync.topic.acls.enabled: "false"{code}
and along with we have below questions 

1. What is the proportion of mm2-offsetsyncsinternal topic writes to overall 
MM2 traffic?

2. Is there a way to tune the internal topic writes with increased traffic MM2?

3. I want to understand the expected mm2-offsetsyncsinternal write TPS for a 
given amount of MM2 traffic. Are there any tunable parameters to reduce these 
writes, and what are the consequences of tuning if any?

4. In a larger Kafka cluster, if a single broker is overloaded with 
mm2-offsetsyncsinternal traffic, can it lead to a broker crash? Are there any 
guidelines available for such scenarios? Currently, we have 6 brokers with 24K 
MM2 traffic, and internal writes are at 10K, resulting in a 20% CPU increase on 
one broker.

5. Are there any limitations on Kafka brokers scaling , I mean how much kafka 
broker can be expanded in single Kakfa cluster due to the 
mm2-offsetsyncsinternal topic?

6. How can the system be dimensioned to handle MM2 internal topic writes 
effectively? Are there any recommended figures available? For instance, for a 
given amount of traffic (X), what percentage increase in CPU (Y) should each 
broker have to handle MM2 internal topic writes? Note that in other pods, this 
resource may not be utilized.

Regards

Jana


was (Author: JIRAUSER301625):
HI [~gharris1727] 

Below are few questions we have 

1. What is the proportion of mm2-offsetsyncsinternal topic writes to overall 
MM2 traffic?

2. Is there a way to tune the internal topic writes with increased traffic MM2?

3. I want to understand the expected mm2-offsetsyncsinternal write TPS for a 
given amount of MM2 traffic. Are there any tunable parameters to reduce these 
writes, and what are the consequences of tuning if any?

4. In a larger Kafka cluster, if a single broker is overloaded with 
mm2-offsetsyncsinternal traffic, can it lead to a broker crash? Are there any 
guidelines available for such scenarios? Currently, we have 6 brokers with 24K 
MM2 traffic, and internal writes are at 10K, resulting in a 20% CPU increase on 
one broker.

5. Are there any limitations on Kafka brokers scaling , I mean how much kafka 
broker can be expanded in single Kakfa cluster due to the 
mm2-offsetsyncsinternal topic?

6. How can the system be dimensioned to handle MM2 internal topic writes 
effectively? Are there any recommended figures available? For instance, for a 
given amount of traffic (X), what percentage increase in CPU (Y) should each 
broker have to handle MM2 internal topic writes? Note that in other pods, this 
resource may not be utilized.

Regards

Jana

> Internal topic mm2-offset-syncsinternal created with single 
> partition is putting more load on the broker
> -
>
> Key: KAFKA-16344
> URL: https://issues.apache.org/jira/browse/KAFKA-16344
> Project: Kafka
>  Issue Type: Bug
>  Components: connect
>Affects Versions: 3.5.1
>Reporter: Janardhana Gopalachar
>Priority: Major
>
> We are using Kafka 3.5.1 version, we see that the internal topic created by 
> mirrormaker 
> mm2-offset-syncsinternal is created with single partition due to 
> which the CPU load on the broker which will be leader for this partition is 
> increased compared to other brokers. Can multiple partitions be  created for 
> the topic so that the CPU load would get distributed 
>  
> Topic: mm2-offset-syncscluster-ainternal    TopicId: XRvTDbogT8ytNhqX2YTyrA   
>  PartitionCount: 1ReplicationFactor: 3    

[jira] [Comment Edited] (KAFKA-16344) Internal topic mm2-offset-syncsinternal created with single partition is putting more load on the broker

2024-03-06 Thread Janardhana Gopalachar (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823930#comment-17823930
 ] 

Janardhana Gopalachar edited comment on KAFKA-16344 at 3/6/24 12:19 PM:


HI [~gharris1727] 

while processing 24k events/second, MM2 internal topic gets 10k events/sec. is 
the event load the internal topic is getting. Is there any parameters that can 
be tuned so that CPU load on the broker instance for the internal topic leader 
can be distributed 

Regards

Jana


was (Author: JIRAUSER301625):
HI [~gharris1727] 

while processing 24k events/second, MM2 internal topic gets 10k events/sec. is 
the event load the internal topic is getting

Regards

Jana

> Internal topic mm2-offset-syncsinternal created with single 
> partition is putting more load on the broker
> -
>
> Key: KAFKA-16344
> URL: https://issues.apache.org/jira/browse/KAFKA-16344
> Project: Kafka
>  Issue Type: Bug
>  Components: connect
>Affects Versions: 3.5.1
>Reporter: Janardhana Gopalachar
>Priority: Major
>
> We are using Kafka 3.5.1 version, we see that the internal topic created by 
> mirrormaker 
> mm2-offset-syncsinternal is created with single partition due to 
> which the CPU load on the broker which will be leader for this partition is 
> increased compared to other brokers. Can multiple partitions be  created for 
> the topic so that the CPU load would get distributed 
>  
> Topic: mm2-offset-syncscluster-ainternal    TopicId: XRvTDbogT8ytNhqX2YTyrA   
>  PartitionCount: 1ReplicationFactor: 3    Configs: 
> min.insync.replicas=2,cleanup.policy=compact,message.format.version=3.0-IV1
>     Topic: mm2-offset-syncscluster-ainternal    Partition: 0    Leader: 2    
> Replicas: 2,1,0    Isr: 2,1,0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)