[jira] [Commented] (KAFKA-16582) Feature Request: Introduce max.record.size Configuration Parameter for Producers

2024-07-23 Thread Ramiz Mehran (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868018#comment-17868018
 ] 

Ramiz Mehran commented on KAFKA-16582:
--

Yes [~xiaodoujiang] .

The reason is basically large payloads. We want a limit before compression, so 
that we can limit making such large batches.

> Feature Request: Introduce max.record.size Configuration Parameter for 
> Producers
> 
>
> Key: KAFKA-16582
> URL: https://issues.apache.org/jira/browse/KAFKA-16582
> Project: Kafka
>  Issue Type: New Feature
>  Components: producer 
>Affects Versions: 3.6.2
>Reporter: Ramiz Mehran
>Priority: Major
>
> {*}Summary{*}:
> Currently, Kafka producers have a {{max.request.size}} configuration that 
> limits the size of the request sent to Kafka brokers, which includes both 
> compressed and uncompressed data sizes. However, it is also the maximum size 
> of an individual record before it is compressed. This can lead to 
> inefficiencies and unexpected behaviours, particularly when records are 
> significantly large before compression but fit multiple times into the 
> {{max.request.size}} after compression.
> {*}Problem{*}:
> During spikes in data transmission, especially with large records, even when 
> compressed within the limits of {{{}max.request.size{}}}, it causes an 
> increase in latency and potential backlog in processing due to the large 
> batch sizes formed by compressed records. This problem is particularly 
> pronounced when using highly efficient compression algorithms like zstd, 
> where the compressed size may allow for large batches that are inefficient to 
> process.
> {*}Proposed Solution{*}:
> Introduce a new producer configuration parameter: {{{}max.record.size{}}}. 
> This parameter will allow administrators to define the maximum size of a 
> record before it is compressed. This would help in managing expectations and 
> system behavior more predictably by separating uncompressed record size limit 
> from compressed request size limit.
> {*}Benefits{*}:
>  # {*}Predictability{*}: Producers can reject records that exceed the 
> {{max.record.size}} before spending resources on compression.
>  # {*}Efficiency{*}: Helps in maintaining efficient batch sizes and system 
> throughput, especially under high load conditions.
>  # {*}System Stability{*}: Avoids the potential for large batch processing 
> which can affect latency and throughput negatively.
> {*}Example{*}: Consider a scenario where the producer sends records up to 20 
> MB in size which, when compressed, fit into a batch under the 25 MB 
> {{max.request.size }}multiple times. These batches can be problematic to 
> process efficiently, even though they meet the current maximum request size 
> constraints. With {{{}max.record.size{}}}, we could separate max.request.size 
> to only limit compressed request size creation, thus helping us limit that to 
> say 5 MB. Thus, preventing very large requests being made, and causing 
> latency spikes.
> {*}Steps to Reproduce{*}:
>  # Configure a Kafka producer with {{max.request.size}} set to 25 MB.
>  # Send multiple uncompressed records close to 20 MB that compress to less 
> than 25 MB.
>  # Observe the impact on Kafka broker performance and client side latency.
> {*}Expected Behavior{*}: The producer should allow administrators to set both 
> pre-compression record size limits and total request size limits post 
> compression.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16582) Feature Request: Introduce max.record.size Configuration Parameter for Producers

2024-04-17 Thread Ramiz Mehran (Jira)
Ramiz Mehran created KAFKA-16582:


 Summary: Feature Request: Introduce max.record.size Configuration 
Parameter for Producers
 Key: KAFKA-16582
 URL: https://issues.apache.org/jira/browse/KAFKA-16582
 Project: Kafka
  Issue Type: New Feature
  Components: producer 
Affects Versions: 3.6.2
Reporter: Ramiz Mehran


{*}Summary{*}:

Currently, Kafka producers have a {{max.request.size}} configuration that 
limits the size of the request sent to Kafka brokers, which includes both 
compressed and uncompressed data sizes. However, it is also the maximum size of 
an individual record before it is compressed. This can lead to inefficiencies 
and unexpected behaviours, particularly when records are significantly large 
before compression but fit multiple times into the {{max.request.size}} after 
compression.

{*}Problem{*}:

During spikes in data transmission, especially with large records, even when 
compressed within the limits of {{{}max.request.size{}}}, it causes an increase 
in latency and potential backlog in processing due to the large batch sizes 
formed by compressed records. This problem is particularly pronounced when 
using highly efficient compression algorithms like zstd, where the compressed 
size may allow for large batches that are inefficient to process.

{*}Proposed Solution{*}:

Introduce a new producer configuration parameter: {{{}max.record.size{}}}. This 
parameter will allow administrators to define the maximum size of a record 
before it is compressed. This would help in managing expectations and system 
behavior more predictably by separating uncompressed record size limit from 
compressed request size limit.

{*}Benefits{*}:
 # {*}Predictability{*}: Producers can reject records that exceed the 
{{max.record.size}} before spending resources on compression.
 # {*}Efficiency{*}: Helps in maintaining efficient batch sizes and system 
throughput, especially under high load conditions.
 # {*}System Stability{*}: Avoids the potential for large batch processing 
which can affect latency and throughput negatively.

{*}Example{*}: Consider a scenario where the producer sends records up to 20 MB 
in size which, when compressed, fit into a batch under the 25 MB 
{{max.request.size }}multiple times. These batches can be problematic to 
process efficiently, even though they meet the current maximum request size 
constraints. With {{{}max.record.size{}}}, we could separate max.request.size 
to only limit compressed request size creation, thus helping us limit that to 
say 5 MB. Thus, preventing very large requests being made, and causing latency 
spikes.

{*}Steps to Reproduce{*}:
 # Configure a Kafka producer with {{max.request.size}} set to 25 MB.
 # Send multiple uncompressed records close to 20 MB that compress to less than 
25 MB.
 # Observe the impact on Kafka broker performance and client side latency.

{*}Expected Behavior{*}: The producer should allow administrators to set both 
pre-compression record size limits and total request size limits post 
compression.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-10034) Clarify Usage of "batch.size" and "max.request.size" Producer Configs

2024-04-17 Thread Ramiz Mehran (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17838450#comment-17838450
 ] 

Ramiz Mehran commented on KAFKA-10034:
--

"Firstly, this configuration is a cap on the maximum uncompressed record batch 
size." should be changed to "Firstly, this configuration is a cap on the 
maximum uncompressed record size."

As the cap is on uncompressed single record's max size.

> Clarify Usage of "batch.size" and "max.request.size" Producer Configs
> -
>
> Key: KAFKA-10034
> URL: https://issues.apache.org/jira/browse/KAFKA-10034
> Project: Kafka
>  Issue Type: Improvement
>  Components: docs, producer 
>Reporter: Mark Cox
>Assignee: Badai Aqrandista
>Priority: Minor
>
> The documentation around the producer configurations "batch.size" and 
> "max.request.size", and how they relate to one another, can be confusing.
> In reality, the "max.request.size" is a hard limit on each individual record, 
> but the documentation makes it seem this is the maximum size of a request 
> sent to Kafka.  If there is a situation where "batch.size" is set greater 
> than "max.request.size" (and each individual record is smaller than 
> "max.request.size") you could end up with larger requests than expected sent 
> to Kafka.
> There are a few things that could be considered to make this clearer:
>  # Improve the documentation to clarify the two producer configurations and 
> how they relate to each other
>  # Provide a producer check, and possibly a warning, if "batch.size" is found 
> to be greater than "max.request.size"
>  # The producer could take the _minimum_ of "batch.size" or "max.request.size"
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] (KAFKA-10034) Clarify Usage of "batch.size" and "max.request.size" Producer Configs

2024-04-17 Thread Ramiz Mehran (Jira)


[ https://issues.apache.org/jira/browse/KAFKA-10034 ]


Ramiz Mehran deleted comment on KAFKA-10034:
--

was (Author: JIRAUSER290918):
"Firstly, this configuration is a cap on the maximum uncompressed record batch 
size." should be changed to "Firstly, this configuration is a cap on the 
maximum uncompressed record size."

> Clarify Usage of "batch.size" and "max.request.size" Producer Configs
> -
>
> Key: KAFKA-10034
> URL: https://issues.apache.org/jira/browse/KAFKA-10034
> Project: Kafka
>  Issue Type: Improvement
>  Components: docs, producer 
>Reporter: Mark Cox
>Assignee: Badai Aqrandista
>Priority: Minor
>
> The documentation around the producer configurations "batch.size" and 
> "max.request.size", and how they relate to one another, can be confusing.
> In reality, the "max.request.size" is a hard limit on each individual record, 
> but the documentation makes it seem this is the maximum size of a request 
> sent to Kafka.  If there is a situation where "batch.size" is set greater 
> than "max.request.size" (and each individual record is smaller than 
> "max.request.size") you could end up with larger requests than expected sent 
> to Kafka.
> There are a few things that could be considered to make this clearer:
>  # Improve the documentation to clarify the two producer configurations and 
> how they relate to each other
>  # Provide a producer check, and possibly a warning, if "batch.size" is found 
> to be greater than "max.request.size"
>  # The producer could take the _minimum_ of "batch.size" or "max.request.size"
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-10034) Clarify Usage of "batch.size" and "max.request.size" Producer Configs

2024-04-17 Thread Ramiz Mehran (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17838447#comment-17838447
 ] 

Ramiz Mehran commented on KAFKA-10034:
--

"Firstly, this configuration is a cap on the maximum uncompressed record batch 
size." should be changed to "Firstly, this configuration is a cap on the 
maximum uncompressed record size."

> Clarify Usage of "batch.size" and "max.request.size" Producer Configs
> -
>
> Key: KAFKA-10034
> URL: https://issues.apache.org/jira/browse/KAFKA-10034
> Project: Kafka
>  Issue Type: Improvement
>  Components: docs, producer 
>Reporter: Mark Cox
>Assignee: Badai Aqrandista
>Priority: Minor
>
> The documentation around the producer configurations "batch.size" and 
> "max.request.size", and how they relate to one another, can be confusing.
> In reality, the "max.request.size" is a hard limit on each individual record, 
> but the documentation makes it seem this is the maximum size of a request 
> sent to Kafka.  If there is a situation where "batch.size" is set greater 
> than "max.request.size" (and each individual record is smaller than 
> "max.request.size") you could end up with larger requests than expected sent 
> to Kafka.
> There are a few things that could be considered to make this clearer:
>  # Improve the documentation to clarify the two producer configurations and 
> how they relate to each other
>  # Provide a producer check, and possibly a warning, if "batch.size" is found 
> to be greater than "max.request.size"
>  # The producer could take the _minimum_ of "batch.size" or "max.request.size"
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14282) RecordCollector throws exception on message processing

2022-11-07 Thread Ramiz Mehran (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17630184#comment-17630184
 ] 

Ramiz Mehran commented on KAFKA-14282:
--

I am also facing the same. Adding logs below:



  07-11-2022 19:00:31.602 [,] ERROR 
org.apache.kafka.streams.processor.internals.RecordCollectorImpl: -  
stream-thread [-StreamThread-8] task [0_136] Unable to records bytes 
produced to topic  by sink node KSTREAM-SINK-09 as the node 
is not recognized.

Known sink nodes are [].

  07-11-2022 19:00:31.602 [,] ERROR 
org.apache.kafka.streams.processor.internals.RecordCollectorImpl: -  
stream-thread [-StreamThread-8] task [0_136] Unable to records bytes 
produced to topic  by sink node KSTREAM-SINK-09 as the node 
is not recognized.

Known sink nodes are [].

  07-11-2022 19:00:31.602 [,] ERROR 
org.apache.kafka.streams.processor.internals.RecordCollectorImpl: -  
stream-thread [-StreamThread-8] task [0_136] Unable to records bytes 
produced to topic  by sink node KSTREAM-SINK-09 as the node 
is not recognized.

Known sink nodes are [].

> RecordCollector throws exception on message processing
> --
>
> Key: KAFKA-14282
> URL: https://issues.apache.org/jira/browse/KAFKA-14282
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.3.1
>Reporter: Sebastian Bruckner
>Priority: Major
>
> Since we upgrade from version 3.2.0 to 3.3.1 we see a lot of exceptions 
> thrown by the RecordCollector
> {code:java}
> stream-thread [XXX-StreamThread-1] task [2_8] Unable to records bytes 
> produced to topic XXX by sink node KSTREAM-SINK-33 as the node is not 
> recognized.
> Known sink nodes are [KSTREAM-SINK-57, 
> XXX-joined-fk-subscription-registration-sink]. 
> {code}
> Restarting the application did not help.
> I think this is related to 
> [KIP-846|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211886093]
>  which was introduced in 3.3.0 with the ticket 
> https://issues.apache.org/jira/browse/KAFKA-13945 .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-7077) KIP-318: Make Kafka Connect Source idempotent

2021-01-21 Thread Mehran (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-7077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17269901#comment-17269901
 ] 

Mehran commented on KAFKA-7077:
---

Any update?

> KIP-318: Make Kafka Connect Source idempotent
> -
>
> Key: KAFKA-7077
> URL: https://issues.apache.org/jira/browse/KAFKA-7077
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 2.0.0
>Reporter: Stephane Maarek
>Assignee: Stephane Maarek
>Priority: Major
>
> KIP Link: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-318%3A+Make+Kafka+Connect+Source+idempotent



--
This message was sent by Atlassian Jira
(v8.3.4#803005)