[ 
https://issues.apache.org/jira/browse/KAFKA-19462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen updated KAFKA-19462:
------------------------------
    Description: 
Currently in local fetch case, we'll calculate the remaining bytes to be 
fetched for each partition via "fetch.max.bytes" and 
"max.partition.fetch.bytes" configs. For example:
 # Config:
max.partition.fetch.bytes = 1MB
fetch.max.bytes = 1.5MB
 # Topic foo has 2 partitions.
 # Consumer fetches data from topic foo
 # Fetches from foo-0 first, it got 1MB of data (max.partition.fetch.bytes), so 
remaining 0.5 MB of data available to be fetched
 # Fetches from foo-1 for max 0.5MB.
 # Total returned 1.5MB records

However, in remote + local fetch case, because we don't know how much data we 
can fetch before querying remote log metadata manager or other resource, we 
can't have a value to tell replicaManager beforehand. Currently, we treat it as 
0 bytes read. And that's why the final returned data could exceed the 
"fetch.max.bytes" value.

For example:
 # Config:
max.partition.fetch.bytes = 1MB
fetch.max.bytes = 1.5MB
 # Topic foo has 2 partitions + topic boo has 1 partition with tiered storage 
enabled.
 # Consumer fetches data from topic foo and boo
 # Fetches from boo-0, because we don't know how much data we can get, return 
0, and send to remote async read.
 # Fetches from foo-0, it got 1MB of data, so remaining 0.5 MB of data 
available to be fetched
 # Fetches from foo-1 for max 0.5MB.
 # remote async read for boo-0, and it got 1MB data (max.partition.fetch.bytes).
 # Total returned 2.5MB records, which exceeds `fetch.max.bytes = 1.5MB`

 

  was:
Currently in local fetch case, we'll calculate the remaining bytes to be 
fetched for each partition via "fetch.max.bytes" and 
"max.partition.fetch.bytes" configs. For example:
 # Config: 
max.partition.fetch.bytes = 1MB
fetch.max.bytes = 1.5MB
 # Topic foo has 2 partitions.
 # Consumer fetches data from topic foo
 # Fetches from foo-0 first, it got 1MB of data, so remaining 0.5 MB of data 
available to be fetched
 # Fetches from foo-1 for max 0.5MB.
 # Total returned 1.5MB records

However, in remote + local fetch case, because we don't know how much data we 
can fetch before querying remote log metadata manager or other resource, we 
can't have a value to tell replicaManager beforehand. Currently, we treat it as 
0 bytes read. And that's why the final returned data could exceed the 
"fetch.max.bytes" value. 

For example:
 # Config: 
max.partition.fetch.bytes = 1MB
fetch.max.bytes = 1.5MB
 # Topic foo has 2 partitions + topic boo has 1 partition with tiered storage 
enabled.
 # Consumer fetches data from topic foo and boo
 # Fetches from boo-0, because we don't know how much data we can get, return 
0, and send to remote async read.
 # Fetches from foo-0, it got 1MB of data, so remaining 0.5 MB of data 
available to be fetched
 # Fetches from foo-1 for max 0.5MB.
 # remote async read for boo-0, and it got 1MB data (max.partition.fetch.bytes).
 # Total returned 2.5MB records, which exceeds `fetch.max.bytes = 1.5MB`

 


> "fetch.max.bytes" config is not honored when remote + local fetch
> -----------------------------------------------------------------
>
>                 Key: KAFKA-19462
>                 URL: https://issues.apache.org/jira/browse/KAFKA-19462
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Luke Chen
>            Assignee: Luke Chen
>            Priority: Major
>
> Currently in local fetch case, we'll calculate the remaining bytes to be 
> fetched for each partition via "fetch.max.bytes" and 
> "max.partition.fetch.bytes" configs. For example:
>  # Config:
> max.partition.fetch.bytes = 1MB
> fetch.max.bytes = 1.5MB
>  # Topic foo has 2 partitions.
>  # Consumer fetches data from topic foo
>  # Fetches from foo-0 first, it got 1MB of data (max.partition.fetch.bytes), 
> so remaining 0.5 MB of data available to be fetched
>  # Fetches from foo-1 for max 0.5MB.
>  # Total returned 1.5MB records
> However, in remote + local fetch case, because we don't know how much data we 
> can fetch before querying remote log metadata manager or other resource, we 
> can't have a value to tell replicaManager beforehand. Currently, we treat it 
> as 0 bytes read. And that's why the final returned data could exceed the 
> "fetch.max.bytes" value.
> For example:
>  # Config:
> max.partition.fetch.bytes = 1MB
> fetch.max.bytes = 1.5MB
>  # Topic foo has 2 partitions + topic boo has 1 partition with tiered storage 
> enabled.
>  # Consumer fetches data from topic foo and boo
>  # Fetches from boo-0, because we don't know how much data we can get, return 
> 0, and send to remote async read.
>  # Fetches from foo-0, it got 1MB of data, so remaining 0.5 MB of data 
> available to be fetched
>  # Fetches from foo-1 for max 0.5MB.
>  # remote async read for boo-0, and it got 1MB data 
> (max.partition.fetch.bytes).
>  # Total returned 2.5MB records, which exceeds `fetch.max.bytes = 1.5MB`
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to