Hi all, If there are no more comments, I'll open a VOTE thread.
-- Kamal On Sat, May 4, 2024 at 8:39 AM Kamal Chandraprakash < kamal.chandraprak...@gmail.com> wrote: > Luke, > > Thanks for the review! > > DelayedFetch and DelayedRemoteFetch are orthogonal calls > <https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ReplicaManager.scala#L1559> > . > Only one of them will be active in a given FETCH request. > > The fetch request might take more than 500 ms when the time > taken to read data from remote storage exceeds 500 ms and > `remote.fetch.max.wait.ms` is configured higher than 500 ms. > > -- > Kamal > > > On Fri, May 3, 2024 at 1:55 PM Luke Chen <show...@gmail.com> wrote: > >> Hi Kamal, >> >> Thanks for the KIP! >> Sorry for the late review. >> >> Overall LGTM! Just 1 question: >> >> If one fetch request contains 2 partitions: [p1, p2] >> fetch.max.wait.ms: 500, remote.fetch.max.wait.ms: 1000 >> >> And now, p1 fetch offset is the log end offset and has no new data coming, >> and p2 fetch offset is to fetch from remote storage. >> And suppose the fetch from remote storage takes 1000ms. >> So, question: >> Will this fetch request return in 500ms or 1000ms? >> And what will be returned? >> >> I think before this change, it'll return within 500ms, right? >> But it's not clear what behavior it will be after this KIP. >> >> Thank you. >> Luke >> >> >> On Fri, May 3, 2024 at 1:56 PM Kamal Chandraprakash < >> kamal.chandraprak...@gmail.com> wrote: >> >> > Christo, >> > >> > We have localTimeMs, remoteTimeMs, and totalTimeMs as part of the >> > FetchConsumer request metric. >> > >> > >> > >> kafka.network:type=RequestMetrics,name={LocalTimeMs|RemoteTimeMs|TotalTimeMs},request={Produce|FetchConsumer|FetchFollower} >> > >> > RemoteTimeMs refers to the amount of time spent in the purgatory for >> normal >> > fetch requests >> > and amount of time spent in reading the remote data for remote-fetch >> > requests. Do we want >> > to have a separate `TieredStorageTimeMs` to capture the time spent in >> > remote-read requests? >> > >> > With per-broker level timer metrics combined with the request level >> > metrics, the user will have >> > sufficient information. >> > >> > Metric name = >> > >> > >> kafka.log.remote:type=RemoteLogManager,name=RemoteLogReaderFetchRateAndTimeMs >> > >> > -- >> > Kamal >> > >> > On Mon, Apr 29, 2024 at 1:38 PM Christo Lolov <christolo...@gmail.com> >> > wrote: >> > >> > > Heya! >> > > >> > > Is it difficult to instead add the metric at >> > > kafka.network:type=RequestMetrics,name=TieredStorageMs (or some other >> > > name=*)? Alternatively, if it is difficult to add it there, is it >> > possible >> > > to add 2 metrics, one at the RequestMetrics level (even if it is >> > > total-time-ms - (all other times)) and one at what you are proposing? >> As >> > an >> > > operator I would find it strange to not see the metric in the >> > > RequestMetrics. >> > > >> > > Your thoughts? >> > > >> > > Best, >> > > Christo >> > > >> > > On Sun, 28 Apr 2024 at 10:52, Kamal Chandraprakash < >> > > kamal.chandraprak...@gmail.com> wrote: >> > > >> > > > Christo, >> > > > >> > > > Updated the KIP with the remote fetch latency metric. Please take >> > another >> > > > look! >> > > > >> > > > -- >> > > > Kamal >> > > > >> > > > On Sun, Apr 28, 2024 at 12:23 PM Kamal Chandraprakash < >> > > > kamal.chandraprak...@gmail.com> wrote: >> > > > >> > > > > Hi Federico, >> > > > > >> > > > > Thanks for the suggestion! Updated the config name to " >> > > > > remote.fetch.max.wait.ms". >> > > > > >> > > > > Christo, >> > > > > >> > > > > Good point. We don't have the remote-read latency metrics to >> measure >> > > the >> > > > > performance of the remote read requests. I'll update the KIP to >> emit >> > > this >> > > > > metric. >> > > > > >> > > > > -- >> > > > > Kamal >> > > > > >> > > > > >> > > > > On Sat, Apr 27, 2024 at 4:03 PM Federico Valeri < >> > fedeval...@gmail.com> >> > > > > wrote: >> > > > > >> > > > >> Hi Kamal, it looks like all TS configurations starts with >> "remote." >> > > > >> prefix, so I was wondering if we should name it >> > > > >> "remote.fetch.max.wait.ms". >> > > > >> >> > > > >> On Fri, Apr 26, 2024 at 7:07 PM Kamal Chandraprakash >> > > > >> <kamal.chandraprak...@gmail.com> wrote: >> > > > >> > >> > > > >> > Hi all, >> > > > >> > >> > > > >> > If there are no more comments, I'll start a vote thread by >> > tomorrow. >> > > > >> > Please review the KIP. >> > > > >> > >> > > > >> > Thanks, >> > > > >> > Kamal >> > > > >> > >> > > > >> > On Sat, Mar 30, 2024 at 11:08 PM Kamal Chandraprakash < >> > > > >> > kamal.chandraprak...@gmail.com> wrote: >> > > > >> > >> > > > >> > > Hi all, >> > > > >> > > >> > > > >> > > Bumping the thread. Please review this KIP. Thanks! >> > > > >> > > >> > > > >> > > On Thu, Feb 1, 2024 at 9:11 PM Kamal Chandraprakash < >> > > > >> > > kamal.chandraprak...@gmail.com> wrote: >> > > > >> > > >> > > > >> > >> Hi Jorge, >> > > > >> > >> >> > > > >> > >> Thanks for the review! Added your suggestions to the KIP. >> PTAL. >> > > > >> > >> >> > > > >> > >> The `fetch.max.wait.ms` config will be also applicable for >> > > topics >> > > > >> > >> enabled with remote storage. >> > > > >> > >> Updated the description to: >> > > > >> > >> >> > > > >> > >> ``` >> > > > >> > >> The maximum amount of time the server will block before >> > answering >> > > > the >> > > > >> > >> fetch request >> > > > >> > >> when it is reading near to the tail of the partition >> > > > >> (high-watermark) and >> > > > >> > >> there isn't >> > > > >> > >> sufficient data to immediately satisfy the requirement >> given by >> > > > >> > >> fetch.min.bytes. >> > > > >> > >> ``` >> > > > >> > >> >> > > > >> > >> -- >> > > > >> > >> Kamal >> > > > >> > >> >> > > > >> > >> On Thu, Feb 1, 2024 at 12:12 AM Jorge Esteban Quilcate >> Otoya < >> > > > >> > >> quilcate.jo...@gmail.com> wrote: >> > > > >> > >> >> > > > >> > >>> Hi Kamal, >> > > > >> > >>> >> > > > >> > >>> Thanks for this KIP! It should help to solve one of the >> main >> > > > issues >> > > > >> with >> > > > >> > >>> tiered storage at the moment that is dealing with >> individual >> > > > >> consumer >> > > > >> > >>> configurations to avoid flooding logs with interrupted >> > > exceptions. >> > > > >> > >>> >> > > > >> > >>> One of the topics discussed in [1][2] was on the semantics >> of >> > ` >> > > > >> > >>> fetch.max.wait.ms` and how it's affected by remote >> storage. >> > > > Should >> > > > >> we >> > > > >> > >>> consider within this KIP the update of `fetch.max.wail.ms` >> > docs >> > > > to >> > > > >> > >>> clarify >> > > > >> > >>> it only applies to local storage? >> > > > >> > >>> >> > > > >> > >>> Otherwise, LGTM -- looking forward to see this KIP adopted. >> > > > >> > >>> >> > > > >> > >>> [1] https://issues.apache.org/jira/browse/KAFKA-15776 >> > > > >> > >>> [2] >> > > > >> >> https://github.com/apache/kafka/pull/14778#issuecomment-1820588080 >> > > > >> > >>> >> > > > >> > >>> On Tue, 30 Jan 2024 at 01:01, Kamal Chandraprakash < >> > > > >> > >>> kamal.chandraprak...@gmail.com> wrote: >> > > > >> > >>> >> > > > >> > >>> > Hi all, >> > > > >> > >>> > >> > > > >> > >>> > I have opened a KIP-1018 >> > > > >> > >>> > < >> > > > >> > >>> > >> > > > >> > >>> >> > > > >> >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1018%3A+Introduce+max+remote+fetch+timeout+config+for+DelayedRemoteFetch+requests >> > > > >> > >>> > > >> > > > >> > >>> > to introduce dynamic max-remote-fetch-timeout broker >> config >> > to >> > > > >> give >> > > > >> > >>> more >> > > > >> > >>> > control to the operator. >> > > > >> > >>> > >> > > > >> > >>> > >> > > > >> > >>> > >> > > > >> > >>> >> > > > >> >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1018%3A+Introduce+max+remote+fetch+timeout+config+for+DelayedRemoteFetch+requests >> > > > >> > >>> > >> > > > >> > >>> > Let me know if you have any feedback or suggestions. >> > > > >> > >>> > >> > > > >> > >>> > -- >> > > > >> > >>> > Kamal >> > > > >> > >>> > >> > > > >> > >>> >> > > > >> > >> >> > > > >> >> > > > > >> > > > >> > > >> > >> >