Hi all, While testing the patch [1], realised that introducing a new REMOTE_STORAGE_NOT_READY error-code is not compatible with the consumer. Consumer does not retry the FETCH request for all the retriable exceptions [2] instead it retries only for specific error codes. Dropping the KIP-1007 <https://cwiki.apache.org/confluence/display/KAFKA/KIP-1007%3A+Introduce+Remote+Storage+Not+Ready+Exception> as it is not compatible with the older clients. Thanks!
[1]: https://github.com/apache/kafka/pull/14822 [2]: https://sourcegraph.com/github.com/apache/kafka@trunk/-/blob/clients/src/main/java/org/apache/kafka/clients/consumer/internals/FetchCollector.java?L325 -- Kamal On Fri, Jan 5, 2024 at 5:03 PM Divij Vaidya <divijvaidy...@gmail.com> wrote: > Thank you for addressing my concerns Kamal. Though, instead of the KIP, I > actually was suggesting to add it in JavaDoc so that someone looking at the > exception is able to understand what it means. We can discuss that during > the PR review though. > > The KIP looks good to me. > > -- > Divij Vaidya > > > > On Fri, Jan 5, 2024 at 10:44 AM Satish Duggana <satish.dugg...@gmail.com> > wrote: > > > Thanks for the KIP Kamal, LGTM. > > > > On Tue, 26 Dec 2023 at 10:23, Kamal Chandraprakash > > <kamal.chandraprak...@gmail.com> wrote: > > > > > > Hi Divij, > > > > > > Thanks for reviewing the KIP! I've updated the KIP with the below > > > documentation. Let me know if it needs to be changed: > > > > > > The consumer can read the local data as long as it knows the offset > from > > > where to fetch the data from. > > > When there is no initial offset, the consumer decides the offset based > on > > > the below config: > > > > > > ``` > > > auto.offset.reset = earliest / latest / none > > > ``` > > > > > > - For `earliest` offset policy and any offset that lies in the > remote > > > storage, the consumer (FETCH request) > > > cannot be able to make progress until the remote log metadata gets > > > synced. > > > - In a FETCH request, when there are multiple partitions where a > > subset > > > of them are consuming from local > > > and others from remote, then only the partitions which are consuming > > > from the remote cannot make > > > progress and the partitions that fetch data from local storage > should > > be > > > able to make progress. > > > - In a FETCH request, when the fetch-offset for a partition is > within > > > the local-storage, then it should be able > > > to consume the messages. > > > - All the calls to LIST_OFFETS will fail until the remote log > metadata > > > gets synced. > > > > > > > > > The code link that is mentioned is referring to the `LIST_OFFSETS` > > handler. > > > Usually, consumers don't use this API > > > unless it's explicitly called by the user. > > > > > > > > > On Fri, Dec 22, 2023 at 4:10 PM Divij Vaidya <divijvaidy...@gmail.com> > > > wrote: > > > > > > > Thanks for the KIP, Kamal. > > > > > > > > The change looks good to me, though, I think we can do a better job > at > > > > documenting what the error means for the clients and users. > > > > > > > > Correct me if I'm wrong, when remote metadata is being synced on a > new > > > > leader, we cannot fetch even the local data (as per [1]), hence, > > partition > > > > is considered "unreadable" but writes (and all other operations such > as > > > > admin operations) can continue to work on that partition. If my > > > > understanding is correct, perhaps, please clarify this in the error > > > > description. In absence of it, it is difficult to determine what this > > error > > > > means for operations that can be performed on a partition. > > > > > > > > [1] > > > > > > > > > > > https://github.com/apache/kafka/blob/82808873cbf6a95611243c2e7984c4aa6ff2cfff/core/src/main/scala/kafka/log/UnifiedLog.scala#L1336 > > > > > > > > > > > > -- > > > > Divij Vaidya > > > > > > > > > > > > > > > > On Tue, Dec 12, 2023 at 9:58 AM Kamal Chandraprakash < > > > > kamal.chandraprak...@gmail.com> wrote: > > > > > > > > > Thanks Luke for reviewing this KIP! > > > > > > > > > > If there are no more comments from others, I'll start the VOTE > since > > this > > > > > is a minor KIP. > > > > > > > > > > On Mon, Dec 11, 2023 at 1:01 PM Luke Chen <show...@gmail.com> > wrote: > > > > > > > > > > > Hi Kamal, > > > > > > > > > > > > Thanks for the KIP! > > > > > > LGTM. > > > > > > > > > > > > Thanks. > > > > > > Luke > > > > > > > > > > > > On Wed, Nov 22, 2023 at 7:28 PM Kamal Chandraprakash < > > > > > > kamal.chandraprak...@gmail.com> wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > I would like to start a discussion to introduce a new error > code > > for > > > > > > > retriable remote storage errors. Please take a look at the > > proposal: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1007%3A+Introduce+Remote+Storage+Not+Ready+Exception > > > > > > > > > > > > > > > > > > > > > > > > >