Hi all, I want to bring this to a conclusion (positive or negative), so if there are no more questions in a couple of days, I'll put the KIP to the vote.
Best, Ivan On Fri, 5 May 2023 at 18:42, Ivan Yurchenko <ivan0yurche...@gmail.com> wrote: > Hi Alexandre, > > > combining custom > > metadata with rlmMetadata increases coupling between Kafka and the > > plugin. > > This is true. However, (if I understand your concern correctly,) > rlmMetadata in the current form may be independent from RSM plugins, but > data they point to are accessible only via the particular plugin (the one > that wrote the data or a compatible one). It seems, this coupling already > exists, but it is implicit. To make my point more concrete, imagine an S3 > RSM which maps RemoteLogSegmentMetadata objects to S3 object keys. This > mapping logic is a part of the RSM plugin and without it the metadata is > useless. I think it will not get worse if (to follow the example) the > plugin makes the said S3 object keys explicit by adding them to the > metadata. From the high level point of view, moving the custom metadata to > a separate topic doesn't change the picture: it's still the plugin that > binds the standard and custom metadata together. > > > > For instance, the custom metadata may need to be modified > > outside of Kafka, but the rlmMetadata would still be cached on brokers > > independently of any update of custom metadata. Since both types of > > metadata are authored by different systems, and are cached in > > different layers, this may become a problem, or make plugin migration > > more difficult. What do you think? > > This is indeed a problem. I think a solution to this would be to clearly > state that metadata being modified outside of Kafka is out of scope and > instruct the plugin authors that custom metadata could be provided only > reactively from the copyLogSegmentData method and must remain immutable > after that. Does it make sense? > > > > Yes, you are right that the suggested alternative is to let the plugin > > store its own metadata separately with a solution chosen by the admin > > or plugin provider. For instance, it could be using a dedicated topic > > if chosen to, or relying on an external key-value store. > > I see. Yes, this option always exists and doesn't even require a KIP. The > biggest drawback I see is that a plugin will need to reimplement the > consumer/producer + caching mechanics that will exist on the broker side > for the standard remote metadata. I'd like to avoid this and this KIP is > the best solution I see. > > Best, > Ivan > > > > On Tue, 18 Apr 2023 at 13:02, Alexandre Dupriez < > alexandre.dupr...@gmail.com> wrote: > >> Hi Ivan, >> >> Thanks for the follow-up. >> >> Yes, you are right that the suggested alternative is to let the plugin >> store its own metadata separately with a solution chosen by the admin >> or plugin provider. For instance, it could be using a dedicated topic >> if chosen to, or relying on an external key-value store. >> >> I agree with you on the existing risks associated with running >> third-party code inside Apache Kafka. That said, combining custom >> metadata with rlmMetadata increases coupling between Kafka and the >> plugin. For instance, the custom metadata may need to be modified >> outside of Kafka, but the rlmMetadata would still be cached on brokers >> independently of any update of custom metadata. Since both types of >> metadata are authored by different systems, and are cached in >> different layers, this may become a problem, or make plugin migration >> more difficult. What do you think? >> >> I have a vague memory of this being discussed back when the tiered >> storage KIP was started. Maybe Satish has more background on this. >> >> Thanks, >> Alexandre >> >> Le lun. 17 avr. 2023 à 16:50, Ivan Yurchenko >> <ivan0yurche...@gmail.com> a écrit : >> > >> > Hi Alexandre, >> > >> > Thank you for your feedback! >> > >> > > One question I would have is, what is the benefit of adding these >> > > custom metadata in the rlmMetadata rather than letting the plugin >> > > manage access and persistence to them? >> > >> > Could you please elaborate? Do I understand correctly that the idea is >> that >> > the plugin will have its own storage for those custom metadata, for >> example >> > a special topic? >> > >> > > It would be possible for a user >> > > to use custom metadata large enough to adversely impact access to and >> > > caching of the rlmMetadata by Kafka. >> > >> > Since the custom metadata is 100% under control of the RSM plugin, the >> risk >> > is as big as the risk of running a third-party code (i.e. the RSM >> plugin). >> > The cluster admin must make the decision if they trust it. >> > To mitigate this risk and put it under control, the RSM plugin >> > implementations could document what custom metadata they use and >> estimate >> > their size. >> > >> > Best, >> > Ivan >> > >> > >> > On Mon, 17 Apr 2023 at 18:14, Alexandre Dupriez < >> alexandre.dupr...@gmail.com> >> > wrote: >> > >> > > Hi Ivan, >> > > >> > > Thank you for the KIP. >> > > >> > > I think the KIP clearly explains the need for out-of-band metadata >> > > authored and used by an implementation of the remote storage manager. >> > > One question I would have is, what is the benefit of adding these >> > > custom metadata in the rlmMetadata rather than letting the plugin >> > > manage access and persistence to them? >> > > >> > > Maybe one disadvantage and potential risk with the approach proposed >> > > in the KIP is that the rlmMetadata is not of a predefined, relatively >> > > constant size (although corner cases with thousands of leader epochs >> > > in the leader epoch map are possible). It would be possible for a user >> > > to use custom metadata large enough to adversely impact access to and >> > > caching of the rlmMetadata by Kafka. >> > > >> > > Thanks, >> > > Alexandre >> > > >> > > Le jeu. 6 avr. 2023 à 16:03, hzh0425 <hzhka...@163.com> a écrit : >> > > > >> > > > I think it's a good idea as we may want to store remote segments in >> > > different buckets >> > > > >> > > > >> > > > >> > > > | | >> > > > hzhka...@163.com >> > > > | >> > > > | >> > > > 邮箱:hzhka...@163.com >> > > > | >> > > > >> > > > >> > > > >> > > > >> > > > ---- 回复的原邮件 ---- >> > > > | 发件人 | Ivan Yurchenko<ivan0yurche...@gmail.com> | >> > > > | 日期 | 2023年04月06日 22:37 | >> > > > | 收件人 | dev@kafka.apache.org<dev@kafka.apache.org> | >> > > > | 抄送至 | | >> > > > | 主题 | [DISCUSS] KIP-917: Additional custom metadata for remote log >> > > segment | >> > > > Hello! >> > > > >> > > > I would like to start the discussion thread on KIP-917: Additional >> custom >> > > > metadata for remote log segment [1] >> > > > This KIP is fairly small and proposes to add a new field to the >> remote >> > > > segment metadata. >> > > > >> > > > Thank you! >> > > > >> > > > Best, >> > > > Ivan >> > > > >> > > > [1] >> > > > >> > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-917%3A+Additional+custom+metadata+for+remote+log+segment >> > > >> >