Hi Ivan,

Thanks for the follow-up.

Yes, you are right that the suggested alternative is to let the plugin
store its own metadata separately with a solution chosen by the admin
or plugin provider. For instance, it could be using a dedicated topic
if chosen to, or relying on an external key-value store.

I agree with you on the existing risks associated with running
third-party code inside Apache Kafka. That said, combining custom
metadata with rlmMetadata increases coupling between Kafka and the
plugin. For instance, the custom metadata may need to be modified
outside of Kafka, but the rlmMetadata would still be cached on brokers
independently of any update of custom metadata. Since both types of
metadata are authored by different systems, and are cached in
different layers, this may become a problem, or make plugin migration
more difficult. What do you think?

I have a vague memory of this being discussed back when the tiered
storage KIP was started. Maybe Satish has more background on this.

Thanks,
Alexandre

Le lun. 17 avr. 2023 à 16:50, Ivan Yurchenko
<ivan0yurche...@gmail.com> a écrit :
>
> Hi Alexandre,
>
> Thank you for your feedback!
>
> > One question I would have is, what is the benefit of adding these
> > custom metadata in the rlmMetadata rather than letting the plugin
> > manage access and persistence to them?
>
> Could you please elaborate? Do I understand correctly that the idea is that
> the plugin will have its own storage for those custom metadata, for example
> a special topic?
>
> > It would be possible for a user
> > to use custom metadata large enough to adversely impact access to and
> > caching of the rlmMetadata by Kafka.
>
> Since the custom metadata is 100% under control of the RSM plugin, the risk
> is as big as the risk of running a third-party code (i.e. the RSM plugin).
> The cluster admin must make the decision if they trust it.
> To mitigate this risk and put it under control, the RSM plugin
> implementations could document what custom metadata they use and estimate
> their size.
>
> Best,
> Ivan
>
>
> On Mon, 17 Apr 2023 at 18:14, Alexandre Dupriez <alexandre.dupr...@gmail.com>
> wrote:
>
> > Hi Ivan,
> >
> > Thank you for the KIP.
> >
> > I think the KIP clearly explains the need for out-of-band metadata
> > authored and used by an implementation of the remote storage manager.
> > One question I would have is, what is the benefit of adding these
> > custom metadata in the rlmMetadata rather than letting the plugin
> > manage access and persistence to them?
> >
> > Maybe one disadvantage and potential risk with the approach proposed
> > in the KIP is that the rlmMetadata is not of a predefined, relatively
> > constant size (although corner cases with thousands of leader epochs
> > in the leader epoch map are possible). It would be possible for a user
> > to use custom metadata large enough to adversely impact access to and
> > caching of the rlmMetadata by Kafka.
> >
> > Thanks,
> > Alexandre
> >
> > Le jeu. 6 avr. 2023 à 16:03, hzh0425 <hzhka...@163.com> a écrit :
> > >
> > > I think it's a good idea as we may want to store remote segments in
> > different buckets
> > >
> > >
> > >
> > > | |
> > > hzhka...@163.com
> > > |
> > > |
> > > 邮箱:hzhka...@163.com
> > > |
> > >
> > >
> > >
> > >
> > > ---- 回复的原邮件 ----
> > > | 发件人 | Ivan Yurchenko<ivan0yurche...@gmail.com> |
> > > | 日期 | 2023年04月06日 22:37 |
> > > | 收件人 | dev@kafka.apache.org<dev@kafka.apache.org> |
> > > | 抄送至 | |
> > > | 主题 | [DISCUSS] KIP-917: Additional custom metadata for remote log
> > segment |
> > > Hello!
> > >
> > > I would like to start the discussion thread on KIP-917: Additional custom
> > > metadata for remote log segment [1]
> > > This KIP is fairly small and proposes to add a new field to the remote
> > > segment metadata.
> > >
> > > Thank you!
> > >
> > > Best,
> > > Ivan
> > >
> > > [1]
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-917%3A+Additional+custom+metadata+for+remote+log+segment
> >

Reply via email to