Hi Alexandre,

> combining custom
> metadata with rlmMetadata increases coupling between Kafka and the
> plugin.

This is true. However, (if I understand your concern correctly,)
rlmMetadata in the current form may be independent from RSM plugins, but
data they point to are accessible only via the particular plugin (the one
that wrote the data or a compatible one). It seems, this coupling already
exists, but it is implicit. To make my point more concrete, imagine an S3
RSM which maps RemoteLogSegmentMetadata objects to S3 object keys. This
mapping logic is a part of the RSM plugin and without it the metadata is
useless. I think it will not get worse if (to follow the example) the
plugin makes the said S3 object keys explicit by adding them to the
metadata. From the high level point of view, moving the custom metadata to
a separate topic doesn't change the picture: it's still the plugin that
binds the standard and custom metadata together.


> For instance, the custom metadata may need to be modified
> outside of Kafka, but the rlmMetadata would still be cached on brokers
> independently of any update of custom metadata. Since both types of
> metadata are authored by different systems, and are cached in
> different layers, this may become a problem, or make plugin migration
> more difficult. What do you think?

This is indeed a problem. I think a solution to this would be to clearly
state that metadata being modified outside of Kafka is out of scope and
instruct the plugin authors that custom metadata could be provided only
reactively from the copyLogSegmentData method and must remain immutable
after that. Does it make sense?


> Yes, you are right that the suggested alternative is to let the plugin
> store its own metadata separately with a solution chosen by the admin
> or plugin provider. For instance, it could be using a dedicated topic
> if chosen to, or relying on an external key-value store.

I see. Yes, this option always exists and doesn't even require a KIP. The
biggest drawback I see is that a plugin will need to reimplement the
consumer/producer + caching mechanics that will exist on the broker side
for the standard remote metadata. I'd like to avoid this and this KIP is
the best solution I see.

Best,
Ivan



On Tue, 18 Apr 2023 at 13:02, Alexandre Dupriez <alexandre.dupr...@gmail.com>
wrote:

> Hi Ivan,
>
> Thanks for the follow-up.
>
> Yes, you are right that the suggested alternative is to let the plugin
> store its own metadata separately with a solution chosen by the admin
> or plugin provider. For instance, it could be using a dedicated topic
> if chosen to, or relying on an external key-value store.
>
> I agree with you on the existing risks associated with running
> third-party code inside Apache Kafka. That said, combining custom
> metadata with rlmMetadata increases coupling between Kafka and the
> plugin. For instance, the custom metadata may need to be modified
> outside of Kafka, but the rlmMetadata would still be cached on brokers
> independently of any update of custom metadata. Since both types of
> metadata are authored by different systems, and are cached in
> different layers, this may become a problem, or make plugin migration
> more difficult. What do you think?
>
> I have a vague memory of this being discussed back when the tiered
> storage KIP was started. Maybe Satish has more background on this.
>
> Thanks,
> Alexandre
>
> Le lun. 17 avr. 2023 à 16:50, Ivan Yurchenko
> <ivan0yurche...@gmail.com> a écrit :
> >
> > Hi Alexandre,
> >
> > Thank you for your feedback!
> >
> > > One question I would have is, what is the benefit of adding these
> > > custom metadata in the rlmMetadata rather than letting the plugin
> > > manage access and persistence to them?
> >
> > Could you please elaborate? Do I understand correctly that the idea is
> that
> > the plugin will have its own storage for those custom metadata, for
> example
> > a special topic?
> >
> > > It would be possible for a user
> > > to use custom metadata large enough to adversely impact access to and
> > > caching of the rlmMetadata by Kafka.
> >
> > Since the custom metadata is 100% under control of the RSM plugin, the
> risk
> > is as big as the risk of running a third-party code (i.e. the RSM
> plugin).
> > The cluster admin must make the decision if they trust it.
> > To mitigate this risk and put it under control, the RSM plugin
> > implementations could document what custom metadata they use and estimate
> > their size.
> >
> > Best,
> > Ivan
> >
> >
> > On Mon, 17 Apr 2023 at 18:14, Alexandre Dupriez <
> alexandre.dupr...@gmail.com>
> > wrote:
> >
> > > Hi Ivan,
> > >
> > > Thank you for the KIP.
> > >
> > > I think the KIP clearly explains the need for out-of-band metadata
> > > authored and used by an implementation of the remote storage manager.
> > > One question I would have is, what is the benefit of adding these
> > > custom metadata in the rlmMetadata rather than letting the plugin
> > > manage access and persistence to them?
> > >
> > > Maybe one disadvantage and potential risk with the approach proposed
> > > in the KIP is that the rlmMetadata is not of a predefined, relatively
> > > constant size (although corner cases with thousands of leader epochs
> > > in the leader epoch map are possible). It would be possible for a user
> > > to use custom metadata large enough to adversely impact access to and
> > > caching of the rlmMetadata by Kafka.
> > >
> > > Thanks,
> > > Alexandre
> > >
> > > Le jeu. 6 avr. 2023 à 16:03, hzh0425 <hzhka...@163.com> a écrit :
> > > >
> > > > I think it's a good idea as we may want to store remote segments in
> > > different buckets
> > > >
> > > >
> > > >
> > > > | |
> > > > hzhka...@163.com
> > > > |
> > > > |
> > > > 邮箱:hzhka...@163.com
> > > > |
> > > >
> > > >
> > > >
> > > >
> > > > ---- 回复的原邮件 ----
> > > > | 发件人 | Ivan Yurchenko<ivan0yurche...@gmail.com> |
> > > > | 日期 | 2023年04月06日 22:37 |
> > > > | 收件人 | dev@kafka.apache.org<dev@kafka.apache.org> |
> > > > | 抄送至 | |
> > > > | 主题 | [DISCUSS] KIP-917: Additional custom metadata for remote log
> > > segment |
> > > > Hello!
> > > >
> > > > I would like to start the discussion thread on KIP-917: Additional
> custom
> > > > metadata for remote log segment [1]
> > > > This KIP is fairly small and proposes to add a new field to the
> remote
> > > > segment metadata.
> > > >
> > > > Thank you!
> > > >
> > > > Best,
> > > > Ivan
> > > >
> > > > [1]
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-917%3A+Additional+custom+metadata+for+remote+log+segment
> > >
>

Reply via email to