Hi all,

I want to bring this to a conclusion (positive or negative), so if there
are no more questions in a couple of days, I'll put the KIP to the vote.

Best,
Ivan


On Fri, 5 May 2023 at 18:42, Ivan Yurchenko <ivan0yurche...@gmail.com>
wrote:

> Hi Alexandre,
>
> > combining custom
> > metadata with rlmMetadata increases coupling between Kafka and the
> > plugin.
>
> This is true. However, (if I understand your concern correctly,)
> rlmMetadata in the current form may be independent from RSM plugins, but
> data they point to are accessible only via the particular plugin (the one
> that wrote the data or a compatible one). It seems, this coupling already
> exists, but it is implicit. To make my point more concrete, imagine an S3
> RSM which maps RemoteLogSegmentMetadata objects to S3 object keys. This
> mapping logic is a part of the RSM plugin and without it the metadata is
> useless. I think it will not get worse if (to follow the example) the
> plugin makes the said S3 object keys explicit by adding them to the
> metadata. From the high level point of view, moving the custom metadata to
> a separate topic doesn't change the picture: it's still the plugin that
> binds the standard and custom metadata together.
>
>
> > For instance, the custom metadata may need to be modified
> > outside of Kafka, but the rlmMetadata would still be cached on brokers
> > independently of any update of custom metadata. Since both types of
> > metadata are authored by different systems, and are cached in
> > different layers, this may become a problem, or make plugin migration
> > more difficult. What do you think?
>
> This is indeed a problem. I think a solution to this would be to clearly
> state that metadata being modified outside of Kafka is out of scope and
> instruct the plugin authors that custom metadata could be provided only
> reactively from the copyLogSegmentData method and must remain immutable
> after that. Does it make sense?
>
>
> > Yes, you are right that the suggested alternative is to let the plugin
> > store its own metadata separately with a solution chosen by the admin
> > or plugin provider. For instance, it could be using a dedicated topic
> > if chosen to, or relying on an external key-value store.
>
> I see. Yes, this option always exists and doesn't even require a KIP. The
> biggest drawback I see is that a plugin will need to reimplement the
> consumer/producer + caching mechanics that will exist on the broker side
> for the standard remote metadata. I'd like to avoid this and this KIP is
> the best solution I see.
>
> Best,
> Ivan
>
>
>
> On Tue, 18 Apr 2023 at 13:02, Alexandre Dupriez <
> alexandre.dupr...@gmail.com> wrote:
>
>> Hi Ivan,
>>
>> Thanks for the follow-up.
>>
>> Yes, you are right that the suggested alternative is to let the plugin
>> store its own metadata separately with a solution chosen by the admin
>> or plugin provider. For instance, it could be using a dedicated topic
>> if chosen to, or relying on an external key-value store.
>>
>> I agree with you on the existing risks associated with running
>> third-party code inside Apache Kafka. That said, combining custom
>> metadata with rlmMetadata increases coupling between Kafka and the
>> plugin. For instance, the custom metadata may need to be modified
>> outside of Kafka, but the rlmMetadata would still be cached on brokers
>> independently of any update of custom metadata. Since both types of
>> metadata are authored by different systems, and are cached in
>> different layers, this may become a problem, or make plugin migration
>> more difficult. What do you think?
>>
>> I have a vague memory of this being discussed back when the tiered
>> storage KIP was started. Maybe Satish has more background on this.
>>
>> Thanks,
>> Alexandre
>>
>> Le lun. 17 avr. 2023 à 16:50, Ivan Yurchenko
>> <ivan0yurche...@gmail.com> a écrit :
>> >
>> > Hi Alexandre,
>> >
>> > Thank you for your feedback!
>> >
>> > > One question I would have is, what is the benefit of adding these
>> > > custom metadata in the rlmMetadata rather than letting the plugin
>> > > manage access and persistence to them?
>> >
>> > Could you please elaborate? Do I understand correctly that the idea is
>> that
>> > the plugin will have its own storage for those custom metadata, for
>> example
>> > a special topic?
>> >
>> > > It would be possible for a user
>> > > to use custom metadata large enough to adversely impact access to and
>> > > caching of the rlmMetadata by Kafka.
>> >
>> > Since the custom metadata is 100% under control of the RSM plugin, the
>> risk
>> > is as big as the risk of running a third-party code (i.e. the RSM
>> plugin).
>> > The cluster admin must make the decision if they trust it.
>> > To mitigate this risk and put it under control, the RSM plugin
>> > implementations could document what custom metadata they use and
>> estimate
>> > their size.
>> >
>> > Best,
>> > Ivan
>> >
>> >
>> > On Mon, 17 Apr 2023 at 18:14, Alexandre Dupriez <
>> alexandre.dupr...@gmail.com>
>> > wrote:
>> >
>> > > Hi Ivan,
>> > >
>> > > Thank you for the KIP.
>> > >
>> > > I think the KIP clearly explains the need for out-of-band metadata
>> > > authored and used by an implementation of the remote storage manager.
>> > > One question I would have is, what is the benefit of adding these
>> > > custom metadata in the rlmMetadata rather than letting the plugin
>> > > manage access and persistence to them?
>> > >
>> > > Maybe one disadvantage and potential risk with the approach proposed
>> > > in the KIP is that the rlmMetadata is not of a predefined, relatively
>> > > constant size (although corner cases with thousands of leader epochs
>> > > in the leader epoch map are possible). It would be possible for a user
>> > > to use custom metadata large enough to adversely impact access to and
>> > > caching of the rlmMetadata by Kafka.
>> > >
>> > > Thanks,
>> > > Alexandre
>> > >
>> > > Le jeu. 6 avr. 2023 à 16:03, hzh0425 <hzhka...@163.com> a écrit :
>> > > >
>> > > > I think it's a good idea as we may want to store remote segments in
>> > > different buckets
>> > > >
>> > > >
>> > > >
>> > > > | |
>> > > > hzhka...@163.com
>> > > > |
>> > > > |
>> > > > 邮箱:hzhka...@163.com
>> > > > |
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > ---- 回复的原邮件 ----
>> > > > | 发件人 | Ivan Yurchenko<ivan0yurche...@gmail.com> |
>> > > > | 日期 | 2023年04月06日 22:37 |
>> > > > | 收件人 | dev@kafka.apache.org<dev@kafka.apache.org> |
>> > > > | 抄送至 | |
>> > > > | 主题 | [DISCUSS] KIP-917: Additional custom metadata for remote log
>> > > segment |
>> > > > Hello!
>> > > >
>> > > > I would like to start the discussion thread on KIP-917: Additional
>> custom
>> > > > metadata for remote log segment [1]
>> > > > This KIP is fairly small and proposes to add a new field to the
>> remote
>> > > > segment metadata.
>> > > >
>> > > > Thank you!
>> > > >
>> > > > Best,
>> > > > Ivan
>> > > >
>> > > > [1]
>> > > >
>> > >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-917%3A+Additional+custom+metadata+for+remote+log+segment
>> > >
>>
>

Reply via email to