Hi Ivan,

I agree this is a good improvement for remote storage plugin.
One question from me,
I saw you add the `RemoteLogSegmentMetadataRecord` and
`RemoteLogSegmentMetadataSnapshotRecord`, I'd like to confirm we're
not adding these records into any new API request/response, right?
I think the serialization/deserialization should be the remote storage
plugin provider's responsibility. And in Kafka's point of view, this
customMetadata is just a bunch of byte array, we don't have to parse
it.
Is my understanding correct?
If so, maybe you can add some comments into the KIP to mention it, to
avoid confusion.

Usually the "record" being added in the KIP will be the one affecting
the public interface, ex: API request/response.
So I'd like to confirm it.


Thanks.
Luke

On Thu, Jun 8, 2023 at 2:09 AM Ivan Yurchenko <ivan0yurche...@gmail.com> wrote:
>
> Hi Satish,
>
> Thank you for your feedback.
>
> I've nothing against going from Map<String, byte[]> to byte[].
> Serialization should not be a problem for RSM implementations: `Struct`,
> `Schema` and other useful serde classes are distributed as a part of the
> kafka-clients library.
>
> Also a good idea to add the size limiting setting, some
> `remote.log.metadata.custom.metadata.max.size`. A sensible default may be
> 10 KB, which is enough to host a struct with 10 long (almost) 1K symbol
> ASCII strings.
>
> If a piece of custom metadata exceeds the limit, the execution of
> RLMTask.copyLogSegmentsToRemote should be interrupted with an error message.
>
> Does this sound good?
> If so, I'll update the KIP accordingly. And I think it may be time for the
> vote after that.
>
> Best,
> Ivan
>
>
>
> On Sat, 3 Jun 2023 at 17:13, Satish Duggana <satish.dugg...@gmail.com>
> wrote:
>
> > Hi Ivan,
> > Thanks for the KIP.
> >
> > The motivation of the KIP looks reasonable to me. It requires a way
> > for RSM providers to add custom metadata with the existing
> > RemoteLogSegmentMetadata. I remember we wanted to introduce a very
> > similar change in the earlier proposals called
> > RemoteLogMetadataContext. But we dropped that as we did not feel a
> > strong need at that time and we wanted to revisit it if needed. But I
> > see there is a clear need for this kind of custom metadata to keep
> > with RemoteLogSegmentMetadata.
> >
> > It is better to introduce a new class for this custom metadata in
> > RemoteLogSegmentMetadata like below for any changes in the future.
> > RemoteLogSegmentMetadata will have this as an optional value.
> >
> > public class RemoteLogSegmentMetadata {
> > ...
> > public static class CustomMetadata {
> >      private final byte[] value;
> >     ...
> > }
> > ...
> > }
> >
> > This is completely opaque and it is the RSM implementation provider's
> > responsibility in serializing and deserializing the bytes. We can
> > introduce a property to guard the size with a configurable property
> > with a default value to avoid any unwanted large size values.
> >
> > Thanks,
> > Satish.
> >
> > On Tue, 30 May 2023 at 10:59, Ivan Yurchenko <ivan0yurche...@gmail.com>
> > wrote:
> > >
> > > Hi all,
> > >
> > > I want to bring this to a conclusion (positive or negative), so if there
> > > are no more questions in a couple of days, I'll put the KIP to the vote.
> > >
> > > Best,
> > > Ivan
> > >
> > >
> > > On Fri, 5 May 2023 at 18:42, Ivan Yurchenko <ivan0yurche...@gmail.com>
> > > wrote:
> > >
> > > > Hi Alexandre,
> > > >
> > > > > combining custom
> > > > > metadata with rlmMetadata increases coupling between Kafka and the
> > > > > plugin.
> > > >
> > > > This is true. However, (if I understand your concern correctly,)
> > > > rlmMetadata in the current form may be independent from RSM plugins,
> > but
> > > > data they point to are accessible only via the particular plugin (the
> > one
> > > > that wrote the data or a compatible one). It seems, this coupling
> > already
> > > > exists, but it is implicit. To make my point more concrete, imagine an
> > S3
> > > > RSM which maps RemoteLogSegmentMetadata objects to S3 object keys. This
> > > > mapping logic is a part of the RSM plugin and without it the metadata
> > is
> > > > useless. I think it will not get worse if (to follow the example) the
> > > > plugin makes the said S3 object keys explicit by adding them to the
> > > > metadata. From the high level point of view, moving the custom
> > metadata to
> > > > a separate topic doesn't change the picture: it's still the plugin that
> > > > binds the standard and custom metadata together.
> > > >
> > > >
> > > > > For instance, the custom metadata may need to be modified
> > > > > outside of Kafka, but the rlmMetadata would still be cached on
> > brokers
> > > > > independently of any update of custom metadata. Since both types of
> > > > > metadata are authored by different systems, and are cached in
> > > > > different layers, this may become a problem, or make plugin migration
> > > > > more difficult. What do you think?
> > > >
> > > > This is indeed a problem. I think a solution to this would be to
> > clearly
> > > > state that metadata being modified outside of Kafka is out of scope and
> > > > instruct the plugin authors that custom metadata could be provided only
> > > > reactively from the copyLogSegmentData method and must remain immutable
> > > > after that. Does it make sense?
> > > >
> > > >
> > > > > Yes, you are right that the suggested alternative is to let the
> > plugin
> > > > > store its own metadata separately with a solution chosen by the admin
> > > > > or plugin provider. For instance, it could be using a dedicated topic
> > > > > if chosen to, or relying on an external key-value store.
> > > >
> > > > I see. Yes, this option always exists and doesn't even require a KIP.
> > The
> > > > biggest drawback I see is that a plugin will need to reimplement the
> > > > consumer/producer + caching mechanics that will exist on the broker
> > side
> > > > for the standard remote metadata. I'd like to avoid this and this KIP
> > is
> > > > the best solution I see.
> > > >
> > > > Best,
> > > > Ivan
> > > >
> > > >
> > > >
> > > > On Tue, 18 Apr 2023 at 13:02, Alexandre Dupriez <
> > > > alexandre.dupr...@gmail.com> wrote:
> > > >
> > > >> Hi Ivan,
> > > >>
> > > >> Thanks for the follow-up.
> > > >>
> > > >> Yes, you are right that the suggested alternative is to let the plugin
> > > >> store its own metadata separately with a solution chosen by the admin
> > > >> or plugin provider. For instance, it could be using a dedicated topic
> > > >> if chosen to, or relying on an external key-value store.
> > > >>
> > > >> I agree with you on the existing risks associated with running
> > > >> third-party code inside Apache Kafka. That said, combining custom
> > > >> metadata with rlmMetadata increases coupling between Kafka and the
> > > >> plugin. For instance, the custom metadata may need to be modified
> > > >> outside of Kafka, but the rlmMetadata would still be cached on brokers
> > > >> independently of any update of custom metadata. Since both types of
> > > >> metadata are authored by different systems, and are cached in
> > > >> different layers, this may become a problem, or make plugin migration
> > > >> more difficult. What do you think?
> > > >>
> > > >> I have a vague memory of this being discussed back when the tiered
> > > >> storage KIP was started. Maybe Satish has more background on this.
> > > >>
> > > >> Thanks,
> > > >> Alexandre
> > > >>
> > > >> Le lun. 17 avr. 2023 à 16:50, Ivan Yurchenko
> > > >> <ivan0yurche...@gmail.com> a écrit :
> > > >> >
> > > >> > Hi Alexandre,
> > > >> >
> > > >> > Thank you for your feedback!
> > > >> >
> > > >> > > One question I would have is, what is the benefit of adding these
> > > >> > > custom metadata in the rlmMetadata rather than letting the plugin
> > > >> > > manage access and persistence to them?
> > > >> >
> > > >> > Could you please elaborate? Do I understand correctly that the idea
> > is
> > > >> that
> > > >> > the plugin will have its own storage for those custom metadata, for
> > > >> example
> > > >> > a special topic?
> > > >> >
> > > >> > > It would be possible for a user
> > > >> > > to use custom metadata large enough to adversely impact access to
> > and
> > > >> > > caching of the rlmMetadata by Kafka.
> > > >> >
> > > >> > Since the custom metadata is 100% under control of the RSM plugin,
> > the
> > > >> risk
> > > >> > is as big as the risk of running a third-party code (i.e. the RSM
> > > >> plugin).
> > > >> > The cluster admin must make the decision if they trust it.
> > > >> > To mitigate this risk and put it under control, the RSM plugin
> > > >> > implementations could document what custom metadata they use and
> > > >> estimate
> > > >> > their size.
> > > >> >
> > > >> > Best,
> > > >> > Ivan
> > > >> >
> > > >> >
> > > >> > On Mon, 17 Apr 2023 at 18:14, Alexandre Dupriez <
> > > >> alexandre.dupr...@gmail.com>
> > > >> > wrote:
> > > >> >
> > > >> > > Hi Ivan,
> > > >> > >
> > > >> > > Thank you for the KIP.
> > > >> > >
> > > >> > > I think the KIP clearly explains the need for out-of-band metadata
> > > >> > > authored and used by an implementation of the remote storage
> > manager.
> > > >> > > One question I would have is, what is the benefit of adding these
> > > >> > > custom metadata in the rlmMetadata rather than letting the plugin
> > > >> > > manage access and persistence to them?
> > > >> > >
> > > >> > > Maybe one disadvantage and potential risk with the approach
> > proposed
> > > >> > > in the KIP is that the rlmMetadata is not of a predefined,
> > relatively
> > > >> > > constant size (although corner cases with thousands of leader
> > epochs
> > > >> > > in the leader epoch map are possible). It would be possible for a
> > user
> > > >> > > to use custom metadata large enough to adversely impact access to
> > and
> > > >> > > caching of the rlmMetadata by Kafka.
> > > >> > >
> > > >> > > Thanks,
> > > >> > > Alexandre
> > > >> > >
> > > >> > > Le jeu. 6 avr. 2023 à 16:03, hzh0425 <hzhka...@163.com> a écrit :
> > > >> > > >
> > > >> > > > I think it's a good idea as we may want to store remote
> > segments in
> > > >> > > different buckets
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > | |
> > > >> > > > hzhka...@163.com
> > > >> > > > |
> > > >> > > > |
> > > >> > > > 邮箱:hzhka...@163.com
> > > >> > > > |
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > >
> > > >> > > > ---- 回复的原邮件 ----
> > > >> > > > | 发件人 | Ivan Yurchenko<ivan0yurche...@gmail.com> |
> > > >> > > > | 日期 | 2023年04月06日 22:37 |
> > > >> > > > | 收件人 | dev@kafka.apache.org<dev@kafka.apache.org> |
> > > >> > > > | 抄送至 | |
> > > >> > > > | 主题 | [DISCUSS] KIP-917: Additional custom metadata for remote
> > log
> > > >> > > segment |
> > > >> > > > Hello!
> > > >> > > >
> > > >> > > > I would like to start the discussion thread on KIP-917:
> > Additional
> > > >> custom
> > > >> > > > metadata for remote log segment [1]
> > > >> > > > This KIP is fairly small and proposes to add a new field to the
> > > >> remote
> > > >> > > > segment metadata.
> > > >> > > >
> > > >> > > > Thank you!
> > > >> > > >
> > > >> > > > Best,
> > > >> > > > Ivan
> > > >> > > >
> > > >> > > > [1]
> > > >> > > >
> > > >> > >
> > > >>
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-917%3A+Additional+custom+metadata+for+remote+log+segment
> > > >> > >
> > > >>
> > > >
> >

Reply via email to