Hi Matthias,

Thanks for reviewing it. That's a nice question. I can provide you the
answer as a user of the API - but of course it very much depends on what
the original intention towards the user is.
The only reason I found out it was opinionated was when my tests using
in-memory materialization started failing due to the state-store clean-up
bug
<https://stackoverflow.com/questions/50602512/failed-to-delete-the-state-directory-in-ide-for-kafka-stream-application>
in Windows systems.

To be honest, I think I wouldn't have investigated how subscription stores
are used without seeing that my system couldn't handle the expected data.
Must the user know about the subscription-stores beforehand? If yes, then
indeed we should deprecate.

Having said that, it seems to me that deprecating would not be consistent
with other parts of the API that have parameterized methods hiding
complexity.
E.g. from the top of my head, join() will make a rocksDB materialization if
no-materialization method is provided.

public <V1, R> KTable<K, R> join(final KTable<K, V1> other, final
ValueJoiner<? super V, ? super V1, ? extends R> joiner0


I see this as something equivalent. If the user didn't provide all the
methods, shouldn't it be fine to fallback to a default materialization
method as the other sections of the API already does?
In that scenario, there's no need to deprecate and we just need to make a
reasonable guess towards it. Using the same method as the table-store
sounds like a reasonable guess in this scenario.

In my perspective as a user, I would prefer to have simpler methods knowing
I would be losing some fine-grain configuration. Does it sound reasonable
to you?

Cheers,
Marco Lotz

On Wed, Apr 14, 2021 at 8:47 PM Matthias J. Sax <mj...@apache.org> wrote:

> I just re-read the KIP, and I am wondering why we would *only add* new
> methods.
>
> Is it an expected use case to only change the main stores but not the
> subscription stores?
>
> Wondering if we should deprecate the existing methods that only accept a
> single `Materialized` parameter?
>
>
> -Matthias
>
> On 4/9/21 1:32 PM, Marco Aurélio Lotz wrote:
> > Hi everyone,
> >
> > I am fine with sticking with Materialized and adding the info to the
> > javadoc then - so we keep the footprint smaller.
> > I will then update the KIP to match what we discussed here and send it
> for
> > a vote next week.
> >
> > I will raise a new bug-fix ticket and change KAFKA-10383
> > <https://issues.apache.org/jira/browse/KAFKA-10383> to become a feature
> -
> > if that's ok.
> >
> > Cheers,
> > Marco
> >
> > On Wed, Apr 7, 2021 at 4:15 AM Matthias J. Sax <mj...@apache.org> wrote:
> >
> >> Just catching up here.
> >>
> >> I agree that we have two issue, and the first (align subscription store
> >> to main store) can be done as a bug-fix.
> >>
> >> For the KIP (that addressed the second), I tend to agree that reusing
> >> `Materialized` might be better as it would keep the API surface area
> >> smaller.
> >>
> >>
> >> -Matthias
> >>
> >> On 4/6/21 8:48 AM, John Roesler wrote:
> >>> Hi Marco,
> >>>
> >>> Just a quick clarification: I just reviewed the Materialized class. It
> >> looks like the only undesirable members are:
> >>> 1. Retention
> >>> 2. Key/Value serdes
> >>>
> >>> The underlying store type would be “KeyValueStore<Bytes,byte[]>” , for
> >> which case the withRetention javadoc already says it’s ignored.
> >>>
> >>> Perhaps we could just stick with Materialized by adding a note to the
> >> Key/Value serdes setters that they are ignored for FKJoin subscription
> >> stores?
> >>>
> >>> Not as elegant as a new config class, but these config classes actually
> >> bring a fair amount of complexity, so it might be nice to avoid a new
> one.
> >>>
> >>> Thanks,
> >>> John
> >>>
> >>> On Tue, Apr 6, 2021, at 10:28, Marco Aurélio Lotz wrote:
> >>>> Hi John / Guozhang,
> >>>>
> >>>> If I correctly understood John's message, he agrees on having the two
> >>>> scenarios (piggy-back and api extension). In my view, these two
> >> scenarios
> >>>> are separate tasks - the first one is a bug-fix and the second is an
> >>>> improvement on the current API.
> >>>>
> >>>> - bug-fix: On the current API, we change its implementation to piggy
> >> back
> >>>> on the materialization method provided to the materialized parameter.
> >> This
> >>>> way it will not be opinionated anymore and will not force RocksDb
> >>>> persistence for subscription store. Thus an in-memory materialized
> >>>> parameter would imply an in-memory subscription store, for example.
> >> From my
> >>>> understanding, the original implementation tried to be as unopionated
> >>>> towards storage methods as possible - and the current implementation
> is
> >> not
> >>>> allowing that. Was that the case? We would still need to add this
> >>>> modification to the update notes, since it may affect some
> deployments.
> >>>>
> >>>> - improvement: We extend the API to allow a user to fine tune
> different
> >>>> materialization methods for subscription and join store. This is done
> by
> >>>> adding a new parameter to the associated methods.
> >>>>
> >>>> Does it sound reasonable to you Guozhang?
> >>>> On your question, does it make sense for an user to decide retention
> >>>> policies (withRetention method) or caching for subscription stores? I
> >> can
> >>>> see why to finetune Logging for example, but in a first moment not
> these
> >>>> other behaviours. That's why I am unsure about using Materialized
> class.
> >>>>
> >>>> @John, I will update the KIP with your points as soon as we clarify
> >> this.
> >>>>
> >>>> Cheers,
> >>>> Marco
> >>>>
> >>>> On Tue, Apr 6, 2021 at 1:17 AM Guozhang Wang <wangg...@gmail.com>
> >> wrote:
> >>>>
> >>>>> Thanks Marco / John,
> >>>>>
> >>>>> I think the arguments for not piggy-backing on the existing
> >> Materialized
> >>>>> makes sense; on the other hand, if we go this route should we just
> use
> >> a
> >>>>> separate Materialized than using an extended /
> >>>>> narrowed-scoped MaterializedSubscription since it seems we want to
> >> allow
> >>>>> users to fully customize this store?
> >>>>>
> >>>>> Guozhang
> >>>>>
> >>>>> On Thu, Apr 1, 2021 at 5:28 PM John Roesler <vvcep...@apache.org>
> >> wrote:
> >>>>>
> >>>>>> Thanks Marco,
> >>>>>>
> >>>>>> Sorry if I caused any trouble!
> >>>>>>
> >>>>>> I don’t remember what I was thinking before, but reasoning about it
> >> now,
> >>>>>> you might need the fine-grained choice if:
> >>>>>>
> >>>>>> 1. The number or size of records in each partition of both tables is
> >>>>>> small(ish), but the cardinality of the join is very high. Then you
> >> might
> >>>>>> want an in-memory table store, but an on-disk subscription store.
> >>>>>>
> >>>>>> 2. The number or size of records is very large, but the join
> >> cardinality
> >>>>>> is low. Then you might need an on-disk table store, but an in-memory
> >>>>>> subscription store.
> >>>>>>
> >>>>>> 3. You might want a different kind (or differently configured) store
> >> for
> >>>>>> the subscription store, since it’s access pattern is so different.
> >>>>>>
> >>>>>> If you buy these, it might be good to put the justification into the
> >> KIP.
> >>>>>>
> >>>>>> I’m in favor of the default you’ve proposed.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> John
> >>>>>>
> >>>>>> On Mon, Mar 29, 2021, at 04:24, Marco Aurélio Lotz wrote:
> >>>>>>> Hi Guozhang,
> >>>>>>>
> >>>>>>> Apologies for the late answer. Originally that was my proposal - to
> >>>>>>> piggyback on the provided materialisation method (
> >>>>>>> https://issues.apache.org/jira/browse/KAFKA-10383).
> >>>>>>> John Roesler suggested to us to provide even further fine tuning on
> >> API
> >>>>>>> level parameters. Maybe we could see this as two sides of the same
> >>>>> coin:
> >>>>>>>
> >>>>>>> - On the current API, we change it to piggy back on the
> >> materialization
> >>>>>>> method provided to the join store.
> >>>>>>> - We extend the API to allow a user to fine tune different
> >>>>>> materialization
> >>>>>>> methods for subscription and join store.
> >>>>>>>
> >>>>>>> What do you think?
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Marco
> >>>>>>>
> >>>>>>> On Thu, Mar 4, 2021 at 8:04 PM Guozhang Wang <wangg...@gmail.com>
> >>>>> wrote:
> >>>>>>>
> >>>>>>>> Thanks Marco,
> >>>>>>>>
> >>>>>>>> Just a quick thought: what if we reuse the existing Materialized
> >>>>>> object for
> >>>>>>>> both subscription and join stores, instead of introducing a new
> >>>>> param /
> >>>>>>>> class?
> >>>>>>>>
> >>>>>>>> Guozhang
> >>>>>>>>
> >>>>>>>> On Tue, Mar 2, 2021 at 1:07 AM Marco Aurélio Lotz <
> >>>>>> cont...@marcolotz.com>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi folks,
> >>>>>>>>>
> >>>>>>>>> I would like to invite everyone to discuss further KIP-718:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-718%3A+Make+KTable+Join+on+Foreign+key+unopinionated
> >>>>>>>>>
> >>>>>>>>> I welcome all feedback on it.
> >>>>>>>>>
> >>>>>>>>> Kind Regards,
> >>>>>>>>> Marco Lotz
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> -- Guozhang
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> -- Guozhang
> >>>>>
> >>>>
> >>
> >
>

Reply via email to