Re: [DISCUSS] KIP-819: Merge multiple KStreams in one operation

Chris Egerton Tue, 29 Mar 2022 21:21:33 -0700

Hi all,

The Named-first variant seems pretty appealing:
    merge(Named named, KStream... streams)
It doesn't align with the existing merge methods, but it does at least
follow precedent set by the (now-deprecated) branch method [1].


A Collection-based alternative seems slightly less appealing, but only
because I'm guessing it'll be more common for the set of to-be-merged
streams to be known at compile time. In that case, the syntactic sugar
provided by a variadic method is preferable to having to wrap your set of
streams in a call to, e.g., Arrays::asList [2].

An issue I have with the split variant:
    merge(KStream first, Named named, KStream... rest)
is that it doesn't seem very intuitive to users who aren't familiar with
Streams and don't have the context of how/when the overloaded variants to
the merge method were introduced.

If we really want things to be consistent, one possibility is to add a
Named-first variant:
    merge(Named named, KStream... streams)
Deprecate the existing Named variant:
    merge(KStream stream, Named named)
And change the existing single-arg merge method:
    merge(KStream stream)
To become variadic (like proposed by Matthias earlier in the thread):
    merge(KStream... streams)

In the end, we'd have three methods, possibly reduced to two by the next
major release:
    merge(Named named, KStream... streams)
    merge(KStream... streams)
    merge(KStream stream, Named named) (deprecated)

RE supporting both a Collection-based and a variadic method: it doesn't
look like this is too common in the Streams API right now and considering
how trivial it is to convert from one style to another in most cases
(either with the aforementioned Arrays::asList to turn a static
compile-time set of streams into a Collection, or with Collection::toArray
[3] to turn a runtime Collection of streams into an array which can be
passed to a variadic method), it doesn't seem worth it to pollute the API
space with multiple methods that provide the exact same behavior.

[1] -
https://kafka.apache.org/31/javadoc/org/apache/kafka/streams/kstream/KStream.html#branch(org.apache.kafka.streams.kstream.Named,org.apache.kafka.streams.kstream.Predicate..
.)
[2] -
https://docs.oracle.com/javase/8/docs/api/java/util/Arrays.html#asList-T...-
[3] -
https://docs.oracle.com/javase/8/docs/api/java/util/Collection.html#toArray-T:A-

Cheers,

Chris

On Tue, Mar 29, 2022 at 11:14 PM Matthias J. Sax <mj...@apache.org> wrote:

> My understand was, that the original proposal was to have:
>
>    merge(KStream stream);
>    merge(KStream... streams);
>
> Maybe I misunderstood.
>
>
> I am not really a fan of
>
>    merge(KStream stream, KStream... streams);
>
> because it seem to break the `Collection` pattern. If I have a
> collection of KStreams, I need to artificially extract one and pass as
> first argument and pass all others using the second argument.
>
> On the other hand, _if_ I have a collection, going back to the original
> proposal of
>
>    merge(Collection<KStream> streams);
>
> would work, too. Maybe bottom line is, that we might want to have both
> (`Collection` and vararg) to optimize for both cases? On the other hand
> it feels rather redundant? Also not sure if both are compatible?
>
>
>
> The issue with Named is interesting. Wondering if we should just flip
> the argument order:
>
>    merge(Named name, KStream... streams);
>
> Falling back to `Collection` would also avoid this issue.
>
>
>
> -Matthias
>
>
> On 3/29/22 1:43 AM, Nick Telford wrote:
> > Yeah, the Named parameter makes it a little trickier. My suggestion would
> > be to add an additional overload that looks like:
> >
> > KStream<K, V> merged(KStream<K, V> first, Named named, KStream<K, V>
> rest);
> >
> > It's not ideal having the Named parameter split the other parameters; we
> > could alternatively move the Named parameter to be first, but then that
> > wouldn't align with the rest of the API.
> >
> > Nick
> >
> > On Tue, 29 Mar 2022 at 05:20, Chris Egerton <fearthecel...@gmail.com>
> wrote:
> >
> >> Hi all,
> >>
> >> Java permits the overload. Simple test class to demonstrate:
> >>
> >> ```
> >> public class Test {
> >>      private final String field;
> >>
> >>      public Test(String field) {
> >>          this.field = field;
> >>      }
> >>
> >>      public Test merge(Test that) {
> >>          return new Test("Single-arg merge: " + this.field + ", " +
> >> that.field);
> >>      }
> >>
> >>      public Test merge(Test that, Test... those) {
> >>          String newField = "Varargs merge: " + this.field + ", " +
> >> that.field;
> >>          for (Test test : those) newField += ", " + test.field;
> >>          return new Test(newField);
> >>      }
> >>
> >>      public static void main(String[] args) {
> >>          Test t1 = new Test("t1"), t2 = new Test("t2"), t3 = new
> Test("t3");
> >>          Test merge1 = t1.merge(t2), merge2 = t1.merge(t2, t3);
> >>          System.out.println(merge1.field); // Single-arg merge: t1, t2
> >>          System.out.println(merge2.field); // Varargs merge: t1, t2, t3
> >>      }
> >> }
> >> ```
> >>
> >> There's a great StackOverflow writeup on the subject [1], which explains
> >> that during method resolution, priority is given to methods whose
> >> signatures match the argument list without taking boxing/unboxing or
> >> varargs into consideration:
> >>
> >>> The first phase performs overload resolution without permitting boxing
> or
> >> unboxing conversion, or the use of variable arity method invocation. If
> no
> >> applicable method is found during this phase then processing continues
> to
> >> the second phase.
> >>> The second phase performs overload resolution while allowing boxing and
> >> unboxing, but still precludes the use of variable arity method
> invocation.
> >> If no applicable method is found during this phase then processing
> >> continues to the third phase.
> >>> The third phase allows overloading to be combined with variable arity
> >> methods, boxing, and unboxing.
> >>
> >> I'm curious if it's worth keeping a variant that accepts a Named
> parameter?
> >> Might be tricky to accommodate since variadic arguments have to be last.
> >>
> >> [1] - https://stackoverflow.com/a/48850722
> >>
> >> Cheers,
> >>
> >> Chris
> >>
> >> On Mon, Mar 28, 2022 at 11:46 PM Matthias J. Sax <mj...@apache.org>
> wrote:
> >>
> >>> I think Java does not allow to have both overloads, because it would
> >>> result in ambiguity?
> >>>
> >>> If you call `s1.merge(s2)` it's unclear which method you want to call.
> >>>
> >>>
> >>> -Matthias
> >>>
> >>>
> >>> On 3/28/22 7:20 AM, Nick Telford wrote:
> >>>> Hi Matthias,
> >>>>
> >>>> How about instead of changing the signature of the existing method to
> >>>> variadic, we simply add a new overload which takes variadic args:
> >>>>
> >>>> KStream<K, V> merge(KStream<K, V> first, KStream<K, V>... rest);
> >>>>
> >>>> That way, we maintain both source *and* binary compatibility for the
> >>>> existing method, and we can enforce that there is always at least one
> >>>> stream (argument) being merged.
> >>>>
> >>>> I'm fine dropping the static methods. As you said, this is mostly all
> >>> just
> >>>> syntax sugar anyway, but I do think allowing multiple streams to be
> >>> merged
> >>>> together is a benefit. My motivation was that we generate diagrams for
> >>> our
> >>>> Topologies, and having several binary merges becomes quite messy when
> a
> >>>> single n-ary merge is what you're really modelling.
> >>>>
> >>>> Regards,
> >>>>
> >>>> Nick
> >>>>
> >>>> On Thu, 24 Mar 2022 at 21:24, Matthias J. Sax <mj...@apache.org>
> >> wrote:
> >>>>
> >>>>> Thanks for proposing this KIP.
> >>>>>
> >>>>> I feel a little bit torn by the idea. In general, we try to keep the
> >>>>> surface area small, and only add APIs that delivery (significant)
> >> value.
> >>>>>
> >>>>> It seems the current proposal is more or less about syntactic sugar,
> >>>>> what can still be valuable, but I am not really sure about it.
> >>>>>
> >>>>> I am also wondering, if we could use a variadic argument instead of a
> >>>>> `Collection`:
> >>>>>
> >>>>>        KStream<K, V> merge(KStream<K, V>... streams);
> >>>>>
> >>>>> This way, we could just replace the existing method in a backward
> >>>>> compatible way (well, source code compatible only) and thus not
> >> increase
> >>>>> the surface area of the API while still achieving your goal?
> >>>>>
> >>>>> A `merge()` with zero argument would just be a no-op (same as for
> >> using
> >>>>> `Collection` I assume?).
> >>>>>
> >>>>>
> >>>>> For adding the static methods: It seems not to be a common pattern to
> >>>>> me? I might be better not to add them and leave it to users to write
> a
> >>>>> small helper method themselves if they have such a pattern?
> >>>>>
> >>>>>
> >>>>> -Matthias
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 1/31/22 7:35 AM, Nick Telford wrote:
> >>>>>> Hi everyone,
> >>>>>>
> >>>>>> I'd like to discuss KIP 819:
> >>>>>>
> >>>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-819%3A+Merge+multiple+KStreams+in+one+operation
> >>>>>>
> >>>>>> This is a simple KIP that adds/modifies the KStream#merge API to
> >> enable
> >>>>>> many streams to be merged in a single graph node.
> >>>>>>
> >>>>>> Regards,
> >>>>>>
> >>>>>> Nick Telford
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>

Re: [DISCUSS] KIP-819: Merge multiple KStreams in one operation

Reply via email to