Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-28 Thread Matthias J. Sax
sed in DSL, I agree both has their arguments:
>>>>
>>>> 1. On one side, people using the DSL layer probably do not need to be aware
>>>> (or rather, "learn about") of the "topology" concept, although this concept
>>>> is a publicly exposed one in Kafka Streams.
>>>>
>>>> 2. On the other side, StreamsBuilder#build() returning a Topology object
>>>> sounds a little weird, at least to me (admittedly subjective matter).
>>>>
>>>>
>>>> Since the second bullet point seems to be more "subjective" and many people
>>>> are not worried about it, I'm OK to go with the other option.
>>>>
>>>>
>>>> Guozhang
>>>>
>>>>
>>>> On Wed, Mar 22, 2017 at 8:58 AM, Michael Noll <mich...@confluent.io>
>>>> wrote:
>>>>
>>>>> Forwarding to kafka-user.
>>>>>
>>>>>
>>>>> -- Forwarded message --
>>>>> From: Michael Noll <mich...@confluent.io>
>>>>> Date: Wed, Mar 22, 2017 at 8:48 AM
>>>>> Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API
>>>>> To: d...@kafka.apache.org
>>>>>
>>>>>
>>>>> Matthias,
>>>>>
>>>>>> @Michael:
>>>>>>
>>>>>> You seemed to agree with Jay about not exposing the `Topology` concept
>>>>>> in our main entry class (ie, current KStreamBuilder), thus, I
>>>>>> interpreted that you do not want `Topology` in the name either (I am a
>>>>>> little surprised by your last response, that goes the opposite
>>>>> direction).
>>>>>
>>>>> Oh, sorry for not being clear.
>>>>>
>>>>> What I wanted to say in my earlier email was the following:  Yes, I do
>>>>> agree with most of Jay's reasoning, notably about carefully deciding how
>>>>> much and which parts of the API/concept "surface" we expose to users of
>>>> the
>>>>> DSL.  However, and this is perhaps where I wasn't very clear, I disagree
>>>> on
>>>>> the particular opinion about not exposing the topology concept to DSL
>>>>> users.  Instead, I think the concept of a topology is important to
>>>>> understand even for DSL users -- particularly because of the way the DSL
>>>> is
>>>>> currently wiring your processing logic via the builder pattern.  (As I
>>>>> noted, e.g. Akka uses a different approach where you might be able to get
>>>>> away with not exposing the "topology" concept, but even in Akka there's
>>>> the
>>>>> notion of graphs and flows.)
>>>>>
>>>>>
>>>>>>> StreamsBuilder builder = new StreamsBuilder();
>>>>>>>
>>>>>>> // And here you'd define your...well, what actually?
>>>>>>> // Ah right, you are composing a topology here, though you are
>>>> not
>>>>>>> aware of it.
>>>>>>
>>>>>> Yes. You are not aware of if -- that's the whole point about it --
>>>> don't
>>>>>> put the Topology concept in the focus...
>>>>>
>>>>> Let me turn this around, because that was my point: it's confusing to
>>>> have
>>>>> a name "StreamsBuilder" if that thing isn't building streams, and it is
>>>>> not.
>>>>>
>>>>> As I mentioned before, I do think it is a benefit to make it clear to DSL
>>>>> users that there are two aspects at play: (1) defining the logic/plan of
>>>>> your processing, and (2) the execution of that plan.  I have a less
>>>> strong
>>>>> opinion whether or not having "topology" in the names would help to
>>>>> communicate this separation as well as combination of (1) and (2) to make
>>>>> your app work as expected.
>>>>>
>>>>> If we stick with `KafkaStreams` for (2) *and* don't like having
>>>> "topology"
>>>>> in the name, then perhaps we should rename `KStreamBuilder` to
>>>>> `KafkaStreamsBuilder`.  That at least gives some illusion of a combo of
>>>> (1)
>>>>> and (2).  IMHO, `KafkaStreamsBuilder` highlights better that "it is a
>>>>> builder/h

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-27 Thread Matthias J. Sax
Hi,

I would like to trigger this discussion again. It seems that the naming
question is rather subjective and both main alternatives (w/ or w/o the
word "Topology" in the name) have pros/cons.

If you have any further thought, please share it. At the moment I still
propose `StreamsBuilder` in the KIP.

I also want do point out, that the VOTE thread was already started. So
if you like the current KIP, please cast your vote there.


Thanks a lot!


-Matthias


On 3/23/17 3:38 PM, Matthias J. Sax wrote:
> Jay,
> 
> about the naming schema:
> 
>>>1. "kstreams" - the DSL
>>>2. "processor api" - the lower level callback/topology api
>>>3. KStream/KTable - entities in the kstreams dsl
>>>4. "Kafka Streams" - General name for stream processing stuff in Kafka,
>>>including both kstreams and the processor API plus the underlying
>>>implementation.
> 
> It think this terminology has some issues... To me, `kstreams` was
> always not more than an abbreviation for `Kafka Streams` -- thus (1) and
> (4) kinda collide here. Following questions on the mailing list etc I
> often see people using kstreams or kstream exactly a abbr. for "Kafka
> Streams"
> 
>> I think referring to the dsl as "kstreams" is cute and pneumonic and not
>> particularly confusing.
> 
> I disagree here. It's a very subtle difference between `kstreams` and
> `KStream` -- just singular/plural, thus (1) and (3) also "collide" --
> it's just too close to each other.
> 
> Thus, I really think it's a good idea to get a new name for the DSL to
> get a better separation of the 4 concepts.
> 
> Furthermore, we use the term "Streams API". Thus, I think
> `StreamsBuilder` (or `StreamsTopologyBuilder`) are both very good names.
> 
> 
> Thus, I prefer to keep the KIP as is (suggesting `StreamsBuilder`).
> 
> I will start a VOTE thread. Of course, we can still discuss the naming
> issue. :)
> 
> 
> 
> -Matthias
> 
> 
> On 3/22/17 8:53 PM, Jay Kreps wrote:
>> I don't feel strongly on this, so I'm happy with whatever everyone else
>> wants.
>>
>> Michael, I'm not arguing that people don't need to understand topologies, I
>> just think it is like rocks db, you need to understand it when
>> debugging/operating but not in the initial coding since the metaphor we're
>> providing at this layer isn't a topology of processors but rather something
>> like the collections api. Anyhow it won't hurt people to have it there.
>>
>> For the original KStreamBuilder thing, I think that came from the naming we
>> discussed originally:
>>
>>1. "kstreams" - the DSL
>>2. "processor api" - the lower level callback/topology api
>>3. KStream/KTable - entities in the kstreams dsl
>>4. "Kafka Streams" - General name for stream processing stuff in Kafka,
>>including both kstreams and the processor API plus the underlying
>>implementation.
>>
>> I think referring to the dsl as "kstreams" is cute and pneumonic and not
>> particularly confusing. Just like referring to the "java collections
>> library" isn't confusing even though it contains the Iterator interface
>> which is not actually itself a collection.
>>
>> So I think KStreamBuilder should technically have been KstreamsBuilder and
>> is intended not to be a builder of a KStream but rather the builder for the
>> kstreams DSL. Okay, yes, that *is* slightly confusing. :-)
>>
>> -Jay
>>
>> On Wed, Mar 22, 2017 at 11:25 AM, Guozhang Wang <wangg...@gmail.com> wrote:
>>
>>> Regarding the naming of `StreamsTopologyBuilder` v.s. `StreamsBuilder` that
>>> are going to be used in DSL, I agree both has their arguments:
>>>
>>> 1. On one side, people using the DSL layer probably do not need to be aware
>>> (or rather, "learn about") of the "topology" concept, although this concept
>>> is a publicly exposed one in Kafka Streams.
>>>
>>> 2. On the other side, StreamsBuilder#build() returning a Topology object
>>> sounds a little weird, at least to me (admittedly subjective matter).
>>>
>>>
>>> Since the second bullet point seems to be more "subjective" and many people
>>> are not worried about it, I'm OK to go with the other option.
>>>
>>>
>>> Guozhang
>>>
>>>
>>> On Wed, Mar 22, 2017 at 8:58 AM, Michael Noll <mich...@confluent.io>
>>> wrote:
>>>
>>>> Forwarding to kafka-user.
>

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-23 Thread Matthias J. Sax
Jay,

about the naming schema:

>>1. "kstreams" - the DSL
>>2. "processor api" - the lower level callback/topology api
>>3. KStream/KTable - entities in the kstreams dsl
>>4. "Kafka Streams" - General name for stream processing stuff in Kafka,
>>including both kstreams and the processor API plus the underlying
>>implementation.

It think this terminology has some issues... To me, `kstreams` was
always not more than an abbreviation for `Kafka Streams` -- thus (1) and
(4) kinda collide here. Following questions on the mailing list etc I
often see people using kstreams or kstream exactly a abbr. for "Kafka
Streams"

> I think referring to the dsl as "kstreams" is cute and pneumonic and not
> particularly confusing.

I disagree here. It's a very subtle difference between `kstreams` and
`KStream` -- just singular/plural, thus (1) and (3) also "collide" --
it's just too close to each other.

Thus, I really think it's a good idea to get a new name for the DSL to
get a better separation of the 4 concepts.

Furthermore, we use the term "Streams API". Thus, I think
`StreamsBuilder` (or `StreamsTopologyBuilder`) are both very good names.


Thus, I prefer to keep the KIP as is (suggesting `StreamsBuilder`).

I will start a VOTE thread. Of course, we can still discuss the naming
issue. :)



-Matthias


On 3/22/17 8:53 PM, Jay Kreps wrote:
> I don't feel strongly on this, so I'm happy with whatever everyone else
> wants.
> 
> Michael, I'm not arguing that people don't need to understand topologies, I
> just think it is like rocks db, you need to understand it when
> debugging/operating but not in the initial coding since the metaphor we're
> providing at this layer isn't a topology of processors but rather something
> like the collections api. Anyhow it won't hurt people to have it there.
> 
> For the original KStreamBuilder thing, I think that came from the naming we
> discussed originally:
> 
>1. "kstreams" - the DSL
>2. "processor api" - the lower level callback/topology api
>3. KStream/KTable - entities in the kstreams dsl
>4. "Kafka Streams" - General name for stream processing stuff in Kafka,
>including both kstreams and the processor API plus the underlying
>implementation.
> 
> I think referring to the dsl as "kstreams" is cute and pneumonic and not
> particularly confusing. Just like referring to the "java collections
> library" isn't confusing even though it contains the Iterator interface
> which is not actually itself a collection.
> 
> So I think KStreamBuilder should technically have been KstreamsBuilder and
> is intended not to be a builder of a KStream but rather the builder for the
> kstreams DSL. Okay, yes, that *is* slightly confusing. :-)
> 
> -Jay
> 
> On Wed, Mar 22, 2017 at 11:25 AM, Guozhang Wang <wangg...@gmail.com> wrote:
> 
>> Regarding the naming of `StreamsTopologyBuilder` v.s. `StreamsBuilder` that
>> are going to be used in DSL, I agree both has their arguments:
>>
>> 1. On one side, people using the DSL layer probably do not need to be aware
>> (or rather, "learn about") of the "topology" concept, although this concept
>> is a publicly exposed one in Kafka Streams.
>>
>> 2. On the other side, StreamsBuilder#build() returning a Topology object
>> sounds a little weird, at least to me (admittedly subjective matter).
>>
>>
>> Since the second bullet point seems to be more "subjective" and many people
>> are not worried about it, I'm OK to go with the other option.
>>
>>
>> Guozhang
>>
>>
>> On Wed, Mar 22, 2017 at 8:58 AM, Michael Noll <mich...@confluent.io>
>> wrote:
>>
>>> Forwarding to kafka-user.
>>>
>>>
>>> -- Forwarded message --
>>> From: Michael Noll <mich...@confluent.io>
>>> Date: Wed, Mar 22, 2017 at 8:48 AM
>>> Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API
>>> To: d...@kafka.apache.org
>>>
>>>
>>> Matthias,
>>>
>>>> @Michael:
>>>>
>>>> You seemed to agree with Jay about not exposing the `Topology` concept
>>>> in our main entry class (ie, current KStreamBuilder), thus, I
>>>> interpreted that you do not want `Topology` in the name either (I am a
>>>> little surprised by your last response, that goes the opposite
>>> direction).
>>>
>>> Oh, sorry for not being clear.
>>>
>>> What I wanted to say in my earlier email was the following:  Yes, I do
>>

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-22 Thread Jay Kreps
I don't feel strongly on this, so I'm happy with whatever everyone else
wants.

Michael, I'm not arguing that people don't need to understand topologies, I
just think it is like rocks db, you need to understand it when
debugging/operating but not in the initial coding since the metaphor we're
providing at this layer isn't a topology of processors but rather something
like the collections api. Anyhow it won't hurt people to have it there.

For the original KStreamBuilder thing, I think that came from the naming we
discussed originally:

   1. "kstreams" - the DSL
   2. "processor api" - the lower level callback/topology api
   3. KStream/KTable - entities in the kstreams dsl
   4. "Kafka Streams" - General name for stream processing stuff in Kafka,
   including both kstreams and the processor API plus the underlying
   implementation.

I think referring to the dsl as "kstreams" is cute and pneumonic and not
particularly confusing. Just like referring to the "java collections
library" isn't confusing even though it contains the Iterator interface
which is not actually itself a collection.

So I think KStreamBuilder should technically have been KstreamsBuilder and
is intended not to be a builder of a KStream but rather the builder for the
kstreams DSL. Okay, yes, that *is* slightly confusing. :-)

-Jay

On Wed, Mar 22, 2017 at 11:25 AM, Guozhang Wang <wangg...@gmail.com> wrote:

> Regarding the naming of `StreamsTopologyBuilder` v.s. `StreamsBuilder` that
> are going to be used in DSL, I agree both has their arguments:
>
> 1. On one side, people using the DSL layer probably do not need to be aware
> (or rather, "learn about") of the "topology" concept, although this concept
> is a publicly exposed one in Kafka Streams.
>
> 2. On the other side, StreamsBuilder#build() returning a Topology object
> sounds a little weird, at least to me (admittedly subjective matter).
>
>
> Since the second bullet point seems to be more "subjective" and many people
> are not worried about it, I'm OK to go with the other option.
>
>
> Guozhang
>
>
> On Wed, Mar 22, 2017 at 8:58 AM, Michael Noll <mich...@confluent.io>
> wrote:
>
> > Forwarding to kafka-user.
> >
> >
> > ------ Forwarded message --
> > From: Michael Noll <mich...@confluent.io>
> > Date: Wed, Mar 22, 2017 at 8:48 AM
> > Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API
> > To: d...@kafka.apache.org
> >
> >
> > Matthias,
> >
> > > @Michael:
> > >
> > > You seemed to agree with Jay about not exposing the `Topology` concept
> > > in our main entry class (ie, current KStreamBuilder), thus, I
> > > interpreted that you do not want `Topology` in the name either (I am a
> > > little surprised by your last response, that goes the opposite
> > direction).
> >
> > Oh, sorry for not being clear.
> >
> > What I wanted to say in my earlier email was the following:  Yes, I do
> > agree with most of Jay's reasoning, notably about carefully deciding how
> > much and which parts of the API/concept "surface" we expose to users of
> the
> > DSL.  However, and this is perhaps where I wasn't very clear, I disagree
> on
> > the particular opinion about not exposing the topology concept to DSL
> > users.  Instead, I think the concept of a topology is important to
> > understand even for DSL users -- particularly because of the way the DSL
> is
> > currently wiring your processing logic via the builder pattern.  (As I
> > noted, e.g. Akka uses a different approach where you might be able to get
> > away with not exposing the "topology" concept, but even in Akka there's
> the
> > notion of graphs and flows.)
> >
> >
> > > > StreamsBuilder builder = new StreamsBuilder();
> > > >
> > > > // And here you'd define your...well, what actually?
> > > > // Ah right, you are composing a topology here, though you are
> not
> > > > aware of it.
> > >
> > > Yes. You are not aware of if -- that's the whole point about it --
> don't
> > > put the Topology concept in the focus...
> >
> > Let me turn this around, because that was my point: it's confusing to
> have
> > a name "StreamsBuilder" if that thing isn't building streams, and it is
> > not.
> >
> > As I mentioned before, I do think it is a benefit to make it clear to DSL
> > users that there are two aspects at play: (1) defining the logic/plan of
> > your processing, and (2) the execution of that plan.  I have a less
> strong
> > opin

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-22 Thread Guozhang Wang
Regarding the naming of `StreamsTopologyBuilder` v.s. `StreamsBuilder` that
are going to be used in DSL, I agree both has their arguments:

1. On one side, people using the DSL layer probably do not need to be aware
(or rather, "learn about") of the "topology" concept, although this concept
is a publicly exposed one in Kafka Streams.

2. On the other side, StreamsBuilder#build() returning a Topology object
sounds a little weird, at least to me (admittedly subjective matter).


Since the second bullet point seems to be more "subjective" and many people
are not worried about it, I'm OK to go with the other option.


Guozhang


On Wed, Mar 22, 2017 at 8:58 AM, Michael Noll <mich...@confluent.io> wrote:

> Forwarding to kafka-user.
>
>
> -- Forwarded message --
> From: Michael Noll <mich...@confluent.io>
> Date: Wed, Mar 22, 2017 at 8:48 AM
> Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API
> To: d...@kafka.apache.org
>
>
> Matthias,
>
> > @Michael:
> >
> > You seemed to agree with Jay about not exposing the `Topology` concept
> > in our main entry class (ie, current KStreamBuilder), thus, I
> > interpreted that you do not want `Topology` in the name either (I am a
> > little surprised by your last response, that goes the opposite
> direction).
>
> Oh, sorry for not being clear.
>
> What I wanted to say in my earlier email was the following:  Yes, I do
> agree with most of Jay's reasoning, notably about carefully deciding how
> much and which parts of the API/concept "surface" we expose to users of the
> DSL.  However, and this is perhaps where I wasn't very clear, I disagree on
> the particular opinion about not exposing the topology concept to DSL
> users.  Instead, I think the concept of a topology is important to
> understand even for DSL users -- particularly because of the way the DSL is
> currently wiring your processing logic via the builder pattern.  (As I
> noted, e.g. Akka uses a different approach where you might be able to get
> away with not exposing the "topology" concept, but even in Akka there's the
> notion of graphs and flows.)
>
>
> > > StreamsBuilder builder = new StreamsBuilder();
> > >
> > > // And here you'd define your...well, what actually?
> > > // Ah right, you are composing a topology here, though you are not
> > > aware of it.
> >
> > Yes. You are not aware of if -- that's the whole point about it -- don't
> > put the Topology concept in the focus...
>
> Let me turn this around, because that was my point: it's confusing to have
> a name "StreamsBuilder" if that thing isn't building streams, and it is
> not.
>
> As I mentioned before, I do think it is a benefit to make it clear to DSL
> users that there are two aspects at play: (1) defining the logic/plan of
> your processing, and (2) the execution of that plan.  I have a less strong
> opinion whether or not having "topology" in the names would help to
> communicate this separation as well as combination of (1) and (2) to make
> your app work as expected.
>
> If we stick with `KafkaStreams` for (2) *and* don't like having "topology"
> in the name, then perhaps we should rename `KStreamBuilder` to
> `KafkaStreamsBuilder`.  That at least gives some illusion of a combo of (1)
> and (2).  IMHO, `KafkaStreamsBuilder` highlights better that "it is a
> builder/helper for the Kafka Streams API", rather than "a builder for
> streams".
>
> Also, I think some of the naming challenges we're discussing here are
> caused by having this builder pattern in the first place.  If the Streams
> API was implemented in Scala, for example, we could use implicits for
> helping us to "stitch streams/tables together to build the full topology",
> thus using a different (better?) approach to composing your topologies that
> through a builder pattern.  So: perhaps there's a better way then the
> builder, and that way would also be clearer on terminology?  That said,
> this might take this KIP off-scope.
>
> -Michael
>
>
>
>
> On Wed, Mar 22, 2017 at 12:33 AM, Matthias J. Sax <matth...@confluent.io>
> wrote:
>
> > @Guozhang:
> >
> > I recognized that you want to have `Topology` in the name. But it seems
> > that more people preferred to not have it (Jay, Ram, Michael [?],
> myself).
> >
> > @Michael:
> >
> > You seemed to agree with Jay about not exposing the `Topology` concept
> > in our main entry class (ie, current KStreamBuilder), thus, I
> > interpreted that you do not want `Topology` in the name either (I am a
> > 

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-22 Thread Michael Noll
Forwarding to kafka-user.


-- Forwarded message --
From: Michael Noll <mich...@confluent.io>
Date: Wed, Mar 22, 2017 at 8:48 AM
Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API
To: d...@kafka.apache.org


Matthias,

> @Michael:
>
> You seemed to agree with Jay about not exposing the `Topology` concept
> in our main entry class (ie, current KStreamBuilder), thus, I
> interpreted that you do not want `Topology` in the name either (I am a
> little surprised by your last response, that goes the opposite direction).

Oh, sorry for not being clear.

What I wanted to say in my earlier email was the following:  Yes, I do
agree with most of Jay's reasoning, notably about carefully deciding how
much and which parts of the API/concept "surface" we expose to users of the
DSL.  However, and this is perhaps where I wasn't very clear, I disagree on
the particular opinion about not exposing the topology concept to DSL
users.  Instead, I think the concept of a topology is important to
understand even for DSL users -- particularly because of the way the DSL is
currently wiring your processing logic via the builder pattern.  (As I
noted, e.g. Akka uses a different approach where you might be able to get
away with not exposing the "topology" concept, but even in Akka there's the
notion of graphs and flows.)


> > StreamsBuilder builder = new StreamsBuilder();
> >
> > // And here you'd define your...well, what actually?
> > // Ah right, you are composing a topology here, though you are not
> > aware of it.
>
> Yes. You are not aware of if -- that's the whole point about it -- don't
> put the Topology concept in the focus...

Let me turn this around, because that was my point: it's confusing to have
a name "StreamsBuilder" if that thing isn't building streams, and it is not.

As I mentioned before, I do think it is a benefit to make it clear to DSL
users that there are two aspects at play: (1) defining the logic/plan of
your processing, and (2) the execution of that plan.  I have a less strong
opinion whether or not having "topology" in the names would help to
communicate this separation as well as combination of (1) and (2) to make
your app work as expected.

If we stick with `KafkaStreams` for (2) *and* don't like having "topology"
in the name, then perhaps we should rename `KStreamBuilder` to
`KafkaStreamsBuilder`.  That at least gives some illusion of a combo of (1)
and (2).  IMHO, `KafkaStreamsBuilder` highlights better that "it is a
builder/helper for the Kafka Streams API", rather than "a builder for
streams".

Also, I think some of the naming challenges we're discussing here are
caused by having this builder pattern in the first place.  If the Streams
API was implemented in Scala, for example, we could use implicits for
helping us to "stitch streams/tables together to build the full topology",
thus using a different (better?) approach to composing your topologies that
through a builder pattern.  So: perhaps there's a better way then the
builder, and that way would also be clearer on terminology?  That said,
this might take this KIP off-scope.

-Michael




On Wed, Mar 22, 2017 at 12:33 AM, Matthias J. Sax <matth...@confluent.io>
wrote:

> @Guozhang:
>
> I recognized that you want to have `Topology` in the name. But it seems
> that more people preferred to not have it (Jay, Ram, Michael [?], myself).
>
> @Michael:
>
> You seemed to agree with Jay about not exposing the `Topology` concept
> in our main entry class (ie, current KStreamBuilder), thus, I
> interpreted that you do not want `Topology` in the name either (I am a
> little surprised by your last response, that goes the opposite direction).
>
> > StreamsBuilder builder = new StreamsBuilder();
> >
> > // And here you'd define your...well, what actually?
> > // Ah right, you are composing a topology here, though you are not
> > aware of it.
>
> Yes. You are not aware of if -- that's the whole point about it -- don't
> put the Topology concept in the focus...
>
> Furthermore,
>
> >>> So what are you building here with StreamsBuilder?  Streams (hint: No)?
> >>> And what about tables -- is there a TableBuilder (hint: No)?
>
> I am not sure, if this is too much a concern. In contrast to
> `KStreamBuilder` (singular) that contains `KStream` and thus puts
> KStream concept in focus and thus degrade `KTable`, `StreamsBuilder`
> (plural) focuses on "Streams API". IMHO, it does not put focus on
> KStream. It's just a builder from the Streams API -- you don't need to
> worry what you are building -- and you don't need to think about the
> `Topology` concept (of course, you see that .build() return a Topology).
>
>
> P

Re: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-21 Thread Guozhang Wang
Just to clarify, I did want to have the term `Topology` as part of the
class name, for the reasons above. I'm not too worried about to be
consistent with the previous names, but I feel the `XXTopologyBuilder` is
better than `XXStreamsBuilder` since it's build() function returns a
Topology object.


Guozhang


On Mon, Mar 20, 2017 at 12:53 PM, Michael Noll <mich...@confluent.io> wrote:

> Hmm, I must admit I don't like this last update all too much.
>
> Basically we would have:
>
> StreamsBuilder builder = new StreamsBuilder();
>
> // And here you'd define your...well, what actually?
> // Ah right, you are composing a topology here, though you are not
> aware of it.
>
> KafkaStreams streams = new KafkaStreams(builder.build(),
> streamsConfiguration);
>
> So what are you building here with StreamsBuilder?  Streams (hint: No)?
> And what about tables -- is there a TableBuilder (hint: No)?
>
> I also interpret Guozhang's last response as that he'd prefer to have
> "Topology" in the class/interface names.  I am aware that we shouldn't
> necessarily use the status quo to make decisions about future changes, but
> the very first concept we explain in the Kafka Streams documentation is
> "Stream Processing Topology":
> https://kafka.apache.org/0102/documentation/streams#streams_concepts
>
> -Michael
>
>
>
> On Mon, Mar 20, 2017 at 7:55 PM, Matthias J. Sax <matth...@confluent.io>
> wrote:
>
> > \cc users list
> >
> >
> >  Forwarded Message 
> > Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API
> > Date: Mon, 20 Mar 2017 11:51:01 -0700
> > From: Matthias J. Sax <matth...@confluent.io>
> > Organization: Confluent Inc
> > To: d...@kafka.apache.org
> >
> > I want to push this discussion further.
> >
> > Guozhang's argument about "exposing" the Topology class is valid. It's a
> > public class anyway, so it's not as issue. However, I think the question
> > is not too much about exposing but about "advertising" (ie, putting it
> > into the focus) or not at DSL level.
> >
> >
> > If I interpret the last replies correctly, it seems that we could agree
> > on "StreamsBuilder" as name. I did update the KIP accordingly. Please
> > correct me, if I got this wrong.
> >
> >
> > If there are not other objects -- this naming discussion was the last
> > open point to far -- I would like the start the VOTE thread.
> >
> >
> > -Matthias
> >
> >
> > On 3/14/17 2:37 PM, Guozhang Wang wrote:
> > > I'd like to keep the term "Topology" inside the builder class since, as
> > > Matthias mentioned, this builder#build() function returns a "Topology"
> > > object, whose type is a public class anyways. Although you can argue to
> > let
> > > users always call
> > >
> > > "new KafkaStreams(builder.build())"
> > >
> > > I think it is still more benefit to expose this concept.
> > >
> > >
> > >
> > > Guozhang
> > >
> > > On Tue, Mar 14, 2017 at 10:43 AM, Matthias J. Sax <
> matth...@confluent.io
> > >
> > > wrote:
> > >
> > >> Thanks for your input Michael.
> > >>
> > >>>> - KafkaStreams as the new name for the builder that creates the
> > logical
> > >>>> plan, with e.g. `KafkaStreams.stream("intput-topic")` and
> > >>>> `KafkaStreams.table("input-topic")`.
> > >>
> > >> I don't thinks this is a good idea, for multiple reasons:
> > >>
> > >> (1) We would reuse a name for a completely different purpose. The same
> > >> argument for not renaming KStreamBuilder to TopologyBuilder. The
> > >> confusion would just be too large.
> > >>
> > >> So if we would start from scratch, it might be ok to do so, but now we
> > >> cannot make this move, IMHO.
> > >>
> > >> Also a clarification question: do you suggest to have static methods
> > >> #stream and #table -- I am not sure if this would work?
> > >> (or was you code snippet just simplification?)
> > >>
> > >>
> > >> (2) Kafka Streams is basically a "processing client" next to consumer
> > >> and producer client. Thus, the name KafkaStreams aligns to the naming
> > >> schema of KafkaConsumer and KafkaProducer. I am not sure if it would
> be
> > >> a good choice to "break" th

Re: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-20 Thread Michael Noll
Hmm, I must admit I don't like this last update all too much.

Basically we would have:

StreamsBuilder builder = new StreamsBuilder();

// And here you'd define your...well, what actually?
// Ah right, you are composing a topology here, though you are not
aware of it.

KafkaStreams streams = new KafkaStreams(builder.build(),
streamsConfiguration);

So what are you building here with StreamsBuilder?  Streams (hint: No)?
And what about tables -- is there a TableBuilder (hint: No)?

I also interpret Guozhang's last response as that he'd prefer to have
"Topology" in the class/interface names.  I am aware that we shouldn't
necessarily use the status quo to make decisions about future changes, but
the very first concept we explain in the Kafka Streams documentation is
"Stream Processing Topology":
https://kafka.apache.org/0102/documentation/streams#streams_concepts

-Michael



On Mon, Mar 20, 2017 at 7:55 PM, Matthias J. Sax <matth...@confluent.io>
wrote:

> \cc users list
>
>
>  Forwarded Message --------
> Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API
> Date: Mon, 20 Mar 2017 11:51:01 -0700
> From: Matthias J. Sax <matth...@confluent.io>
> Organization: Confluent Inc
> To: d...@kafka.apache.org
>
> I want to push this discussion further.
>
> Guozhang's argument about "exposing" the Topology class is valid. It's a
> public class anyway, so it's not as issue. However, I think the question
> is not too much about exposing but about "advertising" (ie, putting it
> into the focus) or not at DSL level.
>
>
> If I interpret the last replies correctly, it seems that we could agree
> on "StreamsBuilder" as name. I did update the KIP accordingly. Please
> correct me, if I got this wrong.
>
>
> If there are not other objects -- this naming discussion was the last
> open point to far -- I would like the start the VOTE thread.
>
>
> -Matthias
>
>
> On 3/14/17 2:37 PM, Guozhang Wang wrote:
> > I'd like to keep the term "Topology" inside the builder class since, as
> > Matthias mentioned, this builder#build() function returns a "Topology"
> > object, whose type is a public class anyways. Although you can argue to
> let
> > users always call
> >
> > "new KafkaStreams(builder.build())"
> >
> > I think it is still more benefit to expose this concept.
> >
> >
> >
> > Guozhang
> >
> > On Tue, Mar 14, 2017 at 10:43 AM, Matthias J. Sax <matth...@confluent.io
> >
> > wrote:
> >
> >> Thanks for your input Michael.
> >>
> >>>> - KafkaStreams as the new name for the builder that creates the
> logical
> >>>> plan, with e.g. `KafkaStreams.stream("intput-topic")` and
> >>>> `KafkaStreams.table("input-topic")`.
> >>
> >> I don't thinks this is a good idea, for multiple reasons:
> >>
> >> (1) We would reuse a name for a completely different purpose. The same
> >> argument for not renaming KStreamBuilder to TopologyBuilder. The
> >> confusion would just be too large.
> >>
> >> So if we would start from scratch, it might be ok to do so, but now we
> >> cannot make this move, IMHO.
> >>
> >> Also a clarification question: do you suggest to have static methods
> >> #stream and #table -- I am not sure if this would work?
> >> (or was you code snippet just simplification?)
> >>
> >>
> >> (2) Kafka Streams is basically a "processing client" next to consumer
> >> and producer client. Thus, the name KafkaStreams aligns to the naming
> >> schema of KafkaConsumer and KafkaProducer. I am not sure if it would be
> >> a good choice to "break" this naming scheme.
> >>
> >> Btw: this is also the reason, why we have KafkaStreams#close() -- and
> >> not KafkaStreams#stop() -- because #close() aligns with consumer and
> >> producer client.
> >>
> >>
> >> (3) On more argument against using KafkaStreams as DSL entry class would
> >> be, that it would need to create a Topology that can be given to the
> >> "runner/processing-client". Thus the pattern would be
> >>
> >>> Topology topology = streams.build();
> >>> KafkaStramsRunner runner = new KafkaStreamsRunner(..., topology)
> >>
> >> (or of course as a one liner).
> >>
> >>
> >>
> >> On the other hand, there was the idea (that we intentionally excluded
> >> from the KIP), to change the "client instantiation" pat

Fwd: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-20 Thread Matthias J. Sax
\cc users list


 Forwarded Message 
Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API
Date: Mon, 20 Mar 2017 11:51:01 -0700
From: Matthias J. Sax <matth...@confluent.io>
Organization: Confluent Inc
To: d...@kafka.apache.org

I want to push this discussion further.

Guozhang's argument about "exposing" the Topology class is valid. It's a
public class anyway, so it's not as issue. However, I think the question
is not too much about exposing but about "advertising" (ie, putting it
into the focus) or not at DSL level.


If I interpret the last replies correctly, it seems that we could agree
on "StreamsBuilder" as name. I did update the KIP accordingly. Please
correct me, if I got this wrong.


If there are not other objects -- this naming discussion was the last
open point to far -- I would like the start the VOTE thread.


-Matthias


On 3/14/17 2:37 PM, Guozhang Wang wrote:
> I'd like to keep the term "Topology" inside the builder class since, as
> Matthias mentioned, this builder#build() function returns a "Topology"
> object, whose type is a public class anyways. Although you can argue to let
> users always call
> 
> "new KafkaStreams(builder.build())"
> 
> I think it is still more benefit to expose this concept.
> 
> 
> 
> Guozhang
> 
> On Tue, Mar 14, 2017 at 10:43 AM, Matthias J. Sax <matth...@confluent.io>
> wrote:
> 
>> Thanks for your input Michael.
>>
>>>> - KafkaStreams as the new name for the builder that creates the logical
>>>> plan, with e.g. `KafkaStreams.stream("intput-topic")` and
>>>> `KafkaStreams.table("input-topic")`.
>>
>> I don't thinks this is a good idea, for multiple reasons:
>>
>> (1) We would reuse a name for a completely different purpose. The same
>> argument for not renaming KStreamBuilder to TopologyBuilder. The
>> confusion would just be too large.
>>
>> So if we would start from scratch, it might be ok to do so, but now we
>> cannot make this move, IMHO.
>>
>> Also a clarification question: do you suggest to have static methods
>> #stream and #table -- I am not sure if this would work?
>> (or was you code snippet just simplification?)
>>
>>
>> (2) Kafka Streams is basically a "processing client" next to consumer
>> and producer client. Thus, the name KafkaStreams aligns to the naming
>> schema of KafkaConsumer and KafkaProducer. I am not sure if it would be
>> a good choice to "break" this naming scheme.
>>
>> Btw: this is also the reason, why we have KafkaStreams#close() -- and
>> not KafkaStreams#stop() -- because #close() aligns with consumer and
>> producer client.
>>
>>
>> (3) On more argument against using KafkaStreams as DSL entry class would
>> be, that it would need to create a Topology that can be given to the
>> "runner/processing-client". Thus the pattern would be
>>
>>> Topology topology = streams.build();
>>> KafkaStramsRunner runner = new KafkaStreamsRunner(..., topology)
>>
>> (or of course as a one liner).
>>
>>
>>
>> On the other hand, there was the idea (that we intentionally excluded
>> from the KIP), to change the "client instantiation" pattern.
>>
>> Right now, a new client in actively instantiated (ie, by calling "new")
>> and the topology if provided as a constructor argument. However,
>> especially for DSL (not sure if it would make sense for PAPI), the DSL
>> builder could create the client for the user.
>>
>> Something like this:
>>
>>> KStreamBuilder builder = new KStreamBuilder();
>>> builder.whatever() // use the builder
>>>
>>> StreamsConfig config = 
>>> KafkaStreams streams = builder.getKafkaStreams(config);
>>
>> If we change the patter like this, the notion a the "DSL builder" would
>> change, as it does not create a topology anymore, but it creates the
>> "processing client". This would address Jay's concern about "not
>> exposing concept users don't need the understand" and would not require
>> to include the word "Topology" in the DSL builder class name, because
>> the builder does not build a Topology anymore.
>>
>> I just put some names that came to my mind first hand -- did not think
>> about good names. It's just to discuss the pattern.
>>
>>
>>
>> -Matthias
>>
>>
>>
>>
>>
>> On 3/14/17 3:36 AM, Michael Noll wrote:
>>> I see Jay's point, and I agree with much of it -- not

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-14 Thread Guozhang Wang
I'd like to keep the term "Topology" inside the builder class since, as
Matthias mentioned, this builder#build() function returns a "Topology"
object, whose type is a public class anyways. Although you can argue to let
users always call

"new KafkaStreams(builder.build())"

I think it is still more benefit to expose this concept.



Guozhang

On Tue, Mar 14, 2017 at 10:43 AM, Matthias J. Sax 
wrote:

> Thanks for your input Michael.
>
> >> - KafkaStreams as the new name for the builder that creates the logical
> >> plan, with e.g. `KafkaStreams.stream("intput-topic")` and
> >> `KafkaStreams.table("input-topic")`.
>
> I don't thinks this is a good idea, for multiple reasons:
>
> (1) We would reuse a name for a completely different purpose. The same
> argument for not renaming KStreamBuilder to TopologyBuilder. The
> confusion would just be too large.
>
> So if we would start from scratch, it might be ok to do so, but now we
> cannot make this move, IMHO.
>
> Also a clarification question: do you suggest to have static methods
> #stream and #table -- I am not sure if this would work?
> (or was you code snippet just simplification?)
>
>
> (2) Kafka Streams is basically a "processing client" next to consumer
> and producer client. Thus, the name KafkaStreams aligns to the naming
> schema of KafkaConsumer and KafkaProducer. I am not sure if it would be
> a good choice to "break" this naming scheme.
>
> Btw: this is also the reason, why we have KafkaStreams#close() -- and
> not KafkaStreams#stop() -- because #close() aligns with consumer and
> producer client.
>
>
> (3) On more argument against using KafkaStreams as DSL entry class would
> be, that it would need to create a Topology that can be given to the
> "runner/processing-client". Thus the pattern would be
>
> > Topology topology = streams.build();
> > KafkaStramsRunner runner = new KafkaStreamsRunner(..., topology)
>
> (or of course as a one liner).
>
>
>
> On the other hand, there was the idea (that we intentionally excluded
> from the KIP), to change the "client instantiation" pattern.
>
> Right now, a new client in actively instantiated (ie, by calling "new")
> and the topology if provided as a constructor argument. However,
> especially for DSL (not sure if it would make sense for PAPI), the DSL
> builder could create the client for the user.
>
> Something like this:
>
> > KStreamBuilder builder = new KStreamBuilder();
> > builder.whatever() // use the builder
> >
> > StreamsConfig config = 
> > KafkaStreams streams = builder.getKafkaStreams(config);
>
> If we change the patter like this, the notion a the "DSL builder" would
> change, as it does not create a topology anymore, but it creates the
> "processing client". This would address Jay's concern about "not
> exposing concept users don't need the understand" and would not require
> to include the word "Topology" in the DSL builder class name, because
> the builder does not build a Topology anymore.
>
> I just put some names that came to my mind first hand -- did not think
> about good names. It's just to discuss the pattern.
>
>
>
> -Matthias
>
>
>
>
>
> On 3/14/17 3:36 AM, Michael Noll wrote:
> > I see Jay's point, and I agree with much of it -- notably about being
> > careful which concepts we do and do not expose, depending on which user
> > group / user type is affected.  That said, I'm not sure yet whether or
> not
> > we should get rid of "Topology" (or a similar term) in the DSL.
> >
> > For what it's worth, here's how related technologies define/name their
> > "topologies" and "builders".  Note that, in all cases, it's about
> > constructing a logical processing plan, which then is being executed/run.
> >
> > - `Pipeline` (Google Dataflow/Apache Beam)
> > - To add a source you first instantiate the Source (e.g.
> > `TextIO.Read.from("gs://some/inputData.txt")`),
> >   then attach it to your processing plan via
> `Pipeline#apply()`.
> >   This setup is a bit different to our DSL because in our DSL the
> > builder does both, i.e.
> >   instantiating + auto-attaching to itself.
> > - To execute the processing plan you call `Pipeline#execute()`.
> > - `StreamingContext`` (Spark): This setup is similar to our DSL.
> > - To add a source you call e.g.
> > `StreamingContext#socketTextStream("localhost", )`.
> > - To execute the processing plan you call
> `StreamingContext#execute()`.
> > - `StreamExecutionEnvironment` (Flink): This setup is similar to our DSL.
> > - To add a source you call e.g.
> > `StreamExecutionEnvironment#socketTextStream("localhost", )`.
> > - To execute the processing plan you call
> > `StreamExecutionEnvironment#execute()`.
> > - `Graph`/`Flow` (Akka Streams), as a result of composing Sources (~
> > `KStreamBuilder.stream()`) and Sinks (~ `KStream#to()`)
> >   into Flows, which are [Runnable]Graphs.
> > - You instantiate a Source directly, and then compose the Source with
> > Sinks to create a 

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-14 Thread Matthias J. Sax
Thanks for your input Michael.

>> - KafkaStreams as the new name for the builder that creates the logical
>> plan, with e.g. `KafkaStreams.stream("intput-topic")` and
>> `KafkaStreams.table("input-topic")`.

I don't thinks this is a good idea, for multiple reasons:

(1) We would reuse a name for a completely different purpose. The same
argument for not renaming KStreamBuilder to TopologyBuilder. The
confusion would just be too large.

So if we would start from scratch, it might be ok to do so, but now we
cannot make this move, IMHO.

Also a clarification question: do you suggest to have static methods
#stream and #table -- I am not sure if this would work?
(or was you code snippet just simplification?)


(2) Kafka Streams is basically a "processing client" next to consumer
and producer client. Thus, the name KafkaStreams aligns to the naming
schema of KafkaConsumer and KafkaProducer. I am not sure if it would be
a good choice to "break" this naming scheme.

Btw: this is also the reason, why we have KafkaStreams#close() -- and
not KafkaStreams#stop() -- because #close() aligns with consumer and
producer client.


(3) On more argument against using KafkaStreams as DSL entry class would
be, that it would need to create a Topology that can be given to the
"runner/processing-client". Thus the pattern would be

> Topology topology = streams.build();
> KafkaStramsRunner runner = new KafkaStreamsRunner(..., topology)

(or of course as a one liner).



On the other hand, there was the idea (that we intentionally excluded
from the KIP), to change the "client instantiation" pattern.

Right now, a new client in actively instantiated (ie, by calling "new")
and the topology if provided as a constructor argument. However,
especially for DSL (not sure if it would make sense for PAPI), the DSL
builder could create the client for the user.

Something like this:

> KStreamBuilder builder = new KStreamBuilder();
> builder.whatever() // use the builder
>
> StreamsConfig config = 
> KafkaStreams streams = builder.getKafkaStreams(config);

If we change the patter like this, the notion a the "DSL builder" would
change, as it does not create a topology anymore, but it creates the
"processing client". This would address Jay's concern about "not
exposing concept users don't need the understand" and would not require
to include the word "Topology" in the DSL builder class name, because
the builder does not build a Topology anymore.

I just put some names that came to my mind first hand -- did not think
about good names. It's just to discuss the pattern.



-Matthias





On 3/14/17 3:36 AM, Michael Noll wrote:
> I see Jay's point, and I agree with much of it -- notably about being
> careful which concepts we do and do not expose, depending on which user
> group / user type is affected.  That said, I'm not sure yet whether or not
> we should get rid of "Topology" (or a similar term) in the DSL.
> 
> For what it's worth, here's how related technologies define/name their
> "topologies" and "builders".  Note that, in all cases, it's about
> constructing a logical processing plan, which then is being executed/run.
> 
> - `Pipeline` (Google Dataflow/Apache Beam)
> - To add a source you first instantiate the Source (e.g.
> `TextIO.Read.from("gs://some/inputData.txt")`),
>   then attach it to your processing plan via `Pipeline#apply()`.
>   This setup is a bit different to our DSL because in our DSL the
> builder does both, i.e.
>   instantiating + auto-attaching to itself.
> - To execute the processing plan you call `Pipeline#execute()`.
> - `StreamingContext`` (Spark): This setup is similar to our DSL.
> - To add a source you call e.g.
> `StreamingContext#socketTextStream("localhost", )`.
> - To execute the processing plan you call `StreamingContext#execute()`.
> - `StreamExecutionEnvironment` (Flink): This setup is similar to our DSL.
> - To add a source you call e.g.
> `StreamExecutionEnvironment#socketTextStream("localhost", )`.
> - To execute the processing plan you call
> `StreamExecutionEnvironment#execute()`.
> - `Graph`/`Flow` (Akka Streams), as a result of composing Sources (~
> `KStreamBuilder.stream()`) and Sinks (~ `KStream#to()`)
>   into Flows, which are [Runnable]Graphs.
> - You instantiate a Source directly, and then compose the Source with
> Sinks to create a RunnableGraph:
>   see signature `Source#to[Mat2](sink: Graph[SinkShape[Out], Mat2]):
> RunnableGraph[Mat]`.
> - To execute the processing plan you call `Flow#run()`.
> 
> In our DSL, in comparison, we do:
> 
> - `KStreamBuilder` (Kafka Streams API)
> - To add a source you call e.g. `KStreamBuilder#stream("input-topic")`.
> - To execute the processing plan you create a `KafkaStreams` instance
> from `KStreamBuilder`
>   (where the builder will instantiate the topology = processing plan to
> be executed), and then
>   call `KafkaStreams#start()`.  Think of `KafkaStreams` as our runner.
> 
> 

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-14 Thread Michael Noll
I see Jay's point, and I agree with much of it -- notably about being
careful which concepts we do and do not expose, depending on which user
group / user type is affected.  That said, I'm not sure yet whether or not
we should get rid of "Topology" (or a similar term) in the DSL.

For what it's worth, here's how related technologies define/name their
"topologies" and "builders".  Note that, in all cases, it's about
constructing a logical processing plan, which then is being executed/run.

- `Pipeline` (Google Dataflow/Apache Beam)
- To add a source you first instantiate the Source (e.g.
`TextIO.Read.from("gs://some/inputData.txt")`),
  then attach it to your processing plan via `Pipeline#apply()`.
  This setup is a bit different to our DSL because in our DSL the
builder does both, i.e.
  instantiating + auto-attaching to itself.
- To execute the processing plan you call `Pipeline#execute()`.
- `StreamingContext`` (Spark): This setup is similar to our DSL.
- To add a source you call e.g.
`StreamingContext#socketTextStream("localhost", )`.
- To execute the processing plan you call `StreamingContext#execute()`.
- `StreamExecutionEnvironment` (Flink): This setup is similar to our DSL.
- To add a source you call e.g.
`StreamExecutionEnvironment#socketTextStream("localhost", )`.
- To execute the processing plan you call
`StreamExecutionEnvironment#execute()`.
- `Graph`/`Flow` (Akka Streams), as a result of composing Sources (~
`KStreamBuilder.stream()`) and Sinks (~ `KStream#to()`)
  into Flows, which are [Runnable]Graphs.
- You instantiate a Source directly, and then compose the Source with
Sinks to create a RunnableGraph:
  see signature `Source#to[Mat2](sink: Graph[SinkShape[Out], Mat2]):
RunnableGraph[Mat]`.
- To execute the processing plan you call `Flow#run()`.

In our DSL, in comparison, we do:

- `KStreamBuilder` (Kafka Streams API)
- To add a source you call e.g. `KStreamBuilder#stream("input-topic")`.
- To execute the processing plan you create a `KafkaStreams` instance
from `KStreamBuilder`
  (where the builder will instantiate the topology = processing plan to
be executed), and then
  call `KafkaStreams#start()`.  Think of `KafkaStreams` as our runner.

First, I agree with the sentiment that the current name of `KStreamBuilder`
isn't great (which is why we're having this discussion).  Also, that
finding a good name is tricky. ;-)

Second, even though I agree with many of Jay's points I'm not sure whether
I like the `StreamsBuilder` suggestion (i.e. any name that does not include
"topology" or a similar term) that much more.  It still doesn't describe
what that class actually does, and what the difference to `KafkaStreams`
is.  IMHO, the point of `KStreamBuilder` is that it lets you build a
logical plan (what we call "topology"), and `KafkaStreams` is the thing
that executes that plan.  I'm not yet convinced that abstracting these two
points away from the user is a good idea if the argument is that it's
potentially confusing to beginners (a claim which I am not sure is actually
true).

That said, if we rather favor "good-sounding but perhaps less technically
correct names", I'd argue we should not even use something like "Builder".
We could, for example, also pick the following names:

- KafkaStreams as the new name for the builder that creates the logical
plan, with e.g. `KafkaStreams.stream("intput-topic")` and
`KafkaStreams.table("input-topic")`.
- KafkaStreamsRunner as the new name for the executioner of the plan, with
`KafkaStreamsRunner(KafkaStreams).run()`.



On Tue, Mar 14, 2017 at 5:56 AM, Sriram Subramanian 
wrote:

> StreamsBuilder would be my vote.
>
> > On Mar 13, 2017, at 9:42 PM, Jay Kreps  wrote:
> >
> > Hey Matthias,
> >
> > Make sense, I'm more advocating for removing the word topology than any
> > particular new replacement.
> >
> > -Jay
> >
> > On Mon, Mar 13, 2017 at 12:30 PM, Matthias J. Sax  >
> > wrote:
> >
> >> Jay,
> >>
> >> thanks for your feedback
> >>
> >>> What if instead we called it KStreamsBuilder?
> >>
> >> That's the current name and I personally think it's not the best one.
> >> The main reason why I don't like KStreamsBuilder is, that we have the
> >> concepts of KStreams and KTables, and the builder creates both. However,
> >> the name puts he focus on KStream and devalues KTable.
> >>
> >> I understand your argument, and I am personally open the remove the
> >> "Topology" part, and name it "StreamsBuilder". Not sure what others
> >> think about this.
> >>
> >>
> >> About Processor API: I like the idea in general, but I thinks it's out
> >> of scope for this KIP. KIP-120 has the focus on removing leaking
> >> internal APIs and do some cleanup how our API reflects some concepts.
> >>
> >> However, I added your idea to API discussion Wiki page and we take if
> >> from there:
> >> https://cwiki.apache.org/confluence/display/KAFKA/
> 

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-13 Thread Sriram Subramanian
StreamsBuilder would be my vote.

> On Mar 13, 2017, at 9:42 PM, Jay Kreps  wrote:
> 
> Hey Matthias,
> 
> Make sense, I'm more advocating for removing the word topology than any
> particular new replacement.
> 
> -Jay
> 
> On Mon, Mar 13, 2017 at 12:30 PM, Matthias J. Sax 
> wrote:
> 
>> Jay,
>> 
>> thanks for your feedback
>> 
>>> What if instead we called it KStreamsBuilder?
>> 
>> That's the current name and I personally think it's not the best one.
>> The main reason why I don't like KStreamsBuilder is, that we have the
>> concepts of KStreams and KTables, and the builder creates both. However,
>> the name puts he focus on KStream and devalues KTable.
>> 
>> I understand your argument, and I am personally open the remove the
>> "Topology" part, and name it "StreamsBuilder". Not sure what others
>> think about this.
>> 
>> 
>> About Processor API: I like the idea in general, but I thinks it's out
>> of scope for this KIP. KIP-120 has the focus on removing leaking
>> internal APIs and do some cleanup how our API reflects some concepts.
>> 
>> However, I added your idea to API discussion Wiki page and we take if
>> from there:
>> https://cwiki.apache.org/confluence/display/KAFKA/
>> Kafka+Streams+Discussions
>> 
>> 
>> 
>> -Matthias
>> 
>> 
>>> On 3/13/17 11:52 AM, Jay Kreps wrote:
>>> Two things:
>>> 
>>>   1. This is a minor thing but the proposed new name for KStreamBuilder
>>>   is StreamsTopologyBuilder. I actually think we should not put
>> topology in
>>>   the name as topology is not a concept you need to understand at the
>>>   kstreams layer right now. I'd think of three categories of concepts:
>> (1)
>>>   concepts you need to understand to get going even for a simple
>> example, (2)
>>>   concepts you need to understand to operate and debug a real
>> production app,
>>>   (3) concepts we truly abstract and you don't need to ever understand.
>> I
>>>   think in the kstream layer topologies are currently category (2), and
>> this
>>>   is where they belong. By introducing the name in even the simplest
>> example
>>>   it means the user has to go read about toplogies to really understand
>> even
>>>   this simple snippet. What if instead we called it KStreamsBuilder?
>>>   2. For the processor api, I think this api is mostly not for end
>> users.
>>>   However this are a couple cases where it might make sense to expose
>> it. I
>>>   think users coming from Samza, or JMS's MessageListener (
>>>   https://docs.oracle.com/javaee/7/api/javax/jms/MessageListener.html)
>>>   understand a simple callback interface for message processing. In
>> fact,
>>>   people often ask why Kafka's consumer doesn't provide such an
>> interface.
>>>   I'd argue we do, it's KafkaStreams. The only issue is that the
>> processor
>>>   API documentation is a bit scary for a person implementing this type
>> of
>>>   api. My observation is that people using this style of API don't do a
>> lot
>>>   of cross-message operations, then just do single message operations
>> and use
>>>   a database for anything that spans messages. They also don't factor
>> their
>>>   code into many MessageListeners and compose them, they just have one
>>>   listener that has the complete handling logic. Say I am a user who
>> wants to
>>>   implement a single Processor in this style. Do we have an easy way to
>> do
>>>   that today (either with the .transform/.process methods in kstreams
>> or with
>>>   the topology apis)? Is there anything we can do in the way of trivial
>>>   helper code to make this better? Also, how can we explain that
>> pattern to
>>>   people? I think currently we have pretty in-depth docs on our apis
>> but I
>>>   suspect a person trying to figure out how to implement a simple
>> callback
>>>   might get a bit lost trying to figure out how to wire it up. A simple
>> five
>>>   line example in the docs would probably help a lot. Not sure if this
>> is
>>>   best addressed in this KIP or is a side comment.
>>> 
>>> Cheers,
>>> 
>>> -Jay
>>> 
>>> On Fri, Feb 3, 2017 at 3:33 PM, Matthias J. Sax 
>>> wrote:
>>> 
 Hi All,
 
 I did prepare a KIP to do some cleanup some of Kafka's Streaming API.
 
 Please have a look here:
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-
 120%3A+Cleanup+Kafka+Streams+builder+API
 
 Looking forward to your feedback!
 
 
 -Matthias
>> 
>> 


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-13 Thread Jay Kreps
Hey Matthias,

Make sense, I'm more advocating for removing the word topology than any
particular new replacement.

-Jay

On Mon, Mar 13, 2017 at 12:30 PM, Matthias J. Sax 
wrote:

> Jay,
>
> thanks for your feedback
>
> > What if instead we called it KStreamsBuilder?
>
> That's the current name and I personally think it's not the best one.
> The main reason why I don't like KStreamsBuilder is, that we have the
> concepts of KStreams and KTables, and the builder creates both. However,
> the name puts he focus on KStream and devalues KTable.
>
> I understand your argument, and I am personally open the remove the
> "Topology" part, and name it "StreamsBuilder". Not sure what others
> think about this.
>
>
> About Processor API: I like the idea in general, but I thinks it's out
> of scope for this KIP. KIP-120 has the focus on removing leaking
> internal APIs and do some cleanup how our API reflects some concepts.
>
> However, I added your idea to API discussion Wiki page and we take if
> from there:
> https://cwiki.apache.org/confluence/display/KAFKA/
> Kafka+Streams+Discussions
>
>
>
> -Matthias
>
>
> On 3/13/17 11:52 AM, Jay Kreps wrote:
> > Two things:
> >
> >1. This is a minor thing but the proposed new name for KStreamBuilder
> >is StreamsTopologyBuilder. I actually think we should not put
> topology in
> >the name as topology is not a concept you need to understand at the
> >kstreams layer right now. I'd think of three categories of concepts:
> (1)
> >concepts you need to understand to get going even for a simple
> example, (2)
> >concepts you need to understand to operate and debug a real
> production app,
> >(3) concepts we truly abstract and you don't need to ever understand.
> I
> >think in the kstream layer topologies are currently category (2), and
> this
> >is where they belong. By introducing the name in even the simplest
> example
> >it means the user has to go read about toplogies to really understand
> even
> >this simple snippet. What if instead we called it KStreamsBuilder?
> >2. For the processor api, I think this api is mostly not for end
> users.
> >However this are a couple cases where it might make sense to expose
> it. I
> >think users coming from Samza, or JMS's MessageListener (
> >https://docs.oracle.com/javaee/7/api/javax/jms/MessageListener.html)
> >understand a simple callback interface for message processing. In
> fact,
> >people often ask why Kafka's consumer doesn't provide such an
> interface.
> >I'd argue we do, it's KafkaStreams. The only issue is that the
> processor
> >API documentation is a bit scary for a person implementing this type
> of
> >api. My observation is that people using this style of API don't do a
> lot
> >of cross-message operations, then just do single message operations
> and use
> >a database for anything that spans messages. They also don't factor
> their
> >code into many MessageListeners and compose them, they just have one
> >listener that has the complete handling logic. Say I am a user who
> wants to
> >implement a single Processor in this style. Do we have an easy way to
> do
> >that today (either with the .transform/.process methods in kstreams
> or with
> >the topology apis)? Is there anything we can do in the way of trivial
> >helper code to make this better? Also, how can we explain that
> pattern to
> >people? I think currently we have pretty in-depth docs on our apis
> but I
> >suspect a person trying to figure out how to implement a simple
> callback
> >might get a bit lost trying to figure out how to wire it up. A simple
> five
> >line example in the docs would probably help a lot. Not sure if this
> is
> >best addressed in this KIP or is a side comment.
> >
> > Cheers,
> >
> > -Jay
> >
> > On Fri, Feb 3, 2017 at 3:33 PM, Matthias J. Sax 
> > wrote:
> >
> >> Hi All,
> >>
> >> I did prepare a KIP to do some cleanup some of Kafka's Streaming API.
> >>
> >> Please have a look here:
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> 120%3A+Cleanup+Kafka+Streams+builder+API
> >>
> >> Looking forward to your feedback!
> >>
> >>
> >> -Matthias
> >>
> >>
> >
>
>


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-13 Thread Matthias J. Sax
Steven,

thanks for your feedback.

I am not sure about KafkaStreamsBuilder (even if agree that it is better
than KStreamBuilder), because it sounds like a builder that creates a
KafkaStreams instance. But that's of course not the case. It builds a
Topology -- that was the reason to consider calling it TopologyBuilder.

I suggested StreamsTopologyBuilder (instead of TopologyBuilder) to avoid
any confusion with the current TopologyBuilder (that we are going to
rename to Topology).

We could also go with DslBuilder (or DslTopologyBuilder as suggested by
Michael) -- it should be clear, that this does not build a DSL :) to
contract against KafkaStreamsBuilder.



-Matthias


On 3/13/17 12:46 PM, Steven Schlansker wrote:
> 
>> On Mar 13, 2017, at 12:30 PM, Matthias J. Sax  wrote:
>>
>> Jay,
>>
>> thanks for your feedback
>>
>>> What if instead we called it KStreamsBuilder?
>>
>> That's the current name and I personally think it's not the best one.
>> The main reason why I don't like KStreamsBuilder is, that we have the
>> concepts of KStreams and KTables, and the builder creates both. However,
>> the name puts he focus on KStream and devalues KTable.
>>
>> I understand your argument, and I am personally open the remove the
>> "Topology" part, and name it "StreamsBuilder". Not sure what others
>> think about this.
> 
> If you worry about that, you could consider "KafkaStreamsBuilder"
> which highlights that it is about Kafka Streams the project as
> opposed to KStream the feature.  A little bit wordier, but you probably
> only type it a couple times.
> 



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-13 Thread Steven Schlansker

> On Mar 13, 2017, at 12:30 PM, Matthias J. Sax  wrote:
> 
> Jay,
> 
> thanks for your feedback
> 
>> What if instead we called it KStreamsBuilder?
> 
> That's the current name and I personally think it's not the best one.
> The main reason why I don't like KStreamsBuilder is, that we have the
> concepts of KStreams and KTables, and the builder creates both. However,
> the name puts he focus on KStream and devalues KTable.
> 
> I understand your argument, and I am personally open the remove the
> "Topology" part, and name it "StreamsBuilder". Not sure what others
> think about this.

If you worry about that, you could consider "KafkaStreamsBuilder"
which highlights that it is about Kafka Streams the project as
opposed to KStream the feature.  A little bit wordier, but you probably
only type it a couple times.



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-13 Thread Matthias J. Sax
Jay,

thanks for your feedback

> What if instead we called it KStreamsBuilder?

That's the current name and I personally think it's not the best one.
The main reason why I don't like KStreamsBuilder is, that we have the
concepts of KStreams and KTables, and the builder creates both. However,
the name puts he focus on KStream and devalues KTable.

I understand your argument, and I am personally open the remove the
"Topology" part, and name it "StreamsBuilder". Not sure what others
think about this.


About Processor API: I like the idea in general, but I thinks it's out
of scope for this KIP. KIP-120 has the focus on removing leaking
internal APIs and do some cleanup how our API reflects some concepts.

However, I added your idea to API discussion Wiki page and we take if
from there:
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Discussions



-Matthias


On 3/13/17 11:52 AM, Jay Kreps wrote:
> Two things:
> 
>1. This is a minor thing but the proposed new name for KStreamBuilder
>is StreamsTopologyBuilder. I actually think we should not put topology in
>the name as topology is not a concept you need to understand at the
>kstreams layer right now. I'd think of three categories of concepts: (1)
>concepts you need to understand to get going even for a simple example, (2)
>concepts you need to understand to operate and debug a real production app,
>(3) concepts we truly abstract and you don't need to ever understand. I
>think in the kstream layer topologies are currently category (2), and this
>is where they belong. By introducing the name in even the simplest example
>it means the user has to go read about toplogies to really understand even
>this simple snippet. What if instead we called it KStreamsBuilder?
>2. For the processor api, I think this api is mostly not for end users.
>However this are a couple cases where it might make sense to expose it. I
>think users coming from Samza, or JMS's MessageListener (
>https://docs.oracle.com/javaee/7/api/javax/jms/MessageListener.html)
>understand a simple callback interface for message processing. In fact,
>people often ask why Kafka's consumer doesn't provide such an interface.
>I'd argue we do, it's KafkaStreams. The only issue is that the processor
>API documentation is a bit scary for a person implementing this type of
>api. My observation is that people using this style of API don't do a lot
>of cross-message operations, then just do single message operations and use
>a database for anything that spans messages. They also don't factor their
>code into many MessageListeners and compose them, they just have one
>listener that has the complete handling logic. Say I am a user who wants to
>implement a single Processor in this style. Do we have an easy way to do
>that today (either with the .transform/.process methods in kstreams or with
>the topology apis)? Is there anything we can do in the way of trivial
>helper code to make this better? Also, how can we explain that pattern to
>people? I think currently we have pretty in-depth docs on our apis but I
>suspect a person trying to figure out how to implement a simple callback
>might get a bit lost trying to figure out how to wire it up. A simple five
>line example in the docs would probably help a lot. Not sure if this is
>best addressed in this KIP or is a side comment.
> 
> Cheers,
> 
> -Jay
> 
> On Fri, Feb 3, 2017 at 3:33 PM, Matthias J. Sax 
> wrote:
> 
>> Hi All,
>>
>> I did prepare a KIP to do some cleanup some of Kafka's Streaming API.
>>
>> Please have a look here:
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 120%3A+Cleanup+Kafka+Streams+builder+API
>>
>> Looking forward to your feedback!
>>
>>
>> -Matthias
>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-13 Thread Jay Kreps
Two things:

   1. This is a minor thing but the proposed new name for KStreamBuilder
   is StreamsTopologyBuilder. I actually think we should not put topology in
   the name as topology is not a concept you need to understand at the
   kstreams layer right now. I'd think of three categories of concepts: (1)
   concepts you need to understand to get going even for a simple example, (2)
   concepts you need to understand to operate and debug a real production app,
   (3) concepts we truly abstract and you don't need to ever understand. I
   think in the kstream layer topologies are currently category (2), and this
   is where they belong. By introducing the name in even the simplest example
   it means the user has to go read about toplogies to really understand even
   this simple snippet. What if instead we called it KStreamsBuilder?
   2. For the processor api, I think this api is mostly not for end users.
   However this are a couple cases where it might make sense to expose it. I
   think users coming from Samza, or JMS's MessageListener (
   https://docs.oracle.com/javaee/7/api/javax/jms/MessageListener.html)
   understand a simple callback interface for message processing. In fact,
   people often ask why Kafka's consumer doesn't provide such an interface.
   I'd argue we do, it's KafkaStreams. The only issue is that the processor
   API documentation is a bit scary for a person implementing this type of
   api. My observation is that people using this style of API don't do a lot
   of cross-message operations, then just do single message operations and use
   a database for anything that spans messages. They also don't factor their
   code into many MessageListeners and compose them, they just have one
   listener that has the complete handling logic. Say I am a user who wants to
   implement a single Processor in this style. Do we have an easy way to do
   that today (either with the .transform/.process methods in kstreams or with
   the topology apis)? Is there anything we can do in the way of trivial
   helper code to make this better? Also, how can we explain that pattern to
   people? I think currently we have pretty in-depth docs on our apis but I
   suspect a person trying to figure out how to implement a simple callback
   might get a bit lost trying to figure out how to wire it up. A simple five
   line example in the docs would probably help a lot. Not sure if this is
   best addressed in this KIP or is a side comment.

Cheers,

-Jay

On Fri, Feb 3, 2017 at 3:33 PM, Matthias J. Sax 
wrote:

> Hi All,
>
> I did prepare a KIP to do some cleanup some of Kafka's Streaming API.
>
> Please have a look here:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 120%3A+Cleanup+Kafka+Streams+builder+API
>
> Looking forward to your feedback!
>
>
> -Matthias
>
>


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-11 Thread Matthias J. Sax
Thanks for your feedback Guozhang.


1) There are multiple ways to do this. Let me know what you think about
all options:

(a) I updated the KIP to this:

> public final class Source implements Node {
> public final String name;
> // topicNames and topicPattern are mutually exclusive, i.e., only one 
> will be not-null
> public final List topicNames; // null if #addSource(..., 
> Pattern) was used
> public final Pattern topicPattern; // null if #addSource(..., 
> String...) was used
> }

(b) We could also go with a single variable (as originally proposed).
This would have the advantage (compared to (a)), that null checks are
not required accessing TopologyDescription#Source class.

> String topics; // can be comma separated list of topic names or pattern (as 
> String)

However, with an encoded list or an encoded pattern it's required to
parse the string again, what we want to avoid in the first place.

(c) Use a single variable as in (b)

> String topics; // always a pattern (as String)

We translate a list of topic names into a pattern
"topic1|topic2|topic3". We loose the information if the source was added
via list or via pattern.



2) Your understanding is correct. Added a comment to the KIP.



3) I would keep KafkaStreams#toString() -- it's conceptually two
different things and runtime information is useful, too. But as its
return value is ambiguous to parse (and must be parsed in the first
place what is cumbersome), we could add KafkaStreams#describe() as

> public synchronized KafkaStreamsDescription describe();

KafkaStreamsDescription class would be similar to TopologyDescription to
allow programmatic access to runtime information. I guess we could even
reuse (parts of) TopologyDescription within KafkaStreamsDescription to
avoid code duplication.

If you think this would be useful, I can extend the KIP accordingly.



-Matthias




On 3/10/17 1:38 PM, Guozhang Wang wrote:
> One more question here:
> 
> 3. with TopologyDescription, do we still want to keep the
> `KafkaStream.toString()` function? I think it may still have some advantage
> such that it contains tasks information after `KafkaStream#start()` has
> been called, but much of it is duplicate with the TopologyDescription, and
> it is only in the form of the string hence hard to programmatically
> leverage. So would like to hear your thoughts.
> 
> Guozhang
> 
> On Thu, Mar 9, 2017 at 11:20 PM, Guozhang Wang  wrote:
> 
>> Thanks Matthias, the updated KIP lgtm overall. A couple of minor comments:
>>
>> 1. With regard to this class:
>>
>> public final class Source implements Node {
>> public final String name;
>> public final String topic; // can be topic name or pattern (as
>> String)
>> }
>>
>> Note that the source node could contain more than a single topic, i.e. a
>> list of topics besides a pattern.
>>
>> 2. With regard to
>>
>>   public synchronized TopologyDescription describe();
>>
>> My understand is that whenever the topology is modified, one needs to call
>> this function again to get a new description object, as the old one won't
>> be updated automatically. Hence the usage pattern would be:
>>
>> TopologyDescription description = topology.describe();
>>
>> topology.addProcessor(...)
>>
>> description = topology.describe(); // have to call again
>>
>> ---
>>
>> Is that right? If yes could you clarify this in the wiki?
>>
>>
>>
>> Guozhang
>>
>> On Thu, Mar 9, 2017 at 2:30 AM, Michael Noll  wrote:
>>
>>> Thanks for the update, Matthias.
>>>
>>> +1 to the points 1,2,3,4 you mentioned.
>>>
>>> Naming is always a tricky subject, but renaming KStreamBuilder
>>> to StreamsTopologyBuilder looks ok to me (I would have had a slight
>>> preference towards DslTopologyBuilder, but hey.)  The most important
>>> aspect
>>> is, IMHO, what you also pointed out:  to make it clear that the current
>>> KStreamBuilder actually builds a topology (though currently the latter is
>>> actually called `TopologyBuilder` currently), and not a `KStream`.
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Mar 8, 2017 at 11:43 PM, Matthias J. Sax 
>>> wrote:
>>>
 Hi,

 sorry for not replying earlier and thanks for all your feedback. After
 some more discussions I updated the KIP. The new proposal puts some
 other design considerations into account, that I want to highlight
 shortly. Those considerations, automatically resolve the concerns
>>> raised.

 First some answers:

> The PAPI processors I use in my KStreams app are all functioning on
 KTable
> internals.  I wouldn't be able to convert them to
>>> process()/transform().
>
> What's the harm in permitting both APIs to be used in the same
 application?

 It's not about "harm" but about design. We want to switch from a
 "inheritance" to a "composition" pattern.

 About the interface idea: using a shared 

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-10 Thread Guozhang Wang
One more question here:

3. with TopologyDescription, do we still want to keep the
`KafkaStream.toString()` function? I think it may still have some advantage
such that it contains tasks information after `KafkaStream#start()` has
been called, but much of it is duplicate with the TopologyDescription, and
it is only in the form of the string hence hard to programmatically
leverage. So would like to hear your thoughts.

Guozhang

On Thu, Mar 9, 2017 at 11:20 PM, Guozhang Wang  wrote:

> Thanks Matthias, the updated KIP lgtm overall. A couple of minor comments:
>
> 1. With regard to this class:
>
> public final class Source implements Node {
> public final String name;
> public final String topic; // can be topic name or pattern (as
> String)
> }
>
> Note that the source node could contain more than a single topic, i.e. a
> list of topics besides a pattern.
>
> 2. With regard to
>
>   public synchronized TopologyDescription describe();
>
> My understand is that whenever the topology is modified, one needs to call
> this function again to get a new description object, as the old one won't
> be updated automatically. Hence the usage pattern would be:
>
> TopologyDescription description = topology.describe();
>
> topology.addProcessor(...)
>
> description = topology.describe(); // have to call again
>
> ---
>
> Is that right? If yes could you clarify this in the wiki?
>
>
>
> Guozhang
>
> On Thu, Mar 9, 2017 at 2:30 AM, Michael Noll  wrote:
>
>> Thanks for the update, Matthias.
>>
>> +1 to the points 1,2,3,4 you mentioned.
>>
>> Naming is always a tricky subject, but renaming KStreamBuilder
>> to StreamsTopologyBuilder looks ok to me (I would have had a slight
>> preference towards DslTopologyBuilder, but hey.)  The most important
>> aspect
>> is, IMHO, what you also pointed out:  to make it clear that the current
>> KStreamBuilder actually builds a topology (though currently the latter is
>> actually called `TopologyBuilder` currently), and not a `KStream`.
>>
>>
>>
>>
>>
>> On Wed, Mar 8, 2017 at 11:43 PM, Matthias J. Sax 
>> wrote:
>>
>> > Hi,
>> >
>> > sorry for not replying earlier and thanks for all your feedback. After
>> > some more discussions I updated the KIP. The new proposal puts some
>> > other design considerations into account, that I want to highlight
>> > shortly. Those considerations, automatically resolve the concerns
>> raised.
>> >
>> > First some answers:
>> >
>> > > The PAPI processors I use in my KStreams app are all functioning on
>> > KTable
>> > > internals.  I wouldn't be able to convert them to
>> process()/transform().
>> > >
>> > > What's the harm in permitting both APIs to be used in the same
>> > application?
>> >
>> > It's not about "harm" but about design. We want to switch from a
>> > "inheritance" to a "composition" pattern.
>> >
>> > About the interface idea: using a shared interface would not help to get
>> > a composition pattern
>> >
>> >
>> > Next I want to give the design considerations leading to the updated
>> KIP:
>> >
>> > 1) Using KStreamBuilder in the constructor of KafkaStreams is unnatural.
>> > KafkaStreams client executes a `Topology` and this execution should be
>> > independent of the way the topology is "put together", ie, low-level API
>> > or DSL.
>> >
>> > 2) Thus, we don't want to have any changes to KafkaStreams class.
>> >
>> > 3) Thus, KStreamBuilder needs to have a method `build()` that returns a
>> > `Topology` that can be passed into KafakStreams.
>> >
>> > 4) Because `KStreamBuilder` should build a `Topology` I suggest to
>> > rename the new class to `StreamsTopologyBuilder` (the name
>> > TopologyBuilder would actually be more natural, but would be easily
>> > confused with old low-level API TopologyBuilder).
>> >
>> > Thus, PAPI and DSL can be mixed-and-matched with full power, as
>> > StreamsTopologyBuilder return the created Topology via #build().
>> >
>> > I also removed `final` for both builder classes.
>> >
>> >
>> >
>> > With regard to the larger scope of the overal API redesign, I also want
>> > to point to a summary of API issues:
>> > https://cwiki.apache.org/confluence/display/KAFKA/
>> > Kafka+Streams+Discussions
>> >
>> > Thus, this KIP is only one building block of a larger improvement
>> > effort, and we hope to get as much as possible done for 0.11. If you
>> > have any API improvement ideas, please share them so we can come up with
>> > an holistic sound design (instead of uncoordinated local improvements
>> > that might diverge)
>> >
>> >
>> >
>> > Looking forward to your feedback on this KIP and the other API issues.
>> >
>> >
>> >
>> > -Matthias
>> >
>> >
>> >
>> >
>> > On 2/15/17 7:36 PM, Mathieu Fenniak wrote:
>> > > On Wed, Feb 15, 2017 at 5:04 PM, Matthias J. Sax <
>> matth...@confluent.io>
>> > > wrote:
>> > >
>> > >> - We also removed method #topologyBuilder() from KStreamBuilder
>> because
>> > >> we think 

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-09 Thread Guozhang Wang
Thanks Matthias, the updated KIP lgtm overall. A couple of minor comments:

1. With regard to this class:

public final class Source implements Node {
public final String name;
public final String topic; // can be topic name or pattern (as
String)
}

Note that the source node could contain more than a single topic, i.e. a
list of topics besides a pattern.

2. With regard to

  public synchronized TopologyDescription describe();

My understand is that whenever the topology is modified, one needs to call
this function again to get a new description object, as the old one won't
be updated automatically. Hence the usage pattern would be:

TopologyDescription description = topology.describe();

topology.addProcessor(...)

description = topology.describe(); // have to call again

---

Is that right? If yes could you clarify this in the wiki?



Guozhang

On Thu, Mar 9, 2017 at 2:30 AM, Michael Noll  wrote:

> Thanks for the update, Matthias.
>
> +1 to the points 1,2,3,4 you mentioned.
>
> Naming is always a tricky subject, but renaming KStreamBuilder
> to StreamsTopologyBuilder looks ok to me (I would have had a slight
> preference towards DslTopologyBuilder, but hey.)  The most important aspect
> is, IMHO, what you also pointed out:  to make it clear that the current
> KStreamBuilder actually builds a topology (though currently the latter is
> actually called `TopologyBuilder` currently), and not a `KStream`.
>
>
>
>
>
> On Wed, Mar 8, 2017 at 11:43 PM, Matthias J. Sax 
> wrote:
>
> > Hi,
> >
> > sorry for not replying earlier and thanks for all your feedback. After
> > some more discussions I updated the KIP. The new proposal puts some
> > other design considerations into account, that I want to highlight
> > shortly. Those considerations, automatically resolve the concerns raised.
> >
> > First some answers:
> >
> > > The PAPI processors I use in my KStreams app are all functioning on
> > KTable
> > > internals.  I wouldn't be able to convert them to
> process()/transform().
> > >
> > > What's the harm in permitting both APIs to be used in the same
> > application?
> >
> > It's not about "harm" but about design. We want to switch from a
> > "inheritance" to a "composition" pattern.
> >
> > About the interface idea: using a shared interface would not help to get
> > a composition pattern
> >
> >
> > Next I want to give the design considerations leading to the updated KIP:
> >
> > 1) Using KStreamBuilder in the constructor of KafkaStreams is unnatural.
> > KafkaStreams client executes a `Topology` and this execution should be
> > independent of the way the topology is "put together", ie, low-level API
> > or DSL.
> >
> > 2) Thus, we don't want to have any changes to KafkaStreams class.
> >
> > 3) Thus, KStreamBuilder needs to have a method `build()` that returns a
> > `Topology` that can be passed into KafakStreams.
> >
> > 4) Because `KStreamBuilder` should build a `Topology` I suggest to
> > rename the new class to `StreamsTopologyBuilder` (the name
> > TopologyBuilder would actually be more natural, but would be easily
> > confused with old low-level API TopologyBuilder).
> >
> > Thus, PAPI and DSL can be mixed-and-matched with full power, as
> > StreamsTopologyBuilder return the created Topology via #build().
> >
> > I also removed `final` for both builder classes.
> >
> >
> >
> > With regard to the larger scope of the overal API redesign, I also want
> > to point to a summary of API issues:
> > https://cwiki.apache.org/confluence/display/KAFKA/
> > Kafka+Streams+Discussions
> >
> > Thus, this KIP is only one building block of a larger improvement
> > effort, and we hope to get as much as possible done for 0.11. If you
> > have any API improvement ideas, please share them so we can come up with
> > an holistic sound design (instead of uncoordinated local improvements
> > that might diverge)
> >
> >
> >
> > Looking forward to your feedback on this KIP and the other API issues.
> >
> >
> >
> > -Matthias
> >
> >
> >
> >
> > On 2/15/17 7:36 PM, Mathieu Fenniak wrote:
> > > On Wed, Feb 15, 2017 at 5:04 PM, Matthias J. Sax <
> matth...@confluent.io>
> > > wrote:
> > >
> > >> - We also removed method #topologyBuilder() from KStreamBuilder
> because
> > >> we think #transform() should provide all functionality you need to
> > >> mix-an-match Processor API and DSL. If there is any further concern
> > >> about this, please let us know.
> > >>
> > >
> > > Hi Matthias,
> > >
> > > Yes, I'm sorry I didn't respond sooner, but I still have a lot of
> > concerns
> > > about this.  You're correct to point out that transform() can be used
> for
> > > some of the output situations I pointed out; albeit it seems somewhat
> > > awkward to do so in a "transform" method; what do you do with the
> retval?
> > >
> > > The PAPI processors I use in my KStreams app are all functioning on
> > KTable
> > > internals.  I wouldn't be able to convert them 

Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-09 Thread Michael Noll
Thanks for the update, Matthias.

+1 to the points 1,2,3,4 you mentioned.

Naming is always a tricky subject, but renaming KStreamBuilder
to StreamsTopologyBuilder looks ok to me (I would have had a slight
preference towards DslTopologyBuilder, but hey.)  The most important aspect
is, IMHO, what you also pointed out:  to make it clear that the current
KStreamBuilder actually builds a topology (though currently the latter is
actually called `TopologyBuilder` currently), and not a `KStream`.





On Wed, Mar 8, 2017 at 11:43 PM, Matthias J. Sax 
wrote:

> Hi,
>
> sorry for not replying earlier and thanks for all your feedback. After
> some more discussions I updated the KIP. The new proposal puts some
> other design considerations into account, that I want to highlight
> shortly. Those considerations, automatically resolve the concerns raised.
>
> First some answers:
>
> > The PAPI processors I use in my KStreams app are all functioning on
> KTable
> > internals.  I wouldn't be able to convert them to process()/transform().
> >
> > What's the harm in permitting both APIs to be used in the same
> application?
>
> It's not about "harm" but about design. We want to switch from a
> "inheritance" to a "composition" pattern.
>
> About the interface idea: using a shared interface would not help to get
> a composition pattern
>
>
> Next I want to give the design considerations leading to the updated KIP:
>
> 1) Using KStreamBuilder in the constructor of KafkaStreams is unnatural.
> KafkaStreams client executes a `Topology` and this execution should be
> independent of the way the topology is "put together", ie, low-level API
> or DSL.
>
> 2) Thus, we don't want to have any changes to KafkaStreams class.
>
> 3) Thus, KStreamBuilder needs to have a method `build()` that returns a
> `Topology` that can be passed into KafakStreams.
>
> 4) Because `KStreamBuilder` should build a `Topology` I suggest to
> rename the new class to `StreamsTopologyBuilder` (the name
> TopologyBuilder would actually be more natural, but would be easily
> confused with old low-level API TopologyBuilder).
>
> Thus, PAPI and DSL can be mixed-and-matched with full power, as
> StreamsTopologyBuilder return the created Topology via #build().
>
> I also removed `final` for both builder classes.
>
>
>
> With regard to the larger scope of the overal API redesign, I also want
> to point to a summary of API issues:
> https://cwiki.apache.org/confluence/display/KAFKA/
> Kafka+Streams+Discussions
>
> Thus, this KIP is only one building block of a larger improvement
> effort, and we hope to get as much as possible done for 0.11. If you
> have any API improvement ideas, please share them so we can come up with
> an holistic sound design (instead of uncoordinated local improvements
> that might diverge)
>
>
>
> Looking forward to your feedback on this KIP and the other API issues.
>
>
>
> -Matthias
>
>
>
>
> On 2/15/17 7:36 PM, Mathieu Fenniak wrote:
> > On Wed, Feb 15, 2017 at 5:04 PM, Matthias J. Sax 
> > wrote:
> >
> >> - We also removed method #topologyBuilder() from KStreamBuilder because
> >> we think #transform() should provide all functionality you need to
> >> mix-an-match Processor API and DSL. If there is any further concern
> >> about this, please let us know.
> >>
> >
> > Hi Matthias,
> >
> > Yes, I'm sorry I didn't respond sooner, but I still have a lot of
> concerns
> > about this.  You're correct to point out that transform() can be used for
> > some of the output situations I pointed out; albeit it seems somewhat
> > awkward to do so in a "transform" method; what do you do with the retval?
> >
> > The PAPI processors I use in my KStreams app are all functioning on
> KTable
> > internals.  I wouldn't be able to convert them to process()/transform().
> >
> > What's the harm in permitting both APIs to be used in the same
> application?
> >
> > Mathieu
> >
>
>


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-03-08 Thread Matthias J. Sax
Hi,

sorry for not replying earlier and thanks for all your feedback. After
some more discussions I updated the KIP. The new proposal puts some
other design considerations into account, that I want to highlight
shortly. Those considerations, automatically resolve the concerns raised.

First some answers:

> The PAPI processors I use in my KStreams app are all functioning on KTable
> internals.  I wouldn't be able to convert them to process()/transform().
> 
> What's the harm in permitting both APIs to be used in the same application?

It's not about "harm" but about design. We want to switch from a
"inheritance" to a "composition" pattern.

About the interface idea: using a shared interface would not help to get
a composition pattern


Next I want to give the design considerations leading to the updated KIP:

1) Using KStreamBuilder in the constructor of KafkaStreams is unnatural.
KafkaStreams client executes a `Topology` and this execution should be
independent of the way the topology is "put together", ie, low-level API
or DSL.

2) Thus, we don't want to have any changes to KafkaStreams class.

3) Thus, KStreamBuilder needs to have a method `build()` that returns a
`Topology` that can be passed into KafakStreams.

4) Because `KStreamBuilder` should build a `Topology` I suggest to
rename the new class to `StreamsTopologyBuilder` (the name
TopologyBuilder would actually be more natural, but would be easily
confused with old low-level API TopologyBuilder).

Thus, PAPI and DSL can be mixed-and-matched with full power, as
StreamsTopologyBuilder return the created Topology via #build().

I also removed `final` for both builder classes.



With regard to the larger scope of the overal API redesign, I also want
to point to a summary of API issues:
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Discussions

Thus, this KIP is only one building block of a larger improvement
effort, and we hope to get as much as possible done for 0.11. If you
have any API improvement ideas, please share them so we can come up with
an holistic sound design (instead of uncoordinated local improvements
that might diverge)



Looking forward to your feedback on this KIP and the other API issues.



-Matthias




On 2/15/17 7:36 PM, Mathieu Fenniak wrote:
> On Wed, Feb 15, 2017 at 5:04 PM, Matthias J. Sax 
> wrote:
> 
>> - We also removed method #topologyBuilder() from KStreamBuilder because
>> we think #transform() should provide all functionality you need to
>> mix-an-match Processor API and DSL. If there is any further concern
>> about this, please let us know.
>>
> 
> Hi Matthias,
> 
> Yes, I'm sorry I didn't respond sooner, but I still have a lot of concerns
> about this.  You're correct to point out that transform() can be used for
> some of the output situations I pointed out; albeit it seems somewhat
> awkward to do so in a "transform" method; what do you do with the retval?
> 
> The PAPI processors I use in my KStreams app are all functioning on KTable
> internals.  I wouldn't be able to convert them to process()/transform().
> 
> What's the harm in permitting both APIs to be used in the same application?
> 
> Mathieu
> 



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-02-15 Thread Mathieu Fenniak
On Wed, Feb 15, 2017 at 5:04 PM, Matthias J. Sax 
wrote:

> - We also removed method #topologyBuilder() from KStreamBuilder because
> we think #transform() should provide all functionality you need to
> mix-an-match Processor API and DSL. If there is any further concern
> about this, please let us know.
>

Hi Matthias,

Yes, I'm sorry I didn't respond sooner, but I still have a lot of concerns
about this.  You're correct to point out that transform() can be used for
some of the output situations I pointed out; albeit it seems somewhat
awkward to do so in a "transform" method; what do you do with the retval?

The PAPI processors I use in my KStreams app are all functioning on KTable
internals.  I wouldn't be able to convert them to process()/transform().

What's the harm in permitting both APIs to be used in the same application?

Mathieu


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-02-14 Thread Matthias J. Sax
You can already output any number of record within .transform() using
the provided Context object from init()...


-Matthias

On 2/14/17 9:16 AM, Guozhang Wang wrote:
>> and you can't output multiple records or branching logic from a
> transform();
> 
> For output multiple records in transform, we are currently working on
> https://issues.apache.org/jira/browse/KAFKA-4217, I think that should cover
> this use case.
> 
> For branching the output in transform, I agree this is not perfect but I
> think users can follow some patterns like "stream.transform().branch()",
> would that work for you?
> 
> 
> Guozhang
> 
> 
> On Tue, Feb 14, 2017 at 8:29 AM, Mathieu Fenniak <
> mathieu.fenn...@replicon.com> wrote:
> 
>> On Tue, Feb 14, 2017 at 1:14 AM, Guozhang Wang  wrote:
>>
>>> Some thoughts on the mixture usage of DSL / PAPI:
>>>
>>> There were some suggestions on mixing the usage of DSL and PAPI:
>>> https://issues.apache.org/jira/browse/KAFKA-3455, and after thinking it
>> a
>>> bit more carefully, I'd rather not recommend users following this
>> pattern,
>>> since in DSL this can always be achieved in process() / transform().
>> Hence
>>> I think it is okay to prevent such patterns in the new APIs. And for the
>>> same reasons, I think we can remove KStreamBuilder#newName() from the
>>> public APIs.
>>>
>>
>> I'm not sure that things can always be achieved by process() /
>> transform()... there are some limitations to these APIs.  You can't output
>> from a process(), and you can't output multiple records or branching logic
>> from a transform(); these are things that can be done in the PAPI quite
>> easily.
>>
>> I definitely understand a preference for using process()/transform() where
>> possible, but, they don't seem to replace the PAPI.
>>
>> I would love to be operating in a world that was entirely DSL.  But the DSL
>> is limited, and it isn't extensible (... by any stable API).  I don't mind
>> reaching into internals today and making my own life difficult to extend
>> it, and I'd continue to find a way to do that if you made the APIs distinct
>> and split, but I'm just expressing my preference that you not do that. :-)
>>
>> And about printing the topology for debuggability: I agrees this is a
>>> potential drawback, and I'd suggest maintain some functionality to build
>> a
>>> "dry topology" as Mathieu suggested; the difficulty is that, internally
>> we
>>> need a different "copy" of the topology for each thread so that they will
>>> not share any states, so we cannot directly pass in the topology into
>>> KafkaStreams instead of the topology builder. So how about adding a
>>> `ToplogyBuilder#toString` function which calls `build()` internally then
>>> prints the built dry topology?
>>>
>>
>> Well, this sounds better than KafkaStreams#toString() in that it doesn't
>> require a running processor.  But I'd really love to have a simple object
>> model for the topology, not a string output, so that I can output my own
>> debug format.  I currently have that in the form of
>> TopologyBuilder#nodeGroups() & TopologyBuilder#build(Integer).
>>
>> Mathieu
>>
> 
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-02-14 Thread Guozhang Wang
> and you can't output multiple records or branching logic from a
transform();

For output multiple records in transform, we are currently working on
https://issues.apache.org/jira/browse/KAFKA-4217, I think that should cover
this use case.

For branching the output in transform, I agree this is not perfect but I
think users can follow some patterns like "stream.transform().branch()",
would that work for you?


Guozhang


On Tue, Feb 14, 2017 at 8:29 AM, Mathieu Fenniak <
mathieu.fenn...@replicon.com> wrote:

> On Tue, Feb 14, 2017 at 1:14 AM, Guozhang Wang  wrote:
>
> > Some thoughts on the mixture usage of DSL / PAPI:
> >
> > There were some suggestions on mixing the usage of DSL and PAPI:
> > https://issues.apache.org/jira/browse/KAFKA-3455, and after thinking it
> a
> > bit more carefully, I'd rather not recommend users following this
> pattern,
> > since in DSL this can always be achieved in process() / transform().
> Hence
> > I think it is okay to prevent such patterns in the new APIs. And for the
> > same reasons, I think we can remove KStreamBuilder#newName() from the
> > public APIs.
> >
>
> I'm not sure that things can always be achieved by process() /
> transform()... there are some limitations to these APIs.  You can't output
> from a process(), and you can't output multiple records or branching logic
> from a transform(); these are things that can be done in the PAPI quite
> easily.
>
> I definitely understand a preference for using process()/transform() where
> possible, but, they don't seem to replace the PAPI.
>
> I would love to be operating in a world that was entirely DSL.  But the DSL
> is limited, and it isn't extensible (... by any stable API).  I don't mind
> reaching into internals today and making my own life difficult to extend
> it, and I'd continue to find a way to do that if you made the APIs distinct
> and split, but I'm just expressing my preference that you not do that. :-)
>
> And about printing the topology for debuggability: I agrees this is a
> > potential drawback, and I'd suggest maintain some functionality to build
> a
> > "dry topology" as Mathieu suggested; the difficulty is that, internally
> we
> > need a different "copy" of the topology for each thread so that they will
> > not share any states, so we cannot directly pass in the topology into
> > KafkaStreams instead of the topology builder. So how about adding a
> > `ToplogyBuilder#toString` function which calls `build()` internally then
> > prints the built dry topology?
> >
>
> Well, this sounds better than KafkaStreams#toString() in that it doesn't
> require a running processor.  But I'd really love to have a simple object
> model for the topology, not a string output, so that I can output my own
> debug format.  I currently have that in the form of
> TopologyBuilder#nodeGroups() & TopologyBuilder#build(Integer).
>
> Mathieu
>



-- 
-- Guozhang


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-02-14 Thread Mathieu Fenniak
On Tue, Feb 14, 2017 at 9:37 AM, Damian Guy  wrote:

> > And about printing the topology for debuggability: I agrees this is a
> > > potential drawback, and I'd suggest maintain some functionality to
> build
> > a
> > > "dry topology" as Mathieu suggested; the difficulty is that, internally
> > we
> > > need a different "copy" of the topology for each thread so that they
> will
> > > not share any states, so we cannot directly pass in the topology into
> > > KafkaStreams instead of the topology builder. So how about adding a
> > > `ToplogyBuilder#toString` function which calls `build()` internally
> then
> > > prints the built dry topology?
> > >
> >
> > Well, this sounds better than KafkaStreams#toString() in that it doesn't
> > require a running processor.  But I'd really love to have a simple object
> > model for the topology, not a string output, so that I can output my own
> > debug format.  I currently have that in the form of
> > TopologyBuilder#nodeGroups() & TopologyBuilder#build(Integer).
> >
>
> How about a method on TopologyBuilder that you pass a functional interface
> to and it gets called once for each ProcessorTopology? We may need to add a
> new public class that represents the topology (i'm not sure we want to
> 'expose` ProcessorTopology as public).
>

That sounds awesome.

Mathieu


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-02-14 Thread Damian Guy
> And about printing the topology for debuggability: I agrees this is a
> > potential drawback, and I'd suggest maintain some functionality to build
> a
> > "dry topology" as Mathieu suggested; the difficulty is that, internally
> we
> > need a different "copy" of the topology for each thread so that they will
> > not share any states, so we cannot directly pass in the topology into
> > KafkaStreams instead of the topology builder. So how about adding a
> > `ToplogyBuilder#toString` function which calls `build()` internally then
> > prints the built dry topology?
> >
>
> Well, this sounds better than KafkaStreams#toString() in that it doesn't
> require a running processor.  But I'd really love to have a simple object
> model for the topology, not a string output, so that I can output my own
> debug format.  I currently have that in the form of
> TopologyBuilder#nodeGroups() & TopologyBuilder#build(Integer).
>

How about a method on TopologyBuilder that you pass a functional interface
to and it gets called once for each ProcessorTopology? We may need to add a
new public class that represents the topology (i'm not sure we want to
'expose` ProcessorTopology as public).


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-02-14 Thread Mathieu Fenniak
On Tue, Feb 14, 2017 at 1:14 AM, Guozhang Wang  wrote:

> Some thoughts on the mixture usage of DSL / PAPI:
>
> There were some suggestions on mixing the usage of DSL and PAPI:
> https://issues.apache.org/jira/browse/KAFKA-3455, and after thinking it a
> bit more carefully, I'd rather not recommend users following this pattern,
> since in DSL this can always be achieved in process() / transform(). Hence
> I think it is okay to prevent such patterns in the new APIs. And for the
> same reasons, I think we can remove KStreamBuilder#newName() from the
> public APIs.
>

I'm not sure that things can always be achieved by process() /
transform()... there are some limitations to these APIs.  You can't output
from a process(), and you can't output multiple records or branching logic
from a transform(); these are things that can be done in the PAPI quite
easily.

I definitely understand a preference for using process()/transform() where
possible, but, they don't seem to replace the PAPI.

I would love to be operating in a world that was entirely DSL.  But the DSL
is limited, and it isn't extensible (... by any stable API).  I don't mind
reaching into internals today and making my own life difficult to extend
it, and I'd continue to find a way to do that if you made the APIs distinct
and split, but I'm just expressing my preference that you not do that. :-)

And about printing the topology for debuggability: I agrees this is a
> potential drawback, and I'd suggest maintain some functionality to build a
> "dry topology" as Mathieu suggested; the difficulty is that, internally we
> need a different "copy" of the topology for each thread so that they will
> not share any states, so we cannot directly pass in the topology into
> KafkaStreams instead of the topology builder. So how about adding a
> `ToplogyBuilder#toString` function which calls `build()` internally then
> prints the built dry topology?
>

Well, this sounds better than KafkaStreams#toString() in that it doesn't
require a running processor.  But I'd really love to have a simple object
model for the topology, not a string output, so that I can output my own
debug format.  I currently have that in the form of
TopologyBuilder#nodeGroups() & TopologyBuilder#build(Integer).

Mathieu


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-02-14 Thread Guozhang Wang
Some thoughts on the mixture usage of DSL / PAPI:

There were some suggestions on mixing the usage of DSL and PAPI:
https://issues.apache.org/jira/browse/KAFKA-3455, and after thinking it a
bit more carefully, I'd rather not recommend users following this pattern,
since in DSL this can always be achieved in process() / transform(). Hence
I think it is okay to prevent such patterns in the new APIs. And for the
same reasons, I think we can remove KStreamBuilder#newName() from the
public APIs.

About KStreamBuilder#addInternalTopic(): I admit that we can optimize the
built topology for cases like "table.groupBy(..).aggregate(fn1);
table.groupBy(/*same
groupBy key*/).aggregate(fn2);" we can reuse the same repartition topic,
and we have plans to apply query optimization to the building process of
the topology. For now I'd rather suggest using KStream#through() to reuse
the topic.

And about printing the topology for debuggability: I agrees this is a
potential drawback, and I'd suggest maintain some functionality to build a
"dry topology" as Mathieu suggested; the difficulty is that, internally we
need a different "copy" of the topology for each thread so that they will
not share any states, so we cannot directly pass in the topology into
KafkaStreams instead of the topology builder. So how about adding a
`ToplogyBuilder#toString` function which calls `build()` internally then
prints the built dry topology?


Guozhang


On Tue, Feb 7, 2017 at 6:32 AM, Mathieu Fenniak <
mathieu.fenn...@replicon.com> wrote:

> On Mon, Feb 6, 2017 at 2:35 PM, Matthias J. Sax 
> wrote:
>
> > - adding KStreamBuilder#topologyBuilder() seems like be a good idea to
> > address any concern with limited access to TopologyBuilder and DSL/PAPI
> > mix-and-match approach. However, we should try to cover as much as
> > possible with #process(), #transform() etc.
> >
>
> That sounds like it'll work for me.
>
>
> > - about TopologyBuilder.nodeGroups & TopologyBuilder.build: not sure
> > what analysis you do -- there is also KafkaStreams#toString() that
> > describes the topology/DAG of the job. @Mathieu: Could you use this for
> > your analysis?
> >
>
> Well, I'd like to be able to output a graphviz diagram of my processor
> topology.  I am aware of KafkaStreams#toString(), but, it isn't the format
> I want, if I remember correctly I found it was ambiguous to parse &
> transform, and it also has the limitation of requiring a running and
> processing application as toString() doesn't return anything useful until
> the consumer stream threads are running.
>
> What I've whipped up with the existing ProcessorTopology API (
> https://gist.github.com/mfenniak/04f9c0bea8a1a2e0a747d678117df9f7) just
> builds a "dry" topology (ie. no data being processed) and outputs a graph.
> It's hooked into my app so that I can run with a specific command-line
> option to output the graph without having to start the processor.
>
> It's not the worst thing in the world to lose, or to have to jump through
> some reflection hoops to do. :-)  Perhaps a better approach would be to
> have an API designed specifically for this kind of introspection,
> independent of the much more commonly used API to build a topology.
>
> Mathieu
>



-- 
-- Guozhang


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-02-07 Thread Mathieu Fenniak
On Mon, Feb 6, 2017 at 2:35 PM, Matthias J. Sax 
wrote:

> - adding KStreamBuilder#topologyBuilder() seems like be a good idea to
> address any concern with limited access to TopologyBuilder and DSL/PAPI
> mix-and-match approach. However, we should try to cover as much as
> possible with #process(), #transform() etc.
>

That sounds like it'll work for me.


> - about TopologyBuilder.nodeGroups & TopologyBuilder.build: not sure
> what analysis you do -- there is also KafkaStreams#toString() that
> describes the topology/DAG of the job. @Mathieu: Could you use this for
> your analysis?
>

Well, I'd like to be able to output a graphviz diagram of my processor
topology.  I am aware of KafkaStreams#toString(), but, it isn't the format
I want, if I remember correctly I found it was ambiguous to parse &
transform, and it also has the limitation of requiring a running and
processing application as toString() doesn't return anything useful until
the consumer stream threads are running.

What I've whipped up with the existing ProcessorTopology API (
https://gist.github.com/mfenniak/04f9c0bea8a1a2e0a747d678117df9f7) just
builds a "dry" topology (ie. no data being processed) and outputs a graph.
It's hooked into my app so that I can run with a specific command-line
option to output the graph without having to start the processor.

It's not the worst thing in the world to lose, or to have to jump through
some reflection hoops to do. :-)  Perhaps a better approach would be to
have an API designed specifically for this kind of introspection,
independent of the much more commonly used API to build a topology.

Mathieu


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-02-06 Thread Gwen Shapira
Sounds good :)

On Mon, Feb 6, 2017 at 5:40 PM, Matthias J. Sax  wrote:
> Gwen,
>
> thanks for your feedback.
>
> I completely agree that KStreamBuilder#merge() is miss placed and should
> belong to KStream. However, I wanted to keep this KIP focus on one thing.
>
> As mentioned in a previous reply, we plan to have at least one more KIP
> to clean up DSL -- this future KIP should include exact this change.
>
>
> -Matthias
>
>
> On 2/6/17 4:26 PM, Gwen Shapira wrote:
>> I like the cleanup a lot :)
>>
>> The cleaner lines between PAPI and DSL are very helpful to beginners
>> who try to make sense of a long list of methods.
>>
>> I noticed that the "merge" method is still part of StreamBuilder. I
>> thought it belongs inside KStream. Merge seems a lot like the SQL
>> "union" operator, so I expect it to be a method of the same object as
>> "join". At least, it isn't immediately clear to me why "join" and
>> "merge" belong in two different levels of the hierarchy. They both
>> transform two (or more) streams into one.
>>
>> Gwen
>>
>> On Fri, Feb 3, 2017 at 3:33 PM, Matthias J. Sax  
>> wrote:
>>> Hi All,
>>>
>>> I did prepare a KIP to do some cleanup some of Kafka's Streaming API.
>>>
>>> Please have a look here:
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-120%3A+Cleanup+Kafka+Streams+builder+API
>>>
>>> Looking forward to your feedback!
>>>
>>>
>>> -Matthias
>>>
>>
>>
>>
>



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-02-06 Thread Matthias J. Sax
Gwen,

thanks for your feedback.

I completely agree that KStreamBuilder#merge() is miss placed and should
belong to KStream. However, I wanted to keep this KIP focus on one thing.

As mentioned in a previous reply, we plan to have at least one more KIP
to clean up DSL -- this future KIP should include exact this change.


-Matthias


On 2/6/17 4:26 PM, Gwen Shapira wrote:
> I like the cleanup a lot :)
> 
> The cleaner lines between PAPI and DSL are very helpful to beginners
> who try to make sense of a long list of methods.
> 
> I noticed that the "merge" method is still part of StreamBuilder. I
> thought it belongs inside KStream. Merge seems a lot like the SQL
> "union" operator, so I expect it to be a method of the same object as
> "join". At least, it isn't immediately clear to me why "join" and
> "merge" belong in two different levels of the hierarchy. They both
> transform two (or more) streams into one.
> 
> Gwen
> 
> On Fri, Feb 3, 2017 at 3:33 PM, Matthias J. Sax  wrote:
>> Hi All,
>>
>> I did prepare a KIP to do some cleanup some of Kafka's Streaming API.
>>
>> Please have a look here:
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-120%3A+Cleanup+Kafka+Streams+builder+API
>>
>> Looking forward to your feedback!
>>
>>
>> -Matthias
>>
> 
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-02-06 Thread Gwen Shapira
I like the cleanup a lot :)

The cleaner lines between PAPI and DSL are very helpful to beginners
who try to make sense of a long list of methods.

I noticed that the "merge" method is still part of StreamBuilder. I
thought it belongs inside KStream. Merge seems a lot like the SQL
"union" operator, so I expect it to be a method of the same object as
"join". At least, it isn't immediately clear to me why "join" and
"merge" belong in two different levels of the hierarchy. They both
transform two (or more) streams into one.

Gwen

On Fri, Feb 3, 2017 at 3:33 PM, Matthias J. Sax  wrote:
> Hi All,
>
> I did prepare a KIP to do some cleanup some of Kafka's Streaming API.
>
> Please have a look here:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-120%3A+Cleanup+Kafka+Streams+builder+API
>
> Looking forward to your feedback!
>
>
> -Matthias
>



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-02-06 Thread Matthias J. Sax
Mathieu, Damian,

thanks a lot for your feedback. It's very valuable to see what, how and
why people are using certain methods right now.

We don't want to lock people out (that's why we put this KIP on users
list, too) and we want to keep the ability to mix-and-match DSL and
Processor API.

Furthermore, we actually plan another KIP to clean-up DSL. If we see
that we put some limitation with this KIP, we can add missing features
in this follow up KIP. We want to get both done before 0.10.3 so only
people working on trunk would be affected for some time.


- updated both #build() methods -- I also added one overload for each
that was missing

- adding KStreamBuilder#topologyBuilder() seems like be a good idea to
address any concern with limited access to TopologyBuilder and DSL/PAPI
mix-and-match approach. However, we should try to cover as much as
possible with #process(), #transform() etc.

- about TopologyBuilder#addInternalTopic(): I am not sure about this --
internal topics are a DSL concept to simplify the user code by avoiding
manual calls to #through() -- thus, I would prefer to remove it from
TopologyBuilder. Many methods are "potentially useful" -- as long as
removing a method does not limit things that can be done, I rather
remove them. TopologyBuilder API should give basic and complete API --
but no need for syntactic sugar IMHO. And Mathieu also stated "This
could be worked around fairly easily."

- about TopologyBuilder.nodeGroups & TopologyBuilder.build: not sure
what analysis you do -- there is also KafkaStreams#toString() that
describes the topology/DAG of the job. @Mathieu: Could you use this for
your analysis?

- KStreamBuilder.newName: similar argument as above -- if you use
TopologyBuilder, you have to create your own names (there is no need
that KStreamBuilder to provided helper methods for TopologyBuilder
usage). And Mathieu also stated "Lacking this method would be fairly
easy to work around".


-Matthias



On 2/6/17 8:07 AM, Damian Guy wrote:
> Hi Matthias,
> 
> Thanks for the KIP.  Should TopologyBuilder#build() and
> KStreamBuilder#build accept a StreamsConfig as an argument?
> 
> Should we add a KStreamBuilder#topologyBuilder() for cases where people
> want to mix and match DSL & PAPI? Or do you think we already provide enough
> support for that via the DSL methods, i.e, KStream#process(...),
> KStream.transform(...) etc?
> 
> I wonder if we could leave TopologyBuilder#addInternalTopic() ? It seems
> like it could be potentially useful? Perhaps the method could be renamed.
> 
> Thanks,
> Damian
> 
> On Mon, 6 Feb 2017 at 15:39 Mathieu Fenniak 
> wrote:
> 
>> Hi Matthias,
>>
>> I use a few of the methods that you're pointing out that will be deprecated
>> and don't have an apparent alternative, so I wanted to just let you know
>> what they are and what my use-cases are for them.
>>
>> First of all, I use a combination of DSL and PAPI in the same application
>> very happily.  It looks like this API would not permit that at all... it
>> looks like the intent here is to block that kind of usage, but as the DSL
>> is not extensible (short of building on "internal" API classes and using
>> reflection), this creates some real limitations in the type of applications
>> that can be built.
>>
>> TopologyBuilder.addInternalTopic -- as we've discussed before, I have some
>> reusable join components that build multiple processors w/ internal
>> repartition topics.  We use addInternalTopic to create application-id
>> prefixed repartition topics.  This could be worked around fairly easily.
>>
>> TopologyBuilder.nodeGroups & TopologyBuilder.build -- I have an option in
>> my KS app to run once & output a graphviz document with my entire topology
>> for debugging and analysis purposes; I use these methods to
>> create ProcessorTopology instances to inspect the topology and create this
>> output.  I don't really see any alternative approach with this new API.
>>
>> KStreamBuilder.newName -- Similar to addInternalTopic, I use this to create
>> processor names in reusable components.  Lacking this method would be
>> fairly easy to work around.
>>
>> Mathieu
>>
>>
>> On Fri, Feb 3, 2017 at 4:33 PM, Matthias J. Sax 
>> wrote:
>>
>>> Hi All,
>>>
>>> I did prepare a KIP to do some cleanup some of Kafka's Streaming API.
>>>
>>> Please have a look here:
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>> 120%3A+Cleanup+Kafka+Streams+builder+API
>>>
>>> Looking forward to your feedback!
>>>
>>>
>>> -Matthias
>>>
>>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-02-06 Thread Damian Guy
Hi Matthias,

Thanks for the KIP.  Should TopologyBuilder#build() and
KStreamBuilder#build accept a StreamsConfig as an argument?

Should we add a KStreamBuilder#topologyBuilder() for cases where people
want to mix and match DSL & PAPI? Or do you think we already provide enough
support for that via the DSL methods, i.e, KStream#process(...),
KStream.transform(...) etc?

I wonder if we could leave TopologyBuilder#addInternalTopic() ? It seems
like it could be potentially useful? Perhaps the method could be renamed.

Thanks,
Damian

On Mon, 6 Feb 2017 at 15:39 Mathieu Fenniak 
wrote:

> Hi Matthias,
>
> I use a few of the methods that you're pointing out that will be deprecated
> and don't have an apparent alternative, so I wanted to just let you know
> what they are and what my use-cases are for them.
>
> First of all, I use a combination of DSL and PAPI in the same application
> very happily.  It looks like this API would not permit that at all... it
> looks like the intent here is to block that kind of usage, but as the DSL
> is not extensible (short of building on "internal" API classes and using
> reflection), this creates some real limitations in the type of applications
> that can be built.
>
> TopologyBuilder.addInternalTopic -- as we've discussed before, I have some
> reusable join components that build multiple processors w/ internal
> repartition topics.  We use addInternalTopic to create application-id
> prefixed repartition topics.  This could be worked around fairly easily.
>
> TopologyBuilder.nodeGroups & TopologyBuilder.build -- I have an option in
> my KS app to run once & output a graphviz document with my entire topology
> for debugging and analysis purposes; I use these methods to
> create ProcessorTopology instances to inspect the topology and create this
> output.  I don't really see any alternative approach with this new API.
>
> KStreamBuilder.newName -- Similar to addInternalTopic, I use this to create
> processor names in reusable components.  Lacking this method would be
> fairly easy to work around.
>
> Mathieu
>
>
> On Fri, Feb 3, 2017 at 4:33 PM, Matthias J. Sax 
> wrote:
>
> > Hi All,
> >
> > I did prepare a KIP to do some cleanup some of Kafka's Streaming API.
> >
> > Please have a look here:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 120%3A+Cleanup+Kafka+Streams+builder+API
> >
> > Looking forward to your feedback!
> >
> >
> > -Matthias
> >
> >
>


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-02-06 Thread Mathieu Fenniak
Hi Matthias,

I use a few of the methods that you're pointing out that will be deprecated
and don't have an apparent alternative, so I wanted to just let you know
what they are and what my use-cases are for them.

First of all, I use a combination of DSL and PAPI in the same application
very happily.  It looks like this API would not permit that at all... it
looks like the intent here is to block that kind of usage, but as the DSL
is not extensible (short of building on "internal" API classes and using
reflection), this creates some real limitations in the type of applications
that can be built.

TopologyBuilder.addInternalTopic -- as we've discussed before, I have some
reusable join components that build multiple processors w/ internal
repartition topics.  We use addInternalTopic to create application-id
prefixed repartition topics.  This could be worked around fairly easily.

TopologyBuilder.nodeGroups & TopologyBuilder.build -- I have an option in
my KS app to run once & output a graphviz document with my entire topology
for debugging and analysis purposes; I use these methods to
create ProcessorTopology instances to inspect the topology and create this
output.  I don't really see any alternative approach with this new API.

KStreamBuilder.newName -- Similar to addInternalTopic, I use this to create
processor names in reusable components.  Lacking this method would be
fairly easy to work around.

Mathieu


On Fri, Feb 3, 2017 at 4:33 PM, Matthias J. Sax 
wrote:

> Hi All,
>
> I did prepare a KIP to do some cleanup some of Kafka's Streaming API.
>
> Please have a look here:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 120%3A+Cleanup+Kafka+Streams+builder+API
>
> Looking forward to your feedback!
>
>
> -Matthias
>
>


Fwd: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-02-04 Thread Matthias J. Sax
cc'ed from dev


 Forwarded Message 
Subject: Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API
Date: Sat, 4 Feb 2017 11:30:46 -0800
From: Matthias J. Sax <matth...@confluent.io>
Organization: Confluent Inc
To: d...@kafka.apache.org

I think the right pattern should be to use TopologyBuilder as a class
member, like new KStreamBuilder will do, instead of a class hierarchy,
in case you want to offer your own higher level abstraction to describe
a topology.

Why is it important that you can derive a new class from it?

I am not too strict about about. It's just a suggestion to make it final
to enforce a different usage pattern, that IMHO is better than the
current one.


-Matthias


On 2/3/17 5:15 PM, radai wrote:
> why is it so important to make those classes final ?
> 
> On Fri, Feb 3, 2017 at 3:33 PM, Matthias J. Sax <matth...@confluent.io>
> wrote:
> 
>> Hi All,
>>
>> I did prepare a KIP to do some cleanup some of Kafka's Streaming API.
>>
>> Please have a look here:
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 120%3A+Cleanup+Kafka+Streams+builder+API
>>
>> Looking forward to your feedback!
>>
>>
>> -Matthias
>>
>>
> 





signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-02-03 Thread radai
why is it so important to make those classes final ?

On Fri, Feb 3, 2017 at 3:33 PM, Matthias J. Sax 
wrote:

> Hi All,
>
> I did prepare a KIP to do some cleanup some of Kafka's Streaming API.
>
> Please have a look here:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 120%3A+Cleanup+Kafka+Streams+builder+API
>
> Looking forward to your feedback!
>
>
> -Matthias
>
>