Re: [VOTE] KIP-67: Queryable state for Kafka Streams

Neha Narkhede Sun, 10 Jul 2016 11:22:32 -0700

Few thoughts that became apparent after observing example code of what an
application architecture and code might look like with these changes.
Apologize for the late realization hence.

1. "user.endpoint" will be very differently defined for respective
applications. I don't think Kafka Streams should generalize to accept any
connection URL as we expect to only expose metadata expressed as HostInfo
(which is defined by host & port) and hence need to interpret the
"user.endpoint" as host & port. Applications will have their own endpoint
configs that will take many forms and they will be responsible for parsing
out host and port and configuring Kafka Streams accordingly.

If we are in fact limiting to host and port, I wonder if we should change
the name of "user.endpoint" into something more specific. We have clients
expose host/port pairs as "bootstrap.servers". Should this be
"application.server"?

2. I don't think we should expose another abstraction called
KafkaStreamsInstance to the user. This is related to the discussion of the
right abstraction that we want to expose to an application. The abstraction
discussion itself should probably be part of the KIP itself, let me give a
quick summary of my thoughts here:
1. The person implementing an application using Queryable State has likely
already made some choices for the service layer–a REST framework, Thrift,
or whatever. We don't really want to add another RPC framework to this mix,
nor do we want to try to make Kafka's RPC mechanism general purpose.
2. Likewise, it should be clear that the API you want to expose to the
front-end/client service is not necessarily the API you'd need internally
as there may be additional filtering/processing in the router.

Given these constraints, what we prefer to add is a fairly low-level
"toolbox" that would let you do anything you want, but requires to route
and perform any aggregation or processing yourself. This pattern is
not recommended for all kinds of services/apps, but there are definitely a
category of things where it is a big win and other advanced applications
are out-of-scope.

The APIs we expose should take the following things into consideration:
1. Make it clear to the user that they will do the routing, aggregation,
processing themselves. So the bare minimum that we want to expose is store
and partition metadata per application server identified by the host and
port.
2. Ensure that the API exposes abstractions that are known to the user or
are intuitive to the user.
3. Avoid exposing internal objects or implementation details to the user.

So tying all this into answering the question of what we should expose
through the APIs -

In Kafka Streams, the user is aware of the concept of tasks and partitions
since the application scales with the number of partitions and tasks are
the construct for logical parallelism. The user is also aware of the
concept of state stores though until now they were not user accessible.
With Queryable State, the bare minimum abstractions that we need to expose
are state stores and the location of state store partitions.

For exposing the state stores, the getStore() APIs look good but I think
for locating the state store partitions, we should go back to the original
proposal of simply exposing some sort of getPartitionMetadata() that
returns a PartitionMetadata or TaskMetadata object keyed by HostInfo.

The application will convert the HostInfo (host and port) into some
connection URL to talk to the other app instances via its own RPC mechanism
depending on whether it needs to scatter-gather or just query. The
application will know how a key maps to a partition and through
PartitionMetadata it will know how to locate the server that hosts the
store that has the partition hosting that key.

On Fri, Jul 8, 2016 at 9:40 AM, Michael Noll <mich...@confluent.io> wrote:

> Addendum in case my previous email wasn't clear:
>
> > So for any given instance of a streams application there will never be
> both a v1 and v2 alive at the same time
>
> That's right.  But the current live instance will be able to tell other
> instances, via its endpoint setting, whether it wants to be contacted at v1
> or at v2.  The other instances can't guess that.  Think: if an older
> instance would manually compose the "rest" of an endpoint URI, having only
> the host and port from the endpoint setting, it might not know that the new
> instances have a different endpoint suffix, for example).
>
>
> On Fri, Jul 8, 2016 at 6:37 PM, Michael Noll <mich...@confluent.io> wrote:
>
> > Damian,
> >
> > about the rolling upgrade comment:  An instance A will contact another
> > instance B by the latter's endpoint, right?  So if A has no further
> > information available than B's host and port, then how should instance A
> > know whether it should call B at /v1/ or at /v2/?  I agree that my
> > suggestion isn't foolproof, but it is afaict better than the host:port
> > approach.
> >
> >
> >
> > On Fri, Jul 8, 2016 at 5:15 PM, Damian Guy <damian....@gmail.com> wrote:
> >
> >> Michael - i'm ok with changing it to a string. Any one else have a
> strong
> >> opinion on this?
> >>
> >> FWIW - i don't think it will work fine as is during the rolling upgrade
> >> scenario as the service that is listening on the port needs to be
> embedded
> >> within each instance. So for any given instance of a streams application
> >> there will never be both a v1 and v2 alive at the same time (unless of
> >> course the process didn't shutdown properly, but then you have another
> >> problem...).
> >>
> >> On Fri, 8 Jul 2016 at 15:26 Michael Noll <mich...@confluent.io> wrote:
> >>
> >> > I have one further comment about `StreamsConfig.USER_ENDPOINT_CONFIG`.
> >> >
> >> > I think we should consider to not restricting the value of this
> setting
> >> to
> >> > only `host:port` pairs.  By design, this setting is capturing
> >> user-driven
> >> > metadata to define an endpoint, so why restrict the creativity or
> >> > flexibility of our users?  I can imagine, for example, that users
> would
> >> > like to set values such as `https://host:port/api/rest/v1/` in this
> >> field
> >> > (e.g. being able to distinguish between `.../v1/` and `.../v2/` may
> >> help in
> >> > scenarios such as rolling upgrades, where, during the upgrade, older
> >> > instances may need to coexist with newer instances).
> >> >
> >> > That said, I don't have a strong opinion here.
> >> >
> >> > -Michael
> >> >
> >> >
> >> >
> >> > On Fri, Jul 8, 2016 at 2:55 PM, Matthias J. Sax <
> matth...@confluent.io>
> >> > wrote:
> >> >
> >> > > +1
> >> > >
> >> > > On 07/08/2016 11:03 AM, Eno Thereska wrote:
> >> > > > +1 (non-binding)
> >> > > >
> >> > > >> On 7 Jul 2016, at 18:31, Sriram Subramanian <r...@confluent.io>
> >> wrote:
> >> > > >>
> >> > > >> +1
> >> > > >>
> >> > > >> On Thu, Jul 7, 2016 at 9:53 AM, Henry Cai
> >> <h...@pinterest.com.invalid
> >> > >
> >> > > >> wrote:
> >> > > >>
> >> > > >>> +1
> >> > > >>>
> >> > > >>> On Thu, Jul 7, 2016 at 6:48 AM, Michael Noll <
> >> mich...@confluent.io>
> >> > > wrote:
> >> > > >>>
> >> > > >>>> +1 (non-binding)
> >> > > >>>>
> >> > > >>>> On Thu, Jul 7, 2016 at 10:24 AM, Damian Guy <
> >> damian....@gmail.com>
> >> > > >>> wrote:
> >> > > >>>>
> >> > > >>>>> Thanks Henry - we've updated the KIP with an example and the
> new
> >> > > config
> >> > > >>>>> parameter required. FWIW the user doesn't register a listener,
> >> they
> >> > > >>>> provide
> >> > > >>>>> a host:port in config. It is expected they will start a
> service
> >> > > running
> >> > > >>>> on
> >> > > >>>>> that host:port that they can use to connect to the running
> >> > > KafkaStreams
> >> > > >>>>> Instance.
> >> > > >>>>>
> >> > > >>>>> Thanks,
> >> > > >>>>> Damian
> >> > > >>>>>
> >> > > >>>>> On Thu, 7 Jul 2016 at 06:06 Henry Cai
> >> <h...@pinterest.com.invalid>
> >> > > >>>> wrote:
> >> > > >>>>>
> >> > > >>>>>> It wasn't quite clear to me how the user program interacts
> with
> >> > the
> >> > > >>>>>> discovery API, especially on the user supplied listener part,
> >> how
> >> > > >>> does
> >> > > >>>>> the
> >> > > >>>>>> user program supply that listener to KafkaStreams and how
> does
> >> > > >>>>> KafkaStreams
> >> > > >>>>>> know which port the user listener is running, maybe a more
> >> > complete
> >> > > >>>>>> end-to-end example including the steps on registering the
> user
> >> > > >>> listener
> >> > > >>>>> and
> >> > > >>>>>> whether the user listener needs to be involved with task
> >> > > >>> reassignment.
> >> > > >>>>>>
> >> > > >>>>>>
> >> > > >>>>>> On Wed, Jul 6, 2016 at 9:13 PM, Guozhang Wang <
> >> wangg...@gmail.com
> >> > >
> >> > > >>>>> wrote:
> >> > > >>>>>>
> >> > > >>>>>>> ＋1
> >> > > >>>>>>>
> >> > > >>>>>>> On Wed, Jul 6, 2016 at 12:44 PM, Damian Guy <
> >> > damian....@gmail.com>
> >> > > >>>>>> wrote:
> >> > > >>>>>>>
> >> > > >>>>>>>> Hi all,
> >> > > >>>>>>>>
> >> > > >>>>>>>> I'd like to initiate the voting process for KIP-67
> >> > > >>>>>>>> <
> >> > > >>>>>>>>
> >> > > >>>>>>>
> >> > > >>>>>>
> >> > > >>>>>
> >> > > >>>>
> >> > > >>>
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-67%3A+Queryable+state+for+Kafka+Streams
> >> > > >>>>>>>>>
> >> > > >>>>>>>>
> >> > > >>>>>>>> KAFKA-3909 <
> https://issues.apache.org/jira/browse/KAFKA-3909
> >> >
> >> > is
> >> > > >>>> the
> >> > > >>>>>> top
> >> > > >>>>>>>> level JIRA for this effort.
> >> > > >>>>>>>>
> >> > > >>>>>>>> Initial PRs for Step 1 of the process are:
> >> > > >>>>>>>> Expose State Store Names <
> >> > > >>>> https://github.com/apache/kafka/pull/1526>
> >> > > >>>>>> and
> >> > > >>>>>>>> Query Local State Stores <
> >> > > >>>> https://github.com/apache/kafka/pull/1565>
> >> > > >>>>>>>>
> >> > > >>>>>>>> Thanks,
> >> > > >>>>>>>> Damian
> >> > > >>>>>>>>
> >> > > >>>>>>>
> >> > > >>>>>>>
> >> > > >>>>>>>
> >> > > >>>>>>> --
> >> > > >>>>>>> -- Guozhang
> >> > > >>>>>>>
> >> > > >>>>>>
> >> > > >>>>>
> >> > > >>>>
> >> > > >>>>
> >> > > >>>>
> >> > > >>>> --
> >> > > >>>> Best regards,
> >> > > >>>> Michael Noll
> >> > > >>>>
> >> > > >>>>
> >> > > >>>>
> >> > > >>>> *Michael G. Noll | Product Manager | Confluent | +1
> >> > > 650.453.5860Download
> >> > > >>>> Apache Kafka and Confluent Platform: www.confluent.io/download
> >> > > >>>> <http://www.confluent.io/download>*
> >> > > >>>>
> >> > > >>>
> >> > > >
> >> > >
> >> > >
> >> >
> >> >
> >> > --
> >> > Best regards,
> >> > Michael Noll
> >> >
> >> >
> >> >
> >> > *Michael G. Noll | Product Manager | Confluent | +1
> 650.453.5860Download
> >> > Apache Kafka and Confluent Platform: www.confluent.io/download
> >> > <http://www.confluent.io/download>*
> >> >
> >>
> >
> >
> >
> > --
> > Best regards,
> > Michael Noll
> >
> >
> >
> > *Michael G. Noll | Product Manager | Confluent | +1 650.453.5860
> > <%2B1%20650.453.5860>Download Apache Kafka and Confluent Platform:
> > www.confluent.io/download <http://www.confluent.io/download>*
> >
>
>
>
> --
> Best regards,
> Michael Noll
>
>
>
> *Michael G. Noll | Product Manager | Confluent | +1 650.453.5860Download
> Apache Kafka and Confluent Platform: www.confluent.io/download
> <http://www.confluent.io/download>*
>

-- 
Thanks,
Neha

Re: [VOTE] KIP-67: Queryable state for Kafka Streams

Reply via email to