Re: [DISCUSS] KIP 130: Expose states of active tasks to KafkaStreams public API

Damian Guy Thu, 02 Mar 2017 03:12:10 -0800

Hi Florian,

Thanks for the KIP.


It seems there is some overlap here with what we already have in
KafkaStreams.allMetadata(). This currently returns a
Collection<StreamsMetadata> where each StreamsMetadata instance holds the
state stores and partition assignment for every instance of the
KafkaStreams application. I'm wondering if that is good enough for what you
are trying to achieve? If not could it be modified to include the per
Thread assignment?

Thanks,
Damian






On Wed, 1 Mar 2017 at 22:49 Florian Hussonnois <[email protected]>
wrote:

> Hi Matthias,
>
> First, I will answer to your last question.
>
> The main reason to have both TaskState#assignment and
> TaskState#consumedOffsetsByPartition is that tasks have no consumed offsets
> until at least one message is consumed for each partition even if previous
> offsets exist for the consumer group.
> So yes this methods are redundant as it only diverge at application
> startup.
>
> About the use case, currently we are developping for a customer a little
> framework based on KafkaStreams which transform/denormalize data before
> ingesting into hadoop.
>
> We have a cluster of workers (SpringBoot) which instantiate KStreams
> topologies dynamicaly based on dataflow configurations.
> Each configuration describes a topic to consume and how to process messages
> (this looks like NiFi processors API).
>
> Our architecture is inspired from KafkaConnect. We have topics for configs
> and states which are consumed by each workers (actually we have reused some
> internals classes to the connect API).
>
> Now, we would like to develop UIs to visualize topics and partitions
> consumed by our worker applications.
>
> Also, I think it would be nice to be able,  in the futur, to develop web
> UIs similar to Spark but for KafkaStreams to visualize DAGs...so maybe this
> KIP is just a first step.
>
> Thanks,
>
> 2017-03-01 22:52 GMT+01:00 Matthias J. Sax <[email protected]>:
>
> > Thanks for the KIP.
> >
> > I am wondering a little bit, why you need to expose this information.
> > Can you describe some use cases?
> >
> > Would it be worth to unify this new API with KafkaStreams#state() to get
> > the overall state of an application without the need to call two
> > different methods? Not sure how this unified API might look like though.
> >
> >
> > One minor comment about the API: TaskState#assignment seems to be
> > redundant. It should be the same as
> > TaskState#consumedOffsetsByPartition.keySet()
> >
> > Or do I miss something?
> >
> >
> > -Matthias
> >
> > On 3/1/17 5:19 AM, Florian Hussonnois wrote:
> > > Hi Eno,
> > >
> > > Yes, but the state() method only returns the global state of the
> > > KafkaStream application (ie: CREATED, RUNNING, REBALANCING,
> > > PENDING_SHUTDOWN, NOT_RUNNING).
> > >
> > > An alternative to this KIP would be to change this method to return
> more
> > > information instead of adding a new method.
> > >
> > > 2017-03-01 13:46 GMT+01:00 Eno Thereska <[email protected]>:
> > >
> > >> Thanks Florian,
> > >>
> > >> Have you had a chance to look at the new state methods in 0.10.2,
> e.g.,
> > >> KafkaStreams.state()?
> > >>
> > >> Thanks
> > >> Eno
> > >>> On 1 Mar 2017, at 11:54, Florian Hussonnois <[email protected]>
> > >> wrote:
> > >>>
> > >>> Hi all,
> > >>>
> > >>> I have just created KIP-130 to add a new method to the KafkaStreams
> API
> > >> in
> > >>> order to expose the states of threads and active tasks.
> > >>>
> > >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP+
> > >> 130%3A+Expose+states+of+active+tasks+to+KafkaStreams+public+API
> > >>>
> > >>>
> > >>> Thanks,
> > >>>
> > >>> --
> > >>> Florian HUSSONNOIS
> > >>
> > >>
> > >
> > >
> >
> >
>
>
> --
> Florian HUSSONNOIS
>

Re: [DISCUSS] KIP 130: Expose states of active tasks to KafkaStreams public API

Reply via email to