Nice.

On Tue, Nov 24, 2015 at 4:32 PM, Thomas Weise <[email protected]>
wrote:

> The new API provides for explicit offset storage on the broker side (see
> commitXXX methods):
>
>
> http://kafka.apache.org/090/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
>
> With this, we won't need the current offset manager approach.
>
> On Tue, Nov 24, 2015 at 1:32 PM, Pramod Immaneni <[email protected]>
> wrote:
>
> > What support does the new API provide for offset management?
> >
> > On Tue, Nov 24, 2015 at 1:09 PM, Thomas Weise <[email protected]>
> > wrote:
> >
> > > This discussion applies to Kafka 0.8.x only?
> > >
> > > With the new consumer API, offset management can be delegated, we won't
> > > need this component any longer. Each partition can record the offset
> for
> > > the committed window then.
> > >
> > > Thomas
> > >
> > > On Tue, Nov 24, 2015 at 12:59 PM, Siyuan Hua <[email protected]>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I need your idea for the design of OffsetManager for kafka input
> > operator
> > > >
> > > > First of all, some background of OffsetManager and why we may need
> it.
> > > The
> > > > OffsetManager is a plugin in kafka input operator for cutomized
> offset
> > > > management. The API will be called if the consumer offset(the message
> > > that
> > > > has been emitted, along with the window id) changes.
> > > > OffsetManager is different from offset checkpointing, we still use
> > > > checkpointed offset to recover node from failure.
> > > >
> > > > 2 reasons for the need of OffsetManager:
> > > >   1) User might want to store offsets in their own way (hdfs,
> > zookeeper,
> > > > database, etc)
> > > >   2) User might want to continue consuming at application restart.
> > > >
> > > > In the current version, the OffsetManager works in a central mode,
> each
> > > > partition only reports the offset(s) to Statslistener, the listener
> > calls
> > > > OffsetManager to update the offsets.
> > > > The other possibility is make the OffsetManager work in a distributed
> > > > mode.  Each partition update the offset(s) on its own.
> > > >
> > > > The distributed mode is more straightforward, but the developer needs
> > to
> > > > know it's distributed, you have to manage write from multiple nodes,
> > > > collisions on your own. But also it's more real time at no risk of
> > > failure
> > > > of stats reporter
> > > >
> > > > Any input is welcome, thanks!
> > > >
> > > > Regards,
> > > > Siyuan
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Nov 16, 2015 at 10:53 AM, Siyuan Hua <[email protected]
> >
> > > > wrote:
> > > >
> > > > > I will be working on rewriting the kafka input operator.
> > > > >
> > > > > Here is the ticket
> > > > > https://malhar.atlassian.net/browse/MLHR-1904
> > > > >
> > > > > Here is some comments on the ticket
> > > > >
> > > > > The RC2 is out here
> > > > > https://people.apache.org/~junrao/kafka-0.9.0.0-candidate2/
> > > > >
> > > > > We will keep most features of the old input operator but the
> internal
> > > > > mechanism will be changed, for example, using new API to refresh
> the
> > > > > metadata
> > > > > The bugs that will be fixed:
> > > > >
> > > > >    - Synchronized offset checkpoint
> > > > >    - Transient offsetmanager
> > > > >
> > > > > New features:
> > > > >
> > > > >    - Support customized partition schema
> > > > >    - Default OffsetManager using new
> > > > >
> > > > > Improvement
> > > > >
> > > > >    - Add window id and application name to OffsetManager interface
> > > > >    - Support multi-topic
> > > > >    - Easy configuration
> > > > >
> > > > >
> > > > > Please leave thoughts here or on the ticket. Thanks
> > > > >
> > > > > Best,
> > > > > Siyuan
> > > > >
> > > >
> > >
> >
>

Reply via email to