What support does the new API provide for offset management? On Tue, Nov 24, 2015 at 1:09 PM, Thomas Weise <[email protected]> wrote:
> This discussion applies to Kafka 0.8.x only? > > With the new consumer API, offset management can be delegated, we won't > need this component any longer. Each partition can record the offset for > the committed window then. > > Thomas > > On Tue, Nov 24, 2015 at 12:59 PM, Siyuan Hua <[email protected]> > wrote: > > > Hi all, > > > > I need your idea for the design of OffsetManager for kafka input operator > > > > First of all, some background of OffsetManager and why we may need it. > The > > OffsetManager is a plugin in kafka input operator for cutomized offset > > management. The API will be called if the consumer offset(the message > that > > has been emitted, along with the window id) changes. > > OffsetManager is different from offset checkpointing, we still use > > checkpointed offset to recover node from failure. > > > > 2 reasons for the need of OffsetManager: > > 1) User might want to store offsets in their own way (hdfs, zookeeper, > > database, etc) > > 2) User might want to continue consuming at application restart. > > > > In the current version, the OffsetManager works in a central mode, each > > partition only reports the offset(s) to Statslistener, the listener calls > > OffsetManager to update the offsets. > > The other possibility is make the OffsetManager work in a distributed > > mode. Each partition update the offset(s) on its own. > > > > The distributed mode is more straightforward, but the developer needs to > > know it's distributed, you have to manage write from multiple nodes, > > collisions on your own. But also it's more real time at no risk of > failure > > of stats reporter > > > > Any input is welcome, thanks! > > > > Regards, > > Siyuan > > > > > > > > > > > > On Mon, Nov 16, 2015 at 10:53 AM, Siyuan Hua <[email protected]> > > wrote: > > > > > I will be working on rewriting the kafka input operator. > > > > > > Here is the ticket > > > https://malhar.atlassian.net/browse/MLHR-1904 > > > > > > Here is some comments on the ticket > > > > > > The RC2 is out here > > > https://people.apache.org/~junrao/kafka-0.9.0.0-candidate2/ > > > > > > We will keep most features of the old input operator but the internal > > > mechanism will be changed, for example, using new API to refresh the > > > metadata > > > The bugs that will be fixed: > > > > > > - Synchronized offset checkpoint > > > - Transient offsetmanager > > > > > > New features: > > > > > > - Support customized partition schema > > > - Default OffsetManager using new > > > > > > Improvement > > > > > > - Add window id and application name to OffsetManager interface > > > - Support multi-topic > > > - Easy configuration > > > > > > > > > Please leave thoughts here or on the ticket. Thanks > > > > > > Best, > > > Siyuan > > > > > >
