This discussion applies to Kafka 0.8.x only? With the new consumer API, offset management can be delegated, we won't need this component any longer. Each partition can record the offset for the committed window then.
Thomas On Tue, Nov 24, 2015 at 12:59 PM, Siyuan Hua <[email protected]> wrote: > Hi all, > > I need your idea for the design of OffsetManager for kafka input operator > > First of all, some background of OffsetManager and why we may need it. The > OffsetManager is a plugin in kafka input operator for cutomized offset > management. The API will be called if the consumer offset(the message that > has been emitted, along with the window id) changes. > OffsetManager is different from offset checkpointing, we still use > checkpointed offset to recover node from failure. > > 2 reasons for the need of OffsetManager: > 1) User might want to store offsets in their own way (hdfs, zookeeper, > database, etc) > 2) User might want to continue consuming at application restart. > > In the current version, the OffsetManager works in a central mode, each > partition only reports the offset(s) to Statslistener, the listener calls > OffsetManager to update the offsets. > The other possibility is make the OffsetManager work in a distributed > mode. Each partition update the offset(s) on its own. > > The distributed mode is more straightforward, but the developer needs to > know it's distributed, you have to manage write from multiple nodes, > collisions on your own. But also it's more real time at no risk of failure > of stats reporter > > Any input is welcome, thanks! > > Regards, > Siyuan > > > > > > On Mon, Nov 16, 2015 at 10:53 AM, Siyuan Hua <[email protected]> > wrote: > > > I will be working on rewriting the kafka input operator. > > > > Here is the ticket > > https://malhar.atlassian.net/browse/MLHR-1904 > > > > Here is some comments on the ticket > > > > The RC2 is out here > > https://people.apache.org/~junrao/kafka-0.9.0.0-candidate2/ > > > > We will keep most features of the old input operator but the internal > > mechanism will be changed, for example, using new API to refresh the > > metadata > > The bugs that will be fixed: > > > > - Synchronized offset checkpoint > > - Transient offsetmanager > > > > New features: > > > > - Support customized partition schema > > - Default OffsetManager using new > > > > Improvement > > > > - Add window id and application name to OffsetManager interface > > - Support multi-topic > > - Easy configuration > > > > > > Please leave thoughts here or on the ticket. Thanks > > > > Best, > > Siyuan > > >
