Re: KafkaStreams aggregation with multiple instance
1. The aggregation is done based on the key to the message. So for a silly example, if your messages were data about new car sales and you wanted to count how many cars sold by color, you could consume the messages and then "re-key" them so that the key to the message was the color. Then later in your streams topology, you would aggregate (count) based on that new key. Because kafka will guarantee that the same key will always wind up in the same partition, you won't have a scenario where messages with the key "red" will end up being consumed by more than 1 instance. "Red" might always be getting consumed/aggregated on instance A, "blue" on instance B, etc etc. 2. You can use other data stores as state stores and the documentation describes how to do this, however my opinion is that unless you can a good reason to NOT use RocksDB, I would use RocksDB - especially to start with. Hope that helps! Alex On Fri, May 7, 2021 at 12:59 AM Pietro Galassi wrote: > Hi Neeraj, > > 1) I have multiple instance reading from orderTopic and using aggregate > (sum). So if instance A reads and do a +1 and instance B reads and do a +1 > at the same time can i have wrong count numbers (some +1 may be lost ?). > Yes i'm using messageKeys and multiple partitions. > > 2) What state store can i use ? I'm actually using spring kafka and it > relays on RockDB it seems. > > Regards, > Pietro > > On Fri, May 7, 2021 at 12:39 AM Neeraj Vaidya > wrote: > > > Hi Pietro, > > 1) What do you mean by problems in counts due to multiple instances ? > > Also, do you use Keys in your messages ? > > 2) If you want to maintain state and refer to that state when processing > > each message, then yes you will need a state store. A state store will > also > > be needed if you want to I guess query that state externally. > > > > Regards, > > Neeraj > > > > > > On Friday, 7 May, 2021, 01:47:59 am GMT+10, Pietro Galassi < > > pietro.gala...@gmail.com> wrote: > > > > Hi all, > > hi have hope you can help me figure out this scenario. > > > > I have a multiinstance microservice that consumes from a topic > > (ordersTopic) all of them use the same consumer_group. > > > > This microservice uses a KStream to aggregate (sum) topic events and > > produces results on another topic (countTopic). > > > > Have two questions: > > > > 1) Can i have problems on counts due to multiple instance of the same > > microservies ? > > 2) I need rockDB and materialized view in order to store data ? > > > > Thanks a lot. > > Regards, > > Pietro Galassi > > >
Re: KafkaStreams aggregation with multiple instance
Hi Neeraj, 1) I have multiple instance reading from orderTopic and using aggregate (sum). So if instance A reads and do a +1 and instance B reads and do a +1 at the same time can i have wrong count numbers (some +1 may be lost ?). Yes i'm using messageKeys and multiple partitions. 2) What state store can i use ? I'm actually using spring kafka and it relays on RockDB it seems. Regards, Pietro On Fri, May 7, 2021 at 12:39 AM Neeraj Vaidya wrote: > Hi Pietro, > 1) What do you mean by problems in counts due to multiple instances ? > Also, do you use Keys in your messages ? > 2) If you want to maintain state and refer to that state when processing > each message, then yes you will need a state store. A state store will also > be needed if you want to I guess query that state externally. > > Regards, > Neeraj > > > On Friday, 7 May, 2021, 01:47:59 am GMT+10, Pietro Galassi < > pietro.gala...@gmail.com> wrote: > > Hi all, > hi have hope you can help me figure out this scenario. > > I have a multiinstance microservice that consumes from a topic > (ordersTopic) all of them use the same consumer_group. > > This microservice uses a KStream to aggregate (sum) topic events and > produces results on another topic (countTopic). > > Have two questions: > > 1) Can i have problems on counts due to multiple instance of the same > microservies ? > 2) I need rockDB and materialized view in order to store data ? > > Thanks a lot. > Regards, > Pietro Galassi >
Re: KafkaStreams aggregation with multiple instance
Hi Pietro, 1) What do you mean by problems in counts due to multiple instances ? Also, do you use Keys in your messages ? 2) If you want to maintain state and refer to that state when processing each message, then yes you will need a state store. A state store will also be needed if you want to I guess query that state externally. Regards, Neeraj On Friday, 7 May, 2021, 01:47:59 am GMT+10, Pietro Galassi wrote: Hi all, hi have hope you can help me figure out this scenario. I have a multiinstance microservice that consumes from a topic (ordersTopic) all of them use the same consumer_group. This microservice uses a KStream to aggregate (sum) topic events and produces results on another topic (countTopic). Have two questions: 1) Can i have problems on counts due to multiple instance of the same microservies ? 2) I need rockDB and materialized view in order to store data ? Thanks a lot. Regards, Pietro Galassi
KafkaStreams aggregation with multiple instance
Hi all, hi have hope you can help me figure out this scenario. I have a multiinstance microservice that consumes from a topic (ordersTopic) all of them use the same consumer_group. This microservice uses a KStream to aggregate (sum) topic events and produces results on another topic (countTopic). Have two questions: 1) Can i have problems on counts due to multiple instance of the same microservies ? 2) I need rockDB and materialized view in order to store data ? Thanks a lot. Regards, Pietro Galassi