Re: Is this a decent use case for Kafka Streams?
Unf this notion isn't applicable: "...At the end of a time window..." If you comb through the archives of this group you'll see many questions about notifications for the 'end of an aggregation window' and a similar number of replies from the Kafka group stating that such a notion doesn't really exist. Each window is kept open so that late arriving records can be incorporated. You can specify the lifetime of a given window but you don't get any sort of signal when it expires. A record that arrives after said expiration will trigger a new window to be created. On Wed, Jul 12, 2017 at 5:06 PM, Stephen Powiswrote: > Hey! I was hoping I could get some input from people more experienced with > Kafka Streams to determine if they'd be a good use case/solution for me. > > I have multi-tenant clients submitting data to a Kafka topic that they want > ETL'd to a third party service. I'd like to batch and group these by > tenant over a time window, somewhere between 1 and 5 minutes. At the end > of a time window then issue an API request to the third party service for > each tenant sending the batch of data over. > > Other points of note: > - Ideally we'd have exactly-once semantics, sending data multiple times > would typically be bad. But we'd need to gracefully handle things like API > request errors / service outages. > > - We currently use Storm for doing stream processing, but the long running > time-windows and potentially large amount of data stored in memory make me > a bit nervous to use it for this. > > Thoughts? Thanks in Advance! > Stephen >
Re: Is this a decent use case for Kafka Streams?
From just looking at your description of the problem, I'd say yes, this looks like a typical scenario for Kafka Streams. Kafka Streams supports exactly once semantics too in 0.11. Cheers Eno > On 12 Jul 2017, at 17:06, Stephen Powiswrote: > > Hey! I was hoping I could get some input from people more experienced with > Kafka Streams to determine if they'd be a good use case/solution for me. > > I have multi-tenant clients submitting data to a Kafka topic that they want > ETL'd to a third party service. I'd like to batch and group these by > tenant over a time window, somewhere between 1 and 5 minutes. At the end > of a time window then issue an API request to the third party service for > each tenant sending the batch of data over. > > Other points of note: > - Ideally we'd have exactly-once semantics, sending data multiple times > would typically be bad. But we'd need to gracefully handle things like API > request errors / service outages. > > - We currently use Storm for doing stream processing, but the long running > time-windows and potentially large amount of data stored in memory make me > a bit nervous to use it for this. > > Thoughts? Thanks in Advance! > Stephen
Is this a decent use case for Kafka Streams?
Hey! I was hoping I could get some input from people more experienced with Kafka Streams to determine if they'd be a good use case/solution for me. I have multi-tenant clients submitting data to a Kafka topic that they want ETL'd to a third party service. I'd like to batch and group these by tenant over a time window, somewhere between 1 and 5 minutes. At the end of a time window then issue an API request to the third party service for each tenant sending the batch of data over. Other points of note: - Ideally we'd have exactly-once semantics, sending data multiple times would typically be bad. But we'd need to gracefully handle things like API request errors / service outages. - We currently use Storm for doing stream processing, but the long running time-windows and potentially large amount of data stored in memory make me a bit nervous to use it for this. Thoughts? Thanks in Advance! Stephen