Re: Is this a decent use case for Kafka Streams?

2017-07-13 Thread Jon Yeargers
Unf this notion isn't applicable: "...At the end of a time window..."

If you comb through the archives of this group you'll see many questions
about notifications for the 'end of an aggregation window' and a similar
number of replies from the Kafka group stating that such a notion doesn't
really exist. Each window is kept open so that late arriving records can be
incorporated. You can specify the lifetime of a given window but you don't
get any sort of signal when it expires. A record that arrives after said
expiration will trigger a new window to be created.


On Wed, Jul 12, 2017 at 5:06 PM, Stephen Powis 
wrote:

> Hey! I was hoping I could get some input from people more experienced with
> Kafka Streams to determine if they'd be a good use case/solution for me.
>
> I have multi-tenant clients submitting data to a Kafka topic that they want
> ETL'd to a third party service.  I'd like to batch and group these by
> tenant over a time window, somewhere between 1 and 5 minutes.  At the end
> of a time window then issue an API request to the third party service for
> each tenant sending the batch of data over.
>
> Other points of note:
> - Ideally we'd have exactly-once semantics, sending data multiple times
> would typically be bad.  But we'd need to gracefully handle things like API
> request errors / service outages.
>
> - We currently use Storm for doing stream processing, but the long running
> time-windows and potentially large amount of data stored in memory make me
> a bit nervous to use it for this.
>
> Thoughts?  Thanks in Advance!
> Stephen
>


Re: Is this a decent use case for Kafka Streams?

2017-07-13 Thread Eno Thereska
From just looking at your description of the problem, I'd say yes, this looks 
like a typical scenario for Kafka Streams. Kafka Streams supports exactly once 
semantics too in 0.11.

Cheers
Eno

> On 12 Jul 2017, at 17:06, Stephen Powis  wrote:
> 
> Hey! I was hoping I could get some input from people more experienced with
> Kafka Streams to determine if they'd be a good use case/solution for me.
> 
> I have multi-tenant clients submitting data to a Kafka topic that they want
> ETL'd to a third party service.  I'd like to batch and group these by
> tenant over a time window, somewhere between 1 and 5 minutes.  At the end
> of a time window then issue an API request to the third party service for
> each tenant sending the batch of data over.
> 
> Other points of note:
> - Ideally we'd have exactly-once semantics, sending data multiple times
> would typically be bad.  But we'd need to gracefully handle things like API
> request errors / service outages.
> 
> - We currently use Storm for doing stream processing, but the long running
> time-windows and potentially large amount of data stored in memory make me
> a bit nervous to use it for this.
> 
> Thoughts?  Thanks in Advance!
> Stephen



Is this a decent use case for Kafka Streams?

2017-07-12 Thread Stephen Powis
Hey! I was hoping I could get some input from people more experienced with
Kafka Streams to determine if they'd be a good use case/solution for me.

I have multi-tenant clients submitting data to a Kafka topic that they want
ETL'd to a third party service.  I'd like to batch and group these by
tenant over a time window, somewhere between 1 and 5 minutes.  At the end
of a time window then issue an API request to the third party service for
each tenant sending the batch of data over.

Other points of note:
- Ideally we'd have exactly-once semantics, sending data multiple times
would typically be bad.  But we'd need to gracefully handle things like API
request errors / service outages.

- We currently use Storm for doing stream processing, but the long running
time-windows and potentially large amount of data stored in memory make me
a bit nervous to use it for this.

Thoughts?  Thanks in Advance!
Stephen