Hi Michael,

Samza is designed for high-throughput and realtime processing. If you are
using HTTP request/external service, you may not retrieve the same
performance as not using it. However, technically speaking, there is
nothing blocking you to do this, (well, discouraged anyway :). Samza by
default does not provide this feature. So you maybe a little cautious when
implementing this.

Thanks,

Fang, Yan
yanfang...@gmail.com

On Sun, Sep 20, 2015 at 4:28 PM, Michael Sklyar <mikesk...@gmail.com> wrote:

> Hi,
>
> What would be the best approach for doing "blocking" operations in Samza?
>
> For example, we have a kafka stream of urls for which we need to gather
> external data via HTTP (such as alexa rank, get the page title and
> headers..). Other scenarios include database access and decision making via
> a rule engine.
>
> Samza processes messages in a singe thread, HTTP requests might take
> hundreds of miliseconds. With the single threaded design the throughput
> would be very limited, which can be solved with an asynchronous approach.
> However Samza documentation explicitely states
> "*You are strongly discouraged from using threads in your job’s code*".
>
> It seems that Samza design suits very well "data transformation" scenarios,
> what is not clear is how well can it support external services?
>
> Thanks,
> Michael Sklyar
>

Reply via email to