Hi Michael,
I agree with what Yan said. While nothing stops you from doing it, it is
not encouraged as it affect throughput and realtime processing.

{quote}
It seems that Samza design suits very well "data transformation" scenarios,
what is not clear is how well can it support external services?
{quote}
We have some similar use-cases at LinkedIn where the Samza jobs need to
query to external data sources. We do use a pattern where the job
bootstraps the data from the source using a change-capture system like
databus and buffer it locally, before processing from input streams.
Depending on the scale of your data, this model may or may not work for
you. However, there is no in-built support for this in Samza.

Thanks!
Navina

On Sun, Sep 20, 2015 at 7:55 PM, Yan Fang <yanfang...@gmail.com> wrote:

> Hi Michael,
>
> Samza is designed for high-throughput and realtime processing. If you are
> using HTTP request/external service, you may not retrieve the same
> performance as not using it. However, technically speaking, there is
> nothing blocking you to do this, (well, discouraged anyway :). Samza by
> default does not provide this feature. So you maybe a little cautious when
> implementing this.
>
> Thanks,
>
> Fang, Yan
> yanfang...@gmail.com
>
> On Sun, Sep 20, 2015 at 4:28 PM, Michael Sklyar <mikesk...@gmail.com>
> wrote:
>
> > Hi,
> >
> > What would be the best approach for doing "blocking" operations in Samza?
> >
> > For example, we have a kafka stream of urls for which we need to gather
> > external data via HTTP (such as alexa rank, get the page title and
> > headers..). Other scenarios include database access and decision making
> via
> > a rule engine.
> >
> > Samza processes messages in a singe thread, HTTP requests might take
> > hundreds of miliseconds. With the single threaded design the throughput
> > would be very limited, which can be solved with an asynchronous approach.
> > However Samza documentation explicitely states
> > "*You are strongly discouraged from using threads in your job’s code*".
> >
> > It seems that Samza design suits very well "data transformation"
> scenarios,
> > what is not clear is how well can it support external services?
> >
> > Thanks,
> > Michael Sklyar
> >
>



-- 
Navina R.

Reply via email to