Just to chime in, I'd be very interested in the monitoring blog post too.
We're doing a kafka implementation for a robust data pipeline. Initially,
samza does look interesting for monitoring use-cases.

On Sun, Oct 20, 2013 at 2:53 PM, Garry Turkington <
[email protected]> wrote:

> Hi Chris,
>
> Thanks for all this, makes sense.  Be interested to hear where things go
> with the locality optimizations. I'm just looking at deploying our first
> Kafka cluster to change how we do data distribution and that's not going to
>  initially be collocated with the Hadoop cluster.  Samza's tight Kafka
> integration is one of the things that has drawn me to it so I'm looking
> forward (!) to seeing what sort of performance/latency I get from the
> remote/smaller Kafka setup.
>
> Looking forward to the blog post on the monitoring jobs written in Samza.
>  We're in the earlier stages of a common service framework so have the
> luxury of building on the experiences of others who learned this stuff the
> hard way. :)
>
> Regards
> Garry
>
> -----Original Message-----
> From: Chris Riccomini [mailto:[email protected]]
> Sent: 18 October 2013 19:01
> To: [email protected]
> Subject: Re: Special Bay Area HUG: Tajo and Samza
>
> Hey Gary,
>
> Thanks!
>
> Locality: A few things to note here.
>
> 1. We run one broker per host, as you suggest (18 nodes = 18 brokers).
> 2. Samza does not explicitly try to do any co-location right now. Any
> locality that we get is purely luck.
> 3. YARN allows you to make resource requests for a specific host/rack.
> This is the feature we would like to use to provide better locality.
>
> We haven't done any meaningful evaluation of the locality we're getting
> (or would get) right now, though.
>
> Operations: Yes, we have a pretty cool set of Samza jobs that Jakob wrote
> to do some metrics/monitoring stuff. He can probably talk more about it
> than I can. We're planning on putting up a blog post in the near future
> about it.
>
> More broadly, we have a pretty well defined service container at LinkedIn.
> These services are called via RPC. Every time an RPC request is made, the
> service logs out information about the request: who sent the request, what
> method was called, how long it took to process, etc etc. In addition, we
> also have all WARN/ERROR log events flowing through Kafka as well (via
> Kafka's Log4j appender). There is a brief mention of this in:
>
>   http://sites.computer.org/debull/A12june/pipeline.pdf
>
> As you can imagine, there are a ton of things you can do with this data. :)
>
> Cheers,
> Chris
>
> On 10/18/13 4:44 AM, "Garry Turkington" <[email protected]>
> wrote:
>
> >Hi Chris,
> >
> >Nice presentation -- 2 questions:
> >
> >1. I had wondered about the references to Kafka broker colocation I'd
> >seen around the place.  So for example in the 18-node sized cluster you
> >mention you'd have 18 Kafka brokers running there, 1 per host?  Do you
> >actually get any sort of data locality benefits from this, is there a
> >way to ensure that the Samza container on host x is processing the
> >partitions of each topic on the collocated Kafka broker?  Or am I missing
> the intent?
> >
> >2. Interested at your mention of using something like Samza for
> >processing of monitoring and metric type data, it's something we've
> >been talking about internally.  Anything been published on what you are
> >doing in that space?
> >
> >Thanks!
> >Garry
> >
> >-----Original Message-----
> >From: Chris Riccomini [mailto:[email protected]]
> >Sent: 17 October 2013 21:54
> >To: [email protected]
> >Subject: Re: Special Bay Area HUG: Tajo and Samza
> >
> >Hey Guys,
> >
> >On a related note, my talk from the YARN meet up at LinkedIn is now
> >online:
> >
> >  https://www.youtube.com/watch?v=7YBmUKjzg7c
> >
> >If you're not too familiar with Samza, this is a great place to start.
> >
> >Also, feedback welcome on presentation content, style, etc.
> >
> >Cheers,
> >Chris
> >
> >On 10/17/13 11:08 AM, "Jakob Homan" <[email protected]> wrote:
> >
> >>Hey everybody-
> >>   Join us at LinkedIn Nov. 5 for a special HUG dedicated to two new
> >>awesome Incubator projects, Tajo, a low-latency SQL query engine atop
> >>YARN and Samza.
> >>
> >>http://www.meetup.com/hadoop/events/146077932/
> >>
> >>-Jakob
> >
> >
> >-----
> >No virus found in this message.
> >Checked by AVG - www.avg.com
> >Version: 2013.0.3408 / Virus Database: 3222/6751 - Release Date:
> >10/15/13
>
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 2013.0.3408 / Virus Database: 3222/6751 - Release Date: 10/15/13
>

Reply via email to