Just to chime in, I'd be very interested in the monitoring blog post too. We're doing a kafka implementation for a robust data pipeline. Initially, samza does look interesting for monitoring use-cases.
On Sun, Oct 20, 2013 at 2:53 PM, Garry Turkington < [email protected]> wrote: > Hi Chris, > > Thanks for all this, makes sense. Be interested to hear where things go > with the locality optimizations. I'm just looking at deploying our first > Kafka cluster to change how we do data distribution and that's not going to > initially be collocated with the Hadoop cluster. Samza's tight Kafka > integration is one of the things that has drawn me to it so I'm looking > forward (!) to seeing what sort of performance/latency I get from the > remote/smaller Kafka setup. > > Looking forward to the blog post on the monitoring jobs written in Samza. > We're in the earlier stages of a common service framework so have the > luxury of building on the experiences of others who learned this stuff the > hard way. :) > > Regards > Garry > > -----Original Message----- > From: Chris Riccomini [mailto:[email protected]] > Sent: 18 October 2013 19:01 > To: [email protected] > Subject: Re: Special Bay Area HUG: Tajo and Samza > > Hey Gary, > > Thanks! > > Locality: A few things to note here. > > 1. We run one broker per host, as you suggest (18 nodes = 18 brokers). > 2. Samza does not explicitly try to do any co-location right now. Any > locality that we get is purely luck. > 3. YARN allows you to make resource requests for a specific host/rack. > This is the feature we would like to use to provide better locality. > > We haven't done any meaningful evaluation of the locality we're getting > (or would get) right now, though. > > Operations: Yes, we have a pretty cool set of Samza jobs that Jakob wrote > to do some metrics/monitoring stuff. He can probably talk more about it > than I can. We're planning on putting up a blog post in the near future > about it. > > More broadly, we have a pretty well defined service container at LinkedIn. > These services are called via RPC. Every time an RPC request is made, the > service logs out information about the request: who sent the request, what > method was called, how long it took to process, etc etc. In addition, we > also have all WARN/ERROR log events flowing through Kafka as well (via > Kafka's Log4j appender). There is a brief mention of this in: > > http://sites.computer.org/debull/A12june/pipeline.pdf > > As you can imagine, there are a ton of things you can do with this data. :) > > Cheers, > Chris > > On 10/18/13 4:44 AM, "Garry Turkington" <[email protected]> > wrote: > > >Hi Chris, > > > >Nice presentation -- 2 questions: > > > >1. I had wondered about the references to Kafka broker colocation I'd > >seen around the place. So for example in the 18-node sized cluster you > >mention you'd have 18 Kafka brokers running there, 1 per host? Do you > >actually get any sort of data locality benefits from this, is there a > >way to ensure that the Samza container on host x is processing the > >partitions of each topic on the collocated Kafka broker? Or am I missing > the intent? > > > >2. Interested at your mention of using something like Samza for > >processing of monitoring and metric type data, it's something we've > >been talking about internally. Anything been published on what you are > >doing in that space? > > > >Thanks! > >Garry > > > >-----Original Message----- > >From: Chris Riccomini [mailto:[email protected]] > >Sent: 17 October 2013 21:54 > >To: [email protected] > >Subject: Re: Special Bay Area HUG: Tajo and Samza > > > >Hey Guys, > > > >On a related note, my talk from the YARN meet up at LinkedIn is now > >online: > > > > https://www.youtube.com/watch?v=7YBmUKjzg7c > > > >If you're not too familiar with Samza, this is a great place to start. > > > >Also, feedback welcome on presentation content, style, etc. > > > >Cheers, > >Chris > > > >On 10/17/13 11:08 AM, "Jakob Homan" <[email protected]> wrote: > > > >>Hey everybody- > >> Join us at LinkedIn Nov. 5 for a special HUG dedicated to two new > >>awesome Incubator projects, Tajo, a low-latency SQL query engine atop > >>YARN and Samza. > >> > >>http://www.meetup.com/hadoop/events/146077932/ > >> > >>-Jakob > > > > > >----- > >No virus found in this message. > >Checked by AVG - www.avg.com > >Version: 2013.0.3408 / Virus Database: 3222/6751 - Release Date: > >10/15/13 > > > ----- > No virus found in this message. > Checked by AVG - www.avg.com > Version: 2013.0.3408 / Virus Database: 3222/6751 - Release Date: 10/15/13 >
