Hi Chris,

Thanks for all this, makes sense.  Be interested to hear where things go with 
the locality optimizations. I'm just looking at deploying our first Kafka 
cluster to change how we do data distribution and that's not going to  
initially be collocated with the Hadoop cluster.  Samza's tight Kafka 
integration is one of the things that has drawn me to it so I'm looking forward 
(!) to seeing what sort of performance/latency I get from the remote/smaller 
Kafka setup.

Looking forward to the blog post on the monitoring jobs written in Samza.  
We're in the earlier stages of a common service framework so have the luxury of 
building on the experiences of others who learned this stuff the hard way. :)

Regards
Garry

-----Original Message-----
From: Chris Riccomini [mailto:[email protected]] 
Sent: 18 October 2013 19:01
To: [email protected]
Subject: Re: Special Bay Area HUG: Tajo and Samza

Hey Gary,

Thanks!

Locality: A few things to note here.

1. We run one broker per host, as you suggest (18 nodes = 18 brokers).
2. Samza does not explicitly try to do any co-location right now. Any locality 
that we get is purely luck.
3. YARN allows you to make resource requests for a specific host/rack.
This is the feature we would like to use to provide better locality.

We haven't done any meaningful evaluation of the locality we're getting (or 
would get) right now, though.

Operations: Yes, we have a pretty cool set of Samza jobs that Jakob wrote to do 
some metrics/monitoring stuff. He can probably talk more about it than I can. 
We're planning on putting up a blog post in the near future about it.

More broadly, we have a pretty well defined service container at LinkedIn.
These services are called via RPC. Every time an RPC request is made, the 
service logs out information about the request: who sent the request, what 
method was called, how long it took to process, etc etc. In addition, we also 
have all WARN/ERROR log events flowing through Kafka as well (via Kafka's Log4j 
appender). There is a brief mention of this in:

  http://sites.computer.org/debull/A12june/pipeline.pdf

As you can imagine, there are a ton of things you can do with this data. :)

Cheers,
Chris

On 10/18/13 4:44 AM, "Garry Turkington" <[email protected]>
wrote:

>Hi Chris,
>
>Nice presentation -- 2 questions:
>
>1. I had wondered about the references to Kafka broker colocation I'd 
>seen around the place.  So for example in the 18-node sized cluster you 
>mention you'd have 18 Kafka brokers running there, 1 per host?  Do you 
>actually get any sort of data locality benefits from this, is there a 
>way to ensure that the Samza container on host x is processing the 
>partitions of each topic on the collocated Kafka broker?  Or am I missing the 
>intent?
>
>2. Interested at your mention of using something like Samza for 
>processing of monitoring and metric type data, it's something we've 
>been talking about internally.  Anything been published on what you are 
>doing in that space?
>
>Thanks!
>Garry
>
>-----Original Message-----
>From: Chris Riccomini [mailto:[email protected]]
>Sent: 17 October 2013 21:54
>To: [email protected]
>Subject: Re: Special Bay Area HUG: Tajo and Samza
>
>Hey Guys,
>
>On a related note, my talk from the YARN meet up at LinkedIn is now
>online:
>
>  https://www.youtube.com/watch?v=7YBmUKjzg7c
>
>If you're not too familiar with Samza, this is a great place to start.
>
>Also, feedback welcome on presentation content, style, etc.
>
>Cheers,
>Chris
>
>On 10/17/13 11:08 AM, "Jakob Homan" <[email protected]> wrote:
>
>>Hey everybody-
>>   Join us at LinkedIn Nov. 5 for a special HUG dedicated to two new 
>>awesome Incubator projects, Tajo, a low-latency SQL query engine atop 
>>YARN and Samza.
>>
>>http://www.meetup.com/hadoop/events/146077932/
>>
>>-Jakob
>
>
>-----
>No virus found in this message.
>Checked by AVG - www.avg.com
>Version: 2013.0.3408 / Virus Database: 3222/6751 - Release Date: 
>10/15/13


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2013.0.3408 / Virus Database: 3222/6751 - Release Date: 10/15/13

Reply via email to