Jun, compression is an awesome feature.. Re: file rolling - I was referring to the apache log rotation, not kafka..
On Thu, Sep 29, 2011 at 12:20 PM, Jun Rao <[email protected]> wrote: > Eric, > > Thanks for the analysis. A couple of comments: > > Kafka recently added the end-to-end compression feature and we will be > releasing it soon. Please see > https://issues.apache.org/jira/browse/KAFKA-79for details. > > About the file rolling support, are you referring to Kafka log? Kafka logs > are rolled based on a preconfigured size. > > Thanks, > > Jun > > On Thu, Sep 29, 2011 at 11:25 AM, Eric Hauser <[email protected]> wrote: > >> Jeremy, >> >> I've used both Flume and Kafka, and I can provide some info for comparison: >> >> Flume >> - The current Flume release 0.9.4 has some pretty nasty bugs in it >> (most have been fixed in trunk). >> - Flume is a more complex to maintain operations-wise (IMO) than Kafka >> since you have to setup masters and collectors (you don't necessarily >> need collectors if you aren't writing to HDFS) >> - Flume has a well defined pattern for doing what you want: >> >> http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-server-logs/ >> >> Kafka >> - If you need multiple Kafka partitions for the logs, you will want to >> partition by host so the messages arrive in order for the same host >> - You can use the same piped technique as Flume to publish to Kafka, >> but you'll have to write a little code to publish and subscribe to the >> stream >> - Kafka does not provide any of the file rolling, compression, etc. >> that Flume provides >> - If you ever want to do anything more interesting with those log >> files than just send them to one location, publishing them to Kafka >> would allow you to add additional consumers later. Flume has a >> concept of fanout sinks, but I don't care for the way it works. >> >> >> >> On Thu, Sep 29, 2011 at 1:48 PM, Jun Rao <[email protected]> wrote: >> > Jeremy, >> > >> > Yes, Kafka will be a good fit for that. >> > >> > Thanks, >> > >> > Jun >> > >> > On Thu, Sep 29, 2011 at 10:12 AM, Jeremy Hanna >> > <[email protected]>wrote: >> > >> >> We have a number of web servers in ec2 and periodically we just blow >> them >> >> away and create new ones. That makes keeping logs problematic. We're >> >> looking for a way to stream the logs from those various sources directly >> to >> >> a central log server - either just a single server or hdfs or something >> like >> >> that. >> >> >> >> My question is whether kafka is a good fit for that or should I be >> looking >> >> more along the lines of flume or scribe? >> >> >> >> Many thanks. >> >> >> >> Jeremy >> > >> >
