Jun,
compression is an awesome feature..
Re: file rolling - I was referring to the apache log rotation, not kafka..


On Thu, Sep 29, 2011 at 12:20 PM, Jun Rao <[email protected]> wrote:
> Eric,
>
> Thanks for the analysis. A couple of comments:
>
> Kafka recently added the end-to-end compression feature and we will be
> releasing it soon. Please see
> https://issues.apache.org/jira/browse/KAFKA-79for details.
>
> About the file rolling support, are you referring to Kafka log? Kafka logs
> are rolled based on a preconfigured size.
>
> Thanks,
>
> Jun
>
> On Thu, Sep 29, 2011 at 11:25 AM, Eric Hauser <[email protected]> wrote:
>
>> Jeremy,
>>
>> I've used both Flume and Kafka, and I can provide some info for comparison:
>>
>> Flume
>> - The current Flume release 0.9.4 has some pretty nasty bugs in it
>> (most have been fixed in trunk).
>> - Flume is a more complex to maintain operations-wise (IMO) than Kafka
>> since you have to setup masters and collectors (you don't necessarily
>> need collectors if you aren't writing to HDFS)
>> - Flume has a well defined pattern for doing what you want:
>>
>> http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-server-logs/
>>
>> Kafka
>> - If you need multiple Kafka partitions for the logs, you will want to
>> partition by host so the messages arrive in order for the same host
>> - You can use the same piped technique as Flume to publish to Kafka,
>> but you'll have to write a little code to publish and subscribe to the
>> stream
>> - Kafka does not provide any of the file rolling, compression, etc.
>> that Flume provides
>> - If you ever want to do anything more interesting with those log
>> files than just send them to one location, publishing them to Kafka
>> would allow you to add additional consumers later.  Flume has a
>> concept of fanout sinks, but I don't care for the way it works.
>>
>>
>>
>> On Thu, Sep 29, 2011 at 1:48 PM, Jun Rao <[email protected]> wrote:
>> > Jeremy,
>> >
>> > Yes, Kafka will be a good fit for that.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > On Thu, Sep 29, 2011 at 10:12 AM, Jeremy Hanna
>> > <[email protected]>wrote:
>> >
>> >> We have a number of web servers in ec2 and periodically we just blow
>> them
>> >> away and create new ones.  That makes keeping logs problematic.  We're
>> >> looking for a way to stream the logs from those various sources directly
>> to
>> >> a central log server - either just a single server or hdfs or something
>> like
>> >> that.
>> >>
>> >> My question is whether kafka is a good fit for that or should I be
>> looking
>> >> more along the lines of flume or scribe?
>> >>
>> >> Many thanks.
>> >>
>> >> Jeremy
>> >
>>
>

Reply via email to