Just want to make sure I understand the setup: 1. 9 hadoop servers that were fed the data 2. 1 server was used to generate the syslog data that was spread accross the 6 flume agent servers 3. 6 flume agent servers that collected data in memory and flushed to the 9 hadoop servers
Is that right? On Tue, May 8, 2012 at 1:49 AM, Jarek Jarcec Cecho <jar...@apache.org>wrote: > Thanks Mike, > this is in deed very helpful! > > Jarcec > > On Mon, May 07, 2012 at 06:55:49PM -0700, Mike Percy wrote: > > Hi folks, > > Will McQueen and I have been doing some Flume NG stress and performance > testing, and we wanted to share some of our recent findings. The focus of > the most recent tests has been on the syslog TCP source, memory channel, > and HDFS sink. > > > > I wrote some software to generate load in syslog format over TCP and to > automate some of the analysis. The first thing we wanted to verify is that > no data was lost during these tests (a.k.a. correctness), with a close > second priority being of course throughput (performance). I used Pig and > AvroStorage from piggybank in the data integrity analysis, and committed > the compiled (0.11 trunk) piggybank jar so the load analysis scripts would > be relatively easy to use. It seems to be compatible with Pig 0.8.1. I am a > little wary of having to maintain that type of thing at the Apache org > level so for now I have checked all the code in on Github under an ASL 2.0 > license: > > > > https://github.com/mpercy/flume-load-gen > > > > I have created a Wiki page with the performance metrics we have come up > with so far. The executive summary is that at the time of this writing, we > have observed Flume NG on a single machine processing events at a > throughput rate of 70,000+ events/sec with no data loss. > > > > > https://cwiki.apache.org/confluence/display/FLUME/Flume+NG+Performance+Measurements > > > > I have put more details on the wiki page itself. Please let me know if > you want me to add more detail. I'll be looking into improving the > performance of these components going forward, however we wanted to post > these results to set a public performance baseline of Flume NG. > > > > If others have done performance testing, we would love to see your > results if you can post the details. > > > > Regards, > > Mike > > >