Denny, I am not sure if anyone has actually benchmarked the FileChannel. What kind of performance are you getting as of now? If you have a patch that can improve the performance a lot, please feel free to submit it. We'd definitely like to get such a patch committed.
Thanks Hari -- Hari Shreedharan On Thursday, August 2, 2012 at 8:02 PM, Denny Ye wrote: > hi all, > I posted performance of MemoryChannel last week. That's normal throughput > in most environment. Therefore, the performance result of FileChannel is > below expectation with same environments and parameters, almost 5MB/s. > > I want to know your throughput result of FileChannel specially. Am I > walking with wrong way? It's hard to believe the result. > > Also I have tuning with several code changes, the throughput increasing to > 30MB/s. I think there also have lots of points to impact the performance. > > Any guys, would you give me your throughput result or feedback for tuning? > > -Regards > Denny Ye > > > ---------- Forwarded message ---------- > From: Denny Ye <[email protected] (mailto:[email protected])> > Date: 2012/7/25 > Subject: Latest Flume test report and problem > To: [email protected] (mailto:[email protected]) > > > hi all, > I tested Flume in last week with > ScribeSource(https://issues.apache.org/jira/browse/FLUME-1382) and HDFS Sink. > More detailed conditions and deployment cases listed below. Too many 'Full > GC' impact the throughput and amount of events promoted into old generation. > I have applied some tuning methods, no much effect. > Could someone give me your feedback or tip to reduce the GC problem? Wish > your attention. > > PS: Using Mike's report template at > https://cwiki.apache.org/FLUME/flume-ng-performance-measurements.html > > Flume Performance Test 2012-07-25 > Overview > The Flume agent was run on its own physical machine in a single JVM. A > separate client machine generated load against the Flume box in > List<LogEntry> format. Flume stored data onto a 4-node HDFS cluster > configured on its own separate hardware. No virtual machines were used in > this test. > Hardware specs > CPU: Inter Xeon L5640 2 x quad-core @ 2.27 GHz (12 physical cores) > Memory: 16 GB > OS: CentOS release 5.3 (Final) > Flume configuration > JAVA Version: 1.6.0_20 (Java HotSpot 64-Bit Server VM) > JAVA OPTS: -Xms1024m -Xmx4096m -XX:PermSize=256m -XX:NewRatio=1 > -XX:SurvivorRatio=5 -XX:InitialTenuringThreshold=15 > -XX:MaxTenuringThreshold=31 -XX:PretenureSizeThreshold=4096 > Num. agents: 1 > Num. parallel flows: 5 > Source: ScribeSource > Channel: MemoryChannel > Sink: HDFSEventSink > Selector: RandomSelector > Config-file > # list sources, channels, sinks for the agent > agent.sources = seqGenSrc > agent.channels = mc1 mc2 mc3 mc4 mc5 > agent.sinks = hdfsSin1 hdfsSin2 hdfsSin3 hdfsSin4 hdfsSin5 > > # define sources > agent.sources.seqGenSrc.type = org.apache.flume.source.scribe.ScribeSource > agent.sources.seqGenSrc.selector.type = io.flume.RandomSelector > > # define sinks > agent.sinks.hdfsSin1.type = hdfs > agent.sinks.hdfsSin1.hdfs.path = /flume_test/data1/ > agent.sinks.hdfsSin1.hdfs.rollInterval = 300 > agent.sinks.hdfsSin1.hdfs.rollSize = 0 > agent.sinks.hdfsSin1.hdfs.rollCount = 1000000 > agent.sinks.hdfsSin1.hdfs.batchSize = 10000 > agent.sinks.hdfsSin1.hdfs.fileType = DataStream > agent.sinks.hdfsSin1.hdfs.txnEventMax = 1000 > # ... define sink #2 #3 #4 #5 ... > > # define channels > agent.channels.mc1.type = memory > agent.channels.mc1.capacity = 1000000 > agent.channels.mc1.transactionCapacity = 1000 > # ... define channel #2 #3 #4 #5 ... > > # specify the channel each sink and source should use > agent.sources.seqGenSrc.channels = mc1 mc2 mc3 mc4 mc5 > agent.sinks.hdfsSin1.channel = mc1 > # ... specify sink #2 #3 #4 #5 ... > Hadoop configuration > The HDFS sink was connected to a 4-node Hadoop cluster running CDH3u1. For > different HDFS sink, HDFS wrote data into different path. > Visualization of test setup > https://lh3.googleusercontent.com/dGumq1pu1Wr3Bj8WJmRHOoLWmUlGqxC4wW7_XCNO9R1wuh15LRXaKKxGoccpjBXtgqcdSVW-vtg > There are 10 Scribe Clients and each client send 20 million LogEntry objects > to ScribleSource. > Data description > List<LogEntry> entries containing a string category and a ByteArray body. The > ByteArray body size is 500 bytes. > Results > Throughput: > Average: Source: 46.4 MB/s, Sink: 45.2 MB/s > Maximum: Source: 67.1 MB/s, Sink: 88.3 MB/s > > CPU: Average: 196%, Maximum: 440% > > GC: Young GC: 1636 times, Full GC: 384 times > > No data loss. > Heap and GC > By analyzing JVM Heap, we found that there are many LogEntry objects in > OldGen. We have tried to carry out some optimizations, but the results are > not satisfactory. We will continue to track this limitation. > > FullGC Log examples: > [Full GC [PSYoungGen: 1497984K->0K(1797568K)] [PSOldGen: > 1720643K->1693741K(2097152K)] 3218627K->1693741K(3894720K) [PSPermGen: > 14566K->14566K(262144K)], 5.0027700 secs] [Times: user=5.01 sys=0.00, > real=5.00 secs] > [Full GC [PSYoungGen: 1497960K->0K(1797568K)] [PSOldGen: > 1693805K->1752540K(2097152K)] 3191765K->1752540K(3894720K) [PSPermGen: > 14571K->14571K(262144K)], 5.0732570 secs] [Times: user=5.07 sys=0.00, > real=5.07 secs] > [Full GC [PSYoungGen: 1497984K->0K(1797568K)] [PSOldGen: > 1752540K->1642553K(2097152K)] 3250524K->1642553K(3894720K) [PSPermGen: > 14572K->14568K(262144K)], 5.0710730 secs] [Times: user=5.07 sys=0.01, > real=5.08 secs] > > > -Regards > Denny Ye > >
