Re: Low throughput of FileChannel

Hari Shreedharan Thu, 02 Aug 2012 20:09:42 -0700

Denny,  

I am not sure if anyone has actually benchmarked the FileChannel. What kind of 
performance are you getting as of now? If you have a patch that can improve the 
performance a lot, please feel free to submit it. We'd definitely like to get 
such a patch committed.


Thanks
Hari

--  
Hari Shreedharan


On Thursday, August 2, 2012 at 8:02 PM, Denny Ye wrote:

> hi all,   
>     I posted performance of MemoryChannel last week. That's normal throughput 
> in most environment. Therefore, the performance result of FileChannel is 
> below expectation with same environments and parameters, almost 5MB/s.  
>     
>     I want to know your throughput result of FileChannel specially. Am I 
> walking with wrong way? It's hard to believe the result.
>  
>    Also I have tuning with several code changes, the throughput increasing to 
> 30MB/s. I think there also have lots of points to impact the performance.  
>   
>     Any guys, would you give me your throughput result or feedback for tuning?
>  
> -Regards
> Denny Ye
>      
>  
> ---------- Forwarded message ----------
> From: Denny Ye <[email protected] (mailto:[email protected])>
> Date: 2012/7/25
> Subject: Latest Flume test report and problem
> To: [email protected] (mailto:[email protected])
>  
>  
> hi all,  
>    I tested Flume in last week with 
> ScribeSource(https://issues.apache.org/jira/browse/FLUME-1382) and HDFS Sink. 
> More detailed conditions and deployment cases listed below. Too many 'Full 
> GC' impact the throughput and amount of events promoted into old generation. 
> I have applied some tuning methods, no much effect.  
>    Could someone give me your feedback or tip to reduce the GC problem? Wish 
> your attention.  
>  
> PS: Using Mike's report template at 
> https://cwiki.apache.org/FLUME/flume-ng-performance-measurements.html  
>  
> Flume Performance Test 2012-07-25  
> Overview
> The Flume agent was run on its own physical machine in a single JVM. A 
> separate client machine generated load against the Flume box in 
> List<LogEntry> format. Flume stored data onto a 4-node HDFS cluster 
> configured on its own separate hardware. No virtual machines were used in 
> this test.
> Hardware specs
> CPU: Inter Xeon L5640 2 x quad-core @ 2.27 GHz (12 physical cores)
> Memory: 16 GB
> OS: CentOS release 5.3 (Final)
> Flume configuration
> JAVA Version: 1.6.0_20 (Java HotSpot 64-Bit Server VM)
> JAVA OPTS: -Xms1024m -Xmx4096m -XX:PermSize=256m -XX:NewRatio=1 
> -XX:SurvivorRatio=5 -XX:InitialTenuringThreshold=15 
> -XX:MaxTenuringThreshold=31 -XX:PretenureSizeThreshold=4096
> Num. agents: 1
> Num. parallel flows: 5
> Source: ScribeSource
> Channel: MemoryChannel
> Sink: HDFSEventSink
> Selector: RandomSelector
> Config-file
> # list sources, channels, sinks for the agent
> agent.sources = seqGenSrc
> agent.channels = mc1 mc2 mc3 mc4 mc5
> agent.sinks = hdfsSin1 hdfsSin2 hdfsSin3 hdfsSin4 hdfsSin5
>   
> # define sources
> agent.sources.seqGenSrc.type = org.apache.flume.source.scribe.ScribeSource
> agent.sources.seqGenSrc.selector.type = io.flume.RandomSelector
>   
> # define sinks
> agent.sinks.hdfsSin1.type = hdfs
> agent.sinks.hdfsSin1.hdfs.path = /flume_test/data1/
> agent.sinks.hdfsSin1.hdfs.rollInterval = 300
> agent.sinks.hdfsSin1.hdfs.rollSize = 0
> agent.sinks.hdfsSin1.hdfs.rollCount = 1000000
> agent.sinks.hdfsSin1.hdfs.batchSize = 10000
> agent.sinks.hdfsSin1.hdfs.fileType = DataStream
> agent.sinks.hdfsSin1.hdfs.txnEventMax = 1000
> # ... define sink #2 #3 #4 #5 ...
>   
> # define channels
> agent.channels.mc1.type = memory
> agent.channels.mc1.capacity = 1000000
> agent.channels.mc1.transactionCapacity = 1000
> # ... define channel #2 #3 #4 #5 ...
>   
> # specify the channel each sink and source should use
> agent.sources.seqGenSrc.channels = mc1 mc2 mc3 mc4 mc5
> agent.sinks.hdfsSin1.channel = mc1
> # ... specify sink #2 #3 #4 #5 ...
> Hadoop configuration
> The HDFS sink was connected to a 4-node Hadoop cluster running CDH3u1. For 
> different HDFS sink, HDFS wrote data into different path.
> Visualization of test setup
> https://lh3.googleusercontent.com/dGumq1pu1Wr3Bj8WJmRHOoLWmUlGqxC4wW7_XCNO9R1wuh15LRXaKKxGoccpjBXtgqcdSVW-vtg
> There are 10 Scribe Clients and each client send 20 million LogEntry objects 
> to ScribleSource.  
> Data description
> List<LogEntry> entries containing a string category and a ByteArray body. The 
> ByteArray body size is 500 bytes.
> Results
> Throughput:
> Average:       Source: 46.4 MB/s, Sink: 45.2 MB/s
> Maximum:    Source: 67.1 MB/s, Sink: 88.3 MB/s
>   
> CPU:       Average: 196%, Maximum: 440%
>   
> GC:         Young GC: 1636 times,      Full GC: 384 times
>  
> No data loss.  
> Heap and GC
> By analyzing JVM Heap, we found that there are many LogEntry objects in 
> OldGen. We have tried to carry out some optimizations, but the results are 
> not satisfactory. We will continue to track this limitation.
>   
> FullGC Log examples:   
> [Full GC [PSYoungGen: 1497984K->0K(1797568K)] [PSOldGen: 
> 1720643K->1693741K(2097152K)] 3218627K->1693741K(3894720K) [PSPermGen: 
> 14566K->14566K(262144K)], 5.0027700 secs] [Times: user=5.01 sys=0.00, 
> real=5.00 secs]
> [Full GC [PSYoungGen: 1497960K->0K(1797568K)] [PSOldGen: 
> 1693805K->1752540K(2097152K)] 3191765K->1752540K(3894720K) [PSPermGen: 
> 14571K->14571K(262144K)], 5.0732570 secs] [Times: user=5.07 sys=0.00, 
> real=5.07 secs]
> [Full GC [PSYoungGen: 1497984K->0K(1797568K)] [PSOldGen: 
> 1752540K->1642553K(2097152K)] 3250524K->1642553K(3894720K) [PSPermGen: 
> 14572K->14568K(262144K)], 5.0710730 secs] [Times: user=5.07 sys=0.01, 
> real=5.08 secs]
>  
>  
> －Regards
> Denny Ye
>  
>

Re: Low throughput of FileChannel

Reply via email to