[ https://issues.apache.org/jira/browse/FLUME-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14185606#comment-14185606 ]
Hari Shreedharan commented on FLUME-2517: ----------------------------------------- +1. This looks good to me. Let me run the tests > Performance issue: SimpleDateFormat constructor takes 30% of > HDFSEventSink.process() > ------------------------------------------------------------------------------------ > > Key: FLUME-2517 > URL: https://issues.apache.org/jira/browse/FLUME-2517 > Project: Flume > Issue Type: Bug > Components: Sinks+Sources > Affects Versions: v1.5.0.1 > Environment: linux i686 > java version "1.7.0_45" > Reporter: Pal Konyves > Labels: performance > Attachments: flume_2517.patch, flume_2517.png > > > I started investigating why HDFS sink has so bad throughput in v 1.5.0.0. It > seems to be better in 1.6.0.0 (current trunk). > PseudoTx channel was filling up, because HDFS Sink could not write as fast as > data coming from source. > Profiling from jconsole revealed that 30% of the time spent in > HDFSEventSink.process() method is taken by constructing SimpleDateFormat > objects. SimpleDateFormat object is notoriously a heavy and time consuming > object to create. It is also not thread-safe. > It is used in HDFS Sink to calculate the path that contains date-time > wildcards. I will provide a patch to cache SimpleDateFormat objects for > thread. With this patch, the PseudoTx channel I used for testing was not > constantly filling up, and throughput was much better. -- This message was sent by Atlassian JIRA (v6.3.4#6332)