[ 
https://issues.apache.org/jira/browse/FLUME-2517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14185606#comment-14185606
 ] 

Hari Shreedharan commented on FLUME-2517:
-----------------------------------------

+1. This looks good to me. Let me run the tests

> Performance issue: SimpleDateFormat constructor takes 30% of 
> HDFSEventSink.process()
> ------------------------------------------------------------------------------------
>
>                 Key: FLUME-2517
>                 URL: https://issues.apache.org/jira/browse/FLUME-2517
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.5.0.1
>         Environment: linux i686
> java version "1.7.0_45"
>            Reporter: Pal Konyves
>              Labels: performance
>         Attachments: flume_2517.patch, flume_2517.png
>
>
> I started investigating why HDFS sink has so bad throughput in v 1.5.0.0. It 
> seems to be better in 1.6.0.0 (current trunk).
> PseudoTx channel was filling up, because HDFS Sink could not write as fast as 
> data coming from source.
> Profiling from jconsole revealed that 30% of the time spent in 
> HDFSEventSink.process() method is taken by constructing SimpleDateFormat 
> objects. SimpleDateFormat object is notoriously a heavy and time consuming 
> object to create. It is also not thread-safe.
> It is used in HDFS Sink to calculate the path that contains date-time 
> wildcards. I will provide a patch to cache SimpleDateFormat objects for 
> thread. With this patch, the PseudoTx channel I used for testing was not 
> constantly filling up, and throughput was much better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to