[ 
https://issues.apache.org/jira/browse/FLUME-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465268#comment-13465268
 ] 

Denny Ye commented on FLUME-1573:
---------------------------------

Sorry, I couldn't agree with you. Currently, duplicated file name was caused by 
original code of BucketWriter. Thus, it doesn't optional choose or just 'simply 
a non-static instance variable'. 
Let me explain the root cause of this bug, though I tried to at bottom of bug 
description.
The primary design might be hold unique file name between concurrent Sinks. The 
designer picked up CAS as final solution. What's cardinal number of CAS in each 
Sink? Here is a failed point. Designer was used timestamp for each Sink when it 
had been created. The cardinal number of all Sinks should be unified, not 
private setting. That's the reason of why I used static instance. For holding 
unified cardinal number for all Sinks.
                
> Duplicated HDFS file name when multiple SinkRunner was existing
> ---------------------------------------------------------------
>
>                 Key: FLUME-1573
>                 URL: https://issues.apache.org/jira/browse/FLUME-1573
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.2.0
>            Reporter: Denny Ye
>            Assignee: Denny Ye
>             Fix For: v1.3.0
>
>         Attachments: FLUME-1573.patch
>
>
> Multiple HDFS Sinks to write events into storage. Timeout exception is always 
> happening:
> {code:xml}
> 11 Sep 2012 07:04:53,478 WARN  
> [SinkRunner-PollingRunner-DefaultSinkProcessor] 
> (org.apache.flume.sink.hdfs.HDFSEventSink.process:442)  - HDFS IO error
> java.io.IOException: Callable timed out after 10000 ms
>         at 
> org.apache.flume.sink.hdfs.HDFSEventSink.callWithTimeout(HDFSEventSink.java:342)
>         at 
> org.apache.flume.sink.hdfs.HDFSEventSink.append(HDFSEventSink.java:713)
>         at 
> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:412)
>         at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: java.util.concurrent.TimeoutException
>         at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:91)
>         at 
> org.apache.flume.sink.hdfs.HDFSEventSink.callWithTimeout(HDFSEventSink.java:335)
>         ... 5 more
> {code}
> I doubted that there might be happened HDFS timeout or slowly response. As 
> expected, I found the duplicated creation exception with same with at HDFS. 
> Also, Flume recorded same case for duplicated file name.
> {code:xml}
> 13 Sep 2012 02:09:35,432 INFO  [hdfs-hdfsSink-3-call-runner-7] 
> (org.apache.flume.sink.hdfs.BucketWriter.doOpen:189)  - Creating 
> /FLUME/dt=2012-09-13/02-host.1347501924111.tmp
> 13 Sep 2012 02:09:36,425 INFO  [hdfs-hdfsSink-4-call-runner-8] 
> (org.apache.flume.sink.hdfs.BucketWriter.doOpen:189)  - Creating 
> /FLUME/dt=2012-09-13/02-host.1347501924111.tmp
> {code}
> Different threads were going to create same file without time conflict.
> I found the root cause might be wrong usage the AtomicLong property named 
> 'fileExtensionCounter' at BucketWriter. Different threads should own same 
> counter by protected with CAS, not multiple private property in each thread. 
> It's useless to avoid conflict of HDFS path

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to