Denny Ye created FLUME-1573:
-------------------------------

             Summary: Duplicated HDFS file name when multiple SinkRunner was 
existing
                 Key: FLUME-1573
                 URL: https://issues.apache.org/jira/browse/FLUME-1573
             Project: Flume
          Issue Type: Bug
          Components: Sinks+Sources
    Affects Versions: v1.2.0
            Reporter: Denny Ye
            Assignee: Denny Ye
             Fix For: v1.3.0


Multiple HDFS Sinks to write events into storage. Timeout exception is always 
happening:
{code:xml}
11 Sep 2012 07:04:53,478 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] 
(org.apache.flume.sink.hdfs.HDFSEventSink.process:442)  - HDFS IO error
java.io.IOException: Callable timed out after 10000 ms
        at 
org.apache.flume.sink.hdfs.HDFSEventSink.callWithTimeout(HDFSEventSink.java:342)
        at 
org.apache.flume.sink.hdfs.HDFSEventSink.append(HDFSEventSink.java:713)
        at 
org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:412)
        at 
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run(Thread.java:619)
Caused by: java.util.concurrent.TimeoutException
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
        at java.util.concurrent.FutureTask.get(FutureTask.java:91)
        at 
org.apache.flume.sink.hdfs.HDFSEventSink.callWithTimeout(HDFSEventSink.java:335)
        ... 5 more
{code}

I doubted that there might be happened HDFS timeout or slowly response. As 
expected, I found the duplicated creation exception with same with at HDFS. 
Also, Flume recorded same case for duplicated file name.
{code:xml}
13 Sep 2012 02:09:35,432 INFO  [hdfs-hdfsSink-3-call-runner-7] 
(org.apache.flume.sink.hdfs.BucketWriter.doOpen:189)  - Creating 
/FLUME/dt=2012-09-13/02-host.1347501924111.tmp
13 Sep 2012 02:09:36,425 INFO  [hdfs-hdfsSink-4-call-runner-8] 
(org.apache.flume.sink.hdfs.BucketWriter.doOpen:189)  - Creating 
/FLUME/dt=2012-09-13/02-host.1347501924111.tmp
{code}

Different threads were going to create same file without time conflict.

I found the root cause might be wrong usage the AtomicLong property named 
'fileExtensionCounter' at BucketWriter. Different threads should own same 
counter by protected with CAS, not multiple private property in each thread. 
It's useless to avoid conflict of HDFS path

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to