[jira] [Comment Edited] (FLUME-1573) Duplicated HDFS file name when multiple SinkRunner was existing

Denny Ye (JIRA) Tue, 25 Sep 2012 18:50:11 -0700

    [ 
https://issues.apache.org/jira/browse/FLUME-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463456#comment-13463456
 ]


Denny Ye edited comment on FLUME-1573 at 9/26/12 12:48 PM:
-----------------------------------------------------------

Sure, we are talking about naming conflict in two levels : multiple Flume 
processes in different host and multiple Sinks in same Flume instance. Previous 
one about naming is classic problem in distributed cluster. It's hard to 
synchronize timestamp in multiple Flume processes, even using static or 
non-static property. It can be resolved by method what you talked above for 
attaching identifier between each Flume process, such as host name or something 
else. I attached my flume.conf 
here.[https://issues.apache.org/jira/browse/FLUME-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463428#comment-13463428]

The following we may handle conflict between multiple Sinks in single Flume 
instance. This is easy case to synchronize within process by using original 
lock or CAS. Old code in BucketWriter is illegal usage about Java 
synchronization, in my opinion. I just provide one of solution to avoid 
conflict between multiple threads. Of course, we can discuss more detail in how 
to reach our expectation.
                
      was (Author: dennyy):
    Sure, we are talking about naming conflict in two levels : multiple Flume 
process in different host and multiple Sink in same Flume instance. Previous 
one about naming is classic problem in distributed cluster. It's hard to 
synchronize timestamp in multiple Flume process, even using static or 
non-static property. It can be resolved by method what you talked above for 
attaching identifier between each Flume process, such as host name or something 
else. [I attached my flume.conf here. 
https://issues.apache.org/jira/browse/FLUME-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463428#comment-13463428]

The following we may handle conflict between multiple Sinks in single Flume 
instance. This is easy case to synchronize within process by using original 
lock or CAS. Old code in BucketWriter is illegal usage about Java 
synchronization, in my opinion. I just provide one of solution to avoid 
conflict between multiple threads. Of course, we can discuss more detail in how 
to reach our expectation.
                  
> Duplicated HDFS file name when multiple SinkRunner was existing
> ---------------------------------------------------------------
>
>                 Key: FLUME-1573
>                 URL: https://issues.apache.org/jira/browse/FLUME-1573
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.2.0
>            Reporter: Denny Ye
>            Assignee: Denny Ye
>             Fix For: v1.3.0
>
>         Attachments: FLUME-1573.patch
>
>
> Multiple HDFS Sinks to write events into storage. Timeout exception is always 
> happening:
> {code:xml}
> 11 Sep 2012 07:04:53,478 WARN  
> [SinkRunner-PollingRunner-DefaultSinkProcessor] 
> (org.apache.flume.sink.hdfs.HDFSEventSink.process:442)  - HDFS IO error
> java.io.IOException: Callable timed out after 10000 ms
>         at 
> org.apache.flume.sink.hdfs.HDFSEventSink.callWithTimeout(HDFSEventSink.java:342)
>         at 
> org.apache.flume.sink.hdfs.HDFSEventSink.append(HDFSEventSink.java:713)
>         at 
> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:412)
>         at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: java.util.concurrent.TimeoutException
>         at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:228)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:91)
>         at 
> org.apache.flume.sink.hdfs.HDFSEventSink.callWithTimeout(HDFSEventSink.java:335)
>         ... 5 more
> {code}
> I doubted that there might be happened HDFS timeout or slowly response. As 
> expected, I found the duplicated creation exception with same with at HDFS. 
> Also, Flume recorded same case for duplicated file name.
> {code:xml}
> 13 Sep 2012 02:09:35,432 INFO  [hdfs-hdfsSink-3-call-runner-7] 
> (org.apache.flume.sink.hdfs.BucketWriter.doOpen:189)  - Creating 
> /FLUME/dt=2012-09-13/02-host.1347501924111.tmp
> 13 Sep 2012 02:09:36,425 INFO  [hdfs-hdfsSink-4-call-runner-8] 
> (org.apache.flume.sink.hdfs.BucketWriter.doOpen:189)  - Creating 
> /FLUME/dt=2012-09-13/02-host.1347501924111.tmp
> {code}
> Different threads were going to create same file without time conflict.
> I found the root cause might be wrong usage the AtomicLong property named 
> 'fileExtensionCounter' at BucketWriter. Different threads should own same 
> counter by protected with CAS, not multiple private property in each thread. 
> It's useless to avoid conflict of HDFS path

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (FLUME-1573) Duplicated HDFS file name when multiple SinkRunner was existing

Reply via email to