[jira] [Commented] (FLUME-2451) HDFS Sink Cannot Reconnect After NameNode Restart

Tom Harrison (JIRA) Wed, 29 Mar 2017 07:08:09 -0700

    [ 
https://issues.apache.org/jira/browse/FLUME-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947190#comment-15947190
 ]


Tom Harrison commented on FLUME-2451:
-------------------------------------

Piotr -- we upgraded flume to a version supplied in Cloudera CDH 5.5 last year 
and it had all the issues we were running into resolved -- it has been rock 
solid since.  Not sure how that version corresponds to the the Apache version, 
but the issue was on the client side (we still have an older version on the 
server), so you might try the latest version from the Cloudera distribution.  
This is a major issue for us and was resolved completely through the new version

> HDFS Sink Cannot Reconnect After NameNode Restart
> -------------------------------------------------
>
>                 Key: FLUME-2451
>                 URL: https://issues.apache.org/jira/browse/FLUME-2451
>             Project: Flume
>          Issue Type: Bug
>          Components: File Channel, Sinks+Sources
>    Affects Versions: 1.4.0
>         Environment: 8 node CDH 4.2.2 (2.0.0-cdh4.2.2) cluster
> All cluster machines are running Ubuntu 12.04 x86_64
>            Reporter: Andrew O'Neill
>            Assignee: Roshan Naik
>              Labels: HDFS, Sink
>         Attachments: FLUME-2451.patch, FLUME-2451.v2.patch, 
> FLUME-2451.v3.patch
>
>
> I am testing a simple flume setup with a Sequence Generator Source, a File 
> Channel, and an HDFS Sink (see my flume.conf below). This configuration works 
> as expected until I reboot the cluster’s NameNode or until I restart the HDFS 
> service on the cluster. At this point, it appears that the Flume Agent cannot 
> reconnect to HDFS and must be manually restarted.
> Here is our flume.conf:
>     appserver.sources = rawtext
>     appserver.channels = testchannel
>     appserver.sinks = test_sink
>     appserver.sources.rawtext.type = seq
>     appserver.sources.rawtext.channels = testchannel
>     appserver.channels.testchannel.type = file
>     appserver.channels.testchannel.capacity = 10000000
>     appserver.channels.testchannel.minimumRequiredSpace = 214748364800
>     appserver.channels.testchannel.checkpointDir = 
> /Users/aoneill/Desktop/testchannel/checkpoint
>     appserver.channels.testchannel.dataDirs = 
> /Users/aoneill/Desktop/testchannel/data
>     appserver.channels.testchannel.maxFileSize = 20000000
>     appserver.sinks.test_sink.type = hdfs
>     appserver.sinks.test_sink.channel = testchannel
>     appserver.sinks.test_sink.hdfs.path = 
> hdfs://cluster01:8020/user/aoneill/flumetest
>     appserver.sinks.test_sink.hdfs.closeTries = 3
>     appserver.sinks.test_sink.hdfs.filePrefix = events-
>     appserver.sinks.test_sink.hdfs.fileSuffix = .avro
>     appserver.sinks.test_sink.hdfs.fileType = DataStream
>     appserver.sinks.test_sink.hdfs.writeFormat = Text
>     appserver.sinks.test_sink.hdfs.inUsePrefix = inuse-
>     appserver.sinks.test_sink.hdfs.inUseSuffix = .avro
>     appserver.sinks.test_sink.hdfs.rollCount = 100000
>     appserver.sinks.test_sink.hdfs.rollInterval = 30
>     appserver.sinks.test_sink.hdfs.rollSize = 10485760
> These are the two error message that the Flume Agent outputs constantly after 
> the restart:
>     2014-08-26 10:47:24,572 (SinkRunner-PollingRunner-DefaultSinkProcessor) 
> [ERROR - 
> org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:96)]
>  Unexpected error while checking replication factor
>     java.lang.reflect.InvocationTargetException
>         at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at 
> org.apache.flume.sink.hdfs.AbstractHDFSWriter.getNumCurrentReplicas(AbstractHDFSWriter.java:162)
>         at 
> org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:82)
>         at 
> org.apache.flume.sink.hdfs.BucketWriter.shouldRotate(BucketWriter.java:452)
>         at 
> org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:387)
>         at 
> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
>         at 
> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>         at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>         at java.lang.Thread.run(Thread.java:744)
>     Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)
>         at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:525)
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1253)
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:891)
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:881)
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:779)
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)
> and
>     2014-08-26 10:47:29,592 (SinkRunner-PollingRunner-DefaultSinkProcessor) 
> [WARN - 
> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:418)] 
> HDFS IO error
>     java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)
>         at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:525)
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1253)
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:891)
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:881)
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:779)
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)
> I can provide additional information if needed. Thank you very much for any 
> insight you are able to provide into this problem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (FLUME-2451) HDFS Sink Cannot Reconnect After NameNode Restart

Reply via email to