Thanks Hari for your help in this. Appreciate it. We will work towards upgrading to CDH 4.2.1 soon, and hopefully, this issue is resolved.
~Rahul. ________________________________ From: Hari Shreedharan <hshreedha...@cloudera.com> To: "user@flume.apache.org" <user@flume.apache.org> Sent: Monday, May 13, 2013 7:58 PM Subject: Re: IOException with HDFS-Sink:flushOrSync The patch also made it to Hadoop 2.0.3. On Monday, May 13, 2013, Hari Shreedharan wrote: Looks like CDH4.2.1 does have that patch: http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.2.1.CHANGES.txt (but it was not in CDH4.1.2) > > > > >Hari > > >-- >Hari Shreedharan > > >On Monday, May 13, 2013 at 7:23 PM, Rahul Ravindran wrote: >We are using cdh 4.1.2 - Hadoop version 2.0.0. Looks like cdh 4.2.1 also uses >the same Hadoop version. Any suggestions on any mitigations? >> >>Sent from my phone.Excuse the terseness. >> >>On May 13, 2013, at 7:12 PM, Hari Shreedharan <hshreedha...@cloudera.com> >>wrote: >> >> >>What version of Hadoop are you using? Looks like you are getting hit by >>https://issues.apache.org/jira/browse/HADOOP-6762. >>> >>> >>> >>> >>>Hari >>> >>> >>>-- >>>Hari Shreedharan >>> >>> >>>On Monday, May 13, 2013 at 6:50 PM, Matt Wise wrote: >>>So we've just had this happen twice to two different flume machines... we're >>>using the HDFS sink as well, but ours is writing to an S3N:// URL. Both >>>times our sink stopped working and the filechannel clogged up immediately >>>causing serious problems. A restart of Flume worked -- but the filechannel >>>was so backed up at that point that it took a good long while to get Flume >>>started up again properly. >>>> >>>> >>>>Anyone else seeing this behavior? >>>> >>>> >>>>(oh, and we're running flume 1.3.0) >>>> >>>>On May 7, 2013, at 8:42 AM, Rahul Ravindran <rahu...@yahoo.com> wrote: >>>> >>>>Hi, >>>>> We have noticed this a few times now where we appear to have an >>>>>IOException from HDFS and this stops draining the channel until the flume >>>>>process is restarted. Below are the logs: namenode-v01-00b is the active >>>>>namenode (namenode-v01-00a is standby). We are using Quorum Journal >>>>>Manager for our Namenode HA, but there was no Namenode failover which was >>>>>initiated. If this is an expected error, should flume handle it and >>>>>gracefully retry (thereby not requiring a restart)? >>>>>Thanks, >>>>>~Rahul. >>>>> >>>>> >>>>>7 May 2013 06:35:02,494 WARN [hdfs-hdfs-sink4-call-runner-2] >>>>>(org.apache.flume.sink.hdfs.BucketWriter.append:378) - Caught IOException >>>>>writing to HDFSWriter (IOException flush:java.io.IOException: Failed on >>>>>local exception: java.nio.channels.ClosedByInterruptException; Host >>>>>Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination >>>>>host is: "namenode-v01-00a.a.com":8020; ). Closing file >>>>>(hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp) >>>>> and rethrowing exception. >>>>>07 May 2013 06:35:02,494 WARN [hdfs-hdfs-sink4-call-runner-2] >>>>>(org.apache.flume.sink.hdfs.BucketWriter.append:384) - Caught IOException >>>>>while closing file >>>>>(hdfs://nameservice1/user/br/data_platform/eventstream/event/flumefs-v01-10a-4//event.1367891734983.tmp). >>>>> Exception follows. >>>>>java.io.IOException: IOException flush:java.io.IOException: Failed on >>>>>local exception: java.nio.channels.ClosedByInterruptException; Host >>>>>Details : local host is: "flumefs-v01-10a.a.com/10.40.85.170"; destination >>>>>host is: "namenode-v01-00a.a.com":8020;