[ 
https://issues.apache.org/jira/browse/FLUME-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399064#comment-13399064
 ] 

Will McQueen commented on FLUME-1308:
-------------------------------------

Config file used:

agent.channels = c1
agent.sources = r1 r2
agent.sinks = k1
#
agent.channels.c1.type = FILE
agent.channels.c1.checkpointDir = /tmp/flume-check
agent.channels.c1.dataDirs = /tmp/flume-data
agent.channels.c1.checkpointInterval=30000
agent.channels.c1.capacity=2100000
#
agent.sources.r1.channels = c1
agent.sources.r1.type = AVRO
agent.sources.r1.bind = 0.0.0.0
agent.sources.r1.port = 41414
agent.sources.r1.interceptors = i1 i2
agent.sources.r1.interceptors.i1.type = TIMESTAMP
agent.sources.r1.interceptors.i2.type = HOST
agent.sources.r1.interceptors.i2.useIP = true
#
agent.sources.r2.channels = c1
agent.sources.r2.type = AVRO
agent.sources.r2.bind = 0.0.0.0
agent.sources.r2.port = 41415
agent.sources.r2.interceptors = i1 i2
agent.sources.r2.interceptors.i1.type = TIMESTAMP
agent.sources.r2.interceptors.i2.type = HOST
agent.sources.r2.interceptors.i2.useIP = true
#
agent.sinks.k1.channel = c1
agent.sinks.k1.type = HDFS
agent.sinks.k1.hdfs.path = hdfs://localhost/test/%{client}/
agent.sinks.k1.hdfs.fileType = DataStream
agent.sinks.k1.hdfs.rollSize = 0
agent.sinks.k1.hdfs.rollCount = 100000
agent.sinks.k1.hdfs.rollInterval = 0
agent.sinks.k1.hdfs.batchSize = 10
agent.sinks.k1.hdfs.maxOpenFiles = 1

                
> HDFS Sink throws DFSOutputStream when exception when maxOpenFiles=1
> -------------------------------------------------------------------
>
>                 Key: FLUME-1308
>                 URL: https://issues.apache.org/jira/browse/FLUME-1308
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.2.0
>         Environment: RHEL 6.2 64-bit
>            Reporter: Will McQueen
>             Fix For: v1.2.0
>
>
> When I set the HDFS sink to have maxOpenFiles=1, then 2 things happen:
> 1) Events propagate very slowly to HDFS
> 2) Events are repeated (eg, after a while the same 100 or so events appear 
> repeatedly in HDFS where each event should have unique payload per the test 
> I'm running).
> Steps:
> 1) Launch 2 avro clients targetting a single avro source whose associated 
> channel is a file channel (also tried memory channel.. had same issue)
> 2) View the logs, and you're likely to see:
> 2012-06-21 16:27:34,106 WARN hdfs.HDFSEventSink: HDFS IO error
> java.io.IOException: DFSOutputStream is closed
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream.isClosed(DFSOutputStream.java:1193)
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1453)
>         at 
> org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1437)
>         at 
> org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
>         at 
> org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
>         at 
> org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:276)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to