[
https://issues.apache.org/jira/browse/FLUME-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399068#comment-13399068
]
Will McQueen commented on FLUME-1308:
-------------------------------------
I'm using Flume NG 'trunk' branch, commit
0a483c7bfa76950277fbb04b5dcddfc5164137f9
Command to launch Flume agent:
bin/flume-ng agent --conf conf --conf-file conf/flume.conf --name agent
Commands to launch avro clients (be sure to wait until agent is fully
initialized):
bin/flume-ng avro-client --conf conf -Dflume.root.logger=DEBUG,console
--host localhost --port 41414 --headerFile /tmp/flume-client-1.header
--filename /tmp/flume-client.data
bin/flume-ng avro-client --conf conf -Dflume.root.logger=DEBUG,console
--host localhost --port 41415 --headerFile /tmp/flume-client-2.header
--filename /tmp/flume-client.data
Command to populate /tmp/flume-client.data file with data (1M events, numbered
starting at 1):
seq 1000000 > /tmp/flume-client.data
Commands to watch the filechannel's checkpoint and data dirs:
watch -n1 ls -al /tmp/flume-check/
watch -n1 ls -al /tmp/flume-data/
Command to watch the flume log for exceptions:
tail -f flume.log
(need to occassionally ctrl-C and then re-run this command when the log
file rolls)
Commands to monitor event count in HDFS (I'm using a local Hadoop
2.0.0-cdh4.0.0 installation):
hadoop fs -cat "/test/client1/*" | wc -l
hadoop fs -cat "/test/client2/*" | wc -l
Contents of /tmp/flume-client-1.header:
client = client1
Contents of /tmp/flume-client-2.header:
client = client2
Contents of conf/flume-env.sh:
JAVA_OPTS="-Xms4096m -Xmx4096m"
> HDFS Sink throws DFSOutputStream when exception when maxOpenFiles=1
> -------------------------------------------------------------------
>
> Key: FLUME-1308
> URL: https://issues.apache.org/jira/browse/FLUME-1308
> Project: Flume
> Issue Type: Bug
> Components: Sinks+Sources
> Affects Versions: v1.2.0
> Environment: RHEL 6.2 64-bit
> Reporter: Will McQueen
> Fix For: v1.2.0
>
>
> When I set the HDFS sink to have maxOpenFiles=1, then 2 things happen:
> 1) Events propagate very slowly to HDFS
> 2) Events are repeated (eg, after a while the same 100 or so events appear
> repeatedly in HDFS where each event should have unique payload per the test
> I'm running).
> Steps:
> 1) Launch 2 avro clients targetting a single avro source whose associated
> channel is a file channel (also tried memory channel.. had same issue)
> 2) View the logs, and you're likely to see:
> 2012-06-21 16:27:34,106 WARN hdfs.HDFSEventSink: HDFS IO error
> java.io.IOException: DFSOutputStream is closed
> at
> org.apache.hadoop.hdfs.DFSOutputStream.isClosed(DFSOutputStream.java:1193)
> at
> org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1453)
> at
> org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1437)
> at
> org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
> at
> org.apache.flume.sink.hdfs.HDFSDataStream.sync(HDFSDataStream.java:95)
> at
> org.apache.flume.sink.hdfs.BucketWriter.doFlush(BucketWriter.java:276)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira