[
https://issues.apache.org/jira/browse/FLUME-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556494#comment-13556494
]
Connor Woodson commented on FLUME-1850:
---------------------------------------
Is that the config that was used when the issue came about? How many different
hosts are connected? The settings appear that there won't be a lot of files
open unless lots of hosts are connected, resulting in a need for many files.
(both rollInterval and idleTimeout should result in the files being closed
every hour or so and then eventually removed from the linked list).
Would having a rollTimerPoolSize of 1 (the default) prevent some buckets from
either rolling or timing out? Or maybe there's a bug in SequenceFile that
doesn't free itself up.
The only way to confirm this is a bug I suppose is trying out various
configurations to try and narrow down what is preventing the files from being
closed; maybe see if using hdfs.fileType=DataStream gives you the same issue.
If the issue is just that files aren't closing properly, a workaround for that
is to limit hdfs.maxOpenFiles to a reasonable number (it starts at 5000)
> OutOfMemory Error
> -----------------
>
> Key: FLUME-1850
> URL: https://issues.apache.org/jira/browse/FLUME-1850
> Project: Flume
> Issue Type: Bug
> Components: Node
> Affects Versions: v1.3.0
> Environment: RHEL 6
> Reporter: Mohit Anchlia
> Attachments: flume-oo.docx, Screen Shot 2013-01-16 at 11.05.55 PM.png
>
>
> We are using flume-1.3.0. After flume is up for a while (30 days+) we get
> OutOfMemory error. Our heap is set to 2G and load on the system is very low.
> Around 50 request/minute. We use AvroClient and long lived connection.
> Below is the stack trace. I don't have the heap dump but I plan to enable
> that for next time.
> 13/01/16 09:09:38 ERROR hdfs.HDFSEventSink: process failed
> java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2786)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at org.apache.hadoop.io.Text.write(Text.java:282)
> at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
> at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
> at
> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.append(SequenceFile.java:1320)
> at
> org.apache.flume.sink.hdfs.HDFSSequenceFile.append(HDFSSequenceFile.java:72)
> at
> org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:376)
> at
> org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
> at
> org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor"
> java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2786)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at org.apache.hadoop.io.Text.write(Text.java:282)
> at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
> at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
> at
> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.append(SequenceFile.java:1320)
> at
> org.apache.flume.sink.hdfs.HDFSSequenceFile.append(HDFSSequenceFile.java:72)
> at
> org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:376)
> at
> org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
> at
> org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira