[
https://issues.apache.org/jira/browse/FLUME-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556889#comment-13556889
]
Juhani Connolly commented on FLUME-1850:
----------------------------------------
The relevant bit from that:
"Juhani Connolly 2 months, 2 weeks ago (Oct. 31, 2012, 3:56 a.m.)
Hmm... I can see that as a viable approach but am curious about what happens
with the sfWriters map in HDFSEventSink... It seems like old writers are just
abandoned there forever? I would like to clean them up properly(I believe this
is common in the use case where files are dumped in a file named by date).
While not major, this does seem like it would lead to a buildup of inactive
writers? We've had OOM errors when running flume with an HDFS sink using the
default memory settings. I have no idea if this is related, perhaps it could
be? Looks to me that nowhere other than the stop method is the sfWriters map
ever cleaned up.
Juhani Connolly 2 months, 2 weeks ago (Oct. 31, 2012, 6:11 a.m.)
So, I took a heap dump and checked the retained size for BucketWriter
objects... Around 4000 bytes all told.
After a week, one of our aggregator/hdfs sink nodes has got 1500 bucket writers
alive in memory, for about 6mb of memory on what are essentially dead objects.
This is because we generate a new path(based on time) every hour, for each
host/data type. We're still running in a test phase, with only a handful of our
servers feeding data, so with more servers and more time, this moderate amount
of memory to do nothing.
So in the end of the day, at some point, HDFSEventSink does need to get
involved to clean this stuff up.
Juhani Connolly 2 months, 2 weeks ago (Oct. 31, 2012, 6:12 a.m.)
Uh, that is, 4000 bytes each. Most of that is in the writer.
Mike Percy 2 months, 2 weeks ago (Oct. 31, 2012, 7:27 a.m.)
Yeah, you're right about the sfWriters map. The original implementation
contained that thing and I never tried to address that issue... it won't cause
correctness problems (since close() is effectively idempotent) but yes it will
consume some memory. That problem exists with all of the existing rolling code,
so it would not just exist with the new code for the close-on-idle feature.
One easy band-aid would be to redefine the default maxOpenFiles to, say, 500.
That would reduce the severity of the memory reclamation delay, at the limit.
If each object takes 4K then a cache of 500 would only take up 2MB which isn't
terrible. Another simple approach, which would be a bit ugly (need to be
careful with the circular reference) would be to provide a sfWriters reference
to each BucketWriter instance, and when the BucketWriter's close() method is
called then it can remove itself from the cache if it's still there. Speaking
of which, I would prefer to use the Guava CacheBuilder over what we are using
now if we can.
Anyway, aside from simple workarounds such as the above, I think the whole
HDFSEventSink/BucketWriter interaction would need to be significantly
refactored to solve this design flaw, which makes me nervous since the HDFS
sink is rather stable today after fishing out many very subtle bugs over
several months of testing and use."
One easy band-aid would be to redefine the default maxOpenFiles to, say, 500.
That would reduce the severity of the memory reclamation delay, at the limit.
If each object takes 4K then a cache of 500 would only take up 2MB which isn't
terrible. Another simple approach, which would be a bit ugly (need to be
careful with the circular reference) would be to provide a sfWriters reference
to each BucketWriter instance, and when the BucketWriter's close() method is
called then it can remove itself from the cache if it's still there. Speaking
of which, I would prefer to use the Guava CacheBuilder over what we are using
now if we can.
Anyway, aside from simple workarounds such as the above, I think the whole
HDFSEventSink/BucketWriter interaction would need to be significantly
refactored to solve this design flaw, which makes me nervous since the HDFS
sink is rather stable today after fishing out many very subtle bugs over
several months of testing and use."
> OutOfMemory Error
> -----------------
>
> Key: FLUME-1850
> URL: https://issues.apache.org/jira/browse/FLUME-1850
> Project: Flume
> Issue Type: Bug
> Components: Node
> Affects Versions: v1.3.0
> Environment: RHEL 6
> Reporter: Mohit Anchlia
> Attachments: flume-oo.docx, Screen Shot 2013-01-16 at 11.05.55 PM.png
>
>
> We are using flume-1.3.0. After flume is up for a while (30 days+) we get
> OutOfMemory error. Our heap is set to 2G and load on the system is very low.
> Around 50 request/minute. We use AvroClient and long lived connection.
> Below is the stack trace. I don't have the heap dump but I plan to enable
> that for next time.
> 13/01/16 09:09:38 ERROR hdfs.HDFSEventSink: process failed
> java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2786)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at org.apache.hadoop.io.Text.write(Text.java:282)
> at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
> at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
> at
> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.append(SequenceFile.java:1320)
> at
> org.apache.flume.sink.hdfs.HDFSSequenceFile.append(HDFSSequenceFile.java:72)
> at
> org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:376)
> at
> org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
> at
> org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor"
> java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2786)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> at org.apache.hadoop.io.Text.write(Text.java:282)
> at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
> at
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
> at
> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.append(SequenceFile.java:1320)
> at
> org.apache.flume.sink.hdfs.HDFSSequenceFile.append(HDFSSequenceFile.java:72)
> at
> org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:376)
> at
> org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
> at
> org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira