[jira] [Commented] (FLUME-1850) OutOfMemory Error

Juhani Connolly (JIRA) Thu, 17 Jan 2013 18:02:14 -0800

    [ 
https://issues.apache.org/jira/browse/FLUME-1850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556889#comment-13556889
 ]


Juhani Connolly commented on FLUME-1850:
----------------------------------------

The relevant bit from that:

"Juhani Connolly 2 months, 2 weeks ago (Oct. 31, 2012, 3:56 a.m.)
Hmm... I can see that as a viable approach but am curious about what happens 
with the sfWriters map in HDFSEventSink... It seems like old writers are just 
abandoned there forever? I would like to clean them up properly(I believe this 
is common in the use case where files are dumped in a file named by date). 
While not major, this does seem like it would lead to a buildup of inactive 
writers? We've had OOM errors when running flume with an HDFS sink using the 
default memory settings. I have no idea if this is related, perhaps it could 
be? Looks to me that nowhere other than the stop method is the sfWriters map 
ever cleaned up.
Juhani Connolly 2 months, 2 weeks ago (Oct. 31, 2012, 6:11 a.m.)
So, I took a heap dump and checked the retained size for BucketWriter 
objects... Around 4000 bytes  all told.

After a week, one of our aggregator/hdfs sink nodes has got 1500 bucket writers 
alive in memory, for about 6mb of memory on what are essentially dead objects. 
This is because we generate a new path(based on time) every hour, for each 
host/data type. We're still running in a test phase, with only a handful of our 
servers feeding data, so with more servers and more time, this moderate amount 
of memory to do nothing.

So in the end of the day, at some point, HDFSEventSink does need to get 
involved to clean this stuff up.
Juhani Connolly 2 months, 2 weeks ago (Oct. 31, 2012, 6:12 a.m.)
Uh, that is, 4000 bytes each. Most of that is in the writer.
Mike Percy 2 months, 2 weeks ago (Oct. 31, 2012, 7:27 a.m.)
Yeah, you're right about the sfWriters map. The original implementation 
contained that thing and I never tried to address that issue... it won't cause 
correctness problems (since close() is effectively idempotent) but yes it will 
consume some memory. That problem exists with all of the existing rolling code, 
so it would not just exist with the new code for the close-on-idle feature.

One easy band-aid would be to redefine the default maxOpenFiles to, say, 500. 
That would reduce the severity of the memory reclamation delay, at the limit. 
If each object takes 4K then a cache of 500 would only take up 2MB which isn't 
terrible. Another simple approach, which would be a bit ugly (need to be 
careful with the circular reference) would be to provide a sfWriters reference 
to each BucketWriter instance, and when the BucketWriter's close() method is 
called then it can remove itself from the cache if it's still there. Speaking 
of which, I would prefer to use the Guava CacheBuilder over what we are using 
now if we can.

Anyway, aside from simple workarounds such as the above, I think the whole 
HDFSEventSink/BucketWriter interaction would need to be significantly 
refactored to solve this design flaw, which makes me nervous since the HDFS 
sink is rather stable today after fishing out many very subtle bugs over 
several months of testing and use."
One easy band-aid would be to redefine the default maxOpenFiles to, say, 500. 
That would reduce the severity of the memory reclamation delay, at the limit. 
If each object takes 4K then a cache of 500 would only take up 2MB which isn't 
terrible. Another simple approach, which would be a bit ugly (need to be 
careful with the circular reference) would be to provide a sfWriters reference 
to each BucketWriter instance, and when the BucketWriter's close() method is 
called then it can remove itself from the cache if it's still there. Speaking 
of which, I would prefer to use the Guava CacheBuilder over what we are using 
now if we can.

Anyway, aside from simple workarounds such as the above, I think the whole 
HDFSEventSink/BucketWriter interaction would need to be significantly 
refactored to solve this design flaw, which makes me nervous since the HDFS 
sink is rather stable today after fishing out many very subtle bugs over 
several months of testing and use."
                
> OutOfMemory Error
> -----------------
>
>                 Key: FLUME-1850
>                 URL: https://issues.apache.org/jira/browse/FLUME-1850
>             Project: Flume
>          Issue Type: Bug
>          Components: Node
>    Affects Versions: v1.3.0
>         Environment: RHEL 6
>            Reporter: Mohit Anchlia
>         Attachments: flume-oo.docx, Screen Shot 2013-01-16 at 11.05.55 PM.png
>
>
> We are using flume-1.3.0. After flume is up for a while (30 days+) we get 
> OutOfMemory error. Our heap is set to 2G and load on the system is very low. 
> Around 50 request/minute. We use AvroClient and long lived connection.
> Below is the stack trace. I don't have the heap dump but I plan to enable 
> that for next time.
> 13/01/16 09:09:38 ERROR hdfs.HDFSEventSink: process failed
> java.lang.OutOfMemoryError: Java heap space
>         at java.util.Arrays.copyOf(Arrays.java:2786)
>         at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
>         at java.io.DataOutputStream.write(DataOutputStream.java:90)
>         at org.apache.hadoop.io.Text.write(Text.java:282)
>         at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
>         at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
>         at 
> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.append(SequenceFile.java:1320)
>         at 
> org.apache.flume.sink.hdfs.HDFSSequenceFile.append(HDFSSequenceFile.java:72)
>         at 
> org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:376)
>         at 
> org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
>         at 
> org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor" 
> java.lang.OutOfMemoryError: Java heap space
>         at java.util.Arrays.copyOf(Arrays.java:2786)
>         at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
>         at java.io.DataOutputStream.write(DataOutputStream.java:90)
>         at org.apache.hadoop.io.Text.write(Text.java:282)
>         at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
>         at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
>         at 
> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.append(SequenceFile.java:1320)
>         at 
> org.apache.flume.sink.hdfs.HDFSSequenceFile.append(HDFSSequenceFile.java:72)
>         at 
> org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:376)
>         at 
> org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
>         at 
> org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (FLUME-1850) OutOfMemory Error

Reply via email to