[ https://issues.apache.org/jira/browse/PIG-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096796#comment-15096796 ]
Koji Noguchi commented on PIG-3251: ----------------------------------- Thanks [~rohini]. Created PIG-4779 for tracking. In my test environment it was throwing different IOException and incorrectly passing the test. I'll check. > Bzip2TextInputFormat requires double the memory of maximum record size > ---------------------------------------------------------------------- > > Key: PIG-3251 > URL: https://issues.apache.org/jira/browse/PIG-3251 > Project: Pig > Issue Type: Improvement > Reporter: Koji Noguchi > Assignee: Koji Noguchi > Fix For: 0.16.0 > > Attachments: pig-3251-trunk-v01.patch, pig-3251-trunk-v02.patch, > pig-3251-trunk-v03.patch, pig-3251-trunk-v04.patch, pig-3251-trunk-v05.patch, > pig-3251-trunk-v06.patch, pig-3251-trunk-v07.patch, pig-3251-trunk-v08.patch, > pig-3251-trunk-v09.patch > > > While looking at user's OOM heap dump, noticed that pig's > Bzip2TextInputFormat consumes memory at both > Bzip2TextInputFormat.buffer (ByteArrayOutputStream) > and actual Text that is returned as line. > For example, when having one record with 160MBytes, buffer was 268MBytes and > Text was 160MBytes. > We can probably eliminate one of them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)