[ 
https://issues.apache.org/jira/browse/PIG-3251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648788#comment-13648788
 ] 

Koji Noguchi commented on PIG-3251:
-----------------------------------

bq. FYI, couple of tests from TestBZip are failing after applying my patch. 
Looking.

3 tests failed.  
{noformat}
Testcase: testBZ2Concatenation took 38.266 sec
  FAILED
Expected exception: java.io.IOException
junit.framework.AssertionFailedError: Expected exception: java.io.IOException

Testcase: testBlockHeaderEndingWithCR took 49.539 sec
  FAILED
expected:<82094> but was:<82093>
junit.framework.AssertionFailedError: expected:<82094> but was:<82093>
  at org.apache.pig.test.TestBZip.testCount(TestBZip.java:256)
  at
org.apache.pig.test.TestBZip.testBlockHeaderEndingWithCR(TestBZip.java:112)

Testcase: testBlockHeaderEndingAtSplitNotByteAligned took 48.996 sec
  FAILED
expected:<74999> but was:<101591>
junit.framework.AssertionFailedError: expected:<74999> but was:<101591>
  at org.apache.pig.test.TestBZip.testCount(TestBZip.java:256)
  at
org.apache.pig.test.TestBZip.testBlockHeaderEndingAtSplitNotByteAligned(TestBZip.java:88)
{noformat}

"testBZ2Concatenation" is expected since hadoop bzip2 codec handles 
concatenated bzip files (whereas pig's TestBZip is testing whether it reliably 
fails).
Other two are worrisome to me.  Asking my colleague to check.  It'll take some 
time.  Depending on what we find, we may need to change the condition for using 
hadoop's bzip codec.

                
> Bzip2TextInputFormat requires double the memory of maximum record size
> ----------------------------------------------------------------------
>
>                 Key: PIG-3251
>                 URL: https://issues.apache.org/jira/browse/PIG-3251
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Minor
>         Attachments: pig-3251-trunk-v01.patch, pig-3251-trunk-v02.patch, 
> pig-3251-trunk-v03.patch, pig-3251-trunk-v04.patch, pig-3251-trunk-v05.patch
>
>
> While looking at user's OOM heap dump, noticed that pig's 
> Bzip2TextInputFormat consumes memory at both
> Bzip2TextInputFormat.buffer (ByteArrayOutputStream) 
> and actual Text that is returned as line.
> For example, when having one record with 160MBytes, buffer was 268MBytes and 
> Text was 160MBytes.  
> We can probably eliminate one of them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to