[
https://issues.apache.org/jira/browse/HADOOP-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arun C Murthy updated HADOOP-1193:
----------------------------------
Attachment: HADOOP-1193_1_20070517.patch
Here is a patch while I continue further testing... Hairong could you try to
see if it works for you? Thanks!
Basically I went ahead and implemented a 'codec pool' to reuse the
direct-buffer based codecs so as to not create too many of them...
Results while trying to sort 1Million records via TestSequenceFile with RECORD
compression:
trunk H-1193
Compressors: 1382 3
Decompressors: 1520 12
-----------------------------------------------------
Total: 2902 15
Results are even more dramatic for BLOCK compression (we need 4 codecs per
Reader with BLOCK compression for key, keyLen, val & valLen) ... in fact I have
gone ahead and bumped up the default direct buffer size for zlib to 64K from 1K
which should lead to improved performance too, on the back of this patch.
Appreciate any review/feedback.
> Map/reduce job gets OutOfMemoryException when set map out to be compressed
> --------------------------------------------------------------------------
>
> Key: HADOOP-1193
> URL: https://issues.apache.org/jira/browse/HADOOP-1193
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.12.2
> Reporter: Hairong Kuang
> Assigned To: Arun C Murthy
> Attachments: HADOOP-1193_1_20070517.patch
>
>
> One of my jobs quickly fails with the OutOfMemoryException when I set the map
> out to be compressed. But it worked fine with release 0.10.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.