[ https://issues.apache.org/jira/browse/TEZ-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Muhammad Samir Khan updated TEZ-3809: ------------------------------------- Attachment: TEZ-3809.002.patch Removed the unused imports and fixed the names for new methods. > The buffer size allocated for InMemoryMapOutput can be optimized > ---------------------------------------------------------------- > > Key: TEZ-3809 > URL: https://issues.apache.org/jira/browse/TEZ-3809 > Project: Apache Tez > Issue Type: Bug > Reporter: Muhammad Samir Khan > Assignee: Muhammad Samir Khan > Attachments: TEZ-3809.001.patch, TEZ-3809.002.patch > > > Related jiras: TEZ-3752 and TEZ-3732. > -When shuffling input to memory, the decompressed length is used to create > the InMemoryMapOutput object. However, IFile.Reader's readToMemory reads 4 > bytes less (the IFile header). These 4 bytes can optimized and, in an extreme > case of 10,000,000 fetches, can save ~38 MB (TEZ-3732). > -Memory-to-memory merge sums up the sizes of input InMemoryMapOutput buffers > to allocate the new InMemoryMapOutput. However, each input has two > EOF_MARKERs while only two are needed at the end. > -InMemoryWriter wraps the output BoundedByteArrayOutputStream in > IFileOutputStream which will write checksum at close. This creates an > inconsistency between the primary input buffers which don't have checksum and > the merged buffers which do. IFileOutputStream wrap can be removed to save 4 > bytes per merged buffers. > -InMemoryWriter does not account for two EOF_MARKERs written at close() in > its accounting so that the getRawLength() method is off by two bytes. -- This message was sent by Atlassian JIRA (v6.4.14#64029)