[jira] [Commented] (TEZ-3752) Reduce Object size of InMemoryMapOutput for large jobs
[ https://issues.apache.org/jira/browse/TEZ-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104002#comment-16104002 ] TezQA commented on TEZ-3752: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12879018/TEZ-3752.001.patch against master revision 4b5448d. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2590//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2590//console This message is automatically generated. > Reduce Object size of InMemoryMapOutput for large jobs > -- > > Key: TEZ-3752 > URL: https://issues.apache.org/jira/browse/TEZ-3752 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Muhammad Samir Khan > Attachments: TEZ-3752.001.patch > > > Follow-on jira from TEZ-3732. The InMemoryMapOutput has a > BoundedByteArrayOutputStream that is only used in the Merged MapOutput case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3752) Reduce Object size of InMemoryMapOutput for large jobs
[ https://issues.apache.org/jira/browse/TEZ-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103880#comment-16103880 ] Muhammad Samir Khan commented on TEZ-3752: -- However, this test doesn't actually hit the RLE case. InMemoryWriter has RLE turned off since the Writer constructor it calls has rle flag set to false. > Reduce Object size of InMemoryMapOutput for large jobs > -- > > Key: TEZ-3752 > URL: https://issues.apache.org/jira/browse/TEZ-3752 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Muhammad Samir Khan > Attachments: TEZ-3752.001.patch > > > Follow-on jira from TEZ-3732. The InMemoryMapOutput has a > BoundedByteArrayOutputStream that is only used in the Merged MapOutput case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3752) Reduce Object size of InMemoryMapOutput for large jobs
[ https://issues.apache.org/jira/browse/TEZ-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103716#comment-16103716 ] Muhammad Samir Khan commented on TEZ-3752: -- Ran orderedwordcount with -Dtez.shuffle-vertex-manager.enable.auto-parallel=true -Dtez.runtime.io.sort.factor=4 -Dtez.runtime.shuffle.memory-to-memory.enable=true. Sorted the output (via sort) and diff'd against the output from orderedwordcount without the changes. Also turned on the '"writeFile SAME_KEY count=" + count' log line in TezMerger.writeFile to ensure we hit the RLE case with in memory merge: 2017-07-27 18:19:18,128 [INFO] [MemToMemMerger [Tokenizer]] |orderedgrouped.MergeManager|: Tokenizer: Initiating Memory-to-Memory merge with 4 segments of total-size: 22182024 2017-07-27 18:19:18,770 [INFO] [MemToMemMerger [Tokenizer]] |impl.TezMerger|: writeFile SAME_KEY count=1544269 2017-07-27 18:19:18,771 [INFO] [MemToMemMerger [Tokenizer]] |orderedgrouped.MergeManager|: Tokenizer Memory-to-Memory merge of the 4 files in-memory complete with mergeOutputSize=22182024 > Reduce Object size of InMemoryMapOutput for large jobs > -- > > Key: TEZ-3752 > URL: https://issues.apache.org/jira/browse/TEZ-3752 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Muhammad Samir Khan > Attachments: TEZ-3752.001.patch > > > Follow-on jira from TEZ-3732. The InMemoryMapOutput has a > BoundedByteArrayOutputStream that is only used in the Merged MapOutput case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3752) Reduce Object size of InMemoryMapOutput for large jobs
[ https://issues.apache.org/jira/browse/TEZ-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102127#comment-16102127 ] Jonathan Eagles commented on TEZ-3752: -- This approach and implementation look correct. Can you post some results of running jobs with RLE to verify merge is correct in that scenario? > Reduce Object size of InMemoryMapOutput for large jobs > -- > > Key: TEZ-3752 > URL: https://issues.apache.org/jira/browse/TEZ-3752 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Muhammad Samir Khan > Attachments: TEZ-3752.001.patch > > > Follow-on jira from TEZ-3732. The InMemoryMapOutput has a > BoundedByteArrayOutputStream that is only used in the Merged MapOutput case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3752) Reduce Object size of InMemoryMapOutput for large jobs
[ https://issues.apache.org/jira/browse/TEZ-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102046#comment-16102046 ] Muhammad Samir Khan commented on TEZ-3752: -- JOL dump: Before: -internals: {code} # Running 64-bit HotSpot VM. # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # Objects are 8 bytes aligned. # Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes] # Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes] Instantiated the sample instance via org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput(org.apache.tez.runtime.library.common.InputAttemptIdentifier,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped,long,boolean,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$1) org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 01 00 00 00 (0001 ) (1) 4 4 (object header) 00 00 00 00 ( ) (0) 8 4 (object header) 78 12 01 f8 (0000 00010010 0001 1000) (-134147464) 12 4 int MapOutput.id 1 16 1 boolean MapOutput.primaryMapOutputfalse 17 3 (alignment/padding gap) 20 4 org.apache.tez.runtime.library.common.InputAttemptIdentifier MapOutput.attemptIdentifier null 24 4 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped MapOutput.callbacknull 28 4 org.apache.hadoop.io.BoundedByteArrayOutputStream InMemoryMapOutput.byteStream (object) Instance size: 32 bytes Space losses: 3 bytes internal + 0 bytes external = 3 bytes total {code} -footprint: {code} # Running 64-bit HotSpot VM. # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # Objects are 8 bytes aligned. # Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes] # Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes] Instantiated the sample instance via org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput(org.apache.tez.runtime.library.common.InputAttemptIdentifier,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped,long,boolean,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$1) org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput@10bdf5e5d footprint: COUNT AVG SUM DESCRIPTION 11616 [B 13232 org.apache.hadoop.io.BoundedByteArrayOutputStream 13232 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput 3 80 (total) {code} After: -internals: {code} # Running 64-bit HotSpot VM. # Using compressed oop with 3-bit shift. # Using compressed klass with 3-bit shift. # Objects are 8 bytes aligned. # Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes] # Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes] Instantiated the sample instance via org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput(org.apache.tez.runtime.library.common.InputAttemptIdentifier,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped,long,boolean,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$1) org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4