[jira] [Comment Edited] (TEZ-3809) The buffer size allocated for InMemoryMapOutput can be optimized

2017-08-02 Thread Muhammad Samir Khan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16111794#comment-16111794
 ] 

Muhammad Samir Khan edited comment on TEZ-3809 at 8/2/17 10:07 PM:
---

Took a heap dump on ordered word count before final merge. In the after case, 
one of the outputs was written to disk instead of kept in memory and that is 
why it has 37 entries. 

Before:

Class Name  
   | Shallow Heap | Retained Heap | Percentage
---
java.lang.Thread @ 0x5d2c473f8  ShuffleAndMergeRunner {Tokenizer} Thread
   |  120 | 2,229,207,992 | 96.48%
|- java.util.ArrayList @ 0x73f978f10
   |   24 | 2,229,206,760 | 96.48%
|  '- java.lang.Object[38] @ 0x73f979130
   |  168 | 2,229,206,736 | 96.48%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x5e4a88898|   32 |68,078,192 |  2.95%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631e0b260|   32 |67,839,520 |  2.94%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x5e4a888b8|   32 |67,700,608 |  2.93%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x73f9db168|   32 |67,500,816 |  2.92%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x60ab36218|   32 |67,408,704 |  2.92%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631deed28|   32 |67,367,424 |  2.92%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x743b86ee0|   32 |67,337,936 |  2.91%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x60af3a698|   32 |67,300,896 |  2.91%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631e0c5b8|   32 |67,282,464 |  2.91%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x60ab33140|   32 |67,264,304 |  2.91%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x5e4a88878|   32 |67,127,368 |  2.91%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631e0b218|   32 |67,098,216 |  2.90%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631e0c6c8|   32 |67,064,504 |  2.90%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x5d239a6c8|   32 |67,003,776 |  2.90%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x5d23b7e10|   32 |66,965,296 |  2.90%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631def2b8|   32 |66,928,032 |  2.90%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x60ab351d0|   32 |66,916,896 |  2.90%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x74805dfb8|   32 |66,886,272 |  2.89%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x60af39598|   32 |66,718,800 |  2.89%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x73fb0fb78|   32 |66,688,296 |  2.89%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631e0c4b0|   32 |66,656,312 |  2.88%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x60af39578|   32 |66,629,936 |  2.88%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631deec30|   32 |66,584,576 |  2.88%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631e

[jira] [Comment Edited] (TEZ-3809) The buffer size allocated for InMemoryMapOutput can be optimized

2017-08-02 Thread Muhammad Samir Khan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16111794#comment-16111794
 ] 

Muhammad Samir Khan edited comment on TEZ-3809 at 8/2/17 10:04 PM:
---

Took a heap dump on ordered word count before final merge. In the after case, 
one of the outputs was written to disk instead of kept in memory and that is 
why it has 37 entries. 

Before:

Class Name  
   | Shallow Heap | Retained Heap | Percentage
---
java.lang.Thread @ 0x5d2c473f8  ShuffleAndMergeRunner {Tokenizer} Thread
   |  120 | 2,229,207,992 | 96.48%
|- java.util.ArrayList @ 0x73f978f10
   |   24 | 2,229,206,760 | 96.48%
|  '- java.lang.Object[38] @ 0x73f979130
   |  168 | 2,229,206,736 | 96.48%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x5e4a88898|   32 |68,078,192 |  2.95%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631e0b260|   32 |67,839,520 |  2.94%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x5e4a888b8|   32 |67,700,608 |  2.93%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x73f9db168|   32 |67,500,816 |  2.92%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x60ab36218|   32 |67,408,704 |  2.92%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631deed28|   32 |67,367,424 |  2.92%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x743b86ee0|   32 |67,337,936 |  2.91%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x60af3a698|   32 |67,300,896 |  2.91%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631e0c5b8|   32 |67,282,464 |  2.91%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x60ab33140|   32 |67,264,304 |  2.91%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x5e4a88878|   32 |67,127,368 |  2.91%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631e0b218|   32 |67,098,216 |  2.90%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631e0c6c8|   32 |67,064,504 |  2.90%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x5d239a6c8|   32 |67,003,776 |  2.90%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x5d23b7e10|   32 |66,965,296 |  2.90%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631def2b8|   32 |66,928,032 |  2.90%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x60ab351d0|   32 |66,916,896 |  2.90%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x74805dfb8|   32 |66,886,272 |  2.89%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x60af39598|   32 |66,718,800 |  2.89%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x73fb0fb78|   32 |66,688,296 |  2.89%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631e0c4b0|   32 |66,656,312 |  2.88%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x60af39578|   32 |66,629,936 |  2.88%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631deec30|   32 |66,584,576 |  2.88%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631e

[jira] [Comment Edited] (TEZ-3809) The buffer size allocated for InMemoryMapOutput can be optimized

2017-08-02 Thread Muhammad Samir Khan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16111794#comment-16111794
 ] 

Muhammad Samir Khan edited comment on TEZ-3809 at 8/2/17 10:04 PM:
---

Took a heap dump on ordered word count before final merge. In the after case, 
one of the outputs was written to disk instead of kept in memory and that is 
why it has 37 entries. 

Before:

Class Name  
   | Shallow Heap | Retained Heap | Percentage
---
java.lang.Thread @ 0x5d2c473f8  ShuffleAndMergeRunner {Tokenizer} Thread
   |  120 | 2,229,207,992 | 96.48%
|- java.util.ArrayList @ 0x73f978f10
   |   24 | 2,229,206,760 | 96.48%
|  '- java.lang.Object[38] @ 0x73f979130
   |  168 | 2,229,206,736 | 96.48%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x5e4a88898|   32 |68,078,192 |  2.95%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631e0b260|   32 |67,839,520 |  2.94%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x5e4a888b8|   32 |67,700,608 |  2.93%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x73f9db168|   32 |67,500,816 |  2.92%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x60ab36218|   32 |67,408,704 |  2.92%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631deed28|   32 |67,367,424 |  2.92%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x743b86ee0|   32 |67,337,936 |  2.91%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x60af3a698|   32 |67,300,896 |  2.91%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631e0c5b8|   32 |67,282,464 |  2.91%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x60ab33140|   32 |67,264,304 |  2.91%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x5e4a88878|   32 |67,127,368 |  2.91%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631e0b218|   32 |67,098,216 |  2.90%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631e0c6c8|   32 |67,064,504 |  2.90%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x5d239a6c8|   32 |67,003,776 |  2.90%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x5d23b7e10|   32 |66,965,296 |  2.90%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631def2b8|   32 |66,928,032 |  2.90%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x60ab351d0|   32 |66,916,896 |  2.90%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x74805dfb8|   32 |66,886,272 |  2.89%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x60af39598|   32 |66,718,800 |  2.89%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x73fb0fb78|   32 |66,688,296 |  2.89%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631e0c4b0|   32 |66,656,312 |  2.88%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x60af39578|   32 |66,629,936 |  2.88%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631deec30|   32 |66,584,576 |  2.88%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631e

[jira] [Comment Edited] (TEZ-3809) The buffer size allocated for InMemoryMapOutput can be optimized

2017-08-02 Thread Muhammad Samir Khan (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16111794#comment-16111794
 ] 

Muhammad Samir Khan edited comment on TEZ-3809 at 8/2/17 10:03 PM:
---

Took a heap dump on ordered word count before final merge. In the after case, 
one of the outputs was written to disk instead of kept in memory and that is 
why it has 37 entries. 

Before:
Class Name  
   | Shallow Heap | Retained Heap | Percentage
---
java.lang.Thread @ 0x5d2c473f8  ShuffleAndMergeRunner {Tokenizer} Thread
   |  120 | 2,229,207,992 | 96.48%
|- java.util.ArrayList @ 0x73f978f10
   |   24 | 2,229,206,760 | 96.48%
|  '- java.lang.Object[38] @ 0x73f979130
   |  168 | 2,229,206,736 | 96.48%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x5e4a88898|   32 |68,078,192 |  2.95%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631e0b260|   32 |67,839,520 |  2.94%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x5e4a888b8|   32 |67,700,608 |  2.93%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x73f9db168|   32 |67,500,816 |  2.92%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x60ab36218|   32 |67,408,704 |  2.92%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631deed28|   32 |67,367,424 |  2.92%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x743b86ee0|   32 |67,337,936 |  2.91%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x60af3a698|   32 |67,300,896 |  2.91%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631e0c5b8|   32 |67,282,464 |  2.91%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x60ab33140|   32 |67,264,304 |  2.91%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x5e4a88878|   32 |67,127,368 |  2.91%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631e0b218|   32 |67,098,216 |  2.90%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631e0c6c8|   32 |67,064,504 |  2.90%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x5d239a6c8|   32 |67,003,776 |  2.90%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x5d23b7e10|   32 |66,965,296 |  2.90%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631def2b8|   32 |66,928,032 |  2.90%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x60ab351d0|   32 |66,916,896 |  2.90%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x74805dfb8|   32 |66,886,272 |  2.89%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x60af39598|   32 |66,718,800 |  2.89%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x73fb0fb78|   32 |66,688,296 |  2.89%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631e0c4b0|   32 |66,656,312 |  2.88%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x60af39578|   32 |66,629,936 |  2.88%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631deec30|   32 |66,584,576 |  2.88%
| |- 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 @ 0x631e0