[jira] [Commented] (TEZ-2850) Tez MergeManager OOM for small Map Outputs
[ https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14967407#comment-14967407 ] Siddharth Seth commented on TEZ-2850: - Not from me. I think this should go back to 0.6. I don't believe any more releases are planned from the 0.5 line - so I haven't been backporting patches to the 0.5 branch. > Tez MergeManager OOM for small Map Outputs > -- > > Key: TEZ-2850 > URL: https://issues.apache.org/jira/browse/TEZ-2850 > Project: Apache Tez > Issue Type: Bug >Reporter: Saikat >Assignee: Jonathan Eagles > Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850.1.patch, > TEZ-2850.2.patch, TEZ-2850.3.patch, TEZ-2850_test.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2850) Tez MergeManager OOM for small Map Outputs
[ https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14963548#comment-14963548 ] Jonathan Eagles commented on TEZ-2850: -- [~sseth], [~gopalv], any other comments/concerns before this goes in? Also, should this go back to 0.5/0.6 or just to 0.7? > Tez MergeManager OOM for small Map Outputs > -- > > Key: TEZ-2850 > URL: https://issues.apache.org/jira/browse/TEZ-2850 > Project: Apache Tez > Issue Type: Bug >Reporter: Saikat >Assignee: Jonathan Eagles > Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850.1.patch, > TEZ-2850.2.patch, TEZ-2850.3.patch, TEZ-2850_test.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2850) Tez MergeManager OOM for small Map Outputs
[ https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961627#comment-14961627 ] Siddharth Seth commented on TEZ-2850: - +1. This looks good. The null in close would have shown up as an NPE in ChecksumInputStream if it were invoked. > Tez MergeManager OOM for small Map Outputs > -- > > Key: TEZ-2850 > URL: https://issues.apache.org/jira/browse/TEZ-2850 > Project: Apache Tez > Issue Type: Bug >Reporter: Saikat > Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850.1.patch, > TEZ-2850.2.patch, TEZ-2850.3.patch, TEZ-2850_test.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2850) Tez MergeManager OOM for small Map Outputs
[ https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961474#comment-14961474 ] Gopal V commented on TEZ-2850: -- [~jeagles]: good call on the in == null check, that's a valid assumption for InMemoryReader. I notice that there's some possibility of an NPE in close() if checksSumIn is null, but I'm pretty sure it doesn't called in the normal operations. > Tez MergeManager OOM for small Map Outputs > -- > > Key: TEZ-2850 > URL: https://issues.apache.org/jira/browse/TEZ-2850 > Project: Apache Tez > Issue Type: Bug >Reporter: Saikat > Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850.1.patch, > TEZ-2850.2.patch, TEZ-2850.3.patch, TEZ-2850_test.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2850) Tez MergeManager OOM for small Map Outputs
[ https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961446#comment-14961446 ] TezQA commented on TEZ-2850: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12767127/TEZ-2850.3.patch against master revision 25f0247. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.dag.app.TestSpeculation Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1232//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1232//console This message is automatically generated. > Tez MergeManager OOM for small Map Outputs > -- > > Key: TEZ-2850 > URL: https://issues.apache.org/jira/browse/TEZ-2850 > Project: Apache Tez > Issue Type: Bug >Reporter: Saikat > Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850.1.patch, > TEZ-2850.2.patch, TEZ-2850.3.patch, TEZ-2850_test.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2850) Tez MergeManager OOM for small Map Outputs
[ https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961265#comment-14961265 ] Jonathan Eagles commented on TEZ-2850: -- TEZ-2901 was filed to take over the maximum in memory segments. This ticket will be for removing the unnecessary IFileInputStream. > Tez MergeManager OOM for small Map Outputs > -- > > Key: TEZ-2850 > URL: https://issues.apache.org/jira/browse/TEZ-2850 > Project: Apache Tez > Issue Type: Bug >Reporter: Saikat > Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850.1.patch, > TEZ-2850.2.patch, TEZ-2850_test.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2850) Tez MergeManager OOM for small Map Outputs
[ https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14961167#comment-14961167 ] Saikat commented on TEZ-2850: - [~jeagles] unassigning myself to due time critical nature of this bug. Will take up the bug if still unresolved. > Tez MergeManager OOM for small Map Outputs > -- > > Key: TEZ-2850 > URL: https://issues.apache.org/jira/browse/TEZ-2850 > Project: Apache Tez > Issue Type: Bug >Reporter: Saikat > Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850.1.patch, > TEZ-2850.2.patch, TEZ-2850_test.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2850) Tez MergeManager OOM for small Map Outputs
[ https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955306#comment-14955306 ] Siddharth Seth commented on TEZ-2850: - Yes, that should take care of not allocating the buffer. An alternate constructor may be required in the main Reader. > Tez MergeManager OOM for small Map Outputs > -- > > Key: TEZ-2850 > URL: https://issues.apache.org/jira/browse/TEZ-2850 > Project: Apache Tez > Issue Type: Bug >Reporter: Saikat >Assignee: Saikat > Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850.1.patch, > TEZ-2850.2.patch, TEZ-2850_test.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2850) Tez MergeManager OOM for small Map Outputs
[ https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955234#comment-14955234 ] Saikat commented on TEZ-2850: - [~sseth] if my understanding is correct when we call the InMemoryReader Constructor which in turn calls the IFile.Reader superclass constructor, we should pass an info saying that donot allocate the IFileInputStream object since checksumIn its not used, as the data is already in memory. > Tez MergeManager OOM for small Map Outputs > -- > > Key: TEZ-2850 > URL: https://issues.apache.org/jira/browse/TEZ-2850 > Project: Apache Tez > Issue Type: Bug >Reporter: Saikat >Assignee: Saikat > Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850.1.patch, > TEZ-2850.2.patch, TEZ-2850_test.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2850) Tez MergeManager OOM for small Map Outputs
[ https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954018#comment-14954018 ] TezQA commented on TEZ-2850: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12766181/TEZ-2850.2.patch against master revision 822bc69. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1210//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1210//console This message is automatically generated. > Tez MergeManager OOM for small Map Outputs > -- > > Key: TEZ-2850 > URL: https://issues.apache.org/jira/browse/TEZ-2850 > Project: Apache Tez > Issue Type: Bug >Reporter: Saikat >Assignee: Saikat > Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850.1.patch, > TEZ-2850.2.patch, TEZ-2850_test.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2850) Tez MergeManager OOM for small Map Outputs
[ https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14954009#comment-14954009 ] Siddharth Seth commented on TEZ-2850: - [~saikatr], as [~gopalv] pointed out in the previous comment - the 4K chunks are not required for in-memory segments. Fixing that would be a lot simpler - and less error prone (races on when to trigger a spill). We should target that as the fix for this issue. > Tez MergeManager OOM for small Map Outputs > -- > > Key: TEZ-2850 > URL: https://issues.apache.org/jira/browse/TEZ-2850 > Project: Apache Tez > Issue Type: Bug >Reporter: Saikat >Assignee: Saikat > Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850.1.patch, > TEZ-2850.2.patch, TEZ-2850_test.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2850) Tez MergeManager OOM for small Map Outputs
[ https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952646#comment-14952646 ] TezQA commented on TEZ-2850: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12766061/TEZ-2850.1.patch against master revision ba63219. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.runtime.library.api.TestTezRuntimeConfiguration Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/1208//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/1208//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-library.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1208//console This message is automatically generated. > Tez MergeManager OOM for small Map Outputs > -- > > Key: TEZ-2850 > URL: https://issues.apache.org/jira/browse/TEZ-2850 > Project: Apache Tez > Issue Type: Bug >Reporter: Saikat >Assignee: Saikat > Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850.1.patch, > TEZ-2850_test.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2850) Tez MergeManager OOM for small Map Outputs
[ https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14936021#comment-14936021 ] Siddharth Seth commented on TEZ-2850: - That is a very good point. The checksum has already been computed/verified while writing the segment to a buffer. Looks like setting up the constructors correctly will take care of this. > Tez MergeManager OOM for small Map Outputs > -- > > Key: TEZ-2850 > URL: https://issues.apache.org/jira/browse/TEZ-2850 > Project: Apache Tez > Issue Type: Bug >Reporter: Saikat >Assignee: Saikat > Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850_test.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2850) Tez MergeManager OOM for small Map Outputs
[ https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14935959#comment-14935959 ] Gopal V commented on TEZ-2850: -- I've been trying to understand why we even have a reference to IFileInputStream from the Segment. The shuffleToMemory() should throw away the IFileInputStream as soon as it copies the data into memory. >From my understanding of the merger code, for in-memory segments, this buffer >is assumed to be already thrown away after the reader pulls it into memory. Only disk segments should be having 4kb chunks attached to them (a total of 4Mb with a 100 sort factor). > Tez MergeManager OOM for small Map Outputs > -- > > Key: TEZ-2850 > URL: https://issues.apache.org/jira/browse/TEZ-2850 > Project: Apache Tez > Issue Type: Bug >Reporter: Saikat >Assignee: Saikat > Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850_test.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2850) Tez MergeManager OOM for small Map Outputs
[ https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933962#comment-14933962 ] Siddharth Seth commented on TEZ-2850: - bq. How to we estimate the size of the segments, since it may vary for each map output? I mean size of the segment data structure in memory. That should be independent of the data size. Looking at the heap images you've posted - this is approximately 5.5K ? 3% of the memory allocated for shuffle. Comes to about 1024 segments for a 200MB allocation. bq. Whats should be the default number of segments (should it be 0, so that 0 means ignore this setting)? A high number. Something like 4096. 0 would disable the checks. mapreduce.reduce.merge.inmem.threshold in hadoop corresponds to tez.runtime.shuffle.memory-to-memory.segments - which indicates the number of segments after which an in-mem merge will be triggered, if enabled. This is slightly different - it's a limit on the segments, but triggers a disk merge instead of an in-mem merge. It'll have to be consolidated with in-mem merge once that is tested properly with Tez. The property could be named tez.runtime.shuffle.in-memory.segments.max > Tez MergeManager OOM for small Map Outputs > -- > > Key: TEZ-2850 > URL: https://issues.apache.org/jira/browse/TEZ-2850 > Project: Apache Tez > Issue Type: Bug >Reporter: Saikat >Assignee: Saikat > Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850_test.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2850) Tez MergeManager OOM for small Map Outputs
[ https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908260#comment-14908260 ] Saikat commented on TEZ-2850: - [~sseth] some question for the approach that you mention 1. "We should try capping the value based on a rough estimate of the size of segments." How to we estimate the size of the segments, since it may vary for each map output? and what percent should be set as default? 2. Whats should be the default number of segments (should it be 0, so that 0 means ignore this setting)? (commitmemory>mergethreshold || (inMemMergeSegmentsThreshold != 0 && inMemoryMapOutputs.size() > inMemMergeSegmentsThreshold)) 3. What should be the flag name? hadoop has something like "mapreduce.reduce.merge.inmem.threshold". > Tez MergeManager OOM for small Map Outputs > -- > > Key: TEZ-2850 > URL: https://issues.apache.org/jira/browse/TEZ-2850 > Project: Apache Tez > Issue Type: Bug >Reporter: Saikat >Assignee: Saikat > Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850_test.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2850) Tez MergeManager OOM for small Map Outputs
[ https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904810#comment-14904810 ] Siddharth Seth commented on TEZ-2850: - I think it'll be better to add / compute another parameter which would limit the number of segments which are retained in memory. i.e. Spill to disk if 1) the memory threshold is exceeded, or 2) If #segments limit is reached. This could be a configurable parameter - which serves more as an upper limit. We should try capping the value based on a rough estimate of the size of segments. The JVM size cannot be used as an available memory parameter, since multiple Inputs/Outputs could be running in the same JVM. We could limit this to a small fraction of the allocated memory for the shuffle. > Tez MergeManager OOM for small Map Outputs > -- > > Key: TEZ-2850 > URL: https://issues.apache.org/jira/browse/TEZ-2850 > Project: Apache Tez > Issue Type: Bug >Reporter: Saikat >Assignee: Saikat > Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850_test.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2850) Tez MergeManager OOM for small Map Outputs
[ https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14904597#comment-14904597 ] Saikat commented on TEZ-2850: - Thanks [~gopalv] [~sseth] for the explanation. So how do we go about handling this scenario. Can we have a TEZ config flag to turn on/off this optimization feature? If so what name should be used for the flag. I can submit a patch for review. So this IFileInputStream optimzation flag and/or tweaking the shuffle.merge.percent flag can resolve this problem. (without this optmization turned off, we might need to put a very low value of around 0.01 for shuffle.merge.percent) > Tez MergeManager OOM for small Map Outputs > -- > > Key: TEZ-2850 > URL: https://issues.apache.org/jira/browse/TEZ-2850 > Project: Apache Tez > Issue Type: Bug >Reporter: Saikat > Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850_test.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2850) Tez MergeManager OOM for small Map Outputs
[ https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903821#comment-14903821 ] Gopal V commented on TEZ-2850: -- Good catch [~saikatr], that's 4kb of space overhead for 100 bytes of data. The perf fix was to fix the total # of JNI calls to libhadoop.so CRC32. With this fix, the Writable deserialization is unbuffered - so an IntWritable will trigger 1 JNI call out to libhadoop.so per 4 byte Integer read (also see HADOOP-10778). > Tez MergeManager OOM for small Map Outputs > -- > > Key: TEZ-2850 > URL: https://issues.apache.org/jira/browse/TEZ-2850 > Project: Apache Tez > Issue Type: Bug >Reporter: Saikat > Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850_test.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2850) Tez MergeManager OOM for small Map Outputs
[ https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903787#comment-14903787 ] Siddharth Seth commented on TEZ-2850: - Nice find! I believe this change was made to reduce the number of times the checksum is computed, and to try and compute it in chunks of 4096 for better performance. cc [~gopalv] Other than the 4K buffer, there's a bunch of other objects, references etc per Segment - I won't be surprised if this adds up to a KB. Along with the memory spill limit, adding a limit on the number of in-memory segments would help. The memory-to-memory merger would normally have helped in this case, but that's not tested and should not be enabled. > Tez MergeManager OOM for small Map Outputs > -- > > Key: TEZ-2850 > URL: https://issues.apache.org/jira/browse/TEZ-2850 > Project: Apache Tez > Issue Type: Bug >Reporter: Saikat > Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850_test.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2850) Tez MergeManager OOM for small Map Outputs
[ https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903562#comment-14903562 ] Rohini Palaniswamy commented on TEZ-2850: - bq. A reducer vertex task fetches around 20 map outputs, each of around ~100 odd bytes. In case question pops up on why such high number of map outputs, it is because of auto parallelism. Consider the case of auto parallelism estimation of 999 for source and target vertex which is very common with Pig (999 is the default upper limit for estimation). But source produces less data making it change the target vertex parallelism to 1 (may be a higher number in Saikat's case). So 1 task can end up fetching 999*999 = 998001 map outputs. > Tez MergeManager OOM for small Map Outputs > -- > > Key: TEZ-2850 > URL: https://issues.apache.org/jira/browse/TEZ-2850 > Project: Apache Tez > Issue Type: Bug >Reporter: Saikat > Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850_test.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2850) Tez MergeManager OOM for small Map Outputs
[ https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903501#comment-14903501 ] Hitesh Shah commented on TEZ-2850: -- \cc [~rajesh.balamohan] [~sseth] [~gopalv] > Tez MergeManager OOM for small Map Outputs > -- > > Key: TEZ-2850 > URL: https://issues.apache.org/jira/browse/TEZ-2850 > Project: Apache Tez > Issue Type: Bug >Reporter: Saikat > Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850_test.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2850) Tez MergeManager OOM for small Map Outputs
[ https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903498#comment-14903498 ] Saikat commented on TEZ-2850: - adding [~jeagles] [~rohini] for watch > Tez MergeManager OOM for small Map Outputs > -- > > Key: TEZ-2850 > URL: https://issues.apache.org/jira/browse/TEZ-2850 > Project: Apache Tez > Issue Type: Bug >Reporter: Saikat > Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850_test.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2850) Tez MergeManager OOM for small Map Outputs
[ https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903497#comment-14903497 ] Saikat commented on TEZ-2850: - [~hitesh] I was going through Hadoop's IFileInputStream implementation(hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/IFileInputStream.java ) and found that this implementation of buffer[4096] is not present in hadoop but in Tez. I submitted a tentative patch in which IFileInputStream of Tez behaves exactly as that of Hadoop. Can you please throw some light what does this added buffer do? > Tez MergeManager OOM for small Map Outputs > -- > > Key: TEZ-2850 > URL: https://issues.apache.org/jira/browse/TEZ-2850 > Project: Apache Tez > Issue Type: Bug >Reporter: Saikat > Attachments: OOM_1.png, OOM_2.png, OOM_3.png, TEZ-2850_test.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2850) Tez MergeManager OOM for small Map Outputs
[ https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903481#comment-14903481 ] Saikat commented on TEZ-2850: - This is a unique scenario, that we faced, while running a Tez Job. A reducer vertex task fetches around 20 map outputs, each of around ~100 odd bytes. So total mapoutput size is around 20 * 100 ~ 20Mb. The MergeManager has a merge threshold check, where if it crosses this threshold, InmemoryMerger will be triggered and it will spill the inmemory fetched map outputs to disk to free up memory. In our scenario, mergethreshold(~500mb) >> commitMemory(~20mb), So inMemory merger never gets triggerd. Finally when the finalMerge() is called in close(), MergeManager calls createInMemorySegments() to do the final merge. In this, when Tez creates a IFileInputStream object for the InMemoryReader, the IFileInputStream allocates a buffer of size 4096(hard coded). Thus the total size of a single inmemory segment comes to around 5kb, even though data in this segment is only in order of 100 bytes. So, for 20 map outputs, the total size is 20 * 5000 ~ 1G, which causes OOM! Attached is a snapshot of the heap dump which shows this scenario. > Tez MergeManager OOM for small Map Outputs > -- > > Key: TEZ-2850 > URL: https://issues.apache.org/jira/browse/TEZ-2850 > Project: Apache Tez > Issue Type: Bug >Reporter: Saikat > Attachments: OOM_1.png, OOM_2.png, OOM_3.png > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)