[jira] [Commented] (HDFS-4940) namenode OOMs under Bigtop's TestCLI
[ https://issues.apache.org/jira/browse/HDFS-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13751900#comment-13751900 ] Roman Shaposhnik commented on HDFS-4940: @arun any reason not to close this? I don't think I can repro it anymore. > namenode OOMs under Bigtop's TestCLI > > > Key: HDFS-4940 > URL: https://issues.apache.org/jira/browse/HDFS-4940 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Roman Shaposhnik >Assignee: Roman Shaposhnik >Priority: Blocker > Fix For: 2.3.0 > > > Bigtop's TestCLI when executed against Hadoop 2.1.0 seems to make it OOM > quite reliably regardless of the heap size settings. I'm attaching a heap > dump URL. Alliteratively anybody can just take Bigtop's tests, compiled them > against Hadoop 2.1.0 bits and try to reproduce it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4940) namenode OOMs under Bigtop's TestCLI
[ https://issues.apache.org/jira/browse/HDFS-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13730221#comment-13730221 ] Roman Shaposhnik commented on HDFS-4940: I wasn't able to reproduce it in my 2.1.0-beta RC testing. I suggest we close this one as non-reproducible. Objections? > namenode OOMs under Bigtop's TestCLI > > > Key: HDFS-4940 > URL: https://issues.apache.org/jira/browse/HDFS-4940 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Roman Shaposhnik >Assignee: Roman Shaposhnik >Priority: Blocker > Fix For: 2.1.0-beta > > > Bigtop's TestCLI when executed against Hadoop 2.1.0 seems to make it OOM > quite reliably regardless of the heap size settings. I'm attaching a heap > dump URL. Alliteratively anybody can just take Bigtop's tests, compiled them > against Hadoop 2.1.0 bits and try to reproduce it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4940) namenode OOMs under Bigtop's TestCLI
[ https://issues.apache.org/jira/browse/HDFS-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13724823#comment-13724823 ] Roman Shaposhnik commented on HDFS-4940: Sorry just got back from vacation -- I'm currently building/running tests on RC1 in Bigtop -- will update you on Wed. > namenode OOMs under Bigtop's TestCLI > > > Key: HDFS-4940 > URL: https://issues.apache.org/jira/browse/HDFS-4940 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Roman Shaposhnik >Assignee: Roman Shaposhnik >Priority: Blocker > Fix For: 2.1.0-beta > > > Bigtop's TestCLI when executed against Hadoop 2.1.0 seems to make it OOM > quite reliably regardless of the heap size settings. I'm attaching a heap > dump URL. Alliteratively anybody can just take Bigtop's tests, compiled them > against Hadoop 2.1.0 bits and try to reproduce it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4940) namenode OOMs under Bigtop's TestCLI
[ https://issues.apache.org/jira/browse/HDFS-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13723282#comment-13723282 ] Arun C Murthy commented on HDFS-4940: - [~rvs] Did you get a chance to look at this? I'm inclined to push this to a 2.1.1-beta to unblock 2.1.0-beta since it'll be good to get f/b on APIs etc. while we investigate this further. Makes sense? Thanks. > namenode OOMs under Bigtop's TestCLI > > > Key: HDFS-4940 > URL: https://issues.apache.org/jira/browse/HDFS-4940 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Roman Shaposhnik >Assignee: Roman Shaposhnik >Priority: Blocker > Fix For: 2.1.0-beta > > > Bigtop's TestCLI when executed against Hadoop 2.1.0 seems to make it OOM > quite reliably regardless of the heap size settings. I'm attaching a heap > dump URL. Alliteratively anybody can just take Bigtop's tests, compiled them > against Hadoop 2.1.0 bits and try to reproduce it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4940) namenode OOMs under Bigtop's TestCLI
[ https://issues.apache.org/jira/browse/HDFS-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721096#comment-13721096 ] Colin Patrick McCabe commented on HDFS-4940: HADOOP-9676 was not the root cause. It was just a secondary issue that was making debugging more difficult. We are still waiting on more data (such as logs) to help understand why BigTop's TestCLI is failing. Roman will be back from vacation next week and I believe he said "I'll totally pick this JIRA back up once I get back." So I will assign this to him. > namenode OOMs under Bigtop's TestCLI > > > Key: HDFS-4940 > URL: https://issues.apache.org/jira/browse/HDFS-4940 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Roman Shaposhnik >Priority: Blocker > Fix For: 2.1.0-beta > > > Bigtop's TestCLI when executed against Hadoop 2.1.0 seems to make it OOM > quite reliably regardless of the heap size settings. I'm attaching a heap > dump URL. Alliteratively anybody can just take Bigtop's tests, compiled them > against Hadoop 2.1.0 bits and try to reproduce it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4940) namenode OOMs under Bigtop's TestCLI
[ https://issues.apache.org/jira/browse/HDFS-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720963#comment-13720963 ] Arun C Murthy commented on HDFS-4940: - HADOOP-9676 is a long standing bug - how is that Bigtop never hit it before? What am I missing here? > namenode OOMs under Bigtop's TestCLI > > > Key: HDFS-4940 > URL: https://issues.apache.org/jira/browse/HDFS-4940 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Roman Shaposhnik >Priority: Blocker > Fix For: 2.1.0-beta > > > Bigtop's TestCLI when executed against Hadoop 2.1.0 seems to make it OOM > quite reliably regardless of the heap size settings. I'm attaching a heap > dump URL. Alliteratively anybody can just take Bigtop's tests, compiled them > against Hadoop 2.1.0 bits and try to reproduce it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4940) namenode OOMs under Bigtop's TestCLI
[ https://issues.apache.org/jira/browse/HDFS-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13701800#comment-13701800 ] Roman Shaposhnik commented on HDFS-4940: [~sureshms] Suresh, sorry for the belated reply -- I'm actually on vacation till last week of July. While I'm away you can just run a Bigtop's TestCLI against the latest 2.1-based bits. Drop an email on Bigtop's ML and I'm sure folks will be more than happy to help with the details. I'll totally pick this JIRA back up once I get back. > namenode OOMs under Bigtop's TestCLI > > > Key: HDFS-4940 > URL: https://issues.apache.org/jira/browse/HDFS-4940 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Roman Shaposhnik >Priority: Blocker > Fix For: 2.1.0-beta > > > Bigtop's TestCLI when executed against Hadoop 2.1.0 seems to make it OOM > quite reliably regardless of the heap size settings. I'm attaching a heap > dump URL. Alliteratively anybody can just take Bigtop's tests, compiled them > against Hadoop 2.1.0 bits and try to reproduce it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4940) namenode OOMs under Bigtop's TestCLI
[ https://issues.apache.org/jira/browse/HDFS-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699562#comment-13699562 ] Suresh Srinivas commented on HDFS-4940: --- [~rvs] Can you please post the logs so that we can debug what causes buffer growth? > namenode OOMs under Bigtop's TestCLI > > > Key: HDFS-4940 > URL: https://issues.apache.org/jira/browse/HDFS-4940 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Roman Shaposhnik >Priority: Blocker > Fix For: 2.1.0-beta > > > Bigtop's TestCLI when executed against Hadoop 2.1.0 seems to make it OOM > quite reliably regardless of the heap size settings. I'm attaching a heap > dump URL. Alliteratively anybody can just take Bigtop's tests, compiled them > against Hadoop 2.1.0 bits and try to reproduce it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4940) namenode OOMs under Bigtop's TestCLI
[ https://issues.apache.org/jira/browse/HDFS-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13697533#comment-13697533 ] Suresh Srinivas commented on HDFS-4940: --- [~rvs] Thanks for the update. I am really interested in seeing what is causing the memory growth. Namenode logs with HADOOP-9676 will help understand the issue. > namenode OOMs under Bigtop's TestCLI > > > Key: HDFS-4940 > URL: https://issues.apache.org/jira/browse/HDFS-4940 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Roman Shaposhnik >Priority: Blocker > Fix For: 2.1.0-beta > > > Bigtop's TestCLI when executed against Hadoop 2.1.0 seems to make it OOM > quite reliably regardless of the heap size settings. I'm attaching a heap > dump URL. Alliteratively anybody can just take Bigtop's tests, compiled them > against Hadoop 2.1.0 bits and try to reproduce it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4940) namenode OOMs under Bigtop's TestCLI
[ https://issues.apache.org/jira/browse/HDFS-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13697371#comment-13697371 ] Roman Shaposhnik commented on HDFS-4940: HADOOP-9676 now allows to isolated this down to a few tests. I'll keep you guys posted. > namenode OOMs under Bigtop's TestCLI > > > Key: HDFS-4940 > URL: https://issues.apache.org/jira/browse/HDFS-4940 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Roman Shaposhnik >Priority: Blocker > Fix For: 2.1.0-beta > > > Bigtop's TestCLI when executed against Hadoop 2.1.0 seems to make it OOM > quite reliably regardless of the heap size settings. I'm attaching a heap > dump URL. Alliteratively anybody can just take Bigtop's tests, compiled them > against Hadoop 2.1.0 bits and try to reproduce it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4940) namenode OOMs under Bigtop's TestCLI
[ https://issues.apache.org/jira/browse/HDFS-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13697268#comment-13697268 ] Suresh Srinivas commented on HDFS-4940: --- bq. Have you checked out the heap dump? yes. I agree with your assessment based on that alone. bq. I'm still not sure how the test is causing this problem It would be good to get bigtop results for this to understand what causes this. > namenode OOMs under Bigtop's TestCLI > > > Key: HDFS-4940 > URL: https://issues.apache.org/jira/browse/HDFS-4940 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Roman Shaposhnik >Priority: Blocker > Fix For: 2.1.0-beta > > > Bigtop's TestCLI when executed against Hadoop 2.1.0 seems to make it OOM > quite reliably regardless of the heap size settings. I'm attaching a heap > dump URL. Alliteratively anybody can just take Bigtop's tests, compiled them > against Hadoop 2.1.0 bits and try to reproduce it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4940) namenode OOMs under Bigtop's TestCLI
[ https://issues.apache.org/jira/browse/HDFS-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13695846#comment-13695846 ] Colin Patrick McCabe commented on HDFS-4940: Have you checked out the heap dump? I looked at it in Eclipse Memory Analyzer and that's how I found the giant buffer inside {{org.apache.hadoop.ipc.Server$Connection}}. Unless the heap dump is wrong, it really does look like this is where the memory is going. I'm still not sure how the test is causing this problem-- hopefully the patch from HADOOP-9676 will make this more reproducible (apparently it formerly didn't reproduce all the time) > namenode OOMs under Bigtop's TestCLI > > > Key: HDFS-4940 > URL: https://issues.apache.org/jira/browse/HDFS-4940 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Roman Shaposhnik >Priority: Blocker > Fix For: 2.1.0-beta > > > Bigtop's TestCLI when executed against Hadoop 2.1.0 seems to make it OOM > quite reliably regardless of the heap size settings. I'm attaching a heap > dump URL. Alliteratively anybody can just take Bigtop's tests, compiled them > against Hadoop 2.1.0 bits and try to reproduce it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4940) namenode OOMs under Bigtop's TestCLI
[ https://issues.apache.org/jira/browse/HDFS-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13695840#comment-13695840 ] Roman Shaposhnik commented on HDFS-4940: [~sureshms] yes NN still OOMs -- the heapdump I provided is from the NN, not from the client. Client is now fine. I'm actually rebuilding the 2.1 RC with the patch from HADOOP-9676 to try and see what leads to the gigantic RPC call. > namenode OOMs under Bigtop's TestCLI > > > Key: HDFS-4940 > URL: https://issues.apache.org/jira/browse/HDFS-4940 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Roman Shaposhnik >Priority: Blocker > Fix For: 2.1.0-beta > > > Bigtop's TestCLI when executed against Hadoop 2.1.0 seems to make it OOM > quite reliably regardless of the heap size settings. I'm attaching a heap > dump URL. Alliteratively anybody can just take Bigtop's tests, compiled them > against Hadoop 2.1.0 bits and try to reproduce it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4940) namenode OOMs under Bigtop's TestCLI
[ https://issues.apache.org/jira/browse/HDFS-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13695829#comment-13695829 ] Suresh Srinivas commented on HDFS-4940: --- [~rvs], we had exchange few emails about this. With Xmx == Xms, do you still see namenode running OOME? Also I had suggested setting bigger PermSize on the client side, did you get a chance to test that? bq. so I think the issue here is that the RPC layer reads 4 bytes from the client, and allocates a buffer of that size, no matter how big it is. While it is possible, the likelihood of some request sending incorrect size in these tests is probably low. I am not sure if this is the cause. > namenode OOMs under Bigtop's TestCLI > > > Key: HDFS-4940 > URL: https://issues.apache.org/jira/browse/HDFS-4940 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Roman Shaposhnik >Priority: Blocker > Fix For: 2.1.0-beta > > > Bigtop's TestCLI when executed against Hadoop 2.1.0 seems to make it OOM > quite reliably regardless of the heap size settings. I'm attaching a heap > dump URL. Alliteratively anybody can just take Bigtop's tests, compiled them > against Hadoop 2.1.0 bits and try to reproduce it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4940) namenode OOMs under Bigtop's TestCLI
[ https://issues.apache.org/jira/browse/HDFS-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13695747#comment-13695747 ] Colin Patrick McCabe commented on HDFS-4940: so I think the issue here is that the RPC layer reads 4 bytes from the client, and allocates a buffer of that size, no matter how big it is. Code here: {code} dataLength = dataLengthBuffer.getInt(); if ((dataLength == Client.PING_CALL_ID) && (!useWrap)) { // covers the !useSasl too dataLengthBuffer.clear(); return 0; // ping message } if (dataLength < 0) { LOG.warn("Unexpected data length " + dataLength + "!! from " + getHostAddress()); } data = ByteBuffer.allocate(dataLength); {code} It seems silly to allocate such large RPC buffers. It allows clients to bring down the NN with just one or two RPCs. Why don't we make the maximum RPC size configurable, and perhaps default to 64 MB or something? Even a block report of a few million blocks should fit in that size. > namenode OOMs under Bigtop's TestCLI > > > Key: HDFS-4940 > URL: https://issues.apache.org/jira/browse/HDFS-4940 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Roman Shaposhnik >Priority: Blocker > Fix For: 2.1.0-beta > > > Bigtop's TestCLI when executed against Hadoop 2.1.0 seems to make it OOM > quite reliably regardless of the heap size settings. I'm attaching a heap > dump URL. Alliteratively anybody can just take Bigtop's tests, compiled them > against Hadoop 2.1.0 bits and try to reproduce it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4940) namenode OOMs under Bigtop's TestCLI
[ https://issues.apache.org/jira/browse/HDFS-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13695715#comment-13695715 ] Colin Patrick McCabe commented on HDFS-4940: You've got 1 GB of data being held in the 'data' element of a {{org.apache.hadoop.ipc.Server$Connection}} object. It looks like it's in the buffer inside the connection. > namenode OOMs under Bigtop's TestCLI > > > Key: HDFS-4940 > URL: https://issues.apache.org/jira/browse/HDFS-4940 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Roman Shaposhnik >Priority: Blocker > Fix For: 2.1.0-beta > > > Bigtop's TestCLI when executed against Hadoop 2.1.0 seems to make it OOM > quite reliably regardless of the heap size settings. I'm attaching a heap > dump URL. Alliteratively anybody can just take Bigtop's tests, compiled them > against Hadoop 2.1.0 bits and try to reproduce it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-4940) namenode OOMs under Bigtop's TestCLI
[ https://issues.apache.org/jira/browse/HDFS-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694846#comment-13694846 ] Roman Shaposhnik commented on HDFS-4940: Here's where to get the heap dump: http://people.apache.org/~rvs/jira/apache/HDFS-4940/java_pid29413.hprof.bz2 > namenode OOMs under Bigtop's TestCLI > > > Key: HDFS-4940 > URL: https://issues.apache.org/jira/browse/HDFS-4940 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Roman Shaposhnik >Priority: Blocker > Fix For: 2.1.0-beta > > > Bigtop's TestCLI when executed against Hadoop 2.1.0 seems to make it OOM > quite reliably regardless of the heap size settings. I'm attaching a heap > dump URL. Alliteratively anybody can just take Bigtop's tests, compiled them > against Hadoop 2.1.0 bits and try to reproduce it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira