[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083529#comment-13083529 ] Hudson commented on MAPREDUCE-2489: --- Integrated in Hadoop-Common-trunk-Commit #728 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/728/]) MAPREDUCE-2489. Jobsplits with random hostnames can make the queue unusable (jeffrey naisbit via mahadev) mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1156821 Files : * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/JobTracker.java * /hadoop/common/trunk/mapreduce/CHANGES.txt * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/JobInProgress.java Jobsplits with random hostnames can make the queue unusable --- Key: MAPREDUCE-2489 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.205.0, 0.23.0 Reporter: Jeffrey Naisbitt Assignee: Jeffrey Naisbitt Fix For: 0.20.205.0, 0.23.0 Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, MAPREDUCE-2489-0.20s-v5.patch, MAPREDUCE-2489-0.20s-v6.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred-v6.patch, MAPREDUCE-2489-mapred-v7.patch, MAPREDUCE-2489-mapred.patch We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else. I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number of failures in the resolve. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083809#comment-13083809 ] Hudson commented on MAPREDUCE-2489: --- Integrated in Hadoop-Mapreduce-trunk-Commit #761 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/761/]) MAPREDUCE-2489. Jobsplits with random hostnames can make the queue unusable (jeffrey naisbit via mahadev) mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1156821 Files : * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/JobTracker.java * /hadoop/common/trunk/mapreduce/CHANGES.txt * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/JobInProgress.java Jobsplits with random hostnames can make the queue unusable --- Key: MAPREDUCE-2489 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.205.0, 0.23.0 Reporter: Jeffrey Naisbitt Assignee: Jeffrey Naisbitt Fix For: 0.20.205.0, 0.23.0 Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, MAPREDUCE-2489-0.20s-v5.patch, MAPREDUCE-2489-0.20s-v6.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred-v6.patch, MAPREDUCE-2489-mapred-v7.patch, MAPREDUCE-2489-mapred.patch We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else. I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number of failures in the resolve. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083814#comment-13083814 ] Hudson commented on MAPREDUCE-2489: --- Integrated in Hadoop-Mapreduce-trunk #752 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/752/]) MAPREDUCE-2489. Jobsplits with random hostnames can make the queue unusable (jeffrey naisbit via mahadev) mahadev : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1156821 Files : * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/JobTracker.java * /hadoop/common/trunk/mapreduce/CHANGES.txt * /hadoop/common/trunk/mapreduce/src/java/org/apache/hadoop/mapred/JobInProgress.java Jobsplits with random hostnames can make the queue unusable --- Key: MAPREDUCE-2489 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.205.0, 0.23.0 Reporter: Jeffrey Naisbitt Assignee: Jeffrey Naisbitt Fix For: 0.20.205.0, 0.23.0 Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, MAPREDUCE-2489-0.20s-v5.patch, MAPREDUCE-2489-0.20s-v6.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred-v6.patch, MAPREDUCE-2489-mapred-v7.patch, MAPREDUCE-2489-mapred.patch We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else. I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number of failures in the resolve. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081917#comment-13081917 ] Jeffrey Naisbitt commented on MAPREDUCE-2489: - The only test failures on trunk are unrelated to this patch and fail with a clean checkout of trunk as well. trunk test-patch results (since this patch will fail until HADOOP-7499 has been committed): [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] -1 system test framework. The patch failed system test framework compile. The extra findbugs warning is unrelated to this bug (Incorrect lazy initialization of static field org.apache.hadoop.mapred.TaskLog.localFS in org.apache.hadoop.mapred.TaskLog.writeToIndexFile(String, boolean)) The system test framework is because HADOOP-7499 has not been committed yet. Jobsplits with random hostnames can make the queue unusable --- Key: MAPREDUCE-2489 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.205.0, 0.23.0 Reporter: Jeffrey Naisbitt Assignee: Jeffrey Naisbitt Fix For: 0.20.205.0, 0.23.0 Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, MAPREDUCE-2489-0.20s-v5.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred-v6.patch, MAPREDUCE-2489-mapred.patch We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else. I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number of failures in the resolve. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080438#comment-13080438 ] Mahadev konar commented on MAPREDUCE-2489: -- +1 for 0.20S patch. I have committed that to the branch. Thanks Jeffrey. I'll wait for the trunk common and mapred patches before closing this jira. Jobsplits with random hostnames can make the queue unusable --- Key: MAPREDUCE-2489 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.205.0, 0.23.0 Reporter: Jeffrey Naisbitt Assignee: Jeffrey Naisbitt Fix For: 0.20.205.0, 0.23.0 Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, MAPREDUCE-2489-0.20s-v5.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred.patch We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else. I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number of failures in the resolve. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079472#comment-13079472 ] Jeffrey Naisbitt commented on MAPREDUCE-2489: - I meant 'ant test' instead of 'mvn test'. I was running with ant-1.8.2, which apparently caused the test failures. All of the tests pass with ant-1.7.1 Jobsplits with random hostnames can make the queue unusable --- Key: MAPREDUCE-2489 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.205.0, 0.23.0 Reporter: Jeffrey Naisbitt Assignee: Jeffrey Naisbitt Fix For: 0.20.205.0, 0.23.0 Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, MAPREDUCE-2489-0.20s-v5.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred.patch We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else. I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number of failures in the resolve. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079540#comment-13079540 ] Jeffrey Naisbitt commented on MAPREDUCE-2489: - To clarify, all the tests pass for the 0.20s patch. I am still looking at the patch for trunk. Jobsplits with random hostnames can make the queue unusable --- Key: MAPREDUCE-2489 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.205.0, 0.23.0 Reporter: Jeffrey Naisbitt Assignee: Jeffrey Naisbitt Fix For: 0.20.205.0, 0.23.0 Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, MAPREDUCE-2489-0.20s-v5.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred-v5.patch, MAPREDUCE-2489-mapred.patch We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else. I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number of failures in the resolve. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13076195#comment-13076195 ] Jeffrey Naisbitt commented on MAPREDUCE-2489: - I agree. It would definitely fit better in NetUtils. I'll post new patches soon. Jobsplits with random hostnames can make the queue unusable --- Key: MAPREDUCE-2489 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.205.0, 0.23.0 Reporter: Jeffrey Naisbitt Assignee: Jeffrey Naisbitt Fix For: 0.20.205.0, 0.23.0 Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred.patch We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else. I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number of failures in the resolve. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13076283#comment-13076283 ] Jeffrey Naisbitt commented on MAPREDUCE-2489: - 20-security branch test-patch results: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. Jobsplits with random hostnames can make the queue unusable --- Key: MAPREDUCE-2489 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.205.0, 0.23.0 Reporter: Jeffrey Naisbitt Assignee: Jeffrey Naisbitt Fix For: 0.20.205.0, 0.23.0 Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s-v4.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred.patch We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else. I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number of failures in the resolve. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073838#comment-13073838 ] Mahadev konar commented on MAPREDUCE-2489: -- Jeffrey, One minor nit, The method: {code} static void verifyHostnames(String[] names) throws UnknownHostException { {code} does not seem appropriate for JobInProgress class. It needs to be moved out to some helper class. NetUtils seems more appropriate for this helper method. What do you think? Jobsplits with random hostnames can make the queue unusable --- Key: MAPREDUCE-2489 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.205.0, 0.23.0 Reporter: Jeffrey Naisbitt Assignee: Jeffrey Naisbitt Fix For: 0.20.205.0, 0.23.0 Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred.patch We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else. I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number of failures in the resolve. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070456#comment-13070456 ] Jeffrey Naisbitt commented on MAPREDUCE-2489: - Mahadev, There are two parts to this patch. The most important is the use of the new resolveValidHosts function and throwing the UnknownHostException up the chain. That part ensures that the hostnames are actually valid and is needed to resolve the problem this Jira was filed for. The second part (that you describe above) is meant to only be an early sanity check. All it is trying to do is ensure that the hostnames appear to be valid. With this, I'm just trying to check the hostname string itself to see that it contains valid characters and such so we can fail early on if the hostname string contains garbage (like in the original case this Jira was filed for). The URI object seemed the easiest and most standard way I could find to do this. The URI object creation will fail, or the uri.getHost() method will return null if the string passed in doesn't match their regular expressions for what a URI should look like (which includes checking the hostname portion). The URI object requires a schema (http://, ftp://, etc.), so I add one on if it doesn't already have one in the string. Again, this is only really meant to catch the garbage string cases, and it's not meant to be a perfect check that will catch all hostname problems (those are caught by the first part of the patch mentioned above). Does that make sense? If you have other questions or suggestions, please let me know :) Thanks for looking at it! Jobsplits with random hostnames can make the queue unusable --- Key: MAPREDUCE-2489 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.205.0, 0.23.0 Reporter: Jeffrey Naisbitt Assignee: Jeffrey Naisbitt Fix For: 0.20.205.0, 0.23.0 Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred.patch We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else. I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number of failures in the resolve. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070318#comment-13070318 ] Mahadev konar commented on MAPREDUCE-2489: -- Jeffrey, Sorry, I am a little unclear on what the patch is doing. Can you please specify what you are trying to achieve with the patch? The patch seems to create a URI with hostname and checking if its a valid URI or not? How is that verifying if a hostname is valid or not? Jobsplits with random hostnames can make the queue unusable --- Key: MAPREDUCE-2489 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.205.0, 0.23.0 Reporter: Jeffrey Naisbitt Assignee: Jeffrey Naisbitt Fix For: 0.20.205.0, 0.23.0 Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred.patch We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else. I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number of failures in the resolve. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13065433#comment-13065433 ] Hadoop QA commented on MAPREDUCE-2489: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12486481/MAPREDUCE-2489-mapred-v4.patch against trunk revision 1146517. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: -1 contrib tests. The patch failed contrib unit tests. -1 system test framework. The patch failed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/467//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/467//console This message is automatically generated. Jobsplits with random hostnames can make the queue unusable --- Key: MAPREDUCE-2489 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.205.0, 0.23.0 Reporter: Jeffrey Naisbitt Assignee: Jeffrey Naisbitt Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred.patch We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else. I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number of failures in the resolve. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053997#comment-13053997 ] Jeffrey Naisbitt commented on MAPREDUCE-2489: - test-patch results for the 0.20s patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 9 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. Jobsplits with random hostnames can make the queue unusable --- Key: MAPREDUCE-2489 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.205.0, 0.23.0 Reporter: Jeffrey Naisbitt Assignee: Jeffrey Naisbitt Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred.patch We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else. I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number of failures in the resolve. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054000#comment-13054000 ] Hadoop QA commented on MAPREDUCE-2489: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12483623/MAPREDUCE-2489-mapred-v3.patch against trunk revision 1138301. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs (version 1.3.9) to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these core unit tests: -1 contrib tests. The patch failed contrib unit tests. -1 system test framework. The patch failed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/413//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/413//console This message is automatically generated. Jobsplits with random hostnames can make the queue unusable --- Key: MAPREDUCE-2489 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.205.0, 0.23.0 Reporter: Jeffrey Naisbitt Assignee: Jeffrey Naisbitt Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred.patch We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else. I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number of failures in the resolve. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054002#comment-13054002 ] Jeffrey Naisbitt commented on MAPREDUCE-2489: - Again, this patch will fail until HADOOP-7314 has been committed. Jobsplits with random hostnames can make the queue unusable --- Key: MAPREDUCE-2489 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.205.0, 0.23.0 Reporter: Jeffrey Naisbitt Assignee: Jeffrey Naisbitt Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred.patch We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else. I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number of failures in the resolve. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049876#comment-13049876 ] Hadoop QA commented on MAPREDUCE-2489: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12482679/MAPREDUCE-2489-0.20s.patch against trunk revision 1136000. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/397//console This message is automatically generated. Jobsplits with random hostnames can make the queue unusable --- Key: MAPREDUCE-2489 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.205.0, 0.23.0 Reporter: Jeffrey Naisbitt Assignee: Jeffrey Naisbitt Attachments: MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred.patch We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else. I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number of failures in the resolve. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049941#comment-13049941 ] Jeffrey Naisbitt commented on MAPREDUCE-2489: - Test-patch results for 20-security patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. Jobsplits with random hostnames can make the queue unusable --- Key: MAPREDUCE-2489 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.205.0, 0.23.0 Reporter: Jeffrey Naisbitt Assignee: Jeffrey Naisbitt Attachments: MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred.patch We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else. I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number of failures in the resolve. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038008#comment-13038008 ] jirapos...@reviews.apache.org commented on MAPREDUCE-2489: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/776/ --- Review request for hadoop-mapreduce. Summary --- We saw an issue where a custom InputSplit was returning invalid hostnames (non-repeating) for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else. I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number of failures in the resolve. NOTE: This requires the changes in HADOOP-7314 This addresses bug MAPREDUCE-2489. https://issues.apache.org/jira/browse/MAPREDUCE-2489 Diffs - trunk/ivy.xml 1125074 trunk/ivy/libraries.properties 1125074 trunk/src/contrib/mumak/src/java/org/apache/hadoop/mapred/SimulatorJobTracker.java 1125074 trunk/src/java/org/apache/hadoop/mapred/JobInProgress.java 1125074 trunk/src/java/org/apache/hadoop/mapred/JobTracker.java 1125074 Diff: https://reviews.apache.org/r/776/diff Testing --- Thanks, Jeffrey Jobsplits with random hostnames can make the queue unusable --- Key: MAPREDUCE-2489 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Reporter: Jeffrey Naisbitt Assignee: Jeffrey Naisbitt Attachments: MAPREDUCE-2489-mapred.patch We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else. I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number of failures in the resolve. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036847#comment-13036847 ] Jeffrey Naisbitt commented on MAPREDUCE-2489: - @Allen, apologies for not being clear in the description. Also, I expect this patch will fail in the Hudson build until HADOOP-7314 has been committed. Jobsplits with random hostnames can make the queue unusable --- Key: MAPREDUCE-2489 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Reporter: Jeffrey Naisbitt Assignee: Jeffrey Naisbitt Attachments: MAPREDUCE-2489-mapred.patch We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else. I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number of failures in the resolve. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033116#comment-13033116 ] Jeffrey Naisbitt commented on MAPREDUCE-2489: - Honestly, I'm not sure what caching was enabled at the time. How would caching have helped in this case though - where we have basically tons of lookups on garbage hostnames? (none of these strings are repeated) Jobsplits with random hostnames can make the queue unusable --- Key: MAPREDUCE-2489 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Reporter: Jeffrey Naisbitt Assignee: Jeffrey Naisbitt We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else. I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number of failures in the resolve. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033129#comment-13033129 ] Allen Wittenauer commented on MAPREDUCE-2489: - Ah, ok, the non-repeating hostnames wasn't clear in the description. Jobsplits with random hostnames can make the queue unusable --- Key: MAPREDUCE-2489 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Reporter: Jeffrey Naisbitt Assignee: Jeffrey Naisbitt We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else. I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number of failures in the resolve. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032738#comment-13032738 ] Allen Wittenauer commented on MAPREDUCE-2489: - I'm assuming no nscd or dns cache was running on the JT? Jobsplits with random hostnames can make the queue unusable --- Key: MAPREDUCE-2489 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Reporter: Jeffrey Naisbitt Assignee: Jeffrey Naisbitt We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else. I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number of failures in the resolve. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira