[jira] [Commented] (MAPREDUCE-4488) Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448597#comment-13448597 ] Tom White commented on MAPREDUCE-4488: -- Arun - the patch correctly allows setup and cleanup to be disabled, however the problem is that the locking is incorrect. So that's what we need to fix - or did you have another idea? Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1 --- Key: MAPREDUCE-4488 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4488 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv1, performance Affects Versions: 1.0.3 Reporter: Tom White Assignee: Tom White Attachments: fix-mr-4488.patch, MAPREDUCE-4488.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4488) Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445707#comment-13445707 ] Karthik Kambatla commented on MAPREDUCE-4488: - Thanks Todd. I ll run JCarder before and after the fix and report back. Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1 --- Key: MAPREDUCE-4488 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4488 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv1, performance Affects Versions: 1.0.3 Reporter: Tom White Assignee: Tom White Attachments: fix-mr-4488.patch, MAPREDUCE-4488.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4488) Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445967#comment-13445967 ] Arun C Murthy commented on MAPREDUCE-4488: -- Let's revisit the original patch? Tom? Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1 --- Key: MAPREDUCE-4488 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4488 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv1, performance Affects Versions: 1.0.3 Reporter: Tom White Assignee: Tom White Attachments: fix-mr-4488.patch, MAPREDUCE-4488.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4488) Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445979#comment-13445979 ] Tom White commented on MAPREDUCE-4488: -- I agree. I'm going to revert this and MAPREDUCE-4567. Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1 --- Key: MAPREDUCE-4488 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4488 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv1, performance Affects Versions: 1.0.3 Reporter: Tom White Assignee: Tom White Attachments: fix-mr-4488.patch, MAPREDUCE-4488.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4488) Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13446029#comment-13446029 ] Tom White commented on MAPREDUCE-4488: -- Karthik - thanks for investigating. Regarding your fix, it would be better to reduce the scope of the lock on JT to the {{job.initTasks()}} statement. However even this might be excessively wide since initTasks() reads input split files, etc. There might be a way of reducing the scope of the synchronization on JobInProgress in initTasks() so that it can take a lock on the JT first before making the setupComplete() call. But as Arun rightly points out the locking in JT is very delicate so we have to be conservative here, so at least having a clean jcarder run would be prudent. Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1 --- Key: MAPREDUCE-4488 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4488 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv1, performance Affects Versions: 1.0.3 Reporter: Tom White Assignee: Tom White Attachments: fix-mr-4488.patch, MAPREDUCE-4488.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4488) Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13446065#comment-13446065 ] Arun C Murthy commented on MAPREDUCE-4488: -- The concern I have is that MAPREDUCE-463 is very different from the current JT. Originally, I did this work for the 2009 terasort record and was since ported over to branch-0.21. However, since then the locking in the JT has changed significantly - hence my advise to revisit. Thoughts? Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1 --- Key: MAPREDUCE-4488 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4488 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv1, performance Affects Versions: 1.0.3 Reporter: Tom White Assignee: Tom White Attachments: fix-mr-4488.patch, MAPREDUCE-4488.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4488) Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445401#comment-13445401 ] Karthik Kambatla commented on MAPREDUCE-4488: - It looks like the change, in particular, the implementation of {{JobInProgress#setupComplete()}} seems to have introduced a race leading to the following deadlock as noticed in our clusters: {noformat} Thread 42 (IPC Server handler 1 on 8021): State: BLOCKED Blocked count: 203661 Waited count: 563040 Blocked on org.apache.hadoop.mapred.JobInProgress@6ab8d396 Blocked by 243 (pool-7-thread-1) Stack: org.apache.hadoop.mapred.JobInProgress.runningMaps(JobInProgress.java:884) org.apache.hadoop.mapred.JobSchedulable.getRunningTasks(JobSchedulable.java:110) org.apache.hadoop.mapred.PoolSchedulable.getRunningTasks(PoolSchedulable.java:132) org.apache.hadoop.mapred.FairScheduler.assignTasks(FairScheduler.java:351) org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2935) sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) java.lang.reflect.Method.invoke(Method.java:597) org.apache.hadoop.ipc.WritableRpcEngine$Server$WritableRpcInvoker.call(WritableRpcEngine.java:474) org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) java.security.AccessController.doPrivileged(Native Method) javax.security.auth.Subject.doAs(Subject.java:396) org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) Thread 243 (pool-7-thread-1): State: BLOCKED Blocked count: 435 Waited count: 569 Blocked on org.apache.hadoop.mapred.JobTracker@3cfa54fe Blocked by 42 (IPC Server handler 1 on 8021) Stack: org.apache.hadoop.mapred.JobTracker.getClusterStatus(JobTracker.java:3616) org.apache.hadoop.mapred.JobInProgress.jobComplete(JobInProgress.java:2713) org.apache.hadoop.mapred.JobInProgress.setupComplete(JobInProgress.java:837) org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:790) org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3750) org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79) java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) java.lang.Thread.run(Thread.java:662) {noformat} We should probably revert the commit, and fix it. Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1 --- Key: MAPREDUCE-4488 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4488 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv1, performance Affects Versions: 1.0.3 Reporter: Tom White Assignee: Tom White Attachments: MAPREDUCE-4488.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4488) Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445410#comment-13445410 ] Karthik Kambatla commented on MAPREDUCE-4488: - As the stack trace shows, the deadlock is because of the following: - {{JobInProgress#jobComplete()}} (while holding {{JobInProgress}} lock) is blocked on the {{JobTracker}} lock via the call to synchronized method {{JobTracker#getClusterStatus()}} - {{FairScheduler}} (while holding the {{JobTracker}} lock by calling {{synchronized heartbeat()}}) tries to acquire the {{JobInProgress}} lock via the call to the synchronized method {{JobInProgress#runningMaps()}} Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1 --- Key: MAPREDUCE-4488 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4488 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv1, performance Affects Versions: 1.0.3 Reporter: Tom White Assignee: Tom White Attachments: MAPREDUCE-4488.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4488) Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445417#comment-13445417 ] Karthik Kambatla commented on MAPREDUCE-4488: - On examining the code, it appears safe to modify {{JobTracker#getClusterStatus()}} to non-synchronized. All the statements in the method are guarded by a {code}synchronized (taskTracker) {} {code} Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1 --- Key: MAPREDUCE-4488 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4488 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv1, performance Affects Versions: 1.0.3 Reporter: Tom White Assignee: Tom White Attachments: MAPREDUCE-4488.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4488) Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445587#comment-13445587 ] Karthik Kambatla commented on MAPREDUCE-4488: - As we cannot deterministically validate the lack of deadlocks in a piece of code, I was thinking of the following two options: - Verify lock ordering: in this case, we can write a test to verify that {{JobTracker#initJob()}} acquires the lock on {{JobTracker}} before acquiring the lock on {{JobInProgress}}. This would prevent future changes to the lock-ordering. - Run two threads with sleep statements to force a deadlock in most cases. However, it remains a best-effort test. I am very keen on learning alternate ways of testing deadlocks and which option to prefer. Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1 --- Key: MAPREDUCE-4488 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4488 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv1, performance Affects Versions: 1.0.3 Reporter: Tom White Assignee: Tom White Attachments: fix-mr-4488.patch, MAPREDUCE-4488.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4488) Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445596#comment-13445596 ] Arun C Murthy commented on MAPREDUCE-4488: -- I'm concerned, let's spend time on this one. JT locking is one of my worst nightmares. Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1 --- Key: MAPREDUCE-4488 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4488 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv1, performance Affects Versions: 1.0.3 Reporter: Tom White Assignee: Tom White Attachments: fix-mr-4488.patch, MAPREDUCE-4488.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4488) Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445600#comment-13445600 ] Karthik Kambatla commented on MAPREDUCE-4488: - Arun, what do you think of tests verifying lock ordering on all JT methods? We can verify that we hold JT lock before holding any other lock. That way, the JT itself wouldn't be involved in deadlocks? Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1 --- Key: MAPREDUCE-4488 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4488 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv1, performance Affects Versions: 1.0.3 Reporter: Tom White Assignee: Tom White Attachments: fix-mr-4488.patch, MAPREDUCE-4488.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4488) Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445678#comment-13445678 ] Todd Lipcon commented on MAPREDUCE-4488: You can use jcarder to check for lock inversions like this. See http://wiki.apache.org/hadoop/HowToUseJCarder for details. I haven't run it on branch-1 for a while but I'd be really surprised if it didn't catch this deadlock. Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1 --- Key: MAPREDUCE-4488 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4488 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv1, performance Affects Versions: 1.0.3 Reporter: Tom White Assignee: Tom White Attachments: fix-mr-4488.patch, MAPREDUCE-4488.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4488) Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428188#comment-13428188 ] Tom White commented on MAPREDUCE-4488: -- Alejandro - the code is from MAPREDUCE-463. Can I make the changes you suggest in another JIRA so that branches 1 and 2 are kept the same? Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1 --- Key: MAPREDUCE-4488 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4488 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv1, performance Affects Versions: 1.0.3 Reporter: Tom White Assignee: Tom White Attachments: MAPREDUCE-4488.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4488) Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428195#comment-13428195 ] Alejandro Abdelnur commented on MAPREDUCE-4488: --- +1 Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1 --- Key: MAPREDUCE-4488 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4488 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv1, performance Affects Versions: 1.0.3 Reporter: Tom White Assignee: Tom White Attachments: MAPREDUCE-4488.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4488) Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427785#comment-13427785 ] Ahmed Radwan commented on MAPREDUCE-4488: - +1 Thanks Tom! On a related note, I think this property need better documentation so (from a user perspective) can be clear when separate setup and cleanup tasks are not needed and it is safe to set it to false. Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1 --- Key: MAPREDUCE-4488 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4488 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv1, performance Affects Versions: 1.0.3 Reporter: Tom White Assignee: Tom White Attachments: MAPREDUCE-4488.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4488) Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427839#comment-13427839 ] Alejandro Abdelnur commented on MAPREDUCE-4488: --- looks good, some minor comments: * JobInProgress constructors, is there a need to create a JobContext to get the value of the flag? Why just not do a conf.get() ? * JobInProgress initSetupCleanupTask(), revert the IF condition and do the logic within the IF block, then no need for a return call. * JobInProgress setupComplete(), do an ELSE instead of return call at the end of the first IF block. Port MAPREDUCE-463 (The job setup and cleanup tasks should be optional) to branch-1 --- Key: MAPREDUCE-4488 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4488 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv1, performance Affects Versions: 1.0.3 Reporter: Tom White Assignee: Tom White Attachments: MAPREDUCE-4488.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira