[jira] Created: (MAPREDUCE-1482) Better handling of task diagnostic information stored in the TaskInProgress
Better handling of task diagnostic information stored in the TaskInProgress --- Key: MAPREDUCE-1482 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1482 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Reporter: Amar Kamat Task diagnostic information can be very large at times eating up Jobtracker's memory. There should be some way to avoid storing large error strings in JobTracker. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1307) Introduce the concept of Job Permissions
[ https://issues.apache.org/jira/browse/MAPREDUCE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod K V updated MAPREDUCE-1307: - Attachment: MAPREDUCE-1307-20100211.txt Updated patch that fixes client side to print nice message in case of unauthorized access. NOTE: CompletedJobStore needs to be fixed w.r.t authorization. This might involve serializing the ACLs to the job-store on DFS and using the same for authorizing further requests. I'll do it as part of a follow-up issue. Introduce the concept of Job Permissions Key: MAPREDUCE-1307 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1307 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Devaraj Das Assignee: Vinod K V Fix For: 0.22.0 Attachments: 1307-early-1.patch, MAPREDUCE-1307-20100210.txt, MAPREDUCE-1307-20100211.txt It would be good to define the notion of job permissions analogous to file permissions. Then the JobTracker can restrict who can read (e.g. look at the job page) or modify (e.g. kill) jobs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1483) CompletedJobStore should be authorized using job-acls
CompletedJobStore should be authorized using job-acls - Key: MAPREDUCE-1483 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1483 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker, security Reporter: Vinod K V Fix For: 0.22.0 MAPREDUCE-1307 adds job-acls. CompletedJobStore serves job-status off DFS after jobs are long gone and needs to have job-acls also serialized so as to facilitate authorization of job related requests. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1354) Refactor JobTracker.submitJob to not lock the JobTracker during the HDFS accesses
[ https://issues.apache.org/jira/browse/MAPREDUCE-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832437#action_12832437 ] Hemanth Yamijala commented on MAPREDUCE-1354: - One thing that was noticed was that the getCounters call in JobInProgress is synchronized. The wrapper call to getCounters in Jobtracker acquires a lock on the JT and then calls JobInProgress.getCounters. The problem is that if the job is being initialized under initTasks, then the jobtracker lock can get held up. We saw an instance of this on our clusters. To avoid this case, one solution could be to check if the job being queried is inited. This pattern is used in getTaskCompletionEvents. Refactor JobTracker.submitJob to not lock the JobTracker during the HDFS accesses - Key: MAPREDUCE-1354 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1354 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Reporter: Devaraj Das Assignee: Arun C Murthy Priority: Critical Attachments: MAPREDUCE-1354_yhadoop20.patch It'd be nice to have the JobTracker object not be locked while accessing the HDFS for reading the jobconf file and while writing the jobinfo file in the submitJob method. We should see if we can avoid taking the lock altogether. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1455) Authorization for servlets
[ https://issues.apache.org/jira/browse/MAPREDUCE-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832450#action_12832450 ] Ravi Gummadi commented on MAPREDUCE-1455: - One more thing is /logs, /static, /stack, /conf, /logLevel etc. are not going through authorization as part of this JIRA. It needs changes in common and will be addressed in a separate JIRA. Authorization for servlets -- Key: MAPREDUCE-1455 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1455 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: jobtracker, security, tasktracker Reporter: Devaraj Das Assignee: Ravi Gummadi Fix For: 0.22.0 This jira is about building the authorization for servlets (on top of MAPREDUCE-1307). That is, the JobTracker/TaskTracker runs authorization checks on web requests based on the configured job permissions. For e.g., if the job permission is 600, then no one except the authenticated user can look at the job details via the browser. The authenticated user in the servlet can be obtained using the HttpServletRequest method. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1398) TaskLauncher remains stuck on tasks waiting for free nodes even if task is killed.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1398: --- Assignee: Amareshwari Sriramadasu Status: Patch Available (was: Open) TaskLauncher remains stuck on tasks waiting for free nodes even if task is killed. -- Key: MAPREDUCE-1398 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1398 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Reporter: Hemanth Yamijala Assignee: Amareshwari Sriramadasu Attachments: patch-1398.txt Tasks could be assigned to trackers for slots that are running other tasks in a commit pending state. This is an optimization done to pipeline task assignment and launch. When the task reaches the tracker, it waits until sufficient slots become free for it. This wait is done in the TaskLauncher thread. Now, while waiting, if the task is killed externally (maybe because the job finishes, etc), the TaskLauncher is not notified of this. So, it continues to wait for the killed task to get sufficient slots. If slots do not become free for a long time, this would result in considerable delay in waking up the TaskLauncher thread. If the waiting task happens to be a high RAM task, then it is also wasteful, because by waking up, it can make way for normal tasks that can run on the available number of slots. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1398) TaskLauncher remains stuck on tasks waiting for free nodes even if task is killed.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1398: --- Attachment: patch-1398.txt Patch fixing the bug. Added a testcase which fails without the patch and passes with the patch. TaskLauncher remains stuck on tasks waiting for free nodes even if task is killed. -- Key: MAPREDUCE-1398 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1398 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Reporter: Hemanth Yamijala Attachments: patch-1398.txt Tasks could be assigned to trackers for slots that are running other tasks in a commit pending state. This is an optimization done to pipeline task assignment and launch. When the task reaches the tracker, it waits until sufficient slots become free for it. This wait is done in the TaskLauncher thread. Now, while waiting, if the task is killed externally (maybe because the job finishes, etc), the TaskLauncher is not notified of this. So, it continues to wait for the killed task to get sufficient slots. If slots do not become free for a long time, this would result in considerable delay in waking up the TaskLauncher thread. If the waiting task happens to be a high RAM task, then it is also wasteful, because by waking up, it can make way for normal tasks that can run on the available number of slots. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1354) Refactor JobTracker.submitJob to not lock the JobTracker during the HDFS accesses
[ https://issues.apache.org/jira/browse/MAPREDUCE-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832468#action_12832468 ] Amar Kamat commented on MAPREDUCE-1354: --- Job initialization (job.split localization) can also take up considerable amount of time. Hence we should avoid access to any getter calls to JobInProgress while the initialization is in progress. Following are the other methods that first lock the JobTracker and then JobInProgress potentially locking up the JobTracker during the job initialization. - getMapTaskReports() - getReduceTaskReports() - getCleanupTaskReports() - getSetupTaskReports() - getTaskCompletionEvents() - getTaskDiagnostics() - setJobPriority() Refactor JobTracker.submitJob to not lock the JobTracker during the HDFS accesses - Key: MAPREDUCE-1354 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1354 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Reporter: Devaraj Das Assignee: Arun C Murthy Priority: Critical Attachments: MAPREDUCE-1354_yhadoop20.patch It'd be nice to have the JobTracker object not be locked while accessing the HDFS for reading the jobconf file and while writing the jobinfo file in the submitJob method. We should see if we can avoid taking the lock altogether. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1474) forrest docs for archives is out of date.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832528#action_12832528 ] Hadoop QA commented on MAPREDUCE-1474: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12435397/MAPREDUCE-1474.patch against trunk revision 908321. +1 @author. The patch does not contain any @author tags. +0 tests included. The patch appears to be a documentation patch that doesn't require tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/313/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/313/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/313/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/313/console This message is automatically generated. forrest docs for archives is out of date. - Key: MAPREDUCE-1474 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1474 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Reporter: Mahadev konar Assignee: Mahadev konar Fix For: 0.22.0 Attachments: MAPREDUCE-1474.patch The docs for archives are out of date. The new docs that were checked into hadoop common were lost because of the project split. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1305) Running distcp with -delete incurs avoidable penalties
[ https://issues.apache.org/jira/browse/MAPREDUCE-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832541#action_12832541 ] Hadoop QA commented on MAPREDUCE-1305: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12435423/M1305-2.patch against trunk revision 908321. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/441/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/441/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/441/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/441/console This message is automatically generated. Running distcp with -delete incurs avoidable penalties -- Key: MAPREDUCE-1305 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1305 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 0.20.1 Reporter: Peter Romianowski Assignee: Peter Romianowski Attachments: M1305-1.patch, M1305-2.patch, MAPREDUCE-1305.patch *First problem* In org.apache.hadoop.tools.DistCp#deleteNonexisting we serialize FileStatus objects when the path is all we need. The performance problem comes from org.apache.hadoop.fs.RawLocalFileSystem.RawLocalFileStatus#write which tries to retrieve file permissions by issuing a ls -ld path which is painfully slow. Changed that to just serialize Path and not FileStatus. *Second problem* To delete the files we invoke the hadoop command line tool with option -rmr path. Again, for each file. Changed that to dstfs.delete(path, true) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1305) Running distcp with -delete incurs avoidable penalties
[ https://issues.apache.org/jira/browse/MAPREDUCE-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832575#action_12832575 ] Peter Romianowski commented on MAPREDUCE-1305: -- Thanks Chris for remove calls to FsShell. I've been very busy lately so I did not manage to compile the patch. Running distcp with -delete incurs avoidable penalties -- Key: MAPREDUCE-1305 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1305 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 0.20.1 Reporter: Peter Romianowski Assignee: Peter Romianowski Attachments: M1305-1.patch, M1305-2.patch, MAPREDUCE-1305.patch *First problem* In org.apache.hadoop.tools.DistCp#deleteNonexisting we serialize FileStatus objects when the path is all we need. The performance problem comes from org.apache.hadoop.fs.RawLocalFileSystem.RawLocalFileStatus#write which tries to retrieve file permissions by issuing a ls -ld path which is painfully slow. Changed that to just serialize Path and not FileStatus. *Second problem* To delete the files we invoke the hadoop command line tool with option -rmr path. Again, for each file. Changed that to dstfs.delete(path, true) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1251) c++ utils doesn't compile
[ https://issues.apache.org/jira/browse/MAPREDUCE-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832581#action_12832581 ] Todd Lipcon commented on MAPREDUCE-1251: This should be committed to branch-0.20 as well, since it causes a fail to build from release source on many systems. c++ utils doesn't compile - Key: MAPREDUCE-1251 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1251 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0 Environment: ubuntu karmic 64-bit Reporter: Eli Collins Assignee: Eli Collins Attachments: HDFS-790-1.patch, HDFS-790.patch, MR-1251.patch c++ utils doesn't compile on ubuntu karmic 64-bit. The latest patch for HADOOP-5611 needs to be applied first. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (MAPREDUCE-1251) c++ utils doesn't compile
[ https://issues.apache.org/jira/browse/MAPREDUCE-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reopened MAPREDUCE-1251: c++ utils doesn't compile - Key: MAPREDUCE-1251 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1251 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0 Environment: ubuntu karmic 64-bit Reporter: Eli Collins Assignee: Eli Collins Attachments: HDFS-790-1.patch, HDFS-790.patch, MR-1251.patch c++ utils doesn't compile on ubuntu karmic 64-bit. The latest patch for HADOOP-5611 needs to be applied first. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1305) Running distcp with -delete incurs avoidable penalties
[ https://issues.apache.org/jira/browse/MAPREDUCE-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-1305: -- Hadoop Flags: [Reviewed] +1 patch looks good. Running distcp with -delete incurs avoidable penalties -- Key: MAPREDUCE-1305 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1305 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 0.20.1 Reporter: Peter Romianowski Assignee: Peter Romianowski Attachments: M1305-1.patch, M1305-2.patch, MAPREDUCE-1305.patch *First problem* In org.apache.hadoop.tools.DistCp#deleteNonexisting we serialize FileStatus objects when the path is all we need. The performance problem comes from org.apache.hadoop.fs.RawLocalFileSystem.RawLocalFileStatus#write which tries to retrieve file permissions by issuing a ls -ld path which is painfully slow. Changed that to just serialize Path and not FileStatus. *Second problem* To delete the files we invoke the hadoop command line tool with option -rmr path. Again, for each file. Changed that to dstfs.delete(path, true) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1484) Framework should not sort the input splits
Framework should not sort the input splits -- Key: MAPREDUCE-1484 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1484 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Owen O'Malley Currently the framework sorts the input splits by size before the job is submitted. This makes it very difficult to run map only jobs that transform the input because the assignment of input names to output names isn't obvious. We fixed this once in HADOOP-1440, but the fix was broken so it was rolled back. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1434) Dynamic add input for one job
[ https://issues.apache.org/jira/browse/MAPREDUCE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832608#action_12832608 ] Aaron Kimball commented on MAPREDUCE-1434: -- Owen, The {{getNewInputSplits}} method proposed above requires the InputFormat to maintain state containing the previously-enumerated InputSplits. The proposed command-line tools suggest independent user-side processes performing the addition of files to the job, making this challenging. Given that splits are calculated on the client, but the true list of input splits is held by the JobTracker (or is/could the splits file be written to HDFS?), calculating just the delta might be challenging. I think it might be more reasonable if one of the following things were true: * The client code just calls {{getInputSplits()}} again. The same algorithm is run as in initial job submission, but the output list may be longer than the previous list returned by this method. The InputFormat is responsible for ensuring that it doesn't return any fewer splits than it did before (i.e., don't drop inputs) * For that matter, if the input queue for a job is dynamic, I don't see why this same mechanism couldn't be used to drop splits that are, for whatever reason, irrelevant. * {{getNewInputSplits()}} should have the signature: {{InputSplit [] getNewInputSplits(JobContext job, ListInputSplit existingSplits) throws IOException, InterruptedException}}. The latter case would present to the user a list of the existing inputs read from the existing 'splits' file for the job. That way state-tracking is unnecessary; you can just use (e.g.) a PathFilter to disregard things already in {{existingSplits}}. A final proposition is that users must manually specify new paths (or other arbitrary arguments like database table names, URLs, etc) to include, in addition to the InputFormat. In which case, it might look more sane to have: * {{getNewInputSplits()}} should have the signature: {{InputSplit [] getNewInputSplits(JobContext job, String... newSplitHints) throws IOException, InterruptedException}}. The {{newSplitHints}} is effectively a user-specified argv; it can be decoded as a list of Paths, database tables, etc., and used appropriately by the InputFormat to generate new splits. Other question: What are the semantics of a doubly-specified split? (Especially curious about the inexact match case, where the same file in HDFS is enumerated twice but the splits are at different offsets) Can/should the same file be processed twice in a job? Finally: Why does a user-disconnect timeout kill the job? That's different than the usual case in MapReduce, where a user disconnect is not noticed by the server-side processes at all. I would think that after a user-disconnect timeout, that declares that all the input is added, and that the reduce phase can begin -- not that it should kill things. Dynamic add input for one job - Key: MAPREDUCE-1434 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1434 Project: Hadoop Map/Reduce Issue Type: New Feature Environment: 0.19.0 Reporter: Xing Shi Always we should firstly upload the data to hdfs, then we can analize the data using hadoop mapreduce. Sometimes, the upload process takes long time. So if we can add input during one job, the time can be saved. WHAT? Client: a) hadoop job -add-input jobId inputFormat ... Add the input to jobid b) hadoop job -add-input done Tell the JobTracker, the input has been prepared over. c) hadoop job -add-input status jobid Show how many input the jobid has. HOWTO? Mainly, I think we should do three things: 1. JobClinet: here JobClient should support add input to a job, indeed, JobClient generate the split, and submit to JobTracker. 2. JobTracker: JobTracker support addInput, and add the new tasks to the original mapTasks. Because the uploaded data will be processed quickly, so it also should update the scheduler to support pending a map task till Client tells the Job input done. 3. Reducer: the reducer should also update the mapNums, so it will shuffle right. This is the rough idea, and I will update it . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1436) Deadlock in preemption code in fair scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832623#action_12832623 ] Matei Zaharia commented on MAPREDUCE-1436: -- Are you suggesting that I add a JobTracker lock in update() or in the JobListener methods? I think it's best to add it in update() because it also gets called from a separate thread. This actually happens quite rarely now (it used to be every few seconds, but it's every 15 seconds after MAPREDUCE-706, and can be set higher pretty safely). BTW, I found another deadlock that seems to be much rarer (it happened when I was submitting about 50 jobs simultaneously) but is not related to preemption: code Found one Java-level deadlock: = IPC Server handler 24 on 9001: waiting to lock monitor 0x40c91750 (object 0x7fc0243e2c20, a org.apache.hadoop.mapred.JobTracker), which is held by IPC Server handler 0 on 9001 IPC Server handler 0 on 9001: waiting to lock monitor 0x40bc0770 (object 0x7fc0243e3080, a org.apache.hadoop.mapred.FairScheduler), which is held by FairScheduler update thread FairScheduler update thread: waiting to lock monitor 0x4095dd98 (object 0x7fc0258bc0d0, a org.apache.hadoop.mapred.JobInProgress), which is held by IPC Server handler 0 on 9001 Java stack information for the threads listed above: === IPC Server handler 24 on 9001: at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2487) - waiting to lock 0x7fc0243e2c20 (a org.apache.hadoop.mapred.JobTracker) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) IPC Server handler 0 on 9001: at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:2115) - waiting to lock 0x7fc0243e3080 (a org.apache.hadoop.mapred.FairScheduler) - locked 0x7fc0243e3420 (a java.util.TreeMap) - locked 0x7fc0243e2c20 (a org.apache.hadoop.mapred.JobTracker) at org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:2510) - locked 0x7fc0258bc0d0 (a org.apache.hadoop.mapred.JobInProgress) at org.apache.hadoop.mapred.JobInProgress.jobComplete(JobInProgress.java:2146) at org.apache.hadoop.mapred.JobInProgress.completedTask(JobInProgress.java:2084) - locked 0x7fc0258bc0d0 (a org.apache.hadoop.mapred.JobInProgress) at org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:883) - locked 0x7fc0258bc0d0 (a org.apache.hadoop.mapred.JobInProgress) at org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:3564) at org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:2758) - locked 0x7fc0243e2c20 (a org.apache.hadoop.mapred.JobTracker) at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2553) - locked 0x7fc0243e2c20 (a org.apache.hadoop.mapred.JobTracker) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953) FairScheduler update thread: at org.apache.hadoop.mapred.JobInProgress.scheduleReduces(JobInProgress.java:1203) - waiting to lock 0x7fc0258bc0d0 (a org.apache.hadoop.mapred.JobInProgress) at org.apache.hadoop.mapred.JobSchedulable.updateDemand(JobSchedulable.java:53) at org.apache.hadoop.mapred.PoolSchedulable.updateDemand(PoolSchedulable.java:81) at org.apache.hadoop.mapred.FairScheduler.update(FairScheduler.java:577) - locked 0x7fc0243e3080 (a org.apache.hadoop.mapred.FairScheduler) at org.apache.hadoop.mapred.FairScheduler$UpdateThread.run(FairScheduler.java:277) /code The problem in this
[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats
[ https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832642#action_12832642 ] Hadoop QA commented on MAPREDUCE-1309: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12435485/mapreduce-1309--2010-02-10.patch against trunk revision 908321. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 17 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. -1 javac. The applied patch generated 2219 javac compiler warnings (more than the trunk's current 2215 warnings). +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/442/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/442/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/442/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/442/console This message is automatically generated. I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats - Key: MAPREDUCE-1309 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Dick King Assignee: Dick King Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, demuxer-plus-concatenated-files--2010-01-06.patch, demuxer-plus-concatenated-files--2010-01-08-b.patch, demuxer-plus-concatenated-files--2010-01-08-c.patch, demuxer-plus-concatenated-files--2010-01-08-d.patch, demuxer-plus-concatenated-files--2010-01-08.patch, demuxer-plus-concatenated-files--2010-01-11.patch, mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch There are two orthogonal questions to answer when processing a job tracker log: how will the logs and the xml configuration files be packaged, and in which release of hadoop map/reduce were the logs generated? The existing rumen only has a couple of answers to this question. The new engine will handle three answers to the version question: 0.18, 0.20 and current, and two answers to the packaging question: separate files with names derived from the job ID, and concatenated files with a header between sections [used for easier file interchange]. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1470) Move Delegation token into Common so that we can use it for MapReduce also
[ https://issues.apache.org/jira/browse/MAPREDUCE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832645#action_12832645 ] Hudson commented on MAPREDUCE-1470: --- Integrated in Hadoop-Mapreduce-trunk #232 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/232/]) Move Delegation token into Common so that we can use it for MapReduce also -- Key: MAPREDUCE-1470 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1470 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.22.0 Attachments: mr-1470.patch We need to update one reference for map/reduce when we move the hdfs delegation tokens. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1433) Create a Delegation token for MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832646#action_12832646 ] Hudson commented on MAPREDUCE-1433: --- Integrated in Hadoop-Mapreduce-trunk #232 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/232/]) Create a Delegation token for MapReduce --- Key: MAPREDUCE-1433 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: security Reporter: Owen O'Malley Assignee: Owen O'Malley Fix For: 0.22.0 Attachments: 1433.bp20.patch, 1433.bp20.patch, mr-1433.patch, mr-1433.patch, mr-1433.patch, mr-1433.patch, mr-1433.patch Occasionally, MapReduce jobs need to launch other MapReduce jobs. With security enabled, the task needs to authenticate to the JobTracker as the user with a token. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1399) The archive command shows a null error message
[ https://issues.apache.org/jira/browse/MAPREDUCE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832648#action_12832648 ] Hudson commented on MAPREDUCE-1399: --- Integrated in Hadoop-Mapreduce-trunk #232 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/232/]) The archive command shows a null error message -- Key: MAPREDUCE-1399 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1399 Project: Hadoop Map/Reduce Issue Type: Bug Components: harchive Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Fix For: 0.22.0 Attachments: m1399_20100204.patch, m1399_20100205.patch, m1399_20100205trunk.patch, m1399_20100205trunk2.patch, m1399_20100205trunk2_y0.20.patch, MAPREDUCE-1399.patch {noformat} bash-3.1$ hadoop archive -archiveName foo.har -p . foo . Exception in archives null {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1448) [Mumak] mumak.sh does not honor --config option.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832647#action_12832647 ] Hudson commented on MAPREDUCE-1448: --- Integrated in Hadoop-Mapreduce-trunk #232 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/232/]) [Mumak] mumak.sh does not honor --config option. Key: MAPREDUCE-1448 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1448 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.21.0, 0.22.0 Reporter: Hong Tang Assignee: Hong Tang Fix For: 0.21.0 Attachments: mapred-1448-2.patch, mapred-1448.patch When --config is specified, mumak.sh should put the customized conf directory in the classpath. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1425) archive throws OutOfMemoryError
[ https://issues.apache.org/jira/browse/MAPREDUCE-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832644#action_12832644 ] Hudson commented on MAPREDUCE-1425: --- Integrated in Hadoop-Mapreduce-trunk #232 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/232/]) archive throws OutOfMemoryError --- Key: MAPREDUCE-1425 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1425 Project: Hadoop Map/Reduce Issue Type: Improvement Components: harchive Reporter: Tsz Wo (Nicholas), SZE Assignee: Mahadev konar Fix For: 0.22.0 Attachments: har.sh, m1425_20100129TextFileGenerator.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425_y_0.20.patch {noformat} -bash-3.1$ hadoop archive -archiveName t4.har -p . t4 . Exception in thread main java.lang.OutOfMemoryError: Java heap space at java.util.regex.Pattern.compile(Pattern.java:1432) at java.util.regex.Pattern.init(Pattern.java:1133) at java.util.regex.Pattern.compile(Pattern.java:847) at java.lang.String.replace(String.java:2208) at org.apache.hadoop.fs.Path.normalizePath(Path.java:146) at org.apache.hadoop.fs.Path.initialize(Path.java:137) at org.apache.hadoop.fs.Path.init(Path.java:126) at org.apache.hadoop.fs.Path.makeQualified(Path.java:296) at org.apache.hadoop.hdfs.DistributedFileSystem.makeQualified(DistributedFileSystem.java:244) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:256) at org.apache.hadoop.tools.HadoopArchives.archive(HadoopArchives.java:393) at org.apache.hadoop.tools.HadoopArchives.run(HadoopArchives.java:736) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.tools.HadoopArchives.main(HadoopArchives.java:751) {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1320) StringBuffer - StringBuilder occurence
[ https://issues.apache.org/jira/browse/MAPREDUCE-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832654#action_12832654 ] Hadoop QA commented on MAPREDUCE-1320: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428677/MAPREDUCE-1320.patch against trunk revision 908321. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/443/console This message is automatically generated. StringBuffer - StringBuilder occurence Key: MAPREDUCE-1320 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1320 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.22.0 Reporter: Kay Kay Fix For: 0.22.0 Attachments: MAPREDUCE-1320.patch A good number of toString() implementations use StringBuffer when the reference clearly does not go out of scope of the method and no concurrency is needed. Patch contains replacing those occurences from StringBuffer to StringBuilder. Created against map/reduce project trunk . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1485) CapacityScheduler should have prevent a single job taking over large parts of a cluster
CapacityScheduler should have prevent a single job taking over large parts of a cluster --- Key: MAPREDUCE-1485 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1485 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/capacity-sched Reporter: Arun C Murthy Assignee: Arun C Murthy Fix For: 0.22.0 The proposal is to have a per-queue limit on the number of concurrent tasks a job can run on a cluster. We've seen cases where a single, large, job took over a majority of the cluster - worse, it meant that any bug in it caused issues for both the NameNode _and_ the JobTracker. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-323) Improve the way job history files are managed
[ https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832677#action_12832677 ] Edward Capriolo commented on MAPREDUCE-323: --- Being able to control the structure better is definitely a nice feature. Practically, for dividing the job folders by mm/dd/yy would solve the immediate problem on having to clean and restart your JobTracker when you hit ext3 limit. Introducing a variable into the jobtracker mapred.jobhistory.maxjobhistory and a FIFO queue might be helpful as well. As things stand now a downtime and cleanup is needed to keep the JobTracker running well, this is less then optimal. Improve the way job history files are managed - Key: MAPREDUCE-323 URL: https://issues.apache.org/jira/browse/MAPREDUCE-323 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.21.0, 0.22.0 Reporter: Amar Kamat Assignee: Amareshwari Sriramadasu Priority: Critical Today all the jobhistory files are dumped in one _job-history_ folder. This can cause problems when there is a need to search the history folder (job-recovery etc). It would be nice if we group all the jobs under a _user_ folder. So all the jobs for user _amar_ will go in _history-folder/amar/_. Jobs can be categorized using various features like _jobid, date, jobname_ etc but using _username_ will make the search much more efficient and also will not result into namespace explosion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1431) archive does not work with distcp -update
[ https://issues.apache.org/jira/browse/MAPREDUCE-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832697#action_12832697 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-1431: --- Took a closer look: HarFileSystem extends FilterFileSystem and it uses the underlying file system to get file checksum. That's why we got Wrong FS since HarFileSystem passes a har:// path to the underlying fs.getFileChecksum(..). In our case, the underlying fs is hdfs. archive does not work with distcp -update - Key: MAPREDUCE-1431 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1431 Project: Hadoop Map/Reduce Issue Type: Bug Components: harchive Reporter: Tsz Wo (Nicholas), SZE Assignee: Mahadev konar Fix For: 0.22.0 The following distcp command works. {noformat} hadoop distcp -Dmapred.job.queue.name=q har://hdfs-nn_hostname:8020/user/tsz/t101.har/t101 t101_distcp {noformat} However, it does not work for -update. {noformat} -bash-3.1$ hadoop distcp -Dmapred.job.queue.name=q -update har://hdfs-nn_hostname:8020/user/tsz/t101.har/t101 t101_distcp 10/01/29 20:06:53 INFO tools.DistCp: srcPaths=[har://hdfs-nn_hostname:8020/user/tsz/t101.har/t101] 10/01/29 20:06:53 INFO tools.DistCp: destPath=t101 java.lang.IllegalArgumentException: Wrong FS: har://hdfs-nn_hostname:8020/user/tsz/t101.har/t101/text-, expected: hdfs://nn_hostname at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310) at org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:99) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:155) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:463) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:46) at org.apache.hadoop.fs.FilterFileSystem.getFileChecksum(FilterFileSystem.java:250) at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1204) at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1084) ... {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step
[ https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832705#action_12832705 ] Hadoop QA commented on MAPREDUCE-1341: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12435484/MAPREDUCE-1341.6.patch against trunk revision 908321. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 27 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/314/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/314/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/314/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/314/console This message is automatically generated. Sqoop should have an option to create hive tables and skip the table import step Key: MAPREDUCE-1341 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341 Project: Hadoop Map/Reduce Issue Type: New Feature Components: contrib/sqoop Affects Versions: 0.22.0 Reporter: Leonid Furman Assignee: Leonid Furman Priority: Minor Fix For: 0.22.0 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, MAPREDUCE-1341.4.patch, MAPREDUCE-1341.5.patch, MAPREDUCE-1341.6.patch, MAPREDUCE-1341.patch In case the client only needs to create tables in hive, it would be helpful if Sqoop had an optional parameter: --hive-create-only which would omit the time consuming table import step, generate hive create table statements and run them. If this feature seems useful, I can generate the patch. I have modified the Sqoop code and built it on my development machine, and it seems to be working well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1334) contrib/index - test - TestIndexUpdater fails due to an additional presence of file _SUCCESS in hdfs
[ https://issues.apache.org/jira/browse/MAPREDUCE-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832709#action_12832709 ] Hadoop QA commented on MAPREDUCE-1334: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12429081/MAPREDUCE-1334.patch against trunk revision 908321. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/444/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/444/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/444/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/444/console This message is automatically generated. contrib/index - test - TestIndexUpdater fails due to an additional presence of file _SUCCESS in hdfs - Key: MAPREDUCE-1334 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1334 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/index Reporter: Kay Kay Priority: Critical Fix For: 0.21.0 Attachments: MAPREDUCE-1334.patch $ cd src/contrib/index $ ant clean test This fails the test TestIndexUpdater due to a mismatch in the - doneFileNames - data structure, when it is being run with different parameters. (ArrayIndexOutOfBoundsException raised when inserting elements in doneFileNames, array ). Debugging further - there seems to be an additional file called as - hdfs://localhost:36021/myoutput/_SUCCESS , taken into consideration in addition to those that begins with done* . The presence of the extra file causes the error. Attaching a patch that would circumvent this by increasing the array length of shards by 1 . But longer term the test fixtures need to be probably revisited to see if the presence of _SUCCESS as a file is a good thing to begin with before we even get to this test case. Any comments / suggestions on the same welcome. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAPREDUCE-1375) TestFileArgs fails intermittently
[ https://issues.apache.org/jira/browse/MAPREDUCE-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reassigned MAPREDUCE-1375: -- Assignee: Todd Lipcon TestFileArgs fails intermittently - Key: MAPREDUCE-1375 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1375 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Amar Kamat Assignee: Todd Lipcon Fix For: 0.22.0 Attachments: TEST-org.apache.hadoop.streaming.TestFileArgs.txt TestFileArgs failed once for me with the following error {code} expected:[job.jar sidefile tmp ] but was:[] sidefile tmp ] but was:[] at org.apache.hadoop.streaming.TestStreaming.checkOutput(TestStreaming.java:107) at org.apache.hadoop.streaming.TestStreaming.testCommandLine(TestStreaming.java:123) {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1375) TestFileArgs fails intermittently
[ https://issues.apache.org/jira/browse/MAPREDUCE-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832751#action_12832751 ] Todd Lipcon commented on MAPREDUCE-1375: I think I got this figured out. The issue is that the test actually tries to write some roses are red text to ls's stdin. Very infrequently, the ls will actually complete before the data can be flushed, so the task gets a Broken pipe exception - see MAPREDUCE-1481. I'm actually unsure whether MAPREDUCE-1481 is a bug, but the easy fix for this test is to make the input so no data gets written into ls's stdin. I'm running the test in a loop with this fix now. If it keeps going for a couple hours without failure I'll post a patch. (before, this loop would fail after about 10 minutes usually) TestFileArgs fails intermittently - Key: MAPREDUCE-1375 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1375 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Amar Kamat Assignee: Todd Lipcon Fix For: 0.22.0 Attachments: TEST-org.apache.hadoop.streaming.TestFileArgs.txt TestFileArgs failed once for me with the following error {code} expected:[job.jar sidefile tmp ] but was:[] sidefile tmp ] but was:[] at org.apache.hadoop.streaming.TestStreaming.checkOutput(TestStreaming.java:107) at org.apache.hadoop.streaming.TestStreaming.testCommandLine(TestStreaming.java:123) {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1481) Streaming should swallow IOExceptions when closing clientOut
[ https://issues.apache.org/jira/browse/MAPREDUCE-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832753#action_12832753 ] Todd Lipcon commented on MAPREDUCE-1481: Actually, I think this is a bug but not quite how I described it. If the flush fails, it means we were trying to write data into a streaming executable that didn't consume all of its input. I don't know what the expected behavior is here. Right now, the behavior is that we stop consuming its output, but the task still succeeds so long as the exit code is 0. I think this is incorrect. We should either entirely fail the task regardless of exit code, or we should consume the rest of its output. Streaming should swallow IOExceptions when closing clientOut Key: MAPREDUCE-1481 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1481 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Affects Versions: 0.20.1, 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon in PipeMapRed.mapRedFinished, streaming flushes and closes clientOut_, the handle to the subprocess's stdin. If the subprocess has already exited or closed its stdin, this will generate a Broken Pipe IOException. This causes us to skip waitOutputThreads, which is incorrect, since the subprocess may have data still written from stdout that needs to be read. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1375) TestFileArgs fails intermittently
[ https://issues.apache.org/jira/browse/MAPREDUCE-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated MAPREDUCE-1375: --- Attachment: mapreduce-1375.txt I think this patch fixes the problem. TestFileArgs fails intermittently - Key: MAPREDUCE-1375 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1375 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Amar Kamat Assignee: Todd Lipcon Fix For: 0.22.0 Attachments: mapreduce-1375.txt, TEST-org.apache.hadoop.streaming.TestFileArgs.txt TestFileArgs failed once for me with the following error {code} expected:[job.jar sidefile tmp ] but was:[] sidefile tmp ] but was:[] at org.apache.hadoop.streaming.TestStreaming.checkOutput(TestStreaming.java:107) at org.apache.hadoop.streaming.TestStreaming.testCommandLine(TestStreaming.java:123) {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1375) TestFileArgs fails intermittently
[ https://issues.apache.org/jira/browse/MAPREDUCE-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated MAPREDUCE-1375: --- Status: Patch Available (was: Open) TestFileArgs fails intermittently - Key: MAPREDUCE-1375 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1375 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Amar Kamat Assignee: Todd Lipcon Fix For: 0.22.0 Attachments: mapreduce-1375.txt, TEST-org.apache.hadoop.streaming.TestFileArgs.txt TestFileArgs failed once for me with the following error {code} expected:[job.jar sidefile tmp ] but was:[] sidefile tmp ] but was:[] at org.apache.hadoop.streaming.TestStreaming.checkOutput(TestStreaming.java:107) at org.apache.hadoop.streaming.TestStreaming.testCommandLine(TestStreaming.java:123) {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1462) Enable context-specific and stateful serializers in MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated MAPREDUCE-1462: - Attachment: MAPREDUCE-1462-mr.patch MAPREDUCE-1462-common.patch In order to help understand the problem better I've created a demonstration patch that uses the SerializationContext-based user API, while retaining the Serialization code that exists in common. (In fact, I had to make some changes to the Serialization code so that it can retain its metadata in an instance variable.) Here's what the configuration looks like for the user: {code} Schema keySchema = Schema.create(Schema.Type.STRING); Schema valSchema = Schema.create(Schema.Type.LONG); job.setSerialization(Job.SerializationContext.MAP_OUTPUT_KEY, new AvroGenericSerialization(keySchema)); job.setSerialization(Job.SerializationContext.MAP_OUTPUT_VALUE, new AvroGenericSerialization(valSchema)); {code} Enable context-specific and stateful serializers in MapReduce - Key: MAPREDUCE-1462 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1462 Project: Hadoop Map/Reduce Issue Type: New Feature Components: task Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: h-1462.patch, MAPREDUCE-1462-common.patch, MAPREDUCE-1462-mr.patch Although the current serializer framework is powerful, within the context of a job it is limited to picking a single serializer for a given class. Additionally, Avro generic serialization can make use of additional configuration/state such as the schema. (Most other serialization frameworks including Writable, Jute/Record IO, Thrift, Avro Specific, and Protocol Buffers only need the object's class name to deserialize the object.) With the goal of keeping the easy things easy and maintaining backwards compatibility, we should be able to allow applications to use context specific (eg. map output key) serializers in addition to the current type based ones that handle the majority of the cases. Furthermore, we should be able to support serializer specific configuration/metadata in a type safe manor without cluttering up the base API with a lot of new methods that will confuse new users. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-434) local map-reduce job limited to single reducer
[ https://issues.apache.org/jira/browse/MAPREDUCE-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832781#action_12832781 ] Hadoop QA commented on MAPREDUCE-434: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12435513/MAPREDUCE-434.5.patch against trunk revision 908321. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/315/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/315/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/315/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/315/console This message is automatically generated. local map-reduce job limited to single reducer -- Key: MAPREDUCE-434 URL: https://issues.apache.org/jira/browse/MAPREDUCE-434 Project: Hadoop Map/Reduce Issue Type: Bug Environment: local job tracker Reporter: Yoram Arnon Assignee: Aaron Kimball Priority: Minor Attachments: MAPREDUCE-434.2.patch, MAPREDUCE-434.3.patch, MAPREDUCE-434.4.patch, MAPREDUCE-434.5.patch, MAPREDUCE-434.patch when mapred.job.tracker is set to 'local', my setNumReduceTasks call is ignored, and the number of reduce tasks is set at 1. This prevents me from locally debugging my partition function, which tries to partition based on the number of reduce tasks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-326) The lowest level map-reduce APIs should be byte oriented
[ https://issues.apache.org/jira/browse/MAPREDUCE-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated MAPREDUCE-326: Attachment: MAPREDUCE-326.pdf Here's a proposal for a binary API for review. The lowest level map-reduce APIs should be byte oriented Key: MAPREDUCE-326 URL: https://issues.apache.org/jira/browse/MAPREDUCE-326 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: eric baldeschwieler Attachments: MAPREDUCE-326.pdf As discussed here: https://issues.apache.org/jira/browse/HADOOP-1986#action_12551237 The templates, serializers and other complexities that allow map-reduce to use arbitrary types complicate the design and lead to lots of object creates and other overhead that a byte oriented design would not suffer. I believe the lowest level implementation of hadoop map-reduce should have byte string oriented APIs (for keys and values). This API would be more performant, simpler and more easily cross language. The existing API could be maintained as a thin layer on top of the leaner API. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1480) CombineFileRecordReader does not properly initialize child RecordReader
[ https://issues.apache.org/jira/browse/MAPREDUCE-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832784#action_12832784 ] Hadoop QA commented on MAPREDUCE-1480: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12435529/MAPREDUCE-1480.2.patch against trunk revision 908321. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/445/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/445/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/445/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/445/console This message is automatically generated. CombineFileRecordReader does not properly initialize child RecordReader --- Key: MAPREDUCE-1480 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1480 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Aaron Kimball Assignee: Aaron Kimball Attachments: MAPREDUCE-1480.2.patch, MAPREDUCE-1480.patch CombineFileRecordReader instantiates child RecordReader instances but never calls their initialize() method to give them the proper TaskAttemptContext. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1486) Configuration data should be preserved within the same MapTask
Configuration data should be preserved within the same MapTask -- Key: MAPREDUCE-1486 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1486 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Reporter: Aaron Kimball Assignee: Aaron Kimball Map tasks involve a number of Contexts -- at least a TaskAttemptContext and a MapContext. These context objects contain a Configuration each; when one context is initialized, it initializes its own Configuration by deep-copying a previous Configuration. If one Context instance is used entirely prior to a second, more specific Context then the second Context should contain the configuration data initialized in the previous Context. This specifically affects the interaction between an InputFormat and its RecordReader instance(s). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1486) Configuration data should be preserved within the same MapTask
[ https://issues.apache.org/jira/browse/MAPREDUCE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Kimball updated MAPREDUCE-1486: - Attachment: MAPREDUCE-1486.patch Attaching patch which fixes this problem; now the same configuration data will flow forward through the map task. This patch also contains a test case that highlights the problem. Configuration data should be preserved within the same MapTask -- Key: MAPREDUCE-1486 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1486 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Reporter: Aaron Kimball Assignee: Aaron Kimball Attachments: MAPREDUCE-1486.patch Map tasks involve a number of Contexts -- at least a TaskAttemptContext and a MapContext. These context objects contain a Configuration each; when one context is initialized, it initializes its own Configuration by deep-copying a previous Configuration. If one Context instance is used entirely prior to a second, more specific Context then the second Context should contain the configuration data initialized in the previous Context. This specifically affects the interaction between an InputFormat and its RecordReader instance(s). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1486) Configuration data should be preserved within the same MapTask
[ https://issues.apache.org/jira/browse/MAPREDUCE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Kimball updated MAPREDUCE-1486: - Status: Patch Available (was: Open) Configuration data should be preserved within the same MapTask -- Key: MAPREDUCE-1486 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1486 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Reporter: Aaron Kimball Assignee: Aaron Kimball Attachments: MAPREDUCE-1486.patch Map tasks involve a number of Contexts -- at least a TaskAttemptContext and a MapContext. These context objects contain a Configuration each; when one context is initialized, it initializes its own Configuration by deep-copying a previous Configuration. If one Context instance is used entirely prior to a second, more specific Context then the second Context should contain the configuration data initialized in the previous Context. This specifically affects the interaction between an InputFormat and its RecordReader instance(s). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step
[ https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832797#action_12832797 ] Aaron Kimball commented on MAPREDUCE-1341: -- +1; patch #6 looks good to me. If someone could commit this, that'd be superb. Sqoop should have an option to create hive tables and skip the table import step Key: MAPREDUCE-1341 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341 Project: Hadoop Map/Reduce Issue Type: New Feature Components: contrib/sqoop Affects Versions: 0.22.0 Reporter: Leonid Furman Assignee: Leonid Furman Priority: Minor Fix For: 0.22.0 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, MAPREDUCE-1341.4.patch, MAPREDUCE-1341.5.patch, MAPREDUCE-1341.6.patch, MAPREDUCE-1341.patch In case the client only needs to create tables in hive, it would be helpful if Sqoop had an optional parameter: --hive-create-only which would omit the time consuming table import step, generate hive create table statements and run them. If this feature seems useful, I can generate the patch. I have modified the Sqoop code and built it on my development machine, and it seems to be working well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step
[ https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832803#action_12832803 ] Leonid Furman commented on MAPREDUCE-1341: -- Thanks, Aaron! Sqoop should have an option to create hive tables and skip the table import step Key: MAPREDUCE-1341 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341 Project: Hadoop Map/Reduce Issue Type: New Feature Components: contrib/sqoop Affects Versions: 0.22.0 Reporter: Leonid Furman Assignee: Leonid Furman Priority: Minor Fix For: 0.22.0 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, MAPREDUCE-1341.4.patch, MAPREDUCE-1341.5.patch, MAPREDUCE-1341.6.patch, MAPREDUCE-1341.patch In case the client only needs to create tables in hive, it would be helpful if Sqoop had an optional parameter: --hive-create-only which would omit the time consuming table import step, generate hive create table statements and run them. If this feature seems useful, I can generate the patch. I have modified the Sqoop code and built it on my development machine, and it seems to be working well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-326) The lowest level map-reduce APIs should be byte oriented
[ https://issues.apache.org/jira/browse/MAPREDUCE-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated MAPREDUCE-326: Attachment: MAPREDUCE-326-api.patch And an accompanying draft patch for the raw API classes. The lowest level map-reduce APIs should be byte oriented Key: MAPREDUCE-326 URL: https://issues.apache.org/jira/browse/MAPREDUCE-326 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: eric baldeschwieler Attachments: MAPREDUCE-326-api.patch, MAPREDUCE-326.pdf As discussed here: https://issues.apache.org/jira/browse/HADOOP-1986#action_12551237 The templates, serializers and other complexities that allow map-reduce to use arbitrary types complicate the design and lead to lots of object creates and other overhead that a byte oriented design would not suffer. I believe the lowest level implementation of hadoop map-reduce should have byte string oriented APIs (for keys and values). This API would be more performant, simpler and more easily cross language. The existing API could be maintained as a thin layer on top of the leaner API. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1220) Implement an in-cluster LocalJobRunner
[ https://issues.apache.org/jira/browse/MAPREDUCE-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832838#action_12832838 ] Tom White commented on MAPREDUCE-1220: -- bq. Most of the effort involved teasing out the framework in the MapTask and ReduceTask to allow several components such as MapOutputBuffer, ReduceValuesIterator etc. to be used as 'pluggable' components. Interesting. MAPREDUCE-326 has a proposal for making these components pluggable, which might make the work of this JIRA simpler. Implement an in-cluster LocalJobRunner -- Key: MAPREDUCE-1220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1220 Project: Hadoop Map/Reduce Issue Type: New Feature Components: client, jobtracker Reporter: Arun C Murthy Assignee: Arun C Murthy Fix For: 0.22.0 Attachments: MAPREDUCE-1220_yhadoop20.patch Currently very small map-reduce jobs suffer from latency issues due to overheads in Hadoop Map-Reduce such as scheduling, jvm startup etc. We've periodically tried to optimize all parts of framework to achieve lower latencies. I'd like to turn the problem around a little bit. I propose we allow very small jobs to run as a single task job with multiple maps and reduces i.e. similar to our current implementation of the LocalJobRunner. Thus, under certain conditions (maybe user-set configuration, or if input data is small i.e. less a DFS blocksize) we could launch a special task which will run all maps in a serial manner, followed by the reduces. This would really help small jobs achieve significantly smaller latencies, thanks to lesser scheduling overhead, jvm startup, lack of shuffle over the network etc. This would be a huge benefit, especially on large clusters, to small Hive/Pig queries. Thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step
[ https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated MAPREDUCE-1341: - Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) I've just committed this. Thanks Leonid! Sqoop should have an option to create hive tables and skip the table import step Key: MAPREDUCE-1341 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341 Project: Hadoop Map/Reduce Issue Type: New Feature Components: contrib/sqoop Affects Versions: 0.22.0 Reporter: Leonid Furman Assignee: Leonid Furman Priority: Minor Fix For: 0.22.0 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, MAPREDUCE-1341.4.patch, MAPREDUCE-1341.5.patch, MAPREDUCE-1341.6.patch, MAPREDUCE-1341.patch In case the client only needs to create tables in hive, it would be helpful if Sqoop had an optional parameter: --hive-create-only which would omit the time consuming table import step, generate hive create table statements and run them. If this feature seems useful, I can generate the patch. I have modified the Sqoop code and built it on my development machine, and it seems to be working well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1469) Sqoop should disable speculative execution in export
[ https://issues.apache.org/jira/browse/MAPREDUCE-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated MAPREDUCE-1469: - Resolution: Fixed Fix Version/s: 0.22.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) +1 I've just committed this. Thanks Aaron! Sqoop should disable speculative execution in export Key: MAPREDUCE-1469 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1469 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/sqoop Reporter: Aaron Kimball Assignee: Aaron Kimball Fix For: 0.22.0 Attachments: MAPREDUCE-1469.patch Concurrent writers of the same output shard may cause the database to try to insert duplicate primary keys concurrently. Not a good situation. Speculative execution should be forced off for this operation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1476) committer.needsTaskCommit should not be called for a task cleanup attempt
[ https://issues.apache.org/jira/browse/MAPREDUCE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832845#action_12832845 ] Hadoop QA commented on MAPREDUCE-1476: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12435549/patch-1476.txt against trunk revision 908321. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/316/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/316/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/316/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/316/console This message is automatically generated. committer.needsTaskCommit should not be called for a task cleanup attempt - Key: MAPREDUCE-1476 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1476 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: Amareshwari Sriramadasu Assignee: Amareshwari Sriramadasu Fix For: 0.22.0 Attachments: patch-1476.txt Currently, Task.done() calls committer.needsTaskCommit() to know whether it needs a commit or not. This need not be called for task cleanup attempt as no commit is required for a cleanup attempt. Due to MAPREDUCE-1409, we saw a case where cleanup attempt went into COMMIT_PENDING state. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.