[jira] Updated: (MAPREDUCE-1813) NPE in PipeMapred.MRErrorThread
[ https://issues.apache.org/jira/browse/MAPREDUCE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod K V updated MAPREDUCE-1813: - Status: Patch Available (was: Open) Hadoop Flags: [Reviewed] +1 for the patch. Submitting to Hudson. > NPE in PipeMapred.MRErrorThread > --- > > Key: MAPREDUCE-1813 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1813 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.20.1 >Reporter: Amareshwari Sriramadasu >Assignee: Ravi Gummadi > Fix For: 0.22.0 > > Attachments: 1813.patch, 1813.v1.2.patch, 1813.v1.3.patch, > 1813.v1.4.patch, 1813.v1.patch > > > Some reduce tasks fail with following NPE > java.lang.RuntimeException: java.lang.NullPointerException > at > org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325) > at > org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:540) > at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:412) > at org.apache.hadoop.mapred.Child.main(Child.java:159) > Caused by: java.lang.NullPointerException >at > org.apache.hadoop.streaming.PipeMapRed$MRErrorThread.setStatus(PipeMapRed.java:517) > at > org.apache.hadoop.streaming.PipeMapRed$MRErrorThread.run(PipeMapRed.java:449) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1854) [herriot] Automate health script system test
[ https://issues.apache.org/jira/browse/MAPREDUCE-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877732#action_12877732 ] Konstantin Boudnik commented on MAPREDUCE-1854: --- Looks like license of JSch shouldn't be an issue because BSD license is generally acceptable for Apache software (think Ant) > [herriot] Automate health script system test > > > Key: MAPREDUCE-1854 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1854 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: test > Environment: Herriot framework >Reporter: Balaji Rajagopalan >Assignee: Balaji Rajagopalan > Attachments: health_script_5.txt > > Original Estimate: 120h > Remaining Estimate: 120h > > 1. There are three scenarios, first is induce a error from health script, > verify that task tracker is blacklisted. > 2. Make the health script timeout and verify the task tracker is blacklisted. > 3. Make an error in the health script path and make sure the task tracker > stays healthy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1854) [herriot] Automate health script system test
[ https://issues.apache.org/jira/browse/MAPREDUCE-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877730#action_12877730 ] Konstantin Boudnik commented on MAPREDUCE-1854: --- Some comments: - changing visibility from 'package private' to 'public' for testing purpose isn't advisable. Consider injecting a getter with public access - same here: {noformat} - static class TaskTrackerHealthStatus implements Writable { + public static class TaskTrackerHealthStatus implements Writable { {noformat} - AbstractTestCase sounds like a utility methods' class to me. Unless a common parent is really required from some design perspective I won't recommend clog class hierarchy with unnecessary inheritance. - using hard-coded paths like {{/tmp/}} restricts tests applicability. Would it be better to use configurable location, i.e. mapred data directory or something? - JUnit v3 imports +import junit.framework.Assert - using {{StringBuffer}} to append a couple of tokens and convert the result to a new {{String}} seems excessive. Why not to use {{String}}? {noformat} +StringBuffer localFile = new StringBuffer(); +localFile.append(scriptDir).append(File.separator).append(scriptName); +cmdArgs.add(localFile.toString()); {noformat} - remove commented out lines of code which seem like a debugging leftovers - try to generate patch with '--no-prefix' to avoid extra prefixes in the file paths - this JavaDoc seems incomplete {noformat} + * This directly calls the JobTracker public with no modifications + * @param trackerID uniquely indentifies the task tracker + * @return + * @throws IOException is thrown in case of RPC error {noformat} - there some unused imports - are changes in AbstractDaemonCluster.java related to this patch? - looks like the change in DaemonProtocolAspect.aj is unrelated to this patch, isn't it? ;) - same about ClusterProcessManager, HadoopDaemonRemoteCluster, and RemoteProcess As you're clearly using Git (this isn't SVN - it is a great SCM system!) for the development work try to have a separate branch for any JIRA you're working on. But this you'll avoid any mess and accidental inclusion of irrelevant files. - writing new script every time we need to do some sort of ssh command looks bad. I have a couple alternative thoughts: ** using pure Java ssh client like JSch ** creating a wrapper around ssh command using Shell class (in case the above is impossible because of license issues or something) I think this is enough for the starter :) > [herriot] Automate health script system test > > > Key: MAPREDUCE-1854 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1854 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: test > Environment: Herriot framework >Reporter: Balaji Rajagopalan >Assignee: Balaji Rajagopalan > Attachments: health_script_5.txt > > Original Estimate: 120h > Remaining Estimate: 120h > > 1. There are three scenarios, first is induce a error from health script, > verify that task tracker is blacklisted. > 2. Make the health script timeout and verify the task tracker is blacklisted. > 3. Make an error in the health script path and make sure the task tracker > stays healthy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1813) NPE in PipeMapred.MRErrorThread
[ https://issues.apache.org/jira/browse/MAPREDUCE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Gummadi updated MAPREDUCE-1813: Attachment: 1813.v1.4.patch Attaching new patch as I missed replacing one occurence of readOutput() call in earlier patch. > NPE in PipeMapred.MRErrorThread > --- > > Key: MAPREDUCE-1813 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1813 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.20.1 >Reporter: Amareshwari Sriramadasu >Assignee: Ravi Gummadi > Fix For: 0.22.0 > > Attachments: 1813.patch, 1813.v1.2.patch, 1813.v1.3.patch, > 1813.v1.4.patch, 1813.v1.patch > > > Some reduce tasks fail with following NPE > java.lang.RuntimeException: java.lang.NullPointerException > at > org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325) > at > org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:540) > at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:412) > at org.apache.hadoop.mapred.Child.main(Child.java:159) > Caused by: java.lang.NullPointerException >at > org.apache.hadoop.streaming.PipeMapRed$MRErrorThread.setStatus(PipeMapRed.java:517) > at > org.apache.hadoop.streaming.PipeMapRed$MRErrorThread.run(PipeMapRed.java:449) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1778) CompletedJobStatusStore initialization should fail if {mapred.job.tracker.persist.jobstatus.dir} is unwritable
[ https://issues.apache.org/jira/browse/MAPREDUCE-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877728#action_12877728 ] Hadoop QA commented on MAPREDUCE-1778: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12446825/mapred-1778-4.patch against trunk revision 953490. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/233/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/233/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/233/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/233/console This message is automatically generated. > CompletedJobStatusStore initialization should fail if > {mapred.job.tracker.persist.jobstatus.dir} is unwritable > -- > > Key: MAPREDUCE-1778 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1778 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Reporter: Amar Kamat >Assignee: Krishna Ramachandran > Attachments: mapred-1778-1.patch, mapred-1778-2.patch, > mapred-1778-3.patch, mapred-1778-4.patch, mapred-1778.patch > > > If {mapred.job.tracker.persist.jobstatus.dir} points to an unwritable > location or mkdir of {mapred.job.tracker.persist.jobstatus.dir} fails, then > CompletedJobStatusStore silently ignores the failure and disables > CompletedJobStatusStore. Ideally the JobTracker should bail out early > indicating a misconfiguration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1606) TestJobACLs may timeout as there are no slots for launching JOB_CLEANUP task
[ https://issues.apache.org/jira/browse/MAPREDUCE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod K V updated MAPREDUCE-1606: - Status: Patch Available (was: Open) Hadoop Flags: [Reviewed] > TestJobACLs may timeout as there are no slots for launching JOB_CLEANUP task > > > Key: MAPREDUCE-1606 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1606 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Affects Versions: 0.22.0 >Reporter: Ravi Gummadi >Assignee: Ravi Gummadi > Fix For: 0.22.0 > > Attachments: 1606.patch, 1606.v1.1.patch, 1606.v1.patch, > 1606.v2.patch, MAPREDUCE-1606-20100610.txt, MR1606.20S.1.patch, > MR1606.20S.patch, MR1606.patch > > > TestJobACLs may timeout as there are no slots for launching JOB_CLEANUP task. > Because MiniMRCluster with 0 TaskTrackers is started in the test. In trunk, > we can set the config property mapreduce.job.committer.setup.cleanup.needed > to false sothat we don't get into this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1829) JobInProgress.findSpeculativeTask should use min() to find the candidate instead of sort()
[ https://issues.apache.org/jira/browse/MAPREDUCE-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877719#action_12877719 ] Ravi Gummadi commented on MAPREDUCE-1829: - I think slowestTIP is more meaningful because we are launching speculative attempt for the slowest task only. No ? Why do we call the tip selected(in this method) as "latestTIP" ? > JobInProgress.findSpeculativeTask should use min() to find the candidate > instead of sort() > -- > > Key: MAPREDUCE-1829 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1829 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Affects Versions: 0.22.0 >Reporter: Scott Chen >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1829-20100610.txt, MAPREDUCE-1829.txt > > > findSpeculativeTask needs only one candidate to speculate so it does not need > to sort the whole list. It may looks OK but someone can still submit big jobs > with small slow task thresholds. In this case, this sorting becomes expensive. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1606) TestJobACLs may timeout as there are no slots for launching JOB_CLEANUP task
[ https://issues.apache.org/jira/browse/MAPREDUCE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877717#action_12877717 ] Ravi Gummadi commented on MAPREDUCE-1606: - Patch looks good. Reduced the execution time to 8sec. +1 > TestJobACLs may timeout as there are no slots for launching JOB_CLEANUP task > > > Key: MAPREDUCE-1606 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1606 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Affects Versions: 0.22.0 >Reporter: Ravi Gummadi >Assignee: Ravi Gummadi > Fix For: 0.22.0 > > Attachments: 1606.patch, 1606.v1.1.patch, 1606.v1.patch, > 1606.v2.patch, MAPREDUCE-1606-20100610.txt, MR1606.20S.1.patch, > MR1606.20S.patch, MR1606.patch > > > TestJobACLs may timeout as there are no slots for launching JOB_CLEANUP task. > Because MiniMRCluster with 0 TaskTrackers is started in the test. In trunk, > we can set the config property mapreduce.job.committer.setup.cleanup.needed > to false sothat we don't get into this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1853) MultipleOutputs does not cache TaskAttemptContext
[ https://issues.apache.org/jira/browse/MAPREDUCE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877703#action_12877703 ] Amareshwari Sriramadasu commented on MAPREDUCE-1853: bq. so for every reduce call it creates a new Job instance ...which creates a new LocalJobRunner. Though this will be fixed by MAPREDUCE-1505, caching introduced in the patch is good optimization. Can you regenerate the patch with --no-prefix option, so that it can be applied with command "patch -p0 < patch-file" ? > MultipleOutputs does not cache TaskAttemptContext > - > > Key: MAPREDUCE-1853 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1853 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.21.0 > Environment: OSX 10.6 > java6 >Reporter: Torsten Curdt >Priority: Critical > Attachments: cache-task-attempts.diff > > > In MultipleOutputs there is > [code] > private TaskAttemptContext getContext(String nameOutput) throws IOException { > // The following trick leverages the instantiation of a record writer via > // the job thus supporting arbitrary output formats. > Job job = new Job(context.getConfiguration()); > job.setOutputFormatClass(getNamedOutputFormatClass(context, nameOutput)); > job.setOutputKeyClass(getNamedOutputKeyClass(context, nameOutput)); > job.setOutputValueClass(getNamedOutputValueClass(context, nameOutput)); > TaskAttemptContext taskContext = > new TaskAttemptContextImpl(job.getConfiguration(), > context.getTaskAttemptID()); > return taskContext; > } > [code] > so for every reduce call it creates a new Job instance ...which creates a new > LocalJobRunner. > That does not sound like a good idea. > You end up with a flood of "jvm.JvmMetrics: Cannot initialize JVM Metrics > with processName=JobTracker, sessionId= - already initialized" > This should probably also be added to 0.22. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1778) CompletedJobStatusStore initialization should fail if {mapred.job.tracker.persist.jobstatus.dir} is unwritable
[ https://issues.apache.org/jira/browse/MAPREDUCE-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877697#action_12877697 ] Amar Kamat commented on MAPREDUCE-1778: --- bq. I only see FileStatus.isDir() Look at [FileStatus.java|http://tinyurl.com/2uaew2a] from hadoop-common. Comments: # I still see few lines of code crossing the 80 column margin. # Plz add comments. For example {code} + } else { +FileStatus stat = fs.getFileStatus(path); +FsPermission actual = stat.getPermission(); +if (!stat.isDir()) + throw new DiskErrorException("not a directory: " + + path.toString()); +FsAction user = actual.getUserAction(); +if (!user.implies(FsAction.READ)) + throw new DiskErrorException("directory is not readable: " + + path.toString()); +if (!user.implies(FsAction.WRITE)) + throw new DiskErrorException("directory is not writable: " + + path.toString()); {code} It would be nice to explain why its done this way. Testcase: # It would be better if we test the following scenarios ## directory doesnt exist but mkdir fails() [example: parent folder doesnt have write permissions?] ## directory exists but no read/write permissions ## completedjob status store is a file and not a directory I think the current changes in the patch tests sceanrio#2. Can you please add tests for testing scenario#1 and scenario#3? # If the JobTracker gets started, it should be shut down. # For a newly created job-status-store dir, it would be nice to validate its permissions (i.e JOB_STATUS_STORE_DIR_PERMISSION). > CompletedJobStatusStore initialization should fail if > {mapred.job.tracker.persist.jobstatus.dir} is unwritable > -- > > Key: MAPREDUCE-1778 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1778 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Reporter: Amar Kamat >Assignee: Krishna Ramachandran > Attachments: mapred-1778-1.patch, mapred-1778-2.patch, > mapred-1778-3.patch, mapred-1778-4.patch, mapred-1778.patch > > > If {mapred.job.tracker.persist.jobstatus.dir} points to an unwritable > location or mkdir of {mapred.job.tracker.persist.jobstatus.dir} fails, then > CompletedJobStatusStore silently ignores the failure and disables > CompletedJobStatusStore. Ideally the JobTracker should bail out early > indicating a misconfiguration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1831) Delete the co-located replicas when raiding file
[ https://issues.apache.org/jira/browse/MAPREDUCE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Chen updated MAPREDUCE-1831: -- Attachment: MAPREDUCE-1831.20100610.txt Update. Looking at blocks on the same stripe only. Use FSNamesystem instead of FileSystem to get block information. > Delete the co-located replicas when raiding file > > > Key: MAPREDUCE-1831 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1831 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/raid >Affects Versions: 0.22.0 >Reporter: Scott Chen >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1831.20100610.txt, MAPREDUCE-1831.txt, > MAPREDUCE-1831.v1.1.txt > > > In raid, it is good to have the blocks on the same stripe located on > different machine. > This way when one machine is down, it does not broke two blocks on the stripe. > By doing this, we can decrease the block error probability in raid from > O(p^3) to O(p^4) which can be a hugh improvement (where p is the replica > missing probability). > One way to do this is that we can add a new BlockPlacementPolicy which > deletes the replicas that are co-located. > So when raiding the file, we can make the remaining replicas live on > different machines. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1831) Delete the co-located replicas when raiding file
[ https://issues.apache.org/jira/browse/MAPREDUCE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Chen updated MAPREDUCE-1831: -- Status: Patch Available (was: Open) > Delete the co-located replicas when raiding file > > > Key: MAPREDUCE-1831 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1831 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/raid >Affects Versions: 0.22.0 >Reporter: Scott Chen >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1831.20100610.txt, MAPREDUCE-1831.txt, > MAPREDUCE-1831.v1.1.txt > > > In raid, it is good to have the blocks on the same stripe located on > different machine. > This way when one machine is down, it does not broke two blocks on the stripe. > By doing this, we can decrease the block error probability in raid from > O(p^3) to O(p^4) which can be a hugh improvement (where p is the replica > missing probability). > One way to do this is that we can add a new BlockPlacementPolicy which > deletes the replicas that are co-located. > So when raiding the file, we can make the remaining replicas live on > different machines. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1018) Document changes to the memory management and scheduling model
[ https://issues.apache.org/jira/browse/MAPREDUCE-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hemanth Yamijala updated MAPREDUCE-1018: Status: Patch Available (was: Open) > Document changes to the memory management and scheduling model > -- > > Key: MAPREDUCE-1018 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1018 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: documentation >Affects Versions: 0.21.0 >Reporter: Hemanth Yamijala >Assignee: Hemanth Yamijala >Priority: Blocker > Fix For: 0.21.0 > > Attachments: MAPRED-1018-1.patch, MAPRED-1018-2.patch, > MAPRED-1018-3.patch, MAPRED-1018-4.patch.txt, MAPRED-1018-5.patch.txt, > MAPRED-1018-6.patch.txt, MAPRED-1018-7.patch.txt, MAPRED-1018-8.patch.txt, > MAPRED-1018-9.patch.txt, MAPRED-1018-commons.patch > > > There were changes done for the configuration, monitoring and scheduling of > high ram jobs. This must be documented in the mapred-defaults.xml and also on > forrest documentation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1018) Document changes to the memory management and scheduling model
[ https://issues.apache.org/jira/browse/MAPREDUCE-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hemanth Yamijala updated MAPREDUCE-1018: Attachment: MAPRED-1018-9.patch.txt Attaching a new patch that merges with trunk, and incorporates Vinod's review comments. One of the links will work only when the corresponding changes to cluster_setup are committed to Common. This is expected. > Document changes to the memory management and scheduling model > -- > > Key: MAPREDUCE-1018 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1018 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: documentation >Affects Versions: 0.21.0 >Reporter: Hemanth Yamijala >Assignee: Hemanth Yamijala >Priority: Blocker > Fix For: 0.21.0 > > Attachments: MAPRED-1018-1.patch, MAPRED-1018-2.patch, > MAPRED-1018-3.patch, MAPRED-1018-4.patch.txt, MAPRED-1018-5.patch.txt, > MAPRED-1018-6.patch.txt, MAPRED-1018-7.patch.txt, MAPRED-1018-8.patch.txt, > MAPRED-1018-9.patch.txt, MAPRED-1018-commons.patch > > > There were changes done for the configuration, monitoring and scheduling of > high ram jobs. This must be documented in the mapred-defaults.xml and also on > forrest documentation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1778) CompletedJobStatusStore initialization should fail if {mapred.job.tracker.persist.jobstatus.dir} is unwritable
[ https://issues.apache.org/jira/browse/MAPREDUCE-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Ramachandran updated MAPREDUCE-1778: Status: Patch Available (was: Open) >From Arun's review (since last patch) Check read write permissions using FsAction.getuserAction(); Amar's reco Modified TestJobStatusPersistency.java > CompletedJobStatusStore initialization should fail if > {mapred.job.tracker.persist.jobstatus.dir} is unwritable > -- > > Key: MAPREDUCE-1778 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1778 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Reporter: Amar Kamat >Assignee: Krishna Ramachandran > Attachments: mapred-1778-1.patch, mapred-1778-2.patch, > mapred-1778-3.patch, mapred-1778-4.patch, mapred-1778.patch > > > If {mapred.job.tracker.persist.jobstatus.dir} points to an unwritable > location or mkdir of {mapred.job.tracker.persist.jobstatus.dir} fails, then > CompletedJobStatusStore silently ignores the failure and disables > CompletedJobStatusStore. Ideally the JobTracker should bail out early > indicating a misconfiguration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1778) CompletedJobStatusStore initialization should fail if {mapred.job.tracker.persist.jobstatus.dir} is unwritable
[ https://issues.apache.org/jira/browse/MAPREDUCE-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Ramachandran updated MAPREDUCE-1778: Attachment: mapred-1778-4.patch Update Include Arun's changes Removed the new test and modified existing (from Amar) TestJobStatusPersistency.java > CompletedJobStatusStore initialization should fail if > {mapred.job.tracker.persist.jobstatus.dir} is unwritable > -- > > Key: MAPREDUCE-1778 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1778 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Reporter: Amar Kamat >Assignee: Krishna Ramachandran > Attachments: mapred-1778-1.patch, mapred-1778-2.patch, > mapred-1778-3.patch, mapred-1778-4.patch, mapred-1778.patch > > > If {mapred.job.tracker.persist.jobstatus.dir} points to an unwritable > location or mkdir of {mapred.job.tracker.persist.jobstatus.dir} fails, then > CompletedJobStatusStore silently ignores the failure and disables > CompletedJobStatusStore. Ideally the JobTracker should bail out early > indicating a misconfiguration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1831) Delete the co-located replicas when raiding file
[ https://issues.apache.org/jira/browse/MAPREDUCE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Chen updated MAPREDUCE-1831: -- Description: In raid, it is good to have the blocks on the same stripe located on different machine. This way when one machine is down, it does not broke two blocks on the stripe. By doing this, we can decrease the block error probability in raid from O(p^3) to O(p^4) which can be a hugh improvement (where p is the replica missing probability). One way to do this is that we can add a new BlockPlacementPolicy which deletes the replicas that are co-located. So when raiding the file, we can make the remaining replicas live on different machines. was: In raid, it is good to have the blocks on the same stripe located on different machine. This way when one machine is down, it does not broke two blocks on the stripe. By doing this, we can decrease the block error probability in raid from O(p^3) to O(p^4) which can be a hugh improvement. One way to do this is that we can add a new BlockPlacementPolicy which deletes the replicas that are co-located. So when raiding the file, we can make the remaining replicas live on different machines. > Delete the co-located replicas when raiding file > > > Key: MAPREDUCE-1831 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1831 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/raid >Affects Versions: 0.22.0 >Reporter: Scott Chen >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1831.txt, MAPREDUCE-1831.v1.1.txt > > > In raid, it is good to have the blocks on the same stripe located on > different machine. > This way when one machine is down, it does not broke two blocks on the stripe. > By doing this, we can decrease the block error probability in raid from > O(p^3) to O(p^4) which can be a hugh improvement (where p is the replica > missing probability). > One way to do this is that we can add a new BlockPlacementPolicy which > deletes the replicas that are co-located. > So when raiding the file, we can make the remaining replicas live on > different machines. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1831) Delete the co-located replicas when raiding file
[ https://issues.apache.org/jira/browse/MAPREDUCE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Chen updated MAPREDUCE-1831: -- Status: Open (was: Patch Available) > Delete the co-located replicas when raiding file > > > Key: MAPREDUCE-1831 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1831 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/raid >Affects Versions: 0.22.0 >Reporter: Scott Chen >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1831.txt, MAPREDUCE-1831.v1.1.txt > > > In raid, it is good to have the blocks on the same stripe located on > different machine. > This way when one machine is down, it does not broke two blocks on the stripe. > By doing this, we can decrease the block error probability in raid from > O(p^3) to O(p^4) which can be a hugh improvement. > One way to do this is that we can add a new BlockPlacementPolicy which > deletes the replicas that are co-located. > So when raiding the file, we can make the remaining replicas live on > different machines. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1831) Delete the co-located replicas when raiding file
[ https://issues.apache.org/jira/browse/MAPREDUCE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877676#action_12877676 ] dhruba borthakur commented on MAPREDUCE-1831: - > Is there any block placement policy to make sure that parity does not land on > the same node as its source file This is work-inprogress and we will post it to a new JIRA soon. > Delete the co-located replicas when raiding file > > > Key: MAPREDUCE-1831 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1831 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/raid >Affects Versions: 0.22.0 >Reporter: Scott Chen >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1831.txt, MAPREDUCE-1831.v1.1.txt > > > In raid, it is good to have the blocks on the same stripe located on > different machine. > This way when one machine is down, it does not broke two blocks on the stripe. > By doing this, we can decrease the block error probability in raid from > O(p^3) to O(p^4) which can be a hugh improvement. > One way to do this is that we can add a new BlockPlacementPolicy which > deletes the replicas that are co-located. > So when raiding the file, we can make the remaining replicas live on > different machines. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1831) Delete the co-located replicas when raiding file
[ https://issues.apache.org/jira/browse/MAPREDUCE-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877674#action_12877674 ] Wittawat Tantisiriroj commented on MAPREDUCE-1831: -- Is there any block placement policy to make sure that parity does not land on the same node as its source file? > Delete the co-located replicas when raiding file > > > Key: MAPREDUCE-1831 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1831 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/raid >Affects Versions: 0.22.0 >Reporter: Scott Chen >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1831.txt, MAPREDUCE-1831.v1.1.txt > > > In raid, it is good to have the blocks on the same stripe located on > different machine. > This way when one machine is down, it does not broke two blocks on the stripe. > By doing this, we can decrease the block error probability in raid from > O(p^3) to O(p^4) which can be a hugh improvement. > One way to do this is that we can add a new BlockPlacementPolicy which > deletes the replicas that are co-located. > So when raiding the file, we can make the remaining replicas live on > different machines. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1856) Extract a subset of tests for smoke (DOA) validation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik updated MAPREDUCE-1856: -- Attachment: MAPREDUCE-1856.patch This patch introduces an extra target which allow to execute all tests excluding commit and smoke sets. Also about 15 minutes worth of tests is added. > Extract a subset of tests for smoke (DOA) validation > > > Key: MAPREDUCE-1856 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1856 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: build >Affects Versions: 0.21.0 >Reporter: Konstantin Boudnik >Assignee: Konstantin Boudnik > Attachments: MAPREDUCE-1856.patch > > > Similar to that of HDFS-1199 for MapReduce. > Adds an ability to run up to 30 minutes of the tests to 'smoke' MapReduce > build i.e. find possible issues faster than the full test cycle does). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1856) Extract a subset of tests for smoke (DOA) validation
Extract a subset of tests for smoke (DOA) validation Key: MAPREDUCE-1856 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1856 Project: Hadoop Map/Reduce Issue Type: Improvement Components: build Affects Versions: 0.21.0 Reporter: Konstantin Boudnik Assignee: Konstantin Boudnik Similar to that of HDFS-1199 for MapReduce. Adds an ability to run up to 30 minutes of the tests to 'smoke' MapReduce build i.e. find possible issues faster than the full test cycle does). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1855) refreshSuperUserGroupsConfiguration for MR should use server side configuration for the refresh (for HADOOP-6815)
[ https://issues.apache.org/jira/browse/MAPREDUCE-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boris Shkolnik updated MAPREDUCE-1855: -- Attachment: MAPREDUCE-1855-1.patch > refreshSuperUserGroupsConfiguration for MR should use server side > configuration for the refresh (for HADOOP-6815) > - > > Key: MAPREDUCE-1855 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1855 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Boris Shkolnik >Assignee: Boris Shkolnik > Attachments: MAPREDUCE-1855-1.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1849) Implement a FlumeJava-like library for operations over parallel collections using Hadoop MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877637#action_12877637 ] Jake Mannix commented on MAPREDUCE-1849: While I agree that doing cool Hadoop via functional JVM languages like Scala and Clojure are great ideas, I think part of the point of the findings of this paper (and the point of this particular JIRA ticket) is concerning a simple, object-oriented Java API which has "distributed primitives" that the typical java programmer can easily understand and integrate with their current code with minimal effort. > Implement a FlumeJava-like library for operations over parallel collections > using Hadoop MapReduce > -- > > Key: MAPREDUCE-1849 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1849 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Jeff Hammerbacher > > The API used internally at Google is described in great detail at > http://portal.acm.org/citation.cfm?id=1806596.1806638. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1033) Resolve location of scripts and configuration files after project split
[ https://issues.apache.org/jira/browse/MAPREDUCE-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated MAPREDUCE-1033: - Status: Resolved (was: Patch Available) Resolution: Fixed I've just committed this. > Resolve location of scripts and configuration files after project split > --- > > Key: MAPREDUCE-1033 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1033 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Affects Versions: 0.21.0 >Reporter: Vinod K V >Assignee: Tom White >Priority: Blocker > Fix For: 0.21.0 > > Attachments: MAPREDUCE-1033.patch, MAPREDUCE-1033.patch, > MAPREDUCE-1033.patch, MAPREDUCE-1033.patch > > > At present, all the sub-projects - common, hdfs and mapreduce - have copies > of all the configuration files. Common configuration files should be left in > common, mapreduce specific files should be moved to mapreduce project, same > with hdfs related files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1744) DistributedCache creates its own FileSytem instance when adding a file/archive to the path
[ https://issues.apache.org/jira/browse/MAPREDUCE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishna Ramachandran updated MAPREDUCE-1744: Attachment: mapred-1744-3.patch After further discussion with Arun, it has been decided to put this on hold this until we reconcile with mapred-950 Just for completeness, I modified the existing APIs, add*toClassPath(Path ...) to be compatible > DistributedCache creates its own FileSytem instance when adding a > file/archive to the path > -- > > Key: MAPREDUCE-1744 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1744 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Dick King >Assignee: Krishna Ramachandran > Attachments: BZ-3503564--2010-05-06.patch, h1744.patch, > mapred-1744-1.patch, mapred-1744-2.patch, mapred-1744-3.patch, > mapred-1744.patch, MAPREDUCE-1744.patch > > > According to the contract of {{UserGroupInformation.doAs()}} the only > required operations within the {{doAs()}} block are the > creation of a {{JobClient}} or getting a {{FileSystem}} . > The {{DistributedCache.add(File/Archive)ToClasspath()}} methods create a > {{FileSystem}} instance outside of the {{doAs()}} block, > this {{FileSystem}} instance is not in the scope of the proxy user but of the > superuser and permissions may make the method > fail. > One option is to overload the methods above to receive a filesystem. > Another option is to do obtain the {{FileSystem}} within a {{doAs()}} block, > for this it would be required to have the proxy > user set in the passed configuration. > The second option seems nicer, but I don't know if the proxy user is as a > property in the jobconf. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-1795) add error option if file-based record-readers fail to consume all input (e.g., concatenated gzip, bzip2)
[ https://issues.apache.org/jira/browse/MAPREDUCE-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Roelofs resolved MAPREDUCE-1795. - Resolution: Won't Fix Per previous comment, we're going to fix the underlying issue instead (i.e., make decompressors support concatenated streams). See MAPREDUCE-469. > add error option if file-based record-readers fail to consume all input > (e.g., concatenated gzip, bzip2) > > > Key: MAPREDUCE-1795 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1795 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Greg Roelofs >Assignee: Greg Roelofs > > When running MapReduce with concatenated gzip files as input, only the first > part ("member" in gzip spec parlance, http://www.ietf.org/rfc/rfc1952.txt) is > read; the remainder is silently ignored. As a first step toward fixing that, > this issue will add a configurable option to throw an error in such cases. > MAPREDUCE-469 is the tracker for the more complete fix/feature, whenever that > occurs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-323) Improve the way job history files are managed
[ https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877559#action_12877559 ] Dick King commented on MAPREDUCE-323: - My proposal does present the display faster when the history files are numerous, but we do lose the ability to display the total amount or to jump to the last page. > Improve the way job history files are managed > - > > Key: MAPREDUCE-323 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-323 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.21.0, 0.22.0 >Reporter: Amar Kamat >Assignee: Dick King >Priority: Critical > > Today all the jobhistory files are dumped in one _job-history_ folder. This > can cause problems when there is a need to search the history folder > (job-recovery etc). It would be nice if we group all the jobs under a _user_ > folder. So all the jobs for user _amar_ will go in _history-folder/amar/_. > Jobs can be categorized using various features like _jobid, date, jobname_ > etc but using _username_ will make the search much more efficient and also > will not result into namespace explosion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1033) Resolve location of scripts and configuration files after project split
[ https://issues.apache.org/jira/browse/MAPREDUCE-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877624#action_12877624 ] Hudson commented on MAPREDUCE-1033: --- Integrated in Hadoop-Common-trunk-Commit #293 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/293/]) HADOOP-6794. Move configuration and script files post split. Includes HDFS-1181, MAPREDUCE-1033. > Resolve location of scripts and configuration files after project split > --- > > Key: MAPREDUCE-1033 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1033 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Affects Versions: 0.21.0 >Reporter: Vinod K V >Assignee: Tom White >Priority: Blocker > Fix For: 0.21.0 > > Attachments: MAPREDUCE-1033.patch, MAPREDUCE-1033.patch, > MAPREDUCE-1033.patch, MAPREDUCE-1033.patch > > > At present, all the sub-projects - common, hdfs and mapreduce - have copies > of all the configuration files. Common configuration files should be left in > common, mapreduce specific files should be moved to mapreduce project, same > with hdfs related files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1849) Implement a FlumeJava-like library for operations over parallel collections using Hadoop MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877615#action_12877615 ] Jakob Homan commented on MAPREDUCE-1849: bq. I am not sure closure such as scala's would work on a distributed, multi JVM setup such as Hadoop. Otherwise I agree with Luke's POV. Matei and the Spark guys got it working quite well: http://www.cs.berkeley.edu/~matei/spark/ > Implement a FlumeJava-like library for operations over parallel collections > using Hadoop MapReduce > -- > > Key: MAPREDUCE-1849 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1849 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Jeff Hammerbacher > > The API used internally at Google is described in great detail at > http://portal.acm.org/citation.cfm?id=1806596.1806638. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-323) Improve the way job history files are managed
[ https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877614#action_12877614 ] Chris Douglas commented on MAPREDUCE-323: - The scope of this issue has not been well defined. The designs are arguing about the correct subset of a database to implement for JobHistory, leaving a wide range of known (and as Allen points out, unknown) use cases ill served. This will not converge quickly. For purposes of consensus, this issue is a bug; the _existing_ functionality is not handled efficiently. It should go without saying that the design should not be over-specific to today's use cases, but the issue's focus should remain on solving the problems cited and servicing the use cases already in the system. This is a misbehaving component, not a project implementing a small database in HDFS. Perhaps the title should change to reflect this. There are 3 operations to support (please amend as necessary): # Lookup by JobID. This should not be worse than O\(log n) (and should be O\(1)), as it is a frequent operation. # Find a set of jobs run by a particular user # Find a set of jobs with names matching a regex (2) and (3) can require a scan, but the cost should be bounded. If there are common operator activities (like archiving old history, etc) then the layout should support that, but arbitrary queries are out of scope. The problems with the flat hierarchy are, obviously, the cost of listing files both in the JobTracker and NameNode. This can be ameliorated, somewhat, by HDFS-1091 and HDFS-985, but further optimizations/caching are possible if one can assume that recent entries are more relevant. Dick/[Doug|https://issues.apache.org/jira/browse/MAPREDUCE-323?focusedCommentId=12771987&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12771987]'s format looks sound to me. Amar identified many complexities in implementing the configurable-schema, mini-database proposal and in my opinion: while the solutions are feasible, the virtues of a simpler fix for this issue outweigh the costs of solving those problems. I particularly like the idea of bounding scans of JobHistory to _n_ entries, unless the user requests a deeper search. Caching recent entries, metadata about which subdirectories are sufficent for _n_ entries, etc. are all reasonable optimizations, but adopting the new layout should be sufficient for this issue. Agreed? > Improve the way job history files are managed > - > > Key: MAPREDUCE-323 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-323 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.21.0, 0.22.0 >Reporter: Amar Kamat >Assignee: Dick King >Priority: Critical > > Today all the jobhistory files are dumped in one _job-history_ folder. This > can cause problems when there is a need to search the history folder > (job-recovery etc). It would be nice if we group all the jobs under a _user_ > folder. So all the jobs for user _amar_ will go in _history-folder/amar/_. > Jobs can be categorized using various features like _jobid, date, jobname_ > etc but using _username_ will make the search much more efficient and also > will not result into namespace explosion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1849) Implement a FlumeJava-like library for operations over parallel collections using Hadoop MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877604#action_12877604 ] Olivier Grisel commented on MAPREDUCE-1849: --- I am not sure closure such as scala's would work on a distributed, multi JVM setup such as Hadoop. Otherwise I agree with Luke's POV. > Implement a FlumeJava-like library for operations over parallel collections > using Hadoop MapReduce > -- > > Key: MAPREDUCE-1849 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1849 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Jeff Hammerbacher > > The API used internally at Google is described in great detail at > http://portal.acm.org/citation.cfm?id=1806596.1806638. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1778) CompletedJobStatusStore initialization should fail if {mapred.job.tracker.persist.jobstatus.dir} is unwritable
[ https://issues.apache.org/jira/browse/MAPREDUCE-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877569#action_12877569 ] Krishna Ramachandran commented on MAPREDUCE-1778: - Amar I only see FileStatus.isDir() in trunk (org.apache.hadoop.fs) /** * Is this a directory? * @return true if this is a directory */ public boolean isDir() { return isdir; } > CompletedJobStatusStore initialization should fail if > {mapred.job.tracker.persist.jobstatus.dir} is unwritable > -- > > Key: MAPREDUCE-1778 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1778 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Reporter: Amar Kamat >Assignee: Krishna Ramachandran > Attachments: mapred-1778-1.patch, mapred-1778-2.patch, > mapred-1778-3.patch, mapred-1778.patch > > > If {mapred.job.tracker.persist.jobstatus.dir} points to an unwritable > location or mkdir of {mapred.job.tracker.persist.jobstatus.dir} fails, then > CompletedJobStatusStore silently ignores the failure and disables > CompletedJobStatusStore. Ideally the JobTracker should bail out early > indicating a misconfiguration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-323) Improve the way job history files are managed
[ https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877567#action_12877567 ] Allen Wittenauer commented on MAPREDUCE-323: > 7: Perhaps there needs to be a programmatic API as well, reducing the need > for people to read directories. In fact, I worry we are building something to make a faster UI but making it impossible to use in any other manner. > Improve the way job history files are managed > - > > Key: MAPREDUCE-323 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-323 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.21.0, 0.22.0 >Reporter: Amar Kamat >Assignee: Dick King >Priority: Critical > > Today all the jobhistory files are dumped in one _job-history_ folder. This > can cause problems when there is a need to search the history folder > (job-recovery etc). It would be nice if we group all the jobs under a _user_ > folder. So all the jobs for user _amar_ will go in _history-folder/amar/_. > Jobs can be categorized using various features like _jobid, date, jobname_ > etc but using _username_ will make the search much more efficient and also > will not result into namespace explosion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-323) Improve the way job history files are managed
[ https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877513#action_12877513 ] Dick King commented on MAPREDUCE-323: - I've given this some more thought and I've devised a new design. I don't think that the subdirectory _per se_ is the important issue, except to keep the directory sizes manageable. However, the important operations should be supported, with good performance, preferably in the {{jobhistory.jsp}} interface. We have to support reasonable searches in the {{jsp}} . To that end, I would to do the following: 1: let the done jobs' directory structure be {{DONE/jobtracker-timestamp/123/456/789}} where {{123456789}} is the job ID serial number. Leading zeros are depicted in the directory even if they're not in the serial number. Perhaps {{jobtracker-timestamp}} should be {{jobtracker-id}} ? 2: In the {{jsp}}, we could present newest jobs first. This is probably what people want, and in common cases it speeds up the presentation when the user displays an early page. With the current naming convention,these are the jobs with the lexicographically latest file names. 3: All the URLs in the {{jsp}} pages [including those behind forms] would have a starting job tracker ID and serial number encoded, so we can continue from where we left off, even though we keep adding new jobs to the beginning because of 2: . Subsequent pages will not overlap previous pages just because new jobs have been added at the beginning. 4: When we do searches, we work back through the directories in reverse order, so we can stop when we populate a page rather than reading all of the history files' names. 5: For low-yield searches we'll consider offering to stop after, say, 10K non-matching jobs have been ignored. This lets us process mistyped queries in a reasonable time. 6: The start time is of interest. Inside the {{JobHistory}} code, as the cached history files are being copied to the {{DONE}} directory, an approximation of the start time is available in the modification time of the {{conf.xml}} file. We can copy that, either to the modification time of the new job history file [using {{setTime}}], or encode it into the filename in some manner [as we do with the job name]. Either way, we can then present it in the {{jsp}} result, or filter based on time ranges. What does the community think? 7: Perhaps there needs to be a programmatic API as well, reducing the need for people to read directories. > Improve the way job history files are managed > - > > Key: MAPREDUCE-323 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-323 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.21.0, 0.22.0 >Reporter: Amar Kamat >Assignee: Dick King >Priority: Critical > > Today all the jobhistory files are dumped in one _job-history_ folder. This > can cause problems when there is a need to search the history folder > (job-recovery etc). It would be nice if we group all the jobs under a _user_ > folder. So all the jobs for user _amar_ will go in _history-folder/amar/_. > Jobs can be categorized using various features like _jobid, date, jobname_ > etc but using _username_ will make the search much more efficient and also > will not result into namespace explosion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1829) JobInProgress.findSpeculativeTask should use min() to find the candidate instead of sort()
[ https://issues.apache.org/jira/browse/MAPREDUCE-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877510#action_12877510 ] Scott Chen commented on MAPREDUCE-1829: --- Cool. The code looks neater this way. One small comment: Should we use "latestTIP" instead of "slowestTIP" to make it more clear? > JobInProgress.findSpeculativeTask should use min() to find the candidate > instead of sort() > -- > > Key: MAPREDUCE-1829 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1829 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Affects Versions: 0.22.0 >Reporter: Scott Chen >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1829-20100610.txt, MAPREDUCE-1829.txt > > > findSpeculativeTask needs only one candidate to speculate so it does not need > to sort the whole list. It may looks OK but someone can still submit big jobs > with small slow task thresholds. In this case, this sorting becomes expensive. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1849) Implement a FlumeJava-like library for operations over parallel collections using Hadoop MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877506#action_12877506 ] Luke Lu commented on MAPREDUCE-1849: I had some experience with Cascading in production code. One of the major benefits of being a java library from my POV is easy unit testing of various user defined operations, which is inconvenient in most DSLs. OTOH, Cascading forces you to define data-flows explicitly (which is not so bad, if you have nice FlowBuilder utility class). FlumeJava, IMO, actually captures the essence of MapReduce originated from functional programming. The immutable P* collections and side-effect free (no global effect) DoFn's allows many optimization opportunities a la Haskell's lazy evaluation (deferred evaluation in the paper.) However the lack of type inference and closure in Java makes the usage much more verbose than necessary. I think similar libraries could be better implemented in Scala. > Implement a FlumeJava-like library for operations over parallel collections > using Hadoop MapReduce > -- > > Key: MAPREDUCE-1849 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1849 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Jeff Hammerbacher > > The API used internally at Google is described in great detail at > http://portal.acm.org/citation.cfm?id=1806596.1806638. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1855) refreshSuperUserGroupsConfiguration for MR should use server side configuration for the refresh (for HADOOP-6815)
refreshSuperUserGroupsConfiguration for MR should use server side configuration for the refresh (for HADOOP-6815) - Key: MAPREDUCE-1855 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1855 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Boris Shkolnik Assignee: Boris Shkolnik -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1854) [herriot] Automate health script system test
[ https://issues.apache.org/jira/browse/MAPREDUCE-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Rajagopalan updated MAPREDUCE-1854: -- Attachment: health_script_5.txt The first patch for review. > [herriot] Automate health script system test > > > Key: MAPREDUCE-1854 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1854 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: test > Environment: Herriot framework >Reporter: Balaji Rajagopalan >Assignee: Balaji Rajagopalan > Attachments: health_script_5.txt > > Original Estimate: 120h > Remaining Estimate: 120h > > 1. There are three scenarios, first is induce a error from health script, > verify that task tracker is blacklisted. > 2. Make the health script timeout and verify the task tracker is blacklisted. > 3. Make an error in the health script path and make sure the task tracker > stays healthy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1854) [herriot] Automate health script system test
[herriot] Automate health script system test Key: MAPREDUCE-1854 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1854 Project: Hadoop Map/Reduce Issue Type: New Feature Components: test Environment: Herriot framework Reporter: Balaji Rajagopalan Assignee: Balaji Rajagopalan 1. There are three scenarios, first is induce a error from health script, verify that task tracker is blacklisted. 2. Make the health script timeout and verify the task tracker is blacklisted. 3. Make an error in the health script path and make sure the task tracker stays healthy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1849) Implement a FlumeJava-like library for operations over parallel collections using Hadoop MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877477#action_12877477 ] Jake Mannix commented on MAPREDUCE-1849: [quote] The main difference from Pig seems to be allowing users to work in Java. [quote] To add my $0.02: FlumeJava lets the developers work in an object-oriented language, *period*. The difference between writing a Pig "script", or a SQL (or Hive variant therof) "query" and being able to seamlessly integrate distributed primitives (primitive not meaning java primitive, but "basic building block") in a standard java program is *amazing* The real comparison is between FlumeJava and *Cascading*, which also lets you stay in java-land, and has a query-plan optimizer. I'm no expert in Cascading, but it seems the primitives in Cascading are "verbs" related to flows, while FlumeJava really settles on a DistributedDataSet (PCollection, for them) as the object which has methods, and can be passed to methods of other (either distributed or normal) objects. I don't know if that is clearly better, but it certainly seems more in line with the way most people program in java. > Implement a FlumeJava-like library for operations over parallel collections > using Hadoop MapReduce > -- > > Key: MAPREDUCE-1849 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1849 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Jeff Hammerbacher > > The API used internally at Google is described in great detail at > http://portal.acm.org/citation.cfm?id=1806596.1806638. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1849) Implement a FlumeJava-like library for operations over parallel collections using Hadoop MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877464#action_12877464 ] Jeff Hammerbacher commented on MAPREDUCE-1849: -- Some things you get for free from being a Java library: control flow (branching, looping, etc.), composability (functions, classes, packages), IDE support, etc. Having PigLatin execute on top of something like FlumeJava could be interesting > Implement a FlumeJava-like library for operations over parallel collections > using Hadoop MapReduce > -- > > Key: MAPREDUCE-1849 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1849 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Jeff Hammerbacher > > The API used internally at Google is described in great detail at > http://portal.acm.org/citation.cfm?id=1806596.1806638. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (MAPREDUCE-1849) Implement a FlumeJava-like library for operations over parallel collections using Hadoop MapReduce
I think, a prerequisite for implementing FlumeJava is to improve JobControl to allow DAGs of Hadoop jobs such that independent jobs can be executed in parallel. It also needs to be enriched with intermediate data management. A simpler alternative would be to implement FlumeJava on top of Oozie. Ideally, FlumeJava should be a Pig backend. - Original Message - From: Jeff Hammerbacher (JIRA) To: mapreduce-issues@hadoop.apache.org Sent: Thu Jun 10 08:31:18 2010 Subject: [jira] Commented: (MAPREDUCE-1849) Implement a FlumeJava-like library for operations over parallel collections using Hadoop MapReduce [ https://issues.apache.org/jira/browse/MAPREDUCE-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877451#action_12877451 ] Jeff Hammerbacher commented on MAPREDUCE-1849: -- Owen: sure. They provide "derived operators" as well, like count(), join(), and top(). The main difference from Pig seems to be allowing users to work in Java. In fact, the Google team initially implemented their approach in a new language called Lumberjack, but mentions that, among other things, the implementation of a new language was a lot of work, and most importantly, novelty is an obstacle to adoption. They settled on Java and seem to have had some internal success. > Implement a FlumeJava-like library for operations over parallel collections > using Hadoop MapReduce > -- > > Key: MAPREDUCE-1849 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1849 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Jeff Hammerbacher > > The API used internally at Google is described in great detail at > http://portal.acm.org/citation.cfm?id=1806596.1806638. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1849) Implement a FlumeJava-like library for operations over parallel collections using Hadoop MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877451#action_12877451 ] Jeff Hammerbacher commented on MAPREDUCE-1849: -- Owen: sure. They provide "derived operators" as well, like count(), join(), and top(). The main difference from Pig seems to be allowing users to work in Java. In fact, the Google team initially implemented their approach in a new language called Lumberjack, but mentions that, among other things, the implementation of a new language was a lot of work, and most importantly, novelty is an obstacle to adoption. They settled on Java and seem to have had some internal success. > Implement a FlumeJava-like library for operations over parallel collections > using Hadoop MapReduce > -- > > Key: MAPREDUCE-1849 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1849 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Jeff Hammerbacher > > The API used internally at Google is described in great detail at > http://portal.acm.org/citation.cfm?id=1806596.1806638. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1849) Implement a FlumeJava-like library for operations over parallel collections using Hadoop MapReduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877435#action_12877435 ] Owen O'Malley commented on MAPREDUCE-1849: -- I haven't read the paper yet, but can you summarize how this differs from Pig? Those operators all map into Pig's operators one to one. Pig also supports join, which is *really* nice to have automated support for. > Implement a FlumeJava-like library for operations over parallel collections > using Hadoop MapReduce > -- > > Key: MAPREDUCE-1849 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1849 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Jeff Hammerbacher > > The API used internally at Google is described in great detail at > http://portal.acm.org/citation.cfm?id=1806596.1806638. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1853) MultipleOutputs does not cache TaskAttemptContext
[ https://issues.apache.org/jira/browse/MAPREDUCE-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Torsten Curdt updated MAPREDUCE-1853: - Attachment: cache-task-attempts.diff > MultipleOutputs does not cache TaskAttemptContext > - > > Key: MAPREDUCE-1853 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1853 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.21.0 > Environment: OSX 10.6 > java6 >Reporter: Torsten Curdt >Priority: Critical > Attachments: cache-task-attempts.diff > > > In MultipleOutputs there is > [code] > private TaskAttemptContext getContext(String nameOutput) throws IOException { > // The following trick leverages the instantiation of a record writer via > // the job thus supporting arbitrary output formats. > Job job = new Job(context.getConfiguration()); > job.setOutputFormatClass(getNamedOutputFormatClass(context, nameOutput)); > job.setOutputKeyClass(getNamedOutputKeyClass(context, nameOutput)); > job.setOutputValueClass(getNamedOutputValueClass(context, nameOutput)); > TaskAttemptContext taskContext = > new TaskAttemptContextImpl(job.getConfiguration(), > context.getTaskAttemptID()); > return taskContext; > } > [code] > so for every reduce call it creates a new Job instance ...which creates a new > LocalJobRunner. > That does not sound like a good idea. > You end up with a flood of "jvm.JvmMetrics: Cannot initialize JVM Metrics > with processName=JobTracker, sessionId= - already initialized" > This should probably also be added to 0.22. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1853) MultipleOutputs does not cache TaskAttemptContext
MultipleOutputs does not cache TaskAttemptContext - Key: MAPREDUCE-1853 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1853 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.21.0 Environment: OSX 10.6 java6 Reporter: Torsten Curdt Priority: Critical Attachments: cache-task-attempts.diff In MultipleOutputs there is [code] private TaskAttemptContext getContext(String nameOutput) throws IOException { // The following trick leverages the instantiation of a record writer via // the job thus supporting arbitrary output formats. Job job = new Job(context.getConfiguration()); job.setOutputFormatClass(getNamedOutputFormatClass(context, nameOutput)); job.setOutputKeyClass(getNamedOutputKeyClass(context, nameOutput)); job.setOutputValueClass(getNamedOutputValueClass(context, nameOutput)); TaskAttemptContext taskContext = new TaskAttemptContextImpl(job.getConfiguration(), context.getTaskAttemptID()); return taskContext; } [code] so for every reduce call it creates a new Job instance ...which creates a new LocalJobRunner. That does not sound like a good idea. You end up with a flood of "jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized" This should probably also be added to 0.22. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-323) Improve the way job history files are managed
[ https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877425#action_12877425 ] Allen Wittenauer commented on MAPREDUCE-323: This should work just like core dump configuration options. A pattern is provided via an option and the system replaces the pattern's parameters with the job's unique values. This way everyone can get what they want in a very simple interface. Hard-coding a log file name is something that we shouldn't be doing anyway. > -1. There is no point supporting configuration options which are clearly > infeasible in several cases. If we stop hard coding log file names and use pattern substitution, then this isn't the case anymore. > Improve the way job history files are managed > - > > Key: MAPREDUCE-323 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-323 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.21.0, 0.22.0 >Reporter: Amar Kamat >Assignee: Dick King >Priority: Critical > > Today all the jobhistory files are dumped in one _job-history_ folder. This > can cause problems when there is a need to search the history folder > (job-recovery etc). It would be nice if we group all the jobs under a _user_ > folder. So all the jobs for user _amar_ will go in _history-folder/amar/_. > Jobs can be categorized using various features like _jobid, date, jobname_ > etc but using _username_ will make the search much more efficient and also > will not result into namespace explosion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1606) TestJobACLs may timeout as there are no slots for launching JOB_CLEANUP task
[ https://issues.apache.org/jira/browse/MAPREDUCE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod K V updated MAPREDUCE-1606: - Attachment: MAPREDUCE-1606-20100610.txt Here's a patch which reduces the test time down to 8-10 seconds. Ravi, can you please look at it? > TestJobACLs may timeout as there are no slots for launching JOB_CLEANUP task > > > Key: MAPREDUCE-1606 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1606 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Affects Versions: 0.22.0 >Reporter: Ravi Gummadi >Assignee: Ravi Gummadi > Fix For: 0.22.0 > > Attachments: 1606.patch, 1606.v1.1.patch, 1606.v1.patch, > 1606.v2.patch, MAPREDUCE-1606-20100610.txt, MR1606.20S.1.patch, > MR1606.20S.patch, MR1606.patch > > > TestJobACLs may timeout as there are no slots for launching JOB_CLEANUP task. > Because MiniMRCluster with 0 TaskTrackers is started in the test. In trunk, > we can set the config property mapreduce.job.committer.setup.cleanup.needed > to false sothat we don't get into this issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1354) Incremental enhancements to the JobTracker for better scalability
[ https://issues.apache.org/jira/browse/MAPREDUCE-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod K V updated MAPREDUCE-1354: - Fix Version/s: 0.22.0 > Incremental enhancements to the JobTracker for better scalability > - > > Key: MAPREDUCE-1354 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1354 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Reporter: Devaraj Das >Assignee: Dick King >Priority: Critical > Fix For: 0.22.0 > > Attachments: mapreduce-1354--2010-03-10.patch, > mapreduce-1354--2010-05-13.patch, MAPREDUCE-1354_yhadoop20.patch, > MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, > MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, > MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, > mr-1354-y20.patch > > > It'd be nice to have the JobTracker object not be locked while accessing the > HDFS for reading the jobconf file and while writing the jobinfo file in the > submitJob method. We should see if we can avoid taking the lock altogether. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1533) Reduce or remove usage of String.format() usage in CapacityTaskScheduler.updateQSIObjects and Counters.makeEscapedString()
[ https://issues.apache.org/jira/browse/MAPREDUCE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod K V updated MAPREDUCE-1533: - Hadoop Flags: [Reviewed] Fix Version/s: 0.22.0 > Reduce or remove usage of String.format() usage in > CapacityTaskScheduler.updateQSIObjects and Counters.makeEscapedString() > -- > > Key: MAPREDUCE-1533 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1533 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.20.1 >Reporter: Rajesh Balamohan >Assignee: Dick King > Fix For: 0.22.0 > > Attachments: mapreduce-1533--2010-05-10a.patch, > mapreduce-1533--2010-05-21.patch, mapreduce-1533--2010-05-21a.patch, > mapreduce-1533--2010-05-24.patch, MAPREDUCE-1533-and-others-20100413.1.txt, > MAPREDUCE-1533-and-others-20100413.bugfix.txt, mapreduce-1533-v1.4.patch, > mapreduce-1533-v1.8.patch > > > When short jobs are executed in hadoop with OutOfBandHeardBeat=true, JT > executes heartBeat() method heavily. This internally makes a call to > CapacityTaskScheduler.updateQSIObjects(). > CapacityTaskScheduler.updateQSIObjects(), internally calls String.format() > for setting the job scheduling information. Based on the datastructure size > of "jobQueuesManager" and "queueInfoMap", the number of times String.format() > gets executed becomes very high. String.format() internally does pattern > matching which turns to be out very heavy (This was revealed while profiling > JT. Almost 57% of time was spent in CapacityScheduler.assignTasks(), out of > which String.format() took 46%. > Would it be possible to do String.format() only at the time of invoking > JobInProgress.getSchedulingInfo?. This might reduce the pressure on JT while > processing heartbeats. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1829) JobInProgress.findSpeculativeTask should use min() to find the candidate instead of sort()
[ https://issues.apache.org/jira/browse/MAPREDUCE-1829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod K V updated MAPREDUCE-1829: - Attachment: MAPREDUCE-1829-20100610.txt There's another loop above this one which is to weed out TIPs that don't need or cannot be speculated. How about collapsing these two into a single loop? Attaching a patch for this, what do you think? > JobInProgress.findSpeculativeTask should use min() to find the candidate > instead of sort() > -- > > Key: MAPREDUCE-1829 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1829 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Affects Versions: 0.22.0 >Reporter: Scott Chen >Assignee: Scott Chen > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1829-20100610.txt, MAPREDUCE-1829.txt > > > findSpeculativeTask needs only one candidate to speculate so it does not need > to sort the whole list. It may looks OK but someone can still submit big jobs > with small slow task thresholds. In this case, this sorting becomes expensive. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1813) NPE in PipeMapred.MRErrorThread
[ https://issues.apache.org/jira/browse/MAPREDUCE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Gummadi updated MAPREDUCE-1813: Attachment: 1813.v1.3.patch Attaching new patch with the testcases refactoring. Moved the validation of task status from testStreamingStatus() to testReporting() and removed testStreamingStatus() method. Modified the perl script of TestStreamingStatus.java to write to stdout(similar to that is there in TestStreamingEmptyInpnonemptyOut.java) and removed the file TestStreamingEmptyInpnonemptyOut.java. As there are 3 copies of the method readOutput() in different files, removed 2 of the 3 readOutput() methods and kept the one in MapReduceTestUtil.java. > NPE in PipeMapred.MRErrorThread > --- > > Key: MAPREDUCE-1813 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1813 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.20.1 >Reporter: Amareshwari Sriramadasu >Assignee: Ravi Gummadi > Fix For: 0.22.0 > > Attachments: 1813.patch, 1813.v1.2.patch, 1813.v1.3.patch, > 1813.v1.patch > > > Some reduce tasks fail with following NPE > java.lang.RuntimeException: java.lang.NullPointerException > at > org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325) > at > org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:540) > at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:412) > at org.apache.hadoop.mapred.Child.main(Child.java:159) > Caused by: java.lang.NullPointerException >at > org.apache.hadoop.streaming.PipeMapRed$MRErrorThread.setStatus(PipeMapRed.java:517) > at > org.apache.hadoop.streaming.PipeMapRed$MRErrorThread.run(PipeMapRed.java:449) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1018) Document changes to the memory management and scheduling model
[ https://issues.apache.org/jira/browse/MAPREDUCE-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877369#action_12877369 ] Hemanth Yamijala commented on MAPREDUCE-1018: - Vinod, I was in the process of generating two patches; thanks for taking extra steps. That will fix the broken links etc. I will also work on your comment related to occupying multiple slots. In cluster_setup.xml, I had removed the documentation on java.opts and ulimits specifically. I have linked them saying they are described in detail in mapred_tutorial, where I have indeed described them in detail. Having the same definitions at two places would be very difficult to maintain. Once you have looked at the changes for mapred_tutorial, you could tell if this suffices or not. Please expect a new patch in a day or so. > Document changes to the memory management and scheduling model > -- > > Key: MAPREDUCE-1018 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1018 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: documentation >Affects Versions: 0.21.0 >Reporter: Hemanth Yamijala >Assignee: Hemanth Yamijala >Priority: Blocker > Fix For: 0.21.0 > > Attachments: MAPRED-1018-1.patch, MAPRED-1018-2.patch, > MAPRED-1018-3.patch, MAPRED-1018-4.patch.txt, MAPRED-1018-5.patch.txt, > MAPRED-1018-6.patch.txt, MAPRED-1018-7.patch.txt, MAPRED-1018-8.patch.txt, > MAPRED-1018-commons.patch > > > There were changes done for the configuration, monitoring and scheduling of > high ram jobs. This must be documented in the mapred-defaults.xml and also on > forrest documentation -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1813) NPE in PipeMapred.MRErrorThread
[ https://issues.apache.org/jira/browse/MAPREDUCE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Gummadi updated MAPREDUCE-1813: Status: Open (was: Patch Available) > NPE in PipeMapred.MRErrorThread > --- > > Key: MAPREDUCE-1813 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1813 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.20.1 >Reporter: Amareshwari Sriramadasu >Assignee: Ravi Gummadi > Fix For: 0.22.0 > > Attachments: 1813.patch, 1813.v1.2.patch, 1813.v1.patch > > > Some reduce tasks fail with following NPE > java.lang.RuntimeException: java.lang.NullPointerException > at > org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325) > at > org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:540) > at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:474) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:412) > at org.apache.hadoop.mapred.Child.main(Child.java:159) > Caused by: java.lang.NullPointerException >at > org.apache.hadoop.streaming.PipeMapRed$MRErrorThread.setStatus(PipeMapRed.java:517) > at > org.apache.hadoop.streaming.PipeMapRed$MRErrorThread.run(PipeMapRed.java:449) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1744) DistributedCache creates its own FileSytem instance when adding a file/archive to the path
[ https://issues.apache.org/jira/browse/MAPREDUCE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877355#action_12877355 ] Amareshwari Sriramadasu commented on MAPREDUCE-1744: I see that the issue is trying to solve add*ToClassPath apis, which obtain FileSystem in their implementation and construct a URI from it. If these apis take URI similar to other add/set apis, then I feel the issue no longer exists. Am i missing something? I see these two apis as exceptions that do not take URIs.. and all the apis should follow similar signature as proposed in MAPREDUCE-950. We take URI as argument for these apis because we interpret fragment portion of it as symlink and etc. So, still I feel MAPREDUCE-950 is the solution for this. > DistributedCache creates its own FileSytem instance when adding a > file/archive to the path > -- > > Key: MAPREDUCE-1744 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1744 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Dick King >Assignee: Krishna Ramachandran > Attachments: BZ-3503564--2010-05-06.patch, h1744.patch, > mapred-1744-1.patch, mapred-1744-2.patch, mapred-1744.patch, > MAPREDUCE-1744.patch > > > According to the contract of {{UserGroupInformation.doAs()}} the only > required operations within the {{doAs()}} block are the > creation of a {{JobClient}} or getting a {{FileSystem}} . > The {{DistributedCache.add(File/Archive)ToClasspath()}} methods create a > {{FileSystem}} instance outside of the {{doAs()}} block, > this {{FileSystem}} instance is not in the scope of the proxy user but of the > superuser and permissions may make the method > fail. > One option is to overload the methods above to receive a filesystem. > Another option is to do obtain the {{FileSystem}} within a {{doAs()}} block, > for this it would be required to have the proxy > user set in the passed configuration. > The second option seems nicer, but I don't know if the proxy user is as a > property in the jobconf. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1744) DistributedCache creates its own FileSytem instance when adding a file/archive to the path
[ https://issues.apache.org/jira/browse/MAPREDUCE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877354#action_12877354 ] Arun C Murthy commented on MAPREDUCE-1744: -- I take it back, Amareshwari's comment seems reasonable and looks like we are stuck with URI anyway for the 'symlink' feature. Sigh. > DistributedCache creates its own FileSytem instance when adding a > file/archive to the path > -- > > Key: MAPREDUCE-1744 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1744 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Dick King >Assignee: Krishna Ramachandran > Attachments: BZ-3503564--2010-05-06.patch, h1744.patch, > mapred-1744-1.patch, mapred-1744-2.patch, mapred-1744.patch, > MAPREDUCE-1744.patch > > > According to the contract of {{UserGroupInformation.doAs()}} the only > required operations within the {{doAs()}} block are the > creation of a {{JobClient}} or getting a {{FileSystem}} . > The {{DistributedCache.add(File/Archive)ToClasspath()}} methods create a > {{FileSystem}} instance outside of the {{doAs()}} block, > this {{FileSystem}} instance is not in the scope of the proxy user but of the > superuser and permissions may make the method > fail. > One option is to overload the methods above to receive a filesystem. > Another option is to do obtain the {{FileSystem}} within a {{doAs()}} block, > for this it would be required to have the proxy > user set in the passed configuration. > The second option seems nicer, but I don't know if the proxy user is as a > property in the jobconf. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.