[jira] [Created] (MAPREDUCE-5022) Tasklogs disappear if JVM reuse is enabled
Karthik Kambatla created MAPREDUCE-5022: --- Summary: Tasklogs disappear if JVM reuse is enabled Key: MAPREDUCE-5022 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5022 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Affects Versions: 1.1.1 Reporter: Karthik Kambatla Assignee: Karthik Kambatla Can't see task logs when mapred.job.reuse.jvm.num.tasks is set to -1, but the logs are visible when the same is set 1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4443) MR AM and job history server should be resilient to jobs that exceed counter limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated MAPREDUCE-4443: --- Labels: usability (was: ) > MR AM and job history server should be resilient to jobs that exceed counter > limits > > > Key: MAPREDUCE-4443 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4443 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.0-alpha >Reporter: Rahul Jain > Labels: usability > Attachments: am_failed_counter_limits.txt > > > We saw this problem migrating applications to MapReduceV2: > Our applications use hadoop counters extensively (1000+ counters for certain > jobs). While this may not be one of recommended best practices in hadoop, the > real issue here is reliability of the framework when applications exceed > counter limits. > The hadoop servers (yarn, history server) were originally brought up with > mapreduce.job.counters.max=1000 under core-site.xml > We then ran map-reduce job under an application using its own job specific > overrides, with mapreduce.job.counters.max=1 > All the tasks for the job finished successfully; however the overall job > still failed due to AM encountering exceptions as: > {code} > 2012-07-12 17:31:43,485 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks > : 712012-07-12 17:31:43,502 FATAL [AsyncDispatcher event handler] > org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher threa > dorg.apache.hadoop.mapreduce.counters.LimitExceededException: Too many > counters: 1001 max=1000 > at > org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:58) >at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:65) > at > org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:77) > at > org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:94) > at > org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:105) > at > org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.incrAllCounters(AbstractCounterGroup.java:202) > at > org.apache.hadoop.mapreduce.counters.AbstractCounters.incrAllCounters(AbstractCounters.java:337) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.constructFinalFullcounters(JobImpl.java:1212) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.mayBeConstructFinalFullCounters(JobImpl.java:1198) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.createJobFinishedEvent(JobImpl.java:1179) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.logJobHistoryFinishedEvent(JobImpl.java:711) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.checkJobCompleteSuccess(JobImpl.java:737) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$TaskCompletedTransition.checkJobForCompletion(JobImpl.java:1360) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$TaskCompletedTransition.transition(JobImpl.java:1340) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$TaskCompletedTransition.transition(JobImpl.java:1323) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:380) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:666) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:113) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:890) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:886) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:74) > at java.lang.Thread.run(Thread.java:662) > 2012-07-12 17:31:43,502 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..2012-07-12 > 17:31:43,503 INFO [Thread-1] org.apache.had > {code} > The overall job failed, and the job history wasn't accessible either at the > end of the job (didn't
[jira] [Updated] (MAPREDUCE-4443) MR AM and job history server should be resilient to jobs that exceed counter limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated MAPREDUCE-4443: --- Summary: MR AM and job history server should be resilient to jobs that exceed counter limits (was: Yarn framework components (AM, job history server) should be resilient to applications exceeding counter limits ) > MR AM and job history server should be resilient to jobs that exceed counter > limits > > > Key: MAPREDUCE-4443 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4443 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.0-alpha >Reporter: Rahul Jain > Attachments: am_failed_counter_limits.txt > > > We saw this problem migrating applications to MapReduceV2: > Our applications use hadoop counters extensively (1000+ counters for certain > jobs). While this may not be one of recommended best practices in hadoop, the > real issue here is reliability of the framework when applications exceed > counter limits. > The hadoop servers (yarn, history server) were originally brought up with > mapreduce.job.counters.max=1000 under core-site.xml > We then ran map-reduce job under an application using its own job specific > overrides, with mapreduce.job.counters.max=1 > All the tasks for the job finished successfully; however the overall job > still failed due to AM encountering exceptions as: > {code} > 2012-07-12 17:31:43,485 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks > : 712012-07-12 17:31:43,502 FATAL [AsyncDispatcher event handler] > org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher threa > dorg.apache.hadoop.mapreduce.counters.LimitExceededException: Too many > counters: 1001 max=1000 > at > org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:58) >at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:65) > at > org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:77) > at > org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:94) > at > org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:105) > at > org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.incrAllCounters(AbstractCounterGroup.java:202) > at > org.apache.hadoop.mapreduce.counters.AbstractCounters.incrAllCounters(AbstractCounters.java:337) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.constructFinalFullcounters(JobImpl.java:1212) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.mayBeConstructFinalFullCounters(JobImpl.java:1198) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.createJobFinishedEvent(JobImpl.java:1179) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.logJobHistoryFinishedEvent(JobImpl.java:711) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.checkJobCompleteSuccess(JobImpl.java:737) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$TaskCompletedTransition.checkJobForCompletion(JobImpl.java:1360) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$TaskCompletedTransition.transition(JobImpl.java:1340) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$TaskCompletedTransition.transition(JobImpl.java:1323) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:380) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:666) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:113) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:890) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:886) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:74) > at java.lang.Thread.run(Thread.java:662) > 2012-07-12 17:31:43,502 INFO [AsyncDispatcher event handler] > org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..2012-07-12 >
[jira] [Commented] (MAPREDUCE-377) Add serialization for Protocol Buffers
[ https://issues.apache.org/jira/browse/MAPREDUCE-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583715#comment-13583715 ] Josh Hansen commented on MAPREDUCE-377: --- writeDelimitedTo(OutputStream), mergeDelimitedFrom(InputStream), and parseDelimitedFrom(InputStream) have all made it into the standard Protocol Buffers library now. See https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/MessageLite#writeDelimitedTo(java.io.OutputStream) . That should resolve one obvious obstacle to addressing this issue. There were questions a few years ago about whether this issue is still relevant; I'm with Tom White that it's very relevant for people who want to use their protobuf data in Hadoop MapReduce. Avro in particular doesn't meet the needs of my organization due to its lack of a sparse representation. Twitter's elephant-bird library (https://github.com/kevinweil/elephant-bird) provides some protobuf-in-Hadoop support, but it's less than obvious how to use it with protobufs that are not LZO-compressed. > Add serialization for Protocol Buffers > -- > > Key: MAPREDUCE-377 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-377 > Project: Hadoop Map/Reduce > Issue Type: Wish >Reporter: Tom White >Assignee: Alex Loddengaard > Attachments: hadoop-3788-v1.patch, hadoop-3788-v2.patch, > hadoop-3788-v3.patch, protobuf-java-2.0.1.jar, protobuf-java-2.0.2.jar > > > Protocol Buffers (http://code.google.com/p/protobuf/) are a way of encoding > data in a compact binary format. This issue is to write a > ProtocolBuffersSerialization to support using Protocol Buffers types in > MapReduce programs, including an example program. This should probably go > into contrib. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5021) Add an addDirectoryToClassPath method DistributedCache
[ https://issues.apache.org/jira/browse/MAPREDUCE-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583693#comment-13583693 ] Alejandro Abdelnur commented on MAPREDUCE-5021: --- This is equivalent to how java -classpath supports wildcards, i.e. {{lib/'*'}} to easily specify all JARs in a directory. Regarding adding a new method, an alternative would be making the existing addFileToClasspath() to detect if the provided path is a directory and then do the addition of all JARs under it. > Add an addDirectoryToClassPath method DistributedCache > -- > > Key: MAPREDUCE-5021 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5021 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client, distributed-cache >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza > > As adding a directory of jars to the class path is a common use for the > distributed cache it would be easier on API consumers if they were able to > call a method that would add all the the files in a directory for them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5021) Add an addDirectoryToClassPath method DistributedCache
Sandy Ryza created MAPREDUCE-5021: - Summary: Add an addDirectoryToClassPath method DistributedCache Key: MAPREDUCE-5021 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5021 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client, distributed-cache Affects Versions: 2.0.3-alpha Reporter: Sandy Ryza As adding a directory of jars to the class path is a common use for the distributed cache it would be easier on API consumers if they were able to call a method that would add all the the files in a directory for them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5020) Compile failure with JDK8
[ https://issues.apache.org/jira/browse/MAPREDUCE-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583628#comment-13583628 ] Hadoop QA commented on MAPREDUCE-5020: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12570367/MAPREDUCE-5020.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3354//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3354//console This message is automatically generated. > Compile failure with JDK8 > - > > Key: MAPREDUCE-5020 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5020 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 2.0.3-alpha > Environment: java version "1.8.0-ea" > Java(TM) SE Runtime Environment (build 1.8.0-ea-b36e) > Java HotSpot(TM) Client VM (build 25.0-b04, mixed mode) >Reporter: Trevor Robinson >Assignee: Trevor Robinson > Labels: build-failure, jdk8 > Attachments: MAPREDUCE-5020.patch > > > Compiling {{org/apache/hadoop/mapreduce/lib/partition/InputSampler.java}} > fails with the Java 8 preview compiler due to its stricter enforcement of JLS > 15.12.2.6 (for [Java > 5|http://docs.oracle.com/javase/specs/jls/se5.0/html/expressions.html#15.12.2.6] > or [Java > 7|http://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.12.2.6]), > which demands that methods applicable via unchecked conversion have their > return type erased: > {noformat} > [ERROR] > hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/InputSampler.java:[320,35] > error: incompatible types: Object[] cannot be converted to K[] > {noformat} > {code} > @SuppressWarnings("unchecked") // getInputFormat, getOutputKeyComparator > public static void writePartitionFile(Job job, Sampler sampler) > throws IOException, ClassNotFoundException, InterruptedException { > Configuration conf = job.getConfiguration(); > final InputFormat inf = > ReflectionUtils.newInstance(job.getInputFormatClass(), conf); > int numPartitions = job.getNumReduceTasks(); > K[] samples = sampler.getSample(inf, job); // returns Object[] according > to JLS > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5020) Compile failure with JDK8
[ https://issues.apache.org/jira/browse/MAPREDUCE-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trevor Robinson updated MAPREDUCE-5020: --- Attachment: MAPREDUCE-5020.patch The attached patch simply adds a {{(K[])}} cast to the result of {{sampler.getSample()}}. This was sufficient to build Hadoop with the Java 8 (preview) compiler. > Compile failure with JDK8 > - > > Key: MAPREDUCE-5020 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5020 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 2.0.3-alpha > Environment: java version "1.8.0-ea" > Java(TM) SE Runtime Environment (build 1.8.0-ea-b36e) > Java HotSpot(TM) Client VM (build 25.0-b04, mixed mode) >Reporter: Trevor Robinson > Labels: build-failure, jdk8 > Attachments: MAPREDUCE-5020.patch > > > Compiling {{org/apache/hadoop/mapreduce/lib/partition/InputSampler.java}} > fails with the Java 8 preview compiler due to its stricter enforcement of JLS > 15.12.2.6 (for [Java > 5|http://docs.oracle.com/javase/specs/jls/se5.0/html/expressions.html#15.12.2.6] > or [Java > 7|http://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.12.2.6]), > which demands that methods applicable via unchecked conversion have their > return type erased: > {noformat} > [ERROR] > hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/InputSampler.java:[320,35] > error: incompatible types: Object[] cannot be converted to K[] > {noformat} > {code} > @SuppressWarnings("unchecked") // getInputFormat, getOutputKeyComparator > public static void writePartitionFile(Job job, Sampler sampler) > throws IOException, ClassNotFoundException, InterruptedException { > Configuration conf = job.getConfiguration(); > final InputFormat inf = > ReflectionUtils.newInstance(job.getInputFormatClass(), conf); > int numPartitions = job.getNumReduceTasks(); > K[] samples = sampler.getSample(inf, job); // returns Object[] according > to JLS > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5020) Compile failure with JDK8
[ https://issues.apache.org/jira/browse/MAPREDUCE-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trevor Robinson updated MAPREDUCE-5020: --- Status: Patch Available (was: Open) > Compile failure with JDK8 > - > > Key: MAPREDUCE-5020 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5020 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 2.0.3-alpha > Environment: java version "1.8.0-ea" > Java(TM) SE Runtime Environment (build 1.8.0-ea-b36e) > Java HotSpot(TM) Client VM (build 25.0-b04, mixed mode) >Reporter: Trevor Robinson >Assignee: Trevor Robinson > Labels: build-failure, jdk8 > Attachments: MAPREDUCE-5020.patch > > > Compiling {{org/apache/hadoop/mapreduce/lib/partition/InputSampler.java}} > fails with the Java 8 preview compiler due to its stricter enforcement of JLS > 15.12.2.6 (for [Java > 5|http://docs.oracle.com/javase/specs/jls/se5.0/html/expressions.html#15.12.2.6] > or [Java > 7|http://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.12.2.6]), > which demands that methods applicable via unchecked conversion have their > return type erased: > {noformat} > [ERROR] > hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/InputSampler.java:[320,35] > error: incompatible types: Object[] cannot be converted to K[] > {noformat} > {code} > @SuppressWarnings("unchecked") // getInputFormat, getOutputKeyComparator > public static void writePartitionFile(Job job, Sampler sampler) > throws IOException, ClassNotFoundException, InterruptedException { > Configuration conf = job.getConfiguration(); > final InputFormat inf = > ReflectionUtils.newInstance(job.getInputFormatClass(), conf); > int numPartitions = job.getNumReduceTasks(); > K[] samples = sampler.getSample(inf, job); // returns Object[] according > to JLS > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-5020) Compile failure with JDK8
[ https://issues.apache.org/jira/browse/MAPREDUCE-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trevor Robinson reassigned MAPREDUCE-5020: -- Assignee: Trevor Robinson > Compile failure with JDK8 > - > > Key: MAPREDUCE-5020 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5020 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 2.0.3-alpha > Environment: java version "1.8.0-ea" > Java(TM) SE Runtime Environment (build 1.8.0-ea-b36e) > Java HotSpot(TM) Client VM (build 25.0-b04, mixed mode) >Reporter: Trevor Robinson >Assignee: Trevor Robinson > Labels: build-failure, jdk8 > Attachments: MAPREDUCE-5020.patch > > > Compiling {{org/apache/hadoop/mapreduce/lib/partition/InputSampler.java}} > fails with the Java 8 preview compiler due to its stricter enforcement of JLS > 15.12.2.6 (for [Java > 5|http://docs.oracle.com/javase/specs/jls/se5.0/html/expressions.html#15.12.2.6] > or [Java > 7|http://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.12.2.6]), > which demands that methods applicable via unchecked conversion have their > return type erased: > {noformat} > [ERROR] > hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/InputSampler.java:[320,35] > error: incompatible types: Object[] cannot be converted to K[] > {noformat} > {code} > @SuppressWarnings("unchecked") // getInputFormat, getOutputKeyComparator > public static void writePartitionFile(Job job, Sampler sampler) > throws IOException, ClassNotFoundException, InterruptedException { > Configuration conf = job.getConfiguration(); > final InputFormat inf = > ReflectionUtils.newInstance(job.getInputFormatClass(), conf); > int numPartitions = job.getNumReduceTasks(); > K[] samples = sampler.getSample(inf, job); // returns Object[] according > to JLS > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583592#comment-13583592 ] Surenkumar Nihalani commented on MAPREDUCE-4974: I don't see the null check for key/value being the beneficial part of the optimization. Can you post the patch and I'll review it? I agree with the change bq. 2) if we have ' compressionCodecs & codec instantiated only if its a compressed input. ' > Optimising the LineRecordReader initialize() method > --- > > Key: MAPREDUCE-4974 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv1, mrv2, performance >Affects Versions: 2.0.2-alpha, 0.23.5 > Environment: Hadoop Linux >Reporter: Arun A K >Assignee: Gelesh > Labels: patch, performance > Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, > MAPREDUCE-4974.3.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > I found there is a a scope of optimizing the code, over initialize() if we > have compressionCodecs & codec instantiated only if its a compressed input. > Mean while Gelesh George Omathil, added if we could avoid the null check of > key & value. This would time save, since for every next key value generation, > null check is done. The intention being to instantiate only once and avoid > NPE as well. Hope both could be met if initialize key & value over > initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5020) Compile failure with JDK8
[ https://issues.apache.org/jira/browse/MAPREDUCE-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trevor Robinson updated MAPREDUCE-5020: --- Labels: build-failure jdk8 (was: jdk8) > Compile failure with JDK8 > - > > Key: MAPREDUCE-5020 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5020 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 2.0.3-alpha > Environment: java version "1.8.0-ea" > Java(TM) SE Runtime Environment (build 1.8.0-ea-b36e) > Java HotSpot(TM) Client VM (build 25.0-b04, mixed mode) >Reporter: Trevor Robinson > Labels: build-failure, jdk8 > > Compiling {{org/apache/hadoop/mapreduce/lib/partition/InputSampler.java}} > fails with the Java 8 preview compiler due to its stricter enforcement of JLS > 15.12.2.6 (for [Java > 5|http://docs.oracle.com/javase/specs/jls/se5.0/html/expressions.html#15.12.2.6] > or [Java > 7|http://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.12.2.6]), > which demands that methods applicable via unchecked conversion have their > return type erased: > {noformat} > [ERROR] > hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/InputSampler.java:[320,35] > error: incompatible types: Object[] cannot be converted to K[] > {noformat} > {code} > @SuppressWarnings("unchecked") // getInputFormat, getOutputKeyComparator > public static void writePartitionFile(Job job, Sampler sampler) > throws IOException, ClassNotFoundException, InterruptedException { > Configuration conf = job.getConfiguration(); > final InputFormat inf = > ReflectionUtils.newInstance(job.getInputFormatClass(), conf); > int numPartitions = job.getNumReduceTasks(); > K[] samples = sampler.getSample(inf, job); // returns Object[] according > to JLS > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5020) Compile failure with JDK8
Trevor Robinson created MAPREDUCE-5020: -- Summary: Compile failure with JDK8 Key: MAPREDUCE-5020 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5020 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 2.0.3-alpha Environment: java version "1.8.0-ea" Java(TM) SE Runtime Environment (build 1.8.0-ea-b36e) Java HotSpot(TM) Client VM (build 25.0-b04, mixed mode) Reporter: Trevor Robinson Compiling `org/apache/hadoop/mapreduce/lib/partition/InputSampler.java` fails with the Java 8 preview compiler due to its stricter enforcement of JLS 15.12.2.6 (for [Java 5|http://docs.oracle.com/javase/specs/jls/se5.0/html/expressions.html#15.12.2.6] or [Java 7|http://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.12.2.6]), which demands that methods applicable via unchecked conversion have their return type erased: {noformat} [ERROR] hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/InputSampler.java:[320,35] error: incompatible types: Object[] cannot be converted to K[] {noformat} {code} @SuppressWarnings("unchecked") // getInputFormat, getOutputKeyComparator public static void writePartitionFile(Job job, Sampler sampler) throws IOException, ClassNotFoundException, InterruptedException { Configuration conf = job.getConfiguration(); final InputFormat inf = ReflectionUtils.newInstance(job.getInputFormatClass(), conf); int numPartitions = job.getNumReduceTasks(); K[] samples = sampler.getSample(inf, job); // returns Object[] according to JLS {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5020) Compile failure with JDK8
[ https://issues.apache.org/jira/browse/MAPREDUCE-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trevor Robinson updated MAPREDUCE-5020: --- Description: Compiling {{org/apache/hadoop/mapreduce/lib/partition/InputSampler.java}} fails with the Java 8 preview compiler due to its stricter enforcement of JLS 15.12.2.6 (for [Java 5|http://docs.oracle.com/javase/specs/jls/se5.0/html/expressions.html#15.12.2.6] or [Java 7|http://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.12.2.6]), which demands that methods applicable via unchecked conversion have their return type erased: {noformat} [ERROR] hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/InputSampler.java:[320,35] error: incompatible types: Object[] cannot be converted to K[] {noformat} {code} @SuppressWarnings("unchecked") // getInputFormat, getOutputKeyComparator public static void writePartitionFile(Job job, Sampler sampler) throws IOException, ClassNotFoundException, InterruptedException { Configuration conf = job.getConfiguration(); final InputFormat inf = ReflectionUtils.newInstance(job.getInputFormatClass(), conf); int numPartitions = job.getNumReduceTasks(); K[] samples = sampler.getSample(inf, job); // returns Object[] according to JLS {code} was: Compiling `org/apache/hadoop/mapreduce/lib/partition/InputSampler.java` fails with the Java 8 preview compiler due to its stricter enforcement of JLS 15.12.2.6 (for [Java 5|http://docs.oracle.com/javase/specs/jls/se5.0/html/expressions.html#15.12.2.6] or [Java 7|http://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.12.2.6]), which demands that methods applicable via unchecked conversion have their return type erased: {noformat} [ERROR] hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/InputSampler.java:[320,35] error: incompatible types: Object[] cannot be converted to K[] {noformat} {code} @SuppressWarnings("unchecked") // getInputFormat, getOutputKeyComparator public static void writePartitionFile(Job job, Sampler sampler) throws IOException, ClassNotFoundException, InterruptedException { Configuration conf = job.getConfiguration(); final InputFormat inf = ReflectionUtils.newInstance(job.getInputFormatClass(), conf); int numPartitions = job.getNumReduceTasks(); K[] samples = sampler.getSample(inf, job); // returns Object[] according to JLS {code} > Compile failure with JDK8 > - > > Key: MAPREDUCE-5020 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5020 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 2.0.3-alpha > Environment: java version "1.8.0-ea" > Java(TM) SE Runtime Environment (build 1.8.0-ea-b36e) > Java HotSpot(TM) Client VM (build 25.0-b04, mixed mode) >Reporter: Trevor Robinson > Labels: jdk8 > > Compiling {{org/apache/hadoop/mapreduce/lib/partition/InputSampler.java}} > fails with the Java 8 preview compiler due to its stricter enforcement of JLS > 15.12.2.6 (for [Java > 5|http://docs.oracle.com/javase/specs/jls/se5.0/html/expressions.html#15.12.2.6] > or [Java > 7|http://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.12.2.6]), > which demands that methods applicable via unchecked conversion have their > return type erased: > {noformat} > [ERROR] > hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/InputSampler.java:[320,35] > error: incompatible types: Object[] cannot be converted to K[] > {noformat} > {code} > @SuppressWarnings("unchecked") // getInputFormat, getOutputKeyComparator > public static void writePartitionFile(Job job, Sampler sampler) > throws IOException, ClassNotFoundException, InterruptedException { > Configuration conf = job.getConfiguration(); > final InputFormat inf = > ReflectionUtils.newInstance(job.getInputFormatClass(), conf); > int numPartitions = job.getNumReduceTasks(); > K[] samples = sampler.getSample(inf, job); // returns Object[] according > to JLS > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3951) Tasks are not evenly spread throughout cluster in MR2
[ https://issues.apache.org/jira/browse/MAPREDUCE-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583495#comment-13583495 ] Sandy Ryza commented on MAPREDUCE-3951: --- As of now, the fair and capacity schedulers support even spreading, but the fifo scheduler does not. The capacity scheduler is the default scheduler. Should we still try to add it to the fifo scheduler as well? > Tasks are not evenly spread throughout cluster in MR2 > - > > Key: MAPREDUCE-3951 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3951 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: scheduler >Affects Versions: 0.23.0, 0.24.0 >Reporter: Todd Lipcon > > In MR1 (at least with the fair and fifo schedulers), if you submit a job that > needs fewer resources than the cluster can provide, the tasks are spread > relatively evenly across the node. For example, submitting a 100-map job to a > 50-node cluster, each with 10 slots, results in 2 tasks on each machine. In > MR2, however, the tasks would pile up on the first 10 nodes of the cluster, > leaving the other nodes unused. This is highly suboptimal for many use cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5006) streaming tests failing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583481#comment-13583481 ] Alejandro Abdelnur commented on MAPREDUCE-5006: --- I think is fine just fixing the testcases as the current patch proposes as the testcases were coded assuming there is 1 map when using localrunner. > streaming tests failing > --- > > Key: MAPREDUCE-5006 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5006 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 2.0.4-beta >Reporter: Alejandro Abdelnur >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5006.patch > > > The following 2 tests are failing in trunk > * org.apache.hadoop.streaming.TestStreamReduceNone > * org.apache.hadoop.streaming.TestStreamXmlRecordReader -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5006) streaming tests failing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583472#comment-13583472 ] Sandy Ryza commented on MAPREDUCE-5006: --- MAPREDUCE-4994 made it so that LocalClientProtocolProvider doesn't always set the number of map tasks to 1. As 2 is the number in mapred-default, this means that the local job runner now defaults to 2 mappers. Ideally, it would be set to default to 1 and still be overridable via command line, but I am told this isn't possible with the current way that configurations work. We could possibly add in a mapred.local.job.maps? > streaming tests failing > --- > > Key: MAPREDUCE-5006 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5006 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 2.0.4-beta >Reporter: Alejandro Abdelnur >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5006.patch > > > The following 2 tests are failing in trunk > * org.apache.hadoop.streaming.TestStreamReduceNone > * org.apache.hadoop.streaming.TestStreamXmlRecordReader -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4693) Historyserver should provide counters for failed tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583450#comment-13583450 ] Siddharth Seth commented on MAPREDUCE-4693: --- bq. ... and the counters is converted to jobhistory.JhCounters while serializing. Storing the counters as org.apache.hadoop.mapreduce.Counters is to prevent a duplicate copy of the counters till they're actually serialized that's the getDatum() method. (MAPREDUCE-3511) Other than this one change and a couple of minor formatting fixes, the patch looks good. > Historyserver should provide counters for failed tasks > -- > > Key: MAPREDUCE-4693 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4693 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver, mrv2 >Affects Versions: 2.0.3-alpha, 0.23.6 >Reporter: Jason Lowe >Assignee: Xuan Gong > Labels: usability > Attachments: MAPREDUCE-4693.1.patch, MAPREDUCE-4693.2.patch > > > Currently the historyserver is not providing counters for failed tasks, even > though they are available via the AM as long as the job is still running. > Those counters are lost when the client needs to redirect to the > historyserver after the job completes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4693) Historyserver should provide counters for failed tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated MAPREDUCE-4693: -- Status: Open (was: Patch Available) > Historyserver should provide counters for failed tasks > -- > > Key: MAPREDUCE-4693 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4693 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver, mrv2 >Affects Versions: 2.0.3-alpha, 0.23.6 >Reporter: Jason Lowe >Assignee: Xuan Gong > Labels: usability > Attachments: MAPREDUCE-4693.1.patch, MAPREDUCE-4693.2.patch > > > Currently the historyserver is not providing counters for failed tasks, even > though they are available via the AM as long as the job is still running. > Those counters are lost when the client needs to redirect to the > historyserver after the job completes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583406#comment-13583406 ] Gelesh commented on MAPREDUCE-4974: --- And Also, as [~ak.a...@aol.com] has mentioned, 1) To avoid ' if (newSize == 0) ' check inside the loop, 2) if we have ' compressionCodecs & codec instantiated only if its a compressed input. ' Hope These two points are valid, Please share your thoughts... > Optimising the LineRecordReader initialize() method > --- > > Key: MAPREDUCE-4974 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv1, mrv2, performance >Affects Versions: 2.0.2-alpha, 0.23.5 > Environment: Hadoop Linux >Reporter: Arun A K >Assignee: Gelesh > Labels: patch, performance > Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, > MAPREDUCE-4974.3.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > I found there is a a scope of optimizing the code, over initialize() if we > have compressionCodecs & codec instantiated only if its a compressed input. > Mean while Gelesh George Omathil, added if we could avoid the null check of > key & value. This would time save, since for every next key value generation, > null check is done. The intention being to instantiate only once and avoid > NPE as well. Hope both could be met if initialize key & value over > initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583403#comment-13583403 ] Gelesh commented on MAPREDUCE-4974: --- [~snihalani], Thanks for bring up that very valid point. In That Case, What if we eliminate the null check for Value alone, And keep the Null Check for Key as such .. ? > Optimising the LineRecordReader initialize() method > --- > > Key: MAPREDUCE-4974 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv1, mrv2, performance >Affects Versions: 2.0.2-alpha, 0.23.5 > Environment: Hadoop Linux >Reporter: Arun A K >Assignee: Gelesh > Labels: patch, performance > Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, > MAPREDUCE-4974.3.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > I found there is a a scope of optimizing the code, over initialize() if we > have compressionCodecs & codec instantiated only if its a compressed input. > Mean while Gelesh George Omathil, added if we could avoid the null check of > key & value. This would time save, since for every next key value generation, > null check is done. The intention being to instantiate only once and avoid > NPE as well. Hope both could be met if initialize key & value over > initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5008) Merger progress miscounts with respect to EOF_MARKER
[ https://issues.apache.org/jira/browse/MAPREDUCE-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583388#comment-13583388 ] Tom White commented on MAPREDUCE-5008: -- Yes, that is fine. +1 for the patch. > Merger progress miscounts with respect to EOF_MARKER > > > Key: MAPREDUCE-5008 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5008 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5008-branch-1.patch, MAPREDUCE-5008.patch, > MAPREDUCE-5008.patch > > > After MAPREDUCE-2264, a segment's raw data length is calculated without the > EOF_MARKER bytes. However, when the merge is counting how many bytes it > processed, it includes the marker. This can cause the merge progress to go > above 100%. > Whether these EOF_MARKER bytes should count should be consistent between the > two. > This a JIRA instead of an amendment because MAPREDUCE-2264 already went into > 2.0.3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-3778) Per-state RM app-pages should have search ala JHS pages
[ https://issues.apache.org/jira/browse/MAPREDUCE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved MAPREDUCE-3778. Resolution: Duplicate This is already fixed as I see now. Closing as duplicate. > Per-state RM app-pages should have search ala JHS pages > --- > > Key: MAPREDUCE-3778 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3778 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2, webapps >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5008) Merger progress miscounts with respect to EOF_MARKER
[ https://issues.apache.org/jira/browse/MAPREDUCE-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583357#comment-13583357 ] Sandy Ryza commented on MAPREDUCE-5008: --- With the merge turned on in the local job runner (MAPREDUCE-434), TestReporter, which includes progress counting, fails. With this patch and MAPREDUCE-434, it passes. Does that sound sufficient to you Tom? > Merger progress miscounts with respect to EOF_MARKER > > > Key: MAPREDUCE-5008 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5008 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5008-branch-1.patch, MAPREDUCE-5008.patch, > MAPREDUCE-5008.patch > > > After MAPREDUCE-2264, a segment's raw data length is calculated without the > EOF_MARKER bytes. However, when the merge is counting how many bytes it > processed, it includes the marker. This can cause the merge progress to go > above 100%. > Whether these EOF_MARKER bytes should count should be consistent between the > two. > This a JIRA instead of an amendment because MAPREDUCE-2264 already went into > 2.0.3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5017) Provide access to launcher job URL from web console when using Map Reduce action
[ https://issues.apache.org/jira/browse/MAPREDUCE-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583355#comment-13583355 ] Ryota Egashira commented on MAPREDUCE-5017: --- oh didn't know that. thanks, Harsh > Provide access to launcher job URL from web console when using Map Reduce > action > - > > Key: MAPREDUCE-5017 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5017 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: trunk >Reporter: Ryota Egashira >Assignee: Ryota Egashira > Fix For: trunk > > > there are applications where custom inputformat used in MR action, and log > message from the inputformat is written on launcher task log. for debugging > purpose, users need to check the launcher task log. but currently in MR > action, oozie automatically swaps external ID, and do not expose the launcher > ID in web-console. (now only way is to to grep oozie.log). this jira is to > show launcher job URL on web console when using Map Reduce action -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5018) Support raw binary data with Hadoop streaming
[ https://issues.apache.org/jira/browse/MAPREDUCE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583352#comment-13583352 ] Hadoop QA commented on MAPREDUCE-5018: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12570328/mapstream against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3353//console This message is automatically generated. > Support raw binary data with Hadoop streaming > - > > Key: MAPREDUCE-5018 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5018 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/streaming >Reporter: Jay Hacker >Priority: Minor > Attachments: justbytes.jar, MAPREDUCE-5018.patch, mapstream > > > People often have a need to run older programs over many files, and turn to > Hadoop streaming as a reliable, performant batch system. There are good > reasons for this: > 1. Hadoop is convenient: they may already be using it for mapreduce jobs, and > it is easy to spin up a cluster in the cloud. > 2. It is reliable: HDFS replicates data and the scheduler retries failed jobs. > 3. It is reasonably performant: it moves the code to the data, maintaining > locality, and scales with the number of nodes. > Historically Hadoop is of course oriented toward processing key/value pairs, > and so needs to interpret the data passing through it. Unfortunately, this > makes it difficult to use Hadoop streaming with programs that don't deal in > key/value pairs, or with binary data in general. For example, something as > simple as running md5sum to verify the integrity of files will not give the > correct result, due to Hadoop's interpretation of the data. > There have been several attempts at binary serialization schemes for Hadoop > streaming, such as TypedBytes (HADOOP-1722); however, these are still aimed > at efficiently encoding key/value pairs, and not passing data through > unmodified. Even the "RawBytes" serialization scheme adds length fields to > the data, rendering it not-so-raw. > I often have a need to run a Unix filter on files stored in HDFS; currently, > the only way I can do this on the raw data is to copy the data out and run > the filter on one machine, which is inconvenient, slow, and unreliable. It > would be very convenient to run the filter as a map-only job, allowing me to > build on existing (well-tested!) building blocks in the Unix tradition > instead of reimplementing them as mapreduce programs. > However, most existing tools don't know about file splits, and so want to > process whole files; and of course many expect raw binary input and output. > The solution is to run a map-only job with an InputFormat and OutputFormat > that just pass raw bytes and don't split. It turns out to be a little more > complicated with streaming; I have attached a patch with the simplest > solution I could come up with. I call the format "JustBytes" (as "RawBytes" > was already taken), and it should be usable with most recent versions of > Hadoop. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5018) Support raw binary data with Hadoop streaming
[ https://issues.apache.org/jira/browse/MAPREDUCE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Hacker updated MAPREDUCE-5018: -- Attachment: mapstream justbytes.jar I've attached a jar file with source and compiled binaries for people who want to try it out without recompiling Hadoop. You can use the attached 'mapstream' shell script to run it easily. For those interested in performance, the TL;DR is about 10X slower than native. That's running 'cat' as the mapper on one file that fits in one block, compared to cat on a local ext4 filesystem on the same machine. If your files span multiple blocks, the non-local reads will be even slower. That also doesn't include job overhead. However, most mappers will be more CPU intensive, and the relative overhead of I/O diminishes; YMMV. > Support raw binary data with Hadoop streaming > - > > Key: MAPREDUCE-5018 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5018 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/streaming >Reporter: Jay Hacker >Priority: Minor > Attachments: justbytes.jar, MAPREDUCE-5018.patch, mapstream > > > People often have a need to run older programs over many files, and turn to > Hadoop streaming as a reliable, performant batch system. There are good > reasons for this: > 1. Hadoop is convenient: they may already be using it for mapreduce jobs, and > it is easy to spin up a cluster in the cloud. > 2. It is reliable: HDFS replicates data and the scheduler retries failed jobs. > 3. It is reasonably performant: it moves the code to the data, maintaining > locality, and scales with the number of nodes. > Historically Hadoop is of course oriented toward processing key/value pairs, > and so needs to interpret the data passing through it. Unfortunately, this > makes it difficult to use Hadoop streaming with programs that don't deal in > key/value pairs, or with binary data in general. For example, something as > simple as running md5sum to verify the integrity of files will not give the > correct result, due to Hadoop's interpretation of the data. > There have been several attempts at binary serialization schemes for Hadoop > streaming, such as TypedBytes (HADOOP-1722); however, these are still aimed > at efficiently encoding key/value pairs, and not passing data through > unmodified. Even the "RawBytes" serialization scheme adds length fields to > the data, rendering it not-so-raw. > I often have a need to run a Unix filter on files stored in HDFS; currently, > the only way I can do this on the raw data is to copy the data out and run > the filter on one machine, which is inconvenient, slow, and unreliable. It > would be very convenient to run the filter as a map-only job, allowing me to > build on existing (well-tested!) building blocks in the Unix tradition > instead of reimplementing them as mapreduce programs. > However, most existing tools don't know about file splits, and so want to > process whole files; and of course many expect raw binary input and output. > The solution is to run a map-only job with an InputFormat and OutputFormat > that just pass raw bytes and don't split. It turns out to be a little more > complicated with streaming; I have attached a patch with the simplest > solution I could come up with. I call the format "JustBytes" (as "RawBytes" > was already taken), and it should be usable with most recent versions of > Hadoop. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5018) Support raw binary data with Hadoop streaming
[ https://issues.apache.org/jira/browse/MAPREDUCE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583302#comment-13583302 ] Hadoop QA commented on MAPREDUCE-5018: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12570317/MAPREDUCE-5018.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3352//console This message is automatically generated. > Support raw binary data with Hadoop streaming > - > > Key: MAPREDUCE-5018 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5018 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/streaming >Reporter: Jay Hacker >Priority: Minor > Attachments: MAPREDUCE-5018.patch > > > People often have a need to run older programs over many files, and turn to > Hadoop streaming as a reliable, performant batch system. There are good > reasons for this: > 1. Hadoop is convenient: they may already be using it for mapreduce jobs, and > it is easy to spin up a cluster in the cloud. > 2. It is reliable: HDFS replicates data and the scheduler retries failed jobs. > 3. It is reasonably performant: it moves the code to the data, maintaining > locality, and scales with the number of nodes. > Historically Hadoop is of course oriented toward processing key/value pairs, > and so needs to interpret the data passing through it. Unfortunately, this > makes it difficult to use Hadoop streaming with programs that don't deal in > key/value pairs, or with binary data in general. For example, something as > simple as running md5sum to verify the integrity of files will not give the > correct result, due to Hadoop's interpretation of the data. > There have been several attempts at binary serialization schemes for Hadoop > streaming, such as TypedBytes (HADOOP-1722); however, these are still aimed > at efficiently encoding key/value pairs, and not passing data through > unmodified. Even the "RawBytes" serialization scheme adds length fields to > the data, rendering it not-so-raw. > I often have a need to run a Unix filter on files stored in HDFS; currently, > the only way I can do this on the raw data is to copy the data out and run > the filter on one machine, which is inconvenient, slow, and unreliable. It > would be very convenient to run the filter as a map-only job, allowing me to > build on existing (well-tested!) building blocks in the Unix tradition > instead of reimplementing them as mapreduce programs. > However, most existing tools don't know about file splits, and so want to > process whole files; and of course many expect raw binary input and output. > The solution is to run a map-only job with an InputFormat and OutputFormat > that just pass raw bytes and don't split. It turns out to be a little more > complicated with streaming; I have attached a patch with the simplest > solution I could come up with. I call the format "JustBytes" (as "RawBytes" > was already taken), and it should be usable with most recent versions of > Hadoop. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5018) Support raw binary data with Hadoop streaming
[ https://issues.apache.org/jira/browse/MAPREDUCE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Hacker updated MAPREDUCE-5018: -- Attachment: MAPREDUCE-5018.patch justbytes patch submitted for code review. > Support raw binary data with Hadoop streaming > - > > Key: MAPREDUCE-5018 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5018 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/streaming >Reporter: Jay Hacker >Priority: Minor > Attachments: MAPREDUCE-5018.patch > > > People often have a need to run older programs over many files, and turn to > Hadoop streaming as a reliable, performant batch system. There are good > reasons for this: > 1. Hadoop is convenient: they may already be using it for mapreduce jobs, and > it is easy to spin up a cluster in the cloud. > 2. It is reliable: HDFS replicates data and the scheduler retries failed jobs. > 3. It is reasonably performant: it moves the code to the data, maintaining > locality, and scales with the number of nodes. > Historically Hadoop is of course oriented toward processing key/value pairs, > and so needs to interpret the data passing through it. Unfortunately, this > makes it difficult to use Hadoop streaming with programs that don't deal in > key/value pairs, or with binary data in general. For example, something as > simple as running md5sum to verify the integrity of files will not give the > correct result, due to Hadoop's interpretation of the data. > There have been several attempts at binary serialization schemes for Hadoop > streaming, such as TypedBytes (HADOOP-1722); however, these are still aimed > at efficiently encoding key/value pairs, and not passing data through > unmodified. Even the "RawBytes" serialization scheme adds length fields to > the data, rendering it not-so-raw. > I often have a need to run a Unix filter on files stored in HDFS; currently, > the only way I can do this on the raw data is to copy the data out and run > the filter on one machine, which is inconvenient, slow, and unreliable. It > would be very convenient to run the filter as a map-only job, allowing me to > build on existing (well-tested!) building blocks in the Unix tradition > instead of reimplementing them as mapreduce programs. > However, most existing tools don't know about file splits, and so want to > process whole files; and of course many expect raw binary input and output. > The solution is to run a map-only job with an InputFormat and OutputFormat > that just pass raw bytes and don't split. It turns out to be a little more > complicated with streaming; I have attached a patch with the simplest > solution I could come up with. I call the format "JustBytes" (as "RawBytes" > was already taken), and it should be usable with most recent versions of > Hadoop. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5018) Support raw binary data with Hadoop streaming
[ https://issues.apache.org/jira/browse/MAPREDUCE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Hacker updated MAPREDUCE-5018: -- Target Version/s: trunk Release Note: Add "-io justbytes" I/O format to allow raw binary streaming. Status: Patch Available (was: Open) This patch adds a 'JustBytesWritable' and supporting InputFormat, OutputFormat, InputWriter, and OutputReader to support passing raw, unmodified, unaugmented bytes through Hadoop streaming. The purpose is to be able to run arbitrary Unix filters on entire binary files stored in HDFS as map-only jobs, taking advantage of locality and reliability offered by Hadoop. The code is very straightforward; most methods are only one line. A few design notes: 1. Data is stored in a JustBytesWritable, which is the simplest possible Writable wrapper around a byte[]. It literally just reads until the buffer is full or EOF and remembers the number of bytes. 2. Data is read by JustBytesInputFormat in 64K chunks by default and stored in a JustBytesWritable key; the value is a NullWritable, but no value is ever read or written. They key is used instead of the value to allow the possibility of using it in a reduce. 3. Input files are never split, as most programs are not able to handle splits. 4. Input files are not decompressed, as the purpose is to get raw data to a program, people may want to operate on compressed data (e.g., md5sum on archives), and as most tools do not expect automatic decompression, this is the "least surprising" option. It's also trivial to throw a "zcat" in front of your filter. 5. Output is even simpler than input, and just writes the bytes of a JustBytesWritable key to the output stream. Output is never compressed, for similar reasons as above. 6. The code uses the old mapred API, as that is what streaming uses. Streaming inserts an InputWriter between the InputFormat and the map executable, and an OutputReader between the map executable and the OutputFormat; the JustBytes version simply pass the key bytes on through. I've augmented IdentifierResolver to recognize "-io justbytes" on the command line and set the input/output classes appropriately. I've included a shell script called "mapstream" to run streaming with all required command line parameters; it makes running a binary map-only job as easy as: mapstream indir command outdir which runs "command" on every file in indir and writes the results to outdir. I welcome feedback, especially if there is an even simpler way to do this. I'm not hung up on the JustBytes name, I'd be happy to switch to a better one. If people like the general approach, I will add unit tests and resubmit. Also please let me know if I should break this into separate patches for common and mapreduce. > Support raw binary data with Hadoop streaming > - > > Key: MAPREDUCE-5018 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5018 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/streaming >Reporter: Jay Hacker >Priority: Minor > > People often have a need to run older programs over many files, and turn to > Hadoop streaming as a reliable, performant batch system. There are good > reasons for this: > 1. Hadoop is convenient: they may already be using it for mapreduce jobs, and > it is easy to spin up a cluster in the cloud. > 2. It is reliable: HDFS replicates data and the scheduler retries failed jobs. > 3. It is reasonably performant: it moves the code to the data, maintaining > locality, and scales with the number of nodes. > Historically Hadoop is of course oriented toward processing key/value pairs, > and so needs to interpret the data passing through it. Unfortunately, this > makes it difficult to use Hadoop streaming with programs that don't deal in > key/value pairs, or with binary data in general. For example, something as > simple as running md5sum to verify the integrity of files will not give the > correct result, due to Hadoop's interpretation of the data. > There have been several attempts at binary serialization schemes for Hadoop > streaming, such as TypedBytes (HADOOP-1722); however, these are still aimed > at efficiently encoding key/value pairs, and not passing data through > unmodified. Even the "RawBytes" serialization scheme adds length fields to > the data, rendering it not-so-raw. > I often have a need to run a Unix filter on files stored in HDFS; currently, > the only way I can do this on the raw data is to copy the data out and run > the filter on one machine, which is inconvenient, slow, and unreliable. It > would be very convenient to run the filter as a map-only job, allowing me to > build on existing (well-tested!) building blocks in the
[jira] [Created] (MAPREDUCE-5019) Fair scheduler should allow peremption on reducer only
Damien Hardy created MAPREDUCE-5019: --- Summary: Fair scheduler should allow peremption on reducer only Key: MAPREDUCE-5019 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5019 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1, scheduler Affects Versions: 2.0.2-alpha Environment: CDH4.1.2 Reporter: Damien Hardy Priority: Minor Fair scheduler is very good. But having a big MR job running lots of mapper and reducer( 10M + 10R ) Then a small MR on the same pool (1M + 1R) having slots for 10 mapper and 10 reducer - The big job take all the map slots - The small job wait for a map slot - 1rst big job map task finish - the small job take the map slot it needs - meanwhile all the reducer of the big job take all the reducer slot to copy and sort - the small job end is map and wait for the all maps to end and for 1 reducer to end before accessing for a reducer slot. - all the reducer stalled after sorting waiting for the mapper to end one by one... If I have a big job and a lot of small, I don't want new small arriving and killing running map tasks of big job to get a slot. I think it could be useful that the small job can kill a reducer tasks (and only reducer) to end before the big job finish all its map tasks and a reducer. rules can be : a job having all its map finished and waiting for reducer slot can kill reducer tasks from a job that still have map slot running (assuming they are just waiting for copy and sort) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5018) Support raw binary data with Hadoop streaming
Jay Hacker created MAPREDUCE-5018: - Summary: Support raw binary data with Hadoop streaming Key: MAPREDUCE-5018 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5018 Project: Hadoop Map/Reduce Issue Type: New Feature Components: contrib/streaming Reporter: Jay Hacker Priority: Minor People often have a need to run older programs over many files, and turn to Hadoop streaming as a reliable, performant batch system. There are good reasons for this: 1. Hadoop is convenient: they may already be using it for mapreduce jobs, and it is easy to spin up a cluster in the cloud. 2. It is reliable: HDFS replicates data and the scheduler retries failed jobs. 3. It is reasonably performant: it moves the code to the data, maintaining locality, and scales with the number of nodes. Historically Hadoop is of course oriented toward processing key/value pairs, and so needs to interpret the data passing through it. Unfortunately, this makes it difficult to use Hadoop streaming with programs that don't deal in key/value pairs, or with binary data in general. For example, something as simple as running md5sum to verify the integrity of files will not give the correct result, due to Hadoop's interpretation of the data. There have been several attempts at binary serialization schemes for Hadoop streaming, such as TypedBytes (HADOOP-1722); however, these are still aimed at efficiently encoding key/value pairs, and not passing data through unmodified. Even the "RawBytes" serialization scheme adds length fields to the data, rendering it not-so-raw. I often have a need to run a Unix filter on files stored in HDFS; currently, the only way I can do this on the raw data is to copy the data out and run the filter on one machine, which is inconvenient, slow, and unreliable. It would be very convenient to run the filter as a map-only job, allowing me to build on existing (well-tested!) building blocks in the Unix tradition instead of reimplementing them as mapreduce programs. However, most existing tools don't know about file splits, and so want to process whole files; and of course many expect raw binary input and output. The solution is to run a map-only job with an InputFormat and OutputFormat that just pass raw bytes and don't split. It turns out to be a little more complicated with streaming; I have attached a patch with the simplest solution I could come up with. I call the format "JustBytes" (as "RawBytes" was already taken), and it should be usable with most recent versions of Hadoop. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method
[ https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583195#comment-13583195 ] Surenkumar Nihalani commented on MAPREDUCE-4974: I don't think we return null if there is no key. So it no longer follows the contract at http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapreduce/RecordReader.html#getCurrentKey%28%29 Can you confirm it works? > Optimising the LineRecordReader initialize() method > --- > > Key: MAPREDUCE-4974 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv1, mrv2, performance >Affects Versions: 2.0.2-alpha, 0.23.5 > Environment: Hadoop Linux >Reporter: Arun A K >Assignee: Gelesh > Labels: patch, performance > Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, > MAPREDUCE-4974.3.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > I found there is a a scope of optimizing the code, over initialize() if we > have compressionCodecs & codec instantiated only if its a compressed input. > Mean while Gelesh George Omathil, added if we could avoid the null check of > key & value. This would time save, since for every next key value generation, > null check is done. The intention being to instantiate only once and avoid > NPE as well. Hope both could be met if initialize key & value over > initialize() method. We both have worked on it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure
[ https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583177#comment-13583177 ] Hudson commented on MAPREDUCE-4951: --- Integrated in Hadoop-Mapreduce-trunk #1351 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1351/]) MAPREDUCE-4951. Container preemption interpreted as task failure. Contributed by Sandy Ryza. (Revision 1448615) Result = FAILURE tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448615 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java > Container preemption interpreted as task failure > > > Key: MAPREDUCE-4951 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mr-am, mrv2 >Affects Versions: 2.0.2-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 2.0.4-beta > > Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951-2.patch, > MAPREDUCE-4951.patch > > > When YARN reports a completed container to the MR AM, it always interprets it > as a failure. This can lead to a job failing because too many of its tasks > failed, when in fact they only failed because the scheduler preempted them. > MR needs to recognize the special exit code value of -100 and interpret it as > a container being killed instead of a container failure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5013) mapred.JobStatus compatibility: MR2 missing constructors from MR1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583178#comment-13583178 ] Hudson commented on MAPREDUCE-5013: --- Integrated in Hadoop-Mapreduce-trunk #1351 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1351/]) MAPREDUCE-5013. mapred.JobStatus compatibility: MR2 missing constructors from MR1. Contributed by Sandy Ryza. (Revision 1448602) Result = FAILURE tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448602 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/JobStatus.java > mapred.JobStatus compatibility: MR2 missing constructors from MR1 > - > > Key: MAPREDUCE-5013 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5013 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 2.0.4-beta > > Attachments: MAPREDUCE-5013.patch > > > JobStatus is missing the following constructors in MR2 that were present in > MR1 > public org.apache.hadoop.mapred.JobStatus(org.apache.hadoop.mapred.JobID, > float, float, float, int); > public org.apache.hadoop.mapred.JobStatus(org.apache.hadoop.mapred.JobID, > float, float, int); > public org.apache.hadoop.mapred.JobStatus(org.apache.hadoop.mapred.JobID, > float, float, float, int, org.apache.hadoop.mapred.JobPriority); > public org.apache.hadoop.mapred.JobStatus(org.apache.hadoop.mapred.JobID, > float, float, float, float, int, org.apache.hadoop.mapred.JobPriority); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4846) Some JobQueueInfo methods are public in MR1 but protected in MR2
[ https://issues.apache.org/jira/browse/MAPREDUCE-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583176#comment-13583176 ] Hudson commented on MAPREDUCE-4846: --- Integrated in Hadoop-Mapreduce-trunk #1351 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1351/]) MAPREDUCE-4846. Some JobQueueInfo methods are public in MR1 but protected in MR2. Contributed by Sandy Ryza. (Revision 1448597) Result = FAILURE tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448597 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/JobQueueInfo.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/QueueConfigurationParser.java > Some JobQueueInfo methods are public in MR1 but protected in MR2 > > > Key: MAPREDUCE-4846 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4846 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Affects Versions: 2.0.2-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 2.0.4-beta > > Attachments: MAPREDUCE-4846-1.patch, MAPREDUCE-4846-1.patch, > MAPREDUCE-4846.patch > > > setQueueName, setSchedulingInfo, and setQueueState were public in MR1, but > are private int MR2. They should be made public with > InterfaceAudience.Private. > getQueueState was public, but is now package private. It has been replaced > with getState, which returns a QueueState instead of a String. It should be > made public and deprecated, with a documentation reference to getState. > Should the other setter methods in JobQueueInfo that were not in MR1 be > changed to public/InterfaceAudience.Private for consistency? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4502) Multi-level aggregation with combining the result of maps per node/rack
[ https://issues.apache.org/jira/browse/MAPREDUCE-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583150#comment-13583150 ] Tsuyoshi OZAWA commented on MAPREDUCE-4502: --- Can someone explain why my patch got "-1 overall" while this patch passed all checks? > Multi-level aggregation with combining the result of maps per node/rack > --- > > Key: MAPREDUCE-4502 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4502 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Affects Versions: 3.0.0 >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: design_v2.pdf, MAPREDUCE-4502.1.patch, > MAPREDUCE-4502.2.patch, MAPREDUCE-4502.3.patch, MAPREDUCE-4502.4.patch, > MAPREDUCE-4525-pof.diff, speculative_draft.pdf > > > The shuffle costs is expensive in Hadoop in spite of the existence of > combiner, because the scope of combining is limited within only one MapTask. > To solve this problem, it's a good way to aggregate the result of maps per > node/rack by launch combiner. > This JIRA is to implement the multi-level aggregation infrastructure, > including combining per container(MAPREDUCE-3902 is related), coordinating > containers by application master without breaking fault tolerance of jobs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4951) Container preemption interpreted as task failure
[ https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated MAPREDUCE-4951: - Resolution: Fixed Fix Version/s: 2.0.4-beta Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I just committed this. Thanks Sandy. > Container preemption interpreted as task failure > > > Key: MAPREDUCE-4951 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mr-am, mrv2 >Affects Versions: 2.0.2-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 2.0.4-beta > > Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951-2.patch, > MAPREDUCE-4951.patch > > > When YARN reports a completed container to the MR AM, it always interprets it > as a failure. This can lead to a job failing because too many of its tasks > failed, when in fact they only failed because the scheduler preempted them. > MR needs to recognize the special exit code value of -100 and interpret it as > a container being killed instead of a container failure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure
[ https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583138#comment-13583138 ] Hudson commented on MAPREDUCE-4951: --- Integrated in Hadoop-trunk-Commit #3372 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3372/]) MAPREDUCE-4951. Container preemption interpreted as task failure. Contributed by Sandy Ryza. (Revision 1448615) Result = SUCCESS tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448615 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java > Container preemption interpreted as task failure > > > Key: MAPREDUCE-4951 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mr-am, mrv2 >Affects Versions: 2.0.2-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951-2.patch, > MAPREDUCE-4951.patch > > > When YARN reports a completed container to the MR AM, it always interprets it > as a failure. This can lead to a job failing because too many of its tasks > failed, when in fact they only failed because the scheduler preempted them. > MR needs to recognize the special exit code value of -100 and interpret it as > a container being killed instead of a container failure. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5013) mapred.JobStatus compatibility: MR2 missing constructors from MR1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583132#comment-13583132 ] Hudson commented on MAPREDUCE-5013: --- Integrated in Hadoop-trunk-Commit #3371 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3371/]) MAPREDUCE-5013. mapred.JobStatus compatibility: MR2 missing constructors from MR1. Contributed by Sandy Ryza. (Revision 1448602) Result = SUCCESS tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448602 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/JobStatus.java > mapred.JobStatus compatibility: MR2 missing constructors from MR1 > - > > Key: MAPREDUCE-5013 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5013 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 2.0.4-beta > > Attachments: MAPREDUCE-5013.patch > > > JobStatus is missing the following constructors in MR2 that were present in > MR1 > public org.apache.hadoop.mapred.JobStatus(org.apache.hadoop.mapred.JobID, > float, float, float, int); > public org.apache.hadoop.mapred.JobStatus(org.apache.hadoop.mapred.JobID, > float, float, int); > public org.apache.hadoop.mapred.JobStatus(org.apache.hadoop.mapred.JobID, > float, float, float, int, org.apache.hadoop.mapred.JobPriority); > public org.apache.hadoop.mapred.JobStatus(org.apache.hadoop.mapred.JobID, > float, float, float, float, int, org.apache.hadoop.mapred.JobPriority); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4502) Multi-level aggregation with combining the result of maps per node/rack
[ https://issues.apache.org/jira/browse/MAPREDUCE-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583133#comment-13583133 ] Hadoop QA commented on MAPREDUCE-4502: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12570292/MAPREDUCE-4502.4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified test files. {color:green}+1 tests included appear to have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3351//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3351//console This message is automatically generated. > Multi-level aggregation with combining the result of maps per node/rack > --- > > Key: MAPREDUCE-4502 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4502 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Affects Versions: 3.0.0 >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: design_v2.pdf, MAPREDUCE-4502.1.patch, > MAPREDUCE-4502.2.patch, MAPREDUCE-4502.3.patch, MAPREDUCE-4502.4.patch, > MAPREDUCE-4525-pof.diff, speculative_draft.pdf > > > The shuffle costs is expensive in Hadoop in spite of the existence of > combiner, because the scope of combining is limited within only one MapTask. > To solve this problem, it's a good way to aggregate the result of maps per > node/rack by launch combiner. > This JIRA is to implement the multi-level aggregation infrastructure, > including combining per container(MAPREDUCE-3902 is related), coordinating > containers by application master without breaking fault tolerance of jobs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4846) Some JobQueueInfo methods are public in MR1 but protected in MR2
[ https://issues.apache.org/jira/browse/MAPREDUCE-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583126#comment-13583126 ] Hudson commented on MAPREDUCE-4846: --- Integrated in Hadoop-trunk-Commit #3370 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/3370/]) MAPREDUCE-4846. Some JobQueueInfo methods are public in MR1 but protected in MR2. Contributed by Sandy Ryza. (Revision 1448597) Result = SUCCESS tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448597 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/JobQueueInfo.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/QueueConfigurationParser.java > Some JobQueueInfo methods are public in MR1 but protected in MR2 > > > Key: MAPREDUCE-4846 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4846 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Affects Versions: 2.0.2-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 2.0.4-beta > > Attachments: MAPREDUCE-4846-1.patch, MAPREDUCE-4846-1.patch, > MAPREDUCE-4846.patch > > > setQueueName, setSchedulingInfo, and setQueueState were public in MR1, but > are private int MR2. They should be made public with > InterfaceAudience.Private. > getQueueState was public, but is now package private. It has been replaced > with getState, which returns a QueueState instead of a String. It should be > made public and deprecated, with a documentation reference to getState. > Should the other setter methods in JobQueueInfo that were not in MR1 be > changed to public/InterfaceAudience.Private for consistency? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5013) mapred.JobStatus compatibility: MR2 missing constructors from MR1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated MAPREDUCE-5013: - Resolution: Fixed Fix Version/s: 2.0.4-beta Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) +1 I just committed this. Thanks Sandy. > mapred.JobStatus compatibility: MR2 missing constructors from MR1 > - > > Key: MAPREDUCE-5013 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5013 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 2.0.4-beta > > Attachments: MAPREDUCE-5013.patch > > > JobStatus is missing the following constructors in MR2 that were present in > MR1 > public org.apache.hadoop.mapred.JobStatus(org.apache.hadoop.mapred.JobID, > float, float, float, int); > public org.apache.hadoop.mapred.JobStatus(org.apache.hadoop.mapred.JobID, > float, float, int); > public org.apache.hadoop.mapred.JobStatus(org.apache.hadoop.mapred.JobID, > float, float, float, int, org.apache.hadoop.mapred.JobPriority); > public org.apache.hadoop.mapred.JobStatus(org.apache.hadoop.mapred.JobID, > float, float, float, float, int, org.apache.hadoop.mapred.JobPriority); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5008) Merger progress miscounts with respect to EOF_MARKER
[ https://issues.apache.org/jira/browse/MAPREDUCE-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583125#comment-13583125 ] Tom White commented on MAPREDUCE-5008: -- Is there a way to test this, manual or otherwise? > Merger progress miscounts with respect to EOF_MARKER > > > Key: MAPREDUCE-5008 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5008 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.3-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5008-branch-1.patch, MAPREDUCE-5008.patch, > MAPREDUCE-5008.patch > > > After MAPREDUCE-2264, a segment's raw data length is calculated without the > EOF_MARKER bytes. However, when the merge is counting how many bytes it > processed, it includes the marker. This can cause the merge progress to go > above 100%. > Whether these EOF_MARKER bytes should count should be consistent between the > two. > This a JIRA instead of an amendment because MAPREDUCE-2264 already went into > 2.0.3. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4846) Some JobQueueInfo methods are public in MR1 but protected in MR2
[ https://issues.apache.org/jira/browse/MAPREDUCE-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom White updated MAPREDUCE-4846: - Resolution: Fixed Fix Version/s: 2.0.4-beta Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) +1 I just committed this. Thanks Sandy. > Some JobQueueInfo methods are public in MR1 but protected in MR2 > > > Key: MAPREDUCE-4846 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4846 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Affects Versions: 2.0.2-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 2.0.4-beta > > Attachments: MAPREDUCE-4846-1.patch, MAPREDUCE-4846-1.patch, > MAPREDUCE-4846.patch > > > setQueueName, setSchedulingInfo, and setQueueState were public in MR1, but > are private int MR2. They should be made public with > InterfaceAudience.Private. > getQueueState was public, but is now package private. It has been replaced > with getState, which returns a QueueState instead of a String. It should be > made public and deprecated, with a documentation reference to getState. > Should the other setter methods in JobQueueInfo that were not in MR1 be > changed to public/InterfaceAudience.Private for consistency? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5006) streaming tests failing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583120#comment-13583120 ] Tom White commented on MAPREDUCE-5006: -- Sandy, what changed to cause the tests to fail? > streaming tests failing > --- > > Key: MAPREDUCE-5006 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5006 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 2.0.4-beta >Reporter: Alejandro Abdelnur >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5006.patch > > > The following 2 tests are failing in trunk > * org.apache.hadoop.streaming.TestStreamReduceNone > * org.apache.hadoop.streaming.TestStreamXmlRecordReader -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4502) Multi-level aggregation with combining the result of maps per node/rack
[ https://issues.apache.org/jira/browse/MAPREDUCE-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-4502: -- Attachment: MAPREDUCE-4502.4.patch Fixed to handle exception correctly and add timeout to TestJobConf. > Multi-level aggregation with combining the result of maps per node/rack > --- > > Key: MAPREDUCE-4502 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4502 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Affects Versions: 3.0.0 >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: design_v2.pdf, MAPREDUCE-4502.1.patch, > MAPREDUCE-4502.2.patch, MAPREDUCE-4502.3.patch, MAPREDUCE-4502.4.patch, > MAPREDUCE-4525-pof.diff, speculative_draft.pdf > > > The shuffle costs is expensive in Hadoop in spite of the existence of > combiner, because the scope of combining is limited within only one MapTask. > To solve this problem, it's a good way to aggregate the result of maps per > node/rack by launch combiner. > This JIRA is to implement the multi-level aggregation infrastructure, > including combining per container(MAPREDUCE-3902 is related), coordinating > containers by application master without breaking fault tolerance of jobs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5017) Provide access to launcher job URL from web console when using Map Reduce action
[ https://issues.apache.org/jira/browse/MAPREDUCE-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583045#comment-13583045 ] Harsh J commented on MAPREDUCE-5017: Hi Ryota -- We can (in future) also make use of the 'More Actions -> Move' feature on top, to move JIRAs between project, and save time that way :) > Provide access to launcher job URL from web console when using Map Reduce > action > - > > Key: MAPREDUCE-5017 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5017 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: trunk >Reporter: Ryota Egashira >Assignee: Ryota Egashira > Fix For: trunk > > > there are applications where custom inputformat used in MR action, and log > message from the inputformat is written on launcher task log. for debugging > purpose, users need to check the launcher task log. but currently in MR > action, oozie automatically swaps external ID, and do not expose the launcher > ID in web-console. (now only way is to to grep oozie.log). this jira is to > show launcher job URL on web console when using Map Reduce action -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4991) coverage for gridmix
[ https://issues.apache.org/jira/browse/MAPREDUCE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583019#comment-13583019 ] Hadoop QA commented on MAPREDUCE-4991: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12569337/MAPREDUCE-4991-branch-0.23.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified test files. {color:red}-1 one of tests included doesn't have a timeout.{color} {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3350//console This message is automatically generated. > coverage for gridmix > > > Key: MAPREDUCE-4991 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4991 > Project: Hadoop Map/Reduce > Issue Type: Test >Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.7 >Reporter: Aleksey Gorshkov > Attachments: MAPREDUCE-4991-branch-0.23.patch, > MAPREDUCE-4991-branch-2.patch, MAPREDUCE-4991-trunk.patch > > > fix coverage for GridMix > MAPREDUCE-4991-trunk.patch patch for thunk > MAPREDUCE-4991-branch-2.patch for branch-2 and > MAPREDUCE-4991-branch-0.23.patch for branch-0.23 > known fail > -org.apache.hadoop.mapred.gridmix.TestGridmixSummary.testExecutionSummarizer. > It is for next issue -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4991) coverage for gridmix
[ https://issues.apache.org/jira/browse/MAPREDUCE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Gorshkov updated MAPREDUCE-4991: Status: Patch Available (was: Open) > coverage for gridmix > > > Key: MAPREDUCE-4991 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4991 > Project: Hadoop Map/Reduce > Issue Type: Test >Affects Versions: 2.0.3-alpha, 3.0.0, 0.23.7 >Reporter: Aleksey Gorshkov > Attachments: MAPREDUCE-4991-branch-0.23.patch, > MAPREDUCE-4991-branch-2.patch, MAPREDUCE-4991-trunk.patch > > > fix coverage for GridMix > MAPREDUCE-4991-trunk.patch patch for thunk > MAPREDUCE-4991-branch-2.patch for branch-2 and > MAPREDUCE-4991-branch-0.23.patch for branch-0.23 > known fail > -org.apache.hadoop.mapred.gridmix.TestGridmixSummary.testExecutionSummarizer. > It is for next issue -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4502) Multi-level aggregation with combining the result of maps per node/rack
[ https://issues.apache.org/jira/browse/MAPREDUCE-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583003#comment-13583003 ] Hadoop QA commented on MAPREDUCE-4502: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12570272/MAPREDUCE-4502.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified test files. {color:red}-1 one of tests included doesn't have a timeout.{color} {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3349//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3349//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3349//console This message is automatically generated. > Multi-level aggregation with combining the result of maps per node/rack > --- > > Key: MAPREDUCE-4502 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4502 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Affects Versions: 3.0.0 >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: design_v2.pdf, MAPREDUCE-4502.1.patch, > MAPREDUCE-4502.2.patch, MAPREDUCE-4502.3.patch, MAPREDUCE-4525-pof.diff, > speculative_draft.pdf > > > The shuffle costs is expensive in Hadoop in spite of the existence of > combiner, because the scope of combining is limited within only one MapTask. > To solve this problem, it's a good way to aggregate the result of maps per > node/rack by launch combiner. > This JIRA is to implement the multi-level aggregation infrastructure, > including combining per container(MAPREDUCE-3902 is related), coordinating > containers by application master without breaking fault tolerance of jobs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira