[jira] [Created] (MAPREDUCE-5022) Tasklogs disappear if JVM reuse is enabled

2013-02-21 Thread Karthik Kambatla (JIRA)
Karthik Kambatla created MAPREDUCE-5022:
---

 Summary: Tasklogs disappear if JVM reuse is enabled
 Key: MAPREDUCE-5022
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5022
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 1.1.1
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla


Can't see task logs when mapred.job.reuse.jvm.num.tasks is set to -1, but the 
logs are visible when the same is set 1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4443) MR AM and job history server should be resilient to jobs that exceed counter limits

2013-02-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-4443:
---

Labels: usability  (was: )

> MR AM and job history server should be resilient to jobs that exceed counter 
> limits 
> 
>
> Key: MAPREDUCE-4443
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4443
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Rahul Jain
>  Labels: usability
> Attachments: am_failed_counter_limits.txt
>
>
> We saw this problem migrating applications to MapReduceV2:
> Our applications use hadoop counters extensively (1000+ counters for certain 
> jobs). While this may not be one of recommended best practices in hadoop, the 
> real issue here is reliability of the framework when applications exceed 
> counter limits.
> The hadoop servers (yarn, history server) were originally brought up with 
> mapreduce.job.counters.max=1000 under core-site.xml
> We then ran map-reduce job under an application using its own job specific 
> overrides, with  mapreduce.job.counters.max=1
> All the tasks for the job finished successfully; however the overall job 
> still failed due to AM encountering exceptions as:
> {code}
> 2012-07-12 17:31:43,485 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks
> : 712012-07-12 17:31:43,502 FATAL [AsyncDispatcher event handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher threa
> dorg.apache.hadoop.mapreduce.counters.LimitExceededException: Too many 
> counters: 1001 max=1000
> at 
> org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:58) 
>at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:65)
> at 
> org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:77)
> at 
> org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:94)
> at 
> org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:105)
> at 
> org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.incrAllCounters(AbstractCounterGroup.java:202)
> at 
> org.apache.hadoop.mapreduce.counters.AbstractCounters.incrAllCounters(AbstractCounters.java:337)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.constructFinalFullcounters(JobImpl.java:1212)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.mayBeConstructFinalFullCounters(JobImpl.java:1198)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.createJobFinishedEvent(JobImpl.java:1179)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.logJobHistoryFinishedEvent(JobImpl.java:711)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.checkJobCompleteSuccess(JobImpl.java:737)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$TaskCompletedTransition.checkJobForCompletion(JobImpl.java:1360)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$TaskCompletedTransition.transition(JobImpl.java:1340)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$TaskCompletedTransition.transition(JobImpl.java:1323)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:380)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:666)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:113)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:890)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:886)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:74)   
>  at java.lang.Thread.run(Thread.java:662)
> 2012-07-12 17:31:43,502 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..2012-07-12 
> 17:31:43,503 INFO [Thread-1] org.apache.had
> {code}
> The overall job failed, and the job history wasn't accessible either at the 
> end of the job (didn't

[jira] [Updated] (MAPREDUCE-4443) MR AM and job history server should be resilient to jobs that exceed counter limits

2013-02-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-4443:
---

Summary: MR AM and job history server should be resilient to jobs that 
exceed counter limits   (was: Yarn framework components (AM, job history 
server) should be resilient to applications exceeding counter limits )

> MR AM and job history server should be resilient to jobs that exceed counter 
> limits 
> 
>
> Key: MAPREDUCE-4443
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4443
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Rahul Jain
> Attachments: am_failed_counter_limits.txt
>
>
> We saw this problem migrating applications to MapReduceV2:
> Our applications use hadoop counters extensively (1000+ counters for certain 
> jobs). While this may not be one of recommended best practices in hadoop, the 
> real issue here is reliability of the framework when applications exceed 
> counter limits.
> The hadoop servers (yarn, history server) were originally brought up with 
> mapreduce.job.counters.max=1000 under core-site.xml
> We then ran map-reduce job under an application using its own job specific 
> overrides, with  mapreduce.job.counters.max=1
> All the tasks for the job finished successfully; however the overall job 
> still failed due to AM encountering exceptions as:
> {code}
> 2012-07-12 17:31:43,485 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks
> : 712012-07-12 17:31:43,502 FATAL [AsyncDispatcher event handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher threa
> dorg.apache.hadoop.mapreduce.counters.LimitExceededException: Too many 
> counters: 1001 max=1000
> at 
> org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:58) 
>at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:65)
> at 
> org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounter(AbstractCounterGroup.java:77)
> at 
> org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.addCounterImpl(AbstractCounterGroup.java:94)
> at 
> org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.findCounter(AbstractCounterGroup.java:105)
> at 
> org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.incrAllCounters(AbstractCounterGroup.java:202)
> at 
> org.apache.hadoop.mapreduce.counters.AbstractCounters.incrAllCounters(AbstractCounters.java:337)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.constructFinalFullcounters(JobImpl.java:1212)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.mayBeConstructFinalFullCounters(JobImpl.java:1198)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.createJobFinishedEvent(JobImpl.java:1179)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.logJobHistoryFinishedEvent(JobImpl.java:711)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.checkJobCompleteSuccess(JobImpl.java:737)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$TaskCompletedTransition.checkJobForCompletion(JobImpl.java:1360)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$TaskCompletedTransition.transition(JobImpl.java:1340)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$TaskCompletedTransition.transition(JobImpl.java:1323)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:380)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:666)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:113)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:890)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:886)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:74)   
>  at java.lang.Thread.run(Thread.java:662)
> 2012-07-12 17:31:43,502 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..2012-07-12 
>

[jira] [Commented] (MAPREDUCE-377) Add serialization for Protocol Buffers

2013-02-21 Thread Josh Hansen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583715#comment-13583715
 ] 

Josh Hansen commented on MAPREDUCE-377:
---

writeDelimitedTo(OutputStream), mergeDelimitedFrom(InputStream), and 
parseDelimitedFrom(InputStream) have all made it into the standard Protocol 
Buffers library now. See 
https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/MessageLite#writeDelimitedTo(java.io.OutputStream)
 . That should resolve one obvious obstacle to addressing this issue.

There were questions a few years ago about whether this issue is still 
relevant; I'm with Tom White that it's very relevant for people who want to use 
their protobuf data in Hadoop MapReduce. Avro in particular doesn't meet the 
needs of my organization due to its lack of a sparse representation.

Twitter's elephant-bird library (https://github.com/kevinweil/elephant-bird) 
provides some protobuf-in-Hadoop support, but it's less than obvious how to use 
it with protobufs that are not LZO-compressed.

> Add serialization for Protocol Buffers
> --
>
> Key: MAPREDUCE-377
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-377
> Project: Hadoop Map/Reduce
>  Issue Type: Wish
>Reporter: Tom White
>Assignee: Alex Loddengaard
> Attachments: hadoop-3788-v1.patch, hadoop-3788-v2.patch, 
> hadoop-3788-v3.patch, protobuf-java-2.0.1.jar, protobuf-java-2.0.2.jar
>
>
> Protocol Buffers (http://code.google.com/p/protobuf/) are a way of encoding 
> data in a compact binary format. This issue is to write a 
> ProtocolBuffersSerialization to support using Protocol Buffers types in 
> MapReduce programs, including an example program. This should probably go 
> into contrib. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5021) Add an addDirectoryToClassPath method DistributedCache

2013-02-21 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583693#comment-13583693
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5021:
---

This is equivalent to how java -classpath supports wildcards, i.e. {{lib/'*'}} 
to easily specify all JARs in a directory. Regarding adding a new method, an 
alternative would be making the existing addFileToClasspath() to detect if the 
provided path is a directory and then do the addition of all JARs under it.

> Add an addDirectoryToClassPath method DistributedCache
> --
>
> Key: MAPREDUCE-5021
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5021
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client, distributed-cache
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>
> As adding a directory of jars to the class path is a common use for the 
> distributed cache it would be easier on API consumers if they were able to 
> call a method that would add all the the files in a directory for them. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5021) Add an addDirectoryToClassPath method DistributedCache

2013-02-21 Thread Sandy Ryza (JIRA)
Sandy Ryza created MAPREDUCE-5021:
-

 Summary: Add an addDirectoryToClassPath method DistributedCache
 Key: MAPREDUCE-5021
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5021
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client, distributed-cache
Affects Versions: 2.0.3-alpha
Reporter: Sandy Ryza


As adding a directory of jars to the class path is a common use for the 
distributed cache it would be easier on API consumers if they were able to call 
a method that would add all the the files in a directory for them. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5020) Compile failure with JDK8

2013-02-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583628#comment-13583628
 ] 

Hadoop QA commented on MAPREDUCE-5020:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12570367/MAPREDUCE-5020.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3354//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3354//console

This message is automatically generated.

> Compile failure with JDK8
> -
>
> Key: MAPREDUCE-5020
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5020
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.0.3-alpha
> Environment: java version "1.8.0-ea"
> Java(TM) SE Runtime Environment (build 1.8.0-ea-b36e)
> Java HotSpot(TM) Client VM (build 25.0-b04, mixed mode)
>Reporter: Trevor Robinson
>Assignee: Trevor Robinson
>  Labels: build-failure, jdk8
> Attachments: MAPREDUCE-5020.patch
>
>
> Compiling {{org/apache/hadoop/mapreduce/lib/partition/InputSampler.java}} 
> fails with the Java 8 preview compiler due to its stricter enforcement of JLS 
> 15.12.2.6 (for [Java 
> 5|http://docs.oracle.com/javase/specs/jls/se5.0/html/expressions.html#15.12.2.6]
>  or [Java 
> 7|http://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.12.2.6]),
>  which demands that methods applicable via unchecked conversion have their 
> return type erased:
> {noformat}
> [ERROR] 
> hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/InputSampler.java:[320,35]
>  error: incompatible types: Object[] cannot be converted to K[]
> {noformat}
> {code}
>   @SuppressWarnings("unchecked") // getInputFormat, getOutputKeyComparator
>   public static  void writePartitionFile(Job job, Sampler sampler) 
>   throws IOException, ClassNotFoundException, InterruptedException {
> Configuration conf = job.getConfiguration();
> final InputFormat inf = 
> ReflectionUtils.newInstance(job.getInputFormatClass(), conf);
> int numPartitions = job.getNumReduceTasks();
> K[] samples = sampler.getSample(inf, job); // returns Object[] according 
> to JLS
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5020) Compile failure with JDK8

2013-02-21 Thread Trevor Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Robinson updated MAPREDUCE-5020:
---

Attachment: MAPREDUCE-5020.patch

The attached patch simply adds a {{(K[])}} cast to the result of 
{{sampler.getSample()}}. This was sufficient to build Hadoop with the Java 8 
(preview) compiler.

> Compile failure with JDK8
> -
>
> Key: MAPREDUCE-5020
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5020
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.0.3-alpha
> Environment: java version "1.8.0-ea"
> Java(TM) SE Runtime Environment (build 1.8.0-ea-b36e)
> Java HotSpot(TM) Client VM (build 25.0-b04, mixed mode)
>Reporter: Trevor Robinson
>  Labels: build-failure, jdk8
> Attachments: MAPREDUCE-5020.patch
>
>
> Compiling {{org/apache/hadoop/mapreduce/lib/partition/InputSampler.java}} 
> fails with the Java 8 preview compiler due to its stricter enforcement of JLS 
> 15.12.2.6 (for [Java 
> 5|http://docs.oracle.com/javase/specs/jls/se5.0/html/expressions.html#15.12.2.6]
>  or [Java 
> 7|http://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.12.2.6]),
>  which demands that methods applicable via unchecked conversion have their 
> return type erased:
> {noformat}
> [ERROR] 
> hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/InputSampler.java:[320,35]
>  error: incompatible types: Object[] cannot be converted to K[]
> {noformat}
> {code}
>   @SuppressWarnings("unchecked") // getInputFormat, getOutputKeyComparator
>   public static  void writePartitionFile(Job job, Sampler sampler) 
>   throws IOException, ClassNotFoundException, InterruptedException {
> Configuration conf = job.getConfiguration();
> final InputFormat inf = 
> ReflectionUtils.newInstance(job.getInputFormatClass(), conf);
> int numPartitions = job.getNumReduceTasks();
> K[] samples = sampler.getSample(inf, job); // returns Object[] according 
> to JLS
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5020) Compile failure with JDK8

2013-02-21 Thread Trevor Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Robinson updated MAPREDUCE-5020:
---

Status: Patch Available  (was: Open)

> Compile failure with JDK8
> -
>
> Key: MAPREDUCE-5020
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5020
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.0.3-alpha
> Environment: java version "1.8.0-ea"
> Java(TM) SE Runtime Environment (build 1.8.0-ea-b36e)
> Java HotSpot(TM) Client VM (build 25.0-b04, mixed mode)
>Reporter: Trevor Robinson
>Assignee: Trevor Robinson
>  Labels: build-failure, jdk8
> Attachments: MAPREDUCE-5020.patch
>
>
> Compiling {{org/apache/hadoop/mapreduce/lib/partition/InputSampler.java}} 
> fails with the Java 8 preview compiler due to its stricter enforcement of JLS 
> 15.12.2.6 (for [Java 
> 5|http://docs.oracle.com/javase/specs/jls/se5.0/html/expressions.html#15.12.2.6]
>  or [Java 
> 7|http://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.12.2.6]),
>  which demands that methods applicable via unchecked conversion have their 
> return type erased:
> {noformat}
> [ERROR] 
> hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/InputSampler.java:[320,35]
>  error: incompatible types: Object[] cannot be converted to K[]
> {noformat}
> {code}
>   @SuppressWarnings("unchecked") // getInputFormat, getOutputKeyComparator
>   public static  void writePartitionFile(Job job, Sampler sampler) 
>   throws IOException, ClassNotFoundException, InterruptedException {
> Configuration conf = job.getConfiguration();
> final InputFormat inf = 
> ReflectionUtils.newInstance(job.getInputFormatClass(), conf);
> int numPartitions = job.getNumReduceTasks();
> K[] samples = sampler.getSample(inf, job); // returns Object[] according 
> to JLS
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (MAPREDUCE-5020) Compile failure with JDK8

2013-02-21 Thread Trevor Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Robinson reassigned MAPREDUCE-5020:
--

Assignee: Trevor Robinson

> Compile failure with JDK8
> -
>
> Key: MAPREDUCE-5020
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5020
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.0.3-alpha
> Environment: java version "1.8.0-ea"
> Java(TM) SE Runtime Environment (build 1.8.0-ea-b36e)
> Java HotSpot(TM) Client VM (build 25.0-b04, mixed mode)
>Reporter: Trevor Robinson
>Assignee: Trevor Robinson
>  Labels: build-failure, jdk8
> Attachments: MAPREDUCE-5020.patch
>
>
> Compiling {{org/apache/hadoop/mapreduce/lib/partition/InputSampler.java}} 
> fails with the Java 8 preview compiler due to its stricter enforcement of JLS 
> 15.12.2.6 (for [Java 
> 5|http://docs.oracle.com/javase/specs/jls/se5.0/html/expressions.html#15.12.2.6]
>  or [Java 
> 7|http://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.12.2.6]),
>  which demands that methods applicable via unchecked conversion have their 
> return type erased:
> {noformat}
> [ERROR] 
> hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/InputSampler.java:[320,35]
>  error: incompatible types: Object[] cannot be converted to K[]
> {noformat}
> {code}
>   @SuppressWarnings("unchecked") // getInputFormat, getOutputKeyComparator
>   public static  void writePartitionFile(Job job, Sampler sampler) 
>   throws IOException, ClassNotFoundException, InterruptedException {
> Configuration conf = job.getConfiguration();
> final InputFormat inf = 
> ReflectionUtils.newInstance(job.getInputFormatClass(), conf);
> int numPartitions = job.getNumReduceTasks();
> K[] samples = sampler.getSample(inf, job); // returns Object[] according 
> to JLS
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-21 Thread Surenkumar Nihalani (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583592#comment-13583592
 ] 

Surenkumar Nihalani commented on MAPREDUCE-4974:


I don't see the null check for key/value being the beneficial part of the 
optimization. Can you post the patch and I'll review it?

I agree with the change 
bq. 2) if we have ' compressionCodecs & codec instantiated only if its a 
compressed input. '


> Optimising the LineRecordReader initialize() method
> ---
>
> Key: MAPREDUCE-4974
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1, mrv2, performance
>Affects Versions: 2.0.2-alpha, 0.23.5
> Environment: Hadoop Linux
>Reporter: Arun A K
>Assignee: Gelesh
>  Labels: patch, performance
> Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, 
> MAPREDUCE-4974.3.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I found there is a a scope of optimizing the code, over initialize() if we 
> have compressionCodecs & codec instantiated only if its a compressed input.
> Mean while Gelesh George Omathil, added if we could avoid the null check of 
> key & value. This would time save, since for every next key value generation, 
> null check is done. The intention being to instantiate only once and avoid 
> NPE as well. Hope both could be met if initialize key & value over  
> initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5020) Compile failure with JDK8

2013-02-21 Thread Trevor Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Robinson updated MAPREDUCE-5020:
---

Labels: build-failure jdk8  (was: jdk8)

> Compile failure with JDK8
> -
>
> Key: MAPREDUCE-5020
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5020
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.0.3-alpha
> Environment: java version "1.8.0-ea"
> Java(TM) SE Runtime Environment (build 1.8.0-ea-b36e)
> Java HotSpot(TM) Client VM (build 25.0-b04, mixed mode)
>Reporter: Trevor Robinson
>  Labels: build-failure, jdk8
>
> Compiling {{org/apache/hadoop/mapreduce/lib/partition/InputSampler.java}} 
> fails with the Java 8 preview compiler due to its stricter enforcement of JLS 
> 15.12.2.6 (for [Java 
> 5|http://docs.oracle.com/javase/specs/jls/se5.0/html/expressions.html#15.12.2.6]
>  or [Java 
> 7|http://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.12.2.6]),
>  which demands that methods applicable via unchecked conversion have their 
> return type erased:
> {noformat}
> [ERROR] 
> hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/InputSampler.java:[320,35]
>  error: incompatible types: Object[] cannot be converted to K[]
> {noformat}
> {code}
>   @SuppressWarnings("unchecked") // getInputFormat, getOutputKeyComparator
>   public static  void writePartitionFile(Job job, Sampler sampler) 
>   throws IOException, ClassNotFoundException, InterruptedException {
> Configuration conf = job.getConfiguration();
> final InputFormat inf = 
> ReflectionUtils.newInstance(job.getInputFormatClass(), conf);
> int numPartitions = job.getNumReduceTasks();
> K[] samples = sampler.getSample(inf, job); // returns Object[] according 
> to JLS
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5020) Compile failure with JDK8

2013-02-21 Thread Trevor Robinson (JIRA)
Trevor Robinson created MAPREDUCE-5020:
--

 Summary: Compile failure with JDK8
 Key: MAPREDUCE-5020
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5020
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 2.0.3-alpha
 Environment: java version "1.8.0-ea"
Java(TM) SE Runtime Environment (build 1.8.0-ea-b36e)
Java HotSpot(TM) Client VM (build 25.0-b04, mixed mode)
Reporter: Trevor Robinson


Compiling `org/apache/hadoop/mapreduce/lib/partition/InputSampler.java` fails 
with the Java 8 preview compiler due to its stricter enforcement of JLS 
15.12.2.6 (for [Java 
5|http://docs.oracle.com/javase/specs/jls/se5.0/html/expressions.html#15.12.2.6]
 or [Java 
7|http://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.12.2.6]), 
which demands that methods applicable via unchecked conversion have their 
return type erased:

{noformat}
[ERROR] 
hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/InputSampler.java:[320,35]
 error: incompatible types: Object[] cannot be converted to K[]
{noformat}

{code}
  @SuppressWarnings("unchecked") // getInputFormat, getOutputKeyComparator
  public static  void writePartitionFile(Job job, Sampler sampler) 
  throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = job.getConfiguration();
final InputFormat inf = 
ReflectionUtils.newInstance(job.getInputFormatClass(), conf);
int numPartitions = job.getNumReduceTasks();
K[] samples = sampler.getSample(inf, job); // returns Object[] according to 
JLS
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5020) Compile failure with JDK8

2013-02-21 Thread Trevor Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Trevor Robinson updated MAPREDUCE-5020:
---

Description: 
Compiling {{org/apache/hadoop/mapreduce/lib/partition/InputSampler.java}} fails 
with the Java 8 preview compiler due to its stricter enforcement of JLS 
15.12.2.6 (for [Java 
5|http://docs.oracle.com/javase/specs/jls/se5.0/html/expressions.html#15.12.2.6]
 or [Java 
7|http://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.12.2.6]), 
which demands that methods applicable via unchecked conversion have their 
return type erased:

{noformat}
[ERROR] 
hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/InputSampler.java:[320,35]
 error: incompatible types: Object[] cannot be converted to K[]
{noformat}

{code}
  @SuppressWarnings("unchecked") // getInputFormat, getOutputKeyComparator
  public static  void writePartitionFile(Job job, Sampler sampler) 
  throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = job.getConfiguration();
final InputFormat inf = 
ReflectionUtils.newInstance(job.getInputFormatClass(), conf);
int numPartitions = job.getNumReduceTasks();
K[] samples = sampler.getSample(inf, job); // returns Object[] according to 
JLS
{code}

  was:
Compiling `org/apache/hadoop/mapreduce/lib/partition/InputSampler.java` fails 
with the Java 8 preview compiler due to its stricter enforcement of JLS 
15.12.2.6 (for [Java 
5|http://docs.oracle.com/javase/specs/jls/se5.0/html/expressions.html#15.12.2.6]
 or [Java 
7|http://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.12.2.6]), 
which demands that methods applicable via unchecked conversion have their 
return type erased:

{noformat}
[ERROR] 
hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/InputSampler.java:[320,35]
 error: incompatible types: Object[] cannot be converted to K[]
{noformat}

{code}
  @SuppressWarnings("unchecked") // getInputFormat, getOutputKeyComparator
  public static  void writePartitionFile(Job job, Sampler sampler) 
  throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = job.getConfiguration();
final InputFormat inf = 
ReflectionUtils.newInstance(job.getInputFormatClass(), conf);
int numPartitions = job.getNumReduceTasks();
K[] samples = sampler.getSample(inf, job); // returns Object[] according to 
JLS
{code}


> Compile failure with JDK8
> -
>
> Key: MAPREDUCE-5020
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5020
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.0.3-alpha
> Environment: java version "1.8.0-ea"
> Java(TM) SE Runtime Environment (build 1.8.0-ea-b36e)
> Java HotSpot(TM) Client VM (build 25.0-b04, mixed mode)
>Reporter: Trevor Robinson
>  Labels: jdk8
>
> Compiling {{org/apache/hadoop/mapreduce/lib/partition/InputSampler.java}} 
> fails with the Java 8 preview compiler due to its stricter enforcement of JLS 
> 15.12.2.6 (for [Java 
> 5|http://docs.oracle.com/javase/specs/jls/se5.0/html/expressions.html#15.12.2.6]
>  or [Java 
> 7|http://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.12.2.6]),
>  which demands that methods applicable via unchecked conversion have their 
> return type erased:
> {noformat}
> [ERROR] 
> hadoop-common/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/partition/InputSampler.java:[320,35]
>  error: incompatible types: Object[] cannot be converted to K[]
> {noformat}
> {code}
>   @SuppressWarnings("unchecked") // getInputFormat, getOutputKeyComparator
>   public static  void writePartitionFile(Job job, Sampler sampler) 
>   throws IOException, ClassNotFoundException, InterruptedException {
> Configuration conf = job.getConfiguration();
> final InputFormat inf = 
> ReflectionUtils.newInstance(job.getInputFormatClass(), conf);
> int numPartitions = job.getNumReduceTasks();
> K[] samples = sampler.getSample(inf, job); // returns Object[] according 
> to JLS
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-3951) Tasks are not evenly spread throughout cluster in MR2

2013-02-21 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583495#comment-13583495
 ] 

Sandy Ryza commented on MAPREDUCE-3951:
---

As of now, the fair and capacity schedulers support even spreading, but the 
fifo scheduler does not.  The capacity scheduler is the default scheduler.  
Should we still try to add it to the fifo scheduler as well?

> Tasks are not evenly spread throughout cluster in MR2
> -
>
> Key: MAPREDUCE-3951
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3951
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 0.23.0, 0.24.0
>Reporter: Todd Lipcon
>
> In MR1 (at least with the fair and fifo schedulers), if you submit a job that 
> needs fewer resources than the cluster can provide, the tasks are spread 
> relatively evenly across the node. For example, submitting a 100-map job to a 
> 50-node cluster, each with 10 slots, results in 2 tasks on each machine. In 
> MR2, however, the tasks would pile up on the first 10 nodes of the cluster, 
> leaving the other nodes unused. This is highly suboptimal for many use cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5006) streaming tests failing

2013-02-21 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583481#comment-13583481
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5006:
---

I think is fine just fixing the testcases as the current patch proposes as the 
testcases were coded assuming there is 1 map when using localrunner.

> streaming tests failing
> ---
>
> Key: MAPREDUCE-5006
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5006
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 2.0.4-beta
>Reporter: Alejandro Abdelnur
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5006.patch
>
>
> The following 2 tests are failing in trunk
> * org.apache.hadoop.streaming.TestStreamReduceNone
> * org.apache.hadoop.streaming.TestStreamXmlRecordReader

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5006) streaming tests failing

2013-02-21 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583472#comment-13583472
 ] 

Sandy Ryza commented on MAPREDUCE-5006:
---

MAPREDUCE-4994 made it so that LocalClientProtocolProvider doesn't always set 
the number of map tasks to 1.  As 2 is the number in mapred-default, this means 
that the local job runner now defaults to 2 mappers.  Ideally, it would be set 
to default to 1 and still be overridable via command line, but I am told this 
isn't possible with the current way that configurations work.  We could 
possibly add in a mapred.local.job.maps?

> streaming tests failing
> ---
>
> Key: MAPREDUCE-5006
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5006
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 2.0.4-beta
>Reporter: Alejandro Abdelnur
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5006.patch
>
>
> The following 2 tests are failing in trunk
> * org.apache.hadoop.streaming.TestStreamReduceNone
> * org.apache.hadoop.streaming.TestStreamXmlRecordReader

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4693) Historyserver should provide counters for failed tasks

2013-02-21 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583450#comment-13583450
 ] 

Siddharth Seth commented on MAPREDUCE-4693:
---

bq. ... and the counters is converted to jobhistory.JhCounters while 
serializing.

Storing the counters as org.apache.hadoop.mapreduce.Counters is to prevent a 
duplicate copy of the counters till they're actually serialized that's the 
getDatum() method. (MAPREDUCE-3511)

Other than this one change and a couple of minor formatting fixes, the patch 
looks good.

> Historyserver should provide counters for failed tasks
> --
>
> Key: MAPREDUCE-4693
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4693
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, mrv2
>Affects Versions: 2.0.3-alpha, 0.23.6
>Reporter: Jason Lowe
>Assignee: Xuan Gong
>  Labels: usability
> Attachments: MAPREDUCE-4693.1.patch, MAPREDUCE-4693.2.patch
>
>
> Currently the historyserver is not providing counters for failed tasks, even 
> though they are available via the AM as long as the job is still running.  
> Those counters are lost when the client needs to redirect to the 
> historyserver after the job completes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4693) Historyserver should provide counters for failed tasks

2013-02-21 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated MAPREDUCE-4693:
--

Status: Open  (was: Patch Available)

> Historyserver should provide counters for failed tasks
> --
>
> Key: MAPREDUCE-4693
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4693
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, mrv2
>Affects Versions: 2.0.3-alpha, 0.23.6
>Reporter: Jason Lowe
>Assignee: Xuan Gong
>  Labels: usability
> Attachments: MAPREDUCE-4693.1.patch, MAPREDUCE-4693.2.patch
>
>
> Currently the historyserver is not providing counters for failed tasks, even 
> though they are available via the AM as long as the job is still running.  
> Those counters are lost when the client needs to redirect to the 
> historyserver after the job completes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-21 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583406#comment-13583406
 ] 

Gelesh commented on MAPREDUCE-4974:
---

And Also, as [~ak.a...@aol.com] has mentioned,
1) To avoid ' if (newSize == 0) ' check inside the loop,
2) if we have ' compressionCodecs & codec instantiated only if its a compressed 
input. '

Hope These two points are valid,
Please share your thoughts...

> Optimising the LineRecordReader initialize() method
> ---
>
> Key: MAPREDUCE-4974
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1, mrv2, performance
>Affects Versions: 2.0.2-alpha, 0.23.5
> Environment: Hadoop Linux
>Reporter: Arun A K
>Assignee: Gelesh
>  Labels: patch, performance
> Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, 
> MAPREDUCE-4974.3.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I found there is a a scope of optimizing the code, over initialize() if we 
> have compressionCodecs & codec instantiated only if its a compressed input.
> Mean while Gelesh George Omathil, added if we could avoid the null check of 
> key & value. This would time save, since for every next key value generation, 
> null check is done. The intention being to instantiate only once and avoid 
> NPE as well. Hope both could be met if initialize key & value over  
> initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-21 Thread Gelesh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583403#comment-13583403
 ] 

Gelesh commented on MAPREDUCE-4974:
---

[~snihalani],
Thanks for bring up that very valid point.
In That Case, What if we eliminate the null check for Value alone,
And keep the Null Check for Key as such .. ?


> Optimising the LineRecordReader initialize() method
> ---
>
> Key: MAPREDUCE-4974
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1, mrv2, performance
>Affects Versions: 2.0.2-alpha, 0.23.5
> Environment: Hadoop Linux
>Reporter: Arun A K
>Assignee: Gelesh
>  Labels: patch, performance
> Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, 
> MAPREDUCE-4974.3.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I found there is a a scope of optimizing the code, over initialize() if we 
> have compressionCodecs & codec instantiated only if its a compressed input.
> Mean while Gelesh George Omathil, added if we could avoid the null check of 
> key & value. This would time save, since for every next key value generation, 
> null check is done. The intention being to instantiate only once and avoid 
> NPE as well. Hope both could be met if initialize key & value over  
> initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5008) Merger progress miscounts with respect to EOF_MARKER

2013-02-21 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583388#comment-13583388
 ] 

Tom White commented on MAPREDUCE-5008:
--

Yes, that is fine. +1 for the patch.

> Merger progress miscounts with respect to EOF_MARKER
> 
>
> Key: MAPREDUCE-5008
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5008
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5008-branch-1.patch, MAPREDUCE-5008.patch, 
> MAPREDUCE-5008.patch
>
>
> After MAPREDUCE-2264, a segment's raw data length is calculated without the 
> EOF_MARKER bytes.  However, when the merge is counting how many bytes it 
> processed, it includes the marker.  This can cause the merge progress to go 
> above 100%.
> Whether these EOF_MARKER bytes should count should be consistent between the 
> two.
> This a JIRA instead of an amendment because MAPREDUCE-2264 already went into 
> 2.0.3.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-3778) Per-state RM app-pages should have search ala JHS pages

2013-02-21 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved MAPREDUCE-3778.


Resolution: Duplicate

This is already fixed as I see now. Closing as duplicate.

> Per-state RM app-pages should have search ala JHS pages
> ---
>
> Key: MAPREDUCE-3778
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3778
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2, webapps
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5008) Merger progress miscounts with respect to EOF_MARKER

2013-02-21 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583357#comment-13583357
 ] 

Sandy Ryza commented on MAPREDUCE-5008:
---

With the merge turned on in the local job runner (MAPREDUCE-434), TestReporter, 
which includes progress counting, fails.  With this patch and MAPREDUCE-434, it 
passes.  Does that sound sufficient to you Tom?

> Merger progress miscounts with respect to EOF_MARKER
> 
>
> Key: MAPREDUCE-5008
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5008
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5008-branch-1.patch, MAPREDUCE-5008.patch, 
> MAPREDUCE-5008.patch
>
>
> After MAPREDUCE-2264, a segment's raw data length is calculated without the 
> EOF_MARKER bytes.  However, when the merge is counting how many bytes it 
> processed, it includes the marker.  This can cause the merge progress to go 
> above 100%.
> Whether these EOF_MARKER bytes should count should be consistent between the 
> two.
> This a JIRA instead of an amendment because MAPREDUCE-2264 already went into 
> 2.0.3.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5017) Provide access to launcher job URL from web console when using Map Reduce action

2013-02-21 Thread Ryota Egashira (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583355#comment-13583355
 ] 

Ryota Egashira commented on MAPREDUCE-5017:
---

oh didn't know that. thanks, Harsh

> Provide access to launcher job URL from web console when using Map Reduce 
> action 
> -
>
> Key: MAPREDUCE-5017
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5017
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: trunk
>Reporter: Ryota Egashira
>Assignee: Ryota Egashira
> Fix For: trunk
>
>
> there are applications where custom inputformat used in MR action, and log 
> message from the inputformat is written on launcher task log. for debugging 
> purpose, users need to check the launcher task log. but currently in MR 
> action, oozie automatically swaps external ID, and do not expose the launcher 
> ID in web-console. (now only way is to to grep oozie.log). this jira is to 
> show launcher job URL on web console when using Map Reduce action 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5018) Support raw binary data with Hadoop streaming

2013-02-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583352#comment-13583352
 ] 

Hadoop QA commented on MAPREDUCE-5018:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12570328/mapstream
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3353//console

This message is automatically generated.

> Support raw binary data with Hadoop streaming
> -
>
> Key: MAPREDUCE-5018
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5018
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/streaming
>Reporter: Jay Hacker
>Priority: Minor
> Attachments: justbytes.jar, MAPREDUCE-5018.patch, mapstream
>
>
> People often have a need to run older programs over many files, and turn to 
> Hadoop streaming as a reliable, performant batch system.  There are good 
> reasons for this:
> 1. Hadoop is convenient: they may already be using it for mapreduce jobs, and 
> it is easy to spin up a cluster in the cloud.
> 2. It is reliable: HDFS replicates data and the scheduler retries failed jobs.
> 3. It is reasonably performant: it moves the code to the data, maintaining 
> locality, and scales with the number of nodes.
> Historically Hadoop is of course oriented toward processing key/value pairs, 
> and so needs to interpret the data passing through it.  Unfortunately, this 
> makes it difficult to use Hadoop streaming with programs that don't deal in 
> key/value pairs, or with binary data in general.  For example, something as 
> simple as running md5sum to verify the integrity of files will not give the 
> correct result, due to Hadoop's interpretation of the data.  
> There have been several attempts at binary serialization schemes for Hadoop 
> streaming, such as TypedBytes (HADOOP-1722); however, these are still aimed 
> at efficiently encoding key/value pairs, and not passing data through 
> unmodified.  Even the "RawBytes" serialization scheme adds length fields to 
> the data, rendering it not-so-raw.
> I often have a need to run a Unix filter on files stored in HDFS; currently, 
> the only way I can do this on the raw data is to copy the data out and run 
> the filter on one machine, which is inconvenient, slow, and unreliable.  It 
> would be very convenient to run the filter as a map-only job, allowing me to 
> build on existing (well-tested!) building blocks in the Unix tradition 
> instead of reimplementing them as mapreduce programs.
> However, most existing tools don't know about file splits, and so want to 
> process whole files; and of course many expect raw binary input and output.  
> The solution is to run a map-only job with an InputFormat and OutputFormat 
> that just pass raw bytes and don't split.  It turns out to be a little more 
> complicated with streaming; I have attached a patch with the simplest 
> solution I could come up with.  I call the format "JustBytes" (as "RawBytes" 
> was already taken), and it should be usable with most recent versions of 
> Hadoop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5018) Support raw binary data with Hadoop streaming

2013-02-21 Thread Jay Hacker (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Hacker updated MAPREDUCE-5018:
--

Attachment: mapstream
justbytes.jar

I've attached a jar file with source and compiled binaries for people who want 
to try it out without recompiling Hadoop.  You can use the attached 'mapstream' 
shell script to run it easily.

For those interested in performance, the TL;DR is about 10X slower than native. 
 That's running 'cat' as the mapper on one file that fits in one block, 
compared to cat on a local ext4 filesystem on the same machine. If your files 
span multiple blocks, the non-local reads will be even slower.  That also 
doesn't include job overhead.  However, most mappers will be more CPU 
intensive, and the relative overhead of I/O diminishes; YMMV.

> Support raw binary data with Hadoop streaming
> -
>
> Key: MAPREDUCE-5018
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5018
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/streaming
>Reporter: Jay Hacker
>Priority: Minor
> Attachments: justbytes.jar, MAPREDUCE-5018.patch, mapstream
>
>
> People often have a need to run older programs over many files, and turn to 
> Hadoop streaming as a reliable, performant batch system.  There are good 
> reasons for this:
> 1. Hadoop is convenient: they may already be using it for mapreduce jobs, and 
> it is easy to spin up a cluster in the cloud.
> 2. It is reliable: HDFS replicates data and the scheduler retries failed jobs.
> 3. It is reasonably performant: it moves the code to the data, maintaining 
> locality, and scales with the number of nodes.
> Historically Hadoop is of course oriented toward processing key/value pairs, 
> and so needs to interpret the data passing through it.  Unfortunately, this 
> makes it difficult to use Hadoop streaming with programs that don't deal in 
> key/value pairs, or with binary data in general.  For example, something as 
> simple as running md5sum to verify the integrity of files will not give the 
> correct result, due to Hadoop's interpretation of the data.  
> There have been several attempts at binary serialization schemes for Hadoop 
> streaming, such as TypedBytes (HADOOP-1722); however, these are still aimed 
> at efficiently encoding key/value pairs, and not passing data through 
> unmodified.  Even the "RawBytes" serialization scheme adds length fields to 
> the data, rendering it not-so-raw.
> I often have a need to run a Unix filter on files stored in HDFS; currently, 
> the only way I can do this on the raw data is to copy the data out and run 
> the filter on one machine, which is inconvenient, slow, and unreliable.  It 
> would be very convenient to run the filter as a map-only job, allowing me to 
> build on existing (well-tested!) building blocks in the Unix tradition 
> instead of reimplementing them as mapreduce programs.
> However, most existing tools don't know about file splits, and so want to 
> process whole files; and of course many expect raw binary input and output.  
> The solution is to run a map-only job with an InputFormat and OutputFormat 
> that just pass raw bytes and don't split.  It turns out to be a little more 
> complicated with streaming; I have attached a patch with the simplest 
> solution I could come up with.  I call the format "JustBytes" (as "RawBytes" 
> was already taken), and it should be usable with most recent versions of 
> Hadoop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5018) Support raw binary data with Hadoop streaming

2013-02-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583302#comment-13583302
 ] 

Hadoop QA commented on MAPREDUCE-5018:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12570317/MAPREDUCE-5018.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3352//console

This message is automatically generated.

> Support raw binary data with Hadoop streaming
> -
>
> Key: MAPREDUCE-5018
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5018
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/streaming
>Reporter: Jay Hacker
>Priority: Minor
> Attachments: MAPREDUCE-5018.patch
>
>
> People often have a need to run older programs over many files, and turn to 
> Hadoop streaming as a reliable, performant batch system.  There are good 
> reasons for this:
> 1. Hadoop is convenient: they may already be using it for mapreduce jobs, and 
> it is easy to spin up a cluster in the cloud.
> 2. It is reliable: HDFS replicates data and the scheduler retries failed jobs.
> 3. It is reasonably performant: it moves the code to the data, maintaining 
> locality, and scales with the number of nodes.
> Historically Hadoop is of course oriented toward processing key/value pairs, 
> and so needs to interpret the data passing through it.  Unfortunately, this 
> makes it difficult to use Hadoop streaming with programs that don't deal in 
> key/value pairs, or with binary data in general.  For example, something as 
> simple as running md5sum to verify the integrity of files will not give the 
> correct result, due to Hadoop's interpretation of the data.  
> There have been several attempts at binary serialization schemes for Hadoop 
> streaming, such as TypedBytes (HADOOP-1722); however, these are still aimed 
> at efficiently encoding key/value pairs, and not passing data through 
> unmodified.  Even the "RawBytes" serialization scheme adds length fields to 
> the data, rendering it not-so-raw.
> I often have a need to run a Unix filter on files stored in HDFS; currently, 
> the only way I can do this on the raw data is to copy the data out and run 
> the filter on one machine, which is inconvenient, slow, and unreliable.  It 
> would be very convenient to run the filter as a map-only job, allowing me to 
> build on existing (well-tested!) building blocks in the Unix tradition 
> instead of reimplementing them as mapreduce programs.
> However, most existing tools don't know about file splits, and so want to 
> process whole files; and of course many expect raw binary input and output.  
> The solution is to run a map-only job with an InputFormat and OutputFormat 
> that just pass raw bytes and don't split.  It turns out to be a little more 
> complicated with streaming; I have attached a patch with the simplest 
> solution I could come up with.  I call the format "JustBytes" (as "RawBytes" 
> was already taken), and it should be usable with most recent versions of 
> Hadoop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5018) Support raw binary data with Hadoop streaming

2013-02-21 Thread Jay Hacker (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Hacker updated MAPREDUCE-5018:
--

Attachment: MAPREDUCE-5018.patch

justbytes patch submitted for code review.

> Support raw binary data with Hadoop streaming
> -
>
> Key: MAPREDUCE-5018
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5018
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/streaming
>Reporter: Jay Hacker
>Priority: Minor
> Attachments: MAPREDUCE-5018.patch
>
>
> People often have a need to run older programs over many files, and turn to 
> Hadoop streaming as a reliable, performant batch system.  There are good 
> reasons for this:
> 1. Hadoop is convenient: they may already be using it for mapreduce jobs, and 
> it is easy to spin up a cluster in the cloud.
> 2. It is reliable: HDFS replicates data and the scheduler retries failed jobs.
> 3. It is reasonably performant: it moves the code to the data, maintaining 
> locality, and scales with the number of nodes.
> Historically Hadoop is of course oriented toward processing key/value pairs, 
> and so needs to interpret the data passing through it.  Unfortunately, this 
> makes it difficult to use Hadoop streaming with programs that don't deal in 
> key/value pairs, or with binary data in general.  For example, something as 
> simple as running md5sum to verify the integrity of files will not give the 
> correct result, due to Hadoop's interpretation of the data.  
> There have been several attempts at binary serialization schemes for Hadoop 
> streaming, such as TypedBytes (HADOOP-1722); however, these are still aimed 
> at efficiently encoding key/value pairs, and not passing data through 
> unmodified.  Even the "RawBytes" serialization scheme adds length fields to 
> the data, rendering it not-so-raw.
> I often have a need to run a Unix filter on files stored in HDFS; currently, 
> the only way I can do this on the raw data is to copy the data out and run 
> the filter on one machine, which is inconvenient, slow, and unreliable.  It 
> would be very convenient to run the filter as a map-only job, allowing me to 
> build on existing (well-tested!) building blocks in the Unix tradition 
> instead of reimplementing them as mapreduce programs.
> However, most existing tools don't know about file splits, and so want to 
> process whole files; and of course many expect raw binary input and output.  
> The solution is to run a map-only job with an InputFormat and OutputFormat 
> that just pass raw bytes and don't split.  It turns out to be a little more 
> complicated with streaming; I have attached a patch with the simplest 
> solution I could come up with.  I call the format "JustBytes" (as "RawBytes" 
> was already taken), and it should be usable with most recent versions of 
> Hadoop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5018) Support raw binary data with Hadoop streaming

2013-02-21 Thread Jay Hacker (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Hacker updated MAPREDUCE-5018:
--

Target Version/s: trunk
Release Note: Add "-io justbytes" I/O format to allow raw binary 
streaming.
  Status: Patch Available  (was: Open)

This patch adds a 'JustBytesWritable' and supporting InputFormat, OutputFormat, 
InputWriter, and OutputReader to support passing raw, unmodified, unaugmented 
bytes through Hadoop streaming.  The purpose is to be able to run arbitrary 
Unix filters on entire binary files stored in HDFS as map-only jobs, taking 
advantage of locality and reliability offered by Hadoop.

The code is very straightforward; most methods are only one line.

A few design notes:

1. Data is stored in a JustBytesWritable, which is the simplest possible 
Writable wrapper around a byte[].  It literally just reads until the buffer is 
full or EOF and remembers the number of bytes.

2. Data is read by JustBytesInputFormat in 64K chunks by default and stored in 
a JustBytesWritable key; the value is a NullWritable, but no value is ever read 
or written.  They key is used instead of the value to allow the possibility of 
using it in a reduce.

3. Input files are never split, as most programs are not able to handle splits.

4. Input files are not decompressed, as the purpose is to get raw data to a 
program, people may want to operate on compressed data (e.g., md5sum on 
archives), and as most tools do not expect automatic decompression, this is the 
"least surprising" option.  It's also trivial to throw a "zcat" in front of 
your filter.

5. Output is even simpler than input, and just writes the bytes of a 
JustBytesWritable key to the output stream.  Output is never compressed, for 
similar reasons as above.

6. The code uses the old mapred API, as that is what streaming uses.

Streaming inserts an InputWriter between the InputFormat and the map 
executable, and an OutputReader between the map executable and the 
OutputFormat; the JustBytes version simply pass the key bytes on through.

I've augmented IdentifierResolver to recognize "-io justbytes" on the command 
line and set the input/output classes appropriately.

I've included a shell script called "mapstream" to run streaming with all 
required command line parameters; it makes running a binary map-only job as 
easy as:

mapstream indir command outdir

which runs "command" on every file in indir and writes the results to outdir.

I welcome feedback, especially if there is an even simpler way to do this.  I'm 
not hung up on the JustBytes name, I'd be happy to switch to a better one.  If 
people like the general approach, I will add unit tests and resubmit.  Also 
please let me know if I should break this into separate patches for common and 
mapreduce.

> Support raw binary data with Hadoop streaming
> -
>
> Key: MAPREDUCE-5018
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5018
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/streaming
>Reporter: Jay Hacker
>Priority: Minor
>
> People often have a need to run older programs over many files, and turn to 
> Hadoop streaming as a reliable, performant batch system.  There are good 
> reasons for this:
> 1. Hadoop is convenient: they may already be using it for mapreduce jobs, and 
> it is easy to spin up a cluster in the cloud.
> 2. It is reliable: HDFS replicates data and the scheduler retries failed jobs.
> 3. It is reasonably performant: it moves the code to the data, maintaining 
> locality, and scales with the number of nodes.
> Historically Hadoop is of course oriented toward processing key/value pairs, 
> and so needs to interpret the data passing through it.  Unfortunately, this 
> makes it difficult to use Hadoop streaming with programs that don't deal in 
> key/value pairs, or with binary data in general.  For example, something as 
> simple as running md5sum to verify the integrity of files will not give the 
> correct result, due to Hadoop's interpretation of the data.  
> There have been several attempts at binary serialization schemes for Hadoop 
> streaming, such as TypedBytes (HADOOP-1722); however, these are still aimed 
> at efficiently encoding key/value pairs, and not passing data through 
> unmodified.  Even the "RawBytes" serialization scheme adds length fields to 
> the data, rendering it not-so-raw.
> I often have a need to run a Unix filter on files stored in HDFS; currently, 
> the only way I can do this on the raw data is to copy the data out and run 
> the filter on one machine, which is inconvenient, slow, and unreliable.  It 
> would be very convenient to run the filter as a map-only job, allowing me to 
> build on existing (well-tested!) building blocks in the 

[jira] [Created] (MAPREDUCE-5019) Fair scheduler should allow peremption on reducer only

2013-02-21 Thread Damien Hardy (JIRA)
Damien Hardy created MAPREDUCE-5019:
---

 Summary: Fair scheduler should allow peremption on reducer only
 Key: MAPREDUCE-5019
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5019
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, scheduler
Affects Versions: 2.0.2-alpha
 Environment: CDH4.1.2
Reporter: Damien Hardy
Priority: Minor


Fair scheduler is very good.
But having a big MR job running lots of mapper and reducer( 10M + 10R )
Then a small MR on the same pool (1M + 1R)
having slots for 10 mapper and 10 reducer

 - The big job take all the map slots
 - The small job wait for a map slot
 - 1rst big job map task finish
 - the small job take the map slot it needs
 - meanwhile all the reducer of the big job take all the reducer slot to copy 
and sort
 - the small job end is map and wait for the all maps to end and for 1 reducer 
to end before accessing for a reducer slot.
 - all the reducer stalled after sorting waiting for the mapper to end one  by 
one...

If I have a big job and a lot of small, I don't want new small arriving  and 
killing running map tasks of big job to get a slot.

I think it could be useful that the small job can kill a reducer tasks (and 
only reducer) to end before the big job finish all its map tasks and a reducer.

rules can be : a job having all its map finished and waiting for reducer slot 
can kill reducer tasks from a job that still have map slot running (assuming 
they are just waiting for copy and sort)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5018) Support raw binary data with Hadoop streaming

2013-02-21 Thread Jay Hacker (JIRA)
Jay Hacker created MAPREDUCE-5018:
-

 Summary: Support raw binary data with Hadoop streaming
 Key: MAPREDUCE-5018
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5018
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/streaming
Reporter: Jay Hacker
Priority: Minor


People often have a need to run older programs over many files, and turn to 
Hadoop streaming as a reliable, performant batch system.  There are good 
reasons for this:

1. Hadoop is convenient: they may already be using it for mapreduce jobs, and 
it is easy to spin up a cluster in the cloud.
2. It is reliable: HDFS replicates data and the scheduler retries failed jobs.
3. It is reasonably performant: it moves the code to the data, maintaining 
locality, and scales with the number of nodes.

Historically Hadoop is of course oriented toward processing key/value pairs, 
and so needs to interpret the data passing through it.  Unfortunately, this 
makes it difficult to use Hadoop streaming with programs that don't deal in 
key/value pairs, or with binary data in general.  For example, something as 
simple as running md5sum to verify the integrity of files will not give the 
correct result, due to Hadoop's interpretation of the data.  

There have been several attempts at binary serialization schemes for Hadoop 
streaming, such as TypedBytes (HADOOP-1722); however, these are still aimed at 
efficiently encoding key/value pairs, and not passing data through unmodified.  
Even the "RawBytes" serialization scheme adds length fields to the data, 
rendering it not-so-raw.

I often have a need to run a Unix filter on files stored in HDFS; currently, 
the only way I can do this on the raw data is to copy the data out and run the 
filter on one machine, which is inconvenient, slow, and unreliable.  It would 
be very convenient to run the filter as a map-only job, allowing me to build on 
existing (well-tested!) building blocks in the Unix tradition instead of 
reimplementing them as mapreduce programs.

However, most existing tools don't know about file splits, and so want to 
process whole files; and of course many expect raw binary input and output.  
The solution is to run a map-only job with an InputFormat and OutputFormat that 
just pass raw bytes and don't split.  It turns out to be a little more 
complicated with streaming; I have attached a patch with the simplest solution 
I could come up with.  I call the format "JustBytes" (as "RawBytes" was already 
taken), and it should be usable with most recent versions of Hadoop.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4974) Optimising the LineRecordReader initialize() method

2013-02-21 Thread Surenkumar Nihalani (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583195#comment-13583195
 ] 

Surenkumar Nihalani commented on MAPREDUCE-4974:


I don't think we return null if there is no key. So it no longer follows the 
contract at 
http://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/mapreduce/RecordReader.html#getCurrentKey%28%29

Can you confirm it works?

> Optimising the LineRecordReader initialize() method
> ---
>
> Key: MAPREDUCE-4974
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4974
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1, mrv2, performance
>Affects Versions: 2.0.2-alpha, 0.23.5
> Environment: Hadoop Linux
>Reporter: Arun A K
>Assignee: Gelesh
>  Labels: patch, performance
> Attachments: MAPREDUCE-4974.1.patch, MAPREDUCE-4974.2.patch, 
> MAPREDUCE-4974.3.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I found there is a a scope of optimizing the code, over initialize() if we 
> have compressionCodecs & codec instantiated only if its a compressed input.
> Mean while Gelesh George Omathil, added if we could avoid the null check of 
> key & value. This would time save, since for every next key value generation, 
> null check is done. The intention being to instantiate only once and avoid 
> NPE as well. Hope both could be met if initialize key & value over  
> initialize() method. We both have worked on it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-02-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583177#comment-13583177
 ] 

Hudson commented on MAPREDUCE-4951:
---

Integrated in Hadoop-Mapreduce-trunk #1351 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1351/])
MAPREDUCE-4951. Container preemption interpreted as task failure. 
Contributed by Sandy Ryza. (Revision 1448615)

 Result = FAILURE
tomwhite : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448615
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java


> Container preemption interpreted as task failure
> 
>
> Key: MAPREDUCE-4951
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mr-am, mrv2
>Affects Versions: 2.0.2-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 2.0.4-beta
>
> Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951-2.patch, 
> MAPREDUCE-4951.patch
>
>
> When YARN reports a completed container to the MR AM, it always interprets it 
> as a failure.  This can lead to a job failing because too many of its tasks 
> failed, when in fact they only failed because the scheduler preempted them.
> MR needs to recognize the special exit code value of -100 and interpret it as 
> a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5013) mapred.JobStatus compatibility: MR2 missing constructors from MR1

2013-02-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583178#comment-13583178
 ] 

Hudson commented on MAPREDUCE-5013:
---

Integrated in Hadoop-Mapreduce-trunk #1351 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1351/])
MAPREDUCE-5013. mapred.JobStatus compatibility: MR2 missing constructors 
from MR1. Contributed by Sandy Ryza. (Revision 1448602)

 Result = FAILURE
tomwhite : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448602
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/JobStatus.java


> mapred.JobStatus compatibility: MR2 missing constructors from MR1
> -
>
> Key: MAPREDUCE-5013
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5013
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 2.0.4-beta
>
> Attachments: MAPREDUCE-5013.patch
>
>
> JobStatus is missing the following constructors in MR2 that were present in 
> MR1
> public org.apache.hadoop.mapred.JobStatus(org.apache.hadoop.mapred.JobID, 
> float, float, float, int);
> public org.apache.hadoop.mapred.JobStatus(org.apache.hadoop.mapred.JobID, 
> float, float, int);
> public org.apache.hadoop.mapred.JobStatus(org.apache.hadoop.mapred.JobID, 
> float, float, float, int, org.apache.hadoop.mapred.JobPriority);
> public org.apache.hadoop.mapred.JobStatus(org.apache.hadoop.mapred.JobID, 
> float, float, float, float, int, org.apache.hadoop.mapred.JobPriority);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4846) Some JobQueueInfo methods are public in MR1 but protected in MR2

2013-02-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583176#comment-13583176
 ] 

Hudson commented on MAPREDUCE-4846:
---

Integrated in Hadoop-Mapreduce-trunk #1351 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1351/])
MAPREDUCE-4846. Some JobQueueInfo methods are public in MR1 but protected 
in MR2. Contributed by Sandy Ryza. (Revision 1448597)

 Result = FAILURE
tomwhite : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448597
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/JobQueueInfo.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/QueueConfigurationParser.java


> Some JobQueueInfo methods are public in MR1 but protected in MR2
> 
>
> Key: MAPREDUCE-4846
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4846
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 2.0.2-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 2.0.4-beta
>
> Attachments: MAPREDUCE-4846-1.patch, MAPREDUCE-4846-1.patch, 
> MAPREDUCE-4846.patch
>
>
> setQueueName, setSchedulingInfo, and setQueueState were public in MR1, but 
> are private int MR2.  They should be made public with 
> InterfaceAudience.Private.
> getQueueState was public, but is now package private.  It has been replaced 
> with getState, which returns a QueueState instead of a String.  It should be 
> made public and deprecated, with a documentation reference to getState.
> Should the other setter methods in JobQueueInfo that were not in MR1 be 
> changed to public/InterfaceAudience.Private for consistency?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4502) Multi-level aggregation with combining the result of maps per node/rack

2013-02-21 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583150#comment-13583150
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-4502:
---

Can someone explain why my patch got "-1 overall" while this patch passed all 
checks?

> Multi-level aggregation with combining the result of maps per node/rack
> ---
>
> Key: MAPREDUCE-4502
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4502
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Affects Versions: 3.0.0
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: design_v2.pdf, MAPREDUCE-4502.1.patch, 
> MAPREDUCE-4502.2.patch, MAPREDUCE-4502.3.patch, MAPREDUCE-4502.4.patch, 
> MAPREDUCE-4525-pof.diff, speculative_draft.pdf
>
>
> The shuffle costs is expensive in Hadoop in spite of the existence of 
> combiner, because the scope of combining is limited within only one MapTask. 
> To solve this problem, it's a good way to aggregate the result of maps per 
> node/rack by launch combiner.
> This JIRA is to implement the multi-level aggregation infrastructure, 
> including combining per container(MAPREDUCE-3902 is related), coordinating 
> containers by application master without breaking fault tolerance of jobs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-02-21 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-4951:
-

   Resolution: Fixed
Fix Version/s: 2.0.4-beta
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I just committed this. Thanks Sandy.

> Container preemption interpreted as task failure
> 
>
> Key: MAPREDUCE-4951
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mr-am, mrv2
>Affects Versions: 2.0.2-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 2.0.4-beta
>
> Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951-2.patch, 
> MAPREDUCE-4951.patch
>
>
> When YARN reports a completed container to the MR AM, it always interprets it 
> as a failure.  This can lead to a job failing because too many of its tasks 
> failed, when in fact they only failed because the scheduler preempted them.
> MR needs to recognize the special exit code value of -100 and interpret it as 
> a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

2013-02-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583138#comment-13583138
 ] 

Hudson commented on MAPREDUCE-4951:
---

Integrated in Hadoop-trunk-Commit #3372 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3372/])
MAPREDUCE-4951. Container preemption interpreted as task failure. 
Contributed by Sandy Ryza. (Revision 1448615)

 Result = SUCCESS
tomwhite : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448615
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestRMContainerAllocator.java


> Container preemption interpreted as task failure
> 
>
> Key: MAPREDUCE-4951
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mr-am, mrv2
>Affects Versions: 2.0.2-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951-2.patch, 
> MAPREDUCE-4951.patch
>
>
> When YARN reports a completed container to the MR AM, it always interprets it 
> as a failure.  This can lead to a job failing because too many of its tasks 
> failed, when in fact they only failed because the scheduler preempted them.
> MR needs to recognize the special exit code value of -100 and interpret it as 
> a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5013) mapred.JobStatus compatibility: MR2 missing constructors from MR1

2013-02-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583132#comment-13583132
 ] 

Hudson commented on MAPREDUCE-5013:
---

Integrated in Hadoop-trunk-Commit #3371 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3371/])
MAPREDUCE-5013. mapred.JobStatus compatibility: MR2 missing constructors 
from MR1. Contributed by Sandy Ryza. (Revision 1448602)

 Result = SUCCESS
tomwhite : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448602
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/JobStatus.java


> mapred.JobStatus compatibility: MR2 missing constructors from MR1
> -
>
> Key: MAPREDUCE-5013
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5013
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 2.0.4-beta
>
> Attachments: MAPREDUCE-5013.patch
>
>
> JobStatus is missing the following constructors in MR2 that were present in 
> MR1
> public org.apache.hadoop.mapred.JobStatus(org.apache.hadoop.mapred.JobID, 
> float, float, float, int);
> public org.apache.hadoop.mapred.JobStatus(org.apache.hadoop.mapred.JobID, 
> float, float, int);
> public org.apache.hadoop.mapred.JobStatus(org.apache.hadoop.mapred.JobID, 
> float, float, float, int, org.apache.hadoop.mapred.JobPriority);
> public org.apache.hadoop.mapred.JobStatus(org.apache.hadoop.mapred.JobID, 
> float, float, float, float, int, org.apache.hadoop.mapred.JobPriority);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4502) Multi-level aggregation with combining the result of maps per node/rack

2013-02-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583133#comment-13583133
 ] 

Hadoop QA commented on MAPREDUCE-4502:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12570292/MAPREDUCE-4502.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

{color:green}+1 tests included appear to have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3351//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3351//console

This message is automatically generated.

> Multi-level aggregation with combining the result of maps per node/rack
> ---
>
> Key: MAPREDUCE-4502
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4502
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Affects Versions: 3.0.0
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: design_v2.pdf, MAPREDUCE-4502.1.patch, 
> MAPREDUCE-4502.2.patch, MAPREDUCE-4502.3.patch, MAPREDUCE-4502.4.patch, 
> MAPREDUCE-4525-pof.diff, speculative_draft.pdf
>
>
> The shuffle costs is expensive in Hadoop in spite of the existence of 
> combiner, because the scope of combining is limited within only one MapTask. 
> To solve this problem, it's a good way to aggregate the result of maps per 
> node/rack by launch combiner.
> This JIRA is to implement the multi-level aggregation infrastructure, 
> including combining per container(MAPREDUCE-3902 is related), coordinating 
> containers by application master without breaking fault tolerance of jobs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4846) Some JobQueueInfo methods are public in MR1 but protected in MR2

2013-02-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583126#comment-13583126
 ] 

Hudson commented on MAPREDUCE-4846:
---

Integrated in Hadoop-trunk-Commit #3370 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/3370/])
MAPREDUCE-4846. Some JobQueueInfo methods are public in MR1 but protected 
in MR2. Contributed by Sandy Ryza. (Revision 1448597)

 Result = SUCCESS
tomwhite : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1448597
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/JobQueueInfo.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/QueueConfigurationParser.java


> Some JobQueueInfo methods are public in MR1 but protected in MR2
> 
>
> Key: MAPREDUCE-4846
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4846
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 2.0.2-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 2.0.4-beta
>
> Attachments: MAPREDUCE-4846-1.patch, MAPREDUCE-4846-1.patch, 
> MAPREDUCE-4846.patch
>
>
> setQueueName, setSchedulingInfo, and setQueueState were public in MR1, but 
> are private int MR2.  They should be made public with 
> InterfaceAudience.Private.
> getQueueState was public, but is now package private.  It has been replaced 
> with getState, which returns a QueueState instead of a String.  It should be 
> made public and deprecated, with a documentation reference to getState.
> Should the other setter methods in JobQueueInfo that were not in MR1 be 
> changed to public/InterfaceAudience.Private for consistency?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5013) mapred.JobStatus compatibility: MR2 missing constructors from MR1

2013-02-21 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-5013:
-

   Resolution: Fixed
Fix Version/s: 2.0.4-beta
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

+1 I just committed this. Thanks Sandy.

> mapred.JobStatus compatibility: MR2 missing constructors from MR1
> -
>
> Key: MAPREDUCE-5013
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5013
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 2.0.4-beta
>
> Attachments: MAPREDUCE-5013.patch
>
>
> JobStatus is missing the following constructors in MR2 that were present in 
> MR1
> public org.apache.hadoop.mapred.JobStatus(org.apache.hadoop.mapred.JobID, 
> float, float, float, int);
> public org.apache.hadoop.mapred.JobStatus(org.apache.hadoop.mapred.JobID, 
> float, float, int);
> public org.apache.hadoop.mapred.JobStatus(org.apache.hadoop.mapred.JobID, 
> float, float, float, int, org.apache.hadoop.mapred.JobPriority);
> public org.apache.hadoop.mapred.JobStatus(org.apache.hadoop.mapred.JobID, 
> float, float, float, float, int, org.apache.hadoop.mapred.JobPriority);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5008) Merger progress miscounts with respect to EOF_MARKER

2013-02-21 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583125#comment-13583125
 ] 

Tom White commented on MAPREDUCE-5008:
--

Is there a way to test this, manual or otherwise?

> Merger progress miscounts with respect to EOF_MARKER
> 
>
> Key: MAPREDUCE-5008
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5008
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.0.3-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5008-branch-1.patch, MAPREDUCE-5008.patch, 
> MAPREDUCE-5008.patch
>
>
> After MAPREDUCE-2264, a segment's raw data length is calculated without the 
> EOF_MARKER bytes.  However, when the merge is counting how many bytes it 
> processed, it includes the marker.  This can cause the merge progress to go 
> above 100%.
> Whether these EOF_MARKER bytes should count should be consistent between the 
> two.
> This a JIRA instead of an amendment because MAPREDUCE-2264 already went into 
> 2.0.3.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4846) Some JobQueueInfo methods are public in MR1 but protected in MR2

2013-02-21 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-4846:
-

   Resolution: Fixed
Fix Version/s: 2.0.4-beta
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

+1 I just committed this. Thanks Sandy.

> Some JobQueueInfo methods are public in MR1 but protected in MR2
> 
>
> Key: MAPREDUCE-4846
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4846
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 2.0.2-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 2.0.4-beta
>
> Attachments: MAPREDUCE-4846-1.patch, MAPREDUCE-4846-1.patch, 
> MAPREDUCE-4846.patch
>
>
> setQueueName, setSchedulingInfo, and setQueueState were public in MR1, but 
> are private int MR2.  They should be made public with 
> InterfaceAudience.Private.
> getQueueState was public, but is now package private.  It has been replaced 
> with getState, which returns a QueueState instead of a String.  It should be 
> made public and deprecated, with a documentation reference to getState.
> Should the other setter methods in JobQueueInfo that were not in MR1 be 
> changed to public/InterfaceAudience.Private for consistency?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5006) streaming tests failing

2013-02-21 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583120#comment-13583120
 ] 

Tom White commented on MAPREDUCE-5006:
--

Sandy, what changed to cause the tests to fail?

> streaming tests failing
> ---
>
> Key: MAPREDUCE-5006
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5006
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 2.0.4-beta
>Reporter: Alejandro Abdelnur
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5006.patch
>
>
> The following 2 tests are failing in trunk
> * org.apache.hadoop.streaming.TestStreamReduceNone
> * org.apache.hadoop.streaming.TestStreamXmlRecordReader

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4502) Multi-level aggregation with combining the result of maps per node/rack

2013-02-21 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-4502:
--

Attachment: MAPREDUCE-4502.4.patch

Fixed to handle exception correctly and add timeout to TestJobConf.

> Multi-level aggregation with combining the result of maps per node/rack
> ---
>
> Key: MAPREDUCE-4502
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4502
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Affects Versions: 3.0.0
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: design_v2.pdf, MAPREDUCE-4502.1.patch, 
> MAPREDUCE-4502.2.patch, MAPREDUCE-4502.3.patch, MAPREDUCE-4502.4.patch, 
> MAPREDUCE-4525-pof.diff, speculative_draft.pdf
>
>
> The shuffle costs is expensive in Hadoop in spite of the existence of 
> combiner, because the scope of combining is limited within only one MapTask. 
> To solve this problem, it's a good way to aggregate the result of maps per 
> node/rack by launch combiner.
> This JIRA is to implement the multi-level aggregation infrastructure, 
> including combining per container(MAPREDUCE-3902 is related), coordinating 
> containers by application master without breaking fault tolerance of jobs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5017) Provide access to launcher job URL from web console when using Map Reduce action

2013-02-21 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583045#comment-13583045
 ] 

Harsh J commented on MAPREDUCE-5017:


Hi Ryota -- We can (in future) also make use of the 'More Actions -> Move' 
feature on top, to move JIRAs between project, and save time that way :)

> Provide access to launcher job URL from web console when using Map Reduce 
> action 
> -
>
> Key: MAPREDUCE-5017
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5017
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: trunk
>Reporter: Ryota Egashira
>Assignee: Ryota Egashira
> Fix For: trunk
>
>
> there are applications where custom inputformat used in MR action, and log 
> message from the inputformat is written on launcher task log. for debugging 
> purpose, users need to check the launcher task log. but currently in MR 
> action, oozie automatically swaps external ID, and do not expose the launcher 
> ID in web-console. (now only way is to to grep oozie.log). this jira is to 
> show launcher job URL on web console when using Map Reduce action 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4991) coverage for gridmix

2013-02-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583019#comment-13583019
 ] 

Hadoop QA commented on MAPREDUCE-4991:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12569337/MAPREDUCE-4991-branch-0.23.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified test files.

  {color:red}-1 one of tests included doesn't have a timeout.{color}

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3350//console

This message is automatically generated.

> coverage for gridmix
> 
>
> Key: MAPREDUCE-4991
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4991
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.7
>Reporter: Aleksey Gorshkov
> Attachments: MAPREDUCE-4991-branch-0.23.patch, 
> MAPREDUCE-4991-branch-2.patch, MAPREDUCE-4991-trunk.patch
>
>
> fix coverage for GridMix
> MAPREDUCE-4991-trunk.patch patch for thunk
> MAPREDUCE-4991-branch-2.patch for branch-2 and 
> MAPREDUCE-4991-branch-0.23.patch for branch-0.23
> known fail 
> -org.apache.hadoop.mapred.gridmix.TestGridmixSummary.testExecutionSummarizer. 
> It is for next issue

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4991) coverage for gridmix

2013-02-21 Thread Aleksey Gorshkov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Gorshkov updated MAPREDUCE-4991:


Status: Patch Available  (was: Open)

> coverage for gridmix
> 
>
> Key: MAPREDUCE-4991
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4991
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>Affects Versions: 2.0.3-alpha, 3.0.0, 0.23.7
>Reporter: Aleksey Gorshkov
> Attachments: MAPREDUCE-4991-branch-0.23.patch, 
> MAPREDUCE-4991-branch-2.patch, MAPREDUCE-4991-trunk.patch
>
>
> fix coverage for GridMix
> MAPREDUCE-4991-trunk.patch patch for thunk
> MAPREDUCE-4991-branch-2.patch for branch-2 and 
> MAPREDUCE-4991-branch-0.23.patch for branch-0.23
> known fail 
> -org.apache.hadoop.mapred.gridmix.TestGridmixSummary.testExecutionSummarizer. 
> It is for next issue

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4502) Multi-level aggregation with combining the result of maps per node/rack

2013-02-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583003#comment-13583003
 ] 

Hadoop QA commented on MAPREDUCE-4502:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12570272/MAPREDUCE-4502.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified test files.

  {color:red}-1 one of tests included doesn't have a timeout.{color}

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3349//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3349//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3349//console

This message is automatically generated.

> Multi-level aggregation with combining the result of maps per node/rack
> ---
>
> Key: MAPREDUCE-4502
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4502
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Affects Versions: 3.0.0
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: design_v2.pdf, MAPREDUCE-4502.1.patch, 
> MAPREDUCE-4502.2.patch, MAPREDUCE-4502.3.patch, MAPREDUCE-4525-pof.diff, 
> speculative_draft.pdf
>
>
> The shuffle costs is expensive in Hadoop in spite of the existence of 
> combiner, because the scope of combining is limited within only one MapTask. 
> To solve this problem, it's a good way to aggregate the result of maps per 
> node/rack by launch combiner.
> This JIRA is to implement the multi-level aggregation infrastructure, 
> including combining per container(MAPREDUCE-3902 is related), coordinating 
> containers by application master without breaking fault tolerance of jobs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira