[jira] [Updated] (MAPREDUCE-6027) mr jobs with relative paths can fail

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6027:

Labels: BB2015-05-TBR  (was: )

> mr jobs with relative paths can fail
> 
>
> Key: MAPREDUCE-6027
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6027
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: job submission
>Reporter: Wing Yew Poon
>Assignee: Wing Yew Poon
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6027.patch
>
>
> I built hadoop from branch-2 and tried to run terasort as follows:
> {noformat}
> wypoon$ bin/hadoop jar 
> share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-SNAPSHOT.jar terasort 
> sort-input sort-output
> 14/08/07 08:57:55 INFO terasort.TeraSort: starting
> 2014-08-07 08:57:56.229 java[36572:1903] Unable to load realm info from 
> SCDynamicStore
> 14/08/07 08:57:56 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 14/08/07 08:57:57 INFO input.FileInputFormat: Total input paths to process : 2
> Spent 156ms computing base-splits.
> Spent 2ms computing TeraScheduler splits.
> Computing input splits took 159ms
> Sampling 2 splits of 2
> Making 1 from 10 sampled records
> Computing parititions took 626ms
> Spent 789ms computing partitions.
> 14/08/07 08:57:57 INFO client.RMProxy: Connecting to ResourceManager at 
> localhost/127.0.0.1:8032
> 14/08/07 08:57:58 INFO mapreduce.JobSubmitter: Cleaning up the staging area 
> /tmp/hadoop-yarn/staging/wypoon/.staging/job_1407426900134_0001
> java.lang.IllegalArgumentException: Can not create a Path from an empty URI
>   at org.apache.hadoop.fs.Path.checkPathArg(Path.java:140)
>   at org.apache.hadoop.fs.Path.(Path.java:192)
>   at 
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>   at 
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.checkPermissionOfOther(ClientDistributedCacheManager.java:275)
>   at 
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.ancestorsHaveExecutePermissions(ClientDistributedCacheManager.java:256)
>   at 
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.isPublic(ClientDistributedCacheManager.java:243)
>   at 
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineCacheVisibilities(ClientDistributedCacheManager.java:162)
>   at 
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:58)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
>   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
>   at org.apache.hadoop.examples.terasort.TeraSort.run(TeraSort.java:316)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at org.apache.hadoop.examples.terasort.TeraSort.main(TeraSort.java:325)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
>   at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
>   at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> {noformat}
> If I used absolute paths for the input and out directories, the job runs fine.
> This breakage is due to HADOOP-10876.



[jira] [Updated] (MAPREDUCE-5876) SequenceFileRecordReader NPE if close() is called before initialize()

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5876:

Labels: BB2015-05-TBR  (was: )

> SequenceFileRecordReader NPE if close() is called before initialize()
> -
>
> Key: MAPREDUCE-5876
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5876
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Reinis Vicups
>Assignee: Tsuyoshi Ozawa
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5876.1.patch
>
>
> org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader extends 
> org.apache.hadoop.mapreduce.RecordReader which in turn implements 
> java.io.Closeable.
> According to java spec the java.io.Closeable#close() has to be idempotent 
> (http://docs.oracle.com/javase/7/docs/api/java/io/Closeable.html) which is 
> not.
> An NPE is being thrown if close() method is invoked without previously 
> calling initialize() method. This happens because SequenceFile.Reader in is 
> null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6003) Resource Estimator suggests huge map output in some cases

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6003:

Labels: BB2015-05-TBR  (was: )

> Resource Estimator suggests huge map output in some cases
> -
>
> Key: MAPREDUCE-6003
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6003
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 1.2.1
>Reporter: Chengbing Liu
>Assignee: Chengbing Liu
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6003-branch-1.2.patch
>
>
> In some cases, ResourceEstimator can return way too large map output 
> estimation. This happens when input size is not correctly calculated.
> A typical case is when joining two Hive tables (one in HDFS and the other in 
> HBase). The maps that process the HBase table finish first, which has a 0 
> length of inputs due to its TableInputFormat. Then for a map that processes 
> HDFS table, the estimated output size is very large because of the wrong 
> input size, causing the map task not possible to be assigned.
> There are two possible solutions to this problem:
> (1) Make input size correct for each case, e.g. HBase, etc.
> (2) Use another algorithm to estimate the map output, or at least make it 
> closer to reality.
> I prefer the second way, since the first would require all possibilities to 
> be taken care of. It is not easy for some inputs such as URIs.
> In my opinion, we could make a second estimation which is independent of the 
> input size:
> estimationB = (completedMapOutputSize / completedMaps) * totalMaps * 10
> Here, multiplying by 10 makes the estimation more conservative, so that it 
> will be less likely to assign it to some where not big enough.
> The former estimation goes like this:
> estimationA = (inputSize * completedMapOutputSize * 2.0) / 
> completedMapInputSize
> My suggestion is to take minimum of the two estimations:
> estimation = min(estimationA, estimationB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-3182) loadgen ignores -m command line when writing random data

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-3182:

Labels: BB2015-05-TBR  (was: )

> loadgen ignores -m command line when writing random data
> 
>
> Key: MAPREDUCE-3182
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3182
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2, test
>Affects Versions: 0.23.0, 2.3.0
>Reporter: Jonathan Eagles
>Assignee: Chen He
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-3182.patch
>
>
> If no input directories are specified, loadgen goes into a special mode where 
> random data is generated and written. In that mode, setting the number of 
> mappers (-m command line option) is overridden by a calculation. Instead, it 
> should take into consideration the user specified number of mappers and fall 
> back to the calculation. In addition, update the documentation as well to 
> match the new behavior in the code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5871) Estimate Job Endtime

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5871:

Labels: BB2015-05-TBR  (was: )

> Estimate Job Endtime
> 
>
> Key: MAPREDUCE-5871
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5871
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Maysam Yabandeh
>Assignee: Maysam Yabandeh
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5871.patch
>
>
> YARN-1969 adds a new earliest-endtime-first policy to the fair scheduler. As 
> a prerequisite step, the AppMaster should estimate its end time and send it 
> to the RM via the heartbeat. This jira focuses on how the AppMaster performs 
> this estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6023) Fix SuppressWarnings from "unchecked" to "rawtypes" in O.A.H.mapreduce.lib.input.TaggedInputSplit

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6023:

Labels: BB2015-05-TBR newbie  (was: newbie)

> Fix SuppressWarnings from "unchecked" to "rawtypes" in 
> O.A.H.mapreduce.lib.input.TaggedInputSplit
> -
>
> Key: MAPREDUCE-6023
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6023
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Junping Du
>Assignee: Abhilash Srimat Tirumala Pallerlamudi
>Priority: Minor
>  Labels: BB2015-05-TBR, newbie
> Attachments: MAPREDUCE-6023.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-3517) "map.input.path" is null at the first split when use CombieFileInputFormat

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-3517:

Labels: BB2015-05-TBR  (was: )

>  "map.input.path" is null at the first split when use CombieFileInputFormat
> ---
>
> Key: MAPREDUCE-3517
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3517
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: task
>Affects Versions: 0.20.203.0
>Reporter: wanbin
>  Labels: BB2015-05-TBR
> Attachments: CombineFileRecordReader.diff, MAPREDUCE-3517.02.patch
>
>
>  "map.input.path" is null at the first split when use CombieFileInputFormat. 
> because in runNewMapper function, mapContext instead of taskContext which is 
> set "map.input.path".  so we need set "map.input.path" again to mapContext



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5883) "Total megabyte-seconds" in job counters is slightly misleading

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5883:

Labels: BB2015-05-TBR  (was: )

> "Total megabyte-seconds" in job counters is slightly misleading
> ---
>
> Key: MAPREDUCE-5883
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5883
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5883.patch
>
>
> The following counters are in milliseconds so "megabyte-seconds" might be 
> better stated as "megabyte-milliseconds"
> MB_MILLIS_MAPS.name=   Total megabyte-seconds taken by all map 
> tasks
> MB_MILLIS_REDUCES.name=Total megabyte-seconds taken by all reduce 
> tasks
> VCORES_MILLIS_MAPS.name=   Total vcore-seconds taken by all map tasks
> VCORES_MILLIS_REDUCES.name=Total vcore-seconds taken by all reduce 
> tasks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5845) TestShuffleHandler failing intermittently on windows

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5845:

Labels: BB2015-05-TBR  (was: )

> TestShuffleHandler failing intermittently on windows
> 
>
> Key: MAPREDUCE-5845
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5845
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
>  Labels: BB2015-05-TBR
> Attachments: apache-mapreduce-5845.0.patch
>
>
> TestShuffleHandler fails intermittently on Windows - specifically, 
> testClientClosesConnection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5225) SplitSampler in mapreduce.lib should use a SPLIT_STEP to jump around splits

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5225:

Labels: BB2015-05-TBR  (was: )

> SplitSampler in mapreduce.lib should use a SPLIT_STEP to jump around splits
> ---
>
> Key: MAPREDUCE-5225
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5225
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5225.1.patch
>
>
> Now, SplitSampler only samples the first maxSplitsSampled splits, caused by 
> MAPREDUCE-1820. However, jumping around all splits is in general preferable 
> than the first N splits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5700) historyServer can't show container's log when aggregation is not enabled

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5700:

Labels: BB2015-05-TBR  (was: )

> historyServer can't show container's log when aggregation is not enabled
> 
>
> Key: MAPREDUCE-5700
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5700
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.23.7, 2.0.4-alpha, 2.2.0
> Environment:  yarn.log-aggregation-enable=false , HistoryServer will 
> show like this:
> Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669
>Reporter: Hong Shen
>Assignee: Hong Shen
>  Labels: BB2015-05-TBR
> Attachments: yarn-647-2.patch, yarn-647.patch
>
>
> When yarn.log-aggregation-enable is seted to false, after a MR_App complete, 
> we can't view the container's log from the HistoryServer, it shows message 
> like:
> Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669
> Since we don't want to aggregate the container's log, because it will be a 
> pressure to namenode. but sometimes we also want to take a look at 
> container's log.
> Should we show the container's log across HistoryServer even if 
> yarn.log-aggregation-enable is seted to false.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5708) Duplicate String.format in getSpillFileForWrite

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5708:

Labels: BB2015-05-TBR  (was: )

> Duplicate String.format in getSpillFileForWrite
> ---
>
> Key: MAPREDUCE-5708
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5708
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Konstantin Weitz
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: 0001-Removed-duplicate-String.format.patch
>
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> The code responsible for formatting the spill file name (namely 
> _getSpillFileForWrite_) unnecessarily calls _String.format_ twice. This does 
> not only affect performance, but leads to the weird requirement that task 
> attempt ids cannot contain _%_ characters (because these would be interpreted 
> as format specifiers in the outside _String.format_ call).
> I assume this was done by mistake, as it could only be useful if task attempt 
> ids contained _%n_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5216) While using TextSplitter in DataDrivenDBInputformat, the lower limit (split start) always remains the same, for all splits.

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5216:

Labels: BB2015-05-TBR  (was: )

> While using TextSplitter in DataDrivenDBInputformat, the lower limit (split 
> start) always remains the same, for all splits.
> ---
>
> Key: MAPREDUCE-5216
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5216
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Gelesh
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5216.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> While using TextSplitter in DataDrivenDBInputformat, the lower limit (split 
> start) always remains the same, for all splits.
> ie, 
> Split 1 Start =A, End = M, Split 2 Start =A, End = P, Split 3 Start =A, End = 
> S,
> instead of
> Split 1 Start =A, End = M, Split 2 Start =M, End = P, Split 3 Start =P, End = 
> S,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5577) Allow querying the JobHistoryServer by job arrival time

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5577:

Labels: BB2015-05-TBR  (was: )

> Allow querying the JobHistoryServer by job arrival time
> ---
>
> Key: MAPREDUCE-5577
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5577
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5577.patch
>
>
>   The JobHistoryServer REST APIs currently allow querying by job submit time 
> and finish time.  However, jobs don't necessarily arrive in order of their 
> finish time, meaning that a client who wants to stay on top of all completed 
> jobs needs to query large time intervals to make sure they're not missing 
> anything.  Exposing functionality to allow querying by the time a job lands 
> at the JobHistoryServer would allow clients to set the start of their query 
> interval to the time of their last query. 
> The arrival time of a job would be defined as the time that it lands in the 
> done directory and can be picked up using the last modified date on history 
> files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4487) Reduce job latency by removing hardcoded sleep statements

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4487:

Labels: BB2015-05-TBR  (was: )

> Reduce job latency by removing hardcoded sleep statements
> -
>
> Key: MAPREDUCE-4487
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4487
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1, mrv2, performance
>Affects Versions: 1.0.3, 2.0.0-alpha
>Reporter: Tom White
>Assignee: Tom White
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-4487-mr2.patch, MAPREDUCE-4487.patch
>
>
> There are a few places in MapReduce where there are hardcoded sleep 
> statements. By replacing them with wait/notify or similar it's possible to 
> reduce latency for short running jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5227) JobTrackerMetricsSource and QueueMetrics should standardize naming rules

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5227:

Labels: BB2015-05-TBR  (was: )

> JobTrackerMetricsSource and QueueMetrics should standardize naming rules
> 
>
> Key: MAPREDUCE-5227
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5227
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1
>Affects Versions: 1.1.3, 1.2.1
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5227-1.1-branch.1.patch, 
> MAPREDUCE-5227-branch-1.1.patch, MAPREDUCE-5227.1.patch
>
>
> JobTrackerMetricsSource and QueueMetrics provides users with some metrics, 
> but its naming rules( "jobs_running", "running_maps", "running_reduces") 
> sometimes confuses users. It should be standardized.
> One concern is backward compatibility, so one idea is to share 
> MetricMutableGaugeInt object from old and new property name.
> e.g. to share runningMaps from "running_maps" and "maps_running".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5248) Let NNBenchWithoutMR specify the replication factor for its test

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5248:

Labels: BB2015-05-TBR  (was: )

> Let NNBenchWithoutMR specify the replication factor for its test
> 
>
> Key: MAPREDUCE-5248
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5248
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client, test
>Affects Versions: 3.0.0
>Reporter: Erik Paulson
>Assignee: Erik Paulson
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5248.patch, MAPREDUCE-5248.txt
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The NNBenchWithoutMR test creates files with a replicationFactorPerFile 
> hard-coded to 1. It'd be nice to be able to specify that on the commandline.
> Also, it'd be great if MAPREDUCE-4750 was merged along with this fix. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5264) FileAlreadyExistsException is assumed to be thrown by FileSystem#mkdirs or FileContext#mkdir in the codebase

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5264:

Labels: BB2015-05-TBR  (was: )

> FileAlreadyExistsException is assumed to be thrown by FileSystem#mkdirs or 
> FileContext#mkdir in the codebase
> 
>
> Key: MAPREDUCE-5264
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5264
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.1.0-beta
>Reporter: Rémy SAISSY
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5264.20130607.1.patch
>
>
> According to https://issues.apache.org/jira/browse/HADOOP-9438,
> FileSystem#mkdirs and FileContext#mkdir do not throw 
> FileAlreadyExistsException if the directory already exist.
> Some places in the mapreduce codebase assumes FileSystem#mkdirs or 
> FileContext#mkdir throw FileAlreadyExistsException.
> At least the following files are concerned:
>  - YarnChild.java
>  - JobHistoryEverntHandler.java
>  - HistoryFileManager.java
> It would be good to re-review and patch this if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4840) Delete dead code and deprecate public API related to skipping bad records

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4840:

Labels: BB2015-05-TBR  (was: )

> Delete dead code and deprecate public API related to skipping bad records
> -
>
> Key: MAPREDUCE-4840
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4840
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Mostafa Elhemali
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-4840.patch
>
>
> It looks like the decision was made in MAPREDUCE-1932 to remove support for 
> skipping bad records rather than fix it (it doesn't work right now in trunk). 
> If that's the case then we should probably delete all the dead code related 
> to it and deprecate the public API's for it right?
> Dead code I'm talking about:
> 1. Task class: skipping, skipRanges, writeSkipRecs
> 2. MapTask class:  SkippingRecordReader inner class
> 3. ReduceTask class: SkippingReduceValuesIterator inner class
> 4. Tests: TestBadRecords
> Public API:
> 1. SkipBadRecords class



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5969:

Labels: BB2015-05-TBR  (was: )

> Private non-Archive Files' size add twice in Distributed Cache directory size 
> calculation.
> --
>
> Key: MAPREDUCE-5969
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>Reporter: zhihai xu
>Assignee: zhihai xu
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5969.branch1.1.patch, 
> MAPREDUCE-5969.branch1.patch
>
>
> Private non-Archive Files' size add twice in Distributed Cache directory size 
> calculation. Private non-Archive Files list is passed in by "-files" command 
> line option. The Distributed Cache directory size is used to check whether 
> the total cache files size exceed the cache size limitation,  the default 
> cache size limitation is 10G.
> I add log in addCacheInfoUpdate and setSize in 
> TrackerDistributedCacheManager.java.
> I use the following command to test:
> hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files 
> hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
>  /tmp/zxu/test_in/ /tmp/zxu/test_out
> to add two files into distributed cache:WordCount.java and wordcount.jar.
> WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 
> bytes. The total should be 6260.
> The log show these files size added twice:
> add one time before download to local node and add second time after download 
> to local node, so total file number becomes 4 instead of 2:
> addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
> addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
> addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
> In the code, for Private non-Archive File, the first time we add file size is 
> at 
> getLocalCache:
> {code}
> if (!isArchive) {
>   //for private archives, the lengths come over RPC from the 
>   //JobLocalizer since the JobLocalizer is the one who expands
>   //archives and gets the total length
>   lcacheStatus.size = fileStatus.getLen();
>   LOG.info("getLocalCache:" + localizedPath + " size = "
>   + lcacheStatus.size);
>   // Increase the size and sub directory count of the cache
>   // from baseDirSize and baseDirNumberSubDir.
>   baseDirManager.addCacheInfoUpdate(lcacheStatus);
> }
> {code}
> The second time we add file size is at 
> setSize:
> {code}
>   synchronized (status) {
> status.size = size;
> baseDirManager.addCacheInfoUpdate(status);
>   }
> {code}
> The fix is not to add the file size for for Private non-Archive File after 
> download(downloadCacheObject).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4818) Easier identification of tasks that timeout during localization

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4818:

Labels: BB2015-05-TBR usability  (was: usability)

> Easier identification of tasks that timeout during localization
> ---
>
> Key: MAPREDUCE-4818
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4818
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am
>Affects Versions: 0.23.3, 2.0.3-alpha
>Reporter: Jason Lowe
>Assignee: Siqi Li
>  Labels: BB2015-05-TBR, usability
> Attachments: MAPREDUCE-4818.v1.patch, MAPREDUCE-4818.v2.patch, 
> MAPREDUCE-4818.v3.patch, MAPREDUCE-4818.v4.patch, MAPREDUCE-4818.v5.patch
>
>
> When a task is taking too long to localize and is killed by the AM due to 
> task timeout, the job UI/history is not very helpful.  The attempt simply 
> lists a diagnostic stating it was killed due to timeout, but there are no 
> logs for the attempt since it never actually got started.  There are log 
> messages on the NM that show the container never made it past localization by 
> the time it was killed, but users often do not have access to those logs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4216) Make MultipleOutputs generic to support non-file output formats

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4216:

Labels: BB2015-05-TBR Output  (was: Output)

> Make MultipleOutputs generic to support non-file output formats
> ---
>
> Key: MAPREDUCE-4216
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4216
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Affects Versions: 1.0.2
>Reporter: Robbie Strickland
>  Labels: BB2015-05-TBR, Output
> Attachments: MAPREDUCE-4216.patch
>
>
> The current MultipleOutputs implementation is tied to FileOutputFormat in 
> such a way that it is not extensible to other types of output. It should be 
> made more generic, such as with an interface that can be implemented for 
> different outputs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-3115) OOM When the value for the property "mapred.map.multithreadedrunner.class" is set to MultithreadedMapper instance.

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-3115:

Labels: BB2015-05-TBR  (was: )

> OOM When the value for the property "mapred.map.multithreadedrunner.class" is 
> set to MultithreadedMapper instance.
> --
>
> Key: MAPREDUCE-3115
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3115
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>Affects Versions: 0.23.0, 1.0.0
> Environment: NA
>Reporter: Bhallamudi Venkata Siva Kamesh
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-3115.2.patch, MAPREDUCE-3115.patch
>
>
> When we set the value for the property *mapred.map.multithreadedrunner.class* 
> as instance of MultithreadedMapper, using 
> MultithreadedMapper.setMapperClass(), it simply throws 
> IllegalArgumentException.
> But when we set the same property, using job's conf object using 
> job.getConfiguration().setClass(*mapred.map.multithreadedrunner.class*, 
> MultithreadedMapper.class, Mapper.class), throws OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5203) Make AM of M/R Use NMClient

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5203:

Labels: BB2015-05-TBR  (was: )

> Make AM of M/R Use NMClient
> ---
>
> Key: MAPREDUCE-5203
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5203
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5203.1.patch, MAPREDUCE-5203.2.patch, 
> MAPREDUCE-5203.3.patch, MAPREDUCE-5203.4.patch, MAPREDUCE-5203.5.patch
>
>
> YARN-422 adds NMClient. AM of mapreduce should use it instead of using the 
> raw ContainerManager proxy directly. ContainerLauncherImpl needs to be 
> changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-2632) Avoid calling the partitioner when the numReduceTasks is 1.

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-2632:

Labels: BB2015-05-TBR  (was: )

> Avoid calling the partitioner when the numReduceTasks is 1.
> ---
>
> Key: MAPREDUCE-2632
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2632
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tasktracker
>Affects Versions: 0.23.0
>Reporter: Ravi Teja Ch N V
>Assignee: Ravi Teja Ch N V
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-2632-1.patch, MAPREDUCE-2632.patch
>
>
> We can avoid the call to the partitioner when the number of reducers is 
> 1.This will avoid the unnecessary computations by the partitioner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5374) CombineFileRecordReader does not set "map.input.*" configuration parameters for first file read

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5374:

Labels: BB2015-05-TBR  (was: )

> CombineFileRecordReader does not set "map.input.*" configuration parameters 
> for first file read
> ---
>
> Key: MAPREDUCE-5374
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5374
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Dave Beech
>Assignee: Dave Beech
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5374.patch, MAPREDUCE-5374.patch
>
>
> The CombineFileRecordReader operates on splits consisting of multiple files. 
> Each time a new record reader is initialised for a "chunk", certain 
> parameters are supposed to be set on the configuration object 
> (map.input.file, map.input.start and map.input.length)
> However, the first reader is initialised in a different way to subsequent 
> ones (i.e. initialize is called by the MapTask directly rather than from 
> inside the record reader class). Because of this, these config parameters are 
> not set properly and are returned as null when you access them from inside a 
> mapper. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5499) Fix synchronization issues of the setters/getters of *PBImpl which take in/return lists

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5499:

Labels: BB2015-05-TBR  (was: )

> Fix synchronization issues of the setters/getters of *PBImpl which take 
> in/return lists
> ---
>
> Key: MAPREDUCE-5499
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5499
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Zhijie Shen
>Assignee: Xuan Gong
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5499.1.patch, MAPREDUCE-5499.2.patch
>
>
> Similar to YARN-609. There're the following *PBImpls which need to be fixed:
> 1. GetDiagnosticsResponsePBImpl
> 2. GetTaskAttemptCompletionEventsResponsePBImpl
> 3. GetTaskReportsResposnePBImpl
> 4. CounterGroupPBImpl
> 5. JobReportPBImpl
> 6. TaskReportPBImpl



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5392) "mapred job -history all" command throws IndexOutOfBoundsException

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5392:

Labels: BB2015-05-TBR  (was: )

> "mapred job -history all" command throws IndexOutOfBoundsException
> --
>
> Key: MAPREDUCE-5392
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5392
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.0.0, 2.0.5-alpha, 2.2.0
>Reporter: Shinichi Yamashita
>Assignee: Shinichi Yamashita
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5392.2.patch, MAPREDUCE-5392.3.patch, 
> MAPREDUCE-5392.4.patch, MAPREDUCE-5392.5.patch, MAPREDUCE-5392.patch, 
> MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, 
> MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, 
> MAPREDUCE-5392.patch, MAPREDUCE-5392.patch
>
>
> When I use an "all" option by "mapred job -history" comamnd, the following 
> exceptions are displayed and do not work.
> {code}
> Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String 
> index out of range: -3
> at java.lang.String.substring(String.java:1875)
> at 
> org.apache.hadoop.mapreduce.util.HostUtil.convertTrackerNameToHostName(HostUtil.java:49)
> at 
> org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.getTaskLogsUrl(HistoryViewer.java:459)
> at 
> org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.printAllTaskAttempts(HistoryViewer.java:235)
> at 
> org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.print(HistoryViewer.java:117)
> at org.apache.hadoop.mapreduce.tools.CLI.viewHistory(CLI.java:472)
> at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:313)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1233)
> {code}
> This is because a node name recorded in History file is not given "tracker_". 
> Therefore it makes modifications to be able to read History file even if a 
> node name is not given by "tracker_".
> In addition, it fixes the URL of displayed task log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4065) Add .proto files to built tarball

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4065:

Labels: BB2015-05-TBR  (was: )

> Add .proto files to built tarball
> -
>
> Key: MAPREDUCE-4065
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4065
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.23.2, 2.4.0
>Reporter: Ralph H Castain
>Assignee: Tsuyoshi Ozawa
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-4065.1.patch
>
>
> Please add the .proto files to the built tarball so that users can build 3rd 
> party tools that use protocol buffers without having to do an svn checkout of 
> the source code.
> Sorry I don't know more about Maven, or I would provide a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6040) distcp should automatically use /.reserved/raw when run by the superuser

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6040:

Labels: BB2015-05-TBR  (was: )

> distcp should automatically use /.reserved/raw when run by the superuser
> 
>
> Key: MAPREDUCE-6040
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6040
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 3.0.0
>Reporter: Andrew Wang
>Assignee: Charles Lamb
>  Labels: BB2015-05-TBR
> Attachments: HDFS-6134-Distcp-cp-UseCasesTable2.pdf, 
> MAPREDUCE-6040.001.patch, MAPREDUCE-6040.002.patch
>
>
> On HDFS-6134, [~sanjay.radia] asked for distcp to automatically prepend 
> /.reserved/raw if the distcp is being performed by the superuser and 
> /.reserved/raw is supported by both the source and destination filesystems. 
> This behavior only occurs if none of the src and target pathnames are 
> /.reserved/raw.
> The -disablereservedraw flag can be used to disable this option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5889) Deprecate FileInputFormat.setInputPaths(Job, String) and FileInputFormat.addInputPaths(Job, String)

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5889:

Labels: BB2015-05-TBR newbie  (was: newbie)

> Deprecate FileInputFormat.setInputPaths(Job, String) and 
> FileInputFormat.addInputPaths(Job, String)
> ---
>
> Key: MAPREDUCE-5889
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5889
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Akira AJISAKA
>Assignee: Akira AJISAKA
>Priority: Minor
>  Labels: BB2015-05-TBR, newbie
> Attachments: MAPREDUCE-5889.3.patch, MAPREDUCE-5889.patch, 
> MAPREDUCE-5889.patch
>
>
> {{FileInputFormat.setInputPaths(Job job, String commaSeparatedPaths)}} and 
> {{FileInputFormat.addInputPaths(Job job, String commaSeparatedPaths)}} fail 
> to parse commaSeparatedPaths if a comma is included in the file path. (e.g. 
> Path: {{/path/file,with,comma}})
> We should deprecate these methods and document to use {{setInputPaths(Job 
> job, Path... inputPaths)}} and {{addInputPaths(Job job, Path... inputPaths)}} 
> instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5929) YARNRunner.java, path for jobJarPath not set correctly

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5929:

Labels: BB2015-05-TBR newbie patch  (was: newbie patch)

> YARNRunner.java, path for jobJarPath not set correctly
> --
>
> Key: MAPREDUCE-5929
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5929
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Chao Tian
>Assignee: Rahul Palamuttam
>  Labels: BB2015-05-TBR, newbie, patch
> Attachments: MAPREDUCE-5929.patch
>
>
> In YARNRunner.java, line 357,
> Path jobJarPath = new Path(jobConf.get(MRJobConfig.JAR));
> This causes the job.jar file to miss scheme, host and port number on 
> distributed file systems other than hdfs. 
> If we compare line 357 with line 344, there "job.xml" is actually set as
>  
> Path jobConfPath = new Path(jobSubmitDir,MRJobConfig.JOB_CONF_FILE);
> It appears "jobSubmitDir" is missing on line 357, which causes this problem. 
> In hdfs, the additional qualify process will correct this problem, but not 
> other generic distributed file systems.
> The proposed change is to replace 35 7 with
> Path jobJarPath = new Path(jobConf.get(jobSubmitDir,MRJobConfig.JAR));
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6038) A boolean may be set error in the Word Count v2.0 in MapReduce Tutorial

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6038:

Labels: BB2015-05-TBR  (was: )

> A boolean may be set error in the Word Count v2.0 in MapReduce Tutorial
> ---
>
> Key: MAPREDUCE-6038
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6038
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
> Environment: java version 1.8.0_11 hostspot 64-bit
>Reporter: Pei Ma
>Assignee: Tsuyoshi Ozawa
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6038.1.patch
>
>
> As a beginner, when I learned about the basic of the mr, I found that I 
> cound't run the WordCount2 using the command "bin/hadoop jar wc.jar 
> WordCount2 /user/joe/wordcount/input /user/joe/wordcount/output" in the 
> Tutorial. The VM throwed the NullPoniterException at the line 47. In the line 
> 45, the returned default value of "conf.getBoolean" is true. That is to say  
> when "wordcount.skip.patterns" is not set ,the WordCount2 will continue to 
> execute getCacheFiles.. Then patternsURIs gets the null value. When the 
> "-skip" option dosen't exist,  "wordcount.skip.patterns" will not be set. 
> Then a NullPointerException come out.
> At all, the block after the if-statement in line no. 45 shoudn't be executed 
> when the "-skip" option dosen't exist in command. Maybe the line 45 should 
> like that  "if (conf.getBoolean("wordcount.skip.patterns", false)) { "
> .Just change the boolean.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5817) mappers get rescheduled on node transition even after all reducers are completed

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5817:

Labels: BB2015-05-TBR  (was: )

> mappers get rescheduled on node transition even after all reducers are 
> completed
> 
>
> Key: MAPREDUCE-5817
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5817
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 2.3.0
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
>  Labels: BB2015-05-TBR
> Attachments: mapreduce-5817.patch
>
>
> We're seeing a behavior where a job runs long after all reducers were already 
> finished. We found that the job was rescheduling and running a number of 
> mappers beyond the point of reducer completion. In one situation, the job ran 
> for some 9 more hours after all reducers completed!
> This happens because whenever a node transition (to an unusable state) comes 
> into the app master, it just reschedules all mappers that already ran on the 
> node in all cases.
> Therefore, if any node transition has a potential to extend the job period. 
> Once this window opens, another node transition can prolong it, and this can 
> happen indefinitely in theory.
> If there is some instability in the pool (unhealthy, etc.) for a duration, 
> then any big job is severely vulnerable to this problem.
> If all reducers have been completed, JobImpl.actOnUnusableNode() should not 
> reschedule mapper tasks. If all reducers are completed, the mapper outputs 
> are no longer needed, and there is no need to reschedule mapper tasks as they 
> would not be consumed anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5490) MapReduce doesn't set the environment variable for children processes

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5490:

Labels: BB2015-05-TBR  (was: )

> MapReduce doesn't set the environment variable for children processes
> -
>
> Key: MAPREDUCE-5490
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5490
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5490.patch, mr-5490.patch, mr-5490.patch
>
>
> Currently, MapReduce uses the command line argument to pass the classpath to 
> the child. This breaks if the process forks a child that needs the same 
> classpath. Such a case happens in Hive when it uses map-side joins. I propose 
> that we make MapReduce in branch-1 use the CLASSPATH environment variable 
> like YARN does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6020) Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job count

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6020:

Labels: BB2015-05-TBR  (was: )

> Too many threads blocking on the global JobTracker lock from getJobCounters, 
> optimize getJobCounters to release global JobTracker lock before access the 
> per job counter in JobInProgress
> -
>
> Key: MAPREDUCE-6020
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6020
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.23.10
>Reporter: zhihai xu
>Assignee: zhihai xu
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6020.branch1.patch
>
>
> Too many threads blocking on the global JobTracker lock from getJobCounters, 
> optimize getJobCounters to release global JobTracker lock before access the 
> per job counter in JobInProgress. It may be a lot of JobClients to call 
> getJobCounters in JobTracker at the same time, Current code will lock the 
> JobTracker to block all the threads to get counter from JobInProgress. It is 
> better to unlock the JobTracker when get counter from 
> JobInProgress(job.getCounters(counters)). So all the theads can run parallel 
> when access its own job counter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5981) Log levels of certain MR logs can be changed to DEBUG

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5981:

Labels: BB2015-05-TBR  (was: )

> Log levels of certain MR logs can be changed to DEBUG
> -
>
> Key: MAPREDUCE-5981
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5981
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5981.patch
>
>
> Following map reduce logs can be changed to DEBUG log level.
> 1. In 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher#copyFromHost(Fetcher.java : 
> 313), the second log is not required to be at info level. This can be moved 
> to debug as a warn log is anyways printed if verifyReply fails.
>   SecureShuffleUtils.verifyReply(replyHash, encHash, shuffleSecretKey);
>   LOG.info("for url="+msgToEncode+" sent hash and received reply");
> 2. Thread related info need not be printed in logs at INFO level. Below 2 
> logs can be moved to DEBUG
> a) In 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl#getHost(ShuffleSchedulerImpl.java
>  : 381), below log can be changed to DEBUG
>LOG.info("Assigning " + host + " with " + host.getNumKnownMapOutputs() +
>" to " + Thread.currentThread().getName());
> b) In 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.getMapsForHost(ShuffleSchedulerImpl.java
>  : 411), below log can be changed to DEBUG
>  LOG.info("assigned " + includedMaps + " of " + totalSize + " to " +
>  host + " to " + Thread.currentThread().getName());
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5362) clean up POM dependencies

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5362:

Labels: BB2015-05-TBR  (was: )

> clean up POM dependencies
> -
>
> Key: MAPREDUCE-5362
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5362
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.1.0-beta
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5362.patch, mr-5362-0.patch
>
>
> Intermediate 'pom' modules define dependencies inherited by leaf modules.
> This is causing issues in intellij IDE.
> We should normalize the leaf modules like in common, hdfs and tools where all 
> dependencies are defined in each leaf module and the intermediate 'pom' 
> module do not define any dependency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6030) In mr-jobhistory-daemon.sh, some env variables are not affected by mapred-env.sh

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6030:

Labels: BB2015-05-TBR  (was: )

> In mr-jobhistory-daemon.sh, some env variables are not affected by 
> mapred-env.sh
> 
>
> Key: MAPREDUCE-6030
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6030
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 2.4.1
>Reporter: Youngjoon Kim
>Assignee: Youngjoon Kim
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6030.patch
>
>
> In mr-jobhistory-daemon.sh, some env variables are exported before sourcing 
> mapred-env.sh, so these variables don't use values defined in mapred-env.sh.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4711) Append time elapsed since job-start-time for finished tasks

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4711:

Labels: BB2015-05-TBR  (was: )

> Append time elapsed since job-start-time for finished tasks
> ---
>
> Key: MAPREDUCE-4711
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4711
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 0.23.3
>Reporter: Ravi Prakash
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-4711.branch-0.23.patch
>
>
> In 0.20.x/1.x, the analyze job link gave this information
> bq. The last Map task task_ finished at (relative to the Job launch 
> time): 5/10 20:23:10 (1hrs, 27mins, 54sec)
> The time it took for the last task to finish needs to be calculated mentally 
> in 0.23. I believe we should print it next to the finish time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4957) Throw FileNotFoundException when running in single node and "mapreduce.framework.name" is local

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4957:

Labels: BB2015-05-TBR  (was: )

> Throw FileNotFoundException when running in single node and 
> "mapreduce.framework.name" is local
> ---
>
> Key: MAPREDUCE-4957
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4957
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Yi Liu
>Assignee: Yi Liu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-4957.patch, MAPREDUCE-4957.patch
>
>
> Run in single node and "mapreduce.framework.name" is local, and get following 
> error:
> java.io.FileNotFoundException: File does not exist: 
> /root/proj/hive-trunk/build/dist/lib/hive-builtins-0.11.0-SNAPSHOT.jar 
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:772)
>  
> at 
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208)
>  
> at 
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:71)
>  
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:254)
>  
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:292)
>  
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:365)
>  
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218) 
> at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:396) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1450)
>  
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215) 
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:617) 
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:612) 
> at java.security.AccessController.doPrivileged(Native Method) 
> at javax.security.auth.Subject.doAs(Subject.java:396) 
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1450)
>  
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:612) 
> at 
> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:446) 
> at 
> org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:683) 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>  
> at java.lang.reflect.Method.invoke(Method.java:597) 
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212) 
> Job Submission failed with exception 'java.io.FileNotFoundException(File does 
> not exist: 
> /root/proj/hive-trunk/build/dist/lib/hive-builtins-0.11.0-SNAPSHOT.jar)'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5748) Potential null pointer deference in ShuffleHandler#Shuffle#messageReceived()

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5748:

Labels: BB2015-05-TBR  (was: )

> Potential null pointer deference in ShuffleHandler#Shuffle#messageReceived()
> 
>
> Key: MAPREDUCE-5748
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5748
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: 
> 0001-MAPREDUCE-5748-Potential-null-pointer-deference-in-S.patch
>
>
> Starting around line 510:
> {code}
>   ChannelFuture lastMap = null;
>   for (String mapId : mapIds) {
> ...
>   }
>   lastMap.addListener(metrics);
>   lastMap.addListener(ChannelFutureListener.CLOSE);
> {code}
> If mapIds is empty, lastMap would remain null, leading to NPE in 
> addListener() call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-3486) All jobs of all queues will be returned, whethor a particular queueName is specified or not

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-3486:

Labels: BB2015-05-TBR  (was: )

> All jobs of all queues will be returned, whethor a particular queueName is 
> specified or not
> ---
>
> Key: MAPREDUCE-3486
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3486
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 1.1.3, 1.3.0, 1.2.2
>Reporter: XieXianshan
>Assignee: XieXianshan
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-3486.patch
>
>
> JobTracker.getJobsFromQueue(queueName) will return all jobs of all queues 
> about the jobtracker even though i specify a queueName. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5704) Optimize nextJobId in JobTracker

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5704:

Labels: BB2015-05-TBR  (was: )

> Optimize nextJobId in JobTracker
> 
>
> Key: MAPREDUCE-5704
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5704
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker, mrv1
>Affects Versions: 1.2.1
>Reporter: JamesLi
>Assignee: JamesLi
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5704.patch
>
>
> When jobtracker start, nextJobId start with 1,if we have run 3000 jobs  then 
> restart jobtracker and run a new job,we can not see this new job on 
> jobtracker:5030/jobhistory.jsp unless click "get more results" button.
> In jobhistory_jsp.java, array SCAN_SIZES controls job numbers displayed on 
> jobhistory.jsp.
> I make a little chage,when jobtracker start,find the biggest id under history 
> done directory,job will start with maxId+1 or 1 if can not find any job files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5907) Improve getSplits() performance for fs implementations that can utilize performance gains from recursive listing

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5907:

Labels: BB2015-05-TBR  (was: )

> Improve getSplits() performance for fs implementations that can utilize 
> performance gains from recursive listing
> 
>
> Key: MAPREDUCE-5907
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5907
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 2.4.0
>Reporter: Sumit Kumar
>Assignee: Sumit Kumar
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5907-2.patch, MAPREDUCE-5907-3.patch, 
> MAPREDUCE-5907.patch
>
>
> FileInputFormat (both mapreduce and mapred implementations) use recursive 
> listing while calculating splits. They however do this by doing listing level 
> by level. That means to discover files in /foo/bar means they do listing at 
> /foo/bar first to get the immediate children, then make the same call on all 
> immediate children for /foo/bar to discover their immediate children and so 
> on. This doesn't scale well for object store based fs implementations like s3 
> and swift because every listStatus call ends up being a webservice call to 
> backend. In cases where large number of files are considered for input, this 
> makes getSplits() call slow. 
> This patch adds a new set of recursive list apis that gives opportunity to 
> the fs implementations to optimize. The behavior remains the same for other 
> implementations (that is a default implementation is provided for other fs so 
> they don't have to implement anything new). However for objectstore based fs 
> implementations it provides a simple change to include recursive flag as true 
> (as shown in the patch) to improve listing performance.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4502) Node-level aggregation with combining the result of maps

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4502:

Labels: BB2015-05-TBR  (was: )

> Node-level aggregation with combining the result of maps
> 
>
> Key: MAPREDUCE-4502
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4502
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Affects Versions: 3.0.0
>Reporter: Tsuyoshi Ozawa
>Assignee: Tsuyoshi Ozawa
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-4502.1.patch, MAPREDUCE-4502.10.patch, 
> MAPREDUCE-4502.2.patch, MAPREDUCE-4502.3.patch, MAPREDUCE-4502.4.patch, 
> MAPREDUCE-4502.5.patch, MAPREDUCE-4502.6.patch, MAPREDUCE-4502.7.patch, 
> MAPREDUCE-4502.8.patch, MAPREDUCE-4502.8.patch, MAPREDUCE-4502.9.patch, 
> MAPREDUCE-4502.9.patch, MAPREDUCE-4525-pof.diff, design_v2.pdf, 
> design_v3.pdf, speculative_draft.pdf
>
>
> The shuffle costs is expensive in Hadoop in spite of the existence of 
> combiner, because the scope of combining is limited within only one MapTask. 
> To solve this problem, it's a good way to aggregate the result of maps per 
> node/rack by launch combiner.
> This JIRA is to implement the multi-level aggregation infrastructure, 
> including combining per container(MAPREDUCE-3902 is related), coordinating 
> containers by application master without breaking fault tolerance of jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5611) CombineFileInputFormat only requests a single location per split when more could be optimal

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5611:

Labels: BB2015-05-TBR  (was: )

> CombineFileInputFormat only requests a single location per split when more 
> could be optimal
> ---
>
> Key: MAPREDUCE-5611
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5611
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Chandra Prakash Bhagtani
>Assignee: Chandra Prakash Bhagtani
>  Labels: BB2015-05-TBR
> Attachments: CombineFileInputFormat-trunk.patch
>
>
> I have come across an issue with CombineFileInputFormat. Actually I ran a 
> hive query on approx 1.2 GB data with CombineHiveInputFormat which internally 
> uses CombineFileInputFormat. My cluster size is 9 datanodes and 
> max.split.size is 256 MB
> When I ran this query with replication factor 9, hive consistently creates 
> all 6 rack-local tasks and with replication factor 3 it creates 5 rack-local 
> and 1 data local tasks. 
>  When replication factor is 9 (equal to cluster size), all the tasks should 
> be data-local as each datanode contains all the replicas of the input data, 
> but that is not happening i.e all the tasks are rack-local. 
> When I dug into CombineFileInputFormat.java code in getMoreSplits method, I 
> found the issue with the following snippet (specially in case of higher 
> replication factor)
> {code:title=CombineFileInputFormat.java|borderStyle=solid}
> for (Iterator  List>> iter = nodeToBlocks.entrySet().iterator();
>  iter.hasNext();) {
>Map.Entry> one = iter.next();
>   nodes.add(one.getKey());
>   List blocksInNode = one.getValue();
>   // for each block, copy it into validBlocks. Delete it from
>   // blockToNodes so that the same block does not appear in
>   // two different splits.
>   for (OneBlockInfo oneblock : blocksInNode) {
> if (blockToNodes.containsKey(oneblock)) {
>   validBlocks.add(oneblock);
>   blockToNodes.remove(oneblock);
>   curSplitSize += oneblock.length;
>   // if the accumulated split size exceeds the maximum, then
>   // create this split.
>   if (maxSize != 0 && curSplitSize >= maxSize) {
> // create an input split and add it to the splits array
> addCreatedSplit(splits, nodes, validBlocks);
> curSplitSize = 0;
> validBlocks.clear();
>   }
> }
>   }
> {code}
> First node in the map nodeToBlocks has all the replicas of input file, so the 
> above code creates 6 splits all with only one location. Now if JT doesn't 
> schedule these tasks on that node, all the tasks will be rack-local, even 
> though all the other datanodes have all the other replicas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5621) mr-jobhistory-daemon.sh doesn't have to execute mkdir and chown all the time

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5621:

Labels: BB2015-05-TBR  (was: )

> mr-jobhistory-daemon.sh doesn't have to execute mkdir and chown all the time
> 
>
> Key: MAPREDUCE-5621
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5621
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 3.0.0
>Reporter: Shinichi Yamashita
>Assignee: Shinichi Yamashita
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5621.patch
>
>
> mr-jobhistory-daemon.sh executes mkdir and chown command to output the log 
> files.
> This is always executed with or without a directory. In addition, this is 
> executed not only starting daemon but also stopping daemon.
> It add "if" like hadoop-daemon.sh and yarn-daemon.sh and should control it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4980) Parallel test execution of hadoop-mapreduce-client-core

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4980:

Labels: BB2015-05-TBR  (was: )

> Parallel test execution of hadoop-mapreduce-client-core
> ---
>
> Key: MAPREDUCE-4980
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4980
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Affects Versions: 3.0.0
>Reporter: Tsuyoshi Ozawa
>Assignee: Andrey Klochkov
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-4980--n3.patch, MAPREDUCE-4980--n4.patch, 
> MAPREDUCE-4980--n5.patch, MAPREDUCE-4980--n6.patch, MAPREDUCE-4980--n7.patch, 
> MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n8.patch, MAPREDUCE-4980.1.patch, 
> MAPREDUCE-4980.patch
>
>
> The maven surefire plugin supports parallel testing feature. By using it, the 
> tests can be run more faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4330) TaskAttemptCompletedEventTransition invalidates previously successful attempt without checking if the newly completed attempt is successful

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4330:

Labels: BB2015-05-TBR  (was: )

> TaskAttemptCompletedEventTransition invalidates previously successful attempt 
> without checking if the newly completed attempt is successful
> ---
>
> Key: MAPREDUCE-4330
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4330
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.23.1
>Reporter: Bikas Saha
>Assignee: Omkar Vinit Joshi
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-4330-20130415.1.patch, 
> MAPREDUCE-4330-20130415.patch, MAPREDUCE-4330-21032013.1.patch, 
> MAPREDUCE-4330-21032013.patch
>
>
> The previously completed attempt is removed from 
> successAttemptCompletionEventNoMap and marked OBSOLETE.
> After that, if the newly completed attempt is successful then it is added to 
> the successAttemptCompletionEventNoMap. 
> This seems wrong because the newly completed attempt could be failed and thus 
> there is no need to invalidate the successful attempt.
> One error case would be when a speculative attempt completes with 
> killed/failed after the successful version has completed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4273) Make CombineFileInputFormat split result JDK independent

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4273:

Labels: BB2015-05-TBR  (was: )

> Make CombineFileInputFormat split result JDK independent
> 
>
> Key: MAPREDUCE-4273
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4273
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 1.0.3
>Reporter: Luke Lu
>Assignee: Yu Gao
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-4273-branch1-v2.patch, 
> mapreduce-4273-branch-1.patch, mapreduce-4273-branch-2.patch, 
> mapreduce-4273.patch
>
>
> The split result of CombineFileInputFormat depends on the iteration order of  
> nodeToBlocks and rackToBlocks hash maps, which makes the result HashMap 
> implementation hence JDK dependent.
> This is manifested as TestCombineFileInputFormat failures on alternative JDKs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5377) JobID is not displayed truly by "hadoop job -history" command

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5377:

Labels: BB2015-05-TBR newbie  (was: newbie)

> JobID is not displayed truly by "hadoop job -history" command
> -
>
> Key: MAPREDUCE-5377
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5377
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>Affects Versions: 1.2.0
>Reporter: Shinichi Yamashita
>Assignee: Shinichi Yamashita
>Priority: Minor
>  Labels: BB2015-05-TBR, newbie
> Attachments: MAPREDUCE-5377.patch
>
>
> JobID output by "hadoop job -history" command is wrong string.
> {quote}
> [hadoop@hadoop hadoop]$ hadoop job -history terasort
> Hadoop job: 0001_1374260789919_hadoop
> =
> Job tracker host name: job
> job tracker start time: Tue May 18 15:39:51 PDT 1976
> User: hadoop
> JobName: TeraSort
> JobConf: 
> hdfs://hadoop:8020/hadoop/mapred/staging/hadoop/.staging/job_201307191206_0001/job.xml
> Submitted At: 19-7-2013 12:06:29
> Launched At: 19-7-2013 12:06:30 (0sec)
> Finished At: 19-7-2013 12:06:44 (14sec)
> Status: SUCCESS
> {quote}
> In this example, it should show "job_201307191206_0001" at "Hadoop job:", but 
> shows "0001_1374260789919_hadoop". In addition, "Job tracker host name" and 
> "job tracker start time" is invalid.
> This problem can solve by fixing setting of jobId in HistoryViewer(). In 
> addition, it should fix the information of JobTracker at HistoryViewr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5403) MR changes to accommodate yarn.application.classpath being moved to the server-side

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5403:

Labels: BB2015-05-TBR  (was: )

> MR changes to accommodate yarn.application.classpath being moved to the 
> server-side
> ---
>
> Key: MAPREDUCE-5403
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5403
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 2.0.5-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5403-1.patch, MAPREDUCE-5403-2.patch, 
> MAPREDUCE-5403.patch
>
>
> yarn.application.classpath is a confusing property because it is used by 
> MapReduce and not YARN, and MapReduce already has 
> mapreduce.application.classpath, which provides the same functionality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5365) Set mapreduce.job.classloader to true by default

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5365:

Labels: BB2015-05-TBR  (was: )

> Set mapreduce.job.classloader to true by default
> 
>
> Key: MAPREDUCE-5365
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5365
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.0.5-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5365.patch
>
>
> MAPREDUCE-1700 introduced the mapreduce.job.classpath option, which uses a 
> custom classloader to separate system classes from user classes.  It seems 
> like there are only rare cases when a user would not want this on, and that 
> it should enabled by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-3807) JobTracker needs fix similar to HDFS-94

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-3807:

Labels: BB2015-05-TBR newbie  (was: newbie)

> JobTracker needs fix similar to HDFS-94
> ---
>
> Key: MAPREDUCE-3807
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3807
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Harsh J
>  Labels: BB2015-05-TBR, newbie
> Attachments: MAPREDUCE-3807.patch
>
>
> 1.0 JobTracker's jobtracker.jsp page currently shows:
> {code}
> Cluster Summary (Heap Size is <%= 
> StringUtils.byteDesc(Runtime.getRuntime().totalMemory()) %>/<%= 
> StringUtils.byteDesc(Runtime.getRuntime().maxMemory()) %>)
> {code}
> It could use an improvement same as HDFS-94 to reflect live heap usage more 
> accurately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5188) error when verify FileType of RS_SOURCE in getCompanionBlocks in BlockPlacementPolicyRaid.java

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5188:

Labels: BB2015-05-TBR contrib/raid  (was: contrib/raid)

> error when verify FileType of RS_SOURCE in getCompanionBlocks  in 
> BlockPlacementPolicyRaid.java
> ---
>
> Key: MAPREDUCE-5188
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5188
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/raid
>Affects Versions: 2.0.2-alpha
>Reporter: junjin
>Assignee: junjin
>Priority: Critical
>  Labels: BB2015-05-TBR, contrib/raid
> Fix For: 2.0.2-alpha
>
> Attachments: MAPREDUCE-5188.patch
>
>
> error when verify FileType of RS_SOURCE in getCompanionBlocks  in 
> BlockPlacementPolicyRaid.java
> need change xorParityLength in line #379 to rsParityLength since it's for 
> verifying RS_SOURCE  type



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4346) Adding a refined version of JobTracker.getAllJobs() and exposing through the JobClient

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4346:

Labels: BB2015-05-TBR  (was: )

> Adding a refined version of JobTracker.getAllJobs() and exposing through the 
> JobClient
> --
>
> Key: MAPREDUCE-4346
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4346
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1
>Reporter: Ahmed Radwan
>Assignee: Ahmed Radwan
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-4346.patch, MAPREDUCE-4346_rev2.patch, 
> MAPREDUCE-4346_rev3.patch, MAPREDUCE-4346_rev4.patch
>
>
> The current implementation for JobTracker.getAllJobs() returns all submitted 
> jobs in any state, in addition to retired jobs. This list can be long and 
> represents an unneeded overhead especially in the case of clients only 
> interested in jobs in specific state(s). 
> It is beneficial to include a refined version where only jobs having specific 
> statuses are returned and retired jobs are optional to include. 
> I'll be uploading an initial patch momentarily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5150) Backport 2009 terasort (MAPREDUCE-639) to branch-1

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5150:

Labels: BB2015-05-TBR  (was: )

> Backport 2009 terasort (MAPREDUCE-639) to branch-1
> --
>
> Key: MAPREDUCE-5150
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5150
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: examples
>Affects Versions: 1.2.0
>Reporter: Gera Shegalov
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5150-branch-1.patch
>
>
> Users evaluate performance of Hadoop clusters using different benchmarks such 
> as TeraSort. However, terasort version in branch-1 is outdated. It works on 
> teragen dataset that cannot exceed 4 billion unique keys and it does not have 
> the fast non-sampling partitioner SimplePartitioner either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-3936) Clients should not enforce counter limits

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-3936:

Labels: BB2015-05-TBR  (was: )

> Clients should not enforce counter limits 
> --
>
> Key: MAPREDUCE-3936
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3936
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv1
>Reporter: Tom White
>Assignee: Tom White
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-3936.patch, MAPREDUCE-3936.patch
>
>
> The code for enforcing counter limits (from MAPREDUCE-1943) creates a static 
> JobConf instance to load the limits, which may throw an exception if the 
> client limit is set to be lower than the limit on the cluster (perhaps 
> because the cluster limit was raised from the default).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4469) Resource calculation in child tasks is CPU-heavy

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4469:

Labels: BB2015-05-TBR  (was: )

> Resource calculation in child tasks is CPU-heavy
> 
>
> Key: MAPREDUCE-4469
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4469
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: performance, task
>Affects Versions: 1.0.3
>Reporter: Todd Lipcon
>Assignee: Ahmed Radwan
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-4469.patch, MAPREDUCE-4469_rev2.patch, 
> MAPREDUCE-4469_rev3.patch, MAPREDUCE-4469_rev4.patch, 
> MAPREDUCE-4469_rev5.patch
>
>
> In doing some benchmarking on a hadoop-1 derived codebase, I noticed that 
> each of the child tasks was doing a ton of syscalls. Upon stracing, I noticed 
> that it's spending a lot of time looping through all the files in /proc to 
> calculate resource usage.
> As a test, I added a flag to disable use of the ResourceCalculatorPlugin 
> within the tasks. On a CPU-bound 500G-sort workload, this improved total job 
> runtime by about 10% (map slot-seconds by 14%, reduce slot seconds by 8%)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4271) Make TestCapacityScheduler more robust with non-Sun JDK

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4271:

Labels: BB2015-05-TBR alt-jdk capacity  (was: alt-jdk capacity)

> Make TestCapacityScheduler more robust with non-Sun JDK
> ---
>
> Key: MAPREDUCE-4271
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4271
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: capacity-sched
>Affects Versions: 1.0.3
>Reporter: Luke Lu
>Assignee: Yu Gao
>  Labels: BB2015-05-TBR, alt-jdk, capacity
> Attachments: MAPREDUCE-4271-branch1-v2.patch, 
> mapreduce-4271-branch-1.patch, test-afterepatch.result, 
> test-beforepatch.result, test-patch.result
>
>
> The capacity scheduler queue is initialized with a HashMap, the values of 
> which are later added to a list (a queue for assigning tasks). 
> TestCapacityScheduler depends on the order of the list hence not portable 
> across JDKs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-1290) DBOutputFormat does not support rewriteBatchedStatements when using MySQL jdbc drivers

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-1290:

Labels: BB2015-05-TBR DBOutoutFormat patch  (was: DBOutoutFormat patch)

> DBOutputFormat does not support rewriteBatchedStatements when using MySQL 
> jdbc drivers
> --
>
> Key: MAPREDUCE-1290
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1290
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.1
>Reporter: Joe Crobak
>  Labels: BB2015-05-TBR, DBOutoutFormat, patch
> Attachments: MAPREDUCE-1290.patch, MapReduce-1290-trunk.patch
>
>
> The DBOutputFormat adds a semi-colon to the end of the INSERT statement that 
> it uses to save fields to the database.  Semicolons are typically used in 
> command line programs but are not needed when using the JDBC API.  In this 
> case, the stray semi-colon breaks rewriteBatchedStatement support. See: 
> http://forums.mysql.com/read.php?39,271526,271526#msg-271526 for an example.
> In my use case, rewriteBatchedStatement is very useful because it increases 
> the speed of inserts and reduces memory consumption.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4293) Rumen TraceBuilder gets NPE some times

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4293:

Labels: BB2015-05-TBR  (was: )

> Rumen TraceBuilder gets NPE some times
> --
>
> Key: MAPREDUCE-4293
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4293
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tools/rumen
>Reporter: Ravi Gummadi
>Assignee: Ravi Gummadi
>  Labels: BB2015-05-TBR
> Attachments: 4293.patch
>
>
> Rumen TraceBuilder's JobBuilder.processTaskFailedEvent throws NPE if 
> failedDueToAttempt is not available in history.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4882) Error in estimating the length of the output file in Spill Phase

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4882:

Labels: BB2015-05-TBR patch  (was: patch)

> Error in estimating the length of the output file in Spill Phase
> 
>
> Key: MAPREDUCE-4882
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4882
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.2, 1.0.3
> Environment: Any Environment
>Reporter: Lijie Xu
>Assignee: Jerry Chen
>  Labels: BB2015-05-TBR, patch
> Attachments: MAPREDUCE-4882.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The sortAndSpill() method in MapTask.java has an error in estimating the 
> length of the output file. 
> The "long size" should be "(bufvoid - bufstart) + bufend" not "(bufvoid - 
> bufend) + bufstart" when "bufend < bufstart".
> Here is the original code in MapTask.java.
>  private void sortAndSpill() throws IOException, ClassNotFoundException,
>InterruptedException {
>   //approximate the length of the output file to be the length of the
>   //buffer + header lengths for the partitions
>   long size = (bufend >= bufstart
>   ? bufend - bufstart
>   : (bufvoid - bufend) + bufstart) +
>   partitions * APPROX_HEADER_LENGTH;
>   FSDataOutputStream out = null;
> --
> I had a test on "TeraSort". A snippet from mapper's log is as follows:
> MapTask: Spilling map output: record full = true
> MapTask: bufstart = 157286200; bufend = 10485460; bufvoid = 199229440
> MapTask: kvstart = 262142; kvend = 131069; length = 655360
> MapTask: Finished spill 3
> In this occasioin, Spill Bytes should be (199229440 - 157286200) + 10485460 = 
> 52428700 (52 MB) because the number of spilled records is 524287 and each 
> record costs 100B.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-3881) building fail under Windows

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-3881:

Labels: BB2015-05-TBR  (was: )

> building fail under Windows
> ---
>
> Key: MAPREDUCE-3881
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3881
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
> Environment: D:\os\hadoopcommon>mvn --version
> Apache Maven 3.0.4 (r1232337; 2012-01-17 16:44:56+0800)
> Maven home: C:\portable\maven\bin\..
> Java version: 1.7.0_02, vendor: Oracle Corporation
> Java home: C:\Program Files (x86)\Java\jdk1.7.0_02\jre
> Default locale: zh_CN, platform encoding: GBK
> OS name: "windows 7", version: "6.1", arch: "x86", family: "windows"
>Reporter: Changming Sun
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: pom.xml.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> hadoop-mapreduce-project\hadoop-yarn\hadoop-yarn-common\pom.xml is not 
> portable.
>  
> generate-version
> generate-sources
> 
>   scripts/saveVersion.sh
>   
> ${project.version}
> ${project.build.directory}
>   
> 
> 
>   exec
> 
>   
> when I built it under windows , I got a such error:
> [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2:exec 
> (gen
> erate-version) on project hadoop-yarn-common: Command execution failed. 
> Cannot r
> un program "scripts\saveVersion.sh" (in directory 
> "D:\os\hadoopcommon\hadoop-map
> reduce-project\hadoop-yarn\hadoop-yarn-common"): CreateProcess error=2, 
> 
> ? -> [Help 1]
> we should modify it like this: (copied from 
> hadoop-common-project\hadoop-common\pom.xml)
> 
>   
>  dir="${project.build.directory}/generated-sources/java"/>
> 
>  line="${basedir}/dev-support/saveVersion.sh 
> ${project.version} ${project.build.directory}/generated-sources/java"/>
> 
>   
> 
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4998) backport MAPREDUCE-3376: Old mapred API combiner uses NULL reporter to branch-1

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4998:

Labels: BB2015-05-TBR  (was: )

> backport MAPREDUCE-3376: Old mapred API combiner uses NULL reporter to 
> branch-1
> ---
>
> Key: MAPREDUCE-4998
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4998
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>Reporter: Jim Donofrio
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-4998-branch-1.patch
>
>
> http://s.apache.org/eI9
> backport MAPREDUCE-3376: Old mapred API combiner uses NULL reporter to 
> branch-1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4473) tasktracker rank on machines.jsp?type=active

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4473:

Labels: BB2015-05-TBR tasktracker  (was: tasktracker)

> tasktracker rank on machines.jsp?type=active
> 
>
> Key: MAPREDUCE-4473
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4473
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tasktracker
>Affects Versions: 0.20.2, 0.21.0, 0.22.0, 0.23.0, 0.23.1, 1.0.0, 1.0.1, 
> 1.0.2, 1.0.3
>Reporter: jian fan
>Priority: Minor
>  Labels: BB2015-05-TBR, tasktracker
> Attachments: MAPREDUCE-4473.patch
>
>
> sometimes we need to simple judge which tasktracker is down from the page of 
> machines.jsp?type=active



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4917) multiple BlockFixer should be supported in order to improve scalability and reduce too much work on single BlockFixer

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4917:

Labels: BB2015-05-TBR patch  (was: patch)

> multiple BlockFixer should be supported in order to improve scalability and 
> reduce too much work on single BlockFixer
> -
>
> Key: MAPREDUCE-4917
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4917
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/raid
>Affects Versions: 0.22.0
>Reporter: Jun Jin
>Assignee: Jun Jin
>  Labels: BB2015-05-TBR, patch
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-4917.1.patch, MAPREDUCE-4917.2.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> current implementation can only run single BlockFixer since the fsck (in 
> RaidDFSUtil.getCorruptFiles) only check the whole DFS file system. multiple 
> BlockFixer will do the same thing and try to fix same file if multiple 
> BlockFixer launched.
> the change/fix will be mainly in BlockFixer.java and 
> RaidDFSUtil.getCorruptFile(), to enable fsck to check the different paths 
> defined in separated Raid.xml for single RaidNode/BlockFixer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-695) MiniMRCluster while shutting down should not wait for currently running jobs to finish

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-695:
---
Labels: BB2015-05-TBR  (was: )

> MiniMRCluster while shutting down should not wait for currently running jobs 
> to finish
> --
>
> Key: MAPREDUCE-695
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-695
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.0.3
>Reporter: Sreekanth Ramakrishnan
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: mapreduce-695.patch
>
>
> Currently in {{org.apache.hadoop.mapred.MiniMRCluster.shutdown()}} we do a 
> {{waitTaskTrackers()}} which can cause {{MiniMRCluster}} to hang indefinitely 
> when used in conjunction with Controlled jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4261) MRAppMaster throws NPE while stopping RMContainerAllocator service

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4261:

Labels: BB2015-05-TBR  (was: )

> MRAppMaster throws NPE while stopping RMContainerAllocator service
> --
>
> Key: MAPREDUCE-4261
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4261
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am, mrv2
>Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.1-alpha, 2.0.2-alpha
>Reporter: Devaraj K
>Assignee: Devaraj K
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-4261.patch
>
>
> {code:xml}
> 2012-05-16 18:55:54,222 INFO [Thread-1] 
> org.apache.hadoop.yarn.service.CompositeService: Error stopping 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerLauncherRouter
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerLauncherRouter.stop(MRAppMaster.java:716)
>   at 
> org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99)
>   at 
> org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1036)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> 2012-05-16 18:55:54,222 INFO [Thread-1] 
> org.apache.hadoop.yarn.service.CompositeService: Error stopping 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getStat(RMContainerAllocator.java:521)
>   at 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.stop(RMContainerAllocator.java:227)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.stop(MRAppMaster.java:668)
>   at 
> org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99)
>   at 
> org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1036)
>   at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-2058) FairScheduler:NullPointerException in web interface when JobTracker not initialized

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-2058:

Labels: BB2015-05-TBR  (was: )

> FairScheduler:NullPointerException in web interface when JobTracker not 
> initialized
> ---
>
> Key: MAPREDUCE-2058
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2058
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/fair-share
>Affects Versions: 0.22.0, 1.0.4
>Reporter: Dan Adkins
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-2058-branch-1.patch, MAPREDUCE-2058.patch
>
>
> When I contact the jobtracker web interface prior to the job tracker being 
> fully initialized (say, if hdfs is still in safe mode), I get the following 
> error:
> 10/09/09 18:06:02 ERROR mortbay.log: /jobtracker.jsp
> java.lang.NullPointerException
> at 
> org.apache.hadoop.mapred.FairScheduler.getJobs(FairScheduler.java:909)
> at 
> org.apache.hadoop.mapred.JobTracker.getJobsFromQueue(JobTracker.java:4357)
> at 
> org.apache.hadoop.mapred.JobTracker.getQueueInfoArray(JobTracker.java:4334)
> at 
> org.apache.hadoop.mapred.JobTracker.getRootQueues(JobTracker.java:4295)
> at 
> org.apache.hadoop.mapred.jobtracker_jsp.generateSummaryTable(jobtracker_jsp.java:44)
> at 
> org.apache.hadoop.mapred.jobtracker_jsp._jspService(jobtracker_jsp.java:176)
> at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1124)
> at 
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:857)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:361)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:324)
> at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)   
>  at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
> at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4639) CombineFileInputFormat#getSplits should throw IOException when input paths contain a directory

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4639:

Labels: BB2015-05-TBR  (was: )

> CombineFileInputFormat#getSplits should throw IOException when input paths 
> contain a directory
> --
>
> Key: MAPREDUCE-4639
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4639
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Reporter: Jim Donofrio
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-4639.patch
>
>
> FileInputFormat#getSplits throws an IOException when the input paths contain 
> a directory. CombineFileInputFormat should do the same, otherwise the jo will 
> not fail until the record reader is initialized when FileSystem#open will say 
> that the directory does not exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-2393) No total min share limitation of all pools

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-2393:

Labels: BB2015-05-TBR fair scheduler  (was: fair scheduler)

> No total min share limitation of all pools
> --
>
> Key: MAPREDUCE-2393
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2393
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/fair-share
>Affects Versions: 0.21.0
>Reporter: Denny Ye
>  Labels: BB2015-05-TBR, fair, scheduler
> Attachments: MAPREDUCE-2393.patch
>
>
> hi, there is no limitation about min share of all pools with cluster total 
> shares. User can define arbitrary amount of min share for each pool. It has 
> such description in , but no regular code. 
> It may critical for slot distribution. One pool can hold all cluster slots to 
> meet it's min share that greater than cluster total slots very much.
> If that case has happened, we should scaled down proportionally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4308) Remove excessive split log messages

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4308:

Labels: BB2015-05-TBR  (was: )

> Remove excessive split log messages
> ---
>
> Key: MAPREDUCE-4308
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4308
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker
>Affects Versions: 1.0.3
>Reporter: Kihwal Lee
>  Labels: BB2015-05-TBR
> Attachments: mapreduce-4308-branch-1.patch
>
>
> Job tracker currently prints out information on every split.
> {noformat}
> 2012-05-20 00:06:01,985 INFO org.apache.hadoop.mapred.JobInProgress: 
> tip:task_201205100740_1745_m_00 has split on node:/192.168.0.1
> /my.totally.madeup.host.com
> {noformat}
> I looked at one cluster and these messages were taking up more than 30% of 
> the JT log. If jobs have large number of maps, it can be worse. I think it is 
> reasonable to lower the log level of the statement from INFO to DEBUG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5097) Job.addArchiveToClassPath is ignored when running job with LocalJobRunner

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5097:

Labels: BB2015-05-TBR  (was: )

> Job.addArchiveToClassPath is ignored when running job with LocalJobRunner
> -
>
> Key: MAPREDUCE-5097
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5097
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Alex Baranau
>Assignee: Alex Baranau
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5097-ugly-test.patch, MAPREDUCE-5097.patch
>
>
> Using external dependency jar in mr job. Adding it to the job classpath via 
> Job.addArchiveToClassPath(...) doesn't work when running with LocalJobRunner 
> (i.e. in unit test). This makes it harder to unit-test such jobs (with 
> third-party runtime dependencies).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4136) Hadoop streaming might succeed even through reducer fails

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4136:

Labels: BB2015-05-TBR  (was: )

> Hadoop streaming might succeed even through reducer fails
> -
>
> Key: MAPREDUCE-4136
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4136
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 0.20.205.0
>Reporter: Wouter de Bie
>  Labels: BB2015-05-TBR
> Attachments: mapreduce-4136.patch
>
>
> Hadoop streaming can even succeed even though the reducer has failed. This 
> happens when Hadoop calls {{PipeReducer.close()}}, but in the mean time the 
> reducer has failed and the process has died. When {{clientOut_.flush()}} 
> throws an {{IOException}} in {{PipeMapRed.mapRedFinish()}} this exception is 
> caught but only logged. The exit status of the child process is never checked 
> and task is marked as successful.
> I've attached a patch that seems to fix it for us.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-3882) fix some compile warnings of hadoop-mapreduce-examples

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-3882:

Labels: BB2015-05-TBR  (was: )

> fix some compile warnings of hadoop-mapreduce-examples
> --
>
> Key: MAPREDUCE-3882
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3882
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
> Environment: Windows 7
>Reporter: Changming Sun
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: mapreduce-3882.patch
>
>   Original Estimate: 2m
>  Remaining Estimate: 2m
>
> fix some compile warnings of hadoop-mapreduce-examples



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4482) Backport MR sort plugin(MAPREDUCE-2454) to Hadoop 1.2

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4482:

Labels: BB2015-05-TBR  (was: )

> Backport MR sort plugin(MAPREDUCE-2454) to Hadoop 1.2
> -
>
> Key: MAPREDUCE-4482
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4482
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: mrv1
>Affects Versions: 1.2.0
>Reporter: Mariappan Asokan
>Assignee: Mariappan Asokan
>  Labels: BB2015-05-TBR
> Attachments: HadoopSortPlugin.pdf, 
> mapreduce-4482-release-1.1.0-rc4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4506) EofException / 'connection reset by peer' while copying map output

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4506:

Labels: BB2015-05-TBR  (was: )

> EofException / 'connection reset by peer' while copying map output 
> ---
>
> Key: MAPREDUCE-4506
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4506
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.0.3
> Environment: Ubuntu Linux 12.04 LTS, 64-bit, Java 6 update 33
>Reporter: Piotr Kołaczkowski
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: RamManager.patch, ReduceTask.patch
>
>
> When running complex mapreduce jobs with many mappers and reducers (e.g. 8 
> mappers, 8 reducers on a 8 core machine), sometimes the following exceptions 
> pop up in the logs during the shuffle phase:
> {noformat}
> WARN [570516323@qtp-2060060479-164] 2012-07-19 02:50:21,229 TaskTracker.java 
> (line 3894) getMapOutput(attempt_201207161621_0217_m_71_0,0) failed :
> org.mortbay.jetty.EofException
> at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:787)
> at 
> org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:568)
> at 
> org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1005)
> at 
> org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:648)
> at 
> org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:579)
> at 
> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3872)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166)
> at 
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:326)
> at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> Caused by: java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcher.write0(Native Method)
> at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
> at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:72)
> at sun.nio.ch.IOUtil.write(IOUtil.java:43)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
> at org.mortbay.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:169)
> at 
> org.mortbay.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:221)
> at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:721)
> {noformat}
> The problem looks like some network problems at first, however it turns out 
> that hadoop shuffleInMemory sometimes deliberately closes map-output-copy 
> connections just to reopen them a few milliseconds later, because of 
> temporary unavailability of free memory. Because the sending side does not 
> expect this, an exception is thrown. Additionally this leads to wasting 
> resources on the sender side, which does more work than required serving 
> additional requests. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4956) The Additional JH Info Should Be Exposed

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4956:

Labels: BB2015-05-TBR  (was: )

> The Additional JH Info Should Be Exposed
> 
>
> Key: MAPREDUCE-4956
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4956
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-4956_1.patch, MAPREDUCE-4956_2.patch, 
> MAPREDUCE-4956_3.patch
>
>
> In MAPREDUCE-4838, the addition info has been added to JH. This info is 
> useful to be exposed, at least via UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-3876) vertica query, sql command not properly ended

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-3876:

Labels: BB2015-05-TBR hadoop newbie patch  (was: hadoop newbie patch)

> vertica query, sql command not properly ended
> -
>
> Key: MAPREDUCE-3876
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3876
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.0.0
> Environment: Red Hat 5.5
> Oracle 11
>Reporter: Joseph Doss
>  Labels: BB2015-05-TBR, hadoop, newbie, patch
> Attachments: HADOOP-oracleDriver-src.patch
>
>
> When running a test script, we're getting a java IO exception thrown.
> This test works on hadoop-0.20.0 but not on hadoop-1.0.0.
> Fri Feb 17 11:36:40 EST 2012
> Running processes with name syncGL.sh: 0
> LIB_JARS: 
> /home/hadoop/verticasync/lib/vertica_4.1.14_jdk_5.jar,/home/hadoop/verticasync/lib/mail.jar,/home/hadoop/verticasync/lib/jdbc14.jar
> VERTICA_SYNC_JAR: /home/hadoop/verticasync/lib/vertica-sync.jar
> PROPERTIES_FILE: 
> /home/hadoop/verticasync/config/ssp-vertica-sync-gl.properties
> Starting Vertica data sync - GL - process
> Warning: $HADOOP_HOME is deprecated.
> 12/02/17 11:36:43 INFO mapred.JobClient: Running job: job_201202171122_0001
> 12/02/17 11:36:44 INFO mapred.JobClient:  map 0% reduce 0%
> 12/02/17 11:36:56 INFO mapred.JobClient: Task Id : 
> attempt_201202171122_0001_m_00_0, Status : FAILED
> java.io.IOException: ORA-00933: SQL command not properly ended
>   at 
> org.apache.hadoop.mapred.lib.db.DBInputFormat.getRecordReader(DBInputFormat.java:289)
>   at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:197)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> 12/02/17 11:36:57 INFO mapred.JobClient: Task Id : 
> attempt_201202171122_0001_m_01_0, Status : FAILED
> java.io.IOException: ORA-00933: SQL command not properly ended
>   at 
> org.apache.hadoop.mapred.lib.db.DBInputFormat.getRecordReader(DBInputFormat.java:289)
>   at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:197)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6342) Make POM project names consistent

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6342:

Labels: BB2015-05-TBR  (was: )

> Make POM project names consistent
> -
>
> Key: MAPREDUCE-6342
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6342
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Reporter: Rohith
>Assignee: Rohith
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6342.patch
>
>
> This is track MR changes for POM changes  by name



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5799) add default value of MR_AM_ADMIN_USER_ENV

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5799:

Labels: BB2015-05-TBR  (was: )

> add default value of MR_AM_ADMIN_USER_ENV
> -
>
> Key: MAPREDUCE-5799
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5799
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Liyin Liang
>Assignee: Rajesh Kartha
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5799-1.diff, MAPREDUCE-5799.002.patch, 
> MAPREDUCE-5799.diff
>
>
> Submit a 1 map + 1 reduce sleep job with the following config:
> {code}
>   
>   mapreduce.map.output.compress
>   true
>   
>   
>   mapreduce.map.output.compress.codec
>   org.apache.hadoop.io.compress.SnappyCodec
>   
> 
>   mapreduce.job.ubertask.enable
>   true
> 
> {code}
> And the LinuxContainerExecutor is enable on NodeManager.
> This job will fail with the following error:
> {code}
> 2014-03-18 21:28:20,153 FATAL [uber-SubtaskRunner] 
> org.apache.hadoop.mapred.LocalContainerLauncher: Error running local 
> (uberized) 'child' : java.lang.UnsatisfiedLinkError: 
> org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
> at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native 
> Method)
> at 
> org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
> at 
> org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132)
> at 
> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
> at 
> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163)
> at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:115)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1583)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462)
> at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700)
> at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> at 
> org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:317)
> at 
> org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:232)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> When create a ContainerLaunchContext for task in 
> TaskAttemptImpl.createCommonContainerLaunchContext(), the 
> DEFAULT_MAPRED_ADMIN_USER_ENV which is 
> "LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native" is added to the environment. 
> Where when create a ContainerLaunchContext for mrappmaster in 
> YARNRunner.createApplicationSubmissionContext(), there is no default 
> environment. So the ubermode job fails to find native lib.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6350) JobHistory doesn't support fully-functional search

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6350:

Labels: BB2015-05-TBR  (was: )

> JobHistory doesn't support fully-functional search
> --
>
> Key: MAPREDUCE-6350
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6350
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Siqi Li
>Assignee: Siqi Li
>Priority: Critical
>  Labels: BB2015-05-TBR
> Attachments: YARN-1614.v1.patch, YARN-1614.v2.patch
>
>
> job history server will only output the first 50 characters of the job names 
> in webUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6284) Add a 'task attempt state' to MapReduce Application Master REST API

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6284:

Labels: BB2015-05-TBR  (was: )

> Add a 'task attempt state' to MapReduce Application Master REST API
> ---
>
> Key: MAPREDUCE-6284
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6284
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Ryu Kobayashi
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6284.1.patch, MAPREDUCE-6284.1.patch, 
> MAPREDUCE-6284.2.patch, MAPREDUCE-6284.3.patch, MAPREDUCE-6284.3.patch
>
>
> It want to 'task attempt state' on the 'App state' similarly REST API.
> GET http:///proxy/ _id>/ws/v1/mapreduce/jobs//tasks//attempts//state
> PUT http:///proxy/ _id>/ws/v1/mapreduce/jobs//tasks//attempts//state
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6338) MR AppMaster does not honor ephemeral port range

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6338:

Labels: BB2015-05-TBR  (was: )

> MR AppMaster does not honor ephemeral port range
> 
>
> Key: MAPREDUCE-6338
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6338
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am, mrv2
>Affects Versions: 2.6.0
>Reporter: Frank Nguyen
>Assignee: Frank Nguyen
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6338.002.patch
>
>
> The MR AppMaster should only use port ranges defined in the 
> yarn.app.mapreduce.am.job.client.port-range property.  On initial startup of 
> the MRAppMaster, it does use the port range defined in the property.  
> However, it also opens up a listener on a random ephemeral port.  This is not 
> the Jetty listener.  It is another listener opened by the MRAppMaster via 
> another thread and is recognized by the RM.  Other nodes will try to 
> communicate to it via that random port.  With firewall settings on, the MR 
> job will fail because the random port is not opened.  This problem has caused 
> others to have all OS ephemeral ports opened to have MR jobs run.
> This is related to MAPREDUCE-4079



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6332) Add more required API's to MergeManager interface

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6332:

Labels: BB2015-05-TBR  (was: )

> Add more required API's to MergeManager interface 
> --
>
> Key: MAPREDUCE-6332
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6332
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 2.5.0, 2.6.0, 2.7.0
>Reporter: Rohith
>Assignee: Rohith
>  Labels: BB2015-05-TBR
> Attachments: 0001-MAPREDUCE-6332.patch, 0002-MAPREDUCE-6332.patch
>
>
> MR provides ability to the user for plugin custom ShuffleConsumerPlugin using 
> *mapreduce.job.reduce.shuffle.consumer.plugin.class*.  When the user is 
> allowed to use this configuration as plugin, user also interest in 
> implementing his own MergeManagerImpl. 
> But now , user is forced to use MR provided MergeManagerImpl instead of 
> custom MergeManagerImpl when user is using shuffle.consumer.plugin class. 
> There should be well defined API's in MergeManager that can be used for any 
> implementation without much effort to user for custom implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5465) Container killed before hprof dumps profile.out

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5465:

Labels: BB2015-05-TBR  (was: )

> Container killed before hprof dumps profile.out
> ---
>
> Key: MAPREDUCE-5465
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mr-am, mrv2
>Reporter: Radim Kolar
>Assignee: Ming Ma
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, 
> MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, 
> MAPREDUCE-5465-7.patch, MAPREDUCE-5465-8.patch, MAPREDUCE-5465-9.patch, 
> MAPREDUCE-5465.patch
>
>
> If there is profiling enabled for mapper or reducer then hprof dumps 
> profile.out at process exit. It is dumped after task signaled to AM that work 
> is finished.
> AM kills container with finished work without waiting for hprof to finish 
> dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 
> works) , it could not finish dump in time before being killed making entire 
> dump unusable because cpu and heap stats are missing.
> There needs to be better delay before container is killed if profiling is 
> enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5733) Define and use a constant for property "textinputformat.record.delimiter"

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-5733:

Labels: BB2015-05-TBR  (was: )

> Define and use a constant for property "textinputformat.record.delimiter"
> -
>
> Key: MAPREDUCE-5733
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5733
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Gelesh
>Assignee: Gelesh
>Priority: Trivial
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5733.patch, MAPREDUCE-5733_2.patch
>
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> (Configugration) conf.set("textinputformat.record.delimiter","myDelimiter") , 
> is bound to typo error. Lets have it as a Static String in some class, to 
> minimise such error. This would also help in IDE like eclipse suggesting the 
> String.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6316) Task Attempt List entries should link to the task overview

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6316:

Labels: BB2015-05-TBR  (was: )

> Task Attempt List entries should link to the task overview
> --
>
> Key: MAPREDUCE-6316
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6316
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Siqi Li
>Assignee: Siqi Li
>  Labels: BB2015-05-TBR
> Attachments: AM attempt page.png, AM task page.png, All Attempts 
> page.png, MAPREDUCE-6316.v1.patch, MAPREDUCE-6316.v2.patch, 
> MAPREDUCE-6316.v3.patch, Task Overview page.png
>
>
> Typical workflow is to click on the list of failed attempts. Then you want to 
> look at the counters, or the list of attempts of just one task in general. If 
> each entry task attempt id linked the task id portion of it back to the task, 
> we would not have to go through the list of tasks to search for the task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6305) AM/Task log page should be able to link back to the job

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6305:

Labels: BB2015-05-TBR  (was: )

> AM/Task log page should be able to link back to the job
> ---
>
> Key: MAPREDUCE-6305
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6305
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Siqi Li
>Assignee: Siqi Li
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6305.v1.patch, MAPREDUCE-6305.v2.patch, 
> MAPREDUCE-6305.v3.patch, MAPREDUCE-6305.v4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6336) Enable v2 FileOutputCommitter by default

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6336:

Labels: BB2015-05-TBR  (was: )

> Enable v2 FileOutputCommitter by default
> 
>
> Key: MAPREDUCE-6336
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6336
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Affects Versions: 2.7.0
>Reporter: Gera Shegalov
>Assignee: Siqi Li
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6336.v1.patch
>
>
> This JIRA is to propose making new FileOutputCommitter behavior from 
> MAPREDUCE-4815 enabled by default in trunk, and potentially in branch-2. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6269) improve JobConf to add option to not share Credentials between jobs.

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6269:

Labels: BB2015-05-TBR  (was: )

> improve JobConf to add option to not share Credentials between jobs.
> 
>
> Key: MAPREDUCE-6269
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6269
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Reporter: zhihai xu
>Assignee: zhihai xu
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6269.000.patch
>
>
> Improve JobConf to add constructor to avoid sharing Credentials between jobs.
> By default the Credentials will be shared to keep the backward compatibility.
> We can add a new constructor with a new parameter to decide whether to share 
> Credentials. Some issues reported in cascading is due to corrupted credentials
> at
> https://github.com/Cascading/cascading/commit/45b33bb864172486ac43782a4d13329312d01c0e
> If we add this support in JobConf, it will benefit all job clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6241) Native compilation fails for Checksum.cc due to an incompatibility of assembler register constraint for PowerPC

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6241:

Labels: BB2015-05-TBR features  (was: features)

> Native compilation fails for Checksum.cc due to an  incompatibility of 
> assembler register constraint for PowerPC
> 
>
> Key: MAPREDUCE-6241
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6241
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.0.0, 2.6.0
> Environment: Debian/Jessie, kernel 3.18.5,  ppc64 GNU/Linux
> gcc (Debian 4.9.1-19)
> protobuf 2.6.1
> OpenJDK Runtime Environment (IcedTea 2.5.3) (7u71-2.5.3-2)
> OpenJDK Zero VM (build 24.65-b04, interpreted mode)
> source was cloned (and updated) from Apache-Hadoop's git repository 
>Reporter: Stephan Drescher
>Assignee: Binglin Chang
>Priority: Minor
>  Labels: BB2015-05-TBR, features
> Attachments: MAPREDUCE-6241.001.patch, MAPREDUCE-6241.002.patch
>
>
> Issue when using assembler code for performance optimization on the powerpc 
> platform (compiled for 32bit)
> mvn compile -Pnative -DskipTests
> [exec] /usr/bin/c++   -Dnativetask_EXPORTS -m32  -DSIMPLE_MEMCPY 
> -fno-strict-aliasing -Wall -Wno-sign-compare -g -O2 -DNDEBUG -fPIC 
> -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target/native/javah
>  
> -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src
>  
> -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/util
>  
> -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib
>  
> -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/test
>  
> -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src
>  
> -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target/native
>  -I/home/hadoop/Java/java7/include -I/home/hadoop/Java/java7/include/linux 
> -isystem 
> /home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/gtest/include
> -o CMakeFiles/nativetask.dir/main/native/src/util/Checksum.cc.o -c 
> /home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/util/Checksum.cc
>  [exec] CMakeFiles/nativetask.dir/build.make:744: recipe for target 
> 'CMakeFiles/nativetask.dir/main/native/src/util/Checksum.cc.o' failed
>  [exec] make[2]: Leaving directory 
> '/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target/native'
>  [exec] CMakeFiles/Makefile2:95: recipe for target 
> 'CMakeFiles/nativetask.dir/all' failed
>  [exec] make[1]: Leaving directory 
> '/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target/native'
>  [exec] Makefile:76: recipe for target 'all' failed
>  [exec] 
> /home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/util/Checksum.cc:
>  In function ‘void NativeTask::init_cpu_support_flag()’:
> /home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/util/Checksum.cc:611:14:
>  error: impossible register constraint in ‘asm’
> -->
> "popl %%ebx" : "=a" (eax), [ebx] "=r"(ebx), "=c"(ecx), "=d"(edx) : "a" 
> (eax_in) : "cc");
> <--



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6246:

Labels: BB2015-05-TBR DB2 mapreduce  (was: DB2 mapreduce)

> DBOutputFormat.java appending extra semicolon to query which is incompatible 
> with DB2
> -
>
> Key: MAPREDUCE-6246
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1, mrv2
>Affects Versions: 2.4.1
> Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x
> Platform: xSeries, pSeries
> Browser: Firefox, IE
> Security Settings: No Security, Flat file, LDAP, PAM
> File System: HDFS, GPFS FPO
>Reporter: ramtin
>Assignee: ramtin
>  Labels: BB2015-05-TBR, DB2, mapreduce
> Attachments: MAPREDUCE-6246.002.patch, MAPREDUCE-6246.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> DBoutputformat is used for writing output of mapreduce jobs to the database 
> and when used with db2 jdbc drivers it fails with following error
> com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, 
> SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, 
> DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at 
> com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127)
> In DBOutputFormat class there is constructQuery method that generates "INSERT 
> INTO" statement with semicolon(";") at the end.
> Semicolon is ANSI SQL-92 standard character for a statement terminator but 
> this feature is disabled(OFF) as a default settings in IBM DB2.
> Although by using -t we can turn it ON for db2. 
> (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2).
>  But there are some products that already built on top of this default 
> setting (OFF) so by turning ON this feature make them error prone.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4683) We need to fix our build to create/distribute hadoop-mapreduce-client-core-tests.jar

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-4683:

Labels: BB2015-05-TBR  (was: )

> We need to fix our build to create/distribute 
> hadoop-mapreduce-client-core-tests.jar
> 
>
> Key: MAPREDUCE-4683
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4683
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Reporter: Arun C Murthy
>Assignee: Akira AJISAKA
>Priority: Critical
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-4683.patch
>
>
> We need to fix our build to create/distribute 
> hadoop-mapreduce-client-core-tests.jar, need this before MAPREDUCE-4253



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6068) Illegal progress value warnings in map tasks

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6068:

Labels: BB2015-05-TBR  (was: )

> Illegal progress value warnings in map tasks
> 
>
> Key: MAPREDUCE-6068
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6068
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2, task
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Binglin Chang
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6068.002.patch, MAPREDUCE-6068.v1.patch
>
>
> When running a terasort on latest trunk, I see the following in my task logs:
> {code}
> 2014-09-02 17:42:28,437 INFO [main] org.apache.hadoop.mapred.MapTask: Map 
> output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
> 2014-09-02 17:42:42,238 WARN [main] org.apache.hadoop.util.Progress: Illegal 
> progress value found, progress is larger than 1. Progress will be changed to 1
> 2014-09-02 17:42:42,238 WARN [main] org.apache.hadoop.util.Progress: Illegal 
> progress value found, progress is larger than 1. Progress will be changed to 1
> 2014-09-02 17:42:42,241 INFO [main] org.apache.hadoop.mapred.MapTask: 
> Starting flush of map output
> {code}
> We should eliminate these warnings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6315) Implement retrieval of logs for crashed MR-AM via jhist in the staging directory

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6315:

Labels: BB2015-05-TBR  (was: )

> Implement retrieval of logs for crashed MR-AM via jhist in the staging 
> directory
> 
>
> Key: MAPREDUCE-6315
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6315
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client, mr-am
>Affects Versions: 2.7.0
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
>Priority: Critical
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6315.001.patch
>
>
> When all AM attempts crash, there is no record of them in JHS. Thus no easy 
> way to get the logs. This JIRA automates the procedure by utilizing the jhist 
> file in the staging directory. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6298) Job#toString throws an exception when not in state RUNNING

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6298:

Labels: BB2015-05-TBR  (was: )

> Job#toString throws an exception when not in state RUNNING
> --
>
> Key: MAPREDUCE-6298
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6298
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Lars Francke
>Assignee: Lars Francke
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6298.1.patch
>
>
> Job#toString calls {{ensureState(JobState.RUNNING);}} as the very first 
> thing. That method causes an Exception to be thrown which is not nice.
> One thing this breaks is usage of Job on the Scala (e.g. Spark) REPL as that 
> calls toString after every invocation and that fails every time.
> I'll attach a patch that checks state and if it's RUNNING prints the original 
> message and if not prints something else.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6320) Configuration of retrieved Job via Cluster is not properly set-up

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated MAPREDUCE-6320:

Labels: BB2015-05-TBR  (was: )

> Configuration of retrieved Job via Cluster is not properly set-up
> -
>
> Key: MAPREDUCE-6320
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6320
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jens Rabe
>Assignee: Jens Rabe
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6320.001.patch, MAPREDUCE-6320.002.patch, 
> MAPREDUCE-6320.003.patch
>
>
> When getting a Job via the Cluster API, it is not correctly configured.
> To reproduce this:
> # Submit a MR job, and set some arbitrary parameter to its configuration
> {code:java}
> job.getConfiguration().set("foo", "bar");
> job.setJobName("foo-bug-demo");
> {code}
> # Get the job in a client:
> {code:java}
> final Cluster c = new Cluster(conf);
> final JobStatus[] statuses = c.getAllJobStatuses();
> final JobStatus s = ... // get the status for the job named foo-bug-demo
> final Job j = c.getJob(s.getJobId());
> final Configuration conf = job.getConfiguration();
> {code}
> # Get its "foo" entry
> {code:java}
> final String s = conf.get("foo");
> {code}
> # Expected: s is "bar"; But: s is null.
> The reason is that the job's configuration is stored on HDFS (the 
> Configuration has a resource with a *hdfs://* URL) and in the *loadResource* 
> it is changed to a path on the local file system 
> (hdfs://host.domain:port/tmp/hadoop-yarn/... is changed to 
> /tmp/hadoop-yarn/...), which does not exist, and thus the configuration is 
> not populated.
> The bug happens in the *Cluster* class, where *JobConfs* are created from 
> *status.getJobFile()*. A quick fix would be to copy this job file to a 
> temporary file in the local file system and populate the JobConf from this 
> file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    1   2   3   4   5   6   7   8   9   10   >