[jira] [Updated] (MAPREDUCE-6027) mr jobs with relative paths can fail
[ https://issues.apache.org/jira/browse/MAPREDUCE-6027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6027: Labels: BB2015-05-TBR (was: ) > mr jobs with relative paths can fail > > > Key: MAPREDUCE-6027 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6027 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: job submission >Reporter: Wing Yew Poon >Assignee: Wing Yew Poon > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-6027.patch > > > I built hadoop from branch-2 and tried to run terasort as follows: > {noformat} > wypoon$ bin/hadoop jar > share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-SNAPSHOT.jar terasort > sort-input sort-output > 14/08/07 08:57:55 INFO terasort.TeraSort: starting > 2014-08-07 08:57:56.229 java[36572:1903] Unable to load realm info from > SCDynamicStore > 14/08/07 08:57:56 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 14/08/07 08:57:57 INFO input.FileInputFormat: Total input paths to process : 2 > Spent 156ms computing base-splits. > Spent 2ms computing TeraScheduler splits. > Computing input splits took 159ms > Sampling 2 splits of 2 > Making 1 from 10 sampled records > Computing parititions took 626ms > Spent 789ms computing partitions. > 14/08/07 08:57:57 INFO client.RMProxy: Connecting to ResourceManager at > localhost/127.0.0.1:8032 > 14/08/07 08:57:58 INFO mapreduce.JobSubmitter: Cleaning up the staging area > /tmp/hadoop-yarn/staging/wypoon/.staging/job_1407426900134_0001 > java.lang.IllegalArgumentException: Can not create a Path from an empty URI > at org.apache.hadoop.fs.Path.checkPathArg(Path.java:140) > at org.apache.hadoop.fs.Path.(Path.java:192) > at > org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288) > at > org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.checkPermissionOfOther(ClientDistributedCacheManager.java:275) > at > org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.ancestorsHaveExecutePermissions(ClientDistributedCacheManager.java:256) > at > org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.isPublic(ClientDistributedCacheManager.java:243) > at > org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineCacheVisibilities(ClientDistributedCacheManager.java:162) > at > org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:58) > at > org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265) > at > org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303) > at org.apache.hadoop.examples.terasort.TeraSort.run(TeraSort.java:316) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.examples.terasort.TeraSort.main(TeraSort.java:325) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72) > at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145) > at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > {noformat} > If I used absolute paths for the input and out directories, the job runs fine. > This breakage is due to HADOOP-10876.
[jira] [Updated] (MAPREDUCE-5876) SequenceFileRecordReader NPE if close() is called before initialize()
[ https://issues.apache.org/jira/browse/MAPREDUCE-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5876: Labels: BB2015-05-TBR (was: ) > SequenceFileRecordReader NPE if close() is called before initialize() > - > > Key: MAPREDUCE-5876 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5876 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 2.3.0, 2.4.0 >Reporter: Reinis Vicups >Assignee: Tsuyoshi Ozawa > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5876.1.patch > > > org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader extends > org.apache.hadoop.mapreduce.RecordReader which in turn implements > java.io.Closeable. > According to java spec the java.io.Closeable#close() has to be idempotent > (http://docs.oracle.com/javase/7/docs/api/java/io/Closeable.html) which is > not. > An NPE is being thrown if close() method is invoked without previously > calling initialize() method. This happens because SequenceFile.Reader in is > null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6003) Resource Estimator suggests huge map output in some cases
[ https://issues.apache.org/jira/browse/MAPREDUCE-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6003: Labels: BB2015-05-TBR (was: ) > Resource Estimator suggests huge map output in some cases > - > > Key: MAPREDUCE-6003 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6003 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 1.2.1 >Reporter: Chengbing Liu >Assignee: Chengbing Liu > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-6003-branch-1.2.patch > > > In some cases, ResourceEstimator can return way too large map output > estimation. This happens when input size is not correctly calculated. > A typical case is when joining two Hive tables (one in HDFS and the other in > HBase). The maps that process the HBase table finish first, which has a 0 > length of inputs due to its TableInputFormat. Then for a map that processes > HDFS table, the estimated output size is very large because of the wrong > input size, causing the map task not possible to be assigned. > There are two possible solutions to this problem: > (1) Make input size correct for each case, e.g. HBase, etc. > (2) Use another algorithm to estimate the map output, or at least make it > closer to reality. > I prefer the second way, since the first would require all possibilities to > be taken care of. It is not easy for some inputs such as URIs. > In my opinion, we could make a second estimation which is independent of the > input size: > estimationB = (completedMapOutputSize / completedMaps) * totalMaps * 10 > Here, multiplying by 10 makes the estimation more conservative, so that it > will be less likely to assign it to some where not big enough. > The former estimation goes like this: > estimationA = (inputSize * completedMapOutputSize * 2.0) / > completedMapInputSize > My suggestion is to take minimum of the two estimations: > estimation = min(estimationA, estimationB) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-3182) loadgen ignores -m command line when writing random data
[ https://issues.apache.org/jira/browse/MAPREDUCE-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-3182: Labels: BB2015-05-TBR (was: ) > loadgen ignores -m command line when writing random data > > > Key: MAPREDUCE-3182 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3182 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2, test >Affects Versions: 0.23.0, 2.3.0 >Reporter: Jonathan Eagles >Assignee: Chen He > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-3182.patch > > > If no input directories are specified, loadgen goes into a special mode where > random data is generated and written. In that mode, setting the number of > mappers (-m command line option) is overridden by a calculation. Instead, it > should take into consideration the user specified number of mappers and fall > back to the calculation. In addition, update the documentation as well to > match the new behavior in the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5871) Estimate Job Endtime
[ https://issues.apache.org/jira/browse/MAPREDUCE-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5871: Labels: BB2015-05-TBR (was: ) > Estimate Job Endtime > > > Key: MAPREDUCE-5871 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5871 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Maysam Yabandeh >Assignee: Maysam Yabandeh > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5871.patch > > > YARN-1969 adds a new earliest-endtime-first policy to the fair scheduler. As > a prerequisite step, the AppMaster should estimate its end time and send it > to the RM via the heartbeat. This jira focuses on how the AppMaster performs > this estimation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6023) Fix SuppressWarnings from "unchecked" to "rawtypes" in O.A.H.mapreduce.lib.input.TaggedInputSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6023: Labels: BB2015-05-TBR newbie (was: newbie) > Fix SuppressWarnings from "unchecked" to "rawtypes" in > O.A.H.mapreduce.lib.input.TaggedInputSplit > - > > Key: MAPREDUCE-6023 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6023 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Junping Du >Assignee: Abhilash Srimat Tirumala Pallerlamudi >Priority: Minor > Labels: BB2015-05-TBR, newbie > Attachments: MAPREDUCE-6023.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-3517) "map.input.path" is null at the first split when use CombieFileInputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-3517: Labels: BB2015-05-TBR (was: ) > "map.input.path" is null at the first split when use CombieFileInputFormat > --- > > Key: MAPREDUCE-3517 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3517 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task >Affects Versions: 0.20.203.0 >Reporter: wanbin > Labels: BB2015-05-TBR > Attachments: CombineFileRecordReader.diff, MAPREDUCE-3517.02.patch > > > "map.input.path" is null at the first split when use CombieFileInputFormat. > because in runNewMapper function, mapContext instead of taskContext which is > set "map.input.path". so we need set "map.input.path" again to mapContext -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5883) "Total megabyte-seconds" in job counters is slightly misleading
[ https://issues.apache.org/jira/browse/MAPREDUCE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5883: Labels: BB2015-05-TBR (was: ) > "Total megabyte-seconds" in job counters is slightly misleading > --- > > Key: MAPREDUCE-5883 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5883 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.0.0, 2.4.0 >Reporter: Nathan Roberts >Assignee: Nathan Roberts >Priority: Minor > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5883.patch > > > The following counters are in milliseconds so "megabyte-seconds" might be > better stated as "megabyte-milliseconds" > MB_MILLIS_MAPS.name= Total megabyte-seconds taken by all map > tasks > MB_MILLIS_REDUCES.name=Total megabyte-seconds taken by all reduce > tasks > VCORES_MILLIS_MAPS.name= Total vcore-seconds taken by all map tasks > VCORES_MILLIS_REDUCES.name=Total vcore-seconds taken by all reduce > tasks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5845) TestShuffleHandler failing intermittently on windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5845: Labels: BB2015-05-TBR (was: ) > TestShuffleHandler failing intermittently on windows > > > Key: MAPREDUCE-5845 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5845 > Project: Hadoop Map/Reduce > Issue Type: Test >Reporter: Varun Vasudev >Assignee: Varun Vasudev > Labels: BB2015-05-TBR > Attachments: apache-mapreduce-5845.0.patch > > > TestShuffleHandler fails intermittently on Windows - specifically, > testClientClosesConnection. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5225) SplitSampler in mapreduce.lib should use a SPLIT_STEP to jump around splits
[ https://issues.apache.org/jira/browse/MAPREDUCE-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5225: Labels: BB2015-05-TBR (was: ) > SplitSampler in mapreduce.lib should use a SPLIT_STEP to jump around splits > --- > > Key: MAPREDUCE-5225 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5225 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5225.1.patch > > > Now, SplitSampler only samples the first maxSplitsSampled splits, caused by > MAPREDUCE-1820. However, jumping around all splits is in general preferable > than the first N splits. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5700) historyServer can't show container's log when aggregation is not enabled
[ https://issues.apache.org/jira/browse/MAPREDUCE-5700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5700: Labels: BB2015-05-TBR (was: ) > historyServer can't show container's log when aggregation is not enabled > > > Key: MAPREDUCE-5700 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5700 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.23.7, 2.0.4-alpha, 2.2.0 > Environment: yarn.log-aggregation-enable=false , HistoryServer will > show like this: > Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669 >Reporter: Hong Shen >Assignee: Hong Shen > Labels: BB2015-05-TBR > Attachments: yarn-647-2.patch, yarn-647.patch > > > When yarn.log-aggregation-enable is seted to false, after a MR_App complete, > we can't view the container's log from the HistoryServer, it shows message > like: > Aggregation is not enabled. Try the nodemanager at hd13-vm1:34669 > Since we don't want to aggregate the container's log, because it will be a > pressure to namenode. but sometimes we also want to take a look at > container's log. > Should we show the container's log across HistoryServer even if > yarn.log-aggregation-enable is seted to false. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5708) Duplicate String.format in getSpillFileForWrite
[ https://issues.apache.org/jira/browse/MAPREDUCE-5708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5708: Labels: BB2015-05-TBR (was: ) > Duplicate String.format in getSpillFileForWrite > --- > > Key: MAPREDUCE-5708 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5708 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Konstantin Weitz >Priority: Minor > Labels: BB2015-05-TBR > Attachments: 0001-Removed-duplicate-String.format.patch > > Original Estimate: 10m > Remaining Estimate: 10m > > The code responsible for formatting the spill file name (namely > _getSpillFileForWrite_) unnecessarily calls _String.format_ twice. This does > not only affect performance, but leads to the weird requirement that task > attempt ids cannot contain _%_ characters (because these would be interpreted > as format specifiers in the outside _String.format_ call). > I assume this was done by mistake, as it could only be useful if task attempt > ids contained _%n_. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5216) While using TextSplitter in DataDrivenDBInputformat, the lower limit (split start) always remains the same, for all splits.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5216: Labels: BB2015-05-TBR (was: ) > While using TextSplitter in DataDrivenDBInputformat, the lower limit (split > start) always remains the same, for all splits. > --- > > Key: MAPREDUCE-5216 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5216 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Gelesh > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5216.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > While using TextSplitter in DataDrivenDBInputformat, the lower limit (split > start) always remains the same, for all splits. > ie, > Split 1 Start =A, End = M, Split 2 Start =A, End = P, Split 3 Start =A, End = > S, > instead of > Split 1 Start =A, End = M, Split 2 Start =M, End = P, Split 3 Start =P, End = > S, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5577) Allow querying the JobHistoryServer by job arrival time
[ https://issues.apache.org/jira/browse/MAPREDUCE-5577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5577: Labels: BB2015-05-TBR (was: ) > Allow querying the JobHistoryServer by job arrival time > --- > > Key: MAPREDUCE-5577 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5577 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5577.patch > > > The JobHistoryServer REST APIs currently allow querying by job submit time > and finish time. However, jobs don't necessarily arrive in order of their > finish time, meaning that a client who wants to stay on top of all completed > jobs needs to query large time intervals to make sure they're not missing > anything. Exposing functionality to allow querying by the time a job lands > at the JobHistoryServer would allow clients to set the start of their query > interval to the time of their last query. > The arrival time of a job would be defined as the time that it lands in the > done directory and can be picked up using the last modified date on history > files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4487) Reduce job latency by removing hardcoded sleep statements
[ https://issues.apache.org/jira/browse/MAPREDUCE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4487: Labels: BB2015-05-TBR (was: ) > Reduce job latency by removing hardcoded sleep statements > - > > Key: MAPREDUCE-4487 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4487 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv1, mrv2, performance >Affects Versions: 1.0.3, 2.0.0-alpha >Reporter: Tom White >Assignee: Tom White > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-4487-mr2.patch, MAPREDUCE-4487.patch > > > There are a few places in MapReduce where there are hardcoded sleep > statements. By replacing them with wait/notify or similar it's possible to > reduce latency for short running jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5227) JobTrackerMetricsSource and QueueMetrics should standardize naming rules
[ https://issues.apache.org/jira/browse/MAPREDUCE-5227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5227: Labels: BB2015-05-TBR (was: ) > JobTrackerMetricsSource and QueueMetrics should standardize naming rules > > > Key: MAPREDUCE-5227 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5227 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv1 >Affects Versions: 1.1.3, 1.2.1 >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa >Priority: Minor > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5227-1.1-branch.1.patch, > MAPREDUCE-5227-branch-1.1.patch, MAPREDUCE-5227.1.patch > > > JobTrackerMetricsSource and QueueMetrics provides users with some metrics, > but its naming rules( "jobs_running", "running_maps", "running_reduces") > sometimes confuses users. It should be standardized. > One concern is backward compatibility, so one idea is to share > MetricMutableGaugeInt object from old and new property name. > e.g. to share runningMaps from "running_maps" and "maps_running". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5248) Let NNBenchWithoutMR specify the replication factor for its test
[ https://issues.apache.org/jira/browse/MAPREDUCE-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5248: Labels: BB2015-05-TBR (was: ) > Let NNBenchWithoutMR specify the replication factor for its test > > > Key: MAPREDUCE-5248 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5248 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client, test >Affects Versions: 3.0.0 >Reporter: Erik Paulson >Assignee: Erik Paulson >Priority: Minor > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5248.patch, MAPREDUCE-5248.txt > > Original Estimate: 1h > Remaining Estimate: 1h > > The NNBenchWithoutMR test creates files with a replicationFactorPerFile > hard-coded to 1. It'd be nice to be able to specify that on the commandline. > Also, it'd be great if MAPREDUCE-4750 was merged along with this fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5264) FileAlreadyExistsException is assumed to be thrown by FileSystem#mkdirs or FileContext#mkdir in the codebase
[ https://issues.apache.org/jira/browse/MAPREDUCE-5264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5264: Labels: BB2015-05-TBR (was: ) > FileAlreadyExistsException is assumed to be thrown by FileSystem#mkdirs or > FileContext#mkdir in the codebase > > > Key: MAPREDUCE-5264 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5264 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.0.0, 2.1.0-beta >Reporter: Rémy SAISSY > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5264.20130607.1.patch > > > According to https://issues.apache.org/jira/browse/HADOOP-9438, > FileSystem#mkdirs and FileContext#mkdir do not throw > FileAlreadyExistsException if the directory already exist. > Some places in the mapreduce codebase assumes FileSystem#mkdirs or > FileContext#mkdir throw FileAlreadyExistsException. > At least the following files are concerned: > - YarnChild.java > - JobHistoryEverntHandler.java > - HistoryFileManager.java > It would be good to re-review and patch this if needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4840) Delete dead code and deprecate public API related to skipping bad records
[ https://issues.apache.org/jira/browse/MAPREDUCE-4840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4840: Labels: BB2015-05-TBR (was: ) > Delete dead code and deprecate public API related to skipping bad records > - > > Key: MAPREDUCE-4840 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4840 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.0-alpha >Reporter: Mostafa Elhemali >Priority: Minor > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-4840.patch > > > It looks like the decision was made in MAPREDUCE-1932 to remove support for > skipping bad records rather than fix it (it doesn't work right now in trunk). > If that's the case then we should probably delete all the dead code related > to it and deprecate the public API's for it right? > Dead code I'm talking about: > 1. Task class: skipping, skipRanges, writeSkipRecs > 2. MapTask class: SkippingRecordReader inner class > 3. ReduceTask class: SkippingReduceValuesIterator inner class > 4. Tests: TestBadRecords > Public API: > 1. SkipBadRecords class -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5969: Labels: BB2015-05-TBR (was: ) > Private non-Archive Files' size add twice in Distributed Cache directory size > calculation. > -- > > Key: MAPREDUCE-5969 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 >Reporter: zhihai xu >Assignee: zhihai xu > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5969.branch1.1.patch, > MAPREDUCE-5969.branch1.patch > > > Private non-Archive Files' size add twice in Distributed Cache directory size > calculation. Private non-Archive Files list is passed in by "-files" command > line option. The Distributed Cache directory size is used to check whether > the total cache files size exceed the cache size limitation, the default > cache size limitation is 10G. > I add log in addCacheInfoUpdate and setSize in > TrackerDistributedCacheManager.java. > I use the following command to test: > hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files > hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar > /tmp/zxu/test_in/ /tmp/zxu/test_out > to add two files into distributed cache:WordCount.java and wordcount.jar. > WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 > bytes. The total should be 6260. > The log show these files size added twice: > add one time before download to local node and add second time after download > to local node, so total file number becomes 4 instead of 2: > addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local > addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local > addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local > In the code, for Private non-Archive File, the first time we add file size is > at > getLocalCache: > {code} > if (!isArchive) { > //for private archives, the lengths come over RPC from the > //JobLocalizer since the JobLocalizer is the one who expands > //archives and gets the total length > lcacheStatus.size = fileStatus.getLen(); > LOG.info("getLocalCache:" + localizedPath + " size = " > + lcacheStatus.size); > // Increase the size and sub directory count of the cache > // from baseDirSize and baseDirNumberSubDir. > baseDirManager.addCacheInfoUpdate(lcacheStatus); > } > {code} > The second time we add file size is at > setSize: > {code} > synchronized (status) { > status.size = size; > baseDirManager.addCacheInfoUpdate(status); > } > {code} > The fix is not to add the file size for for Private non-Archive File after > download(downloadCacheObject). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4818) Easier identification of tasks that timeout during localization
[ https://issues.apache.org/jira/browse/MAPREDUCE-4818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4818: Labels: BB2015-05-TBR usability (was: usability) > Easier identification of tasks that timeout during localization > --- > > Key: MAPREDUCE-4818 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4818 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am >Affects Versions: 0.23.3, 2.0.3-alpha >Reporter: Jason Lowe >Assignee: Siqi Li > Labels: BB2015-05-TBR, usability > Attachments: MAPREDUCE-4818.v1.patch, MAPREDUCE-4818.v2.patch, > MAPREDUCE-4818.v3.patch, MAPREDUCE-4818.v4.patch, MAPREDUCE-4818.v5.patch > > > When a task is taking too long to localize and is killed by the AM due to > task timeout, the job UI/history is not very helpful. The attempt simply > lists a diagnostic stating it was killed due to timeout, but there are no > logs for the attempt since it never actually got started. There are log > messages on the NM that show the container never made it past localization by > the time it was killed, but users often do not have access to those logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4216) Make MultipleOutputs generic to support non-file output formats
[ https://issues.apache.org/jira/browse/MAPREDUCE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4216: Labels: BB2015-05-TBR Output (was: Output) > Make MultipleOutputs generic to support non-file output formats > --- > > Key: MAPREDUCE-4216 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4216 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Affects Versions: 1.0.2 >Reporter: Robbie Strickland > Labels: BB2015-05-TBR, Output > Attachments: MAPREDUCE-4216.patch > > > The current MultipleOutputs implementation is tied to FileOutputFormat in > such a way that it is not extensible to other types of output. It should be > made more generic, such as with an interface that can be implemented for > different outputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-3115) OOM When the value for the property "mapred.map.multithreadedrunner.class" is set to MultithreadedMapper instance.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-3115: Labels: BB2015-05-TBR (was: ) > OOM When the value for the property "mapred.map.multithreadedrunner.class" is > set to MultithreadedMapper instance. > -- > > Key: MAPREDUCE-3115 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3115 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 >Affects Versions: 0.23.0, 1.0.0 > Environment: NA >Reporter: Bhallamudi Venkata Siva Kamesh > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-3115.2.patch, MAPREDUCE-3115.patch > > > When we set the value for the property *mapred.map.multithreadedrunner.class* > as instance of MultithreadedMapper, using > MultithreadedMapper.setMapperClass(), it simply throws > IllegalArgumentException. > But when we set the same property, using job's conf object using > job.getConfiguration().setClass(*mapred.map.multithreadedrunner.class*, > MultithreadedMapper.class, Mapper.class), throws OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5203) Make AM of M/R Use NMClient
[ https://issues.apache.org/jira/browse/MAPREDUCE-5203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5203: Labels: BB2015-05-TBR (was: ) > Make AM of M/R Use NMClient > --- > > Key: MAPREDUCE-5203 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5203 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5203.1.patch, MAPREDUCE-5203.2.patch, > MAPREDUCE-5203.3.patch, MAPREDUCE-5203.4.patch, MAPREDUCE-5203.5.patch > > > YARN-422 adds NMClient. AM of mapreduce should use it instead of using the > raw ContainerManager proxy directly. ContainerLauncherImpl needs to be > changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-2632) Avoid calling the partitioner when the numReduceTasks is 1.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-2632: Labels: BB2015-05-TBR (was: ) > Avoid calling the partitioner when the numReduceTasks is 1. > --- > > Key: MAPREDUCE-2632 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2632 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.23.0 >Reporter: Ravi Teja Ch N V >Assignee: Ravi Teja Ch N V > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-2632-1.patch, MAPREDUCE-2632.patch > > > We can avoid the call to the partitioner when the number of reducers is > 1.This will avoid the unnecessary computations by the partitioner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5374) CombineFileRecordReader does not set "map.input.*" configuration parameters for first file read
[ https://issues.apache.org/jira/browse/MAPREDUCE-5374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5374: Labels: BB2015-05-TBR (was: ) > CombineFileRecordReader does not set "map.input.*" configuration parameters > for first file read > --- > > Key: MAPREDUCE-5374 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5374 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Dave Beech >Assignee: Dave Beech > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5374.patch, MAPREDUCE-5374.patch > > > The CombineFileRecordReader operates on splits consisting of multiple files. > Each time a new record reader is initialised for a "chunk", certain > parameters are supposed to be set on the configuration object > (map.input.file, map.input.start and map.input.length) > However, the first reader is initialised in a different way to subsequent > ones (i.e. initialize is called by the MapTask directly rather than from > inside the record reader class). Because of this, these config parameters are > not set properly and are returned as null when you access them from inside a > mapper. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5499) Fix synchronization issues of the setters/getters of *PBImpl which take in/return lists
[ https://issues.apache.org/jira/browse/MAPREDUCE-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5499: Labels: BB2015-05-TBR (was: ) > Fix synchronization issues of the setters/getters of *PBImpl which take > in/return lists > --- > > Key: MAPREDUCE-5499 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5499 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Xuan Gong > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5499.1.patch, MAPREDUCE-5499.2.patch > > > Similar to YARN-609. There're the following *PBImpls which need to be fixed: > 1. GetDiagnosticsResponsePBImpl > 2. GetTaskAttemptCompletionEventsResponsePBImpl > 3. GetTaskReportsResposnePBImpl > 4. CounterGroupPBImpl > 5. JobReportPBImpl > 6. TaskReportPBImpl -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5392) "mapred job -history all" command throws IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/MAPREDUCE-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5392: Labels: BB2015-05-TBR (was: ) > "mapred job -history all" command throws IndexOutOfBoundsException > -- > > Key: MAPREDUCE-5392 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5392 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 3.0.0, 2.0.5-alpha, 2.2.0 >Reporter: Shinichi Yamashita >Assignee: Shinichi Yamashita > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5392.2.patch, MAPREDUCE-5392.3.patch, > MAPREDUCE-5392.4.patch, MAPREDUCE-5392.5.patch, MAPREDUCE-5392.patch, > MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, > MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, > MAPREDUCE-5392.patch, MAPREDUCE-5392.patch > > > When I use an "all" option by "mapred job -history" comamnd, the following > exceptions are displayed and do not work. > {code} > Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String > index out of range: -3 > at java.lang.String.substring(String.java:1875) > at > org.apache.hadoop.mapreduce.util.HostUtil.convertTrackerNameToHostName(HostUtil.java:49) > at > org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.getTaskLogsUrl(HistoryViewer.java:459) > at > org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.printAllTaskAttempts(HistoryViewer.java:235) > at > org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.print(HistoryViewer.java:117) > at org.apache.hadoop.mapreduce.tools.CLI.viewHistory(CLI.java:472) > at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:313) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1233) > {code} > This is because a node name recorded in History file is not given "tracker_". > Therefore it makes modifications to be able to read History file even if a > node name is not given by "tracker_". > In addition, it fixes the URL of displayed task log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4065) Add .proto files to built tarball
[ https://issues.apache.org/jira/browse/MAPREDUCE-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4065: Labels: BB2015-05-TBR (was: ) > Add .proto files to built tarball > - > > Key: MAPREDUCE-4065 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4065 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: build >Affects Versions: 0.23.2, 2.4.0 >Reporter: Ralph H Castain >Assignee: Tsuyoshi Ozawa > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-4065.1.patch > > > Please add the .proto files to the built tarball so that users can build 3rd > party tools that use protocol buffers without having to do an svn checkout of > the source code. > Sorry I don't know more about Maven, or I would provide a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6040) distcp should automatically use /.reserved/raw when run by the superuser
[ https://issues.apache.org/jira/browse/MAPREDUCE-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6040: Labels: BB2015-05-TBR (was: ) > distcp should automatically use /.reserved/raw when run by the superuser > > > Key: MAPREDUCE-6040 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6040 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp >Affects Versions: 3.0.0 >Reporter: Andrew Wang >Assignee: Charles Lamb > Labels: BB2015-05-TBR > Attachments: HDFS-6134-Distcp-cp-UseCasesTable2.pdf, > MAPREDUCE-6040.001.patch, MAPREDUCE-6040.002.patch > > > On HDFS-6134, [~sanjay.radia] asked for distcp to automatically prepend > /.reserved/raw if the distcp is being performed by the superuser and > /.reserved/raw is supported by both the source and destination filesystems. > This behavior only occurs if none of the src and target pathnames are > /.reserved/raw. > The -disablereservedraw flag can be used to disable this option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5889) Deprecate FileInputFormat.setInputPaths(Job, String) and FileInputFormat.addInputPaths(Job, String)
[ https://issues.apache.org/jira/browse/MAPREDUCE-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5889: Labels: BB2015-05-TBR newbie (was: newbie) > Deprecate FileInputFormat.setInputPaths(Job, String) and > FileInputFormat.addInputPaths(Job, String) > --- > > Key: MAPREDUCE-5889 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5889 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Akira AJISAKA >Assignee: Akira AJISAKA >Priority: Minor > Labels: BB2015-05-TBR, newbie > Attachments: MAPREDUCE-5889.3.patch, MAPREDUCE-5889.patch, > MAPREDUCE-5889.patch > > > {{FileInputFormat.setInputPaths(Job job, String commaSeparatedPaths)}} and > {{FileInputFormat.addInputPaths(Job job, String commaSeparatedPaths)}} fail > to parse commaSeparatedPaths if a comma is included in the file path. (e.g. > Path: {{/path/file,with,comma}}) > We should deprecate these methods and document to use {{setInputPaths(Job > job, Path... inputPaths)}} and {{addInputPaths(Job job, Path... inputPaths)}} > instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5929) YARNRunner.java, path for jobJarPath not set correctly
[ https://issues.apache.org/jira/browse/MAPREDUCE-5929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5929: Labels: BB2015-05-TBR newbie patch (was: newbie patch) > YARNRunner.java, path for jobJarPath not set correctly > -- > > Key: MAPREDUCE-5929 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5929 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Chao Tian >Assignee: Rahul Palamuttam > Labels: BB2015-05-TBR, newbie, patch > Attachments: MAPREDUCE-5929.patch > > > In YARNRunner.java, line 357, > Path jobJarPath = new Path(jobConf.get(MRJobConfig.JAR)); > This causes the job.jar file to miss scheme, host and port number on > distributed file systems other than hdfs. > If we compare line 357 with line 344, there "job.xml" is actually set as > > Path jobConfPath = new Path(jobSubmitDir,MRJobConfig.JOB_CONF_FILE); > It appears "jobSubmitDir" is missing on line 357, which causes this problem. > In hdfs, the additional qualify process will correct this problem, but not > other generic distributed file systems. > The proposed change is to replace 35 7 with > Path jobJarPath = new Path(jobConf.get(jobSubmitDir,MRJobConfig.JAR)); > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6038) A boolean may be set error in the Word Count v2.0 in MapReduce Tutorial
[ https://issues.apache.org/jira/browse/MAPREDUCE-6038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6038: Labels: BB2015-05-TBR (was: ) > A boolean may be set error in the Word Count v2.0 in MapReduce Tutorial > --- > > Key: MAPREDUCE-6038 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6038 > Project: Hadoop Map/Reduce > Issue Type: Bug > Environment: java version 1.8.0_11 hostspot 64-bit >Reporter: Pei Ma >Assignee: Tsuyoshi Ozawa >Priority: Minor > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-6038.1.patch > > > As a beginner, when I learned about the basic of the mr, I found that I > cound't run the WordCount2 using the command "bin/hadoop jar wc.jar > WordCount2 /user/joe/wordcount/input /user/joe/wordcount/output" in the > Tutorial. The VM throwed the NullPoniterException at the line 47. In the line > 45, the returned default value of "conf.getBoolean" is true. That is to say > when "wordcount.skip.patterns" is not set ,the WordCount2 will continue to > execute getCacheFiles.. Then patternsURIs gets the null value. When the > "-skip" option dosen't exist, "wordcount.skip.patterns" will not be set. > Then a NullPointerException come out. > At all, the block after the if-statement in line no. 45 shoudn't be executed > when the "-skip" option dosen't exist in command. Maybe the line 45 should > like that "if (conf.getBoolean("wordcount.skip.patterns", false)) { " > .Just change the boolean. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5817) mappers get rescheduled on node transition even after all reducers are completed
[ https://issues.apache.org/jira/browse/MAPREDUCE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5817: Labels: BB2015-05-TBR (was: ) > mappers get rescheduled on node transition even after all reducers are > completed > > > Key: MAPREDUCE-5817 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5817 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster >Affects Versions: 2.3.0 >Reporter: Sangjin Lee >Assignee: Sangjin Lee > Labels: BB2015-05-TBR > Attachments: mapreduce-5817.patch > > > We're seeing a behavior where a job runs long after all reducers were already > finished. We found that the job was rescheduling and running a number of > mappers beyond the point of reducer completion. In one situation, the job ran > for some 9 more hours after all reducers completed! > This happens because whenever a node transition (to an unusable state) comes > into the app master, it just reschedules all mappers that already ran on the > node in all cases. > Therefore, if any node transition has a potential to extend the job period. > Once this window opens, another node transition can prolong it, and this can > happen indefinitely in theory. > If there is some instability in the pool (unhealthy, etc.) for a duration, > then any big job is severely vulnerable to this problem. > If all reducers have been completed, JobImpl.actOnUnusableNode() should not > reschedule mapper tasks. If all reducers are completed, the mapper outputs > are no longer needed, and there is no need to reschedule mapper tasks as they > would not be consumed anyway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5490) MapReduce doesn't set the environment variable for children processes
[ https://issues.apache.org/jira/browse/MAPREDUCE-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5490: Labels: BB2015-05-TBR (was: ) > MapReduce doesn't set the environment variable for children processes > - > > Key: MAPREDUCE-5490 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5490 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.2.1 >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5490.patch, mr-5490.patch, mr-5490.patch > > > Currently, MapReduce uses the command line argument to pass the classpath to > the child. This breaks if the process forks a child that needs the same > classpath. Such a case happens in Hive when it uses map-side joins. I propose > that we make MapReduce in branch-1 use the CLASSPATH environment variable > like YARN does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6020) Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job count
[ https://issues.apache.org/jira/browse/MAPREDUCE-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6020: Labels: BB2015-05-TBR (was: ) > Too many threads blocking on the global JobTracker lock from getJobCounters, > optimize getJobCounters to release global JobTracker lock before access the > per job counter in JobInProgress > - > > Key: MAPREDUCE-6020 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6020 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.23.10 >Reporter: zhihai xu >Assignee: zhihai xu > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-6020.branch1.patch > > > Too many threads blocking on the global JobTracker lock from getJobCounters, > optimize getJobCounters to release global JobTracker lock before access the > per job counter in JobInProgress. It may be a lot of JobClients to call > getJobCounters in JobTracker at the same time, Current code will lock the > JobTracker to block all the threads to get counter from JobInProgress. It is > better to unlock the JobTracker when get counter from > JobInProgress(job.getCounters(counters)). So all the theads can run parallel > when access its own job counter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5981) Log levels of certain MR logs can be changed to DEBUG
[ https://issues.apache.org/jira/browse/MAPREDUCE-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5981: Labels: BB2015-05-TBR (was: ) > Log levels of certain MR logs can be changed to DEBUG > - > > Key: MAPREDUCE-5981 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5981 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Varun Saxena >Assignee: Varun Saxena > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5981.patch > > > Following map reduce logs can be changed to DEBUG log level. > 1. In > org.apache.hadoop.mapreduce.task.reduce.Fetcher#copyFromHost(Fetcher.java : > 313), the second log is not required to be at info level. This can be moved > to debug as a warn log is anyways printed if verifyReply fails. > SecureShuffleUtils.verifyReply(replyHash, encHash, shuffleSecretKey); > LOG.info("for url="+msgToEncode+" sent hash and received reply"); > 2. Thread related info need not be printed in logs at INFO level. Below 2 > logs can be moved to DEBUG > a) In > org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl#getHost(ShuffleSchedulerImpl.java > : 381), below log can be changed to DEBUG >LOG.info("Assigning " + host + " with " + host.getNumKnownMapOutputs() + >" to " + Thread.currentThread().getName()); > b) In > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.getMapsForHost(ShuffleSchedulerImpl.java > : 411), below log can be changed to DEBUG > LOG.info("assigned " + includedMaps + " of " + totalSize + " to " + > host + " to " + Thread.currentThread().getName()); > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5362) clean up POM dependencies
[ https://issues.apache.org/jira/browse/MAPREDUCE-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5362: Labels: BB2015-05-TBR (was: ) > clean up POM dependencies > - > > Key: MAPREDUCE-5362 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5362 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: build >Affects Versions: 2.1.0-beta >Reporter: Alejandro Abdelnur >Assignee: Alejandro Abdelnur > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5362.patch, mr-5362-0.patch > > > Intermediate 'pom' modules define dependencies inherited by leaf modules. > This is causing issues in intellij IDE. > We should normalize the leaf modules like in common, hdfs and tools where all > dependencies are defined in each leaf module and the intermediate 'pom' > module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6030) In mr-jobhistory-daemon.sh, some env variables are not affected by mapred-env.sh
[ https://issues.apache.org/jira/browse/MAPREDUCE-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6030: Labels: BB2015-05-TBR (was: ) > In mr-jobhistory-daemon.sh, some env variables are not affected by > mapred-env.sh > > > Key: MAPREDUCE-6030 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6030 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Affects Versions: 2.4.1 >Reporter: Youngjoon Kim >Assignee: Youngjoon Kim >Priority: Minor > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-6030.patch > > > In mr-jobhistory-daemon.sh, some env variables are exported before sourcing > mapred-env.sh, so these variables don't use values defined in mapred-env.sh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4711) Append time elapsed since job-start-time for finished tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4711: Labels: BB2015-05-TBR (was: ) > Append time elapsed since job-start-time for finished tasks > --- > > Key: MAPREDUCE-4711 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4711 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Affects Versions: 0.23.3 >Reporter: Ravi Prakash > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-4711.branch-0.23.patch > > > In 0.20.x/1.x, the analyze job link gave this information > bq. The last Map task task_ finished at (relative to the Job launch > time): 5/10 20:23:10 (1hrs, 27mins, 54sec) > The time it took for the last task to finish needs to be calculated mentally > in 0.23. I believe we should print it next to the finish time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4957) Throw FileNotFoundException when running in single node and "mapreduce.framework.name" is local
[ https://issues.apache.org/jira/browse/MAPREDUCE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4957: Labels: BB2015-05-TBR (was: ) > Throw FileNotFoundException when running in single node and > "mapreduce.framework.name" is local > --- > > Key: MAPREDUCE-4957 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4957 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Yi Liu >Assignee: Yi Liu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-4957.patch, MAPREDUCE-4957.patch > > > Run in single node and "mapreduce.framework.name" is local, and get following > error: > java.io.FileNotFoundException: File does not exist: > /root/proj/hive-trunk/build/dist/lib/hive-builtins-0.11.0-SNAPSHOT.jar > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:772) > > at > org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208) > > at > org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:71) > > at > org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:254) > > at > org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:292) > > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:365) > > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218) > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1450) > > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:617) > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:612) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1450) > > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:612) > at > org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:446) > at > org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:683) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > Job Submission failed with exception 'java.io.FileNotFoundException(File does > not exist: > /root/proj/hive-trunk/build/dist/lib/hive-builtins-0.11.0-SNAPSHOT.jar)' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5748) Potential null pointer deference in ShuffleHandler#Shuffle#messageReceived()
[ https://issues.apache.org/jira/browse/MAPREDUCE-5748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5748: Labels: BB2015-05-TBR (was: ) > Potential null pointer deference in ShuffleHandler#Shuffle#messageReceived() > > > Key: MAPREDUCE-5748 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5748 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > Labels: BB2015-05-TBR > Attachments: > 0001-MAPREDUCE-5748-Potential-null-pointer-deference-in-S.patch > > > Starting around line 510: > {code} > ChannelFuture lastMap = null; > for (String mapId : mapIds) { > ... > } > lastMap.addListener(metrics); > lastMap.addListener(ChannelFutureListener.CLOSE); > {code} > If mapIds is empty, lastMap would remain null, leading to NPE in > addListener() call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-3486) All jobs of all queues will be returned, whethor a particular queueName is specified or not
[ https://issues.apache.org/jira/browse/MAPREDUCE-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-3486: Labels: BB2015-05-TBR (was: ) > All jobs of all queues will be returned, whethor a particular queueName is > specified or not > --- > > Key: MAPREDUCE-3486 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3486 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 1.1.3, 1.3.0, 1.2.2 >Reporter: XieXianshan >Assignee: XieXianshan >Priority: Minor > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-3486.patch > > > JobTracker.getJobsFromQueue(queueName) will return all jobs of all queues > about the jobtracker even though i specify a queueName. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5704) Optimize nextJobId in JobTracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5704: Labels: BB2015-05-TBR (was: ) > Optimize nextJobId in JobTracker > > > Key: MAPREDUCE-5704 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5704 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker, mrv1 >Affects Versions: 1.2.1 >Reporter: JamesLi >Assignee: JamesLi > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5704.patch > > > When jobtracker start, nextJobId start with 1,if we have run 3000 jobs then > restart jobtracker and run a new job,we can not see this new job on > jobtracker:5030/jobhistory.jsp unless click "get more results" button. > In jobhistory_jsp.java, array SCAN_SIZES controls job numbers displayed on > jobhistory.jsp. > I make a little chage,when jobtracker start,find the biggest id under history > done directory,job will start with maxId+1 or 1 if can not find any job files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5907) Improve getSplits() performance for fs implementations that can utilize performance gains from recursive listing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5907: Labels: BB2015-05-TBR (was: ) > Improve getSplits() performance for fs implementations that can utilize > performance gains from recursive listing > > > Key: MAPREDUCE-5907 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5907 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Affects Versions: 2.4.0 >Reporter: Sumit Kumar >Assignee: Sumit Kumar > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5907-2.patch, MAPREDUCE-5907-3.patch, > MAPREDUCE-5907.patch > > > FileInputFormat (both mapreduce and mapred implementations) use recursive > listing while calculating splits. They however do this by doing listing level > by level. That means to discover files in /foo/bar means they do listing at > /foo/bar first to get the immediate children, then make the same call on all > immediate children for /foo/bar to discover their immediate children and so > on. This doesn't scale well for object store based fs implementations like s3 > and swift because every listStatus call ends up being a webservice call to > backend. In cases where large number of files are considered for input, this > makes getSplits() call slow. > This patch adds a new set of recursive list apis that gives opportunity to > the fs implementations to optimize. The behavior remains the same for other > implementations (that is a default implementation is provided for other fs so > they don't have to implement anything new). However for objectstore based fs > implementations it provides a simple change to include recursive flag as true > (as shown in the patch) to improve listing performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4502) Node-level aggregation with combining the result of maps
[ https://issues.apache.org/jira/browse/MAPREDUCE-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4502: Labels: BB2015-05-TBR (was: ) > Node-level aggregation with combining the result of maps > > > Key: MAPREDUCE-4502 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4502 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Affects Versions: 3.0.0 >Reporter: Tsuyoshi Ozawa >Assignee: Tsuyoshi Ozawa > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-4502.1.patch, MAPREDUCE-4502.10.patch, > MAPREDUCE-4502.2.patch, MAPREDUCE-4502.3.patch, MAPREDUCE-4502.4.patch, > MAPREDUCE-4502.5.patch, MAPREDUCE-4502.6.patch, MAPREDUCE-4502.7.patch, > MAPREDUCE-4502.8.patch, MAPREDUCE-4502.8.patch, MAPREDUCE-4502.9.patch, > MAPREDUCE-4502.9.patch, MAPREDUCE-4525-pof.diff, design_v2.pdf, > design_v3.pdf, speculative_draft.pdf > > > The shuffle costs is expensive in Hadoop in spite of the existence of > combiner, because the scope of combining is limited within only one MapTask. > To solve this problem, it's a good way to aggregate the result of maps per > node/rack by launch combiner. > This JIRA is to implement the multi-level aggregation infrastructure, > including combining per container(MAPREDUCE-3902 is related), coordinating > containers by application master without breaking fault tolerance of jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5611) CombineFileInputFormat only requests a single location per split when more could be optimal
[ https://issues.apache.org/jira/browse/MAPREDUCE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5611: Labels: BB2015-05-TBR (was: ) > CombineFileInputFormat only requests a single location per split when more > could be optimal > --- > > Key: MAPREDUCE-5611 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5611 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.2.1 >Reporter: Chandra Prakash Bhagtani >Assignee: Chandra Prakash Bhagtani > Labels: BB2015-05-TBR > Attachments: CombineFileInputFormat-trunk.patch > > > I have come across an issue with CombineFileInputFormat. Actually I ran a > hive query on approx 1.2 GB data with CombineHiveInputFormat which internally > uses CombineFileInputFormat. My cluster size is 9 datanodes and > max.split.size is 256 MB > When I ran this query with replication factor 9, hive consistently creates > all 6 rack-local tasks and with replication factor 3 it creates 5 rack-local > and 1 data local tasks. > When replication factor is 9 (equal to cluster size), all the tasks should > be data-local as each datanode contains all the replicas of the input data, > but that is not happening i.e all the tasks are rack-local. > When I dug into CombineFileInputFormat.java code in getMoreSplits method, I > found the issue with the following snippet (specially in case of higher > replication factor) > {code:title=CombineFileInputFormat.java|borderStyle=solid} > for (Iterator List>> iter = nodeToBlocks.entrySet().iterator(); > iter.hasNext();) { >Map.Entry> one = iter.next(); > nodes.add(one.getKey()); > List blocksInNode = one.getValue(); > // for each block, copy it into validBlocks. Delete it from > // blockToNodes so that the same block does not appear in > // two different splits. > for (OneBlockInfo oneblock : blocksInNode) { > if (blockToNodes.containsKey(oneblock)) { > validBlocks.add(oneblock); > blockToNodes.remove(oneblock); > curSplitSize += oneblock.length; > // if the accumulated split size exceeds the maximum, then > // create this split. > if (maxSize != 0 && curSplitSize >= maxSize) { > // create an input split and add it to the splits array > addCreatedSplit(splits, nodes, validBlocks); > curSplitSize = 0; > validBlocks.clear(); > } > } > } > {code} > First node in the map nodeToBlocks has all the replicas of input file, so the > above code creates 6 splits all with only one location. Now if JT doesn't > schedule these tasks on that node, all the tasks will be rack-local, even > though all the other datanodes have all the other replicas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5621) mr-jobhistory-daemon.sh doesn't have to execute mkdir and chown all the time
[ https://issues.apache.org/jira/browse/MAPREDUCE-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5621: Labels: BB2015-05-TBR (was: ) > mr-jobhistory-daemon.sh doesn't have to execute mkdir and chown all the time > > > Key: MAPREDUCE-5621 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5621 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Affects Versions: 3.0.0 >Reporter: Shinichi Yamashita >Assignee: Shinichi Yamashita >Priority: Minor > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5621.patch > > > mr-jobhistory-daemon.sh executes mkdir and chown command to output the log > files. > This is always executed with or without a directory. In addition, this is > executed not only starting daemon but also stopping daemon. > It add "if" like hadoop-daemon.sh and yarn-daemon.sh and should control it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4980) Parallel test execution of hadoop-mapreduce-client-core
[ https://issues.apache.org/jira/browse/MAPREDUCE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4980: Labels: BB2015-05-TBR (was: ) > Parallel test execution of hadoop-mapreduce-client-core > --- > > Key: MAPREDUCE-4980 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4980 > Project: Hadoop Map/Reduce > Issue Type: Test > Components: test >Affects Versions: 3.0.0 >Reporter: Tsuyoshi Ozawa >Assignee: Andrey Klochkov > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-4980--n3.patch, MAPREDUCE-4980--n4.patch, > MAPREDUCE-4980--n5.patch, MAPREDUCE-4980--n6.patch, MAPREDUCE-4980--n7.patch, > MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n8.patch, MAPREDUCE-4980.1.patch, > MAPREDUCE-4980.patch > > > The maven surefire plugin supports parallel testing feature. By using it, the > tests can be run more faster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4330) TaskAttemptCompletedEventTransition invalidates previously successful attempt without checking if the newly completed attempt is successful
[ https://issues.apache.org/jira/browse/MAPREDUCE-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4330: Labels: BB2015-05-TBR (was: ) > TaskAttemptCompletedEventTransition invalidates previously successful attempt > without checking if the newly completed attempt is successful > --- > > Key: MAPREDUCE-4330 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4330 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.23.1 >Reporter: Bikas Saha >Assignee: Omkar Vinit Joshi > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-4330-20130415.1.patch, > MAPREDUCE-4330-20130415.patch, MAPREDUCE-4330-21032013.1.patch, > MAPREDUCE-4330-21032013.patch > > > The previously completed attempt is removed from > successAttemptCompletionEventNoMap and marked OBSOLETE. > After that, if the newly completed attempt is successful then it is added to > the successAttemptCompletionEventNoMap. > This seems wrong because the newly completed attempt could be failed and thus > there is no need to invalidate the successful attempt. > One error case would be when a speculative attempt completes with > killed/failed after the successful version has completed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4273) Make CombineFileInputFormat split result JDK independent
[ https://issues.apache.org/jira/browse/MAPREDUCE-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4273: Labels: BB2015-05-TBR (was: ) > Make CombineFileInputFormat split result JDK independent > > > Key: MAPREDUCE-4273 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4273 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 1.0.3 >Reporter: Luke Lu >Assignee: Yu Gao > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-4273-branch1-v2.patch, > mapreduce-4273-branch-1.patch, mapreduce-4273-branch-2.patch, > mapreduce-4273.patch > > > The split result of CombineFileInputFormat depends on the iteration order of > nodeToBlocks and rackToBlocks hash maps, which makes the result HashMap > implementation hence JDK dependent. > This is manifested as TestCombineFileInputFormat failures on alternative JDKs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5377) JobID is not displayed truly by "hadoop job -history" command
[ https://issues.apache.org/jira/browse/MAPREDUCE-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5377: Labels: BB2015-05-TBR newbie (was: newbie) > JobID is not displayed truly by "hadoop job -history" command > - > > Key: MAPREDUCE-5377 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5377 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 >Affects Versions: 1.2.0 >Reporter: Shinichi Yamashita >Assignee: Shinichi Yamashita >Priority: Minor > Labels: BB2015-05-TBR, newbie > Attachments: MAPREDUCE-5377.patch > > > JobID output by "hadoop job -history" command is wrong string. > {quote} > [hadoop@hadoop hadoop]$ hadoop job -history terasort > Hadoop job: 0001_1374260789919_hadoop > = > Job tracker host name: job > job tracker start time: Tue May 18 15:39:51 PDT 1976 > User: hadoop > JobName: TeraSort > JobConf: > hdfs://hadoop:8020/hadoop/mapred/staging/hadoop/.staging/job_201307191206_0001/job.xml > Submitted At: 19-7-2013 12:06:29 > Launched At: 19-7-2013 12:06:30 (0sec) > Finished At: 19-7-2013 12:06:44 (14sec) > Status: SUCCESS > {quote} > In this example, it should show "job_201307191206_0001" at "Hadoop job:", but > shows "0001_1374260789919_hadoop". In addition, "Job tracker host name" and > "job tracker start time" is invalid. > This problem can solve by fixing setting of jobId in HistoryViewer(). In > addition, it should fix the information of JobTracker at HistoryViewr. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5403) MR changes to accommodate yarn.application.classpath being moved to the server-side
[ https://issues.apache.org/jira/browse/MAPREDUCE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5403: Labels: BB2015-05-TBR (was: ) > MR changes to accommodate yarn.application.classpath being moved to the > server-side > --- > > Key: MAPREDUCE-5403 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5403 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5403-1.patch, MAPREDUCE-5403-2.patch, > MAPREDUCE-5403.patch > > > yarn.application.classpath is a confusing property because it is used by > MapReduce and not YARN, and MapReduce already has > mapreduce.application.classpath, which provides the same functionality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5365) Set mapreduce.job.classloader to true by default
[ https://issues.apache.org/jira/browse/MAPREDUCE-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5365: Labels: BB2015-05-TBR (was: ) > Set mapreduce.job.classloader to true by default > > > Key: MAPREDUCE-5365 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5365 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.0.5-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5365.patch > > > MAPREDUCE-1700 introduced the mapreduce.job.classpath option, which uses a > custom classloader to separate system classes from user classes. It seems > like there are only rare cases when a user would not want this on, and that > it should enabled by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-3807) JobTracker needs fix similar to HDFS-94
[ https://issues.apache.org/jira/browse/MAPREDUCE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-3807: Labels: BB2015-05-TBR newbie (was: newbie) > JobTracker needs fix similar to HDFS-94 > --- > > Key: MAPREDUCE-3807 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3807 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.0.0 >Reporter: Harsh J > Labels: BB2015-05-TBR, newbie > Attachments: MAPREDUCE-3807.patch > > > 1.0 JobTracker's jobtracker.jsp page currently shows: > {code} > Cluster Summary (Heap Size is <%= > StringUtils.byteDesc(Runtime.getRuntime().totalMemory()) %>/<%= > StringUtils.byteDesc(Runtime.getRuntime().maxMemory()) %>) > {code} > It could use an improvement same as HDFS-94 to reflect live heap usage more > accurately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5188) error when verify FileType of RS_SOURCE in getCompanionBlocks in BlockPlacementPolicyRaid.java
[ https://issues.apache.org/jira/browse/MAPREDUCE-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5188: Labels: BB2015-05-TBR contrib/raid (was: contrib/raid) > error when verify FileType of RS_SOURCE in getCompanionBlocks in > BlockPlacementPolicyRaid.java > --- > > Key: MAPREDUCE-5188 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5188 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/raid >Affects Versions: 2.0.2-alpha >Reporter: junjin >Assignee: junjin >Priority: Critical > Labels: BB2015-05-TBR, contrib/raid > Fix For: 2.0.2-alpha > > Attachments: MAPREDUCE-5188.patch > > > error when verify FileType of RS_SOURCE in getCompanionBlocks in > BlockPlacementPolicyRaid.java > need change xorParityLength in line #379 to rsParityLength since it's for > verifying RS_SOURCE type -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4346) Adding a refined version of JobTracker.getAllJobs() and exposing through the JobClient
[ https://issues.apache.org/jira/browse/MAPREDUCE-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4346: Labels: BB2015-05-TBR (was: ) > Adding a refined version of JobTracker.getAllJobs() and exposing through the > JobClient > -- > > Key: MAPREDUCE-4346 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4346 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv1 >Reporter: Ahmed Radwan >Assignee: Ahmed Radwan > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-4346.patch, MAPREDUCE-4346_rev2.patch, > MAPREDUCE-4346_rev3.patch, MAPREDUCE-4346_rev4.patch > > > The current implementation for JobTracker.getAllJobs() returns all submitted > jobs in any state, in addition to retired jobs. This list can be long and > represents an unneeded overhead especially in the case of clients only > interested in jobs in specific state(s). > It is beneficial to include a refined version where only jobs having specific > statuses are returned and retired jobs are optional to include. > I'll be uploading an initial patch momentarily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5150) Backport 2009 terasort (MAPREDUCE-639) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5150: Labels: BB2015-05-TBR (was: ) > Backport 2009 terasort (MAPREDUCE-639) to branch-1 > -- > > Key: MAPREDUCE-5150 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5150 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: examples >Affects Versions: 1.2.0 >Reporter: Gera Shegalov >Priority: Minor > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5150-branch-1.patch > > > Users evaluate performance of Hadoop clusters using different benchmarks such > as TeraSort. However, terasort version in branch-1 is outdated. It works on > teragen dataset that cannot exceed 4 billion unique keys and it does not have > the fast non-sampling partitioner SimplePartitioner either. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-3936) Clients should not enforce counter limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-3936: Labels: BB2015-05-TBR (was: ) > Clients should not enforce counter limits > -- > > Key: MAPREDUCE-3936 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3936 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv1 >Reporter: Tom White >Assignee: Tom White > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-3936.patch, MAPREDUCE-3936.patch > > > The code for enforcing counter limits (from MAPREDUCE-1943) creates a static > JobConf instance to load the limits, which may throw an exception if the > client limit is set to be lower than the limit on the cluster (perhaps > because the cluster limit was raised from the default). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4469) Resource calculation in child tasks is CPU-heavy
[ https://issues.apache.org/jira/browse/MAPREDUCE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4469: Labels: BB2015-05-TBR (was: ) > Resource calculation in child tasks is CPU-heavy > > > Key: MAPREDUCE-4469 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4469 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: performance, task >Affects Versions: 1.0.3 >Reporter: Todd Lipcon >Assignee: Ahmed Radwan > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-4469.patch, MAPREDUCE-4469_rev2.patch, > MAPREDUCE-4469_rev3.patch, MAPREDUCE-4469_rev4.patch, > MAPREDUCE-4469_rev5.patch > > > In doing some benchmarking on a hadoop-1 derived codebase, I noticed that > each of the child tasks was doing a ton of syscalls. Upon stracing, I noticed > that it's spending a lot of time looping through all the files in /proc to > calculate resource usage. > As a test, I added a flag to disable use of the ResourceCalculatorPlugin > within the tasks. On a CPU-bound 500G-sort workload, this improved total job > runtime by about 10% (map slot-seconds by 14%, reduce slot seconds by 8%) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4271) Make TestCapacityScheduler more robust with non-Sun JDK
[ https://issues.apache.org/jira/browse/MAPREDUCE-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4271: Labels: BB2015-05-TBR alt-jdk capacity (was: alt-jdk capacity) > Make TestCapacityScheduler more robust with non-Sun JDK > --- > > Key: MAPREDUCE-4271 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4271 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: capacity-sched >Affects Versions: 1.0.3 >Reporter: Luke Lu >Assignee: Yu Gao > Labels: BB2015-05-TBR, alt-jdk, capacity > Attachments: MAPREDUCE-4271-branch1-v2.patch, > mapreduce-4271-branch-1.patch, test-afterepatch.result, > test-beforepatch.result, test-patch.result > > > The capacity scheduler queue is initialized with a HashMap, the values of > which are later added to a list (a queue for assigning tasks). > TestCapacityScheduler depends on the order of the list hence not portable > across JDKs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-1290) DBOutputFormat does not support rewriteBatchedStatements when using MySQL jdbc drivers
[ https://issues.apache.org/jira/browse/MAPREDUCE-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-1290: Labels: BB2015-05-TBR DBOutoutFormat patch (was: DBOutoutFormat patch) > DBOutputFormat does not support rewriteBatchedStatements when using MySQL > jdbc drivers > -- > > Key: MAPREDUCE-1290 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1290 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.1 >Reporter: Joe Crobak > Labels: BB2015-05-TBR, DBOutoutFormat, patch > Attachments: MAPREDUCE-1290.patch, MapReduce-1290-trunk.patch > > > The DBOutputFormat adds a semi-colon to the end of the INSERT statement that > it uses to save fields to the database. Semicolons are typically used in > command line programs but are not needed when using the JDBC API. In this > case, the stray semi-colon breaks rewriteBatchedStatement support. See: > http://forums.mysql.com/read.php?39,271526,271526#msg-271526 for an example. > In my use case, rewriteBatchedStatement is very useful because it increases > the speed of inserts and reduces memory consumption. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4293) Rumen TraceBuilder gets NPE some times
[ https://issues.apache.org/jira/browse/MAPREDUCE-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4293: Labels: BB2015-05-TBR (was: ) > Rumen TraceBuilder gets NPE some times > -- > > Key: MAPREDUCE-4293 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4293 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tools/rumen >Reporter: Ravi Gummadi >Assignee: Ravi Gummadi > Labels: BB2015-05-TBR > Attachments: 4293.patch > > > Rumen TraceBuilder's JobBuilder.processTaskFailedEvent throws NPE if > failedDueToAttempt is not available in history. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4882) Error in estimating the length of the output file in Spill Phase
[ https://issues.apache.org/jira/browse/MAPREDUCE-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4882: Labels: BB2015-05-TBR patch (was: patch) > Error in estimating the length of the output file in Spill Phase > > > Key: MAPREDUCE-4882 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4882 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.2, 1.0.3 > Environment: Any Environment >Reporter: Lijie Xu >Assignee: Jerry Chen > Labels: BB2015-05-TBR, patch > Attachments: MAPREDUCE-4882.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > The sortAndSpill() method in MapTask.java has an error in estimating the > length of the output file. > The "long size" should be "(bufvoid - bufstart) + bufend" not "(bufvoid - > bufend) + bufstart" when "bufend < bufstart". > Here is the original code in MapTask.java. > private void sortAndSpill() throws IOException, ClassNotFoundException, >InterruptedException { > //approximate the length of the output file to be the length of the > //buffer + header lengths for the partitions > long size = (bufend >= bufstart > ? bufend - bufstart > : (bufvoid - bufend) + bufstart) + > partitions * APPROX_HEADER_LENGTH; > FSDataOutputStream out = null; > -- > I had a test on "TeraSort". A snippet from mapper's log is as follows: > MapTask: Spilling map output: record full = true > MapTask: bufstart = 157286200; bufend = 10485460; bufvoid = 199229440 > MapTask: kvstart = 262142; kvend = 131069; length = 655360 > MapTask: Finished spill 3 > In this occasioin, Spill Bytes should be (199229440 - 157286200) + 10485460 = > 52428700 (52 MB) because the number of spilled records is 524287 and each > record costs 100B. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-3881) building fail under Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-3881: Labels: BB2015-05-TBR (was: ) > building fail under Windows > --- > > Key: MAPREDUCE-3881 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3881 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: build > Environment: D:\os\hadoopcommon>mvn --version > Apache Maven 3.0.4 (r1232337; 2012-01-17 16:44:56+0800) > Maven home: C:\portable\maven\bin\.. > Java version: 1.7.0_02, vendor: Oracle Corporation > Java home: C:\Program Files (x86)\Java\jdk1.7.0_02\jre > Default locale: zh_CN, platform encoding: GBK > OS name: "windows 7", version: "6.1", arch: "x86", family: "windows" >Reporter: Changming Sun >Priority: Minor > Labels: BB2015-05-TBR > Attachments: pom.xml.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > hadoop-mapreduce-project\hadoop-yarn\hadoop-yarn-common\pom.xml is not > portable. > > generate-version > generate-sources > > scripts/saveVersion.sh > > ${project.version} > ${project.build.directory} > > > > exec > > > when I built it under windows , I got a such error: > [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2:exec > (gen > erate-version) on project hadoop-yarn-common: Command execution failed. > Cannot r > un program "scripts\saveVersion.sh" (in directory > "D:\os\hadoopcommon\hadoop-map > reduce-project\hadoop-yarn\hadoop-yarn-common"): CreateProcess error=2, > > ? -> [Help 1] > we should modify it like this: (copied from > hadoop-common-project\hadoop-common\pom.xml) > > > dir="${project.build.directory}/generated-sources/java"/> > > line="${basedir}/dev-support/saveVersion.sh > ${project.version} ${project.build.directory}/generated-sources/java"/> > > > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4998) backport MAPREDUCE-3376: Old mapred API combiner uses NULL reporter to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4998: Labels: BB2015-05-TBR (was: ) > backport MAPREDUCE-3376: Old mapred API combiner uses NULL reporter to > branch-1 > --- > > Key: MAPREDUCE-4998 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4998 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 >Reporter: Jim Donofrio >Priority: Minor > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-4998-branch-1.patch > > > http://s.apache.org/eI9 > backport MAPREDUCE-3376: Old mapred API combiner uses NULL reporter to > branch-1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4473) tasktracker rank on machines.jsp?type=active
[ https://issues.apache.org/jira/browse/MAPREDUCE-4473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4473: Labels: BB2015-05-TBR tasktracker (was: tasktracker) > tasktracker rank on machines.jsp?type=active > > > Key: MAPREDUCE-4473 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4473 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.2, 0.21.0, 0.22.0, 0.23.0, 0.23.1, 1.0.0, 1.0.1, > 1.0.2, 1.0.3 >Reporter: jian fan >Priority: Minor > Labels: BB2015-05-TBR, tasktracker > Attachments: MAPREDUCE-4473.patch > > > sometimes we need to simple judge which tasktracker is down from the page of > machines.jsp?type=active -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4917) multiple BlockFixer should be supported in order to improve scalability and reduce too much work on single BlockFixer
[ https://issues.apache.org/jira/browse/MAPREDUCE-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4917: Labels: BB2015-05-TBR patch (was: patch) > multiple BlockFixer should be supported in order to improve scalability and > reduce too much work on single BlockFixer > - > > Key: MAPREDUCE-4917 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4917 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/raid >Affects Versions: 0.22.0 >Reporter: Jun Jin >Assignee: Jun Jin > Labels: BB2015-05-TBR, patch > Fix For: 0.22.0 > > Attachments: MAPREDUCE-4917.1.patch, MAPREDUCE-4917.2.patch > > Original Estimate: 672h > Remaining Estimate: 672h > > current implementation can only run single BlockFixer since the fsck (in > RaidDFSUtil.getCorruptFiles) only check the whole DFS file system. multiple > BlockFixer will do the same thing and try to fix same file if multiple > BlockFixer launched. > the change/fix will be mainly in BlockFixer.java and > RaidDFSUtil.getCorruptFile(), to enable fsck to check the different paths > defined in separated Raid.xml for single RaidNode/BlockFixer -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-695) MiniMRCluster while shutting down should not wait for currently running jobs to finish
[ https://issues.apache.org/jira/browse/MAPREDUCE-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-695: --- Labels: BB2015-05-TBR (was: ) > MiniMRCluster while shutting down should not wait for currently running jobs > to finish > -- > > Key: MAPREDUCE-695 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-695 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Affects Versions: 1.0.3 >Reporter: Sreekanth Ramakrishnan >Priority: Minor > Labels: BB2015-05-TBR > Attachments: mapreduce-695.patch > > > Currently in {{org.apache.hadoop.mapred.MiniMRCluster.shutdown()}} we do a > {{waitTaskTrackers()}} which can cause {{MiniMRCluster}} to hang indefinitely > when used in conjunction with Controlled jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4261) MRAppMaster throws NPE while stopping RMContainerAllocator service
[ https://issues.apache.org/jira/browse/MAPREDUCE-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4261: Labels: BB2015-05-TBR (was: ) > MRAppMaster throws NPE while stopping RMContainerAllocator service > -- > > Key: MAPREDUCE-4261 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4261 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am, mrv2 >Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.1-alpha, 2.0.2-alpha >Reporter: Devaraj K >Assignee: Devaraj K > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-4261.patch > > > {code:xml} > 2012-05-16 18:55:54,222 INFO [Thread-1] > org.apache.hadoop.yarn.service.CompositeService: Error stopping > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerLauncherRouter > java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerLauncherRouter.stop(MRAppMaster.java:716) > at > org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) > at > org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1036) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) > 2012-05-16 18:55:54,222 INFO [Thread-1] > org.apache.hadoop.yarn.service.CompositeService: Error stopping > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter > java.lang.NullPointerException > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getStat(RMContainerAllocator.java:521) > at > org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.stop(RMContainerAllocator.java:227) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.stop(MRAppMaster.java:668) > at > org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) > at > org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1036) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-2058) FairScheduler:NullPointerException in web interface when JobTracker not initialized
[ https://issues.apache.org/jira/browse/MAPREDUCE-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-2058: Labels: BB2015-05-TBR (was: ) > FairScheduler:NullPointerException in web interface when JobTracker not > initialized > --- > > Key: MAPREDUCE-2058 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2058 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/fair-share >Affects Versions: 0.22.0, 1.0.4 >Reporter: Dan Adkins > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-2058-branch-1.patch, MAPREDUCE-2058.patch > > > When I contact the jobtracker web interface prior to the job tracker being > fully initialized (say, if hdfs is still in safe mode), I get the following > error: > 10/09/09 18:06:02 ERROR mortbay.log: /jobtracker.jsp > java.lang.NullPointerException > at > org.apache.hadoop.mapred.FairScheduler.getJobs(FairScheduler.java:909) > at > org.apache.hadoop.mapred.JobTracker.getJobsFromQueue(JobTracker.java:4357) > at > org.apache.hadoop.mapred.JobTracker.getQueueInfoArray(JobTracker.java:4334) > at > org.apache.hadoop.mapred.JobTracker.getRootQueues(JobTracker.java:4295) > at > org.apache.hadoop.mapred.jobtracker_jsp.generateSummaryTable(jobtracker_jsp.java:44) > at > org.apache.hadoop.mapred.jobtracker_jsp._jspService(jobtracker_jsp.java:176) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1124) > at > org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:857) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:361) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:324) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4639) CombineFileInputFormat#getSplits should throw IOException when input paths contain a directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4639: Labels: BB2015-05-TBR (was: ) > CombineFileInputFormat#getSplits should throw IOException when input paths > contain a directory > -- > > Key: MAPREDUCE-4639 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4639 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Reporter: Jim Donofrio >Priority: Minor > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-4639.patch > > > FileInputFormat#getSplits throws an IOException when the input paths contain > a directory. CombineFileInputFormat should do the same, otherwise the jo will > not fail until the record reader is initialized when FileSystem#open will say > that the directory does not exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-2393) No total min share limitation of all pools
[ https://issues.apache.org/jira/browse/MAPREDUCE-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-2393: Labels: BB2015-05-TBR fair scheduler (was: fair scheduler) > No total min share limitation of all pools > -- > > Key: MAPREDUCE-2393 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2393 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/fair-share >Affects Versions: 0.21.0 >Reporter: Denny Ye > Labels: BB2015-05-TBR, fair, scheduler > Attachments: MAPREDUCE-2393.patch > > > hi, there is no limitation about min share of all pools with cluster total > shares. User can define arbitrary amount of min share for each pool. It has > such description in , but no regular code. > It may critical for slot distribution. One pool can hold all cluster slots to > meet it's min share that greater than cluster total slots very much. > If that case has happened, we should scaled down proportionally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4308) Remove excessive split log messages
[ https://issues.apache.org/jira/browse/MAPREDUCE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4308: Labels: BB2015-05-TBR (was: ) > Remove excessive split log messages > --- > > Key: MAPREDUCE-4308 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4308 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Affects Versions: 1.0.3 >Reporter: Kihwal Lee > Labels: BB2015-05-TBR > Attachments: mapreduce-4308-branch-1.patch > > > Job tracker currently prints out information on every split. > {noformat} > 2012-05-20 00:06:01,985 INFO org.apache.hadoop.mapred.JobInProgress: > tip:task_201205100740_1745_m_00 has split on node:/192.168.0.1 > /my.totally.madeup.host.com > {noformat} > I looked at one cluster and these messages were taking up more than 30% of > the JT log. If jobs have large number of maps, it can be worse. I think it is > reasonable to lower the log level of the statement from INFO to DEBUG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5097) Job.addArchiveToClassPath is ignored when running job with LocalJobRunner
[ https://issues.apache.org/jira/browse/MAPREDUCE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5097: Labels: BB2015-05-TBR (was: ) > Job.addArchiveToClassPath is ignored when running job with LocalJobRunner > - > > Key: MAPREDUCE-5097 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5097 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: Alex Baranau >Assignee: Alex Baranau >Priority: Minor > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5097-ugly-test.patch, MAPREDUCE-5097.patch > > > Using external dependency jar in mr job. Adding it to the job classpath via > Job.addArchiveToClassPath(...) doesn't work when running with LocalJobRunner > (i.e. in unit test). This makes it harder to unit-test such jobs (with > third-party runtime dependencies). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4136) Hadoop streaming might succeed even through reducer fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4136: Labels: BB2015-05-TBR (was: ) > Hadoop streaming might succeed even through reducer fails > - > > Key: MAPREDUCE-4136 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4136 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.20.205.0 >Reporter: Wouter de Bie > Labels: BB2015-05-TBR > Attachments: mapreduce-4136.patch > > > Hadoop streaming can even succeed even though the reducer has failed. This > happens when Hadoop calls {{PipeReducer.close()}}, but in the mean time the > reducer has failed and the process has died. When {{clientOut_.flush()}} > throws an {{IOException}} in {{PipeMapRed.mapRedFinish()}} this exception is > caught but only logged. The exit status of the child process is never checked > and task is marked as successful. > I've attached a patch that seems to fix it for us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-3882) fix some compile warnings of hadoop-mapreduce-examples
[ https://issues.apache.org/jira/browse/MAPREDUCE-3882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-3882: Labels: BB2015-05-TBR (was: ) > fix some compile warnings of hadoop-mapreduce-examples > -- > > Key: MAPREDUCE-3882 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3882 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Environment: Windows 7 >Reporter: Changming Sun >Priority: Minor > Labels: BB2015-05-TBR > Attachments: mapreduce-3882.patch > > Original Estimate: 2m > Remaining Estimate: 2m > > fix some compile warnings of hadoop-mapreduce-examples -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4482) Backport MR sort plugin(MAPREDUCE-2454) to Hadoop 1.2
[ https://issues.apache.org/jira/browse/MAPREDUCE-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4482: Labels: BB2015-05-TBR (was: ) > Backport MR sort plugin(MAPREDUCE-2454) to Hadoop 1.2 > - > > Key: MAPREDUCE-4482 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4482 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: mrv1 >Affects Versions: 1.2.0 >Reporter: Mariappan Asokan >Assignee: Mariappan Asokan > Labels: BB2015-05-TBR > Attachments: HadoopSortPlugin.pdf, > mapreduce-4482-release-1.1.0-rc4.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4506) EofException / 'connection reset by peer' while copying map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4506: Labels: BB2015-05-TBR (was: ) > EofException / 'connection reset by peer' while copying map output > --- > > Key: MAPREDUCE-4506 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4506 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.0.3 > Environment: Ubuntu Linux 12.04 LTS, 64-bit, Java 6 update 33 >Reporter: Piotr Kołaczkowski >Priority: Minor > Labels: BB2015-05-TBR > Attachments: RamManager.patch, ReduceTask.patch > > > When running complex mapreduce jobs with many mappers and reducers (e.g. 8 > mappers, 8 reducers on a 8 core machine), sometimes the following exceptions > pop up in the logs during the shuffle phase: > {noformat} > WARN [570516323@qtp-2060060479-164] 2012-07-19 02:50:21,229 TaskTracker.java > (line 3894) getMapOutput(attempt_201207161621_0217_m_71_0,0) failed : > org.mortbay.jetty.EofException > at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:787) > at > org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:568) > at > org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1005) > at > org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:648) > at > org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:579) > at > org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3872) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166) > at > org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > Caused by: java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcher.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:72) > at sun.nio.ch.IOUtil.write(IOUtil.java:43) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334) > at org.mortbay.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:169) > at > org.mortbay.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:221) > at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:721) > {noformat} > The problem looks like some network problems at first, however it turns out > that hadoop shuffleInMemory sometimes deliberately closes map-output-copy > connections just to reopen them a few milliseconds later, because of > temporary unavailability of free memory. Because the sending side does not > expect this, an exception is thrown. Additionally this leads to wasting > resources on the sender side, which does more work than required serving > additional requests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4956) The Additional JH Info Should Be Exposed
[ https://issues.apache.org/jira/browse/MAPREDUCE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4956: Labels: BB2015-05-TBR (was: ) > The Additional JH Info Should Be Exposed > > > Key: MAPREDUCE-4956 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4956 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zhijie Shen >Assignee: Zhijie Shen > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-4956_1.patch, MAPREDUCE-4956_2.patch, > MAPREDUCE-4956_3.patch > > > In MAPREDUCE-4838, the addition info has been added to JH. This info is > useful to be exposed, at least via UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-3876) vertica query, sql command not properly ended
[ https://issues.apache.org/jira/browse/MAPREDUCE-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-3876: Labels: BB2015-05-TBR hadoop newbie patch (was: hadoop newbie patch) > vertica query, sql command not properly ended > - > > Key: MAPREDUCE-3876 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3876 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Affects Versions: 1.0.0 > Environment: Red Hat 5.5 > Oracle 11 >Reporter: Joseph Doss > Labels: BB2015-05-TBR, hadoop, newbie, patch > Attachments: HADOOP-oracleDriver-src.patch > > > When running a test script, we're getting a java IO exception thrown. > This test works on hadoop-0.20.0 but not on hadoop-1.0.0. > Fri Feb 17 11:36:40 EST 2012 > Running processes with name syncGL.sh: 0 > LIB_JARS: > /home/hadoop/verticasync/lib/vertica_4.1.14_jdk_5.jar,/home/hadoop/verticasync/lib/mail.jar,/home/hadoop/verticasync/lib/jdbc14.jar > VERTICA_SYNC_JAR: /home/hadoop/verticasync/lib/vertica-sync.jar > PROPERTIES_FILE: > /home/hadoop/verticasync/config/ssp-vertica-sync-gl.properties > Starting Vertica data sync - GL - process > Warning: $HADOOP_HOME is deprecated. > 12/02/17 11:36:43 INFO mapred.JobClient: Running job: job_201202171122_0001 > 12/02/17 11:36:44 INFO mapred.JobClient: map 0% reduce 0% > 12/02/17 11:36:56 INFO mapred.JobClient: Task Id : > attempt_201202171122_0001_m_00_0, Status : FAILED > java.io.IOException: ORA-00933: SQL command not properly ended > at > org.apache.hadoop.mapred.lib.db.DBInputFormat.getRecordReader(DBInputFormat.java:289) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:197) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > 12/02/17 11:36:57 INFO mapred.JobClient: Task Id : > attempt_201202171122_0001_m_01_0, Status : FAILED > java.io.IOException: ORA-00933: SQL command not properly ended > at > org.apache.hadoop.mapred.lib.db.DBInputFormat.getRecordReader(DBInputFormat.java:289) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:197) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) > at org.apache.hadoop.mapred.Child.main(Child.java:249) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6342) Make POM project names consistent
[ https://issues.apache.org/jira/browse/MAPREDUCE-6342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6342: Labels: BB2015-05-TBR (was: ) > Make POM project names consistent > - > > Key: MAPREDUCE-6342 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6342 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: build >Reporter: Rohith >Assignee: Rohith >Priority: Minor > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-6342.patch > > > This is track MR changes for POM changes by name -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5799) add default value of MR_AM_ADMIN_USER_ENV
[ https://issues.apache.org/jira/browse/MAPREDUCE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5799: Labels: BB2015-05-TBR (was: ) > add default value of MR_AM_ADMIN_USER_ENV > - > > Key: MAPREDUCE-5799 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5799 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Liyin Liang >Assignee: Rajesh Kartha > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5799-1.diff, MAPREDUCE-5799.002.patch, > MAPREDUCE-5799.diff > > > Submit a 1 map + 1 reduce sleep job with the following config: > {code} > > mapreduce.map.output.compress > true > > > mapreduce.map.output.compress.codec > org.apache.hadoop.io.compress.SnappyCodec > > > mapreduce.job.ubertask.enable > true > > {code} > And the LinuxContainerExecutor is enable on NodeManager. > This job will fail with the following error: > {code} > 2014-03-18 21:28:20,153 FATAL [uber-SubtaskRunner] > org.apache.hadoop.mapred.LocalContainerLauncher: Error running local > (uberized) 'child' : java.lang.UnsatisfiedLinkError: > org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z > at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native > Method) > at > org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63) > at > org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132) > at > org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148) > at > org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163) > at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:115) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1583) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700) > at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at > org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:317) > at > org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:232) > at java.lang.Thread.run(Thread.java:662) > {code} > When create a ContainerLaunchContext for task in > TaskAttemptImpl.createCommonContainerLaunchContext(), the > DEFAULT_MAPRED_ADMIN_USER_ENV which is > "LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native" is added to the environment. > Where when create a ContainerLaunchContext for mrappmaster in > YARNRunner.createApplicationSubmissionContext(), there is no default > environment. So the ubermode job fails to find native lib. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6350) JobHistory doesn't support fully-functional search
[ https://issues.apache.org/jira/browse/MAPREDUCE-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6350: Labels: BB2015-05-TBR (was: ) > JobHistory doesn't support fully-functional search > -- > > Key: MAPREDUCE-6350 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6350 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Reporter: Siqi Li >Assignee: Siqi Li >Priority: Critical > Labels: BB2015-05-TBR > Attachments: YARN-1614.v1.patch, YARN-1614.v2.patch > > > job history server will only output the first 50 characters of the job names > in webUI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6284) Add a 'task attempt state' to MapReduce Application Master REST API
[ https://issues.apache.org/jira/browse/MAPREDUCE-6284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6284: Labels: BB2015-05-TBR (was: ) > Add a 'task attempt state' to MapReduce Application Master REST API > --- > > Key: MAPREDUCE-6284 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6284 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Ryu Kobayashi >Priority: Minor > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-6284.1.patch, MAPREDUCE-6284.1.patch, > MAPREDUCE-6284.2.patch, MAPREDUCE-6284.3.patch, MAPREDUCE-6284.3.patch > > > It want to 'task attempt state' on the 'App state' similarly REST API. > GET http:///proxy/ _id>/ws/v1/mapreduce/jobs//tasks//attempts//state > PUT http:///proxy/ _id>/ws/v1/mapreduce/jobs//tasks//attempts//state > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6338) MR AppMaster does not honor ephemeral port range
[ https://issues.apache.org/jira/browse/MAPREDUCE-6338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6338: Labels: BB2015-05-TBR (was: ) > MR AppMaster does not honor ephemeral port range > > > Key: MAPREDUCE-6338 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6338 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am, mrv2 >Affects Versions: 2.6.0 >Reporter: Frank Nguyen >Assignee: Frank Nguyen > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-6338.002.patch > > > The MR AppMaster should only use port ranges defined in the > yarn.app.mapreduce.am.job.client.port-range property. On initial startup of > the MRAppMaster, it does use the port range defined in the property. > However, it also opens up a listener on a random ephemeral port. This is not > the Jetty listener. It is another listener opened by the MRAppMaster via > another thread and is recognized by the RM. Other nodes will try to > communicate to it via that random port. With firewall settings on, the MR > job will fail because the random port is not opened. This problem has caused > others to have all OS ephemeral ports opened to have MR jobs run. > This is related to MAPREDUCE-4079 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6332) Add more required API's to MergeManager interface
[ https://issues.apache.org/jira/browse/MAPREDUCE-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6332: Labels: BB2015-05-TBR (was: ) > Add more required API's to MergeManager interface > -- > > Key: MAPREDUCE-6332 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6332 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Affects Versions: 2.5.0, 2.6.0, 2.7.0 >Reporter: Rohith >Assignee: Rohith > Labels: BB2015-05-TBR > Attachments: 0001-MAPREDUCE-6332.patch, 0002-MAPREDUCE-6332.patch > > > MR provides ability to the user for plugin custom ShuffleConsumerPlugin using > *mapreduce.job.reduce.shuffle.consumer.plugin.class*. When the user is > allowed to use this configuration as plugin, user also interest in > implementing his own MergeManagerImpl. > But now , user is forced to use MR provided MergeManagerImpl instead of > custom MergeManagerImpl when user is using shuffle.consumer.plugin class. > There should be well defined API's in MergeManager that can be used for any > implementation without much effort to user for custom implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5465) Container killed before hprof dumps profile.out
[ https://issues.apache.org/jira/browse/MAPREDUCE-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5465: Labels: BB2015-05-TBR (was: ) > Container killed before hprof dumps profile.out > --- > > Key: MAPREDUCE-5465 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5465 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mr-am, mrv2 >Reporter: Radim Kolar >Assignee: Ming Ma > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5465-2.patch, MAPREDUCE-5465-3.patch, > MAPREDUCE-5465-4.patch, MAPREDUCE-5465-5.patch, MAPREDUCE-5465-6.patch, > MAPREDUCE-5465-7.patch, MAPREDUCE-5465-8.patch, MAPREDUCE-5465-9.patch, > MAPREDUCE-5465.patch > > > If there is profiling enabled for mapper or reducer then hprof dumps > profile.out at process exit. It is dumped after task signaled to AM that work > is finished. > AM kills container with finished work without waiting for hprof to finish > dumps. If hprof is dumping larger outputs (such as with depth=4 while depth=3 > works) , it could not finish dump in time before being killed making entire > dump unusable because cpu and heap stats are missing. > There needs to be better delay before container is killed if profiling is > enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5733) Define and use a constant for property "textinputformat.record.delimiter"
[ https://issues.apache.org/jira/browse/MAPREDUCE-5733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-5733: Labels: BB2015-05-TBR (was: ) > Define and use a constant for property "textinputformat.record.delimiter" > - > > Key: MAPREDUCE-5733 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5733 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Gelesh >Assignee: Gelesh >Priority: Trivial > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5733.patch, MAPREDUCE-5733_2.patch > > Original Estimate: 10m > Remaining Estimate: 10m > > (Configugration) conf.set("textinputformat.record.delimiter","myDelimiter") , > is bound to typo error. Lets have it as a Static String in some class, to > minimise such error. This would also help in IDE like eclipse suggesting the > String. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6316) Task Attempt List entries should link to the task overview
[ https://issues.apache.org/jira/browse/MAPREDUCE-6316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6316: Labels: BB2015-05-TBR (was: ) > Task Attempt List entries should link to the task overview > -- > > Key: MAPREDUCE-6316 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6316 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Siqi Li >Assignee: Siqi Li > Labels: BB2015-05-TBR > Attachments: AM attempt page.png, AM task page.png, All Attempts > page.png, MAPREDUCE-6316.v1.patch, MAPREDUCE-6316.v2.patch, > MAPREDUCE-6316.v3.patch, Task Overview page.png > > > Typical workflow is to click on the list of failed attempts. Then you want to > look at the counters, or the list of attempts of just one task in general. If > each entry task attempt id linked the task id portion of it back to the task, > we would not have to go through the list of tasks to search for the task. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6305) AM/Task log page should be able to link back to the job
[ https://issues.apache.org/jira/browse/MAPREDUCE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6305: Labels: BB2015-05-TBR (was: ) > AM/Task log page should be able to link back to the job > --- > > Key: MAPREDUCE-6305 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6305 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Siqi Li >Assignee: Siqi Li > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-6305.v1.patch, MAPREDUCE-6305.v2.patch, > MAPREDUCE-6305.v3.patch, MAPREDUCE-6305.v4.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6336) Enable v2 FileOutputCommitter by default
[ https://issues.apache.org/jira/browse/MAPREDUCE-6336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6336: Labels: BB2015-05-TBR (was: ) > Enable v2 FileOutputCommitter by default > > > Key: MAPREDUCE-6336 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6336 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Affects Versions: 2.7.0 >Reporter: Gera Shegalov >Assignee: Siqi Li > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-6336.v1.patch > > > This JIRA is to propose making new FileOutputCommitter behavior from > MAPREDUCE-4815 enabled by default in trunk, and potentially in branch-2. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6269) improve JobConf to add option to not share Credentials between jobs.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6269: Labels: BB2015-05-TBR (was: ) > improve JobConf to add option to not share Credentials between jobs. > > > Key: MAPREDUCE-6269 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6269 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Reporter: zhihai xu >Assignee: zhihai xu > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-6269.000.patch > > > Improve JobConf to add constructor to avoid sharing Credentials between jobs. > By default the Credentials will be shared to keep the backward compatibility. > We can add a new constructor with a new parameter to decide whether to share > Credentials. Some issues reported in cascading is due to corrupted credentials > at > https://github.com/Cascading/cascading/commit/45b33bb864172486ac43782a4d13329312d01c0e > If we add this support in JobConf, it will benefit all job clients. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6241) Native compilation fails for Checksum.cc due to an incompatibility of assembler register constraint for PowerPC
[ https://issues.apache.org/jira/browse/MAPREDUCE-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6241: Labels: BB2015-05-TBR features (was: features) > Native compilation fails for Checksum.cc due to an incompatibility of > assembler register constraint for PowerPC > > > Key: MAPREDUCE-6241 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6241 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: build >Affects Versions: 3.0.0, 2.6.0 > Environment: Debian/Jessie, kernel 3.18.5, ppc64 GNU/Linux > gcc (Debian 4.9.1-19) > protobuf 2.6.1 > OpenJDK Runtime Environment (IcedTea 2.5.3) (7u71-2.5.3-2) > OpenJDK Zero VM (build 24.65-b04, interpreted mode) > source was cloned (and updated) from Apache-Hadoop's git repository >Reporter: Stephan Drescher >Assignee: Binglin Chang >Priority: Minor > Labels: BB2015-05-TBR, features > Attachments: MAPREDUCE-6241.001.patch, MAPREDUCE-6241.002.patch > > > Issue when using assembler code for performance optimization on the powerpc > platform (compiled for 32bit) > mvn compile -Pnative -DskipTests > [exec] /usr/bin/c++ -Dnativetask_EXPORTS -m32 -DSIMPLE_MEMCPY > -fno-strict-aliasing -Wall -Wno-sign-compare -g -O2 -DNDEBUG -fPIC > -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target/native/javah > > -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src > > -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/util > > -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/lib > > -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/test > > -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src > > -I/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target/native > -I/home/hadoop/Java/java7/include -I/home/hadoop/Java/java7/include/linux > -isystem > /home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/gtest/include > -o CMakeFiles/nativetask.dir/main/native/src/util/Checksum.cc.o -c > /home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/util/Checksum.cc > [exec] CMakeFiles/nativetask.dir/build.make:744: recipe for target > 'CMakeFiles/nativetask.dir/main/native/src/util/Checksum.cc.o' failed > [exec] make[2]: Leaving directory > '/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target/native' > [exec] CMakeFiles/Makefile2:95: recipe for target > 'CMakeFiles/nativetask.dir/all' failed > [exec] make[1]: Leaving directory > '/home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/target/native' > [exec] Makefile:76: recipe for target 'all' failed > [exec] > /home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/util/Checksum.cc: > In function ‘void NativeTask::init_cpu_support_flag()’: > /home/hadoop/Developer/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/util/Checksum.cc:611:14: > error: impossible register constraint in ‘asm’ > --> > "popl %%ebx" : "=a" (eax), [ebx] "=r"(ebx), "=c"(ecx), "=d"(edx) : "a" > (eax_in) : "cc"); > <-- -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6246) DBOutputFormat.java appending extra semicolon to query which is incompatible with DB2
[ https://issues.apache.org/jira/browse/MAPREDUCE-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6246: Labels: BB2015-05-TBR DB2 mapreduce (was: DB2 mapreduce) > DBOutputFormat.java appending extra semicolon to query which is incompatible > with DB2 > - > > Key: MAPREDUCE-6246 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6246 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1, mrv2 >Affects Versions: 2.4.1 > Environment: OS: RHEL 5.x, RHEL 6.x, SLES 11.x > Platform: xSeries, pSeries > Browser: Firefox, IE > Security Settings: No Security, Flat file, LDAP, PAM > File System: HDFS, GPFS FPO >Reporter: ramtin >Assignee: ramtin > Labels: BB2015-05-TBR, DB2, mapreduce > Attachments: MAPREDUCE-6246.002.patch, MAPREDUCE-6246.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > DBoutputformat is used for writing output of mapreduce jobs to the database > and when used with db2 jdbc drivers it fails with following error > com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, > SQLSTATE=42601, SQLERRMC=;;,COUNT) VALUES (?,?);END-OF-STATEMENT, > DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at > com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127) > In DBOutputFormat class there is constructQuery method that generates "INSERT > INTO" statement with semicolon(";") at the end. > Semicolon is ANSI SQL-92 standard character for a statement terminator but > this feature is disabled(OFF) as a default settings in IBM DB2. > Although by using -t we can turn it ON for db2. > (http://www-01.ibm.com/support/knowledgecenter/SSEPGG_9.7.0/com.ibm.db2.luw.admin.cmd.doc/doc/r0010410.html?cp=SSEPGG_9.7.0%2F3-6-2-0-2). > But there are some products that already built on top of this default > setting (OFF) so by turning ON this feature make them error prone. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4683) We need to fix our build to create/distribute hadoop-mapreduce-client-core-tests.jar
[ https://issues.apache.org/jira/browse/MAPREDUCE-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-4683: Labels: BB2015-05-TBR (was: ) > We need to fix our build to create/distribute > hadoop-mapreduce-client-core-tests.jar > > > Key: MAPREDUCE-4683 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4683 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: build >Reporter: Arun C Murthy >Assignee: Akira AJISAKA >Priority: Critical > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-4683.patch > > > We need to fix our build to create/distribute > hadoop-mapreduce-client-core-tests.jar, need this before MAPREDUCE-4253 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6068) Illegal progress value warnings in map tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-6068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6068: Labels: BB2015-05-TBR (was: ) > Illegal progress value warnings in map tasks > > > Key: MAPREDUCE-6068 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6068 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2, task >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Binglin Chang > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-6068.002.patch, MAPREDUCE-6068.v1.patch > > > When running a terasort on latest trunk, I see the following in my task logs: > {code} > 2014-09-02 17:42:28,437 INFO [main] org.apache.hadoop.mapred.MapTask: Map > output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer > 2014-09-02 17:42:42,238 WARN [main] org.apache.hadoop.util.Progress: Illegal > progress value found, progress is larger than 1. Progress will be changed to 1 > 2014-09-02 17:42:42,238 WARN [main] org.apache.hadoop.util.Progress: Illegal > progress value found, progress is larger than 1. Progress will be changed to 1 > 2014-09-02 17:42:42,241 INFO [main] org.apache.hadoop.mapred.MapTask: > Starting flush of map output > {code} > We should eliminate these warnings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6315) Implement retrieval of logs for crashed MR-AM via jhist in the staging directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6315: Labels: BB2015-05-TBR (was: ) > Implement retrieval of logs for crashed MR-AM via jhist in the staging > directory > > > Key: MAPREDUCE-6315 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6315 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client, mr-am >Affects Versions: 2.7.0 >Reporter: Gera Shegalov >Assignee: Gera Shegalov >Priority: Critical > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-6315.001.patch > > > When all AM attempts crash, there is no record of them in JHS. Thus no easy > way to get the logs. This JIRA automates the procedure by utilizing the jhist > file in the staging directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6298) Job#toString throws an exception when not in state RUNNING
[ https://issues.apache.org/jira/browse/MAPREDUCE-6298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6298: Labels: BB2015-05-TBR (was: ) > Job#toString throws an exception when not in state RUNNING > -- > > Key: MAPREDUCE-6298 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6298 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Lars Francke >Assignee: Lars Francke >Priority: Minor > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-6298.1.patch > > > Job#toString calls {{ensureState(JobState.RUNNING);}} as the very first > thing. That method causes an Exception to be thrown which is not nice. > One thing this breaks is usage of Job on the Scala (e.g. Spark) REPL as that > calls toString after every invocation and that fails every time. > I'll attach a patch that checks state and if it's RUNNING prints the original > message and if not prints something else. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6320) Configuration of retrieved Job via Cluster is not properly set-up
[ https://issues.apache.org/jira/browse/MAPREDUCE-6320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated MAPREDUCE-6320: Labels: BB2015-05-TBR (was: ) > Configuration of retrieved Job via Cluster is not properly set-up > - > > Key: MAPREDUCE-6320 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6320 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Jens Rabe >Assignee: Jens Rabe > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-6320.001.patch, MAPREDUCE-6320.002.patch, > MAPREDUCE-6320.003.patch > > > When getting a Job via the Cluster API, it is not correctly configured. > To reproduce this: > # Submit a MR job, and set some arbitrary parameter to its configuration > {code:java} > job.getConfiguration().set("foo", "bar"); > job.setJobName("foo-bug-demo"); > {code} > # Get the job in a client: > {code:java} > final Cluster c = new Cluster(conf); > final JobStatus[] statuses = c.getAllJobStatuses(); > final JobStatus s = ... // get the status for the job named foo-bug-demo > final Job j = c.getJob(s.getJobId()); > final Configuration conf = job.getConfiguration(); > {code} > # Get its "foo" entry > {code:java} > final String s = conf.get("foo"); > {code} > # Expected: s is "bar"; But: s is null. > The reason is that the job's configuration is stored on HDFS (the > Configuration has a resource with a *hdfs://* URL) and in the *loadResource* > it is changed to a path on the local file system > (hdfs://host.domain:port/tmp/hadoop-yarn/... is changed to > /tmp/hadoop-yarn/...), which does not exist, and thus the configuration is > not populated. > The bug happens in the *Cluster* class, where *JobConfs* are created from > *status.getJobFile()*. A quick fix would be to copy this job file to a > temporary file in the local file system and populate the JobConf from this > file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)