[jira] [Commented] (MAPREDUCE-6321) Map tasks take a lot of time to start up
[ https://issues.apache.org/jira/browse/MAPREDUCE-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524211#comment-14524211 ] Ray Chiang commented on MAPREDUCE-6321: --- I'd suggest running again with the fix from YARN-2990 and seeing if the times go down. Release 2.7.0 should have the fix. Map tasks take a lot of time to start up Key: MAPREDUCE-6321 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6321 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 2.6.0 Reporter: Rajat Jain Priority: Critical Labels: performance I have noticed repeatedly that the map tasks take a lot of time to startup on YARN clusters. This is not the scheduling part, this is after the actual container is launched containing the Map task. Take for example, the sample log from a mapper of a Pi job that I launched. The command I used to launch the Pi job was: {code} hadoop jar /usr/lib/hadoop/share/hadoop/mapreduce/hadoop*mapreduce*examples*jar pi 10 100 {code} This is the sample job from one of the mappers which took 14 seconds to complete. If you notice from the logs, most of the time taken by this job is during the start up. I notice that the most mappers take anywhere between 7 to 15 seconds during start up and have seen this behavior consistent across mapreduce jobs. This really affects the performance of short running mappers. I run a hadoop2 / yarn cluster on a 4-5 node m1.xlarge cluster, and the mapper memory is always specified as 2048m and so on. Log: {code} 2015-04-18 06:48:34,081 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2015-04-18 06:48:34,637 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2015-04-18 06:48:34,637 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started 2015-04-18 06:48:34,690 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens: 2015-04-18 06:48:34,690 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1429338752209_0059, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@5d48e5d6) 2015-04-18 06:48:35,391 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now. 2015-04-18 06:48:36,656 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /media/ephemeral3/yarn/local/usercache/rjain/appcache/application_1429338752209_0059,/media/ephemeral1/yarn/local/usercache/rjain/appcache/application_1429338752209_0059,/media/ephemeral2/yarn/local/usercache/rjain/appcache/application_1429338752209_0059,/media/ephemeral0/yarn/local/usercache/rjain/appcache/application_1429338752209_0059 2015-04-18 06:48:36,706 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS 2015-04-18 06:48:37,387 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS 2015-04-18 06:48:39,388 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 2015-04-18 06:48:39,448 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS 2015-04-18 06:48:41,060 INFO [main] org.apache.hadoop.fs.s3native.NativeS3FileSystem: setting Progress to org.apache.hadoop.mapred.Task$TaskReporter@601211d0 comment setting up progress from Task 2015-04-18 06:48:41,098 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ] 2015-04-18 06:48:41,585 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: hdfs://ec2-54-211-109-245.compute-1.amazonaws.com:9000/user/rjain/QuasiMonteCarlo_1429339685772_504558444/in/part4:0+118 2015-04-18 06:48:43,926 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 234881020(939524080) 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 896 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 657666880 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 939524096 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 234881020; length = 58720256 2015-04-18 06:48:43,946 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 2015-04-18 06:48:44,022 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output 2015-04-18
[jira] [Commented] (MAPREDUCE-6321) Map tasks take a lot of time to start up
[ https://issues.apache.org/jira/browse/MAPREDUCE-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524212#comment-14524212 ] Ray Chiang commented on MAPREDUCE-6321: --- Oh, assuming you're running FairScheduler. Map tasks take a lot of time to start up Key: MAPREDUCE-6321 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6321 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 2.6.0 Reporter: Rajat Jain Priority: Critical Labels: performance I have noticed repeatedly that the map tasks take a lot of time to startup on YARN clusters. This is not the scheduling part, this is after the actual container is launched containing the Map task. Take for example, the sample log from a mapper of a Pi job that I launched. The command I used to launch the Pi job was: {code} hadoop jar /usr/lib/hadoop/share/hadoop/mapreduce/hadoop*mapreduce*examples*jar pi 10 100 {code} This is the sample job from one of the mappers which took 14 seconds to complete. If you notice from the logs, most of the time taken by this job is during the start up. I notice that the most mappers take anywhere between 7 to 15 seconds during start up and have seen this behavior consistent across mapreduce jobs. This really affects the performance of short running mappers. I run a hadoop2 / yarn cluster on a 4-5 node m1.xlarge cluster, and the mapper memory is always specified as 2048m and so on. Log: {code} 2015-04-18 06:48:34,081 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2015-04-18 06:48:34,637 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2015-04-18 06:48:34,637 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started 2015-04-18 06:48:34,690 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens: 2015-04-18 06:48:34,690 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1429338752209_0059, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@5d48e5d6) 2015-04-18 06:48:35,391 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now. 2015-04-18 06:48:36,656 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /media/ephemeral3/yarn/local/usercache/rjain/appcache/application_1429338752209_0059,/media/ephemeral1/yarn/local/usercache/rjain/appcache/application_1429338752209_0059,/media/ephemeral2/yarn/local/usercache/rjain/appcache/application_1429338752209_0059,/media/ephemeral0/yarn/local/usercache/rjain/appcache/application_1429338752209_0059 2015-04-18 06:48:36,706 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS 2015-04-18 06:48:37,387 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS 2015-04-18 06:48:39,388 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 2015-04-18 06:48:39,448 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS 2015-04-18 06:48:41,060 INFO [main] org.apache.hadoop.fs.s3native.NativeS3FileSystem: setting Progress to org.apache.hadoop.mapred.Task$TaskReporter@601211d0 comment setting up progress from Task 2015-04-18 06:48:41,098 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ] 2015-04-18 06:48:41,585 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: hdfs://ec2-54-211-109-245.compute-1.amazonaws.com:9000/user/rjain/QuasiMonteCarlo_1429339685772_504558444/in/part4:0+118 2015-04-18 06:48:43,926 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 234881020(939524080) 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 896 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 657666880 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 939524096 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 234881020; length = 58720256 2015-04-18 06:48:43,946 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 2015-04-18 06:48:44,022 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output 2015-04-18 06:48:44,022 INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output
[jira] [Commented] (MAPREDUCE-6321) Map tasks take a lot of time to start up
[ https://issues.apache.org/jira/browse/MAPREDUCE-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524254#comment-14524254 ] Ray Chiang commented on MAPREDUCE-6321: --- Task startup time includes scheduling determination delays, which is what YARN-2990 fixes. Localization and JVM startup are usually a noticeable chunk of the remaining time. Map tasks take a lot of time to start up Key: MAPREDUCE-6321 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6321 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 2.6.0 Reporter: Rajat Jain Priority: Critical Labels: performance I have noticed repeatedly that the map tasks take a lot of time to startup on YARN clusters. This is not the scheduling part, this is after the actual container is launched containing the Map task. Take for example, the sample log from a mapper of a Pi job that I launched. The command I used to launch the Pi job was: {code} hadoop jar /usr/lib/hadoop/share/hadoop/mapreduce/hadoop*mapreduce*examples*jar pi 10 100 {code} This is the sample job from one of the mappers which took 14 seconds to complete. If you notice from the logs, most of the time taken by this job is during the start up. I notice that the most mappers take anywhere between 7 to 15 seconds during start up and have seen this behavior consistent across mapreduce jobs. This really affects the performance of short running mappers. I run a hadoop2 / yarn cluster on a 4-5 node m1.xlarge cluster, and the mapper memory is always specified as 2048m and so on. Log: {code} 2015-04-18 06:48:34,081 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2015-04-18 06:48:34,637 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2015-04-18 06:48:34,637 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started 2015-04-18 06:48:34,690 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens: 2015-04-18 06:48:34,690 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1429338752209_0059, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@5d48e5d6) 2015-04-18 06:48:35,391 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now. 2015-04-18 06:48:36,656 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /media/ephemeral3/yarn/local/usercache/rjain/appcache/application_1429338752209_0059,/media/ephemeral1/yarn/local/usercache/rjain/appcache/application_1429338752209_0059,/media/ephemeral2/yarn/local/usercache/rjain/appcache/application_1429338752209_0059,/media/ephemeral0/yarn/local/usercache/rjain/appcache/application_1429338752209_0059 2015-04-18 06:48:36,706 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS 2015-04-18 06:48:37,387 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS 2015-04-18 06:48:39,388 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 2015-04-18 06:48:39,448 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS 2015-04-18 06:48:41,060 INFO [main] org.apache.hadoop.fs.s3native.NativeS3FileSystem: setting Progress to org.apache.hadoop.mapred.Task$TaskReporter@601211d0 comment setting up progress from Task 2015-04-18 06:48:41,098 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ] 2015-04-18 06:48:41,585 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: hdfs://ec2-54-211-109-245.compute-1.amazonaws.com:9000/user/rjain/QuasiMonteCarlo_1429339685772_504558444/in/part4:0+118 2015-04-18 06:48:43,926 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 234881020(939524080) 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 896 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 657666880 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 939524096 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 234881020; length = 58720256 2015-04-18 06:48:43,946 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 2015-04-18 06:48:44,022 INFO [main] org.apache.hadoop.mapred.MapTask:
[jira] [Commented] (MAPREDUCE-5649) Reduce cannot use more than 2G memory for the final merge
[ https://issues.apache.org/jira/browse/MAPREDUCE-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524514#comment-14524514 ] Hadoop QA commented on MAPREDUCE-5649: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 46s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 29s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 29s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 54s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 1s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 33s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 15s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | mapreduce tests | 1m 36s | Tests passed in hadoop-mapreduce-client-core. | | | | 38m 4s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12729903/MAPREDUCE-5649.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | whitespace | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5487/artifact/patchprocess/whitespace.txt | | hadoop-mapreduce-client-core test log | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5487/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5487/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf902.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5487/console | This message was automatically generated. Reduce cannot use more than 2G memory for the final merge -- Key: MAPREDUCE-5649 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5649 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Reporter: stanley shi Assignee: Gera Shegalov Attachments: MAPREDUCE-5649.001.patch, MAPREDUCE-5649.002.patch In the org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.java file, in the finalMerge method: int maxInMemReduce = (int)Math.min( Runtime.getRuntime().maxMemory() * maxRedPer, Integer.MAX_VALUE); This means no matter how much memory user has, reducer will not retain more than 2G data in memory before the reduce phase starts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5377) JobID is not displayed truly by hadoop job -history command
[ https://issues.apache.org/jira/browse/MAPREDUCE-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524641#comment-14524641 ] Hadoop QA commented on MAPREDUCE-5377: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12591055/MAPREDUCE-5377.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5520/console | This message was automatically generated. JobID is not displayed truly by hadoop job -history command - Key: MAPREDUCE-5377 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5377 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.2.0 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Priority: Minor Labels: newbie Attachments: MAPREDUCE-5377.patch JobID output by hadoop job -history command is wrong string. {quote} [hadoop@hadoop hadoop]$ hadoop job -history terasort Hadoop job: 0001_1374260789919_hadoop = Job tracker host name: job job tracker start time: Tue May 18 15:39:51 PDT 1976 User: hadoop JobName: TeraSort JobConf: hdfs://hadoop:8020/hadoop/mapred/staging/hadoop/.staging/job_201307191206_0001/job.xml Submitted At: 19-7-2013 12:06:29 Launched At: 19-7-2013 12:06:30 (0sec) Finished At: 19-7-2013 12:06:44 (14sec) Status: SUCCESS {quote} In this example, it should show job_201307191206_0001 at Hadoop job:, but shows 0001_1374260789919_hadoop. In addition, Job tracker host name and job tracker start time is invalid. This problem can solve by fixing setting of jobId in HistoryViewer(). In addition, it should fix the information of JobTracker at HistoryViewr. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5403) MR changes to accommodate yarn.application.classpath being moved to the server-side
[ https://issues.apache.org/jira/browse/MAPREDUCE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524643#comment-14524643 ] Hadoop QA commented on MAPREDUCE-5403: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12594253/MAPREDUCE-5403-2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle site | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5521/console | This message was automatically generated. MR changes to accommodate yarn.application.classpath being moved to the server-side --- Key: MAPREDUCE-5403 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5403 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 2.0.5-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5403-1.patch, MAPREDUCE-5403-2.patch, MAPREDUCE-5403.patch yarn.application.classpath is a confusing property because it is used by MapReduce and not YARN, and MapReduce already has mapreduce.application.classpath, which provides the same functionality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4065) Add .proto files to built tarball
[ https://issues.apache.org/jira/browse/MAPREDUCE-4065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524851#comment-14524851 ] Hadoop QA commented on MAPREDUCE-4065: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12650714/MAPREDUCE-4065.1.patch | | Optional Tests | javadoc javac unit | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5554/console | This message was automatically generated. Add .proto files to built tarball - Key: MAPREDUCE-4065 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4065 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 0.23.2, 2.4.0 Reporter: Ralph H Castain Assignee: Tsuyoshi Ozawa Attachments: MAPREDUCE-4065.1.patch Please add the .proto files to the built tarball so that users can build 3rd party tools that use protocol buffers without having to do an svn checkout of the source code. Sorry I don't know more about Maven, or I would provide a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5817) mappers get rescheduled on node transition even after all reducers are completed
[ https://issues.apache.org/jira/browse/MAPREDUCE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524854#comment-14524854 ] Hadoop QA commented on MAPREDUCE-5817: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12638107/mapreduce-5817.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5557/console | This message was automatically generated. mappers get rescheduled on node transition even after all reducers are completed Key: MAPREDUCE-5817 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5817 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.3.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: mapreduce-5817.patch We're seeing a behavior where a job runs long after all reducers were already finished. We found that the job was rescheduling and running a number of mappers beyond the point of reducer completion. In one situation, the job ran for some 9 more hours after all reducers completed! This happens because whenever a node transition (to an unusable state) comes into the app master, it just reschedules all mappers that already ran on the node in all cases. Therefore, if any node transition has a potential to extend the job period. Once this window opens, another node transition can prolong it, and this can happen indefinitely in theory. If there is some instability in the pool (unhealthy, etc.) for a duration, then any big job is severely vulnerable to this problem. If all reducers have been completed, JobImpl.actOnUnusableNode() should not reschedule mapper tasks. If all reducers are completed, the mapper outputs are no longer needed, and there is no need to reschedule mapper tasks as they would not be consumed anyway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5362) clean up POM dependencies
[ https://issues.apache.org/jira/browse/MAPREDUCE-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524853#comment-14524853 ] Hadoop QA commented on MAPREDUCE-5362: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12640169/mr-5362-0.patch | | Optional Tests | javadoc javac unit | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5556/console | This message was automatically generated. clean up POM dependencies - Key: MAPREDUCE-5362 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5362 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: MAPREDUCE-5362.patch, mr-5362-0.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5981) Log levels of certain MR logs can be changed to DEBUG
[ https://issues.apache.org/jira/browse/MAPREDUCE-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524864#comment-14524864 ] Hadoop QA commented on MAPREDUCE-5981: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12656504/MAPREDUCE-5981.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5558/console | This message was automatically generated. Log levels of certain MR logs can be changed to DEBUG - Key: MAPREDUCE-5981 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5981 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Varun Saxena Assignee: Varun Saxena Attachments: MAPREDUCE-5981.patch Following map reduce logs can be changed to DEBUG log level. 1. In org.apache.hadoop.mapreduce.task.reduce.Fetcher#copyFromHost(Fetcher.java : 313), the second log is not required to be at info level. This can be moved to debug as a warn log is anyways printed if verifyReply fails. SecureShuffleUtils.verifyReply(replyHash, encHash, shuffleSecretKey); LOG.info(for url=+msgToEncode+ sent hash and received reply); 2. Thread related info need not be printed in logs at INFO level. Below 2 logs can be moved to DEBUG a) In org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl#getHost(ShuffleSchedulerImpl.java : 381), below log can be changed to DEBUG LOG.info(Assigning + host + with + host.getNumKnownMapOutputs() + to + Thread.currentThread().getName()); b) In org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.getMapsForHost(ShuffleSchedulerImpl.java : 411), below log can be changed to DEBUG LOG.info(assigned + includedMaps + of + totalSize + to + host + to + Thread.currentThread().getName()); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6345) Documentation fix for when CRLA is enabled for MRAppMaster logs
[ https://issues.apache.org/jira/browse/MAPREDUCE-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated MAPREDUCE-6345: - Resolution: Fixed Fix Version/s: 2.8.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks [~ragarwal] for contribution! Committed to trunk and branch-2. Documentation fix for when CRLA is enabled for MRAppMaster logs --- Key: MAPREDUCE-6345 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6345 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 2.4.0, 2.5.0, 2.6.0, 2.7.0 Reporter: Rohit Agarwal Assignee: Rohit Agarwal Priority: Trivial Fix For: 2.8.0 Attachments: MAPREDUCE-6345.patch CRLA is enabled for the ApplicationMaster when both yarn.app.mapreduce.am.container.log.limit.kb (not mapreduce.task.userlog.limit.kb) and yarn.app.mapreduce.am.container.log.backups are greater than zero. This was changed in MAPREDUCE-5773. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4346) Adding a refined version of JobTracker.getAllJobs() and exposing through the JobClient
[ https://issues.apache.org/jira/browse/MAPREDUCE-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524614#comment-14524614 ] Hadoop QA commented on MAPREDUCE-4346: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12533607/MAPREDUCE-4346_rev4.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5512/console | This message was automatically generated. Adding a refined version of JobTracker.getAllJobs() and exposing through the JobClient -- Key: MAPREDUCE-4346 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4346 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1 Reporter: Ahmed Radwan Assignee: Ahmed Radwan Attachments: MAPREDUCE-4346.patch, MAPREDUCE-4346_rev2.patch, MAPREDUCE-4346_rev3.patch, MAPREDUCE-4346_rev4.patch The current implementation for JobTracker.getAllJobs() returns all submitted jobs in any state, in addition to retired jobs. This list can be long and represents an unneeded overhead especially in the case of clients only interested in jobs in specific state(s). It is beneficial to include a refined version where only jobs having specific statuses are returned and retired jobs are optional to include. I'll be uploading an initial patch momentarily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4271) Make TestCapacityScheduler more robust with non-Sun JDK
[ https://issues.apache.org/jira/browse/MAPREDUCE-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524626#comment-14524626 ] Hadoop QA commented on MAPREDUCE-4271: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12567098/MAPREDUCE-4271-branch1-v2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5514/console | This message was automatically generated. Make TestCapacityScheduler more robust with non-Sun JDK --- Key: MAPREDUCE-4271 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4271 Project: Hadoop Map/Reduce Issue Type: Bug Components: capacity-sched Affects Versions: 1.0.3 Reporter: Luke Lu Assignee: Yu Gao Labels: alt-jdk, capacity Attachments: MAPREDUCE-4271-branch1-v2.patch, mapreduce-4271-branch-1.patch, test-afterepatch.result, test-beforepatch.result, test-patch.result The capacity scheduler queue is initialized with a HashMap, the values of which are later added to a list (a queue for assigning tasks). TestCapacityScheduler depends on the order of the list hence not portable across JDKs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5188) error when verify FileType of RS_SOURCE in getCompanionBlocks in BlockPlacementPolicyRaid.java
[ https://issues.apache.org/jira/browse/MAPREDUCE-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524616#comment-14524616 ] Hadoop QA commented on MAPREDUCE-5188: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12580811/MAPREDUCE-5188.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5513/console | This message was automatically generated. error when verify FileType of RS_SOURCE in getCompanionBlocks in BlockPlacementPolicyRaid.java --- Key: MAPREDUCE-5188 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5188 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 2.0.2-alpha Reporter: junjin Assignee: junjin Priority: Critical Labels: contrib/raid Fix For: 2.0.2-alpha Attachments: MAPREDUCE-5188.patch error when verify FileType of RS_SOURCE in getCompanionBlocks in BlockPlacementPolicyRaid.java need change xorParityLength in line #379 to rsParityLength since it's for verifying RS_SOURCE type -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6030) In mr-jobhistory-daemon.sh, some env variables are not affected by mapred-env.sh
[ https://issues.apache.org/jira/browse/MAPREDUCE-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524874#comment-14524874 ] Hadoop QA commented on MAPREDUCE-6030: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12660804/MAPREDUCE-6030.patch | | Optional Tests | shellcheck | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5560/console | This message was automatically generated. In mr-jobhistory-daemon.sh, some env variables are not affected by mapred-env.sh Key: MAPREDUCE-6030 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6030 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 2.4.1 Reporter: Youngjoon Kim Assignee: Youngjoon Kim Priority: Minor Attachments: MAPREDUCE-6030.patch In mr-jobhistory-daemon.sh, some env variables are exported before sourcing mapred-env.sh, so these variables don't use values defined in mapred-env.sh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6020) Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job cou
[ https://issues.apache.org/jira/browse/MAPREDUCE-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524871#comment-14524871 ] Hadoop QA commented on MAPREDUCE-6020: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12659032/MAPREDUCE-6020.branch1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5559/console | This message was automatically generated. Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job counter in JobInProgress - Key: MAPREDUCE-6020 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6020 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.23.10 Reporter: zhihai xu Assignee: zhihai xu Attachments: MAPREDUCE-6020.branch1.patch Too many threads blocking on the global JobTracker lock from getJobCounters, optimize getJobCounters to release global JobTracker lock before access the per job counter in JobInProgress. It may be a lot of JobClients to call getJobCounters in JobTracker at the same time, Current code will lock the JobTracker to block all the threads to get counter from JobInProgress. It is better to unlock the JobTracker when get counter from JobInProgress(job.getCounters(counters)). So all the theads can run parallel when access its own job counter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6321) Map tasks take a lot of time to start up
[ https://issues.apache.org/jira/browse/MAPREDUCE-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524221#comment-14524221 ] Rajat Jain commented on MAPREDUCE-6321: --- Yes, we run FairScheduler. However, this is not related to FairScheduler since this slowness is during map task startup. Map tasks take a lot of time to start up Key: MAPREDUCE-6321 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6321 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 2.6.0 Reporter: Rajat Jain Priority: Critical Labels: performance I have noticed repeatedly that the map tasks take a lot of time to startup on YARN clusters. This is not the scheduling part, this is after the actual container is launched containing the Map task. Take for example, the sample log from a mapper of a Pi job that I launched. The command I used to launch the Pi job was: {code} hadoop jar /usr/lib/hadoop/share/hadoop/mapreduce/hadoop*mapreduce*examples*jar pi 10 100 {code} This is the sample job from one of the mappers which took 14 seconds to complete. If you notice from the logs, most of the time taken by this job is during the start up. I notice that the most mappers take anywhere between 7 to 15 seconds during start up and have seen this behavior consistent across mapreduce jobs. This really affects the performance of short running mappers. I run a hadoop2 / yarn cluster on a 4-5 node m1.xlarge cluster, and the mapper memory is always specified as 2048m and so on. Log: {code} 2015-04-18 06:48:34,081 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2015-04-18 06:48:34,637 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2015-04-18 06:48:34,637 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started 2015-04-18 06:48:34,690 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens: 2015-04-18 06:48:34,690 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1429338752209_0059, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@5d48e5d6) 2015-04-18 06:48:35,391 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now. 2015-04-18 06:48:36,656 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /media/ephemeral3/yarn/local/usercache/rjain/appcache/application_1429338752209_0059,/media/ephemeral1/yarn/local/usercache/rjain/appcache/application_1429338752209_0059,/media/ephemeral2/yarn/local/usercache/rjain/appcache/application_1429338752209_0059,/media/ephemeral0/yarn/local/usercache/rjain/appcache/application_1429338752209_0059 2015-04-18 06:48:36,706 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS 2015-04-18 06:48:37,387 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS 2015-04-18 06:48:39,388 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 2015-04-18 06:48:39,448 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS 2015-04-18 06:48:41,060 INFO [main] org.apache.hadoop.fs.s3native.NativeS3FileSystem: setting Progress to org.apache.hadoop.mapred.Task$TaskReporter@601211d0 comment setting up progress from Task 2015-04-18 06:48:41,098 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ] 2015-04-18 06:48:41,585 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: hdfs://ec2-54-211-109-245.compute-1.amazonaws.com:9000/user/rjain/QuasiMonteCarlo_1429339685772_504558444/in/part4:0+118 2015-04-18 06:48:43,926 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 234881020(939524080) 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 896 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 657666880 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 939524096 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 234881020; length = 58720256 2015-04-18 06:48:43,946 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 2015-04-18 06:48:44,022 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output 2015-04-18
[jira] [Commented] (MAPREDUCE-5097) Job.addArchiveToClassPath is ignored when running job with LocalJobRunner
[ https://issues.apache.org/jira/browse/MAPREDUCE-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524608#comment-14524608 ] Hadoop QA commented on MAPREDUCE-5097: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12575117/MAPREDUCE-5097-ugly-test.patch | | Optional Tests | javac unit findbugs checkstyle javadoc | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5510/console | This message was automatically generated. Job.addArchiveToClassPath is ignored when running job with LocalJobRunner - Key: MAPREDUCE-5097 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5097 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Attachments: MAPREDUCE-5097-ugly-test.patch, MAPREDUCE-5097.patch Using external dependency jar in mr job. Adding it to the job classpath via Job.addArchiveToClassPath(...) doesn't work when running with LocalJobRunner (i.e. in unit test). This makes it harder to unit-test such jobs (with third-party runtime dependencies). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4330) TaskAttemptCompletedEventTransition invalidates previously successful attempt without checking if the newly completed attempt is successful
[ https://issues.apache.org/jira/browse/MAPREDUCE-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524612#comment-14524612 ] Hadoop QA commented on MAPREDUCE-4330: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12578792/MAPREDUCE-4330-20130415.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5511/console | This message was automatically generated. TaskAttemptCompletedEventTransition invalidates previously successful attempt without checking if the newly completed attempt is successful --- Key: MAPREDUCE-4330 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4330 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.23.1 Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Attachments: MAPREDUCE-4330-20130415.1.patch, MAPREDUCE-4330-20130415.patch, MAPREDUCE-4330-21032013.1.patch, MAPREDUCE-4330-21032013.patch The previously completed attempt is removed from successAttemptCompletionEventNoMap and marked OBSOLETE. After that, if the newly completed attempt is successful then it is added to the successAttemptCompletionEventNoMap. This seems wrong because the newly completed attempt could be failed and thus there is no need to invalidate the successful attempt. One error case would be when a speculative attempt completes with killed/failed after the successful version has completed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5611) CombineFileInputFormat only requests a single location per split when more could be optimal
[ https://issues.apache.org/jira/browse/MAPREDUCE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524808#comment-14524808 ] Hadoop QA commented on MAPREDUCE-5611: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12613866/CombineFileInputFormat-trunk.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5545/console | This message was automatically generated. CombineFileInputFormat only requests a single location per split when more could be optimal --- Key: MAPREDUCE-5611 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5611 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.2.1 Reporter: Chandra Prakash Bhagtani Assignee: Chandra Prakash Bhagtani Attachments: CombineFileInputFormat-trunk.patch I have come across an issue with CombineFileInputFormat. Actually I ran a hive query on approx 1.2 GB data with CombineHiveInputFormat which internally uses CombineFileInputFormat. My cluster size is 9 datanodes and max.split.size is 256 MB When I ran this query with replication factor 9, hive consistently creates all 6 rack-local tasks and with replication factor 3 it creates 5 rack-local and 1 data local tasks. When replication factor is 9 (equal to cluster size), all the tasks should be data-local as each datanode contains all the replicas of the input data, but that is not happening i.e all the tasks are rack-local. When I dug into CombineFileInputFormat.java code in getMoreSplits method, I found the issue with the following snippet (specially in case of higher replication factor) {code:title=CombineFileInputFormat.java|borderStyle=solid} for (IteratorMap.EntryString, ListOneBlockInfo iter = nodeToBlocks.entrySet().iterator(); iter.hasNext();) { Map.EntryString, ListOneBlockInfo one = iter.next(); nodes.add(one.getKey()); ListOneBlockInfo blocksInNode = one.getValue(); // for each block, copy it into validBlocks. Delete it from // blockToNodes so that the same block does not appear in // two different splits. for (OneBlockInfo oneblock : blocksInNode) { if (blockToNodes.containsKey(oneblock)) { validBlocks.add(oneblock); blockToNodes.remove(oneblock); curSplitSize += oneblock.length; // if the accumulated split size exceeds the maximum, then // create this split. if (maxSize != 0 curSplitSize = maxSize) { // create an input split and add it to the splits array addCreatedSplit(splits, nodes, validBlocks); curSplitSize = 0; validBlocks.clear(); } } } {code} First node in the map nodeToBlocks has all the replicas of input file, so the above code creates 6 splits all with only one location. Now if JT doesn't schedule these tasks on that node, all the tasks will be rack-local, even though all the other datanodes have all the other replicas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5621) mr-jobhistory-daemon.sh doesn't have to execute mkdir and chown all the time
[ https://issues.apache.org/jira/browse/MAPREDUCE-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524798#comment-14524798 ] Hadoop QA commented on MAPREDUCE-5621: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12613541/MAPREDUCE-5621.patch | | Optional Tests | shellcheck | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5544/console | This message was automatically generated. mr-jobhistory-daemon.sh doesn't have to execute mkdir and chown all the time Key: MAPREDUCE-5621 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5621 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 3.0.0 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Priority: Minor Attachments: MAPREDUCE-5621.patch mr-jobhistory-daemon.sh executes mkdir and chown command to output the log files. This is always executed with or without a directory. In addition, this is executed not only starting daemon but also stopping daemon. It add if like hadoop-daemon.sh and yarn-daemon.sh and should control it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-3486) All jobs of all queues will be returned, whethor a particular queueName is specified or not
[ https://issues.apache.org/jira/browse/MAPREDUCE-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524823#comment-14524823 ] Hadoop QA commented on MAPREDUCE-3486: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12505621/MAPREDUCE-3486.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5547/console | This message was automatically generated. All jobs of all queues will be returned, whethor a particular queueName is specified or not --- Key: MAPREDUCE-3486 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3486 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 1.1.3, 1.3.0, 1.2.2 Reporter: XieXianshan Assignee: XieXianshan Priority: Minor Attachments: MAPREDUCE-3486.patch JobTracker.getJobsFromQueue(queueName) will return all jobs of all queues about the jobtracker even though i specify a queueName. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6350) JobHistory doesn't support fully-functional search
[ https://issues.apache.org/jira/browse/MAPREDUCE-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524240#comment-14524240 ] Hadoop QA commented on MAPREDUCE-6350: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 29s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 36s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 36s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 5s | The applied patch generated 3 new checkstyle issues (total was 15, now 17). | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 58s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | mapreduce tests | 9m 20s | Tests passed in hadoop-mapreduce-client-app. | | {color:green}+1{color} | mapreduce tests | 0m 45s | Tests passed in hadoop-mapreduce-client-common. | | | | 47m 22s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12623739/YARN-1614.v2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / d3d019c | | checkstyle | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5486/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-common.txt | | whitespace | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5486/artifact/patchprocess/whitespace.txt | | hadoop-mapreduce-client-app test log | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5486/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt | | hadoop-mapreduce-client-common test log | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5486/artifact/patchprocess/testrun_hadoop-mapreduce-client-common.txt | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5486/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5486/console | This message was automatically generated. JobHistory doesn't support fully-functional search -- Key: MAPREDUCE-6350 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6350 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Reporter: Siqi Li Assignee: Siqi Li Priority: Critical Attachments: YARN-1614.v1.patch, YARN-1614.v2.patch job history server will only output the first 50 characters of the job names in webUI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-2393) No total min share limitation of all pools
[ https://issues.apache.org/jira/browse/MAPREDUCE-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524589#comment-14524589 ] Hadoop QA commented on MAPREDUCE-2393: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12490803/MAPREDUCE-2393.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5503/console | This message was automatically generated. No total min share limitation of all pools -- Key: MAPREDUCE-2393 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2393 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/fair-share Affects Versions: 0.21.0 Reporter: Denny Ye Labels: fair, scheduler Attachments: MAPREDUCE-2393.patch hi, there is no limitation about min share of all pools with cluster total shares. User can define arbitrary amount of min share for each pool. It has such description in fair scheduler design document, but no regular code. It may critical for slot distribution. One pool can hold all cluster slots to meet it's min share that greater than cluster total slots very much. If that case has happened, we should scaled down proportionally. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4261) MRAppMaster throws NPE while stopping RMContainerAllocator service
[ https://issues.apache.org/jira/browse/MAPREDUCE-4261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524597#comment-14524597 ] Hadoop QA commented on MAPREDUCE-4261: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12528092/MAPREDUCE-4261.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5504/console | This message was automatically generated. MRAppMaster throws NPE while stopping RMContainerAllocator service -- Key: MAPREDUCE-4261 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4261 Project: Hadoop Map/Reduce Issue Type: Bug Components: mr-am, mrv2 Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.1-alpha, 2.0.2-alpha Reporter: Devaraj K Assignee: Devaraj K Attachments: MAPREDUCE-4261.patch {code:xml} 2012-05-16 18:55:54,222 INFO [Thread-1] org.apache.hadoop.yarn.service.CompositeService: Error stopping org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerLauncherRouter java.lang.NullPointerException at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerLauncherRouter.stop(MRAppMaster.java:716) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1036) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) 2012-05-16 18:55:54,222 INFO [Thread-1] org.apache.hadoop.yarn.service.CompositeService: Error stopping org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter java.lang.NullPointerException at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.getStat(RMContainerAllocator.java:521) at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.stop(RMContainerAllocator.java:227) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.stop(MRAppMaster.java:668) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$MRAppMasterShutdownHook.run(MRAppMaster.java:1036) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4469) Resource calculation in child tasks is CPU-heavy
[ https://issues.apache.org/jira/browse/MAPREDUCE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524602#comment-14524602 ] Hadoop QA commented on MAPREDUCE-4469: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12565910/MAPREDUCE-4469_rev5.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5506/console | This message was automatically generated. Resource calculation in child tasks is CPU-heavy Key: MAPREDUCE-4469 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4469 Project: Hadoop Map/Reduce Issue Type: Bug Components: performance, task Affects Versions: 1.0.3 Reporter: Todd Lipcon Assignee: Ahmed Radwan Attachments: MAPREDUCE-4469.patch, MAPREDUCE-4469_rev2.patch, MAPREDUCE-4469_rev3.patch, MAPREDUCE-4469_rev4.patch, MAPREDUCE-4469_rev5.patch In doing some benchmarking on a hadoop-1 derived codebase, I noticed that each of the child tasks was doing a ton of syscalls. Upon stracing, I noticed that it's spending a lot of time looping through all the files in /proc to calculate resource usage. As a test, I added a flag to disable use of the ResourceCalculatorPlugin within the tasks. On a CPU-bound 500G-sort workload, this improved total job runtime by about 10% (map slot-seconds by 14%, reduce slot seconds by 8%) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4998) backport MAPREDUCE-3376: Old mapred API combiner uses NULL reporter to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524604#comment-14524604 ] Hadoop QA commented on MAPREDUCE-4998: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12568797/MAPREDUCE-4998-branch-1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5508/console | This message was automatically generated. backport MAPREDUCE-3376: Old mapred API combiner uses NULL reporter to branch-1 --- Key: MAPREDUCE-4998 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4998 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Reporter: Jim Donofrio Priority: Minor Attachments: MAPREDUCE-4998-branch-1.patch http://s.apache.org/eI9 backport MAPREDUCE-3376: Old mapred API combiner uses NULL reporter to branch-1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4882) Error in estimating the length of the output file in Spill Phase
[ https://issues.apache.org/jira/browse/MAPREDUCE-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524601#comment-14524601 ] Hadoop QA commented on MAPREDUCE-4882: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12566626/MAPREDUCE-4882.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5507/console | This message was automatically generated. Error in estimating the length of the output file in Spill Phase Key: MAPREDUCE-4882 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4882 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 1.0.3 Environment: Any Environment Reporter: Lijie Xu Assignee: Jerry Chen Labels: patch Attachments: MAPREDUCE-4882.patch Original Estimate: 1h Remaining Estimate: 1h The sortAndSpill() method in MapTask.java has an error in estimating the length of the output file. The long size should be (bufvoid - bufstart) + bufend not (bufvoid - bufend) + bufstart when bufend bufstart. Here is the original code in MapTask.java. private void sortAndSpill() throws IOException, ClassNotFoundException, InterruptedException { //approximate the length of the output file to be the length of the //buffer + header lengths for the partitions long size = (bufend = bufstart ? bufend - bufstart : (bufvoid - bufend) + bufstart) + partitions * APPROX_HEADER_LENGTH; FSDataOutputStream out = null; -- I had a test on TeraSort. A snippet from mapper's log is as follows: MapTask: Spilling map output: record full = true MapTask: bufstart = 157286200; bufend = 10485460; bufvoid = 199229440 MapTask: kvstart = 262142; kvend = 131069; length = 655360 MapTask: Finished spill 3 In this occasioin, Spill Bytes should be (199229440 - 157286200) + 10485460 = 52428700 (52 MB) because the number of spilled records is 524287 and each record costs 100B. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4917) multiple BlockFixer should be supported in order to improve scalability and reduce too much work on single BlockFixer
[ https://issues.apache.org/jira/browse/MAPREDUCE-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524598#comment-14524598 ] Hadoop QA commented on MAPREDUCE-4917: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12563471/MAPREDUCE-4917.2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5505/console | This message was automatically generated. multiple BlockFixer should be supported in order to improve scalability and reduce too much work on single BlockFixer - Key: MAPREDUCE-4917 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4917 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/raid Affects Versions: 0.22.0 Reporter: Jun Jin Assignee: Jun Jin Labels: patch Fix For: 0.22.0 Attachments: MAPREDUCE-4917.1.patch, MAPREDUCE-4917.2.patch Original Estimate: 672h Remaining Estimate: 672h current implementation can only run single BlockFixer since the fsck (in RaidDFSUtil.getCorruptFiles) only check the whole DFS file system. multiple BlockFixer will do the same thing and try to fix same file if multiple BlockFixer launched. the change/fix will be mainly in BlockFixer.java and RaidDFSUtil.getCorruptFile(), to enable fsck to check the different paths defined in separated Raid.xml for single RaidNode/BlockFixer -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4956) The Additional JH Info Should Be Exposed
[ https://issues.apache.org/jira/browse/MAPREDUCE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524607#comment-14524607 ] Hadoop QA commented on MAPREDUCE-4956: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12574452/MAPREDUCE-4956_3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5509/console | This message was automatically generated. The Additional JH Info Should Be Exposed Key: MAPREDUCE-4956 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4956 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: MAPREDUCE-4956_1.patch, MAPREDUCE-4956_2.patch, MAPREDUCE-4956_3.patch In MAPREDUCE-4838, the addition info has been added to JH. This info is useful to be exposed, at least via UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4980) Parallel test execution of hadoop-mapreduce-client-core
[ https://issues.apache.org/jira/browse/MAPREDUCE-4980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524728#comment-14524728 ] Hadoop QA commented on MAPREDUCE-4980: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 1s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12611165/MAPREDUCE-4980--n8.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5533/console | This message was automatically generated. Parallel test execution of hadoop-mapreduce-client-core --- Key: MAPREDUCE-4980 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4980 Project: Hadoop Map/Reduce Issue Type: Test Components: test Affects Versions: 3.0.0 Reporter: Tsuyoshi Ozawa Assignee: Andrey Klochkov Attachments: MAPREDUCE-4980--n3.patch, MAPREDUCE-4980--n4.patch, MAPREDUCE-4980--n5.patch, MAPREDUCE-4980--n6.patch, MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n7.patch, MAPREDUCE-4980--n8.patch, MAPREDUCE-4980.1.patch, MAPREDUCE-4980.patch The maven surefire plugin supports parallel testing feature. By using it, the tests can be run more faster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5403) MR changes to accommodate yarn.application.classpath being moved to the server-side
[ https://issues.apache.org/jira/browse/MAPREDUCE-5403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524698#comment-14524698 ] Hadoop QA commented on MAPREDUCE-5403: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12594253/MAPREDUCE-5403-2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle site | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5527/console | This message was automatically generated. MR changes to accommodate yarn.application.classpath being moved to the server-side --- Key: MAPREDUCE-5403 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5403 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 2.0.5-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5403-1.patch, MAPREDUCE-5403-2.patch, MAPREDUCE-5403.patch yarn.application.classpath is a confusing property because it is used by MapReduce and not YARN, and MapReduce already has mapreduce.application.classpath, which provides the same functionality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5377) JobID is not displayed truly by hadoop job -history command
[ https://issues.apache.org/jira/browse/MAPREDUCE-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524709#comment-14524709 ] Hadoop QA commented on MAPREDUCE-5377: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12591055/MAPREDUCE-5377.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5531/console | This message was automatically generated. JobID is not displayed truly by hadoop job -history command - Key: MAPREDUCE-5377 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5377 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 1.2.0 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Priority: Minor Labels: newbie Attachments: MAPREDUCE-5377.patch JobID output by hadoop job -history command is wrong string. {quote} [hadoop@hadoop hadoop]$ hadoop job -history terasort Hadoop job: 0001_1374260789919_hadoop = Job tracker host name: job job tracker start time: Tue May 18 15:39:51 PDT 1976 User: hadoop JobName: TeraSort JobConf: hdfs://hadoop:8020/hadoop/mapred/staging/hadoop/.staging/job_201307191206_0001/job.xml Submitted At: 19-7-2013 12:06:29 Launched At: 19-7-2013 12:06:30 (0sec) Finished At: 19-7-2013 12:06:44 (14sec) Status: SUCCESS {quote} In this example, it should show job_201307191206_0001 at Hadoop job:, but shows 0001_1374260789919_hadoop. In addition, Job tracker host name and job tracker start time is invalid. This problem can solve by fixing setting of jobId in HistoryViewer(). In addition, it should fix the information of JobTracker at HistoryViewr. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5150) Backport 2009 terasort (MAPREDUCE-639) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524707#comment-14524707 ] Hadoop QA commented on MAPREDUCE-5150: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12578622/MAPREDUCE-5150-branch-1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5530/console | This message was automatically generated. Backport 2009 terasort (MAPREDUCE-639) to branch-1 -- Key: MAPREDUCE-5150 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5150 Project: Hadoop Map/Reduce Issue Type: Improvement Components: examples Affects Versions: 1.2.0 Reporter: Gera Shegalov Priority: Minor Attachments: MAPREDUCE-5150-branch-1.patch Users evaluate performance of Hadoop clusters using different benchmarks such as TeraSort. However, terasort version in branch-1 is outdated. It works on teragen dataset that cannot exceed 4 billion unique keys and it does not have the fast non-sampling partitioner SimplePartitioner either. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-3936) Clients should not enforce counter limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524711#comment-14524711 ] Hadoop QA commented on MAPREDUCE-3936: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12544972/MAPREDUCE-3936.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5532/console | This message was automatically generated. Clients should not enforce counter limits -- Key: MAPREDUCE-3936 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3936 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1 Reporter: Tom White Assignee: Tom White Attachments: MAPREDUCE-3936.patch, MAPREDUCE-3936.patch The code for enforcing counter limits (from MAPREDUCE-1943) creates a static JobConf instance to load the limits, which may throw an exception if the client limit is set to be lower than the limit on the cluster (perhaps because the cluster limit was raised from the default). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5365) Set mapreduce.job.classloader to true by default
[ https://issues.apache.org/jira/browse/MAPREDUCE-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524704#comment-14524704 ] Hadoop QA commented on MAPREDUCE-5365: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12590345/MAPREDUCE-5365.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5529/console | This message was automatically generated. Set mapreduce.job.classloader to true by default Key: MAPREDUCE-5365 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5365 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.0.5-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5365.patch MAPREDUCE-1700 introduced the mapreduce.job.classpath option, which uses a custom classloader to separate system classes from user classes. It seems like there are only rare cases when a user would not want this on, and that it should enabled by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4346) Adding a refined version of JobTracker.getAllJobs() and exposing through the JobClient
[ https://issues.apache.org/jira/browse/MAPREDUCE-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524703#comment-14524703 ] Hadoop QA commented on MAPREDUCE-4346: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12533607/MAPREDUCE-4346_rev4.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5528/console | This message was automatically generated. Adding a refined version of JobTracker.getAllJobs() and exposing through the JobClient -- Key: MAPREDUCE-4346 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4346 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1 Reporter: Ahmed Radwan Assignee: Ahmed Radwan Attachments: MAPREDUCE-4346.patch, MAPREDUCE-4346_rev2.patch, MAPREDUCE-4346_rev3.patch, MAPREDUCE-4346_rev4.patch The current implementation for JobTracker.getAllJobs() returns all submitted jobs in any state, in addition to retired jobs. This list can be long and represents an unneeded overhead especially in the case of clients only interested in jobs in specific state(s). It is beneficial to include a refined version where only jobs having specific statuses are returned and retired jobs are optional to include. I'll be uploading an initial patch momentarily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-3807) JobTracker needs fix similar to HDFS-94
[ https://issues.apache.org/jira/browse/MAPREDUCE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524693#comment-14524693 ] Hadoop QA commented on MAPREDUCE-3807: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12515105/MAPREDUCE-3807.patch | | Optional Tests | | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5526/console | This message was automatically generated. JobTracker needs fix similar to HDFS-94 --- Key: MAPREDUCE-3807 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3807 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.0.0 Reporter: Harsh J Labels: newbie Attachments: MAPREDUCE-3807.patch 1.0 JobTracker's jobtracker.jsp page currently shows: {code} h2Cluster Summary (Heap Size is %= StringUtils.byteDesc(Runtime.getRuntime().totalMemory()) %/%= StringUtils.byteDesc(Runtime.getRuntime().maxMemory()) %)/h2 {code} It could use an improvement same as HDFS-94 to reflect live heap usage more accurately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6345) Documentation fix for when CRLA is enabled for MRAppMaster logs
[ https://issues.apache.org/jira/browse/MAPREDUCE-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated MAPREDUCE-6345: - Summary: Documentation fix for when CRLA is enabled for MRAppMaster logs (was: Documentation fix for when CRLA is enabled for MR AppMaster logs) Documentation fix for when CRLA is enabled for MRAppMaster logs --- Key: MAPREDUCE-6345 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6345 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 2.4.0, 2.5.0, 2.6.0, 2.7.0 Reporter: Rohit Agarwal Assignee: Rohit Agarwal Priority: Trivial Attachments: MAPREDUCE-6345.patch CRLA is enabled for the ApplicationMaster when both yarn.app.mapreduce.am.container.log.limit.kb (not mapreduce.task.userlog.limit.kb) and yarn.app.mapreduce.am.container.log.backups are greater than zero. This was changed in MAPREDUCE-5773. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6345) Documentation fix for when CRLA is enabled for MR AppMaster logs
[ https://issues.apache.org/jira/browse/MAPREDUCE-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated MAPREDUCE-6345: - Summary: Documentation fix for when CRLA is enabled for MR AppMaster logs (was: Documentation fix for when CRLA is enabled for MR App Master's logs) Documentation fix for when CRLA is enabled for MR AppMaster logs Key: MAPREDUCE-6345 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6345 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 2.4.0, 2.5.0, 2.6.0, 2.7.0 Reporter: Rohit Agarwal Assignee: Rohit Agarwal Priority: Trivial Attachments: MAPREDUCE-6345.patch CRLA is enabled for the ApplicationMaster when both yarn.app.mapreduce.am.container.log.limit.kb (not mapreduce.task.userlog.limit.kb) and yarn.app.mapreduce.am.container.log.backups are greater than zero. This was changed in MAPREDUCE-5773. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6345) Documentation fix for when CRLA is enabled for MRAppMaster logs
[ https://issues.apache.org/jira/browse/MAPREDUCE-6345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524453#comment-14524453 ] Hudson commented on MAPREDUCE-6345: --- SUCCESS: Integrated in Hadoop-trunk-Commit #7716 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7716/]) MAPREDUCE-6345. Documentation fix for when CRLA is enabled for MRAppMaster logs. (Rohit Agarwal via gera) (gera: rev f1a152cc0adc071277c80637ea6f5faa0bf06a1a) * hadoop-mapreduce-project/CHANGES.txt * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml Documentation fix for when CRLA is enabled for MRAppMaster logs --- Key: MAPREDUCE-6345 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6345 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 2.4.0, 2.5.0, 2.6.0, 2.7.0 Reporter: Rohit Agarwal Assignee: Rohit Agarwal Priority: Trivial Attachments: MAPREDUCE-6345.patch CRLA is enabled for the ApplicationMaster when both yarn.app.mapreduce.am.container.log.limit.kb (not mapreduce.task.userlog.limit.kb) and yarn.app.mapreduce.am.container.log.backups are greater than zero. This was changed in MAPREDUCE-5773. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4136) Hadoop streaming might succeed even through reducer fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524554#comment-14524554 ] Hadoop QA commented on MAPREDUCE-4136: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12522230/mapreduce-4136.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5493/console | This message was automatically generated. Hadoop streaming might succeed even through reducer fails - Key: MAPREDUCE-4136 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4136 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Affects Versions: 0.20.205.0 Reporter: Wouter de Bie Attachments: mapreduce-4136.patch Hadoop streaming can even succeed even though the reducer has failed. This happens when Hadoop calls {{PipeReducer.close()}}, but in the mean time the reducer has failed and the process has died. When {{clientOut_.flush()}} throws an {{IOException}} in {{PipeMapRed.mapRedFinish()}} this exception is caught but only logged. The exit status of the child process is never checked and task is marked as successful. I've attached a patch that seems to fix it for us. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-3876) vertica query, sql command not properly ended
[ https://issues.apache.org/jira/browse/MAPREDUCE-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524547#comment-14524547 ] Hadoop QA commented on MAPREDUCE-3876: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12516880/HADOOP-oracleDriver-src.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5492/console | This message was automatically generated. vertica query, sql command not properly ended - Key: MAPREDUCE-3876 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3876 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 1.0.0 Environment: Red Hat 5.5 Oracle 11 Reporter: Joseph Doss Labels: hadoop, newbie, patch Attachments: HADOOP-oracleDriver-src.patch When running a test script, we're getting a java IO exception thrown. This test works on hadoop-0.20.0 but not on hadoop-1.0.0. Fri Feb 17 11:36:40 EST 2012 Running processes with name syncGL.sh: 0 LIB_JARS: /home/hadoop/verticasync/lib/vertica_4.1.14_jdk_5.jar,/home/hadoop/verticasync/lib/mail.jar,/home/hadoop/verticasync/lib/jdbc14.jar VERTICA_SYNC_JAR: /home/hadoop/verticasync/lib/vertica-sync.jar PROPERTIES_FILE: /home/hadoop/verticasync/config/ssp-vertica-sync-gl.properties Starting Vertica data sync - GL - process Warning: $HADOOP_HOME is deprecated. 12/02/17 11:36:43 INFO mapred.JobClient: Running job: job_201202171122_0001 12/02/17 11:36:44 INFO mapred.JobClient: map 0% reduce 0% 12/02/17 11:36:56 INFO mapred.JobClient: Task Id : attempt_201202171122_0001_m_00_0, Status : FAILED java.io.IOException: ORA-00933: SQL command not properly ended at org.apache.hadoop.mapred.lib.db.DBInputFormat.getRecordReader(DBInputFormat.java:289) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.init(MapTask.java:197) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.mapred.Child.main(Child.java:249) 12/02/17 11:36:57 INFO mapred.JobClient: Task Id : attempt_201202171122_0001_m_01_0, Status : FAILED java.io.IOException: ORA-00933: SQL command not properly ended at org.apache.hadoop.mapred.lib.db.DBInputFormat.getRecordReader(DBInputFormat.java:289) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.init(MapTask.java:197) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.mapred.Child.main(Child.java:249) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4506) EofException / 'connection reset by peer' while copying map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524557#comment-14524557 ] Hadoop QA commented on MAPREDUCE-4506: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12538889/ReduceTask.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5494/console | This message was automatically generated. EofException / 'connection reset by peer' while copying map output --- Key: MAPREDUCE-4506 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4506 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.0.3 Environment: Ubuntu Linux 12.04 LTS, 64-bit, Java 6 update 33 Reporter: Piotr Kołaczkowski Priority: Minor Attachments: RamManager.patch, ReduceTask.patch When running complex mapreduce jobs with many mappers and reducers (e.g. 8 mappers, 8 reducers on a 8 core machine), sometimes the following exceptions pop up in the logs during the shuffle phase: {noformat} WARN [570516323@qtp-2060060479-164] 2012-07-19 02:50:21,229 TaskTracker.java (line 3894) getMapOutput(attempt_201207161621_0217_m_71_0,0) failed : org.mortbay.jetty.EofException at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:787) at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:568) at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1005) at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:648) at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:579) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3872) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166) at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:923) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:72) at sun.nio.ch.IOUtil.write(IOUtil.java:43) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334) at org.mortbay.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:169) at org.mortbay.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:221) at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:721) {noformat} The problem looks like some network problems at first, however it turns out that hadoop shuffleInMemory sometimes deliberately closes map-output-copy connections just to reopen them a few milliseconds later, because of temporary
[jira] [Commented] (MAPREDUCE-3882) fix some compile warnings of hadoop-mapreduce-examples
[ https://issues.apache.org/jira/browse/MAPREDUCE-3882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524546#comment-14524546 ] Hadoop QA commented on MAPREDUCE-3882: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12515084/mapreduce-3882.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5491/console | This message was automatically generated. fix some compile warnings of hadoop-mapreduce-examples -- Key: MAPREDUCE-3882 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3882 Project: Hadoop Map/Reduce Issue Type: Improvement Environment: Windows 7 Reporter: Changming Sun Priority: Minor Attachments: mapreduce-3882.patch Original Estimate: 2m Remaining Estimate: 2m fix some compile warnings of hadoop-mapreduce-examples -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4308) Remove excessive split log messages
[ https://issues.apache.org/jira/browse/MAPREDUCE-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524565#comment-14524565 ] Hadoop QA commented on MAPREDUCE-4308: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12530811/mapreduce-4308-branch-1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5497/console | This message was automatically generated. Remove excessive split log messages --- Key: MAPREDUCE-4308 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4308 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker Affects Versions: 1.0.3 Reporter: Kihwal Lee Attachments: mapreduce-4308-branch-1.patch Job tracker currently prints out information on every split. {noformat} 2012-05-20 00:06:01,985 INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201205100740_1745_m_00 has split on node:/192.168.0.1 /my.totally.madeup.host.com {noformat} I looked at one cluster and these messages were taking up more than 30% of the JT log. If jobs have large number of maps, it can be worse. I think it is reasonable to lower the log level of the statement from INFO to DEBUG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-1290) DBOutputFormat does not support rewriteBatchedStatements when using MySQL jdbc drivers
[ https://issues.apache.org/jira/browse/MAPREDUCE-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524559#comment-14524559 ] Hadoop QA commented on MAPREDUCE-1290: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12542003/MapReduce-1290-trunk.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5495/console | This message was automatically generated. DBOutputFormat does not support rewriteBatchedStatements when using MySQL jdbc drivers -- Key: MAPREDUCE-1290 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1290 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.1 Reporter: Joe Crobak Labels: DBOutoutFormat, patch Attachments: MAPREDUCE-1290.patch, MapReduce-1290-trunk.patch The DBOutputFormat adds a semi-colon to the end of the INSERT statement that it uses to save fields to the database. Semicolons are typically used in command line programs but are not needed when using the JDBC API. In this case, the stray semi-colon breaks rewriteBatchedStatement support. See: http://forums.mysql.com/read.php?39,271526,271526#msg-271526 for an example. In my use case, rewriteBatchedStatement is very useful because it increases the speed of inserts and reduces memory consumption. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4639) CombineFileInputFormat#getSplits should throw IOException when input paths contain a directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524564#comment-14524564 ] Hadoop QA commented on MAPREDUCE-4639: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12544000/MAPREDUCE-4639.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5496/console | This message was automatically generated. CombineFileInputFormat#getSplits should throw IOException when input paths contain a directory -- Key: MAPREDUCE-4639 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4639 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Reporter: Jim Donofrio Priority: Minor Attachments: MAPREDUCE-4639.patch FileInputFormat#getSplits throws an IOException when the input paths contain a directory. CombineFileInputFormat should do the same, otherwise the jo will not fail until the record reader is initialized when FileSystem#open will say that the directory does not exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4473) tasktracker rank on machines.jsp?type=active
[ https://issues.apache.org/jira/browse/MAPREDUCE-4473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524567#comment-14524567 ] Hadoop QA commented on MAPREDUCE-4473: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12537657/MAPREDUCE-4473.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5498/console | This message was automatically generated. tasktracker rank on machines.jsp?type=active Key: MAPREDUCE-4473 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4473 Project: Hadoop Map/Reduce Issue Type: Improvement Components: tasktracker Affects Versions: 0.20.2, 0.21.0, 0.22.0, 0.23.0, 0.23.1, 1.0.0, 1.0.1, 1.0.2, 1.0.3 Reporter: jian fan Priority: Minor Labels: tasktracker Attachments: MAPREDUCE-4473.patch sometimes we need to simple judge which tasktracker is down from the page of machines.jsp?type=active -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4293) Rumen TraceBuilder gets NPE some times
[ https://issues.apache.org/jira/browse/MAPREDUCE-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524545#comment-14524545 ] Hadoop QA commented on MAPREDUCE-4293: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12530340/4293.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5490/console | This message was automatically generated. Rumen TraceBuilder gets NPE some times -- Key: MAPREDUCE-4293 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4293 Project: Hadoop Map/Reduce Issue Type: Bug Components: tools/rumen Reporter: Ravi Gummadi Assignee: Ravi Gummadi Attachments: 4293.patch Rumen TraceBuilder's JobBuilder.processTaskFailedEvent throws NPE if failedDueToAttempt is not available in history. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5748) Potential null pointer deference in ShuffleHandler#Shuffle#messageReceived()
[ https://issues.apache.org/jira/browse/MAPREDUCE-5748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524766#comment-14524766 ] Hadoop QA commented on MAPREDUCE-5748: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12635637/0001-MAPREDUCE-5748-Potential-null-pointer-deference-in-S.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5540/console | This message was automatically generated. Potential null pointer deference in ShuffleHandler#Shuffle#messageReceived() Key: MAPREDUCE-5748 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5748 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: 0001-MAPREDUCE-5748-Potential-null-pointer-deference-in-S.patch Starting around line 510: {code} ChannelFuture lastMap = null; for (String mapId : mapIds) { ... } lastMap.addListener(metrics); lastMap.addListener(ChannelFutureListener.CLOSE); {code} If mapIds is empty, lastMap would remain null, leading to NPE in addListener() call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5621) mr-jobhistory-daemon.sh doesn't have to execute mkdir and chown all the time
[ https://issues.apache.org/jira/browse/MAPREDUCE-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524778#comment-14524778 ] Hadoop QA commented on MAPREDUCE-5621: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12613541/MAPREDUCE-5621.patch | | Optional Tests | shellcheck | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5542/console | This message was automatically generated. mr-jobhistory-daemon.sh doesn't have to execute mkdir and chown all the time Key: MAPREDUCE-5621 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5621 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 3.0.0 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Priority: Minor Attachments: MAPREDUCE-5621.patch mr-jobhistory-daemon.sh executes mkdir and chown command to output the log files. This is always executed with or without a directory. In addition, this is executed not only starting daemon but also stopping daemon. It add if like hadoop-daemon.sh and yarn-daemon.sh and should control it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-3876) vertica query, sql command not properly ended
[ https://issues.apache.org/jira/browse/MAPREDUCE-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524539#comment-14524539 ] Hadoop QA commented on MAPREDUCE-3876: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12516880/HADOOP-oracleDriver-src.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5489/console | This message was automatically generated. vertica query, sql command not properly ended - Key: MAPREDUCE-3876 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3876 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 1.0.0 Environment: Red Hat 5.5 Oracle 11 Reporter: Joseph Doss Labels: hadoop, newbie, patch Attachments: HADOOP-oracleDriver-src.patch When running a test script, we're getting a java IO exception thrown. This test works on hadoop-0.20.0 but not on hadoop-1.0.0. Fri Feb 17 11:36:40 EST 2012 Running processes with name syncGL.sh: 0 LIB_JARS: /home/hadoop/verticasync/lib/vertica_4.1.14_jdk_5.jar,/home/hadoop/verticasync/lib/mail.jar,/home/hadoop/verticasync/lib/jdbc14.jar VERTICA_SYNC_JAR: /home/hadoop/verticasync/lib/vertica-sync.jar PROPERTIES_FILE: /home/hadoop/verticasync/config/ssp-vertica-sync-gl.properties Starting Vertica data sync - GL - process Warning: $HADOOP_HOME is deprecated. 12/02/17 11:36:43 INFO mapred.JobClient: Running job: job_201202171122_0001 12/02/17 11:36:44 INFO mapred.JobClient: map 0% reduce 0% 12/02/17 11:36:56 INFO mapred.JobClient: Task Id : attempt_201202171122_0001_m_00_0, Status : FAILED java.io.IOException: ORA-00933: SQL command not properly ended at org.apache.hadoop.mapred.lib.db.DBInputFormat.getRecordReader(DBInputFormat.java:289) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.init(MapTask.java:197) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.mapred.Child.main(Child.java:249) 12/02/17 11:36:57 INFO mapred.JobClient: Task Id : attempt_201202171122_0001_m_01_0, Status : FAILED java.io.IOException: ORA-00933: SQL command not properly ended at org.apache.hadoop.mapred.lib.db.DBInputFormat.getRecordReader(DBInputFormat.java:289) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.init(MapTask.java:197) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.mapred.Child.main(Child.java:249) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-3882) fix some compile warnings of hadoop-mapreduce-examples
[ https://issues.apache.org/jira/browse/MAPREDUCE-3882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524537#comment-14524537 ] Hadoop QA commented on MAPREDUCE-3882: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12515084/mapreduce-3882.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5488/console | This message was automatically generated. fix some compile warnings of hadoop-mapreduce-examples -- Key: MAPREDUCE-3882 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3882 Project: Hadoop Map/Reduce Issue Type: Improvement Environment: Windows 7 Reporter: Changming Sun Priority: Minor Attachments: mapreduce-3882.patch Original Estimate: 2m Remaining Estimate: 2m fix some compile warnings of hadoop-mapreduce-examples -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4271) Make TestCapacityScheduler more robust with non-Sun JDK
[ https://issues.apache.org/jira/browse/MAPREDUCE-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524663#comment-14524663 ] Hadoop QA commented on MAPREDUCE-4271: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12567098/MAPREDUCE-4271-branch1-v2.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5522/console | This message was automatically generated. Make TestCapacityScheduler more robust with non-Sun JDK --- Key: MAPREDUCE-4271 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4271 Project: Hadoop Map/Reduce Issue Type: Bug Components: capacity-sched Affects Versions: 1.0.3 Reporter: Luke Lu Assignee: Yu Gao Labels: alt-jdk, capacity Attachments: MAPREDUCE-4271-branch1-v2.patch, mapreduce-4271-branch-1.patch, test-afterepatch.result, test-beforepatch.result, test-patch.result The capacity scheduler queue is initialized with a HashMap, the values of which are later added to a list (a queue for assigning tasks). TestCapacityScheduler depends on the order of the list hence not portable across JDKs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5490) MapReduce doesn't set the environment variable for children processes
[ https://issues.apache.org/jira/browse/MAPREDUCE-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524759#comment-14524759 ] Hadoop QA commented on MAPREDUCE-5490: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12629589/MAPREDUCE-5490.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5539/console | This message was automatically generated. MapReduce doesn't set the environment variable for children processes - Key: MAPREDUCE-5490 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5490 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.2.1 Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: MAPREDUCE-5490.patch, mr-5490.patch, mr-5490.patch Currently, MapReduce uses the command line argument to pass the classpath to the child. This breaks if the process forks a child that needs the same classpath. Such a case happens in Hive when it uses map-side joins. I propose that we make MapReduce in branch-1 use the CLASSPATH environment variable like YARN does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5704) Optimize nextJobId in JobTracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524744#comment-14524744 ] Hadoop QA commented on MAPREDUCE-5704: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12621052/MAPREDUCE-5704.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5535/console | This message was automatically generated. Optimize nextJobId in JobTracker Key: MAPREDUCE-5704 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5704 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker, mrv1 Affects Versions: 1.2.1 Reporter: JamesLi Assignee: JamesLi Attachments: MAPREDUCE-5704.patch When jobtracker start, nextJobId start with 1,if we have run 3000 jobs then restart jobtracker and run a new job,we can not see this new job on jobtracker:5030/jobhistory.jsp unless click get more results button. In jobhistory_jsp.java, array SCAN_SIZES controls job numbers displayed on jobhistory.jsp. I make a little chage,when jobtracker start,find the biggest id under history done directory,job will start with maxId+1 or 1 if can not find any job files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4502) Node-level aggregation with combining the result of maps
[ https://issues.apache.org/jira/browse/MAPREDUCE-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524738#comment-14524738 ] Hadoop QA commented on MAPREDUCE-4502: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 1s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12592783/MAPREDUCE-4502.10.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5534/console | This message was automatically generated. Node-level aggregation with combining the result of maps Key: MAPREDUCE-4502 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4502 Project: Hadoop Map/Reduce Issue Type: Improvement Components: applicationmaster, mrv2 Affects Versions: 3.0.0 Reporter: Tsuyoshi Ozawa Assignee: Tsuyoshi Ozawa Attachments: MAPREDUCE-4502.1.patch, MAPREDUCE-4502.10.patch, MAPREDUCE-4502.2.patch, MAPREDUCE-4502.3.patch, MAPREDUCE-4502.4.patch, MAPREDUCE-4502.5.patch, MAPREDUCE-4502.6.patch, MAPREDUCE-4502.7.patch, MAPREDUCE-4502.8.patch, MAPREDUCE-4502.8.patch, MAPREDUCE-4502.9.patch, MAPREDUCE-4502.9.patch, MAPREDUCE-4525-pof.diff, design_v2.pdf, design_v3.pdf, speculative_draft.pdf The shuffle costs is expensive in Hadoop in spite of the existence of combiner, because the scope of combining is limited within only one MapTask. To solve this problem, it's a good way to aggregate the result of maps per node/rack by launch combiner. This JIRA is to implement the multi-level aggregation infrastructure, including combining per container(MAPREDUCE-3902 is related), coordinating containers by application master without breaking fault tolerance of jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4883) Reducer's Maximum Shuffle Buffer Size should be enlarged for 64bit JVM
[ https://issues.apache.org/jira/browse/MAPREDUCE-4883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524751#comment-14524751 ] Hadoop QA commented on MAPREDUCE-4883: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12566621/MAPREDUCE-4883.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5537/console | This message was automatically generated. Reducer's Maximum Shuffle Buffer Size should be enlarged for 64bit JVM -- Key: MAPREDUCE-4883 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4883 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.20.2, 1.0.3 Environment: Especially for 64bit JVM Reporter: Lijie Xu Assignee: Jerry Chen Labels: patch Attachments: MAPREDUCE-4883.patch Original Estimate: 12h Remaining Estimate: 12h In hadoop-0.20.2, hadoop-1.0.3 or other versions, reducer's shuffle buffer size cannot exceed 2048MB (i.e., Integer.MAX_VALUE). This is reasonable for 32bit JVM. But for 64bit JVM, although reducer's JVM size can be set more than 2048MB (e.g., mapred.child.java.opts=-Xmx4000m), the heap size used for shuffle buffer is at most 2048MB * maxInMemCopyUse (default 0.7) not 4000MB * maxInMemCopyUse. So the pointed piece of code in ReduceTask.java needs modification for 64bit JVM. --- private final long maxSize; private final long maxSingleShuffleLimit; private long size = 0; private Object dataAvailable = new Object(); private long fullSize = 0; private int numPendingRequests = 0; private int numRequiredMapOutputs = 0; private int numClosed = 0; private boolean closed = false; public ShuffleRamManager(Configuration conf) throws IOException { final float maxInMemCopyUse = conf.getFloat(mapred.job.shuffle.input.buffer.percent, 0.70f); if (maxInMemCopyUse 1.0 || maxInMemCopyUse 0.0) { throw new IOException(mapred.job.shuffle.input.buffer.percent + maxInMemCopyUse); } // Allow unit tests to fix Runtime memory -- maxSize = (int)(conf.getInt(mapred.job.reduce.total.mem.bytes, --(int)Math.min(Runtime.getRuntime().maxMemory(), Integer.MAX_VALUE)) -- * maxInMemCopyUse); maxSingleShuffleLimit = (long)(maxSize * MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION); LOG.info(ShuffleRamManager: MemoryLimit= + maxSize + , MaxSingleShuffleLimit= + maxSingleShuffleLimit); } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-3486) All jobs of all queues will be returned, whethor a particular queueName is specified or not
[ https://issues.apache.org/jira/browse/MAPREDUCE-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524757#comment-14524757 ] Hadoop QA commented on MAPREDUCE-3486: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12505621/MAPREDUCE-3486.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5538/console | This message was automatically generated. All jobs of all queues will be returned, whethor a particular queueName is specified or not --- Key: MAPREDUCE-3486 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3486 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 1.1.3, 1.3.0, 1.2.2 Reporter: XieXianshan Assignee: XieXianshan Priority: Minor Attachments: MAPREDUCE-3486.patch JobTracker.getJobsFromQueue(queueName) will return all jobs of all queues about the jobtracker even though i specify a queueName. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5611) CombineFileInputFormat only requests a single location per split when more could be optimal
[ https://issues.apache.org/jira/browse/MAPREDUCE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524745#comment-14524745 ] Hadoop QA commented on MAPREDUCE-5611: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12613866/CombineFileInputFormat-trunk.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5536/console | This message was automatically generated. CombineFileInputFormat only requests a single location per split when more could be optimal --- Key: MAPREDUCE-5611 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5611 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.2.1 Reporter: Chandra Prakash Bhagtani Assignee: Chandra Prakash Bhagtani Attachments: CombineFileInputFormat-trunk.patch I have come across an issue with CombineFileInputFormat. Actually I ran a hive query on approx 1.2 GB data with CombineHiveInputFormat which internally uses CombineFileInputFormat. My cluster size is 9 datanodes and max.split.size is 256 MB When I ran this query with replication factor 9, hive consistently creates all 6 rack-local tasks and with replication factor 3 it creates 5 rack-local and 1 data local tasks. When replication factor is 9 (equal to cluster size), all the tasks should be data-local as each datanode contains all the replicas of the input data, but that is not happening i.e all the tasks are rack-local. When I dug into CombineFileInputFormat.java code in getMoreSplits method, I found the issue with the following snippet (specially in case of higher replication factor) {code:title=CombineFileInputFormat.java|borderStyle=solid} for (IteratorMap.EntryString, ListOneBlockInfo iter = nodeToBlocks.entrySet().iterator(); iter.hasNext();) { Map.EntryString, ListOneBlockInfo one = iter.next(); nodes.add(one.getKey()); ListOneBlockInfo blocksInNode = one.getValue(); // for each block, copy it into validBlocks. Delete it from // blockToNodes so that the same block does not appear in // two different splits. for (OneBlockInfo oneblock : blocksInNode) { if (blockToNodes.containsKey(oneblock)) { validBlocks.add(oneblock); blockToNodes.remove(oneblock); curSplitSize += oneblock.length; // if the accumulated split size exceeds the maximum, then // create this split. if (maxSize != 0 curSplitSize = maxSize) { // create an input split and add it to the splits array addCreatedSplit(splits, nodes, validBlocks); curSplitSize = 0; validBlocks.clear(); } } } {code} First node in the map nodeToBlocks has all the replicas of input file, so the above code creates 6 splits all with only one location. Now if JT doesn't schedule these tasks on that node, all the tasks will be rack-local, even though all the other datanodes have all the other replicas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4957) Throw FileNotFoundException when running in single node and mapreduce.framework.name is local
[ https://issues.apache.org/jira/browse/MAPREDUCE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524769#comment-14524769 ] Hadoop QA commented on MAPREDUCE-4957: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12566460/MAPREDUCE-4957.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5541/console | This message was automatically generated. Throw FileNotFoundException when running in single node and mapreduce.framework.name is local --- Key: MAPREDUCE-4957 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4957 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Yi Liu Assignee: Yi Liu Priority: Minor Attachments: MAPREDUCE-4957.patch, MAPREDUCE-4957.patch Run in single node and mapreduce.framework.name is local, and get following error: java.io.FileNotFoundException: File does not exist: /root/proj/hive-trunk/build/dist/lib/hive-builtins-0.11.0-SNAPSHOT.jar at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:772) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:71) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:254) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:292) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:365) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1450) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:617) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:612) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1450) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:612) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:446) at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:683) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: /root/proj/hive-trunk/build/dist/lib/hive-builtins-0.11.0-SNAPSHOT.jar)' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5799) add default value of MR_AM_ADMIN_USER_ENV
[ https://issues.apache.org/jira/browse/MAPREDUCE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524194#comment-14524194 ] Hadoop QA commented on MAPREDUCE-5799: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 56s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 38s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 43s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 31s | The applied patch generated 1 new checkstyle issues (total was 26, now 26). | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 41s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:red}-1{color} | mapreduce tests | 98m 52s | Tests failed in hadoop-mapreduce-client-jobclient. | | | | 134m 55s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.mapred.TestMiniMRClientCluster | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12729831/MAPREDUCE-5799.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 3393461 | | checkstyle | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5485/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-jobclient.txt | | hadoop-mapreduce-client-jobclient test log | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5485/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5485/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5485/console | This message was automatically generated. add default value of MR_AM_ADMIN_USER_ENV - Key: MAPREDUCE-5799 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5799 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.3.0 Reporter: Liyin Liang Assignee: Rajesh Kartha Attachments: MAPREDUCE-5799-1.diff, MAPREDUCE-5799.002.patch, MAPREDUCE-5799.diff Submit a 1 map + 1 reduce sleep job with the following config: {code} property namemapreduce.map.output.compress/name valuetrue/value /property property namemapreduce.map.output.compress.codec/name valueorg.apache.hadoop.io.compress.SnappyCodec/value /property property namemapreduce.job.ubertask.enable/name valuetrue/value /property {code} And the LinuxContainerExecutor is enable on NodeManager. This job will fail with the following error: {code} 2014-03-18 21:28:20,153 FATAL [uber-SubtaskRunner] org.apache.hadoop.mapred.LocalContainerLauncher: Error running local (uberized) 'child' : java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method) at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63) at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163) at org.apache.hadoop.mapred.IFile$Writer.init(IFile.java:115) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1583) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700) at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774)
[jira] [Commented] (MAPREDUCE-6321) Map tasks take a lot of time to start up
[ https://issues.apache.org/jira/browse/MAPREDUCE-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524222#comment-14524222 ] Rajat Jain commented on MAPREDUCE-6321: --- Yes, we run FairScheduler. However, this is not related to FairScheduler since this slowness is during map task startup. Map tasks take a lot of time to start up Key: MAPREDUCE-6321 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6321 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 2.6.0 Reporter: Rajat Jain Priority: Critical Labels: performance I have noticed repeatedly that the map tasks take a lot of time to startup on YARN clusters. This is not the scheduling part, this is after the actual container is launched containing the Map task. Take for example, the sample log from a mapper of a Pi job that I launched. The command I used to launch the Pi job was: {code} hadoop jar /usr/lib/hadoop/share/hadoop/mapreduce/hadoop*mapreduce*examples*jar pi 10 100 {code} This is the sample job from one of the mappers which took 14 seconds to complete. If you notice from the logs, most of the time taken by this job is during the start up. I notice that the most mappers take anywhere between 7 to 15 seconds during start up and have seen this behavior consistent across mapreduce jobs. This really affects the performance of short running mappers. I run a hadoop2 / yarn cluster on a 4-5 node m1.xlarge cluster, and the mapper memory is always specified as 2048m and so on. Log: {code} 2015-04-18 06:48:34,081 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2015-04-18 06:48:34,637 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2015-04-18 06:48:34,637 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started 2015-04-18 06:48:34,690 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens: 2015-04-18 06:48:34,690 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1429338752209_0059, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@5d48e5d6) 2015-04-18 06:48:35,391 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now. 2015-04-18 06:48:36,656 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /media/ephemeral3/yarn/local/usercache/rjain/appcache/application_1429338752209_0059,/media/ephemeral1/yarn/local/usercache/rjain/appcache/application_1429338752209_0059,/media/ephemeral2/yarn/local/usercache/rjain/appcache/application_1429338752209_0059,/media/ephemeral0/yarn/local/usercache/rjain/appcache/application_1429338752209_0059 2015-04-18 06:48:36,706 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS 2015-04-18 06:48:37,387 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS 2015-04-18 06:48:39,388 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 2015-04-18 06:48:39,448 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS 2015-04-18 06:48:41,060 INFO [main] org.apache.hadoop.fs.s3native.NativeS3FileSystem: setting Progress to org.apache.hadoop.mapred.Task$TaskReporter@601211d0 comment setting up progress from Task 2015-04-18 06:48:41,098 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ] 2015-04-18 06:48:41,585 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: hdfs://ec2-54-211-109-245.compute-1.amazonaws.com:9000/user/rjain/QuasiMonteCarlo_1429339685772_504558444/in/part4:0+118 2015-04-18 06:48:43,926 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 234881020(939524080) 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 896 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 657666880 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 939524096 2015-04-18 06:48:43,927 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 234881020; length = 58720256 2015-04-18 06:48:43,946 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 2015-04-18 06:48:44,022 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output 2015-04-18
[jira] [Commented] (MAPREDUCE-2058) FairScheduler:NullPointerException in web interface when JobTracker not initialized
[ https://issues.apache.org/jira/browse/MAPREDUCE-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524587#comment-14524587 ] Hadoop QA commented on MAPREDUCE-2058: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12555264/MAPREDUCE-2058-branch-1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5502/console | This message was automatically generated. FairScheduler:NullPointerException in web interface when JobTracker not initialized --- Key: MAPREDUCE-2058 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2058 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/fair-share Affects Versions: 0.22.0, 1.0.4 Reporter: Dan Adkins Attachments: MAPREDUCE-2058-branch-1.patch, MAPREDUCE-2058.patch When I contact the jobtracker web interface prior to the job tracker being fully initialized (say, if hdfs is still in safe mode), I get the following error: 10/09/09 18:06:02 ERROR mortbay.log: /jobtracker.jsp java.lang.NullPointerException at org.apache.hadoop.mapred.FairScheduler.getJobs(FairScheduler.java:909) at org.apache.hadoop.mapred.JobTracker.getJobsFromQueue(JobTracker.java:4357) at org.apache.hadoop.mapred.JobTracker.getQueueInfoArray(JobTracker.java:4334) at org.apache.hadoop.mapred.JobTracker.getRootQueues(JobTracker.java:4295) at org.apache.hadoop.mapred.jobtracker_jsp.generateSummaryTable(jobtracker_jsp.java:44) at org.apache.hadoop.mapred.jobtracker_jsp._jspService(jobtracker_jsp.java:176) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1124) at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:857) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:361) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:324) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4482) Backport MR sort plugin(MAPREDUCE-2454) to Hadoop 1.2
[ https://issues.apache.org/jira/browse/MAPREDUCE-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524581#comment-14524581 ] Hadoop QA commented on MAPREDUCE-4482: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12546055/mapreduce-4482-release-1.1.0-rc4.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5500/console | This message was automatically generated. Backport MR sort plugin(MAPREDUCE-2454) to Hadoop 1.2 - Key: MAPREDUCE-4482 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4482 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv1 Affects Versions: 1.2.0 Reporter: Mariappan Asokan Assignee: Mariappan Asokan Attachments: HadoopSortPlugin.pdf, mapreduce-4482-release-1.1.0-rc4.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-3881) building fail under Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524582#comment-14524582 ] Hadoop QA commented on MAPREDUCE-3881: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12515081/pom.xml.patch | | Optional Tests | javadoc javac unit | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5501/console | This message was automatically generated. building fail under Windows --- Key: MAPREDUCE-3881 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3881 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Environment: D:\os\hadoopcommonmvn --version Apache Maven 3.0.4 (r1232337; 2012-01-17 16:44:56+0800) Maven home: C:\portable\maven\bin\.. Java version: 1.7.0_02, vendor: Oracle Corporation Java home: C:\Program Files (x86)\Java\jdk1.7.0_02\jre Default locale: zh_CN, platform encoding: GBK OS name: windows 7, version: 6.1, arch: x86, family: windows Reporter: Changming Sun Priority: Minor Attachments: pom.xml.patch Original Estimate: 1h Remaining Estimate: 1h hadoop-mapreduce-project\hadoop-yarn\hadoop-yarn-common\pom.xml is not portable. execution idgenerate-version/id phasegenerate-sources/phase configuration executablescripts/saveVersion.sh/executable arguments argument${project.version}/argument argument${project.build.directory}/argument /arguments /configuration goals goalexec/goal /goals /execution when I built it under windows , I got a such error: [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2:exec (gen erate-version) on project hadoop-yarn-common: Command execution failed. Cannot r un program scripts\saveVersion.sh (in directory D:\os\hadoopcommon\hadoop-map reduce-project\hadoop-yarn\hadoop-yarn-common): CreateProcess error=2, ? - [Help 1] we should modify it like this: (copied from hadoop-common-project\hadoop-common\pom.xml) configuration target mkdir dir=${project.build.directory}/generated-sources/java/ exec executable=sh arg line=${basedir}/dev-support/saveVersion.sh ${project.version} ${project.build.directory}/generated-sources/java/ /exec /target /configuration /execution -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-695) MiniMRCluster while shutting down should not wait for currently running jobs to finish
[ https://issues.apache.org/jira/browse/MAPREDUCE-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524578#comment-14524578 ] Hadoop QA commented on MAPREDUCE-695: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12412383/mapreduce-695.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5499/console | This message was automatically generated. MiniMRCluster while shutting down should not wait for currently running jobs to finish -- Key: MAPREDUCE-695 URL: https://issues.apache.org/jira/browse/MAPREDUCE-695 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 1.0.3 Reporter: Sreekanth Ramakrishnan Priority: Minor Attachments: mapreduce-695.patch Currently in {{org.apache.hadoop.mapred.MiniMRCluster.shutdown()}} we do a {{waitTaskTrackers()}} which can cause {{MiniMRCluster}} to hang indefinitely when used in conjunction with Controlled jobs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4273) Make CombineFileInputFormat split result JDK independent
[ https://issues.apache.org/jira/browse/MAPREDUCE-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524684#comment-14524684 ] Hadoop QA commented on MAPREDUCE-4273: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12567099/MAPREDUCE-4273-branch1-v2.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5525/console | This message was automatically generated. Make CombineFileInputFormat split result JDK independent Key: MAPREDUCE-4273 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4273 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 1.0.3 Reporter: Luke Lu Assignee: Yu Gao Attachments: MAPREDUCE-4273-branch1-v2.patch, mapreduce-4273-branch-1.patch, mapreduce-4273-branch-2.patch, mapreduce-4273.patch The split result of CombineFileInputFormat depends on the iteration order of nodeToBlocks and rackToBlocks hash maps, which makes the result HashMap implementation hence JDK dependent. This is manifested as TestCombineFileInputFormat failures on alternative JDKs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4330) TaskAttemptCompletedEventTransition invalidates previously successful attempt without checking if the newly completed attempt is successful
[ https://issues.apache.org/jira/browse/MAPREDUCE-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524674#comment-14524674 ] Hadoop QA commented on MAPREDUCE-4330: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12578792/MAPREDUCE-4330-20130415.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5523/console | This message was automatically generated. TaskAttemptCompletedEventTransition invalidates previously successful attempt without checking if the newly completed attempt is successful --- Key: MAPREDUCE-4330 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4330 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.23.1 Reporter: Bikas Saha Assignee: Omkar Vinit Joshi Attachments: MAPREDUCE-4330-20130415.1.patch, MAPREDUCE-4330-20130415.patch, MAPREDUCE-4330-21032013.1.patch, MAPREDUCE-4330-21032013.patch The previously completed attempt is removed from successAttemptCompletionEventNoMap and marked OBSOLETE. After that, if the newly completed attempt is successful then it is added to the successAttemptCompletionEventNoMap. This seems wrong because the newly completed attempt could be failed and thus there is no need to invalidate the successful attempt. One error case would be when a speculative attempt completes with killed/failed after the successful version has completed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5188) error when verify FileType of RS_SOURCE in getCompanionBlocks in BlockPlacementPolicyRaid.java
[ https://issues.apache.org/jira/browse/MAPREDUCE-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524683#comment-14524683 ] Hadoop QA commented on MAPREDUCE-5188: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12580811/MAPREDUCE-5188.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5524/console | This message was automatically generated. error when verify FileType of RS_SOURCE in getCompanionBlocks in BlockPlacementPolicyRaid.java --- Key: MAPREDUCE-5188 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5188 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Affects Versions: 2.0.2-alpha Reporter: junjin Assignee: junjin Priority: Critical Labels: contrib/raid Fix For: 2.0.2-alpha Attachments: MAPREDUCE-5188.patch error when verify FileType of RS_SOURCE in getCompanionBlocks in BlockPlacementPolicyRaid.java need change xorParityLength in line #379 to rsParityLength since it's for verifying RS_SOURCE type -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5748) Potential null pointer deference in ShuffleHandler#Shuffle#messageReceived()
[ https://issues.apache.org/jira/browse/MAPREDUCE-5748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524825#comment-14524825 ] Hadoop QA commented on MAPREDUCE-5748: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12635637/0001-MAPREDUCE-5748-Potential-null-pointer-deference-in-S.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5549/console | This message was automatically generated. Potential null pointer deference in ShuffleHandler#Shuffle#messageReceived() Key: MAPREDUCE-5748 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5748 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Ted Yu Priority: Minor Attachments: 0001-MAPREDUCE-5748-Potential-null-pointer-deference-in-S.patch Starting around line 510: {code} ChannelFuture lastMap = null; for (String mapId : mapIds) { ... } lastMap.addListener(metrics); lastMap.addListener(ChannelFutureListener.CLOSE); {code} If mapIds is empty, lastMap would remain null, leading to NPE in addListener() call. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5907) Improve getSplits() performance for fs implementations that can utilize performance gains from recursive listing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524821#comment-14524821 ] Hadoop QA commented on MAPREDUCE-5907: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12648040/MAPREDUCE-5907-3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5546/console | This message was automatically generated. Improve getSplits() performance for fs implementations that can utilize performance gains from recursive listing Key: MAPREDUCE-5907 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5907 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 2.4.0 Reporter: Sumit Kumar Assignee: Sumit Kumar Attachments: MAPREDUCE-5907-2.patch, MAPREDUCE-5907-3.patch, MAPREDUCE-5907.patch FileInputFormat (both mapreduce and mapred implementations) use recursive listing while calculating splits. They however do this by doing listing level by level. That means to discover files in /foo/bar means they do listing at /foo/bar first to get the immediate children, then make the same call on all immediate children for /foo/bar to discover their immediate children and so on. This doesn't scale well for object store based fs implementations like s3 and swift because every listStatus call ends up being a webservice call to backend. In cases where large number of files are considered for input, this makes getSplits() call slow. This patch adds a new set of recursive list apis that gives opportunity to the fs implementations to optimize. The behavior remains the same for other implementations (that is a default implementation is provided for other fs so they don't have to implement anything new). However for objectstore based fs implementations it provides a simple change to include recursive flag as true (as shown in the patch) to improve listing performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4883) Reducer's Maximum Shuffle Buffer Size should be enlarged for 64bit JVM
[ https://issues.apache.org/jira/browse/MAPREDUCE-4883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524846#comment-14524846 ] Hadoop QA commented on MAPREDUCE-4883: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12566621/MAPREDUCE-4883.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5552/console | This message was automatically generated. Reducer's Maximum Shuffle Buffer Size should be enlarged for 64bit JVM -- Key: MAPREDUCE-4883 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4883 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.20.2, 1.0.3 Environment: Especially for 64bit JVM Reporter: Lijie Xu Assignee: Jerry Chen Labels: patch Attachments: MAPREDUCE-4883.patch Original Estimate: 12h Remaining Estimate: 12h In hadoop-0.20.2, hadoop-1.0.3 or other versions, reducer's shuffle buffer size cannot exceed 2048MB (i.e., Integer.MAX_VALUE). This is reasonable for 32bit JVM. But for 64bit JVM, although reducer's JVM size can be set more than 2048MB (e.g., mapred.child.java.opts=-Xmx4000m), the heap size used for shuffle buffer is at most 2048MB * maxInMemCopyUse (default 0.7) not 4000MB * maxInMemCopyUse. So the pointed piece of code in ReduceTask.java needs modification for 64bit JVM. --- private final long maxSize; private final long maxSingleShuffleLimit; private long size = 0; private Object dataAvailable = new Object(); private long fullSize = 0; private int numPendingRequests = 0; private int numRequiredMapOutputs = 0; private int numClosed = 0; private boolean closed = false; public ShuffleRamManager(Configuration conf) throws IOException { final float maxInMemCopyUse = conf.getFloat(mapred.job.shuffle.input.buffer.percent, 0.70f); if (maxInMemCopyUse 1.0 || maxInMemCopyUse 0.0) { throw new IOException(mapred.job.shuffle.input.buffer.percent + maxInMemCopyUse); } // Allow unit tests to fix Runtime memory -- maxSize = (int)(conf.getInt(mapred.job.reduce.total.mem.bytes, --(int)Math.min(Runtime.getRuntime().maxMemory(), Integer.MAX_VALUE)) -- * maxInMemCopyUse); maxSingleShuffleLimit = (long)(maxSize * MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION); LOG.info(ShuffleRamManager: MemoryLimit= + maxSize + , MaxSingleShuffleLimit= + maxSingleShuffleLimit); } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5889) Deprecate FileInputFormat.setInputPaths(Job, String) and FileInputFormat.addInputPaths(Job, String)
[ https://issues.apache.org/jira/browse/MAPREDUCE-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524852#comment-14524852 ] Hadoop QA commented on MAPREDUCE-5889: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12647036/MAPREDUCE-5889.3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build//console | This message was automatically generated. Deprecate FileInputFormat.setInputPaths(Job, String) and FileInputFormat.addInputPaths(Job, String) --- Key: MAPREDUCE-5889 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5889 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: newbie Attachments: MAPREDUCE-5889.3.patch, MAPREDUCE-5889.patch, MAPREDUCE-5889.patch {{FileInputFormat.setInputPaths(Job job, String commaSeparatedPaths)}} and {{FileInputFormat.addInputPaths(Job job, String commaSeparatedPaths)}} fail to parse commaSeparatedPaths if a comma is included in the file path. (e.g. Path: {{/path/file,with,comma}}) We should deprecate these methods and document to use {{setInputPaths(Job job, Path... inputPaths)}} and {{addInputPaths(Job job, Path... inputPaths)}} instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5704) Optimize nextJobId in JobTracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524824#comment-14524824 ] Hadoop QA commented on MAPREDUCE-5704: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 1s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12621052/MAPREDUCE-5704.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5548/console | This message was automatically generated. Optimize nextJobId in JobTracker Key: MAPREDUCE-5704 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5704 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker, mrv1 Affects Versions: 1.2.1 Reporter: JamesLi Assignee: JamesLi Attachments: MAPREDUCE-5704.patch When jobtracker start, nextJobId start with 1,if we have run 3000 jobs then restart jobtracker and run a new job,we can not see this new job on jobtracker:5030/jobhistory.jsp unless click get more results button. In jobhistory_jsp.java, array SCAN_SIZES controls job numbers displayed on jobhistory.jsp. I make a little chage,when jobtracker start,find the biggest id under history done directory,job will start with maxId+1 or 1 if can not find any job files. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4711) Append time elapsed since job-start-time for finished tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524850#comment-14524850 ] Hadoop QA commented on MAPREDUCE-4711: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12548038/MAPREDUCE-4711.branch-0.23.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5553/console | This message was automatically generated. Append time elapsed since job-start-time for finished tasks --- Key: MAPREDUCE-4711 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4711 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 0.23.3 Reporter: Ravi Prakash Attachments: MAPREDUCE-4711.branch-0.23.patch In 0.20.x/1.x, the analyze job link gave this information bq. The last Map task task_sometask finished at (relative to the Job launch time): 5/10 20:23:10 (1hrs, 27mins, 54sec) The time it took for the last task to finish needs to be calculated mentally in 0.23. I believe we should print it next to the finish time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4957) Throw FileNotFoundException when running in single node and mapreduce.framework.name is local
[ https://issues.apache.org/jira/browse/MAPREDUCE-4957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524834#comment-14524834 ] Hadoop QA commented on MAPREDUCE-4957: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12566460/MAPREDUCE-4957.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5550/console | This message was automatically generated. Throw FileNotFoundException when running in single node and mapreduce.framework.name is local --- Key: MAPREDUCE-4957 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4957 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Yi Liu Assignee: Yi Liu Priority: Minor Attachments: MAPREDUCE-4957.patch, MAPREDUCE-4957.patch Run in single node and mapreduce.framework.name is local, and get following error: java.io.FileNotFoundException: File does not exist: /root/proj/hive-trunk/build/dist/lib/hive-builtins-0.11.0-SNAPSHOT.jar at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:772) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:208) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:71) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:254) at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:292) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:365) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218) at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1450) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:617) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:612) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1450) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:612) at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:446) at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:683) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Job Submission failed with exception 'java.io.FileNotFoundException(File does not exist: /root/proj/hive-trunk/build/dist/lib/hive-builtins-0.11.0-SNAPSHOT.jar)' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5490) MapReduce doesn't set the environment variable for children processes
[ https://issues.apache.org/jira/browse/MAPREDUCE-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524843#comment-14524843 ] Hadoop QA commented on MAPREDUCE-5490: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12629589/MAPREDUCE-5490.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5551/console | This message was automatically generated. MapReduce doesn't set the environment variable for children processes - Key: MAPREDUCE-5490 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5490 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.2.1 Reporter: Owen O'Malley Assignee: Owen O'Malley Attachments: MAPREDUCE-5490.patch, mr-5490.patch, mr-5490.patch Currently, MapReduce uses the command line argument to pass the classpath to the child. This breaks if the process forks a child that needs the same classpath. Such a case happens in Hive when it uses map-side joins. I propose that we make MapReduce in branch-1 use the CLASSPATH environment variable like YARN does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5649) Reduce cannot use more than 2G memory for the final merge
[ https://issues.apache.org/jira/browse/MAPREDUCE-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated MAPREDUCE-5649: - Attachment: MAPREDUCE-5649.002.patch Thanks for review, [~jlowe]! 002.patch Reduce cannot use more than 2G memory for the final merge -- Key: MAPREDUCE-5649 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5649 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Reporter: stanley shi Assignee: Gera Shegalov Attachments: MAPREDUCE-5649.001.patch, MAPREDUCE-5649.002.patch In the org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.java file, in the finalMerge method: int maxInMemReduce = (int)Math.min( Runtime.getRuntime().maxMemory() * maxRedPer, Integer.MAX_VALUE); This means no matter how much memory user has, reducer will not retain more than 2G data in memory before the reduce phase starts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4273) Make CombineFileInputFormat split result JDK independent
[ https://issues.apache.org/jira/browse/MAPREDUCE-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524629#comment-14524629 ] Hadoop QA commented on MAPREDUCE-4273: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12567099/MAPREDUCE-4273-branch1-v2.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5516/console | This message was automatically generated. Make CombineFileInputFormat split result JDK independent Key: MAPREDUCE-4273 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4273 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 1.0.3 Reporter: Luke Lu Assignee: Yu Gao Attachments: MAPREDUCE-4273-branch1-v2.patch, mapreduce-4273-branch-1.patch, mapreduce-4273-branch-2.patch, mapreduce-4273.patch The split result of CombineFileInputFormat depends on the iteration order of nodeToBlocks and rackToBlocks hash maps, which makes the result HashMap implementation hence JDK dependent. This is manifested as TestCombineFileInputFormat failures on alternative JDKs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-3936) Clients should not enforce counter limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524627#comment-14524627 ] Hadoop QA commented on MAPREDUCE-3936: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12544972/MAPREDUCE-3936.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5515/console | This message was automatically generated. Clients should not enforce counter limits -- Key: MAPREDUCE-3936 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3936 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1 Reporter: Tom White Assignee: Tom White Attachments: MAPREDUCE-3936.patch, MAPREDUCE-3936.patch The code for enforcing counter limits (from MAPREDUCE-1943) creates a static JobConf instance to load the limits, which may throw an exception if the client limit is set to be lower than the limit on the cluster (perhaps because the cluster limit was raised from the default). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5150) Backport 2009 terasort (MAPREDUCE-639) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524633#comment-14524633 ] Hadoop QA commented on MAPREDUCE-5150: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12578622/MAPREDUCE-5150-branch-1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5518/console | This message was automatically generated. Backport 2009 terasort (MAPREDUCE-639) to branch-1 -- Key: MAPREDUCE-5150 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5150 Project: Hadoop Map/Reduce Issue Type: Improvement Components: examples Affects Versions: 1.2.0 Reporter: Gera Shegalov Priority: Minor Attachments: MAPREDUCE-5150-branch-1.patch Users evaluate performance of Hadoop clusters using different benchmarks such as TeraSort. However, terasort version in branch-1 is outdated. It works on teragen dataset that cannot exceed 4 billion unique keys and it does not have the fast non-sampling partitioner SimplePartitioner either. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-3807) JobTracker needs fix similar to HDFS-94
[ https://issues.apache.org/jira/browse/MAPREDUCE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524630#comment-14524630 ] Hadoop QA commented on MAPREDUCE-3807: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12515105/MAPREDUCE-3807.patch | | Optional Tests | | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5517/console | This message was automatically generated. JobTracker needs fix similar to HDFS-94 --- Key: MAPREDUCE-3807 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3807 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 1.0.0 Reporter: Harsh J Labels: newbie Attachments: MAPREDUCE-3807.patch 1.0 JobTracker's jobtracker.jsp page currently shows: {code} h2Cluster Summary (Heap Size is %= StringUtils.byteDesc(Runtime.getRuntime().totalMemory()) %/%= StringUtils.byteDesc(Runtime.getRuntime().maxMemory()) %)/h2 {code} It could use an improvement same as HDFS-94 to reflect live heap usage more accurately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5365) Set mapreduce.job.classloader to true by default
[ https://issues.apache.org/jira/browse/MAPREDUCE-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524636#comment-14524636 ] Hadoop QA commented on MAPREDUCE-5365: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12590345/MAPREDUCE-5365.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5519/console | This message was automatically generated. Set mapreduce.job.classloader to true by default Key: MAPREDUCE-5365 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5365 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.0.5-alpha Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5365.patch MAPREDUCE-1700 introduced the mapreduce.job.classpath option, which uses a custom classloader to separate system classes from user classes. It seems like there are only rare cases when a user would not want this on, and that it should enabled by default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5907) Improve getSplits() performance for fs implementations that can utilize performance gains from recursive listing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524788#comment-14524788 ] Hadoop QA commented on MAPREDUCE-5907: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12648040/MAPREDUCE-5907-3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5543/console | This message was automatically generated. Improve getSplits() performance for fs implementations that can utilize performance gains from recursive listing Key: MAPREDUCE-5907 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5907 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Affects Versions: 2.4.0 Reporter: Sumit Kumar Assignee: Sumit Kumar Attachments: MAPREDUCE-5907-2.patch, MAPREDUCE-5907-3.patch, MAPREDUCE-5907.patch FileInputFormat (both mapreduce and mapred implementations) use recursive listing while calculating splits. They however do this by doing listing level by level. That means to discover files in /foo/bar means they do listing at /foo/bar first to get the immediate children, then make the same call on all immediate children for /foo/bar to discover their immediate children and so on. This doesn't scale well for object store based fs implementations like s3 and swift because every listStatus call ends up being a webservice call to backend. In cases where large number of files are considered for input, this makes getSplits() call slow. This patch adds a new set of recursive list apis that gives opportunity to the fs implementations to optimize. The behavior remains the same for other implementations (that is a default implementation is provided for other fs so they don't have to implement anything new). However for objectstore based fs implementations it provides a simple change to include recursive flag as true (as shown in the patch) to improve listing performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5929) YARNRunner.java, path for jobJarPath not set correctly
[ https://issues.apache.org/jira/browse/MAPREDUCE-5929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524955#comment-14524955 ] Hadoop QA commented on MAPREDUCE-5929: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12668704/MAPREDUCE-5929.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5575/console | This message was automatically generated. YARNRunner.java, path for jobJarPath not set correctly -- Key: MAPREDUCE-5929 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5929 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Chao Tian Assignee: Rahul Palamuttam Labels: newbie, patch Attachments: MAPREDUCE-5929.patch In YARNRunner.java, line 357, Path jobJarPath = new Path(jobConf.get(MRJobConfig.JAR)); This causes the job.jar file to miss scheme, host and port number on distributed file systems other than hdfs. If we compare line 357 with line 344, there job.xml is actually set as Path jobConfPath = new Path(jobSubmitDir,MRJobConfig.JOB_CONF_FILE); It appears jobSubmitDir is missing on line 357, which causes this problem. In hdfs, the additional qualify process will correct this problem, but not other generic distributed file systems. The proposed change is to replace 35 7 with Path jobJarPath = new Path(jobConf.get(jobSubmitDir,MRJobConfig.JAR)); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5392) mapred job -history all command throws IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/MAPREDUCE-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524938#comment-14524938 ] Hadoop QA commented on MAPREDUCE-5392: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12669757/MAPREDUCE-5392.5.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5573/console | This message was automatically generated. mapred job -history all command throws IndexOutOfBoundsException -- Key: MAPREDUCE-5392 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5392 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 3.0.0, 2.0.5-alpha, 2.2.0 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Attachments: MAPREDUCE-5392.2.patch, MAPREDUCE-5392.3.patch, MAPREDUCE-5392.4.patch, MAPREDUCE-5392.5.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch When I use an all option by mapred job -history comamnd, the following exceptions are displayed and do not work. {code} Exception in thread main java.lang.StringIndexOutOfBoundsException: String index out of range: -3 at java.lang.String.substring(String.java:1875) at org.apache.hadoop.mapreduce.util.HostUtil.convertTrackerNameToHostName(HostUtil.java:49) at org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.getTaskLogsUrl(HistoryViewer.java:459) at org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.printAllTaskAttempts(HistoryViewer.java:235) at org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.print(HistoryViewer.java:117) at org.apache.hadoop.mapreduce.tools.CLI.viewHistory(CLI.java:472) at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:313) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1233) {code} This is because a node name recorded in History file is not given tracker_. Therefore it makes modifications to be able to read History file even if a node name is not given by tracker_. In addition, it fixes the URL of displayed task log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4216) Make MultipleOutputs generic to support non-file output formats
[ https://issues.apache.org/jira/browse/MAPREDUCE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525072#comment-14525072 ] Hadoop QA commented on MAPREDUCE-4216: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 14m 35s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 29s | There were no new javac warning messages. | | {color:red}-1{color} | javadoc | 9m 33s | The applied patch generated 1 additional warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 0m 46s | The applied patch generated 3 new checkstyle issues (total was 67, now 70). | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 4 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 34s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 15s | The patch does not introduce any new Findbugs (version 2.0.3) warnings. | | {color:green}+1{color} | mapreduce tests | 1m 35s | Tests passed in hadoop-mapreduce-client-core. | | | | 37m 46s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12525460/MAPREDUCE-4216.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | javadoc | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5581/artifact/patchprocess/diffJavadocWarnings.txt | | checkstyle | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5581/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-core.txt | | whitespace | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5581/artifact/patchprocess/whitespace.txt | | hadoop-mapreduce-client-core test log | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5581/artifact/patchprocess/testrun_hadoop-mapreduce-client-core.txt | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5581/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf905.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5581/console | This message was automatically generated. Make MultipleOutputs generic to support non-file output formats --- Key: MAPREDUCE-4216 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4216 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 1.0.2 Reporter: Robbie Strickland Labels: Output Attachments: MAPREDUCE-4216.patch The current MultipleOutputs implementation is tied to FileOutputFormat in such a way that it is not extensible to other types of output. It should be made more generic, such as with an interface that can be implemented for different outputs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5362) clean up POM dependencies
[ https://issues.apache.org/jira/browse/MAPREDUCE-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524911#comment-14524911 ] Hadoop QA commented on MAPREDUCE-5362: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12640169/mr-5362-0.patch | | Optional Tests | javadoc javac unit | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5567/console | This message was automatically generated. clean up POM dependencies - Key: MAPREDUCE-5362 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5362 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 2.1.0-beta Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: MAPREDUCE-5362.patch, mr-5362-0.patch Intermediate 'pom' modules define dependencies inherited by leaf modules. This is causing issues in intellij IDE. We should normalize the leaf modules like in common, hdfs and tools where all dependencies are defined in each leaf module and the intermediate 'pom' module do not define any dependency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6030) In mr-jobhistory-daemon.sh, some env variables are not affected by mapred-env.sh
[ https://issues.apache.org/jira/browse/MAPREDUCE-6030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524907#comment-14524907 ] Hadoop QA commented on MAPREDUCE-6030: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12660804/MAPREDUCE-6030.patch | | Optional Tests | shellcheck | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5564/console | This message was automatically generated. In mr-jobhistory-daemon.sh, some env variables are not affected by mapred-env.sh Key: MAPREDUCE-6030 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6030 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 2.4.1 Reporter: Youngjoon Kim Assignee: Youngjoon Kim Priority: Minor Attachments: MAPREDUCE-6030.patch In mr-jobhistory-daemon.sh, some env variables are exported before sourcing mapred-env.sh, so these variables don't use values defined in mapred-env.sh. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6038) A boolean may be set error in the Word Count v2.0 in MapReduce Tutorial
[ https://issues.apache.org/jira/browse/MAPREDUCE-6038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524881#comment-14524881 ] Hadoop QA commented on MAPREDUCE-6038: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12665076/MAPREDUCE-6038.1.patch | | Optional Tests | site | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5561/console | This message was automatically generated. A boolean may be set error in the Word Count v2.0 in MapReduce Tutorial --- Key: MAPREDUCE-6038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6038 Project: Hadoop Map/Reduce Issue Type: Bug Environment: java version 1.8.0_11 hostspot 64-bit Reporter: Pei Ma Assignee: Tsuyoshi Ozawa Priority: Minor Attachments: MAPREDUCE-6038.1.patch As a beginner, when I learned about the basic of the mr, I found that I cound't run the WordCount2 using the command bin/hadoop jar wc.jar WordCount2 /user/joe/wordcount/input /user/joe/wordcount/output in the Tutorial. The VM throwed the NullPoniterException at the line 47. In the line 45, the returned default value of conf.getBoolean is true. That is to say when wordcount.skip.patterns is not set ,the WordCount2 will continue to execute getCacheFiles.. Then patternsURIs gets the null value. When the -skip option dosen't exist, wordcount.skip.patterns will not be set. Then a NullPointerException come out. At all, the block after the if-statement in line no. 45 shoudn't be executed when the -skip option dosen't exist in command. Maybe the line 45 should like that if (conf.getBoolean(wordcount.skip.patterns, false)) { .Just change the boolean. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5889) Deprecate FileInputFormat.setInputPaths(Job, String) and FileInputFormat.addInputPaths(Job, String)
[ https://issues.apache.org/jira/browse/MAPREDUCE-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524885#comment-14524885 ] Hadoop QA commented on MAPREDUCE-5889: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12647036/MAPREDUCE-5889.3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5562/console | This message was automatically generated. Deprecate FileInputFormat.setInputPaths(Job, String) and FileInputFormat.addInputPaths(Job, String) --- Key: MAPREDUCE-5889 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5889 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Akira AJISAKA Assignee: Akira AJISAKA Priority: Minor Labels: newbie Attachments: MAPREDUCE-5889.3.patch, MAPREDUCE-5889.patch, MAPREDUCE-5889.patch {{FileInputFormat.setInputPaths(Job job, String commaSeparatedPaths)}} and {{FileInputFormat.addInputPaths(Job job, String commaSeparatedPaths)}} fail to parse commaSeparatedPaths if a comma is included in the file path. (e.g. Path: {{/path/file,with,comma}}) We should deprecate these methods and document to use {{setInputPaths(Job job, Path... inputPaths)}} and {{addInputPaths(Job job, Path... inputPaths)}} instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524969#comment-14524969 ] Hadoop QA commented on MAPREDUCE-5969: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12673831/MAPREDUCE-5969.branch1.1.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5577/console | This message was automatically generated. Private non-Archive Files' size add twice in Distributed Cache directory size calculation. -- Key: MAPREDUCE-5969 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Reporter: zhihai xu Assignee: zhihai xu Attachments: MAPREDUCE-5969.branch1.1.patch, MAPREDUCE-5969.branch1.patch Private non-Archive Files' size add twice in Distributed Cache directory size calculation. Private non-Archive Files list is passed in by -files command line option. The Distributed Cache directory size is used to check whether the total cache files size exceed the cache size limitation, the default cache size limitation is 10G. I add log in addCacheInfoUpdate and setSize in TrackerDistributedCacheManager.java. I use the following command to test: hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar /tmp/zxu/test_in/ /tmp/zxu/test_out to add two files into distributed cache:WordCount.java and wordcount.jar. WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 bytes. The total should be 6260. The log show these files size added twice: add one time before download to local node and add second time after download to local node, so total file number becomes 4 instead of 2: addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local In the code, for Private non-Archive File, the first time we add file size is at getLocalCache: {code} if (!isArchive) { //for private archives, the lengths come over RPC from the //JobLocalizer since the JobLocalizer is the one who expands //archives and gets the total length lcacheStatus.size = fileStatus.getLen(); LOG.info(getLocalCache: + localizedPath + size = + lcacheStatus.size); // Increase the size and sub directory count of the cache // from baseDirSize and baseDirNumberSubDir. baseDirManager.addCacheInfoUpdate(lcacheStatus); } {code} The second time we add file size is at setSize: {code} synchronized (status) { status.size = size; baseDirManager.addCacheInfoUpdate(status); } {code} The fix is not to add the file size for for Private non-Archive File after download(downloadCacheObject). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4818) Easier identification of tasks that timeout during localization
[ https://issues.apache.org/jira/browse/MAPREDUCE-4818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524978#comment-14524978 ] Hadoop QA commented on MAPREDUCE-4818: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12675155/MAPREDUCE-4818.v5.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5578/console | This message was automatically generated. Easier identification of tasks that timeout during localization --- Key: MAPREDUCE-4818 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4818 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am Affects Versions: 0.23.3, 2.0.3-alpha Reporter: Jason Lowe Assignee: Siqi Li Labels: usability Attachments: MAPREDUCE-4818.v1.patch, MAPREDUCE-4818.v2.patch, MAPREDUCE-4818.v3.patch, MAPREDUCE-4818.v4.patch, MAPREDUCE-4818.v5.patch When a task is taking too long to localize and is killed by the AM due to task timeout, the job UI/history is not very helpful. The attempt simply lists a diagnostic stating it was killed due to timeout, but there are no logs for the attempt since it never actually got started. There are log messages on the NM that show the container never made it past localization by the time it was killed, but users often do not have access to those logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4711) Append time elapsed since job-start-time for finished tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524913#comment-14524913 ] Hadoop QA commented on MAPREDUCE-4711: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12548038/MAPREDUCE-4711.branch-0.23.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5569/console | This message was automatically generated. Append time elapsed since job-start-time for finished tasks --- Key: MAPREDUCE-4711 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4711 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 0.23.3 Reporter: Ravi Prakash Attachments: MAPREDUCE-4711.branch-0.23.patch In 0.20.x/1.x, the analyze job link gave this information bq. The last Map task task_sometask finished at (relative to the Job launch time): 5/10 20:23:10 (1hrs, 27mins, 54sec) The time it took for the last task to finish needs to be calculated mentally in 0.23. I believe we should print it next to the finish time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-4818) Easier identification of tasks that timeout during localization
[ https://issues.apache.org/jira/browse/MAPREDUCE-4818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525015#comment-14525015 ] Hadoop QA commented on MAPREDUCE-4818: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12675155/MAPREDUCE-4818.v5.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5579/console | This message was automatically generated. Easier identification of tasks that timeout during localization --- Key: MAPREDUCE-4818 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4818 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am Affects Versions: 0.23.3, 2.0.3-alpha Reporter: Jason Lowe Assignee: Siqi Li Labels: usability Attachments: MAPREDUCE-4818.v1.patch, MAPREDUCE-4818.v2.patch, MAPREDUCE-4818.v3.patch, MAPREDUCE-4818.v4.patch, MAPREDUCE-4818.v5.patch When a task is taking too long to localize and is killed by the AM due to task timeout, the job UI/history is not very helpful. The attempt simply lists a diagnostic stating it was killed due to timeout, but there are no logs for the attempt since it never actually got started. There are log messages on the NM that show the container never made it past localization by the time it was killed, but users often do not have access to those logs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5392) mapred job -history all command throws IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/MAPREDUCE-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524958#comment-14524958 ] Hadoop QA commented on MAPREDUCE-5392: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12669757/MAPREDUCE-5392.5.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5576/console | This message was automatically generated. mapred job -history all command throws IndexOutOfBoundsException -- Key: MAPREDUCE-5392 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5392 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 3.0.0, 2.0.5-alpha, 2.2.0 Reporter: Shinichi Yamashita Assignee: Shinichi Yamashita Attachments: MAPREDUCE-5392.2.patch, MAPREDUCE-5392.3.patch, MAPREDUCE-5392.4.patch, MAPREDUCE-5392.5.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch, MAPREDUCE-5392.patch When I use an all option by mapred job -history comamnd, the following exceptions are displayed and do not work. {code} Exception in thread main java.lang.StringIndexOutOfBoundsException: String index out of range: -3 at java.lang.String.substring(String.java:1875) at org.apache.hadoop.mapreduce.util.HostUtil.convertTrackerNameToHostName(HostUtil.java:49) at org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.getTaskLogsUrl(HistoryViewer.java:459) at org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.printAllTaskAttempts(HistoryViewer.java:235) at org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.print(HistoryViewer.java:117) at org.apache.hadoop.mapreduce.tools.CLI.viewHistory(CLI.java:472) at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:313) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1233) {code} This is because a node name recorded in History file is not given tracker_. Therefore it makes modifications to be able to read History file even if a node name is not given by tracker_. In addition, it fixes the URL of displayed task log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6040) distcp should automatically use /.reserved/raw when run by the superuser
[ https://issues.apache.org/jira/browse/MAPREDUCE-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524949#comment-14524949 ] Hadoop QA commented on MAPREDUCE-6040: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12669095/MAPREDUCE-6040.002.patch | | Optional Tests | site javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5574/console | This message was automatically generated. distcp should automatically use /.reserved/raw when run by the superuser Key: MAPREDUCE-6040 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6040 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp Affects Versions: 3.0.0 Reporter: Andrew Wang Assignee: Charles Lamb Attachments: HDFS-6134-Distcp-cp-UseCasesTable2.pdf, MAPREDUCE-6040.001.patch, MAPREDUCE-6040.002.patch On HDFS-6134, [~sanjay.radia] asked for distcp to automatically prepend /.reserved/raw if the distcp is being performed by the superuser and /.reserved/raw is supported by both the source and destination filesystems. This behavior only occurs if none of the src and target pathnames are /.reserved/raw. The -disablereservedraw flag can be used to disable this option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5981) Log levels of certain MR logs can be changed to DEBUG
[ https://issues.apache.org/jira/browse/MAPREDUCE-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524912#comment-14524912 ] Hadoop QA commented on MAPREDUCE-5981: -- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12656504/MAPREDUCE-5981.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5568/console | This message was automatically generated. Log levels of certain MR logs can be changed to DEBUG - Key: MAPREDUCE-5981 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5981 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Varun Saxena Assignee: Varun Saxena Attachments: MAPREDUCE-5981.patch Following map reduce logs can be changed to DEBUG log level. 1. In org.apache.hadoop.mapreduce.task.reduce.Fetcher#copyFromHost(Fetcher.java : 313), the second log is not required to be at info level. This can be moved to debug as a warn log is anyways printed if verifyReply fails. SecureShuffleUtils.verifyReply(replyHash, encHash, shuffleSecretKey); LOG.info(for url=+msgToEncode+ sent hash and received reply); 2. Thread related info need not be printed in logs at INFO level. Below 2 logs can be moved to DEBUG a) In org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl#getHost(ShuffleSchedulerImpl.java : 381), below log can be changed to DEBUG LOG.info(Assigning + host + with + host.getNumKnownMapOutputs() + to + Thread.currentThread().getName()); b) In org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.getMapsForHost(ShuffleSchedulerImpl.java : 411), below log can be changed to DEBUG LOG.info(assigned + includedMaps + of + totalSize + to + host + to + Thread.currentThread().getName()); -- This message was sent by Atlassian JIRA (v6.3.4#6332)