[jira] [Commented] (MAPREDUCE-6129) Job failed due to counter out of limited in MRAppMaster
[ https://issues.apache.org/jira/browse/MAPREDUCE-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183022#comment-14183022 ] Min Zhou commented on MAPREDUCE-6129: - Hmm.. It indeed a duplicate. Job failed due to counter out of limited in MRAppMaster --- Key: MAPREDUCE-6129 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6129 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 3.0.0, 2.3.0, 2.5.0, 2.4.1, 2.5.1 Reporter: Min Zhou Attachments: MAPREDUCE-6129.diff Lots of of cluster's job use more than 120 counters, those kind of jobs failed with exception like below {noformat} 2014-10-15 22:55:43,742 WARN [Socket Reader #1 for port 45673] org.apache.hadoop.ipc.Server: Unable to read call parameters for client 10.180.216.12on connection protocol org.apache.hadoop.mapred.TaskUmbilicalProtocol for rpcKind RPC_WRITABLE org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters: 121 max=120 at org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:103) at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:110) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.readFields(AbstractCounterGroup.java:175) at org.apache.hadoop.mapred.Counters$Group.readFields(Counters.java:324) at org.apache.hadoop.mapreduce.counters.AbstractCounters.readFields(AbstractCounters.java:314) at org.apache.hadoop.mapred.TaskStatus.readFields(TaskStatus.java:489) at org.apache.hadoop.mapred.ReduceTaskStatus.readFields(ReduceTaskStatus.java:140) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285) at org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:157) at org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1802) at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1734) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1494) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:732) at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:606) at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:577) {noformat} The class org.apache.hadoop.mapreduce.counters.Limits load the mapred-site.xml on nodemanager node for JobConf if it hasn't been inited. If the mapred-site.xml on nodemanager node is not exist or the mapreduce.job.counters.max hasn't been defined on that file, Class org.apache.hadoop.mapreduce.counters.Limits will just use the default value 120. Instead, we should read user job's conf file rather than config files on nodemanager for checking counters limits. I will submitt a patch later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6129) Job failed due to counter out of limited in MRAppMaster
Min Zhou created MAPREDUCE-6129: --- Summary: Job failed due to counter out of limited in MRAppMaster Key: MAPREDUCE-6129 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6129 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Reporter: Min Zhou Lots of of cluster's job use more than 120 counters, those kind of jobs failed with exception like below {noformat} 2014-10-15 22:55:43,742 WARN [Socket Reader #1 for port 45673] org.apache.hadoop.ipc.Server: Unable to read call parameters for client 10.180.216.12on connection protocol org.apache.hadoop.mapred.TaskUmbilicalProtocol for rpcKind RPC_WRITABLE org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters: 121 max=120 at org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:103) at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:110) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.readFields(AbstractCounterGroup.java:175) at org.apache.hadoop.mapred.Counters$Group.readFields(Counters.java:324) at org.apache.hadoop.mapreduce.counters.AbstractCounters.readFields(AbstractCounters.java:314) at org.apache.hadoop.mapred.TaskStatus.readFields(TaskStatus.java:489) at org.apache.hadoop.mapred.ReduceTaskStatus.readFields(ReduceTaskStatus.java:140) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285) at org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:157) at org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1802) at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1734) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1494) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:732) at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:606) at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:577) {noformat} The class org.apache.hadoop.mapreduce.counters.Limits load the mapred-site.xml on nodemanager node for JobConf if it hasn't been inited. If the mapred-site.xml on nodemanager node is not exist or the mapreduce.job.counters.max hasn't been defined on that file, Class org.apache.hadoop.mapreduce.counters.Limits will just use the default value 120. Instead, we should read user job's conf file rather than config files on nodemanager for checking counters limits. I will submitt a patch later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6129) Job failed due to counter out of limited in MRAppMaster
[ https://issues.apache.org/jira/browse/MAPREDUCE-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated MAPREDUCE-6129: Affects Version/s: 3.0.0 2.3.0 2.5.0 2.4.1 2.5.1 Job failed due to counter out of limited in MRAppMaster --- Key: MAPREDUCE-6129 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6129 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 3.0.0, 2.3.0, 2.5.0, 2.4.1, 2.5.1 Reporter: Min Zhou Lots of of cluster's job use more than 120 counters, those kind of jobs failed with exception like below {noformat} 2014-10-15 22:55:43,742 WARN [Socket Reader #1 for port 45673] org.apache.hadoop.ipc.Server: Unable to read call parameters for client 10.180.216.12on connection protocol org.apache.hadoop.mapred.TaskUmbilicalProtocol for rpcKind RPC_WRITABLE org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters: 121 max=120 at org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:103) at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:110) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.readFields(AbstractCounterGroup.java:175) at org.apache.hadoop.mapred.Counters$Group.readFields(Counters.java:324) at org.apache.hadoop.mapreduce.counters.AbstractCounters.readFields(AbstractCounters.java:314) at org.apache.hadoop.mapred.TaskStatus.readFields(TaskStatus.java:489) at org.apache.hadoop.mapred.ReduceTaskStatus.readFields(ReduceTaskStatus.java:140) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285) at org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:157) at org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1802) at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1734) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1494) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:732) at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:606) at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:577) {noformat} The class org.apache.hadoop.mapreduce.counters.Limits load the mapred-site.xml on nodemanager node for JobConf if it hasn't been inited. If the mapred-site.xml on nodemanager node is not exist or the mapreduce.job.counters.max hasn't been defined on that file, Class org.apache.hadoop.mapreduce.counters.Limits will just use the default value 120. Instead, we should read user job's conf file rather than config files on nodemanager for checking counters limits. I will submitt a patch later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6129) Job failed due to counter out of limited in MRAppMaster
[ https://issues.apache.org/jira/browse/MAPREDUCE-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated MAPREDUCE-6129: Attachment: MAPREDUCE-6129.diff Job failed due to counter out of limited in MRAppMaster --- Key: MAPREDUCE-6129 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6129 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 3.0.0, 2.3.0, 2.5.0, 2.4.1, 2.5.1 Reporter: Min Zhou Attachments: MAPREDUCE-6129.diff Lots of of cluster's job use more than 120 counters, those kind of jobs failed with exception like below {noformat} 2014-10-15 22:55:43,742 WARN [Socket Reader #1 for port 45673] org.apache.hadoop.ipc.Server: Unable to read call parameters for client 10.180.216.12on connection protocol org.apache.hadoop.mapred.TaskUmbilicalProtocol for rpcKind RPC_WRITABLE org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters: 121 max=120 at org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:103) at org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:110) at org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.readFields(AbstractCounterGroup.java:175) at org.apache.hadoop.mapred.Counters$Group.readFields(Counters.java:324) at org.apache.hadoop.mapreduce.counters.AbstractCounters.readFields(AbstractCounters.java:314) at org.apache.hadoop.mapred.TaskStatus.readFields(TaskStatus.java:489) at org.apache.hadoop.mapred.ReduceTaskStatus.readFields(ReduceTaskStatus.java:140) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285) at org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:157) at org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1802) at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1734) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1494) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:732) at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:606) at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:577) {noformat} The class org.apache.hadoop.mapreduce.counters.Limits load the mapred-site.xml on nodemanager node for JobConf if it hasn't been inited. If the mapred-site.xml on nodemanager node is not exist or the mapreduce.job.counters.max hasn't been defined on that file, Class org.apache.hadoop.mapreduce.counters.Limits will just use the default value 120. Instead, we should read user job's conf file rather than config files on nodemanager for checking counters limits. I will submitt a patch later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-4291) RpcServerFactoryPBImpl force yarn users to define a protocol in a entailed namespace
Min Zhou created MAPREDUCE-4291: --- Summary: RpcServerFactoryPBImpl force yarn users to define a protocol in a entailed namespace Key: MAPREDUCE-4291 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4291 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1, 0.24.0 Reporter: Min Zhou We defined a wire protocol use protobuf with its java package name org.apache.hadoop.realtime.proto {code:protobuf} option java_package = org.apache.hadoop.realtime.proto; {code} Such definition would cause a ClassNotFoundException when starting our customed application master. {noformat} 12/05/29 14:45:33 ERROR app.DragonAppMaster: Error starting DragonAppMaster org.apache.hadoop.yarn.YarnException: Failed to Start org.apache.hadoop.realtime.client.app.DragonAppMaster at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78) at org.apache.hadoop.realtime.client.app.DragonAppMaster.start(DragonAppMaster.java:155) at org.apache.hadoop.realtime.client.app.DragonAppMaster$1.run(DragonAppMaster.java:218) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177) at org.apache.hadoop.realtime.client.app.DragonAppMaster.initAndStartAppMaster(DragonAppMaster.java:214) at org.apache.hadoop.realtime.client.app.DragonAppMaster.main(DragonAppMaster.java:200) Caused by: org.apache.hadoop.yarn.YarnException: Failed to load class: [org.apache.hadoop.yarn.proto.DragonClientProtocol$DragonClientProtocolService] at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:105) at org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:63) at org.apache.hadoop.realtime.client.DragonClientService.start(DragonClientService.java:134) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) ... 7 more Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.proto.DragonClientProtocol$DragonClientProtocolService at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1162) at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:103) ... 10 more {noformat} RpcServerFactoryPBImpl hard coded namespace suffix and class suffix of every protocol we defined. It force yarn users to define a protocol in a entailed namespace, below is some lines of RpcServerFactoryPBImpl.java that issues in the bug. {code:java} private static final String PROTO_GEN_PACKAGE_NAME = org.apache.hadoop.yarn.proto; private static final String PROTO_GEN_CLASS_SUFFIX = Service; private static final String PB_IMPL_PACKAGE_SUFFIX = impl.pb.service; private static final String PB_IMPL_CLASS_SUFFIX = PBServiceImpl; //... private String getProtoClassName(Class? clazz) { String srcClassName = getClassName(clazz); return PROTO_GEN_PACKAGE_NAME + . + srcClassName + $ + srcClassName + PROTO_GEN_CLASS_SUFFIX; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2425) Distributed simulator for stressing JobTracker and NameNode
[ https://issues.apache.org/jira/browse/MAPREDUCE-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017315#comment-13017315 ] Min Zhou commented on MAPREDUCE-2425: - btw, this tool can stress RPC as well. Distributed simulator for stressing JobTracker and NameNode --- Key: MAPREDUCE-2425 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2425 Project: Hadoop Map/Reduce Issue Type: New Feature Components: benchmarks Reporter: Min Zhou Labels: benchmark, hadoop Fix For: 0.22.0 Attachments: .jpg, screenshot-1.jpg Hadoop need a tool for stressing JobTracker and NameNode. Mumak introduced a simulated JobTracker, whose behavior doesn't exactly like that of the real JobTracker. Even more, mumak can't simulate a large cluster with quite a lot of jobs run on it. On the other hand, Gridmix v3 need hundreds of physical nodes to replay job stories. You can think this tool a complementation of mumak and gridmix v3. We successfully used this tool to simulate a 12000 nodes cluster through 4 real machines. I've talk to Hong Tang and Scott Chen offline, they suggested me contributing this tool to the hadoop community. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2425) Distributed simulator for stressing JobTracker and NameNode
[ https://issues.apache.org/jira/browse/MAPREDUCE-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated MAPREDUCE-2425: Component/s: (was: benchmarks) contrib/mumak Distributed simulator for stressing JobTracker and NameNode --- Key: MAPREDUCE-2425 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2425 Project: Hadoop Map/Reduce Issue Type: New Feature Components: contrib/mumak Reporter: Min Zhou Labels: benchmark, hadoop Fix For: 0.22.0 Attachments: .jpg, screenshot-1.jpg Hadoop need a tool for stressing JobTracker and NameNode. Mumak introduced a simulated JobTracker, whose behavior doesn't exactly like that of the real JobTracker. Even more, mumak can't simulate a large cluster with quite a lot of jobs run on it. On the other hand, Gridmix v3 need hundreds of physical nodes to replay job stories. You can think this tool a complementation of mumak and gridmix v3. We successfully used this tool to simulate a 12000 nodes cluster through 4 real machines. I've talk to Hong Tang and Scott Chen offline, they suggested me contributing this tool to the hadoop community. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2425) Distributed simulator for stressing JobTracker and NameNode
Distributed simulator for stressing JobTracker and NameNode --- Key: MAPREDUCE-2425 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2425 Project: Hadoop Map/Reduce Issue Type: New Feature Components: benchmarks Reporter: Min Zhou Fix For: 0.22.0 Hadoop need a tool for stressing JobTracker and NameNode. Mumak introduced a simulated JobTracker, whose behavior doesn't exactly like that of the real JobTracker. Even more, mumak can't simulate a large cluster with quite a lot of jobs run on it. On the other hand, Gridmix v3 need hundreds of physical nodes to replay job stories. You can think this tool a complementation of mumak and gridmix v3. We successfully used this tool to simulate a 12000 nodes cluster through 4 real machines. I've talk to Hong Tang and Scott Chen offline, they suggested me contributing this tool to the hadoop community. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2425) Distributed simulator for stressing JobTracker and NameNode
[ https://issues.apache.org/jira/browse/MAPREDUCE-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated MAPREDUCE-2425: Attachment: .jpg A screenshot of this tool. we are using hadoop 0.19.1 Distributed simulator for stressing JobTracker and NameNode --- Key: MAPREDUCE-2425 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2425 Project: Hadoop Map/Reduce Issue Type: New Feature Components: benchmarks Reporter: Min Zhou Labels: benchmark, hadoop Fix For: 0.22.0 Attachments: .jpg Hadoop need a tool for stressing JobTracker and NameNode. Mumak introduced a simulated JobTracker, whose behavior doesn't exactly like that of the real JobTracker. Even more, mumak can't simulate a large cluster with quite a lot of jobs run on it. On the other hand, Gridmix v3 need hundreds of physical nodes to replay job stories. You can think this tool a complementation of mumak and gridmix v3. We successfully used this tool to simulate a 12000 nodes cluster through 4 real machines. I've talk to Hong Tang and Scott Chen offline, they suggested me contributing this tool to the hadoop community. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2425) Distributed simulator for stressing JobTracker and NameNode
[ https://issues.apache.org/jira/browse/MAPREDUCE-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Zhou updated MAPREDUCE-2425: Attachment: screenshot-1.jpg screenshot Distributed simulator for stressing JobTracker and NameNode --- Key: MAPREDUCE-2425 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2425 Project: Hadoop Map/Reduce Issue Type: New Feature Components: benchmarks Reporter: Min Zhou Labels: benchmark, hadoop Fix For: 0.22.0 Attachments: .jpg, screenshot-1.jpg Hadoop need a tool for stressing JobTracker and NameNode. Mumak introduced a simulated JobTracker, whose behavior doesn't exactly like that of the real JobTracker. Even more, mumak can't simulate a large cluster with quite a lot of jobs run on it. On the other hand, Gridmix v3 need hundreds of physical nodes to replay job stories. You can think this tool a complementation of mumak and gridmix v3. We successfully used this tool to simulate a 12000 nodes cluster through 4 real machines. I've talk to Hong Tang and Scott Chen offline, they suggested me contributing this tool to the hadoop community. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2425) Distributed simulator for stressing JobTracker and NameNode
[ https://issues.apache.org/jira/browse/MAPREDUCE-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017273#comment-13017273 ] Min Zhou commented on MAPREDUCE-2425: - Amar, This simulator is developed majorly for stressing JT and NN. It can also verify JT's runtime behavior as mumak does. Actually, we use v0.19.1 where rumen and mumak havenot been introduced at that time, so I developed this tool independent from them. Now that I am planning to merge my code into mumak. But before that, I must do 2 things list below 1. Mumak uses a simulated JT for telling TT some informations of a task-attempt reproduced by rumen through heartbeat. I perfer using the real JT when stressing it. 2. I should uses new MR API before merging into mumak. Distributed simulator for stressing JobTracker and NameNode --- Key: MAPREDUCE-2425 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2425 Project: Hadoop Map/Reduce Issue Type: New Feature Components: benchmarks Reporter: Min Zhou Labels: benchmark, hadoop Fix For: 0.22.0 Attachments: .jpg, screenshot-1.jpg Hadoop need a tool for stressing JobTracker and NameNode. Mumak introduced a simulated JobTracker, whose behavior doesn't exactly like that of the real JobTracker. Even more, mumak can't simulate a large cluster with quite a lot of jobs run on it. On the other hand, Gridmix v3 need hundreds of physical nodes to replay job stories. You can think this tool a complementation of mumak and gridmix v3. We successfully used this tool to simulate a 12000 nodes cluster through 4 real machines. I've talk to Hong Tang and Scott Chen offline, they suggested me contributing this tool to the hadoop community. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-279) Map-Reduce 2.0
[ https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12999745#comment-12999745 ] Min Zhou commented on MAPREDUCE-279: @Arun How does ApplicationMaster know its resource requirements before it launches tasks? IMHO, the biggest problem of resources allocation is that we could't determine the CPU/memory/disk/network requirements unless when the task is running. User defined requirements by the configuration files are always improper. From your words, the architecture allows end-users to implement any application-specific framework by implementing a custom ApplicationMaster. Even common users can deploy their ApplicationMaster over the cluster they have no any permissions on that? Can you illustrate how to achieve it? Map-Reduce 2.0 -- Key: MAPREDUCE-279 URL: https://issues.apache.org/jira/browse/MAPREDUCE-279 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker, tasktracker Reporter: Arun C Murthy Assignee: Arun C Murthy Fix For: 0.23.0 Re-factor MapReduce into a generic resource scheduler and a per-job, user-defined component that manages the application execution. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-1981) Improve getSplits performance by using listFiles, the new FileSystem API
[ https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974576#action_12974576 ] Min Zhou commented on MAPREDUCE-1981: - @Hairong Thanks for your share, it greatly helps. We currently use 0.19.1, and our namenode will use LocatedFileStatus array over wire after applied your patch rather than DirectoryListing object. So the first bug happened. I have another idea for shorting client's getListing time by caching split files into DistributedCache. We always scan the same Hive table(or HDFS directory) many times, it needn't call Namenode's getListing again and again if the directory doesn't have any changes. My idea is getListing once, then cache the result splits, the subsequent job submissions reuse this cache without any getListing calls. Improve getSplits performance by using listFiles, the new FileSystem API Key: MAPREDUCE-1981 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.22.0 Attachments: mapredListFiles.patch, mapredListFiles1.patch, mapredListFiles2.patch, mapredListFiles3.patch, mapredListFiles4.patch, mapredListFiles5.patch This jira will make FileInputFormat and CombinedFileInputForm to use the new API, thus reducing the number of RPCs to HDFS NameNode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1981) Improve getSplits performance by using listFiles, the new FileSystem API
[ https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973510#action_12973510 ] Min Zhou commented on MAPREDUCE-1981: - Applied patches from this issuse and HDFS-202 on Hadoop v0.19.1, an exception throwed when running nnbench Exception in thread IPC Client (47) connection to nn151/192.168.201.151:9020 from zhoumin java.lang.RuntimeException: java.lang.NoSuchMethodException: org.apache.hadoop.fs.LocatedFileStatus.init() at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:81) at org.apache.hadoop.io.WritableFactories.newInstance(WritableFactories.java:53) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:236) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:171) at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:219) at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66) at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:509) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:439) Caused by: java.lang.NoSuchMethodException: org.apache.hadoop.fs.LocatedFileStatus.init() at java.lang.Class.getConstructor0(Class.java:2706) at java.lang.Class.getDeclaredConstructor(Class.java:1985) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:75) ... 7 more LocatedFileStatus is a Writable, it should implement a constructor with no params. Improve getSplits performance by using listFiles, the new FileSystem API Key: MAPREDUCE-1981 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.22.0 Attachments: mapredListFiles.patch, mapredListFiles1.patch, mapredListFiles2.patch, mapredListFiles3.patch, mapredListFiles4.patch, mapredListFiles5.patch This jira will make FileInputFormat and CombinedFileInputForm to use the new API, thus reducing the number of RPCs to HDFS NameNode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1981) Improve getSplits performance by using listFiles, the new FileSystem API
[ https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973539#action_12973539 ] Min Zhou commented on MAPREDUCE-1981: - Lines listed below will caused a NullPointerException because EMPTY_BLOCK_LOCS will return null when calling blocks.getLocatedBlocks() {noformat} /** a default LocatedBlocks object, its content should not be changed */ private final static LocatedBlocks EMPTY_BLOCK_LOCS = new LocatedBlocks(); {noformat} here is an example of this exception {noformat} java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hdfs.DFSUtil.locatedBlocks2Locations(DFSUtil.java:84) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.getListing(FSDirectory.java:731) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:2015) at org.apache.hadoop.hdfs.server.namenode.NameNode.getLocatedListing(NameNode.java:494) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) {noformat} Improve getSplits performance by using listFiles, the new FileSystem API Key: MAPREDUCE-1981 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.22.0 Attachments: mapredListFiles.patch, mapredListFiles1.patch, mapredListFiles2.patch, mapredListFiles3.patch, mapredListFiles4.patch, mapredListFiles5.patch This jira will make FileInputFormat and CombinedFileInputForm to use the new API, thus reducing the number of RPCs to HDFS NameNode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.