[jira] [Commented] (MAPREDUCE-6129) Job failed due to counter out of limited in MRAppMaster

2014-10-24 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183022#comment-14183022
 ] 

Min Zhou commented on MAPREDUCE-6129:
-

Hmm.. It indeed a duplicate. 

 Job failed due to counter out of limited in MRAppMaster
 ---

 Key: MAPREDUCE-6129
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6129
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 3.0.0, 2.3.0, 2.5.0, 2.4.1, 2.5.1
Reporter: Min Zhou
 Attachments: MAPREDUCE-6129.diff


 Lots of of cluster's job use more than 120 counters, those kind of jobs  
 failed with exception like below
 {noformat}
 2014-10-15 22:55:43,742 WARN [Socket Reader #1 for port 45673] 
 org.apache.hadoop.ipc.Server: Unable to read call parameters for client 
 10.180.216.12on connection protocol 
 org.apache.hadoop.mapred.TaskUmbilicalProtocol for rpcKind RPC_WRITABLE
 org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many 
 counters: 121 max=120
   at 
 org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:103)
   at 
 org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:110)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.readFields(AbstractCounterGroup.java:175)
   at org.apache.hadoop.mapred.Counters$Group.readFields(Counters.java:324)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounters.readFields(AbstractCounters.java:314)
   at org.apache.hadoop.mapred.TaskStatus.readFields(TaskStatus.java:489)
   at 
 org.apache.hadoop.mapred.ReduceTaskStatus.readFields(ReduceTaskStatus.java:140)
   at 
 org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
   at 
 org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:157)
   at 
 org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1802)
   at 
 org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1734)
   at 
 org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1494)
   at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:732)
   at 
 org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:606)
   at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:577)
 {noformat}
 The class org.apache.hadoop.mapreduce.counters.Limits load the 
 mapred-site.xml on nodemanager node for JobConf if it hasn't been inited. 
 If the mapred-site.xml on nodemanager node is not exist or the 
 mapreduce.job.counters.max hasn't been defined on that file, Class 
 org.apache.hadoop.mapreduce.counters.Limits will just  use the default value 
 120. 
 Instead, we should read user job's conf file rather than config files on 
 nodemanager for checking counters limits.
 I will submitt a patch later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6129) Job failed due to counter out of limited in MRAppMaster

2014-10-15 Thread Min Zhou (JIRA)
Min Zhou created MAPREDUCE-6129:
---

 Summary: Job failed due to counter out of limited in MRAppMaster
 Key: MAPREDUCE-6129
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6129
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Reporter: Min Zhou


Lots of of cluster's job use more than 120 counters, those kind of jobs  failed 
with exception like below
{noformat}
2014-10-15 22:55:43,742 WARN [Socket Reader #1 for port 45673] 
org.apache.hadoop.ipc.Server: Unable to read call parameters for client 
10.180.216.12on connection protocol 
org.apache.hadoop.mapred.TaskUmbilicalProtocol for rpcKind RPC_WRITABLE
org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters: 
121 max=120
at 
org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:103)
at 
org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:110)
at 
org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.readFields(AbstractCounterGroup.java:175)
at org.apache.hadoop.mapred.Counters$Group.readFields(Counters.java:324)
at 
org.apache.hadoop.mapreduce.counters.AbstractCounters.readFields(AbstractCounters.java:314)
at org.apache.hadoop.mapred.TaskStatus.readFields(TaskStatus.java:489)
at 
org.apache.hadoop.mapred.ReduceTaskStatus.readFields(ReduceTaskStatus.java:140)
at 
org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
at 
org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:157)
at 
org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1802)
at 
org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1734)
at 
org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1494)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:732)
at 
org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:606)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:577)

{noformat}

The class org.apache.hadoop.mapreduce.counters.Limits load the mapred-site.xml 
on nodemanager node for JobConf if it hasn't been inited. 
If the mapred-site.xml on nodemanager node is not exist or the 
mapreduce.job.counters.max hasn't been defined on that file, Class 
org.apache.hadoop.mapreduce.counters.Limits will just  use the default value 
120. 

Instead, we should read user job's conf file rather than config files on 
nodemanager for checking counters limits.

I will submitt a patch later.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6129) Job failed due to counter out of limited in MRAppMaster

2014-10-15 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated MAPREDUCE-6129:

Affects Version/s: 3.0.0
   2.3.0
   2.5.0
   2.4.1
   2.5.1

 Job failed due to counter out of limited in MRAppMaster
 ---

 Key: MAPREDUCE-6129
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6129
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 3.0.0, 2.3.0, 2.5.0, 2.4.1, 2.5.1
Reporter: Min Zhou

 Lots of of cluster's job use more than 120 counters, those kind of jobs  
 failed with exception like below
 {noformat}
 2014-10-15 22:55:43,742 WARN [Socket Reader #1 for port 45673] 
 org.apache.hadoop.ipc.Server: Unable to read call parameters for client 
 10.180.216.12on connection protocol 
 org.apache.hadoop.mapred.TaskUmbilicalProtocol for rpcKind RPC_WRITABLE
 org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many 
 counters: 121 max=120
   at 
 org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:103)
   at 
 org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:110)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.readFields(AbstractCounterGroup.java:175)
   at org.apache.hadoop.mapred.Counters$Group.readFields(Counters.java:324)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounters.readFields(AbstractCounters.java:314)
   at org.apache.hadoop.mapred.TaskStatus.readFields(TaskStatus.java:489)
   at 
 org.apache.hadoop.mapred.ReduceTaskStatus.readFields(ReduceTaskStatus.java:140)
   at 
 org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
   at 
 org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:157)
   at 
 org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1802)
   at 
 org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1734)
   at 
 org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1494)
   at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:732)
   at 
 org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:606)
   at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:577)
 {noformat}
 The class org.apache.hadoop.mapreduce.counters.Limits load the 
 mapred-site.xml on nodemanager node for JobConf if it hasn't been inited. 
 If the mapred-site.xml on nodemanager node is not exist or the 
 mapreduce.job.counters.max hasn't been defined on that file, Class 
 org.apache.hadoop.mapreduce.counters.Limits will just  use the default value 
 120. 
 Instead, we should read user job's conf file rather than config files on 
 nodemanager for checking counters limits.
 I will submitt a patch later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6129) Job failed due to counter out of limited in MRAppMaster

2014-10-15 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated MAPREDUCE-6129:

Attachment: MAPREDUCE-6129.diff

 Job failed due to counter out of limited in MRAppMaster
 ---

 Key: MAPREDUCE-6129
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6129
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 3.0.0, 2.3.0, 2.5.0, 2.4.1, 2.5.1
Reporter: Min Zhou
 Attachments: MAPREDUCE-6129.diff


 Lots of of cluster's job use more than 120 counters, those kind of jobs  
 failed with exception like below
 {noformat}
 2014-10-15 22:55:43,742 WARN [Socket Reader #1 for port 45673] 
 org.apache.hadoop.ipc.Server: Unable to read call parameters for client 
 10.180.216.12on connection protocol 
 org.apache.hadoop.mapred.TaskUmbilicalProtocol for rpcKind RPC_WRITABLE
 org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many 
 counters: 121 max=120
   at 
 org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:103)
   at 
 org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:110)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.readFields(AbstractCounterGroup.java:175)
   at org.apache.hadoop.mapred.Counters$Group.readFields(Counters.java:324)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounters.readFields(AbstractCounters.java:314)
   at org.apache.hadoop.mapred.TaskStatus.readFields(TaskStatus.java:489)
   at 
 org.apache.hadoop.mapred.ReduceTaskStatus.readFields(ReduceTaskStatus.java:140)
   at 
 org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
   at 
 org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:157)
   at 
 org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1802)
   at 
 org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1734)
   at 
 org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1494)
   at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:732)
   at 
 org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:606)
   at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:577)
 {noformat}
 The class org.apache.hadoop.mapreduce.counters.Limits load the 
 mapred-site.xml on nodemanager node for JobConf if it hasn't been inited. 
 If the mapred-site.xml on nodemanager node is not exist or the 
 mapreduce.job.counters.max hasn't been defined on that file, Class 
 org.apache.hadoop.mapreduce.counters.Limits will just  use the default value 
 120. 
 Instead, we should read user job's conf file rather than config files on 
 nodemanager for checking counters limits.
 I will submitt a patch later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-4291) RpcServerFactoryPBImpl force yarn users to define a protocol in a entailed namespace

2012-05-29 Thread Min Zhou (JIRA)
Min Zhou created MAPREDUCE-4291:
---

 Summary: RpcServerFactoryPBImpl force yarn users to define a 
protocol in a entailed namespace
 Key: MAPREDUCE-4291
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4291
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1, 0.24.0
Reporter: Min Zhou


We defined a wire protocol use protobuf with its java package name 
org.apache.hadoop.realtime.proto
{code:protobuf} 
option java_package = org.apache.hadoop.realtime.proto;
{code} 
Such definition would cause a ClassNotFoundException when starting our customed 
application master.
{noformat}
12/05/29 14:45:33 ERROR app.DragonAppMaster: Error starting DragonAppMaster
org.apache.hadoop.yarn.YarnException: Failed to Start 
org.apache.hadoop.realtime.client.app.DragonAppMaster
at 
org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78)
at 
org.apache.hadoop.realtime.client.app.DragonAppMaster.start(DragonAppMaster.java:155)
at 
org.apache.hadoop.realtime.client.app.DragonAppMaster$1.run(DragonAppMaster.java:218)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
at 
org.apache.hadoop.realtime.client.app.DragonAppMaster.initAndStartAppMaster(DragonAppMaster.java:214)
at 
org.apache.hadoop.realtime.client.app.DragonAppMaster.main(DragonAppMaster.java:200)
Caused by: org.apache.hadoop.yarn.YarnException: Failed to load class: 
[org.apache.hadoop.yarn.proto.DragonClientProtocol$DragonClientProtocolService]
at 
org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:105)
at 
org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:63)
at 
org.apache.hadoop.realtime.client.DragonClientService.start(DragonClientService.java:134)
at 
org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
... 7 more
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.yarn.proto.DragonClientProtocol$DragonClientProtocolService
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1162)
at 
org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:103)
... 10 more
{noformat}
RpcServerFactoryPBImpl hard coded namespace suffix and class suffix of every 
protocol we defined. It force yarn users to define a protocol in a entailed 
namespace, below is some lines of RpcServerFactoryPBImpl.java that issues in 
the bug.
{code:java}
  private static final String PROTO_GEN_PACKAGE_NAME = 
org.apache.hadoop.yarn.proto;
  private static final String PROTO_GEN_CLASS_SUFFIX = Service;
  private static final String PB_IMPL_PACKAGE_SUFFIX = impl.pb.service;
  private static final String PB_IMPL_CLASS_SUFFIX = PBServiceImpl;

//...

  private String getProtoClassName(Class? clazz) {
String srcClassName = getClassName(clazz);
return PROTO_GEN_PACKAGE_NAME + . + srcClassName + $ + srcClassName + 
PROTO_GEN_CLASS_SUFFIX;  
  } 
{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2425) Distributed simulator for stressing JobTracker and NameNode

2011-04-08 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017315#comment-13017315
 ] 

Min Zhou commented on MAPREDUCE-2425:
-

btw, this tool can stress RPC as well. 

 Distributed simulator for stressing JobTracker and NameNode
 ---

 Key: MAPREDUCE-2425
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2425
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks
Reporter: Min Zhou
  Labels: benchmark, hadoop
 Fix For: 0.22.0

 Attachments: .jpg, screenshot-1.jpg


 Hadoop need a tool for stressing JobTracker and NameNode. Mumak introduced a 
 simulated JobTracker, whose behavior doesn't exactly like that of the real 
 JobTracker. Even more, mumak can't simulate a large cluster with quite a lot 
 of jobs run on it. On the other hand, Gridmix v3 need hundreds of physical 
 nodes to replay job stories. 
 You can think this tool a complementation of mumak and gridmix v3. We 
 successfully used this tool to simulate a 12000 nodes cluster through 4 real 
 machines. 
 I've talk to Hong Tang and Scott Chen offline, they suggested me contributing 
 this tool to the hadoop community.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2425) Distributed simulator for stressing JobTracker and NameNode

2011-04-08 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated MAPREDUCE-2425:


Component/s: (was: benchmarks)
 contrib/mumak

 Distributed simulator for stressing JobTracker and NameNode
 ---

 Key: MAPREDUCE-2425
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2425
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/mumak
Reporter: Min Zhou
  Labels: benchmark, hadoop
 Fix For: 0.22.0

 Attachments: .jpg, screenshot-1.jpg


 Hadoop need a tool for stressing JobTracker and NameNode. Mumak introduced a 
 simulated JobTracker, whose behavior doesn't exactly like that of the real 
 JobTracker. Even more, mumak can't simulate a large cluster with quite a lot 
 of jobs run on it. On the other hand, Gridmix v3 need hundreds of physical 
 nodes to replay job stories. 
 You can think this tool a complementation of mumak and gridmix v3. We 
 successfully used this tool to simulate a 12000 nodes cluster through 4 real 
 machines. 
 I've talk to Hong Tang and Scott Chen offline, they suggested me contributing 
 this tool to the hadoop community.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-2425) Distributed simulator for stressing JobTracker and NameNode

2011-04-07 Thread Min Zhou (JIRA)
Distributed simulator for stressing JobTracker and NameNode
---

 Key: MAPREDUCE-2425
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2425
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks
Reporter: Min Zhou
 Fix For: 0.22.0


Hadoop need a tool for stressing JobTracker and NameNode. Mumak introduced a 
simulated JobTracker, whose behavior doesn't exactly like that of the real 
JobTracker. Even more, mumak can't simulate a large cluster with quite a lot of 
jobs run on it. On the other hand, Gridmix v3 need hundreds of physical nodes 
to replay job stories. 

You can think this tool a complementation of mumak and gridmix v3. We 
successfully used this tool to simulate a 12000 nodes cluster through 4 real 
machines. 
I've talk to Hong Tang and Scott Chen offline, they suggested me contributing 
this tool to the hadoop community.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2425) Distributed simulator for stressing JobTracker and NameNode

2011-04-07 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated MAPREDUCE-2425:


Attachment: .jpg

A screenshot of this tool. we are using hadoop 0.19.1

 Distributed simulator for stressing JobTracker and NameNode
 ---

 Key: MAPREDUCE-2425
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2425
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks
Reporter: Min Zhou
  Labels: benchmark, hadoop
 Fix For: 0.22.0

 Attachments: .jpg


 Hadoop need a tool for stressing JobTracker and NameNode. Mumak introduced a 
 simulated JobTracker, whose behavior doesn't exactly like that of the real 
 JobTracker. Even more, mumak can't simulate a large cluster with quite a lot 
 of jobs run on it. On the other hand, Gridmix v3 need hundreds of physical 
 nodes to replay job stories. 
 You can think this tool a complementation of mumak and gridmix v3. We 
 successfully used this tool to simulate a 12000 nodes cluster through 4 real 
 machines. 
 I've talk to Hong Tang and Scott Chen offline, they suggested me contributing 
 this tool to the hadoop community.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2425) Distributed simulator for stressing JobTracker and NameNode

2011-04-07 Thread Min Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Zhou updated MAPREDUCE-2425:


Attachment: screenshot-1.jpg

screenshot

 Distributed simulator for stressing JobTracker and NameNode
 ---

 Key: MAPREDUCE-2425
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2425
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks
Reporter: Min Zhou
  Labels: benchmark, hadoop
 Fix For: 0.22.0

 Attachments: .jpg, screenshot-1.jpg


 Hadoop need a tool for stressing JobTracker and NameNode. Mumak introduced a 
 simulated JobTracker, whose behavior doesn't exactly like that of the real 
 JobTracker. Even more, mumak can't simulate a large cluster with quite a lot 
 of jobs run on it. On the other hand, Gridmix v3 need hundreds of physical 
 nodes to replay job stories. 
 You can think this tool a complementation of mumak and gridmix v3. We 
 successfully used this tool to simulate a 12000 nodes cluster through 4 real 
 machines. 
 I've talk to Hong Tang and Scott Chen offline, they suggested me contributing 
 this tool to the hadoop community.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2425) Distributed simulator for stressing JobTracker and NameNode

2011-04-07 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13017273#comment-13017273
 ] 

Min Zhou commented on MAPREDUCE-2425:
-

Amar,

This simulator is developed  majorly for stressing JT and NN. It can also 
verify JT's runtime behavior as mumak does. Actually, we use v0.19.1 where 
rumen and mumak havenot been introduced at that time, so I developed this tool 
independent from them.  Now that I am planning to merge my code into mumak. But 
before that, I must do 2 things list below 

1. Mumak uses a simulated JT for telling TT some informations of a task-attempt 
reproduced by rumen through heartbeat. I perfer using the real JT when 
stressing it. 
2. I should uses new MR API before merging into mumak.

 Distributed simulator for stressing JobTracker and NameNode
 ---

 Key: MAPREDUCE-2425
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2425
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: benchmarks
Reporter: Min Zhou
  Labels: benchmark, hadoop
 Fix For: 0.22.0

 Attachments: .jpg, screenshot-1.jpg


 Hadoop need a tool for stressing JobTracker and NameNode. Mumak introduced a 
 simulated JobTracker, whose behavior doesn't exactly like that of the real 
 JobTracker. Even more, mumak can't simulate a large cluster with quite a lot 
 of jobs run on it. On the other hand, Gridmix v3 need hundreds of physical 
 nodes to replay job stories. 
 You can think this tool a complementation of mumak and gridmix v3. We 
 successfully used this tool to simulate a 12000 nodes cluster through 4 real 
 machines. 
 I've talk to Hong Tang and Scott Chen offline, they suggested me contributing 
 this tool to the hadoop community.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (MAPREDUCE-279) Map-Reduce 2.0

2011-02-26 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12999745#comment-12999745
 ] 

Min Zhou commented on MAPREDUCE-279:


@Arun

How does ApplicationMaster know its resource requirements before it launches 
tasks? IMHO, the biggest problem of resources allocation is that we could't 
determine the CPU/memory/disk/network requirements unless when the task is 
running. User defined requirements by the configuration files are always 
improper. 
From your words, the architecture allows end-users to implement any 
application-specific framework by implementing a custom ApplicationMaster. 
Even common users can deploy their ApplicationMaster over the cluster they 
have no any permissions on that? Can you illustrate how to achieve it?

 Map-Reduce 2.0
 --

 Key: MAPREDUCE-279
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker, tasktracker
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 0.23.0


 Re-factor MapReduce into a generic resource scheduler and a per-job, 
 user-defined component that manages the application execution. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (MAPREDUCE-1981) Improve getSplits performance by using listFiles, the new FileSystem API

2010-12-23 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974576#action_12974576
 ] 

Min Zhou commented on MAPREDUCE-1981:
-

@Hairong

Thanks for your share, it greatly helps.  We currently use 0.19.1,  and our 
namenode will use LocatedFileStatus array over wire after applied your patch  
rather than DirectoryListing object. So the first bug happened. 

I have another idea for shorting client's getListing time by caching split 
files into DistributedCache.  We always scan the same Hive table(or HDFS 
directory) many times, it needn't call Namenode's getListing again and again if 
the directory doesn't  have any changes. My idea is getListing once, then cache 
the result splits,  the subsequent job submissions reuse this cache without any 
getListing calls. 


 Improve getSplits performance by using listFiles, the new FileSystem API
 

 Key: MAPREDUCE-1981
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: job submission
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: mapredListFiles.patch, mapredListFiles1.patch, 
 mapredListFiles2.patch, mapredListFiles3.patch, mapredListFiles4.patch, 
 mapredListFiles5.patch


 This jira will make FileInputFormat and CombinedFileInputForm to use the new 
 API, thus reducing the number of RPCs to HDFS NameNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1981) Improve getSplits performance by using listFiles, the new FileSystem API

2010-12-21 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973510#action_12973510
 ] 

Min Zhou commented on MAPREDUCE-1981:
-

Applied patches from this issuse and HDFS-202 on Hadoop v0.19.1, an exception 
throwed when running nnbench

Exception in thread IPC Client (47) connection to nn151/192.168.201.151:9020 
from zhoumin java.lang.RuntimeException: java.lang.NoSuchMethodException: 
org.apache.hadoop.fs.LocatedFileStatus.init()
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:81)
at 
org.apache.hadoop.io.WritableFactories.newInstance(WritableFactories.java:53)
at 
org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:236)
at 
org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:171)
at 
org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:219)
at 
org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:66)
at 
org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:509)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:439)
Caused by: java.lang.NoSuchMethodException: 
org.apache.hadoop.fs.LocatedFileStatus.init()
at java.lang.Class.getConstructor0(Class.java:2706)
at java.lang.Class.getDeclaredConstructor(Class.java:1985)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:75)
... 7 more

LocatedFileStatus is a Writable, it should implement a constructor with no 
params.

 Improve getSplits performance by using listFiles, the new FileSystem API
 

 Key: MAPREDUCE-1981
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: job submission
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: mapredListFiles.patch, mapredListFiles1.patch, 
 mapredListFiles2.patch, mapredListFiles3.patch, mapredListFiles4.patch, 
 mapredListFiles5.patch


 This jira will make FileInputFormat and CombinedFileInputForm to use the new 
 API, thus reducing the number of RPCs to HDFS NameNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1981) Improve getSplits performance by using listFiles, the new FileSystem API

2010-12-21 Thread Min Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12973539#action_12973539
 ] 

Min Zhou commented on MAPREDUCE-1981:
-

Lines listed below will caused a NullPointerException because EMPTY_BLOCK_LOCS  
will return null when calling blocks.getLocatedBlocks()
{noformat}
   /** a default LocatedBlocks object, its content should not be changed */
   private final static LocatedBlocks EMPTY_BLOCK_LOCS = new LocatedBlocks();
{noformat}
here is an example of this exception
{noformat}
java.io.IOException: java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.DFSUtil.locatedBlocks2Locations(DFSUtil.java:84)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.getListing(FSDirectory.java:731)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:2015)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.getLocatedListing(NameNode.java:494)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
{noformat}

 Improve getSplits performance by using listFiles, the new FileSystem API
 

 Key: MAPREDUCE-1981
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: job submission
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: mapredListFiles.patch, mapredListFiles1.patch, 
 mapredListFiles2.patch, mapredListFiles3.patch, mapredListFiles4.patch, 
 mapredListFiles5.patch


 This jira will make FileInputFormat and CombinedFileInputForm to use the new 
 API, thus reducing the number of RPCs to HDFS NameNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.