[jira] [Created] (HDFS-9692) Report top 10 Namenode rpc consumers through JMX

2016-01-23 Thread Nikhil Mulley (JIRA)
Nikhil Mulley created HDFS-9692:
---

 Summary: Report top 10 Namenode rpc consumers through JMX
 Key: HDFS-9692
 URL: https://issues.apache.org/jira/browse/HDFS-9692
 Project: Hadoop HDFS
  Issue Type: Wish
Reporter: Nikhil Mulley


Hi,

I think it would really help if namenode(s) through metrics/jmx report the top 
rpc consumers, so it will be really handy to look at the rogue clients in times 
of despair/troubleshooting times.
At times of rpc spikes on namenode, and callqueuelength increasing, it becomes 
tedious to figure out the offenders when there is a huge cluster (>1k nodes).
Having rpc client information(src_host:src_port)  in the top consumers list 
would help operators. Let me know if any other information is needed to make 
this feature added to namenode metrics system.

thank you

Nikhil




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9692) Report Top Namenode rpc consumers through JMX

2016-01-23 Thread Nikhil Mulley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikhil Mulley updated HDFS-9692:

Summary: Report Top Namenode rpc consumers through JMX  (was: Report top 10 
Namenode rpc consumers through JMX)

> Report Top Namenode rpc consumers through JMX
> -
>
> Key: HDFS-9692
> URL: https://issues.apache.org/jira/browse/HDFS-9692
> Project: Hadoop HDFS
>  Issue Type: Wish
>Reporter: Nikhil Mulley
>
> Hi,
> I think it would really help if namenode(s) through metrics/jmx report the 
> top rpc consumers, so it will be really handy to look at the rogue clients in 
> times of despair/troubleshooting times.
> At times of rpc spikes on namenode, and callqueuelength increasing, it 
> becomes tedious to figure out the offenders when there is a huge cluster (>1k 
> nodes).
> Having rpc client information(src_host:src_port)  in the top consumers list 
> would help operators. Let me know if any other information is needed to make 
> this feature added to namenode metrics system.
> thank you
> Nikhil



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-5646) Exceptions during HDFS failover

2013-12-09 Thread Nikhil Mulley (JIRA)
Nikhil Mulley created HDFS-5646:
---

 Summary: Exceptions during HDFS failover
 Key: HDFS-5646
 URL: https://issues.apache.org/jira/browse/HDFS-5646
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Reporter: Nikhil Mulley


Hi, In our HDFS HA, I see the following excpetions when I try to failback. I 
have an auto failover mechanism enabled. Although the failback operation 
succeeds, the exceptions and the return status of 255 tend to worry me (because 
I cannot script this if I needed to) Please let me know if this is anything 
that is known and easily resolvable. 
I am using Cloudera Hadoop 4.4.0, if that helps.Please let me know if I need to 
open this ticket with CDH Jira, instead. 
Thanks. 

sudo -u hdfs hdfs haadmin -failover nn2 nn1 
Operation failed: Unable to become active. Service became unhealthy while 
trying to failover. at 
org.apache.hadoop.ha.ZKFailoverController.doGracefulFailover(ZKFailoverController.java:652)
 at 
org.apache.hadoop.ha.ZKFailoverController.access$400(ZKFailoverController.java:58)
 at 
org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:591) 
at 
org.apache.hadoop.ha.ZKFailoverController$3.run(ZKFailoverController.java:588) 
at java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:396) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
 at 
org.apache.hadoop.ha.ZKFailoverController.gracefulFailoverToYou(ZKFailoverController.java:588)
 at org.apache.hadoop.ha.ZKFCRpcServer.gracefulFailover(ZKFCRpcServer.java:94) 
at 
org.apache.hadoop.ha.protocolPB.ZKFCProtocolServerSideTranslatorPB.gracefulFailover(ZKFCProtocolServerSideTranslatorPB.java:61)
 at 
org.apache.hadoop.ha.proto.ZKFCProtocolProtos$ZKFCProtocolService$2.callBlockingMethod(ZKFCProtocolProtos.java:1351)
 at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) at 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1751) at 
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1747) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:396) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1745)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)