[ 
https://issues.apache.org/jira/browse/HBASE-11747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611051#comment-14611051
 ] 

Mikhail Antonov commented on HBASE-11747:
-----------------------------------------

[~stack] 

bq. For #2, was looking at exporting jmx so say the Master could read cluster 
metrics instead of getting metrics recast and served on the heartbeat
Did you mean rpc, not jmx? I briefly looked at where it's actually used, and 
unless I'm missing something, we don't really use it in any heardbeats. Cluster 
status is used:

 - for subscribers on (multicast) publishing (that's the only push as far as I 
can tell?)
-  in separate MasterRpcServices#GetClusterStatus rpc call and accordingly in 
Admin interface wrapping it (which is in the log posted in the jora)
- in REST messages

For regular heartbeats we just use MRS#regionServerReport rpc call, which only 
pushes to master RS server name/load (including region load). So as far as I 
can tell, those are already mostly decoupled. So I think the options (aside 
bumping the size of message) drift to something like "check if monolithic 
cluster status is looking too big (over defined limit) on server side, and 
return it with empty load in this case, setting some flag indicating that 
message is partially constructed to not fail as transport level, and that 
client should use separate call to request server/region load for the list of 
RSs it's interested to know about?"

In other words, I guess I see 2 basic options:
 - bump the size of message in this jira (trivial patch)
 - leave current ClusterStatus format as is for compatibility, but add handling 
to return empty LiveServerInfo list if it's coming up too big, add new rpc call 
to retrieve list of LiveServerInfo for a list (range?) of region servers. 
Here's where RS groups would be handy. What do you think?

bq. Seems like its possible to hook up as src for D3 graphing
Hmm, that's something different, drawing metrics in the UI?

> ClusterStatus is too bulky 
> ---------------------------
>
>                 Key: HBASE-11747
>                 URL: https://issues.apache.org/jira/browse/HBASE-11747
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Virag Kothari
>         Attachments: exceptiontrace
>
>
> Following exception on 0.98 with 1M regions on cluster with 160 region servers
> {code}
> Caused by: java.io.IOException: Call to regionserverhost:port failed on local 
> exception: com.google.protobuf.InvalidProtocolBufferException: Protocol 
> message was too large.  May be malicious.  Use 
> CodedInputStream.setSizeLimit() to increase the size limit.
>       at 
> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1482)
>       at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1454)
>       at 
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
>       at 
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
>       at 
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.getClusterStatus(MasterProtos.java:42555)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.getClusterStatus(HConnectionManager.java:2132)
>       at 
> org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2166)
>       at 
> org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2162)
>       at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
>       ... 43 more
> Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol 
> message was too large.  May be malicious.  Use 
> CodedInputStream.setSizeLimit() to increase the size limit.
>       at 
> com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to