[
https://issues.apache.org/jira/browse/HBASE-11747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611051#comment-14611051
]
Mikhail Antonov commented on HBASE-11747:
-----------------------------------------
[~stack]
bq. For #2, was looking at exporting jmx so say the Master could read cluster
metrics instead of getting metrics recast and served on the heartbeat
Did you mean rpc, not jmx? I briefly looked at where it's actually used, and
unless I'm missing something, we don't really use it in any heardbeats. Cluster
status is used:
- for subscribers on (multicast) publishing (that's the only push as far as I
can tell?)
- in separate MasterRpcServices#GetClusterStatus rpc call and accordingly in
Admin interface wrapping it (which is in the log posted in the jora)
- in REST messages
For regular heartbeats we just use MRS#regionServerReport rpc call, which only
pushes to master RS server name/load (including region load). So as far as I
can tell, those are already mostly decoupled. So I think the options (aside
bumping the size of message) drift to something like "check if monolithic
cluster status is looking too big (over defined limit) on server side, and
return it with empty load in this case, setting some flag indicating that
message is partially constructed to not fail as transport level, and that
client should use separate call to request server/region load for the list of
RSs it's interested to know about?"
In other words, I guess I see 2 basic options:
- bump the size of message in this jira (trivial patch)
- leave current ClusterStatus format as is for compatibility, but add handling
to return empty LiveServerInfo list if it's coming up too big, add new rpc call
to retrieve list of LiveServerInfo for a list (range?) of region servers.
Here's where RS groups would be handy. What do you think?
bq. Seems like its possible to hook up as src for D3 graphing
Hmm, that's something different, drawing metrics in the UI?
> ClusterStatus is too bulky
> ---------------------------
>
> Key: HBASE-11747
> URL: https://issues.apache.org/jira/browse/HBASE-11747
> Project: HBase
> Issue Type: Sub-task
> Reporter: Virag Kothari
> Attachments: exceptiontrace
>
>
> Following exception on 0.98 with 1M regions on cluster with 160 region servers
> {code}
> Caused by: java.io.IOException: Call to regionserverhost:port failed on local
> exception: com.google.protobuf.InvalidProtocolBufferException: Protocol
> message was too large. May be malicious. Use
> CodedInputStream.setSizeLimit() to increase the size limit.
> at
> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1482)
> at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1454)
> at
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
> at
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
> at
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.getClusterStatus(MasterProtos.java:42555)
> at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.getClusterStatus(HConnectionManager.java:2132)
> at
> org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2166)
> at
> org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2162)
> at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
> ... 43 more
> Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol
> message was too large. May be malicious. Use
> CodedInputStream.setSizeLimit() to increase the size limit.
> at
> com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)