[ https://issues.apache.org/jira/browse/HBASE-12141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156857#comment-14156857 ]
Andrew Purtell commented on HBASE-12141: ---------------------------------------- See http://mail-archives.apache.org/mod_mbox/hbase-user/201410.mbox/%3C3256288.x8cyWY5ZEW%40localhost.localdomain%3E . The network configuration is "interesting". > ClusterStatus message might exceed max datagram payload limits > -------------------------------------------------------------- > > Key: HBASE-12141 > URL: https://issues.apache.org/jira/browse/HBASE-12141 > Project: HBase > Issue Type: Bug > Affects Versions: 0.98.3 > Reporter: Andrew Purtell > > The multicast ClusterStatusPublisher and its companion listener are using > datagram channels without any framing. I think this is an issue because > Netty's ProtobufDecoder expects a complete PB message to be available in the > ChannelBuffer yet ClusterStatus messages can be large and might exceed the > maximum datagram payload size. As one user reported on list: > {noformat} > org.apache.hadoop.hbase.client.ClusterStatusListener - ERROR - Unexpected > exception, continuing. > com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had > invalid wire type. > at > com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:99) > at > com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:498) > at > com.google.protobuf.GeneratedMessage.parseUnknownField(GeneratedMessage.java:193) > at > org.apache.hadoop.hbase.protobuf.generated.ClusterStatusProtos$ClusterStatus.<init>(ClusterStatusProtos.java:7554) > at > org.apache.hadoop.hbase.protobuf.generated.ClusterStatusProtos$ClusterStatus.<init>(ClusterStatusProtos.java:7512) > at > org.apache.hadoop.hbase.protobuf.generated.ClusterStatusProtos$ClusterStatus$1.parsePartialFrom(ClusterStatusProtos.java:7689) > at > org.apache.hadoop.hbase.protobuf.generated.ClusterStatusProtos$ClusterStatus$1.parsePartialFrom(ClusterStatusProtos.java:7684) > at > com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:141) > at > com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:176) > at > com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:182) > at > com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) > at > org.jboss.netty.handler.codec.protobuf.ProtobufDecoder.decode(ProtobufDecoder.java:122) > at > org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66) > at > org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) > at > org.jboss.netty.channel.socket.oio.OioDatagramWorker.process(OioDatagramWorker.java:52) > at > org.jboss.netty.channel.socket.oio.AbstractOioWorker.run(AbstractOioWorker.java:73) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {noformat} > The javadoc for ProtobufDecoder says: > {quote} > Decodes a received ChannelBuffer into a Google Protocol Buffers Message and > MessageLite. Please note that this decoder must be used with a proper > FrameDecoder such as ProtobufVarint32FrameDecoder or > LengthFieldBasedFrameDecoder if you are using a stream-based transport such > as TCP/IP. > {quote} > and even though we are using a datagram transport we have related issues, > depending on what the sending and receiving OS does with overly large > datagrams: > - We may receive a datagram with a truncated message > - We may get an upcall when processing one fragment of a fragmented datagram, > where the complete message is not available yet > - We may not be able to send the overly large ClusterStatus in the first > place. Linux claims to do PMTU and return EMSGSIZE if a datagram packet > payload exceeds the MTU, but will send a fragmented datagram if PMTU is > disabled. I'm surprised we have the above report given the default is to > reject overly large datagram payloads, so perhaps the user is using a > different server OS or Netty datagram channels do their own fragmentation (I > haven't checked). > In any case, the server and client pipelines are definitely not doing any > kind of framing. This is the multicast status listener from 0.98 for example: > {code} > b.setPipeline(Channels.pipeline( > new > ProtobufDecoder(ClusterStatusProtos.ClusterStatus.getDefaultInstance()), > new ClusterStatusHandler())); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)