[ https://issues.apache.org/jira/browse/KUDU-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015523#comment-17015523 ]
ASF subversion and git services commented on KUDU-3016: ------------------------------------------------------- Commit 3a30cc22684512e3f2934cde57bc10bebf7dd156 in kudu's branch refs/heads/master from Alexey Serbin [ https://gitbox.apache.org/repos/asf?p=kudu.git;h=3a30cc2 ] [master] KUDU-3016 flag for chunking tablet report updates This patch introduces a flag to chunk updates on the system tablet generated by master while processing tablet reports. When the flag is set to 'true' (that's the default setting), masters chunk the updates on the system tablets which otherwise would be oversized. When the flag is set to 'false', masters reject tablet reports which would lead to the oversized write requests on the system tablet. With either setting, masters avoid hitting the maximum RPC size limit while pushing corresponding Raft updates on the system tablet to follower masters. A test is added to reproduce the scenario described in KUDU-3016. In the test scenario, the average size of incoming TSHeartbeat RPC is about 28 KB in size, while the corresponding WriteRequestPB is over 70 KB in size. Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5 Reviewed-on: http://gerrit.cloudera.org:8080/14897 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo <a...@cloudera.com> > Catalog manager: don't lump together all updates from one tablet report > ----------------------------------------------------------------------- > > Key: KUDU-3016 > URL: https://issues.apache.org/jira/browse/KUDU-3016 > Project: Kudu > Issue Type: Improvement > Components: master > Affects Versions: 1.6.0, 1.7.0, 1.8.0, 1.7.1, 1.9.0, 1.10.0, 1.10.1, > 1.11.0, 1.11.1 > Reporter: Alexey Serbin > Assignee: Alexey Serbin > Priority: Major > Labels: Availability, scalability > > With current structure of the system tablet for rows storing metadata > information on tablets, the catalog manager can create a very large write > operation on the system tablet when processing full tablet reports sent from > tablet servers. At some point (depends on the {{\-\-rpc_max_message_size}} > setting), a tablet report received from a tablet server comes through, but > its Raft counterpart for the system tablet update doesn't because it might be > almost two times larger. If that happens, Kudu cluster becomes almost > non-functional because of self-perpetuating > accepted-huge-tablet-report-but-cannot-push-Raft-update-to-follower-masters > pattern. > The catalog manager should not lump together updates on all tablets received > from one tablet server: > https://github.com/apache/kudu/blob/3175c35c7d721aef0c4c6b358cc3b422089c1ba7/src/kudu/master/catalog_manager.cc#L4268-L4274 -- This message was sent by Atlassian Jira (v8.3.4#803005)