[ 
https://issues.apache.org/jira/browse/KUDU-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015523#comment-17015523
 ] 

ASF subversion and git services commented on KUDU-3016:
-------------------------------------------------------

Commit 3a30cc22684512e3f2934cde57bc10bebf7dd156 in kudu's branch 
refs/heads/master from Alexey Serbin
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=3a30cc2 ]

[master] KUDU-3016 flag for chunking tablet report updates

This patch introduces a flag to chunk updates on the system tablet
generated by master while processing tablet reports.  When the flag
is set to 'true' (that's the default setting), masters chunk the updates
on the system tablets which otherwise would be oversized.  When the flag
is set to 'false', masters reject tablet reports which would lead
to the oversized write requests on the system tablet.  With either
setting, masters avoid hitting the maximum RPC size limit while pushing
corresponding Raft updates on the system tablet to follower masters.

A test is added to reproduce the scenario described in KUDU-3016.
In the test scenario, the average size of incoming TSHeartbeat RPC
is about 28 KB in size, while the corresponding WriteRequestPB is
over 70 KB in size.

Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Reviewed-on: http://gerrit.cloudera.org:8080/14897
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <a...@cloudera.com>


> Catalog manager: don't lump together all updates from one tablet report
> -----------------------------------------------------------------------
>
>                 Key: KUDU-3016
>                 URL: https://issues.apache.org/jira/browse/KUDU-3016
>             Project: Kudu
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 1.6.0, 1.7.0, 1.8.0, 1.7.1, 1.9.0, 1.10.0, 1.10.1, 
> 1.11.0, 1.11.1
>            Reporter: Alexey Serbin
>            Assignee: Alexey Serbin
>            Priority: Major
>              Labels: Availability, scalability
>
> With current structure of the system tablet for rows storing metadata 
> information on tablets, the catalog manager can create a very large write 
> operation on the system tablet when processing full tablet reports sent from 
> tablet servers.  At some point (depends on the {{\-\-rpc_max_message_size}} 
> setting), a tablet report received from a tablet server comes through, but 
> its Raft counterpart for the system tablet update doesn't because it might be 
> almost two times larger.  If that happens, Kudu cluster becomes almost 
> non-functional because of self-perpetuating 
> accepted-huge-tablet-report-but-cannot-push-Raft-update-to-follower-masters 
> pattern.
> The catalog manager should not lump together updates on all tablets received 
> from one tablet server:  
> https://github.com/apache/kudu/blob/3175c35c7d721aef0c4c6b358cc3b422089c1ba7/src/kudu/master/catalog_manager.cc#L4268-L4274



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to