[ https://issues.apache.org/jira/browse/KUDU-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexey Serbin updated KUDU-3016: -------------------------------- Fix Version/s: 1.12.0 Resolution: Fixed Status: Resolved (was: In Review) As for the possible performance regressions w.r.t. the changes introduced in KUDU-1125 and the fix for this issue, the chunk size of up to the maximum possible RPC size, so we shouldn't expect any performance regressions there. > Catalog manager: don't lump together all updates from one tablet report > ----------------------------------------------------------------------- > > Key: KUDU-3016 > URL: https://issues.apache.org/jira/browse/KUDU-3016 > Project: Kudu > Issue Type: Improvement > Components: master > Affects Versions: 1.6.0, 1.7.0, 1.8.0, 1.7.1, 1.9.0, 1.10.0, 1.10.1, > 1.11.0, 1.11.1 > Reporter: Alexey Serbin > Assignee: Alexey Serbin > Priority: Major > Labels: Availability, scalability > Fix For: 1.12.0 > > > With current structure of the system tablet for rows storing metadata > information on tablets, the catalog manager can create a very large write > operation on the system tablet when processing full tablet reports sent from > tablet servers. At some point (depends on the {{\-\-rpc_max_message_size}} > setting), a tablet report received from a tablet server comes through, but > its Raft counterpart for the system tablet update doesn't because it might be > almost two times larger. If that happens, Kudu cluster becomes almost > non-functional because of self-perpetuating > accepted-huge-tablet-report-but-cannot-push-Raft-update-to-follower-masters > pattern. > The catalog manager should not lump together updates on all tablets received > from one tablet server: > https://github.com/apache/kudu/blob/3175c35c7d721aef0c4c6b358cc3b422089c1ba7/src/kudu/master/catalog_manager.cc#L4268-L4274 -- This message was sent by Atlassian Jira (v8.3.4#803005)