[ https://issues.apache.org/jira/browse/KUDU-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Grant Henke reassigned KUDU-1755: --------------------------------- Assignee: Grant Henke > Improve tablet disk space estimation > ------------------------------------ > > Key: KUDU-1755 > URL: https://issues.apache.org/jira/browse/KUDU-1755 > Project: Kudu > Issue Type: Bug > Components: supportability, tablet > Affects Versions: 1.1.0 > Reporter: Adar Dembo > Assignee: Grant Henke > Priority: Critical > Fix For: 1.3.0 > > > (Prompted by [this user > post|http://mail-archives.apache.org/mod_mbox/kudu-user/201611.mbox/%3Ctencent_201BBF963FB5CB2D7AF99E25%40qq.com%3E]) > The on-disk size of tablets as reported by the Kudu web UI omits some minor > as well as some major sources of space consumption. I'm listing them all here > for posterity. > # Bloom file and composite index file usage. According to [this > gerrit|https://gerrit.sjc.cloudera.com/#/c/6070/] (warning: internal link), > it's because we also use the rowset estimate to determine how much IO will be > generated were we to compact that rowset, and bloom/composite index files > aren't touched in compaction. > # UNDO file usage. This seems like a more glaring omission, especially for > mutation-heavy workloads like the one reported in the mailing list. But, the > current REDO-only estimate factors into major delta compaction decision > making by the maintenance manager, so maybe there's a good reason there too. > # Log block manager block size rounding. The LBM rounds up Kudu blocks to the > nearest filesystem block size to improve hole punching space reclamation. A > side effect is that some space is lost to external fragmentation. > # Log block manager metadata overhead. Every container has a .metadata file, > and we don't factor that into space utilization. > # Other files, such as the tablet superblock, WAL segments, and cmeta. > I expect the first two items to be the largest, so we should work on > addressing them. Lets decouple the UI-based estimate from the MM path so our > reporting can be more accurate while still allowing the MM to make good > decisions. -- This message was sent by Atlassian JIRA (v6.3.15#6346)