[ https://issues.apache.org/jira/browse/KUDU-2263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Grant Henke updated KUDU-2263: ------------------------------ Target Version/s: 1.8.0 (was: 1.7.0) > Consider removing PB descriptors from PBC header > ------------------------------------------------ > > Key: KUDU-2263 > URL: https://issues.apache.org/jira/browse/KUDU-2263 > Project: Kudu > Issue Type: Improvement > Components: util > Affects Versions: 1.7.0 > Reporter: Todd Lipcon > Priority: Major > > Looking at a cmeta file on disk, it seems the vast majority of the bytes are > in the supplemental header. We currently serialize the entire descriptor set > of the referenced file and its dependencies. This means that in each cmeta > file, we end up serializing even things like the definition of SchemaPB – > unnecessary to serialize the type at hand and quite large. > > At a minimum we can prune the descriptors serialized to only include those > that are transitively referenced by the PB type in the file. I think we > should also consider doing away with this information entirely and instead > allow 'kudu pbc dump' to take a descriptor set as external input – it's easy > enough to generate a descriptor set from any kudu version source tree using > the protoc command line. > One potential major improvement if we can get these files down to <4kb is > that we could atomically rewrite them in a single disk IO using O_DIRECT > rather than doing a rewrite-rename-fsync dance. -- This message was sent by Atlassian JIRA (v7.6.3#76005)