[ https://issues.apache.org/jira/browse/KUDU-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16012887#comment-16012887 ]
Todd Lipcon commented on KUDU-2014: ----------------------------------- Another potentially easy win is to actually start more than one metadata-loading thread per disk. This seems to improve startup time by ~30%: {code} [root@vd0340 data]# echo 3 | sudo tee /proc/sys/vm/drop_caches 3 [root@vd0340 data]# time ls *metadata | xargs -P20 -n1 cat > /dev/null real 0m29.313s user 0m0.124s sys 0m0.607s [root@vd0340 data]# echo 3 | sudo tee /proc/sys/vm/drop_caches 3 [root@vd0340 data]# time ls *metadata | xargs -n1 cat > /dev/null real 0m42.676s user 0m0.273s sys 0m1.153s {code} (I guess the improvement is just getting more queue depth to the underlying disk) > Explore additional approaches to improve LBM startup time > --------------------------------------------------------- > > Key: KUDU-2014 > URL: https://issues.apache.org/jira/browse/KUDU-2014 > Project: Kudu > Issue Type: Bug > Components: fs > Affects Versions: 1.4.0 > Reporter: Adar Dembo > Labels: data-scalability > > The fix for KUDU-1549 added support for deleting full log block manager > containers with no live blocks, and for compacting container metadata to omit > CREATE/DELETE record pairs. Both of these will help reduce the amount of > metadata that must be read at startup. However, there's more we can do to > help; this JIRA captures some additional ideas worth exploring (if/when LBM > startup once again becomes intolerable): > In [this > gerrit|https://gerrit.cloudera.org/#/c/6826/2/src/kudu/fs/log_block_manager.cc@90], > Todd made the case that container metadata processing is seek-dominant: > {quote} > looking at a data/ dir on a cluster that has been around for quite some time, > most of the metadata files seem to be around 400KB. Assuming 100MB/sec > sequential throughput and 10ms seek, it definitely seems like the startup > time would be seek-dominated (10 or 20ms seek depending whether various > internal metadata pages are hot in cache, plus only 4ms of sequential read > time). > {quote} > We theorized several ways to reduce seeking, all focused on reducing the > number of discrete container metadata files read at startup: > # Raise the container max data file size. This won't help on older versions > of el6 with ext4, but will help everywhere else. It makes sense for the max > data file size to be a function of the disk size anyway. And it's a pretty > cheap way to extract more scalability. > # Reuse container data file holes, explicitly to avoid creating so many > containers. Perhaps with a round of "defragmentation" to simplify reuse, or > perhaps not. As a side effect, metadata file compaction now becomes more > important (and costly). > # Eschew one metadata file per data file altogether and maintain just one > metadata file. Deleting "dead" containers would no longer be an improvement > for metadata startup cost. Metadata compaction would be a lot more expensive. > Block records themselves would be larger, because each record now needs to > point to a particular data file, though this can be mitigated in various > ways. A variant of this would be to do away with the 1-1 relationship between > metadata and data files and make it more like m-n. > # Reduce the number of extents in container metadata files via judicious > preallocation. > See the gerrit linked above for more details. -- This message was sent by Atlassian JIRA (v6.3.15#6346)