Hello Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/6636 to look at the new patch set (#43). Change subject: KUDU-1952 Remove round-robin for block placement ...................................................................... KUDU-1952 Remove round-robin for block placement This is the first of a multi-patch patchset to mitigate the effects of single-disk failure. Throughout the code, the term "DataDir" refers to a data directory, which is often mounted on a distinct disk. Thus, "disks" and "data directories" will be used interchangeably. This patch adds a mapping from tablet to a set of disks and uses it to replace the existing round-robin placement of blocks. Tablets are mapped to a fixed number of disks (i.e. a DataDirGroup). New blocks are placed randomly in directories within each tablet's DataDirGroup. Tablet-to-group mappings are generated and stored as metadata upon tablet creation, or upon tablet replacement during a tablet copy. During group creation, disks are added to groups by randomly selecting two available directories and selecting the one with fewer tablets on it ("Power of Two Random Choices"). This avoids pigeonholing new tablets to disks with relatively few tablets, while still trending towards filling underloaded disks. Groups are maintained when restarting the server, as they are flushed with metadata, and are deleted upon tablet deletion. When loading tablet data from a previous version of Kudu, the tablet's metadata will not have a DataDirGroup. One will be generated containing all data directories, as the tablet's data may already be spread across any number of disks. As this patch only addresses block placement, it does not itself mitigate the effects of single-disk failure. Given this, and given the tradeoff between I/O and disk-failure tolerance, the default behavior will be to spread tablet data across all disks. Testing is done at the block manager level in block_manager-test and log_block_manager-test, as well as in the new data_dirs-test. The implementation of block placement is compared against a python implementation found here: https://gist.github.com/andrwng/7c24e8e26aec68c50741f92eb6f2e48d Sweeping over a few parameters shows nearly identical stddev values of the distribution of tablets across directories between implementations. https://github.com/andrwng/kudu/blob/po2c/docs/images/10_20_3_10k_cpp.png https://github.com/andrwng/kudu/blob/po2c/docs/images/10_20_3_10k_py.png https://github.com/andrwng/kudu/blob/po2c/docs/images/30_10_5_5k_cpp.png https://github.com/andrwng/kudu/blob/po2c/docs/images/30_10_5_5k_py.png https://github.com/andrwng/kudu/blob/po2c/docs/images/30_200_5_5k_cpp.png https://github.com/andrwng/kudu/blob/po2c/docs/images/30_200_5_5k_py.png A design doc can be found here: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b --- M src/kudu/cfile/bloomfile-test-base.h M src/kudu/cfile/cfile-test-base.h M src/kudu/cfile/cfile-test.cc M src/kudu/fs/CMakeLists.txt M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.h A src/kudu/fs/data_dirs-test.cc M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/fs.proto M src/kudu/fs/fs_manager-test.cc M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/delta_compaction-test.cc M src/kudu/tablet/delta_compaction.cc M src/kudu/tablet/delta_compaction.h M src/kudu/tablet/delta_tracker.cc M src/kudu/tablet/deltafile-test.cc M src/kudu/tablet/deltamemstore-test.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h M src/kudu/tablet/tablet_bootstrap-test.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tserver/tablet_copy_client-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/util/pb_util.proto 39 files changed, 1,057 insertions(+), 201 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/6636/43 -- To view, visit http://gerrit.cloudera.org:8080/6636 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9828147f4fa5c4d7f6ed23441dca5a116b8cb11b Gerrit-PatchSet: 43 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com> Gerrit-Reviewer: Jean-Daniel Cryans <jdcry...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org>