Joe McDonnell has uploaded a new patch set (#14) to the change originally created by Zihao Ye. ( http://gerrit.cloudera.org:8080/19532 )
Change subject: IMPALA-11904: Data cache support dumping for reloading ...................................................................... IMPALA-11904: Data cache support dumping for reloading Data cache mainly includes cache metadata and cache files. The cache files are located on the disk and is responsible for storing cached data content, while the cache metadata is located in the memory and is responsible for indexing to the cache file according to the cache key. Before this patch, if the impalad process exits, the cache metadata will be lost. After the Impalad process restarts, we cannot reuse the cache file even though it is still on the disk, because there is no corresponding cache metadata for index. This patch implements the dump and load functions of the data cache. After enabling the dump&load function with setting 'data_cache_keep_across_restarts=true', when the Impalad process is closed by graceful shutdown (kill -SIGRTMIN $pid), the data cache will collect the cache metadata and dump them to the location where the cache directory is located. When the Impalad process restarts, it will try to load the dumped files on the disk to restore the original cache metadata, so that the existing cache files can be reused without refilling the cache. The cache can be safely dumped during query execution, because before the dump starts, the data cache will be set to read-only to prevent the inconsistency between the metadata dump and the cache file. Note that the dump files will also use disk space. After testing, the size of the dump file is generally not more than 0.5% of the size of all cache files. Testing: - Add DataCacheTest,#SetReadOnly Used to test whether set/revoke read-only takes effect, even when there are writes in progress. - Add DataCacheTest,#DumpAndLoad Used to test whether the original cache contents can be read after a data cache dump and reload. - Add DataCacheTest,#ChangeConfBeforeLoad Used to test whether the original cache contents can be read after the data cache is dumped and the configuration is changed and then reloaded. - Add end-to-end test in test_data_cache.py Perform end-to-end testing in a custom cluster, including executing queries, gracefully restarting, verifying metrics, re-executing the same query and verifying hits/misses. This also includes testing the modification of cache capacity and restart, as well as testing restarts while querie is in progress. Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101 --- M CMakeLists.txt M be/src/runtime/io/data-cache-test.cc M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/scheduling/executor-group.cc M be/src/service/impala-server.cc M be/src/util/cache/cache-internal.h M be/src/util/cache/cache.h M be/src/util/cache/lirs-cache.cc M be/src/util/cache/rl-cache.cc M tests/common/impala_cluster.py M tests/custom_cluster/test_data_cache.py 14 files changed, 883 insertions(+), 82 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/19532/14 -- To view, visit http://gerrit.cloudera.org:8080/19532 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101 Gerrit-Change-Number: 19532 Gerrit-PatchSet: 14 Gerrit-Owner: Zihao Ye <eyiz...@163.com> Gerrit-Reviewer: David Rorke <dro...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Zihao Ye <eyiz...@163.com>