Hello David Rorke, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/19532 to look at the new patch set (#7). Change subject: IMPALA-11904: Data cache support dumping for reloading ...................................................................... IMPALA-11904: Data cache support dumping for reloading Data cache mainly includes cache metadata and cache files. The cache files are located on the disk and is responsible for storing cached data content, while the cache metadata is located in the memory and is responsible for indexing to the cache file according to the cache key. Before this patch, if the impalad process exits, the cache metadata will be lost. After the Impalad process restarts, we cannot reuse the cache file even though it is still on the disk, because there is no corresponding cache metadata for index. This patch implements the dump and load functions of the data cache. After enabling the dump function with setting 'data_cache_enable_dumping', when the Impalad process is closed by graceful shutdown (kill -SIGRTMIN $pid), the data cache will collect the cache metadata and dump them to the location where the cache directory is located. After enabling the load function with setting 'data_cache_enable_loading', when the Impalad process starts, it will try to load the dumped files on the disk to restore the original cache metadata, so that the existing cache files can be reused without refilling the cache. The cache can be safely dumped during query execution, because before the dump starts, the data cache will be set to read-only to prevent the inconsistency between the metadata dump and the cache file. Note that the dump files will also use disk space. After testing, the size of the dump file is generally not more than 0.5% of the size of all cache files. Testing: - Add DataCacheTest,#SetReadOnly Used to test whether set/revoke read-only takes effect, even when there are writes in progress. - Add DataCacheTest,#DumpAndLoad Used to test whether the original cache contents can be read after a data cache dump and reload. Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101 --- M CMakeLists.txt M be/src/runtime/io/data-cache-test.cc M be/src/runtime/io/data-cache.cc M be/src/runtime/io/data-cache.h M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/disk-io-mgr.h M be/src/scheduling/executor-group.cc M be/src/service/impala-server.cc M be/src/util/cache/cache-internal.h M be/src/util/cache/cache.h M be/src/util/cache/lirs-cache.cc M be/src/util/cache/rl-cache.cc 12 files changed, 650 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/19532/7 -- To view, visit http://gerrit.cloudera.org:8080/19532 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101 Gerrit-Change-Number: 19532 Gerrit-PatchSet: 7 Gerrit-Owner: Anonymous Coward <18770832...@163.com> Gerrit-Reviewer: Anonymous Coward <18770832...@163.com> Gerrit-Reviewer: David Rorke <dro...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>