Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/19532 )
Change subject: IMPALA-11904: Data cache support dumping for reloading ...................................................................... Patch Set 4: (4 comments) Thanks for putting this up, we have been thinking about implementing this for a while now! I'm still reading through this, but here are some initial comments. >From a testing point of view, I'm interested about what happens if a user >changes the data_cache parameter value. For example, they could change the >size of the cache when they restart. http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/runtime/io/data-cache.cc@993 PS4, Line 993: std::ofstream file; : file.open(JoinPathSegments(path_, DUMP_FILE_NAME)); What happens if the file already exists? For example, if we dump to the file, then start back up, load it, and then we are shutting down again. Should we be truncating any existing file? (e.g. std::ofstream::trunc) Also, do we need to open the file in binary mode? (std::ofstream::binary) http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/runtime/io/data-cache.cc@1010 PS4, Line 1010: std::ifstream file; : file.open(JoinPathSegments(path_, DUMP_FILE_NAME)); Do we need to open in binary mode? (std::ofstream::binary) http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/util/cache/cache-internal.h File be/src/util/cache/cache-internal.h: http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/util/cache/cache-internal.h@316 PS4, Line 316: handles.emplace_back( : reinterpret_cast<Cache::Handle*>(handle), Cache::HandleDeleter(this)); I haven't tried this, but I think we should be able to move the UniqueHandle into the vector. i.e. handles.push_back(std::move(handle)); This should also bring along the custom deleter. http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/util/cache/lirs-cache.cc File be/src/util/cache/lirs-cache.cc: http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/util/cache/lirs-cache.cc@1088 PS4, Line 1088: The UNPROTECTED : // entry can be in a recency list and a unprotected list at the same time, so a : // temporary list is required to remove duplicating unprotected entries when collecting. : std::list<LIRSHandle*> unprotecteds; There are two ways to avoid constructing this list: 1) When iterating the recency_list_, don't do anything for UNPROTECTED entries. Then, all the UNPROTECTED will be handled by walking the unprotected list. 2) When iterating the recency_list_, put both PROTECTED and UNPROTECTED into the handles vector. Then, when walking the unprotected list, check whether it is linked on the recency_list_ (i.e. e->recency_list_hook_.is_linked()) and only add it to the handles vector if it is not. -- To view, visit http://gerrit.cloudera.org:8080/19532 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101 Gerrit-Change-Number: 19532 Gerrit-PatchSet: 4 Gerrit-Owner: Anonymous Coward <18770832...@163.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Comment-Date: Tue, 28 Feb 2023 03:22:51 +0000 Gerrit-HasComments: Yes