Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19532 )

Change subject: IMPALA-11904: Data cache support dumping for reloading
......................................................................


Patch Set 4:

(4 comments)

Thanks for putting this up, we have been thinking about implementing this for a 
while now!

I'm still reading through this, but here are some initial comments.

>From a testing point of view, I'm interested about what happens if a user 
>changes the data_cache parameter value. For example, they could change the 
>size of the cache when they restart.

http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/runtime/io/data-cache.cc@993
PS4, Line 993:   std::ofstream file;
             :   file.open(JoinPathSegments(path_, DUMP_FILE_NAME));
What happens if the file already exists? For example, if we dump to the file, 
then start back up, load it, and then we are shutting down again. Should we be 
truncating any existing file? (e.g. std::ofstream::trunc)

Also, do we need to open the file in binary mode? (std::ofstream::binary)


http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/runtime/io/data-cache.cc@1010
PS4, Line 1010:   std::ifstream file;
              :   file.open(JoinPathSegments(path_, DUMP_FILE_NAME));
Do we need to open in binary mode? (std::ofstream::binary)


http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/util/cache/cache-internal.h
File be/src/util/cache/cache-internal.h:

http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/util/cache/cache-internal.h@316
PS4, Line 316:         handles.emplace_back(
             :             reinterpret_cast<Cache::Handle*>(handle), 
Cache::HandleDeleter(this));
I haven't tried this, but I think we should be able to move the UniqueHandle 
into the vector. i.e.

handles.push_back(std::move(handle));

This should also bring along the custom deleter.


http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/util/cache/lirs-cache.cc
File be/src/util/cache/lirs-cache.cc:

http://gerrit.cloudera.org:8080/#/c/19532/4/be/src/util/cache/lirs-cache.cc@1088
PS4, Line 1088:  The UNPROTECTED
              :   // entry can be in a recency list and a unprotected list at 
the same time, so a
              :   // temporary list is required to remove duplicating 
unprotected entries when collecting.
              :   std::list<LIRSHandle*> unprotecteds;
There are two ways to avoid constructing this list:

1) When iterating the recency_list_, don't do anything for UNPROTECTED entries. 
Then, all the UNPROTECTED will be handled by walking the unprotected list.

2) When iterating the recency_list_, put both PROTECTED and UNPROTECTED into 
the handles vector. Then, when walking the unprotected list, check whether it 
is linked on the recency_list_ (i.e. e->recency_list_hook_.is_linked()) and 
only add it to the handles vector if it is not.



--
To view, visit http://gerrit.cloudera.org:8080/19532
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id867f4fc7343898e4906332c3caa40eb57a03101
Gerrit-Change-Number: 19532
Gerrit-PatchSet: 4
Gerrit-Owner: Anonymous Coward <18770832...@163.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Comment-Date: Tue, 28 Feb 2023 03:22:51 +0000
Gerrit-HasComments: Yes

Reply via email to