Ye Zihao created IMPALA-11904:
---------------------------------

             Summary: Data cache should support dumping metadata for reloading
                 Key: IMPALA-11904
                 URL: https://issues.apache.org/jira/browse/IMPALA-11904
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
    Affects Versions: Impala 4.3.0
            Reporter: Ye Zihao
            Assignee: Ye Zihao


Data cache mainly includes cache metadata and cache files. The cache files are 
located on the disk and is responsible for storing cached data content, while 
the cache metadata is located in the memory and is responsible for indexing to 
the cache file according to the cache key.
Currently, if the impalad process exits, the cache metadata will be lost.   
After the Impalad process restarts, we cannot reuse the cache file even though 
it is still on the disk, because there is no corresponding cache metadata for 
index.
If we can support dumping the cache metadata to disk when the process exits, 
then the next time the process starts it can be reloaded back into memory and 
the previous cache files can be reused. This would be helpful in a real 
production environment, where cache data often exceeds TB in size (per 
process), and loss of cache data due to a configuration change or version 
upgrade can take days to recover.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to