Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13724 )

Change subject: IMPALA-8341: [DOCS] Describe the settings for remote data 
caching
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13724/1/docs/topics/impala_data_cache.xml
File docs/topics/impala_data_cache.xml:

http://gerrit.cloudera.org:8080/#/c/13724/1/docs/topics/impala_data_cache.xml@44
PS1, Line 44: --data_cache_dir
--data_cache_dir and --data_cache_size are options built specifically only for 
./bin/start-impala-cluster.py as that script needs to create extra 
sub-directories for the caching directory.

To specify the caching directory, the user should use the flag: 
--data_cache=<dir1>,<dir2>,<dir3>:<quota>

With the above configuration, data will be stored in <dir1>, <dir2> and <dir3> 
respectively. The user needs to make sure those directories exist in the local 
filesystem to begin with. In addition, the filesystem which the directory 
resides in must support hole punching. Modern filesystems such as ext4 and xfs 
support this feature. The cache may consume up to <quota> bytes for each of the 
directories specified. In other words, with the above configuration, the total 
cache size can be up to 3 * <quota>.

Please see 
https://github.com/apache/impala/blob/master/be/src/runtime/io/disk-io-mgr.cc#L58-L63
 for the definition of the flag.



--
To view, visit http://gerrit.cloudera.org:8080/13724
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7dd958e4de109b46eaf906fe93145799af123b3f
Gerrit-Change-Number: 13724
Gerrit-PatchSet: 1
Gerrit-Owner: Alex Rodoni <arod...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Michael Ho <k...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Jun 2019 17:50:09 +0000
Gerrit-HasComments: Yes

Reply via email to