Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12987 )

Change subject: IMPALA-8341: Data cache for remote reads
......................................................................


Patch Set 1:

After sleeping on this patch, had a few more higher-level thoughts:

1- we should probably periodically check disk space on the allocated partition 
and if there is a limited amount remaining we should stop using that partition 
for cache. In the future maybe we would dynamically evict stuff if we're about 
to run out of space but I think a simple "stop adding to the problem" is a good 
starting point. We should also handle ENOSPC gracefully - not sure if we do 
today.

2- currently we hash the cache keys to pick a partition, and then each 
partition is separately LRU. That should work OK if the partitions are 
allocated equal sizes. In the case that they aren't, however, this will have a 
negative effect on hit ratio. For example, consider the limit where one 
partition has only 10MB of capacity whereas a second partition has 1TB. If we 
assume that 10MB of cache provides a very low hit rate, then half of our cache 
queries will hit the low-hit-rate cache and we'll be limited at close to 50% 
hit rate for the combined cache.

A few different solutions come to mind: (a) instead of hashmod to pick a 
partition, we could use the hsah code to place a token in a hash space sized 
based on capacity. eg in the above example, the first partition owns hash 
values 0-1MB and the second partition owns hash values 1MB-1.001TB. Upon 
hashing a key, we mod by the total capacity and pick a partition with weight 
relative to capacity.

One downside of this approach is that the _IO_ usage will end up apportioned 
relative to capacity. That may be especially bad considering smaller devices 
are likely to be faster (a well-meaning user might allocate 
/data/ssd:100GB,/data/hdd:1TB but now they're getting 10x the IO pushed on 
their HDD instead of the SSD.

Another option would be to do a global hashtable and have the entries directly 
identify their partitions (in fact I think they already do via the CacheFile 
pointer, right?). Eviction still needs to be per-partition based on that 
partition's capacity, but insertion can try to more smartly allocate across 
drives.


Last thought: since there are some tricky design decisions above, maybe we can 
simplify this and just support a single partition for now, and come back to 
supporting multi-partition?


--
To view, visit http://gerrit.cloudera.org:8080/12987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc
Gerrit-Change-Number: 12987
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <k...@cloudera.com>
Gerrit-Reviewer: David Rorke <dro...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Thomas Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Comment-Date: Thu, 11 Apr 2019 16:20:55 +0000
Gerrit-HasComments: No

Reply via email to