Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/12987 )
Change subject: IMPALA-8341: Data cache for remote reads ...................................................................... Patch Set 1: After sleeping on this patch, had a few more higher-level thoughts: 1- we should probably periodically check disk space on the allocated partition and if there is a limited amount remaining we should stop using that partition for cache. In the future maybe we would dynamically evict stuff if we're about to run out of space but I think a simple "stop adding to the problem" is a good starting point. We should also handle ENOSPC gracefully - not sure if we do today. 2- currently we hash the cache keys to pick a partition, and then each partition is separately LRU. That should work OK if the partitions are allocated equal sizes. In the case that they aren't, however, this will have a negative effect on hit ratio. For example, consider the limit where one partition has only 10MB of capacity whereas a second partition has 1TB. If we assume that 10MB of cache provides a very low hit rate, then half of our cache queries will hit the low-hit-rate cache and we'll be limited at close to 50% hit rate for the combined cache. A few different solutions come to mind: (a) instead of hashmod to pick a partition, we could use the hsah code to place a token in a hash space sized based on capacity. eg in the above example, the first partition owns hash values 0-1MB and the second partition owns hash values 1MB-1.001TB. Upon hashing a key, we mod by the total capacity and pick a partition with weight relative to capacity. One downside of this approach is that the _IO_ usage will end up apportioned relative to capacity. That may be especially bad considering smaller devices are likely to be faster (a well-meaning user might allocate /data/ssd:100GB,/data/hdd:1TB but now they're getting 10x the IO pushed on their HDD instead of the SSD. Another option would be to do a global hashtable and have the entries directly identify their partitions (in fact I think they already do via the CacheFile pointer, right?). Eviction still needs to be per-partition based on that partition's capacity, but insertion can try to more smartly allocate across drives. Last thought: since there are some tricky design decisions above, maybe we can simplify this and just support a single partition for now, and come back to supporting multi-partition? -- To view, visit http://gerrit.cloudera.org:8080/12987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I734803c1c1787c858dc3ffa0a2c0e33e77b12edc Gerrit-Change-Number: 12987 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho <k...@cloudera.com> Gerrit-Reviewer: David Rorke <dro...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Thomas Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Comment-Date: Thu, 11 Apr 2019 16:20:55 +0000 Gerrit-HasComments: No