[ https://issues.apache.org/jira/browse/IMPALA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874036#comment-16874036 ]
ASF subversion and git services commented on IMPALA-8341: --------------------------------------------------------- Commit e29b387ea10739e78075bac8170e45722d4b9940 in impala's branch refs/heads/master from Alex Rodoni [ https://gitbox.apache.org/repos/asf?p=impala.git;h=e29b387 ] IMPALA-8341: [DOCS] Describe the setting for remote data caching Change-Id: I7dd958e4de109b46eaf906fe93145799af123b3f Reviewed-on: http://gerrit.cloudera.org:8080/13724 Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Reviewed-by: Michael Ho <k...@cloudera.com> > Data cache for remote reads > --------------------------- > > Key: IMPALA-8341 > URL: https://issues.apache.org/jira/browse/IMPALA-8341 > Project: IMPALA > Issue Type: Improvement > Components: Backend > Affects Versions: Impala 3.2.0 > Reporter: Michael Ho > Assignee: Michael Ho > Priority: Critical > Fix For: Impala 3.3.0 > > > When running in public cloud (e.g. AWS with S3) or in certain private cloud > settings (e.g. data stored in object store), the computation and storage are > no longer co-located. This breaks the typical pattern in which Impala query > fragment instances are scheduled at where the data is located. In this > setting, the network bandwidth requirement of both the nics and the top of > rack switches will go up quite a lot as the network traffic includes the data > fetch in addition to the shuffling exchange traffic of intermediate results. > To mitigate the pressure on the network, one can build a storage backed cache > at the compute nodes to cache the working set. With deterministic scan range > scheduling, each compute node should hold non-overlapping partitions of the > data set. > An initial prototype of the cache was posted here: > [https://gerrit.cloudera.org/#/c/12683/] but it probably can benefit from a > better eviction algorithm (e.g. LRU instead of FIFO) and better locking (e.g. > not holding the lock while doing IO). -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org