[ https://issues.apache.org/jira/browse/IMPALA-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16760373#comment-16760373 ]
Joe McDonnell commented on IMPALA-7265: --------------------------------------- [~arodoni_cloudera] The cache_remote_file_handles parameter is merged with the default value of false. This Jira is open to track setting the default to true. > Cache remote file handles > ------------------------- > > Key: IMPALA-7265 > URL: https://issues.apache.org/jira/browse/IMPALA-7265 > Project: IMPALA > Issue Type: Improvement > Components: Backend > Affects Versions: Impala 3.1.0 > Reporter: Joe McDonnell > Assignee: Joe McDonnell > Priority: Critical > > The file handle cache currently does not allow caching remote file handles. > This means that clusters that have a lot of remote reads can suffer from > overloading the NameNode. Impala should be able to cache remote file handles. > There are some open questions about remote file handles and whether they > behave differently from local file handles. In particular: > # Is there any resource constraint on the number of remote file handles > open? (e.g. do they maintain a network connection?) > # Are there any semantic differences in how remote file handles behave when > files are deleted, overwritten, or appended? > # Are there any extra failure cases for remote file handles? (i.e. if a > machine goes down or a remote file handle is left open for an extended period > of time) > The form of caching will depend on the answers, but at the very least, it > should be possible to cache a remote file handle at the level of a query so > that a Parquet file with multiple columns can share file handles. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org