[jira] [Work started] (IMPALA-8490) Impala Doc: the file handle cache now supports S3
[ https://issues.apache.org/jira/browse/IMPALA-8490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-8490 started by Alex Rodoni. --- > Impala Doc: the file handle cache now supports S3 > - > > Key: IMPALA-8490 > URL: https://issues.apache.org/jira/browse/IMPALA-8490 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Sahil Takiar >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc, in_33 > > https://impala.apache.org/docs/build/html/topics/impala_scalability.html > state: > {quote} > Because this feature only involves HDFS data files, it does not apply to > non-HDFS tables, such as Kudu or HBase tables, or tables that store their > data on cloud services such as S3 or ADLS. > {quote} > This section should be updated because the file handle cache now supports S3 > files. > We should add a section to the docs similar to what we added when support for > remote HDFS files was added to the file handle cache: > {quote} > In Impala 3.2 and higher, file handle caching also applies to remote HDFS > file handles. This is controlled by the cache_remote_file_handles flag for an > impalad. It is recommended that you use the default value of true as this > caching prevents your NameNode from overloading when your cluster has many > remote HDFS reads. > {quote} > Like {{cache_remote_file_handles}} the flag {{cache_s3_file_handles}} has > been added as an impalad startup option (the flag is enabled by default). > Unlike HDFS though, S3 has no NameNode, the benefit is that it eliminate a > call to {{getFileStatus()}} on the target S3 file. So "prevents your NameNode > from overloading when your cluster has many remote HDFS reads" should be > changed to something like "avoids an unnecessary call to > S3AFileSystem#getFileStatus() which reduces the number of API calls made to > S3." -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-8490) Impala Doc: the file handle cache now supports S3
[ https://issues.apache.org/jira/browse/IMPALA-8490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-8490 started by Alex Rodoni. --- > Impala Doc: the file handle cache now supports S3 > - > > Key: IMPALA-8490 > URL: https://issues.apache.org/jira/browse/IMPALA-8490 > Project: IMPALA > Issue Type: Sub-task > Components: Docs >Reporter: Sahil Takiar >Assignee: Alex Rodoni >Priority: Major > Labels: future_release_doc, in_33 > > https://impala.apache.org/docs/build/html/topics/impala_scalability.html > state: > {quote} > Because this feature only involves HDFS data files, it does not apply to > non-HDFS tables, such as Kudu or HBase tables, or tables that store their > data on cloud services such as S3 or ADLS. > {quote} > This section should be updated because the file handle cache now supports S3 > files. > We should add a section to the docs similar to what we added when support for > remote HDFS files was added to the file handle cache: > {quote} > In Impala 3.2 and higher, file handle caching also applies to remote HDFS > file handles. This is controlled by the cache_remote_file_handles flag for an > impalad. It is recommended that you use the default value of true as this > caching prevents your NameNode from overloading when your cluster has many > remote HDFS reads. > {quote} > Like {{cache_remote_file_handles}} the flag {{cache_s3_file_handles}} has > been added as an impalad startup option (the flag is enabled by default). > Unlike HDFS though, S3 has no NameNode, the benefit is that it eliminate a > call to {{getFileStatus()}} on the target S3 file. So "prevents your NameNode > from overloading when your cluster has many remote HDFS reads" should be > changed to something like "avoids an unnecessary call to > S3AFileSystem#getFileStatus() which reduces the number of API calls made to > S3." -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org