IMPALA-6807: [DOCS] Update the known issue for HDFS-12528 Added a new recommendation for the new setting with the fix version of HDFS, 2.10 and higher.
Change-Id: If51cb111a9ddc67be4a1cf42502a8a021486b7e4 Reviewed-on: http://gerrit.cloudera.org:8080/9929 Reviewed-by: Joe McDonnell <joemcdonn...@cloudera.com> Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Project: http://git-wip-us.apache.org/repos/asf/impala/repo Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/aab49461 Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/aab49461 Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/aab49461 Branch: refs/heads/2.x Commit: aab49461f5b7f5cab01768abdf75c710c740afed Parents: dc1922f Author: Alex Rodoni <arod...@cloudera.com> Authored: Wed Apr 4 16:22:42 2018 -0700 Committer: Impala Public Jenkins <impala-public-jenk...@gerrit.cloudera.org> Committed: Wed Apr 11 22:56:00 2018 +0000 ---------------------------------------------------------------------- docs/topics/impala_known_issues.xml | 61 +++++++++++++++++++++++--------- 1 file changed, 45 insertions(+), 16 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/impala/blob/aab49461/docs/topics/impala_known_issues.xml ---------------------------------------------------------------------- diff --git a/docs/topics/impala_known_issues.xml b/docs/topics/impala_known_issues.xml index a8a8451..a09188e 100644 --- a/docs/topics/impala_known_issues.xml +++ b/docs/topics/impala_known_issues.xml @@ -409,25 +409,54 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have <title>Interaction of File Handle Cache with HDFS Appends and Short-Circuit Reads</title> <conbody> <p> - If a data file used by Impala is being continuously appended or overwritten in place by an - HDFS mechanism, such as <cmdname>hdfs dfs -appendToFile</cmdname>, interaction with the - file handle caching feature in <keyword keyref="impala210_full"/> and higher could cause - short-circuit reads to sometimes be disabled on some DataNodes. When a mismatch is detected - between the cached file handle and a data block that was rewritten because of an append, - short-circuit reads are turned off on the affected host for a 10-minute period. + If a data file used by Impala is being continuously appended or + overwritten in place by an HDFS mechanism, such as <cmdname>hdfs dfs + -appendToFile</cmdname>, interaction with the file handle caching + feature in <keyword keyref="impala210_full"/> and higher could cause + short-circuit reads to sometimes be disabled on some DataNodes. When a + mismatch is detected between the cached file handle and a data block + that was rewritten because of an append, short-circuit reads are + turned off on the affected host for a 10-minute period. </p> <p> - The possibility of encountering such an issue is the reason why the file handle caching - feature is currently turned off by default. See <xref keyref="scalability_file_handle_cache"/> - for information about this feature and how to enable it. + The possibility of encountering such an issue is the reason why the + file handle caching feature is currently turned off by default. See + <xref keyref="scalability_file_handle_cache"/> for information about + this feature and how to enable it. </p> - <p><b>Bug:</b> <xref href="https://issues.apache.org/jira/browse/HDFS-12528" scope="external" format="html">HDFS-12528</xref></p> - <p><b>Severity:</b> High</p> - <!-- <p><b>Resolution:</b> </p> --> - <p><b>Workaround:</b> Verify whether your ETL process is susceptible to this issue before enabling the file handle caching feature. - You can set the <cmdname>impalad</cmdname> configuration option <codeph>unused_file_handle_timeout_sec</codeph> to a time period - that is shorter than the HDFS setting <codeph>dfs.client.read.shortcircuit.streams.cache.expiry.ms</codeph>. (Keep in mind that - the HDFS setting is in milliseconds while the Impala setting is in seconds.) + <p> + <b>Bug:</b> + <xref href="https://issues.apache.org/jira/browse/HDFS-12528" + scope="external" format="html">HDFS-12528</xref> + </p> + + <p> + <b>Severity:</b> High + </p> + + <p><b>Workaround:</b> Verify whether your ETL process is susceptible to + this issue before enabling the file handle caching feature. You can + set the <cmdname>impalad</cmdname> configuration option + <codeph>unused_file_handle_timeout_sec</codeph> to a time period + that is shorter than the HDFS setting + <codeph>dfs.client.read.shortcircuit.streams.cache.expiry.ms</codeph>. + (Keep in mind that the HDFS setting is in milliseconds while the + Impala setting is in seconds.) + </p> + + <p> + <b>Resolution:</b> Fixed in HDFS 2.10 and higher. Use the new HDFS + parameter <codeph>dfs.domain.socket.disable.interval.seconds</codeph> + to specify the amount of time that short circuit reads are disabled on + encountering an error. The default value is 10 minutes + (<codeph>600</codeph> seconds). It is recommended that you set + <codeph>dfs.domain.socket.disable.interval.seconds</codeph> to a + small value, such as <codeph>1</codeph> second, when using the file + handle cache. Setting <codeph> + dfs.domain.socket.disable.interval.seconds</codeph> to + <codeph>0</codeph> is not recommended as a non-zero interval + protects the system if there is a persistent problem with short + circuit reads. </p> </conbody> </concept>