IMPALA-6807: [DOCS] Update the known issue for HDFS-12528

Added a new recommendation for the new setting with the fix version
of HDFS, 2.10 and higher.

Change-Id: If51cb111a9ddc67be4a1cf42502a8a021486b7e4
Reviewed-on: http://gerrit.cloudera.org:8080/9929
Reviewed-by: Joe McDonnell <joemcdonn...@cloudera.com>
Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/aab49461
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/aab49461
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/aab49461

Branch: refs/heads/2.x
Commit: aab49461f5b7f5cab01768abdf75c710c740afed
Parents: dc1922f
Author: Alex Rodoni <arod...@cloudera.com>
Authored: Wed Apr 4 16:22:42 2018 -0700
Committer: Impala Public Jenkins <impala-public-jenk...@gerrit.cloudera.org>
Committed: Wed Apr 11 22:56:00 2018 +0000

----------------------------------------------------------------------
 docs/topics/impala_known_issues.xml | 61 +++++++++++++++++++++++---------
 1 file changed, 45 insertions(+), 16 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/aab49461/docs/topics/impala_known_issues.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_known_issues.xml 
b/docs/topics/impala_known_issues.xml
index a8a8451..a09188e 100644
--- a/docs/topics/impala_known_issues.xml
+++ b/docs/topics/impala_known_issues.xml
@@ -409,25 +409,54 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't 
have
       <title>Interaction of File Handle Cache with HDFS Appends and 
Short-Circuit Reads</title>
       <conbody>
         <p>
-          If a data file used by Impala is being continuously appended or 
overwritten in place by an
-          HDFS mechanism, such as <cmdname>hdfs dfs -appendToFile</cmdname>, 
interaction with the
-          file handle caching feature in <keyword keyref="impala210_full"/> 
and higher could cause
-          short-circuit reads to sometimes be disabled on some DataNodes. When 
a mismatch is detected
-          between the cached file handle and a data block that was rewritten 
because of an append,
-          short-circuit reads are turned off on the affected host for a 
10-minute period.
+          If a data file used by Impala is being continuously appended or
+          overwritten in place by an HDFS mechanism, such as <cmdname>hdfs dfs
+            -appendToFile</cmdname>, interaction with the file handle caching
+          feature in <keyword keyref="impala210_full"/> and higher could cause
+          short-circuit reads to sometimes be disabled on some DataNodes. When 
a
+          mismatch is detected between the cached file handle and a data block
+          that was rewritten because of an append, short-circuit reads are
+          turned off on the affected host for a 10-minute period.
         </p>
         <p>
-          The possibility of encountering such an issue is the reason why the 
file handle caching
-          feature is currently turned off by default. See <xref 
keyref="scalability_file_handle_cache"/>
-          for information about this feature and how to enable it.
+          The possibility of encountering such an issue is the reason why the
+          file handle caching feature is currently turned off by default. See
+            <xref keyref="scalability_file_handle_cache"/> for information 
about
+          this feature and how to enable it.
         </p>
-        <p><b>Bug:</b> <xref 
href="https://issues.apache.org/jira/browse/HDFS-12528"; scope="external" 
format="html">HDFS-12528</xref></p>
-        <p><b>Severity:</b> High</p>
-        <!-- <p><b>Resolution:</b> </p> -->
-        <p><b>Workaround:</b> Verify whether your ETL process is susceptible 
to this issue before enabling the file handle caching feature.
-          You can set the <cmdname>impalad</cmdname> configuration option 
<codeph>unused_file_handle_timeout_sec</codeph> to a time period
-          that is shorter than the HDFS setting 
<codeph>dfs.client.read.shortcircuit.streams.cache.expiry.ms</codeph>. (Keep in 
mind that
-          the HDFS setting is in milliseconds while the Impala setting is in 
seconds.)
+        <p>
+          <b>Bug:</b>
+          <xref href="https://issues.apache.org/jira/browse/HDFS-12528";
+            scope="external" format="html">HDFS-12528</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p><b>Workaround:</b> Verify whether your ETL process is susceptible to
+          this issue before enabling the file handle caching feature. You can
+          set the <cmdname>impalad</cmdname> configuration option
+            <codeph>unused_file_handle_timeout_sec</codeph> to a time period
+          that is shorter than the HDFS setting
+            
<codeph>dfs.client.read.shortcircuit.streams.cache.expiry.ms</codeph>.
+          (Keep in mind that the HDFS setting is in milliseconds while the
+          Impala setting is in seconds.)
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixed in HDFS 2.10 and higher. Use the new HDFS
+          parameter <codeph>dfs.domain.socket.disable.interval.seconds</codeph>
+          to specify the amount of time that short circuit reads are disabled 
on
+          encountering an error. The default value is 10 minutes
+            (<codeph>600</codeph> seconds). It is recommended that you set
+            <codeph>dfs.domain.socket.disable.interval.seconds</codeph> to a
+          small value, such as <codeph>1</codeph> second, when using the file
+          handle cache. Setting <codeph>
+            dfs.domain.socket.disable.interval.seconds</codeph> to
+            <codeph>0</codeph> is not recommended as a non-zero interval
+          protects the system if there is a persistent problem with short
+          circuit reads.
         </p>
       </conbody>
     </concept>

Reply via email to