http://git-wip-us.apache.org/repos/asf/impala/blob/b4ad38a9/docs/build/html/topics/impala_perf_stats.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_perf_stats.html b/docs/build/html/topics/impala_perf_stats.html index ce79b71..ac1ba3a 100644 --- a/docs/build/html/topics/impala_perf_stats.html +++ b/docs/build/html/topics/impala_perf_stats.html @@ -1,9 +1,29 @@ +<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html - SYSTEM "about:legacy-compat"> -<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.12x"><meta name="version" content="Impala 2.12x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="perf_stats"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Table and Column Statistics</title></head><body id="perf_stats"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> +<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> + +<meta name="copyright" content="(C) Copyright 2018" /> +<meta name="DC.rights.owner" content="(C) Copyright 2018" /> +<meta name="DC.Type" content="concept" /> +<meta name="DC.Title" content="Table and Column Statistics" /> +<meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html" /> +<meta name="prodname" content="Impala" /> +<meta name="prodname" content="Impala" /> +<meta name="version" content="Impala 3.0.x" /> +<meta name="version" content="Impala 3.0.x" /> +<meta name="DC.Format" content="XHTML" /> +<meta name="DC.Identifier" content="perf_stats" /> +<link rel="stylesheet" type="text/css" href="../commonltr.css" /> +<title>Table and Column Statistics</title> +</head> +<body id="perf_stats"> + <h1 class="title topictitle1" id="ariaid-title1">Table and Column Statistics</h1> + <div class="body conbody"> @@ -18,16 +38,24 @@ and how to produce them and keep them up to date. </p> + <p class="p toc inpage all"></p> + </div> - <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav><article class="topic concept nested1" aria-labelledby="perf_table_stats__table_stats" id="perf_stats__perf_table_stats"> + + <div class="related-links"> +<div class="familylinks"> +<div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div> +</div> +</div><div class="topic concept nested1" aria-labelledby="perf_table_stats__table_stats" id="perf_table_stats"> <h2 class="title topictitle2" id="perf_table_stats__table_stats">Overview of Table Statistics</h2> + <div class="body conbody"> <p class="p"> @@ -41,6 +69,7 @@ because they can be calculated cheaply, as part of gathering HDFS block metadata. </p> + <p class="p"> The following example shows table stats for an unpartitioned Parquet table. The values for the number and sizes of files are always available. Initially, the number of rows is @@ -49,6 +78,7 @@ in any unknown table stats values. </p> + <pre class="pre codeblock"><code> show table stats parquet_snappy; +-------+--------+---------+--------------+-------------------+---------+-------------------+... @@ -78,6 +108,7 @@ show table stats parquet_snappy; optimizations by using a combination of table and column statistics. </p> + <p class="p"> To check that table statistics are available for a table, and see the details of those statistics, use the statement <code class="ph codeph">SHOW TABLE STATS @@ -85,6 +116,7 @@ show table stats parquet_snappy; details. </p> + <p class="p"> If you use the Hive-based methods of gathering statistics, see <a class="xref" href="https://cwiki.apache.org/confluence/display/Hive/StatsDev" target="_blank">the @@ -93,20 +125,25 @@ show table stats parquet_snappy; potential configuration and scalability issues with the statistics-gathering process. </p> + <p class="p"> If you run the Hive statement <code class="ph codeph">ANALYZE TABLE COMPUTE STATISTICS FOR COLUMNS</code>, Impala can only use the resulting column statistics if the table is unpartitioned. Impala cannot use Hive-generated column statistics for a partitioned table. </p> + </div> - </article> - <article class="topic concept nested1" aria-labelledby="perf_column_stats__column_stats" id="perf_stats__perf_column_stats"> + </div> + + + <div class="topic concept nested1" aria-labelledby="perf_column_stats__column_stats" id="perf_column_stats"> <h2 class="title topictitle2" id="perf_column_stats__column_stats">Overview of Column Statistics</h2> + <div class="body conbody"> <p class="p"> @@ -119,6 +156,7 @@ show table stats parquet_snappy; internally the same way as join queries.</span> </p> + <p class="p"> The following example shows column stats for an unpartitioned Parquet table. The values for the maximum and average sizes of some types are always available, because those @@ -131,6 +169,7 @@ show table stats parquet_snappy; does not use that figure for query optimization.) </p> + <pre class="pre codeblock"><code> show column stats parquet_snappy; +-------------+----------+------------------+--------+----------+----------+ @@ -164,7 +203,7 @@ show column stats parquet_snappy; +-------------+----------+------------------+--------+----------+-------------------+ </code></pre> - <div class="note note note_note"><span class="note__title notetitle">Note:</span> + <div class="note note"><span class="notetitle">Note:</span> <p class="p"> For column statistics to be effective in Impala, you also need to have table statistics for the applicable tables, as described in @@ -172,9 +211,11 @@ show column stats parquet_snappy; <code class="ph codeph">COMPUTE STATS</code> statement, both table and column statistics are automatically gathered at the same time, for all columns in the table. </p> + </div> - <div class="note note note_note"><span class="note__title notetitle">Note:</span> Prior to Impala 1.4.0, + + <div class="note note"><span class="notetitle">Note:</span> Prior to Impala 1.4.0, <code class="ph codeph">COMPUTE STATS</code> counted the number of <code class="ph codeph">NULL</code> values in each column and recorded that figure in the metastore database. Because Impala does not currently use the @@ -182,6 +223,7 @@ show column stats parquet_snappy; higher speeds up the <code class="ph codeph">COMPUTE STATS</code> statement by skipping this <code class="ph codeph">NULL</code> counting. </div> + <p class="p"> To check whether column statistics are available for a particular set of columns, use the <code class="ph codeph">SHOW COLUMN STATS <var class="keyword varname">table_name</var></code> statement, or check @@ -190,20 +232,25 @@ show column stats parquet_snappy; <a class="xref" href="impala_explain.html#explain">EXPLAIN Statement</a> for details. </p> + <p class="p"> If you run the Hive statement <code class="ph codeph">ANALYZE TABLE COMPUTE STATISTICS FOR COLUMNS</code>, Impala can only use the resulting column statistics if the table is unpartitioned. Impala cannot use Hive-generated column statistics for a partitioned table. </p> + </div> - </article> - <article class="topic concept nested1" aria-labelledby="perf_stats_partitions__stats_partitions" id="perf_stats__perf_stats_partitions"> + </div> + + + <div class="topic concept nested1" aria-labelledby="perf_stats_partitions__stats_partitions" id="perf_stats_partitions"> <h2 class="title topictitle2" id="perf_stats_partitions__stats_partitions">How Table and Column Statistics Work for Partitioned Tables</h2> + <div class="body conbody"> <p class="p"> @@ -216,6 +263,7 @@ show column stats parquet_snappy; as a result. </p> + <p class="p"> The following examples show how table and column stats work with a partitioned table. The table for this example is partitioned by year, month, and day. For simplicity, the @@ -228,6 +276,7 @@ show column stats parquet_snappy; values for non-key columns are shown as -1. </p> + <pre class="pre codeblock"><code> show partitions year_month_day; +-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+... @@ -307,14 +356,18 @@ show column stats year_month_day; Impala cannot use Hive-generated column statistics for a partitioned table. </p> + </div> - </article> - <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="perf_stats__perf_generating_stats"> + </div> + + + <div class="topic concept nested1" aria-labelledby="ariaid-title5" id="perf_generating_stats"> <h2 class="title topictitle2" id="ariaid-title5">Generating Table and Column Statistics</h2> + <div class="body conbody"> <p class="p"> @@ -324,7 +377,8 @@ show column stats year_month_day; workflows which are explained below. </p> - <div class="note important note_important"><span class="note__title importanttitle">Important:</span> + + <div class="note important"><span class="importanttitle">Important:</span> <p class="p"> For a particular table, use either <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>, but never combine the two or @@ -333,16 +387,20 @@ show column stats year_month_day; vice versa, drop all statistics by running <code class="ph codeph">DROP STATS</code> before making the switch. </p> + </div> + </div> - <article class="topic concept nested2" aria-labelledby="ariaid-title6" id="perf_generating_stats__concept_y2f_nfl_mdb"> + + <div class="topic concept nested2" aria-labelledby="ariaid-title6" id="concept_y2f_nfl_mdb"> <h3 class="title topictitle3" id="ariaid-title6">COMPUTE STATS</h3> + <div class="body conbody"> <p class="p"> @@ -351,6 +409,7 @@ show column stats year_month_day; table. The collection process is CPU-intensive and can take a long time to complete for very large tables. </p> + <div class="p"> To speed up <code class="ph codeph">COMPUTE STATS</code> consider the following options which can be combined. @@ -363,7 +422,9 @@ show column stats year_month_day; clauses. Other columns are good candidates to exclude from COMPUTE STATS. This feature is available since Impala 2.12. </p> + </li> + <li class="li"> <p class="p"> Set the MT_DOP query option to use more threads within each participating @@ -373,16 +434,22 @@ show column stats year_month_day; COMPUTE STATS claims most CPU cycles. This feature is available since Impala 2.8. </p> + </li> + <li class="li"> <p class="p"> Consider the experimental extrapolation and sampling features (see below) to further increase the efficiency of computing stats. </p> + </li> + </ul> + </div> + <p class="p"> <code class="ph codeph">COMPUTE STATS</code> is intended to be run periodically, e.g. weekly, or on-demand when the contents of a table have changed @@ -395,16 +462,20 @@ show column stats year_month_day; statistics. </p> + <p class="p"> If you reload a complete new set of data for a table, but the number of rows and number of distinct values for each column is relatively unchanged from before, you do not need to recompute stats for the table. </p> + </div> - <article class="topic concept nested3" aria-labelledby="ariaid-title7" id="concept_y2f_nfl_mdb__experimental_stats_features"> + + <div class="topic concept nested3" aria-labelledby="ariaid-title7" id="experimental_stats_features"> <h4 class="title topictitle4" id="ariaid-title7">Experimental: Extrapolation and Sampling</h4> + <div class="body conbody"> <div class="p"> Impala 2.12 and higher includes two experimental features to alleviate @@ -419,20 +490,26 @@ show column stats year_month_day; the scan cardinality based on those old partitions that have stats, and the new partitions without stats are treated as having 0 rows. </p> + </li> + <li class="li"> <p class="p"> The row counts of existing partitions become stale when data is added or dropped. </p> + </li> + <li class="li"> <p class="p"> Computing stats for tables with a 100,000 or more partitions might fail or be very slow due to the high cost of updating the partition metadata in the Hive Metastore. </p> + </li> + <li class="li"> <p class="p"> With transient compute resources it is important to minimize the time @@ -441,30 +518,36 @@ show column stats year_month_day; quickly collect stats that are "good enough" as opposed to spending a lot of time and resouces on computing full-fidelity stats. </p> + </li> + </ul> + For very large tables, it is often wasteful or impractical to run a full COMPUTE STATS to address the scenarios above on a frequent basis. </div> + <p class="p"> The sampling feature makes COMPUTE STATS more efficient by processing a fraction of the table data, and the extrapolation feature aims to reduce the frequency at which COMPUTE STATS needs to be re-run by estimating the row count of new and modified partitions. </p> + <p class="p"> The sampling and extrapolation features are disabled by default. They can be enabled globally or for specific tables, as follows. Set the impalad start-up configuration "--enable_stats_extrapolation" to enable the features globally. To enable them only for a specific table, set the "impala.enable.stats.extrapolation" table property to "true" for the - desired table. The tbale-level property overrides the global setting, so + desired table. The table-level property overrides the global setting, so it is also possible to enable sampling and extrapolation globally, but disable it for specific tables by setting the table property to "false". Example: ALTER TABLE mytable test_table SET TBLPROPERTIES("impala.enable.stats.extrapolation"="true") </p> - <div class="note note note_note"><span class="note__title notetitle">Note:</span> + + <div class="note note"><span class="notetitle">Note:</span> Why are these features experimental? Due to their probabilistic nature it is possible that these features perform pathologically poorly on tables with extreme data/file/size distributions. Since it is not feasible for us @@ -475,10 +558,13 @@ show column stats year_month_day; We rely on user feedback to guide future inprovements in statistics collection. </div> + </div> - <article class="topic concept nested4" aria-labelledby="ariaid-title8" id="experimental_stats_features__experimental_stats_extrapolation"> + + <div class="topic concept nested4" aria-labelledby="ariaid-title8" id="experimental_stats_extrapolation"> <h5 class="title topictitle5" id="ariaid-title8">Stats Extrapolation</h5> + <div class="body conbody"> <p class="p"> The main idea of stats extrapolation is to estimate the row count of new @@ -496,17 +582,22 @@ show column stats year_month_day; the scan cardinality estimation ignores per-partition row counts. It only relies on the table-level statistics and the scanned data volume. </p> + <p class="p"> The SHOW TABLE STATS and EXPLAIN commands distinguish between row counts stored in the Hive Metastore, and the row counts extrapolated based on the above process. Consult the SHOW TABLE STATS and EXPLAIN documentation for more details. </p> + </div> - </article> - <article class="topic concept nested4" aria-labelledby="ariaid-title9" id="experimental_stats_features__experimental_stats_sampling"> + </div> + + + <div class="topic concept nested4" aria-labelledby="ariaid-title9" id="experimental_stats_sampling"> <h5 class="title topictitle5" id="ariaid-title9">Sampling</h5> + <div class="body conbody"> <p class="p"> A TABLESAMPLE clause may be added to COMPUTE STATS to limit the @@ -516,12 +607,14 @@ show column stats year_month_day; sampling was used. The following example runs COMPUTE STATS over a 10 percent data sample: COMPUTE STATS test_table TABLESAMPLE SYSTEM(10) </p> + <p class="p"> We have found that a 10 percent sampling rate typically offers a good tradeoff between statistics accuracy and execution cost. A sampling rate well below 10 percent has shown poor results and is not recommended. </p> - <div class="note important note_important"><span class="note__title importanttitle">Important:</span> + + <div class="note important"><span class="importanttitle">Important:</span> Sampling-based techniques sacrifice result accuracy for execution efficiency, so your mileage may vary for different tables and columns depending on their data distribution. The extrapolation procedure Impala @@ -529,15 +622,21 @@ show column stats year_month_day; non-detetministic, so your results may even vary between runs of COMPUTE STATS TABLESAMPLE, even if no data has changed. </div> + </div> - </article> - </article> - </article> - <article class="topic concept nested2" aria-labelledby="ariaid-title10" id="perf_generating_stats__concept_bmk_pfl_mdb"> + </div> + + </div> + + </div> + + + <div class="topic concept nested2" aria-labelledby="ariaid-title10" id="concept_bmk_pfl_mdb"> <h3 class="title topictitle3" id="ariaid-title10">COMPUTE INCREMENTAL STATS</h3> + <div class="body conbody"> <p class="p"> @@ -548,6 +647,7 @@ show column stats year_month_day; a specialized feature for partitioned tables. </p> + <p class="p"> When you compute incremental statistics for a partitioned table, by default Impala only processes those partitions that do not yet have incremental statistics. By processing @@ -555,6 +655,7 @@ show column stats year_month_day; overhead of reprocessing the entire table each time. </p> + <p class="p"> You can also compute or drop statistics for a specified subset of partitions by including a <code class="ph codeph">PARTITION</code> clause in the @@ -562,14 +663,18 @@ show column stats year_month_day; statement. </p> - <div class="note important note_important"><span class="note__title importanttitle">Important:</span> - <p class="p"> - For a table with a huge number of partitions and many columns, the approximately 400 bytes - of metadata per column per partition can add up to significant memory overhead, as it must - be cached on the <span class="keyword cmdname">catalogd</span> host and on every <span class="keyword cmdname">impalad</span> host - that is eligible to be a coordinator. If this metadata for all tables combined exceeds 2 GB, - you might experience service downtime. - </p> + + <div class="note important"><span class="importanttitle">Important:</span> + <p class="p"> In Impala 3.0 and lower, approximately + 400 bytes of metadata per column per partition are needed for caching. + Tables with a big number of partitions and many columns can add up to a + significant memory overhead as the metadata must be cached on the + <span class="keyword cmdname">catalogd</span> host and on every + <span class="keyword cmdname">impalad</span> host that is eligible to be a coordinator. + If this metadata for all tables exceeds 2 GB, you might experience + service downtime. In Impala 3.1 and higher, the issue was alleviated + with an improved handling of incremental stats.</p> + <p class="p"> When you run <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> on a table for the first time, the statistics are computed again from scratch regardless of whether the table already @@ -577,13 +682,16 @@ show column stats year_month_day; for scanning the entire table when running <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> for the first time on a given table. </p> + </div> + <p class="p"> The metadata for incremental statistics is handled differently from the original style of statistics: </p> + <ul class="ul"> <li class="li"> <p class="p"> @@ -595,8 +703,10 @@ show column stats year_month_day; incremental stats by issuing a <code class="ph codeph">DROP INCREMENTAL STATS</code> before running <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>. </p> + </li> + <li class="li"> <p class="p"> The <code class="ph codeph">SHOW TABLE STATS</code> and <code class="ph codeph">SHOW PARTITIONS</code> @@ -606,8 +716,10 @@ show column stats year_month_day; indicated by a value other than <code class="ph codeph">-1</code> under the <code class="ph codeph">#Rows</code> column. Impala query planning uses either kind of statistics when available. </p> + </li> + <li class="li"> <p class="p"> <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> takes more time than <code class="ph codeph">COMPUTE @@ -618,19 +730,24 @@ show column stats year_month_day; not updated with new partitions, use the original <code class="ph codeph">COMPUTE STATS</code> syntax. </p> + </li> + <li class="li"> <p class="p"> <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> uses some memory in the - <span class="keyword cmdname">catalogd</span> process, proportional to the number of partitions and - number of columns in the applicable table. The memory overhead is approximately 400 - bytes for each column in each partition. This memory is reserved in the - <span class="keyword cmdname">catalogd</span> daemon, the <span class="keyword cmdname">statestored</span> daemon, and - in each instance of the <span class="keyword cmdname">impalad</span> daemon. - </p> + <span class="keyword cmdname">catalogd</span> process, proportional to the number + of partitions and number of columns in the applicable table. The + memory overhead is approximately 400 bytes for each column in each + partition. This memory is reserved in the + <span class="keyword cmdname">catalogd</span> daemon, the + <span class="keyword cmdname">statestored</span> daemon, and in each instance of + the impalad daemon. </p> + </li> + <li class="li"> <p class="p"> In cases where new files are added to an existing partition, issue a @@ -638,8 +755,10 @@ show column stats year_month_day; INCREMENTAL STATS</code> and <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> sequence for the changed partition. </p> + </li> + <li class="li"> <p class="p"> The <code class="ph codeph">DROP INCREMENTAL STATS</code> statement operates only on a single @@ -647,14 +766,18 @@ show column stats year_month_day; partitions of a table, issue a <code class="ph codeph">DROP STATS</code> statement with no <code class="ph codeph">INCREMENTAL</code> or <code class="ph codeph">PARTITION</code> clauses. </p> + </li> + </ul> + <p class="p"> The following considerations apply to incremental statistics when the structure of an existing table is changed (known as <dfn class="term">schema evolution</dfn>): </p> + <ul class="ul"> <li class="li"> <p class="p"> @@ -662,47 +785,142 @@ show column stats year_month_day; statistics remain valid and <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> does not rescan any partitions. </p> + </li> + <li class="li"> <p class="p"> If you use an <code class="ph codeph">ALTER TABLE</code> statement to add a column, Impala rescans all partitions and fills in the appropriate column-level values the next time you run <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>. </p> + </li> + <li class="li"> <p class="p"> If you use an <code class="ph codeph">ALTER TABLE</code> statement to change the data type of a column, Impala rescans all partitions and fills in the appropriate column-level values the next time you run <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>. </p> + </li> + <li class="li"> <p class="p"> If you use an <code class="ph codeph">ALTER TABLE</code> statement to change the file format of a table, the existing statistics remain valid and a subsequent <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> does not rescan any partitions. </p> + </li> + </ul> + <p class="p"> See <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> and <a class="xref" href="impala_drop_stats.html#drop_stats">DROP STATS Statement</a> for syntax details. </p> + + </div> + + <div class="topic concept nested3" aria-labelledby="ariaid-title11" id="inc_stats_size_limit_bytes"> + <h4 class="title topictitle4" id="ariaid-title11">Maximum Serialized Stats Size</h4> + + <div class="body conbody"> + <p class="p">In Impala 3.0 and lower, when executing <code class="ph codeph">COMPUTE INCREMENTAL + STATS</code> on very large tables, use the configuration setting + <code class="ph codeph">--inc_stats_size_limit_bytes</code> to prevent Impala + from running out of memory while updating table metadata. If this + limit is reached, Impala will stop loading the table and return an + error. The error serves as an indication that <code class="ph codeph">COMPUTE + INCREMENTAL STATS</code> should not be used on the particular + table. Consider spitting the table and using regular <code class="ph codeph">COMPUTE + STATS</code> ]if possible. </p> + + + <p class="p"> The <code class="ph codeph">--inc_stats_size_limit_bytes</code> limit is set as + a safety check, to prevent Impala from hitting the maximum limit for + the table metadata. Note that this limit is only one part of the + entire table's metadata all of which together must be below 2 GB. </p> + + + <p class="p"> The default value for + <code class="ph codeph">--inc_stats_size_limit_bytes</code> is 209715200, 200 + MB. </p> + + + <p class="p"> To change the <code class="ph codeph">--inc_stats_size_limit_bytes</code> value, + restart impalad and catalogd with the new value specified in bytes, + for example, 1048576000 for 1 GB. See <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a> for the steps to + change the option and restart Impala daemons. </p> + + + <div class="note attention"><span class="attentiontitle">Attention:</span> The + <code class="ph codeph">--inc_stats_size_limit_bytes</code> setting should be + increased with care. A big value for the setting, such as 1 GB or + more, can result in a spike in heap usage as well as a crash of + Impala. </div> + + <p class="p">In Impala 3.1 and higher, Impala improved how metadata is updated + when executing <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>, + significantly reducing the need for + <code class="ph codeph">--inc_stats_size_limit_bytes</code>. </p> + + </div> + + </div> + + <div class="topic concept nested3" aria-labelledby="ariaid-title12" id="pull_incremental_statistics"> + <h4 class="title topictitle4" id="ariaid-title12">Loading Incremental Statistics from Catalogd</h4> + + <div class="body conbody"> + <p class="p"> + Starting in Impala 3.1, a new configuration setting, + <code class="ph codeph">--pull_incremental_statistics</code>, was added and set + to <code class="ph codeph">true</code> by default. When you start Impala catalogd + and impalad coordinators with this setting enabled: + </p> + + <ul class="ul"> + <li class="li"> Newly created incremental stats will be smaller in size thus + reducing memory pressure on the catalogd daemon. Your users can + keep more tables and partitions in the same catalog and have lower + chances of crashing catalogd due to out-of-memory issues. </li> + + <li class="li"> + Incremental stats will not be replicated to impalad and will be + accessed on demand from catalogd, resulting in a reduced memory + footprint of impalad. + </li> + + </ul> + + <p class="p"> + We do not recommend you change the default setting of + <code class="ph codeph">--pull_incremental_statistics</code>. + </p> + + </div> + </div> - </article> - </article> + </div> - <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="perf_stats__perf_stats_checking"> - <h2 class="title topictitle2" id="ariaid-title11">Detecting Missing Statistics</h2> + </div> + + + <div class="topic concept nested1" aria-labelledby="ariaid-title13" id="perf_stats_checking"> + + <h2 class="title topictitle2" id="ariaid-title13">Detecting Missing Statistics</h2> + <div class="body conbody"> @@ -715,12 +933,14 @@ show column stats year_month_day; the <code class="ph codeph">#Rows</code> field changes to an accurate value. </p> + <p class="p"> The following example shows a table that initially does not have any statistics. The <code class="ph codeph">SHOW TABLE STATS</code> statement displays different values for <code class="ph codeph">#Rows</code> before and after the <code class="ph codeph">COMPUTE STATS</code> operation. </p> + <pre class="pre codeblock"><code>[localhost:21000] > create table no_stats (x int); [localhost:21000] > show table stats no_stats; +-------+--------+------+--------------+--------+-------------------+ @@ -750,6 +970,7 @@ show column stats year_month_day; adding a new partition. </p> + <pre class="pre codeblock"><code>[localhost:21000] > create table no_stats_partitioned (x int) partitioned by (year smallint); [localhost:21000] > show table stats no_stats_partitioned; +-------+-------+--------+------+--------------+--------+-------------------+ @@ -781,7 +1002,7 @@ show column stats year_month_day; +-------+-------+--------+------+--------------+--------+-------------------+ </code></pre> - <div class="note note note_note"><span class="note__title notetitle">Note:</span> + <div class="note note"><span class="notetitle">Note:</span> Because the default <code class="ph codeph">COMPUTE STATS</code> statement creates and updates statistics for all partitions in a table, if you expect to frequently add new partitions, use the <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> syntax instead, which @@ -789,6 +1010,7 @@ show column stats year_month_day; that do not already have incremental stats. </div> + <p class="p"> If checking each individual table is impractical, due to a large number of tables or views that hide the underlying base tables, you can also check for missing statistics @@ -800,6 +1022,7 @@ show column stats year_month_day; if any tables or partitions involved in the query do not have statistics. </p> + <pre class="pre codeblock"><code>[localhost:21000] > create table no_stats (x int); [localhost:21000] > explain select count(*) from no_stats; +------------------------------------------------------------------------------------+ @@ -830,6 +1053,7 @@ show column stats year_month_day; see warnings or not for different queries against the same table: </p> + <pre class="pre codeblock"><code>-- No warning because all the partitions for the year 2012 have stats. EXPLAIN SELECT ... FROM t1 WHERE year = 2012; @@ -844,17 +1068,22 @@ EXPLAIN SELECT ... FROM t1 WHERE year BETWEEN 2006 AND 2009; <var class="keyword varname">table_name</var></code>. </p> + </div> - </article> - <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="perf_stats__concept_s3c_4gl_mdb"> + </div> + + + <div class="topic concept nested1" aria-labelledby="ariaid-title14" id="concept_s3c_4gl_mdb"> - <h2 class="title topictitle2" id="ariaid-title12">Manually Setting Table and Column Statistics with ALTER TABLE</h2> + <h2 class="title topictitle2" id="ariaid-title14">Manually Setting Table and Column Statistics with ALTER TABLE</h2> - <article class="topic concept nested2" aria-labelledby="ariaid-title13" id="concept_s3c_4gl_mdb__concept_wpt_pgl_mdb"> - <h3 class="title topictitle3" id="ariaid-title13">Setting Table Statistics</h3> + <div class="topic concept nested2" aria-labelledby="ariaid-title15" id="concept_wpt_pgl_mdb"> + + <h3 class="title topictitle3" id="ariaid-title15">Setting Table Statistics</h3> + <div class="body conbody"> @@ -870,6 +1099,7 @@ EXPLAIN SELECT ... FROM t1 WHERE year BETWEEN 2006 AND 2009; statement: </p> + <pre class="pre codeblock"><code> -- Set total number of rows. Applies to both unpartitioned and partitioned tables. alter table <var class="keyword varname">table_name</var> set tblproperties('numRows'='<var class="keyword varname">new_value</var>', 'STATS_GENERATED_VIA_STATS_TASK'='true'); @@ -887,6 +1117,7 @@ alter table <var class="keyword varname">table_name</var> partition (<var class= for the Hive metastore.) </p> + <pre class="pre codeblock"><code>create table analysis_data stored as parquet as select * from raw_data; Inserted 1000000000 rows in 181.98s compute stats analysis_data; @@ -900,6 +1131,7 @@ alter table analysis_data set tblproperties('numRows'='1001000000', 'STATS_GENER of rows for the whole table: </p> + <pre class="pre codeblock"><code>-- If the table originally contained 1 million rows, and we add another partition with 30 thousand rows, -- change the numRows property for the partition and the overall table. alter table partitioned_data partition(year=2009, month=4) set tblproperties ('numRows'='30000', 'STATS_GENERATED_VIA_STATS_TASK'='true'); @@ -918,13 +1150,17 @@ alter table partitioned_data set tblproperties ('numRows'='1030000', 'STATS_GENE <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> operations. </p> + </div> - </article> - <article class="topic concept nested2" aria-labelledby="ariaid-title14" id="concept_s3c_4gl_mdb__concept_asb_vgl_mdb"> + </div> + + + <div class="topic concept nested2" aria-labelledby="ariaid-title16" id="concept_asb_vgl_mdb"> + + <h3 class="title topictitle3" id="ariaid-title16">Setting Column Statistics</h3> - <h3 class="title topictitle3" id="ariaid-title14">Setting Column Statistics</h3> <div class="body conbody"> @@ -936,6 +1172,7 @@ alter table partitioned_data set tblproperties ('numRows'='1030000', 'STATS_GENE frequently enough to keep up with data changes for a huge table. </p> + <div class="p"> You specify a case-insensitive symbolic name for the kind of statistics: <code class="ph codeph">numDVs</code>, <code class="ph codeph">numNulls</code>, <code class="ph codeph">avgSize</code>, <code class="ph codeph">maxSize</code>. @@ -963,15 +1200,20 @@ show column stats t1; </code></pre> </div> + </div> - </article> - </article> + </div> + + + </div> + + + <div class="topic concept nested1" aria-labelledby="ariaid-title17" id="perf_stats_examples"> - <article class="topic concept nested1" aria-labelledby="ariaid-title15" id="perf_stats__perf_stats_examples"> + <h2 class="title topictitle2" id="ariaid-title17">Examples of Using Table and Column Statistics with Impala</h2> - <h2 class="title topictitle2" id="ariaid-title15">Examples of Using Table and Column Statistics with Impala</h2> <div class="body conbody"> @@ -982,6 +1224,7 @@ show column stats t1; aspects of how Impala uses statistics to help optimize queries. </p> + <p class="p"> This example shows table and column statistics for the <code class="ph codeph">STORE</code> column used in the <a class="xref" href="http://www.tpc.org/tpcds/" target="_blank">TPC-DS @@ -995,6 +1238,7 @@ show column stats t1; <code class="ph codeph">TIMESTAMP</code>. </p> + <pre class="pre codeblock"><code>[localhost:21000] > show table stats store; +-------+--------+--------+--------+ | #Rows | #Files | Size | Format | @@ -1048,6 +1292,7 @@ Returned 29 row(s) in 0.04s</code></pre> columns: </p> + <pre class="pre codeblock"><code>[localhost:21000] > compute stats store; +------------------------------------------+ | summary | @@ -1109,6 +1354,7 @@ Returned 29 row(s) in 0.04s</code></pre> </p> + <pre class="pre codeblock"><code>localhost:21000] > describe census; +------+----------+---------+ | name | type | comment | @@ -1145,6 +1391,7 @@ Returned 2 row(s) in 0.02s</code></pre> STATS</code> statement in Impala. </p> + <pre class="pre codeblock"><code>[localhost:21000] > compute stats census; +-----------------------------------------+ | summary | @@ -1185,8 +1432,12 @@ Returned 2 row(s) in 0.02s</code></pre> performance. </p> + </div> - </article> -</article></main></body></html> + </div> + + +</body> +</html> \ No newline at end of file
http://git-wip-us.apache.org/repos/asf/impala/blob/b4ad38a9/docs/build/html/topics/impala_perf_testing.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_perf_testing.html b/docs/build/html/topics/impala_perf_testing.html index 1ecf66f..cc11e8c 100644 --- a/docs/build/html/topics/impala_perf_testing.html +++ b/docs/build/html/topics/impala_perf_testing.html @@ -1,8 +1,28 @@ +<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html - SYSTEM "about:legacy-compat"> -<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.12x"><meta name="version" content="Impala 2.12x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="performance_testing"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Testing Impala Performance</title></head><body id="performance_testing"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> +<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> + +<meta name="copyright" content="(C) Copyright 2018" /> +<meta name="DC.rights.owner" content="(C) Copyright 2018" /> +<meta name="DC.Type" content="concept" /> +<meta name="DC.Title" content="Testing Impala Performance" /> +<meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html" /> +<meta name="prodname" content="Impala" /> +<meta name="prodname" content="Impala" /> +<meta name="version" content="Impala 3.0.x" /> +<meta name="version" content="Impala 3.0.x" /> +<meta name="DC.Format" content="XHTML" /> +<meta name="DC.Identifier" content="performance_testing" /> +<link rel="stylesheet" type="text/css" href="../commonltr.css" /> +<title>Testing Impala Performance</title> +</head> +<body id="performance_testing"> + <h1 class="title topictitle1" id="ariaid-title1">Testing Impala Performance</h1> + <div class="body conbody"> @@ -13,7 +33,8 @@ configuration. These procedures can be used to verify that Impala is set up correctly. </p> - <section class="section" id="performance_testing__checking_config_performance"><h2 class="title sectiontitle">Checking Impala Configuration Values</h2> + + <div class="section" id="performance_testing__checking_config_performance"><h2 class="title sectiontitle">Checking Impala Configuration Values</h2> @@ -21,21 +42,25 @@ You can inspect Impala configuration values by connecting to your Impala server using a browser. </p> + <p class="p"> <strong class="ph b">To check Impala configuration values:</strong> </p> + <ol class="ol"> <li class="li"> Use a browser to connect to one of the hosts running <code class="ph codeph">impalad</code> in your environment. Connect using an address of the form <code class="ph codeph">http://<var class="keyword varname">hostname</var>:<var class="keyword varname">port</var>/varz</code>. - <div class="note note note_note"><span class="note__title notetitle">Note:</span> + <div class="note note"><span class="notetitle">Note:</span> In the preceding example, replace <code class="ph codeph">hostname</code> and <code class="ph codeph">port</code> with the name and port of your Impala server. The default port is 25000. </div> + </li> + <li class="li"> Review the configured values. <p class="p"> @@ -43,13 +68,17 @@ would check that the value for <code class="ph codeph">dfs.datanode.hdfs-blocks-metadata.enabled</code> is <code class="ph codeph">true</code>. </p> + </li> + </ol> + <p class="p" id="performance_testing__p_31"> <strong class="ph b">To check data locality:</strong> </p> + <ol class="ol"> <li class="li"> Execute a query on a dataset that is available across multiple nodes. For example, for a table named @@ -57,13 +86,16 @@ <pre class="pre codeblock"><code>[impalad-host:21000] > SELECT COUNT (*) FROM MyTable</code></pre> </li> + <li class="li"> After the query completes, review the contents of the Impala logs. You should find a recent message similar to the following: <pre class="pre codeblock"><code>Total remote scan volume = 0</code></pre> </li> + </ol> + <p class="p"> The presence of remote scans may indicate <code class="ph codeph">impalad</code> is not running on the correct nodes. This can be because some DataNodes do not have <code class="ph codeph">impalad</code> running or it can be because the @@ -71,10 +103,12 @@ <code class="ph codeph">impalad</code> instances. </p> + <p class="p"> <strong class="ph b">To understand the causes of this issue:</strong> </p> + <ol class="ol"> <li class="li"> Connect to the debugging web server. By default, this server runs on port 25000. This page lists all @@ -83,22 +117,27 @@ <code class="ph codeph">impalad</code> is started on all DataNodes. </li> + <li class="li"> If you are using multi-homed hosts, ensure that the Impala daemon's hostname resolves to the interface on which <code class="ph codeph">impalad</code> is running. The hostname Impala is using is displayed when - <code class="ph codeph">impalad</code> starts. To explicitly set the hostname, use the <code class="ph codeph">--hostname</code> flag. + <code class="ph codeph">impalad</code> starts. To explicitly set the hostname, use the <code class="ph codeph">--hostname</code>Â flag. </li> + <li class="li"> Check that <code class="ph codeph">statestored</code> is running as expected. Review the contents of the state store log to ensure all instances of <code class="ph codeph">impalad</code> are listed as having connected to the state store. </li> + </ol> - </section> - <section class="section" id="performance_testing__checking_config_logs"><h2 class="title sectiontitle">Reviewing Impala Logs</h2> + </div> + + + <div class="section" id="performance_testing__checking_config_logs"><h2 class="title sectiontitle">Reviewing Impala Logs</h2> @@ -111,42 +150,70 @@ <a class="xref" href="impala_logging.html#logging">Using Impala Logging</a>. Log messages and their interpretations are as follows: </p> - <table class="table"><caption></caption><colgroup><col style="width:75%"><col style="width:25%"></colgroup><thead class="thead"> + + +<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" class="table" frame="border" border="1" rules="all"><colgroup><col style="width:75%" /><col style="width:25%" /></colgroup><thead class="thead" style="text-align:left;"> <tr class="row"> - <th class="entry nocellnorowborder" id="performance_testing__checking_config_logs__entry__1"> + <th class="entry nocellnorowborder" style="vertical-align:top;" id="d141767e230"> Log Message </th> - <th class="entry nocellnorowborder" id="performance_testing__checking_config_logs__entry__2"> + + <th class="entry cell-norowborder" style="vertical-align:top;" id="d141767e233"> Interpretation </th> + </tr> - </thead><tbody class="tbody"> + + </thead> +<tbody class="tbody"> <tr class="row"> - <td class="entry nocellnorowborder" headers="performance_testing__checking_config_logs__entry__1 "> + <td class="entry nocellnorowborder" style="vertical-align:top;" headers="d141767e230 "> <div class="p"> <pre class="pre">Unknown disk id. This will negatively affect performance. Check your hdfs settings to enable block location metadata </pre> + </div> + </td> - <td class="entry nocellnorowborder" headers="performance_testing__checking_config_logs__entry__2 "> + + <td class="entry cell-norowborder" style="vertical-align:top;" headers="d141767e233 "> <p class="p"> Tracking block locality is not enabled. </p> + </td> + </tr> + <tr class="row"> - <td class="entry nocellnorowborder" headers="performance_testing__checking_config_logs__entry__1 "> + <td class="entry row-nocellborder" style="vertical-align:top;" headers="d141767e230 "> <div class="p"> <pre class="pre">Unable to load native-hadoop library for your platform... using builtin-java classes where applicable</pre> + </div> + </td> - <td class="entry nocellnorowborder" headers="performance_testing__checking_config_logs__entry__2 "> + + <td class="entry cellrowborder" style="vertical-align:top;" headers="d141767e233 "> <p class="p"> Native checksumming is not enabled. </p> + </td> + </tr> - </tbody></table> - </section> + + </tbody> +</table> +</div> + + </div> + </div> -<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav></article></main></body></html> \ No newline at end of file + +<div class="related-links"> +<div class="familylinks"> +<div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div> +</div> +</div></body> +</html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/impala/blob/b4ad38a9/docs/build/html/topics/impala_performance.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_performance.html b/docs/build/html/topics/impala_performance.html index 7ff8e5c..6938f8c 100644 --- a/docs/build/html/topics/impala_performance.html +++ b/docs/build/html/topics/impala_performance.html @@ -1,8 +1,37 @@ +<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html - SYSTEM "about:legacy-compat"> -<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_cookbook.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_joins.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_stats.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_benchmarking.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_resources.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_filtering.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_hdfs_caching.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_testing.html"><meta name="DC.Relation" scheme="URI" content="../topics/im pala_explain_plan.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_skew.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.12x"><meta name="version" content="Impala 2.12x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="performance"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Tuning Impala for Performance</title></head><body id="performance"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> +<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> + +<meta name="copyright" content="(C) Copyright 2018" /> +<meta name="DC.rights.owner" content="(C) Copyright 2018" /> +<meta name="DC.Type" content="concept" /> +<meta name="DC.Title" content="Tuning Impala for Performance" /> +<meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_cookbook.html" /> +<meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_joins.html" /> +<meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_stats.html" /> +<meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_benchmarking.html" /> +<meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_resources.html" /> +<meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_filtering.html" /> +<meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_hdfs_caching.html" /> +<meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_testing.html" /> +<meta name="DC.Relation" scheme="URI" content="../topics/impala_explain_plan.html" /> +<meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_skew.html" /> +<meta name="prodname" content="Impala" /> +<meta name="prodname" content="Impala" /> +<meta name="version" content="Impala 3.0.x" /> +<meta name="version" content="Impala 3.0.x" /> +<meta name="DC.Format" content="XHTML" /> +<meta name="DC.Identifier" content="performance" /> +<link rel="stylesheet" type="text/css" href="../commonltr.css" /> +<title>Tuning Impala for Performance</title> +</head> +<body id="performance"> + <h1 class="title topictitle1" id="ariaid-title1">Tuning Impala for Performance</h1> + @@ -13,6 +42,7 @@ tuning, monitoring, and benchmarking Impala queries and other SQL operations. </p> + <p class="p"> This section also describes techniques for maximizing Impala scalability. Scalability is tied to performance: it means that performance remains high as the system workload increases. For example, reducing the disk I/O @@ -23,14 +53,17 @@ without running out of memory. </p> - <div class="note note note_note"><span class="note__title notetitle">Note:</span> + + <div class="note note"><span class="notetitle">Note:</span> <p class="p"> Before starting any performance tuning or benchmarking, make sure your system is configured with all the recommended minimum hardware requirements from <a class="xref" href="impala_prereqs.html#prereqs_hardware">Hardware Requirements</a> and software settings from <a class="xref" href="impala_config_performance.html#config_performance">Post-Installation Configuration for Impala</a>. </p> + </div> + <ul class="ul"> <li class="li"> <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>. This technique physically divides the data based on @@ -38,6 +71,7 @@ the data in a table. </li> + <li class="li"> <a class="xref" href="impala_perf_joins.html#perf_joins">Performance Considerations for Join Queries</a>. Joins are the main class of queries that you can tune at the SQL level, as opposed to changing physical factors such as the file format or the hardware @@ -45,6 +79,7 @@ <a class="xref" href="impala_perf_stats.html#perf_table_stats">Overview of Table Statistics</a> are also important primarily for join performance. </li> + <li class="li"> <a class="xref" href="impala_perf_stats.html#perf_table_stats">Overview of Table Statistics</a> and <a class="xref" href="impala_perf_stats.html#perf_column_stats">Overview of Column Statistics</a>. Gathering table and column statistics, using the @@ -55,16 +90,19 @@ <code class="ph codeph">ANALYZE TABLE</code> statement in Hive.) </li> + <li class="li"> <a class="xref" href="impala_perf_testing.html#performance_testing">Testing Impala Performance</a>. Do some post-setup testing to ensure Impala is using optimal settings for performance, before conducting any benchmark tests. </li> + <li class="li"> <a class="xref" href="impala_perf_benchmarking.html#perf_benchmarks">Benchmarking Impala Queries</a>. The configuration and sample data that you use for initial experiments with Impala is often not appropriate for doing performance tests. </li> + <li class="li"> <a class="xref" href="impala_perf_resources.html#mem_limits">Controlling Impala Resource Usage</a>. The more memory Impala can utilize, the better query performance you can expect. In a cluster running other kinds of workloads as well, you must make tradeoffs @@ -72,26 +110,32 @@ Impala can use. </li> + <li class="li"> <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a>. Queries against data stored in the Amazon Simple Storage Service (S3) have different performance characteristics than when the data is stored in HDFS. </li> + </ul> + <p class="p toc"></p> + <p class="p"> A good source of tips related to scalability and performance tuning is the <a class="xref" href="http://www.slideshare.net/cloudera/the-impala-cookbook-42530186" target="_blank">Impala Cookbook</a> presentation. These slides are updated periodically as new features come out and new benchmarks are performed. </p> + </div> + @@ -113,4 +157,28 @@ -<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_perf_cookbook.html">Impala Performance Guidelines and Best Practices</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_perf_joins.html">Performance Considerations for Join Queries</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_perf_stats.html">Table and Column Statistics</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_perf_benchmarking.html">Benchmarking Impala Queries</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_perf_resources.html">Controlling Impala Resource Usage</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/i mpala_perf_hdfs_caching.html">Using HDFS Caching with Impala (Impala 2.1 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_perf_testing.html">Testing Impala Performance</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_explain_plan.html">Understanding Impala Query Performance - EXPLAIN Plans and Query Profiles</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_perf_skew.html">Detecting and Correcting HDFS Block Skew Conditions</a></strong><br></li></ul></nav></article></main></body></html> \ No newline at end of file +<div class="related-links"> +<ul class="ullinks"> +<li class="link ulchildlink"><strong><a href="../topics/impala_perf_cookbook.html">Impala Performance Guidelines and Best Practices</a></strong><br /> +</li> +<li class="link ulchildlink"><strong><a href="../topics/impala_perf_joins.html">Performance Considerations for Join Queries</a></strong><br /> +</li> +<li class="link ulchildlink"><strong><a href="../topics/impala_perf_stats.html">Table and Column Statistics</a></strong><br /> +</li> +<li class="link ulchildlink"><strong><a href="../topics/impala_perf_benchmarking.html">Benchmarking Impala Queries</a></strong><br /> +</li> +<li class="link ulchildlink"><strong><a href="../topics/impala_perf_resources.html">Controlling Impala Resource Usage</a></strong><br /> +</li> +<li class="link ulchildlink"><strong><a href="../topics/impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a></strong><br /> +</li> +<li class="link ulchildlink"><strong><a href="../topics/impala_perf_hdfs_caching.html">Using HDFS Caching with Impala (Impala 2.1 or higher only)</a></strong><br /> +</li> +<li class="link ulchildlink"><strong><a href="../topics/impala_perf_testing.html">Testing Impala Performance</a></strong><br /> +</li> +<li class="link ulchildlink"><strong><a href="../topics/impala_explain_plan.html">Understanding Impala Query Performance - EXPLAIN Plans and Query Profiles</a></strong><br /> +</li> +<li class="link ulchildlink"><strong><a href="../topics/impala_perf_skew.html">Detecting and Correcting HDFS Block Skew Conditions</a></strong><br /> +</li> +</ul> +</div></body> +</html> \ No newline at end of file http://git-wip-us.apache.org/repos/asf/impala/blob/b4ad38a9/docs/build/html/topics/impala_planning.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_planning.html b/docs/build/html/topics/impala_planning.html index cedbd58..c769d3e 100644 --- a/docs/build/html/topics/impala_planning.html +++ b/docs/build/html/topics/impala_planning.html @@ -1,8 +1,29 @@ +<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html - SYSTEM "about:legacy-compat"> -<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_prereqs.html#prereqs"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_design.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.12x"><meta name="version" content="Impala 2.12x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="planning"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Planning for Impala Deployment</title></head><body id="planning"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> +<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> + +<meta name="copyright" content="(C) Copyright 2018" /> +<meta name="DC.rights.owner" content="(C) Copyright 2018" /> +<meta name="DC.Type" content="concept" /> +<meta name="DC.Title" content="Planning for Impala Deployment" /> +<meta name="DC.Relation" scheme="URI" content="../topics/impala_prereqs.html#prereqs" /> +<meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_design.html" /> +<meta name="prodname" content="Impala" /> +<meta name="prodname" content="Impala" /> +<meta name="version" content="Impala 3.0.x" /> +<meta name="version" content="Impala 3.0.x" /> +<meta name="DC.Format" content="XHTML" /> +<meta name="DC.Identifier" content="planning" /> +<link rel="stylesheet" type="text/css" href="../commonltr.css" /> +<title>Planning for Impala Deployment</title> +</head> +<body id="planning"> + <h1 class="title topictitle1" id="ariaid-title1">Planning for Impala Deployment</h1> + @@ -15,6 +36,17 @@ processes follow the best practices for Impala. </p> + <p class="p toc"></p> + </div> -<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_prereqs.html#prereqs">Impala Requirements</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_schema_design.html">Guidelines for Designing Impala Schemas</a></strong><br></li></ul></nav></article></main></body></html> \ No newline at end of file + +<div class="related-links"> +<ul class="ullinks"> +<li class="link ulchildlink"><strong><a href="../topics/impala_prereqs.html#prereqs">Impala Requirements</a></strong><br /> +</li> +<li class="link ulchildlink"><strong><a href="../topics/impala_schema_design.html">Guidelines for Designing Impala Schemas</a></strong><br /> +</li> +</ul> +</div></body> +</html> \ No newline at end of file