[15/51] [partial] impala git commit: [DOCS] Impala 3.1 Docs to be published

mikeb Thu, 06 Dec 2018 15:21:29 -0800

http://git-wip-us.apache.org/repos/asf/impala/blob/b4ad38a9/docs/build/html/topics/impala_orc.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_orc.html 
b/docs/build/html/topics/impala_orc.html
new file mode 100644
index 0000000..a6776ae
--- /dev/null
+++ b/docs/build/html/topics/impala_orc.html
@@ -0,0 +1,368 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE html
+  PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
+<html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
+<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
+
+<meta name="copyright" content="(C) Copyright 2018" />
+<meta name="DC.rights.owner" content="(C) Copyright 2018" />
+<meta name="DC.Type" content="concept" />
+<meta name="DC.Title" content="Using the ORC File Format with Impala Tables" />
+<meta name="DC.Relation" scheme="URI" 
content="../topics/impala_file_formats.html" />
+<meta name="prodname" content="Impala" />
+<meta name="version" content="Impala 3.0.x" />
+<meta name="DC.Format" content="XHTML" />
+<meta name="DC.Identifier" content="orc" />
+<link rel="stylesheet" type="text/css" href="../commonltr.css" />
+<title>Using the ORC File Format with Impala Tables</title>
+</head>
+<body id="orc">
+
+
+  <h1 class="title topictitle1" id="ariaid-title1">Using the ORC File Format 
with Impala Tables</h1>
+
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+
+      Impala supports using ORC data files as an experimental feature since 
2.12.
+      To disable it, set --enable_orc_scanner to false when starting the 
cluster.
+    </p>
+
+
+
+<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" 
class="table" frame="border" border="1" rules="all"><caption><span 
class="tablecap"><span class="table--title-label">Table 1. </span>ORC Format 
Support in Impala</span></caption><colgroup><col style="width:10%" /><col 
style="width:10%" /><col style="width:20%" /><col style="width:30%" /><col 
style="width:30%" /></colgroup><thead class="thead" style="text-align:left;">
+          <tr class="row">
+            <th class="entry nocellnorowborder" style="vertical-align:top;" 
id="d131791e70">
+              File Type
+            </th>
+
+            <th class="entry nocellnorowborder" style="vertical-align:top;" 
id="d131791e73">
+              Format
+            </th>
+
+            <th class="entry nocellnorowborder" style="vertical-align:top;" 
id="d131791e76">
+              Compression Codecs
+            </th>
+
+            <th class="entry nocellnorowborder" style="vertical-align:top;" 
id="d131791e79">
+              Impala Can CREATE?
+            </th>
+
+            <th class="entry cell-norowborder" style="vertical-align:top;" 
id="d131791e82">
+              Impala Can INSERT?
+            </th>
+
+          </tr>
+
+        </thead>
+<tbody class="tbody">
+          <tr class="row">
+            <td class="entry row-nocellborder" style="vertical-align:top;" 
headers="d131791e70 ">
+              <a class="xref" href="impala_orc.html#orc">ORC</a>
+            </td>
+
+            <td class="entry row-nocellborder" style="vertical-align:top;" 
headers="d131791e73 ">
+              Structured
+            </td>
+
+            <td class="entry row-nocellborder" style="vertical-align:top;" 
headers="d131791e76 ">
+              gzip, Snappy, LZO, LZ4; currently gzip by default
+            </td>
+
+            <td class="entry row-nocellborder" style="vertical-align:top;" 
headers="d131791e79 ">
+              Yes, in Impala 2.12.0 and higher.
+            </td>
+
+            <td class="entry cellrowborder" style="vertical-align:top;" 
headers="d131791e82 ">
+              No. Import data by using <code class="ph codeph">LOAD 
DATA</code> on data files already in the right format, or use
+              <code class="ph codeph">INSERT</code> in Hive followed by <code 
class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> 
in Impala.
+            </td>
+
+          </tr>
+
+        </tbody>
+</table>
+</div>
+
+
+    <p class="p toc inpage"></p>
+
+  </div>
+
+
+  <div class="related-links">
+<div class="familylinks">
+<div class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_file_formats.html">How Impala Works with Hadoop File 
Formats</a></div>
+</div>
+</div><div class="topic concept nested1" aria-labelledby="ariaid-title2" 
id="orc_create">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Creating ORC Tables and 
Loading Data</h2>
+
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        If you do not have an existing data file to use, begin by creating one 
in the appropriate format.
+      </p>
+
+
+      <p class="p">
+        <strong class="ph b">To create an ORC table:</strong>
+      </p>
+
+
+      <p class="p">
+        In the <code class="ph codeph">impala-shell</code> interpreter, issue 
a command similar to:
+      </p>
+
+
+<pre class="pre codeblock"><code>CREATE TABLE orc_table (<var class="keyword 
varname">column_specs</var>) STORED AS ORC;</code></pre>
+
+      <p class="p">
+        Because Impala can query some kinds of tables that it cannot currently 
write to, after creating tables of
+        certain file formats, you might use the Hive shell to load the data. 
See
+        <a class="xref" href="impala_file_formats.html#file_formats">How 
Impala Works with Hadoop File Formats</a> for details. After loading data into 
a table through
+        Hive or other mechanism outside of Impala, issue a <code class="ph 
codeph">REFRESH <var class="keyword varname">table_name</var></code>
+        statement the next time you connect to the Impala node, before 
querying the table, to make Impala recognize
+        the new data.
+      </p>
+
+
+      <p class="p">
+        For example, here is how you might create some ORC tables in Impala 
(by specifying the columns
+        explicitly, or cloning the structure of another table), load data 
through Hive, and query them through
+        Impala:
+      </p>
+
+
+<pre class="pre codeblock"><code>$ impala-shell -i localhost
+[localhost:21000] default&gt; CREATE TABLE orc_table (x INT) STORED AS ORC;
+[localhost:21000] default&gt; CREATE TABLE orc_clone LIKE some_other_table 
STORED AS ORC;
+[localhost:21000] default&gt; quit;
+
+$ hive
+hive&gt; INSERT INTO TABLE orc_table SELECT x FROM some_other_table;
+3 Rows loaded to orc_table
+Time taken: 4.169 seconds
+hive&gt; quit;
+
+$ impala-shell -i localhost
+[localhost:21000] default&gt; SELECT * FROM orc_table;
+Fetched 0 row(s) in 0.11s
+[localhost:21000] default&gt; -- Make Impala recognize the data loaded through 
Hive;
+[localhost:21000] default&gt; REFRESH orc_table;
+[localhost:21000] default&gt; SELECT * FROM orc_table;
++---+
+| x |
++---+
+| 1 |
+| 2 |
+| 3 |
++---+
+Fetched 3 row(s) in 0.11s</code></pre>
+
+    </div>
+
+  </div>
+
+
+  <div class="topic concept nested1" aria-labelledby="ariaid-title3" 
id="orc_compression">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Enabling Compression for 
ORC Tables</h2>
+
+
+
+    <div class="body conbody">
+
+      <p class="p">
+
+        ORC tables are in zlib (Deflate in Impala) compression in default. You 
may want
+        to use Snappy or LZO compression on existing tables for different 
balance between
+        compression ratio and decompression speed. In Hive-1.1.0, the supported
+        compressions for ORC tables are NONE, ZLIB, SNAPPY and LZO.
+        For example, to enable Snappy compression, you would specify
+        the following additional settings when loading data through the Hive 
shell:
+      </p>
+
+
+<pre class="pre codeblock"><code>hive&gt; SET hive.exec.compress.output=true;
+hive&gt; SET orc.compress=SNAPPY;
+hive&gt; INSERT OVERWRITE TABLE <var class="keyword varname">new_table</var> 
SELECT * FROM <var class="keyword varname">old_table</var>;</code></pre>
+
+      <p class="p">
+        If you are converting partitioned tables, you must complete additional 
steps. In such a case, specify
+        additional settings similar to the following:
+      </p>
+
+
+<pre class="pre codeblock"><code>hive&gt; CREATE TABLE <var class="keyword 
varname">new_table</var> (<var class="keyword varname">your_cols</var>) 
PARTITIONED BY (<var class="keyword varname">partition_cols</var>) STORED AS 
<var class="keyword varname">new_format</var>;
+hive&gt; SET hive.exec.dynamic.partition.mode=nonstrict;
+hive&gt; SET hive.exec.dynamic.partition=true;
+hive&gt; INSERT OVERWRITE TABLE <var class="keyword varname">new_table</var> 
PARTITION(<var class="keyword varname">comma_separated_partition_cols</var>) 
SELECT * FROM <var class="keyword varname">old_table</var>;</code></pre>
+
+      <p class="p">
+        Remember that Hive does not require that you specify a source format 
for it. Consider the case of
+        converting a table with two partition columns called <code class="ph 
codeph">year</code> and <code class="ph codeph">month</code> to a
+        Snappy compressed ORC table. Combining the components outlined 
previously to complete this table conversion,
+        you would specify settings similar to the following:
+      </p>
+
+
+<pre class="pre codeblock"><code>hive&gt; CREATE TABLE tbl_orc (int_col INT, 
string_col STRING) STORED AS ORC;
+hive&gt; SET hive.exec.compress.output=true;
+hive&gt; SET orc.compress=SNAPPY;
+hive&gt; SET hive.exec.dynamic.partition.mode=nonstrict;
+hive&gt; SET hive.exec.dynamic.partition=true;
+hive&gt; INSERT OVERWRITE TABLE tbl_orc SELECT * FROM tbl;</code></pre>
+
+      <p class="p">
+        To complete a similar process for a table that includes partitions, 
you would specify settings similar to
+        the following:
+      </p>
+
+
+<pre class="pre codeblock"><code>hive&gt; CREATE TABLE tbl_orc (int_col INT, 
string_col STRING) PARTITIONED BY (year INT) STORED AS ORC;
+hive&gt; SET hive.exec.compress.output=true;
+hive&gt; SET orc.compress=SNAPPY;
+hive&gt; SET hive.exec.dynamic.partition.mode=nonstrict;
+hive&gt; SET hive.exec.dynamic.partition=true;
+hive&gt; INSERT OVERWRITE TABLE tbl_orc PARTITION(year) SELECT * FROM 
tbl;</code></pre>
+
+      <div class="note note"><span class="notetitle">Note:</span>
+        <p class="p">
+          The compression type is specified in the following command:
+        </p>
+
+<pre class="pre codeblock"><code>SET orc.compress=SNAPPY;</code></pre>
+        <p class="p">
+          You could elect to specify alternative codecs such as <code 
class="ph codeph">NONE, GZIP, LZO</code> here.
+        </p>
+
+      </div>
+
+    </div>
+
+  </div>
+
+
+  <div class="topic concept nested1" aria-labelledby="ariaid-title4" 
id="rcfile_performance">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Query Performance for 
Impala ORC Tables</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        In general, expect query performance with ORC tables to be
+        faster than with tables using text data, but slower than with
+        Parquet tables since there're bunch of optimizations for Parquet.
+        See <a class="xref" href="impala_parquet.html#parquet">Using the 
Parquet File Format with Impala Tables</a>
+        for information about using the Parquet file format for
+        high-performance analytic queries.
+      </p>
+
+
+      <p class="p">
+        In <span class="keyword">Impala 2.6</span> and higher, Impala queries 
are optimized for files stored in Amazon S3.
+        For Impala tables that use the file formats Parquet, ORC, RCFile, 
SequenceFile,
+        Avro, and uncompressed text, the setting <code class="ph 
codeph">fs.s3a.block.size</code>
+        in the <span class="ph filepath">core-site.xml</span> configuration 
file determines
+        how Impala divides the I/O work of reading the data files. This 
configuration
+        setting is specified in bytes. By default, this
+        value is 33554432 (32 MB), meaning that Impala parallelizes S3 read 
operations on the files
+        as if they were made up of 32 MB blocks. For example, if your S3 
queries primarily access
+        Parquet files written by MapReduce or Hive, increase <code class="ph 
codeph">fs.s3a.block.size</code>
+        to 134217728 (128 MB) to match the row group size of those files. If 
most S3 queries involve
+        Parquet files written by Impala, increase <code class="ph 
codeph">fs.s3a.block.size</code>
+        to 268435456 (256 MB) to match the row group size produced by Impala.
+      </p>
+
+
+    </div>
+
+  </div>
+
+
+  <div class="topic concept nested1" aria-labelledby="ariaid-title5" 
id="orc_data_types">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Data Type Considerations 
for ORC Tables</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        The ORC format defines a set of data types whose names differ from the 
names of the corresponding
+        Impala data types. If you are preparing ORC files using other Hadoop 
components such as Pig or
+        MapReduce, you might need to work with the type names defined by ORC. 
The following figure lists the
+        ORC-defined types and the equivalent types in Impala.
+      </p>
+
+
+      <p class="p">
+        <strong class="ph b">Primitive types:</strong>
+      </p>
+
+
+<pre class="pre codeblock"><code>BINARY -&gt; STRING
+BOOLEAN -&gt; BOOLEAN
+DOUBLE -&gt; DOUBLE
+FLOAT -&gt; FLOAT
+TINYINT -&gt; TINYINT
+SMALLINT -&gt; SMALLINT
+INT -&gt; INT
+BIGINT -&gt; BIGINT
+TIMESTAMP -&gt; TIMESTAMP
+DATE (not supported)
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">Complex types:</strong>
+      </p>
+
+      <p class="p">
+        Complex types are currently not supported on ORC. However, queries 
materializing only scalar type
+        columns are allowed:
+      </p>
+
+
+<pre class="pre codeblock"><code>$ hive
+hive&gt; CREATE TABLE orc_nested_table (id INT, a ARRAY&lt;INT&gt;) STORED AS 
ORC;
+hive&gt; INSERT INTO TABLE orc_nested_table SELECT 1, ARRAY(1,2,3);
+OK
+Time taken: 2.629 seconds
+hive&gt; quit;
+
+$ impala-shell -i localhost
+[localhost:21000] default&gt; INVALIDATE METADATA orc_nested_table;
+[localhost:21000] default&gt; SELECT 1 FROM orc_nested_table t, t.a;
+ERROR: NotImplementedException: Scan of table 't' in format 'ORC' is not 
supported because the table has a column 'a' with a complex type 
'ARRAY&lt;INT&gt;'.
+Complex types are supported for these file formats: PARQUET.
+
+[localhost:21000] default&gt; SELECT COUNT(*) FROM orc_nested_table;
++----------+
+| count(*) |
++----------+
+| 1        |
++----------+
+Fetched 1 row(s) in 0.12s
+
+[localhost:21000] default&gt; SELECT id FROM orc_nested_table;
++----+
+| id |
++----+
+| 1  |
++----+
+Fetched 1 row(s) in 0.12s
+</code></pre>
+
+    </div>
+
+  </div>
+
+</body>
+</html>
\ No newline at end of file


http://git-wip-us.apache.org/repos/asf/impala/blob/b4ad38a9/docs/build/html/topics/impala_order_by.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_order_by.html 
b/docs/build/html/topics/impala_order_by.html
index 040c6ce..e8acc8c 100644
--- a/docs/build/html/topics/impala_order_by.html
+++ b/docs/build/html/topics/impala_order_by.html
@@ -1,8 +1,28 @@
+<?xml version="1.0" encoding="UTF-8"?>
 <!DOCTYPE html
-  SYSTEM "about:legacy-compat">
-<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) 
Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta 
name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" 
content="../topics/impala_select.html"><meta name="prodname" 
content="Impala"><meta name="prodname" content="Impala"><meta name="version" 
content="Impala 2.12x"><meta name="version" content="Impala 2.12x"><meta 
name="DC.Format" content="XHTML"><meta name="DC.Identifier" 
content="order_by"><link rel="stylesheet" type="text/css" 
href="../commonltr.css"><title>ORDER BY Clause</title></head><body 
id="order_by"><main role="main"><article role="article" 
aria-labelledby="ariaid-title1">
+  PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
+<html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en" lang="en">
+<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
+
+<meta name="copyright" content="(C) Copyright 2018" />
+<meta name="DC.rights.owner" content="(C) Copyright 2018" />
+<meta name="DC.Type" content="concept" />
+<meta name="DC.Title" content="ORDER BY Clause" />
+<meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html" />
+<meta name="prodname" content="Impala" />
+<meta name="prodname" content="Impala" />
+<meta name="version" content="Impala 3.0.x" />
+<meta name="version" content="Impala 3.0.x" />
+<meta name="DC.Format" content="XHTML" />
+<meta name="DC.Identifier" content="order_by" />
+<link rel="stylesheet" type="text/css" href="../commonltr.css" />
+<title>ORDER BY Clause</title>
+</head>
+<body id="order_by">
+
 
   <h1 class="title topictitle1" id="ariaid-title1">ORDER BY Clause</h1>
+
   
 
   <div class="body conbody">
@@ -12,6 +32,7 @@
       based on the values from one or more columns.
     </p>
 
+
     <p class="p">
       For distributed queries, this is a relatively expensive operation, 
because the entire result set must be
       produced and transferred to one node before the sorting can happen. This 
can require more memory capacity
@@ -23,23 +44,28 @@
       requirement on the coordinator node.
     </p>
 
-    <div class="note note note_note"><span class="note__title 
notetitle">Note:</span> 
+
+    <div class="note note"><span class="notetitle">Note:</span>
       <p class="p">
         In Impala 1.4.0 and higher, the <code class="ph codeph">LIMIT</code> 
clause is now optional (rather than required) for
         queries that use the <code class="ph codeph">ORDER BY</code> clause. 
Impala automatically uses a temporary disk work area
         to perform the sort if the sort operation would otherwise exceed the 
Impala memory limit for a particular
         DataNode.
       </p>
+
     </div>
 
+
     <p class="p">
         <strong class="ph b">Syntax:</strong>
       </p>
 
+
     <p class="p">
       The full syntax for the <code class="ph codeph">ORDER BY</code> clause 
is:
     </p>
 
+
 <pre class="pre codeblock"><code>ORDER BY <var class="keyword 
varname">col_ref</var> [, <var class="keyword varname">col_ref</var> ...] [ASC 
| DESC] [NULLS FIRST | NULLS LAST]
 
 col_ref ::= <var class="keyword varname">column_name</var> | <var 
class="keyword varname">integer_literal</var>
@@ -53,25 +79,30 @@ col_ref ::= <var class="keyword varname">column_name</var> 
| <var class="keyword
       query succeeds but the order of results is undefined.)
     </p>
 
+
     <p class="p">
       <code class="ph codeph">ORDER BY <var class="keyword 
varname">column_number</var></code> can only be used when the query explicitly 
lists
       the columns in the <code class="ph codeph">SELECT</code> list, not with 
<code class="ph codeph">SELECT *</code> queries.
     </p>
 
+
     <p class="p">
       <strong class="ph b">Ascending and descending sorts:</strong>
     </p>
 
+
     <p class="p">
       The default sort order (the same as using the <code class="ph 
codeph">ASC</code> keyword) puts the smallest values at the
       start of the result set, and the largest values at the end. Specifying 
the <code class="ph codeph">DESC</code> keyword
       reverses that order.
     </p>
 
+
     <p class="p">
       <strong class="ph b">Sort order for NULL values:</strong>
     </p>
 
+
     <p class="p">
       See <a class="xref" href="impala_literals.html#null">NULL</a> for 
details about how <code class="ph codeph">NULL</code> values are positioned
       in the sorted result set, and how to use the <code class="ph 
codeph">NULLS FIRST</code> and <code class="ph codeph">NULLS LAST</code>
@@ -80,6 +111,7 @@ col_ref ::= <var class="keyword varname">column_name</var> | 
<var class="keyword
       <code class="ph codeph">NULLS LAST</code> keywords are new in Impala 
1.2.1.)
     </p>
 
+
     <p class="p">
         Prior to Impala 1.4.0, Impala required any query including an
         <code class="ph codeph"><a class="xref" 
href="../shared/../topics/impala_order_by.html#order_by">ORDER BY</a></code> 
clause to also use a
@@ -89,11 +121,13 @@ col_ref ::= <var class="keyword varname">column_name</var> 
| <var class="keyword
         Impala automatically uses a temporary disk work area to perform the 
sort operation.
       </p>
 
+
     
     <p class="p">
         <strong class="ph b">Complex type considerations:</strong>
       </p>
 
+
     <p class="p">
       In <span class="keyword">Impala 2.3</span> and higher, the complex data 
types <code class="ph codeph">STRUCT</code>,
       <code class="ph codeph">ARRAY</code>, and <code class="ph 
codeph">MAP</code> are available. These columns cannot
@@ -104,10 +138,12 @@ col_ref ::= <var class="keyword 
varname">column_name</var> | <var class="keyword
       See <a class="xref" 
href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or 
higher only)</a> for details about Impala support for complex types.
     </p>
 
+
     <p class="p">
       The following query shows how a complex type column cannot be directly 
used in an <code class="ph codeph">ORDER BY</code> clause:
     </p>
 
+
 <pre class="pre codeblock"><code>CREATE TABLE games (id BIGINT, score ARRAY 
&lt;BIGINT&gt;) STORED AS PARQUET;
 ...use LOAD DATA to load externally created Parquet files into the table...
 SELECT id FROM games ORDER BY score DESC;
@@ -118,6 +154,7 @@ ERROR: AnalysisException: ORDER BY expression 'score' with 
complex type 'ARRAY&l
         <strong class="ph b">Examples:</strong>
       </p>
 
+
     <p class="p">
       The following query retrieves the user ID and score, only for scores 
greater than one million,
       with the highest scores for each user listed first.
@@ -126,6 +163,7 @@ ERROR: AnalysisException: ORDER BY expression 'score' with 
complex type 'ARRAY&l
       pseudocolumn that represents each array element.
     </p>
 
+
 <pre class="pre codeblock"><code>SELECT id, item FROM games, games.score
   WHERE item &gt; 1000000
 ORDER BY id, item desc;
@@ -140,6 +178,7 @@ ORDER BY id, item desc;
       and <code class="ph codeph">ITEM.VALUE</code>.
     </p>
 
+
 <pre class="pre codeblock"><code>CREATE TABLE games2 (id BIGINT, play array 
&lt; struct &lt;game_name: string, score: BIGINT, high_score: boolean&gt; &gt;) 
STORED AS PARQUET
 ...use LOAD DATA to load externally created Parquet files into the table...
 SELECT id, item.game_name, item.score FROM games2, games2.play
@@ -157,6 +196,7 @@ ORDER BY id, info.value desc;
         <strong class="ph b">Usage notes:</strong>
       </p>
 
+
     <p class="p">
       Although the <code class="ph codeph">LIMIT</code> clause is now optional 
on <code class="ph codeph">ORDER BY</code> queries, if your
       query only needs some number of rows that you can predict in advance, 
use the <code class="ph codeph">LIMIT</code> clause
@@ -165,6 +205,7 @@ ORDER BY id, info.value desc;
       coordinator node picks the 10 highest or lowest row values out of this 
small intermediate result set.
     </p>
 
+
     <p class="p">
       If an <code class="ph codeph">ORDER BY</code> clause is applied to an 
early phase of query processing, such as a subquery
       or a view definition, Impala ignores the <code class="ph codeph">ORDER 
BY</code> clause. To get ordered results from a
@@ -172,11 +213,13 @@ ORDER BY id, info.value desc;
       level.
     </p>
 
+
     <p class="p">
       <code class="ph codeph">ORDER BY</code> is often used in combination 
with <code class="ph codeph">LIMIT</code> to perform <span 
class="q">"top-N"</span>
       queries:
     </p>
 
+
 <pre class="pre codeblock"><code>SELECT user_id AS "Top 10 Visitors", 
SUM(page_views) FROM web_stats
   GROUP BY page_views, user_id
   ORDER BY SUM(page_views) DESC LIMIT 10;
@@ -188,6 +231,7 @@ ORDER BY id, info.value desc;
       queries like this against the large tables typically used with Impala:
     </p>
 
+
 <pre class="pre codeblock"><code>SELECT page_title AS "Page 1 of search 
results", page_url FROM search_content
   WHERE LOWER(page_title) LIKE '%game%')
   ORDER BY page_title LIMIT 10 OFFSET 0;
@@ -203,12 +247,14 @@ SELECT page_title AS "Page 3 of search results", page_url 
FROM search_content
         <strong class="ph b">Internal details:</strong>
       </p>
 
+
     <p class="p">
       Impala sorts the intermediate results of an <code class="ph 
codeph">ORDER BY</code> clause in memory whenever practical. In
       a cluster of N DataNodes, each node sorts roughly 1/Nth of the result 
set, the exact proportion varying
       depending on how the data matching the query is distributed in HDFS.
     </p>
 
+
     <p class="p">
       If the size of the sorted intermediate result set on any DataNode would 
cause the query to exceed the Impala
       memory limit, Impala sorts as much as practical in memory, then writes 
partially sorted data to disk. (This
@@ -221,6 +267,7 @@ SELECT page_title AS "Page 3 of search results", page_url 
FROM search_content
       the additional RAM to minimize the amount of disk I/O for sorting.
     </p>
 
+
     <p class="p">
       This external sort technique is used as appropriate on each DataNode 
(possibly including the coordinator
       node) to sort the portion of the result set that is processed on that 
node. When the sorted intermediate
@@ -229,10 +276,12 @@ SELECT page_title AS "Page 3 of search results", page_url 
FROM search_content
       coordinator node.
     </p>
 
+
     <p class="p">
       <strong class="ph b">Configuration for disk usage:</strong>
     </p>
 
+
     <p class="p">
         By default, intermediate files used during large sort, join, 
aggregation, or analytic function operations
         are stored in the directory <span class="ph 
filepath">/tmp/impala-scratch</span> . These files are removed when the
@@ -254,6 +303,7 @@ SELECT page_title AS "Page 3 of search results", page_url 
FROM search_content
 
 
 
+
     <p class="p">
         <strong class="ph b">Sorting considerations:</strong> Although you can 
specify an <code class="ph codeph">ORDER BY</code> clause in an
         <code class="ph codeph">INSERT ... SELECT</code> statement, any <code 
class="ph codeph">ORDER BY</code> clause is ignored and the
@@ -262,6 +312,7 @@ SELECT page_title AS "Page 3 of search results", page_url 
FROM search_content
         stored in sorted order is impractical.
       </p>
 
+
     <div class="p">
         An <code class="ph codeph">ORDER BY</code> clause without an 
additional <code class="ph codeph">LIMIT</code> clause is ignored in any
         view definition. If you need to sort the entire result set from a 
view, use an <code class="ph codeph">ORDER BY</code>
@@ -311,11 +362,13 @@ SELECT page_title AS "Page 3 of search results", page_url 
FROM search_content
 </code></pre>
       </div>
 
+
     <p class="p">
       With the lifting of the requirement to include a <code class="ph 
codeph">LIMIT</code> clause in every <code class="ph codeph">ORDER
       BY</code> query (in Impala 1.4 and higher):
     </p>
 
+
     <ul class="ul">
       <li class="li">
         <p class="p">
@@ -323,19 +376,13 @@ SELECT page_title AS "Page 3 of search results", page_url 
FROM search_content
           particular DataNode, as opposed to the previous possibility of an 
<span class="q">"out of memory"</span> error. Make sure
           to keep at least 1 GB free on the filesystem used for temporary 
sorting work.
         </p>
-      </li>
 
-      <li class="li">
-        <p class="p">
-          The query options
-          <a class="xref" 
href="impala_default_order_by_limit.html#default_order_by_limit">DEFAULT_ORDER_BY_LIMIT</a>
 and
-          <a class="xref" 
href="impala_abort_on_default_limit_exceeded.html#abort_on_default_limit_exceeded">ABORT_ON_DEFAULT_LIMIT_EXCEEDED</a>,
-          which formerly controlled the behavior of <code class="ph 
codeph">ORDER BY</code> queries with no limit specified, are
-          now ignored.
-        </p>
       </li>
+
+
     </ul>
 
+
     <p class="p">
         In Impala 1.2.1 and higher, all <code class="ph codeph">NULL</code> 
values come at the end of the result set for
         <code class="ph codeph">ORDER BY ... ASC</code> queries, and at the 
beginning of the result set for <code class="ph codeph">ORDER BY ...
@@ -346,6 +393,7 @@ SELECT page_title AS "Page 3 of search results", page_url 
FROM search_content
         behavior for <code class="ph codeph">NULL</code> by adding the clause 
<code class="ph codeph">NULLS FIRST</code> or <code class="ph codeph">NULLS
         LAST</code> at the end of the <code class="ph codeph">ORDER BY</code> 
clause.
       </p>
+
 <pre class="pre codeblock"><code>[localhost:21000] &gt; create table numbers 
(x int);
 [localhost:21000] &gt; insert into numbers values (1), (null), (2), (null), 
(3);
 [localhost:21000] &gt; select x from numbers order by x nulls first;
@@ -394,14 +442,23 @@ SELECT page_title AS "Page 3 of search results", page_url 
FROM search_content
         <strong class="ph b">Related information:</strong>
       </p>
 
+
     <p class="p">
       See <a class="xref" href="impala_select.html#select">SELECT 
Statement</a> for further examples of queries with the <code class="ph 
codeph">ORDER
       BY</code> clause.
     </p>
 
+
     <p class="p">
       Analytic functions use the <code class="ph codeph">ORDER BY</code> 
clause in a different context to define the sequence in
       which rows are analyzed. See <a class="xref" 
href="impala_analytic_functions.html#analytic_functions">Impala Analytic 
Functions</a> for details.
     </p>
+
   </div>
-<nav role="navigation" class="related-links"><div class="familylinks"><div 
class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_select.html">SELECT 
Statement</a></div></div></nav></article></main></body></html>
\ No newline at end of file
+
+<div class="related-links">
+<div class="familylinks">
+<div class="parentlink"><strong>Parent topic:</strong> <a class="link" 
href="../topics/impala_select.html">SELECT Statement</a></div>
+</div>
+</div></body>
+</html>
\ No newline at end of file

[15/51] [partial] impala git commit: [DOCS] Impala 3.1 Docs to be published

Reply via email to