http://git-wip-us.apache.org/repos/asf/impala/blob/b4ad38a9/docs/build/html/topics/impala_orc.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_orc.html b/docs/build/html/topics/impala_orc.html new file mode 100644 index 0000000..a6776ae --- /dev/null +++ b/docs/build/html/topics/impala_orc.html @@ -0,0 +1,368 @@ +<?xml version="1.0" encoding="UTF-8"?> +<!DOCTYPE html + PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> +<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> + +<meta name="copyright" content="(C) Copyright 2018" /> +<meta name="DC.rights.owner" content="(C) Copyright 2018" /> +<meta name="DC.Type" content="concept" /> +<meta name="DC.Title" content="Using the ORC File Format with Impala Tables" /> +<meta name="DC.Relation" scheme="URI" content="../topics/impala_file_formats.html" /> +<meta name="prodname" content="Impala" /> +<meta name="version" content="Impala 3.0.x" /> +<meta name="DC.Format" content="XHTML" /> +<meta name="DC.Identifier" content="orc" /> +<link rel="stylesheet" type="text/css" href="../commonltr.css" /> +<title>Using the ORC File Format with Impala Tables</title> +</head> +<body id="orc"> + + + <h1 class="title topictitle1" id="ariaid-title1">Using the ORC File Format with Impala Tables</h1> + + + + + <div class="body conbody"> + + <p class="p"> + + Impala supports using ORC data files as an experimental feature since 2.12. + To disable it, set --enable_orc_scanner to false when starting the cluster. + </p> + + + +<div class="tablenoborder"><table cellpadding="4" cellspacing="0" summary="" class="table" frame="border" border="1" rules="all"><caption><span class="tablecap"><span class="table--title-label">Table 1. </span>ORC Format Support in Impala</span></caption><colgroup><col style="width:10%" /><col style="width:10%" /><col style="width:20%" /><col style="width:30%" /><col style="width:30%" /></colgroup><thead class="thead" style="text-align:left;"> + <tr class="row"> + <th class="entry nocellnorowborder" style="vertical-align:top;" id="d131791e70"> + File Type + </th> + + <th class="entry nocellnorowborder" style="vertical-align:top;" id="d131791e73"> + Format + </th> + + <th class="entry nocellnorowborder" style="vertical-align:top;" id="d131791e76"> + Compression Codecs + </th> + + <th class="entry nocellnorowborder" style="vertical-align:top;" id="d131791e79"> + Impala Can CREATE? + </th> + + <th class="entry cell-norowborder" style="vertical-align:top;" id="d131791e82"> + Impala Can INSERT? + </th> + + </tr> + + </thead> +<tbody class="tbody"> + <tr class="row"> + <td class="entry row-nocellborder" style="vertical-align:top;" headers="d131791e70 "> + <a class="xref" href="impala_orc.html#orc">ORC</a> + </td> + + <td class="entry row-nocellborder" style="vertical-align:top;" headers="d131791e73 "> + Structured + </td> + + <td class="entry row-nocellborder" style="vertical-align:top;" headers="d131791e76 "> + gzip, Snappy, LZO, LZ4; currently gzip by default + </td> + + <td class="entry row-nocellborder" style="vertical-align:top;" headers="d131791e79 "> + Yes, in Impala 2.12.0 and higher. + </td> + + <td class="entry cellrowborder" style="vertical-align:top;" headers="d131791e82 "> + No. Import data by using <code class="ph codeph">LOAD DATA</code> on data files already in the right format, or use + <code class="ph codeph">INSERT</code> in Hive followed by <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> in Impala. + </td> + + </tr> + + </tbody> +</table> +</div> + + + <p class="p toc inpage"></p> + + </div> + + + <div class="related-links"> +<div class="familylinks"> +<div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_file_formats.html">How Impala Works with Hadoop File Formats</a></div> +</div> +</div><div class="topic concept nested1" aria-labelledby="ariaid-title2" id="orc_create"> + + <h2 class="title topictitle2" id="ariaid-title2">Creating ORC Tables and Loading Data</h2> + + + + <div class="body conbody"> + + <p class="p"> + If you do not have an existing data file to use, begin by creating one in the appropriate format. + </p> + + + <p class="p"> + <strong class="ph b">To create an ORC table:</strong> + </p> + + + <p class="p"> + In the <code class="ph codeph">impala-shell</code> interpreter, issue a command similar to: + </p> + + +<pre class="pre codeblock"><code>CREATE TABLE orc_table (<var class="keyword varname">column_specs</var>) STORED AS ORC;</code></pre> + + <p class="p"> + Because Impala can query some kinds of tables that it cannot currently write to, after creating tables of + certain file formats, you might use the Hive shell to load the data. See + <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details. After loading data into a table through + Hive or other mechanism outside of Impala, issue a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> + statement the next time you connect to the Impala node, before querying the table, to make Impala recognize + the new data. + </p> + + + <p class="p"> + For example, here is how you might create some ORC tables in Impala (by specifying the columns + explicitly, or cloning the structure of another table), load data through Hive, and query them through + Impala: + </p> + + +<pre class="pre codeblock"><code>$ impala-shell -i localhost +[localhost:21000] default> CREATE TABLE orc_table (x INT) STORED AS ORC; +[localhost:21000] default> CREATE TABLE orc_clone LIKE some_other_table STORED AS ORC; +[localhost:21000] default> quit; + +$ hive +hive> INSERT INTO TABLE orc_table SELECT x FROM some_other_table; +3 Rows loaded to orc_table +Time taken: 4.169 seconds +hive> quit; + +$ impala-shell -i localhost +[localhost:21000] default> SELECT * FROM orc_table; +Fetched 0 row(s) in 0.11s +[localhost:21000] default> -- Make Impala recognize the data loaded through Hive; +[localhost:21000] default> REFRESH orc_table; +[localhost:21000] default> SELECT * FROM orc_table; ++---+ +| x | ++---+ +| 1 | +| 2 | +| 3 | ++---+ +Fetched 3 row(s) in 0.11s</code></pre> + + </div> + + </div> + + + <div class="topic concept nested1" aria-labelledby="ariaid-title3" id="orc_compression"> + + <h2 class="title topictitle2" id="ariaid-title3">Enabling Compression for ORC Tables</h2> + + + + <div class="body conbody"> + + <p class="p"> + + ORC tables are in zlib (Deflate in Impala) compression in default. You may want + to use Snappy or LZO compression on existing tables for different balance between + compression ratio and decompression speed. In Hive-1.1.0, the supported + compressions for ORC tables are NONE, ZLIB, SNAPPY and LZO. + For example, to enable Snappy compression, you would specify + the following additional settings when loading data through the Hive shell: + </p> + + +<pre class="pre codeblock"><code>hive> SET hive.exec.compress.output=true; +hive> SET orc.compress=SNAPPY; +hive> INSERT OVERWRITE TABLE <var class="keyword varname">new_table</var> SELECT * FROM <var class="keyword varname">old_table</var>;</code></pre> + + <p class="p"> + If you are converting partitioned tables, you must complete additional steps. In such a case, specify + additional settings similar to the following: + </p> + + +<pre class="pre codeblock"><code>hive> CREATE TABLE <var class="keyword varname">new_table</var> (<var class="keyword varname">your_cols</var>) PARTITIONED BY (<var class="keyword varname">partition_cols</var>) STORED AS <var class="keyword varname">new_format</var>; +hive> SET hive.exec.dynamic.partition.mode=nonstrict; +hive> SET hive.exec.dynamic.partition=true; +hive> INSERT OVERWRITE TABLE <var class="keyword varname">new_table</var> PARTITION(<var class="keyword varname">comma_separated_partition_cols</var>) SELECT * FROM <var class="keyword varname">old_table</var>;</code></pre> + + <p class="p"> + Remember that Hive does not require that you specify a source format for it. Consider the case of + converting a table with two partition columns called <code class="ph codeph">year</code> and <code class="ph codeph">month</code> to a + Snappy compressed ORC table. Combining the components outlined previously to complete this table conversion, + you would specify settings similar to the following: + </p> + + +<pre class="pre codeblock"><code>hive> CREATE TABLE tbl_orc (int_col INT, string_col STRING) STORED AS ORC; +hive> SET hive.exec.compress.output=true; +hive> SET orc.compress=SNAPPY; +hive> SET hive.exec.dynamic.partition.mode=nonstrict; +hive> SET hive.exec.dynamic.partition=true; +hive> INSERT OVERWRITE TABLE tbl_orc SELECT * FROM tbl;</code></pre> + + <p class="p"> + To complete a similar process for a table that includes partitions, you would specify settings similar to + the following: + </p> + + +<pre class="pre codeblock"><code>hive> CREATE TABLE tbl_orc (int_col INT, string_col STRING) PARTITIONED BY (year INT) STORED AS ORC; +hive> SET hive.exec.compress.output=true; +hive> SET orc.compress=SNAPPY; +hive> SET hive.exec.dynamic.partition.mode=nonstrict; +hive> SET hive.exec.dynamic.partition=true; +hive> INSERT OVERWRITE TABLE tbl_orc PARTITION(year) SELECT * FROM tbl;</code></pre> + + <div class="note note"><span class="notetitle">Note:</span> + <p class="p"> + The compression type is specified in the following command: + </p> + +<pre class="pre codeblock"><code>SET orc.compress=SNAPPY;</code></pre> + <p class="p"> + You could elect to specify alternative codecs such as <code class="ph codeph">NONE, GZIP, LZO</code> here. + </p> + + </div> + + </div> + + </div> + + + <div class="topic concept nested1" aria-labelledby="ariaid-title4" id="rcfile_performance"> + + <h2 class="title topictitle2" id="ariaid-title4">Query Performance for Impala ORC Tables</h2> + + + <div class="body conbody"> + + <p class="p"> + In general, expect query performance with ORC tables to be + faster than with tables using text data, but slower than with + Parquet tables since there're bunch of optimizations for Parquet. + See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a> + for information about using the Parquet file format for + high-performance analytic queries. + </p> + + + <p class="p"> + In <span class="keyword">Impala 2.6</span> and higher, Impala queries are optimized for files stored in Amazon S3. + For Impala tables that use the file formats Parquet, ORC, RCFile, SequenceFile, + Avro, and uncompressed text, the setting <code class="ph codeph">fs.s3a.block.size</code> + in the <span class="ph filepath">core-site.xml</span> configuration file determines + how Impala divides the I/O work of reading the data files. This configuration + setting is specified in bytes. By default, this + value is 33554432 (32 MB), meaning that Impala parallelizes S3 read operations on the files + as if they were made up of 32 MB blocks. For example, if your S3 queries primarily access + Parquet files written by MapReduce or Hive, increase <code class="ph codeph">fs.s3a.block.size</code> + to 134217728 (128 MB) to match the row group size of those files. If most S3 queries involve + Parquet files written by Impala, increase <code class="ph codeph">fs.s3a.block.size</code> + to 268435456 (256 MB) to match the row group size produced by Impala. + </p> + + + </div> + + </div> + + + <div class="topic concept nested1" aria-labelledby="ariaid-title5" id="orc_data_types"> + + <h2 class="title topictitle2" id="ariaid-title5">Data Type Considerations for ORC Tables</h2> + + + <div class="body conbody"> + + <p class="p"> + The ORC format defines a set of data types whose names differ from the names of the corresponding + Impala data types. If you are preparing ORC files using other Hadoop components such as Pig or + MapReduce, you might need to work with the type names defined by ORC. The following figure lists the + ORC-defined types and the equivalent types in Impala. + </p> + + + <p class="p"> + <strong class="ph b">Primitive types:</strong> + </p> + + +<pre class="pre codeblock"><code>BINARY -> STRING +BOOLEAN -> BOOLEAN +DOUBLE -> DOUBLE +FLOAT -> FLOAT +TINYINT -> TINYINT +SMALLINT -> SMALLINT +INT -> INT +BIGINT -> BIGINT +TIMESTAMP -> TIMESTAMP +DATE (not supported) +</code></pre> + + <p class="p"> + <strong class="ph b">Complex types:</strong> + </p> + + <p class="p"> + Complex types are currently not supported on ORC. However, queries materializing only scalar type + columns are allowed: + </p> + + +<pre class="pre codeblock"><code>$ hive +hive> CREATE TABLE orc_nested_table (id INT, a ARRAY<INT>) STORED AS ORC; +hive> INSERT INTO TABLE orc_nested_table SELECT 1, ARRAY(1,2,3); +OK +Time taken: 2.629 seconds +hive> quit; + +$ impala-shell -i localhost +[localhost:21000] default> INVALIDATE METADATA orc_nested_table; +[localhost:21000] default> SELECT 1 FROM orc_nested_table t, t.a; +ERROR: NotImplementedException: Scan of table 't' in format 'ORC' is not supported because the table has a column 'a' with a complex type 'ARRAY<INT>'. +Complex types are supported for these file formats: PARQUET. + +[localhost:21000] default> SELECT COUNT(*) FROM orc_nested_table; ++----------+ +| count(*) | ++----------+ +| 1 | ++----------+ +Fetched 1 row(s) in 0.12s + +[localhost:21000] default> SELECT id FROM orc_nested_table; ++----+ +| id | ++----+ +| 1 | ++----+ +Fetched 1 row(s) in 0.12s +</code></pre> + + </div> + + </div> + +</body> +</html> \ No newline at end of file
http://git-wip-us.apache.org/repos/asf/impala/blob/b4ad38a9/docs/build/html/topics/impala_order_by.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_order_by.html b/docs/build/html/topics/impala_order_by.html index 040c6ce..e8acc8c 100644 --- a/docs/build/html/topics/impala_order_by.html +++ b/docs/build/html/topics/impala_order_by.html @@ -1,8 +1,28 @@ +<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html - SYSTEM "about:legacy-compat"> -<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.12x"><meta name="version" content="Impala 2.12x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="order_by"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>ORDER BY Clause</title></head><body id="order_by"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> +<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> + +<meta name="copyright" content="(C) Copyright 2018" /> +<meta name="DC.rights.owner" content="(C) Copyright 2018" /> +<meta name="DC.Type" content="concept" /> +<meta name="DC.Title" content="ORDER BY Clause" /> +<meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html" /> +<meta name="prodname" content="Impala" /> +<meta name="prodname" content="Impala" /> +<meta name="version" content="Impala 3.0.x" /> +<meta name="version" content="Impala 3.0.x" /> +<meta name="DC.Format" content="XHTML" /> +<meta name="DC.Identifier" content="order_by" /> +<link rel="stylesheet" type="text/css" href="../commonltr.css" /> +<title>ORDER BY Clause</title> +</head> +<body id="order_by"> + <h1 class="title topictitle1" id="ariaid-title1">ORDER BY Clause</h1> + <div class="body conbody"> @@ -12,6 +32,7 @@ based on the values from one or more columns. </p> + <p class="p"> For distributed queries, this is a relatively expensive operation, because the entire result set must be produced and transferred to one node before the sorting can happen. This can require more memory capacity @@ -23,23 +44,28 @@ requirement on the coordinator node. </p> - <div class="note note note_note"><span class="note__title notetitle">Note:</span> + + <div class="note note"><span class="notetitle">Note:</span> <p class="p"> In Impala 1.4.0 and higher, the <code class="ph codeph">LIMIT</code> clause is now optional (rather than required) for queries that use the <code class="ph codeph">ORDER BY</code> clause. Impala automatically uses a temporary disk work area to perform the sort if the sort operation would otherwise exceed the Impala memory limit for a particular DataNode. </p> + </div> + <p class="p"> <strong class="ph b">Syntax:</strong> </p> + <p class="p"> The full syntax for the <code class="ph codeph">ORDER BY</code> clause is: </p> + <pre class="pre codeblock"><code>ORDER BY <var class="keyword varname">col_ref</var> [, <var class="keyword varname">col_ref</var> ...] [ASC | DESC] [NULLS FIRST | NULLS LAST] col_ref ::= <var class="keyword varname">column_name</var> | <var class="keyword varname">integer_literal</var> @@ -53,25 +79,30 @@ col_ref ::= <var class="keyword varname">column_name</var> | <var class="keyword query succeeds but the order of results is undefined.) </p> + <p class="p"> <code class="ph codeph">ORDER BY <var class="keyword varname">column_number</var></code> can only be used when the query explicitly lists the columns in the <code class="ph codeph">SELECT</code> list, not with <code class="ph codeph">SELECT *</code> queries. </p> + <p class="p"> <strong class="ph b">Ascending and descending sorts:</strong> </p> + <p class="p"> The default sort order (the same as using the <code class="ph codeph">ASC</code> keyword) puts the smallest values at the start of the result set, and the largest values at the end. Specifying the <code class="ph codeph">DESC</code> keyword reverses that order. </p> + <p class="p"> <strong class="ph b">Sort order for NULL values:</strong> </p> + <p class="p"> See <a class="xref" href="impala_literals.html#null">NULL</a> for details about how <code class="ph codeph">NULL</code> values are positioned in the sorted result set, and how to use the <code class="ph codeph">NULLS FIRST</code> and <code class="ph codeph">NULLS LAST</code> @@ -80,6 +111,7 @@ col_ref ::= <var class="keyword varname">column_name</var> | <var class="keyword <code class="ph codeph">NULLS LAST</code> keywords are new in Impala 1.2.1.) </p> + <p class="p"> Prior to Impala 1.4.0, Impala required any query including an <code class="ph codeph"><a class="xref" href="../shared/../topics/impala_order_by.html#order_by">ORDER BY</a></code> clause to also use a @@ -89,11 +121,13 @@ col_ref ::= <var class="keyword varname">column_name</var> | <var class="keyword Impala automatically uses a temporary disk work area to perform the sort operation. </p> + <p class="p"> <strong class="ph b">Complex type considerations:</strong> </p> + <p class="p"> In <span class="keyword">Impala 2.3</span> and higher, the complex data types <code class="ph codeph">STRUCT</code>, <code class="ph codeph">ARRAY</code>, and <code class="ph codeph">MAP</code> are available. These columns cannot @@ -104,10 +138,12 @@ col_ref ::= <var class="keyword varname">column_name</var> | <var class="keyword See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about Impala support for complex types. </p> + <p class="p"> The following query shows how a complex type column cannot be directly used in an <code class="ph codeph">ORDER BY</code> clause: </p> + <pre class="pre codeblock"><code>CREATE TABLE games (id BIGINT, score ARRAY <BIGINT>) STORED AS PARQUET; ...use LOAD DATA to load externally created Parquet files into the table... SELECT id FROM games ORDER BY score DESC; @@ -118,6 +154,7 @@ ERROR: AnalysisException: ORDER BY expression 'score' with complex type 'ARRAY&l <strong class="ph b">Examples:</strong> </p> + <p class="p"> The following query retrieves the user ID and score, only for scores greater than one million, with the highest scores for each user listed first. @@ -126,6 +163,7 @@ ERROR: AnalysisException: ORDER BY expression 'score' with complex type 'ARRAY&l pseudocolumn that represents each array element. </p> + <pre class="pre codeblock"><code>SELECT id, item FROM games, games.score WHERE item > 1000000 ORDER BY id, item desc; @@ -140,6 +178,7 @@ ORDER BY id, item desc; and <code class="ph codeph">ITEM.VALUE</code>. </p> + <pre class="pre codeblock"><code>CREATE TABLE games2 (id BIGINT, play array < struct <game_name: string, score: BIGINT, high_score: boolean> >) STORED AS PARQUET ...use LOAD DATA to load externally created Parquet files into the table... SELECT id, item.game_name, item.score FROM games2, games2.play @@ -157,6 +196,7 @@ ORDER BY id, info.value desc; <strong class="ph b">Usage notes:</strong> </p> + <p class="p"> Although the <code class="ph codeph">LIMIT</code> clause is now optional on <code class="ph codeph">ORDER BY</code> queries, if your query only needs some number of rows that you can predict in advance, use the <code class="ph codeph">LIMIT</code> clause @@ -165,6 +205,7 @@ ORDER BY id, info.value desc; coordinator node picks the 10 highest or lowest row values out of this small intermediate result set. </p> + <p class="p"> If an <code class="ph codeph">ORDER BY</code> clause is applied to an early phase of query processing, such as a subquery or a view definition, Impala ignores the <code class="ph codeph">ORDER BY</code> clause. To get ordered results from a @@ -172,11 +213,13 @@ ORDER BY id, info.value desc; level. </p> + <p class="p"> <code class="ph codeph">ORDER BY</code> is often used in combination with <code class="ph codeph">LIMIT</code> to perform <span class="q">"top-N"</span> queries: </p> + <pre class="pre codeblock"><code>SELECT user_id AS "Top 10 Visitors", SUM(page_views) FROM web_stats GROUP BY page_views, user_id ORDER BY SUM(page_views) DESC LIMIT 10; @@ -188,6 +231,7 @@ ORDER BY id, info.value desc; queries like this against the large tables typically used with Impala: </p> + <pre class="pre codeblock"><code>SELECT page_title AS "Page 1 of search results", page_url FROM search_content WHERE LOWER(page_title) LIKE '%game%') ORDER BY page_title LIMIT 10 OFFSET 0; @@ -203,12 +247,14 @@ SELECT page_title AS "Page 3 of search results", page_url FROM search_content <strong class="ph b">Internal details:</strong> </p> + <p class="p"> Impala sorts the intermediate results of an <code class="ph codeph">ORDER BY</code> clause in memory whenever practical. In a cluster of N DataNodes, each node sorts roughly 1/Nth of the result set, the exact proportion varying depending on how the data matching the query is distributed in HDFS. </p> + <p class="p"> If the size of the sorted intermediate result set on any DataNode would cause the query to exceed the Impala memory limit, Impala sorts as much as practical in memory, then writes partially sorted data to disk. (This @@ -221,6 +267,7 @@ SELECT page_title AS "Page 3 of search results", page_url FROM search_content the additional RAM to minimize the amount of disk I/O for sorting. </p> + <p class="p"> This external sort technique is used as appropriate on each DataNode (possibly including the coordinator node) to sort the portion of the result set that is processed on that node. When the sorted intermediate @@ -229,10 +276,12 @@ SELECT page_title AS "Page 3 of search results", page_url FROM search_content coordinator node. </p> + <p class="p"> <strong class="ph b">Configuration for disk usage:</strong> </p> + <p class="p"> By default, intermediate files used during large sort, join, aggregation, or analytic function operations are stored in the directory <span class="ph filepath">/tmp/impala-scratch</span> . These files are removed when the @@ -254,6 +303,7 @@ SELECT page_title AS "Page 3 of search results", page_url FROM search_content + <p class="p"> <strong class="ph b">Sorting considerations:</strong> Although you can specify an <code class="ph codeph">ORDER BY</code> clause in an <code class="ph codeph">INSERT ... SELECT</code> statement, any <code class="ph codeph">ORDER BY</code> clause is ignored and the @@ -262,6 +312,7 @@ SELECT page_title AS "Page 3 of search results", page_url FROM search_content stored in sorted order is impractical. </p> + <div class="p"> An <code class="ph codeph">ORDER BY</code> clause without an additional <code class="ph codeph">LIMIT</code> clause is ignored in any view definition. If you need to sort the entire result set from a view, use an <code class="ph codeph">ORDER BY</code> @@ -311,11 +362,13 @@ SELECT page_title AS "Page 3 of search results", page_url FROM search_content </code></pre> </div> + <p class="p"> With the lifting of the requirement to include a <code class="ph codeph">LIMIT</code> clause in every <code class="ph codeph">ORDER BY</code> query (in Impala 1.4 and higher): </p> + <ul class="ul"> <li class="li"> <p class="p"> @@ -323,19 +376,13 @@ SELECT page_title AS "Page 3 of search results", page_url FROM search_content particular DataNode, as opposed to the previous possibility of an <span class="q">"out of memory"</span> error. Make sure to keep at least 1 GB free on the filesystem used for temporary sorting work. </p> - </li> - <li class="li"> - <p class="p"> - The query options - <a class="xref" href="impala_default_order_by_limit.html#default_order_by_limit">DEFAULT_ORDER_BY_LIMIT</a> and - <a class="xref" href="impala_abort_on_default_limit_exceeded.html#abort_on_default_limit_exceeded">ABORT_ON_DEFAULT_LIMIT_EXCEEDED</a>, - which formerly controlled the behavior of <code class="ph codeph">ORDER BY</code> queries with no limit specified, are - now ignored. - </p> </li> + + </ul> + <p class="p"> In Impala 1.2.1 and higher, all <code class="ph codeph">NULL</code> values come at the end of the result set for <code class="ph codeph">ORDER BY ... ASC</code> queries, and at the beginning of the result set for <code class="ph codeph">ORDER BY ... @@ -346,6 +393,7 @@ SELECT page_title AS "Page 3 of search results", page_url FROM search_content behavior for <code class="ph codeph">NULL</code> by adding the clause <code class="ph codeph">NULLS FIRST</code> or <code class="ph codeph">NULLS LAST</code> at the end of the <code class="ph codeph">ORDER BY</code> clause. </p> + <pre class="pre codeblock"><code>[localhost:21000] > create table numbers (x int); [localhost:21000] > insert into numbers values (1), (null), (2), (null), (3); [localhost:21000] > select x from numbers order by x nulls first; @@ -394,14 +442,23 @@ SELECT page_title AS "Page 3 of search results", page_url FROM search_content <strong class="ph b">Related information:</strong> </p> + <p class="p"> See <a class="xref" href="impala_select.html#select">SELECT Statement</a> for further examples of queries with the <code class="ph codeph">ORDER BY</code> clause. </p> + <p class="p"> Analytic functions use the <code class="ph codeph">ORDER BY</code> clause in a different context to define the sequence in which rows are analyzed. See <a class="xref" href="impala_analytic_functions.html#analytic_functions">Impala Analytic Functions</a> for details. </p> + </div> -<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div></div></nav></article></main></body></html> \ No newline at end of file + +<div class="related-links"> +<div class="familylinks"> +<div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div> +</div> +</div></body> +</html> \ No newline at end of file