http://git-wip-us.apache.org/repos/asf/impala/blob/b4ad38a9/docs/build/html/topics/impala_complex_types.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_complex_types.html b/docs/build/html/topics/impala_complex_types.html index 1920363..119508e 100644 --- a/docs/build/html/topics/impala_complex_types.html +++ b/docs/build/html/topics/impala_complex_types.html @@ -1,9 +1,29 @@ +<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html - SYSTEM "about:legacy-compat"> -<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.12x"><meta name="version" content="Impala 2.12x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="complex_types"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Complex Types (Impala 2.3 or higher only)</title></head><body id="complex_types"><main role="main"><article role="article" aria-labelledby="complex_types__nested_types"> + PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> +<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> + +<meta name="copyright" content="(C) Copyright 2018" /> +<meta name="DC.rights.owner" content="(C) Copyright 2018" /> +<meta name="DC.Type" content="concept" /> +<meta name="DC.Title" content="Complex Types (Impala 2.3 or higher only)" /> +<meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html" /> +<meta name="prodname" content="Impala" /> +<meta name="prodname" content="Impala" /> +<meta name="version" content="Impala 3.0.x" /> +<meta name="version" content="Impala 3.0.x" /> +<meta name="DC.Format" content="XHTML" /> +<meta name="DC.Identifier" content="complex_types" /> +<link rel="stylesheet" type="text/css" href="../commonltr.css" /> +<title>Complex Types (Impala 2.3 or higher only)</title> +</head> +<body id="complex_types"> + <h1 class="title topictitle1" id="complex_types__nested_types">Complex Types (<span class="keyword">Impala 2.3</span> or higher only)</h1> + <div class="body conbody"> @@ -19,64 +39,85 @@ and higher. The Hive <code class="ph codeph">UNION</code> type is not currently supported. </p> + <p class="p toc inpage"></p> + <p class="p"> Once you understand the basics of complex types, refer to the individual type topics when you need to refresh your memory about syntax and examples: </p> + <ul class="ul"> <li class="li"> <a class="xref" href="impala_array.html#array">ARRAY Complex Type (Impala 2.3 or higher only)</a> </li> + <li class="li"> <a class="xref" href="impala_struct.html#struct">STRUCT Complex Type (Impala 2.3 or higher only)</a> </li> + <li class="li"> <a class="xref" href="impala_map.html#map">MAP Complex Type (Impala 2.3 or higher only)</a> </li> + </ul> + </div> - <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="complex_types__complex_types_benefits"> + + <div class="related-links"> +<div class="familylinks"> +<div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div> +</div> +</div><div class="topic concept nested1" aria-labelledby="ariaid-title2" id="complex_types_benefits"> <h2 class="title topictitle2" id="ariaid-title2">Benefits of Impala Complex Types</h2> + <div class="body conbody"> <p class="p"> The reasons for using Impala complex types include the following: </p> + <ul class="ul"> <li class="li"> <p class="p"> You already have data produced by Hive or other non-Impala component that uses the complex type column names. You might need to convert the underlying data to Parquet to use it with Impala. </p> + </li> + <li class="li"> <p class="p"> Your data model originates with a non-SQL programming language or a NoSQL data management system. For example, if you are representing Python data expressed as nested lists, dictionaries, and tuples, those data structures correspond closely to Impala <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> types. </p> + </li> + <li class="li"> <p class="p"> Your analytic queries involving multiple tables could benefit from greater locality during join processing. By packing more related data items within each HDFS data block, complex types let join queries avoid the network overhead of the traditional Hadoop shuffle or broadcast join techniques. </p> + </li> + </ul> + <p class="p"> The Impala complex type support produces result sets with all scalar values, and the scalar components of complex types can be used with all SQL clauses, such as <code class="ph codeph">GROUP BY</code>, <code class="ph codeph">ORDER BY</code>, all kinds of joins, subqueries, and inline @@ -84,14 +125,18 @@ programming languages to deconstruct the underlying data structures. </p> + </div> - </article> - <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="complex_types__complex_types_overview"> + </div> + + + <div class="topic concept nested1" aria-labelledby="ariaid-title3" id="complex_types_overview"> <h2 class="title topictitle2" id="ariaid-title3">Overview of Impala Complex Types</h2> + <div class="body conbody"> <p class="p"> @@ -102,6 +147,7 @@ has a name. </p> + <p class="p"> The elements of an <code class="ph codeph">ARRAY</code> or <code class="ph codeph">MAP</code>, or the fields of a <code class="ph codeph">STRUCT</code>, can also be other complex types. You can construct elaborate data structures with up to 100 levels of nesting. For example, you can make an @@ -111,6 +157,7 @@ properties of these types. </p> + <p class="p"> When visualizing your data model in familiar SQL terms, you can think of each <code class="ph codeph">ARRAY</code> or <code class="ph codeph">MAP</code> as a miniature table, and each <code class="ph codeph">STRUCT</code> as a row within such a table. By default, the table represented by an @@ -120,6 +167,7 @@ </p> + <p class="p"> The <code class="ph codeph">ITEM</code> and <code class="ph codeph">VALUE</code> names are only required for the very simplest kinds of <code class="ph codeph">ARRAY</code> and <code class="ph codeph">MAP</code> columns, ones that hold only scalar values. When the elements within the <code class="ph codeph">ARRAY</code> or @@ -129,6 +177,7 @@ + <p class="p"> You write most queries that process complex type columns using familiar join syntax, even though the data for both sides of the join resides in a single table. The join notation brings together the scalar values from a row with the values from the complex type @@ -137,6 +186,7 @@ </p> + <p class="p"> Behind the scenes, Impala ensures that the processing for each row is done efficiently on a single host, without the network traffic involved in broadcast or shuffle joins. The most common type of join query for tables with complex type columns is <code class="ph codeph">INNER @@ -144,7 +194,8 @@ examples in this section use either the <code class="ph codeph">INNER JOIN</code> clause or the equivalent comma notation. </p> - <div class="note note note_note"><span class="note__title notetitle">Note:</span> + + <div class="note note"><span class="notetitle">Note:</span> <p class="p"> Although Impala can query complex types that are present in Parquet files, Impala currently cannot create new Parquet files containing complex types. Therefore, the discussion and examples presume that you are working with existing Parquet data produced @@ -152,20 +203,26 @@ files with complex type columns. </p> + <p class="p"> For learning purposes, you can create empty tables with complex type columns and practice query syntax, even if you do not have sample data with the required structure. </p> + </div> + </div> - </article> - <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="complex_types__complex_types_design"> + </div> + + + <div class="topic concept nested1" aria-labelledby="ariaid-title4" id="complex_types_design"> <h2 class="title topictitle2" id="ariaid-title4">Design Considerations for Complex Types</h2> + <div class="body conbody"> <p class="p"> @@ -175,14 +232,18 @@ type data using Impala SQL syntax. </p> + <p class="p toc inpage"></p> + </div> - <article class="topic concept nested2" aria-labelledby="ariaid-title5" id="complex_types_design__complex_types_vs_rdbms"> + + <div class="topic concept nested2" aria-labelledby="ariaid-title5" id="complex_types_vs_rdbms"> <h3 class="title topictitle3" id="ariaid-title5">How Complex Types Differ from Traditional Data Warehouse Schemas</h3> + <div class="body conbody"> <p class="p"> @@ -190,15 +251,18 @@ relational database management systems or data warehouses, a schema with complex types has the following differences: </p> + <ul class="ul"> <li class="li"> <p class="p"> Logically, related values can now be grouped tightly together in the same table. </p> + <p class="p"> In traditional data warehousing, related values were typically arranged in one of two ways: </p> + <ul class="ul"> <li class="li"> <p class="p"> @@ -207,8 +271,10 @@ expensive because the related data had to be retrieved from separate locations. (In the case of distributed Hadoop queries, the joined tables might even be transmitted between different hosts in a cluster.) </p> + </li> + <li class="li"> <p class="p"> Flattened into a single denormalized table. Although this layout eliminated some potential performance issues by removing @@ -216,8 +282,11 @@ cause performance issues in other parts of the workflow, such as longer ETL cycles or more expensive full-table scans during queries. </p> + </li> + </ul> + <p class="p"> Complex types represent a middle ground that addresses these performance and volume concerns. By physically locating related data within the same data files, complex types increase locality and reduce the expense of join queries. By associating an @@ -227,17 +296,23 @@ <code class="ph codeph">MAP</code> types lets you model familiar constructs such as fact and dimension tables from a data warehouse, and wide tables representing sparse matrixes. </p> + </li> + </ul> + </div> - </article> - <article class="topic concept nested2" aria-labelledby="ariaid-title6" id="complex_types_design__complex_types_physical"> + </div> + + + <div class="topic concept nested2" aria-labelledby="ariaid-title6" id="complex_types_physical"> <h3 class="title topictitle3" id="ariaid-title6">Physical Storage for Complex Types</h3> + <div class="body conbody"> <p class="p"> @@ -248,54 +323,69 @@ (possibly large) values of the composite columns. </p> + <p class="p"> Within each Parquet data file, the constituent parts of complex type columns are stored in column-oriented format: </p> + <ul class="ul"> <li class="li"> <p class="p"> Each field of a <code class="ph codeph">STRUCT</code> type is stored like a column, with all the scalar values adjacent to each other and encoded, compressed, and so on using the Parquet space-saving techniques. </p> + </li> + <li class="li"> <p class="p"> For an <code class="ph codeph">ARRAY</code> containing scalar values, all those values (represented by the <code class="ph codeph">ITEM</code> pseudocolumn) are stored adjacent to each other. </p> + </li> + <li class="li"> <p class="p"> For a <code class="ph codeph">MAP</code>, the values of the <code class="ph codeph">KEY</code> pseudocolumn are stored adjacent to each other. If the <code class="ph codeph">VALUE</code> pseudocolumn is a scalar type, its values are also stored adjacent to each other. </p> + </li> + <li class="li"> <p class="p"> If an <code class="ph codeph">ARRAY</code> element, <code class="ph codeph">STRUCT</code> field, or <code class="ph codeph">MAP</code> <code class="ph codeph">VALUE</code> part is another complex type, the column-oriented storage applies to the next level down (or the next level after that, and so on for deeply nested types) where the final elements, fields, or values are of scalar types. </p> + </li> + </ul> + <p class="p"> The numbers represented by the <code class="ph codeph">POS</code> pseudocolumn of an <code class="ph codeph">ARRAY</code> are not physically stored in the data files. They are synthesized at query time based on the order of the <code class="ph codeph">ARRAY</code> elements associated with each row. </p> + </div> - </article> - <article class="topic concept nested2" aria-labelledby="ariaid-title7" id="complex_types_design__complex_types_file_formats"> + </div> + + + <div class="topic concept nested2" aria-labelledby="ariaid-title7" id="complex_types_file_formats"> <h3 class="title topictitle3" id="ariaid-title7">File Format Support for Impala Complex Types</h3> + <div class="body conbody"> <p class="p"> @@ -303,15 +393,6 @@ for details about the performance benefits and physical layout of this file format. </p> - <p class="p"> - Each table, or each partition within a table, can have a separate file format, and you can change file format at the table or - partition level through an <code class="ph codeph">ALTER TABLE</code> statement. Because this flexibility makes it difficult to guarantee ahead - of time that all the data files for a table or partition are in a compatible format, Impala does not throw any errors when you - change the file format for a table or partition using <code class="ph codeph">ALTER TABLE</code>. Any errors come at runtime when Impala - actually processes a table or partition that contains nested types and is not in one of the supported formats. If a query on a - partitioned table only processes some partitions, and all those partitions are in one of the supported formats, the query - succeeds. - </p> <p class="p"> Because Impala does not parse the data structures containing nested types for unsupported formats such as text, Avro, @@ -321,41 +402,54 @@ nested type data and Impala queries on that table will generate errors. </p> - <div class="note note note_note"><span class="note__title notetitle">Note:</span> - <p class="p"> + + <p class="p"> The one exception to the preceding rule is <code class="ph codeph">COUNT(*)</code> queries on RCFile tables that include complex types. Such queries are allowed in <span class="keyword">Impala 2.6</span> and higher. - </p> - </div> + </p> + <p class="p"> - You can perform DDL operations (even <code class="ph codeph">CREATE TABLE</code>) for tables involving complex types in file formats other than - Parquet. The DDL support lets you set up intermediate tables in your ETL pipeline, to be populated by Hive, before the final stage - where the data resides in a Parquet table and is queryable by Impala. Also, you can have a partitioned table with complex type - columns that uses a non-Parquet format, and use <code class="ph codeph">ALTER TABLE</code> to change the file format to Parquet for individual - partitions. When you put Parquet data files into those partitions, Impala can execute queries against that data as long as the - query does not involve any of the non-Parquet partitions. + You can perform DDL operations for tables involving complex types in + most file formats other than Parquet. You cannot create tables in + Impala with complex types using text files. </p> + + <p class="p"> + You can have a partitioned table with complex type columns that uses + a non-Parquet format, and use <code class="ph codeph">ALTER TABLE</code> to change + the file format to Parquet for individual partitions. When you put + Parquet data files into those partitions, Impala can execute queries + against that data as long as the query does not involve any of the + non-Parquet partitions. + </p> + + <p class="p"> If you use the <span class="keyword cmdname">parquet-tools</span> command to examine the structure of a Parquet data file that includes complex types, you see that both <code class="ph codeph">ARRAY</code> and <code class="ph codeph">MAP</code> are represented as a <code class="ph codeph">Bag</code> in Parquet terminology, with all fields marked <code class="ph codeph">Optional</code> because Impala allows any column to be nullable. </p> + <p class="p"> Impala supports either 2-level and 3-level encoding within each Parquet data file. When constructing Parquet data files outside Impala, use either encoding style but do not mix 2-level and 3-level encoding within the same data file. </p> + </div> - </article> - <article class="topic concept nested2" aria-labelledby="ariaid-title8" id="complex_types_design__complex_types_vs_normalization"> + </div> + + + <div class="topic concept nested2" aria-labelledby="ariaid-title8" id="complex_types_vs_normalization"> <h3 class="title topictitle3" id="ariaid-title8">Choosing Between Complex Types and Normalized Tables</h3> + <div class="body conbody"> <p class="p"> @@ -363,6 +457,7 @@ decision. </p> + <ul class="ul"> <li class="li"> <p class="p"> @@ -370,24 +465,30 @@ between tables. Your business intelligence tools might already be optimized for dealing with this kind of multi-table scenario through join queries. </p> + </li> + <li class="li"> <p class="p"> If you are pulling data from Impala into an application written in a programming language that has data structures analogous to the complex types, such as Python or Java, complex types in Impala could simplify data interchange and improve understandability and reliability of your program logic. </p> + </li> + <li class="li"> <p class="p"> You might already be faced with existing infrastructure or receive high volumes of data that assume one layout or the other. For example, complex types are popular with web-oriented applications, for example to keep information about an online user all in one place for convenient lookup and analysis, or to deal with sparse or constantly evolving data fields. </p> + </li> + <li class="li"> <p class="p"> If some parts of the data change over time while related data remains constant, using multiple normalized tables lets you @@ -395,12 +496,15 @@ together, such as in JSON files, using complex types can save the overhead of splitting the related items across multiple tables. </p> + </li> + <li class="li"> <p class="p"> From a performance perspective: </p> + <ul class="ul"> <li class="li"> <p class="p"> @@ -410,8 +514,10 @@ from that column, only the data for the relevant parts of the column type hierarchy. </p> + </li> + <li class="li"> <p class="p"> Complex types avoid the possibility of expensive join queries when data from fact and dimension tables is processed in @@ -419,8 +525,10 @@ block, and therefore does not need to be transmitted across the network when joining fields that are all part of the same row. </p> + </li> + <li class="li"> <p class="p"> The tradeoff with complex types is that fewer rows fit in each data block. Whether it is better to have more data blocks @@ -430,26 +538,28 @@ size by including complex columns might produce more data blocks and thus spread the work more evenly across the cluster. See <a class="xref" href="impala_scalability.html#scalability">Scalability Considerations for Impala</a> for more on this advanced topic. </p> + </li> + </ul> - </li> - </ul> - </div> + </li> - </article> + </ul> - <article class="topic concept nested2" aria-labelledby="ariaid-title9" id="complex_types_design__complex_types_hive"> - <h3 class="title topictitle3" id="ariaid-title9">Differences Between Impala and Hive Complex Types</h3> + </div> - <div class="body conbody"> + </div> + <div class="topic concept nested2" aria-labelledby="ariaid-title9" id="complex_types_hive"> + <h3 class="title topictitle3" id="ariaid-title9">Differences Between Impala and Hive Complex Types</h3> + <div class="body conbody"> <p class="p"> Impala can query Parquet tables containing <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code> columns @@ -458,25 +568,31 @@ </p> <p class="p"> - The syntax for specifying <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code> types in a <code class="ph codeph">CREATE - TABLE</code> statement is compatible between Impala and Hive. + Impala supports a subset of the syntax that Hive supports for + specifying <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and + <code class="ph codeph">MAP</code> types in the <code class="ph codeph">CREATE TABLE</code> + statements. </p> + <p class="p"> Because Impala <code class="ph codeph">STRUCT</code> columns include user-specified field names, you use the <code class="ph codeph">NAMED_STRUCT()</code> constructor in Hive rather than the <code class="ph codeph">STRUCT()</code> constructor when you populate an Impala <code class="ph codeph">STRUCT</code> column using a Hive <code class="ph codeph">INSERT</code> statement. </p> + <p class="p"> The Hive <code class="ph codeph">UNION</code> type is not currently supported in Impala. </p> + <p class="p"> While Impala usually aims for a high degree of compatibility with HiveQL query syntax, Impala syntax differs from Hive for queries involving complex types. The differences are intended to provide extra flexibility for queries involving these kinds of tables. </p> + <ul class="ul"> <li class="li"> Impala uses dot notation for referring to element names or elements within complex types, and join notation for @@ -484,18 +600,21 @@ VIEW</code> clause and <code class="ph codeph">EXPLODE()</code> function of HiveQL. </li> + <li class="li"> Using join notation lets you use all the kinds of join queries with complex type columns. For example, you can use a <code class="ph codeph">LEFT OUTER JOIN</code>, <code class="ph codeph">LEFT ANTI JOIN</code>, or <code class="ph codeph">LEFT SEMI JOIN</code> query to evaluate different scenarios where the complex columns do or do not contain any elements. </li> + <li class="li"> You can include references to collection types inside subqueries and inline views. For example, you can construct a <code class="ph codeph">FROM</code> clause where one of the <span class="q">"tables"</span> is a subquery against a complex type column, or use a subquery against a complex type column as the argument to an <code class="ph codeph">IN</code> or <code class="ph codeph">EXISTS</code> clause. </li> + <li class="li"> The Impala pseudocolumn <code class="ph codeph">POS</code> lets you retrieve the position of elements in an array along with the elements themselves, equivalent to the <code class="ph codeph">POSEXPLODE()</code> function of HiveQL. You do not use index notation to retrieve a @@ -503,39 +622,50 @@ specify which elements to return. </li> + <li class="li"> <p class="p"> Join clauses involving complex type columns do not require an <code class="ph codeph">ON</code> or <code class="ph codeph">USING</code> clause. Impala implicitly applies the join key so that the correct array entries or map elements are associated with the correct row from the table. </p> + </li> + <li class="li"> <p class="p"> Impala does not currently support the <code class="ph codeph">UNION</code> complex type. </p> + </li> + </ul> + </div> - </article> - <article class="topic concept nested2" aria-labelledby="ariaid-title10" id="complex_types_design__complex_types_limits"> + </div> + + + <div class="topic concept nested2" aria-labelledby="ariaid-title10" id="complex_types_limits"> <h3 class="title topictitle3" id="ariaid-title10">Limitations and Restrictions for Complex Types</h3> + <div class="body conbody"> <p class="p"> Complex type columns can only be used in tables or partitions with the Parquet file format. </p> + <p class="p"> Complex type columns cannot be used as partition key columns in a partitioned table. </p> + <p class="p"> When you use complex types with the <code class="ph codeph">ORDER BY</code>, <code class="ph codeph">GROUP BY</code>, <code class="ph codeph">HAVING</code>, or <code class="ph codeph">WHERE</code> clauses, you cannot refer to the column name by itself. Instead, you refer to the names of the scalar @@ -543,32 +673,38 @@ <code class="ph codeph">VALUE</code> pseudocolumns, or the field names from a <code class="ph codeph">STRUCT</code>. </p> + <p class="p"> The maximum depth of nesting for complex types is 100 levels. </p> + <p class="p"> The maximum length of the column definition for any complex type, including declarations for any nested types, is 4000 characters. </p> + <p class="p"> For ideal performance and scalability, use small or medium-sized collections, where all the complex columns contain at most a few hundred megabytes per row. Remember, all the columns of a row are stored in the same HDFS data block, whose size in Parquet files typically ranges from 256 MB to 1 GB. </p> + <p class="p"> Including complex type columns in a table introduces some overhead that might make queries that do not reference those columns somewhat slower than Impala queries against tables without any complex type columns. Expect at most a 2x slowdown compared to tables that do not have any complex type columns. </p> + <p class="p"> Currently, the <code class="ph codeph">COMPUTE STATS</code> statement does not collect any statistics for columns containing complex types. Impala uses heuristics to construct execution plans involving complex type columns. </p> + <p class="p"> Currently, Impala built-in functions and user-defined functions cannot accept complex types as parameters or produce them as function return values. (When the complex type values are materialized in an Impala result set, the result set contains the scalar @@ -577,6 +713,7 @@ scalar data items <em class="ph i">can</em> be used with built-in functions and UDFs as usual.) </p> + <p class="p"> Impala currently cannot write new data files containing complex type columns. Therefore, although the <code class="ph codeph">SELECT</code> statement works for queries @@ -586,6 +723,7 @@ ETL mechanism such as MapReduce jobs, Spark jobs, Pig, and so on. </p> + <p class="p"> Currently, Impala can query complex type columns only from Parquet tables or Parquet partitions within partitioned tables. Although you can use complex types in tables with Avro, text, and other file formats as part of your ETL pipeline, for example as @@ -595,16 +733,21 @@ <a class="xref" href="impala_complex_types.html#complex_types_file_formats">File Format Support for Impala Complex Types</a> for more details. </p> + </div> - </article> - </article> + </div> + + + </div> + - <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="complex_types__complex_types_using"> + <div class="topic concept nested1" aria-labelledby="ariaid-title11" id="complex_types_using"> <h2 class="title topictitle2" id="ariaid-title11">Using Complex Types from SQL</h2> + <div class="body conbody"> <p class="p"> @@ -614,14 +757,18 @@ number of Parquet tables, and use Hive, Spark, Pig, or other mechanism outside Impala to populate the tables with data. </p> + <p class="p toc inpage"></p> + </div> - <article class="topic concept nested2" aria-labelledby="ariaid-title12" id="complex_types_using__nested_types_ddl"> + + <div class="topic concept nested2" aria-labelledby="ariaid-title12" id="nested_types_ddl"> <h3 class="title topictitle3" id="ariaid-title12">Complex Type Syntax for DDL Statements</h3> + <div class="body conbody"> <p class="p"> @@ -629,6 +776,7 @@ statements, now includes complex types in addition to primitive types: </p> + <pre class="pre codeblock"><code> primitive_type | array_type | map_type @@ -640,31 +788,31 @@ </p> <p class="p"> - Array, struct, and map column type declarations are specified in the <code class="ph codeph">CREATE TABLE</code> statement. You can also add or - change the type of complex columns through the <code class="ph codeph">ALTER TABLE</code> statement. - </p> + <code class="ph codeph">Array</code>, <code class="ph codeph">struct</code>, and + <code class="ph codeph">map</code> column type declarations are specified in the + <code class="ph codeph">CREATE TABLE</code> statement. You can also add or change + the type of complex columns through the <code class="ph codeph">ALTER TABLE</code> + statement. </p> - <div class="note note note_note"><span class="note__title notetitle">Note:</span> - <p class="p"> - Currently, Impala queries allow complex types only in tables that use the Parquet format. If an Impala query encounters complex - types in a table or partition using another file format, the query returns a runtime error. - </p> + <p class="p"> Currently, Impala queries allow complex types only in tables that + use the Parquet format. If an Impala query encounters complex types in + a table or partition using another file format, the query returns a + runtime error. </p> + + <p class="p"> You can use <code class="ph codeph">ALTER TABLE ... SET FILEFORMAT PARQUET</code> + to change the file format of an existing table containing complex + types to Parquet, after which Impala can query it. Make sure to load + Parquet files into the table after changing the file format, because + the <code class="ph codeph">ALTER TABLE ... SET FILEFORMAT</code> statement does not + convert existing data to the new file format. </p> - <p class="p"> - The Impala DDL support for complex types works for all file formats, so that you can create tables using text or other - non-Parquet formats for Hive to use as staging tables in an ETL cycle that ends with the data in a Parquet table. You can also - use <code class="ph codeph">ALTER TABLE ... SET FILEFORMAT PARQUET</code> to change the file format of an existing table containing complex - types to Parquet, after which Impala can query it. Make sure to load Parquet files into the table after changing the file - format, because the <code class="ph codeph">ALTER TABLE ... SET FILEFORMAT</code> statement does not convert existing data to the new file - format. - </p> - </div> <p class="p"> Partitioned tables can contain complex type columns. All the partition key columns must be scalar types. </p> + <p class="p"> Because use cases for Impala complex types require that you already have Parquet data files produced outside of Impala, you can use the Impala <code class="ph codeph">CREATE TABLE LIKE PARQUET</code> syntax to produce a table with columns that match the structure of an @@ -673,13 +821,15 @@ resulting table is still text. </p> + <p class="p"> Because the complex columns are omitted from the result set of an Impala <code class="ph codeph">SELECT *</code> or <code class="ph codeph">SELECT <var class="keyword varname">col_name</var></code> query, and because Impala currently does not support writing Parquet files with complex type columns, you cannot use the <code class="ph codeph">CREATE TABLE AS SELECT</code> syntax to create a table with nested type columns. </p> - <div class="note note note_note"><span class="note__title notetitle">Note:</span> + + <div class="note note"><span class="notetitle">Note:</span> <p class="p"> Once you have a table set up with complex type columns, use the <code class="ph codeph">DESCRIBE</code> and <code class="ph codeph">SHOW CREATE TABLE</code> statements to see the correct notation with <code class="ph codeph"><</code> and <code class="ph codeph">></code> delimiters and comma and colon @@ -689,21 +839,25 @@ referring to items within the complex type columns. In the <code class="ph codeph">FROM</code> clause, you use join notation to construct table aliases for any referenced <code class="ph codeph">ARRAY</code> and <code class="ph codeph">MAP</code> columns. </p> + </div> + <p class="p"> For example, when defining a table that holds contact information, you might represent phone numbers differently depending on the expected layout and relationships of the data, and how well you can predict those properties in advance. </p> + <p class="p"> Here are different ways that you might represent phone numbers in a traditional relational schema, with equivalent representations using complex types. </p> - <figure class="fig fignone" id="nested_types_ddl__complex_types_phones_flat_fixed"><figcaption><span class="fig--title-label">Figure 1. </span>Traditional Relational Representation of Phone Numbers: Single Table</figcaption> + + <div class="fig fignone" id="nested_types_ddl__complex_types_phones_flat_fixed"><span class="figcap"><span class="fig--title-label">Figure 1. </span>Traditional Relational Representation of Phone Numbers: Single Table</span> @@ -714,6 +868,7 @@ corresponding column is <code class="ph codeph">NULL</code> for that row. </p> + <pre class="pre codeblock"><code> CREATE TABLE contacts_fixed_phones ( @@ -726,9 +881,10 @@ CREATE TABLE contacts_fixed_phones ) STORED AS PARQUET; </code></pre> - </figure> + </div> - <figure class="fig fignone" id="nested_types_ddl__complex_types_phones_array"><figcaption><span class="fig--title-label">Figure 2. </span>An Array of Phone Numbers</figcaption> + + <div class="fig fignone" id="nested_types_ddl__complex_types_phones_array"><span class="figcap"><span class="fig--title-label">Figure 2. </span>An Array of Phone Numbers</span> @@ -740,6 +896,7 @@ CREATE TABLE contacts_fixed_phones <code class="ph codeph">ARRAY</code> where each element is a <code class="ph codeph">STRUCT</code>.) </p> + <pre class="pre codeblock"><code> CREATE TABLE contacts_array_of_phones ( @@ -751,9 +908,10 @@ CREATE TABLE contacts_array_of_phones </code></pre> - </figure> + </div> + - <figure class="fig fignone" id="nested_types_ddl__complex_types_phones_map"><figcaption><span class="fig--title-label">Figure 3. </span>A Map of Phone Numbers</figcaption> + <div class="fig fignone" id="nested_types_ddl__complex_types_phones_map"><span class="figcap"><span class="fig--title-label">Figure 3. </span>A Map of Phone Numbers</span> @@ -764,6 +922,7 @@ CREATE TABLE contacts_array_of_phones <code class="ph codeph">'mobile'</code>. A query could filter the data based on the key values, or display the key values in reports. </p> + <pre class="pre codeblock"><code> CREATE TABLE contacts_unlimited_phones ( @@ -772,9 +931,10 @@ CREATE TABLE contacts_unlimited_phones </code></pre> - </figure> + </div> + - <figure class="fig fignone" id="nested_types_ddl__complex_types_phones_flat_normalized"><figcaption><span class="fig--title-label">Figure 4. </span>Traditional Relational Representation of Phone Numbers: Normalized Tables</figcaption> + <div class="fig fignone" id="nested_types_ddl__complex_types_phones_flat_normalized"><span class="figcap"><span class="fig--title-label">Figure 4. </span>Traditional Relational Representation of Phone Numbers: Normalized Tables</span> @@ -785,6 +945,7 @@ CREATE TABLE contacts_unlimited_phones number, such as whether it is a home, work, or mobile phone. </p> + <p class="p"> The flexibility of this approach comes with some drawbacks. Reconstructing all the data for a particular person requires a join query, which might require performance tuning on Hadoop because the data from each table might be transmitted from a different @@ -792,6 +953,7 @@ CREATE TABLE contacts_unlimited_phones table. </p> + <p class="p"> This example illustrates a traditional database schema to store contact info normalized across 2 tables. The fact table establishes the identity and basic information about person. A dimension table stores information only about phone numbers, @@ -800,6 +962,7 @@ CREATE TABLE contacts_unlimited_phones to represent all sorts of details about each phone number. </p> + <pre class="pre codeblock"><code> CREATE TABLE fact_contacts (id BIGINT, name STRING, address STRING) STORED AS PARQUET; CREATE TABLE dim_phones @@ -819,9 +982,10 @@ CREATE TABLE dim_phones STORED AS PARQUET; </code></pre> - </figure> + </div> + - <figure class="fig fignone" id="nested_types_ddl__complex_types_phones_array_struct"><figcaption><span class="fig--title-label">Figure 5. </span>Phone Numbers Represented as an Array of Structs</figcaption> + <div class="fig fignone" id="nested_types_ddl__complex_types_phones_array_struct"><span class="figcap"><span class="fig--title-label">Figure 5. </span>Phone Numbers Represented as an Array of Structs</span> @@ -834,6 +998,7 @@ STORED AS PARQUET; table from the previous example. </p> + <p class="p"> You can do all the same kinds of queries with the complex type schema as with the normalized schema from the previous example. The advantages of the complex type design are in the areas of convenience and performance. Now your backup and ETL processes @@ -842,6 +1007,7 @@ STORED AS PARQUET; single host without requiring network transmission. </p> + <pre class="pre codeblock"><code> CREATE TABLE contacts_detailed_phones ( @@ -862,16 +1028,20 @@ CREATE TABLE contacts_detailed_phones </code></pre> - </figure> + </div> + </div> - </article> - <article class="topic concept nested2" aria-labelledby="ariaid-title13" id="complex_types_using__complex_types_sql"> + </div> + + + <div class="topic concept nested2" aria-labelledby="ariaid-title13" id="complex_types_sql"> <h3 class="title topictitle3" id="ariaid-title13">SQL Statements that Support Complex Types</h3> + <div class="body conbody"> <p class="p"> @@ -885,6 +1055,7 @@ CREATE TABLE contacts_detailed_phones containing complex type columns into a table, and query Parquet tables containing complex types. </p> + <p class="p"> Impala currently cannot write new data files containing complex type columns. Therefore, although the <code class="ph codeph">SELECT</code> statement works for queries @@ -894,20 +1065,25 @@ CREATE TABLE contacts_detailed_phones ETL mechanism such as MapReduce jobs, Spark jobs, Pig, and so on. </p> + <p class="p toc inpage"></p> + </div> - <article class="topic concept nested3" aria-labelledby="ariaid-title14" id="complex_types_sql__complex_types_ddl"> + + <div class="topic concept nested3" aria-labelledby="ariaid-title14" id="complex_types_ddl"> <h4 class="title topictitle4" id="ariaid-title14">DDL Statements and Complex Types</h4> + <div class="body conbody"> <p class="p"> Column specifications for complex or nested types use <code class="ph codeph"><</code> and <code class="ph codeph">></code> delimiters: </p> + <pre class="pre codeblock"><code>-- What goes inside the < > for an ARRAY is a single type, either a scalar or another -- complex type (ARRAY, STRUCT, or MAP). CREATE TABLE array_t @@ -950,12 +1126,15 @@ STORED AS PARQUET; </div> - </article> - <article class="topic concept nested3" aria-labelledby="ariaid-title15" id="complex_types_sql__complex_types_queries"> + </div> + + + <div class="topic concept nested3" aria-labelledby="ariaid-title15" id="complex_types_queries"> <h4 class="title topictitle4" id="ariaid-title15">Queries and Complex Types</h4> + <div class="body conbody"> @@ -969,12 +1148,14 @@ STORED AS PARQUET; columns with complex types are skipped. </p> + <p class="p"> The following example shows how referring directly to a complex type column returns an error, while <code class="ph codeph">SELECT *</code> on the same table succeeds, but only retrieves the scalar columns. </p> - <div class="note note note_note"><span class="note__title notetitle">Note:</span> + + <div class="note note"><span class="notetitle">Note:</span> Many of the complex type examples refer to tables such as <code class="ph codeph">CUSTOMER</code> and <code class="ph codeph">REGION</code> adapted from the tables used in the TPC-H benchmark. @@ -984,6 +1165,7 @@ STORED AS PARQUET; + <pre class="pre codeblock"><code>SELECT c_orders FROM customer LIMIT 1; ERROR: AnalysisException: Expr 'c_orders' in select list returns a complex type 'ARRAY<STRUCT<o_orderkey:BIGINT,o_orderstatus:STRING, ... l_receiptdate:STRING,l_shipinstruct:STRING,l_shipmode:STRING,l_comment:STRING>>>>'. Only scalar types are allowed in the select list. @@ -1043,6 +1225,7 @@ DESC select_star_customer; + <pre class="pre codeblock"><code>SELECT id, address.city FROM customers WHERE address.zip = 94305; </code></pre> @@ -1052,6 +1235,7 @@ DESC select_star_customer; + <pre class="pre codeblock"><code>select r_name, r_nations.item.n_name from region, region.r_nations limit 7; +--------+----------------+ | r_name | item.n_name | @@ -1073,6 +1257,7 @@ DESC select_star_customer; <code class="ph codeph">MAP_FIELD.VALUE</code>, which have zero, one, or many instances for each row from the containing table. </p> + <pre class="pre codeblock"><code>DESCRIBE table_0; +---------+-----------------------+ | name | type | @@ -1114,6 +1299,7 @@ LIMIT 10; + <pre class="pre codeblock"><code>SELECT id, phone_numbers.area_code FROM contact_info_many_structs INNER JOIN contact_info_many_structs.phone_numbers phone_numbers LIMIT 3; </code></pre> @@ -1131,7 +1317,8 @@ LIMIT 10; - <div class="note note note_note"><span class="note__title notetitle">Note:</span> + + <div class="note note"><span class="notetitle">Note:</span> Many of the complex type examples refer to tables such as <code class="ph codeph">CUSTOMER</code> and <code class="ph codeph">REGION</code> adapted from the tables used in the TPC-H benchmark. @@ -1139,11 +1326,13 @@ LIMIT 10; for the table definitions. </div> + <p class="p"> For example, the following queries work equivalently. They each return customer and order data for customers that have at least one order. </p> + <pre class="pre codeblock"><code>SELECT c.c_name, o.o_orderkey FROM customer c, c.c_orders o LIMIT 5; +--------------------+------------+ | c_name | o_orderkey | @@ -1172,6 +1361,7 @@ SELECT c.c_name, o.o_orderkey FROM customer c INNER JOIN c.c_orders o LIMIT 5; <code class="ph codeph">C_ORDERS</code> array): </p> + <pre class="pre codeblock"><code>SELECT c.c_custkey, o.o_orderkey FROM customer c LEFT OUTER JOIN c.c_orders o LIMIT 5; @@ -1193,6 +1383,7 @@ LIMIT 5; information in the right-hand table.) </p> + <pre class="pre codeblock"><code>SELECT c.c_custkey, c.c_name FROM customer c LEFT ANTI JOIN c.c_orders o LIMIT 5; @@ -1214,12 +1405,14 @@ LIMIT 5; You can also perform correlated subqueries to examine the properties of complex type columns for each row in the result set. </p> + <p class="p"> Count the number of orders per customer. Note the correlated reference to the table alias <code class="ph codeph">C</code>. The <code class="ph codeph">COUNT(*)</code> operation applies to all the elements of the <code class="ph codeph">C_ORDERS</code> array for the corresponding row, avoiding the need for a <code class="ph codeph">GROUP BY</code> clause. </p> + <pre class="pre codeblock"><code>select c_name, howmany FROM customer c, (SELECT COUNT(*) howmany FROM c.c_orders) v limit 5; +--------------------+---------+ | c_name | howmany | @@ -1236,6 +1429,7 @@ LIMIT 5; Count the number of orders per customer, ignoring any customers that have not placed any orders: </p> + <pre class="pre codeblock"><code>SELECT c_name, howmany_orders FROM customer c, @@ -1260,6 +1454,7 @@ LIMIT 5; from each row of the <code class="ph codeph">CUSTOMERS</code> table. </p> + <pre class="pre codeblock"><code>SELECT c_name, o_orderkey, howmany_line_items FROM customer c, @@ -1284,6 +1479,7 @@ LIMIT 5; the original <code class="ph codeph">CUSTOMER</code> table, and only apply to the complex columns associated with that row. </p> + <pre class="pre codeblock"><code>SELECT c_name, howmany, average_price, most_items FROM customer c, @@ -1308,6 +1504,7 @@ LIMIT 5; another <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>: </p> + <pre class="pre codeblock"><code>-- How many orders does each customer have? -- The type of the ARRAY column doesn't matter, this is just counting the elements. SELECT c_custkey, count(*) @@ -1361,14 +1558,18 @@ LIMIT 5; </div> - </article> - </article> + </div> + + + </div> + - <article class="topic concept nested2" aria-labelledby="ariaid-title16" id="complex_types_using__pseudocolumns"> + <div class="topic concept nested2" aria-labelledby="ariaid-title16" id="pseudocolumns"> <h3 class="title topictitle3" id="ariaid-title16">Pseudocolumns for ARRAY and MAP Types</h3> + <div class="body conbody"> <p class="p"> @@ -1378,6 +1579,7 @@ LIMIT 5; part of qualified column names in queries: </p> + <ul class="ul"> <li class="li"> <code class="ph codeph">ITEM</code>: The value of an array element. If the <code class="ph codeph">ARRAY</code> contains <code class="ph codeph">STRUCT</code> elements, @@ -1385,32 +1587,40 @@ LIMIT 5; <code class="ph codeph"><var class="keyword varname">array_name</var>.<var class="keyword varname">field_name</var></code>. </li> + <li class="li"> <code class="ph codeph">POS</code>: The position of an element within an array. </li> + <li class="li"> <code class="ph codeph">KEY</code>: The value forming the first part of a key-value pair in a map. It is not necessarily unique. </li> + <li class="li"> <code class="ph codeph">VALUE</code>: The data item forming the second part of a key-value pair in a map. If the <code class="ph codeph">VALUE</code> part of the <code class="ph codeph">MAP</code> element is a <code class="ph codeph">STRUCT</code>, you can refer to either <code class="ph codeph"><var class="keyword varname">map_name</var>.VALUE.<var class="keyword varname">field_name</var></code> or use the shorthand <code class="ph codeph"><var class="keyword varname">map_name</var>.<var class="keyword varname">field_name</var></code>. </li> + </ul> + <p class="p toc inpage"></p> + </div> - <article class="topic concept nested3" aria-labelledby="item__pos" id="pseudocolumns__item"> + + <div class="topic concept nested3" aria-labelledby="item__pos" id="item"> <h4 class="title topictitle4" id="item__pos">ITEM and POS Pseudocolumns</h4> + <div class="body conbody"> <p class="p"> @@ -1423,6 +1633,7 @@ LIMIT 5; <code class="ph codeph">SELECT</code> list, or the <code class="ph codeph">WHERE</code> or other clauses. </p> + <p class="p"> This example shows a table with two <code class="ph codeph">ARRAY</code> columns whose elements are of the scalar type <code class="ph codeph">STRING</code>. When referring to the values of the array elements in the <code class="ph codeph">SELECT</code> list, @@ -1430,6 +1641,7 @@ LIMIT 5; within the array, the individual elements have no defined names. </p> + <pre class="pre codeblock"><code>create TABLE persons_of_interest ( person_id BIGINT, @@ -1457,12 +1669,14 @@ WHERE associates.item LIKE '% MacGuffin'; <code class="ph codeph">POS</code> pseudocolumn lets you filter or reorder the result set based on the sequence of array elements. </p> + <p class="p"> The following example uses a table from a flattened version of the TPC-H schema. The <code class="ph codeph">REGION</code> table only has a few rows, such as one row for Europe and one for Asia. The row for each region represents all the countries in that region as an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> elements: </p> + <pre class="pre codeblock"><code>[localhost:21000] > desc region; +-------------+--------------------------------------------------------------------+ | name | type | @@ -1480,6 +1694,7 @@ WHERE associates.item LIKE '% MacGuffin'; refer to the <code class="ph codeph">POS</code> pseudocolumn in the select list: </p> + <pre class="pre codeblock"><code>[localhost:21000] > SELECT r1.r_name, r2.n_name, <strong class="ph b">r2.POS</strong> > FROM region r1 INNER JOIN r1.r_nations r2 > WHERE r1.r_name = 'ASIA'; @@ -1499,6 +1714,7 @@ WHERE associates.item LIKE '% MacGuffin'; ordering of results from the complex type column or to filter certain elements from the array: </p> + <pre class="pre codeblock"><code>[localhost:21000] > SELECT r1.r_name, r2.n_name, r2.POS > FROM region r1 INNER JOIN r1.r_nations r2 > WHERE r1.r_name = 'ASIA' @@ -1526,12 +1742,15 @@ WHERE associates.item LIKE '% MacGuffin'; </div> - </article> - <article class="topic concept nested3" aria-labelledby="key__value" id="pseudocolumns__key"> + </div> + + + <div class="topic concept nested3" aria-labelledby="key__value" id="key"> <h4 class="title topictitle4" id="key__value">KEY and VALUE Pseudocolumns</h4> + <div class="body conbody"> <p class="p"> @@ -1542,6 +1761,7 @@ WHERE associates.item LIKE '% MacGuffin'; <code class="ph codeph"><var class="keyword varname">map_column</var>.KEY</code> and <code class="ph codeph"><var class="keyword varname">map_column</var>.VALUE</code>. </p> + <p class="p"> The <code class="ph codeph">KEY</code> must always be a scalar type, such as <code class="ph codeph">STRING</code>, <code class="ph codeph">BIGINT</code>, or <code class="ph codeph">TIMESTAMP</code>. It can be <code class="ph codeph">NULL</code>. Values of the <code class="ph codeph">KEY</code> field are not necessarily unique @@ -1549,6 +1769,7 @@ WHERE associates.item LIKE '% MacGuffin'; clauses in the query, and loop through the result set to process all the values matching any specified keys. </p> + <p class="p"> The <code class="ph codeph">VALUE</code> can be either a scalar type or another complex type. If the <code class="ph codeph">VALUE</code> is a <code class="ph codeph">STRUCT</code>, you can construct a qualified name @@ -1559,6 +1780,7 @@ WHERE associates.item LIKE '% MacGuffin'; <code class="ph codeph"><var class="keyword varname">table_alias</var>.KEY</code> and <code class="ph codeph"><var class="keyword varname">table_alias</var>.VALUE</code> </p> + <p class="p"> The following example shows different ways to access a <code class="ph codeph">MAP</code> column using the <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code> pseudocolumns. The <code class="ph codeph">DETAILS</code> column has a <code class="ph codeph">STRING</code> first part with short, @@ -1569,7 +1791,8 @@ WHERE associates.item LIKE '% MacGuffin'; underlying values. </p> - <div class="note note note_note"><span class="note__title notetitle">Note:</span> + + <div class="note note"><span class="notetitle">Note:</span> If you find that the single-item nature of the <code class="ph codeph">VALUE</code> makes it difficult to model your data accurately, the solution is typically to add some nesting to the complex type. For example, to have several sets of key-value pairs, make the column an <code class="ph codeph">ARRAY</code> whose elements are <code class="ph codeph">MAP</code>. To make a set of key-value pairs that holds more @@ -1577,6 +1800,7 @@ WHERE associates.item LIKE '% MacGuffin'; or a <code class="ph codeph">STRUCT</code>. </div> + <pre class="pre codeblock"><code>CREATE TABLE dream_journal ( dream_id BIGINT, @@ -1609,6 +1833,7 @@ WHERE <code class="ph codeph">VALUE</code> pseudocolumn directly, you use dot notation to refer to the <code class="ph codeph">STRUCT</code> fields inside it. </p> + <pre class="pre codeblock"><code>CREATE TABLE better_dream_journal ( dream_id BIGINT, @@ -1637,16 +1862,20 @@ WHERE </div> - </article> - </article> + </div> + + + </div> + - <article class="topic concept nested2" aria-labelledby="ariaid-title19" id="complex_types_using__complex_types_etl"> + <div class="topic concept nested2" aria-labelledby="ariaid-title19" id="complex_types_etl"> <h3 class="title topictitle3" id="ariaid-title19">Loading Data Containing Complex Types</h3> + <div class="body conbody"> <p class="p"> @@ -1656,12 +1885,14 @@ WHERE files. </p> + <p class="p"> If you have created a Hive table with the Parquet file format and containing complex types, use the same table for Impala queries with no changes. If you have such a Hive table in some other format, use a Hive <code class="ph codeph">CREATE TABLE AS SELECT ... STORED AS PARQUET</code> or <code class="ph codeph">INSERT ... SELECT</code> statement to produce an equivalent Parquet table that Impala can query. </p> + <p class="p"> If you have existing Parquet data files containing complex types, located outside of any Impala or Hive table, such as data files created by Spark jobs, you can use an Impala <code class="ph codeph">CREATE TABLE ... STORED AS PARQUET</code> statement, followed by an Impala @@ -1670,6 +1901,7 @@ WHERE files. </p> + <p class="p"> Perhaps the simplest way to get started with complex type data is to take a denormalized table containing duplicated values, and use an <code class="ph codeph">INSERT ... SELECT</code> statement to copy the data into a Parquet table and condense the repeated values into @@ -1680,21 +1912,26 @@ WHERE match the field names from the <code class="ph codeph">CREATE TABLE</code> statement. </p> - <div class="note note note_note"><span class="note__title notetitle">Note:</span> + + <div class="note note"><span class="notetitle">Note:</span> Because Hive currently cannot construct individual rows using complex types through the <code class="ph codeph">INSERT ... VALUES</code> syntax, you prepare the data in flat form in a separate table, then copy it to the table with complex columns using <code class="ph codeph">INSERT ... SELECT</code> and the complex type constructors. See <a class="xref" href="impala_complex_types.html#complex_types_ex_hive_etl">Constructing Parquet Files with Complex Columns Using Hive</a> for examples. </div> + </div> - </article> - <article class="topic concept nested2" aria-labelledby="ariaid-title20" id="complex_types_using__complex_types_nesting"> + </div> + + + <div class="topic concept nested2" aria-labelledby="ariaid-title20" id="complex_types_nesting"> <h3 class="title topictitle3" id="ariaid-title20">Using Complex Types as Nested Types</h3> + <div class="body conbody"> <p class="p"> @@ -1705,10 +1942,12 @@ WHERE <code class="ph codeph">STRUCT</code>, elements of an <code class="ph codeph">ARRAY</code>, and keys and values of a <code class="ph codeph">MAP</code>. </p> + <p class="p"> Schemas involving complex types typically use some level of nesting for the complex type columns. </p> + <p class="p"> For example, to model a relationship like a dimension table and a fact table, you typically use an <code class="ph codeph">ARRAY</code> where each array element is a <code class="ph codeph">STRUCT</code>. The <code class="ph codeph">STRUCT</code> fields represent what would traditionally be columns @@ -1718,6 +1957,7 @@ WHERE + <p class="p"> Perhaps the only use case for a top-level <code class="ph codeph">STRUCT</code> would be to to allow <code class="ph codeph">STRUCT</code> fields with the same name as columns to coexist in the same table. The following example shows how a table could have a column named @@ -1726,6 +1966,7 @@ WHERE conflict. </p> + <pre class="pre codeblock"><code>CREATE TABLE struct_namespaces ( id BIGINT @@ -1746,6 +1987,7 @@ select id, s1.id, s2.id from struct_namespaces; structures where each row contains only a few data values drawn from a large set of possible choices. </p> + <p class="p"> Although you can use an <code class="ph codeph">ARRAY</code> of scalar values as the top-level column in a table, such a simple array is typically of limited use for analytic queries. The only property of the array elements, aside from the element value, is the @@ -1754,6 +1996,7 @@ select id, s1.id, s2.id from struct_namespaces; of scalar values. </p> + <p class="p"> If you are considering having multiple <code class="ph codeph">ARRAY</code> or <code class="ph codeph">MAP</code> columns, with related items under the same position in each <code class="ph codeph">ARRAY</code> or the same key in each <code class="ph codeph">MAP</code>, prefer to use a <code class="ph codeph">STRUCT</code> to @@ -1764,6 +2007,7 @@ select id, s1.id, s2.id from struct_namespaces; notation to refer to the relevant fields rather than a sequence of join clauses. </p> + <p class="p"> For example, here is a table with several complex type columns all at the top level and containing only scalar types. To retrieve every data item for the row requires a separate join for each <code class="ph codeph">ARRAY</code> or <code class="ph codeph">MAP</code> column. The fields of @@ -1772,6 +2016,7 @@ select id, s1.id, s2.id from struct_namespaces; <code class="ph codeph">FIELD2</code>. </p> + <pre class="pre codeblock"><code>CREATE TABLE complex_types_top_level ( id BIGINT, @@ -1825,6 +2070,7 @@ from <code class="ph codeph">STRUCT</code>. </p> + <pre class="pre codeblock"><code>CREATE TABLE nesting_demo ( user_id BIGINT, @@ -1843,6 +2089,7 @@ STORED AS PARQUET; names within each <code class="ph codeph">STRUCT</code> for easy readability: </p> + <pre class="pre codeblock"><code>DESCRIBE nesting_demo; +----------------+-----------------------------+ | name | type | @@ -1879,6 +2126,7 @@ STORED AS PARQUET; + <pre class="pre codeblock"><code>SELECT -- The lone scalar field doesn't require any dot notation or join clauses. user_id @@ -1920,14 +2168,18 @@ FROM <code class="ph codeph">MAP</code> items by running comparisons against the <code class="ph codeph">KEY</code> part in the <code class="ph codeph">WHERE</code> clause. </p> + </div> - </article> - <article class="topic concept nested2" aria-labelledby="ariaid-title21" id="complex_types_using__complex_types_views"> + </div> + + + <div class="topic concept nested2" aria-labelledby="ariaid-title21" id="complex_types_views"> <h3 class="title topictitle3" id="ariaid-title21">Accessing Complex Type Data in Flattened Form Using Views</h3> + <div class="body conbody"> <p class="p"> @@ -1941,6 +2193,7 @@ FROM + <p class="p"> For example, the variation of the TPC-H schema containing complex types has a table <code class="ph codeph">REGION</code>. This table has 5 rows, corresponding to 5 regions such as <code class="ph codeph">NORTH AMERICA</code> and <code class="ph codeph">AFRICA</code>. Each row has an @@ -1948,6 +2201,7 @@ FROM region. </p> + <pre class="pre codeblock"><code>DESCRIBE region; +-------------+-------------------------+ | name | type | @@ -1970,6 +2224,7 @@ FROM still keeping the data in a single table rather than normalizing across multiple tables. </p> + <p class="p"> To use this table with a JDBC or ODBC application that expected scalar columns, we could create a view that represented the result set as a set of scalar columns (three columns from the original table, plus three more from the <code class="ph codeph">STRUCT</code> fields of @@ -1980,6 +2235,7 @@ FROM + <pre class="pre codeblock"><code>CREATE VIEW region_view AS SELECT r_regionkey, @@ -1998,6 +2254,7 @@ FROM nation. </p> + <pre class="pre codeblock"><code>-- Retrieve info such as the nation name from the original R_NATIONS array elements. select n_name from region_view where r_name in ('EUROPE', 'ASIA'); +----------------+ @@ -2043,14 +2300,18 @@ SELECT r_regionkey, r_name, n_nationkey, n_name FROM region_view LIMIT 7; </div> - </article> - </article> + </div> + + + </div> + - <article class="topic concept nested1" aria-labelledby="ariaid-title22" id="complex_types__complex_types_examples"> + <div class="topic concept nested1" aria-labelledby="ariaid-title22" id="complex_types_examples"> <h2 class="title topictitle2" id="ariaid-title22">Tutorials and Examples for Complex Types</h2> + <div class="body conbody"> @@ -2059,14 +2320,18 @@ SELECT r_regionkey, r_name, n_nationkey, n_name FROM region_view LIMIT 7; The following examples illustrate the query syntax for some common use cases involving complex type columns. </p> + <p class="p toc inpage"></p> + </div> - <article class="topic concept nested2" aria-labelledby="ariaid-title23" id="complex_types_examples__complex_sample_schema"> + + <div class="topic concept nested2" aria-labelledby="ariaid-title23" id="complex_sample_schema"> <h3 class="title topictitle3" id="ariaid-title23">Sample Schema and Data for Experimenting with Impala Complex Types</h3> + <div class="body conbody"> @@ -2076,6 +2341,7 @@ SELECT r_regionkey, r_name, n_nationkey, n_name FROM region_view LIMIT 7; the complex type feature use these tables, adapted from the schema used for TPC-H testing: </p> + <pre class="pre codeblock"><code>SHOW TABLES; +----------+ | name | @@ -2174,6 +2440,7 @@ DESCRIBE supplier; The volume of data used in the following examples is: </p> + <pre class="pre codeblock"><code>SELECT count(*) FROM customer; +----------+ | count(*) | @@ -2206,9 +2473,11 @@ SELECT count(*) FROM supplier; </div> + - </article> + </div> + @@ -2216,10 +2485,11 @@ SELECT count(*) FROM supplier; - <article class="topic concept nested2" aria-labelledby="ariaid-title24" id="complex_types_examples__complex_types_ex_hive_etl"> + <div class="topic concept nested2" aria-labelledby="ariaid-title24" id="complex_types_ex_hive_etl"> <h3 class="title topictitle3" id="ariaid-title24">Constructing Parquet Files with Complex Columns Using Hive</h3> + <div class="body conbody"> <p class="p"> @@ -2229,10 +2499,12 @@ SELECT count(*) FROM supplier; format. </p> + <p class="p"> <strong class="ph b">Create table with <code class="ph codeph">ARRAY</code> in Impala, load data in Hive, query in Impala:</strong> </p> + <p class="p"> This example shows the cycle of creating the tables and querying the complex data in Impala, and using Hive (either the <code class="ph codeph">hive</code> shell or <code class="ph codeph">beeline</code>) for the data loading step. The data starts in flattened, denormalized @@ -2240,6 +2512,7 @@ SELECT count(*) FROM supplier; analytic queries on the Parquet table, using join notation to unpack the <code class="ph codeph">ARRAY</code> column. </p> + <pre class="pre codeblock"><code>/* Initial DDL and loading of flat, denormalized data happens in impala-shell */CREATE TABLE flat_array (country STRING, city STRING);INSERT INTO flat_array VALUES ('Canada', 'Toronto') , ('Canada', 'Vancouver') , ('Canada', "St. John\'s") , ('Canada', 'Saint John') , ('Canada', 'Montreal') , ('Canada', 'Halifax') @@ -2299,6 +2572,7 @@ SELECT country, city.item FROM complex_array, complex_array.city <strong class="ph b">Create table with <code class="ph codeph">STRUCT</code> and <code class="ph codeph">ARRAY</code> in Impala, load data in Hive, query in Impala:</strong> </p> + <p class="p"> This example shows the cycle of creating the tables and querying the complex data in Impala, and using Hive (either the <code class="ph codeph">hive</code> shell or <code class="ph codeph">beeline</code>) for the data loading step. The data starts in flattened, denormalized @@ -2309,6 +2583,7 @@ SELECT country, city.item FROM complex_array, complex_array.city + <pre class="pre codeblock"><code>/* Initial DDL and loading of flat, denormalized data happens in impala-shell */ CREATE TABLE flat_struct_array (continent STRING, country STRING, city STRING); @@ -2374,14 +2649,17 @@ SELECT t1.continent, t1.country.name, t2.item </div> - </article> + + </div> + - <article class="topic concept nested2" aria-labelledby="ariaid-title25" id="complex_types_examples__complex_denormalizing"> + <div class="topic concept nested2" aria-labelledby="ariaid-title25" id="complex_denormalizing"> <h3 class="title topictitle3" id="ariaid-title25">Flattening Normalized Tables into a Single Table with Complex Types</h3> + <div class="body conbody"> <p class="p"> @@ -2390,17 +2668,20 @@ SELECT t1.continent, t1.country.name, t2.item of rows as in the original normalized table, and put all the associated data from the other table in a single new column. </p> + <p class="p"> In this flattening scenario, you might frequently use a column that is an <code class="ph codeph">ARRAY</code> consisting of <code class="ph codeph">STRUCT</code> elements, where each field within the <code class="ph codeph">STRUCT</code> corresponds to a column name from the table that you are combining. </p> + <p class="p"> The following example shows a traditional normalized layout using two tables, and then an equivalent layout using complex types in a single table. </p> + <pre class="pre codeblock"><code>/* Traditional relational design */ -- This table just stores numbers, allowing us to look up details about the employee @@ -2470,24 +2751,29 @@ STORED AS PARQUET; </div> - </article> - <article class="topic concept nested2" aria-labelledby="ariaid-title26" id="complex_types_examples__complex_inference"> + </div> + + + <div class="topic concept nested2" aria-labelledby="ariaid-title26" id="complex_inference"> <h3 class="title topictitle3" id="ariaid-title26">Interchanging Complex Type Tables and Data Files with Hive and Other Components</h3> + <div class="body conbody"> <p class="p"> You can produce Parquet data files through several Hadoop components and APIs. </p> + <p class="p"> If you have a Hive-created Parquet table that includes <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code> columns, Impala can query that same table in <span class="keyword">Impala 2.3</span> and higher, subject to the usual restriction that all other columns are of data types supported by Impala, and also that the file type of the table must be Parquet. </p> + <p class="p"> If you have a Parquet data file produced outside of Impala, Impala can automatically deduce the appropriate table structure using the syntax <code class="ph codeph">CREATE TABLE ... LIKE PARQUET '<var class="keyword varname">hdfs_path_of_parquet_file</var>'</code>. In <span class="keyword">Impala 2.3</span> @@ -2495,6 +2781,7 @@ STORED AS PARQUET; <code class="ph codeph">MAP</code> types. </p> + <pre class="pre codeblock"><code>/* In impala-shell, find the HDFS data directory of the original table. DESCRIBE FORMATTED tpch_nested_parquet.customer; ... @@ -2599,8 +2886,12 @@ describe customer_ctlp; </div> - </article> - </article> + </div> + + + </div> + -</article></main></body></html> \ No newline at end of file +</body> +</html> \ No newline at end of file
http://git-wip-us.apache.org/repos/asf/impala/blob/b4ad38a9/docs/build/html/topics/impala_components.html ---------------------------------------------------------------------- diff --git a/docs/build/html/topics/impala_components.html b/docs/build/html/topics/impala_components.html index c6ee7fb..eb68b8b 100644 --- a/docs/build/html/topics/impala_components.html +++ b/docs/build/html/topics/impala_components.html @@ -1,8 +1,34 @@ +<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html - SYSTEM "about:legacy-compat"> -<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_concepts.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.12x"><meta name="version" content="Impala 2.12x"><meta name="version" content="Impala 2.12x"><meta name="version" content="Impala 2.12x"><meta name="version" content="Impala 2.12x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="intro_components"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Components of the Impala Server</title></head><body id="intro_components"><main role="main"><article role="article" aria-labelledby="ariaid-title1"> + PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> +<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> +<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> + +<meta name="copyright" content="(C) Copyright 2018" /> +<meta name="DC.rights.owner" content="(C) Copyright 2018" /> +<meta name="DC.Type" content="concept" /> +<meta name="DC.Title" content="Components of the Impala Server" /> +<meta name="DC.Relation" scheme="URI" content="../topics/impala_concepts.html" /> +<meta name="prodname" content="Impala" /> +<meta name="prodname" content="Impala" /> +<meta name="prodname" content="Impala" /> +<meta name="prodname" content="Impala" /> +<meta name="prodname" content="Impala" /> +<meta name="version" content="Impala 3.0.x" /> +<meta name="version" content="Impala 3.0.x" /> +<meta name="version" content="Impala 3.0.x" /> +<meta name="version" content="Impala 3.0.x" /> +<meta name="version" content="Impala 3.0.x" /> +<meta name="DC.Format" content="XHTML" /> +<meta name="DC.Identifier" content="intro_components" /> +<link rel="stylesheet" type="text/css" href="../commonltr.css" /> +<title>Components of the Impala Server</title> +</head> +<body id="intro_components"> + <h1 class="title topictitle1" id="ariaid-title1">Components of the Impala Server</h1> + @@ -13,13 +39,21 @@ different daemon processes that run on specific hosts within your <span class="keyword"></span> cluster. </p> + <p class="p toc inpage"></p> + </div> - <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_concepts.html">Impala Concepts and Architecture</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="intro_components__intro_impalad"> + + <div class="related-links"> +<div class="familylinks"> +<div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_concepts.html">Impala Concepts and Architecture</a></div> +</div> +</div><div class="topic concept nested1" aria-labelledby="ariaid-title2" id="intro_impalad"> <h2 class="title topictitle2" id="ariaid-title2">The Impala Daemon</h2> + <div class="body conbody"> <p class="p"> @@ -30,6 +64,7 @@ central coordinator node. </p> + <p class="p"> You can submit a query to the Impala daemon running on any DataNode, and that instance of the daemon serves as the <dfn class="term">coordinator node</dfn> for that query. The other nodes transmit partial results back to the @@ -39,11 +74,13 @@ submitting each query to a different Impala daemon in round-robin style, using the JDBC or ODBC interfaces. </p> + <p class="p"> The Impala daemons are in constant communication with the <dfn class="term">statestore</dfn>, to confirm which nodes are healthy and can accept new work. </p> + <p class="p"> They also receive broadcast messages from the <span class="keyword cmdname">catalogd</span> daemon (introduced in Impala 1.2) whenever any Impala node in the cluster creates, alters, or drops any type of object, or when an @@ -52,24 +89,30 @@ METADATA</code> statements that were needed to coordinate metadata across nodes prior to Impala 1.2. </p> + <p class="p"> In <span class="keyword">Impala 2.9</span> and higher, you can control which hosts act as query coordinators and which act as query executors, to improve scalability for highly concurrent workloads on large clusters. - See <a class="xref" href="impala_scalability.html">Scalability Considerations for Impala</a> for details. + See <a class="xref" href="impala_dedicated_coordinator.html">How to Configure Impala with Dedicated Coordinators</a> for details. </p> + <p class="p"> <strong class="ph b">Related information:</strong> <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a>, <a class="xref" href="impala_processes.html#processes">Starting Impala</a>, <a class="xref" href="impala_timeouts.html#impalad_timeout">Setting the Idle Query and Idle Session Timeouts for impalad</a>, <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a>, <a class="xref" href="impala_proxy.html#proxy">Using Impala through a Proxy for High Availability</a> </p> + </div> - </article> - <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="intro_components__intro_statestore"> + </div> + + + <div class="topic concept nested1" aria-labelledby="ariaid-title3" id="intro_statestore"> <h2 class="title topictitle2" id="ariaid-title3">The Impala Statestore</h2> + <div class="body conbody"> <p class="p"> @@ -81,14 +124,27 @@ requests to the unreachable node. </p> + <p class="p"> - Because the statestore's purpose is to help when things go wrong, it is not critical to the normal - operation of an Impala cluster. If the statestore is not running or becomes unreachable, the Impala daemons - continue running and distributing work among themselves as usual; the cluster just becomes less robust if - other Impala daemons fail while the statestore is offline. When the statestore comes back online, it re-establishes - communication with the Impala daemons and resumes its monitoring function. + Because the statestore's purpose is to help when things go wrong and + to broadcast metadata to coordinators, it is not always critical to the + normal operation of an Impala cluster. If the statestore is not running + or becomes unreachable, the Impala daemons continue running and + distributing work among themselves as usual when working with the data + known to Impala. The cluster just becomes less robust if other Impala + daemons fail, and metadata becomes less consistent as it changes while + the statestore is offline. When the statestore comes back online, it + re-establishes communication with the Impala daemons and resumes its + monitoring and broadcasting functions. </p> + + <p class="p"> + If you issue a DDL statement while the statestore is down, the queries + that access the new object the DDL created will fail. + </p> + + <p class="p"> Most considerations for load balancing and high availability apply to the <span class="keyword cmdname">impalad</span> daemon. The <span class="keyword cmdname">statestored</span> and <span class="keyword cmdname">catalogd</span> daemons do not have special @@ -99,22 +155,28 @@ Impala service. </p> + <p class="p"> <strong class="ph b">Related information:</strong> </p> + <p class="p"> <a class="xref" href="impala_scalability.html#statestore_scalability">Scalability Considerations for the Impala Statestore</a>, <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a>, <a class="xref" href="impala_processes.html#processes">Starting Impala</a>, <a class="xref" href="impala_timeouts.html#statestore_timeout">Increasing the Statestore Timeout</a>, <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a> </p> + </div> - </article> - <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="intro_components__intro_catalogd"> + </div> + + + <div class="topic concept nested1" aria-labelledby="ariaid-title4" id="intro_catalogd"> <h2 class="title topictitle2" id="ariaid-title4">The Impala Catalog Service</h2> + <div class="body conbody"> <p class="p"> @@ -125,6 +187,7 @@ <span class="keyword cmdname">catalogd</span> services on the same host. </p> + <p class="p"> The catalog service avoids the need to issue <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> statements when the metadata changes are @@ -133,12 +196,14 @@ before executing a query there. </p> + <p class="p"> This feature touches a number of aspects of Impala: </p> + <ul class="ul" id="intro_catalogd__catalogd_xrefs"> <li class="li"> <p class="p"> @@ -146,8 +211,10 @@ <a class="xref" href="impala_processes.html#processes">Starting Impala</a>, for usage information for the <span class="keyword cmdname">catalogd</span> daemon. </p> + </li> + <li class="li"> <p class="p"> The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> statements are not needed @@ -159,9 +226,12 @@ <a class="xref" href="impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA Statement</a> for the latest usage information for those statements. </p> + </li> + </ul> + <div class="p"> Use <code class="ph codeph">--load_catalog_in_background</code> option to control when the metadata of a table is loaded. @@ -174,6 +244,7 @@ <code class="ph codeph">load_catalog_in_background</code> is <code class="ph codeph">false</code>. </li> + <li class="li"> If set to <code class="ph codeph">true</code>, the catalog service attempts to load metadata for a table even if no query needed that metadata. So @@ -188,16 +259,22 @@ and can lead to a seemingly random long running queries that are difficult to diagnose. </li> + <li class="li"> Impala may load metadata for tables that are possibly never used, potentially increasing catalog size and consequently memory usage for both catalog service and Impala Daemon. </li> + </ul> + </li> + </ul> + </div> + <p class="p"> Most considerations for load balancing and high availability apply to the <span class="keyword cmdname">impalad</span> daemon. The <span class="keyword cmdname">statestored</span> and <span class="keyword cmdname">catalogd</span> daemons do not have special @@ -208,7 +285,8 @@ Impala service. </p> - <div class="note note note_note"><span class="note__title notetitle">Note:</span> + + <div class="note note"><span class="notetitle">Note:</span> <p class="p"> In Impala 1.2.4 and higher, you can specify a table name with <code class="ph codeph">INVALIDATE METADATA</code> after the table is created in Hive, allowing you to make individual tables visible to Impala without doing a full @@ -216,12 +294,18 @@ mechanism faster and more responsive, especially during Impala startup. See <a class="xref" href="../shared/../topics/impala_new_features.html#new_features_124">New Features in Impala 1.2.4</a> for details. </p> + </div> + <p class="p"> <strong class="ph b">Related information:</strong> <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a>, <a class="xref" href="impala_processes.html#processes">Starting Impala</a>, <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a> </p> + </div> - </article> -</article></main></body></html> \ No newline at end of file + + </div> + +</body> +</html> \ No newline at end of file