[02/36] incubator-hawq-docs git commit: moving book configuration to new 'book' branch, for HAWQ-1027

yozie Mon, 29 Aug 2016 09:48:20 -0700

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/reference/sql/SELECT.html.md.erb
----------------------------------------------------------------------
diff --git a/reference/sql/SELECT.html.md.erb b/reference/sql/SELECT.html.md.erb
new file mode 100644
index 0000000..4649bad
--- /dev/null
+++ b/reference/sql/SELECT.html.md.erb
@@ -0,0 +1,507 @@
+---
+title: SELECT
+---
+
+Retrieves rows from a table or view.
+
+## <a id="topic1__section2"></a>Synopsis
+
+``` pre
+SELECT [ALL | DISTINCT [ON (<expression> [, ...])]]
+Â Â * | <expression> [[AS] <output_name>] [, ...]
+Â Â [FROM <from_item> [, ...]]
+Â Â [WHERE <condition>]
+Â Â [GROUP BY <grouping_element> [, ...]]
+Â Â [HAVING <condition> [, ...]]
+Â Â [WINDOW <window_name> AS (<window_specification>)]
+Â Â [{UNION | INTERSECT | EXCEPT} [ALL] <select>]
+Â Â [ORDER BY <expression> [ASC | DESC | USING <operator>] [, ...]]
+Â Â [LIMIT {<count> | ALL}]
+Â Â [OFFSET <start>]
+```
+
+where \<grouping\_element\> can be one of:
+
+``` pre
+Â Â ()
+Â Â <expression>
+Â Â ROLLUP (<expression> [,...])
+Â Â CUBE (<expression> [,...])
+Â Â GROUPING SETS ((<grouping_element> [, ...]))
+```
+
+where \<window\_specification\> can be:
+
+``` pre
+Â Â [<window_name>]
+Â Â [PARTITION BY <expression> [, ...]]
+Â Â [ORDER BY <expression> [ASC | DESC | USING <operator>] [, ...]
+Â Â Â Â Â [{RANGE | ROWS}
+Â Â Â Â Â Â Â Â Â Â { UNBOUNDED PRECEDING
+Â Â Â Â Â Â Â Â Â Â | <expression> PRECEDING
+Â Â Â Â Â Â Â Â Â Â | CURRENT ROW
+Â Â Â Â Â Â Â Â Â Â | BETWEEN <window_frame_bound> AND <window_frame_bound> }]]
+Â Â Â Â                 where <window_frame_bound> can be one of:
+                  Â Â Â Â Â Â UNBOUNDED PRECEDING
+                  Â Â Â Â Â Â <expression> PRECEDING
+    Â               Â Â Â Â Â CURRENT ROW
+                  Â Â Â Â Â Â <expression> FOLLOWING
+                  Â Â Â Â Â Â UNBOUNDED FOLLOWING
+```
+
+where \<from\_item\> can be one of:
+
+``` pre
+[ONLY] <table_name> [[AS] <alias> [( <column_alias> [, ...] )]]
+(select) [AS] <alias> [( <column_alias> [, ...] )]
+<function_name> ( [<argument> [, ...]] ) [AS] <alias>
+Â Â Â Â Â Â Â Â Â Â Â Â Â [( <column_alias> [, ...]
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â | <column_definition> [, ...] )]
+<function_name> ( [<argument> [, ...]] ) AS
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â ( <column_definition> [, ...] )
+<from_item> [NATURAL] <join_type>
+            <from_item>
+Â Â Â Â Â Â Â Â Â Â [ON <join_condition> | USING ( <join_column> [, ...] )]
+```
+
+## <a id="topic1__section3"></a>Description
+
+`SELECT` retrieves rows from zero or more tables. The general processing of 
`SELECT` is as follows:
+
+1.  All elements in the `FROM` list are computed. (Each element in the `FROM` 
list is a real or virtual table.) If more than one element is specified in the 
`FROM` list, they are cross-joined together.
+2.  If the `WHERE` clause is specified, all rows that do not satisfy the 
condition are eliminated from the output.
+3.  If the `GROUP BY` clause is specified, the output is divided into groups 
of rows that match on one or more of the defined grouping elements. If the 
`HAVING` clause is present, it eliminates groups that do not satisfy the given 
condition.
+4.  If a window expression is specified (and optional `WINDOW` clause), the 
output is organized according to the positional (row) or value-based (range) 
window frame.
+5.  `DISTINCT` eliminates duplicate rows from the result. `DISTINCT ON` 
eliminates rows that match on all the specified expressions. `ALL` (the 
default) will return all candidate rows, including duplicates.
+6.  The actual output rows are computed using the `SELECT` output expressions 
for each selected row.
+7.  Using the operators `UNION`, `INTERSECT`, and `EXCEPT`, the output of more 
than one `SELECT` statement can be combined to form a single result set. The 
`UNION` operator returns all rows that are in one or both of the result sets. 
The `INTERSECT` operator returns all rows that are strictly in both result 
sets. The `EXCEPT` operator returns the rows that are in the first result set 
but not in the second. In all three cases, duplicate rows are eliminated unless 
`ALL` is specified.
+8.  If the `ORDER BY` clause is specified, the returned rows are sorted in the 
specified order. If `ORDER BY` is not given, the rows are returned in whatever 
order the system finds fastest to produce.
+9.  If the `LIMIT` or `OFFSET` clause is specified, the `SELECT` statement 
only returns a subset of the result rows.
+
+You must have `SELECT` privilege on a table to read its values.
+
+## <a id="topic1__section4"></a>Parameters
+
+**The SELECT List**
+
+The `SELECT` list (between the key words `SELECT` and `FROM`) specifies 
expressions that form the output rows of the `SELECT` statement. The 
expressions can (and usually do) refer to columns computed in the `FROM` clause.
+
+Using the clause `[AS] ` \<output\_name\>, another name can be specified for 
an output column. This name is primarily used to label the column for display. 
It can also be used to refer to the column's value in `ORDER BY` and `GROUP BY` 
clauses, but not in the `WHERE` or `HAVING` clauses; there you must write out 
the expression instead. The `AS` keyword is optional in most cases (such as 
when declaring an alias for column names, constants, function calls, and simple 
unary operator expressions). In cases where the declared alias is a reserved 
SQL keyword, the \<output\_name\> must be enclosed in double quotes to avoid 
ambiguity.
+
+An \<expression\> in the `SELECT` list can be a constant value, a column 
reference, an operator invocation, a function call, an aggregate expression, a 
window expression, a scalar subquery, and so on. There are a number of 
constructs that can be classified as an expression but do not follow any 
general syntax rules.
+
+Instead of an expression, `*` can be written in the output list as a shorthand 
for all the columns of the selected rows. Also, you can write `                 
  table_name.*` as a shorthand for the columns coming from just that table.
+
+**The FROM Clause**
+
+The `FROM` clause specifies one or more source tables for the `SELECT`. If 
multiple sources are specified, the result is the Cartesian product (cross 
join) of all the sources. But usually qualification conditions are added to 
restrict the returned rows to a small subset of the Cartesian product. The 
`FROM` clause can contain the following elements:
+
+<dt> \<table\_name\>  </dt>
+<dd>The name (optionally schema-qualified) of an existing table or view. If 
`ONLY` is specified, only that table is scanned. If `ONLY` is not specified, 
the table and all its descendant tables (if any) are scanned.</dd>
+
+<dt> \<alias\>  </dt>
+<dd>A substitute name for the `FROM` item containing the alias. An alias is 
used for brevity or to eliminate ambiguity for self-joins (where the same table 
is scanned multiple times). When an alias is provided, it completely hides the 
actual name of the table or function; for example given `FROM foo AS f`, the 
remainder of the `SELECT` must refer to this `FROM` item as `f` not `foo`. If 
an alias is written, a column alias list can also be written to provide 
substitute names for one or more columns of the table.</dd>
+
+<dt> \<select\>  </dt>
+<dd>A sub-`SELECT` can appear in the `FROM` clause. This acts as though its 
output were created as a temporary table for the duration of this single 
`SELECT` command. Note that the sub-`SELECT` must be surrounded by parentheses, 
and an alias must be provided for it. A `VALUES` command can also be used here. 
See "Non-standard Clauses" in the [Compatibility](#topic1__section19) section 
for limitations of using correlated sub-selects in HAWQ.</dd>
+
+<dt> \<function\_name\>  </dt>
+<dd>Function calls can appear in the `FROM` clause. (This is especially useful 
for functions that return result sets, but any function can be used.) This acts 
as though its output were created as a temporary table for the duration of this 
single `SELECT` command. An alias may also be used. If an alias is written, a 
column alias list can also be written to provide substitute names for one or 
more attributes of the function's composite return type. If the function has 
been defined as returning the record data type, then an alias or the key word 
`AS` must be present, followed by a column definition list in the form 
`(<column_name> <data_type> [, ... ] )`. The column definition list must match 
the actual number and types of columns returned by the function.</dd>
+
+<dt> \<join\_type\>  </dt>
+<dd>One of:
+
+-   **\[INNER\] JOIN**
+-   **LEFT \[OUTER\] JOIN**
+-   **RIGHT \[OUTER\] JOIN**
+-   **FULL \[OUTER\] JOIN**
+-   **CROSS JOIN**
+
+For the `INNER` and `OUTER` join types, a join condition must be specified, 
namely exactly one of `NATURAL`, `ON <join_condition>                      `, 
or `USING (<join_column> [, ...])`. See below for the meaning. For `CROSS 
JOIN`, none of these clauses may appear.
+
+A `JOIN` clause combines two `FROM` items. Use parentheses if necessary to 
determine the order of nesting. In the absence of parentheses, `JOIN`s nest 
left-to-right. In any case `JOIN` binds more tightly than the commas separating 
`FROM` items.
+
+`CROSS JOIN` and `INNER JOIN` produce a simple Cartesian product, the same 
result as you get from listing the two items at the top level of `FROM`, but 
restricted by the join condition (if any). `CROSS JOIN` is equivalent to `INNER 
JOIN                 ON` `(TRUE)`, that is, no rows are removed by 
qualification. These join types are just a notational convenience, since they 
do nothing you could not do with plain `FROM` and `WHERE`.
+
+`LEFT OUTER JOIN` returns all rows in the qualified Cartesian product (i.e., 
all combined rows that pass its join condition), plus one copy of each row in 
the left-hand table for which there was no right-hand row that passed the join 
condition. This left-hand row is extended to the full width of the joined table 
by inserting null values for the right-hand columns. Note that only the `JOIN` 
clause's own condition is considered while deciding which rows have matches. 
Outer conditions are applied afterwards.
+
+Conversely, `RIGHT OUTER JOIN` returns all the joined rows, plus one row for 
each unmatched right-hand row (extended with nulls on the left). This is just a 
notational convenience, since you could convert it to a `LEFT OUTER             
    JOIN` by switching the left and right inputs.
+
+`FULL OUTER JOIN` returns all the joined rows, plus one row for each unmatched 
left-hand row (extended with nulls on the right), plus one row for each 
unmatched right-hand row (extended with nulls on the left).</dd>
+
+<dt>ON \<join\_condition\>  </dt>
+<dd>\<join\_condition\> is an expression resulting in a value of type 
`boolean` (similar to a `WHERE` clause) that specifies which rows in a join are 
considered to match.</dd>
+
+<dt>USING (\<join\_column\> \[, ...\])  </dt>
+<dd>A clause of the form `USING ( a, b, ... )` is shorthand for `ON 
left_table.a = right_table.a AND left_table.b = right_table.b ...               
`. Also, `USING` implies that only one of each pair of equivalent columns will 
be included in the join output, not both.</dd>
+
+<dt>NATURAL  </dt>
+<dd>`NATURAL` is shorthand for a `USING` list that mentions all columns in the 
two tables that have the same names.</dd>
+
+**The WHERE Clause**
+
+The optional `WHERE` clause has the general form:
+
+``` pre
+WHERE <condition>
+```
+
+where \<condition\> is any expression that evaluates to a result of type 
`boolean`. Any row that does not satisfy this condition will be eliminated from 
the output. A row satisfies the condition if it returns true when the actual 
row values are substituted for any variable references.
+
+**The GROUP BY Clause**
+
+The optional `GROUP BY` clause has the general form:
+
+``` pre
+GROUP BY <grouping_element> [, ...]
+```
+
+where \<grouping\_element\> can be one of:
+
+``` pre
+()
+<expression>
+ROLLUP (<expression> [,...])
+CUBE (<expression> [,...])
+GROUPING SETS ((<grouping_element> [, ...]))
+```
+
+`GROUP             BY` will condense into a single row all selected rows that 
share the same values for the grouped expressions. \<expression\> can be an 
input column name, or the name or ordinal number of an output column (`SELECT` 
list item), or an arbitrary expression formed from input-column values. In case 
of ambiguity, a `GROUP BY` name will be interpreted as an input-column name 
rather than an output column name.
+
+Aggregate functions, if any are used, are computed across all rows making up 
each group, producing a separate value for each group (whereas without `GROUP 
BY`, an aggregate produces a single value computed across all the selected 
rows). When `GROUP BY` is present, it is not valid for the `SELECT` list 
expressions to refer to ungrouped columns except within aggregate functions, 
since there would be more than one possible value to return for an ungrouped 
column.
+
+HAWQ has the following additional OLAP grouping extensions (often referred to 
as *supergroups*):
+
+<dt>ROLLUP  </dt>
+<dd>A `ROLLUP` grouping is an extension to the `GROUP BY` clause that creates 
aggregate subtotals that roll up from the most detailed level to a grand total, 
following a list of grouping columns (or expressions). `ROLLUP` takes an 
ordered list of grouping columns, calculates the standard aggregate values 
specified in the `GROUP BY` clause, then creates progressively higher-level 
subtotals, moving from right to left through the list. Finally, it creates a 
grand total. A `ROLLUP` grouping can be thought of as a series of grouping 
sets. For example:
+
+``` pre
+GROUP BY ROLLUP (a,b,c)
+```
+
+is equivalent to:
+
+``` pre
+GROUP BY GROUPING SETS( (a,b,c), (a,b), (a), () )
+```
+
+Notice that the *n* elements of a `ROLLUP` translate to *n*+1 grouping sets. 
Also, the order in which the grouping expressions are specified is significant 
in a `ROLLUP`.</dd>
+
+<dt>CUBE  </dt>
+<dd>A `CUBE` grouping is an extension to the `GROUP BY` clause that creates 
subtotals for all of the possible combinations of the given list of grouping 
columns (or expressions). In terms of multidimensional analysis, `CUBE` 
generates all the subtotals that could be calculated for a data cube with the 
specified dimensions. For example:
+
+``` pre
+GROUP BY CUBE (a,b,c)
+```
+
+is equivalent to:
+
+``` pre
+GROUP BY GROUPING SETS( (a,b,c), (a,b), (a,c), (b,c), (a),
+(b), (c), () )
+```
+
+Notice that *n* elements of a `CUBE` translate to 2n grouping sets. Consider 
using `CUBE` in any situation requiring cross-tabular reports. `CUBE` is 
typically most suitable in queries that use columns from multiple dimensions 
rather than columns representing different levels of a single dimension. For 
instance, a commonly requested cross-tabulation might need subtotals for all 
the combinations of month, state, and product.</dd>
+
+<dt>GROUPING SETS  </dt>
+<dd>You can selectively specify the set of groups that you want to create 
using a `GROUPING SETS` expression within a `GROUP BY` clause. This allows 
precise specification across multiple dimensions without computing a whole 
`ROLLUP` or `CUBE`. For example:
+
+``` pre
+GROUP BY GROUPING SETS( (a,c), (a,b) )
+```
+
+If using the grouping extension clauses `ROLLUP`, `CUBE`, or `GROUPING SETS`, 
two challenges arise. First, how do you determine which result rows are 
subtotals, and then the exact level of aggregation for a given subtotal. Or, 
how do you differentiate between result rows that contain both stored `NULL` 
values and "NULL" values created by the `ROLLUP` or `CUBE`. Secondly, when 
duplicate grouping sets are specified in the `GROUP BY` clause, how do you 
determine which result rows are duplicates? There are two additional grouping 
functions you can use in the `SELECT` list to help with this:
+
+-   **grouping(\<column\> \[, ...\])** â The `grouping` function can be 
applied to one or more grouping attributes to distinguish super-aggregated rows 
from regular grouped rows. This can be helpful in distinguishing a "NULL" 
representing the set of all values in a super-aggregated row from a `NULL` 
value in a regular row. Each argument in this function produces a bit â 
either `1` or `0`, where `1` means the result row is super-aggregated, and `0` 
means the result row is from a regular grouping. The `grouping` function 
returns an integer by treating these bits as a binary number and then 
converting it to a base-10 integer.
+-   **group\_id()** â For grouping extension queries that contain duplicate 
grouping sets, the `group_id` function is used to identify duplicate rows in 
the output. All *unique* grouping set output rows will have a group\_id value 
of 0. For each duplicate grouping set detected, the `group_id` function assigns 
a group\_id number greater than 0. All output rows in a particular duplicate 
grouping set are identified by the same group\_id number.</dd>
+
+**The WINDOW Clause**
+
+The `WINDOW` clause is used to define a window that can be used in the 
`OVER()` expression of a window function such as `rank` or `avg`. For example:
+
+``` pre
+SELECT vendor, rank() OVER (mywindow) FROM sale
+GROUP BY vendor
+WINDOW mywindow AS (ORDER BY sum(prc*qty));
+```
+
+A `WINDOW` clause has this general form:
+
+``` pre
+WINDOW <window_name> AS (<window_specification>)
+```
+
+where \<window\_specification\> can be:
+
+``` pre
+[<window_name>]
+[PARTITION BY <expression> [, ...]]
+[ORDER BY <expression> [ASC | DESC | USING <operator>] [, ...]
+Â Â Â Â [{RANGE | ROWS}
+Â Â Â Â Â Â { UNBOUNDED PRECEDING
+Â Â Â Â Â Â | <expression> PRECEDING
+Â Â Â Â Â Â | CURRENT ROW
+Â Â Â Â Â Â | BETWEEN <window_frame_bound> AND <window_frame_bound> }]]
+Â Â Â Â          where window_frame_bound can be one of:
+           Â Â Â Â UNBOUNDED PRECEDING
+           Â Â Â Â <expression> PRECEDING
+           Â Â Â Â CURRENT ROW
+           Â Â Â Â <expression> FOLLOWING
+           Â Â Â Â UNBOUNDED FOLLOWING
+```
+
+<dt> \<window\_name\>  </dt>
+<dd>Gives a name to the window specification.</dd>
+
+<dt>PARTITION BY  </dt>
+<dd>The `PARTITION BY` clause organizes the result set into logical groups 
based on the unique values of the specified expression. When used with window 
functions, the functions are applied to each partition independently. For 
example, if you follow `PARTITION BY` with a column name, the result set is 
partitioned by the distinct values of that column. If omitted, the entire 
result set is considered one partition.
+
+<dt>ORDER BY  </dt>
+<dd>The `ORDER BY` clause defines how to sort the rows in each partition of 
the result set. If omitted, rows are returned in whatever order is most 
efficient and may vary.
+
+**Note:** Columns of data types that lack a coherent ordering, such as `time`, 
are not good candidates for use in the `ORDER                   BY` clause of a 
window specification. Time, with or without time zone, lacks a coherent 
ordering because addition and subtraction do not have the expected effects. For 
example, the following is not generally true: `x::time <                   
x::time + '2 hour'::interval`</dd>
+
+<dt>ROWS | RANGE  </dt>
+<dd>Use either a `ROWS` or `RANGE` clause to express the bounds of the window. 
The window bound can be one, many, or all rows of a partition. You can express 
the bound of the window either in terms of a range of data values offset from 
the value in the current row (`RANGE`), or in terms of the number of rows 
offset from the currentrow (`ROWS`). When using the `RANGE` clause, you must 
also use an `ORDER BY` clause. This is because the calculation performed to 
produce the window requires that the values be sorted. Additionally, the `ORDER 
BY` clause cannot contain more than one expression, and the expression must 
result in either a date or a numeric value. When using the `ROWS` or `RANGE` 
clauses, if you specify only a starting row, the current row is used as the 
last row in the window.
+
+**PRECEDING** â The `PRECEDING` clause defines the first row of the window 
using the current row as a reference point. The starting row is expressed in 
terms of the number of rows preceding the current row. For example, in the case 
of `ROWS` framing, 5 `PRECEDING` sets the window to start with the fifth row 
preceding the current row. In the case of `RANGE` framing, it sets the window 
to start with the first row whose ordering column value precedes that of the 
current row by 5 in the given order. If the specified order is ascending by 
date, this will be the first row within 5 days before the current row. 
`UNBOUNDED PRECEDING` sets the first row in the window to be the first row in 
the partition.
+
+**BETWEEN** â The `BETWEEN` clause defines the first and last row of the 
window, using the current row as a reference point. First and last rows are 
expressed in terms of the number of rows preceding and following the current 
row, respectively. For example, `BETWEEN 3 PRECEDING AND 5 FOLLOWING` sets the 
window to start with the third row preceding the current row, and end with the 
fifth row following the current row. Use `BETWEEN UNBOUNDED PRECEDING AND       
          UNBOUNDED FOLLOWING` to set the first and last rows in the window to 
be the first and last row in the partition, respectively. This is equivalent to 
the default behavior if no `ROW` or `RANGE` clause is specified.
+
+**FOLLOWING** â The `FOLLOWING` clause defines the last row of the window 
using the current row as a reference point. The last row is expressed in terms 
of the number of rows following the current row. For example, in the case of 
`ROWS` framing, `5 FOLLOWING` sets the window to end with the fifth row 
following the current row. In the case of `RANGE` framing, it sets the window 
to end with the last row whose ordering column value follows that of the 
current row by 5 in the given order. If the specified order is ascending by 
date, this will be the last row within 5 days after the current row. Use 
`UNBOUNDED FOLLOWING` to set the last row in the window to be the last row in 
the partition.
+
+If you do not specify a `ROW` or a `RANGE` clause, the window bound starts 
with the first row in the partition (`UNBOUNDED                 PRECEDING`) and 
ends with the current row (`CURRENT ROW`) if `ORDER BY` is used. If an `ORDER 
BY` is not specified, the window starts with the first row in the partition 
(`UNBOUNDED                 PRECEDING`) and ends with last row in the partition 
(`UNBOUNDED                 FOLLOWING`).</dd>
+
+**The HAVING Clause**
+
+The optional `HAVING` clause has the general form:
+
+``` pre
+HAVING <condition>
+```
+
+where \<condition\> is the same as specified for the `WHERE` clause. `HAVING` 
eliminates group rows that do not satisfy the condition. `HAVING` is different 
from `WHERE`: `WHERE` filters individual rows before the application of `GROUP 
BY`, while `HAVING` filters group rows created by `GROUP BY`. Each column 
referenced in \<condition\> must unambiguously reference a grouping column, 
unless the reference appears within an aggregate function.
+
+The presence of `HAVING` turns a query into a grouped query even if there is 
no `GROUP BY` clause. This is the same as what happens when the query contains 
aggregate functions but no `GROUP BY` clause. All the selected rows are 
considered to form a single group, and the `SELECT` list and `HAVING` clause 
can only reference table columns from within aggregate functions. Such a query 
will emit a single row if the `HAVING` condition is true, zero rows if it is 
not true.
+
+**The UNION Clause**
+
+The `UNION` clause has this general form:
+
+``` pre
+<select_statement> UNION [ALL] <select_statement>
+```
+
+where \<select\_statement\> is any `SELECT` statement without an `ORDER BY`, 
`LIMIT`, `FOR UPDATE`, or `FOR SHARE` clause. (`ORDER BY` and `LIMIT` can be 
attached to a subquery expression if it is enclosed in parentheses. Without 
parentheses, these clauses will be taken to apply to the result of the `UNION`, 
not to its right-hand input expression.)
+
+The `UNION` operator computes the set union of the rows returned by the 
involved `SELECT` statements. A row is in the set union of two result sets if 
it appears in at least one of the result sets. The two `SELECT` statements that 
represent the direct operands of the `UNION` must produce the same number of 
columns, and corresponding columns must be of compatible data types.
+
+The result of `UNION` does not contain any duplicate rows unless the `ALL` 
option is specified. `ALL` prevents elimination of duplicates. (Therefore, 
`UNION ALL` is usually significantly quicker than `UNION`; use `ALL` when you 
can.)
+
+Multiple `UNION` operators in the same `SELECT` statement are evaluated left 
to right, unless otherwise indicated by parentheses.
+
+Currently, `FOR UPDATE` and `FOR SHARE` may not be specified either for a 
`UNION` result or for any input of a `UNION`.
+
+**The INTERSECT Clause**
+
+The `INTERSECT` clause has this general form:
+
+``` pre
+<select_statement> INTERSECT [ALL] <select_statement>
+```
+
+where \<select\_statement\> is any SELECT statement without an `ORDER BY`, 
`LIMIT`, `FOR UPDATE`, or `FOR SHARE` clause.
+
+The `INTERSECT` operator computes the set intersection of the rows returned by 
the involved `SELECT` statements. A row is in the intersection of two result 
sets if it appears in both result sets.
+
+The result of `INTERSECT` does not contain any duplicate rows unless the `ALL` 
option is specified. With `ALL`, a row that has *m* duplicates in the left 
table and *n* duplicates in the right table will appear min(*m*, *n*) times in 
the result set.
+
+Multiple `INTERSECT` operators in the same `SELECT` statement are evaluated 
left to right, unless parentheses dictate otherwise. `INTERSECT` binds more 
tightly than `UNION`. That is, `A UNION B INTERSECT C` will be read as `A UNION 
(B INTERSECT C)`.
+
+Currently, `FOR UPDATE` and `FOR SHARE` may not be specified either for an 
`INTERSECT` result or for any input of an `INTERSECT`.
+
+**The EXCEPT Clause**
+
+The `EXCEPT` clause has this general form:
+
+``` pre
+<select_statement> EXCEPT [ALL] <select_statement>
+```
+
+where \<select\_statement\> is any `SELECT` statement without an `ORDER BY`, 
`LIMIT`, `FOR UPDATE`, or `FOR SHARE` clause.
+
+The `EXCEPT` operator computes the set of rows that are in the result of the 
left `SELECT` statement but not in the result of the right one.
+
+The result of `EXCEPT` does not contain any duplicate rows unless the `ALL` 
option is specified. With `ALL`, a row that has *m* duplicates in the left 
table and *n* duplicates in the right table will appear max(*m-n*,0) times in 
the result set.
+
+Multiple `EXCEPT` operators in the same `SELECT` statement are evaluated left 
to right unless parentheses dictate otherwise. `EXCEPT` binds at the same level 
as `UNION`.
+
+Currently, `FOR             UPDATE` and `FOR SHARE` may not be specified 
either for an `EXCEPT` result or for any input of an `EXCEPT`.
+
+**The ORDER BY Clause**
+
+The optional `ORDER BY` clause has this general form:
+
+``` pre
+ORDER BY <expression> [ASC | DESC | USING <operator>] [, ...]
+```
+
+where \<expression\> can be the name or ordinal number of an output column 
(`SELECT` list item), or it can be an arbitrary expression formed from 
input-column values.
+
+The `ORDER BY` clause causes the result rows to be sorted according to the 
specified expressions. If two rows are equal according to the left-most 
expression, they are compared according to the next expression and so on. If 
they are equal according to all specified expressions, they are returned in an 
implementation-dependent order.
+
+The ordinal number refers to the ordinal (left-to-right) position of the 
result column. This feature makes it possible to define an ordering on the 
basis of a column that does not have a unique name. This is never absolutely 
necessary because it is always possible to assign a name to a result column 
using the `AS` clause.
+
+It is also possible to use arbitrary expressions in the `ORDER BY` clause, 
including columns that do not appear in the `SELECT` result list. Thus the 
following statement is valid:
+
+``` pre
+SELECT name FROM distributors ORDER BY code;
+```
+
+A limitation of this feature is that an `ORDER BY` clause applying to the 
result of a `UNION`, `INTERSECT`, or `EXCEPT` clause may only specify an output 
column name or number, not an expression.
+
+If an `ORDER BY` expression is a simple name that matches both a result column 
name and an input column name, `ORDER BY` will interpret it as the result 
column name. This is the opposite of the choice that `GROUP BY` will make in 
the same situation. This inconsistency is made to be compatible with the SQL 
standard.
+
+Optionally one may add the key word `ASC` (ascending) or `DESC` (descending) 
after any expression in the `ORDER BY` clause. If not specified, `ASC` is 
assumed by default. Alternatively, a specific ordering operator name may be 
specified in the `USING` clause. `ASC` is usually equivalent to `USING <` and 
`DESC` is usually equivalent to `USING >`. (But the creator of a user-defined 
data type can define exactly what the default sort ordering is, and it might 
correspond to operators with other names.)
+
+The null value sorts higher than any other value. In other words, with 
ascending sort order, null values sort at the end, and with descending sort 
order, null values sort at the beginning.
+
+Character-string data is sorted according to the locale-specific collation 
order that was established when the HAWQ system was initialized.
+
+**The DISTINCT Clause**
+
+If `DISTINCT` is specified, all duplicate rows are removed from the result set 
(one row is kept from each group of duplicates). `ALL` specifies the opposite: 
all rows are kept. `ALL` is the default.
+
+`DISTINCT ON ( <expression> [, ...] )` keeps only the first row of each set of 
rows where the given expressions evaluate to equal. The `DISTINCT ON` 
expressions are interpreted using the same rules as for `ORDER BY`. Note that 
the 'first row' of each set is unpredictable unless `ORDER BY` is used to 
ensure that the desired row appears first. For example:
+
+``` pre
+SELECT DISTINCT ON (location) location, time, report FROM
+weather_reports ORDER BY location, time DESC;
+```
+
+retrieves the most recent weather report for each location. But if we had not 
used `ORDER             BY` to force descending order of time values for each 
location, we would have gotten a report from an unpredictable time for each 
location.
+
+The `DISTINCT ON` expression(s) must match the left-most `ORDER BY` 
expression(s). The `ORDER BY` clause will normally contain additional 
expression(s) that determine the desired precedence of rows within each 
`DISTINCT             ON` group.
+
+**The LIMIT Clause**
+
+The `LIMIT` clause consists of two independent sub-clauses:
+
+``` pre
+LIMIT {<count> | ALL}
+OFFSET <start>
+
+```
+
+where \<count\> specifies the maximum number of rows to return, while 
\<start\> specifies the number of rows to skip before starting to return rows. 
When both are specified, start rows are skipped before starting to count the 
count rows to be returned.
+
+When using `LIMIT`, it is a good idea to use an `ORDER BY` clause that 
constrains the result rows into a unique order. Otherwise you will get an 
unpredictable subset of the query's rows. You may be asking for the tenth 
through twentieth rows, but tenth through twentieth in what ordering? You don't 
know what ordering unless you specify `ORDER BY`.
+
+The query planner takes `LIMIT` into account when generating a query plan, so 
you are very likely to get different plans (yielding different row orders) 
depending on what you use for `LIMIT` and `OFFSET`. Thus, using different 
`LIMIT/OFFSET` values to select different subsets of a query result will give 
inconsistent results unless you enforce a predictable result ordering with 
`ORDER BY`. This is not a defect; it is an inherent consequence of the fact 
that SQL does not promise to deliver the results of a query in any particular 
order unless `ORDER BY` is used to constrain the order.
+
+## <a id="topic1__section18"></a>Examples
+
+To join the table `films` with the table `distributors`:
+
+``` sql
+SELECT f.title, f.did, d.name, f.date_prod, f.kind FROM
+distributors d, films f WHERE f.did = d.did
+```
+
+To sum the column `length` of all films and group the results by `kind`:
+
+``` sql
+SELECT kind, sum(length) AS total FROM films GROUP BY kind;
+```
+
+To sum the column `length` of all films, group the results by `kind` and show 
those group totals that are less than 5 hours:
+
+``` sql
+SELECT kind, sum(length) AS total FROM films GROUP BY kind
+HAVING sum(length) < interval '5 hours';
+```
+
+Calculate the subtotals and grand totals of all sales for movie `kind` and 
`distributor`.
+
+``` sql
+SELECT kind, distributor, sum(prc*qty) FROM sales
+GROUP BY ROLLUP(kind, distributor)
+ORDER BY 1,2,3;
+```
+
+Calculate the rank of movie distributors based on total sales:
+
+``` sql
+SELECT distributor, sum(prc*qty),
+       rank() OVER (ORDER BY sum(prc*qty) DESC)
+FROM sale
+GROUP BY distributor ORDER BY 2 DESC;
+```
+
+The following two examples are identical ways of sorting the individual 
results according to the contents of the second column (`name`):
+
+``` sql
+SELECT * FROM distributors ORDER BY name;
+SELECT * FROM distributors ORDER BY 2;
+```
+
+The next example shows how to obtain the union of the tables `distributors` 
and `actors`, restricting the results to those that begin with the letter `W` 
in each table. Only distinct rows are wanted, so the key word `ALL` is omitted:
+
+``` sql
+SELECT distributors.name FROM distributors WHERE
+distributors.name LIKE 'W%' UNION SELECT actors.name FROM
+actors WHERE actors.name LIKE 'W%';
+```
+
+This example shows how to use a function in the `FROM` clause, both with and 
without a column definition list:
+
+``` pre
+CREATE FUNCTION distributors(int) RETURNS SETOF distributors
+AS $$ SELECT * FROM distributors WHERE did = $1; $$ LANGUAGE
+SQL;
+SELECT * FROM distributors(111);
+
+CREATE FUNCTION distributors_2(int) RETURNS SETOF record AS
+$$ SELECT * FROM distributors WHERE did = $1; $$ LANGUAGE
+SQL;
+SELECT * FROM distributors_2(111) AS (dist_id int, dist_name
+text);
+```
+
+## <a id="topic1__section19"></a>Compatibility
+
+The `SELECT` statement is compatible with the SQL standard, but there are some 
extensions and some missing features.
+
+**Omitted FROM Clauses**
+
+HAWQ allows you to omit the `FROM` clause. It has a straightforward use to 
compute the results of simple expressions. For example:
+
+``` sql
+SELECT 2+2;
+```
+
+Some other SQL databases cannot do this except by introducing a dummy one-row 
table from which to do the `SELECT`.
+
+Note that if a `FROM` clause is not specified, the query cannot reference any 
database tables. For compatibility with applications that rely on this behavior 
the *add\_missing\_from* configuration parameter can be enabled.
+
+**The AS Key Word**
+
+In the SQL standard, the optional key word `AS` is just noise and can be 
omitted without affecting the meaning. The HAWQ parser requires this key word 
when renaming output columns because the type extensibility features lead to 
parsing ambiguities without it. `AS` is optional in `FROM` items, however.
+
+**Namespace Available to GROUP BY and ORDER BY**
+
+In the SQL-92 standard, an `ORDER BY` clause may only use result column names 
or numbers, while a `GROUP BY` clause may only use expressions based on input 
column names. HAWQ extends each of these clauses to allow the other choice as 
well (but it uses the standard's interpretation if there is ambiguity). HAWQ 
also allows both clauses to specify arbitrary expressions. Note that names 
appearing in an expression will always be taken as input-column names, not as 
result-column names.
+
+SQL:1999 and later use a slightly different definition which is not entirely 
upward compatible with SQL-92. In most cases, however, HAWQ will interpret an 
`ORDER BY` or `GROUP             BY` expression the same way SQL:1999 does.
+
+**Nonstandard Clauses**
+
+The clauses `DISTINCT ON`, `LIMIT`, and `OFFSET` are not defined in the SQL 
standard.
+
+**Limited Use of STABLE and VOLATILE Functions**
+
+To prevent data from becoming out-of-sync across the segments in HAWQ, any 
function classified as `STABLE` or `VOLATILE` cannot be executed at the segment 
database level if it contains SQL or modifies the database in any way.
+
+## <a id="topic1__section25"></a>See Also
+
+[EXPLAIN](EXPLAIN.html)


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/reference/sql/SET-ROLE.html.md.erb
----------------------------------------------------------------------
diff --git a/reference/sql/SET-ROLE.html.md.erb 
b/reference/sql/SET-ROLE.html.md.erb
new file mode 100644
index 0000000..63a03f6
--- /dev/null
+++ b/reference/sql/SET-ROLE.html.md.erb
@@ -0,0 +1,72 @@
+---
+title: SET ROLE
+---
+
+Sets the current role identifier of the current session.
+
+## <a id="topic1__section2"></a>Synopsis
+
+``` pre
+SET [SESSION | LOCAL] ROLE <rolename>
+SET [SESSION | LOCAL] ROLE NONE
+RESET ROLE
+```
+
+## <a id="topic1__section3"></a>Description
+
+This command sets the current role identifier of the current SQL-session 
context to be \<rolename\>. The role name may be written as either an 
identifier or a string literal. After `SET ROLE`, permissions checking for SQL 
commands is carried out as though the named role were the one that had logged 
in originally.
+
+The specified \<rolename\> must be a role that the current session user is a 
member of. If the session user is a superuser, any role can be selected.
+
+The `NONE` and `RESET` forms reset the current role identifier to be the 
current session role identifier. These forms may be executed by any user.
+
+## <a id="topic1__section4"></a>Parameters
+
+<dt>SESSION  </dt>
+<dd>Specifies that the command takes effect for the current session. This is 
the default.</dd>
+
+<dt>LOCAL  </dt>
+<dd>Specifies that the command takes effect for only the current transaction. 
After `COMMIT` or `ROLLBACK`, the session-level setting takes effect again. 
Note that `SET LOCAL` will appear to have no effect if it is executed outside 
of a transaction.</dd>
+
+<dt> \<rolename\>   </dt>
+<dd>The name of a role to use for permissions checking in this session.</dd>
+
+<dt>NONE  
+RESET  </dt>
+<dd>Reset the current role identifier to be the current session role 
identifier (that of the role used to log in).</dd>
+
+## <a id="topic1__section5"></a>Notes
+
+Using this command, it is possible to either add privileges or restrict 
privileges. If the session user role has the `INHERITS` attribute, then it 
automatically has all the privileges of every role that it could `SET ROLE` to; 
in this case `SET ROLE` effectively drops all the privileges assigned directly 
to the session user and to the other roles it is a member of, leaving only the 
privileges available to the named role. On the other hand, if the session user 
role has the `NOINHERITS` attribute, `SET ROLE` drops the privileges assigned 
directly to the session user and instead acquires the privileges available to 
the named role.
+
+In particular, when a superuser chooses to `SET ROLE` to a non-superuser role, 
she loses her superuser privileges.
+
+`SET ROLE` has effects comparable to `SET SESSION AUTHORIZATION`, but the 
privilege checks involved are quite different. Also, `SET SESSION 
AUTHORIZATION` determines which roles are allowable for later `SET ROLE` 
commands, whereas changing roles with `SET ROLE` does not change the set of 
roles allowed to a later `SET ROLE`.
+
+## <a id="topic1__section6"></a>Examples
+
+``` sql
+SELECT SESSION_USER, CURRENT_USER;
+```
+``` pre
+ session_user | current_user 
+--------------+--------------
+ peter        | peter
+```
+``` sql
+SET ROLE 'paul';
+SELECT SESSION_USER, CURRENT_USER;
+```
+``` pre
+ session_user | current_user 
+--------------+--------------
+ peter        | paul
+```
+
+## <a id="topic1__section7"></a>Compatibility
+
+HAWQ allows identifier syntax (\<rolename\>), while the SQL standard requires 
the role name to be written as a string literal. SQL does not allow this 
command during a transaction; HAWQ does not make this restriction. The 
`SESSION` and `LOCAL` modifiers are a HAWQ extension, as is the `RESET` syntax.
+
+## <a id="topic1__section8"></a>See Also
+
+[SET SESSION AUTHORIZATION](SET-SESSION-AUTHORIZATION.html)

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/reference/sql/SET-SESSION-AUTHORIZATION.html.md.erb
----------------------------------------------------------------------
diff --git a/reference/sql/SET-SESSION-AUTHORIZATION.html.md.erb 
b/reference/sql/SET-SESSION-AUTHORIZATION.html.md.erb
new file mode 100644
index 0000000..adea314
--- /dev/null
+++ b/reference/sql/SET-SESSION-AUTHORIZATION.html.md.erb
@@ -0,0 +1,66 @@
+---
+title: SET SESSION AUTHORIZATION
+---
+
+Sets the session role identifier and the current role identifier of the 
current session.
+
+## <a id="topic1__section2"></a>Synopsis
+
+``` pre
+SET [SESSION | LOCAL] SESSION AUTHORIZATION <rolename>
+SET [SESSION | LOCAL] SESSION AUTHORIZATION DEFAULT
+RESET SESSION AUTHORIZATION
+```
+
+## <a id="topic1__section3"></a>Description
+
+This command sets the session role identifier and the current role identifier 
of the current SQL-session context to \<rolename\> . The role name may be 
written as either an identifier or a string literal. Using this command, it is 
possible, for example, to temporarily become an unprivileged user and later 
switch back to being a superuser.
+
+The session role identifier is initially set to be the (possibly 
authenticated) role name provided by the client. The current role identifier is 
normally equal to the session user identifier, but may change temporarily in 
the context of setuid functions and similar mechanisms; it can also be changed 
by [SET ROLE](SET-ROLE.html). The current user identifier is relevant for 
permission checking.
+
+The session user identifier may be changed only if the initial session user 
(the authenticated user) had the superuser privilege. Otherwise, the command is 
accepted only if it specifies the authenticated user name.
+
+The `DEFAULT` and `RESET` forms reset the session and current user identifiers 
to be the originally authenticated user name. These forms may be executed by 
any user.
+
+## <a id="topic1__section4"></a>Parameters
+
+<dt>SESSION  </dt>
+<dd>Specifies that the command takes effect for the current session. This is 
the default.</dd>
+
+<dt>LOCAL  </dt>
+<dd>Specifies that the command takes effect for only the current transaction. 
After `COMMIT` or `ROLLBACK`, the session-level setting takes effect again. 
Note that `SET LOCAL` will appear to have no effect if it is executed outside 
of a transaction.</dd>
+
+<dt> \<rolename\>   </dt>
+<dd>The name of the role to assume.</dd>
+
+<dt>NONE  
+RESET  </dt>
+<dd>Reset the session and current role identifiers to be that of the role used 
to log in.</dd>
+
+## <a id="topic1__section5"></a>Examples
+
+``` sql
+SELECT SESSION_USER, CURRENT_USER;
+```
+``` pre
+ session_user | current_user 
+--------------+--------------
+ peter        | peter
+```
+``` sql
+SET SESSION AUTHORIZATION 'paul';
+SELECT SESSION_USER, CURRENT_USER;
+```
+``` pre
+ session_user | current_user 
+--------------+--------------
+ paul         | paul
+```
+
+## <a id="topic1__section6"></a>Compatibility
+
+The SQL standard allows some other expressions to appear in place of the 
literal \<rolename\>, but these options are not important in practice. HAWQ 
allows identifier syntax (\<rolename\>), while SQL does not. SQL does not allow 
this command during a transaction; HAWQ does not make this restriction. The 
`SESSION` and `LOCAL` modifiers are a HAWQ extension, as is the `RESET` syntax.
+
+## <a id="topic1__section7"></a>See Also
+
+[SET ROLE](SET-ROLE.html)

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/reference/sql/SET.html.md.erb
----------------------------------------------------------------------
diff --git a/reference/sql/SET.html.md.erb b/reference/sql/SET.html.md.erb
new file mode 100644
index 0000000..4f4ad24
--- /dev/null
+++ b/reference/sql/SET.html.md.erb
@@ -0,0 +1,87 @@
+---
+title: SET
+---
+
+Changes the value of a HAWQ configuration parameter.
+
+## <a id="topic1__section2"></a>Synopsis
+
+``` pre
+SET [SESSION | LOCAL] <configuration_parameter> {TO | =} {<value> | '<value>' 
| DEFAULT}
+SET [SESSION | LOCAL] TIME ZONE {<timezone> | LOCAL | DEFAULT}
+```
+
+## <a id="topic1__section3"></a>Description
+
+The `SET` command changes server configuration parameters. Any configuration 
parameter classified as a *session* parameter can be changed on-the-fly with 
`SET`. See [About Server Configuration 
Parameters](../guc/guc_config.html#topic1). `SET` only affects the value used 
by the current session.
+
+If `SET` or `SET SESSION` is issued within a transaction that is later 
aborted, the effects of the `SET` command disappear when the transaction is 
rolled back. Once the surrounding transaction is committed, the effects will 
persist until the end of the session, unless overridden by another `SET`.
+
+The effects of `SET LOCAL` only last till the end of the current transaction, 
whether committed or not. A special case is `SET` followed by `SET LOCAL` 
within a single transaction: the `SET                LOCAL` value will be seen 
until the end of the transaction, but afterwards (if the transaction is 
committed) the `SET` value will take effect.
+
+## <a id="topic1__section4"></a>Parameters
+
+<dt>SESSION  </dt>
+<dd>Specifies that the command takes effect for the current session. This is 
the default.</dd>
+
+<dt>LOCAL  </dt>
+<dd>Specifies that the command takes effect for only the current transaction. 
After `COMMIT` or `ROLLBACK`, the session-level setting takes effect again. 
Note that `SET LOCAL` will appear to have no effect if it is executed outside 
of a transaction.</dd>
+
+<dt> \<configuration\_parameter\>  </dt>
+<dd>The name of a HAWQ configuration parameter. Only parameters classified as 
*session* can be changed with `SET`. See [About Server Configuration 
Parameters](../guc/guc_config.html#topic1).</dd>
+
+<dt> \<value\>  </dt>
+<dd>New value of parameter. Values can be specified as string constants, 
identifiers, numbers, or comma-separated lists of these. `DEFAULT` can be used 
to specify resetting the parameter to its default value. If specifying memory 
sizing or time units, enclose the value in single quotes.</dd>
+
+<dt>TIME ZONE  </dt>
+<dd>`SET TIME ZONE` value is an alias for `SET timezone TO                     
    value`.
+
+<dt>LOCAL,  
+DEFAULT  </dt>
+<dd>Set the time zone to your local time zone (the one that the server's 
operating system defaults to).</dd>
+
+<dt> \<timezone\>  </dt>
+<dd>The \<timezone\> specification. Examples of syntactically valid values:
+
+`'PST8PDT'`
+
+`'Europe/Rome'`
+
+`-7` (time zone 7 hours west from UTC)
+
+`INTERVAL '-08:00' HOUR TO MINUTE` (time zone 8 hours west from UTC).</dd>
+</dd>
+
+## <a id="topic1__section5"></a>Examples
+
+Set the schema search path:
+
+``` sql
+SET search_path TO my_schema, public;
+```
+
+Set the style of date to traditional POSTGRES with "day before month" input 
convention:
+
+``` sql
+SET datestyle TO postgres, dmy;
+```
+
+Set the time zone for San Mateo, California (Pacific Time):
+
+``` sql
+SET TIME ZONE 'PST8PDT';
+```
+
+Set the time zone for Italy:
+
+``` sql
+SET TIME ZONE 'Europe/Rome';
+```
+
+## <a id="topic1__section6"></a>Compatibility
+
+`SET TIME ZONE` extends the syntax defined in the SQL standard. The standard 
allows only numeric time zone offsets while HAWQ allows more flexible time-zone 
specifications. All other `SET` features are HAWQ extensions.
+
+## <a id="topic1__section7"></a>See Also
+
+[RESET](RESET.html), [SHOW](SHOW.html)

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/reference/sql/SHOW.html.md.erb
----------------------------------------------------------------------
diff --git a/reference/sql/SHOW.html.md.erb b/reference/sql/SHOW.html.md.erb
new file mode 100644
index 0000000..802761b
--- /dev/null
+++ b/reference/sql/SHOW.html.md.erb
@@ -0,0 +1,47 @@
+---
+title: SHOW
+---
+
+Shows the value of a system configuration parameter.
+
+## <a id="topic1__section2"></a>Synopsis
+
+``` pre
+SHOW <configuration_parameter>
+
+SHOW ALL
+```
+
+## <a id="topic1__section3"></a>Description
+
+`SHOW` displays the current settings of HAWQ system configuration parameters. 
These parameters can be set using the `SET` statement, or by editing the 
`hawq-site.xml` configuration file of the HAWQ master. Note that some 
parameters viewable by `SHOW` are read-only â their values can be viewed but 
not set. See [About Server Configuration 
Parameters](../guc/guc_config.html#topic1).
+
+## <a id="topic1__section4"></a>Parameters
+
+<dt> \<configuration\_parameter\>   </dt>
+<dd>The name of a system configuration parameter.</dd>
+
+<dt>ALL  </dt>
+<dd>Shows the current value of all configuration parameters.</dd>
+
+## <a id="topic1__section5"></a>Examples
+
+Show the current setting of the parameter `search_path`:
+
+``` sql
+SHOW search_path;
+```
+
+Show the current setting of all parameters:
+
+``` sql
+SHOW ALL;
+```
+
+## <a id="topic1__section6"></a>Compatibility
+
+`SHOW` is a HAWQ extension.
+
+## <a id="topic1__section7"></a>See Also
+
+[SET](SET.html), [RESET](RESET.html)

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/reference/sql/TRUNCATE.html.md.erb
----------------------------------------------------------------------
diff --git a/reference/sql/TRUNCATE.html.md.erb 
b/reference/sql/TRUNCATE.html.md.erb
new file mode 100644
index 0000000..c91ae84
--- /dev/null
+++ b/reference/sql/TRUNCATE.html.md.erb
@@ -0,0 +1,52 @@
+---
+title: TRUNCATE
+---
+
+Empties a table of all rows.
+
+## <a id="topic1__section2"></a>Synopsis
+
+``` pre
+TRUNCATE [TABLE] <name> [, ...] [CASCADE | RESTRICT]
+```
+
+## <a id="topic1__section3"></a>Description
+
+`TRUNCATE` quickly removes all rows from a table or set of tables.This is most 
useful on large tables.
+
+## <a id="topic1__section4"></a>Parameters
+
+<dt> \<name\>   </dt>
+<dd>Required. The name (optionally schema-qualified) of a table to be 
truncated.</dd>
+
+<dt>CASCADE  </dt>
+<dd>Since this key word applies to foreign key references (which are not 
supported in HAWQ) it has no effect.</dd>
+
+<dt>RESTRICT  </dt>
+<dd>Since this key word applies to foreign key references (which are not 
supported in HAWQ) it has no effect.</dd>
+
+## <a id="topic1__section5"></a>Notes
+
+Only the owner of a table may `TRUNCATE` it. `TRUNCATE` will not perform the 
following:
+
+-   Run any user-defined `ON DELETE` triggers that might exist for the tables.
+
+    **Note:** HAWQ does not support user-defined triggers.
+
+-   Truncate any tables that inherit from the named table. Only the named 
table is truncated, not its child tables.
+
+## <a id="topic1__section6"></a>Examples
+
+Empty the table `films`:
+
+``` sql
+TRUNCATE films;
+```
+
+## <a id="topic1__section7"></a>Compatibility
+
+There is no `TRUNCATE` command in the SQL standard.
+
+## <a id="topic1__section8"></a>See Also
+
+[DROP TABLE](DROP-TABLE.html)

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/reference/sql/VACUUM.html.md.erb
----------------------------------------------------------------------
diff --git a/reference/sql/VACUUM.html.md.erb b/reference/sql/VACUUM.html.md.erb
new file mode 100644
index 0000000..2db5757
--- /dev/null
+++ b/reference/sql/VACUUM.html.md.erb
@@ -0,0 +1,92 @@
+---
+title: VACUUM
+---
+
+Garbage-collects and optionally analyzes a database.
+
+## <a id="topic1__section2"></a>Synopsis
+
+``` pre
+VACUUM [FULL] [FREEZE] [VERBOSE] <table>
+VACUUM [FULL] [FREEZE] [VERBOSE] ANALYZE
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â [<table> [(<column> [, ...] )]]
+```
+
+## <a id="topic1__section3"></a>Description
+
+`VACUUM` reclaims storage occupied by deleted tuples. In normal HAWQ 
operation, tuples that are deleted or obsoleted by an update are not physically 
removed from their table; they remain present on disk until a `VACUUM` is done. 
Therefore it is necessary to do `VACUUM` periodically, especially on 
frequently-updated catalog tables. `VACUUM` has no effect on a normal HAWQ 
table, since the delete or update operations are not supported on normal HAWQ 
table.
+
+With no parameter, `VACUUM` processes every table in the current database. 
With a parameter, `VACUUM` processes only that table. `VACUUM ANALYZE` performs 
a `VACUUM` and then an `ANALYZE` for each selected table. This is a handy 
combination form for routine maintenance scripts. See [ANALYZE](ANALYZE.html) 
for more details about its processing.
+
+Plain `VACUUM` (without `FULL`) simply reclaims space and makes it available 
for re-use. This form of the command can operate in parallel with normal 
reading and writing of the table, as an exclusive lock is not obtained. `VACUUM 
FULL` does more extensive processing, including moving of tuples across blocks 
to try to compact the table to the minimum number of disk blocks. This form is 
much slower and requires an exclusive lock on each table while it is being 
processed.
+
+**Outputs**
+
+When `VERBOSE` is specified, `VACUUM` emits progress messages to indicate 
which table is currently being processed. Various statistics about the tables 
are printed as well.
+
+## <a id="topic1__section5"></a>Parameters
+
+<dt>FULL  </dt>
+<dd>Selects a full vacuum, which may reclaim more space but takes much longer 
and exclusively locks the table.
+
+**Note:** A VACUUM FULL is not recommended in HAWQ. See 
[Notes](#topic1__section6).</dd>
+
+<dt>FREEZE  </dt>
+<dd>Specifying `FREEZE` is equivalent to performing `VACUUM` with the 
`vacuum_freeze_min_age` server configuration parameter set to zero. The 
`FREEZE` option is deprecated and will be removed in a future release. Set the 
parameter by using `hawq config` or by modifying your `hawq-site.xml` 
configuration file instead.</dd>
+
+<dt>VERBOSE  </dt>
+<dd>Prints a detailed vacuum activity report for each table.</dd>
+
+<dt>ANALYZE  </dt>
+<dd>Updates statistics used by the planner to determine the most efficient way 
to execute a query.</dd>
+
+<dt> \<table\>   </dt>
+<dd>The name (optionally schema-qualified) of a specific table to vacuum. 
Defaults to all tables in the current database.</dd>
+
+<dt> \<column\>   </dt>
+<dd>The name of a specific column to analyze. Defaults to all columns.</dd>
+
+## <a id="topic1__section6"></a>Notes
+
+`VACUUM` cannot be executed inside a transaction block.
+
+A recommended practice is to vacuum active production databases frequently (at 
least nightly), in order to remove expired rows. After adding or deleting a 
large number of rows, it may be a good idea to issue a `VACUUM ANALYZE` command 
for the affected table. This will update the system catalogs with the results 
of all recent changes, and allow the HAWQ query planner to make better choices 
in planning queries.
+
+`VACUUM` causes a substantial increase in I/O traffic, which can cause poor 
performance for other active sessions. Therefore, it is advisable to vacuum the 
database at low usage times. The `auto vacuum` daemon feature, that automates 
the execution of `VACUUM` and `ANALYZE` commands is currently disabled in HAWQ.
+
+Expired rows are held in what is called the *free space map*. The free space 
map must be sized large enough to cover the dead rows of all tables in your 
database. If not sized large enough, space occupied by dead rows that overflow 
the free space map cannot be reclaimed by a regular `VACUUM` command.
+
+`VACUUM FULL` will reclaim all expired row space, but is a very expensive 
operation and may take an unacceptably long time to finish on large, 
distributed HAWQ tables. If you do get into a situation where the free space 
map has overflowed, it may be more timely to recreate the table with a `CREATE 
TABLE AS` statement and drop the old table.
+
+`VACUUM FULL` is not recommended in HAWQ. It is best to size the free space 
map appropriately. The free space map is configured with the following server 
configuration parameters:
+
+-   `max_fsm_pages`
+-   `max_fsm_relations`
+
+## <a id="topic1__section7"></a>Examples
+
+Vacuum all tables in the current database:
+
+``` sql
+VACUUM;
+```
+
+Vacuum a specific table only:
+
+``` sql
+VACUUM mytable;
+```
+
+Vacuum all tables in the current database and collect statistics for the query 
planner:
+
+``` sql
+VACUUM ANALYZE;
+```
+
+## <a id="topic1__section8"></a>Compatibility
+
+There is no `VACUUM` statement in the SQL standard.
+
+## <a id="topic1__section9"></a>See Also
+
+[ANALYZE](ANALYZE.html)

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/reference/toolkit/hawq_toolkit.html.md.erb
----------------------------------------------------------------------
diff --git a/reference/toolkit/hawq_toolkit.html.md.erb 
b/reference/toolkit/hawq_toolkit.html.md.erb
new file mode 100644
index 0000000..f76963c
--- /dev/null
+++ b/reference/toolkit/hawq_toolkit.html.md.erb
@@ -0,0 +1,263 @@
+---
+title: The hawq_toolkit Administrative Schema
+---
+
+This section provides a reference on the `hawq_toolkit` administrative schema.
+
+HAWQ provides an administrative schema called `hawq_toolkit` that you can use 
to query the system catalogs, log files, and operating environment for system 
status information. The `hawq_toolkit` schema contains a number of views that 
you can access using SQL commands. The `hawq_toolkit` schema is accessible to 
all database users, although some objects may require superuser permissions.
+
+This documentation describes the most useful views in `hawq_toolkit`. You may 
notice other objects (views, functions, and external tables) within the 
`hawq_toolkit` schema that are not described in this documentation (these are 
supporting objects to the views described in this section).
+
+**Warning:** Do not change database objects in the `hawq_toolkit` schema. Do 
not create database objects in the schema. Changes to objects in the schema 
might affect the accuracy of administrative information returned by schema 
objects.
+
+## <a id="topic2"></a>Checking for Tables that Need Routine Maintenance
+
+The following views can help identify tables that need routine table 
maintenance (`VACUUM` and/or `ANALYZE`).
+
+-   [hawq\_stats\_missing](#topic4)
+
+The `VACUUM` command reclaims disk space occupied by deleted or obsolete rows. 
Because of the MVCC transaction concurrency model used in HAWQ, data rows that 
are deleted or updated still occupy physical space on disk even though they are 
not visible to any new transactions. Expired rows increase table size on disk 
and eventually slow down scans of the table.
+
+**Note:** VACUUM FULL is not recommended in HAWQ. See 
[VACUUM](../sql/VACUUM.html#topic1).
+
+The `ANALYZE` command collects column-level statistics needed by the query 
optimizer. HAWQ uses a cost-based query optimizer that relies on database 
statistics. Accurate statistics allow the query optimizer to better estimate 
selectivity and the number of rows retrieved by a query operation in order to 
choose the most efficient query plan.
+
+### <a id="topic4"></a>hawq\_stats\_missing
+
+This view shows tables that do not have statistics and therefore may require 
an `ANALYZE` be run on the table.
+
+<a id="topic4__ie194266"></a>
+
+<span class="tablecap">Table 1. hawq\_stats\_missing view</span>
+
+| Column    | Description                                                      
                                                                                
                                                                                
                                                                                
                                                          |
+|-----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| smischema | Schema name.                                                     
                                                                                
                                                                                
                                                                                
                                                          |
+| smitable  | Table name.                                                      
                                                                                
                                                                                
                                                                                
                                                          |
+| smisize   | Does this table have statistics? False if the table does not 
have row count and row sizing statistics recorded in the system catalog, which 
may indicate that the table needs to be analyzed. This will also be false if 
the table does not contain any rows. For example, the parent tables of 
partitioned tables are always empty and will always return a false result. |
+| smicols   | Number of columns in the table.                                  
                                                                                
                                                                                
                                                                                
                                                          |
+| smirecs   | Number of rows in the table.                                     
                                                                                
                                                                                
                                                                                
                                                          |
+
+
+## <a id="topic16"></a>Viewing HAWQ Server Log Files
+
+Each component of a HAWQ system (master, standby master, and segments) keeps 
its own server log files. The `hawq_log_*` family of views allows you to issue 
SQL queries against the server log files to find particular entries of 
interest. The use of these views requires superuser permissions.
+
+-   [hawq\_log\_command\_timings](#topic17)
+-   [hawq\_log\_master\_concise](#topic19)
+
+### <a id="topic17"></a>hawq\_log\_command\_timings
+
+This view uses an external table to read the log files on the master and 
report the execution time of SQL commands executed in a database session. The 
use of this view requires superuser permissions.
+
+<a id="topic17__ie176169"></a>
+
+<span class="tablecap">Table 2. hawq\_log\_command\_timings view</span>
+
+| Column      | Description                                                |
+|-------------|------------------------------------------------------------|
+| logsession  | The session identifier (prefixed with "con").              |
+| logcmdcount | The command number within a session (prefixed with "cmd"). |
+| logdatabase | The name of the database.                                  |
+| loguser     | The name of the database user.                             |
+| logpid      | The process id (prefixed with "p").                        |
+| logtimemin  | The time of the first log message for this command.        |
+| logtimemax  | The time of the last log message for this command.         |
+| logduration | Statement duration from start to end time.                 |
+
+
+### <a id="topic19"></a>hawq\_log\_master\_concise
+
+This view uses an external table to read a subset of the log fields from the 
master log file. The use of this view requires superuser permissions.
+
+<a id="topic19__ie177543"></a>
+
+<span class="tablecap">Table 3. hawq\_log\_master\_concise view</span>
+
+| Column      | Description                                                |
+|-------------|------------------------------------------------------------|
+| logtime     | The timestamp of the log message.                          |
+| logdatabase | The name of the database.                                  |
+| logsession  | The session identifier (prefixed with "con").              |
+| logcmdcount | The command number within a session (prefixed with "cmd"). |
+| logseverity | The severity level for the record.                         |
+| logmessage  | Log or error message text.                                 |
+
+
+## <a id="topic38"></a>Checking Database Object Sizes and Disk Space
+
+The `hawq_size_*` family of views can be used to determine the disk space 
usage for a distributed HAWQ, schema, table, or index. The following views 
calculate the total size of an object across all segments.
+
+-   [hawq\_size\_of\_all\_table\_indexes](#topic39)
+-   [hawq\_size\_of\_database](#topic40)
+-   [hawq\_size\_of\_index](#topic41)
+-   [hawq\_size\_of\_partition\_and\_indexes\_disk](#topic42)
+-   [hawq\_size\_of\_schema\_disk](#topic43)
+-   [hawq\_size\_of\_table\_and\_indexes\_disk](#topic44)
+-   [hawq\_size\_of\_table\_and\_indexes\_licensing](#topic45)
+-   [hawq\_size\_of\_table\_disk](#topic46)
+-   [hawq\_size\_of\_table\_uncompressed](#topic47)
+
+The table and index sizing views list the relation by object ID (not by name). 
To check the size of a table or index by name, you must look up the relation 
name (`relname`) in the `pg_class` table. For example:
+
+``` pre
+SELECT relname as name, sotdsize as size, sotdtoastsize as 
+toast, sotdadditionalsize as other 
+FROM gp_size_of_table_disk as sotd, pg_class 
+WHERE sotd.sotdoid=pg_class.oid ORDER BY relname;
+```
+
+### <a id="topic39"></a>hawq\_size\_of\_all\_table\_indexes
+
+This view shows the total size of all indexes for a table. This view is 
accessible to all users, however non-superusers will only be able to see 
relations that they have permission to access.
+
+<a id="topic39__ie181657"></a>
+
+<span class="tablecap">Table 4. hawq\_size\_of\_all\_table\_indexes view</span>
+
+| Column          | Description                                  |
+|-----------------|----------------------------------------------|
+| soatioid        | The object ID of the table                   |
+| soatisize       | The total size of all table indexes in bytes |
+| soatischemaname | The schema name                              |
+| soatitablename  | The table name                               |
+
+
+### <a id="topic40"></a>hawq\_size\_of\_database
+
+This view shows the total size of a database. This view is accessible to all 
users, however non-superusers will only be able to see databases that they have 
permission to access.
+
+<a id="topic40__ie181758"></a>
+
+<span class="tablecap">Table 5. hawq\_size\_of\_database view</span>
+
+| Column      | Description                       |
+|-------------|-----------------------------------|
+| sodddatname | The name of the database          |
+| sodddatsize | The size of the database in bytes |
+
+
+### <a id="topic41"></a>hawq\_size\_of\_index
+
+This view shows the total size of an index. This view is accessible to all 
users, however non-superusers will only be able to see relations that they have 
permission to access.
+
+<a id="topic41__ie181709"></a>
+
+<span class="tablecap">Table 6. hawq\_size\_of\_index view</span>
+
+| Column             | Description                                           |
+|--------------------|-------------------------------------------------------|
+| soioid             | The object ID of the index                            |
+| soitableoid        | The object ID of the table to which the index belongs |
+| soisize            | The size of the index in bytes                        |
+| soiindexschemaname | The name of the index schema                          |
+| soiindexname       | The name of the index                                 |
+| soitableschemaname | The name of the table schema                          |
+| soitablename       | The name of the table                                 |
+
+
+### <a id="topic42"></a>hawq\_size\_of\_partition\_and\_indexes\_disk
+
+This view shows the size on disk of partitioned child tables and their 
indexes. This view is accessible to all users, however non-superusers will only 
be able to see relations that they have permission to access.
+
+<a id="topic42__ie181803"></a>
+
+<span class="tablecap">Table 7. hawq\_size\_of\_partition\_and\_indexes\_disk 
view</span>
+
+| Column                     | Description                                     
|
+|----------------------------|-------------------------------------------------|
+| sopaidparentoid            | The object ID of the parent table               
|
+| sopaidpartitionoid         | The object ID of the partition table            
|
+| sopaidpartitiontablesize   | The partition table size in bytes               
|
+| sopaidpartitionindexessize | The total size of all indexes on this partition 
|
+| Sopaidparentschemaname     | The name of the parent schema                   
|
+| Sopaidparenttablename      | The name of the parent table                    
|
+| Sopaidpartitionschemaname  | The name of the partition schema                
|
+| sopaidpartitiontablename   | The name of the partition table                 
|
+
+
+### <a id="topic43"></a>hawq\_size\_of\_schema\_disk
+
+This view shows schema sizes for the public schema and the user-created 
schemas in the current database. This view is accessible to all users, however 
non-superusers will be able to see only the schemas that they have permission 
to access.
+
+<a id="topic43__ie183105"></a>
+
+<span class="tablecap">Table 8. hawq\_size\_of\_schema\_disk view</span>
+
+| Column              | Description                                      |
+|---------------------|--------------------------------------------------|
+| sosdnsp             | The name of the schema                           |
+| sosdschematablesize | The total size of tables in the schema in bytes  |
+| sosdschemaidxsize   | The total size of indexes in the schema in bytes |
+
+
+### <a id="topic44"></a>hawq\_size\_of\_table\_and\_indexes\_disk
+
+This view shows the size on disk of tables and their indexes. This view is 
accessible to all users, however non-superusers will only be able to see 
relations that they have permission to access.
+
+<a id="topic44__ie183128"></a>
+
+<span class="tablecap">Table 9. hawq\_size\_of\_table\_and\_indexes\_disk 
view</span>
+
+| Column           | Description                                |
+|------------------|--------------------------------------------|
+| sotaidoid        | The object ID of the parent table          |
+| sotaidtablesize  | The disk size of the table                 |
+| sotaididxsize    | The total size of all indexes on the table |
+| sotaidschemaname | The name of the schema                     |
+| sotaidtablename  | The name of the table                      |
+
+
+### <a id="topic45"></a>hawq\_size\_of\_table\_and\_indexes\_licensing
+
+This view shows the total size of tables and their indexes for licensing 
purposes. The use of this view requires superuser permissions.
+
+<a id="topic45__ie181949"></a>
+
+<span class="tablecap">Table 10. 
hawq\_size\_of\_table\_and\_indexes\_licensing view</span>
+
+| Column                      | Description                                    
                                             |
+|-----------------------------|---------------------------------------------------------------------------------------------|
+| sotailoid                   | The object ID of the table                     
                                             |
+| sotailtablesizedisk         | The total disk size of the table               
                                             |
+| sotailtablesizeuncompressed | If the table is a compressed append-only 
table, shows the uncompressed table size in bytes. |
+| sotailindexessize           | The total size of all indexes in the table     
                                             |
+| sotailschemaname            | The schema name                                
                                             |
+| sotailtablename             | The table name                                 
                                             |
+
+
+### <a id="topic46"></a>hawq\_size\_of\_table\_disk
+
+This view shows the size of a table on disk. This view is accessible to all 
users, however non-superusers will only be able to see tables that they have 
permission to access
+
+<a id="topic46__ie183408"></a>
+
+<span class="tablecap">Table 11. hawq\_size\_of\_table\_disk view</span>
+
+| Column             | Description                                             
                                                                                
                                                             |
+|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| sotdoid            | The object ID of the table                              
                                                                                
                                                             |
+| sotdsize           | The size of the table in bytes. The size is only the 
main table size. The size does not include auxiliary objects such as oversized 
(toast) attributes, or additional storage objects for AO tables. |
+| sotdtoastsize      | The size of the TOAST table (oversized attribute 
storage), if there is one.                                                      
                                                                    |
+| sotdadditionalsize | Reflects the segment and block directory table sizes 
for append-only (AO) tables.                                                    
                                                                |
+| sotdschemaname     | The schema name                                         
                                                                                
                                                             |
+| sotdtablename      | The table name                                          
                                                                                
                                                             |
+
+
+### <a id="topic47"></a>hawq\_size\_of\_table\_uncompressed
+
+This view shows the uncompressed table size for append-only (AO) tables. 
Otherwise, the table size on disk is shown. The use of this view requires 
superuser permissions.
+
+<a id="topic47__ie183582"></a>
+
+<span class="tablecap">Table 12. hawq\_size\_of\_table\_uncompressed 
view</span>
+
+| Column         | Description                                                 
                                                  |
+|----------------|---------------------------------------------------------------------------------------------------------------|
+| sotuoid        | The object ID of the table                                  
                                                  |
+| sotusize       | The uncomressed size of the table in bytes if it is a 
compressed AO table. Otherwise, the table size on disk. |
+| sotuschemaname | The schema name                                             
                                                  |
+| sotutablename  | The table name                                              
                                                  |
+
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/7514e193/requirements/system-requirements.html.md.erb
----------------------------------------------------------------------
diff --git a/requirements/system-requirements.html.md.erb 
b/requirements/system-requirements.html.md.erb
new file mode 100644
index 0000000..bc08ef0
--- /dev/null
+++ b/requirements/system-requirements.html.md.erb
@@ -0,0 +1,197 @@
+---
+title: Apache HAWQ System Requirements
+---
+
+Follow these guidelines to configure each host machine that will run an Apache 
HAWQ or PXF service.
+
+
+## <a id="topic_d3f_vlz_g5"></a>Host Memory Configuration
+
+In order to prevent data loss or corruption in an Apache HAWQ cluster, you 
must configure the memory on each host machine so that the Linux Out-of-Memory 
\(OOM\) killer process never kills a HAWQ process due to OOM conditions. \(HAWQ 
applies its own rules to enforce memory restrictions.\)
+
+**For mission critical deployments of HAWQ, perform these steps on each host 
machine to configure memory:**
+
+1.  Set the operating system `vm.overcommit_memory` parameter to 2. With this 
setting, the OOM killer process reports an error instead of killing running 
processes. To set this parameter:
+    1.  Open the `/etc/sysctl.conf` file with a text editor.
+    2.  Add or change the parameter definition so that the file includes these 
lines:
+
+        ```
+        kernel.threads-max=798720
+        vm.overcommit_memory=2
+        ```
+
+    3.  Save and close the file, then execute this command to apply your 
change:
+
+        ``` shell
+        $ sysctl -p
+        ```
+
+    4.  To view the current `vm.overcommit_memory` setting, execute the 
command:
+
+        ``` shell
+        $ sysctl -a | grep overcommit_memory
+        ```
+
+    5.  To view the runtime overcommit settings, execute the command:
+
+        ``` shell
+        $ cat /proc/meminfo | grep Commit
+        ```
+
+2.  Set the Linux swap space size and `vm.overcommit_ratio` parameter 
according to the available memory on each host. For hosts having 2GB-8GB of 
memory, set swap space = physical RAM and set `vm.overcommit_ratio=50`. For 
hosts having more than 8GB up to 64GB of memory, set swap space = 0.5 \* 
physical RAM and set `vm.overcommit_ratio=50`. For hosts having more than 64GB 
memory, set swap space = 4GB and set `vm.overcommit_ratio=100`
+
+    To set the `vm.overcommit_ratio` parameter:
+
+    1.  Open the `/etc/sysctl.conf` file with a text editor.
+    2.  Add or change the parameter definition so that the file includes the 
line:
+
+           ```
+       vm.overcommit_ratio=50
+       ```
+
+           \(Use `vm.overcommit_ratio=100` for hosts with more than 64GB RAM.\)
+    3.  Save and close the file, then execute this command to apply your 
change:
+
+        ``` shell
+        $ sysctl -p
+        ```
+
+    4.  To view the current `vm.overcommit_ratio` setting, execute the command:
+
+        ``` shell
+        $ sysctl -a | grep overcommit_ratio
+        ```
+        You can choose to use a dedicated swap partition, a swap file, or a 
combination of both. View the current swap settings using the command:
+
+        ``` shell
+        $ cat /proc/meminfo | grep Swap
+        ```
+3.  Ensure that all Java services that run on the machine use the `-Xmx` 
switch to allocate only their required heap.
+4.  Ensure that no other services \(such as Puppet\) or automated processes 
attempt to reset the overcommit settings on cluster hosts.
+5.  During the installation process, configure HAWQ memory by setting YARN or 
HAWQ configuration parameters, as described in [HAWQ Memory 
Configuration](#topic_uzf_flz_g5).
+
+## <a id="topic_uzf_flz_g5"></a>HAWQ Memory Configuration
+
+You must configure the memory used by HAWQ according to whether you plan to 
use YARN or HAWQ to manage system resources.
+
+After you configure the `vm.overcommit_ratio` and swap space according to 
[Host Memory Configuration](#topic_d3f_vlz_g5), the total memory available to a 
Linux host machine can be represented by the equation:
+
+```
+TOTAL_MEMORY = RAM * overcommit_ratio_percentage + SWAP
+```
+
+`TOTAL_MEMORY` comprises both HAWQ memory and `NON_HAWQ_MEMORY`, which is the 
memory used by components such as:
+
+-   Operating system
+-   DataNode
+-   NodeManager
+-   PXF
+-   All other software you run on the host machine.
+
+To configure the HAWQ memory for a given host, first determine the amount of 
`NON_HAWQ_MEMORY` that is used on the machine. Then configure HAWQ memory by 
setting the correct parameter according to whether you use the HAWQ default 
resource manager or YARN to manage resources:
+
+-   If you are using YARN for resource management, set 
`yarn.nodemanager.resource.memory-mb` to the smaller of `TOTAL_MEMORY - 
NON_HAWQ_MEMORY` or `RAM`.
+-   If you are using the HAWQ default resource manager, set 
`hawq_rm_memory_limit_perseg = RAM - NON_HAWQ_MEMORY`.
+
+You can set either parameter using Ambari when configuring YARN or when 
installing HAWQ with Ambari.
+
+### Example 1 - Large Host Machine
+
+An example large host machine uses the memory configuration:
+
+>RAM: 256GB
+>
+>SWAP: 4GB
+
+>NON\_HAWQ\_MEMORY:
+
+>> 2GB for Operating System
+
+>> 2GB for DataNode
+
+>> 2GB for NodeManager
+
+>> 1GB for PXF
+
+>overcommit\_ratio\_percentage:1 \(`vm.overcommit_ratio` = 100\)
+
+For this machine, `TOTAL_MEMORY = 256GB * 1 + 4GB = 260GB`.
+
+If this system uses YARN for resource management, you would set 
`yarn.nodemanager.resource.memory-mb` to `TOTAL_MEMORY - NON_HAWQ_MEMORY` = 
260GB - 7GB = 253 \(because 253GB is smaller than the available amount of RAM\).
+
+If this system uses the default HAWQ resource manager, you would set 
`hawq_rm_memory_limit_perseg` = `RAM - NON_HAWQ_MEMORY` = 256 GB - 7GB = 249.
+
+### Example 2 - Medium Host Machine
+
+An example medium host machine uses the memory configuration:
+
+>RAM: 64GB
+
+>SWAP: 32GB
+
+>NON\_HAWQ\_MEMORY:
+
+>>2GB for Operating System
+
+>>2GB for DataNode
+
+>>2GB for NodeManager
+
+>>1GB for PXF
+
+>overcommit\_ratio\_percentage: .5 \(`vm.overcommit_ratio` = 50\)
+
+For this machine, `TOTAL_MEMORY = 64GB * .5 + 32GB = 64GB`.
+
+If this system uses YARN for resource management, you would set 
`yarn.nodemanager.resource.memory-mb` to `TOTAL_MEMORY - NON_HAWQ_MEMORY` = 
64GB - 7GB = 57 \(because 57GB is smaller than the available amount of RAM\).
+
+If this system uses the default HAWQ resource manager, you would set 
`hawq_rm_memory_limit_perseg` = `RAM - NON_HAWQ_MEMORY` = 64 GB - 11GB = 57.
+
+### Example 3 - Small Host Machine \(Not recommended for production use\)
+
+An example small machine uses the memory configuration:
+
+>RAM: 8GB
+
+>SWAP: 8GB
+
+>NON\_HAWQ\_MEMORY:
+
+>>2GB for Operating System
+
+>>2GB for DataNode
+
+>>2GB for NodeManager
+
+>>1GB for PXF
+
+>overcommit\_ratio\_percentage:  .5 \(`vm.overcommit_ratio` = 50\)
+
+For this machine, `TOTAL_MEMORY = 8GB * .5 + 8GB = 12GB`.
+
+If this system uses YARN for resource management, you would set 
`yarn.nodemanager.resource.memory-mb` to `TOTAL_MEMORY - NON_HAWQ_MEMORY` = 
12GB - 7GB = 5 \(because 5GB is smaller than the available amount of RAM\).
+
+If this system uses the default HAWQ resource manager, you would set 
`hawq_rm_memory_limit_perseg` = `RAM - NON_HAWQ_MEMORY` = 8 GB - 7GB = 1.
+
+## <a id="topic_bsm_hhv_2v"></a>Disk Requirements
+
+-   2GB per host for HAWQ installation.Â 
+-   Approximately 300MB per segment instance for metadata.
+-   Multiple large (2TB or greater) disks are recommended for HAWQ master and 
segment temporary directories. For a given query, HAWQ will use a separate temp 
directory (if available) for each virtual segment to store spill files. 
Multiple HAWQ sessions will also use separate temp directories where available 
to avoid disk contention. If you configure too few temp directories, or you 
place multiple temp directories on the same disk, you increase the risk of disk 
contention or running out of disk space when multiple virtual segments target 
the same disk. Each HAWQ segment node can have 6 virtual segments.  
+-   Appropriate free space for data: disks should have at least 30% free space 
\(no more than 70% capacity\).
+-   High-speed, local storage
+
+## <a id="topic_rdb_jhv_2v"></a>Network Requirements
+
+-   Gigabit Ethernet within the array. For a production cluster, 10 Gigabit 
Ethernet recommended.
+-   Dedicated, non-blocking switch.
+-   Systems with multiple NICs require NIC bonding to utilize all available 
network bandwidth.
+
+
+## <a id="port-req"></a>Port Requirements
+Individual PXF plug-ins, which you install after adding the HAWQ and PXF 
services, require that you Tomcat on the host machine. Tomcat reserves ports 
8005, 8080, and 8009.
+
+If you have configured Oozie JXM reporting on a host that will run a PXF 
plug-in, make sure that the reporting service uses a port other than 8005. This 
helps to prevent port conflict errors from occurring when you start the PXF 
service.
+
+## <a id="umask"></a>Umask Requirement
+Set the OS file system umask to 022 on all cluster hosts. This ensure that 
users can read the HDFS block files.

[02/36] incubator-hawq-docs git commit: moving book configuration to new 'book' branch, for HAWQ-1027

Reply via email to