remove book branching on incubator-hawq-docs

yozie Tue, 10 Jan 2017 15:55:07 -0800

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/query/query-performance.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/query/query-performance.html.md.erb 
b/markdown/query/query-performance.html.md.erb
new file mode 100644
index 0000000..981d77b
--- /dev/null
+++ b/markdown/query/query-performance.html.md.erb
@@ -0,0 +1,155 @@
+---
+title: Query Performance
+---
+
+<span class="shortdesc">HAWQ dynamically allocates resources to queries. Query 
performance depends on several factors such as data locality, number of virtual 
segments used for the query and general cluster health.</span>
+
+-   Dynamic Partition Elimination
+
+    In HAWQ, values available only when a query runs are used to dynamically 
prune partitions, which improves query processing speed. Enable or disable 
dynamic partition elimination by setting the server configuration parameter 
`gp_dynamic_partition_pruning` to `ON` or `OFF`; it is `ON` by default.
+
+-   Memory Optimizations
+
+    HAWQ allocates memory optimally for different operators in a query and 
frees and re-allocates memory during the stages of processing a query.
+
+-   Runaway Query Termination
+
+    HAWQ can automatically terminate the most memory-intensive queries based 
on a memory usage threshold. The threshold is set as a configurable percentage 
([runaway\_detector\_activation\_percent](../reference/guc/parameter_definitions.html#runaway_detector_activation_percent))
 of the resource quota for the segment, which is calculated by HAWQ's resource 
manager.
+
+    If the amount of virtual memory utilized by a physical segment exceeds the 
calculated threshold, then HAWQ begins terminating queries based on memory 
usage, starting with the query that is consuming the largest amount of memory. 
Queries are terminated until the percentage of utilized virtual memory is below 
the specified percentage.
+
+    To calculate the memory usage threshold for runaway queries, HAWQ uses the 
following formula:
+
+    *vmem threshold* = (*virtual memory quota calculated by resource manager* 
+ 
[hawq\_re\_memory\_overcommit\_max](../reference/guc/parameter_definitions.html#hawq_re_memory_overcommit_max))
 \* 
[runaway\_detector\_activation\_percent](../reference/guc/parameter_definitions.html#runaway_detector_activation_percent).
+
+    For example, if HAWQ resource manager calculates a virtual memory quota of 
9GB,`             hawq_re_memory_overcommit_max` is set to 1GB and the value of 
`runaway_detector_activation_percent` is 95 (95%), then HAWQ starts terminating 
queries when the utilized virtual memory exceeds 9.5 GB.
+
+    To disable automatic query detection and termination, set the value of 
`runaway_detector_activation_percent` to 100.
+
+## <a id="id_xkg_znj_f5"></a>How to Investigate Query Performance Issues
+
+A query is not executing as quickly as you would expect. Here is how to 
investigate possible causes of slowdown:
+
+1.  Check the health of the cluster.
+    1.  Are any DataNodes, segments or nodes down?
+    2.  Are there many failed disks?
+
+2.  Check table statistics. Have the tables involved in the query been 
analyzed?
+3.  Check the plan of the query and run /3/4 to determine the bottleneck. 
+    Sometimes, there is not enough memory for some operators, such as Hash 
Join, or spill files are used. If an operator cannot perform all of its work in 
the memory allocated to it, it caches data on disk in *spill files*. Compared 
with no spill files, a query will run much slower.
+
+4.  Check data locality statistics using /3/4. Alternately you can check the 
logs. Data locality result for every query could also be found in the log of 
HAWQ. See [Data Locality Statistics](query-performance.html#topic_amk_drc_d5) 
for information on the statistics.
+5.  Check resource queue status. You can query view `pg_resqueue_status` to 
check if the target queue has already dispatched some resource to the queries, 
or if the target queue is lacking resources. See [Checking Existing Resource 
Queues](../resourcemgmt/ResourceQueues.html#topic_lqy_gls_zt).
+6.  Analyze a dump of the resource manager's status to see more resource queue 
status. See [Analyzing Resource Manager 
Status](../resourcemgmt/ResourceQueues.html#topic_zrh_pkc_f5).
+
+## <a id="topic_amk_drc_d5"></a>Data Locality Statistics
+
+For visibility into query performance, use the EXPLAIN ANALYZE to obtain data 
locality statistics. For example:
+
+``` sql
+postgres=# CREATE TABLE test (i int);
+postgres=# INSERT INTO test VALUES(2);
+postgres=# EXPLAIN ANALYZE SELECT * FROM test;
+```
+```
+QUERY PLAN
+.......
+Data locality statistics:
+data locality ratio: 1.000; virtual segment number: 1; different host number: 
1;
+virtual segment number per host(avg/min/max): (1/1/1);
+segment size(avg/min/max): (32.000 B/32 B/32 B);
+segment size with penalty(avg/min/max): (32.000 B/32 B/32 B);
+continuity(avg/min/max): (1.000/1.000/1.000); DFS metadatacache: 7.816 ms;
+resource allocation: 0.615 ms; datalocality calculation: 0.136 ms.
+```
+
+The following table describes the metrics related to data locality. Use these 
metrics to examine issues behind a query's performance.
+
+<a id="topic_amk_drc_d5__table_q4p_25c_d5"></a>
+
+<table>
+<caption><span class="tablecap">Table 1. Data Locality 
Statistics</span></caption>
+<colgroup>
+<col width="50%" />
+<col width="50%" />
+</colgroup>
+<thead>
+<tr class="header">
+<th>Statistic</th>
+<th>Description</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td>data locality ratio</td>
+<td><p>Indicates the total local read ratio of a query. The lower the ratio, 
the more remote read happens. Since remote read on HDFS needs network IO, the 
execution time of a query may increase.</p>
+<p>For hash distributed tables, all the blocks of a file will be processed by 
one segment, so if data on HDFS is redistributed, such as by the HDFS Balancer, 
the data locality ratio will be decreased. In this case, you can redistribute 
the hash distributed table manually by using CREATE TABLE AS SELECT.</p></td>
+</tr>
+<tr class="even">
+<td>number of virtual segments</td>
+<td>Typically, the more virtual segments are used, the faster the query will 
be executed. If the virtual segment number is too small, you can check whether 
<code class="ph codeph">default_hash_table_bucket_number</code>, <code 
class="ph codeph">hawq_rm_nvseg_perquery_limit</code>, or the bucket number of 
a hash distributed table is small. See <a href="#topic_wv3_gzc_d5">Number of 
Virtual Segments</a>.</td>
+</tr>
+<tr class="odd">
+<td>different host number</td>
+<td>Indicates how many hosts are used to run this query. All the hosts should 
be used when the virtual segment number is bigger than the total number of 
hosts according to the resource allocation strategy of HAWQ. As a result, if 
this metric is smaller than the total number of hosts for a big query, it often 
indicates that some hosts are down. In this case, use âselect 
gp_segment_configurationâ to check the node states first.</td>
+</tr>
+<tr class="even">
+<td>segment size and segment size with penalty</td>
+<td>âsegment sizeâ indicates the (avg/min/max) data size which is 
processed by a virtual segment. âsegment size with penaltyâ is the segment 
size when remote read is calculated as ânet_disk_ratioâ * block size. The 
virtual segment that contains remote read should process less data than the 
virtual segment that contains only local read. ânet_disk_ratioâ can be 
tuned to measure how much slower the remote read is than local read for 
different network environments, while considering the workload balance between 
the nodes. The default value of ânet_disk_ratioâ is 1.01.</td>
+</tr>
+<tr class="odd">
+<td>continuity</td>
+<td>reading a HDFS file discontinuously will introduce additional seek, which 
will slow the table scan of a query. A low value of continuity indicates that 
the blocks of a file are not continuously distributed on a DataNode.</td>
+</tr>
+<tr class="even">
+<td>DFS metadatacache</td>
+<td>Indicates the metadatacache time cost for a query. In HAWQ, HDFS block 
information is cached in a metadatacache process. If cache miss happens, time 
cost of metadatacache may increase.</td>
+</tr>
+<tr class="odd">
+<td>resource allocation</td>
+<td>Indicates the time cost of acquiring resources from the resource 
manager.</td>
+</tr>
+<tr class="even">
+<td>datalocality calculation</td>
+<td>Indicates the time to run the algorithm that assigns HDFS blocks to 
virtual segments and calculates the data locality ratio.</td>
+</tr>
+</tbody>
+</table>
+
+## <a id="topic_wv3_gzc_d5"></a>Number of Virtual Segments
+
+To obtain the best results when querying data in HAWQ, review the best 
practices described in this topic.
+
+### <a id="virtual_seg_performance"></a>Factors Impacting Query Performance
+
+The number of virtual segments used for a query directly impacts the query's 
performance. The following factors can impact the degree of parallelism of a 
query:
+
+-   **Cost of the query**. Small queries use fewer segments and larger queries 
use more segments. Some techniques used in defining resource queues can 
influence the number of both virtual segments and general resources allocated 
to queries.
+-   **Available resources at query time**. If more resources are available in 
the resource queue, those resources will be used.
+-   **Hash table and bucket number**. If the query involves only 
hash-distributed tables, the query's parallelism is fixed (equal to the hash 
table bucket number) under the following conditions:
+
+   - The bucket number (bucketnum) configured for all the hash tables is the 
same bucket number
+   - The table size for random tables is no more than 1.5 times the size 
allotted for the hash tables.
+
+  Otherwise, the number of virtual segments depends on the query's cost: 
hash-distributed table queries behave like queries on randomly-distributed 
tables.
+
+-   **Query Type**: It can be difficult to calculate  resource costs for 
queries with some user-defined functions or for queries to external tables. 
With these queries,  the number of virtual segments is controlled by the  
`hawq_rm_nvseg_perquery_limit `and `hawq_rm_nvseg_perquery_perseg_limit` 
parameters, as well as by the ON clause and the location list of external 
tables. If the query has a hash result table (e.g. `INSERT into hash_table`), 
the number of virtual segments must be equal to the bucket number of the 
resulting hash table. If the query is performed in utility mode, such as for 
`COPY` and `ANALYZE` operations, the virtual segment number is calculated by 
different policies.
+
+###General Guidelines
+
+The following guidelines expand on the numbers of virtual segments to use, 
provided there are sufficient resources available.
+
+-   **Random tables exist in the select list:** \#vseg (number of virtual 
segments) depends on the size of the table.
+-   **Hash tables exist in the select list:** \#vseg depends on the bucket 
number of the table.
+-   **Random and hash tables both exist in the select list:** \#vseg depends 
on the bucket number of the table, if the table size of random tables is no 
more than 1.5 times the size of hash tables. Otherwise, \#vseg depends on the 
size of the random table.
+-   **User-defined functions exist:** \#vseg depends on the 
`hawq_rm_nvseg_perquery_limit` and `hawq_rm_nvseg_perquery_perseg_limit` 
parameters.
+-   **PXF external tables exist:** \#vseg depends on the 
`default_hash_table_bucket_number` parameter.
+-   **gpfdist external tables exist:** \#vseg is at least the number of 
locations in the location list.
+-   **The command for CREATE EXTERNAL TABLE is used:** \#vseg must reflect the 
value in the command and use the `ON` clause in the command.
+-   **Hash tables are copied to or from files:** \#vseg depends on the bucket 
number of the hash table.
+-   **Random tables are copied to files:** \#vseg depends on the size of the 
random table.
+-   **Random tables are copied from files:** \#vseg is a fixed value. \#vseg 
is 6, when there are sufficient resources.
+-   **ANALYZE table:** Analyzing a nonpartitioned table will use more virtual 
segments than a partitioned table.
+-   **Relationship between hash distribution results:** \#vseg must be the 
same as the bucket number for the hash table.
+
+


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/query/query-profiling.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/query/query-profiling.html.md.erb 
b/markdown/query/query-profiling.html.md.erb
new file mode 100644
index 0000000..ea20e0a
--- /dev/null
+++ b/markdown/query/query-profiling.html.md.erb
@@ -0,0 +1,240 @@
+---
+title: Query Profiling
+---
+
+<span class="shortdesc">Examine the query plans of poorly performing queries 
to identify possible performance tuning opportunities.</span>
+
+HAWQ devises a *query plan* for each query. Choosing the right query plan to 
match the query and data structure is necessary for good performance. A query 
plan defines how HAWQ will run the query in the parallel execution environment.
+
+The query optimizer uses data statistics maintained by the database to choose 
a query plan with the lowest possible cost. Cost is measured in disk I/O, shown 
as units of disk page fetches. The goal is to minimize the total execution cost 
for the plan.
+
+View the plan for a given query with the `EXPLAIN` command. `EXPLAIN` shows 
the query optimizer's estimated cost for the query plan. For example:
+
+``` sql
+EXPLAIN SELECT * FROM names WHERE id=22;
+```
+
+`EXPLAIN ANALYZE` runs the statement in addition to displaying its plan. This 
is useful for determining how close the optimizer's estimates are to reality. 
For example:
+
+``` sql
+EXPLAIN ANALYZE SELECT * FROM names WHERE id=22;
+```
+
+**Note:** The legacy and GPORCA query optimizers coexist in HAWQ. GPORCA is 
the default HAWQ optimizer. HAWQ uses GPORCA to generate an execution plan for 
a query when possible. The `EXPLAIN` output generated by GPORCA is different 
than the output generated by the legacy query optimizer.
+
+When the `EXPLAIN ANALYZE` command uses GPORCA, the `EXPLAIN` plan shows only 
the number of partitions that are being eliminated. The scanned partitions are 
not shown. To show name of the scanned partitions in the segment logs set the 
server configuration parameter `gp_log_dynamic_partition_pruning` to `on`. This 
example `SET` command enables the parameter.
+
+``` sql
+SET gp_log_dynamic_partition_pruning = on;
+```
+
+For information about GPORCA, see [Querying Data](query.html#topic1).
+
+## <a id="topic40"></a>Reading EXPLAIN Output
+
+A query plan is a tree of nodes. Each node in the plan represents a single 
operation, such as a table scan, join, aggregation, or sort.
+
+Read plans from the bottom to the top: each node feeds rows into the node 
directly above it. The bottom nodes of a plan are usually table scan 
operations. If the query requires joins, aggregations, sorts, or other 
operations on the rows, there are additional nodes above the scan nodes to 
perform these operations. The topmost plan nodes are usually HAWQ motion nodes: 
redistribute, broadcast, or gather motions. These operations move rows between 
segment instances during query processing.
+
+The output of `EXPLAIN` has one line for each node in the plan tree and shows 
the basic node type and the following execution cost estimates for that plan 
node:
+
+-   **cost** âMeasured in units of disk page fetches. 1.0 equals one 
sequential disk page read. The first estimate is the start-up cost of getting 
the first row and the second is the total cost of cost of getting all rows. The 
total cost assumes all rows will be retrieved, which is not always true; for 
example, if the query uses `LIMIT`, not all rows are retrieved.
+-   **rows** âThe total number of rows output by this plan node. This number 
is usually less than the number of rows processed or scanned by the plan node, 
reflecting the estimated selectivity of any `WHERE` clause conditions. Ideally, 
the estimate for the topmost node approximates the number of rows that the 
query actually returns.
+-   **width** âThe total bytes of all the rows that this plan node outputs.
+
+Note the following:
+
+-   The cost of a node includes the cost of its child nodes. The topmost plan 
node has the estimated total execution cost for the plan. This is the number 
the optimizer intends to minimize.
+-   The cost reflects only the aspects of plan execution that the query 
optimizer takes into consideration. For example, the cost does not reflect time 
spent transmitting result rows to the client.
+
+### <a id="topic41"></a>EXPLAIN Example
+
+The following example describes how to read an `EXPLAIN` query plan for a 
query:
+
+``` sql
+EXPLAIN SELECT * FROM names WHERE name = 'Joelle';
+```
+
+```
+                                 QUERY PLAN
+-----------------------------------------------------------------------------
+ Gather Motion 2:1  (slice1; segments: 2)  (cost=0.00..1.01 rows=1 width=11)
+   ->  Append-only Scan on names  (cost=0.00..1.01 rows=1 width=11)
+         Filter: name::text = 'Joelle'::text
+(3 rows)
+```
+
+Read the plan from the bottom to the top. To start, the query optimizer 
sequentially scans the *names* table. Notice the `WHERE` clause is applied as a 
*filter* condition. This means the scan operation checks the condition for each 
row it scans and outputs only the rows that satisfy the condition.
+
+The results of the scan operation are passed to a *gather motion* operation. 
In HAWQ, a gather motion is when segments send rows to the master. In this 
example, we have two segment instances that send to one master instance. This 
operation is working on `slice1` of the parallel query execution plan. A query 
plan is divided into *slices* so the segments can work on portions of the query 
plan in parallel.
+
+The estimated startup cost for this plan is `00.00` (no cost) and a total cost 
of `1.01` disk page fetches. The optimizer estimates this query will return one 
row.
+
+## <a id="topic42"></a>Reading EXPLAIN ANALYZE Output
+
+`EXPLAIN ANALYZE` plans and runs the statement. The `EXPLAIN           
ANALYZE` plan shows the actual execution cost along with the optimizer's 
estimates. This allows you to see if the optimizer's estimates are close to 
reality. `EXPLAIN ANALYZE` also shows the following:
+
+-   The total runtime (in milliseconds) in which the query executed.
+-   The memory used by each slice of the query plan, as well as the memory 
reserved for the whole query statement.
+-   Statistics for the query dispatcher, including the number of executors 
used for the current query (total number/number of executors cached by previous 
queries/number of executors newly connected), dispatcher time (total dispatch 
time/connection establish time/dispatch data to executor time); and some 
time(max/min/avg) details for dispatching data, consuming executor data, and 
freeing executor.
+-   Statistics about data locality. See [Data Locality 
Statistics](query-performance.html#topic_amk_drc_d5) for details about these 
statistics.
+-   The number of *workers* (segments) involved in a plan node operation. Only 
segments that return rows are counted.
+-   The Max/Last statistics are for the segment that output the maximum number 
of rows and the segment with the longest *&lt;time&gt; to end*.
+-   The segment id of the segment that produced the most rows for an operation.
+-   For relevant operations, the amount of memory (`work_mem`) used by the 
operation. If the `work_mem` was insufficient to perform the operation in 
memory, the plan shows the amount of data spilled to disk for the 
lowest-performing segment. For example:
+
+    ``` pre
+    Work_mem used: 64K bytes avg, 64K bytes max (seg0).
+    Work_mem wanted: 90K bytes avg, 90K byes max (seg0) to lessen
+    workfile I/O affecting 2 workers.
+    ```
+**Note**
+The *work\_mem* property is not configurable. Use resource queues to manage 
memory use. For more information on resource queues, see [Configuring Resource 
Management](../resourcemgmt/ConfigureResourceManagement.html) and [Working with 
Hierarchical Resource Queues](../resourcemgmt/ResourceQueues.html).
+
+-   The time (in milliseconds) in which the segment that produced the most 
rows retrieved the first row, and the time taken for that segment to retrieve 
all rows. The result may omit *&lt;time&gt; to first row* if it is the same as 
the *&lt;time&gt; to end*.
+
+### <a id="topic43"></a>EXPLAIN ANALYZE Example
+
+This example describes how to read an `EXPLAIN ANALYZE` query plan using the 
same query. The `bold` parts of the plan show actual timing and rows returned 
for each plan node, as well as memory and time statistics for the whole query.
+
+``` sql
+EXPLAIN ANALYZE SELECT * FROM names WHERE name = 'Joelle';
+```
+
+```
+                                 QUERY PLAN
+------------------------------------------------------------------------
+ Gather Motion 1:1  (slice1; segments: 1)  (cost=0.00..1.01 rows=1 width=7)
+   Rows out:  Avg 1.0 rows x 1 workers at destination.  
Max/Last(seg0:ip-10-0-1-16/seg0:ip-10-0-1-16) 1/1 rows with 8.713/8.713 ms to 
first row, 8.714/8.714 ms to end, start offset by 0.708/0.708 ms.
+   ->  Append-only Scan on names  (cost=0.00..1.01 rows=1 width=7)
+         Filter: name = 'Joelle'::text
+         Rows out:  Avg 1.0 rows x 1 workers.  
Max/Last(seg0:ip-10-0-1-16/seg0:ip-10-0-1-16) 1/1 rows with 7.053/7.053 ms to 
first row, 7.089/7.089 ms to end, start offset by 2.162/2.162 ms.
+ Slice statistics:
+   (slice0)    Executor memory: 159K bytes.
+   (slice1)    Executor memory: 247K bytes (seg0:ip-10-0-1-16).
+ Statement statistics:
+   Memory used: 262144K bytes
+ Dispatcher statistics:
+   executors used(total/cached/new connection): (1/1/0); dispatcher 
time(total/connection/dispatch data): (0.217 ms/0.000 ms/0.037 ms).
+   dispatch data time(max/min/avg): (0.037 ms/0.037 ms/0.037 ms); consume 
executor data time(max/min/avg): (0.015 ms/0.015 ms/0.015 ms); free executor 
time(max/min/avg): (0.000 ms/0.000 ms/0.000 ms).
+ Data locality statistics:
+   data locality ratio: 1.000; virtual segment number: 1; different host 
number: 1; virtual segment number per host(avg/min/max): (1/1/1); segment 
size(avg/min/max): (48.000 B/48 B/48 B); segment size with 
penalty(avg/min/max): (48.000 B/48 B/48 B); continuity(avg/min/max): 
(1.000/1.000/1.000); DFS metadatacache: 9.343 ms; resource allocation: 0.638 
ms; datalocality calculation: 0.144 ms.
+ Total runtime: 19.690 ms
+(16 rows)
+```
+
+Read the plan from the bottom to the top. The total elapsed time to run this 
query was *19.690* milliseconds.
+
+The *Append-only scan* operation had only one segment (*seg0*) that returned 
rows, and it returned just *1 row*. The Max/Last statistics are identical in 
this example because only one segment returned rows. It took *7.053* 
milliseconds to find the first row and *7.089* milliseconds to scan all rows. 
This result is close to the optimizer's estimate: the query optimizer estimated 
it would return one row for this query. The *gather motion* (segments sending 
data to the master) received 1 row. The total elapsed time for this operation 
was *19.690* milliseconds.
+
+## <a id="topic44"></a>Examining Query Plans to Solve Problems
+
+If a query performs poorly, examine its query plan and ask the following 
questions:
+
+-   **Do operations in the plan take an exceptionally long time?** Look for an 
operation that consumes the majority of query processing time. For example, if 
a scan on a hash table takes longer than expected, the data locality may be 
low; reloading the data can increase the data locality and speed up the query. 
Or, adjust `enable_<operator>` parameters to see if you can force the legacy 
query optimizer (planner) to choose a different plan by disabling a particular 
query plan operator for that query.
+-   **Are the optimizer's estimates close to reality?** Run `EXPLAIN           
  ANALYZE` and see if the number of rows the optimizer estimates is close to 
the number of rows the query operation actually returns. If there is a large 
discrepancy, collect more statistics on the relevant columns.
+-   **Are selective predicates applied early in the plan?** Apply the most 
selective filters early in the plan so fewer rows move up the plan tree. If the 
query plan does not correctly estimate query predicate selectivity, collect 
more statistics on the relevant columns. You can also try reordering the 
`WHERE` clause of your SQL statement.
+-   **Does the optimizer choose the best join order?** When you have a query 
that joins multiple tables, make sure that the optimizer chooses the most 
selective join order. Joins that eliminate the largest number of rows should be 
done earlier in the plan so fewer rows move up the plan tree.
+
+    If the plan is not choosing the optimal join order, set 
`join_collapse_limit=1` and use explicit `JOIN` syntax in your SQL statement to 
force the legacy query optimizer (planner) to the specified join order. You can 
also collect more statistics on the relevant join columns.
+
+-   **Does the optimizer selectively scan partitioned tables?** If you use 
table partitioning, is the optimizer selectively scanning only the child tables 
required to satisfy the query predicates? Scans of the parent tables should 
return 0 rows since the parent tables do not contain any data. See [Verifying 
Your Partition Strategy](../ddl/ddl-partition.html#topic74) for an example of a 
query plan that shows a selective partition scan.
+-   **Does the optimizer choose hash aggregate and hash join operations where 
applicable?** Hash operations are typically much faster than other types of 
joins or aggregations. Row comparison and sorting is done in memory rather than 
reading/writing from disk. To enable the query optimizer to choose hash 
operations, there must be sufficient memory available to hold the estimated 
number of rows. Try increasing work memory to improve performance for a query. 
If possible, run an `EXPLAIN             ANALYZE` for the query to show which 
plan operations spilled to disk, how much work memory they used, and how much 
memory was required to avoid spilling to disk. For example:
+
+    `Work_mem used: 23430K bytes avg, 23430K bytes max (seg0). Work_mem        
       wanted: 33649K bytes avg, 33649K bytes max (seg0) to lessen workfile I/O 
affecting 2               workers.`
+
+    The "bytes wanted" message from `EXPLAIN               ANALYZE` is based 
on the amount of data written to work files and is not exact. The minimum 
`work_mem` needed can differ from the suggested value.
+
+## <a id="explainplan_plpgsql"></a>Generating EXPLAIN Plan from a PL/pgSQL 
Function
+
+User-defined PL/pgSQL functions often include dynamically created queries.  
You may find it useful to generate the `EXPLAIN` plan for such queries for 
query performance optimization and tuning. 
+
+Perform the following steps to create and run a user-defined PL/pgSQL 
function.  This function displays the `EXPLAIN` plan for a simple query on a 
test database.
+
+1. Log in to the HAWQ master node as user `gpadmin` and set up the HAWQ 
environment:
+
+    ``` shell
+    $ ssh gpadmin@hawq_master
+    $ . /usr/local/hawq/greenplum_path.sh
+    ```
+
+2. Create a test database named `testdb`:
+
+    ``` shell
+    $ createdb testdb
+    ```
+   
+3. Start the PostgreSQL interactive utility, connecting to `testdb`:
+
+    ``` shell
+    $ psql -d testdb
+    ```
+
+4. Create the table `test_tbl` with a single column named `id` of type 
`integer`:
+
+    ``` sql
+    testdb=# CREATE TABLE test_tbl (id int);
+    ```
+   
+5. Add some data to the `test_tbl` table:
+
+    ``` sql
+    testdb=# INSERT INTO test_tbl SELECT generate_series(1,100);
+    ```
+   
+    This `INSERT` command adds 100 rows to `test_tbl`, incrementing the `id` 
for each row.
+   
+6. Create a PL/pgSQL function named `explain_plan_func()` by copying and 
pasting the following text at the `psql` prompt:
+
+    ``` sql
+    CREATE OR REPLACE FUNCTION explain_plan_func() RETURNS varchar as $$
+   declare
+
+     a varchar;
+     b varchar;
+
+     begin
+       a = '';
+       for b in execute 'explain select count(*) from test_tbl group by id' 
loop
+         a = a || E'\n' || b;
+       end loop;
+       return a;
+     end;
+   $$
+   LANGUAGE plpgsql
+   VOLATILE;
+    ```
+
+7. Verify the `explain_plan_func()` user-defined function was created 
successfully:
+
+    ``` shell
+    testdb=# \df+
+    ```
+
+    The `\df+` command lists all user-defined functions.
+   
+8. Perform a query using the user-defined function you just created:
+
+    ``` sql
+    testdb=# SELECT explain_plan_func();
+    ```
+
+    The `EXPLAIN` plan results for the query are displayed:
+    
+    ``` pre
+                                             explain_plan_func                 
              
+---------------------------------------------------------------------------------------------------------
                                                                                
             
+ Gather Motion 1:1  (slice2; segments: 1)  (cost=0.00..431.04 rows=100 
width=8)                          
+   ->  Result  (cost=0.00..431.03 rows=100 width=8)                         
+         ->  HashAggregate  (cost=0.00..431.03 rows=100 width=8)               
 
+               Group By: id                                                 
+               ->  Redistribute Motion 1:1  (slice1; segments: 1)  
(cost=0.00..431.02 rows=100 width=12) 
+                     Hash Key: id                                              
+                     ->  Result  (cost=0.00..431.01 rows=100 width=12)      
+                           ->  HashAggregate  (cost=0.00..431.01 rows=100 
width=12)                      
+                                 Group By: id                               
+                                 ->  Table Scan on test_tbl  
(cost=0.00..431.00 rows=100 width=4) 
+ Settings:  default_hash_table_bucket_number=6                              
+ Optimizer status: PQO version 1.627
+(1 row)
+    ```

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/query/query.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/query/query.html.md.erb b/markdown/query/query.html.md.erb
new file mode 100644
index 0000000..9c218c7
--- /dev/null
+++ b/markdown/query/query.html.md.erb
@@ -0,0 +1,37 @@
+---
+title: Querying Data
+---
+
+This topic provides information about using SQL in HAWQ databases.
+
+You enter SQL statements called queries to view and analyze data in a database 
using the `psql` interactive SQL client and other client tools.
+
+**Note:** HAWQ queries timeout after a period of 600 seconds. For this reason, 
long-running queries may appear to hang until results are processed or until 
the timeout period expires.
+
+-   **[About HAWQ Query Processing](../query/HAWQQueryProcessing.html)**
+
+    This topic provides an overview of how HAWQ processes queries. 
Understanding this process can be useful when writing and tuning queries.
+
+-   **[About GPORCA](../query/gporca/query-gporca-optimizer.html)**
+
+    In HAWQ, you can use GPORCA or the legacy query optimizer.
+
+-   **[Defining Queries](../query/defining-queries.html)**
+
+    HAWQ is based on the PostgreSQL implementation of the SQL standard. SQL 
commands are typically entered using the standard PostgreSQL interactive 
terminal `psql`, but other programs that have similar functionality can be used 
as well.
+
+-   **[Using Functions and Operators](../query/functions-operators.html)**
+
+    HAWQ evaluates functions and operators used in SQL expressions.
+
+-   **[Query Performance](../query/query-performance.html)**
+
+    HAWQ dynamically allocates resources to queries. Query performance depends 
on several factors such as data locality, number of virtual segments used for 
the query and general cluster health.
+
+-   **[Query Profiling](../query/query-profiling.html)**
+
+    Examine the query plans of poorly performing queries to identify possible 
performance tuning opportunities.
+
+
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/reference/CharacterSetSupportReference.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/reference/CharacterSetSupportReference.html.md.erb 
b/markdown/reference/CharacterSetSupportReference.html.md.erb
new file mode 100644
index 0000000..8a12471
--- /dev/null
+++ b/markdown/reference/CharacterSetSupportReference.html.md.erb
@@ -0,0 +1,439 @@
+---
+title: Character Set Support Reference
+---
+
+This topic provides a referene of the character sets supported in HAWQ.
+
+The character set support in HAWQ allows you to store text in a variety of 
character sets, including single-byte character sets such as the ISO 8859 
series and multiple-byte character sets such as EUC (Extended Unix Code), 
UTF-8, and Mule internal code. All supported character sets can be used 
transparently by clients, but a few are not supported for use within the server 
(that is, as a server-side encoding). The default character set is selected 
while initializing your HAWQ using `hawq init.` It can be overridden when you 
create a database, so you can have multiple databases each with a different 
character set.
+
+<table style="width:100%;">
+<colgroup>
+<col width="16%" />
+<col width="16%" />
+<col width="16%" />
+<col width="16%" />
+<col width="16%" />
+<col width="16%" />
+</colgroup>
+<thead>
+<tr class="header">
+<th>Name</th>
+<th>Description</th>
+<th>Language</th>
+<th>Server</th>
+<th>Bytes/Char</th>
+<th>Aliases</th>
+</tr>
+</thead>
+<tbody>
+<tr class="odd">
+<td>BIG5</td>
+<td>Big Five</td>
+<td>Traditional Chinese</td>
+<td>No</td>
+<td>1-2</td>
+<td>WIN950, Windows950</td>
+</tr>
+<tr class="even">
+<td>EUC_CN</td>
+<td>Extended UNIX Code-CN</td>
+<td>Simplified Chinese</td>
+<td>Yes</td>
+<td>1-3</td>
+<td></td>
+</tr>
+<tr class="odd">
+<td>EUC_JP</td>
+<td>Extended UNIX Code-JP</td>
+<td>Japanese</td>
+<td>Yes</td>
+<td>1-3</td>
+<td>Â </td>
+</tr>
+<tr class="even">
+<td>EUC_KR</td>
+<td>Extended UNIX Code-KR</td>
+<td>Korean</td>
+<td>Yes</td>
+<td>1-3</td>
+<td>Â </td>
+</tr>
+<tr class="odd">
+<td>EUC_TW</td>
+<td>Extended UNIX Code-TW</td>
+<td>Traditional Chinese, Taiwanese</td>
+<td>Yes</td>
+<td>1-3</td>
+<td>Â </td>
+</tr>
+<tr class="even">
+<td>GB18030</td>
+<td>National Standard</td>
+<td>Chinese</td>
+<td>No</td>
+<td>1-2</td>
+<td>Â </td>
+</tr>
+<tr class="odd">
+<td>GBK</td>
+<td>Extended National Standard</td>
+<td>Simplified Chinese</td>
+<td>No</td>
+<td>1-2</td>
+<td>WIN936,Windows936</td>
+</tr>
+<tr class="even">
+<td>ISO_8859_5</td>
+<td>ISO 8859-5, ECMA 113</td>
+<td>Latin/Cyrillic</td>
+<td>Yes</td>
+<td>1</td>
+<td>Â </td>
+</tr>
+<tr class="odd">
+<td>ISO_8859_6</td>
+<td>ISO 8859-6, ECMA 114</td>
+<td>Latin/Arabic</td>
+<td>Yes</td>
+<td>1</td>
+<td>Â </td>
+</tr>
+<tr class="even">
+<td>ISO_8859_7</td>
+<td>ISO 8859-7, ECMA 118</td>
+<td>Latin/Greek</td>
+<td>Yes</td>
+<td>1</td>
+<td>Â </td>
+</tr>
+<tr class="odd">
+<td>ISO_8859_8</td>
+<td>ISO 8859-8, ECMA 121</td>
+<td>Latin/Hebrew</td>
+<td>Yes</td>
+<td>1</td>
+<td>Â </td>
+</tr>
+<tr class="even">
+<td>JOHAB</td>
+<td>JOHA</td>
+<td>Korean (Hangul)</td>
+<td>Yes</td>
+<td>1-3</td>
+<td>Â </td>
+</tr>
+<tr class="odd">
+<td>KOI8</td>
+<td>KOI8-R(U)</td>
+<td>Cyrillic</td>
+<td>Yes</td>
+<td>1</td>
+<td>KOI8R</td>
+</tr>
+<tr class="even">
+<td>LATIN1</td>
+<td>ISO 8859-1, ECMA 94</td>
+<td>Western European</td>
+<td>Yes</td>
+<td>1</td>
+<td>ISO88591</td>
+</tr>
+<tr class="odd">
+<td>LATIN2</td>
+<td>ISO 8859-2, ECMA 94</td>
+<td>Central European</td>
+<td>Yes</td>
+<td>1</td>
+<td>ISO88592</td>
+</tr>
+<tr class="even">
+<td>LATIN3</td>
+<td>ISO 8859-3, ECMA 94</td>
+<td>South European</td>
+<td>Yes</td>
+<td>1</td>
+<td>ISO88593</td>
+</tr>
+<tr class="odd">
+<td>LATIN4</td>
+<td>ISO 8859-4, ECMA 94</td>
+<td>North European</td>
+<td>Yes</td>
+<td>1</td>
+<td>ISO88594</td>
+</tr>
+<tr class="even">
+<td>LATIN5</td>
+<td>ISO 8859-9, ECMA 128</td>
+<td>Turkish</td>
+<td>Yes</td>
+<td>1</td>
+<td>ISO88599</td>
+</tr>
+<tr class="odd">
+<td>LATIN6</td>
+<td>ISO 8859-10, ECMA 144</td>
+<td>Nordic</td>
+<td>Yes</td>
+<td>1</td>
+<td>ISO885910</td>
+</tr>
+<tr class="even">
+<td>LATIN7</td>
+<td>ISO 8859-13</td>
+<td>Baltic</td>
+<td>Yes</td>
+<td>1</td>
+<td>ISO885913</td>
+</tr>
+<tr class="odd">
+<td>LATIN8</td>
+<td>ISO 8859-14</td>
+<td>Celtic</td>
+<td>Yes</td>
+<td>1</td>
+<td>ISO885914</td>
+</tr>
+<tr class="even">
+<td>LATIN9</td>
+<td>ISO 8859-15</td>
+<td>LATIN1 with Euro and accents</td>
+<td>Yes</td>
+<td>1</td>
+<td>ISO885915</td>
+</tr>
+<tr class="odd">
+<td>LATIN10</td>
+<td>ISO 8859-16, ASRO SR 14111</td>
+<td>Romanian</td>
+<td>Yes</td>
+<td>1</td>
+<td>ISO885916</td>
+</tr>
+<tr class="even">
+<td>MULE_INTERNAL</td>
+<td>Mule internal code</td>
+<td>Multilingual Emacs</td>
+<td>Yes</td>
+<td>1-4</td>
+<td>Â </td>
+</tr>
+<tr class="odd">
+<td>SJIS</td>
+<td>Shift JIS</td>
+<td>Japanese</td>
+<td>No</td>
+<td>1-2</td>
+<td>Mskanji, ShiftJIS, WIN932, Windows932</td>
+</tr>
+<tr class="even">
+<td>SQL_ASCII</td>
+<td>unspecified2</td>
+<td>any</td>
+<td>No</td>
+<td>1</td>
+<td>Â </td>
+</tr>
+<tr class="odd">
+<td>UHC</td>
+<td>Unified Hangul Code</td>
+<td>Korean</td>
+<td>No</td>
+<td>1-2</td>
+<td>WIN949, Windows949</td>
+</tr>
+<tr class="even">
+<td>UTF8</td>
+<td>Unicode, 8-bitÂ </td>
+<td>all</td>
+<td>Yes</td>
+<td>1-4</td>
+<td>Unicode</td>
+</tr>
+<tr class="odd">
+<td>WIN866</td>
+<td>Windows CP866</td>
+<td>Cyrillic</td>
+<td>Yes</td>
+<td>1</td>
+<td>ALT</td>
+</tr>
+<tr class="even">
+<td>WIN874</td>
+<td>Windows CP874</td>
+<td>Thai</td>
+<td>Yes</td>
+<td>1</td>
+<td>Â </td>
+</tr>
+<tr class="odd">
+<td>WIN1250</td>
+<td>Windows CP1250</td>
+<td>Central European</td>
+<td>Yes</td>
+<td>1</td>
+<td>Â </td>
+</tr>
+<tr class="even">
+<td>WIN1251</td>
+<td>Windows CP1251</td>
+<td>Cyrillic</td>
+<td>Yes</td>
+<td>1</td>
+<td>WIN</td>
+</tr>
+<tr class="odd">
+<td>WIN1252</td>
+<td>Windows CP1252</td>
+<td>Western European</td>
+<td>Yes</td>
+<td><p>1</p></td>
+<td>Â </td>
+</tr>
+<tr class="even">
+<td>WIN1253</td>
+<td>Windows CP1253</td>
+<td>Greek</td>
+<td>Yes</td>
+<td>1</td>
+<td>Â </td>
+</tr>
+<tr class="odd">
+<td>WIN1254</td>
+<td>Windows CP1254</td>
+<td>Turkish</td>
+<td>Yes</td>
+<td>1</td>
+<td>Â </td>
+</tr>
+<tr class="even">
+<td>WIN1255</td>
+<td>Windows CP1255</td>
+<td>Hebrew</td>
+<td>Yes</td>
+<td>1</td>
+<td>Â </td>
+</tr>
+<tr class="odd">
+<td>WIN1256</td>
+<td>Windows CP1256</td>
+<td>Arabic</td>
+<td>Yes</td>
+<td>1</td>
+<td>Â </td>
+</tr>
+<tr class="even">
+<td>WIN1257</td>
+<td>Windows CP1257</td>
+<td>Baltic</td>
+<td>Yes</td>
+<td>1</td>
+<td>Â </td>
+</tr>
+<tr class="odd">
+<td>WIN1258</td>
+<td>Windows CP1258</td>
+<td>Vietnamese</td>
+<td>Yes</td>
+<td>1</td>
+<td>ABC, TCVN, TCVN5712, VSCIIÂ </td>
+</tr>
+</tbody>
+</table>
+
+**Note:**
+
+-   Not all the APIs support all the listed character sets. For example, the 
JDBC driver does not support MULE\_INTERNAL, LATIN6, LATIN8, and LATIN10.
+-   The SQLASCII setting behaves considerable differently from the other 
settings. Byte values 0-127 are interpreted according to the ASCII standard, 
while byte values 128-255 are taken as uninterpreted characters. If you are 
working with any nonASCII data, it is unwise to use the SQL\_ASCII setting as a 
client encoding. SQL\_ASCII is not supported as a server encoding.
+
+## <a id="settingthecharacterset"></a>Setting the Character Set
+
+`hawq init` defines the default character set for a HAWQ system by reading the 
setting of the ENCODING parameter in the gp\_init\_config file at 
initialization time. The default character set is UNICODE or UTF8.
+
+You can create a database with a different character set besides what is used 
as the system-wide default. For example:
+
+``` sql
+CREATE DATABASE korean WITH ENCODING 'EUC_KR';
+```
+
+**Note:** Although you can specify any encoding you want for a database, it is 
unwise to choose an encoding that is not what is expected by the locale you 
have selected. The LC\_COLLATE and LC\_CTYPE settings imply a particular 
encoding, and locale-dependent operations (such as sorting) are likely to 
misinterpret data that is in an incompatible encoding.
+
+Since these locale settings are frozen by hawq init, the apparent flexibility 
to use different encodings in different databases is more theoretical than real.
+
+One way to use multiple encodings safely is to set the locale to C or POSIX 
during initialization time, thus disabling any real locale awareness.
+
+## <a id="charactersetconversionbetweenserverandclient"></a>Character Set 
Conversion Between Server and Client
+
+HAWQ supports automatic character set conversion between server and client for 
certain character set combinations. The conversion information is stored in the 
master pg\_conversion system catalog table. HAWQ comes with some predefined 
conversions or you can create a new conversion using the SQL command CREATE 
CONVERSION.
+
+| Server Character Set | Available Client Sets                                 
                                                                         |
+|----------------------|--------------------------------------------------------------------------------------------------------------------------------|
+| BIG5                 | not supported as a server encoding                    
                                                                         |
+| EUC\_CN              | EUC\_CN, MULE\_INTERNAL, UTF8                         
                                                                         |
+| EUC\_JP              | EUC\_JP, MULE\_INTERNAL, SJIS, UTF8                   
                                                                         |
+| EUC\_KR              | EUC\_KR, MULE\_INTERNAL, UTF8                         
                                                                         |
+| EUC\_TWÂ              | EUC\_TW, BIG5, MULE\_INTERNAL, UTF8Â                 
                                                                           |
+| GB18030              | not supported as a server encoding                    
                                                                         |
+| GBK                  | not supported as a server encoding                    
                                                                         |
+| ISO\_8859\_5         | ISO\_8859\_5, KOI8, MULE\_INTERNAL, UTF8, WIN866, 
WIN1251                                                                      |
+| ISO\_8859\_6         | ISO\_8859\_6, UTF8                                    
                                                                         |
+| ISO\_8859\_7         | ISO\_8859\_7, UTF8                                    
                                                                         |
+| ISO\_8859\_8         | ISO\_8859\_8, UTF8                                    
                                                                         |
+| JOHAB                | JOHAB, UTF8                                           
                                                                         |
+| KOI8                 | KOI8, ISO\_8859\_5, MULE\_INTERNAL, UTF8, WIN866, 
WIN1251                                                                      |
+| LATIN1               | LATIN1, MULE\_INTERNAL, UTF8                          
                                                                         |
+| LATIN2               | LATIN2, MULE\_INTERNAL, UTF8, WIN1250                 
                                                                         |
+| LATIN3               | LATIN3, MULE\_INTERNAL, UTF8                          
                                                                         |
+| LATIN4               | LATIN4, MULE\_INTERNAL, UTF8                          
                                                                         |
+| LATIN5               | LATIN5, UTF8                                          
                                                                         |
+| LATIN6               | LATIN6, UTF8                                          
                                                                         |
+| LATIN7               | LATIN7, UTF8                                          
                                                                         |
+| LATIN8               | LATIN8, UTF8Â                                         
                                                                          |
+| LATIN9               | LATIN9, UTF8                                          
                                                                         |
+| LATIN10              | LATIN10, UTF8                                         
                                                                         |
+| MULE\_INTERNAL       | MULE\_INTERNAL, BIG5, EUC\_CN, EUC\_JP, EUC\_KR, 
EUC\_TW, ISO\_8859\_5, KOI8, LATIN1 to LATIN4, SJIS, WIN866, WIN1250, WIN1251 |
+| SJIS                 | not supported as a server encoding                    
                                                                         |
+| SQL\_ASCII           | not supported as a server encoding                    
                                                                         |
+| UHC                  | not supported as a server encoding                    
                                                                         |
+| UTF8                 | all supported encodings                               
                                                                         |
+| WIN866               | WIN866                                                
                                                                         |
+| WIN874               | WIN874, UTF8                                          
                                                                         |
+| WIN1250              | WIN1250, LATIN2, MULE\_INTERNAL, UTF8                 
                                                                         |
+| WIN1251              | WIN1251, ISO\_8859\_5, KOI8, MULE\_INTERNAL, UTF8, 
WIN866Â                                                                      |
+| WIN1252              | WIN1252, UTF8                                         
                                                                         |
+| WIN1253              | WIN1253, UTF8                                         
                                                                         |
+| WIN1254              | WIN1254, UTF8                                         
                                                                         |
+| WIN1255              | WIN1255, UTF8                                         
                                                                         |
+| WIN1256              | WIN1256, UTF8                                         
                                                                         |
+| WIN1257              | WIN1257, UTF8Â                                        
                                                                          |
+| WIN1258              | WIN1258, UTF8Â                                        
                                                                          |
+
+To enable automatic character set conversion, you have to tell HAWQ the 
character set (encoding) you would like to use in the client. There are several 
ways to accomplish this:Â 
+
+-   Using the \\encoding command in psql, which allows you to change client 
encoding on the fly.
+-   Using SET client\_encoding TO. Setting the client encoding can be done 
with this SQL command:
+
+    ``` sql
+    SET CLIENT_ENCODING TO 'value';
+    ```
+
+    To query the current client encoding:
+
+    ``` sql
+    SHOW client_encoding;
+    ```
+
+    To return the default encoding:
+
+    ``` sql
+    RESET client_encoding;
+    ```
+
+-   Using the PGCLIENTENCODING environment variable. When PGCLIENTENCODING is 
defined in the client's environment, that client encoding is automatically 
selected when a connection to the server is made. (This can subsequently be 
overridden using any of the other methods mentioned above.)
+-   Setting the configuration parameter client\_encoding. If client\_encoding 
is set in the master `hawq-site.xml` file, that client encoding is 
automatically selected when a connection to HAWQ is made. (This can 
subsequently be overridden using any of the other methods mentioned above.)
+
+If the conversion of a particular character is not possible â suppose you 
chose EUC\_JP for the server and LATIN1 for the client, then some Japanese 
characters do not have a representation in LATIN1 â then an error is 
reported.Â 
+
+If the client character set is defined as SQL\_ASCII, encoding conversion is 
disabled, regardless of the serverâs character set. The use of SQL\_ASCII is 
unwise unless you are working with all-ASCII data. SQL\_ASCII is not supported 
as a server encoding.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/reference/HAWQDataTypes.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/reference/HAWQDataTypes.html.md.erb 
b/markdown/reference/HAWQDataTypes.html.md.erb
new file mode 100644
index 0000000..fe5cff7
--- /dev/null
+++ b/markdown/reference/HAWQDataTypes.html.md.erb
@@ -0,0 +1,139 @@
+---
+title: Data Types
+---
+
+This topic provides a reference of the data types supported in HAWQ.
+
+HAWQ has a rich set of native data types available to users. Users may also 
define new data types using the `CREATE TYPE` command. This reference shows all 
of the built-in data types. In addition to the types listed here, there are 
also some internallyÂ used data types, such as **oid**Â (object identifier), 
but those are not documented in this guide.
+
+The following data types are specified by SQL:
+
+-   array (*)
+-   bit
+-   bit varying
+-   boolean
+-   character varying
+-   char
+-   character
+-   date
+-   decimal
+-   double precision
+-   integer
+-   interval
+-   numeric
+-   real
+-   smallint
+-   time (with or without time zone)
+-   timestampÂ (with or without time zone)
+-   varchar
+
+**Note**(\*): HAWQ supports the array data type for append-only tables; 
parquet table storage does *not* support the array type. 
+
+Each data type has an external representation determined by its input and 
output functions. Many of the built-in types have obvious external formats. 
However, several types are unique to HAWQ, such as geometric paths, or have 
several possibilities for formats, such as the date and time types. Some of the 
input and output functions are not invertible. That is, the result of an output 
function may lose accuracy when compared to the original input.
+
+ 
+ <span class="tablecap">Table 1. HAWQ Built-in Data Types</span>
+=======
+ 
+
+| Name                                       | Alias               | Size      
            | Range                                       | Description         
                                                              |
+|--------------------------------------------|---------------------|-----------------------|---------------------------------------------|-----------------------------------------------------------------------------------|
+| array                                     |          [ ]       |    variable 
(ignored)    | multi-dimensional |   any built-in or user-defined base type, 
enum type, or composite type                                                    
           |
+| bigint                                     | int8                | 8 bytes   
            | -9223372036854775808 to 9223372036854775807 | large range integer 
                                                              |
+| bigserial                                  | serial8             | 8 bytes   
            | 1 to 9223372036854775807                    | large 
autoincrementing integer                                                    |
+| bit \[ (n) \]                              | Â                    | n bits   
             | bit string constant                         | fixed-length bit 
string                                                           |
+| bit varying \[ (n) \]                      | varbit              | actual 
number of bits | bit string constant                         | variable-length 
bit string                                                        |
+| boolean                                    | bool                | 1 byte    
            | true/false, t/f, yes/no, y/n, 1/0           | logical Boolean 
(true/false)                                                      |
+| box                                        | Â                    | 32 bytes 
             | ((x1,y1),(x2,y2))                           | rectangular box in 
the plane - not allowed in distribution key columns.           |
+| bytea                                      | Â                    | 1 byte + 
binarystring | sequence of octets                          | variable-length 
binary string                                                     |
+| characterÂ \[ (n) \]                        | char Â \[ (n) \]     | 1 byte 
+ n            | strings up to n characters in length        | fixed-length, 
blank padded                                                        |
+| character varyingÂ \[ (n) \]                | varcharÂ  \[ (n) \]  | 1 byte 
+ binarystring | strings up to n characters in length        | variable-lengthÂ 
 with limit                                                       |
+| cidr                                       | Â                    | 12 or 24 
bytes        | Â                                            | IPv4 networks     
                                                                |
+| circle                                     | Â                    | 24 bytes 
             | &lt;(x,y),r&gt; (center and radius)         | circle in the 
plane - not allowed in distribution key columns.                    |
+| date                                       | Â                    | 4 bytes  
             | 4713 BC - 294,277 AD                        | Â calendar date 
(year, month, day)                                                 |
+| decimal \[ (p, s) \]                       | numeric \[ (p,s) \] | variable  
            | no limit                                    | user-specified, 
inexact                                                           |
+| double precision                           | float8 float        | 8 bytes   
            | 15 decimal digitsÂ precision                 | 
variable-precision, inexact                                                     
  |
+| inet                                       | Â                    | 12 or 24 
bytes        | Â                                            | Â IPv4 hosts and 
networks                                                          |
+| integer                                    | int, int4           | 4 bytes   
            | -2147483648 to +2147483647                  | usual choice for 
integer                                                          |
+| intervalÂ \[ (p) \]                         | Â                    | 12 
bytes              | -178000000 years - 178000000 years          | time span    
                                                                     |
+| lseg                                       | Â                    | 32 bytes 
             | ((x1,y1),(x2,y2))                           | line segment in 
the plane - not allowed in distribution key columns.              |
+| macaddr                                    | Â                    | 6 bytes  
             | Â                                            | MAC addresses     
                                                                |
+| money                                      | Â                    | 4 bytes  
             | -21474836.48 to +21474836.47                | currency amount    
                                                               |
+| path                                       | Â                    | 16+16n 
bytes          | \[(x1,y1),...\]                             | geometric path 
in the plane - not allowed in distribution key columns.            |
+| point                                      | Â                    | 16 bytes 
             | (x, y)                                      | geometric path in 
the plane - not allowed in distribution key columns.            |
+| polygon                                    | Â                    | 40+16n 
bytes          | Â \[(x1,y1),...\]                            | closed 
geometric path in the plane - not allowed in the distribution key columns. |
+| real                                       | float4              | 4 bytes   
            | 6 decimal digits precision                  | Â 
variable-precision, inexact                                                     
 |
+| serial                                     | serial4             | 4 bytes   
            | 1 to 2147483647                             | autoincrementing 
integer                                                          |
+| smallint                                   | int2                | 2Â bytes  
             | -32768 to +32767                            | small range 
integer                                                               |
+| text                                       | Â                    | 1 byteÂ 
+ string size  | strings of any length                       | variable 
unlimited length                                                         |
+| time \[ (p) \] \[ without time zone \]     | Â                    | 8 bytes  
             | 00:00:00\[.000000\] - 24:00:00\[.000000\]   | time of day only   
                                                               |
+| time \[ (p) \] with time zone              | timetz              | 12 bytes  
            | 00:00:00+1359 - 24:00:00-1359               | time of day only, 
with time zone                                                  |
+| timestamp \[ (p) \] \[without time zone \] | Â                    | 8 bytes  
             | 4713 BC - 294,277 AD                        | both date and time 
                                                               |
+| timestamp \[ (p) \] with time zone         | timestamptz         | 8 bytes   
            | 4713 BC - 294,277 AD                        | both date and time, 
with time zone                                                |
+| xml                                        | Â                    | 1 byte + 
xml size     | xml of any length                           | variable unlimited 
length                                                         |
+
+ 
+For variable length data types (such as char, varchar, text, xml, etc.) if the 
data is greater than or equal to 127 bytes, the storage overhead is 4 bytes 
instead of 1.
+
+**Note**: Use these documented built-in types when creating user tables.  Any 
other data types that might be visible in the source code are for internal use 
only.
+
+## <a id="timezones"></a>Time Zones
+
+Time zones, and time-zone conventions, are influenced by political decisions, 
not just earth geometry. Time zones around the world became somewhat 
standardized during the 1900's, but continue to be prone to arbitrary changes, 
particularly with respect to daylight-savings rules.Â HAWQÂ uses the 
widely-usedÂ zoneinfoÂ time zone database for information about historical time 
zone rules. For times in the future, the assumption is that the latest known 
rules for a given time zone will continue to be observed indefinitely far into 
the future.
+
+HAWQ is compatible with theÂ SQLÂ standard definitions for typical usage. 
However, theÂ SQLÂ standard has an odd mix of date and time types and 
capabilities. Two obvious problems are:
+
+-   Although theÂ dateÂ type cannot have an associated time zone, theÂ timeÂ 
type can. Time zones in the real world have little meaning unless associated 
with a date as well as a time, since the offset can vary through the year with 
daylight-saving time boundaries.
+-   The default time zone is specified as a constant numeric offset fromÂ UTC. 
It is therefore impossible to adapt to daylight-saving time when doing 
date/time arithmetic acrossÂ DSTÂ boundaries.
+
+To address these difficulties, use date/time types that contain both date and 
time when using time zones. Do not use the typeÂ time with time zoneÂ (although 
HAWQ supports thisÂ for legacy applications and for compliance with theÂ SQLÂ 
standard).Â HAWQÂ assumes your local time zone for any type containing only 
date or time.
+
+All timezone-aware dates and times are stored internally inÂ UTC. They are 
converted to local time in the zone specified by theÂ timezoneÂ configuration 
parameter before being displayed to the client.
+
+HAWQÂ allows you to specify time zones in three different forms:
+
+-   A full time zone name, for exampleÂ America/New\_York.Â HAWQ uses the 
widely-usedÂ zoneinfoÂ time zone data for this purpose, so the same names are 
also recognized by much other software.
+-   A time zone abbreviation, for exampleÂ PST. Such a specification merely 
defines a particular offset from UTC, in contrast to full time zone names which 
can imply a set of daylight savings transition-date rules as well. You cannot 
set the configuration parametersÂ timezoneÂ orÂ log\_timezoneÂ to a time zone 
abbreviation, but you can use abbreviations in date/time input values and with 
theÂ AT TIME ZONEÂ operator.
+-   In addition to the timezone names and abbreviations, HAWQ /&gt; acceptsÂ 
POSIX-style time zone specifications of the formÂ STDoffsetÂ orÂ STDoffsetDST, 
where STDÂ is a zone abbreviation,Â offsetÂ is a numeric offset in hours west 
from UTC, andÂ DSTÂ is an optional daylight-savings zone abbreviation, assumed 
to stand for one hour ahead of the given offset. For example, ifÂ EST5EDTÂ were 
not already a recognized zone name, it would be accepted and would be 
functionally equivalent to United States East Coast time. When a 
daylight-savings zone name is present, it is assumed to be used according to 
the same daylight-savings transition rules used in theÂ zoneinfoÂ time zone 
database'sÂ posixrulesÂ entry. In a standardÂ HAWQÂ installation,Â posixrulesÂ 
is the same asÂ US/Eastern, so that POSIX-style time zone specifications follow 
USA daylight-savings rules. If needed, you can adjust this behavior by 
replacing theÂ posixrulesÂ file.
+
+In short, this is the difference between abbreviations and full names: 
abbreviations always represent a fixed offset from UTC, whereas most of the 
full names imply a local daylight-savings time rule, and so have two possible 
UTC offsets.
+
+One should be wary that the POSIX-style time zone feature can lead to silently 
accepting bogus input, since there is no check on the reasonableness of the 
zone abbreviations. For example,Â SET TIMEZONE TO FOOBAR0Â will work, leaving 
the system effectively using a rather peculiar abbreviation for UTC. Another 
issue to keep in mind is that in POSIX time zone names, positive offsets are 
used for locationsÂ westÂ of Greenwich. Everywhere else,Â PostgreSQLÂ follows 
the ISO-8601 convention that positive timezone offsets areÂ eastÂ of Greenwich.
+
+In all cases, timezone names are recognized case-insensitively.Â 
+
+Neither full names nor abbreviations are hard-wired into the server, seeÂ 
[Date and Time Configuration Files](#dateandtimeconfigurationfiles).
+
+TheÂ timezoneÂ configuration parameter can be set in the fileÂ 
`hawq-site.xml`. There are also several special ways to set it:
+
+-   IfÂ timezoneÂ is not specified inÂ `hawq-site.xml`Â or as a server 
command-line option, the server attempts to use the value of theÂ TZÂ 
environment variable as the default time zone. IfÂ TZÂ is not defined or is not 
any of the time zone names known toÂ PostgreSQL, the server attempts to 
determine the operating system's default time zone by checking the behavior of 
the C library functionÂ localtime(). The default time zone is selected as the 
closest match from the known time zones.Â 
+-   TheÂ SQLÂ commandÂ SET TIME ZONEÂ sets the time zone for the session. This 
is an alternative spelling ofÂ SET TIMEZONE TOÂ with a more SQL-spec-compatible 
syntax.
+-   TheÂ PGTZÂ environment variable is used byÂ libpqÂ clients to send aÂ SET 
TIME ZONEÂ command to the server upon connection.
+
+## <a id="dateandtimeconfigurationfiles"></a>Date and Time Configuration Files
+
+Since timezone abbreviations are not well standardized, HAWQ /&gt;Â provides a 
means to customize the set of abbreviations accepted by the server. TheÂ 
timezone\_abbreviationsÂ run-time parameter determines the active set of 
abbreviations. While this parameter can be altered by any database user, the 
possible values for it are under the control of the database administrator â 
they are in fact names of configuration files stored inÂ 
.../share/timezonesets/Â of the installation directory. By adding or altering 
files in that directory, the administrator can set local policy for timezone 
abbreviations.
+
+timezone\_abbreviationsÂ can be set to any file name found inÂ 
.../share/timezonesets/, if the file's name is entirely alphabetic. (The 
prohibition against non-alphabetic characters inÂ timezone\_abbreviationsÂ 
prevents reading files outside the intended directory, as well as reading 
editor backup files and other extraneous files.)
+
+A timezone abbreviation file can contain blank lines and comments beginning 
withÂ \#. Non-comment lines must have one of these formats:
+
+``` pre
+time_zone_nameoffsettime_zone_nameoffset D
+@INCLUDE file_name
+@OVERRIDE
+```
+
+AÂ time\_zone\_nameÂ is just the abbreviation being defined. TheÂ offsetÂ is 
the zone's offset in seconds from UTC, positive being east from Greenwich and 
negative being west. For example, -18000 would be five hours west of Greenwich, 
or North American east coast standard time.Â DÂ indicates that the zone name 
represents local daylight-savings time rather than standard time. Since all 
known time zone offsets are on 15 minute boundaries, the number of seconds has 
to be a multiple of 900.
+
+TheÂ @INCLUDEÂ syntax allows inclusion of another file in theÂ 
.../share/timezonesets/Â directory. Inclusion can be nested, to a limited depth.
+
+TheÂ @OVERRIDEÂ syntax indicates that subsequent entries in the file can 
override previous entries (i.e., entries obtained from included files). Without 
this, conflicting definitions of the same timezone abbreviation are considered 
an error.
+
+In an unmodified installation, the fileÂ DefaultÂ contains all the 
non-conflicting time zone abbreviations for most of the world. Additional 
filesÂ AustraliaÂ andÂ IndiaÂ are provided for those regions: these files first 
include theÂ DefaultÂ file and then add or modify timezones as needed.
+
+For reference purposes, a standard installation also contains filesÂ 
Africa.txt,Â America.txt, etc, containing information about every time zone 
abbreviation known to be in use according to theÂ zoneinfoÂ timezone database. 
The zone name definitions found in these files can be copied and pasted into a 
custom configuration file as needed.
+
+**Note:** These files cannot be directly referenced asÂ 
timezone\_abbreviationsÂ settings, because of the dot embedded in their names.
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/reference/HAWQEnvironmentVariables.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/reference/HAWQEnvironmentVariables.html.md.erb 
b/markdown/reference/HAWQEnvironmentVariables.html.md.erb
new file mode 100644
index 0000000..ce21798
--- /dev/null
+++ b/markdown/reference/HAWQEnvironmentVariables.html.md.erb
@@ -0,0 +1,97 @@
+---
+title: Environment Variables
+---
+
+This topic contains a reference of the environment variables that you set for 
HAWQ.
+
+Set these in your userâs startup shell profile (such as `~/.bashrc` or 
`~/.bash_profile`), or in `/etc/profile`, if you want to set them for all users.
+
+## <a id="requiredenvironmentvariables"></a>Required Environment Variables
+
+**Note:** `GPHOME`, `PATH` and `LD_LIBRARY_PATH` can be set by sourcing the 
`greenplum_path.sh` file from your HAWQ installation directory.
+
+### <a id="gphome"></a>GPHOME
+
+This is the installed location of your HAWQ software. For example:
+
+``` pre
+GPHOME=/usr/local/hawq  
+export GPHOME
+```
+
+### <a id="path"></a>PATH
+
+Your `PATH` environment variable should point to the location of the HAWQ bin 
directory. For example:
+
+``` pre
+PATH=$GPHOME/bin:$PATH
+export PATH 
+```
+
+### <a id="ld_library_path"></a>LD\_LIBRARY\_PATH
+
+The `LD_LIBRARY_PATH` environment variable should point to the location of the 
`HAWQ/PostgreSQL` library files. For example:
+
+``` pre
+LD_LIBRARY_PATH=$GPHOME/lib
+export LD_LIBRARY_PATH
+```
+
+## <a id="optionalenvironmentvariables"></a>Optional Environment Variables
+
+The following are HAWQ environment variables. You may want to add the 
connection-related environment variables to your profile, for convenience. That 
way, you do not have to type so many options on the command line for client 
connections. Note that these environment variables should be set on the HAWQ 
master host only.
+
+
+### <a id="pgappname"></a>PGAPPNAME
+
+This is the name of the application that is usually set by an application when 
it connects to the server. This name is displayed in the activity view and in 
log entries. The `PGAPPNAME` environmental variable behaves the same as the 
`application_name` connection parameter. The default value for 
`application_name` is `psql`. The name cannot be longer than 63 characters.
+
+### <a id="pgdatabase"></a>PGDATABASE
+
+The name of the default database to use when connecting.
+
+### <a id="pghost"></a>PGHOST
+
+The HAWQ master host name.
+
+### <a id="pghostaddr"></a>PGHOSTADDR
+
+The numeric IP address of the master host. This can be set instead of, or in 
addition to, `PGHOST`, to avoid DNS lookup overhead.
+
+### <a id="pgpassword"></a>PGPASSWORD
+
+The password used if the server demands password authentication. Use of this 
environment variable is not recommended, for security reasons (some operating 
systems allow non-root users to see process environment variables via ps). 
Instead, consider using the `~/.pgpass` file.
+
+### <a id="pgpassfile"></a>PGPASSFILE
+
+The name of the password file to use for lookups. If not set, it defaults to 
`~/.pgpass`.
+
+See The Password File underÂ [Configuring Client 
Authentication](../clientaccess/client_auth.html).
+
+### <a id="pgoptions"></a>PGOPTIONS
+
+Sets additional configuration parameters for the HAWQ master server.
+
+### <a id="pgport"></a>PGPORT
+
+The port number of the HAWQ server on the master host. The default port is 
5432.
+
+### <a id="pguser"></a>PGUSER
+
+The HAWQ user name used to connect.
+
+### <a id="pgdatestyle"></a>PGDATESTYLE
+
+Sets the default style of date/time representation for a session. (Equivalent 
to `SET datestyle TO....`)
+
+### <a id="pgtz"></a>PGTZ
+
+Sets the default time zone for a session. (Equivalent to `SET timezone         
          TO....`)
+
+### <a id="pgclientencoding"></a>PGCLIENTENCODING
+
+Sets the default client character set encoding for a session. (Equivalent to 
`SET client_encoding TO....`)
+
+Â Â 
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/reference/HAWQSampleSiteConfig.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/reference/HAWQSampleSiteConfig.html.md.erb 
b/markdown/reference/HAWQSampleSiteConfig.html.md.erb
new file mode 100644
index 0000000..d4cae5a
--- /dev/null
+++ b/markdown/reference/HAWQSampleSiteConfig.html.md.erb
@@ -0,0 +1,120 @@
+---
+title: Sample hawq-site.xml Configuration File
+---
+
+```xml
+<configuration>
+        <property>
+                <name>default_hash_table_bucket_number</name>
+                <value>18</value>
+        </property>
+
+        <property>
+                <name>hawq_dfs_url</name>
+                <value>hawq.example.com:8020/hawq_default</value>
+        </property>
+
+        <property>
+                <name>hawq_global_rm_type</name>
+                <value>none</value>
+        </property>
+
+        <property>
+                <name>hawq_master_address_host</name>
+                <value>hawq.example.com</value>
+        </property>
+
+        <property>
+                <name>hawq_master_address_port</name>
+                <value>5432</value>
+        </property>
+
+        <property>
+                <name>hawq_master_directory</name>
+                <value>/data/hawq/master</value>
+        </property>
+
+        <property>
+                <name>hawq_master_temp_directory</name>
+                <value>/tmp/hawq/master</value>
+        </property>
+
+        <property>
+                <name>hawq_re_cgroup_hierarchy_name</name>
+                <value>hawq</value>
+        </property>
+
+        <property>
+                <name>hawq_re_cgroup_mount_point</name>
+                <value>/sys/fs/cgroup</value>
+        </property>
+
+        <property>
+                <name>hawq_re_cpu_enable</name>
+                <value>false</value>
+        </property>
+
+        <property>
+                <name>hawq_rm_memory_limit_perseg</name>
+                <value>64GB</value>
+        </property>
+
+        <property>
+                <name>hawq_rm_nvcore_limit_perseg</name>
+                <value>16</value>
+        </property>
+
+        <property>
+                <name>hawq_rm_nvseg_perquery_limit</name>
+                <value>512</value>
+        </property>
+
+        <property>
+                <name>hawq_rm_nvseg_perquery_perseg_limit</name>
+                <value>6</value>
+        </property>
+
+        <property>
+                <name>hawq_rm_yarn_address</name>
+                <value>rm.example.com:8050</value>
+        </property>
+
+        <property>
+                <name>hawq_rm_yarn_app_name</name>
+                <value>hawq</value>
+        </property>
+
+        <property>
+                <name>hawq_rm_yarn_queue_name</name>
+                <value>default</value>
+        </property>
+
+        <property>
+                <name>hawq_rm_yarn_scheduler_address</name>
+                <value>rm.example.com:8030</value>
+        </property>
+
+        <property>
+                <name>hawq_segment_address_port</name>
+                <value>40000</value>
+        </property>
+
+        <property>
+                <name>hawq_segment_directory</name>
+                <value>/data/hawq/segment</value>
+        </property>
+
+        <property>
+                <name>hawq_segment_temp_directory</name>
+                <value>/tmp/hawq/segment</value>
+        </property>
+
+        <property>
+                <name>hawq_standby_address_host</name>
+                <value>standbyhost.example.com</value>
+        </property>
+
+</configuration>
+```
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/reference/HAWQSiteConfig.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/reference/HAWQSiteConfig.html.md.erb 
b/markdown/reference/HAWQSiteConfig.html.md.erb
new file mode 100644
index 0000000..3d20297
--- /dev/null
+++ b/markdown/reference/HAWQSiteConfig.html.md.erb
@@ -0,0 +1,23 @@
+---
+title: Server Configuration Parameter Reference
+---
+
+This section describes all server configuration guc/parameters that are 
available in HAWQ.
+
+Configuration guc/parameters are located in `$GPHOME/etc/hawq-site.xml`. This 
configuration file resides on all HAWQ instances and is managed either by 
Ambari or by using the `hawq config` utility. On HAWQ clusters installed and 
managed by Ambari, always use the Ambari administration interface, and not 
`hawq config`, to configure HAWQ properties. Ambari will overwrite any changes 
made using `hawq config`. 
+
+You can use the same configuration file cluster-wide across both master and 
segments.
+
+**Note:** While `postgresql.conf` still exists in HAWQ, any parameters defined 
in `hawq-site.xml` will overwrite configurations in `postgresql.conf`. For this 
reason, we recommend that you only use `hawq-site.xml` to configure your HAWQ 
cluster.
+
+**Note:** If you install and manage HAWQ using Ambari, be aware that any 
property changes to `hawq-site.xml` made using the command line could be 
overwritten by Ambari. For Ambari-managed HAWQ clusters, always use the Ambari 
administration interface to set or change HAWQ configuration properties.
+
+-   **[About Server Configuration 
Parameters](../reference/guc/guc_config.html)**
+
+-   **[Configuration Parameter 
Categories](../reference/guc/guc_category-list.html)**
+
+-   **[Configuration Parameters](../reference/guc/parameter_definitions.html)**
+
+-   **[Sample hawq-site.xml Configuration 
File](../reference/HAWQSampleSiteConfig.html)**
+
+

[27/57] [abbrv] [partial] incubator-hawq-docs git commit: HAWQ-1254 Fix/remove book branching on incubator-hawq-docs

Reply via email to