http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/query/query-performance.html.md.erb ---------------------------------------------------------------------- diff --git a/markdown/query/query-performance.html.md.erb b/markdown/query/query-performance.html.md.erb new file mode 100644 index 0000000..981d77b --- /dev/null +++ b/markdown/query/query-performance.html.md.erb @@ -0,0 +1,155 @@ +--- +title: Query Performance +--- + +<span class="shortdesc">HAWQ dynamically allocates resources to queries. Query performance depends on several factors such as data locality, number of virtual segments used for the query and general cluster health.</span> + +- Dynamic Partition Elimination + + In HAWQ, values available only when a query runs are used to dynamically prune partitions, which improves query processing speed. Enable or disable dynamic partition elimination by setting the server configuration parameter `gp_dynamic_partition_pruning` to `ON` or `OFF`; it is `ON` by default. + +- Memory Optimizations + + HAWQ allocates memory optimally for different operators in a query and frees and re-allocates memory during the stages of processing a query. + +- Runaway Query Termination + + HAWQ can automatically terminate the most memory-intensive queries based on a memory usage threshold. The threshold is set as a configurable percentage ([runaway\_detector\_activation\_percent](../reference/guc/parameter_definitions.html#runaway_detector_activation_percent)) of the resource quota for the segment, which is calculated by HAWQ's resource manager. + + If the amount of virtual memory utilized by a physical segment exceeds the calculated threshold, then HAWQ begins terminating queries based on memory usage, starting with the query that is consuming the largest amount of memory. Queries are terminated until the percentage of utilized virtual memory is below the specified percentage. + + To calculate the memory usage threshold for runaway queries, HAWQ uses the following formula: + + *vmem threshold* = (*virtual memory quota calculated by resource manager* + [hawq\_re\_memory\_overcommit\_max](../reference/guc/parameter_definitions.html#hawq_re_memory_overcommit_max)) \* [runaway\_detector\_activation\_percent](../reference/guc/parameter_definitions.html#runaway_detector_activation_percent). + + For example, if HAWQ resource manager calculates a virtual memory quota of 9GB,` hawq_re_memory_overcommit_max` is set to 1GB and the value of `runaway_detector_activation_percent` is 95 (95%), then HAWQ starts terminating queries when the utilized virtual memory exceeds 9.5 GB. + + To disable automatic query detection and termination, set the value of `runaway_detector_activation_percent` to 100. + +## <a id="id_xkg_znj_f5"></a>How to Investigate Query Performance Issues + +A query is not executing as quickly as you would expect. Here is how to investigate possible causes of slowdown: + +1. Check the health of the cluster. + 1. Are any DataNodes, segments or nodes down? + 2. Are there many failed disks? + +2. Check table statistics. Have the tables involved in the query been analyzed? +3. Check the plan of the query and run /3/4 to determine the bottleneck. + Sometimes, there is not enough memory for some operators, such as Hash Join, or spill files are used. If an operator cannot perform all of its work in the memory allocated to it, it caches data on disk in *spill files*. Compared with no spill files, a query will run much slower. + +4. Check data locality statistics using /3/4. Alternately you can check the logs. Data locality result for every query could also be found in the log of HAWQ. See [Data Locality Statistics](query-performance.html#topic_amk_drc_d5) for information on the statistics. +5. Check resource queue status. You can query view `pg_resqueue_status` to check if the target queue has already dispatched some resource to the queries, or if the target queue is lacking resources. See [Checking Existing Resource Queues](../resourcemgmt/ResourceQueues.html#topic_lqy_gls_zt). +6. Analyze a dump of the resource manager's status to see more resource queue status. See [Analyzing Resource Manager Status](../resourcemgmt/ResourceQueues.html#topic_zrh_pkc_f5). + +## <a id="topic_amk_drc_d5"></a>Data Locality Statistics + +For visibility into query performance, use the EXPLAIN ANALYZE to obtain data locality statistics. For example: + +``` sql +postgres=# CREATE TABLE test (i int); +postgres=# INSERT INTO test VALUES(2); +postgres=# EXPLAIN ANALYZE SELECT * FROM test; +``` +``` +QUERY PLAN +....... +Data locality statistics: +data locality ratio: 1.000; virtual segment number: 1; different host number: 1; +virtual segment number per host(avg/min/max): (1/1/1); +segment size(avg/min/max): (32.000 B/32 B/32 B); +segment size with penalty(avg/min/max): (32.000 B/32 B/32 B); +continuity(avg/min/max): (1.000/1.000/1.000); DFS metadatacache: 7.816 ms; +resource allocation: 0.615 ms; datalocality calculation: 0.136 ms. +``` + +The following table describes the metrics related to data locality. Use these metrics to examine issues behind a query's performance. + +<a id="topic_amk_drc_d5__table_q4p_25c_d5"></a> + +<table> +<caption><span class="tablecap">Table 1. Data Locality Statistics</span></caption> +<colgroup> +<col width="50%" /> +<col width="50%" /> +</colgroup> +<thead> +<tr class="header"> +<th>Statistic</th> +<th>Description</th> +</tr> +</thead> +<tbody> +<tr class="odd"> +<td>data locality ratio</td> +<td><p>Indicates the total local read ratio of a query. The lower the ratio, the more remote read happens. Since remote read on HDFS needs network IO, the execution time of a query may increase.</p> +<p>For hash distributed tables, all the blocks of a file will be processed by one segment, so if data on HDFS is redistributed, such as by the HDFS Balancer, the data locality ratio will be decreased. In this case, you can redistribute the hash distributed table manually by using CREATE TABLE AS SELECT.</p></td> +</tr> +<tr class="even"> +<td>number of virtual segments</td> +<td>Typically, the more virtual segments are used, the faster the query will be executed. If the virtual segment number is too small, you can check whether <code class="ph codeph">default_hash_table_bucket_number</code>, <code class="ph codeph">hawq_rm_nvseg_perquery_limit</code>, or the bucket number of a hash distributed table is small. See <a href="#topic_wv3_gzc_d5">Number of Virtual Segments</a>.</td> +</tr> +<tr class="odd"> +<td>different host number</td> +<td>Indicates how many hosts are used to run this query. All the hosts should be used when the virtual segment number is bigger than the total number of hosts according to the resource allocation strategy of HAWQ. As a result, if this metric is smaller than the total number of hosts for a big query, it often indicates that some hosts are down. In this case, use âselect gp_segment_configurationâ to check the node states first.</td> +</tr> +<tr class="even"> +<td>segment size and segment size with penalty</td> +<td>âsegment sizeâ indicates the (avg/min/max) data size which is processed by a virtual segment. âsegment size with penaltyâ is the segment size when remote read is calculated as ânet_disk_ratioâ * block size. The virtual segment that contains remote read should process less data than the virtual segment that contains only local read. ânet_disk_ratioâ can be tuned to measure how much slower the remote read is than local read for different network environments, while considering the workload balance between the nodes. The default value of ânet_disk_ratioâ is 1.01.</td> +</tr> +<tr class="odd"> +<td>continuity</td> +<td>reading a HDFS file discontinuously will introduce additional seek, which will slow the table scan of a query. A low value of continuity indicates that the blocks of a file are not continuously distributed on a DataNode.</td> +</tr> +<tr class="even"> +<td>DFS metadatacache</td> +<td>Indicates the metadatacache time cost for a query. In HAWQ, HDFS block information is cached in a metadatacache process. If cache miss happens, time cost of metadatacache may increase.</td> +</tr> +<tr class="odd"> +<td>resource allocation</td> +<td>Indicates the time cost of acquiring resources from the resource manager.</td> +</tr> +<tr class="even"> +<td>datalocality calculation</td> +<td>Indicates the time to run the algorithm that assigns HDFS blocks to virtual segments and calculates the data locality ratio.</td> +</tr> +</tbody> +</table> + +## <a id="topic_wv3_gzc_d5"></a>Number of Virtual Segments + +To obtain the best results when querying data in HAWQ, review the best practices described in this topic. + +### <a id="virtual_seg_performance"></a>Factors Impacting Query Performance + +The number of virtual segments used for a query directly impacts the query's performance. The following factors can impact the degree of parallelism of a query: + +- **Cost of the query**. Small queries use fewer segments and larger queries use more segments. Some techniques used in defining resource queues can influence the number of both virtual segments and general resources allocated to queries. +- **Available resources at query time**. If more resources are available in the resource queue, those resources will be used. +- **Hash table and bucket number**. If the query involves only hash-distributed tables, the query's parallelism is fixed (equal to the hash table bucket number) under the following conditions: + + - The bucket number (bucketnum) configured for all the hash tables is the same bucket number + - The table size for random tables is no more than 1.5 times the size allotted for the hash tables. + + Otherwise, the number of virtual segments depends on the query's cost: hash-distributed table queries behave like queries on randomly-distributed tables. + +- **Query Type**: It can be difficult to calculate resource costs for queries with some user-defined functions or for queries to external tables. With these queries, the number of virtual segments is controlled by the `hawq_rm_nvseg_perquery_limit `and `hawq_rm_nvseg_perquery_perseg_limit` parameters, as well as by the ON clause and the location list of external tables. If the query has a hash result table (e.g. `INSERT into hash_table`), the number of virtual segments must be equal to the bucket number of the resulting hash table. If the query is performed in utility mode, such as for `COPY` and `ANALYZE` operations, the virtual segment number is calculated by different policies. + +###General Guidelines + +The following guidelines expand on the numbers of virtual segments to use, provided there are sufficient resources available. + +- **Random tables exist in the select list:** \#vseg (number of virtual segments) depends on the size of the table. +- **Hash tables exist in the select list:** \#vseg depends on the bucket number of the table. +- **Random and hash tables both exist in the select list:** \#vseg depends on the bucket number of the table, if the table size of random tables is no more than 1.5 times the size of hash tables. Otherwise, \#vseg depends on the size of the random table. +- **User-defined functions exist:** \#vseg depends on the `hawq_rm_nvseg_perquery_limit` and `hawq_rm_nvseg_perquery_perseg_limit` parameters. +- **PXF external tables exist:** \#vseg depends on the `default_hash_table_bucket_number` parameter. +- **gpfdist external tables exist:** \#vseg is at least the number of locations in the location list. +- **The command for CREATE EXTERNAL TABLE is used:** \#vseg must reflect the value in the command and use the `ON` clause in the command. +- **Hash tables are copied to or from files:** \#vseg depends on the bucket number of the hash table. +- **Random tables are copied to files:** \#vseg depends on the size of the random table. +- **Random tables are copied from files:** \#vseg is a fixed value. \#vseg is 6, when there are sufficient resources. +- **ANALYZE table:** Analyzing a nonpartitioned table will use more virtual segments than a partitioned table. +- **Relationship between hash distribution results:** \#vseg must be the same as the bucket number for the hash table. + +
http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/query/query-profiling.html.md.erb ---------------------------------------------------------------------- diff --git a/markdown/query/query-profiling.html.md.erb b/markdown/query/query-profiling.html.md.erb new file mode 100644 index 0000000..ea20e0a --- /dev/null +++ b/markdown/query/query-profiling.html.md.erb @@ -0,0 +1,240 @@ +--- +title: Query Profiling +--- + +<span class="shortdesc">Examine the query plans of poorly performing queries to identify possible performance tuning opportunities.</span> + +HAWQ devises a *query plan* for each query. Choosing the right query plan to match the query and data structure is necessary for good performance. A query plan defines how HAWQ will run the query in the parallel execution environment. + +The query optimizer uses data statistics maintained by the database to choose a query plan with the lowest possible cost. Cost is measured in disk I/O, shown as units of disk page fetches. The goal is to minimize the total execution cost for the plan. + +View the plan for a given query with the `EXPLAIN` command. `EXPLAIN` shows the query optimizer's estimated cost for the query plan. For example: + +``` sql +EXPLAIN SELECT * FROM names WHERE id=22; +``` + +`EXPLAIN ANALYZE` runs the statement in addition to displaying its plan. This is useful for determining how close the optimizer's estimates are to reality. For example: + +``` sql +EXPLAIN ANALYZE SELECT * FROM names WHERE id=22; +``` + +**Note:** The legacy and GPORCA query optimizers coexist in HAWQ. GPORCA is the default HAWQ optimizer. HAWQ uses GPORCA to generate an execution plan for a query when possible. The `EXPLAIN` output generated by GPORCA is different than the output generated by the legacy query optimizer. + +When the `EXPLAIN ANALYZE` command uses GPORCA, the `EXPLAIN` plan shows only the number of partitions that are being eliminated. The scanned partitions are not shown. To show name of the scanned partitions in the segment logs set the server configuration parameter `gp_log_dynamic_partition_pruning` to `on`. This example `SET` command enables the parameter. + +``` sql +SET gp_log_dynamic_partition_pruning = on; +``` + +For information about GPORCA, see [Querying Data](query.html#topic1). + +## <a id="topic40"></a>Reading EXPLAIN Output + +A query plan is a tree of nodes. Each node in the plan represents a single operation, such as a table scan, join, aggregation, or sort. + +Read plans from the bottom to the top: each node feeds rows into the node directly above it. The bottom nodes of a plan are usually table scan operations. If the query requires joins, aggregations, sorts, or other operations on the rows, there are additional nodes above the scan nodes to perform these operations. The topmost plan nodes are usually HAWQ motion nodes: redistribute, broadcast, or gather motions. These operations move rows between segment instances during query processing. + +The output of `EXPLAIN` has one line for each node in the plan tree and shows the basic node type and the following execution cost estimates for that plan node: + +- **cost** âMeasured in units of disk page fetches. 1.0 equals one sequential disk page read. The first estimate is the start-up cost of getting the first row and the second is the total cost of cost of getting all rows. The total cost assumes all rows will be retrieved, which is not always true; for example, if the query uses `LIMIT`, not all rows are retrieved. +- **rows** âThe total number of rows output by this plan node. This number is usually less than the number of rows processed or scanned by the plan node, reflecting the estimated selectivity of any `WHERE` clause conditions. Ideally, the estimate for the topmost node approximates the number of rows that the query actually returns. +- **width** âThe total bytes of all the rows that this plan node outputs. + +Note the following: + +- The cost of a node includes the cost of its child nodes. The topmost plan node has the estimated total execution cost for the plan. This is the number the optimizer intends to minimize. +- The cost reflects only the aspects of plan execution that the query optimizer takes into consideration. For example, the cost does not reflect time spent transmitting result rows to the client. + +### <a id="topic41"></a>EXPLAIN Example + +The following example describes how to read an `EXPLAIN` query plan for a query: + +``` sql +EXPLAIN SELECT * FROM names WHERE name = 'Joelle'; +``` + +``` + QUERY PLAN +----------------------------------------------------------------------------- + Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..1.01 rows=1 width=11) + -> Append-only Scan on names (cost=0.00..1.01 rows=1 width=11) + Filter: name::text = 'Joelle'::text +(3 rows) +``` + +Read the plan from the bottom to the top. To start, the query optimizer sequentially scans the *names* table. Notice the `WHERE` clause is applied as a *filter* condition. This means the scan operation checks the condition for each row it scans and outputs only the rows that satisfy the condition. + +The results of the scan operation are passed to a *gather motion* operation. In HAWQ, a gather motion is when segments send rows to the master. In this example, we have two segment instances that send to one master instance. This operation is working on `slice1` of the parallel query execution plan. A query plan is divided into *slices* so the segments can work on portions of the query plan in parallel. + +The estimated startup cost for this plan is `00.00` (no cost) and a total cost of `1.01` disk page fetches. The optimizer estimates this query will return one row. + +## <a id="topic42"></a>Reading EXPLAIN ANALYZE Output + +`EXPLAIN ANALYZE` plans and runs the statement. The `EXPLAIN ANALYZE` plan shows the actual execution cost along with the optimizer's estimates. This allows you to see if the optimizer's estimates are close to reality. `EXPLAIN ANALYZE` also shows the following: + +- The total runtime (in milliseconds) in which the query executed. +- The memory used by each slice of the query plan, as well as the memory reserved for the whole query statement. +- Statistics for the query dispatcher, including the number of executors used for the current query (total number/number of executors cached by previous queries/number of executors newly connected), dispatcher time (total dispatch time/connection establish time/dispatch data to executor time); and some time(max/min/avg) details for dispatching data, consuming executor data, and freeing executor. +- Statistics about data locality. See [Data Locality Statistics](query-performance.html#topic_amk_drc_d5) for details about these statistics. +- The number of *workers* (segments) involved in a plan node operation. Only segments that return rows are counted. +- The Max/Last statistics are for the segment that output the maximum number of rows and the segment with the longest *<time> to end*. +- The segment id of the segment that produced the most rows for an operation. +- For relevant operations, the amount of memory (`work_mem`) used by the operation. If the `work_mem` was insufficient to perform the operation in memory, the plan shows the amount of data spilled to disk for the lowest-performing segment. For example: + + ``` pre + Work_mem used: 64K bytes avg, 64K bytes max (seg0). + Work_mem wanted: 90K bytes avg, 90K byes max (seg0) to lessen + workfile I/O affecting 2 workers. + ``` +**Note** +The *work\_mem* property is not configurable. Use resource queues to manage memory use. For more information on resource queues, see [Configuring Resource Management](../resourcemgmt/ConfigureResourceManagement.html) and [Working with Hierarchical Resource Queues](../resourcemgmt/ResourceQueues.html). + +- The time (in milliseconds) in which the segment that produced the most rows retrieved the first row, and the time taken for that segment to retrieve all rows. The result may omit *<time> to first row* if it is the same as the *<time> to end*. + +### <a id="topic43"></a>EXPLAIN ANALYZE Example + +This example describes how to read an `EXPLAIN ANALYZE` query plan using the same query. The `bold` parts of the plan show actual timing and rows returned for each plan node, as well as memory and time statistics for the whole query. + +``` sql +EXPLAIN ANALYZE SELECT * FROM names WHERE name = 'Joelle'; +``` + +``` + QUERY PLAN +------------------------------------------------------------------------ + Gather Motion 1:1 (slice1; segments: 1) (cost=0.00..1.01 rows=1 width=7) + Rows out: Avg 1.0 rows x 1 workers at destination. Max/Last(seg0:ip-10-0-1-16/seg0:ip-10-0-1-16) 1/1 rows with 8.713/8.713 ms to first row, 8.714/8.714 ms to end, start offset by 0.708/0.708 ms. + -> Append-only Scan on names (cost=0.00..1.01 rows=1 width=7) + Filter: name = 'Joelle'::text + Rows out: Avg 1.0 rows x 1 workers. Max/Last(seg0:ip-10-0-1-16/seg0:ip-10-0-1-16) 1/1 rows with 7.053/7.053 ms to first row, 7.089/7.089 ms to end, start offset by 2.162/2.162 ms. + Slice statistics: + (slice0) Executor memory: 159K bytes. + (slice1) Executor memory: 247K bytes (seg0:ip-10-0-1-16). + Statement statistics: + Memory used: 262144K bytes + Dispatcher statistics: + executors used(total/cached/new connection): (1/1/0); dispatcher time(total/connection/dispatch data): (0.217 ms/0.000 ms/0.037 ms). + dispatch data time(max/min/avg): (0.037 ms/0.037 ms/0.037 ms); consume executor data time(max/min/avg): (0.015 ms/0.015 ms/0.015 ms); free executor time(max/min/avg): (0.000 ms/0.000 ms/0.000 ms). + Data locality statistics: + data locality ratio: 1.000; virtual segment number: 1; different host number: 1; virtual segment number per host(avg/min/max): (1/1/1); segment size(avg/min/max): (48.000 B/48 B/48 B); segment size with penalty(avg/min/max): (48.000 B/48 B/48 B); continuity(avg/min/max): (1.000/1.000/1.000); DFS metadatacache: 9.343 ms; resource allocation: 0.638 ms; datalocality calculation: 0.144 ms. + Total runtime: 19.690 ms +(16 rows) +``` + +Read the plan from the bottom to the top. The total elapsed time to run this query was *19.690* milliseconds. + +The *Append-only scan* operation had only one segment (*seg0*) that returned rows, and it returned just *1 row*. The Max/Last statistics are identical in this example because only one segment returned rows. It took *7.053* milliseconds to find the first row and *7.089* milliseconds to scan all rows. This result is close to the optimizer's estimate: the query optimizer estimated it would return one row for this query. The *gather motion* (segments sending data to the master) received 1 row. The total elapsed time for this operation was *19.690* milliseconds. + +## <a id="topic44"></a>Examining Query Plans to Solve Problems + +If a query performs poorly, examine its query plan and ask the following questions: + +- **Do operations in the plan take an exceptionally long time?** Look for an operation that consumes the majority of query processing time. For example, if a scan on a hash table takes longer than expected, the data locality may be low; reloading the data can increase the data locality and speed up the query. Or, adjust `enable_<operator>` parameters to see if you can force the legacy query optimizer (planner) to choose a different plan by disabling a particular query plan operator for that query. +- **Are the optimizer's estimates close to reality?** Run `EXPLAIN ANALYZE` and see if the number of rows the optimizer estimates is close to the number of rows the query operation actually returns. If there is a large discrepancy, collect more statistics on the relevant columns. +- **Are selective predicates applied early in the plan?** Apply the most selective filters early in the plan so fewer rows move up the plan tree. If the query plan does not correctly estimate query predicate selectivity, collect more statistics on the relevant columns. You can also try reordering the `WHERE` clause of your SQL statement. +- **Does the optimizer choose the best join order?** When you have a query that joins multiple tables, make sure that the optimizer chooses the most selective join order. Joins that eliminate the largest number of rows should be done earlier in the plan so fewer rows move up the plan tree. + + If the plan is not choosing the optimal join order, set `join_collapse_limit=1` and use explicit `JOIN` syntax in your SQL statement to force the legacy query optimizer (planner) to the specified join order. You can also collect more statistics on the relevant join columns. + +- **Does the optimizer selectively scan partitioned tables?** If you use table partitioning, is the optimizer selectively scanning only the child tables required to satisfy the query predicates? Scans of the parent tables should return 0 rows since the parent tables do not contain any data. See [Verifying Your Partition Strategy](../ddl/ddl-partition.html#topic74) for an example of a query plan that shows a selective partition scan. +- **Does the optimizer choose hash aggregate and hash join operations where applicable?** Hash operations are typically much faster than other types of joins or aggregations. Row comparison and sorting is done in memory rather than reading/writing from disk. To enable the query optimizer to choose hash operations, there must be sufficient memory available to hold the estimated number of rows. Try increasing work memory to improve performance for a query. If possible, run an `EXPLAIN ANALYZE` for the query to show which plan operations spilled to disk, how much work memory they used, and how much memory was required to avoid spilling to disk. For example: + + `Work_mem used: 23430K bytes avg, 23430K bytes max (seg0). Work_mem wanted: 33649K bytes avg, 33649K bytes max (seg0) to lessen workfile I/O affecting 2 workers.` + + The "bytes wanted" message from `EXPLAIN ANALYZE` is based on the amount of data written to work files and is not exact. The minimum `work_mem` needed can differ from the suggested value. + +## <a id="explainplan_plpgsql"></a>Generating EXPLAIN Plan from a PL/pgSQL Function + +User-defined PL/pgSQL functions often include dynamically created queries. You may find it useful to generate the `EXPLAIN` plan for such queries for query performance optimization and tuning. + +Perform the following steps to create and run a user-defined PL/pgSQL function. This function displays the `EXPLAIN` plan for a simple query on a test database. + +1. Log in to the HAWQ master node as user `gpadmin` and set up the HAWQ environment: + + ``` shell + $ ssh gpadmin@hawq_master + $ . /usr/local/hawq/greenplum_path.sh + ``` + +2. Create a test database named `testdb`: + + ``` shell + $ createdb testdb + ``` + +3. Start the PostgreSQL interactive utility, connecting to `testdb`: + + ``` shell + $ psql -d testdb + ``` + +4. Create the table `test_tbl` with a single column named `id` of type `integer`: + + ``` sql + testdb=# CREATE TABLE test_tbl (id int); + ``` + +5. Add some data to the `test_tbl` table: + + ``` sql + testdb=# INSERT INTO test_tbl SELECT generate_series(1,100); + ``` + + This `INSERT` command adds 100 rows to `test_tbl`, incrementing the `id` for each row. + +6. Create a PL/pgSQL function named `explain_plan_func()` by copying and pasting the following text at the `psql` prompt: + + ``` sql + CREATE OR REPLACE FUNCTION explain_plan_func() RETURNS varchar as $$ + declare + + a varchar; + b varchar; + + begin + a = ''; + for b in execute 'explain select count(*) from test_tbl group by id' loop + a = a || E'\n' || b; + end loop; + return a; + end; + $$ + LANGUAGE plpgsql + VOLATILE; + ``` + +7. Verify the `explain_plan_func()` user-defined function was created successfully: + + ``` shell + testdb=# \df+ + ``` + + The `\df+` command lists all user-defined functions. + +8. Perform a query using the user-defined function you just created: + + ``` sql + testdb=# SELECT explain_plan_func(); + ``` + + The `EXPLAIN` plan results for the query are displayed: + + ``` pre + explain_plan_func +--------------------------------------------------------------------------------------------------------- + Gather Motion 1:1 (slice2; segments: 1) (cost=0.00..431.04 rows=100 width=8) + -> Result (cost=0.00..431.03 rows=100 width=8) + -> HashAggregate (cost=0.00..431.03 rows=100 width=8) + Group By: id + -> Redistribute Motion 1:1 (slice1; segments: 1) (cost=0.00..431.02 rows=100 width=12) + Hash Key: id + -> Result (cost=0.00..431.01 rows=100 width=12) + -> HashAggregate (cost=0.00..431.01 rows=100 width=12) + Group By: id + -> Table Scan on test_tbl (cost=0.00..431.00 rows=100 width=4) + Settings: default_hash_table_bucket_number=6 + Optimizer status: PQO version 1.627 +(1 row) + ``` http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/query/query.html.md.erb ---------------------------------------------------------------------- diff --git a/markdown/query/query.html.md.erb b/markdown/query/query.html.md.erb new file mode 100644 index 0000000..9c218c7 --- /dev/null +++ b/markdown/query/query.html.md.erb @@ -0,0 +1,37 @@ +--- +title: Querying Data +--- + +This topic provides information about using SQL in HAWQ databases. + +You enter SQL statements called queries to view and analyze data in a database using the `psql` interactive SQL client and other client tools. + +**Note:** HAWQ queries timeout after a period of 600 seconds. For this reason, long-running queries may appear to hang until results are processed or until the timeout period expires. + +- **[About HAWQ Query Processing](../query/HAWQQueryProcessing.html)** + + This topic provides an overview of how HAWQ processes queries. Understanding this process can be useful when writing and tuning queries. + +- **[About GPORCA](../query/gporca/query-gporca-optimizer.html)** + + In HAWQ, you can use GPORCA or the legacy query optimizer. + +- **[Defining Queries](../query/defining-queries.html)** + + HAWQ is based on the PostgreSQL implementation of the SQL standard. SQL commands are typically entered using the standard PostgreSQL interactive terminal `psql`, but other programs that have similar functionality can be used as well. + +- **[Using Functions and Operators](../query/functions-operators.html)** + + HAWQ evaluates functions and operators used in SQL expressions. + +- **[Query Performance](../query/query-performance.html)** + + HAWQ dynamically allocates resources to queries. Query performance depends on several factors such as data locality, number of virtual segments used for the query and general cluster health. + +- **[Query Profiling](../query/query-profiling.html)** + + Examine the query plans of poorly performing queries to identify possible performance tuning opportunities. + + + + http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/reference/CharacterSetSupportReference.html.md.erb ---------------------------------------------------------------------- diff --git a/markdown/reference/CharacterSetSupportReference.html.md.erb b/markdown/reference/CharacterSetSupportReference.html.md.erb new file mode 100644 index 0000000..8a12471 --- /dev/null +++ b/markdown/reference/CharacterSetSupportReference.html.md.erb @@ -0,0 +1,439 @@ +--- +title: Character Set Support Reference +--- + +This topic provides a referene of the character sets supported in HAWQ. + +The character set support in HAWQ allows you to store text in a variety of character sets, including single-byte character sets such as the ISO 8859 series and multiple-byte character sets such as EUC (Extended Unix Code), UTF-8, and Mule internal code. All supported character sets can be used transparently by clients, but a few are not supported for use within the server (that is, as a server-side encoding). The default character set is selected while initializing your HAWQ using `hawq init.` It can be overridden when you create a database, so you can have multiple databases each with a different character set. + +<table style="width:100%;"> +<colgroup> +<col width="16%" /> +<col width="16%" /> +<col width="16%" /> +<col width="16%" /> +<col width="16%" /> +<col width="16%" /> +</colgroup> +<thead> +<tr class="header"> +<th>Name</th> +<th>Description</th> +<th>Language</th> +<th>Server</th> +<th>Bytes/Char</th> +<th>Aliases</th> +</tr> +</thead> +<tbody> +<tr class="odd"> +<td>BIG5</td> +<td>Big Five</td> +<td>Traditional Chinese</td> +<td>No</td> +<td>1-2</td> +<td>WIN950, Windows950</td> +</tr> +<tr class="even"> +<td>EUC_CN</td> +<td>Extended UNIX Code-CN</td> +<td>Simplified Chinese</td> +<td>Yes</td> +<td>1-3</td> +<td></td> +</tr> +<tr class="odd"> +<td>EUC_JP</td> +<td>Extended UNIX Code-JP</td> +<td>Japanese</td> +<td>Yes</td> +<td>1-3</td> +<td> </td> +</tr> +<tr class="even"> +<td>EUC_KR</td> +<td>Extended UNIX Code-KR</td> +<td>Korean</td> +<td>Yes</td> +<td>1-3</td> +<td> </td> +</tr> +<tr class="odd"> +<td>EUC_TW</td> +<td>Extended UNIX Code-TW</td> +<td>Traditional Chinese, Taiwanese</td> +<td>Yes</td> +<td>1-3</td> +<td> </td> +</tr> +<tr class="even"> +<td>GB18030</td> +<td>National Standard</td> +<td>Chinese</td> +<td>No</td> +<td>1-2</td> +<td> </td> +</tr> +<tr class="odd"> +<td>GBK</td> +<td>Extended National Standard</td> +<td>Simplified Chinese</td> +<td>No</td> +<td>1-2</td> +<td>WIN936,Windows936</td> +</tr> +<tr class="even"> +<td>ISO_8859_5</td> +<td>ISO 8859-5, ECMA 113</td> +<td>Latin/Cyrillic</td> +<td>Yes</td> +<td>1</td> +<td> </td> +</tr> +<tr class="odd"> +<td>ISO_8859_6</td> +<td>ISO 8859-6, ECMA 114</td> +<td>Latin/Arabic</td> +<td>Yes</td> +<td>1</td> +<td> </td> +</tr> +<tr class="even"> +<td>ISO_8859_7</td> +<td>ISO 8859-7, ECMA 118</td> +<td>Latin/Greek</td> +<td>Yes</td> +<td>1</td> +<td> </td> +</tr> +<tr class="odd"> +<td>ISO_8859_8</td> +<td>ISO 8859-8, ECMA 121</td> +<td>Latin/Hebrew</td> +<td>Yes</td> +<td>1</td> +<td> </td> +</tr> +<tr class="even"> +<td>JOHAB</td> +<td>JOHA</td> +<td>Korean (Hangul)</td> +<td>Yes</td> +<td>1-3</td> +<td> </td> +</tr> +<tr class="odd"> +<td>KOI8</td> +<td>KOI8-R(U)</td> +<td>Cyrillic</td> +<td>Yes</td> +<td>1</td> +<td>KOI8R</td> +</tr> +<tr class="even"> +<td>LATIN1</td> +<td>ISO 8859-1, ECMA 94</td> +<td>Western European</td> +<td>Yes</td> +<td>1</td> +<td>ISO88591</td> +</tr> +<tr class="odd"> +<td>LATIN2</td> +<td>ISO 8859-2, ECMA 94</td> +<td>Central European</td> +<td>Yes</td> +<td>1</td> +<td>ISO88592</td> +</tr> +<tr class="even"> +<td>LATIN3</td> +<td>ISO 8859-3, ECMA 94</td> +<td>South European</td> +<td>Yes</td> +<td>1</td> +<td>ISO88593</td> +</tr> +<tr class="odd"> +<td>LATIN4</td> +<td>ISO 8859-4, ECMA 94</td> +<td>North European</td> +<td>Yes</td> +<td>1</td> +<td>ISO88594</td> +</tr> +<tr class="even"> +<td>LATIN5</td> +<td>ISO 8859-9, ECMA 128</td> +<td>Turkish</td> +<td>Yes</td> +<td>1</td> +<td>ISO88599</td> +</tr> +<tr class="odd"> +<td>LATIN6</td> +<td>ISO 8859-10, ECMA 144</td> +<td>Nordic</td> +<td>Yes</td> +<td>1</td> +<td>ISO885910</td> +</tr> +<tr class="even"> +<td>LATIN7</td> +<td>ISO 8859-13</td> +<td>Baltic</td> +<td>Yes</td> +<td>1</td> +<td>ISO885913</td> +</tr> +<tr class="odd"> +<td>LATIN8</td> +<td>ISO 8859-14</td> +<td>Celtic</td> +<td>Yes</td> +<td>1</td> +<td>ISO885914</td> +</tr> +<tr class="even"> +<td>LATIN9</td> +<td>ISO 8859-15</td> +<td>LATIN1 with Euro and accents</td> +<td>Yes</td> +<td>1</td> +<td>ISO885915</td> +</tr> +<tr class="odd"> +<td>LATIN10</td> +<td>ISO 8859-16, ASRO SR 14111</td> +<td>Romanian</td> +<td>Yes</td> +<td>1</td> +<td>ISO885916</td> +</tr> +<tr class="even"> +<td>MULE_INTERNAL</td> +<td>Mule internal code</td> +<td>Multilingual Emacs</td> +<td>Yes</td> +<td>1-4</td> +<td> </td> +</tr> +<tr class="odd"> +<td>SJIS</td> +<td>Shift JIS</td> +<td>Japanese</td> +<td>No</td> +<td>1-2</td> +<td>Mskanji, ShiftJIS, WIN932, Windows932</td> +</tr> +<tr class="even"> +<td>SQL_ASCII</td> +<td>unspecified2</td> +<td>any</td> +<td>No</td> +<td>1</td> +<td> </td> +</tr> +<tr class="odd"> +<td>UHC</td> +<td>Unified Hangul Code</td> +<td>Korean</td> +<td>No</td> +<td>1-2</td> +<td>WIN949, Windows949</td> +</tr> +<tr class="even"> +<td>UTF8</td> +<td>Unicode, 8-bit </td> +<td>all</td> +<td>Yes</td> +<td>1-4</td> +<td>Unicode</td> +</tr> +<tr class="odd"> +<td>WIN866</td> +<td>Windows CP866</td> +<td>Cyrillic</td> +<td>Yes</td> +<td>1</td> +<td>ALT</td> +</tr> +<tr class="even"> +<td>WIN874</td> +<td>Windows CP874</td> +<td>Thai</td> +<td>Yes</td> +<td>1</td> +<td> </td> +</tr> +<tr class="odd"> +<td>WIN1250</td> +<td>Windows CP1250</td> +<td>Central European</td> +<td>Yes</td> +<td>1</td> +<td> </td> +</tr> +<tr class="even"> +<td>WIN1251</td> +<td>Windows CP1251</td> +<td>Cyrillic</td> +<td>Yes</td> +<td>1</td> +<td>WIN</td> +</tr> +<tr class="odd"> +<td>WIN1252</td> +<td>Windows CP1252</td> +<td>Western European</td> +<td>Yes</td> +<td><p>1</p></td> +<td> </td> +</tr> +<tr class="even"> +<td>WIN1253</td> +<td>Windows CP1253</td> +<td>Greek</td> +<td>Yes</td> +<td>1</td> +<td> </td> +</tr> +<tr class="odd"> +<td>WIN1254</td> +<td>Windows CP1254</td> +<td>Turkish</td> +<td>Yes</td> +<td>1</td> +<td> </td> +</tr> +<tr class="even"> +<td>WIN1255</td> +<td>Windows CP1255</td> +<td>Hebrew</td> +<td>Yes</td> +<td>1</td> +<td> </td> +</tr> +<tr class="odd"> +<td>WIN1256</td> +<td>Windows CP1256</td> +<td>Arabic</td> +<td>Yes</td> +<td>1</td> +<td> </td> +</tr> +<tr class="even"> +<td>WIN1257</td> +<td>Windows CP1257</td> +<td>Baltic</td> +<td>Yes</td> +<td>1</td> +<td> </td> +</tr> +<tr class="odd"> +<td>WIN1258</td> +<td>Windows CP1258</td> +<td>Vietnamese</td> +<td>Yes</td> +<td>1</td> +<td>ABC, TCVN, TCVN5712, VSCII </td> +</tr> +</tbody> +</table> + +**Note:** + +- Not all the APIs support all the listed character sets. For example, the JDBC driver does not support MULE\_INTERNAL, LATIN6, LATIN8, and LATIN10. +- The SQLASCII setting behaves considerable differently from the other settings. Byte values 0-127 are interpreted according to the ASCII standard, while byte values 128-255 are taken as uninterpreted characters. If you are working with any nonASCII data, it is unwise to use the SQL\_ASCII setting as a client encoding. SQL\_ASCII is not supported as a server encoding. + +## <a id="settingthecharacterset"></a>Setting the Character Set + +`hawq init` defines the default character set for a HAWQ system by reading the setting of the ENCODING parameter in the gp\_init\_config file at initialization time. The default character set is UNICODE or UTF8. + +You can create a database with a different character set besides what is used as the system-wide default. For example: + +``` sql +CREATE DATABASE korean WITH ENCODING 'EUC_KR'; +``` + +**Note:** Although you can specify any encoding you want for a database, it is unwise to choose an encoding that is not what is expected by the locale you have selected. The LC\_COLLATE and LC\_CTYPE settings imply a particular encoding, and locale-dependent operations (such as sorting) are likely to misinterpret data that is in an incompatible encoding. + +Since these locale settings are frozen by hawq init, the apparent flexibility to use different encodings in different databases is more theoretical than real. + +One way to use multiple encodings safely is to set the locale to C or POSIX during initialization time, thus disabling any real locale awareness. + +## <a id="charactersetconversionbetweenserverandclient"></a>Character Set Conversion Between Server and Client + +HAWQ supports automatic character set conversion between server and client for certain character set combinations. The conversion information is stored in the master pg\_conversion system catalog table. HAWQ comes with some predefined conversions or you can create a new conversion using the SQL command CREATE CONVERSION. + +| Server Character Set | Available Client Sets | +|----------------------|--------------------------------------------------------------------------------------------------------------------------------| +| BIG5 | not supported as a server encoding | +| EUC\_CN | EUC\_CN, MULE\_INTERNAL, UTF8 | +| EUC\_JP | EUC\_JP, MULE\_INTERNAL, SJIS, UTF8 | +| EUC\_KR | EUC\_KR, MULE\_INTERNAL, UTF8 | +| EUC\_TW | EUC\_TW, BIG5, MULE\_INTERNAL, UTF8 | +| GB18030 | not supported as a server encoding | +| GBK | not supported as a server encoding | +| ISO\_8859\_5 | ISO\_8859\_5, KOI8, MULE\_INTERNAL, UTF8, WIN866, WIN1251 | +| ISO\_8859\_6 | ISO\_8859\_6, UTF8 | +| ISO\_8859\_7 | ISO\_8859\_7, UTF8 | +| ISO\_8859\_8 | ISO\_8859\_8, UTF8 | +| JOHAB | JOHAB, UTF8 | +| KOI8 | KOI8, ISO\_8859\_5, MULE\_INTERNAL, UTF8, WIN866, WIN1251 | +| LATIN1 | LATIN1, MULE\_INTERNAL, UTF8 | +| LATIN2 | LATIN2, MULE\_INTERNAL, UTF8, WIN1250 | +| LATIN3 | LATIN3, MULE\_INTERNAL, UTF8 | +| LATIN4 | LATIN4, MULE\_INTERNAL, UTF8 | +| LATIN5 | LATIN5, UTF8 | +| LATIN6 | LATIN6, UTF8 | +| LATIN7 | LATIN7, UTF8 | +| LATIN8 | LATIN8, UTF8 | +| LATIN9 | LATIN9, UTF8 | +| LATIN10 | LATIN10, UTF8 | +| MULE\_INTERNAL | MULE\_INTERNAL, BIG5, EUC\_CN, EUC\_JP, EUC\_KR, EUC\_TW, ISO\_8859\_5, KOI8, LATIN1 to LATIN4, SJIS, WIN866, WIN1250, WIN1251 | +| SJIS | not supported as a server encoding | +| SQL\_ASCII | not supported as a server encoding | +| UHC | not supported as a server encoding | +| UTF8 | all supported encodings | +| WIN866 | WIN866 | +| WIN874 | WIN874, UTF8 | +| WIN1250 | WIN1250, LATIN2, MULE\_INTERNAL, UTF8 | +| WIN1251 | WIN1251, ISO\_8859\_5, KOI8, MULE\_INTERNAL, UTF8, WIN866 | +| WIN1252 | WIN1252, UTF8 | +| WIN1253 | WIN1253, UTF8 | +| WIN1254 | WIN1254, UTF8 | +| WIN1255 | WIN1255, UTF8 | +| WIN1256 | WIN1256, UTF8 | +| WIN1257 | WIN1257, UTF8 | +| WIN1258 | WIN1258, UTF8 | + +To enable automatic character set conversion, you have to tell HAWQ the character set (encoding) you would like to use in the client. There are several ways to accomplish this: + +- Using the \\encoding command in psql, which allows you to change client encoding on the fly. +- Using SET client\_encoding TO. Setting the client encoding can be done with this SQL command: + + ``` sql + SET CLIENT_ENCODING TO 'value'; + ``` + + To query the current client encoding: + + ``` sql + SHOW client_encoding; + ``` + + To return the default encoding: + + ``` sql + RESET client_encoding; + ``` + +- Using the PGCLIENTENCODING environment variable. When PGCLIENTENCODING is defined in the client's environment, that client encoding is automatically selected when a connection to the server is made. (This can subsequently be overridden using any of the other methods mentioned above.) +- Setting the configuration parameter client\_encoding. If client\_encoding is set in the master `hawq-site.xml` file, that client encoding is automatically selected when a connection to HAWQ is made. (This can subsequently be overridden using any of the other methods mentioned above.) + +If the conversion of a particular character is not possible â suppose you chose EUC\_JP for the server and LATIN1 for the client, then some Japanese characters do not have a representation in LATIN1 â then an error is reported. + +If the client character set is defined as SQL\_ASCII, encoding conversion is disabled, regardless of the serverâs character set. The use of SQL\_ASCII is unwise unless you are working with all-ASCII data. SQL\_ASCII is not supported as a server encoding. + + http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/reference/HAWQDataTypes.html.md.erb ---------------------------------------------------------------------- diff --git a/markdown/reference/HAWQDataTypes.html.md.erb b/markdown/reference/HAWQDataTypes.html.md.erb new file mode 100644 index 0000000..fe5cff7 --- /dev/null +++ b/markdown/reference/HAWQDataTypes.html.md.erb @@ -0,0 +1,139 @@ +--- +title: Data Types +--- + +This topic provides a reference of the data types supported in HAWQ. + +HAWQ has a rich set of native data types available to users. Users may also define new data types using the `CREATE TYPE` command. This reference shows all of the built-in data types. In addition to the types listed here, there are also some internally used data types, such as **oid** (object identifier), but those are not documented in this guide. + +The following data types are specified by SQL: + +- array (*) +- bit +- bit varying +- boolean +- character varying +- char +- character +- date +- decimal +- double precision +- integer +- interval +- numeric +- real +- smallint +- time (with or without time zone) +- timestamp (with or without time zone) +- varchar + +**Note**(\*): HAWQ supports the array data type for append-only tables; parquet table storage does *not* support the array type. + +Each data type has an external representation determined by its input and output functions. Many of the built-in types have obvious external formats. However, several types are unique to HAWQ, such as geometric paths, or have several possibilities for formats, such as the date and time types. Some of the input and output functions are not invertible. That is, the result of an output function may lose accuracy when compared to the original input. + + + <span class="tablecap">Table 1. HAWQ Built-in Data Types</span> +======= + + +| Name | Alias | Size | Range | Description | +|--------------------------------------------|---------------------|-----------------------|---------------------------------------------|-----------------------------------------------------------------------------------| +| array | [ ] | variable (ignored) | multi-dimensional | any built-in or user-defined base type, enum type, or composite type | +| bigint | int8 | 8 bytes | -9223372036854775808 to 9223372036854775807 | large range integer | +| bigserial | serial8 | 8 bytes | 1 to 9223372036854775807 | large autoincrementing integer | +| bit \[ (n) \] |  | n bits | bit string constant | fixed-length bit string | +| bit varying \[ (n) \] | varbit | actual number of bits | bit string constant | variable-length bit string | +| boolean | bool | 1 byte | true/false, t/f, yes/no, y/n, 1/0 | logical Boolean (true/false) | +| box |  | 32 bytes | ((x1,y1),(x2,y2)) | rectangular box in the plane - not allowed in distribution key columns. | +| bytea |  | 1 byte + binarystring | sequence of octets | variable-length binary string | +| character \[ (n) \] | char  \[ (n) \] | 1 byte + n | strings up to n characters in length | fixed-length, blank padded | +| character varying \[ (n) \] | varchar \[ (n) \] | 1 byte + binarystring | strings up to n characters in length | variable-length with limit | +| cidr |  | 12 or 24 bytes |  | IPv4 networks | +| circle |  | 24 bytes | <(x,y),r> (center and radius) | circle in the plane - not allowed in distribution key columns. | +| date |  | 4 bytes | 4713 BC - 294,277 AD |  calendar date (year, month, day) | +| decimal \[ (p, s) \] | numeric \[ (p,s) \] | variable | no limit | user-specified, inexact | +| double precision | float8 float | 8 bytes | 15 decimal digits precision | variable-precision, inexact | +| inet |  | 12 or 24 bytes |  |  IPv4 hosts and networks | +| integer | int, int4 | 4 bytes | -2147483648 to +2147483647 | usual choice for integer | +| interval \[ (p) \] |  | 12 bytes | -178000000 years - 178000000 years | time span | +| lseg |  | 32 bytes | ((x1,y1),(x2,y2)) | line segment in the plane - not allowed in distribution key columns. | +| macaddr |  | 6 bytes |  | MAC addresses | +| money |  | 4 bytes | -21474836.48 to +21474836.47 | currency amount | +| path |  | 16+16n bytes | \[(x1,y1),...\] | geometric path in the plane - not allowed in distribution key columns. | +| point |  | 16 bytes | (x, y) | geometric path in the plane - not allowed in distribution key columns. | +| polygon |  | 40+16n bytes |  \[(x1,y1),...\] | closed geometric path in the plane - not allowed in the distribution key columns. | +| real | float4 | 4 bytes | 6 decimal digits precision |  variable-precision, inexact | +| serial | serial4 | 4 bytes | 1 to 2147483647 | autoincrementing integer | +| smallint | int2 | 2 bytes | -32768 to +32767 | small range integer | +| text |  | 1 byte + string size | strings of any length | variable unlimited length | +| time \[ (p) \] \[ without time zone \] |  | 8 bytes | 00:00:00\[.000000\] - 24:00:00\[.000000\] | time of day only | +| time \[ (p) \] with time zone | timetz | 12 bytes | 00:00:00+1359 - 24:00:00-1359 | time of day only, with time zone | +| timestamp \[ (p) \] \[without time zone \] |  | 8 bytes | 4713 BC - 294,277 AD | both date and time | +| timestamp \[ (p) \] with time zone | timestamptz | 8 bytes | 4713 BC - 294,277 AD | both date and time, with time zone | +| xml |  | 1 byte + xml size | xml of any length | variable unlimited length | + + +For variable length data types (such as char, varchar, text, xml, etc.) if the data is greater than or equal to 127 bytes, the storage overhead is 4 bytes instead of 1. + +**Note**: Use these documented built-in types when creating user tables. Any other data types that might be visible in the source code are for internal use only. + +## <a id="timezones"></a>Time Zones + +Time zones, and time-zone conventions, are influenced by political decisions, not just earth geometry. Time zones around the world became somewhat standardized during the 1900's, but continue to be prone to arbitrary changes, particularly with respect to daylight-savings rules. HAWQ uses the widely-used zoneinfo time zone database for information about historical time zone rules. For times in the future, the assumption is that the latest known rules for a given time zone will continue to be observed indefinitely far into the future. + +HAWQ is compatible with the SQL standard definitions for typical usage. However, the SQL standard has an odd mix of date and time types and capabilities. Two obvious problems are: + +- Although the date type cannot have an associated time zone, the time type can. Time zones in the real world have little meaning unless associated with a date as well as a time, since the offset can vary through the year with daylight-saving time boundaries. +- The default time zone is specified as a constant numeric offset from UTC. It is therefore impossible to adapt to daylight-saving time when doing date/time arithmetic across DST boundaries. + +To address these difficulties, use date/time types that contain both date and time when using time zones. Do not use the type time with time zone (although HAWQ supports this for legacy applications and for compliance with the SQL standard). HAWQ assumes your local time zone for any type containing only date or time. + +All timezone-aware dates and times are stored internally in UTC. They are converted to local time in the zone specified by the timezone configuration parameter before being displayed to the client. + +HAWQ allows you to specify time zones in three different forms: + +- A full time zone name, for example America/New\_York. HAWQ uses the widely-used zoneinfo time zone data for this purpose, so the same names are also recognized by much other software. +- A time zone abbreviation, for example PST. Such a specification merely defines a particular offset from UTC, in contrast to full time zone names which can imply a set of daylight savings transition-date rules as well. You cannot set the configuration parameters timezone or log\_timezone to a time zone abbreviation, but you can use abbreviations in date/time input values and with the AT TIME ZONE operator. +- In addition to the timezone names and abbreviations, HAWQ /> accepts POSIX-style time zone specifications of the form STDoffset or STDoffsetDST, where STD is a zone abbreviation, offset is a numeric offset in hours west from UTC, and DST is an optional daylight-savings zone abbreviation, assumed to stand for one hour ahead of the given offset. For example, if EST5EDT were not already a recognized zone name, it would be accepted and would be functionally equivalent to United States East Coast time. When a daylight-savings zone name is present, it is assumed to be used according to the same daylight-savings transition rules used in the zoneinfo time zone database's posixrules entry. In a standard HAWQ installation, posixrules is the same as US/Eastern, so that POSIX-style time zone specifications follow USA daylight-savings rules. If needed, you can adjust this behavior by replacing the posixrules file. + +In short, this is the difference between abbreviations and full names: abbreviations always represent a fixed offset from UTC, whereas most of the full names imply a local daylight-savings time rule, and so have two possible UTC offsets. + +One should be wary that the POSIX-style time zone feature can lead to silently accepting bogus input, since there is no check on the reasonableness of the zone abbreviations. For example, SET TIMEZONE TO FOOBAR0 will work, leaving the system effectively using a rather peculiar abbreviation for UTC. Another issue to keep in mind is that in POSIX time zone names, positive offsets are used for locations west of Greenwich. Everywhere else, PostgreSQL follows the ISO-8601 convention that positive timezone offsets are east of Greenwich. + +In all cases, timezone names are recognized case-insensitively. + +Neither full names nor abbreviations are hard-wired into the server, see [Date and Time Configuration Files](#dateandtimeconfigurationfiles). + +The timezone configuration parameter can be set in the file `hawq-site.xml`. There are also several special ways to set it: + +- If timezone is not specified in `hawq-site.xml` or as a server command-line option, the server attempts to use the value of the TZ environment variable as the default time zone. If TZ is not defined or is not any of the time zone names known to PostgreSQL, the server attempts to determine the operating system's default time zone by checking the behavior of the C library function localtime(). The default time zone is selected as the closest match from the known time zones. +- The SQL command SET TIME ZONE sets the time zone for the session. This is an alternative spelling of SET TIMEZONE TO with a more SQL-spec-compatible syntax. +- The PGTZ environment variable is used by libpq clients to send a SET TIME ZONE command to the server upon connection. + +## <a id="dateandtimeconfigurationfiles"></a>Date and Time Configuration Files + +Since timezone abbreviations are not well standardized, HAWQ /> provides a means to customize the set of abbreviations accepted by the server. The timezone\_abbreviations run-time parameter determines the active set of abbreviations. While this parameter can be altered by any database user, the possible values for it are under the control of the database administrator â they are in fact names of configuration files stored in .../share/timezonesets/ of the installation directory. By adding or altering files in that directory, the administrator can set local policy for timezone abbreviations. + +timezone\_abbreviations can be set to any file name found in .../share/timezonesets/, if the file's name is entirely alphabetic. (The prohibition against non-alphabetic characters in timezone\_abbreviations prevents reading files outside the intended directory, as well as reading editor backup files and other extraneous files.) + +A timezone abbreviation file can contain blank lines and comments beginning with \#. Non-comment lines must have one of these formats: + +``` pre +time_zone_nameoffsettime_zone_nameoffset D +@INCLUDE file_name +@OVERRIDE +``` + +A time\_zone\_name is just the abbreviation being defined. The offset is the zone's offset in seconds from UTC, positive being east from Greenwich and negative being west. For example, -18000 would be five hours west of Greenwich, or North American east coast standard time. D indicates that the zone name represents local daylight-savings time rather than standard time. Since all known time zone offsets are on 15 minute boundaries, the number of seconds has to be a multiple of 900. + +The @INCLUDE syntax allows inclusion of another file in the .../share/timezonesets/ directory. Inclusion can be nested, to a limited depth. + +The @OVERRIDE syntax indicates that subsequent entries in the file can override previous entries (i.e., entries obtained from included files). Without this, conflicting definitions of the same timezone abbreviation are considered an error. + +In an unmodified installation, the file Default contains all the non-conflicting time zone abbreviations for most of the world. Additional files Australia and India are provided for those regions: these files first include the Default file and then add or modify timezones as needed. + +For reference purposes, a standard installation also contains files Africa.txt, America.txt, etc, containing information about every time zone abbreviation known to be in use according to the zoneinfo timezone database. The zone name definitions found in these files can be copied and pasted into a custom configuration file as needed. + +**Note:** These files cannot be directly referenced as timezone\_abbreviations settings, because of the dot embedded in their names. + + http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/reference/HAWQEnvironmentVariables.html.md.erb ---------------------------------------------------------------------- diff --git a/markdown/reference/HAWQEnvironmentVariables.html.md.erb b/markdown/reference/HAWQEnvironmentVariables.html.md.erb new file mode 100644 index 0000000..ce21798 --- /dev/null +++ b/markdown/reference/HAWQEnvironmentVariables.html.md.erb @@ -0,0 +1,97 @@ +--- +title: Environment Variables +--- + +This topic contains a reference of the environment variables that you set for HAWQ. + +Set these in your userâs startup shell profile (such as `~/.bashrc` or `~/.bash_profile`), or in `/etc/profile`, if you want to set them for all users. + +## <a id="requiredenvironmentvariables"></a>Required Environment Variables + +**Note:** `GPHOME`, `PATH` and `LD_LIBRARY_PATH` can be set by sourcing the `greenplum_path.sh` file from your HAWQ installation directory. + +### <a id="gphome"></a>GPHOME + +This is the installed location of your HAWQ software. For example: + +``` pre +GPHOME=/usr/local/hawq +export GPHOME +``` + +### <a id="path"></a>PATH + +Your `PATH` environment variable should point to the location of the HAWQ bin directory. For example: + +``` pre +PATH=$GPHOME/bin:$PATH +export PATH +``` + +### <a id="ld_library_path"></a>LD\_LIBRARY\_PATH + +The `LD_LIBRARY_PATH` environment variable should point to the location of the `HAWQ/PostgreSQL` library files. For example: + +``` pre +LD_LIBRARY_PATH=$GPHOME/lib +export LD_LIBRARY_PATH +``` + +## <a id="optionalenvironmentvariables"></a>Optional Environment Variables + +The following are HAWQ environment variables. You may want to add the connection-related environment variables to your profile, for convenience. That way, you do not have to type so many options on the command line for client connections. Note that these environment variables should be set on the HAWQ master host only. + + +### <a id="pgappname"></a>PGAPPNAME + +This is the name of the application that is usually set by an application when it connects to the server. This name is displayed in the activity view and in log entries. The `PGAPPNAME` environmental variable behaves the same as the `application_name` connection parameter. The default value for `application_name` is `psql`. The name cannot be longer than 63 characters. + +### <a id="pgdatabase"></a>PGDATABASE + +The name of the default database to use when connecting. + +### <a id="pghost"></a>PGHOST + +The HAWQ master host name. + +### <a id="pghostaddr"></a>PGHOSTADDR + +The numeric IP address of the master host. This can be set instead of, or in addition to, `PGHOST`, to avoid DNS lookup overhead. + +### <a id="pgpassword"></a>PGPASSWORD + +The password used if the server demands password authentication. Use of this environment variable is not recommended, for security reasons (some operating systems allow non-root users to see process environment variables via ps). Instead, consider using the `~/.pgpass` file. + +### <a id="pgpassfile"></a>PGPASSFILE + +The name of the password file to use for lookups. If not set, it defaults to `~/.pgpass`. + +See The Password File under [Configuring Client Authentication](../clientaccess/client_auth.html). + +### <a id="pgoptions"></a>PGOPTIONS + +Sets additional configuration parameters for the HAWQ master server. + +### <a id="pgport"></a>PGPORT + +The port number of the HAWQ server on the master host. The default port is 5432. + +### <a id="pguser"></a>PGUSER + +The HAWQ user name used to connect. + +### <a id="pgdatestyle"></a>PGDATESTYLE + +Sets the default style of date/time representation for a session. (Equivalent to `SET datestyle TO....`) + +### <a id="pgtz"></a>PGTZ + +Sets the default time zone for a session. (Equivalent to `SET timezone TO....`) + +### <a id="pgclientencoding"></a>PGCLIENTENCODING + +Sets the default client character set encoding for a session. (Equivalent to `SET client_encoding TO....`) + +  + + http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/reference/HAWQSampleSiteConfig.html.md.erb ---------------------------------------------------------------------- diff --git a/markdown/reference/HAWQSampleSiteConfig.html.md.erb b/markdown/reference/HAWQSampleSiteConfig.html.md.erb new file mode 100644 index 0000000..d4cae5a --- /dev/null +++ b/markdown/reference/HAWQSampleSiteConfig.html.md.erb @@ -0,0 +1,120 @@ +--- +title: Sample hawq-site.xml Configuration File +--- + +```xml +<configuration> + <property> + <name>default_hash_table_bucket_number</name> + <value>18</value> + </property> + + <property> + <name>hawq_dfs_url</name> + <value>hawq.example.com:8020/hawq_default</value> + </property> + + <property> + <name>hawq_global_rm_type</name> + <value>none</value> + </property> + + <property> + <name>hawq_master_address_host</name> + <value>hawq.example.com</value> + </property> + + <property> + <name>hawq_master_address_port</name> + <value>5432</value> + </property> + + <property> + <name>hawq_master_directory</name> + <value>/data/hawq/master</value> + </property> + + <property> + <name>hawq_master_temp_directory</name> + <value>/tmp/hawq/master</value> + </property> + + <property> + <name>hawq_re_cgroup_hierarchy_name</name> + <value>hawq</value> + </property> + + <property> + <name>hawq_re_cgroup_mount_point</name> + <value>/sys/fs/cgroup</value> + </property> + + <property> + <name>hawq_re_cpu_enable</name> + <value>false</value> + </property> + + <property> + <name>hawq_rm_memory_limit_perseg</name> + <value>64GB</value> + </property> + + <property> + <name>hawq_rm_nvcore_limit_perseg</name> + <value>16</value> + </property> + + <property> + <name>hawq_rm_nvseg_perquery_limit</name> + <value>512</value> + </property> + + <property> + <name>hawq_rm_nvseg_perquery_perseg_limit</name> + <value>6</value> + </property> + + <property> + <name>hawq_rm_yarn_address</name> + <value>rm.example.com:8050</value> + </property> + + <property> + <name>hawq_rm_yarn_app_name</name> + <value>hawq</value> + </property> + + <property> + <name>hawq_rm_yarn_queue_name</name> + <value>default</value> + </property> + + <property> + <name>hawq_rm_yarn_scheduler_address</name> + <value>rm.example.com:8030</value> + </property> + + <property> + <name>hawq_segment_address_port</name> + <value>40000</value> + </property> + + <property> + <name>hawq_segment_directory</name> + <value>/data/hawq/segment</value> + </property> + + <property> + <name>hawq_segment_temp_directory</name> + <value>/tmp/hawq/segment</value> + </property> + + <property> + <name>hawq_standby_address_host</name> + <value>standbyhost.example.com</value> + </property> + +</configuration> +``` + + http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/reference/HAWQSiteConfig.html.md.erb ---------------------------------------------------------------------- diff --git a/markdown/reference/HAWQSiteConfig.html.md.erb b/markdown/reference/HAWQSiteConfig.html.md.erb new file mode 100644 index 0000000..3d20297 --- /dev/null +++ b/markdown/reference/HAWQSiteConfig.html.md.erb @@ -0,0 +1,23 @@ +--- +title: Server Configuration Parameter Reference +--- + +This section describes all server configuration guc/parameters that are available in HAWQ. + +Configuration guc/parameters are located in `$GPHOME/etc/hawq-site.xml`. This configuration file resides on all HAWQ instances and is managed either by Ambari or by using the `hawq config` utility. On HAWQ clusters installed and managed by Ambari, always use the Ambari administration interface, and not `hawq config`, to configure HAWQ properties. Ambari will overwrite any changes made using `hawq config`. + +You can use the same configuration file cluster-wide across both master and segments. + +**Note:** While `postgresql.conf` still exists in HAWQ, any parameters defined in `hawq-site.xml` will overwrite configurations in `postgresql.conf`. For this reason, we recommend that you only use `hawq-site.xml` to configure your HAWQ cluster. + +**Note:** If you install and manage HAWQ using Ambari, be aware that any property changes to `hawq-site.xml` made using the command line could be overwritten by Ambari. For Ambari-managed HAWQ clusters, always use the Ambari administration interface to set or change HAWQ configuration properties. + +- **[About Server Configuration Parameters](../reference/guc/guc_config.html)** + +- **[Configuration Parameter Categories](../reference/guc/guc_category-list.html)** + +- **[Configuration Parameters](../reference/guc/parameter_definitions.html)** + +- **[Sample hawq-site.xml Configuration File](../reference/HAWQSampleSiteConfig.html)** + +