remove book branching on incubator-hawq-docs

yozie Tue, 10 Jan 2017 15:54:52 -0800

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/overview/TableDistributionStorage.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/overview/TableDistributionStorage.html.md.erb 
b/markdown/overview/TableDistributionStorage.html.md.erb
new file mode 100755
index 0000000..ec1d8b5
--- /dev/null
+++ b/markdown/overview/TableDistributionStorage.html.md.erb
@@ -0,0 +1,41 @@
+---
+title: Table Distribution and Storage
+---
+
+HAWQ stores all table data, except the system table, in HDFS. When a user 
creates a table, the metadata is stored on the master's local file system and 
the table content is stored in HDFS.
+
+In order to simplify table data management, all the data of one relation are 
saved under one HDFS folder.
+
+For all HAWQ table storage formats, AO \(Append-Only\) and Parquet, the data 
files are splittable, so that HAWQ can assign multiple virtual segments to 
consume one data file concurrently. This increases the degree of query 
parallelism.
+
+## Table Distribution Policy
+
+The default table distribution policy in HAWQ is random.
+
+Randomly distributed tables have some benefits over hash distributed tables. 
For example, after cluster expansion, HAWQ can use more resources automatically 
without redistributing the data. For huge tables, redistribution is very 
expensive, and data locality for randomly distributed tables is better after 
the underlying HDFS redistributes its data during rebalance or DataNode 
failures. This is quite common when the cluster is large.
+
+On the other hand, for some queries, hash distributed tables are faster than 
randomly distributed tables. For example, hash distributed tables have some 
performance benefits for some TPC-H queries. You should choose the distribution 
policy that is best suited for your application's scenario.
+
+See [Choosing the Table Distribution Policy](../ddl/ddl-table.html) for more 
details.
+
+## Data Locality
+
+Data is distributed across HDFS DataNodes. Since remote read involves network 
I/O, a data locality algorithm improves the local read ratio. HAWQ considers 
three aspects when allocating data blocks to virtual segments:
+
+-   Ratio of local read
+-   Continuity of file read
+-   Data balance among virtual segments
+
+## External Data Access
+
+HAWQ can access data in external files using the HAWQ Extension Framework 
(PXF).
+PXF is an extensible framework that allows HAWQ to access data in external
+sources as readable or writable HAWQ tables. PXF has built-in connectors for
+accessing data inside HDFS files, Hive tables, and HBase tables. PXF also
+integrates with HCatalog to query Hive tables directly. See [Using PXF
+with Unmanaged Data](../pxf/HawqExtensionFrameworkPXF.html) for more
+details.
+
+Users can create custom PXF connectors to access other parallel data stores or
+processing engines. Connectors are Java plug-ins that use the PXF API. For more
+information see [PXF External Tables and 
API](../pxf/PXFExternalTableandAPIReference.html).


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/overview/system-overview.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/overview/system-overview.html.md.erb 
b/markdown/overview/system-overview.html.md.erb
new file mode 100644
index 0000000..9fc1c53
--- /dev/null
+++ b/markdown/overview/system-overview.html.md.erb
@@ -0,0 +1,11 @@
+---
+title: Apache HAWQ (Incubating) System Overview
+---
+* <a href="./HAWQOverview.html" class="subnav">What is HAWQ?</a>
+* <a href="./HAWQArchitecture.html" class="subnav">HAWQ Architecture</a>
+* <a href="./TableDistributionStorage.html" class="subnav">Table Distribution 
and Storage</a>
+* <a href="./ElasticSegments.html" class="subnav">Elastic Virtual Segment 
Allocation</a>
+* <a href="./ResourceManagement.html" class="subnav">Resource Management</a>
+* <a href="./HDFSCatalogCache.html" class="subnav">HDFS Catalog Cache</a>
+* <a href="./ManagementTools.html" class="subnav">Management Tools</a>
+* <a href="./RedundancyFailover.html" class="subnav">Redundancy and Fault 
Tolerance</a>

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/plext/UsingProceduralLanguages.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/plext/UsingProceduralLanguages.html.md.erb 
b/markdown/plext/UsingProceduralLanguages.html.md.erb
new file mode 100644
index 0000000..bef1b93
--- /dev/null
+++ b/markdown/plext/UsingProceduralLanguages.html.md.erb
@@ -0,0 +1,23 @@
+---
+title: Using Languages and Extensions in HAWQ
+---
+
+HAWQ supports user-defined functions that are created with the SQL and C 
built-in languages, and also supports user-defined aliases for internal 
functions.
+
+HAWQ also supports user-defined functions written in languages other than SQL 
and C. These other languages are generically called *procedural languages* 
(PLs) and are extensions to the core HAWQ functionality. HAWQ specifically 
supports the PL/Java, PL/Perl, PL/pgSQL, PL/Python, and PL/R procedural 
languages. 
+
+HAWQ additionally provides the `pgcrypto` extension for password hashing and 
data encryption.
+
+This chapter describes these languages and extensions:
+
+-   <a href="builtin_langs.html">Using HAWQ Built-In Languages</a>
+-   <a href="using_pljava.html">Using PL/Java</a>
+-   <a href="using_plperl.html">Using PL/Perl</a>
+-   <a href="using_plpgsql.html">Using PL/pgSQL</a>
+-   <a href="using_plpython.html">Using PL/Python</a>
+-   <a href="using_plr.html">Using PL/R</a>
+-   <a href="using_pgcrypto.html">Using pgcrypto</a>
+
+
+
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/plext/builtin_langs.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/plext/builtin_langs.html.md.erb 
b/markdown/plext/builtin_langs.html.md.erb
new file mode 100644
index 0000000..01891e8
--- /dev/null
+++ b/markdown/plext/builtin_langs.html.md.erb
@@ -0,0 +1,110 @@
+---
+title: Using HAWQ Built-In Languages
+---
+
+This section provides an introduction to using the HAWQ built-in languages.
+
+HAWQ supports user-defined functions created with the SQL and C built-in 
languages. HAWQ also supports user-defined aliases for internal functions.
+
+
+## <a id="enablebuiltin"></a>Enabling Built-in Language Support
+
+Support for SQL and C language user-defined functions and aliasing of internal 
functions is enabled by default for all HAWQ databases.
+
+## <a id="builtinsql"></a>Defining SQL Functions
+
+SQL functions execute an arbitrary list of SQL statements. The SQL statements 
in the body of a SQL function must be separated by semicolons. The final 
statement in a non-void-returning SQL function must be a 
[SELECT](../reference/sql/SELECT.html) that returns data of the type specified 
by the function's return type. The function will return a single or set of rows 
corresponding to this last SQL query.
+
+The following example creates and calls a SQL function to count the number of 
rows of the table named `orders`:
+
+``` sql
+gpadmin=# CREATE FUNCTION count_orders() RETURNS bigint AS $$
+ SELECT count(*) FROM orders;
+$$ LANGUAGE SQL;
+CREATE FUNCTION
+gpadmin=# SELECT count_orders();
+ my_count 
+----------
+   830513
+(1 row)
+```
+
+For additional information about creating SQL functions, refer to [Query 
Language (SQL) 
Functions](https://www.postgresql.org/docs/8.2/static/xfunc-sql.html) in the 
PostgreSQL documentation.
+
+## <a id="builtininternal"></a>Aliasing Internal Functions
+
+Many HAWQ internal functions are written in C. These functions are declared 
during initialization of the database cluster and statically linked to the HAWQ 
server. See [Built-in Functions and 
Operators](../query/functions-operators.html#topic29) for detailed information 
about HAWQ internal functions.
+
+You cannot define new internal functions, but you can create aliases for 
existing internal functions.
+
+The following example creates a new function named `all_caps` that is an alias 
for the `upper` HAWQ internal function:
+
+
+``` sql
+gpadmin=# CREATE FUNCTION all_caps (text) RETURNS text AS 'upper'
+            LANGUAGE internal STRICT;
+CREATE FUNCTION
+gpadmin=# SELECT all_caps('change me');
+ all_caps  
+-----------
+ CHANGE ME
+(1 row)
+
+```
+
+For more information about aliasing internal functions, refer to [Internal 
Functions](https://www.postgresql.org/docs/8.2/static/xfunc-internal.html) in 
the PostgreSQL documentation.
+
+## <a id="builtinc_lang"></a>Defining C Functions
+
+You must compile user-defined functions written in C into shared libraries so 
that the HAWQ server can load them on demand. This dynamic loading 
distinguishes C language functions from internal functions that are written in 
C.
+
+The [CREATE FUNCTION](../reference/sql/CREATE-FUNCTION.html) call for a 
user-defined C function must include both the name of the shared library and 
the name of the function.
+
+If an absolute path to the shared library is not provided, an attempt is made 
to locate the library relative to the: 
+
+1. HAWQ PostgreSQL library directory (obtained via the `pg_config --pkglibdir` 
command)
+2. `dynamic_library_path` configuration value
+3. current working directory
+
+in that order. 
+
+Example:
+
+``` c
+#include "postgres.h"
+#include "fmgr.h"
+
+#ifdef PG_MODULE_MAGIC
+PG_MODULE_MAGIC;
+#endif
+
+PG_FUNCTION_INFO_V1(double_it);
+         
+Datum
+double_it(PG_FUNCTION_ARGS)
+{
+    int32   arg = PG_GETARG_INT32(0);
+
+    PG_RETURN_INT64(arg + arg);
+}
+```
+
+If the above function is compiled into a shared object named `libdoubleit.so` 
located in `/share/libs`, you would register and invoke the function with HAWQ 
as follows:
+
+``` sql
+gpadmin=# CREATE FUNCTION double_it_c(integer) RETURNS integer
+            AS '/share/libs/libdoubleit', 'double_it'
+            LANGUAGE C STRICT;
+CREATE FUNCTION
+gpadmin=# SELECT double_it_c(27);
+ double_it 
+-----------
+        54
+(1 row)
+
+```
+
+The shared library `.so` extension may be omitted.
+
+For additional information about using the C language to create functions, 
refer to [C-Language 
Functions](https://www.postgresql.org/docs/8.2/static/xfunc-c.html) in the 
PostgreSQL documentation.
+

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/plext/using_pgcrypto.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/plext/using_pgcrypto.html.md.erb 
b/markdown/plext/using_pgcrypto.html.md.erb
new file mode 100644
index 0000000..e3e9225
--- /dev/null
+++ b/markdown/plext/using_pgcrypto.html.md.erb
@@ -0,0 +1,32 @@
+---
+title: Enabling Cryptographic Functions for PostgreSQL (pgcrypto)
+---
+
+`pgcrypto` is a package extension included in your HAWQ distribution. You must 
explicitly enable the cryptographic functions to use this extension.
+
+## <a id="pgcryptoprereq"></a>Prerequisites 
+
+
+Before you enable the `pgcrypto` software package, make sure that your HAWQ 
database is running, you have sourced `greenplum_path.sh`, and that the 
`$GPHOME` environment variable is set.
+
+## <a id="enablepgcrypto"></a>Enable pgcrypto 
+
+On every database in which you want to enable `pgcrypto`, run the following 
command:
+
+``` shell
+$ psql -d <dbname> -f $GPHOME/share/postgresql/contrib/pgcrypto.sql
+```
+       
+Replace \<dbname\> with the name of the target database.
+       
+## <a id="uninstallpgcrypto"></a>Disable pgcrypto 
+
+The `uninstall_pgcrypto.sql` script removes `pgcrypto` objects from your 
database.  On each database in which you enabled `pgcrypto` support, execute 
the following:
+
+``` shell
+$ psql -d <dbname> -f $GPHOME/share/postgresql/contrib/uninstall_pgcrypto.sql
+```
+
+Replace \<dbname\> with the name of the target database.
+       
+**Note:**  This script does not remove dependent user-created objects.

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/plext/using_pljava.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/plext/using_pljava.html.md.erb 
b/markdown/plext/using_pljava.html.md.erb
new file mode 100644
index 0000000..99b5767
--- /dev/null
+++ b/markdown/plext/using_pljava.html.md.erb
@@ -0,0 +1,709 @@
+---
+title: Using PL/Java
+---
+
+This section contains an overview of the HAWQ PL/Java language. 
+
+
+## <a id="aboutpljava"></a>About PL/Java 
+
+With the HAWQ PL/Java extension, you can write Java methods using your 
favorite Java IDE and install the JAR files that implement the methods in your 
HAWQ cluster.
+
+**Note**: If building HAWQ from source, you must specify PL/Java as a build 
option when compiling HAWQ. To use PL/Java in a HAWQ deployment, you must 
explicitly enable the PL/Java extension in all desired databases.  
+
+The HAWQ PL/Java package is based on the open source PL/Java 1.4.0. HAWQ 
PL/Java provides the following features.
+
+- Ability to execute PL/Java functions with Java 1.6 or 1.7.
+- Standardized utilities (modeled after the SQL 2003 proposal) to install and 
maintain Java code in the database.
+- Standardized mappings of parameters and result. Complex types as well as 
sets are supported.
+- An embedded, high performance, JDBC driver utilizing the internal HAWQ 
Database SPI routines.
+- Metadata support for the JDBC driver. Both `DatabaseMetaData` and 
`ResultSetMetaData` are included.
+- The ability to return a `ResultSet` from a query as an alternative to 
building a ResultSet row by row.
+- Full support for savepoints and exception handling.
+- The ability to use IN, INOUT, and OUT parameters.
+- Two separate HAWQ languages:
+       - pljava, TRUSTED PL/Java language
+       - pljavau, UNTRUSTED PL/Java language
+- Transaction and Savepoint listeners enabling code execution when a 
transaction or savepoint is committed or rolled back.
+- Integration with GNU GCJ on selected platforms.
+
+A function in SQL will appoint a static method in a Java class. In order for 
the function to execute, the appointed class must available on the class path 
specified by the HAWQ server configuration parameter `pljava_classpath`. The 
PL/Java extension adds a set of functions that helps to install and maintain 
the Java classes. Classes are stored in normal Java archives, JAR files. A JAR 
file can optionally contain a deployment descriptor that in turn contains SQL 
commands to be executed when the JAR is deployed or undeployed. The functions 
are modeled after the standards proposed for SQL 2003.
+
+PL/Java implements a standard way of passing parameters and return values. 
Complex types and sets are passed using the standard JDBC ResultSet class.
+
+A JDBC driver is included in PL/Java. This driver calls HAWQ internal SPI 
routines. The driver is essential since it is common for functions to make 
calls back to the database to fetch data. When PL/Java functions fetch data, 
they must use the same transactional boundaries that are used by the main 
function that entered PL/Java execution context.
+
+PL/Java is optimized for performance. The Java virtual machine executes within 
the same process as the backend to minimize call overhead. PL/Java is designed 
with the objective to enable the power of Java to the database itself so that 
database intensive business logic can execute as close to the actual data as 
possible.
+
+The standard Java Native Interface (JNI) is used when bridging calls between 
the backend and the Java VM.
+
+
+## <a id="abouthawqpljava"></a>About HAWQ PL/Java 
+
+There are a few key differences between the implementation of PL/Java in 
standard PostgreSQL and HAWQ.
+
+### <a id="pljavafunctions"></a>Functions 
+
+The following functions are not supported in HAWQ. The classpath is handled 
differently in a distributed HAWQ environment than in the PostgreSQL 
environment.
+
+- sqlj.install_jar
+- sqlj.install_jar
+- sqlj.replace_jar
+- sqlj.remove_jar
+- sqlj.get_classpath
+- sqlj.set_classpath
+
+HAWQ uses the `pljava_classpath` server configuration parameter in place of 
the `sqlj.set_classpath` function.
+
+### <a id="serverconfigparams"></a>Server Configuration Parameters 
+
+PL/Java uses server configuration parameters to configure classpath, Java VM, 
and other options. Refer to the [Server Configuration Parameter 
Reference](../reference/HAWQSiteConfig.html) for general information about HAWQ 
server configuration parameters.
+
+The following server configuration parameters are used by PL/Java in HAWQ. 
These parameters replace the `pljava.*` parameters that are used in the 
standard PostgreSQL PL/Java implementation.
+
+#### pljava\_classpath
+
+A colon (:) separated list of the jar files containing the Java classes used 
in any PL/Java functions. The jar files must be installed in the same locations 
on all HAWQ hosts. With the trusted PL/Java language handler, jar file paths 
must be relative to the `$GPHOME/lib/postgresql/java/` directory. With the 
untrusted language handler (javaU language tag), paths may be relative to 
`$GPHOME/lib/postgresql/java/` or absolute.
+
+#### pljava\_statement\_cache\_size
+
+Sets the size in KB of the Most Recently Used (MRU) cache for prepared 
statements.
+
+#### pljava\_release\_lingering\_savepoints
+
+If TRUE, lingering savepoints will be released on function exit. If FALSE, 
they will be rolled back.
+
+#### pljava\_vmoptions
+
+Defines the start up options for the Java VM.
+
+### <a id="setting_serverconfigparams"></a>Setting PL/Java Configuration 
Parameters 
+
+You can set PL/Java server configuration parameters at the session level, or 
globally across your whole cluster. Your HAWQ cluster configuration must be 
reloaded after setting a server configuration value globally.
+
+#### <a id="setsrvrcfg_global"></a>Cluster Level
+
+You will perform different procedures to set a PL/Java server configuration 
parameter for your whole HAWQ cluster depending upon whether you manage your 
cluster from the command line or use Ambari. If you use Ambari to manage your 
HAWQ cluster, you must ensure that you update server configuration parameters 
only via the Ambari Web UI. If you manage your HAWQ cluster from the command 
line, you will use the `hawq config` command line utility to set PL/Java server 
configuration parameters.
+
+The following examples add a JAR file named `myclasses.jar` to the 
`pljava_classpath` server configuration parameter for the entire HAWQ cluster.
+
+If you use Ambari to manage your HAWQ cluster:
+
+1. Set the `pljava_classpath` configuration property to include 
`myclasses.jar` via the HAWQ service **Configs > Advanced > Custom hawq-site** 
drop down. 
+2. Select **Service Actions > Restart All** to load the updated configuration.
+
+If you manage your HAWQ cluster from the command line:
+
+1.  Log in to the HAWQ master host as a HAWQ administrator and source the file 
`/usr/local/hawq/greenplum_path.sh`.
+
+    ``` shell
+    $ source /usr/local/hawq/greenplum_path.sh
+    ```
+
+1. Use the `hawq config` utility to set `pljava_classpath`:
+
+    ``` shell
+    $ hawq config -c pljava_classpath -v \'myclasses.jar\'
+    ```
+2. Reload the HAWQ configuration:
+
+    ``` shell
+    $ hawq stop cluster -u
+    ```
+
+#### <a id="setsrvrcfg_session"></a>Session Level 
+
+To set a PL/Java server configuration parameter for only the *current* 
database session, set the parameter within the `psql` subsystem. For example, 
to set `pljava_classpath`:
+       
+``` sql
+=> SET pljava_classpath='myclasses.jar';
+```
+
+
+## <a id="enablepljava"></a>Enabling and Removing PL/Java Support 
+
+The PL/Java extension must be explicitly enabled on each database in which it 
will be used.
+
+
+### <a id="pljavaprereq"></a>Prerequisites 
+
+Before you enable PL/Java:
+
+1. Ensure that you have installed a supported Java runtime environment and 
that the `$JAVA_HOME` variable is set to the same path on the master and all 
segment nodes.
+
+2. Perform the following step on all machines to set up `ldconfig` for the 
installed JDK:
+
+       ``` shell
+       $ echo "$JAVA_HOME/jre/lib/amd64/server" > /etc/ld.so.conf.d/libjdk.conf
+       $ ldconfig
+       ```
+4. Make sure that your HAWQ cluster is running, you have sourced 
`greenplum_path.sh` and that your `$GPHOME` environment variable is set.
+
+
+### <a id="enablepljava"></a>Enable PL/Java and Install JAR Files 
+
+To use PL/Java:
+
+1. Enable the language for each database.
+1. Install user-created JAR files on all HAWQ hosts.
+1. Add the names of the JAR files to the HAWQ `pljava_classpath` server 
configuration parameter. This parameter value should identify a list of the 
installed JAR files.
+
+#### <a id="enablepljava"></a>Enable PL/Java and Install JAR Files 
+
+Perform the following steps as the `gpadmin` user:
+
+1. Enable PL/Java by running the `$GPHOME/share/postgresql/pljava/install.sql` 
SQL script in the databases that will use PL/Java. The `install.sql` script 
registers both the trusted and untrusted PL/Java languages. For example, the 
following command enables PL/Java on a database named `testdb`:
+
+       ``` shell
+       $ psql -d testdb -f $GPHOME/share/postgresql/pljava/install.sql
+       ```
+       
+       To enable the PL/Java extension in all new HAWQ databases, run the 
script on the `template1` database: 
+
+    ``` shell
+    $ psql -d template1 -f $GPHOME/share/postgresql/pljava/install.sql
+    ```
+
+    Use this option *only* if you are certain you want to enable PL/Java in 
all new databases.
+       
+2. Copy your Java archives (JAR files) to `$GPHOME/lib/postgresql/java/` on 
all HAWQ hosts. This example uses the `hawq scp` utility to copy the 
`myclasses.jar` file located in the current directory:
+
+       ``` shell
+       $ hawq scp -f hawq_hosts myclasses.jar =:$GPHOME/lib/postgresql/java/
+       ```
+       The `hawq_hosts` file contains a list of the HAWQ hosts.
+
+3. Add the JAR files to the `pljava_classpath` configuration parameter. Refer 
to [Setting PL/Java Configuration Parameters](#setting_serverconfigparams) for 
the specific procedure.
+
+5. (Optional) Your HAWQ installation includes an `examples.sql` file.  This 
script contains sample PL/Java functions that you can use for testing. Run the 
commands in this file to create and run test functions that use the Java 
classes in `examples.jar`:
+
+       ``` shell
+       $ psql -f $GPHOME/share/postgresql/pljava/examples.sql
+       ```
+
+#### Configuring PL/Java VM Options
+
+PL/Java JVM options can be configured via the `pljava_vmoptions` server 
configuration parameter. For example, `pljava_vmoptions=-Xmx512M` sets the 
maximum heap size of the JVM. The default `-Xmx` value is `64M`.
+
+Refer to [Setting PL/Java Configuration 
Parameters](#setting_serverconfigparams) for the specific procedure to set 
PL/Java server configuration parameters.
+
+       
+### <a id="uninstallpljava"></a>Disable PL/Java 
+
+To disable PL/Java, you should:
+
+1. Remove PL/Java support from each database in which it was added.
+2. Uninstall the Java JAR files.
+
+#### <a id="uninstallpljavasupport"></a>Remove PL/Java Support from Databases 
+
+For a database that no longer requires the PL/Java language, remove support 
for PL/Java by running the `uninstall.sql` script as the `gpadmin` user. For 
example, the following command disables the PL/Java language in the specified 
database:
+
+``` shell
+$ psql -d <dbname> -f $GPHOME/share/postgresql/pljava/uninstall.sql
+```
+
+Replace \<dbname\> with the name of the target database.
+
+
+#### <a id="uninstallpljavapackage"></a>Uninstall the Java JAR files 
+
+When no databases have PL/Java as a registered language, remove the Java JAR 
files.
+
+If you use Ambari to manage your cluster:
+
+1. Remove the `pljava_classpath` configuration property via the HAWQ service 
**Configs > Advanced > Custom hawq-site** drop down.
+
+2. Remove the JAR files from the `$GPHOME/lib/postgresql/java/` directory of 
each HAWQ host.
+
+3. Select **Service Actions > Restart All** to restart your HAWQ cluster.
+
+
+If you manage your cluster from the command line:
+
+1.  Log in to the HAWQ master host as a HAWQ administrator and source the file 
`/usr/local/hawq/greenplum_path.sh`.
+
+    ``` shell
+    $ source /usr/local/hawq/greenplum_path.sh
+    ```
+
+1. Use the `hawq config` utility to remove `pljava_classpath`:
+
+    ``` shell
+    $ hawq config -r pljava_classpath
+    ```
+    
+2. Remove the JAR files from the `$GPHOME/lib/postgresql/java/` directory of 
each HAWQ host.
+
+3. If you manage your cluster from the command line, run:
+
+    ``` shell
+    $ hawq restart cluster
+    ```
+
+
+## <a id="writingpljavafunc"></a>Writing PL/Java Functions 
+
+This section provides information about writing functions with PL/Java.
+
+- [SQL Declaration](#sqldeclaration)
+- [Type Mapping](#typemapping)
+- [NULL Handling](#nullhandling)
+- [Complex Types](#complextypes)
+- [Returning Complex Types](#returningcomplextypes)
+- [Functions That Return Sets](#functionreturnsets)
+- [Returning a SETOF \<scalar type\>](#returnsetofscalar)
+- [Returning a SETOF \<complex type\>](#returnsetofcomplex)
+
+
+### <a id="sqldeclaration"></a>SQL Declaration 
+
+A Java function is declared with the name of a class and a static method on 
that class. The class will be resolved using the classpath that has been 
defined for the schema where the function is declared. If no classpath has been 
defined for that schema, the public schema is used. If no classpath is found 
there either, the class is resolved using the system classloader.
+
+The following function can be declared to access the static method getProperty 
on `java.lang.System` class:
+
+```sql
+=> CREATE FUNCTION getsysprop(VARCHAR)
+     RETURNS VARCHAR
+     AS 'java.lang.System.getProperty'
+   LANGUAGE java;
+```
+
+Run the following command to return the Java `user.home` property:
+
+```sql
+=> SELECT getsysprop('user.home');
+```
+
+### <a id="typemapping"></a>Type Mapping 
+
+Scalar types are mapped in a straightforward way. This table lists the current 
mappings.
+
+***Table 1: PL/Java data type mappings***
+
+| PostgreSQL | Java |
+|------------|------|
+| bool | boolean |
+| char | byte |
+| int2 | short |
+| int4 | int |
+| int8 | long |
+| varchar | java.lang.String |
+| text | java.lang.String |
+| bytea | byte[ ] |
+| date | java.sql.Date |
+| time | java.sql.Time (stored value treated as local time) |
+| timetz | java.sql.Time |
+| timestamp    | java.sql.Timestamp (stored value treated as local time) |
+| timestampz | java.sql.Timestamp |
+| complex |    java.sql.ResultSet |
+| setof complex        | java.sql.ResultSet |
+
+All other types are mapped to `java.lang.String` and will utilize the standard 
textin/textout routines registered for respective type.
+
+### <a id="nullhandling"></a>NULL Handling 
+
+The scalar types that map to Java primitives can not be passed as NULL values. 
To pass NULL values, those types can have an alternative mapping. You enable 
this mapping by explicitly denoting it in the method reference.
+
+```sql
+=> CREATE FUNCTION trueIfEvenOrNull(integer)
+     RETURNS bool
+     AS 'foo.fee.Fum.trueIfEvenOrNull(java.lang.Integer)'
+   LANGUAGE java;
+```
+
+The Java code would be similar to this:
+
+```java
+package foo.fee;
+public class Fum
+{
+  static boolean trueIfEvenOrNull(Integer value)
+  {
+    return (value == null)
+      ? true
+      : (value.intValue() % 1) == 0;
+  }
+}
+```
+
+The following two statements both yield true:
+
+```sql
+=> SELECT trueIfEvenOrNull(NULL);
+=> SELECT trueIfEvenOrNull(4);
+```
+
+In order to return NULL values from a Java method, you use the object type 
that corresponds to the primitive (for example, you return `java.lang.Integer` 
instead of `int`). The PL/Java resolve mechanism finds the method regardless. 
Since Java cannot have different return types for methods with the same name, 
this does not introduce any ambiguity.
+
+### <a id="complextypes"></a>Complex Types 
+
+A complex type will always be passed as a read-only `java.sql.ResultSet` with 
exactly one row. The `ResultSet` is positioned on its row so a call to `next()` 
should not be made. The values of the complex type are retrieved using the 
standard getter methods of the `ResultSet`.
+
+Example:
+
+```sql
+=> CREATE TYPE complexTest
+     AS(base integer, incbase integer, ctime timestamptz);
+=> CREATE FUNCTION useComplexTest(complexTest)
+     RETURNS VARCHAR
+     AS 'foo.fee.Fum.useComplexTest'
+   IMMUTABLE LANGUAGE java;
+```
+
+In the Java class `Fum`, we add the following static method:
+
+```java
+public static String useComplexTest(ResultSet complexTest)
+throws SQLException
+{
+  int base = complexTest.getInt(1);
+  int incbase = complexTest.getInt(2);
+  Timestamp ctime = complexTest.getTimestamp(3);
+  return "Base = \"" + base +
+    "\", incbase = \"" + incbase +
+    "\", ctime = \"" + ctime + "\"";
+}
+```
+
+### <a id="returningcomplextypes"></a>Returning Complex Types 
+
+Java does not stipulate any way to create a `ResultSet`. Hence, returning a 
ResultSet is not an option. The SQL-2003 draft suggests that a complex return 
value should be handled as an IN/OUT parameter. PL/Java implements a 
`ResultSet` that way. If you declare a function that returns a complex type, 
you will need to use a Java method with boolean return type with a last 
parameter of type `java.sql.ResultSet`. The parameter will be initialized to an 
empty updateable ResultSet that contains exactly one row.
+
+Assume that the complexTest type in previous section has been created.
+
+```sql
+=> CREATE FUNCTION createComplexTest(int, int)
+     RETURNS complexTest
+     AS 'foo.fee.Fum.createComplexTest'
+   IMMUTABLE LANGUAGE java;
+```
+
+The PL/Java method resolve will now find the following method in the `Fum` 
class:
+
+```java
+public static boolean complexReturn(int base, int increment, 
+  ResultSet receiver)
+throws SQLException
+{
+  receiver.updateInt(1, base);
+  receiver.updateInt(2, base + increment);
+  receiver.updateTimestamp(3, new 
+    Timestamp(System.currentTimeMillis()));
+  return true;
+}
+```
+
+The return value denotes if the receiver should be considered as a valid tuple 
(true) or NULL (false).
+
+### <a id="functionreturnsets"></a>Functions that Return Sets 
+
+When returning result set, you should not build a result set before returning 
it, because building a large result set would consume a large amount of 
resources. It is better to produce one row at a time. Incidentally, that is 
what the HAWQ backend expects a function with SETOF return to do. You can 
return a SETOF a scalar type such as an int, float or varchar, or you can 
return a SETOF a complex type.
+
+### <a id="returnsetofscalar"></a>Returning a SETOF \<scalar type\> 
+
+In order to return a set of a scalar type, you need create a Java method that 
returns something that implements the `java.util.Iterator` interface. Here is 
an example of a method that returns a SETOF varchar:
+
+```sql
+=> CREATE FUNCTION javatest.getSystemProperties()
+     RETURNS SETOF varchar
+     AS 'foo.fee.Bar.getNames'
+   IMMUTABLE LANGUAGE java;
+```
+
+This simple Java method returns an iterator:
+
+```java
+package foo.fee;
+import java.util.Iterator;
+
+public class Bar
+{
+    public static Iterator getNames()
+    {
+        ArrayList names = new ArrayList();
+        names.add("Lisa");
+        names.add("Bob");
+        names.add("Bill");
+        names.add("Sally");
+        return names.iterator();
+    }
+}
+```
+
+### <a id="returnsetofcomplex"></a>Returning a SETOF \<complex type\> 
+
+A method returning a SETOF <complex type> must use either the interface 
`org.postgresql.pljava.ResultSetProvider` or 
`org.postgresql.pljava.ResultSetHandle`. The reason for having two interfaces 
is that they cater for optimal handling of two distinct use cases. The former 
is for cases when you want to dynamically create each row that is to be 
returned from the SETOF function. The latter makes is in cases where you want 
to return the result of an executed query.
+
+#### Using the ResultSetProvider Interface
+
+This interface has two methods. The boolean 
`assignRowValues(java.sql.ResultSet tupleBuilder, int rowNumber)` and the `void 
close()` method. The HAWQ query evaluator will call the `assignRowValues` 
repeatedly until it returns false or until the evaluator decides that it does 
not need any more rows. Then it calls close.
+
+You can use this interface the following way:
+
+```sql
+=> CREATE FUNCTION javatest.listComplexTests(int, int)
+     RETURNS SETOF complexTest
+     AS 'foo.fee.Fum.listComplexTest'
+   IMMUTABLE LANGUAGE java;
+```
+
+The function maps to a static java method that returns an instance that 
implements the `ResultSetProvider` interface.
+
+```java
+public class Fum implements ResultSetProvider
+{
+  private final int m_base;
+  private final int m_increment;
+  public Fum(int base, int increment)
+  {
+    m_base = base;
+    m_increment = increment;
+  }
+  public boolean assignRowValues(ResultSet receiver, int 
+currentRow)
+  throws SQLException
+  {
+    // Stop when we reach 12 rows.
+    //
+    if(currentRow >= 12)
+      return false;
+    receiver.updateInt(1, m_base);
+    receiver.updateInt(2, m_base + m_increment * currentRow);
+    receiver.updateTimestamp(3, new 
+Timestamp(System.currentTimeMillis()));
+    return true;
+  }
+  public void close()
+  {
+   // Nothing needed in this example
+  }
+  public static ResultSetProvider listComplexTests(int base, 
+int increment)
+  throws SQLException
+  {
+    return new Fum(base, increment);
+  }
+}
+```
+
+The `listComplextTests` method is called once. It may return NULL if no 
results are available or an instance of the `ResultSetProvider`. Here the Java 
class `Fum` implements this interface so it returns an instance of itself. The 
method `assignRowValues` will then be called repeatedly until it returns false. 
At that time, close will be called.
+
+#### Using the ResultSetHandle Interface
+
+This interface is similar to the `ResultSetProvider` interface in that it has 
a `close()` method that will be called at the end. But instead of having the 
evaluator call a method that builds one row at a time, this method has a method 
that returns a `ResultSet`. The query evaluator will iterate over this set and 
deliver the `ResultSet` contents, one tuple at a time, to the caller until a 
call to `next()` returns false or the evaluator decides that no more rows are 
needed.
+
+Here is an example that executes a query using a statement that it obtained 
using the default connection. The SQL suitable for the deployment descriptor 
looks like this:
+
+```sql
+=> CREATE FUNCTION javatest.listSupers()
+     RETURNS SETOF pg_user
+     AS 'org.postgresql.pljava.example.Users.listSupers'
+   LANGUAGE java;
+=> CREATE FUNCTION javatest.listNonSupers()
+     RETURNS SETOF pg_user
+     AS 'org.postgresql.pljava.example.Users.listNonSupers'
+   LANGUAGE java;
+```
+
+And in the Java package `org.postgresql.pljava.example` a class `Users` is 
added:
+
+```java
+public class Users implements ResultSetHandle
+{
+  private final String m_filter;
+  private Statement m_statement;
+  public Users(String filter)
+  {
+    m_filter = filter;
+  }
+  public ResultSet getResultSet()
+  throws SQLException
+  {
+    m_statement = 
+      DriverManager.getConnection("jdbc:default:connection").cr
+eateStatement();
+    return m_statement.executeQuery("SELECT * FROM pg_user 
+       WHERE " + m_filter);
+  }
+
+  public void close()
+  throws SQLException
+  {
+    m_statement.close();
+  }
+
+  public static ResultSetHandle listSupers()
+  {
+    return new Users("usesuper = true");
+  }
+
+  public static ResultSetHandle listNonSupers()
+  {
+    return new Users("usesuper = false");
+  }
+}
+```
+## <a id="usingjdbc"></a>Using JDBC 
+
+PL/Java contains a JDBC driver that maps to the PostgreSQL SPI functions. A 
connection that maps to the current transaction can be obtained using the 
following statement:
+
+```java
+Connection conn = 
+  DriverManager.getConnection("jdbc:default:connection"); 
+```
+
+After obtaining a connection, you can prepare and execute statements similar 
to other JDBC connections. These are limitations for the PL/Java JDBC driver:
+
+- The transaction cannot be managed in any way. Thus, you cannot use methods 
on the connection such as:
+   - `commit()`
+   - `rollback()`
+   - `setAutoCommit()`
+   - `setTransactionIsolation()`
+- Savepoints are available with some restrictions. A savepoint cannot outlive 
the function in which it was set and it must be rolled back or released by that 
same function.
+- A ResultSet returned from `executeQuery()` are always `FETCH_FORWARD` and 
`CONCUR_READ_ONLY`.
+- Meta-data is only available in PL/Java 1.1 or higher.
+- `CallableStatement` (for stored procedures) is not implemented.
+- The types `Clob` or `Blob` are not completely implemented, they need more 
work. The types `byte[]` and `String` can be used for `bytea` and `text` 
respectively.
+
+## <a id="exceptionhandling"></a>Exception Handling 
+
+You can catch and handle an exception in the HAWQ backend just like any other 
exception. The backend `ErrorData` structure is exposed as a property in a 
class called `org.postgresql.pljava.ServerException` (derived from 
`java.sql.SQLException`) and the Java try/catch mechanism is synchronized with 
the backend mechanism.
+
+**Important:** You will not be able to continue executing backend functions 
until your function has returned and the error has been propagated when the 
backend has generated an exception unless you have used a savepoint. When a 
savepoint is rolled back, the exceptional condition is reset and you can 
continue your execution.
+
+## <a id="savepoints"></a>Savepoints 
+
+HAWQ savepoints are exposed using the `java.sql.Connection` interface. Two 
restrictions apply.
+
+- A savepoint must be rolled back or released in the function where it was set.
+- A savepoint must not outlive the function where it was set.
+
+## <a id="logging"></a>Logging 
+
+PL/Java uses the standard Java Logger. Hence, you can write things like:
+
+```java
+Logger.getAnonymousLogger().info( "Time is " + new 
+Date(System.currentTimeMillis()));
+```
+
+At present, the logger uses a handler that maps the current state of the HAWQ 
configuration setting `log_min_messages` to a valid Logger level and that 
outputs all messages using the HAWQ backend function `elog()`.
+
+**Note:** The `log_min_messages` setting is read from the database the first 
time a PL/Java function in a session is executed. On the Java side, the setting 
does not change after the first PL/Java function execution in a specific 
session until the HAWQ session that is working with PL/Java is restarted.
+
+The following mapping apply between the Logger levels and the HAWQ backend 
levels.
+
+***Table 2: PL/Java Logging Levels Mappings***
+
+| java.util.logging.Level | HAWQ Level |
+|-------------------------|------------|
+| SEVERE ERROR | ERROR |
+| WARNING |    WARNING |
+| CONFIG |     LOG |
+| INFO | INFO |
+| FINE | DEBUG1 |
+| FINER | DEBUG2 |
+| FINEST | DEBUG3 |
+
+## <a id="security"></a>Security 
+
+This section describes security aspects of using PL/Java.
+
+### <a id="installation"></a>Installation 
+
+Only a database super user can install PL/Java. The PL/Java utility functions 
are installed using SECURITY DEFINER so that they execute with the access 
permissions that where granted to the creator of the functions.
+
+### <a id="trustedlang"></a>Trusted Language 
+
+PL/Java is a trusted language. The trusted PL/Java language has no access to 
the file system as stipulated by PostgreSQL definition of a trusted language. 
Any database user can create and access functions in a trusted language.
+
+PL/Java also installs a language handler for the language `javau`. This 
version is not trusted and only a superuser can create new functions that use 
it. Any user can call the functions.
+
+
+## <a id="pljavaexample"></a>Example 
+
+The following simple Java example creates a JAR file that contains a single 
method and runs the method.
+
+<p class="note"><b>Note:</b> The example requires Java SDK to compile the Java 
file.</p>
+
+The following method returns a substring.
+
+```java
+{
+public static String substring(String text, int beginIndex,
+  int endIndex)
+    {
+    return text.substring(beginIndex, endIndex);
+    }
+}
+```
+
+Enter the Java code in a text file `example.class`.
+
+Contents of the file `manifest.txt`:
+
+```plaintext
+Manifest-Version: 1.0
+Main-Class: Example
+Specification-Title: "Example"
+Specification-Version: "1.0"
+Created-By: 1.6.0_35-b10-428-11M3811
+Build-Date: 01../2013 10:09 AM
+```
+
+Compile the Java code:
+
+```shell
+$ javac *.java
+```
+
+Create a JAR archive named `analytics.jar` that contains the class file and 
the manifest file in the JAR:
+
+```shell
+$ jar cfm analytics.jar manifest.txt *.class
+```
+
+Upload the JAR file to the HAWQ master host.
+
+Run the `hawq scp` utility to copy the jar file to the HAWQ Java directory. 
Use the `-f` option to specify the file that contains a list of the master and 
segment hosts:
+
+```shell
+$ hawq scp -f hawq_hosts analytics.jar =:/usr/local/hawq/lib/postgresql/java/
+```
+
+Add the `analytics.jar` JAR file to the `pljava_classpath` configuration 
parameter. Refer to [Setting PL/Java Configuration 
Parameters](#setting_serverconfigparams) for the specific procedure.
+
+From the `psql` subsystem, run the following command to show the installed JAR 
files:
+
+``` sql
+=> SHOW pljava_classpath
+```
+
+The following SQL commands create a table and define a Java function to test 
the method in the JAR file:
+
+```sql
+=> CREATE TABLE temp (a varchar) DISTRIBUTED randomly; 
+=> INSERT INTO temp values ('my string'); 
+--Example function 
+=> CREATE OR REPLACE FUNCTION java_substring(varchar, int, int) 
+     RETURNS varchar AS 'Example.substring' 
+   LANGUAGE java; 
+--Example execution 
+=> SELECT java_substring(a, 1, 5) FROM temp;
+```
+
+If you add these SQL commands to a file named `mysample.sql`, you can run the 
commands from the `psql` subsystem using the `\i` meta-command:
+
+``` sql
+=> \i mysample.sql 
+```
+
+The output is similar to this:
+
+```shell
+java_substring
+----------------
+ y st
+(1 row)
+```
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/plext/using_plperl.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/plext/using_plperl.html.md.erb 
b/markdown/plext/using_plperl.html.md.erb
new file mode 100644
index 0000000..d6ffa04
--- /dev/null
+++ b/markdown/plext/using_plperl.html.md.erb
@@ -0,0 +1,27 @@
+---
+title: Using PL/Perl
+---
+
+This section contains an overview of the HAWQ PL/Perl language extension.
+
+## <a id="enableplperl"></a>Enabling PL/Perl
+
+If PL/Perl is enabled during HAWQ build time, HAWQ installs the PL/Perl 
language extension automatically. To use PL/Perl, you must enable it on 
specific databases.
+
+On every database where you want to enable PL/Perl, connect to the database 
using the psql client.
+
+``` shell
+$ psql -d <dbname>
+```
+
+Replace \<dbname\> with the name of the target database.
+
+Then, run the following SQL command:
+
+``` shell
+psql# CREATE LANGUAGE plperl;
+```
+
+## <a id="references"></a>References 
+
+For more information on using PL/Perl, see the PostgreSQL PL/Perl 
documentation at 
[https://www.postgresql.org/docs/8.2/static/plperl.html](https://www.postgresql.org/docs/8.2/static/plperl.html).
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/plext/using_plpgsql.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/plext/using_plpgsql.html.md.erb 
b/markdown/plext/using_plpgsql.html.md.erb
new file mode 100644
index 0000000..3661e9b
--- /dev/null
+++ b/markdown/plext/using_plpgsql.html.md.erb
@@ -0,0 +1,142 @@
+---
+title: Using PL/pgSQL in HAWQ
+---
+
+SQL is the language of most other relational databases use as query language. 
It is portable and easy to learn. But every SQL statement must be executed 
individually by the database server. 
+
+PL/pgSQL is a loadable procedural language. PL/SQL can do the following:
+
+-   create functions
+-   add control structures to the SQL language
+-   perform complex computations
+-   inherit all user-defined types, functions, and operators
+-   be trusted by the server
+
+You can use functions created with PL/pgSQL with any database that supports 
built-in functions. For example, it is possible to create complex conditional 
computation functions and later use them to define operators or use them in 
index expressions.
+
+Every SQL statement must be executed individually by the database server. Your 
client application must send each query to the database server, wait for it to 
be processed, receive and process the results, do some computation, then send 
further queries to the server. This requires interprocess communication and 
incurs network overhead if your client is on a different machine than the 
database server.
+
+With PL/pgSQL, you can group a block of computation and a series of queries 
inside the database server, thus having the power of a procedural language and 
the ease of use of SQL, but with considerable savings of client/server 
communication overhead.
+
+-   Extra round trips between client and server are eliminated
+-   Intermediate results that the client does not need do not have to be 
marshaled or transferred between server and client
+-   Multiple rounds of query parsing can be avoided
+
+This can result in a considerable performance increase as compared to an 
application that does not use stored functions.
+
+PL/pgSQL supports all the data types, operators, and functions of SQL.
+
+**Note:**  PL/pgSQL is automatically installed and registered in all HAWQ 
databases.
+
+## <a id="supportedargumentandresultdatatypes"></a>Supported Data Types for 
Arguments and Results 
+
+Functions written in PL/pgSQL accept as arguments any scalar or array data 
type supported by the server, and they can return a result containing this data 
type. They can also accept or return any composite type (row type) specified by 
name. It is also possible to declare a PL/pgSQL function as returning record, 
which means that the result is a row type whose columns are determined by 
specification in the calling query. See <a href="#tablefunctions" 
class="xref">Table Functions</a>.
+
+PL/pgSQL functions can be declared to accept a variable number of arguments by 
using the VARIADIC marker. This works exactly the same way as for SQL 
functions. See <a href="#sqlfunctionswithvariablenumbersofarguments" 
class="xref">SQL Functions with Variable Numbers of Arguments</a>.
+
+PL/pgSQLfunctions can also be declared to accept and return the polymorphic 
typesanyelement,anyarray,anynonarray, and anyenum. The actual data types 
handled by a polymorphic function can vary from call to call, as discussed in 
<a 
href="http://www.postgresql.org/docs/8.4/static/extend-type-system.html#EXTEND-TYPES-POLYMORPHIC";
 class="xref">Section 34.2.5</a>. An example is shown in <a 
href="http://www.postgresql.org/docs/8.4/static/plpgsql-declarations.html#PLPGSQL-DECLARATION-ALIASES";
 class="xref">Section 38.3.1</a>.
+
+PL/pgSQL functions can also be declared to return a "set" (or table) of any 
data type that can be returned as a single instance. Such a function generates 
its output by executing RETURN NEXT for each desired element of the result set, 
or by using RETURN QUERY to output the result of evaluating a query.
+
+Finally, a PL/pgSQL function can be declared to return void if it has no 
useful return value.
+
+PL/pgSQL functions can also be declared with output parameters in place of an 
explicit specification of the return type. This does not add any fundamental 
capability to the language, but it is often convenient, especially for 
returning multiple values. The RETURNS TABLE notation can also be used in place 
of RETURNS SETOF .
+
+This topic describes the following PL/pgSQLconcepts:
+
+-   [Table Functions](#tablefunctions)
+-   [SQL Functions with Variable number of 
Arguments](#sqlfunctionswithvariablenumbersofarguments)
+-   [Polymorphic Types](#polymorphictypes)
+
+
+## <a id="tablefunctions"></a>Table Functions 
+
+
+Table functions are functions that produce a set of rows, made up of either 
base data types (scalar types) or composite data types (table rows). They are 
used like a table, view, or subquery in the FROM clause of a query. Columns 
returned by table functions can be included in SELECT, JOIN, or WHERE clauses 
in the same manner as a table, view, or subquery column.
+
+If a table function returns a base data type, the single result column name 
matches the function name. If the function returns a composite type, the result 
columns get the same names as the individual attributes of the type.
+
+A table function can be aliased in the FROM clause, but it also can be left 
unaliased. If a function is used in the FROM clause with no alias, the function 
name is used as the resulting table name.
+
+Some examples:
+
+```sql
+CREATE TABLE foo (fooid int, foosubid int, fooname text);
+
+CREATE FUNCTION getfoo(int) RETURNS SETOF foo AS $$
+    SELECT * FROM foo WHERE fooid = $1;
+$$ LANGUAGE SQL;
+
+SELECT * FROM getfoo(1) AS t1;
+
+SELECT * FROM foo
+    WHERE foosubid IN (
+                        SELECT foosubid
+                        FROM getfoo(foo.fooid) z
+                        WHERE z.fooid = foo.fooid
+                      );
+
+CREATE VIEW vw_getfoo AS SELECT * FROM getfoo(1);
+
+SELECT * FROM vw_getfoo;
+```
+
+In some cases, it is useful to define table functions that can return 
different column sets depending on how they are invoked. To support this, the 
table function can be declared as returning the pseudotype record. When such a 
function is used in a query, the expected row structure must be specified in 
the query itself, so that the system can know how to parse and plan the query. 
Consider this example:
+
+```sql
+SELECT *
+    FROM dblink('dbname=mydb', 'SELECT proname, prosrc FROM pg_proc')
+      AS t1(proname name, prosrc text)
+    WHERE proname LIKE 'bytea%';
+```
+
+The `dblink` function executes a remote query (see `contrib/dblink`). It is 
declared to return `record` since it might be used for any kind of query. The 
actual column set must be specified in the calling query so that the parser 
knows, for example, what `*` should expand to.
+
+
+## <a id="sqlfunctionswithvariablenumbersofarguments"></a>SQL Functions with 
Variable Numbers of Arguments 
+
+SQL functions can be declared to accept variable numbers of arguments, so long 
as all the "optional" arguments are of the same data type. The optional 
arguments will be passed to the function as an array. The function is declared 
by marking the last parameter as VARIADIC; this parameter must be declared as 
being of an array type. For example:
+
+```sql
+CREATE FUNCTION mleast(VARIADIC numeric[]) RETURNS numeric AS $$
+    SELECT min($1[i]) FROM generate_subscripts($1, 1) g(i);
+$$ LANGUAGE SQL;
+
+SELECT mleast(10, -1, 5, 4.4);
+ mleast 
+--------
+     -1
+(1 row)
+```
+
+Effectively, all the actual arguments at or beyond the VARIADIC position are 
gathered up into a one-dimensional array, as if you had written
+
+```sql
+SELECT mleast(ARRAY[10, -1, 5, 4.4]);    -- doesn't work
+```
+
+You can't actually write that, though; or at least, it will not match this 
function definition. A parameter marked VARIADIC matches one or more 
occurrences of its element type, not of its own type.
+
+Sometimes it is useful to be able to pass an already-constructed array to a 
variadic function; this is particularly handy when one variadic function wants 
to pass on its array parameter to another one. You can do that by specifying 
VARIADIC in the call:
+
+```sql
+SELECT mleast(VARIADIC ARRAY[10, -1, 5, 4.4]);
+```
+
+This prevents expansion of the function's variadic parameter into its element 
type, thereby allowing the array argument value to match normally. VARIADIC can 
only be attached to the last actual argument of a function call.
+
+
+
+## <a id="polymorphictypes"></a>Polymorphic Types 
+
+Four pseudo-types of special interest are anyelement,anyarray, anynonarray, 
and anyenum, which are collectively called *polymorphic types*. Any function 
declared using these types is said to be a*polymorphic function*. A polymorphic 
function can operate on many different data types, with the specific data 
type(s) being determined by the data types actually passed to it in a 
particular call.
+
+Polymorphic arguments and results are tied to each other and are resolved to a 
specific data type when a query calling a polymorphic function is parsed. Each 
position (either argument or return value) declared as anyelement is allowed to 
have any specific actual data type, but in any given call they must all be the 
sam eactual type. Each position declared as anyarray can have any array data 
type, but similarly they must all be the same type. If there are positions 
declared anyarray and others declared anyelement, the actual array type in the 
anyarray positions must be an array whose elements are the same type appearing 
in the anyelement positions.anynonarray is treated exactly the same as 
anyelement, but adds the additional constraint that the actual type must not be 
an array type. anyenum is treated exactly the same as anyelement, but adds the 
additional constraint that the actual type must be an enum type.
+
+Thus, when more than one argument position is declared with a polymorphic 
type, the net effect is that only certain combinations of actual argument types 
are allowed. For example, a function declared as equal(anyelement, anyelement) 
will take any two input values, so long as they are of the same data type.
+
+When the return value of a function is declared as a polymorphic type, there 
must be at least one argument position that is also polymorphic, and the actual 
data type supplied as the argument determines the actual result type for that 
call. For example, if there were not already an array subscripting mechanism, 
one could define a function that implements subscripting `assubscript(anyarray, 
integer)` returns `anyelement`. This declaration constrains the actual first 
argument to be an array type, and allows the parser to infer the correct result 
type from the actual first argument's type. Another example is that a function 
declared `asf(anyarray)` returns `anyenum` will only accept arrays of `enum` 
types.
+
+Note that `anynonarray` and `anyenum` do not represent separate type 
variables; they are the same type as `anyelement`, just with an additional 
constraint. For example, declaring a function as `f(anyelement,           
anyenum)` is equivalent to declaring it as `f(anyenum, anyenum)`; both actual 
arguments have to be the same enum type.
+
+Variadic functions described in <a 
href="#sqlfunctionswithvariablenumbersofarguments" class="xref">SQL Functions 
with Variable Numbers of Arguments</a> can be polymorphic: this is accomplished 
by declaring its last parameter as `VARIADIC anyarray`. For purposes of 
argument matching and determining the actual result type, such a function 
behaves the same as if you had written the appropriate number of `anynonarray` 
parameters.

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/de1e2e07/markdown/plext/using_plpython.html.md.erb
----------------------------------------------------------------------
diff --git a/markdown/plext/using_plpython.html.md.erb 
b/markdown/plext/using_plpython.html.md.erb
new file mode 100644
index 0000000..063509a
--- /dev/null
+++ b/markdown/plext/using_plpython.html.md.erb
@@ -0,0 +1,789 @@
+---
+title: Using PL/Python in HAWQ
+---
+
+This section provides an overview of the HAWQ PL/Python procedural language 
extension.
+
+## <a id="abouthawqplpython"></a>About HAWQ PL/Python 
+
+PL/Python is embedded in your HAWQ product distribution or within your HAWQ 
build if you chose to enable it as a build option. 
+
+With the HAWQ PL/Python extension, you can write user-defined functions in 
Python that take advantage of Python features and modules, enabling you to 
quickly build robust HAWQ database applications.
+
+HAWQ uses the system Python installation.
+
+### <a id="hawqlimitations"></a>HAWQ PL/Python Limitations 
+
+- HAWQ does not support PL/Python trigger functions.
+- PL/Python is available only as a HAWQ untrusted language.
+ 
+## <a id="enableplpython"></a>Enabling and Removing PL/Python Support 
+
+To use PL/Python in HAWQ, you must either install a binary version of HAWQ 
that includes PL/Python or specify PL/Python as a build option when you compile 
HAWQ from source.
+
+You must register the PL/Python language with a database before you can create 
and execute a PL/Python UDF on that database. You must be a database superuser 
to register and remove new languages in HAWQ databases.
+
+On every database to which you want to install and enable PL/Python:
+
+1. Connect to the database using the `psql` client:
+
+    ``` shell
+    gpadmin@hawq-node$ psql -d <dbname>
+    ```
+
+    Replace \<dbname\> with the name of the target database.
+
+2. Run the following SQL command to register the PL/Python procedural language:
+
+    ``` sql
+    dbname=# CREATE LANGUAGE plpythonu;
+    ```
+
+    **Note**: `plpythonu` is installed as an *untrusted* language; it offers 
no way of restricting what you can program in UDFs created with the language. 
Creating and executing PL/Python UDFs is permitted only by database superusers 
and other database users explicitly `GRANT`ed the permissions.
+
+To remove support for `plpythonu` from a database, run the following SQL 
command; you must be a database superuser to remove a registered procedural 
language:
+
+``` sql
+dbname=# DROP LANGUAGE plpythonu;
+```
+
+## <a id="developfunctions"></a>Developing Functions with PL/Python 
+
+PL/Python functions are defined using the standard SQL [CREATE 
FUNCTION](../reference/sql/CREATE-FUNCTION.html) syntax.
+
+The body of a PL/Python user-defined function is a Python script. When the 
function is called, its arguments are passed as elements of the array `args[]`. 
You can also pass named arguments as ordinary variables to the Python script. 
+
+PL/Python function results are returned with a `return` statement, or a 
`yield` statement in the case of a result-set statement.
+
+The following PL/Python function computes and returns the maximum of two 
integers:
+
+``` sql
+=# CREATE FUNCTION mypymax (a integer, b integer)
+     RETURNS integer
+   AS $$
+     if (a is None) or (b is None):
+       return None
+     if a > b:
+       return a
+     return b
+   $$ LANGUAGE plpythonu;
+```
+
+To execute the `mypymax` function:
+
+``` sql
+=# SELECT mypymax(5, 7);
+ mypymax 
+---------
+       7
+(1 row)
+```
+
+Adding the `STRICT` keyword to the `LANGUAGE` subclause instructs HAWQ to 
return null when any of the input arguments are null. When created as `STRICT`, 
the function itself need not perform null checks.
+
+The following example uses an unnamed argument, the built-in Python `max()` 
function, and the `STRICT` keyword to create a UDF named `mypymax2`:
+
+``` sql
+=# CREATE FUNCTION mypymax2 (a integer, integer)
+     RETURNS integer AS $$ 
+   return max(a, args[0]) 
+   $$ LANGUAGE plpythonu STRICT;
+=# SELECT mypymax(5, 3);
+ mypymax2
+----------
+        5
+(1 row)
+=# SELECT mypymax(5, null);
+ mypymax2
+----------
+       
+(1 row)
+```
+
+## <a id="example_createtbl"></a>Creating the Sample Data
+
+Perform the following steps to create, and insert data into, a simple table. 
This table will be used in later exercises.
+
+1. Create a database named `testdb`:
+
+    ``` shell
+    gpadmin@hawq-node$ createdb testdb
+    ```
+
+1. Create a table named `sales`:
+
+    ``` shell
+    gpadmin@hawq-node$ psql -d testdb
+    ```
+    ``` sql
+    testdb=> CREATE TABLE sales (id int, year int, qtr int, day int, region 
text)
+               DISTRIBUTED BY (id);
+    ```
+
+2. Insert data into the table:
+
+    ``` sql
+    testdb=> INSERT INTO sales VALUES
+     (1, 2014, 1,1, 'usa'),
+     (2, 2002, 2,2, 'europe'),
+     (3, 2014, 3,3, 'asia'),
+     (4, 2014, 4,4, 'usa'),
+     (5, 2014, 1,5, 'europe'),
+     (6, 2014, 2,6, 'asia'),
+     (7, 2002, 3,7, 'usa') ;
+    ```
+
+## <a id="pymod_intro"></a>Python Modules 
+A Python module is a text file containing Python statements and definitions. 
Python modules are named, with the file name for a module following the 
`<python-module-name>.py` naming convention.
+
+Should you need to build a Python module, ensure that the appropriate software 
is installed on the build system. Also be sure that you are building for the 
correct deployment architecture, i.e. 64-bit.
+
+### <a id="pymod_intro_hawq"></a>HAWQ Considerations 
+
+When installing a Python module in HAWQ, you must add the module to all 
segment nodes in the cluster. You must also add all Python modules to any new 
segment hosts when you expand your HAWQ cluster.
+
+PL/Python supports the built-in HAWQ Python module named `plpy`.  You can also 
install 3rd party Python modules.
+
+
+## <a id="modules_plpy"></a>plpy Module 
+
+The HAWQ PL/Python procedural language extension automatically imports the 
Python module `plpy`. `plpy` implements functions to execute SQL queries and 
prepare execution plans for queries.  The `plpy` module also includes functions 
to manage errors and messages.
+   
+### <a id="executepreparesql"></a>Executing and Preparing SQL Queries 
+
+Use the PL/Python `plpy` module `plpy.execute()` function to execute a SQL 
query. Use the `plpy.prepare()` function to prepare an execution plan for a 
query. Preparing the execution plan for a query is useful if you want to run 
the query from multiple Python functions.
+
+#### <a id="plpyexecute"></a>plpy.execute() 
+
+Invoking `plpy.execute()` with a query string and an optional limit argument 
runs the query, returning the result in a Python result object. This result 
object:
+
+- emulates a list or dictionary object
+- returns rows that can be accessed by row number and column name; row 
numbering starts with 0 (zero)
+- can be modified
+- includes an `nrows()` method that returns the number of rows returned by the 
query
+- includes a `status()` method that returns the `SPI_execute()` return value
+
+For example, the following Python statement when present in a PL/Python 
user-defined function will execute a `SELECT * FROM mytable` query:
+
+``` python
+rv = plpy.execute("SELECT * FROM my_table", 3)
+```
+
+As instructed by the limit argument `3`, the `plpy.execute` function will 
return up to 3 rows from `my_table`. The result set is stored in the `rv` 
object.
+
+Access specific columns in the table by name. For example, if `my_table` has a 
column named `my_column`:
+
+``` python
+my_col_data = rv[i]["my_column"]
+```
+
+You specified that the function return a maximum of 3 rows in the 
`plpy.execute()` command above. As such, the index `i` used to access the 
result value `rv` must specify an integer between 0 and 2, inclusive.
+
+##### <a id="plpyexecute_example"></a>Example: plpy.execute()
+
+Example: Use `plpy.execute()` to run a similar query on the `sales` table you 
created in an earlier section:
+
+1. Define a PL/Python UDF that executes a query to return at most 5 rows from 
the `sales` table:
+
+    ``` sql
+    =# CREATE OR REPLACE FUNCTION mypytest(a integer) 
+         RETURNS text 
+       AS $$ 
+         rv = plpy.execute("SELECT * FROM sales ORDER BY id", 5)
+         region = rv[a-1]["region"]
+         return region
+       $$ LANGUAGE plpythonu;
+    ```
+
+    When executed, this UDF returns the `region` value from the `id` 
identified by the input value `a`. Since row numbering of the result set starts 
at 0, you must access the result set with index `a - 1`. 
+    
+    Specifying the `ORDER BY id` clause in the `SELECT` statement ensures that 
subsequent invocations of `mypytest` with the same input argument will return 
identical result sets.
+
+3. Run `mypytest` with an argument identifying `id` `3`:
+
+    ```sql
+    =# SELECT mypytest(3);
+     mypytest 
+    ----------
+     asia
+    (1 row)
+    ```
+    
+    Recall that the row numbering starts from 0 in a Python returned result 
set. The valid input argument for the `mypytest2` function is an integer 
between 0 and 4, inclusive.
+
+    The query returns the `region` from the row with `id = 3`, `asia`.
+    
+Note: This example demonstrates some of the concepts discussed previously. It 
may not be the ideal way to return a specific column value.
+
+#### <a id="plpyprepare"></a>plpy.prepare() 
+
+The function `plpy.prepare()` prepares the execution plan for a query. 
Preparing the execution plan for a query is useful if you plan to run the query 
from multiple Python functions.
+
+You invoke `plpy.prepare()` with a query string. Also include a list of 
parameter types if you are using parameter references in the query. For 
example, the following statement in a PL/Python user-defined function returns 
the execution plan for a query:
+
+``` python
+plan = plpy.prepare("SELECT * FROM sales ORDER BY id WHERE 
+  region = $1", [ "text" ])
+```
+
+The string `text` identifies the data type of the variable `$1`. 
+
+After preparing an execution plan, you use the function `plpy.execute()` to 
run it.  For example:
+
+``` python
+rv = plpy.execute(plan, [ "usa" ])
+```
+
+When executed, `rv` will include all rows in the `sales` table where `region = 
usa`.
+
+Read on for a description of how one passes data between PL/Python function 
calls.
+
+##### <a id="plpyprepare_dictionaries"></a>Saving Execution Plans
+
+When you prepare an execution plan using the PL/Python module, the plan is 
automatically saved. See the [Postgres Server Programming Interface 
(SPI)](http://www.postgresql.org/docs/8.2/static/spi.html) documentation for 
information about execution plans.
+
+To make effective use of saved plans across function calls, you use one of the 
Python persistent storage dictionaries, SD or GD.
+
+The global dictionary SD is available to store data between function calls. 
This variable is private static data. The global dictionary GD is public data, 
and is available to all Python functions within a session. *Use GD with care*.
+
+Each function gets its own execution environment in the Python interpreter, so 
that global data and function arguments from `myfunc1` are not available to 
`myfunc2`. The exception is the data in the GD dictionary, as mentioned 
previously.
+
+This example saves an execution plan to the SD dictionary and then executes 
the plan:
+
+```sql
+=# CREATE FUNCTION usesavedplan() RETURNS text AS $$
+     select1plan = plpy.prepare("SELECT region FROM sales WHERE id=1")
+     SD["s1plan"] = select1plan
+     # other function processing
+     # execute the saved plan
+     rv = plpy.execute(SD["s1plan"])
+     return rv[0]["region"]
+   $$ LANGUAGE plpythonu;
+=# SELECT usesavedplan();
+```
+
+##### <a id="plpyprepare_example"></a>Example: plpy.prepare()
+
+Example: Use `plpy.prepare()` and `plpy.execute()` to prepare and run an 
execution plan using the GD dictionary:
+
+1. Define a PL/Python UDF to prepare and save an execution plan to the GD. 
Also  return the name of the plan:
+
+    ``` sql
+    =# CREATE OR REPLACE FUNCTION mypy_prepplan() 
+         RETURNS text 
+       AS $$ 
+         plan = plpy.prepare("SELECT * FROM sales WHERE region = $1 ORDER BY 
id", [ "text" ])
+         GD["getregionplan"] = plan
+         return "getregionplan"
+       $$ LANGUAGE plpythonu;
+    ```
+
+    This UDF, when run, will return the name (key) of the execution plan 
generated from the `plpy.prepare()` call.
+
+1. Define a PL/Python UDF to run the execution plan; this function will take 
the plan name and `region` name as an input:
+
+    ``` sql
+    =# CREATE OR REPLACE FUNCTION mypy_execplan(planname text, regionname text)
+         RETURNS integer 
+       AS $$ 
+         rv = plpy.execute(GD[planname], [ regionname ], 5)
+         year = rv[0]["year"]
+         return year
+       $$ LANGUAGE plpythonu STRICT;
+    ```
+
+    This UDF executes the `planname` plan that was previously saved to the GD. 
You will call `mypy_execplan()` with the `planname` returned from the 
`plpy.prepare()` call.
+
+3. Execute the `mypy_prepplan()` and `mypy_execplan()` UDFs, passing `region` 
`usa`:
+
+    ``` sql
+    =# SELECT mypy_execplan( mypy_prepplan(), 'usa' );
+     mypy_execplan
+    ---------------
+         2014
+    (1 row)
+    ```
+
+### <a id="pythonerrors"></a>Handling Python Errors and Messages 
+
+The `plpy` module implements the following message- and error-related 
functions, each of which takes a message string as an argument:
+
+- `plpy.debug(msg)`
+- `plpy.log(msg)`
+- `plpy.info(msg)`
+- `plpy.notice(msg)`
+- `plpy.warning(msg)`
+- `plpy.error(msg)`
+- `plpy.fatal(msg)`
+
+`plpy.error()` and `plpy.fatal()` raise a Python exception which, if uncaught, 
propagates out to the calling query, possibly aborting the current transaction 
or subtransaction. `raise plpy.ERROR(msg)` and `raise plpy.FATAL(msg)` are 
equivalent to calling `plpy.error()` and `plpy.fatal()`, respectively. Use the 
other message functions to generate messages of different priority levels.
+
+Messages may be reported to the client and/or written to the HAWQ server log 
file.  The HAWQ server configuration parameters 
[`log_min_messages`](../reference/guc/parameter_definitions.html#log_min_messages)
 and 
[`client_min_messages`](../reference/guc/parameter_definitions.html#client_min_messages)
 control where messages are reported.
+
+#### <a id="plpymessages_example"></a>Example: Generating Messages
+
+In this example, you will create a PL/Python UDF that includes some debug log 
messages. You will also configure your `psql` session to enable debug-level 
client logging.
+
+1. Define a PL/Python UDF that executes a query that will return at most 5 
rows from the `sales` table. Invoke the `plpy.debug()` method to display some 
additional information:
+
+    ``` sql
+    =# CREATE OR REPLACE FUNCTION mypytest_debug(a integer) 
+         RETURNS text 
+       AS $$ 
+         plpy.debug('mypytest_debug executing query:  SELECT * FROM sales 
ORDER BY id')
+         rv = plpy.execute("SELECT * FROM sales ORDER BY id", 5)
+         plpy.debug('mypytest_debug: query returned ' + str(rv.nrows()) + ' 
rows')
+         region = rv[a]["region"]
+         return region
+       $$ LANGUAGE plpythonu;
+    ```
+
+2. Execute the `mypytest_debug()` UDF, passing the integer `2` as an argument:
+
+    ```sql
+    =# SELECT mypytest_debug(2);
+     mypytest_debug 
+    ----------------
+     asia
+    (1 row)
+    ```
+
+3. Enable `DEBUG2` level client logging:
+
+    ``` sql
+    =# SET client_min_messages=DEBUG2;
+    ```
+    
+2. Execute the `mypytest_debug()` UDF again:
+
+    ```sql
+    =# SELECT mypytest_debug(2);
+    ...
+    DEBUG2:  mypytest_debug executing query:  SELECT * FROM sales ORDER BY id
+    ...
+    DEBUG2:  mypytest_debug: query returned 5 rows
+    ...
+    ```
+
+    Debug output is very verbose. You will parse a lot of output to find the 
`mypytest_debug` messages. *Hint*: look both near the start and end of the 
output.
+    
+6. Turn off client-level debug logging:
+
+    ```sql
+    =# SET client_min_messages=NOTICE;
+    ```
+
+## <a id="pythonmodules-3rdparty"></a>3rd-Party Python Modules 
+
+PL/Python supports installation and use of 3rd-party Python Modules. This 
section includes examples for installing the `setuptools` and NumPy Python 
modules.
+
+**Note**: You must have superuser privileges to install Python modules to the 
system Python directories.
+
+### <a id="simpleinstall"></a>Example: Installing setuptools 
+
+In this example, you will manually install the Python `setuptools` module from 
the Python Package Index repository. `setuptools` enables you to easily 
download, build, install, upgrade, and uninstall Python packages.
+
+You will first build the module from the downloaded package, installing it on 
a single host. You will then build and install the module on all segment nodes 
in your HAWQ cluster.
+
+1. Download the `setuptools` module package from the Python Package Index 
site. For example, run this `wget` command on a HAWQ node as the `gpadmin` user:
+
+    ``` shell
+    $ ssh gpadmin@<hawq-node>
+    gpadmin@hawq-node$ . /usr/local/hawq/greenplum_path.sh
+    gpadmin@hawq-node$ mkdir plpython_pkgs
+    gpadmin@hawq-node$ cd plpython_pkgs
+    gpadmin@hawq-node$ export PLPYPKGDIR=`pwd`
+    gpadmin@hawq-node$ wget --no-check-certificate 
https://pypi.python.org/packages/source/s/setuptools/setuptools-18.4.tar.gz
+    ```
+
+2. Extract the files from the `tar.gz` package:
+
+    ``` shell
+    gpadmin@hawq-node$ tar -xzvf setuptools-18.4.tar.gz
+    ```
+
+3. Run the Python scripts to build and install the Python package; you must 
have superuser privileges to install Python modules to the system Python 
installation:
+
+    ``` shell
+    gpadmin@hawq-node$ cd setuptools-18.4
+    gpadmin@hawq-node$ python setup.py build 
+    gpadmin@hawq-node$ sudo python setup.py install
+    ```
+
+4. Run the following command to verify the module is available to Python:
+
+    ``` shell
+    gpadmin@hawq-node$ python -c "import setuptools"
+    ```
+    
+    If no error is returned, the `setuptools` module was successfully imported.
+
+5. The `setuptools` package installs the `easy_install` utility. This utility 
enables you to install Python packages from the Python Package Index 
repository. For example, this command installs the Python `pip` utility from 
the Python Package Index site:
+
+    ``` shell
+    gpadmin@hawq-node$ sudo easy_install pip
+    ```
+
+5. Copy the `setuptools` package to all HAWQ nodes in your cluster. For 
example, this command copies the `tar.gz` file from the current host to the 
host systems listed in the file `hawq-hosts`:
+
+    ``` shell
+    gpadmin@hawq-node$ cd $PLPYPKGDIR
+    gpadmin@hawq-node$ hawq scp -f hawq-hosts setuptools-18.4.tar.gz 
=:/home/gpadmin
+    ```
+
+6. Run the commands to build, install, and test the `setuptools` package you 
just copied to all hosts in your HAWQ cluster. For example:
+
+    ``` shell
+    gpadmin@hawq-node$ hawq ssh -f hawq-hosts
+    >>> mkdir plpython_pkgs
+    >>> cd plpython_pkgs
+    >>> tar -xzvf ../setuptools-18.4.tar.gz
+    >>> cd setuptools-18.4
+    >>> python setup.py build 
+    >>> sudo python setup.py install
+    >>> python -c "import setuptools"
+    >>> exit
+    ```
+
+### <a id="complexinstall"></a>Example: Installing NumPy 
+
+In this example, you will build and install the Python module NumPy. NumPy is 
a module for scientific computing with Python. For additional information about 
NumPy, refer to [http://www.numpy.org/](http://www.numpy.org/).
+
+This example assumes `yum` is installed on all HAWQ segment nodes and that the 
`gpadmin` user is a member of `sudoers` with `root` privileges on the nodes.
+
+#### <a id="complexinstall_prereq"></a>Prerequisites
+Building the NumPy package requires the following software:
+
+- OpenBLAS libraries - an open source implementation of BLAS (Basic Linear 
Algebra Subprograms)
+- Python development packages - python-devel
+- gcc compilers - gcc, gcc-gfortran, and gcc-c++
+
+Perform the following steps to set up the OpenBLAS compilation environment on 
each HAWQ node:
+
+1. Use `yum` to install gcc compilers from system repositories. The compilers 
are required on all hosts where you compile OpenBLAS.  For example:
+
+       ``` shell
+       root@hawq-node$ yum -y install gcc gcc-gfortran gcc-c++ python-devel
+       ```
+
+2. (Optionally required) If you cannot install the correct compiler versions 
with `yum`, you have the option to download the gcc compilers, including 
`gfortran`, from source and build and install them manually. Refer to [Building 
gfortran from Source](https://gcc.gnu.org/wiki/GFortranBinaries#FromSource) for 
`gfortran` build and install information.
+
+2. Create a symbolic link to `g++`, naming it `gxx`:
+
+       ``` bash
+       root@hawq-node$ ln -s /usr/bin/g++ /usr/bin/gxx
+       ```
+
+3. You may also need to create symbolic links to any libraries that have 
different versions available; for example, linking `libppl_c.so.4` to 
`libppl_c.so.2`.
+
+4. You can use the `hawq scp` utility to copy files to HAWQ hosts and the 
`hawq ssh` utility to run commands on those hosts.
+
+
+#### <a id="complexinstall_downdist"></a>Obtaining Packages
+
+Perform the following steps to download and distribute the OpenBLAS and NumPy 
source packages:
+
+1. Download the OpenBLAS and NumPy source files. For example, these `wget` 
commands download `tar.gz` files into a `packages` directory in the current 
working directory:
+
+    ``` shell
+    $ ssh gpadmin@<hawq-node>
+    gpadmin@hawq-node$ wget --directory-prefix=packages 
http://github.com/xianyi/OpenBLAS/tarball/v0.2.8
+    gpadmin@hawq-node$ wget --directory-prefix=packages 
http://sourceforge.net/projects/numpy/files/NumPy/1.8.0/numpy-1.8.0.tar.gz/download
+    ```
+
+2. Distribute the software to all nodes in your HAWQ cluster. For example, if 
you downloaded the software to `/home/gpadmin/packages`, these commands create 
the `packages` directory on all nodes and copies the software to the nodes 
listed in the `hawq-hosts` file:
+
+    ``` shell
+    gpadmin@hawq-node$ hawq ssh -f hawq-hosts mkdir packages 
+    gpadmin@hawq-node$ hawq scp -f hawq-hosts packages/* 
=:/home/gpadmin/packages
+    ```
+
+#### <a id="buildopenblas"></a>Build and Install OpenBLAS Libraries 
+
+Before building and installing the NumPy module, you must first build and 
install the OpenBLAS libraries. This section describes how to build and install 
the libraries on a single HAWQ node.
+
+1. Extract the OpenBLAS files from the file:
+
+       ``` shell
+       $ ssh gpadmin@<hawq-node>
+       gpadmin@hawq-node$ cd packages
+       gpadmin@hawq-node$ tar xzf v0.2.8 -C /home/gpadmin/packages
+       gpadmin@hawq-node$ mv /home/gpadmin/packages/xianyi-OpenBLAS-9c51cdf 
/home/gpadmin/packages/OpenBLAS
+       ```
+       
+       These commands extract the OpenBLAS tar file and simplify the unpacked 
directory name.
+
+2. Compile OpenBLAS. You must set the `LIBRARY_PATH` environment variable to 
the current `$LD_LIBRARY_PATH`. For example:
+
+       ``` shell
+       gpadmin@hawq-node$ cd OpenBLAS
+       gpadmin@hawq-node$ export LIBRARY_PATH=$LD_LIBRARY_PATH
+       gpadmin@hawq-node$ make FC=gfortran USE_THREAD=0 TARGET=SANDYBRIDGE
+       ```
+       
+       Replace the `TARGET` argument with the target appropriate for your 
hardware. The `TargetList.txt` file identifies the list of supported OpenBLAS 
targets.
+       
+       Compiling OpenBLAS make take some time.
+
+3. Install the OpenBLAS libraries in `/usr/local` and then change the owner of 
the files to `gpadmin`. You must have `root` privileges. For example:
+
+       ``` shell
+       gpadmin@hawq-node$ sudo make PREFIX=/usr/local install
+       gpadmin@hawq-node$ sudo ldconfig
+       gpadmin@hawq-node$ sudo chown -R gpadmin /usr/local/lib
+       ```
+
+       The following libraries are installed to `/usr/local/lib`, along with 
symbolic links:
+
+       ``` shell
+       gpadmin@hawq-node$ ls -l gpadmin@hawq-node$
+           ...
+           libopenblas.a -> libopenblas_sandybridge-r0.2.8.a
+           libopenblas_sandybridge-r0.2.8.a
+           libopenblas_sandybridge-r0.2.8.so
+           libopenblas.so -> libopenblas_sandybridge-r0.2.8.so
+           libopenblas.so.0 -> libopenblas_sandybridge-r0.2.8.so
+           ...
+       ```
+
+4. Install the OpenBLAS libraries on all nodes in your HAWQ cluster. You can 
use the `hawq ssh` utility to similarly build and install the OpenBLAS 
libraries on each of the nodes. 
+
+    Or, you may choose to copy the OpenBLAS libraries you just built to all of 
the HAWQ cluster nodes. For example, these `hawq ssh` and `hawq scp` commands 
install prerequisite packages, and copy and install the OpenBLAS libraries on 
the hosts listed in the `hawq-hosts` file.
+
+    ``` shell
+    $ hawq ssh -f hawq-hosts -e 'sudo yum -y install gcc gcc-gfortran gcc-c++ 
python-devel'
+    $ hawq ssh -f hawq-hosts -e 'ln -s /usr/bin/g++ /usr/bin/gxx'
+    $ hawq ssh -f hawq-hosts -e sudo chown gpadmin /usr/local/lib
+    $ hawq scp -f hawq-hosts /usr/local/lib/libopen*sandy* =:/usr/local/lib
+    ```
+    ``` shell
+    $ hawq ssh -f hawq-hosts
+    >>> cd /usr/local/lib
+    >>> ln -s libopenblas_sandybridge-r0.2.8.a libopenblas.a
+    >>> ln -s libopenblas_sandybridge-r0.2.8.so libopenblas.so
+    >>> ln -s libopenblas_sandybridge-r0.2.8.so libopenblas.so.0
+    >>> sudo ldconfig
+   ```
+
+#### Build and Install NumPy <a name="buildinstallnumpy"></a>
+
+After you have installed the OpenBLAS libraries, you can build and install 
NumPy module. These steps install the NumPy module on a single host. You can 
use the `hawq ssh` utility to build and install the NumPy module on multiple 
hosts.
+
+1. Extract the NumPy module source files:
+
+       ``` shell
+       gpadmin@hawq-node$ cd /home/gpadmin/packages
+       gpadmin@hawq-node$ tar xzf numpy-1.8.0.tar.gz
+       ```
+       
+       Unpacking the `numpy-1.8.0.tar.gz` file creates a directory named 
`numpy-1.8.0` in the current directory.
+
+2. Set up the environment for building and installing NumPy:
+
+       ``` shell
+       gpadmin@hawq-node$ export BLAS=/usr/local/lib/libopenblas.a
+       gpadmin@hawq-node$ export LAPACK=/usr/local/lib/libopenblas.a
+       gpadmin@hawq-node$ export 
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/
+       gpadmin@hawq-node$ export LIBRARY_PATH=$LD_LIBRARY_PATH
+       ```
+
+3. Build and install NumPy. (Building the NumPy package might take some time.)
+
+       ``` shell
+       gpadmin@hawq-node$ cd numpy-1.8.0
+       gpadmin@hawq-node$ python setup.py build
+       gpadmin@hawq-node$ sudo python setup.py install
+       ```
+
+       **Note:** If the NumPy module did not successfully build, the NumPy 
build process might need a `site.cfg` file that specifies the location of the 
OpenBLAS libraries. Create the `site.cfg` file in the NumPy package directory:
+
+       ``` shell
+       gpadmin@hawq-node$ touch site.cfg
+       ```
+
+       Add the following to the `site.cfg` file and run the NumPy build 
command again:
+
+       ``` pre
+       [default]
+       library_dirs = /usr/local/lib
+
+       [atlas]
+       atlas_libs = openblas
+       library_dirs = /usr/local/lib
+
+       [lapack]
+       lapack_libs = openblas
+       library_dirs = /usr/local/lib
+
+       # added for scikit-learn 
+       [openblas]
+       libraries = openblas
+       library_dirs = /usr/local/lib
+       include_dirs = /usr/local/include
+       ```
+
+4. Verify that the NumPy module is available for import by Python:
+
+       ``` shell
+       gpadmin@hawq-node$ cd $HOME
+       gpadmin@hawq-node$ python -c "import numpy"
+       ```
+       
+       If no error is returned, the NumPy module was successfully imported.
+
+5. As performed in the `setuptools` Python module installation, use the `hawq 
ssh` utility to build, install, and test the NumPy module on all HAWQ nodes.
+
+5. The environment variables that were required to build the NumPy module are 
also required in the `gpadmin` runtime environment to run Python NumPy 
functions. You can use the `echo` command to add the environment variables to 
`gpadmin`'s `.bashrc` file. For example, the following `echo` commands add the 
environment variables to the `.bashrc` file in `gpadmin`'s home directory:
+
+       ``` shell
+       $ echo -e '\n#Needed for NumPy' >> ~/.bashrc
+       $ echo -e 'export BLAS=/usr/local/lib/libopenblas.a' >> ~/.bashrc
+       $ echo -e 'export LAPACK=/usr/local/lib/libopenblas.a' >> ~/.bashrc
+       $ echo -e 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib' >> 
~/.bashrc
+       $ echo -e 'export LIBRARY_PATH=$LD_LIBRARY_PATH' >> ~/.bashrc
+       ```
+
+    You can use the `hawq ssh` utility with these `echo` commands to add the 
environment variables to the `.bashrc` file on all nodes in your HAWQ cluster.
+
+### <a id="testingpythonmodules"></a>Testing Installed Python Modules 
+
+You can create a simple PL/Python user-defined function (UDF) to validate that 
a Python module is available in HAWQ. This example tests the NumPy module.
+
+1. Create a PL/Python UDF that imports the NumPy module:
+
+    ``` shell
+    gpadmin@hawq_node$ psql -d testdb
+    ```
+    ``` sql
+    =# CREATE OR REPLACE FUNCTION test_importnumpy(x int)
+       RETURNS text
+       AS $$
+         try:
+             from numpy import *
+             return 'SUCCESS'
+         except ImportError, e:
+             return 'FAILURE'
+       $$ LANGUAGE plpythonu;
+    ```
+
+    The function returns SUCCESS if the module is imported, and FAILURE if an 
import error occurs.
+
+2. Create a table that loads data on each HAWQ segment instance:
+
+    ``` sql
+    => CREATE TABLE disttbl AS (SELECT x FROM generate_series(1,50) x ) 
DISTRIBUTED BY (x);
+    ```
+    
+    Depending upon the size of your HAWQ installation, you may need to 
generate a larger series to ensure data is distributed to all segment instances.
+
+3. Run the UDF on the segment nodes where data is stored in the primary 
segment instances.
+
+    ``` sql
+    =# SELECT gp_segment_id, test_importnumpy(1) AS status
+         FROM disttbl
+         GROUP BY gp_segment_id, status
+         ORDER BY gp_segment_id, status;
+    ```
+
+    The `SELECT` command returns SUCCESS if the UDF imported the Python module 
on the HAWQ segment instance. FAILURE is returned if the Python module could 
not be imported.
+   
+
+#### <a id="testingpythonmodules"></a>Troubleshooting Python Module Import 
Failures
+
+Possible causes of a Python module import failure include:
+
+- A problem accessing required libraries. For the NumPy example, HAWQ might 
have a problem accessing the OpenBLAS libraries or the Python libraries on a 
segment host.
+
+       *Try*: Test importing the module on the segment host. This `hawq ssh` 
command tests importing the NumPy module on the segment host named mdw1.
+
+       ``` shell
+       gpadmin@hawq-node$ hawq ssh -h mdw1 python -c "import numpy"
+       ```
+
+- Environment variables may not be configured in the HAWQ environment. The 
Python import command may not return an error in this case.
+
+       *Try*: Ensure that the environment variables are properly set. For the 
NumPy example, ensure that the environment variables listed at the end of the 
section [Build and Install NumPy](#buildinstallnumpy) are defined in the 
`.bashrc` file for the `gpadmin` user on the master and all segment nodes.
+       
+       **Note:** The `.bashrc` file for the `gpadmin` user on the HAWQ master 
and all segment nodes must source the `greenplum_path.sh` file.
+
+       
+- HAWQ might not have been restarted after adding environment variable 
settings to the `.bashrc` file. Again, the Python import command may not return 
an error in this case.
+
+       *Try*: Ensure that you have restarted HAWQ.
+       
+       ``` shell
+       gpadmin@master$ hawq restart cluster
+       ```
+
+## <a id="dictionarygd"></a>Using the GD Dictionary to Improve PL/Python 
Performance 
+
+Importing a Python module is an expensive operation that can adversely affect 
performance. If you are importing the same module frequently, you can use 
Python global variables to import the module on the first invocation and forego 
loading the module on subsequent imports. 
+
+The following PL/Python function uses the GD persistent storage dictionary to 
avoid importing the module NumPy if it has already been imported in the GD. The 
UDF includes a call to `plpy.notice()` to display a message when importing the 
module.
+
+``` sql
+=# CREATE FUNCTION mypy_import2gd() RETURNS text AS $$ 
+     if 'numpy' not in GD:
+       plpy.notice('mypy_import2gd: importing module numpy')
+       import numpy
+       GD['numpy'] = numpy
+     return 'numpy'
+   $$ LANGUAGE plpythonu;
+```
+``` sql
+=# SELECT mypy_import2gd();
+NOTICE:  mypy_import2gd: importing module numpy
+CONTEXT:  PL/Python function "mypy_import2gd"
+ mypy_import2gd 
+----------------
+ numpy
+(1 row)
+```
+``` sql
+=# SELECT mypy_import2gd();
+ mypy_import2gd 
+----------------
+ numpy
+(1 row)
+```
+
+The second `SELECT` call does not include the `NOTICE` message, indicating 
that the module was obtained from the GD.
+
+## <a id="references"></a>References 
+
+This section lists references for using PL/Python.
+
+### <a id="technicalreferences"></a>Technical References 
+
+For information about PL/Python in HAWQ, see the [PL/Python - Python 
Procedural Language](http://www.postgresql.org/docs/8.2/static/plpython.html) 
PostgreSQL documentation.
+
+For information about Python Package Index (PyPI), refer to [PyPI - the Python 
Package Index](https://pypi.python.org/pypi).
+
+The following Python modules may be of interest:
+
+- [SciPy library](http://www.scipy.org/scipylib/index.html) provides 
user-friendly and efficient numerical routines including those for numerical 
integration and optimization. To download the SciPy package tar file:
+
+    ``` shell
+    hawq-node$ wget 
http://sourceforge.net/projects/scipy/files/scipy/0.10.1/scipy-0.10.1.tar.gz
+    ```
+
+- [Natural Language Toolkit](http://www.nltk.org/) (`nltk`) is a platform for 
building Python programs to work with human language data. 
+
+    The Python [`distribute`](https://pypi.python.org/pypi/distribute/0.6.21) 
package is required for `nltk`. The `distribute` package should be installed 
before installing `ntlk`. To download the `distribute` package tar file:
+
+    ``` shell
+    hawq-node$ wget 
http://pypi.python.org/packages/source/d/distribute/distribute-0.6.21.tar.gz
+    ```
+
+    To download the `nltk` package tar file:
+
+    ``` shell
+    hawq-node$ wget 
http://pypi.python.org/packages/source/n/nltk/nltk-2.0.2.tar.gz#md5=6e714ff74c3398e88be084748df4e657
+    ```
+
+### <a id="usefulreading"></a>Useful Reading 
+
+For information about the Python language, see 
[http://www.python.org/](http://www.python.org/).
+
+A set of slides that were used in a talk about how the Pivotal Data Science 
team uses the PyData stack in the Pivotal MPP databases and on Pivotal Cloud 
Foundry 
[http://www.slideshare.net/SrivatsanRamanujam/all-thingspythonpivotal](http://www.slideshare.net/SrivatsanRamanujam/all-thingspythonpivotal).
+

[31/57] [abbrv] [partial] incubator-hawq-docs git commit: HAWQ-1254 Fix/remove book branching on incubator-hawq-docs

Reply via email to