[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612454#comment-15612454 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user kavinderd commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85376355 --- Diff: pxf/HivePXF.html.md.erb --- @@ -2,121 +2,450 @@ title: Accessing Hive Data --- -This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog. +Apache Hive is a distributed data warehousing infrastructure. Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. + +This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include: + +- Creating an external table in PXF and querying that table +- Querying Hive tables via PXF's integration with HCatalog ## Prerequisites -Check the following before using PXF to access Hive: +Before accessing Hive data with HAWQ and PXF, ensure that: -- The PXF HDFS plug-in is installed on all cluster nodes. +- The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. - The PXF Hive plug-in is installed on all cluster nodes. - The Hive JAR files and conf directory are installed on all cluster nodes. -- Test PXF on HDFS before connecting to Hive or HBase. +- You have tested PXF on HDFS. - You are running the Hive Metastore service on a machine in your cluster. - You have set the `hive.metastore.uris` property in the `hive-site.xml` on the NameNode. +## Hive File Formats + +Hive supports several file formats: + +- TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation +- SequenceFile - flat file consisting of binary key/value pairs +- RCFile - record columnar data consisting of binary key/value pairs; high row compression rate +- ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size +- Parquet - compressed columnar data representation +- Avro - JSON-defined, schema-based data serialization format + +Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive. + +The PXF Hive plug-in supports the following profiles for accessing the Hive file formats listed above. These include: + +- `Hive` +- `HiveText` +- `HiveRC` + +## Data Type Mapping + +### Primitive Data Types + +To represent Hive data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type. + +The following table summarizes external mapping rules for Hive primitive types. + +| Hive Data Type | Hawq Data Type | +|---|---| +| boolean| bool | +| int | int4 | +| smallint | int2 | +| tinyint | int2 | +| bigint | int8 | +| decimal | numeric | +| float | float4 | +| double | float8 | +| string | text | +| binary | bytea | +| char | bpchar | +| varchar | varchar | +| timestamp | timestamp | +| date | date | + + +### Complex Data Types + +Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`. While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract subcomponents of these complex data types. + +An example using complex data types is provided later in this topic. + + +## Sample Data Set + +Examples used in this topic will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types: + +- location - text +- month - text +- number\_of\_orders - integer +- total\_sales - double + +Prepare the sample data set for use: + +1. First, create a text file: + +``` +$ vi /tmp/pxf_hive_datafile.txt +``` + +2. Add the following data to `pxf_hive_datafile.txt`; notice the use of the comma `,` to separate the four field values: + +``` +Prague,Jan,101,4875.33 +
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612400#comment-15612400 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user lisakowen commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85374612 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,506 +2,449 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class=accessor_class=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; +If you find that the pre-defined PXF HDFS profiles do not meet your needs, you may choose to create a custom HDFS profile from the existing HDFS serialization and deserialization classes. Refer to [Adding and Updating Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on creating a custom profile. + +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, and so forth. + +The HDFS file system command syntax is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. + +`hdfs dfs` options used in this topic are: + +| Option | Description | +|---|-| +| `-cat`| Display file contents. | +| `-mkdir`| Create directory in HDFS. | +| `-put`| Copy file from local file system to HDFS. | + +Examples: + +Create a directory in HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir ``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +Copy a text file to HDFS: -``` sql -INSERT INTO table_name ...; +``` shell +$ sudo -u hdfs hdfs dfs -put /tmp/example.txt /data/exampledir/ ``` -To read the data in the files or to write based on the existing format, use `FORMAT`, `PROFILE`, or one of the classes. - -This topic describes the following: - -- FORMAT clause -- Profile -- Accessor -- Resolver -- Avro - -**Note:** For more details about the API and classes, see [PXF External Tables and API](PXFExternalTableandAPIReference.html#pxfexternaltableandapireference). - -### FORMAT clause - -Use one of the following formats to read data with any PXF connector: - -- `FORMAT 'TEXT'`: Use with plain delimited text files on HDFS. -- `FORMAT 'CSV'`: Use with comma-separated value files on HDFS.
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612406#comment-15612406 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user kavinderd commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85375160 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,506 +2,449 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class=accessor_class=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; +If you find that the pre-defined PXF HDFS profiles do not meet your needs, you may choose to create a custom HDFS profile from the existing HDFS serialization and deserialization classes. Refer to [Adding and Updating Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on creating a custom profile. + +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, and so forth. + +The HDFS file system command syntax is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. + +`hdfs dfs` options used in this topic are: + +| Option | Description | +|---|-| +| `-cat`| Display file contents. | +| `-mkdir`| Create directory in HDFS. | +| `-put`| Copy file from local file system to HDFS. | + +Examples: + +Create a directory in HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir ``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +Copy a text file to HDFS: -``` sql -INSERT INTO table_name ...; +``` shell +$ sudo -u hdfs hdfs dfs -put /tmp/example.txt /data/exampledir/ ``` -To read the data in the files or to write based on the existing format, use `FORMAT`, `PROFILE`, or one of the classes. - -This topic describes the following: - -- FORMAT clause -- Profile -- Accessor -- Resolver -- Avro - -**Note:** For more details about the API and classes, see [PXF External Tables and API](PXFExternalTableandAPIReference.html#pxfexternaltableandapireference). - -### FORMAT clause - -Use one of the following formats to read data with any PXF connector: - -- `FORMAT 'TEXT'`: Use with plain delimited text files on HDFS. -- `FORMAT 'CSV'`: Use with comma-separated value files on HDFS.
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612373#comment-15612373 ] ASF GitHub Bot commented on HAWQ-1107: -- GitHub user lisakowen opened a pull request: https://github.com/apache/incubator-hawq-docs/pull/41 HAWQ-1107 - incorporate kavinder's comments incorporated kavinder's comments on HDFS plug in doc restructure. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lisakowen/incubator-hawq-docs feature/pxfhdfs-enhance Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-docs/pull/41.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #41 commit e16a4a46b6ab2a180e99f5fc793bbabb4f4cbfec Author: Lisa OwenDate: 2016-10-27T16:10:29Z incorporate kavinder's comments > PXF HDFS documentation - restructure content and include more examples > -- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, > Avro) profiles. restructure the content as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612361#comment-15612361 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85372086 --- Diff: pxf/HivePXF.html.md.erb --- @@ -2,121 +2,450 @@ title: Accessing Hive Data --- -This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog. +Apache Hive is a distributed data warehousing infrastructure. Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. + +This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include: + +- Creating an external table in PXF and querying that table +- Querying Hive tables via PXF's integration with HCatalog ## Prerequisites -Check the following before using PXF to access Hive: +Before accessing Hive data with HAWQ and PXF, ensure that: -- The PXF HDFS plug-in is installed on all cluster nodes. +- The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. - The PXF Hive plug-in is installed on all cluster nodes. - The Hive JAR files and conf directory are installed on all cluster nodes. -- Test PXF on HDFS before connecting to Hive or HBase. +- You have tested PXF on HDFS. - You are running the Hive Metastore service on a machine in your cluster. - You have set the `hive.metastore.uris` property in the `hive-site.xml` on the NameNode. +## Hive File Formats + +Hive supports several file formats: + +- TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation +- SequenceFile - flat file consisting of binary key/value pairs +- RCFile - record columnar data consisting of binary key/value pairs; high row compression rate +- ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size +- Parquet - compressed columnar data representation +- Avro - JSON-defined, schema-based data serialization format + +Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive. + +The PXF Hive plug-in supports the following profiles for accessing the Hive file formats listed above. These include: + +- `Hive` +- `HiveText` +- `HiveRC` + +## Data Type Mapping + +### Primitive Data Types + +To represent Hive data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type. + +The following table summarizes external mapping rules for Hive primitive types. + +| Hive Data Type | Hawq Data Type | +|---|---| +| boolean| bool | +| int | int4 | +| smallint | int2 | +| tinyint | int2 | +| bigint | int8 | +| decimal | numeric | +| float | float4 | +| double | float8 | +| string | text | +| binary | bytea | +| char | bpchar | +| varchar | varchar | +| timestamp | timestamp | +| date | date | + + +### Complex Data Types + +Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`. While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract subcomponents of these complex data types. + +An example using complex data types is provided later in this topic. + + +## Sample Data Set + +Examples used in this topic will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types: + +- location - text +- month - text +- number\_of\_orders - integer +- total\_sales - double + +Prepare the sample data set for use: + +1. First, create a text file: + +``` +$ vi /tmp/pxf_hive_datafile.txt +``` + +2. Add the following data to `pxf_hive_datafile.txt`; notice the use of the comma `,` to separate the four field values: + +``` +Prague,Jan,101,4875.33 +
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612370#comment-15612370 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user kavinderd commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85372290 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,506 +2,449 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class=accessor_class=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; +If you find that the pre-defined PXF HDFS profiles do not meet your needs, you may choose to create a custom HDFS profile from the existing HDFS serialization and deserialization classes. Refer to [Adding and Updating Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on creating a custom profile. + +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, and so forth. + +The HDFS file system command syntax is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. + +`hdfs dfs` options used in this topic are: + +| Option | Description | +|---|-| +| `-cat`| Display file contents. | +| `-mkdir`| Create directory in HDFS. | +| `-put`| Copy file from local file system to HDFS. | + +Examples: + +Create a directory in HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir ``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +Copy a text file to HDFS: -``` sql -INSERT INTO table_name ...; +``` shell +$ sudo -u hdfs hdfs dfs -put /tmp/example.txt /data/exampledir/ ``` -To read the data in the files or to write based on the existing format, use `FORMAT`, `PROFILE`, or one of the classes. - -This topic describes the following: - -- FORMAT clause -- Profile -- Accessor -- Resolver -- Avro - -**Note:** For more details about the API and classes, see [PXF External Tables and API](PXFExternalTableandAPIReference.html#pxfexternaltableandapireference). - -### FORMAT clause - -Use one of the following formats to read data with any PXF connector: - -- `FORMAT 'TEXT'`: Use with plain delimited text files on HDFS. -- `FORMAT 'CSV'`: Use with comma-separated value files on HDFS.
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612358#comment-15612358 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85370681 --- Diff: pxf/HivePXF.html.md.erb --- @@ -151,184 +477,120 @@ To enable HCatalog query integration in HAWQ, perform the following steps: postgres=# GRANT ALL ON PROTOCOL pxf TO "role"; ``` -3. To query a Hive table with HCatalog integration, simply query HCatalog directly from HAWQ. The query syntax is: -``` sql -postgres=# SELECT * FROM hcatalog.hive-db-name.hive-table-name; -``` +To query a Hive table with HCatalog integration, query HCatalog directly from HAWQ. The query syntax is: --- End diff -- It's a bit awkward to drop out of the procedure and into free-form discussion of the various operations. I think it might be better to put the previous 3-step procedure into a new subsection like "Enabling HCatalog Integration" and then putting the remaining non-procedural content into "Usage" ? > add PXF HiveText and HiveRC profile examples to the documentation > - > > Key: HAWQ-1071 > URL: https://issues.apache.org/jira/browse/HAWQ-1071 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF Hive documentation includes an example for only the Hive > profile. add examples for HiveText and HiveRC profiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612362#comment-15612362 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85368842 --- Diff: pxf/HivePXF.html.md.erb --- @@ -2,121 +2,450 @@ title: Accessing Hive Data --- -This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog. +Apache Hive is a distributed data warehousing infrastructure. Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. + +This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include: + +- Creating an external table in PXF and querying that table +- Querying Hive tables via PXF's integration with HCatalog ## Prerequisites -Check the following before using PXF to access Hive: +Before accessing Hive data with HAWQ and PXF, ensure that: -- The PXF HDFS plug-in is installed on all cluster nodes. +- The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. - The PXF Hive plug-in is installed on all cluster nodes. - The Hive JAR files and conf directory are installed on all cluster nodes. -- Test PXF on HDFS before connecting to Hive or HBase. +- You have tested PXF on HDFS. - You are running the Hive Metastore service on a machine in your cluster. - You have set the `hive.metastore.uris` property in the `hive-site.xml` on the NameNode. +## Hive File Formats + +Hive supports several file formats: + +- TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation +- SequenceFile - flat file consisting of binary key/value pairs +- RCFile - record columnar data consisting of binary key/value pairs; high row compression rate +- ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size +- Parquet - compressed columnar data representation +- Avro - JSON-defined, schema-based data serialization format + +Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive. + +The PXF Hive plug-in supports the following profiles for accessing the Hive file formats listed above. These include: + +- `Hive` +- `HiveText` +- `HiveRC` + +## Data Type Mapping + +### Primitive Data Types + +To represent Hive data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type. + +The following table summarizes external mapping rules for Hive primitive types. + +| Hive Data Type | Hawq Data Type | +|---|---| +| boolean| bool | +| int | int4 | +| smallint | int2 | +| tinyint | int2 | +| bigint | int8 | +| decimal | numeric | +| float | float4 | +| double | float8 | +| string | text | +| binary | bytea | +| char | bpchar | +| varchar | varchar | +| timestamp | timestamp | +| date | date | + + +### Complex Data Types + +Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`. While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract subcomponents of these complex data types. + +An example using complex data types is provided later in this topic. + + +## Sample Data Set + +Examples used in this topic will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types: + +- location - text +- month - text +- number\_of\_orders - integer +- total\_sales - double + +Prepare the sample data set for use: + +1. First, create a text file: + +``` +$ vi /tmp/pxf_hive_datafile.txt +``` + +2. Add the following data to `pxf_hive_datafile.txt`; notice the use of the comma `,` to separate the four field values: + +``` +Prague,Jan,101,4875.33 +
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612357#comment-15612357 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85367789 --- Diff: pxf/HivePXF.html.md.erb --- @@ -2,121 +2,450 @@ title: Accessing Hive Data --- -This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog. +Apache Hive is a distributed data warehousing infrastructure. Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. + +This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include: + +- Creating an external table in PXF and querying that table +- Querying Hive tables via PXF's integration with HCatalog ## Prerequisites -Check the following before using PXF to access Hive: +Before accessing Hive data with HAWQ and PXF, ensure that: -- The PXF HDFS plug-in is installed on all cluster nodes. +- The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. - The PXF Hive plug-in is installed on all cluster nodes. - The Hive JAR files and conf directory are installed on all cluster nodes. -- Test PXF on HDFS before connecting to Hive or HBase. +- You have tested PXF on HDFS. - You are running the Hive Metastore service on a machine in your cluster. - You have set the `hive.metastore.uris` property in the `hive-site.xml` on the NameNode. +## Hive File Formats + +Hive supports several file formats: + +- TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation +- SequenceFile - flat file consisting of binary key/value pairs +- RCFile - record columnar data consisting of binary key/value pairs; high row compression rate +- ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size +- Parquet - compressed columnar data representation +- Avro - JSON-defined, schema-based data serialization format + +Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive. + +The PXF Hive plug-in supports the following profiles for accessing the Hive file formats listed above. These include: + +- `Hive` +- `HiveText` +- `HiveRC` + +## Data Type Mapping + +### Primitive Data Types + +To represent Hive data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type. + +The following table summarizes external mapping rules for Hive primitive types. + +| Hive Data Type | Hawq Data Type | +|---|---| +| boolean| bool | +| int | int4 | +| smallint | int2 | +| tinyint | int2 | +| bigint | int8 | +| decimal | numeric | +| float | float4 | +| double | float8 | +| string | text | +| binary | bytea | +| char | bpchar | +| varchar | varchar | +| timestamp | timestamp | +| date | date | + + +### Complex Data Types + +Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`. While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract subcomponents of these complex data types. + +An example using complex data types is provided later in this topic. + + +## Sample Data Set + +Examples used in this topic will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types: + +- location - text +- month - text +- number\_of\_orders - integer +- total\_sales - double + +Prepare the sample data set for use: + +1. First, create a text file: + +``` +$ vi /tmp/pxf_hive_datafile.txt +``` + +2. Add the following data to `pxf_hive_datafile.txt`; notice the use of the comma `,` to separate the four field values: + +``` +Prague,Jan,101,4875.33 +
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612366#comment-15612366 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85369947 --- Diff: pxf/HivePXF.html.md.erb --- @@ -2,121 +2,450 @@ title: Accessing Hive Data --- -This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog. +Apache Hive is a distributed data warehousing infrastructure. Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. + +This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include: + +- Creating an external table in PXF and querying that table +- Querying Hive tables via PXF's integration with HCatalog ## Prerequisites -Check the following before using PXF to access Hive: +Before accessing Hive data with HAWQ and PXF, ensure that: -- The PXF HDFS plug-in is installed on all cluster nodes. +- The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. - The PXF Hive plug-in is installed on all cluster nodes. - The Hive JAR files and conf directory are installed on all cluster nodes. -- Test PXF on HDFS before connecting to Hive or HBase. +- You have tested PXF on HDFS. - You are running the Hive Metastore service on a machine in your cluster. - You have set the `hive.metastore.uris` property in the `hive-site.xml` on the NameNode. +## Hive File Formats + +Hive supports several file formats: + +- TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation +- SequenceFile - flat file consisting of binary key/value pairs +- RCFile - record columnar data consisting of binary key/value pairs; high row compression rate +- ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size +- Parquet - compressed columnar data representation +- Avro - JSON-defined, schema-based data serialization format + +Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive. + +The PXF Hive plug-in supports the following profiles for accessing the Hive file formats listed above. These include: + +- `Hive` +- `HiveText` +- `HiveRC` + +## Data Type Mapping + +### Primitive Data Types + +To represent Hive data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type. + +The following table summarizes external mapping rules for Hive primitive types. + +| Hive Data Type | Hawq Data Type | +|---|---| +| boolean| bool | +| int | int4 | +| smallint | int2 | +| tinyint | int2 | +| bigint | int8 | +| decimal | numeric | +| float | float4 | +| double | float8 | +| string | text | +| binary | bytea | +| char | bpchar | +| varchar | varchar | +| timestamp | timestamp | +| date | date | + + +### Complex Data Types + +Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`. While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract subcomponents of these complex data types. + +An example using complex data types is provided later in this topic. + + +## Sample Data Set + +Examples used in this topic will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types: + +- location - text +- month - text +- number\_of\_orders - integer +- total\_sales - double + +Prepare the sample data set for use: + +1. First, create a text file: + +``` +$ vi /tmp/pxf_hive_datafile.txt +``` + +2. Add the following data to `pxf_hive_datafile.txt`; notice the use of the comma `,` to separate the four field values: + +``` +Prague,Jan,101,4875.33 +
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612365#comment-15612365 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85365540 --- Diff: pxf/HivePXF.html.md.erb --- @@ -2,121 +2,450 @@ title: Accessing Hive Data --- -This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog. +Apache Hive is a distributed data warehousing infrastructure. Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. + +This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include: + +- Creating an external table in PXF and querying that table +- Querying Hive tables via PXF's integration with HCatalog ## Prerequisites -Check the following before using PXF to access Hive: +Before accessing Hive data with HAWQ and PXF, ensure that: -- The PXF HDFS plug-in is installed on all cluster nodes. +- The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. - The PXF Hive plug-in is installed on all cluster nodes. - The Hive JAR files and conf directory are installed on all cluster nodes. -- Test PXF on HDFS before connecting to Hive or HBase. +- You have tested PXF on HDFS. - You are running the Hive Metastore service on a machine in your cluster. - You have set the `hive.metastore.uris` property in the `hive-site.xml` on the NameNode. +## Hive File Formats + +Hive supports several file formats: + +- TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation +- SequenceFile - flat file consisting of binary key/value pairs +- RCFile - record columnar data consisting of binary key/value pairs; high row compression rate +- ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size +- Parquet - compressed columnar data representation +- Avro - JSON-defined, schema-based data serialization format --- End diff -- Just a suggestion, but I think this would read better as a 2-column term/definition table. You could even make it a 3-column table to describe which PXF plug-ins are used with each format. > add PXF HiveText and HiveRC profile examples to the documentation > - > > Key: HAWQ-1071 > URL: https://issues.apache.org/jira/browse/HAWQ-1071 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF Hive documentation includes an example for only the Hive > profile. add examples for HiveText and HiveRC profiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612363#comment-15612363 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85367290 --- Diff: pxf/HivePXF.html.md.erb --- @@ -2,121 +2,450 @@ title: Accessing Hive Data --- -This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog. +Apache Hive is a distributed data warehousing infrastructure. Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. + +This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include: + +- Creating an external table in PXF and querying that table +- Querying Hive tables via PXF's integration with HCatalog ## Prerequisites -Check the following before using PXF to access Hive: +Before accessing Hive data with HAWQ and PXF, ensure that: -- The PXF HDFS plug-in is installed on all cluster nodes. +- The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. - The PXF Hive plug-in is installed on all cluster nodes. - The Hive JAR files and conf directory are installed on all cluster nodes. -- Test PXF on HDFS before connecting to Hive or HBase. +- You have tested PXF on HDFS. - You are running the Hive Metastore service on a machine in your cluster. - You have set the `hive.metastore.uris` property in the `hive-site.xml` on the NameNode. +## Hive File Formats + +Hive supports several file formats: + +- TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation +- SequenceFile - flat file consisting of binary key/value pairs +- RCFile - record columnar data consisting of binary key/value pairs; high row compression rate +- ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size +- Parquet - compressed columnar data representation +- Avro - JSON-defined, schema-based data serialization format + +Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive. + +The PXF Hive plug-in supports the following profiles for accessing the Hive file formats listed above. These include: + +- `Hive` +- `HiveText` +- `HiveRC` + +## Data Type Mapping + +### Primitive Data Types + +To represent Hive data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type. + +The following table summarizes external mapping rules for Hive primitive types. + +| Hive Data Type | Hawq Data Type | +|---|---| +| boolean| bool | +| int | int4 | +| smallint | int2 | +| tinyint | int2 | +| bigint | int8 | +| decimal | numeric | +| float | float4 | +| double | float8 | +| string | text | +| binary | bytea | +| char | bpchar | +| varchar | varchar | +| timestamp | timestamp | +| date | date | + + +### Complex Data Types + +Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`. While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract subcomponents of these complex data types. + +An example using complex data types is provided later in this topic. + + +## Sample Data Set + +Examples used in this topic will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types: + +- location - text +- month - text +- number\_of\_orders - integer +- total\_sales - double + +Prepare the sample data set for use: + +1. First, create a text file: + +``` +$ vi /tmp/pxf_hive_datafile.txt +``` + +2. Add the following data to `pxf_hive_datafile.txt`; notice the use of the comma `,` to separate the four field values: + +``` +Prague,Jan,101,4875.33 +
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612367#comment-15612367 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85367943 --- Diff: pxf/HivePXF.html.md.erb --- @@ -2,121 +2,450 @@ title: Accessing Hive Data --- -This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog. +Apache Hive is a distributed data warehousing infrastructure. Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. + +This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include: + +- Creating an external table in PXF and querying that table +- Querying Hive tables via PXF's integration with HCatalog ## Prerequisites -Check the following before using PXF to access Hive: +Before accessing Hive data with HAWQ and PXF, ensure that: -- The PXF HDFS plug-in is installed on all cluster nodes. +- The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. - The PXF Hive plug-in is installed on all cluster nodes. - The Hive JAR files and conf directory are installed on all cluster nodes. -- Test PXF on HDFS before connecting to Hive or HBase. +- You have tested PXF on HDFS. - You are running the Hive Metastore service on a machine in your cluster. - You have set the `hive.metastore.uris` property in the `hive-site.xml` on the NameNode. +## Hive File Formats + +Hive supports several file formats: + +- TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation +- SequenceFile - flat file consisting of binary key/value pairs +- RCFile - record columnar data consisting of binary key/value pairs; high row compression rate +- ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size +- Parquet - compressed columnar data representation +- Avro - JSON-defined, schema-based data serialization format + +Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive. + +The PXF Hive plug-in supports the following profiles for accessing the Hive file formats listed above. These include: + +- `Hive` +- `HiveText` +- `HiveRC` + +## Data Type Mapping + +### Primitive Data Types + +To represent Hive data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type. + +The following table summarizes external mapping rules for Hive primitive types. + +| Hive Data Type | Hawq Data Type | +|---|---| +| boolean| bool | +| int | int4 | +| smallint | int2 | +| tinyint | int2 | +| bigint | int8 | +| decimal | numeric | +| float | float4 | +| double | float8 | +| string | text | +| binary | bytea | +| char | bpchar | +| varchar | varchar | +| timestamp | timestamp | +| date | date | + + +### Complex Data Types + +Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`. While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract subcomponents of these complex data types. + +An example using complex data types is provided later in this topic. + + +## Sample Data Set + +Examples used in this topic will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types: + +- location - text +- month - text +- number\_of\_orders - integer +- total\_sales - double + +Prepare the sample data set for use: + +1. First, create a text file: + +``` +$ vi /tmp/pxf_hive_datafile.txt +``` + +2. Add the following data to `pxf_hive_datafile.txt`; notice the use of the comma `,` to separate the four field values: + +``` +Prague,Jan,101,4875.33 +
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612359#comment-15612359 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85365959 --- Diff: pxf/HivePXF.html.md.erb --- @@ -2,121 +2,450 @@ title: Accessing Hive Data --- -This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog. +Apache Hive is a distributed data warehousing infrastructure. Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. + +This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include: + +- Creating an external table in PXF and querying that table +- Querying Hive tables via PXF's integration with HCatalog ## Prerequisites -Check the following before using PXF to access Hive: +Before accessing Hive data with HAWQ and PXF, ensure that: -- The PXF HDFS plug-in is installed on all cluster nodes. +- The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. - The PXF Hive plug-in is installed on all cluster nodes. - The Hive JAR files and conf directory are installed on all cluster nodes. -- Test PXF on HDFS before connecting to Hive or HBase. +- You have tested PXF on HDFS. - You are running the Hive Metastore service on a machine in your cluster. - You have set the `hive.metastore.uris` property in the `hive-site.xml` on the NameNode. +## Hive File Formats + +Hive supports several file formats: + +- TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation +- SequenceFile - flat file consisting of binary key/value pairs +- RCFile - record columnar data consisting of binary key/value pairs; high row compression rate +- ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size +- Parquet - compressed columnar data representation +- Avro - JSON-defined, schema-based data serialization format + +Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive. + +The PXF Hive plug-in supports the following profiles for accessing the Hive file formats listed above. These include: + +- `Hive` +- `HiveText` +- `HiveRC` + +## Data Type Mapping + +### Primitive Data Types + +To represent Hive data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type. + +The following table summarizes external mapping rules for Hive primitive types. + +| Hive Data Type | Hawq Data Type | +|---|---| +| boolean| bool | +| int | int4 | +| smallint | int2 | +| tinyint | int2 | +| bigint | int8 | +| decimal | numeric | +| float | float4 | +| double | float8 | +| string | text | +| binary | bytea | +| char | bpchar | +| varchar | varchar | +| timestamp | timestamp | +| date | date | + + +### Complex Data Types + +Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`. While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract subcomponents of these complex data types. + +An example using complex data types is provided later in this topic. + + +## Sample Data Set + +Examples used in this topic will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types: + +- location - text +- month - text +- number\_of\_orders - integer +- total\_sales - double --- End diff -- Also consider term/definition table here. > add PXF HiveText and HiveRC profile examples to the documentation > - > > Key: HAWQ-1071 > URL: https://issues.apache.org/jira/browse/HAWQ-1071 > Project: Apache HAWQ >
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612356#comment-15612356 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85368752 --- Diff: pxf/HivePXF.html.md.erb --- @@ -2,121 +2,450 @@ title: Accessing Hive Data --- -This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog. +Apache Hive is a distributed data warehousing infrastructure. Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. + +This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include: + +- Creating an external table in PXF and querying that table +- Querying Hive tables via PXF's integration with HCatalog ## Prerequisites -Check the following before using PXF to access Hive: +Before accessing Hive data with HAWQ and PXF, ensure that: -- The PXF HDFS plug-in is installed on all cluster nodes. +- The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. - The PXF Hive plug-in is installed on all cluster nodes. - The Hive JAR files and conf directory are installed on all cluster nodes. -- Test PXF on HDFS before connecting to Hive or HBase. +- You have tested PXF on HDFS. - You are running the Hive Metastore service on a machine in your cluster. - You have set the `hive.metastore.uris` property in the `hive-site.xml` on the NameNode. +## Hive File Formats + +Hive supports several file formats: + +- TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation +- SequenceFile - flat file consisting of binary key/value pairs +- RCFile - record columnar data consisting of binary key/value pairs; high row compression rate +- ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size +- Parquet - compressed columnar data representation +- Avro - JSON-defined, schema-based data serialization format + +Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive. + +The PXF Hive plug-in supports the following profiles for accessing the Hive file formats listed above. These include: + +- `Hive` +- `HiveText` +- `HiveRC` + +## Data Type Mapping + +### Primitive Data Types + +To represent Hive data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type. + +The following table summarizes external mapping rules for Hive primitive types. + +| Hive Data Type | Hawq Data Type | +|---|---| +| boolean| bool | +| int | int4 | +| smallint | int2 | +| tinyint | int2 | +| bigint | int8 | +| decimal | numeric | +| float | float4 | +| double | float8 | +| string | text | +| binary | bytea | +| char | bpchar | +| varchar | varchar | +| timestamp | timestamp | +| date | date | + + +### Complex Data Types + +Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`. While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract subcomponents of these complex data types. + +An example using complex data types is provided later in this topic. + + +## Sample Data Set + +Examples used in this topic will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types: + +- location - text +- month - text +- number\_of\_orders - integer +- total\_sales - double + +Prepare the sample data set for use: + +1. First, create a text file: + +``` +$ vi /tmp/pxf_hive_datafile.txt +``` + +2. Add the following data to `pxf_hive_datafile.txt`; notice the use of the comma `,` to separate the four field values: + +``` +Prague,Jan,101,4875.33 +
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612364#comment-15612364 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85366470 --- Diff: pxf/HivePXF.html.md.erb --- @@ -2,121 +2,450 @@ title: Accessing Hive Data --- -This topic describes how to access Hive data using PXF. You have several options for querying data stored in Hive. You can create external tables in PXF and then query those tables, or you can easily query Hive tables by using HAWQ and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored in HCatalog. +Apache Hive is a distributed data warehousing infrastructure. Hive facilitates managing large data sets supporting multiple data formats, including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive plug-in reads data stored in Hive, as well as HDFS or HBase. + +This section describes how to use PXF to access Hive data. Options for querying data stored in Hive include: + +- Creating an external table in PXF and querying that table +- Querying Hive tables via PXF's integration with HCatalog ## Prerequisites -Check the following before using PXF to access Hive: +Before accessing Hive data with HAWQ and PXF, ensure that: -- The PXF HDFS plug-in is installed on all cluster nodes. +- The PXF HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. - The PXF Hive plug-in is installed on all cluster nodes. - The Hive JAR files and conf directory are installed on all cluster nodes. -- Test PXF on HDFS before connecting to Hive or HBase. +- You have tested PXF on HDFS. - You are running the Hive Metastore service on a machine in your cluster. - You have set the `hive.metastore.uris` property in the `hive-site.xml` on the NameNode. +## Hive File Formats + +Hive supports several file formats: + +- TextFile - flat file with data in comma-, tab-, or space-separated value format or JSON notation +- SequenceFile - flat file consisting of binary key/value pairs +- RCFile - record columnar data consisting of binary key/value pairs; high row compression rate +- ORCFile - optimized row columnar data with stripe, footer, and postscript sections; reduces data size +- Parquet - compressed columnar data representation +- Avro - JSON-defined, schema-based data serialization format + +Refer to [File Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for detailed information about the file formats supported by Hive. + +The PXF Hive plug-in supports the following profiles for accessing the Hive file formats listed above. These include: + +- `Hive` +- `HiveText` +- `HiveRC` + +## Data Type Mapping + +### Primitive Data Types + +To represent Hive data in HAWQ, map data values that use a primitive data type to HAWQ columns of the same type. + +The following table summarizes external mapping rules for Hive primitive types. + +| Hive Data Type | Hawq Data Type | +|---|---| +| boolean| bool | +| int | int4 | +| smallint | int2 | +| tinyint | int2 | +| bigint | int8 | +| decimal | numeric | +| float | float4 | +| double | float8 | +| string | text | +| binary | bytea | +| char | bpchar | +| varchar | varchar | +| timestamp | timestamp | +| date | date | + + +### Complex Data Types + +Hive supports complex data types including array, struct, map, and union. PXF maps each of these complex types to `text`. While HAWQ does not natively support these types, you can create HAWQ functions or application code to extract subcomponents of these complex data types. + +An example using complex data types is provided later in this topic. + + +## Sample Data Set + +Examples used in this topic will operate on a common data set. This simple data set models a retail sales operation and includes fields with the following names and data types: + +- location - text +- month - text +- number\_of\_orders - integer +- total\_sales - double + +Prepare the sample data set for use: + +1. First, create a text file: + +``` +$ vi /tmp/pxf_hive_datafile.txt +``` + +2. Add the following data to `pxf_hive_datafile.txt`; notice the use of the comma `,` to separate the four field values: + +``` +Prague,Jan,101,4875.33 +
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612360#comment-15612360 ] ASF GitHub Bot commented on HAWQ-1071: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85371576 --- Diff: pxf/HivePXF.html.md.erb --- @@ -151,184 +477,120 @@ To enable HCatalog query integration in HAWQ, perform the following steps: postgres=# GRANT ALL ON PROTOCOL pxf TO "role"; ``` -3. To query a Hive table with HCatalog integration, simply query HCatalog directly from HAWQ. The query syntax is: -``` sql -postgres=# SELECT * FROM hcatalog.hive-db-name.hive-table-name; -``` +To query a Hive table with HCatalog integration, query HCatalog directly from HAWQ. The query syntax is: + +``` sql +postgres=# SELECT * FROM hcatalog.hive-db-name.hive-table-name; +``` -For example: +For example: -``` sql -postgres=# SELECT * FROM hcatalog.default.sales; -``` - -4. To obtain a description of a Hive table with HCatalog integration, you can use the `psql` client interface. -- Within HAWQ, use either the `\d hcatalog.hive-db-name.hive-table-name` or `\d+ hcatalog.hive-db-name.hive-table-name` commands to describe a single table. For example, from the `psql` client interface: - -``` shell -$ psql -d postgres -postgres=# \d hcatalog.default.test - -PXF Hive Table "default.test" -Column| Type ---+ - name | text - type | text - supplier_key | int4 - full_price | float8 -``` -- Use `\d hcatalog.hive-db-name.*` to describe the whole database schema. For example: - -``` shell -postgres=# \d hcatalog.default.* - -PXF Hive Table "default.test" -Column| Type ---+ - type | text - name | text - supplier_key | int4 - full_price | float8 - -PXF Hive Table "default.testabc" - Column | Type -+-- - type | text - name | text -``` -- Use `\d hcatalog.*.*` to describe the whole schema: - -``` shell -postgres=# \d hcatalog.*.* - -PXF Hive Table "default.test" -Column| Type ---+ - type | text - name | text - supplier_key | int4 - full_price | float8 - -PXF Hive Table "default.testabc" - Column | Type -+-- - type | text - name | text - -PXF Hive Table "userdb.test" - Column | Type ---+-- - address | text - username | text - -``` - -**Note:** When using `\d` or `\d+` commands in the `psql` HAWQ client, `hcatalog` will not be listed as a database. If you use other `psql` compatible clients, `hcatalog` will be listed as a database with a size value of `-1` since `hcatalog` is not a real database in HAWQ. - -5. Alternatively, you can use the **pxf\_get\_item\_fields** user-defined function (UDF) to obtain Hive table descriptions from other client interfaces or third-party applications. The UDF takes a PXF profile and a table pattern string as its input parameters. - -**Note:** Currently the only supported input profile is `'Hive'`. - -For example, the following statement returns a description of a specific table. The description includes path, itemname (table), fieldname, and fieldtype. +``` sql +postgres=# SELECT * FROM hcatalog.default.sales_info; +``` + +To obtain a description of a Hive table with HCatalog integration, you can use the `psql` client interface. + +- Within HAWQ, use either the `\d hcatalog.hive-db-name.hive-table-name` or `\d+ hcatalog.hive-db-name.hive-table-name` commands to describe a single table. For example, from the `psql` client interface: + +``` shell +$ psql -d postgres +``` ``` sql -postgres=# select * from pxf_get_item_fields('Hive','default.test'); +postgres=# \d hcatalog.default.sales_info_rcfile; ``` - -``` pre - path | itemname |
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612342#comment-15612342 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user lisakowen commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85371514 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,506 +2,449 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class=accessor_class=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; +If you find that the pre-defined PXF HDFS profiles do not meet your needs, you may choose to create a custom HDFS profile from the existing HDFS serialization and deserialization classes. Refer to [Adding and Updating Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on creating a custom profile. + +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, and so forth. + +The HDFS file system command syntax is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. + +`hdfs dfs` options used in this topic are: + +| Option | Description | +|---|-| +| `-cat`| Display file contents. | +| `-mkdir`| Create directory in HDFS. | +| `-put`| Copy file from local file system to HDFS. | + +Examples: + +Create a directory in HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir ``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +Copy a text file to HDFS: -``` sql -INSERT INTO table_name ...; +``` shell +$ sudo -u hdfs hdfs dfs -put /tmp/example.txt /data/exampledir/ ``` -To read the data in the files or to write based on the existing format, use `FORMAT`, `PROFILE`, or one of the classes. - -This topic describes the following: - -- FORMAT clause -- Profile -- Accessor -- Resolver -- Avro - -**Note:** For more details about the API and classes, see [PXF External Tables and API](PXFExternalTableandAPIReference.html#pxfexternaltableandapireference). - -### FORMAT clause - -Use one of the following formats to read data with any PXF connector: - -- `FORMAT 'TEXT'`: Use with plain delimited text files on HDFS. -- `FORMAT 'CSV'`: Use with comma-separated value files on HDFS.
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612340#comment-15612340 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user lisakowen commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85371358 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,506 +2,449 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class=accessor_class=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; +If you find that the pre-defined PXF HDFS profiles do not meet your needs, you may choose to create a custom HDFS profile from the existing HDFS serialization and deserialization classes. Refer to [Adding and Updating Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on creating a custom profile. + +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, and so forth. + +The HDFS file system command syntax is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. + +`hdfs dfs` options used in this topic are: + +| Option | Description | +|---|-| +| `-cat`| Display file contents. | +| `-mkdir`| Create directory in HDFS. | +| `-put`| Copy file from local file system to HDFS. | + +Examples: + +Create a directory in HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir ``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +Copy a text file to HDFS: -``` sql -INSERT INTO table_name ...; +``` shell +$ sudo -u hdfs hdfs dfs -put /tmp/example.txt /data/exampledir/ ``` -To read the data in the files or to write based on the existing format, use `FORMAT`, `PROFILE`, or one of the classes. - -This topic describes the following: - -- FORMAT clause -- Profile -- Accessor -- Resolver -- Avro - -**Note:** For more details about the API and classes, see [PXF External Tables and API](PXFExternalTableandAPIReference.html#pxfexternaltableandapireference). - -### FORMAT clause - -Use one of the following formats to read data with any PXF connector: - -- `FORMAT 'TEXT'`: Use with plain delimited text files on HDFS. -- `FORMAT 'CSV'`: Use with comma-separated value files on HDFS.
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612281#comment-15612281 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user kavinderd commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85362384 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,506 +2,449 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class=accessor_class=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; +If you find that the pre-defined PXF HDFS profiles do not meet your needs, you may choose to create a custom HDFS profile from the existing HDFS serialization and deserialization classes. Refer to [Adding and Updating Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on creating a custom profile. + +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, and so forth. + +The HDFS file system command syntax is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. + +`hdfs dfs` options used in this topic are: + +| Option | Description | +|---|-| +| `-cat`| Display file contents. | +| `-mkdir`| Create directory in HDFS. | +| `-put`| Copy file from local file system to HDFS. | + +Examples: + +Create a directory in HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir ``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +Copy a text file to HDFS: -``` sql -INSERT INTO table_name ...; +``` shell +$ sudo -u hdfs hdfs dfs -put /tmp/example.txt /data/exampledir/ ``` -To read the data in the files or to write based on the existing format, use `FORMAT`, `PROFILE`, or one of the classes. - -This topic describes the following: - -- FORMAT clause -- Profile -- Accessor -- Resolver -- Avro - -**Note:** For more details about the API and classes, see [PXF External Tables and API](PXFExternalTableandAPIReference.html#pxfexternaltableandapireference). - -### FORMAT clause - -Use one of the following formats to read data with any PXF connector: - -- `FORMAT 'TEXT'`: Use with plain delimited text files on HDFS. -- `FORMAT 'CSV'`: Use with comma-separated value files on HDFS.
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612279#comment-15612279 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user kavinderd commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85358483 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,506 +2,449 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class=accessor_class=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; +If you find that the pre-defined PXF HDFS profiles do not meet your needs, you may choose to create a custom HDFS profile from the existing HDFS serialization and deserialization classes. Refer to [Adding and Updating Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on creating a custom profile. + +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, and so forth. + +The HDFS file system command syntax is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. + +`hdfs dfs` options used in this topic are: + +| Option | Description | +|---|-| +| `-cat`| Display file contents. | +| `-mkdir`| Create directory in HDFS. | +| `-put`| Copy file from local file system to HDFS. | + +Examples: + +Create a directory in HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir ``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +Copy a text file to HDFS: -``` sql -INSERT INTO table_name ...; +``` shell +$ sudo -u hdfs hdfs dfs -put /tmp/example.txt /data/exampledir/ ``` -To read the data in the files or to write based on the existing format, use `FORMAT`, `PROFILE`, or one of the classes. - -This topic describes the following: - -- FORMAT clause -- Profile -- Accessor -- Resolver -- Avro - -**Note:** For more details about the API and classes, see [PXF External Tables and API](PXFExternalTableandAPIReference.html#pxfexternaltableandapireference). - -### FORMAT clause - -Use one of the following formats to read data with any PXF connector: - -- `FORMAT 'TEXT'`: Use with plain delimited text files on HDFS. -- `FORMAT 'CSV'`: Use with comma-separated value files on HDFS.
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612280#comment-15612280 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user kavinderd commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85361807 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,506 +2,449 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class=accessor_class=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; +If you find that the pre-defined PXF HDFS profiles do not meet your needs, you may choose to create a custom HDFS profile from the existing HDFS serialization and deserialization classes. Refer to [Adding and Updating Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on creating a custom profile. + +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, and so forth. + +The HDFS file system command syntax is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. + +`hdfs dfs` options used in this topic are: + +| Option | Description | +|---|-| +| `-cat`| Display file contents. | +| `-mkdir`| Create directory in HDFS. | +| `-put`| Copy file from local file system to HDFS. | + +Examples: + +Create a directory in HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir --- End diff -- You don't necessarily have to run hdfs commands as `sudo -u hdfs` if the current user has the hdfs client and permissions. > PXF HDFS documentation - restructure content and include more examples > -- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, > Avro) profiles. restructure the content as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612282#comment-15612282 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user kavinderd commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85362806 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,506 +2,449 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class=accessor_class=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; +If you find that the pre-defined PXF HDFS profiles do not meet your needs, you may choose to create a custom HDFS profile from the existing HDFS serialization and deserialization classes. Refer to [Adding and Updating Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on creating a custom profile. + +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, and so forth. + +The HDFS file system command syntax is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. + +`hdfs dfs` options used in this topic are: + +| Option | Description | +|---|-| +| `-cat`| Display file contents. | +| `-mkdir`| Create directory in HDFS. | +| `-put`| Copy file from local file system to HDFS. | + +Examples: + +Create a directory in HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -mkdir -p /data/exampledir ``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +Copy a text file to HDFS: -``` sql -INSERT INTO table_name ...; +``` shell +$ sudo -u hdfs hdfs dfs -put /tmp/example.txt /data/exampledir/ ``` -To read the data in the files or to write based on the existing format, use `FORMAT`, `PROFILE`, or one of the classes. - -This topic describes the following: - -- FORMAT clause -- Profile -- Accessor -- Resolver -- Avro - -**Note:** For more details about the API and classes, see [PXF External Tables and API](PXFExternalTableandAPIReference.html#pxfexternaltableandapireference). - -### FORMAT clause - -Use one of the following formats to read data with any PXF connector: - -- `FORMAT 'TEXT'`: Use with plain delimited text files on HDFS. -- `FORMAT 'CSV'`: Use with comma-separated value files on HDFS.
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15609898#comment-15609898 ] ASF GitHub Bot commented on HAWQ-1071: -- GitHub user lisakowen opened a pull request: https://github.com/apache/incubator-hawq-docs/pull/40 HAWQ-1071 - subnav changes for pxf enhancement work removed all submenus from the subnav You can merge this pull request into a Git repository by running: $ git pull https://github.com/lisakowen/incubator-hawq-docs feature/subnav-pxfhive-enhance Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-docs/pull/40.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #40 commit c3f381265b2c48b89f98863888ccd00b2926880c Author: Lisa OwenDate: 2016-10-24T19:43:04Z subnav chgs for hive plugin content restructure commit 54445c6815a166e4e275455ea64221322087 Author: Lisa Owen Date: 2016-10-26T16:36:31Z remove submenu from pxf hive plugin subnav > add PXF HiveText and HiveRC profile examples to the documentation > - > > Key: HAWQ-1071 > URL: https://issues.apache.org/jira/browse/HAWQ-1071 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF Hive documentation includes an example for only the Hive > profile. add examples for HiveText and HiveRC profiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation
[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15609892#comment-15609892 ] ASF GitHub Bot commented on HAWQ-1071: -- GitHub user lisakowen opened a pull request: https://github.com/apache/incubator-hawq-docs/pull/39 HAWQ-1071 - add examples for HiveText and HiveRC plugins added examples, restructured content, added hive command line section. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lisakowen/incubator-hawq-docs feature/pxfhive-enhance Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-docs/pull/39.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #39 commit 0398a62fefd3627273927f938b4d082a25bf3003 Author: Lisa OwenDate: 2016-09-26T21:37:04Z restructure PXF Hive pulug-in page; add more relevant examples commit 457d703a3f5c057e241acf985fbc35da34f6a075 Author: Lisa Owen Date: 2016-09-26T22:40:10Z PXF Hive plug-in mods commit 822d7545e746490e55507866c62dca5ea2d5349a Author: Lisa Owen Date: 2016-10-03T22:19:03Z clean up some extra whitespace commit 8c986b60b8db3edd77c10f23704cc9174c52a803 Author: Lisa Owen Date: 2016-10-11T18:37:34Z include list of hive profile names in file format section commit 150fa67857871d58ea05eb14c023215c932ab7b1 Author: Lisa Owen Date: 2016-10-11T19:03:39Z link to CREATE EXTERNAL TABLE ref page commit 5cdd8f8c35a51360fe3bfdedeff796bf1e0f31f3 Author: Lisa Owen Date: 2016-10-11T20:27:17Z sql commands all caps commit 67e8b9699c9eec64d04ce9e6048ffb385f7f3573 Author: Lisa Owen Date: 2016-10-11T20:33:35Z use <> for optional args commit 54b2c01a80d477cc093d7eb1ed2aa8c0bf762d36 Author: Lisa Owen Date: 2016-10-22T00:16:24Z fix some duplicate ids commit 284c3ec2db38e8d9020826e3bf292efad76c1819 Author: Lisa Owen Date: 2016-10-26T15:38:37Z restructure to use numbered steps commit 2a38a0322abda804cfd4fc8aa39f142f0d83ea11 Author: Lisa Owen Date: 2016-10-26T17:20:28Z note/notice > add PXF HiveText and HiveRC profile examples to the documentation > - > > Key: HAWQ-1071 > URL: https://issues.apache.org/jira/browse/HAWQ-1071 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF Hive documentation includes an example for only the Hive > profile. add examples for HiveText and HiveRC profiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15609382#comment-15609382 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq-docs/pull/33 > PXF HDFS documentation - restructure content and include more examples > -- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, > Avro) profiles. restructure the content as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15609383#comment-15609383 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq-docs/pull/38 > PXF HDFS documentation - restructure content and include more examples > -- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, > Avro) profiles. restructure the content as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608979#comment-15608979 ] ASF GitHub Bot commented on HAWQ-1107: -- GitHub user lisakowen opened a pull request: https://github.com/apache/incubator-hawq-docs/pull/38 HAWQ-1107 - more subnav changes for HDFS plugin remove all submenus You can merge this pull request into a Git repository by running: $ git pull https://github.com/lisakowen/incubator-hawq-docs feature/subnav-pxfhdfs-enhance Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-docs/pull/38.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #38 > PXF HDFS documentation - restructure content and include more examples > -- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, > Avro) profiles. restructure the content as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15606649#comment-15606649 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq-docs/pull/34 > PXF HDFS documentation - restructure content and include more examples > -- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, > Avro) profiles. restructure the content as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15606507#comment-15606507 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r84997781 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,388 +2,282 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class=accessor_class=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; -``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, etc. -``` sql -INSERT INTO table_name ...; -``` +The HDFS file system command is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. --- End diff -- command -> command syntax > PXF HDFS documentation - restructure content and include more examples > -- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, > Avro) profiles. restructure the content as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15606512#comment-15606512 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r84999604 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,388 +2,282 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class=accessor_class=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; -``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, etc. -``` sql -INSERT INTO table_name ...; -``` +The HDFS file system command is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. + +`hdfs dfs` options used in this section are identified in the table below: + +| Option | Description | +|---|-| +| `-cat`| Display file contents. | +| `-mkdir`| Create directory in HDFS. | +| `-put`| Copy file from local file system to HDFS. | + +### Create Data Files + +Perform the following steps to create data files used in subsequent exercises: + +1. Create an HDFS directory for PXF example data files: + +``` shell + $ sudo -u hdfs hdfs dfs -mkdir -p /data/pxf_examples +``` + +2. Create a delimited plain text file: + +``` shell +$ vi /tmp/pxf_hdfs_simple.txt --- End diff -- Does it make sense to change these into `echo` commands so they can just be cut/pasted? Like: $ echo 'Prague,Jan,101,4875.33 Rome,Mar,87,1557.39 Bangalore,May,317,8936.99 Beijing,Jul,411,11600.67' >> pxf_hdfs_simple.txt > PXF HDFS documentation - restructure content and include more examples > -- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, > Avro) profiles. restructure the content
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15606511#comment-15606511 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r84996425 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,388 +2,282 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. --- End diff -- Add an XREF here. > PXF HDFS documentation - restructure content and include more examples > -- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, > Avro) profiles. restructure the content as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15606513#comment-15606513 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85002565 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,388 +2,282 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class=accessor_class=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; -``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, etc. -``` sql -INSERT INTO table_name ...; -``` +The HDFS file system command is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. + +`hdfs dfs` options used in this section are identified in the table below: + +| Option | Description | +|---|-| +| `-cat`| Display file contents. | +| `-mkdir`| Create directory in HDFS. | +| `-put`| Copy file from local file system to HDFS. | + +### Create Data Files + +Perform the following steps to create data files used in subsequent exercises: + +1. Create an HDFS directory for PXF example data files: + +``` shell + $ sudo -u hdfs hdfs dfs -mkdir -p /data/pxf_examples +``` + +2. Create a delimited plain text file: + +``` shell +$ vi /tmp/pxf_hdfs_simple.txt +``` + +3. Copy and paste the following data into `pxf_hdfs_simple.txt`: + +``` pre +Prague,Jan,101,4875.33 +Rome,Mar,87,1557.39 +Bangalore,May,317,8936.99 +Beijing,Jul,411,11600.67 +``` + +Notice the use of the comma `,` to separate the four data fields. + +4. Add the data file to HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_simple.txt /data/pxf_examples/ +``` + +5. Display the contents of the `pxf_hdfs_simple.txt` file stored in HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -cat /data/pxf_examples/pxf_hdfs_simple.txt +``` + +6. Create a second delimited plain text file: + +``` shell +$ vi /tmp/pxf_hdfs_multi.txt +``` -To read the data in the files or to write based on the existing format, use `FORMAT`, `PROFILE`, or one of the classes. - -This
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15606514#comment-15606514 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85003214 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,388 +2,282 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class=accessor_class=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; -``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, etc. -``` sql -INSERT INTO table_name ...; -``` +The HDFS file system command is `hdfs dfs []`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. + +`hdfs dfs` options used in this section are identified in the table below: + +| Option | Description | +|---|-| +| `-cat`| Display file contents. | +| `-mkdir`| Create directory in HDFS. | +| `-put`| Copy file from local file system to HDFS. | + +### Create Data Files + +Perform the following steps to create data files used in subsequent exercises: + +1. Create an HDFS directory for PXF example data files: + +``` shell + $ sudo -u hdfs hdfs dfs -mkdir -p /data/pxf_examples +``` + +2. Create a delimited plain text file: + +``` shell +$ vi /tmp/pxf_hdfs_simple.txt +``` + +3. Copy and paste the following data into `pxf_hdfs_simple.txt`: + +``` pre +Prague,Jan,101,4875.33 +Rome,Mar,87,1557.39 +Bangalore,May,317,8936.99 +Beijing,Jul,411,11600.67 +``` + +Notice the use of the comma `,` to separate the four data fields. + +4. Add the data file to HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_simple.txt /data/pxf_examples/ +``` + +5. Display the contents of the `pxf_hdfs_simple.txt` file stored in HDFS: + +``` shell +$ sudo -u hdfs hdfs dfs -cat /data/pxf_examples/pxf_hdfs_simple.txt +``` + +6. Create a second delimited plain text file: + +``` shell +$ vi /tmp/pxf_hdfs_multi.txt +``` -To read the data in the files or to write based on the existing format, use `FORMAT`, `PROFILE`, or one of the classes. - -This
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15606515#comment-15606515 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r85003579 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -415,93 +312,101 @@ The following example uses the Avro schema shown in [Sample Avro Schema](#topic_ {"name":"street", "type":"string"}, {"name":"city", "type":"string"}] } - }, { - "name": "relationship", -"type": { -"type": "enum", -"name": "relationshipEnum", -"symbols": ["MARRIED","LOVE","FRIEND","COLLEAGUE","STRANGER","ENEMY"] -} - }, { -"name" : "md5", -"type": { -"type" : "fixed", -"name" : "md5Fixed", -"size" : 4 -} } ], "doc:" : "A basic schema for storing messages" } ``` - Sample Avro Data (JSON) +### Sample Avro Data (JSON) + +Create a text file named `pxf_hdfs_avro.txt`: + +``` shell +$ vi /tmp/pxf_hdfs_avro.txt +``` + +Enter the following data into `pxf_hdfs_avro.txt`: ``` pre -{"id":1, "username":"john","followers":["kate", "santosh"], "rank":null, "relationship": "FRIEND", "fmap": {"kate":10,"santosh":4}, -"address":{"street":"renaissance drive", "number":1,"city":"san jose"}, "md5":\u3F00\u007A\u0073\u0074} +{"id":1, "username":"john","followers":["kate", "santosh"], "relationship": "FRIEND", "fmap": {"kate":10,"santosh":4}, "address":{"number":1, "street":"renaissance drive", "city":"san jose"}} + +{"id":2, "username":"jim","followers":["john", "pam"], "relationship": "COLLEAGUE", "fmap": {"john":3,"pam":3}, "address":{"number":9, "street":"deer creek", "city":"palo alto"}} +``` + +The sample data uses a comma `,` to separate top level records and a colon `:` to separate map/key values and record field name/values. -{"id":2, "username":"jim","followers":["john", "pam"], "rank":3, "relationship": "COLLEAGUE", "fmap": {"john":3,"pam":3}, -"address":{"street":"deer creek", "number":9,"city":"palo alto"}, "md5":\u0010\u0021\u0003\u0004} +Convert the text file to Avro format. There are various ways to perform the conversion programmatically and via the command line. In this example, we use the [Java Avro tools](http://avro.apache.org/releases.html), and the jar file resides in the current directory: + +``` shell +$ java -jar ./avro-tools-1.8.1.jar fromjson --schema-file /tmp/avro_schema.avsc /tmp/pxf_hdfs_avro.txt > /tmp/pxf_hdfs_avro.avro ``` -To map this Avro file to an external table, the top-level primitive fields ("id" of type long and "username" of type string) are mapped to their equivalent HAWQ types (bigint and text). The remaining complex fields are mapped to text columns: +The generated Avro binary data file is written to `/tmp/pxf_hdfs_avro.avro`. Copy this file to HDFS: -``` sql -gpadmin=# CREATE EXTERNAL TABLE avro_complex - (id bigint, - username text, - followers text, - rank int, - fmap text, - address text, - relationship text, - md5 bytea) -LOCATION ('pxf://namehost:51200/tmp/avro_complex?PROFILE=Avro') -FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'); +``` shell +$ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_avro.avro /data/pxf_examples/ ``` +### Querying Avro Data + +Create a queryable external table from this Avro file: -The above command uses default delimiters for separating components of the complex types. This command is equivalent to the one above, but it explicitly sets the delimiters using the Avro profile parameters: +- Map the top-level primitive fields, `id` (type long) and `username` (type string), to their equivalent HAWQ types (bigint and text). +- Map the remaining complex fields to type text. +- Explicitly set the record, map, and collection delimiters using the Avro profile custom options: ``` sql -gpadmin=# CREATE EXTERNAL TABLE avro_complex - (id bigint, - username text, - followers text, - rank int, - fmap text, - address text, - relationship text, - md5 bytea) -LOCATION ('pxf://localhost:51200/tmp/avro_complex?PROFILE=Avro_DELIM=,_DELIM=:_DELIM=:') -FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'); +gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_avro(id bigint, username text, followers text, fmap text, relationship text, address text) +LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_avro.avro?PROFILE=Avro_DELIM=,_DELIM=:_DELIM=:') +
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15606508#comment-15606508 ] ASF GitHub Bot commented on HAWQ-1107: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r84997631 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,388 +2,282 @@ title: Accessing HDFS File Data --- -## Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## Prerequisites -## Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name -( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?[=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (); -``` +## HDFS File Formats -where `` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class=accessor_class=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; -``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +## HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, etc. --- End diff -- Change "etc." to "and so forth." > PXF HDFS documentation - restructure content and include more examples > -- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, > Avro) profiles. restructure the content as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602963#comment-15602963 ] ASF GitHub Bot commented on HAWQ-1107: -- GitHub user lisakowen opened a pull request: https://github.com/apache/incubator-hawq-docs/pull/34 HAWQ-1107 - subnav chgs for pxf hdfs plugin content restructure subnav changes You can merge this pull request into a Git repository by running: $ git pull https://github.com/lisakowen/incubator-hawq-docs feature/subnav-pxfhdfs-enhance Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-docs/pull/34.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #34 commit f350e41fa419e9fb661f4ccb6e8793b7d9e9a40b Author: Lisa OwenDate: 2016-10-24T19:30:37Z subna chgs for pxf hdfs plugin content restructure > PXF HDFS documentation - restructure content and include more examples > -- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, > Avro) profiles. restructure the content as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples
[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15602949#comment-15602949 ] ASF GitHub Bot commented on HAWQ-1107: -- GitHub user lisakowen opened a pull request: https://github.com/apache/incubator-hawq-docs/pull/33 HAWQ-1107 - enhance PXF HDFS plugin documentation added more examples, restructured the content, removed SequenceWritable references. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lisakowen/incubator-hawq-docs feature/pxfhdfs-enhance Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-docs/pull/33.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #33 commit 9ca277927bebd9c8d79bdf4619dfaf94a695c838 Author: Lisa OwenDate: 2016-10-14T22:29:22Z start restructuring HDFS plug-in page commit 2da7a92a3e8431335a48005d55a70c9eba333e16 Author: Lisa Owen Date: 2016-10-17T23:27:23Z more content and rearranging of pxf hdfs plugin page commit 5a941a70bda0e8466b5aa5dd2885840fce14c522 Author: Lisa Owen Date: 2016-10-18T16:57:09Z more rework of hdfs plug in page commit fd029d568589f5a4e2461d92437963d97f7d3198 Author: Lisa Owen Date: 2016-10-20T19:20:21Z remove SerialWritable, use namenode for host commit 6ba64f94d5b11397c98f46eb14d5c6e48d17a6cc Author: Lisa Owen Date: 2016-10-20T21:12:43Z use more descriptive file names commit 86d13b312ea8591949b8a811973937ab60f74df9 Author: Lisa Owen Date: 2016-10-20T22:36:01Z more mods to HDFS plugin docs > PXF HDFS documentation - restructure content and include more examples > -- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, > Avro) profiles. restructure the content as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)
[ https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15589313#comment-15589313 ] ASF GitHub Bot commented on HAWQ-1096: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq-docs/pull/25 > document the HAWQ built-in languages (SQL, C, internal) > --- > > Key: HAWQ-1096 > URL: https://issues.apache.org/jira/browse/HAWQ-1096 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > > the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, > C and internal. add content to introduce these languages with relevant > examples and links. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation
[ https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15589318#comment-15589318 ] ASF GitHub Bot commented on HAWQ-1095: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq-docs/pull/23 > enhance database driver and API documentation > - > > Key: HAWQ-1095 > URL: https://issues.apache.org/jira/browse/HAWQ-1095 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > docs contain very brief references to JDBC/ODBC and none at all to libpq. > add more content in these areas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)
[ https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15589308#comment-15589308 ] ASF GitHub Bot commented on HAWQ-1096: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq-docs/pull/27 > document the HAWQ built-in languages (SQL, C, internal) > --- > > Key: HAWQ-1096 > URL: https://issues.apache.org/jira/browse/HAWQ-1096 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > > the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, > C and internal. add content to introduce these languages with relevant > examples and links. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation
[ https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15589261#comment-15589261 ] ASF GitHub Bot commented on HAWQ-1095: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r84117499 --- Diff: clientaccess/g-database-application-interfaces.html.md.erb --- @@ -1,8 +1,96 @@ --- -title: ODBC/JDBC Application Interfaces +title: HAWQ Database Drivers and APIs --- +You may want to connect your existing Business Intelligence (BI) or Analytics applications with HAWQ. The database application programming interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs. -You may want to deploy your existing Business Intelligence (BI) or Analytics applications with HAWQ. The most commonly used database application programming interfaces with HAWQ are the ODBC and JDBC APIs. +HAWQ provides the following connectivity tools for connecting to the database: + + - ODBC driver + - JDBC driver + - `libpq` - PostgreSQL C API + +## HAWQ Drivers + +ODBC and JDBC drivers for HAWQ are available as a separate download from Pivotal Network [Pivotal Network](https://network.pivotal.io/products/pivotal-hdb). + +### ODBC Driver + +The ODBC API specifies a standard set of C interfaces for accessing database management systems. For additional information on using the ODBC API, refer to the [ODBC Programmer's Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) documentation. + +HAWQ supports the DataDirect ODBC Driver. Installation instructions for this driver are provided on the Pivotal Network driver download page. Refer to [HAWQ ODBC Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23) for HAWQ-specific ODBC driver information. --- End diff -- Ok - thanks. I think in other cases PDFs of the actual docs are included. This might only be in the Windows downloads. > enhance database driver and API documentation > - > > Key: HAWQ-1095 > URL: https://issues.apache.org/jira/browse/HAWQ-1095 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > docs contain very brief references to JDBC/ODBC and none at all to libpq. > add more content in these areas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)
[ https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587174#comment-15587174 ] ASF GitHub Bot commented on HAWQ-1096: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83978174 --- Diff: plext/builtin_langs.html.md.erb --- @@ -0,0 +1,110 @@ +--- +title: Using HAWQ Built-In Languages +--- + +This section provides an introduction to using the HAWQ built-in languages. + +HAWQ supports user-defined functions created with the SQL and C built-in languages. HAWQ also supports user-defined aliases for internal functions. + + +## Enabling Built-in Language Support + +Support for SQL, internal, and C language user-defined functions is enabled by default for all HAWQ databases. + +## SQL + +SQL functions execute an arbitrary list of SQL statements. The SQL statements in the body of an SQL function must be separated by semicolons. The final statement in a non-void-returning SQL function must be a [SELECT](../reference/sql/SELECT.html) that returns data of the type specified by the function's return type. The function will return a single or set of rows corresponding to this last SQL query. + +The following example creates and calls an SQL function to count the number of rows of the database named `orders`: + +``` sql +gpadmin=# CREATE FUNCTION count_orders() RETURNS bigint AS $$ + SELECT count(*) FROM orders; +$$ LANGUAGE SQL; +CREATE FUNCTION +gpadmin=# select count_orders(); + my_count +-- + 830513 +(1 row) +``` + +For additional information on creating SQL functions, refer to [Query Language (SQL) Functions](https://www.postgresql.org/docs/8.2/static/xfunc-sql.html) in the PostgreSQL documentation. + +## Internal --- End diff -- Change title to "Internal Functions"? > document the HAWQ built-in languages (SQL, C, internal) > --- > > Key: HAWQ-1096 > URL: https://issues.apache.org/jira/browse/HAWQ-1096 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > > the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, > C and internal. add content to introduce these languages with relevant > examples and links. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)
[ https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587169#comment-15587169 ] ASF GitHub Bot commented on HAWQ-1096: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83979056 --- Diff: plext/builtin_langs.html.md.erb --- @@ -0,0 +1,110 @@ +--- +title: Using HAWQ Built-In Languages +--- + +This section provides an introduction to using the HAWQ built-in languages. + +HAWQ supports user-defined functions created with the SQL and C built-in languages. HAWQ also supports user-defined aliases for internal functions. + + +## Enabling Built-in Language Support + +Support for SQL, internal, and C language user-defined functions is enabled by default for all HAWQ databases. + +## SQL + +SQL functions execute an arbitrary list of SQL statements. The SQL statements in the body of an SQL function must be separated by semicolons. The final statement in a non-void-returning SQL function must be a [SELECT](../reference/sql/SELECT.html) that returns data of the type specified by the function's return type. The function will return a single or set of rows corresponding to this last SQL query. + +The following example creates and calls an SQL function to count the number of rows of the database named `orders`: + +``` sql +gpadmin=# CREATE FUNCTION count_orders() RETURNS bigint AS $$ + SELECT count(*) FROM orders; +$$ LANGUAGE SQL; +CREATE FUNCTION +gpadmin=# select count_orders(); + my_count +-- + 830513 +(1 row) +``` + +For additional information on creating SQL functions, refer to [Query Language (SQL) Functions](https://www.postgresql.org/docs/8.2/static/xfunc-sql.html) in the PostgreSQL documentation. --- End diff -- Global edit: Change "For additional information on" to "For additional information about" > document the HAWQ built-in languages (SQL, C, internal) > --- > > Key: HAWQ-1096 > URL: https://issues.apache.org/jira/browse/HAWQ-1096 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > > the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, > C and internal. add content to introduce these languages with relevant > examples and links. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)
[ https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587168#comment-15587168 ] ASF GitHub Bot commented on HAWQ-1096: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83977628 --- Diff: plext/builtin_langs.html.md.erb --- @@ -0,0 +1,110 @@ +--- +title: Using HAWQ Built-In Languages +--- + +This section provides an introduction to using the HAWQ built-in languages. + +HAWQ supports user-defined functions created with the SQL and C built-in languages. HAWQ also supports user-defined aliases for internal functions. + + +## Enabling Built-in Language Support + +Support for SQL, internal, and C language user-defined functions is enabled by default for all HAWQ databases. + +## SQL + +SQL functions execute an arbitrary list of SQL statements. The SQL statements in the body of an SQL function must be separated by semicolons. The final statement in a non-void-returning SQL function must be a [SELECT](../reference/sql/SELECT.html) that returns data of the type specified by the function's return type. The function will return a single or set of rows corresponding to this last SQL query. --- End diff -- Global: change "an SQL" to "a SQL" (pronounced 'sequel') > document the HAWQ built-in languages (SQL, C, internal) > --- > > Key: HAWQ-1096 > URL: https://issues.apache.org/jira/browse/HAWQ-1096 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > > the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, > C and internal. add content to introduce these languages with relevant > examples and links. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)
[ https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587170#comment-15587170 ] ASF GitHub Bot commented on HAWQ-1096: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83978854 --- Diff: plext/builtin_langs.html.md.erb --- @@ -0,0 +1,110 @@ +--- +title: Using HAWQ Built-In Languages +--- + +This section provides an introduction to using the HAWQ built-in languages. + +HAWQ supports user-defined functions created with the SQL and C built-in languages. HAWQ also supports user-defined aliases for internal functions. + + +## Enabling Built-in Language Support + +Support for SQL, internal, and C language user-defined functions is enabled by default for all HAWQ databases. + +## SQL + +SQL functions execute an arbitrary list of SQL statements. The SQL statements in the body of an SQL function must be separated by semicolons. The final statement in a non-void-returning SQL function must be a [SELECT](../reference/sql/SELECT.html) that returns data of the type specified by the function's return type. The function will return a single or set of rows corresponding to this last SQL query. + +The following example creates and calls an SQL function to count the number of rows of the database named `orders`: + +``` sql +gpadmin=# CREATE FUNCTION count_orders() RETURNS bigint AS $$ + SELECT count(*) FROM orders; +$$ LANGUAGE SQL; +CREATE FUNCTION +gpadmin=# select count_orders(); + my_count +-- + 830513 +(1 row) +``` + +For additional information on creating SQL functions, refer to [Query Language (SQL) Functions](https://www.postgresql.org/docs/8.2/static/xfunc-sql.html) in the PostgreSQL documentation. + +## Internal + +Many HAWQ internal functions are written in C. These functions are declared during initialization of the database cluster and statically linked to the HAWQ server. See [Built-in Functions and Operators](../query/functions-operators.html#topic29) for detailed information on HAWQ internal functions. + +While users cannot define new internal functions, they can create aliases for existing internal functions. + +The following example creates a new function named `all_caps` that will be defined as an alias for the `upper` HAWQ internal function: + + +``` sql +gpadmin=# CREATE FUNCTION all_caps (text) RETURNS text AS 'upper' +LANGUAGE internal STRICT; +CREATE FUNCTION +gpadmin=# SELECT all_caps('change me'); + all_caps +--- + CHANGE ME +(1 row) + +``` + +For more information on aliasing internal functions, refer to [Internal Functions](https://www.postgresql.org/docs/8.2/static/xfunc-internal.html) in the PostgreSQL documentation. + +## C + +User-defined functions written in C must be compiled into shared libraries to be loaded by the HAWQ server on demand. This dynamic loading distinguishes C language functions from internal functions that are written in C. --- End diff -- Avoid passive voice here: "You must compile user-defined functions written in C into shared libraries so that the HAWQ server can load them on demand." > document the HAWQ built-in languages (SQL, C, internal) > --- > > Key: HAWQ-1096 > URL: https://issues.apache.org/jira/browse/HAWQ-1096 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > > the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, > C and internal. add content to introduce these languages with relevant > examples and links. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)
[ https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587173#comment-15587173 ] ASF GitHub Bot commented on HAWQ-1096: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83978549 --- Diff: plext/builtin_langs.html.md.erb --- @@ -0,0 +1,110 @@ +--- +title: Using HAWQ Built-In Languages +--- + +This section provides an introduction to using the HAWQ built-in languages. + +HAWQ supports user-defined functions created with the SQL and C built-in languages. HAWQ also supports user-defined aliases for internal functions. + + +## Enabling Built-in Language Support + +Support for SQL, internal, and C language user-defined functions is enabled by default for all HAWQ databases. + +## SQL + +SQL functions execute an arbitrary list of SQL statements. The SQL statements in the body of an SQL function must be separated by semicolons. The final statement in a non-void-returning SQL function must be a [SELECT](../reference/sql/SELECT.html) that returns data of the type specified by the function's return type. The function will return a single or set of rows corresponding to this last SQL query. + +The following example creates and calls an SQL function to count the number of rows of the database named `orders`: + +``` sql +gpadmin=# CREATE FUNCTION count_orders() RETURNS bigint AS $$ + SELECT count(*) FROM orders; +$$ LANGUAGE SQL; +CREATE FUNCTION +gpadmin=# select count_orders(); + my_count +-- + 830513 +(1 row) +``` + +For additional information on creating SQL functions, refer to [Query Language (SQL) Functions](https://www.postgresql.org/docs/8.2/static/xfunc-sql.html) in the PostgreSQL documentation. + +## Internal + +Many HAWQ internal functions are written in C. These functions are declared during initialization of the database cluster and statically linked to the HAWQ server. See [Built-in Functions and Operators](../query/functions-operators.html#topic29) for detailed information on HAWQ internal functions. + +While users cannot define new internal functions, they can create aliases for existing internal functions. + +The following example creates a new function named `all_caps` that will be defined as an alias for the `upper` HAWQ internal function: --- End diff -- Edit: change "that will be defined as an" to "that is an" > document the HAWQ built-in languages (SQL, C, internal) > --- > > Key: HAWQ-1096 > URL: https://issues.apache.org/jira/browse/HAWQ-1096 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > > the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, > C and internal. add content to introduce these languages with relevant > examples and links. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)
[ https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587171#comment-15587171 ] ASF GitHub Bot commented on HAWQ-1096: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83978465 --- Diff: plext/builtin_langs.html.md.erb --- @@ -0,0 +1,110 @@ +--- +title: Using HAWQ Built-In Languages +--- + +This section provides an introduction to using the HAWQ built-in languages. + +HAWQ supports user-defined functions created with the SQL and C built-in languages. HAWQ also supports user-defined aliases for internal functions. + + +## Enabling Built-in Language Support + +Support for SQL, internal, and C language user-defined functions is enabled by default for all HAWQ databases. + +## SQL + +SQL functions execute an arbitrary list of SQL statements. The SQL statements in the body of an SQL function must be separated by semicolons. The final statement in a non-void-returning SQL function must be a [SELECT](../reference/sql/SELECT.html) that returns data of the type specified by the function's return type. The function will return a single or set of rows corresponding to this last SQL query. + +The following example creates and calls an SQL function to count the number of rows of the database named `orders`: + +``` sql +gpadmin=# CREATE FUNCTION count_orders() RETURNS bigint AS $$ + SELECT count(*) FROM orders; +$$ LANGUAGE SQL; +CREATE FUNCTION +gpadmin=# select count_orders(); + my_count +-- + 830513 +(1 row) +``` + +For additional information on creating SQL functions, refer to [Query Language (SQL) Functions](https://www.postgresql.org/docs/8.2/static/xfunc-sql.html) in the PostgreSQL documentation. + +## Internal + +Many HAWQ internal functions are written in C. These functions are declared during initialization of the database cluster and statically linked to the HAWQ server. See [Built-in Functions and Operators](../query/functions-operators.html#topic29) for detailed information on HAWQ internal functions. + +While users cannot define new internal functions, they can create aliases for existing internal functions. --- End diff -- Reword: **You** cannot define new internal functions, **but you** can create... > document the HAWQ built-in languages (SQL, C, internal) > --- > > Key: HAWQ-1096 > URL: https://issues.apache.org/jira/browse/HAWQ-1096 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > > the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, > C and internal. add content to introduce these languages with relevant > examples and links. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)
[ https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587175#comment-15587175 ] ASF GitHub Bot commented on HAWQ-1096: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83978153 --- Diff: plext/builtin_langs.html.md.erb --- @@ -0,0 +1,110 @@ +--- +title: Using HAWQ Built-In Languages +--- + +This section provides an introduction to using the HAWQ built-in languages. + +HAWQ supports user-defined functions created with the SQL and C built-in languages. HAWQ also supports user-defined aliases for internal functions. + + +## Enabling Built-in Language Support + +Support for SQL, internal, and C language user-defined functions is enabled by default for all HAWQ databases. + +## SQL + +SQL functions execute an arbitrary list of SQL statements. The SQL statements in the body of an SQL function must be separated by semicolons. The final statement in a non-void-returning SQL function must be a [SELECT](../reference/sql/SELECT.html) that returns data of the type specified by the function's return type. The function will return a single or set of rows corresponding to this last SQL query. + +The following example creates and calls an SQL function to count the number of rows of the database named `orders`: + +``` sql +gpadmin=# CREATE FUNCTION count_orders() RETURNS bigint AS $$ + SELECT count(*) FROM orders; +$$ LANGUAGE SQL; +CREATE FUNCTION +gpadmin=# select count_orders(); + my_count +-- + 830513 +(1 row) +``` + +For additional information on creating SQL functions, refer to [Query Language (SQL) Functions](https://www.postgresql.org/docs/8.2/static/xfunc-sql.html) in the PostgreSQL documentation. + +## Internal + +Many HAWQ internal functions are written in C. These functions are declared during initialization of the database cluster and statically linked to the HAWQ server. See [Built-in Functions and Operators](../query/functions-operators.html#topic29) for detailed information on HAWQ internal functions. + +While users cannot define new internal functions, they can create aliases for existing internal functions. + +The following example creates a new function named `all_caps` that will be defined as an alias for the `upper` HAWQ internal function: + + +``` sql +gpadmin=# CREATE FUNCTION all_caps (text) RETURNS text AS 'upper' +LANGUAGE internal STRICT; +CREATE FUNCTION +gpadmin=# SELECT all_caps('change me'); + all_caps +--- + CHANGE ME +(1 row) + +``` + +For more information on aliasing internal functions, refer to [Internal Functions](https://www.postgresql.org/docs/8.2/static/xfunc-internal.html) in the PostgreSQL documentation. + +## C --- End diff -- This id value is the same as the previous one - should be unique. Also change header to "C Functions"? > document the HAWQ built-in languages (SQL, C, internal) > --- > > Key: HAWQ-1096 > URL: https://issues.apache.org/jira/browse/HAWQ-1096 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > > the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, > C and internal. add content to introduce these languages with relevant > examples and links. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)
[ https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587172#comment-15587172 ] ASF GitHub Bot commented on HAWQ-1096: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/25#discussion_r83976414 --- Diff: plext/UsingProceduralLanguages.html.md.erb --- @@ -1,13 +1,16 @@ --- -title: Using Procedural Languages and Extensions in HAWQ +title: Using Languages and Extensions in HAWQ --- -HAWQ allows user-defined functions to be written in other languages besides SQL and C. These other languages are generically called *procedural languages* (PLs). +HAWQ supports user-defined functions created with the SQL and C built-in languages, including supporting user-defined aliases for internal functions. --- End diff -- This needs a bit of an edit: HAWQ supports user-defined functions **that are** created with the SQL and C built-in languages, **and also supports** user-defined aliases for internal functions. > document the HAWQ built-in languages (SQL, C, internal) > --- > > Key: HAWQ-1096 > URL: https://issues.apache.org/jira/browse/HAWQ-1096 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > > the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, > C and internal. add content to introduce these languages with relevant > examples and links. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation
[ https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587132#comment-15587132 ] ASF GitHub Bot commented on HAWQ-1095: -- Github user lisakowen commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r83977371 --- Diff: clientaccess/g-database-application-interfaces.html.md.erb --- @@ -1,8 +1,96 @@ --- -title: ODBC/JDBC Application Interfaces +title: HAWQ Database Drivers and APIs --- +You may want to connect your existing Business Intelligence (BI) or Analytics applications with HAWQ. The database application programming interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs. -You may want to deploy your existing Business Intelligence (BI) or Analytics applications with HAWQ. The most commonly used database application programming interfaces with HAWQ are the ODBC and JDBC APIs. +HAWQ provides the following connectivity tools for connecting to the database: + + - ODBC driver + - JDBC driver + - `libpq` - PostgreSQL C API + +## HAWQ Drivers + +ODBC and JDBC drivers for HAWQ are available as a separate download from Pivotal Network [Pivotal Network](https://network.pivotal.io/products/pivotal-hdb). + +### ODBC Driver + +The ODBC API specifies a standard set of C interfaces for accessing database management systems. For additional information on using the ODBC API, refer to the [ODBC Programmer's Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) documentation. + +HAWQ supports the DataDirect ODBC Driver. Installation instructions for this driver are provided on the Pivotal Network driver download page. Refer to [HAWQ ODBC Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23) for HAWQ-specific ODBC driver information. --- End diff -- users will download the readme from pivnet. the link at the end of the readme points to a datadirect page from which one could navigate to the links i have included. i don't see any other docs when i untar the download package. > enhance database driver and API documentation > - > > Key: HAWQ-1095 > URL: https://issues.apache.org/jira/browse/HAWQ-1095 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > docs contain very brief references to JDBC/ODBC and none at all to libpq. > add more content in these areas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation
[ https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587108#comment-15587108 ] ASF GitHub Bot commented on HAWQ-1095: -- Github user lisakowen commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r83976521 --- Diff: clientaccess/g-database-application-interfaces.html.md.erb --- @@ -1,8 +1,96 @@ --- -title: ODBC/JDBC Application Interfaces +title: HAWQ Database Drivers and APIs --- +You may want to connect your existing Business Intelligence (BI) or Analytics applications with HAWQ. The database application programming interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs. -You may want to deploy your existing Business Intelligence (BI) or Analytics applications with HAWQ. The most commonly used database application programming interfaces with HAWQ are the ODBC and JDBC APIs. +HAWQ provides the following connectivity tools for connecting to the database: + + - ODBC driver + - JDBC driver + - `libpq` - PostgreSQL C API + +## HAWQ Drivers + +ODBC and JDBC drivers for HAWQ are available as a separate download from Pivotal Network [Pivotal Network](https://network.pivotal.io/products/pivotal-hdb). + +### ODBC Driver + +The ODBC API specifies a standard set of C interfaces for accessing database management systems. For additional information on using the ODBC API, refer to the [ODBC Programmer's Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) documentation. + +HAWQ supports the DataDirect ODBC Driver. Installation instructions for this driver are provided on the Pivotal Network driver download page. Refer to [HAWQ ODBC Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23) for HAWQ-specific ODBC driver information. + + Connection Data Source +The information required by the HAWQ ODBC driver to connect to a database is typically stored in a named data source. Depending on your platform, you may use [GUI](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_through_a_GUI_14.html%23) or [command line](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_in_the_UNIX_2fLinux_odbc_13.html%23) tools to create your data source definition. On Linux, ODBC data sources are typically defined in a file named `odbc.ini`. + +Commonly-specified HAWQ ODBC data source connection properties include: + +| Property Name| Value Description | +|---|-| +| Database | name of the database to which you want to connect | +| Driver | full path to the ODBC driver library file | +| HostName | HAWQ master host name | +| MaxLongVarcharSize | maximum size of columns of type long varchar | +| Password | password used to connect to the specified database | +| PortNumber | HAWQ master database port number | + +Refer to [Connection Option Descriptions](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fgreenplum-connection-option-descriptions.html%23) for a list of ODBC connection properties supported by the HAWQ DataDirect ODBC driver. + +Example HAWQ DataDirect ODBC driver data source definition: + +``` shell +[HAWQ-201] +Driver=/usr/local/hawq_drivers/odbc/lib/ddgplm27.so +Description=DataDirect 7.1 Greenplum Wire Protocol - for HAWQ +Database=getstartdb +HostName=hdm1 +PortNumber=5432 +Password=changeme +MaxLongVarcharSize=8192 +``` + +The first line, `[HAWQ-201]`, identifies the name of the data source. + +ODBC connection properties may also be specified in a connection string identifying either a data
[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation
[ https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587103#comment-15587103 ] ASF GitHub Bot commented on HAWQ-1095: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq-docs/pull/26 > enhance database driver and API documentation > - > > Key: HAWQ-1095 > URL: https://issues.apache.org/jira/browse/HAWQ-1095 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > docs contain very brief references to JDBC/ODBC and none at all to libpq. > add more content in these areas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation
[ https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587081#comment-15587081 ] ASF GitHub Bot commented on HAWQ-1095: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r83974367 --- Diff: clientaccess/g-database-application-interfaces.html.md.erb --- @@ -1,8 +1,96 @@ --- -title: ODBC/JDBC Application Interfaces +title: HAWQ Database Drivers and APIs --- +You may want to connect your existing Business Intelligence (BI) or Analytics applications with HAWQ. The database application programming interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs. -You may want to deploy your existing Business Intelligence (BI) or Analytics applications with HAWQ. The most commonly used database application programming interfaces with HAWQ are the ODBC and JDBC APIs. +HAWQ provides the following connectivity tools for connecting to the database: + + - ODBC driver + - JDBC driver + - `libpq` - PostgreSQL C API + +## HAWQ Drivers + +ODBC and JDBC drivers for HAWQ are available as a separate download from Pivotal Network [Pivotal Network](https://network.pivotal.io/products/pivotal-hdb). + +### ODBC Driver + +The ODBC API specifies a standard set of C interfaces for accessing database management systems. For additional information on using the ODBC API, refer to the [ODBC Programmer's Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) documentation. + +HAWQ supports the DataDirect ODBC Driver. Installation instructions for this driver are provided on the Pivotal Network driver download page. Refer to [HAWQ ODBC Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23) for HAWQ-specific ODBC driver information. --- End diff -- Are you sure the datadirect link contains the same info available in the HAWQ ODBC download? > enhance database driver and API documentation > - > > Key: HAWQ-1095 > URL: https://issues.apache.org/jira/browse/HAWQ-1095 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > docs contain very brief references to JDBC/ODBC and none at all to libpq. > add more content in these areas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation
[ https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587078#comment-15587078 ] ASF GitHub Bot commented on HAWQ-1095: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r83974668 --- Diff: clientaccess/g-database-application-interfaces.html.md.erb --- @@ -1,8 +1,96 @@ --- -title: ODBC/JDBC Application Interfaces +title: HAWQ Database Drivers and APIs --- +You may want to connect your existing Business Intelligence (BI) or Analytics applications with HAWQ. The database application programming interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs. -You may want to deploy your existing Business Intelligence (BI) or Analytics applications with HAWQ. The most commonly used database application programming interfaces with HAWQ are the ODBC and JDBC APIs. +HAWQ provides the following connectivity tools for connecting to the database: + + - ODBC driver + - JDBC driver + - `libpq` - PostgreSQL C API + +## HAWQ Drivers + +ODBC and JDBC drivers for HAWQ are available as a separate download from Pivotal Network [Pivotal Network](https://network.pivotal.io/products/pivotal-hdb). + +### ODBC Driver + +The ODBC API specifies a standard set of C interfaces for accessing database management systems. For additional information on using the ODBC API, refer to the [ODBC Programmer's Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) documentation. + +HAWQ supports the DataDirect ODBC Driver. Installation instructions for this driver are provided on the Pivotal Network driver download page. Refer to [HAWQ ODBC Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23) for HAWQ-specific ODBC driver information. + + Connection Data Source +The information required by the HAWQ ODBC driver to connect to a database is typically stored in a named data source. Depending on your platform, you may use [GUI](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_through_a_GUI_14.html%23) or [command line](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_in_the_UNIX_2fLinux_odbc_13.html%23) tools to create your data source definition. On Linux, ODBC data sources are typically defined in a file named `odbc.ini`. + +Commonly-specified HAWQ ODBC data source connection properties include: + +| Property Name| Value Description | +|---|-| +| Database | name of the database to which you want to connect | +| Driver | full path to the ODBC driver library file | +| HostName | HAWQ master host name | +| MaxLongVarcharSize | maximum size of columns of type long varchar | +| Password | password used to connect to the specified database | +| PortNumber | HAWQ master database port number | + +Refer to [Connection Option Descriptions](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fgreenplum-connection-option-descriptions.html%23) for a list of ODBC connection properties supported by the HAWQ DataDirect ODBC driver. + +Example HAWQ DataDirect ODBC driver data source definition: + +``` shell +[HAWQ-201] +Driver=/usr/local/hawq_drivers/odbc/lib/ddgplm27.so +Description=DataDirect 7.1 Greenplum Wire Protocol - for HAWQ +Database=getstartdb +HostName=hdm1 +PortNumber=5432 +Password=changeme +MaxLongVarcharSize=8192 +``` + +The first line, `[HAWQ-201]`, identifies the name of the data source. + +ODBC connection properties may also be specified in a connection string identifying either a data
[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation
[ https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587079#comment-15587079 ] ASF GitHub Bot commented on HAWQ-1095: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r83974918 --- Diff: clientaccess/g-database-application-interfaces.html.md.erb --- @@ -1,8 +1,96 @@ --- -title: ODBC/JDBC Application Interfaces +title: HAWQ Database Drivers and APIs --- +You may want to connect your existing Business Intelligence (BI) or Analytics applications with HAWQ. The database application programming interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs. -You may want to deploy your existing Business Intelligence (BI) or Analytics applications with HAWQ. The most commonly used database application programming interfaces with HAWQ are the ODBC and JDBC APIs. +HAWQ provides the following connectivity tools for connecting to the database: + + - ODBC driver + - JDBC driver + - `libpq` - PostgreSQL C API + +## HAWQ Drivers + +ODBC and JDBC drivers for HAWQ are available as a separate download from Pivotal Network [Pivotal Network](https://network.pivotal.io/products/pivotal-hdb). + +### ODBC Driver + +The ODBC API specifies a standard set of C interfaces for accessing database management systems. For additional information on using the ODBC API, refer to the [ODBC Programmer's Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) documentation. + +HAWQ supports the DataDirect ODBC Driver. Installation instructions for this driver are provided on the Pivotal Network driver download page. Refer to [HAWQ ODBC Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23) for HAWQ-specific ODBC driver information. + + Connection Data Source +The information required by the HAWQ ODBC driver to connect to a database is typically stored in a named data source. Depending on your platform, you may use [GUI](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_through_a_GUI_14.html%23) or [command line](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_in_the_UNIX_2fLinux_odbc_13.html%23) tools to create your data source definition. On Linux, ODBC data sources are typically defined in a file named `odbc.ini`. + +Commonly-specified HAWQ ODBC data source connection properties include: + +| Property Name| Value Description | +|---|-| +| Database | name of the database to which you want to connect | +| Driver | full path to the ODBC driver library file | +| HostName | HAWQ master host name | +| MaxLongVarcharSize | maximum size of columns of type long varchar | +| Password | password used to connect to the specified database | +| PortNumber | HAWQ master database port number | + +Refer to [Connection Option Descriptions](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fgreenplum-connection-option-descriptions.html%23) for a list of ODBC connection properties supported by the HAWQ DataDirect ODBC driver. + +Example HAWQ DataDirect ODBC driver data source definition: + +``` shell +[HAWQ-201] +Driver=/usr/local/hawq_drivers/odbc/lib/ddgplm27.so +Description=DataDirect 7.1 Greenplum Wire Protocol - for HAWQ +Database=getstartdb +HostName=hdm1 +PortNumber=5432 +Password=changeme +MaxLongVarcharSize=8192 +``` + +The first line, `[HAWQ-201]`, identifies the name of the data source. + +ODBC connection properties may also be specified in a connection string identifying either a data
[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation
[ https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587077#comment-15587077 ] ASF GitHub Bot commented on HAWQ-1095: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r83974424 --- Diff: clientaccess/g-database-application-interfaces.html.md.erb --- @@ -1,8 +1,96 @@ --- -title: ODBC/JDBC Application Interfaces +title: HAWQ Database Drivers and APIs --- +You may want to connect your existing Business Intelligence (BI) or Analytics applications with HAWQ. The database application programming interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs. -You may want to deploy your existing Business Intelligence (BI) or Analytics applications with HAWQ. The most commonly used database application programming interfaces with HAWQ are the ODBC and JDBC APIs. +HAWQ provides the following connectivity tools for connecting to the database: + + - ODBC driver + - JDBC driver + - `libpq` - PostgreSQL C API + +## HAWQ Drivers + +ODBC and JDBC drivers for HAWQ are available as a separate download from Pivotal Network [Pivotal Network](https://network.pivotal.io/products/pivotal-hdb). + +### ODBC Driver + +The ODBC API specifies a standard set of C interfaces for accessing database management systems. For additional information on using the ODBC API, refer to the [ODBC Programmer's Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) documentation. + +HAWQ supports the DataDirect ODBC Driver. Installation instructions for this driver are provided on the Pivotal Network driver download page. Refer to [HAWQ ODBC Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23) for HAWQ-specific ODBC driver information. + + Connection Data Source +The information required by the HAWQ ODBC driver to connect to a database is typically stored in a named data source. Depending on your platform, you may use [GUI](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_through_a_GUI_14.html%23) or [command line](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_in_the_UNIX_2fLinux_odbc_13.html%23) tools to create your data source definition. On Linux, ODBC data sources are typically defined in a file named `odbc.ini`. + +Commonly-specified HAWQ ODBC data source connection properties include: + +| Property Name| Value Description | +|---|-| +| Database | name of the database to which you want to connect | +| Driver | full path to the ODBC driver library file | +| HostName | HAWQ master host name | +| MaxLongVarcharSize | maximum size of columns of type long varchar | +| Password | password used to connect to the specified database | +| PortNumber | HAWQ master database port number | --- End diff -- Let's initial-capitalize the second column. > enhance database driver and API documentation > - > > Key: HAWQ-1095 > URL: https://issues.apache.org/jira/browse/HAWQ-1095 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > docs contain very brief references to JDBC/ODBC and none at all to libpq. > add more content in these areas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation
[ https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15587080#comment-15587080 ] ASF GitHub Bot commented on HAWQ-1095: -- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/23#discussion_r83974977 --- Diff: clientaccess/g-database-application-interfaces.html.md.erb --- @@ -1,8 +1,96 @@ --- -title: ODBC/JDBC Application Interfaces +title: HAWQ Database Drivers and APIs --- +You may want to connect your existing Business Intelligence (BI) or Analytics applications with HAWQ. The database application programming interfaces most commonly used with HAWQ are the Postgres and ODBC and JDBC APIs. -You may want to deploy your existing Business Intelligence (BI) or Analytics applications with HAWQ. The most commonly used database application programming interfaces with HAWQ are the ODBC and JDBC APIs. +HAWQ provides the following connectivity tools for connecting to the database: + + - ODBC driver + - JDBC driver + - `libpq` - PostgreSQL C API + +## HAWQ Drivers + +ODBC and JDBC drivers for HAWQ are available as a separate download from Pivotal Network [Pivotal Network](https://network.pivotal.io/products/pivotal-hdb). + +### ODBC Driver + +The ODBC API specifies a standard set of C interfaces for accessing database management systems. For additional information on using the ODBC API, refer to the [ODBC Programmer's Reference](https://msdn.microsoft.com/en-us/library/ms714177(v=vs.85).aspx) documentation. + +HAWQ supports the DataDirect ODBC Driver. Installation instructions for this driver are provided on the Pivotal Network driver download page. Refer to [HAWQ ODBC Driver](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fthe-greenplum-wire-protocol-driver.html%23) for HAWQ-specific ODBC driver information. + + Connection Data Source +The information required by the HAWQ ODBC driver to connect to a database is typically stored in a named data source. Depending on your platform, you may use [GUI](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_through_a_GUI_14.html%23) or [command line](http://media.datadirect.com/download/docs/odbc/allodbc/index.html#page/odbc%2FData_Source_Configuration_in_the_UNIX_2fLinux_odbc_13.html%23) tools to create your data source definition. On Linux, ODBC data sources are typically defined in a file named `odbc.ini`. + +Commonly-specified HAWQ ODBC data source connection properties include: + +| Property Name| Value Description | +|---|-| +| Database | name of the database to which you want to connect | +| Driver | full path to the ODBC driver library file | +| HostName | HAWQ master host name | +| MaxLongVarcharSize | maximum size of columns of type long varchar | +| Password | password used to connect to the specified database | +| PortNumber | HAWQ master database port number | + +Refer to [Connection Option Descriptions](http://media.datadirect.com/download/docs/odbc/allodbc/#page/odbc%2Fgreenplum-connection-option-descriptions.html%23) for a list of ODBC connection properties supported by the HAWQ DataDirect ODBC driver. + +Example HAWQ DataDirect ODBC driver data source definition: + +``` shell +[HAWQ-201] +Driver=/usr/local/hawq_drivers/odbc/lib/ddgplm27.so +Description=DataDirect 7.1 Greenplum Wire Protocol - for HAWQ +Database=getstartdb +HostName=hdm1 +PortNumber=5432 +Password=changeme +MaxLongVarcharSize=8192 +``` + +The first line, `[HAWQ-201]`, identifies the name of the data source. + +ODBC connection properties may also be specified in a connection string identifying either a data
[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)
[ https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586400#comment-15586400 ] ASF GitHub Bot commented on HAWQ-1096: -- GitHub user lisakowen opened a pull request: https://github.com/apache/incubator-hawq-docs/pull/27 HAWQ-1096 - add subnav entry for built-in languages add subnav for new topic You can merge this pull request into a Git repository by running: $ git pull https://github.com/lisakowen/incubator-hawq-docs feature/subnav-builtin-langs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-docs/pull/27.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #27 > document the HAWQ built-in languages (SQL, C, internal) > --- > > Key: HAWQ-1096 > URL: https://issues.apache.org/jira/browse/HAWQ-1096 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > > the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, > C and internal. add content to introduce these languages with relevant > examples and links. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1096) document the HAWQ built-in languages (SQL, C, internal)
[ https://issues.apache.org/jira/browse/HAWQ-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586041#comment-15586041 ] ASF GitHub Bot commented on HAWQ-1096: -- GitHub user lisakowen opened a pull request: https://github.com/apache/incubator-hawq-docs/pull/25 HAWQ-1096 - add content for hawq built-in languages add content for sql, c, and internal hawq built in languages You can merge this pull request into a Git repository by running: $ git pull https://github.com/lisakowen/incubator-hawq-docs feature/builtin-langs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-docs/pull/25.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #25 commit 504c662be21dc344a161b81a9c627a8f6d7861cd Author: Lisa OwenDate: 2016-10-05T21:33:36Z add file discussing hawq built-in languages commit 8e27e9093f1d27277d676386144ee895ad004f86 Author: Lisa Owen Date: 2016-10-05T21:34:36Z include built-in languages in PL lang landing page commit bd85fdbc31cb463855c2606fde48d803dccb3de2 Author: Lisa Owen Date: 2016-10-05T21:47:11Z c user-defined function example - add _c to function name to avoid confusion commit 1332870d01d2f8da2f8284ac167253d7005c6dfd Author: Lisa Owen Date: 2016-10-10T22:24:20Z builtin langs - clarify and add some links > document the HAWQ built-in languages (SQL, C, internal) > --- > > Key: HAWQ-1096 > URL: https://issues.apache.org/jira/browse/HAWQ-1096 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > > the HAWQ docs do not discuss the built-in languages supported by HAWQ - SQL, > C and internal. add content to introduce these languages with relevant > examples and links. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1095) enhance database driver and API documentation
[ https://issues.apache.org/jira/browse/HAWQ-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586004#comment-15586004 ] ASF GitHub Bot commented on HAWQ-1095: -- GitHub user lisakowen opened a pull request: https://github.com/apache/incubator-hawq-docs/pull/23 HAWQ-1095 - enhance database api docs add content for jdbc, odbc, libpq You can merge this pull request into a Git repository by running: $ git pull https://github.com/lisakowen/incubator-hawq-docs feature/dbapiinfo Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-docs/pull/23.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23 commit 2c0f4b19bb2baef545467c9d39f097344c6358b2 Author: Lisa OwenDate: 2016-10-04T19:25:29Z restructure db API section; add libpq and links to driver and api docs commit f066326f0241050a22a8b592fcaae3aab037c504 Author: Lisa Owen Date: 2016-10-04T20:36:41Z clarify some statements commit fbb0571df9cdb1ba05a2ba970b560cb6388b72eb Author: Lisa Owen Date: 2016-10-04T23:11:07Z hawq supports datadirect drivers commit df2aaed3aab20b9d0fffa0c62df8a23c33864065 Author: Lisa Owen Date: 2016-10-04T23:26:02Z update driver names commit 245633e69bd0017f43a5cc20e82c9a5fc23b4079 Author: Lisa Owen Date: 2016-10-05T21:56:36Z provide locations of libpq lib and include file commit 57d76d2b86014f772754ca70cab95e4c337a71a2 Author: Lisa Owen Date: 2016-10-07T16:02:22Z add jdbc connection string and example commit 70e45af7d24a6699840eec176603b4b835121bef Author: Lisa Owen Date: 2016-10-07T23:48:39Z flesh out jdbc section; add connection URL specs commit 3288da3e8ce51482e1d6e6913a237cbf5fc0bc8e Author: Lisa Owen Date: 2016-10-10T19:08:48Z db drivers and apis - flesh out odbc section > enhance database driver and API documentation > - > > Key: HAWQ-1095 > URL: https://issues.apache.org/jira/browse/HAWQ-1095 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > docs contain very brief references to JDBC/ODBC and none at all to libpq. > add more content in these areas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1056) "hawq check" help output and documentation updates needed
[ https://issues.apache.org/jira/browse/HAWQ-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15508041#comment-15508041 ] ASF GitHub Bot commented on HAWQ-1056: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq-docs/pull/9 > "hawq check" help output and documentation updates needed > - > > Key: HAWQ-1056 > URL: https://issues.apache.org/jira/browse/HAWQ-1056 > Project: Apache HAWQ > Issue Type: Bug > Components: Command Line Tools, Documentation >Reporter: Lisa Owen >Assignee: David Yozie >Priority: Minor > Fix For: 2.0.1.0-incubating > > > help output and reference documentation for "hawq check" --hadoop option is > not clear. specifically, this option should identify the full path to the > hadoop installation. > additionally, the [-h | --host ] option appears to be missing in > both areas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1056) "hawq check" help output and documentation updates needed
[ https://issues.apache.org/jira/browse/HAWQ-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494197#comment-15494197 ] ASF GitHub Bot commented on HAWQ-1056: -- GitHub user lisakowen opened a pull request: https://github.com/apache/incubator-hawq-docs/pull/9 Feature/hawqcheck hadoopopt some cleanup to documentation for "hawq check" command. fixes the documentation part of HAWQ-1056. - add -h, --host option - clarify value of --hadoop, --hadoop-home option value should be the full install path to hadoop - modify the examples to use relevant values for hadoop_home You can merge this pull request into a Git repository by running: $ git pull https://github.com/lisakowen/incubator-hawq-docs feature/hawqcheck-hadoopopt Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-docs/pull/9.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9 commit 0f642f1ce67bd570d2043174bbdaed990c7840bb Author: Lisa OwenDate: 2016-09-14T21:11:38Z clarify use of hawq check --hadoop option commit 6704cc0b7a358fafba08b3fd66a5a12b5bb97f85 Author: Lisa Owen Date: 2016-09-14T21:21:07Z hawq check --hadoop option - misc cleanup commit 016630163015e782ef630338998ef1696f5f005e Author: Lisa Owen Date: 2016-09-15T17:52:32Z hawq check - add h/host option, cleanup commit 4a617974cf04d1b1758bdfbc490116b60bdefb79 Author: Lisa Owen Date: 2016-09-15T18:38:36Z hawq check - hadoop home is optional > "hawq check" help output and documentation updates needed > - > > Key: HAWQ-1056 > URL: https://issues.apache.org/jira/browse/HAWQ-1056 > Project: Apache HAWQ > Issue Type: Bug > Components: Command Line Tools, Documentation >Reporter: Lisa Owen >Assignee: David Yozie > Fix For: 2.0.1.0-incubating > > > help output and reference documentation for "hawq check" --hadoop option is > not clear. specifically, this option should identify the full path to the > hadoop installation. > additionally, the [-h | --host ] option appears to be missing in > both areas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1019) clarify database application interfaces discussion
[ https://issues.apache.org/jira/browse/HAWQ-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440281#comment-15440281 ] ASF GitHub Bot commented on HAWQ-1019: -- Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq-docs/pull/3 > clarify database application interfaces discussion > -- > > Key: HAWQ-1019 > URL: https://issues.apache.org/jira/browse/HAWQ-1019 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: Lei Chang >Priority: Minor > Fix For: 2.0.1.0-incubating > > > discussion of drivers for database application interfaces needs to be > clarified. > relevant incubator-hawq-docs file: > clientaccess/g-database-application-interfaces.html.md.erb -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-1019) clarify database application interfaces discussion
[ https://issues.apache.org/jira/browse/HAWQ-1019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440251#comment-15440251 ] ASF GitHub Bot commented on HAWQ-1019: -- GitHub user lisakowen opened a pull request: https://github.com/apache/incubator-hawq-docs/pull/3 misc doc updates clarifying APIs updates to clarify database application interfaces. fixes HAWQ-1019 You can merge this pull request into a Git repository by running: $ git pull https://github.com/lisakowen/incubator-hawq-docs feature/dbappif-fixes Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq-docs/pull/3.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3 commit bec5e3e1855bed4f0f8a5ef2a41b94c3c92f17fc Author: Lisa OwenDate: 2016-08-26T23:59:00Z misc doc updates clarifying APIs > clarify database application interfaces discussion > -- > > Key: HAWQ-1019 > URL: https://issues.apache.org/jira/browse/HAWQ-1019 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation >Reporter: Lisa Owen >Assignee: Lei Chang >Priority: Minor > Fix For: 2.0.1.0-incubating > > > discussion of drivers for database application interfaces needs to be > clarified. > relevant incubator-hawq-docs file: > clientaccess/g-database-application-interfaces.html.md.erb -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-945) catalog:char/varchar test cases fail due to locale settings.
[ https://issues.apache.org/jira/browse/HAWQ-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389139#comment-15389139 ] ASF GitHub Bot commented on HAWQ-945: - Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq/pull/809 > catalog:char/varchar test cases fail due to locale settings. > > > Key: HAWQ-945 > URL: https://issues.apache.org/jira/browse/HAWQ-945 > Project: Apache HAWQ > Issue Type: Bug >Reporter: Paul Guo >Assignee: Paul Guo > Fix For: 2.0.1.0-incubating > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-945) catalog:char/varchar test cases fail due to locale settings.
[ https://issues.apache.org/jira/browse/HAWQ-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389133#comment-15389133 ] ASF GitHub Bot commented on HAWQ-945: - Github user radarwave commented on the issue: https://github.com/apache/incubator-hawq/pull/809 +1 > catalog:char/varchar test cases fail due to locale settings. > > > Key: HAWQ-945 > URL: https://issues.apache.org/jira/browse/HAWQ-945 > Project: Apache HAWQ > Issue Type: Bug >Reporter: Paul Guo >Assignee: Paul Guo > Fix For: 2.0.1.0-incubating > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-945) catalog:char/varchar test cases fail due to locale settings.
[ https://issues.apache.org/jira/browse/HAWQ-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389129#comment-15389129 ] ASF GitHub Bot commented on HAWQ-945: - Github user yaoj2 commented on the issue: https://github.com/apache/incubator-hawq/pull/809 +1 > catalog:char/varchar test cases fail due to locale settings. > > > Key: HAWQ-945 > URL: https://issues.apache.org/jira/browse/HAWQ-945 > Project: Apache HAWQ > Issue Type: Bug >Reporter: Paul Guo >Assignee: Paul Guo > Fix For: 2.0.1.0-incubating > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-945) catalog:char/varchar test cases fail due to locale settings.
[ https://issues.apache.org/jira/browse/HAWQ-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15389125#comment-15389125 ] ASF GitHub Bot commented on HAWQ-945: - GitHub user paul-guo- opened a pull request: https://github.com/apache/incubator-hawq/pull/809 HAWQ-945. catalog:char/varchar test cases fail due to locale settings. You can merge this pull request into a Git repository by running: $ git pull https://github.com/paul-guo-/incubator-hawq test3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq/pull/809.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #809 commit 1e3cdd448a30e8db62ad47e428d318f78acd1a34 Author: Paul GuoDate: 2016-07-22T07:50:37Z HAWQ-945. catalog:char/varchar test cases fail due to locale settings. > catalog:char/varchar test cases fail due to locale settings. > > > Key: HAWQ-945 > URL: https://issues.apache.org/jira/browse/HAWQ-945 > Project: Apache HAWQ > Issue Type: Bug >Reporter: Paul Guo >Assignee: Paul Guo > Fix For: 2.0.1.0-incubating > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-940) Kerberos Ticket Expired for LibYARN Operations
[ https://issues.apache.org/jira/browse/HAWQ-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388921#comment-15388921 ] ASF GitHub Bot commented on HAWQ-940: - Github user linwen closed the pull request at: https://github.com/apache/incubator-hawq/pull/804 > Kerberos Ticket Expired for LibYARN Operations > -- > > Key: HAWQ-940 > URL: https://issues.apache.org/jira/browse/HAWQ-940 > Project: Apache HAWQ > Issue Type: Bug > Components: libyarn >Reporter: Lin Wen >Assignee: Lin Wen > Fix For: 2.0.1.0-incubating > > > HAWQ's libhdfs3 and libyarn use a same kerberos keyfile. > Whenever a hdfs operation is triggered, a function named login() is called, > in login() function, this ticket is initialized by "kinit". > But for libyarn, login() function is only called during the resource broker > process starts. So if HAWQ starts up and there is no query for a long > period(24 hours in kerberos's configure file, krb.conf), this ticket will > expire, and HAWQ fails to register itself in Hadoop YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-940) Kerberos Ticket Expired for LibYARN Operations
[ https://issues.apache.org/jira/browse/HAWQ-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388905#comment-15388905 ] ASF GitHub Bot commented on HAWQ-940: - Github user paul-guo- commented on the issue: https://github.com/apache/incubator-hawq/pull/804 +1 > Kerberos Ticket Expired for LibYARN Operations > -- > > Key: HAWQ-940 > URL: https://issues.apache.org/jira/browse/HAWQ-940 > Project: Apache HAWQ > Issue Type: Bug > Components: libyarn >Reporter: Lin Wen >Assignee: Lin Wen > Fix For: 2.0.1.0-incubating > > > HAWQ's libhdfs3 and libyarn use a same kerberos keyfile. > Whenever a hdfs operation is triggered, a function named login() is called, > in login() function, this ticket is initialized by "kinit". > But for libyarn, login() function is only called during the resource broker > process starts. So if HAWQ starts up and there is no query for a long > period(24 hours in kerberos's configure file, krb.conf), this ticket will > expire, and HAWQ fails to register itself in Hadoop YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-932) HAWQ fails to query external table defined with "localhost" in URL
[ https://issues.apache.org/jira/browse/HAWQ-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388640#comment-15388640 ] ASF GitHub Bot commented on HAWQ-932: - Github user sansanichfb commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/803#discussion_r71808630 --- Diff: src/backend/catalog/external/externalmd.c --- @@ -84,7 +84,7 @@ List *ParsePxfEntries(StringInfo json, char *profile, Oid dboid) { struct json_object *jsonItem = json_object_array_get_idx(jsonItems, i); PxfItem *pxfItem = ParsePxfItem(jsonItem, profile); - if (dboid != NULL) + if (dboid != InvalidOid) --- End diff -- Warning. > HAWQ fails to query external table defined with "localhost" in URL > -- > > Key: HAWQ-932 > URL: https://issues.apache.org/jira/browse/HAWQ-932 > Project: Apache HAWQ > Issue Type: Bug > Components: External Tables, PXF >Reporter: Goden Yao >Assignee: Oleksandr Diachenko > Fix For: 2.0.1.0-incubating > > > Originally reported by [~jpatel] when he's making a docker image based on > HAWQ 2.0.0.0-incubating dev build. Investigated by [~odiachenko] > There is workaround to define it with 127.0.0.1, but there is not a > workaround for querying tables using HCatalog integration. > It used to work before. > {code} > template1=# CREATE EXTERNAL TABLE ext_table1 (t1text, t2text, > num1 integer, dub1 double precision) LOCATION > (E'pxf://localhost:51200/hive_small_data?PROFILE=Hive') FORMAT 'CUSTOM' > (formatter='pxfwritable_import');* > CREATE EXTERNAL TABLE > template1=# select * from ext_table1; > ERROR: remote component error (0): (libchurl.c:898)* > {code} > When I turned on debug mode in curl, I found this error in logs - "* > Closing connection 0". > I found a workaround, to set CURLOPT_RESOLVE option in curl: > {code} > struct curl_slist *host = NULL; > host = curl_slist_append(NULL, "localhost:51200:127.0.0.1");* > set_curl_option(context, CURLOPT_RESOLVE, host); > {code} > It seems like an issue with DNS cache, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-932) HAWQ fails to query external table defined with "localhost" in URL
[ https://issues.apache.org/jira/browse/HAWQ-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388639#comment-15388639 ] ASF GitHub Bot commented on HAWQ-932: - Github user sansanichfb commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/803#discussion_r71808615 --- Diff: src/backend/utils/adt/pxf_functions.c --- @@ -43,7 +44,7 @@ pxf_item_fields_enum_start(text *profile, text *pattern) char *profile_cstr = text_to_cstring(profile); char *pattern_cstr = text_to_cstring(pattern); - items = get_pxf_item_metadata(profile_cstr, pattern_cstr, NULL); + items = get_pxf_item_metadata(profile_cstr, pattern_cstr, InvalidOid); --- End diff -- One more warning. > HAWQ fails to query external table defined with "localhost" in URL > -- > > Key: HAWQ-932 > URL: https://issues.apache.org/jira/browse/HAWQ-932 > Project: Apache HAWQ > Issue Type: Bug > Components: External Tables, PXF >Reporter: Goden Yao >Assignee: Oleksandr Diachenko > Fix For: 2.0.1.0-incubating > > > Originally reported by [~jpatel] when he's making a docker image based on > HAWQ 2.0.0.0-incubating dev build. Investigated by [~odiachenko] > There is workaround to define it with 127.0.0.1, but there is not a > workaround for querying tables using HCatalog integration. > It used to work before. > {code} > template1=# CREATE EXTERNAL TABLE ext_table1 (t1text, t2text, > num1 integer, dub1 double precision) LOCATION > (E'pxf://localhost:51200/hive_small_data?PROFILE=Hive') FORMAT 'CUSTOM' > (formatter='pxfwritable_import');* > CREATE EXTERNAL TABLE > template1=# select * from ext_table1; > ERROR: remote component error (0): (libchurl.c:898)* > {code} > When I turned on debug mode in curl, I found this error in logs - "* > Closing connection 0". > I found a workaround, to set CURLOPT_RESOLVE option in curl: > {code} > struct curl_slist *host = NULL; > host = curl_slist_append(NULL, "localhost:51200:127.0.0.1");* > set_curl_option(context, CURLOPT_RESOLVE, host); > {code} > It seems like an issue with DNS cache, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-932) HAWQ fails to query external table defined with "localhost" in URL
[ https://issues.apache.org/jira/browse/HAWQ-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388172#comment-15388172 ] ASF GitHub Bot commented on HAWQ-932: - Github user shivzone commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/803#discussion_r71760909 --- Diff: src/backend/access/external/libchurl.c --- @@ -312,6 +312,14 @@ CHURL_HANDLE churl_init_upload(const char* url, CHURL_HEADERS headers) context->upload = true; clear_error_buffer(context); + /* needed to resolve pxf service address */ + struct curl_slist *resolve_hosts = NULL; + char *pxf_host_entry = (char *) palloc0(strlen(pxf_service_address) + strlen(LocalhostIpV4Entry) + 1); + strcat(pxf_host_entry, pxf_service_address); --- End diff -- Yes. I was suggesting that we add the below OPT only when pxf_service_address is not based on IP address > HAWQ fails to query external table defined with "localhost" in URL > -- > > Key: HAWQ-932 > URL: https://issues.apache.org/jira/browse/HAWQ-932 > Project: Apache HAWQ > Issue Type: Bug > Components: External Tables, PXF >Reporter: Goden Yao >Assignee: Oleksandr Diachenko > Fix For: 2.0.1.0-incubating > > > Originally reported by [~jpatel] when he's making a docker image based on > HAWQ 2.0.0.0-incubating dev build. Investigated by [~odiachenko] > There is workaround to define it with 127.0.0.1, but there is not a > workaround for querying tables using HCatalog integration. > It used to work before. > {code} > template1=# CREATE EXTERNAL TABLE ext_table1 (t1text, t2text, > num1 integer, dub1 double precision) LOCATION > (E'pxf://localhost:51200/hive_small_data?PROFILE=Hive') FORMAT 'CUSTOM' > (formatter='pxfwritable_import');* > CREATE EXTERNAL TABLE > template1=# select * from ext_table1; > ERROR: remote component error (0): (libchurl.c:898)* > {code} > When I turned on debug mode in curl, I found this error in logs - "* > Closing connection 0". > I found a workaround, to set CURLOPT_RESOLVE option in curl: > {code} > struct curl_slist *host = NULL; > host = curl_slist_append(NULL, "localhost:51200:127.0.0.1");* > set_curl_option(context, CURLOPT_RESOLVE, host); > {code} > It seems like an issue with DNS cache, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-932) HAWQ fails to query external table defined with "localhost" in URL
[ https://issues.apache.org/jira/browse/HAWQ-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388167#comment-15388167 ] ASF GitHub Bot commented on HAWQ-932: - Github user sansanichfb commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/803#discussion_r71760410 --- Diff: src/backend/access/external/libchurl.c --- @@ -312,6 +312,14 @@ CHURL_HANDLE churl_init_upload(const char* url, CHURL_HEADERS headers) context->upload = true; clear_error_buffer(context); + /* needed to resolve pxf service address */ + struct curl_slist *resolve_hosts = NULL; + char *pxf_host_entry = (char *) palloc0(strlen(pxf_service_address) + strlen(LocalhostIpV4Entry) + 1); + strcat(pxf_host_entry, pxf_service_address); --- End diff -- For case when user created external table referring to "localhost" it's needed. > HAWQ fails to query external table defined with "localhost" in URL > -- > > Key: HAWQ-932 > URL: https://issues.apache.org/jira/browse/HAWQ-932 > Project: Apache HAWQ > Issue Type: Bug > Components: External Tables, PXF >Reporter: Goden Yao >Assignee: Oleksandr Diachenko > Fix For: 2.0.1.0-incubating > > > Originally reported by [~jpatel] when he's making a docker image based on > HAWQ 2.0.0.0-incubating dev build. Investigated by [~odiachenko] > There is workaround to define it with 127.0.0.1, but there is not a > workaround for querying tables using HCatalog integration. > It used to work before. > {code} > template1=# CREATE EXTERNAL TABLE ext_table1 (t1text, t2text, > num1 integer, dub1 double precision) LOCATION > (E'pxf://localhost:51200/hive_small_data?PROFILE=Hive') FORMAT 'CUSTOM' > (formatter='pxfwritable_import');* > CREATE EXTERNAL TABLE > template1=# select * from ext_table1; > ERROR: remote component error (0): (libchurl.c:898)* > {code} > When I turned on debug mode in curl, I found this error in logs - "* > Closing connection 0". > I found a workaround, to set CURLOPT_RESOLVE option in curl: > {code} > struct curl_slist *host = NULL; > host = curl_slist_append(NULL, "localhost:51200:127.0.0.1");* > set_curl_option(context, CURLOPT_RESOLVE, host); > {code} > It seems like an issue with DNS cache, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-932) HAWQ fails to query external table defined with "localhost" in URL
[ https://issues.apache.org/jira/browse/HAWQ-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388163#comment-15388163 ] ASF GitHub Bot commented on HAWQ-932: - Github user shivzone commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/803#discussion_r71760169 --- Diff: src/backend/access/external/libchurl.c --- @@ -312,6 +312,14 @@ CHURL_HANDLE churl_init_upload(const char* url, CHURL_HEADERS headers) context->upload = true; clear_error_buffer(context); + /* needed to resolve pxf service address */ + struct curl_slist *resolve_hosts = NULL; + char *pxf_host_entry = (char *) palloc0(strlen(pxf_service_address) + strlen(LocalhostIpV4Entry) + 1); + strcat(pxf_host_entry, pxf_service_address); --- End diff -- This step might be unnecssary if the pxf_service_address itself is based on the IP address > HAWQ fails to query external table defined with "localhost" in URL > -- > > Key: HAWQ-932 > URL: https://issues.apache.org/jira/browse/HAWQ-932 > Project: Apache HAWQ > Issue Type: Bug > Components: External Tables, PXF >Reporter: Goden Yao >Assignee: Oleksandr Diachenko > Fix For: 2.0.1.0-incubating > > > Originally reported by [~jpatel] when he's making a docker image based on > HAWQ 2.0.0.0-incubating dev build. Investigated by [~odiachenko] > There is workaround to define it with 127.0.0.1, but there is not a > workaround for querying tables using HCatalog integration. > It used to work before. > {code} > template1=# CREATE EXTERNAL TABLE ext_table1 (t1text, t2text, > num1 integer, dub1 double precision) LOCATION > (E'pxf://localhost:51200/hive_small_data?PROFILE=Hive') FORMAT 'CUSTOM' > (formatter='pxfwritable_import');* > CREATE EXTERNAL TABLE > template1=# select * from ext_table1; > ERROR: remote component error (0): (libchurl.c:898)* > {code} > When I turned on debug mode in curl, I found this error in logs - "* > Closing connection 0". > I found a workaround, to set CURLOPT_RESOLVE option in curl: > {code} > struct curl_slist *host = NULL; > host = curl_slist_append(NULL, "localhost:51200:127.0.0.1");* > set_curl_option(context, CURLOPT_RESOLVE, host); > {code} > It seems like an issue with DNS cache, -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-944) Numutils.c: pg_ltoa and pg_itoa functions allocate unnecessary amount of bytes
[ https://issues.apache.org/jira/browse/HAWQ-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388000#comment-15388000 ] ASF GitHub Bot commented on HAWQ-944: - Github user kavinderd commented on the issue: https://github.com/apache/incubator-hawq/pull/808 @paul-guo- @xunzhang Please review, I think I covered all invocations of the two functions. > Numutils.c: pg_ltoa and pg_itoa functions allocate unnecessary amount of bytes > -- > > Key: HAWQ-944 > URL: https://issues.apache.org/jira/browse/HAWQ-944 > Project: Apache HAWQ > Issue Type: Improvement > Components: Core >Reporter: Kavinder Dhaliwal >Assignee: Kavinder Dhaliwal >Priority: Minor > > The current implementations of {{pg_ltoa}} and {{pg_itoa}} allocate a 33 byte > char array and set the input pointer to that array. This is far too many > bytes than needed to translate an int16 or int32 to a string > int32 -> 10 bytes maximum + 1 sign bit + '\0' = 12 bytes > int16 -> 5 bytes maximum + 1 sign bit + '\0' = 7 bytes > When HAWQ/Greenplum forked from Postgres the two functions simply delegated > to {{sprintf}} so an optimization was introduced that involved the 33 byte > solution. Postgres itself implemented these functions in commit > https://github.com/postgres/postgres/commit/4fc115b2e981f8c63165ca86a23215380a3fda66 > that require a 12 byte maximum char pointer. > This is a minor improvement that can be made to the HAWQ codebase and it's > relatively little effort to do so. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-897) Add feature test for create table distribution with new test framework
[ https://issues.apache.org/jira/browse/HAWQ-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387150#comment-15387150 ] ASF GitHub Bot commented on HAWQ-897: - Github user yaoj2 commented on the issue: https://github.com/apache/incubator-hawq/pull/807 LGTM > Add feature test for create table distribution with new test framework > -- > > Key: HAWQ-897 > URL: https://issues.apache.org/jira/browse/HAWQ-897 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Tests >Reporter: Lin Wen >Assignee: Lin Wen > Fix For: 2.0.1.0-incubating > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-940) Kerberos Ticket Expired for LibYARN Operations
[ https://issues.apache.org/jira/browse/HAWQ-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387148#comment-15387148 ] ASF GitHub Bot commented on HAWQ-940: - Github user jiny2 commented on the issue: https://github.com/apache/incubator-hawq/pull/804 LGTM +1 > Kerberos Ticket Expired for LibYARN Operations > -- > > Key: HAWQ-940 > URL: https://issues.apache.org/jira/browse/HAWQ-940 > Project: Apache HAWQ > Issue Type: Bug > Components: libyarn >Reporter: Lin Wen >Assignee: Lin Wen > Fix For: 2.0.1.0-incubating > > > HAWQ's libhdfs3 and libyarn use a same kerberos keyfile. > Whenever a hdfs operation is triggered, a function named login() is called, > in login() function, this ticket is initialized by "kinit". > But for libyarn, login() function is only called during the resource broker > process starts. So if HAWQ starts up and there is no query for a long > period(24 hours in kerberos's configure file, krb.conf), this ticket will > expire, and HAWQ fails to register itself in Hadoop YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-897) Add feature test for create table distribution with new test framework
[ https://issues.apache.org/jira/browse/HAWQ-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387147#comment-15387147 ] ASF GitHub Bot commented on HAWQ-897: - Github user linwen commented on the issue: https://github.com/apache/incubator-hawq/pull/807 +1 > Add feature test for create table distribution with new test framework > -- > > Key: HAWQ-897 > URL: https://issues.apache.org/jira/browse/HAWQ-897 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Tests >Reporter: Lin Wen >Assignee: Lin Wen > Fix For: 2.0.1.0-incubating > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-934) Populate canSetTag of PlannedStmt from Query object
[ https://issues.apache.org/jira/browse/HAWQ-934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387135#comment-15387135 ] ASF GitHub Bot commented on HAWQ-934: - Github user hsyuan commented on the issue: https://github.com/apache/incubator-hawq/pull/799 @changleicn @wengyanqing @paul-guo- Please take a look. > Populate canSetTag of PlannedStmt from Query object > --- > > Key: HAWQ-934 > URL: https://issues.apache.org/jira/browse/HAWQ-934 > Project: Apache HAWQ > Issue Type: Bug > Components: Optimizer >Reporter: Haisheng Yuan >Assignee: Venkatesh > Fix For: 2.0.1.0-incubating > > > HAWQ generated an error if a single query resulted in multiple query plans > because of rule transformation and the plans were produced by PQO. This is > because of an incorrect directive in the plan to lock the same resource more > than once. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-897) Add feature test for create table distribution with new test framework
[ https://issues.apache.org/jira/browse/HAWQ-897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387136#comment-15387136 ] ASF GitHub Bot commented on HAWQ-897: - GitHub user jiny2 opened a pull request: https://github.com/apache/incubator-hawq/pull/807 HAWQ-897. Add feature test for create table distribution with new test framework You can merge this pull request into a Git repository by running: $ git pull https://github.com/jiny2/incubator-hawq HAWQ-897 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq/pull/807.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #807 commit 5fb757d8597b6dfa3e64cda81260e4d819e1793c Author: YI JINDate: 2016-07-21T04:28:24Z HAWQ-897. Add feature test for create table distribution with new test framework > Add feature test for create table distribution with new test framework > -- > > Key: HAWQ-897 > URL: https://issues.apache.org/jira/browse/HAWQ-897 > Project: Apache HAWQ > Issue Type: Sub-task > Components: Tests >Reporter: Lin Wen >Assignee: Lin Wen > Fix For: 2.0.1.0-incubating > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-938) Remove ivy.xml in gpopt and read orca version from header file
[ https://issues.apache.org/jira/browse/HAWQ-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387128#comment-15387128 ] ASF GitHub Bot commented on HAWQ-938: - Github user hsyuan commented on the issue: https://github.com/apache/incubator-hawq/pull/806 Thanks, will take care of it. > Remove ivy.xml in gpopt and read orca version from header file > -- > > Key: HAWQ-938 > URL: https://issues.apache.org/jira/browse/HAWQ-938 > Project: Apache HAWQ > Issue Type: Improvement > Components: Optimizer >Reporter: Haisheng Yuan >Assignee: Haisheng Yuan > Fix For: 2.0.1.0-incubating > > > Currently, if we want to upgrade orca or gpos, we need change the orca SHA as > well as the version number in ivy.xml. The function gp_opt_version() returns > version number that is read from ivy.xml, which is not a right way. It should > only be dependent on the source file of orca and gpos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-940) Kerberos Ticket Expired for LibYARN Operations
[ https://issues.apache.org/jira/browse/HAWQ-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387080#comment-15387080 ] ASF GitHub Bot commented on HAWQ-940: - GitHub user linwen reopened a pull request: https://github.com/apache/incubator-hawq/pull/804 HAWQ-940. Fix Kerberos ticket expired for libyarn operations Please review, thanks! You can merge this pull request into a Git repository by running: $ git pull https://github.com/linwen/incubator-hawq hawq_940 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq/pull/804.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #804 commit 0942630825f6c84da155bd2d9aec4831f7e4d049 Author: Wen LinDate: 2016-07-20T09:07:01Z HAWQ-940. Fix Kerberos ticket expired for libyarn operations > Kerberos Ticket Expired for LibYARN Operations > -- > > Key: HAWQ-940 > URL: https://issues.apache.org/jira/browse/HAWQ-940 > Project: Apache HAWQ > Issue Type: Bug > Components: libyarn >Reporter: Lin Wen >Assignee: Lin Wen > Fix For: 2.0.1.0-incubating > > > HAWQ's libhdfs3 and libyarn use a same kerberos keyfile. > Whenever a hdfs operation is triggered, a function named login() is called, > in login() function, this ticket is initialized by "kinit". > But for libyarn, login() function is only called during the resource broker > process starts. So if HAWQ starts up and there is no query for a long > period(24 hours in kerberos's configure file, krb.conf), this ticket will > expire, and HAWQ fails to register itself in Hadoop YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-940) Kerberos Ticket Expired for LibYARN Operations
[ https://issues.apache.org/jira/browse/HAWQ-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387079#comment-15387079 ] ASF GitHub Bot commented on HAWQ-940: - Github user linwen commented on the issue: https://github.com/apache/incubator-hawq/pull/804 Another way of fix is to add the ticket check in resource broker process loop, for every time interval, login() is called. But this fix has to keep another variable to record the last updated time, which is duplicate with login(). > Kerberos Ticket Expired for LibYARN Operations > -- > > Key: HAWQ-940 > URL: https://issues.apache.org/jira/browse/HAWQ-940 > Project: Apache HAWQ > Issue Type: Bug > Components: libyarn >Reporter: Lin Wen >Assignee: Lin Wen > Fix For: 2.0.1.0-incubating > > > HAWQ's libhdfs3 and libyarn use a same kerberos keyfile. > Whenever a hdfs operation is triggered, a function named login() is called, > in login() function, this ticket is initialized by "kinit". > But for libyarn, login() function is only called during the resource broker > process starts. So if HAWQ starts up and there is no query for a long > period(24 hours in kerberos's configure file, krb.conf), this ticket will > expire, and HAWQ fails to register itself in Hadoop YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-940) Kerberos Ticket Expired for LibYARN Operations
[ https://issues.apache.org/jira/browse/HAWQ-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387063#comment-15387063 ] ASF GitHub Bot commented on HAWQ-940: - Github user linwen commented on the issue: https://github.com/apache/incubator-hawq/pull/804 After re-think this fix, I think it's better to check ticket expiration in resource broker process loop. so close it. > Kerberos Ticket Expired for LibYARN Operations > -- > > Key: HAWQ-940 > URL: https://issues.apache.org/jira/browse/HAWQ-940 > Project: Apache HAWQ > Issue Type: Bug > Components: libyarn >Reporter: Lin Wen >Assignee: Lei Chang > Fix For: 2.0.1.0-incubating > > > HAWQ's libhdfs3 and libyarn use a same kerberos keyfile. > Whenever a hdfs operation is triggered, a function named login() is called, > in login() function, this ticket is initialized by "kinit". > But for libyarn, login() function is only called during the resource broker > process starts. So if HAWQ starts up and there is no query for a long > period(24 hours in kerberos's configure file, krb.conf), this ticket will > expire, and HAWQ fails to register itself in Hadoop YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-940) Kerberos Ticket Expired for LibYARN Operations
[ https://issues.apache.org/jira/browse/HAWQ-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387064#comment-15387064 ] ASF GitHub Bot commented on HAWQ-940: - Github user linwen closed the pull request at: https://github.com/apache/incubator-hawq/pull/804 > Kerberos Ticket Expired for LibYARN Operations > -- > > Key: HAWQ-940 > URL: https://issues.apache.org/jira/browse/HAWQ-940 > Project: Apache HAWQ > Issue Type: Bug > Components: libyarn >Reporter: Lin Wen >Assignee: Lei Chang > Fix For: 2.0.1.0-incubating > > > HAWQ's libhdfs3 and libyarn use a same kerberos keyfile. > Whenever a hdfs operation is triggered, a function named login() is called, > in login() function, this ticket is initialized by "kinit". > But for libyarn, login() function is only called during the resource broker > process starts. So if HAWQ starts up and there is no query for a long > period(24 hours in kerberos's configure file, krb.conf), this ticket will > expire, and HAWQ fails to register itself in Hadoop YARN. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-943) Various issues in hawq register feature_test cases
[ https://issues.apache.org/jira/browse/HAWQ-943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15387059#comment-15387059 ] ASF GitHub Bot commented on HAWQ-943: - Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq/pull/805 > Various issues in hawq register feature_test cases > -- > > Key: HAWQ-943 > URL: https://issues.apache.org/jira/browse/HAWQ-943 > Project: Apache HAWQ > Issue Type: Bug >Reporter: Paul Guo >Assignee: Lei Chang > Fix For: 2.0.1.0-incubating > > > 1) Do not assume the test database is postgres. > Use HAWQ_DB which is defined in sql_util.h > 2) Use error immune options when creating a new hdfs file or directory. > e.g. mkdir -p, put -f. > Since nonexistence of those files/directories are not guaranteed. > e.g. Previous test run was terminated by ctrl+c. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-943) Various issues in hawq register feature_test cases
[ https://issues.apache.org/jira/browse/HAWQ-943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386997#comment-15386997 ] ASF GitHub Bot commented on HAWQ-943: - Github user ictmalili commented on the issue: https://github.com/apache/incubator-hawq/pull/805 LGTM. +1 > Various issues in hawq register feature_test cases > -- > > Key: HAWQ-943 > URL: https://issues.apache.org/jira/browse/HAWQ-943 > Project: Apache HAWQ > Issue Type: Bug >Reporter: Paul Guo >Assignee: Lei Chang > Fix For: 2.0.1.0-incubating > > > 1) Do not assume the test database is postgres. > Use HAWQ_DB which is defined in sql_util.h > 2) Use error immune options when creating a new hdfs file or directory. > e.g. mkdir -p, put -f. > Since nonexistence of those files/directories are not guaranteed. > e.g. Previous test run was terminated by ctrl+c. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-936) Add GUC for array expansion in ORCA optimizer
[ https://issues.apache.org/jira/browse/HAWQ-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386950#comment-15386950 ] ASF GitHub Bot commented on HAWQ-936: - Github user changleicn commented on the issue: https://github.com/apache/incubator-hawq/pull/800 LGTM > Add GUC for array expansion in ORCA optimizer > - > > Key: HAWQ-936 > URL: https://issues.apache.org/jira/browse/HAWQ-936 > Project: Apache HAWQ > Issue Type: New Feature > Components: Optimizer >Reporter: Haisheng Yuan >Assignee: Venkatesh > Fix For: 2.0.1.0-incubating > > > Consider the query with the following pattern select * from foo where foo.a > IN {1,2,3,...}. Currently, when the number of constants in the IN subquery is > large, the query optimization time is unacceptable. This is stopping > customers from turning Orca on by default since many of the queries are > generated queries with such a pattern. > The root cause is due to the expansion of the IN subquery into an expression > in disjunctive normal form. The objective of this story is to disable this > expansion when the number of constants in the IN list is large. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-938) Remove ivy.xml in gpopt and read orca version from header file
[ https://issues.apache.org/jira/browse/HAWQ-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386947#comment-15386947 ] ASF GitHub Bot commented on HAWQ-938: - Github user changleicn commented on the issue: https://github.com/apache/incubator-hawq/pull/806 @paul-guo- to review. > Remove ivy.xml in gpopt and read orca version from header file > -- > > Key: HAWQ-938 > URL: https://issues.apache.org/jira/browse/HAWQ-938 > Project: Apache HAWQ > Issue Type: Improvement > Components: Optimizer >Reporter: Haisheng Yuan >Assignee: Haisheng Yuan > Fix For: 2.0.1.0-incubating > > > Currently, if we want to upgrade orca or gpos, we need change the orca SHA as > well as the version number in ivy.xml. The function gp_opt_version() returns > version number that is read from ivy.xml, which is not a right way. It should > only be dependent on the source file of orca and gpos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-927) Send Projection Info Data from HAWQ to PXF
[ https://issues.apache.org/jira/browse/HAWQ-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386817#comment-15386817 ] ASF GitHub Bot commented on HAWQ-927: - Github user asfgit closed the pull request at: https://github.com/apache/incubator-hawq/pull/796 > Send Projection Info Data from HAWQ to PXF > -- > > Key: HAWQ-927 > URL: https://issues.apache.org/jira/browse/HAWQ-927 > Project: Apache HAWQ > Issue Type: Sub-task > Components: External Tables, PXF >Reporter: Kavinder Dhaliwal >Assignee: Kavinder Dhaliwal > Fix For: backlog > > > To achieve column projection at the level of PXF or the underlying readers we > need to first send this data as a Header/Param to PXF. Currently, PXF has no > knowledge whether a query requires all columns or a subset of columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-936) Add GUC for array expansion in ORCA optimizer
[ https://issues.apache.org/jira/browse/HAWQ-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386816#comment-15386816 ] ASF GitHub Bot commented on HAWQ-936: - Github user hsyuan commented on the issue: https://github.com/apache/incubator-hawq/pull/800 This PR will give https://github.com/apache/incubator-hawq/pull/795 a free ride. > Add GUC for array expansion in ORCA optimizer > - > > Key: HAWQ-936 > URL: https://issues.apache.org/jira/browse/HAWQ-936 > Project: Apache HAWQ > Issue Type: New Feature > Components: Optimizer >Reporter: Haisheng Yuan >Assignee: Venkatesh > Fix For: 2.0.1.0-incubating > > > Consider the query with the following pattern select * from foo where foo.a > IN {1,2,3,...}. Currently, when the number of constants in the IN subquery is > large, the query optimization time is unacceptable. This is stopping > customers from turning Orca on by default since many of the queries are > generated queries with such a pattern. > The root cause is due to the expansion of the IN subquery into an expression > in disjunctive normal form. The objective of this story is to disable this > expansion when the number of constants in the IN list is large. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-927) Send Projection Info Data from HAWQ to PXF
[ https://issues.apache.org/jira/browse/HAWQ-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386809#comment-15386809 ] ASF GitHub Bot commented on HAWQ-927: - Github user shivzone commented on the issue: https://github.com/apache/incubator-hawq/pull/796 +1 > Send Projection Info Data from HAWQ to PXF > -- > > Key: HAWQ-927 > URL: https://issues.apache.org/jira/browse/HAWQ-927 > Project: Apache HAWQ > Issue Type: Sub-task > Components: External Tables, PXF >Reporter: Kavinder Dhaliwal >Assignee: Kavinder Dhaliwal > Fix For: backlog > > > To achieve column projection at the level of PXF or the underlying readers we > need to first send this data as a Header/Param to PXF. Currently, PXF has no > knowledge whether a query requires all columns or a subset of columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-938) Remove ivy.xml in gpopt and read orca version from header file
[ https://issues.apache.org/jira/browse/HAWQ-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386634#comment-15386634 ] ASF GitHub Bot commented on HAWQ-938: - Github user hsyuan commented on the issue: https://github.com/apache/incubator-hawq/pull/806 @changleicn @yaoj2 @wengyanqing Please take a look. > Remove ivy.xml in gpopt and read orca version from header file > -- > > Key: HAWQ-938 > URL: https://issues.apache.org/jira/browse/HAWQ-938 > Project: Apache HAWQ > Issue Type: Improvement > Components: Optimizer >Reporter: Haisheng Yuan >Assignee: Haisheng Yuan > Fix For: 2.0.1.0-incubating > > > Currently, if we want to upgrade orca or gpos, we need change the orca SHA as > well as the version number in ivy.xml. The function gp_opt_version() returns > version number that is read from ivy.xml, which is not a right way. It should > only be dependent on the source file of orca and gpos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-938) Remove ivy.xml in gpopt and read orca version from header file
[ https://issues.apache.org/jira/browse/HAWQ-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386632#comment-15386632 ] ASF GitHub Bot commented on HAWQ-938: - GitHub user hsyuan opened a pull request: https://github.com/apache/incubator-hawq/pull/806 HAWQ-938. Remove ivy.xml in gpopt and read orca version from header file The old mechanism extracted the version numbers from the Ivy config file, which doesn't do the right thing if you build without Ivy. Using the version headers is simpler, anyway. Also removed `ivy.xml` and `ivy-build.xml` under `gpopt` folder. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hsyuan/incubator-hawq HAWQ-938 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hawq/pull/806.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #806 commit 2a2a89cc6b950d4067a8e4d8a5e79b2f7b1cf839 Author: Haisheng YuanDate: 2016-07-20T20:14:25Z HAWQ-938. Remove ivy.xml in gpopt and read orca version from header file The old mechanism extracted the version numbers from the Ivy config file, which doesn't do the right thing if you build without Ivy. Using the version headers is simpler, anyway. > Remove ivy.xml in gpopt and read orca version from header file > -- > > Key: HAWQ-938 > URL: https://issues.apache.org/jira/browse/HAWQ-938 > Project: Apache HAWQ > Issue Type: Improvement > Components: Optimizer >Reporter: Haisheng Yuan >Assignee: Haisheng Yuan > Fix For: 2.0.1.0-incubating > > > Currently, if we want to upgrade orca or gpos, we need change the orca SHA as > well as the version number in ivy.xml. The function gp_opt_version() returns > version number that is read from ivy.xml, which is not a right way. It should > only be dependent on the source file of orca and gpos. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-927) Send Projection Info Data from HAWQ to PXF
[ https://issues.apache.org/jira/browse/HAWQ-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386188#comment-15386188 ] ASF GitHub Bot commented on HAWQ-927: - Github user xunzhang commented on the issue: https://github.com/apache/incubator-hawq/pull/796 LGTM. Remember rebasing the commit before checking in. > Send Projection Info Data from HAWQ to PXF > -- > > Key: HAWQ-927 > URL: https://issues.apache.org/jira/browse/HAWQ-927 > Project: Apache HAWQ > Issue Type: Sub-task > Components: External Tables, PXF >Reporter: Kavinder Dhaliwal >Assignee: Kavinder Dhaliwal > Fix For: backlog > > > To achieve column projection at the level of PXF or the underlying readers we > need to first send this data as a Header/Param to PXF. Currently, PXF has no > knowledge whether a query requires all columns or a subset of columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-927) Send Projection Info Data from HAWQ to PXF
[ https://issues.apache.org/jira/browse/HAWQ-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386155#comment-15386155 ] ASF GitHub Bot commented on HAWQ-927: - Github user kavinderd commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/796#discussion_r71559534 --- Diff: src/backend/access/external/pxfheaders.c --- @@ -158,6 +166,29 @@ static void add_tuple_desc_httpheader(CHURL_HEADERS headers, Relation rel) pfree(formatter.data); } +static void add_projection_desc_httpheader(CHURL_HEADERS headers, ProjectionInfo *projInfo) { + int i; + char long_number[sizeof(int32) * 8]; + int *varNumbers = projInfo->pi_varNumbers; + StringInfoData formatter; + initStringInfo(); + +/* Convert the number of projection columns to a string */ +pg_ltoa(list_length(projInfo->pi_targetlist), long_number); +churl_headers_append(headers, "X-GP-ATTRS-PROJ", long_number); + + for(i = 0; i < list_length(projInfo->pi_targetlist); i++) { --- End diff -- Yes it will be in another PR related to [this](https://issues.apache.org/jira/browse/HAWQ-583?jql=project%20%3D%20HAWQ%20AND%20resolution%20%3D%20Unresolved%20AND%20assignee%20%3D%20kavinderd%20ORDER%20BY%20priority%20DESC) Jira. > Send Projection Info Data from HAWQ to PXF > -- > > Key: HAWQ-927 > URL: https://issues.apache.org/jira/browse/HAWQ-927 > Project: Apache HAWQ > Issue Type: Sub-task > Components: External Tables, PXF >Reporter: Kavinder Dhaliwal >Assignee: Kavinder Dhaliwal > Fix For: backlog > > > To achieve column projection at the level of PXF or the underlying readers we > need to first send this data as a Header/Param to PXF. Currently, PXF has no > knowledge whether a query requires all columns or a subset of columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-927) Send Projection Info Data from HAWQ to PXF
[ https://issues.apache.org/jira/browse/HAWQ-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386147#comment-15386147 ] ASF GitHub Bot commented on HAWQ-927: - Github user kavinderd commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/796#discussion_r71558813 --- Diff: src/backend/access/external/fileam.c --- @@ -454,6 +454,21 @@ external_stopscan(FileScanDesc scan) } } +/* + * external_getnext_init - prepare ExternalSelectDesc struct before external_getnext +/* + */ + +ExternalSelectDesc +external_getnext_init(PlanState *state) { + ExternalSelectDesc desc = (ExternalSelectDesc) palloc0(sizeof(ExternalSelectDescData)); --- End diff -- I missed adding `pfree` for `desc`. I added it to the end of `ExternalNext()` > Send Projection Info Data from HAWQ to PXF > -- > > Key: HAWQ-927 > URL: https://issues.apache.org/jira/browse/HAWQ-927 > Project: Apache HAWQ > Issue Type: Sub-task > Components: External Tables, PXF >Reporter: Kavinder Dhaliwal >Assignee: Kavinder Dhaliwal > Fix For: backlog > > > To achieve column projection at the level of PXF or the underlying readers we > need to first send this data as a Header/Param to PXF. Currently, PXF has no > knowledge whether a query requires all columns or a subset of columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HAWQ-932) HAWQ fails to query external table defined with "localhost" in URL
[ https://issues.apache.org/jira/browse/HAWQ-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386111#comment-15386111 ] ASF GitHub Bot commented on HAWQ-932: - Github user kavinderd commented on a diff in the pull request: https://github.com/apache/incubator-hawq/pull/803#discussion_r71555148 --- Diff: src/backend/access/external/libchurl.c --- @@ -312,6 +312,14 @@ CHURL_HANDLE churl_init_upload(const char* url, CHURL_HEADERS headers) context->upload = true; clear_error_buffer(context); + /* needed to resolve pxf service address */ + struct curl_slist *resolve_hosts = NULL; + char *pxf_host_entry = (char *) palloc0(strlen(pxf_service_address) + strlen(LocalhostIpV4Entry) + 1); --- End diff -- Is `pxf_host_entry` pfree'd? > HAWQ fails to query external table defined with "localhost" in URL > -- > > Key: HAWQ-932 > URL: https://issues.apache.org/jira/browse/HAWQ-932 > Project: Apache HAWQ > Issue Type: Bug > Components: External Tables, PXF >Reporter: Goden Yao >Assignee: Oleksandr Diachenko > Fix For: 2.0.1.0-incubating > > > Originally reported by [~jpatel] when he's making a docker image based on > HAWQ 2.0.0.0-incubating dev build. Investigated by [~odiachenko] > There is workaround to define it with 127.0.0.1, but there is not a > workaround for querying tables using HCatalog integration. > It used to work before. > {code} > template1=# CREATE EXTERNAL TABLE ext_table1 (t1text, t2text, > num1 integer, dub1 double precision) LOCATION > (E'pxf://localhost:51200/hive_small_data?PROFILE=Hive') FORMAT 'CUSTOM' > (formatter='pxfwritable_import');* > CREATE EXTERNAL TABLE > template1=# select * from ext_table1; > ERROR: remote component error (0): (libchurl.c:898)* > {code} > When I turned on debug mode in curl, I found this error in logs - "* > Closing connection 0". > I found a workaround, to set CURLOPT_RESOLVE option in curl: > {code} > struct curl_slist *host = NULL; > host = curl_slist_append(NULL, "localhost:51200:127.0.0.1");* > set_curl_option(context, CURLOPT_RESOLVE, host); > {code} > It seems like an issue with DNS cache, -- This message was sent by Atlassian JIRA (v6.3.4#6332)