[ https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612360#comment-15612360 ]
ASF GitHub Bot commented on HAWQ-1071: -------------------------------------- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85371576 --- Diff: pxf/HivePXF.html.md.erb --- @@ -151,184 +477,120 @@ To enable HCatalog query integration in HAWQ, perform the following steps: postgres=# GRANT ALL ON PROTOCOL pxf TO "role"; ``` -3. To query a Hive table with HCatalog integration, simply query HCatalog directly from HAWQ. The query syntax is: - ``` sql - postgres=# SELECT * FROM hcatalog.hive-db-name.hive-table-name; - ``` +To query a Hive table with HCatalog integration, query HCatalog directly from HAWQ. The query syntax is: + +``` sql +postgres=# SELECT * FROM hcatalog.hive-db-name.hive-table-name; +``` - For example: +For example: - ``` sql - postgres=# SELECT * FROM hcatalog.default.sales; - ``` - -4. To obtain a description of a Hive table with HCatalog integration, you can use the `psql` client interface. - - Within HAWQ, use either the `\d hcatalog.hive-db-name.hive-table-name` or `\d+ hcatalog.hive-db-name.hive-table-name` commands to describe a single table. For example, from the `psql` client interface: - - ``` shell - $ psql -d postgres - postgres=# \d hcatalog.default.test - - PXF Hive Table "default.test" - Column | Type - --------------+-------- - name | text - type | text - supplier_key | int4 - full_price | float8 - ``` - - Use `\d hcatalog.hive-db-name.*` to describe the whole database schema. For example: - - ``` shell - postgres=# \d hcatalog.default.* - - PXF Hive Table "default.test" - Column | Type - --------------+-------- - type | text - name | text - supplier_key | int4 - full_price | float8 - - PXF Hive Table "default.testabc" - Column | Type - --------+------ - type | text - name | text - ``` - - Use `\d hcatalog.*.*` to describe the whole schema: - - ``` shell - postgres=# \d hcatalog.*.* - - PXF Hive Table "default.test" - Column | Type - --------------+-------- - type | text - name | text - supplier_key | int4 - full_price | float8 - - PXF Hive Table "default.testabc" - Column | Type - --------+------ - type | text - name | text - - PXF Hive Table "userdb.test" - Column | Type - ----------+------ - address | text - username | text - - ``` - - **Note:** When using `\d` or `\d+` commands in the `psql` HAWQ client, `hcatalog` will not be listed as a database. If you use other `psql` compatible clients, `hcatalog` will be listed as a database with a size value of `-1` since `hcatalog` is not a real database in HAWQ. - -5. Alternatively, you can use the **pxf\_get\_item\_fields** user-defined function (UDF) to obtain Hive table descriptions from other client interfaces or third-party applications. The UDF takes a PXF profile and a table pattern string as its input parameters. - - **Note:** Currently the only supported input profile is `'Hive'`. - - For example, the following statement returns a description of a specific table. The description includes path, itemname (table), fieldname, and fieldtype. +``` sql +postgres=# SELECT * FROM hcatalog.default.sales_info; +``` + +To obtain a description of a Hive table with HCatalog integration, you can use the `psql` client interface. + +- Within HAWQ, use either the `\d hcatalog.hive-db-name.hive-table-name` or `\d+ hcatalog.hive-db-name.hive-table-name` commands to describe a single table. For example, from the `psql` client interface: + + ``` shell + $ psql -d postgres + ``` ``` sql - postgres=# select * from pxf_get_item_fields('Hive','default.test'); + postgres=# \d hcatalog.default.sales_info_rcfile; ``` - - ``` pre - path | itemname | fieldname | fieldtype - ---------+----------+--------------+----------- - default | test | name | text - default | test | type | text - default | test | supplier_key | int4 - default | test | full_price | float8 - (4 rows) + + ``` shell + PXF Hive Table "default.sales_info_rcfile" + Column | Type + ------------------+-------- + location | text + month | text + number_of_orders | int4 + total_sales | float8 ``` +- Use `\d hcatalog.hive-db-name.*` to describe the whole database schema, i.e. all tables in `hive-db-name`. +- Use `\d hcatalog.*.*` to describe the whole schema, i.e. all databases and tables. - The following statement returns table descriptions from the default database. +When using `\d` or `\d+` commands in the `psql` HAWQ client, `hcatalog` will not be listed as a database. If you use other `psql` compatible clients, `hcatalog` will be listed as a database with a size value of `-1` since `hcatalog` is not a real database in HAWQ. + +Alternatively, you can use the `pxf_get_item_fields` user-defined function (UDF) to obtain Hive table descriptions from other client interfaces or third-party applications. The UDF takes a PXF profile and a table pattern string as its input parameters. **Note:** The only supported input profile at this time is `'Hive'`. + +- The following statement returns a description of a specific table. The description includes path, itemname (table), fieldname, and fieldtype. ``` sql - postgres=# select * from pxf_get_item_fields('Hive','default.*'); + postgres=# SELECT * FROM pxf_get_item_fields('Hive','default.sales_info_rcfile'); ``` - + ``` pre - path | itemname | fieldname | fieldtype - ---------+----------+--------------+----------- - default | test | name | text - default | test | type | text - default | test | supplier_key | int4 - default | test | full_price | float8 - default | testabc | name | text - default | testabc | type | text - (6 rows) + path | itemname | fieldname | fieldtype + ---------+-------------------+------------------+----------- + default | sales_info_rcfile | location | text + default | sales_info_rcfile | month | text + default | sales_info_rcfile | number_of_orders | int4 + default | sales_info_rcfile | total_sales | float8 ``` - The following statement returns a description of the entire schema. +- The following statement returns table descriptions from the default database. ``` sql - postgres=# select * from pxf_get_item_fields('Hive', '*.*'); + postgres=# SELECT * FROM pxf_get_item_fields('Hive','default.*'); ``` - - ``` pre - path | itemname | fieldname | fieldtype - ---------+----------+--------------+----------- - default | test | name | text - default | test | type | text - default | test | supplier_key | int4 - default | test | full_price | float8 - default | testabc | name | text - default | testabc | type | text - userdb | test | username | text - userdb | test | address | text - (8 rows) + +- The following statement returns a description of the entire schema. + + ``` sql + postgres=# SELECT * FROM pxf_get_item_fields('Hive', '*.*'); ``` ### <a id="topic_r5k_pst_25"></a>Limitations HCatalog integration has the following limitations: -- HCatalog integration queries and describe commands do not support complex types; only primitive types are supported. Use PXF external tables to query complex types in Hive instead. (See [Hive Complex Types](#topic_b4v_g3n_25) for example.) -- Even for primitive types, HCatalog metadata descriptions produced by `\d` and` \d+` are converted to HAWQ types. For example, the Hive type `tinyint` is converted to HAWQ type `int2`. (See [Data Types](../reference/HAWQDataTypes.html) for a list of data types in HAWQ.) -- HAWQ reserves the database name "hcatalog" for system use. You cannot connect to or alter the system "hcatalog" database. +- HCatalog integration queries and describe commands do not support complex types; only primitive types are supported. Use PXF external tables to query complex types in Hive. (See [Complex Types Example](#complex_dt_example).) +- Even for primitive types, HCatalog metadata descriptions produced by `\d` and` \d+` are converted to HAWQ types. For example, the Hive type `tinyint` is converted to HAWQ type `int2`. (See [Data Type Mapping](#hive_primdatatypes).) +- HAWQ reserves the database name `hcatalog` for system use. You cannot connect to or alter the system `hcatalog` database. ## <a id="partitionfiltering"></a>Partition Filtering -The PXF Hive plug-in uses the Hive partitioning feature and directory structure. This enables partition exclusion on HDFS files that contain the Hive table. To use the partition filtering feature to reduce network traffic and I/O, run a PXF query using a WHERE clause that refers to a specific partition in the partitioned Hive table. +The PXF Hive plug-in supports the Hive partitioning feature and directory structure. This enables partition exclusion on selected HDFS files comprising the Hive table. To use the partition filtering feature to reduce network traffic and I/O, run a PXF query using a `WHERE` clause that refers to a specific partition in the partitioned Hive table. -To take advantage of PXF partition filtering push-down, name the partition fields in the external table. These names must be the same as the names stored in the Hive table. Otherwise, PXF ignores Partition filtering and the filtering is performed on the HAWQ side, impacting performance. +To take advantage of PXF partition filtering push-down, the Hive and PXF partition field names should be the same. Otherwise, PXF ignores partition filtering and the filtering is performed on the HAWQ side, impacting performance. -**Note:** The Hive plug-in only filters on partition columns, not on other table attributes. +**Note:** The Hive plug-in filters only on partition columns, not on other table attributes. -### <a id="example2"></a>Example +### <a id="example2"></a>Create Partitioned Hive Table Create a Hive table `sales_part` with two partition columns, `delivery_state` and `delivery_city:` ``` sql -hive> CREATE TABLE sales_part (name STRING, type STRING, supplier_key INT, price DOUBLE) -PARTITIONED BY (delivery_state STRING, delivery_city STRING) -ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; +hive> CREATE TABLE sales_part (name string, type string, supplier_key int, price double) + PARTITIONED BY (delivery_state string, delivery_city string) + ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; ``` Load data into this Hive table and add some partitions: ``` sql -hive> LOAD DATA LOCAL INPATH '/local/path/data1.txt' INTO TABLE sales_part PARTITION(delivery_state = 'CALIFORNIA', delivery_city = 'San Francisco'); -hive> LOAD DATA LOCAL INPATH '/local/path/data2.txt' INTO TABLE sales_part PARTITION(delivery_state = 'CALIFORNIA', delivery_city = 'Sacramento'); -hive> LOAD DATA LOCAL INPATH '/local/path/data3.txt' INTO TABLE sales_part PARTITION(delivery_state = 'NEVADA' , delivery_city = 'Reno'); -hive> LOAD DATA LOCAL INPATH '/local/path/data4.txt' INTO TABLE sales_part PARTITION(delivery_state = 'NEVADA' , delivery_city = 'Las Vegas'); +hive> INSERT INTO TABLE sales_part PARTITION(delivery_state = 'CALIFORNIA', delivery_city = 'Fresno') VALUES ('block', 'widget', 33, 15.17); --- End diff -- Might be nice to add line breaks and spacing as with the previous code block, so readers don't have to scroll horizontally. > add PXF HiveText and HiveRC profile examples to the documentation > ----------------------------------------------------------------- > > Key: HAWQ-1071 > URL: https://issues.apache.org/jira/browse/HAWQ-1071 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation > Reporter: Lisa Owen > Assignee: David Yozie > Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF Hive documentation includes an example for only the Hive > profile. add examples for HiveText and HiveRC profiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)