[ 
https://issues.apache.org/jira/browse/HAWQ-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858621#comment-15858621
 ] 

ASF GitHub Bot commented on HAWQ-1304:
--------------------------------------

Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/94#discussion_r100185543
  
    --- Diff: markdown/pxf/HivePXF.html.md.erb ---
    @@ -164,9 +164,143 @@ Create a Hive table to expose our sample data set.
     
     In examples later in this section, you will access the `sales_info` Hive 
table directly via PXF. You will also insert `sales_info` data into tables of 
other Hive file format types, and use PXF to access those directly as well.
     
    +
    +## <a id="hcatalog"></a>Using PXF and HCatalog to Query Hive
    +
    +You can query Hive tables directly through HCatalog integration with HAWQ 
and PXF, regardless of the underlying file storage format. This integration 
allows HAWQ to directly use table metadata stored in HCatalog.
    +
    +HCatalog is built on top of the Hive metastore and incorporates Hive's 
DDL. This provides several advantages:
    +
    +-   You do not need to know the table schema of your Hive tables
    +-   You do not need to manually enter information about Hive table 
location or format
    +-   If Hive table metadata changes, HCatalog provides updated metadata. 
This is in contrast to the use of static external PXF tables to define Hive 
table metadata for HAWQ.
    +
    +The following diagram depicts how HAWQ integrates with HCatalog to query 
Hive tables:
    +
    +<img src="../images/hawq_hcatalog.png" id="hcatalog__image_ukw_h2v_c5" 
class="image" width="672" />
    +
    +1.  HAWQ retrieves table metadata from HCatalog using PXF.
    +2.  HAWQ creates in-memory catalog tables from the retrieved metadata. If 
a table is referenced multiple times in a transaction, HAWQ uses its in-memory 
metadata to reduce external calls to HCatalog.
    +3.  PXF queries Hive using table metadata that is stored in the HAWQ 
in-memory catalog tables. Table metadata is dropped at the end of the 
transaction.
    +
    +
    +### <a id="topic_j1l_enabling"></a>Enabling HCatalog Integration
    +
    +To enable HCatalog query integration in HAWQ, perform the following steps:
    +
    +1.  Make sure your deployment meets the requirements listed in 
[Prerequisites](#installingthepxfhiveplugin).
    +2.  If necessary, set the `pxf_service_address` global configuration 
property to the hostname or IP address and port where you have installed the 
PXF Hive plug-in. By default, the value is set to `localhost:51200`.
    +
    +    ``` sql
    +    postgres=# SET pxf_service_address TO <hivenode>:51200
    +    ```
    +
    +3.  HCatalog internally uses the `pxf` protocol to query.  Grant this 
protocol privilege to all roles requiring access:
    +
    +    ``` sql
    +    postgres=# GRANT ALL ON PROTOCOL pxf TO <role>;
    +    ```
    +
    +4. It is not recommended to create a HAWQ table using the `WITH (OIDS)` 
clause. If any user tables were created using the `WITH (OIDS)` clause, 
additional operations are required to enable HCatalog integration. To access a 
Hive table via HCatalog when user tables were created using `WITH (OIDS)`, HAWQ 
users must have `SELECT` permission to query every user table within the same 
schema that was created using the `WITH (OIDS)` clause.
    +
    +    1. Determine which user tables were created using the `WITH (OIDS)` 
clause:
    +
    +        ``` sql
    +        postgres=# SELECT oid, relname FROM pg_class
    +                     WHERE relhasoids = true
    +                       AND relnamespace <> (SELECT oid FROM pg_namespace 
WHERE nspname = 'pg_catalog');
    +        ```
    +
    +    2. Grant `SELECT` privilege on all returned tables to all roles to 
which you chose to provide HCatalog query access. For example:
    +
    +        ``` sql
    +        postgres=# GRANT SELECT ON <table-created-WITH-OIDS> TO <role>
    +        ```
    +
    +### <a id="topic_j1l_y55_c5"></a>Usage
    +
    +To query a Hive table with HCatalog integration, query HCatalog directly 
from HAWQ. The query syntax is:
    +
    +``` sql
    +postgres=# SELECT * FROM hcatalog.hive-db-name.hive-table-name;
    +```
    +
    +For example:
    +
    +``` sql
    +postgres=# SELECT * FROM hcatalog.default.sales_info;
    +```
    +
    +To obtain a description of a Hive table with HCatalog integration, you can 
use the `psql` client interface.
    +
    +-   Within HAWQ, use either the `\d                                        
 hcatalog.hive-db-name.hive-table-name` or `\d+                                 
        hcatalog.hive-db-name.hive-table-name` commands to describe a single 
table.  `\d` displays only HAWQ's interpretation of the underlying source (Hive 
in this case) data type, while `\d+` displays both the HAWQ interpreted and 
Hive source data types. For example, from the `psql` client interface:
    +
    +    ``` shell
    +    $ psql -d postgres
    +    ```
    +
    +    ``` sql
    +    postgres=# \d+ hcatalog.default.sales_info_rcfile;
    +    ```
    +
    +    ``` shell
    +       PXF Hive Table "default.sales_info"
    +          Column      |  Type  | Source type 
    +    ------------------+--------+-------------
    +     location         | text   | string
    +     month            | text   | string
    +     number_of_orders | int4   | int
    +     total_sales      | float8 | double
    +    ```
    +-   Use `\d hcatalog.hive-db-name.*` to describe the whole database 
schema, i.e. all tables in `hive-db-name`.
    +-   Use `\d hcatalog.*.*` to describe the whole schema, i.e. all databases 
and tables.
    +
    +When using `\d` or `\d+` commands in the `psql` HAWQ client, `hcatalog` 
will not be listed as a database. If you use other `psql` compatible clients, 
`hcatalog` will be listed as a database with a size value of `-1` since 
`hcatalog` is not a real database in HAWQ.
    +
    +Alternatively, you can use the `pxf_get_item_fields` user-defined function 
(UDF) to obtain Hive table descriptions from other client interfaces or 
third-party applications. The UDF takes a PXF profile and a table pattern 
string as its input parameters.  **Note:** The only supported input profile at 
this time is `'Hive'`.
    +
    +- The following statement returns a description of a specific table. The 
description includes path, itemname (table), fieldname, fieldtype (HAWQ type), 
and sourcefieldtype (Hive type).
    +
    +    ``` sql
    +    postgres=# SELECT * FROM 
pxf_get_item_fields('Hive','default.sales_info');
    +    ```
    +
    +    ``` pre
    +      path   |  itemname  |    fieldname     | fieldtype | sourcefieldtype 
    +    ---------+------------+------------------+-----------+-----------------
    +     default | sales_info | location         | text      | string
    +     default | sales_info | month            | text      | string
    +     default | sales_info | number_of_orders | int4      | int
    +     default | sales_info | total_sales      | float8    | double
    +    ```
    +
    +- The following statement returns table descriptions from the default 
database.
    +
    +    ``` sql
    +    postgres=# SELECT * FROM pxf_get_item_fields('Hive','default.*');
    +    ```
    +
    +- The following statement returns a description of the entire schema.
    +
    +    ``` sql
    +    postgres=# SELECT * FROM pxf_get_item_fields('Hive', '*.*');
    +    ```
    +
    +### <a id="topic_r5k_pst_25"></a>Limitations
    +
    +HCatalog integration has the following limitations:
    +
    +-   HCatalog queries on Hive tables with complex type fields return those 
fields serialized as text.
    +-   Even for primitive types, HCatalog metadata descriptions produced by 
`\d` are HAWQ's interpretation of the underlying Hive data types. For example, 
the Hive type `tinyint` is converted to HAWQ type `int2`. (See [Data Type 
Mapping](#hive_primdatatypes).)
    +-   HAWQ reserves the database name `hcatalog` for system use. You cannot 
connect to or alter the system `hcatalog` database.
    +
    +
    +
     ## <a id="topic_p2s_lvl_28"></a>Querying External Hive Data
     
    -The PXF Hive plug-in supports several Hive-related profiles. These include 
`Hive`, `HiveText`, and `HiveRC`.
    +In the previous section, you used HCatalog integration to query a Hive 
table. You can also create a PXF/HAWQ external table to access Hive table data. 
This Hive table access mechanism requires that you identify an appropriate Hive 
profile.
    +
    +The PXF Hive plug-in supports several Hive-related profiles. These include 
`Hive`, `HiveText`, and `HiveRC`. `HiveText` and `HiveRC` profiles are 
specifically optimized for text and RC file formats, respectively. The `Hive` 
profile is optimized for all file storage types; use the `Hive` profile when 
the underlying Hive table is composed of multiple partitions with differing 
file formats.
    --- End diff --
    
    Third sentence here should start with "The"


> documentation changes for HAWQ-1228
> -----------------------------------
>
>                 Key: HAWQ-1304
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1304
>             Project: Apache HAWQ
>          Issue Type: New Feature
>          Components: Documentation
>    Affects Versions: 2.1.0.0-incubating
>            Reporter: Lisa Owen
>            Assignee: David Yozie
>            Priority: Minor
>
> - new pxf-profiles.xml outputFormat parameter
> - hive table access via external table and hcatalog now uses optimal profile 
> for each fragment
> - others



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to