[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation

ASF GitHub Bot (JIRA) Thu, 27 Oct 2016 09:13:10 -0700

    [ 
https://issues.apache.org/jira/browse/HAWQ-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612359#comment-15612359
 ]


ASF GitHub Bot commented on HAWQ-1071:
--------------------------------------

Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85365959
  
    --- Diff: pxf/HivePXF.html.md.erb ---
    @@ -2,121 +2,450 @@
     title: Accessing Hive Data
     ---
     
    -This topic describes how to access Hive data using PXF. You have several 
options for querying data stored in Hive. You can create external tables in PXF 
and then query those tables, or you can easily query Hive tables by using HAWQ 
and PXF's integration with HCatalog. HAWQ accesses Hive table metadata stored 
in HCatalog.
    +Apache Hive is a distributed data warehousing infrastructure.  Hive 
facilitates managing large data sets supporting multiple data formats, 
including comma-separated value (.csv), RC, ORC, and parquet. The PXF Hive 
plug-in reads data stored in Hive, as well as HDFS or HBase.
    +
    +This section describes how to use PXF to access Hive data. Options for 
querying data stored in Hive include:
    +
    +-  Creating an external table in PXF and querying that table
    +-  Querying Hive tables via PXF's integration with HCatalog
     
     ## <a id="installingthepxfhiveplugin"></a>Prerequisites
     
    -Check the following before using PXF to access Hive:
    +Before accessing Hive data with HAWQ and PXF, ensure that:
     
    --   The PXF HDFS plug-in is installed on all cluster nodes.
    +-   The PXF HDFS plug-in is installed on all cluster nodes. See 
[Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation 
information.
     -   The PXF Hive plug-in is installed on all cluster nodes.
     -   The Hive JAR files and conf directory are installed on all cluster 
nodes.
    --   Test PXF on HDFS before connecting to Hive or HBase.
    +-   You have tested PXF on HDFS.
     -   You are running the Hive Metastore service on a machine in your 
cluster. 
     -   You have set the `hive.metastore.uris` property in the `hive-site.xml` 
on the NameNode.
     
    +## <a id="topic_p2s_lvl_25"></a>Hive File Formats
    +
    +Hive supports several file formats:
    +
    +-   TextFile - flat file with data in comma-, tab-, or space-separated 
value format or JSON notation
    +-   SequenceFile - flat file consisting of binary key/value pairs
    +-   RCFile - record columnar data consisting of binary key/value pairs; 
high row compression rate
    +-   ORCFile - optimized row columnar data with stripe, footer, and 
postscript sections; reduces data size
    +-   Parquet - compressed columnar data representation
    +-   Avro - JSON-defined, schema-based data serialization format
    +
    +Refer to [File 
Formats](https://cwiki.apache.org/confluence/display/Hive/FileFormats) for 
detailed information about the file formats supported by Hive.
    +
    +The PXF Hive plug-in supports the following profiles for accessing the 
Hive file formats listed above. These include:
    +
    +- `Hive`
    +- `HiveText`
    +- `HiveRC`
    +
    +## <a id="topic_p2s_lvl_29"></a>Data Type Mapping
    +
    +### <a id="hive_primdatatypes"></a>Primitive Data Types
    +
    +To represent Hive data in HAWQ, map data values that use a primitive data 
type to HAWQ columns of the same type.
    +
    +The following table summarizes external mapping rules for Hive primitive 
types.
    +
    +| Hive Data Type  | Hawq Data Type |
    +|-------|---------------------------|
    +| boolean    | bool |
    +| int   | int4 |
    +| smallint   | int2 |
    +| tinyint   | int2 |
    +| bigint   | int8 |
    +| decimal  |  numeric  |
    +| float   | float4 |
    +| double   | float8 |
    +| string   | text |
    +| binary   | bytea |
    +| char   | bpchar |
    +| varchar   | varchar |
    +| timestamp   | timestamp |
    +| date   | date |
    +
    +
    +### <a id="topic_b4v_g3n_25"></a>Complex Data Types
    +
    +Hive supports complex data types including array, struct, map, and union. 
PXF maps each of these complex types to `text`.  While HAWQ does not natively 
support these types, you can create HAWQ functions or application code to 
extract subcomponents of these complex data types.
    +
    +An example using complex data types is provided later in this topic.
    +
    +
    +## <a id="hive_sampledataset"></a>Sample Data Set
    +
    +Examples used in this topic will operate on a common data set. This simple 
data set models a retail sales operation and includes fields with the following 
names and data types:
    +
    +- location - text
    +- month - text
    +- number\_of\_orders - integer
    +- total\_sales - double
    --- End diff --
    
    Also consider term/definition table here.


> add PXF HiveText and HiveRC profile examples to the documentation
> -----------------------------------------------------------------
>
>                 Key: HAWQ-1071
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1071
>             Project: Apache HAWQ
>          Issue Type: Improvement
>          Components: Documentation
>            Reporter: Lisa Owen
>            Assignee: David Yozie
>            Priority: Minor
>             Fix For: 2.0.1.0-incubating
>
>
> the current PXF Hive documentation includes an example for only the Hive 
> profile.  add examples for HiveText and HiveRC profiles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HAWQ-1071) add PXF HiveText and HiveRC profile examples to the documentation

Reply via email to