[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

ASF GitHub Bot (JIRA) Tue, 25 Oct 2016 14:29:20 -0700

    [ 
https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606507#comment-15606507
 ]


ASF GitHub Bot commented on HAWQ-1107:
--------------------------------------

Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r84997781
  
    --- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
    @@ -2,388 +2,282 @@
     title: Accessing HDFS File Data
     ---
     
    -## <a id="installingthepxfhdfsplugin"></a>Prerequisites
    +HDFS is the primary distributed storage mechanism used by Apache Hadoop 
applications. The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in 
supports plain delimited and comma-separated-value format text files.  The HDFS 
plug-in also supports the Avro binary format.
     
    -Before working with HDFS file data using HAWQ and PXF, you should perform 
the following operations:
    +This section describes how to use PXF to access HDFS data, including how 
to create and query an external table from files in the HDFS data store.
     
    --   Test PXF on HDFS before connecting to Hive or HBase.
    --   Ensure that all HDFS users have read permissions to HDFS services and 
that write permissions have been limited to specific users.
    +## <a id="hdfsplugin_prereq"></a>Prerequisites
     
    -## <a id="syntax1"></a>Syntax
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
     
    -The syntax for creating an external HDFS file is as follows: 
    +-   The HDFS plug-in is installed on all cluster nodes.
    +-   All HDFS users have read permissions to HDFS services and that write 
permissions have been restricted to specific users.
     
    -``` sql
    -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
    -    ( column_name data_type [, ...] | LIKE other_table )
    -LOCATION ('pxf://host[:port]/path-to-data?<pxf 
parameters>[&custom-option=value...]')
    -      FORMAT '[TEXT | CSV | CUSTOM]' (<formatting_properties>);
    -```
    +## <a id="hdfsplugin_fileformats"></a>HDFS File Formats
     
    -where `<pxf parameters>` is:
    +The PXF HDFS plug-in supports reading the following file formats:
     
    -``` pre
    -   
FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class]
    - | PROFILE=profile-name
    -```
    +- Text File - comma-separated value (.csv) or delimited format plain text 
file
    +- Avro - JSON-defined, schema-based data serialization format
     
    -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
    +The PXF HDFS plug-in includes the following profiles to support the file 
formats listed above:
     
    -Use an SQL `SELECT` statement to read from an HDFS READABLE table:
    +- `HdfsTextSimple` - text files
    +- `HdfsTextMulti` - text files with embedded line feeds
    +- `Avro` - Avro files
     
    -``` sql
    -SELECT ... FROM table_name;
    -```
     
    -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table:
    +## <a id="hdfsplugin_cmdline"></a>HDFS Shell Commands
    +Hadoop includes command-line tools that interact directly with HDFS.  
These tools support typical file system operations including copying and 
listing files, changing file permissions, etc. 
     
    -``` sql
    -INSERT INTO table_name ...;
    -```
    +The HDFS file system command is `hdfs dfs <options> [<file>]`. Invoked 
with no options, `hdfs dfs` lists the file system options supported by the tool.
    --- End diff --
    
    command -> command syntax


> PXF HDFS documentation - restructure content and include more examples
> ----------------------------------------------------------------------
>
>                 Key: HAWQ-1107
>                 URL: https://issues.apache.org/jira/browse/HAWQ-1107
>             Project: Apache HAWQ
>          Issue Type: Improvement
>          Components: Documentation
>            Reporter: Lisa Owen
>            Assignee: David Yozie
>            Priority: Minor
>             Fix For: 2.0.1.0-incubating
>
>
> the current PXF HDFS documentation does not include any runnable examples.  
> add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, 
> Avro) profiles.  restructure the content as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HAWQ-1107) PXF HDFS documentation - restructure content and include more examples

Reply via email to