[ https://issues.apache.org/jira/browse/HAWQ-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15606507#comment-15606507 ]
ASF GitHub Bot commented on HAWQ-1107: -------------------------------------- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r84997781 --- Diff: pxf/HDFSFileDataPXF.html.md.erb --- @@ -2,388 +2,282 @@ title: Accessing HDFS File Data --- -## <a id="installingthepxfhdfsplugin"></a>Prerequisites +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The PXF HDFS plug-in reads file data stored in HDFS. The plug-in supports plain delimited and comma-separated-value format text files. The HDFS plug-in also supports the Avro binary format. -Before working with HDFS file data using HAWQ and PXF, you should perform the following operations: +This section describes how to use PXF to access HDFS data, including how to create and query an external table from files in the HDFS data store. -- Test PXF on HDFS before connecting to Hive or HBase. -- Ensure that all HDFS users have read permissions to HDFS services and that write permissions have been limited to specific users. +## <a id="hdfsplugin_prereq"></a>Prerequisites -## <a id="syntax1"></a>Syntax +Before working with HDFS file data using HAWQ and PXF, ensure that: -The syntax for creating an external HDFS file is as follows: +- The HDFS plug-in is installed on all cluster nodes. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. -``` sql -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name - ( column_name data_type [, ...] | LIKE other_table ) -LOCATION ('pxf://host[:port]/path-to-data?<pxf parameters>[&custom-option=value...]') - FORMAT '[TEXT | CSV | CUSTOM]' (<formatting_properties>); -``` +## <a id="hdfsplugin_fileformats"></a>HDFS File Formats -where `<pxf parameters>` is: +The PXF HDFS plug-in supports reading the following file formats: -``` pre - FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class] - | PROFILE=profile-name -``` +- Text File - comma-separated value (.csv) or delimited format plain text file +- Avro - JSON-defined, schema-based data serialization format -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables. +The PXF HDFS plug-in includes the following profiles to support the file formats listed above: -Use an SQL `SELECT` statement to read from an HDFS READABLE table: +- `HdfsTextSimple` - text files +- `HdfsTextMulti` - text files with embedded line feeds +- `Avro` - Avro files -``` sql -SELECT ... FROM table_name; -``` -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table: +## <a id="hdfsplugin_cmdline"></a>HDFS Shell Commands +Hadoop includes command-line tools that interact directly with HDFS. These tools support typical file system operations including copying and listing files, changing file permissions, etc. -``` sql -INSERT INTO table_name ...; -``` +The HDFS file system command is `hdfs dfs <options> [<file>]`. Invoked with no options, `hdfs dfs` lists the file system options supported by the tool. --- End diff -- command -> command syntax > PXF HDFS documentation - restructure content and include more examples > ---------------------------------------------------------------------- > > Key: HAWQ-1107 > URL: https://issues.apache.org/jira/browse/HAWQ-1107 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation > Reporter: Lisa Owen > Assignee: David Yozie > Priority: Minor > Fix For: 2.0.1.0-incubating > > > the current PXF HDFS documentation does not include any runnable examples. > add runnable examples for all (HdfsTextSimple, HdfsTextMulti, SerialWritable, > Avro) profiles. restructure the content as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)