[ https://issues.apache.org/jira/browse/HAWQ-1119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623612#comment-15623612 ]
ASF GitHub Bot commented on HAWQ-1119: -------------------------------------- Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85814125 --- Diff: pxf/HDFSWritablePXF.html.md.erb --- @@ -0,0 +1,410 @@ +--- +title: Writing Data to HDFS +--- + +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles. You might create a writable table to export data from a HAWQ internal table to HDFS. + +This section describes how to use these PXF profiles to create writable external tables. + +**Note**: You cannot directly query data in a HAWQ writable table. After creating the external writable table, you must create a HAWQ readable external table accessing the HDFS file, then query that table. ??You can also create a Hive table to access the HDFS file.?? + +## <a id="pxfwrite_prereq"></a>Prerequisites + +Before working with HDFS file data using HAWQ and PXF, ensure that: + +- The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. + +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables +The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and `SequenceWritable`. + +Use the following syntax to create a HAWQ external writable table representing HDFS data: + +``` sql +CREATE EXTERNAL WRITABLE TABLE <table_name> + ( <column_name> <data_type> [, ...] | LIKE <other_table> ) +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file> + ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]') +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>); +``` + +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below. + +| Keyword | Value | +|-------|-------------------------------------| +| \<host\>[:\<port\>] | The HDFS NameNode and port. | +| \<path-to-hdfs-file\> | The path to the file in the HDFS data store. | +| PROFILE | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. | +| \<custom-option\> | \<custom-option\> is profile-specific. These options are discussed in the next topic.| +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile when \<path-to-hdfs-file\> will reference a plain text delimited file. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=<delim>)` \<formatting-property\>. | +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` when \<path-to-hdfs-file\> will reference a comma-separated value file. | +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)` (write) and `(formatter='pxfwritable_import)` (read) \<formatting-properties\>. + +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification. + +## <a id="profile_hdfstextsimple"></a>Custom Options + +The `HdfsTextSimple` and `SequenceWritable` profiles support the following \<custom-options\>: + +| Keyword | Value Description | +|-------|-------------------------------------| +| COMPRESSION_CODEC | The compression codec Java class name. If this option is not provided, no data compression is performed. Supported compression codecs include: `org.apache.hadoop.io.compress.DefaultCodec`, `org.apache.hadoop.io.compress.BZip2Codec`, and `org.apache.hadoop.io.compress.GzipCodec` (`HdfsTextSimple` profile only) | +| COMPRESSION_TYPE | The compression type to employ; supported values are `RECORD` (the default) or `BLOCK`. | +| DATA-SCHEMA | (`SequenceWritable` profile only) The name of the writer serialization/deserialization class. The jar file in which this class resides must be in the PXF class path. This option has no default value. | +| THREAD-SAFE | Boolean value determining if a table query can run in multi-thread mode. Default value is `TRUE`, requests run in multi-threaded mode. When set to `FALSE`, requests will be handled in a single thread. `THREAD-SAFE` should be set appropriately when operations that are not thread-safe are performed (i.e. compression). | + +## <a id="profile_hdfstextsimple"></a>HdfsTextSimple Profile + +Use the `HdfsTextSimple` profile when writing delimited data to a plain text file where each row is a single record. + +Writable tables created using the `HdfsTextSimple` profile can use no, record, or block compression. When compression is used, the default, gzip, and bzip2 Hadoop compression codecs are supported: + +- org.apache.hadoop.io.compress.DefaultCodec +- org.apache.hadoop.io.compress.GzipCodec +- org.apache.hadoop.io.compress.BZip2Codec + +\<formatting-properties\> supported by the `HdfsTextSimple` profile include: + +| Keyword | Value | +|-------|-------------------------------------| +| delimiter | The delimiter character to use when writing the file. Default value is a comma `,`.| + + +### <a id="profile_hdfstextsimple_writing"></a>Example: Writing Using the HdfsTextSimple Profile --- End diff -- Writing *Data* Using the HdfsTextSimple Profile > create new documentation topic for PXF writable profiles > -------------------------------------------------------- > > Key: HAWQ-1119 > URL: https://issues.apache.org/jira/browse/HAWQ-1119 > Project: Apache HAWQ > Issue Type: Improvement > Components: Documentation > Reporter: Lisa Owen > Assignee: David Yozie > Fix For: 2.0.1.0-incubating > > > certain profiles supported by the existing PXF plug-ins support writable > tables. create some documentation content for these profiles. -- This message was sent by Atlassian JIRA (v6.3.4#6332)