Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/94#discussion_r99384731
  
    --- Diff: markdown/pxf/PXFExternalTableandAPIReference.html.md.erb ---
    @@ -232,23 +250,23 @@ public class InputData {
     
     ### <a id="fragmenter"></a>Fragmenter
     
    -**Note:** The Fragmenter Plugin reads data into HAWQ readable external 
tables. The Fragmenter Plugin cannot write data out of HAWQ into writable 
external tables.
    +**Note:** The Fragmenter class reads data into HAWQ readable external 
tables. The Fragmenter class cannot write data out of HAWQ into writable 
external tables.
     
    -The Fragmenter is responsible for passing datasource metadata back to 
HAWQ. It also returns a list of data fragments to the Accessor or Resolver. 
Each data fragment describes some part of the requested data set. It contains 
the datasource name, such as the file or table name, including the hostname 
where it is located. For example, if the source is a HDFS file, the Fragmenter 
returns a list of data fragments containing a HDFS file block. Each fragment 
includes the location of the block. If the source data is an HBase table, the 
Fragmenter returns information about table regions, including their locations.
    +The Fragmenter is responsible for passing datasource metadata back to 
HAWQ. It also returns a list of data fragments to the Accessor or Resolver. 
Each data fragment describes some part of the requested data set. It contains 
the datasource name, such as the file or table name, including the hostname 
where it is located. For example, if the source is an HDFS file, the Fragmenter 
returns a list of data fragments containing an HDFS file block. Each fragment 
includes the location of the block. If the source data is an HBase table, the 
Fragmenter returns information about table regions, including their locations.
     
     The `ANALYZE` command now retrieves advanced statistics for PXF readable 
tables by estimating the number of tuples in a table, creating a sample table 
from the external table, and running advanced statistics queries on the sample 
table in the same way statistics are collected for native HAWQ tables.
     
     The configuration parameter `pxf_enable_stat_collection` controls 
collection of advanced statistics. If `pxf_enable_stat_collection` is set to 
false, no analysis is performed on PXF tables. An additional parameter, 
`pxf_stat_max_fragments`, controls the number of fragments sampled to build a 
sample table. By default `pxf_stat_max_fragments` is set to 100, which means 
that even if there are more than 100 fragments, only this number of fragments 
will be used in `ANALYZE` to sample the data. Increasing this number will 
result in better sampling, but can also impact performance.
     
    -When a PXF table is analyzed and `pxf_enable_stat_collection` is set to 
off, or an error occurs because the table is not defined correctly, the PXF 
service is down, or `getFragmentsStats` is not implemented, a warning message 
is shown and no statistics are gathered for that table. If `ANALYZE` is running 
over all tables in the database, the next table will be processed – a failure 
processing one table does not stop the command.
    +When a PXF table is analyzed and `pxf_enable_stat_collection` is set to 
off, or an error occurs because the table is not defined correctly, the PXF 
service is down, or `getFragmentsStats()` is not implemented, a warning message 
is shown and no statistics are gathered for that table. If `ANALYZE` is running 
over all tables in the database, the next table will be processed – a failure 
processing one table does not stop the command.
     
     For a detailed explanation about HAWQ statistical data gathering, see 
`ANALYZE` in the SQL Commands Reference.
     
     **Note:**
     
     -   Depending on external table size, the time required to complete an 
ANALYZE operation can be lengthy. The boolean parameter 
`pxf_enable_stat_collection` enables statistics collection for PXF. The default 
value is `on`. Turning this parameter off (disabling PXF statistics collection) 
can help decrease the time needed for the ANALYZE operation.
    --   You can also use *pxf\_stat\_max\_fragments* to limit the number of 
fragments to be sampled by decreasing it from the default (100). However, if 
the number is too low, the sample might not be uniform and the statistics might 
be skewed.
    --   You can also implement getFragmentsStats to return an error. This will 
cause ANALYZE on a table with this Fragmenter to fail immediately, and default 
statistics values will be used for that table.
    +-   You can also use `pxf_stat_max_fragments` to limit the number of 
fragments to be sampled by decreasing it from the default (100). However, if 
the number is too low, the sample might not be uniform and the statistics might 
be skewed.
    +-   You can also implement `getFragmentsStats()` to return an error. This 
will cause ANALYZE on a table with this Fragmenter to fail immediately, and 
default statistics values will be used for that table.
    --- End diff --
    
    Change ANALYZE to `ANALYZE`  Maybe `Fragmenter` too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to