[
https://issues.apache.org/jira/browse/TAJO-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13960061#comment-13960061
]
David Chen commented on TAJO-736:
---------------------------------
Not a problem! The documentation looks great so far. I have some minor comments:
{quote}
For more details, please refer Parquet File Format.
{quote}
Should be "please refer to the Parquet File Format."
{quote}
If you are not familiar with CREATE TABLE statement, please refer Data
Definition Language Data Definition Language.
{quote}
Similar to above, add a "to" after "refer." Also, "Data Definition Language"
appears to be repeated.
{quote}
In order to specify a certain file format for your table, you need to use USING
PARQUET clause
{quote}
Add a "the" after "use."
{quote}
The below is an example statement for creating a table using PARQUET files.
WITH clause allows users to specify a set of physical properties.
{quote}
I'm not sure whether PARQUET needs to be all-caps here. Also, add a "the"
before "WITH."
{quote}
Some table file formats provide special enable/disable features and the ways to
adjust some physical parameters. WITH clause in CREATE TABLE statement allows
users to set those physical parameters.
{quote}
I think it might be better to rephrase this as: "Some table storage formats
provide parameters for enabling or disabling features and adjusting physical
parameters. The WITH clause in the CREATE TABLE statement allows users to set
those parameters."
{quote}
Now, Parquet file provides the following physical properties.
{quote}
"The Parquet storage format provides the following physical properties:"
{quote}
Larger values will improve the IO when reading but consume more memory when
writing.
{quote}
IO should be I/O.
{quote}
The compression algorithm used to compress pages. It should be one of
UNCOMPRESSED, SNAPPY, GZIP, LZO. Default is UNCOMPRESSED.
{quote}
I believe the convention for the compression codec names from the Parquet
documentation is to use all lowercase (even though code-wise, ALL CAPS still
works anyway :)).
{quote}
Compatibility Issues
{quote}
It might be possible that users might try to use Parquet files with nested
schemas and non-scalar types, which is currently a compatibility issue. Should
we add a note that we are currently working on adding support for nested
schemas on non-scalar types?
--------
If you would like, I would be glad to review the documentation for the other
storage formats too. Can you create a RB review for this patch? It might be
easier to review on RB.
Thanks,
David
> Add table management documentation
> ----------------------------------
>
> Key: TAJO-736
> URL: https://issues.apache.org/jira/browse/TAJO-736
> Project: Tajo
> Issue Type: Sub-task
> Components: documentation
> Reporter: Hyunsik Choi
> Assignee: Hyunsik Choi
> Fix For: 0.8-incubating, 1.0-incubating
>
> Attachments: TAJO-736.patch
>
>
> Jinho and I wrote some user documentations for file formats. This patch
> contains documentations for CSV file, RCFile, and Parquet file.
--
This message was sent by Atlassian JIRA
(v6.2#6252)