[ 
https://issues.apache.org/jira/browse/TAJO-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13960061#comment-13960061
 ] 

David Chen commented on TAJO-736:
---------------------------------

Not a problem! The documentation looks great so far. I have some minor comments:

{quote}
For more details, please refer Parquet File Format.
{quote}
Should be "please refer to the Parquet File Format."

{quote}
If you are not familiar with CREATE TABLE statement, please refer Data 
Definition Language Data Definition Language.
{quote}
Similar to above, add a "to" after "refer." Also, "Data Definition Language" 
appears to be repeated.

{quote}
In order to specify a certain file format for your table, you need to use USING 
PARQUET clause
{quote}

Add a "the" after "use."

{quote}
The below is an example statement for creating a table using PARQUET files. 
WITH clause allows users to specify a set of physical properties.
{quote}

I'm not sure whether PARQUET needs to be all-caps here. Also, add a "the" 
before "WITH."

{quote}
Some table file formats provide special enable/disable features and the ways to 
adjust some physical parameters. WITH clause in CREATE TABLE statement allows 
users to set those physical parameters.
{quote}

I think it might be better to rephrase this as: "Some table storage formats 
provide parameters for enabling or disabling features and adjusting physical 
parameters. The WITH clause in the CREATE TABLE statement allows users to set 
those parameters."

{quote}
Now, Parquet file provides the following physical properties.
{quote}

"The Parquet storage format provides the following physical properties:"

{quote}
Larger values will improve the IO when reading but consume more memory when 
writing.
{quote}

IO should be I/O.

{quote}
The compression algorithm used to compress pages. It should be one of 
UNCOMPRESSED, SNAPPY, GZIP, LZO. Default is UNCOMPRESSED.
{quote}

I believe the convention for the compression codec names from the Parquet 
documentation is to use all lowercase (even though code-wise, ALL CAPS still 
works anyway :)).

{quote}
Compatibility Issues
{quote}

It might be possible that users might try to use Parquet files with nested 
schemas and non-scalar types, which is currently a compatibility issue. Should 
we add a note that we are currently working on adding support for nested 
schemas on non-scalar types?

--------

If you would like, I would be glad to review the documentation for the other 
storage formats too. Can you create a RB review for this patch? It might be 
easier to review on RB.

Thanks,
David

> Add table management documentation
> ----------------------------------
>
>                 Key: TAJO-736
>                 URL: https://issues.apache.org/jira/browse/TAJO-736
>             Project: Tajo
>          Issue Type: Sub-task
>          Components: documentation
>            Reporter: Hyunsik Choi
>            Assignee: Hyunsik Choi
>             Fix For: 0.8-incubating, 1.0-incubating
>
>         Attachments: TAJO-736.patch
>
>
> Jinho and I wrote some user documentations for file formats. This patch 
> contains documentations for CSV file, RCFile, and Parquet file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to