[ 
https://issues.apache.org/jira/browse/PIG-2143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-2143:
-----------------------------------

    Release Note: 
Documentation has been updated to reflect reality.

An optional second constructor argument is provided that allows one to 
customize advanced behaviors. A list of available options is below:

-schema Stores the schema of the relation using a hidden JSON file.
-noschema Ignores a stored schema during loading.

Schemas
If -schema is specified, a hidden ".pig_schema" file is created in the output 
directory when storing data. It is used by PigStorage (with or without -schema) 
during loading to determine the field names and types of the data without the 
need for a user to explicitly provide the schema in an as clause, unless 
-noschema is specified. No attempt to merge conflicting schemas is made during 
loading. The first schema encountered during a file system scan is used.
In addition, using -schema drops a ".pig_headers" file in the output directory. 
This file simply lists the delimited aliases. This is intended to make export 
to tools that can read files with header lines easier (just cat the header to 
your data).

Note that regardless of whether or not you store the schema, you always need to 
specify the correct delimiter to read your data. If you store reading delimiter 
"#" and then load using the default delimiter, your data will not be parsed 
correctly.



> Make PigStorage optionally store schema; improve docs.
> ------------------------------------------------------
>
>                 Key: PIG-2143
>                 URL: https://issues.apache.org/jira/browse/PIG-2143
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>             Fix For: 0.10
>
>         Attachments: PIG-2143.2.diff, PIG-2143.3.patch, PIG-2143.4.patch, 
> PIG-2143.5.patch, PIG-2143.diff
>
>
> I'd like to propose that we allow for a greater degree of customization in 
> PigStorage.
> An incomplete list features that we might want to add:
> - flag to tell it to overwrite existing output if it exists
> - flag to tell it to compress output using gzip|bzip|lzo (currently this can 
> be achieved by setting the directory name to end in .gz or .bz2, which is a 
> bit awkward)
> - flag to tell it to store the schema and header (perhaps by merging in 
> PigStorageSchema work?)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to