[
https://issues.apache.org/jira/browse/PIG-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020398#comment-13020398
]
Olga Natkovich commented on PIG-1822:
-------------------------------------
Please, append the following subsection to the end of Schemas section:
How Pig Handles Schema
As you can see from above, with a few exceptions, Pig can infer the schema of
a relationship upfront. You can see the schema of particular relation via
describe. Pig enforces this computed schema during the actually computation by
casting the input data to the expected data type. If the process is successful,
the results are returned to the user; otherwise, a warning will be generated
for each record that failed to convert. Note that Pig does not know upfront the
type of the actually data and will determine this and perform the right
conversion on the fly.
Having a deterministic schema is very powerful; however, sometimes it comes at
the cost of performance. Consider the following example:
A = load ‘input’ as (x, y, z);
B = foreach A generate x+y;
If you do describe on B, you will see a single column of type double. This is
because Pig makes the safest choice and takes the largest numeric type when the
schema is not know. In practice, the input data can be containing integer
values; however, Pig will cast the data to double and make sure that a double
result is returned.
If the schema of a relationship can’t be inferred, Pig will just use the
runtime data as is and propagate it through the pipeline.
> Need to document how types work in Pig
> --------------------------------------
>
> Key: PIG-1822
> URL: https://issues.apache.org/jira/browse/PIG-1822
> Project: Pig
> Issue Type: Improvement
> Components: documentation
> Reporter: Olga Natkovich
> Assignee: Olga Natkovich
> Fix For: 0.9.0
>
>
> What is static and what is dynamic.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira