[ 
https://issues.apache.org/jira/browse/PIG-1822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020398#comment-13020398
 ] 

Olga Natkovich commented on PIG-1822:
-------------------------------------

Please, append the following subsection to the end of Schemas section:

How Pig Handles Schema

As you can see from above,  with a few exceptions, Pig can infer the schema of 
a relationship upfront. You can see the schema of particular relation via 
describe. Pig enforces this computed schema during the actually computation by 
casting the input data to the expected data type. If the process is successful, 
the results are returned to the user; otherwise, a warning will be generated 
for each record that failed to convert. Note that Pig does not know upfront the 
type of the actually data and will determine this and perform the right 
conversion on the fly.

Having a deterministic schema is very powerful; however, sometimes it comes at 
the cost of performance. Consider the following example:

A = load ‘input’ as (x, y, z);
B = foreach A generate x+y;

If you do describe on B, you will see a single column of type double. This is 
because Pig makes the safest choice and takes the largest numeric type when the 
schema is not know. In practice, the input data can be containing integer 
values; however, Pig will cast the data to double and make sure that a double 
result is returned.

If the schema of a relationship can’t be inferred, Pig will just use the 
runtime data as is and propagate it through the pipeline.


> Need to document how types work in Pig
> --------------------------------------
>
>                 Key: PIG-1822
>                 URL: https://issues.apache.org/jira/browse/PIG-1822
>             Project: Pig
>          Issue Type: Improvement
>          Components: documentation
>            Reporter: Olga Natkovich
>            Assignee: Olga Natkovich
>             Fix For: 0.9.0
>
>
> What is static and what is dynamic.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to