[ 
https://issues.apache.org/jira/browse/PIG-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895082#action_12895082
 ] 

Thejas M Nair commented on PIG-1461:
------------------------------------

Documentation for UNION ONSCHEMA:

Use the keyword ONSCHEMA with union so that the union is based on column names 
of the input relations, and not column position. 
If the following requirements are not met, the statement will throw an error :
 * All inputs to the union should have a non null schema.
 * The data type for columns with same name in different input schemas should 
be compatible. Numeric types are compatible, and if column having same name in 
different input schemas have different numeric types , an implicit conversion 
will happen. bytearray type is considered compatible with all other types, a 
cast will be added to convert to other type. Bags or tuples having different 
inner schema are considered incompatible.


Example - 
{code}
grunt> L1 = load 'f1' using (a : int, b : float);
grunt> dump L1;
(11,12.0)
(21,22.0)

grunt> L2 = load 'f1' using (a : long, c : chararray);
grunt> dump L2;
(11,a)
(12,b)
(13,c)

grunt> U = union onschema L1, L2;
grunt> describe U ;
U : {a : long, b : float, c : chararray}

grunt> dump U;
(11,12.0,)
(21,22.0,)
(11,,a)
(12,,b)
(13,,c)


{code}


> support union operation that merges based on column names
> ---------------------------------------------------------
>
>                 Key: PIG-1461
>                 URL: https://issues.apache.org/jira/browse/PIG-1461
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>             Fix For: 0.8.0
>
>         Attachments: PIG-1461.1.patch, PIG-1461.patch
>
>
> When the data has schema, it often makes sense to union on column names in 
> schema rather than the position of the columns. 
> The behavior of existing union operator should remain backward compatible .
> This feature can be supported using either a new operator or extending union 
> to support 'using' clause . I am thinking of having a new operator called 
> either unionschema or merge . Does anybody have any other suggestions for the 
> syntax ?
> example -
> L1 = load 'x' as (a,b);
> L2 = load 'y' as (b,c);
> U = unionschema L1, L2;
> describe U;
> U: {a:bytearray, b:byetarray, c:bytearray}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to