[ https://issues.apache.org/jira/browse/PIG-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895383#action_12895383 ]
Olga Natkovich commented on PIG-1461: ------------------------------------- The patch looks good. A couple of comments: (1) Looks like there is a type in the code that loads data for testing: w.println("5\tdef\t3\t{(2,a),(2,b)}]"); - contains an extra "]" at the end (2) This is not related to the patch but to the documentation above. Please, add info that UNION supports 2 or more inputs. (3) In mergeSchemasByAlias, I think it is safer to make copy of the schema rather than just assigning it for the corner case of 1 schema. (4) Need to add a comment about inner bag schema to mergeFieldSchemaFirstLevelSameAlias (5) General comment on schema merging - we have completely different code path for posiiton vs. alias based merge. I am worried that we will have subtly different semantics either now or later. > support union operation that merges based on column names > --------------------------------------------------------- > > Key: PIG-1461 > URL: https://issues.apache.org/jira/browse/PIG-1461 > Project: Pig > Issue Type: New Feature > Components: impl > Affects Versions: 0.8.0 > Reporter: Thejas M Nair > Assignee: Thejas M Nair > Fix For: 0.8.0 > > Attachments: PIG-1461.1.patch, PIG-1461.patch > > > When the data has schema, it often makes sense to union on column names in > schema rather than the position of the columns. > The behavior of existing union operator should remain backward compatible . > This feature can be supported using either a new operator or extending union > to support 'using' clause . I am thinking of having a new operator called > either unionschema or merge . Does anybody have any other suggestions for the > syntax ? > example - > L1 = load 'x' as (a,b); > L2 = load 'y' as (b,c); > U = unionschema L1, L2; > describe U; > U: {a:bytearray, b:byetarray, c:bytearray} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.