I think we're saying the same thing.
In the UDF case, both result in the UDF getting a tuple with two fields.
In the non-UDF case, both should result in a tuple with two fields. At
the moment generate * results in a tuple with one field, which is a
tuple that has two fields. It should not. That's the bug.
Alan.
Mridul Muralidharan wrote:
Assuming 2 field schema for A, shouldn't
B = foreach A generate $0, $1;
and
B = foreach A generate *;
not be the same ?
This is similar to
B = foreach A generate myFunc($0, $1)
and
B = foreach A generate myFunc(*)
The udf gets a tuple in both cases as ($0, $1) and not (($0, $1)) for
second case.
Regards,
Mridul
Alan Gates (JIRA) wrote:
Semantics of generate * have changed
------------------------------------
Key: PIG-359
URL: https://issues.apache.org/jira/browse/PIG-359
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: types_branch
Reporter: Alan Gates
Priority: Minor
Fix For: types_branch
In the main trunk, the script
A = load 'myfile';
B = foreach A generate *;
returns:
(x, y, z)
In the types branch, it returns:
((x, y, z))
There is an extra level of tuple in it. In the main branch generate
* seems to include an implicit flatten.