I think we're saying the same thing.
In the UDF case, both result in the UDF getting a tuple with two fields.

In the non-UDF case, both should result in a tuple with two fields. At the moment generate * results in a tuple with one field, which is a tuple that has two fields. It should not. That's the bug.

Alan.

Mridul Muralidharan wrote:

Assuming 2 field schema for A, shouldn't

B = foreach A generate $0, $1;
and
B = foreach A generate *;

not be the same ?

This is similar to

B = foreach A generate myFunc($0, $1)
and
B = foreach A generate myFunc(*)

The udf gets a tuple in both cases as ($0, $1) and not (($0, $1)) for second case.


Regards,
Mridul




Alan Gates (JIRA) wrote:
Semantics of generate * have changed
------------------------------------

                 Key: PIG-359
                 URL: https://issues.apache.org/jira/browse/PIG-359
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: types_branch
            Reporter: Alan Gates
            Priority: Minor
             Fix For: types_branch


In the main trunk, the script

A = load 'myfile';
B = foreach A generate *;

returns:

(x, y, z)

In the types branch, it returns:

((x, y, z))

There is an extra level of tuple in it. In the main branch generate * seems to include an implicit flatten.


Reply via email to