Add trailing flag to commands to prevent retention of relation name in field
names: STRIP ?
-------------------------------------------------------------------------------------------
Key: PIG-1476
URL: https://issues.apache.org/jira/browse/PIG-1476
Project: Pig
Issue Type: New Feature
Affects Versions: 0.7.0
Environment: sunny, 60% humidity with a chance of rain.
Reporter: Russell Jurney
Fix For: 0.8.0
After doing a JOIN or a GROUP/FOREACH, one often ends up with data looking like:
> DESCRIBE foo;
foo: {other_thing::f1:int, other_thing::f2:chararray, other_thing::f3: int}
What wunn usually wants is:
foo: {f1:int, f2:chararray, f3: int}
At this point, won is left with two choices, neither of which is very good.
Choice wan:
> foo = FOREACH foo GENERATE $0 AS f1, $1 AS f2, $3 AS f3;
This is a poor choice because later when wahn edits this file, it is confusing
to remember what order is what field when wun manipulates something up stream
in the script. So instead whun does this:
> foo = FOREACH foo GENERATE old_thing::f1 AS f1, old_thing::f2 AS f2,
> old_thing::f3 AS f3;
This is a poor choice because it is verbose and cumbersome.
Whan is unsure what to do, pauses and reflects that the Pig is perplexing, and
hopes for a better tomorrow. Here's what wuhn should do to avoid this
situation:
foo = JOIN old_thing by f1, other_thing BY f1 STRIP;
DESCRIBE foo> foo: {f1:int, f2:chararray, f3: int};
I think so, anyway. I leave the behavior of duplicate fields to more
enlightened beings, but I think this would be a big improvement to Pig Latin.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.