[jira] Commented: (PIG-1434) Allow casting relations to scalars

Alan Gates (JIRA) Wed, 14 Jul 2010 10:53:47 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888449#action_12888449
 ]


Alan Gates commented on PIG-1434:
---------------------------------

Alright, I finally understand.  I think the potential confusion for the user 
and the Pig parser is caused by the proposed way to handle multi-columned 
input.  Rather than

{code}
Y = foreach Z generate X::$1/(long) C.count, X::$2-(long) C.max;
{code}

if we instead do
{code}
Y = foreach Z generate X::$1/((tuple)C).count, X::$2 - ((tuple)C).max;
{code}

then I believe it is clear for both user and parser what is happening.

In each case C is being cast to a tuple and then fields read out of it.  C is 
not being cast to a long.  Then the feature remains basically as originally 
proposed.  The relation being cast must have one record and one field.  That 
one field can be a tuple to handle the case where the record has multiple 
fields.  But Pig will still reads it as a single column which is a tuple, and 
the user will need to cast it accordingly.

This should also avoid accidental usage.  In the example above:
{code}
Y = foreach Z generate X::$1/C.count, X::$2 - C.max;
{code}
should still be an error because the type checker should not be able to find C 
as a tuple anywhere in its symbol table.


> Allow casting relations to scalars
> ----------------------------------
>
>                 Key: PIG-1434
>                 URL: https://issues.apache.org/jira/browse/PIG-1434
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>            Assignee: Aniket Mokashi
>             Fix For: 0.8.0
>
>         Attachments: scalarImpl.patch
>
>
> This jira is to implement a simplified version of the functionality described 
> in https://issues.apache.org/jira/browse/PIG-801.
> The proposal is to allow casting relations to scalar types in foreach.
> Example:
> A = load 'data' as (x, y, z);
> B = group A all;
> C = foreach B generate COUNT(A);
> .....
> X = ....
> Y = foreach X generate $1/(long) C;
> Couple of additional comments:
> (1) You can only cast relations including a single value or an error will be 
> reported
> (2) Name resolution is needed since relation X might have field named C in 
> which case that field takes precedence.
> (3) Y will look for C closest to it.
> Implementation thoughts:
> The idea is to store C into a file and then convert it into scalar via a UDF. 
> I believe we already have a UDF that Ben Reed contributed for this purpose. 
> Most of the work would be to update the logical plan to
> (1) Store C
> (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1434) Allow casting relations to scalars

Reply via email to