[ 
https://issues.apache.org/jira/browse/PIG-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020106#comment-13020106
 ] 

Thejas M Nair commented on PIG-1281:
------------------------------------

{quote}
Discussed this with Daniel. Here is what needs to happen:
(1) If type is specified at load time or through cast, typechecker should 
detect the problem.
(2) Otherwise, frontend needs to insert cast to a tuple and let backend figure 
out if the real data contains the tuple.
{quote}

The col.$0 syntax is applicable to both tuple and bag. So casting it to Tuple 
is not right. 
I think pig should ideally at run time determine if the input is a Tuple or Bag 
and return column(s) or bag(s).
That is more than a type checker change, I will open another jira to test/fix 
that. (I need (to create) a LoadFunc that does not return a schema, but also 
returns tuple or bag objects).
I will address the case (1) in this jira.



> Detect org.apache.pig.data.DataByteArray cannot be cast to 
> org.apache.pig.data.Tuple type of errors at Compile Type during creation of 
> logical plan
> ---------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1281
>                 URL: https://issues.apache.org/jira/browse/PIG-1281
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.6.0
>            Reporter: Viraj Bhat
>            Assignee: Thejas M Nair
>             Fix For: 0.9.0
>
>
> This is more of an enhancement request, where we can detect simple errors 
> during compile time during creation of Logical plan rather than at the 
> backend.
> I created a script which contains an error which gets detected in the backend 
> as a cast error when in fact we can detect it in the front end(group is a 
> single element so group.$0 projection operation will not work).
> {code}
> inputdata = LOAD '/user/viraj/mymapdata' AS (co1, col2, col3, col4);
> projdata = FILTER inputdata BY (col1 is not null);
> groupprojdata = GROUP projdata BY col1;
> cleandata = FOREACH groupprojdata {
>                      bagproj = projdata.col1;
>                      dist_bags = DISTINCT bagproj;
>                      GENERATE group.$0 as newcol1, COUNT(dist_bags) as 
> newcol2;
>                       };
> cleandata1 = GROUP cleandata by newcol2;
> cleandata2 = FOREACH cleandata1 { GENERATE group.$0 as finalcol1, 
> COUNT(cleandata.newcol1) as finalcol2; };
> ordereddata = ORDER cleandata2 by finalcol2;
> store into 'finalresult' using PigStorage();
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to