[ https://issues.apache.org/jira/browse/PIG-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020106#comment-13020106 ]
Thejas M Nair commented on PIG-1281: ------------------------------------ {quote} Discussed this with Daniel. Here is what needs to happen: (1) If type is specified at load time or through cast, typechecker should detect the problem. (2) Otherwise, frontend needs to insert cast to a tuple and let backend figure out if the real data contains the tuple. {quote} The col.$0 syntax is applicable to both tuple and bag. So casting it to Tuple is not right. I think pig should ideally at run time determine if the input is a Tuple or Bag and return column(s) or bag(s). That is more than a type checker change, I will open another jira to test/fix that. (I need (to create) a LoadFunc that does not return a schema, but also returns tuple or bag objects). I will address the case (1) in this jira. > Detect org.apache.pig.data.DataByteArray cannot be cast to > org.apache.pig.data.Tuple type of errors at Compile Type during creation of > logical plan > --------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: PIG-1281 > URL: https://issues.apache.org/jira/browse/PIG-1281 > Project: Pig > Issue Type: Improvement > Affects Versions: 0.6.0 > Reporter: Viraj Bhat > Assignee: Thejas M Nair > Fix For: 0.9.0 > > > This is more of an enhancement request, where we can detect simple errors > during compile time during creation of Logical plan rather than at the > backend. > I created a script which contains an error which gets detected in the backend > as a cast error when in fact we can detect it in the front end(group is a > single element so group.$0 projection operation will not work). > {code} > inputdata = LOAD '/user/viraj/mymapdata' AS (co1, col2, col3, col4); > projdata = FILTER inputdata BY (col1 is not null); > groupprojdata = GROUP projdata BY col1; > cleandata = FOREACH groupprojdata { > bagproj = projdata.col1; > dist_bags = DISTINCT bagproj; > GENERATE group.$0 as newcol1, COUNT(dist_bags) as > newcol2; > }; > cleandata1 = GROUP cleandata by newcol2; > cleandata2 = FOREACH cleandata1 { GENERATE group.$0 as finalcol1, > COUNT(cleandata.newcol1) as finalcol2; }; > ordereddata = ORDER cleandata2 by finalcol2; > store into 'finalresult' using PigStorage(); > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira