[ https://issues.apache.org/jira/browse/PIG-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903714#action_12903714 ]
Thejas M Nair commented on PIG-1482: ------------------------------------ Patch review comments - - Schema.java {code} public FieldSchema(String a, Schema s, byte t) throws FrontendException { alias = a; schema = s; log.debug("t: " + t + " Bag: " + DataType.BAG + " tuple: " + DataType.TUPLE); /* * The following check is removed because it may not be always true. As a matter of * fact, the condition can be produced using other constructors anyway. * if ((null != s) && !(DataType.isSchemaType(t))) { int errCode = 1020; throw new FrontendException("Only a BAG or TUPLE can have schemas. Got " + DataType.findTypeName(t), errCode, PigException.INPUT); } */ {code} I think some other code paths might be relying on this constructor for error checking. It would be safer to create another constructor with a check boolean argument {code} public FieldSchema(String a, Schema s, byte t, boolean innerTypeCheck) {code} and call that from above constructor and from FieldSchema.copyAndLink(..) - In LOStream.java.getSchema() mIsSchemaComputed is used to keep track of whether the fieldschema parents have been set. I think it will be better to use a different variable for the purpose - it will be more readable, and also not likely to break any assumptions people are likely to make about this variable that is from the LogicalOperator class. - TypeCheckingVisitor.java insertCastForUDF is called on input of udf , it seems like same logic should be used for other expressions as well (instead of insertCast(.) ). Also, insertCastForUDF(..) and insertCast(..) have only two lines different, we can share rest of the code. {code} private void insertCastForUDF(LOUserFunc udf, FieldSchema fromFS, FieldSchema toFs, ExpressionOperator predecessor){ toFs.setParent( fromFS.canonicalName, predecessor ); insertCast(udf, toFs.type, toFs, predecessor); } {code} - TypeCheckingVisitor.java In visit(LOCast), it seems like we can just pick any of the matching predecessor load functions, shouldn't we check if all the FuncSpec returned are the same ? {code} for( Map.Entry<String, LogicalOperator> entry : canonicalMap.entrySet() ) { FuncSpec loadFuncSpec = getLoadFuncSpec( entry.getValue(), entry.getKey() ); cast.setLoadFuncSpec( loadFuncSpec ); } {code} - LOProject.java the commented line can be removed - {code} +// mFieldSchema.setParent(fs.canonicalName, expressionOperator); {code} > Pig gets confused when more than one loader is involved > ------------------------------------------------------- > > Key: PIG-1482 > URL: https://issues.apache.org/jira/browse/PIG-1482 > Project: Pig > Issue Type: Bug > Affects Versions: 0.7.0 > Reporter: Ankur > Assignee: Xuefu Zhang > Fix For: 0.8.0 > > Attachments: jira-1482-final.patch, jira-1482-final.patch, > jira-1482-final.patch > > > In case of two relations being loaded using different loader, joined, grouped > and projected, pig gets confused in trying to find appropriate loader for the > requested cast. Consider the following script :- > A = LOAD 'data1' USING PigStorage() AS (s, m, l); > B = FOREACH A GENERATE s#'k1' as v1, m#'k2' as v2, l#'k3' as v3; > C = FOREACH B GENERATE v1, (v2 == 'v2' ? 1L : 0L) as v2:long, (v3 == 'v3' ? 1 > :0) as v3:int; > D = LOAD 'data2' USING TextLoader() AS (a); > E = JOIN C BY v1, D BY a USING 'replicated'; > F = GROUP E BY (v1, a); > G = FOREACH F GENERATE (chararray)group.v1, group.a; > > dump G; > This throws the error, stack trace of which is in the next comment -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.