[ https://issues.apache.org/jira/browse/PIG-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977048#action_12977048 ]
Gerrit Jansen van Vuuren commented on PIG-1717: ----------------------------------------------- The need to have the AS clause schema (script schema) available in the LoadFunc actually comes from the AllLoader, but I've had the same frustration with when writing the HiveColumnarLoader. What the AllLoader does (apart from loading LoadFunc's based on configured extensions) is provide generic support for path name key value partitioning, and modifies the schema to add these partitioning values to both the schema and returned Tuple. Currently it will try to get the Schema from the LoadFunc, but the problem comes into play when this LoadFunc does not return a Schema e.g. LzoTextLoader or PigStorage, then there is only one option left for the AllLoader and that is to throw an Exception because otherwise the partitioning logic will not work as expected. My reason for requiring this is: As you've mentioned users have two use cases, ad hocs and pipe-line. For pipe line queries I agree that its better using a metadata repository like howl. But for add hocs queries missing the ability the know the AS clause schema in pig means that custom loaders can only support the pipe line type queries and cannot provide for the adhocs. > pig needs to call setPartitionFilter if schema is null but getPartitionKeys > is not > ---------------------------------------------------------------------------------- > > Key: PIG-1717 > URL: https://issues.apache.org/jira/browse/PIG-1717 > Project: Pig > Issue Type: Improvement > Components: impl > Affects Versions: 0.9.0 > Reporter: Gerrit Jansen van Vuuren > Assignee: Gerrit Jansen van Vuuren > Priority: Minor > Fix For: 0.9.0 > > Attachments: PIG-1717.patch > > > I'm writing a loader that works with hive style partitioning e.g. > /logs/type1/daydate=2010-11-01 > The loader does not know the schema upfront and this is something that the > user adds in the script using the AS clause. > The problem is that this user defined schema is not available to the loader, > so the loader cannot return any schema, the Loader does know what the > partition keys are and pig needs in some way to know about these partition > keys. > Currently if the schema is null pig never calls the > LoadMetaData:getPartitionKeys method or the setPartitionFilter method. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.