[ 
https://issues.apache.org/jira/browse/PIG-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977048#action_12977048
 ] 

Gerrit Jansen van Vuuren commented on PIG-1717:
-----------------------------------------------

The need to have the AS clause schema (script schema) available in the LoadFunc 
actually comes from the AllLoader, but I've had the same frustration with when 
writing the HiveColumnarLoader. What the AllLoader does (apart from loading 
LoadFunc's based on configured extensions) is provide generic support for path 
name key value partitioning, and modifies the schema to add these partitioning 
values to both the schema and returned Tuple.  Currently it will try to get the 
Schema from the LoadFunc, but the problem comes into play when this LoadFunc  
does not return a Schema e.g. LzoTextLoader or PigStorage, then there is only 
one option left for the AllLoader and that is to throw an Exception because 
otherwise the partitioning logic will not work as expected. 

My reason for requiring this is: 
 As you've mentioned users have two use cases, ad hocs and pipe-line. For pipe 
line queries I agree that its better using a metadata repository like howl. But 
for add hocs queries missing the ability the know the AS clause schema in pig 
means that custom loaders can only support the pipe line type queries and 
cannot provide for the adhocs. 
 

> pig needs to call setPartitionFilter if schema is null but getPartitionKeys 
> is not
> ----------------------------------------------------------------------------------
>
>                 Key: PIG-1717
>                 URL: https://issues.apache.org/jira/browse/PIG-1717
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.9.0
>            Reporter: Gerrit Jansen van Vuuren
>            Assignee: Gerrit Jansen van Vuuren
>            Priority: Minor
>             Fix For: 0.9.0
>
>         Attachments: PIG-1717.patch
>
>
> I'm writing a loader that works with hive style partitioning e.g. 
> /logs/type1/daydate=2010-11-01
> The loader does not know the schema upfront and this is something that the 
> user adds in the script using the AS clause.
> The problem is that this user defined schema is not available to the loader, 
> so the loader cannot return any schema, the Loader does know what the 
> partition keys are and pig needs in some way to know about these partition 
> keys. 
> Currently if the schema is null pig never calls the 
> LoadMetaData:getPartitionKeys method or the setPartitionFilter method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to