[
https://issues.apache.org/jira/browse/PIG-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977048#action_12977048
]
Gerrit Jansen van Vuuren commented on PIG-1717:
-----------------------------------------------
The need to have the AS clause schema (script schema) available in the LoadFunc
actually comes from the AllLoader, but I've had the same frustration with when
writing the HiveColumnarLoader. What the AllLoader does (apart from loading
LoadFunc's based on configured extensions) is provide generic support for path
name key value partitioning, and modifies the schema to add these partitioning
values to both the schema and returned Tuple. Currently it will try to get the
Schema from the LoadFunc, but the problem comes into play when this LoadFunc
does not return a Schema e.g. LzoTextLoader or PigStorage, then there is only
one option left for the AllLoader and that is to throw an Exception because
otherwise the partitioning logic will not work as expected.
My reason for requiring this is:
As you've mentioned users have two use cases, ad hocs and pipe-line. For pipe
line queries I agree that its better using a metadata repository like howl. But
for add hocs queries missing the ability the know the AS clause schema in pig
means that custom loaders can only support the pipe line type queries and
cannot provide for the adhocs.
> pig needs to call setPartitionFilter if schema is null but getPartitionKeys
> is not
> ----------------------------------------------------------------------------------
>
> Key: PIG-1717
> URL: https://issues.apache.org/jira/browse/PIG-1717
> Project: Pig
> Issue Type: Improvement
> Components: impl
> Affects Versions: 0.9.0
> Reporter: Gerrit Jansen van Vuuren
> Assignee: Gerrit Jansen van Vuuren
> Priority: Minor
> Fix For: 0.9.0
>
> Attachments: PIG-1717.patch
>
>
> I'm writing a loader that works with hive style partitioning e.g.
> /logs/type1/daydate=2010-11-01
> The loader does not know the schema upfront and this is something that the
> user adds in the script using the AS clause.
> The problem is that this user defined schema is not available to the loader,
> so the loader cannot return any schema, the Loader does know what the
> partition keys are and pig needs in some way to know about these partition
> keys.
> Currently if the schema is null pig never calls the
> LoadMetaData:getPartitionKeys method or the setPartitionFilter method.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.