[ 
https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12851718#action_12851718
 ] 

Zheng Shao commented on HIVE-1131:
----------------------------------

> Look at the DataContainer class. That has a partition in it. And the 
> Dependency has a mapping from Partition to the dependencies. Can you explain 
> more your concerns on inefficiency?

I see. So the DataContainer captures the output partition information, but we 
don't have input partition information (BaseColumnInfo/TableAliasInfo). This is 
reasonable since the input can be lots of partitions.

> For S6 actually the queryplan is the wrong place to store the lineageinfo. 
> Because of the dynamic partitioning work that Ning is doing, I have to 
> generate the partition to dependency mapping at run time. So I would rather 
> store it in a run time structure as opposed to a compile time structure. 
> SessionState fits that bill, though I think we should have another structure 
> called ExecutionCtx for this. But otherwise I think we want to store this in 
> a runtime structure.

+1 on the ExecutionCtx idea. SessionState is at the session level, and 
LineageInfo is at the query level. It will be great to put LineageInfo into 
ExecutionCtx.


> Add column lineage information to the pre execution hooks
> ---------------------------------------------------------
>
>                 Key: HIVE-1131
>                 URL: https://issues.apache.org/jira/browse/HIVE-1131
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>         Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, 
> HIVE-1131_4.patch
>
>
> We need a mechanism to pass the lineage information of the various columns of 
> a table to a pre execution hook so that applications can use that for:
> - auditing
> - dependency checking
> and many other applications.
> The proposal is to expose this through a bunch of classes to the pre 
> execution hook interface to the clients and put in the necessary 
> transformation logic in the optimizer to generate this information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to