[ 
https://issues.apache.org/jira/browse/HIVE-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Zhang updated HIVE-1348:
-----------------------------

    Attachment: HIVE-1348.4.patch

Since Yongqiang is tied up with other tasks, I'm uploading a new patch 
HIVE-1348.4.patch to simplify the ExecMapperContext and the logic to check if 
the input file has been changed. 

It differs from the previous version in the following way:

1) the lastInputFile in ExecMapperContext will be only modified by resetRow() 
which should be called only once for each new row by the root of the operator 
tree -- ExecMapper.map(). It should not be changed by other operators 
throughout the operator tree. 

2) removed the variable inputFileChanged in ExecMapperContext and simplified 
the function inputFileChanged() so that it can be called by any operator in the 
operator tree, and can be called multiple times. 

3) the currentInputFile will be updated only by inputFileChanged(). If the 
function is not called, the variable doesn't need to be updated.

> Moving inputFileChanged() from ExecMapper to where it is needed
> ---------------------------------------------------------------
>
>                 Key: HIVE-1348
>                 URL: https://issues.apache.org/jira/browse/HIVE-1348
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Ning Zhang
>            Assignee: He Yongqiang
>         Attachments: hive-1348.1.patch, hive-1348.2.patch, hive-1348.3.patch, 
> HIVE-1348.4.patch
>
>
> inputFileChanged() is only needed for Bucketed sort merge map join. It should 
> not be put in ExecMapper.map() where all code paths will hit this function. 
> This function is quite expensive since JobConf look up is a hash table look 
> up. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to