[ 
https://issues.apache.org/jira/browse/HIVE-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978997#action_12978997
 ] 

Ning Zhang commented on HIVE-1900:
----------------------------------

Namit, do you mean bucketized sort-merge join? In that case don't you need to 
use a specialized InputFormat and RecordReader? If we allow mappers get inputs 
from multiple partitions, we need to ensure HiveInputFormat and 
CombineHiveInputFormat and the RecordReaders be partition aware. 

2) is important because we don't want to merge different partitions in one 
file. Otherwise you need a dynamic partition insert for the merge which may 
generate multiple small files for a partition again. 

3) If TableScanOperator can take multiple partitions, the stats has to be 
gathered according to the input partition column values. Currently the 
partition column value is checked for the 1st row and assumes all the rows have 
the same partitioning column value. If we allow multiple partitions in a 
mapper, we have to check the partition column values for each row. 

> a mapper should be able to span multiple partitions
> ---------------------------------------------------
>
>                 Key: HIVE-1900
>                 URL: https://issues.apache.org/jira/browse/HIVE-1900
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: He Yongqiang
>
> Currently, a  mapper only spans a single partition which creates a problem in 
> the presence of many
> small partitions (which is becoming a common usecase in facebook).
> If the plan is the same, a mapper should be able to span files across 
> multiple partitions

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to