[
https://issues.apache.org/jira/browse/HIVE-8700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Suhas Satish updated HIVE-8700:
-------------------------------
Attachment: HIVE-8700.3-spark.patch
> Replace ReduceSink to HashTableSink (or equi.) for small tables [Spark Branch]
> ------------------------------------------------------------------------------
>
> Key: HIVE-8700
> URL: https://issues.apache.org/jira/browse/HIVE-8700
> Project: Hive
> Issue Type: Sub-task
> Components: Spark
> Reporter: Xuefu Zhang
> Assignee: Suhas Satish
> Attachments: HIVE-8700-spark.patch, HIVE-8700.2-spark.patch,
> HIVE-8700.3-spark.patch, HIVE-8700.patch
>
>
> With HIVE-8616 enabled, the new plan has ReduceSinkOperator for the small
> tables. For example, the follow represents the operator plan for the small
> table dec1 derived from query {code}explain select /*+ MAPJOIN(dec)*/ * from
> dec join dec1 on dec.value=dec1.d;{code}
> {code}
> Map 2
> Map Operator Tree:
> TableScan
> alias: dec1
> Statistics: Num rows: 0 Data size: 107 Basic stats: PARTIAL
> Column stats: NONE
> Filter Operator
> predicate: d is not null (type: boolean)
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE
> Column stats: NONE
> Reduce Output Operator
> key expressions: d (type: decimal(5,2))
> sort order: +
> Map-reduce partition columns: d (type: decimal(5,2))
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE
> Column stats: NONE
> value expressions: i (type: int)
> {code}
> With the new design for broadcasting small tables, we need to convert the
> ReduceSinkOperator with HashTableSinkOperator or equivalent in the new plan.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)