[ 
https://issues.apache.org/jira/browse/HIVE-14112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated HIVE-14112:
-------------------------------
    Description: 
Two tables, _hbasetable_risk_control_defense_idx_uid_ is HBase mapped table:
{noformat}
[root@dev01 ~]# hadoop fs -du -s -h 
/hbase/data/tandem/hbase-table-risk-control-defense-idx-uid
3.0 G  9.0 G  /hbase/data/tandem/hbase-table-risk-control-defense-idx-uid
[root@dev01 ~]# hadoop fs -du -s -h /user/hive/warehouse/openapi_invoke_base
6.6 G  19.7 G  /user/hive/warehouse/openapi_invoke_base
{noformat}
The smallest table is 3.0G, is greater than _hive.mapjoin.smalltable.filesize_ 
and _hive.auto.convert.join.noconditionaltask.size_. When join these tables, 
Hive auto convert it to mapjoin:
{noformat}
hive> select count(*) from hbasetable_risk_control_defense_idx_uid t1 join 
openapi_invoke_base t2 on (t1.key=t2.merchantid);
Query ID = root_20160628092222_9f9d3f25-857b-412c-8a75-3d9228bd5ee5
Total jobs = 1
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; 
support was removed in 8.0
Execution log at: 
/tmp/root/root_20160628092222_9f9d3f25-857b-412c-8a75-3d9228bd5ee5.log
2016-06-28 09:22:10     Starting to launch local task to process map join;      
maximum memory = 1908932608
{noformat} 
the root cause is hive use 
{/user/hive/warehouse/hbasetable_risk_control_defense_idx_uid} as it location, 
but it empty. so hive auto convert it to mapjoin.
My opinion is set right location when mapping HBase table.

  was:
Two tables, _hbasetable_risk_control_defense_idx_uid_ is HBase mapped table:
{noformat}
[root@dev01 ~]# hadoop fs -du -s -h 
/hbase/data/tandem/hbase-table-risk-control-defense-idx-uid
3.0 G  9.0 G  /hbase/data/tandem/hbase-table-risk-control-defense-idx-uid
[root@dev01 ~]# hadoop fs -du -s -h /user/hive/warehouse/openapi_invoke_base
6.6 G  19.7 G  /user/hive/warehouse/openapi_invoke_base
{noformat}
The smallest table is 3.0G, is greater than _hive.mapjoin.smalltable.filesize_ 
and _hive.auto.convert.join.noconditionaltask.size_. When join these tables, 
Hive auto convert it to mapjoin:
{noformat}
hive> select count(*) from hbasetable_risk_control_defense_idx_uid t1 join 
openapi_invoke_base t2 on (t1.key=t2.merchantid);
Query ID = root_20160628092222_9f9d3f25-857b-412c-8a75-3d9228bd5ee5
Total jobs = 1
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; 
support was removed in 8.0
Execution log at: 
/tmp/root/root_20160628092222_9f9d3f25-857b-412c-8a75-3d9228bd5ee5.log
2016-06-28 09:22:10     Starting to launch local task to process map join;      
maximum memory = 1908932608
{noformat} 
the root cause is hive use 
_/user/hive/warehouse/hbasetable_risk_control_defense_idx_uid_ as it location, 
but it empty. so hive auto convert it to mapjoin.
My opinion is set right location when mapping HBase table.


> Join a HBase mapped big table shouldn't convert to MapJoin
> ----------------------------------------------------------
>
>                 Key: HIVE-14112
>                 URL: https://issues.apache.org/jira/browse/HIVE-14112
>             Project: Hive
>          Issue Type: Bug
>          Components: StorageHandler
>    Affects Versions: 1.2.0, 1.1.0
>            Reporter: Yuming Wang
>            Assignee: Yuming Wang
>            Priority: Minor
>         Attachments: HIVE-14112.1.patch
>
>
> Two tables, _hbasetable_risk_control_defense_idx_uid_ is HBase mapped table:
> {noformat}
> [root@dev01 ~]# hadoop fs -du -s -h 
> /hbase/data/tandem/hbase-table-risk-control-defense-idx-uid
> 3.0 G  9.0 G  /hbase/data/tandem/hbase-table-risk-control-defense-idx-uid
> [root@dev01 ~]# hadoop fs -du -s -h /user/hive/warehouse/openapi_invoke_base
> 6.6 G  19.7 G  /user/hive/warehouse/openapi_invoke_base
> {noformat}
> The smallest table is 3.0G, is greater than 
> _hive.mapjoin.smalltable.filesize_ and 
> _hive.auto.convert.join.noconditionaltask.size_. When join these tables, Hive 
> auto convert it to mapjoin:
> {noformat}
> hive> select count(*) from hbasetable_risk_control_defense_idx_uid t1 join 
> openapi_invoke_base t2 on (t1.key=t2.merchantid);
> Query ID = root_20160628092222_9f9d3f25-857b-412c-8a75-3d9228bd5ee5
> Total jobs = 1
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; 
> support was removed in 8.0
> Execution log at: 
> /tmp/root/root_20160628092222_9f9d3f25-857b-412c-8a75-3d9228bd5ee5.log
> 2016-06-28 09:22:10   Starting to launch local task to process map join;      
> maximum memory = 1908932608
> {noformat} 
> the root cause is hive use 
> {/user/hive/warehouse/hbasetable_risk_control_defense_idx_uid} as it 
> location, but it empty. so hive auto convert it to mapjoin.
> My opinion is set right location when mapping HBase table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to