[jira] [Updated] (HIVE-28583) In the case of subqueries, HIVE often incorrectly uses MAP-JOIN for large tables.

yongzhi.shao (Jira) Fri, 18 Oct 2024 20:18:04 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


yongzhi.shao updated HIVE-28583:
--------------------------------
    Description: 
Hello. Team.
 
Currently we found that in version 4.0.1, HIVE many many occasions will be 
wrongly estimated table size, and thus incorrectly use MapJoin to optimise, 
ultimately leading to a series of problems such as 
OOM/MapJoinMemoryExhaustionError.
 
We found a typical scenario as follows:
{code:java}
---dataset size
select
c1,c2,c3
from big_table_2;    50GB
 
big_table_1   1TB；
 
 
----- use map join. and cause oom/MapJoinMemoryExhaustionError
select
*
from 
big_table_1    t1
join
(

select
c1,c2,c3
from big_table_2

) t2  on xxxxx;  
 
 
 
----- use smj. no map join. job success
create table t2 as 
select
c1,c2,c3
from big_table_2;


select
*
from 
big_table_1  t1
join
t2  on xxxxx;  
 {code}
The above SQL can be executed normally in HIVE3.
Can anyone guide me on how to deal with this kind of problem?
 
 
 
 

  was:
Hello. Team.
 
Currently we found that in version 4.0, HIVE many many occasions will be 
wrongly estimated table size, and thus incorrectly use MapJoin to optimise, 
ultimately leading to a series of problems such as 
OOM/MapJoinMemoryExhaustionError.
 
We found a typical scenario as follows:
{code:java}
---dataset size
select
c1,c2,c3
from big_table_2;    50GB
 
big_table_1   1TB；
 
 
----- use map join. and cause oom/MapJoinMemoryExhaustionError
select
*
from 
big_table_1    t1
join
(

select
c1,c2,c3
from big_table_2

) t2  on xxxxx;  
 
 
 
----- use smj. no map join. job success
create table t2 as 
select
c1,c2,c3
from big_table_2;


select
*
from 
big_table_1  t1
join
t2  on xxxxx;  
 {code}
The above SQL can be executed normally in HIVE3.
Can anyone guide me on how to deal with this kind of problem?
 
 
 
 


> In the case of subqueries, HIVE often incorrectly uses MAP-JOIN for large 
> tables.
> ---------------------------------------------------------------------------------
>
>                 Key: HIVE-28583
>                 URL: https://issues.apache.org/jira/browse/HIVE-28583
>             Project: Hive
>          Issue Type: Bug
>      Security Level: Public(Viewable by anyone) 
>            Reporter: yongzhi.shao
>            Priority: Major
>
> Hello. Team.
>  
> Currently we found that in version 4.0.1, HIVE many many occasions will be 
> wrongly estimated table size, and thus incorrectly use MapJoin to optimise, 
> ultimately leading to a series of problems such as 
> OOM/MapJoinMemoryExhaustionError.
>  
> We found a typical scenario as follows:
> {code:java}
> ---dataset size
> select
> c1,c2,c3
> from big_table_2;    50GB
>  
> big_table_1   1TB；
>  
>  
> ----- use map join. and cause oom/MapJoinMemoryExhaustionError
> select
> *
> from 
> big_table_1    t1
> join
> (
> select
> c1,c2,c3
> from big_table_2
> ) t2  on xxxxx;  
>  
>  
>  
> ----- use smj. no map join. job success
> create table t2 as 
> select
> c1,c2,c3
> from big_table_2;
> select
> *
> from 
> big_table_1  t1
> join
> t2  on xxxxx;  
>  {code}
> The above SQL can be executed normally in HIVE3.
> Can anyone guide me on how to deal with this kind of problem?
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28583) In the case of subqueries, HIVE often incorrectly uses MAP-JOIN for large tables.

Reply via email to