[
https://issues.apache.org/jira/browse/HIVE-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yongzhi.shao updated HIVE-28583:
--------------------------------
Description:
Hello. Team.
Currently we found that in version 4.0.1, HIVE many many occasions will be
wrongly estimated table size, and thus incorrectly use MapJoin to optimise,
ultimately leading to a series of problems such as
OOM/MapJoinMemoryExhaustionError.
We found a typical scenario as follows:
{code:java}
---dataset size
select
c1,c2,c3
from big_table_2; 50GB
big_table_1 1TB;
----- use map join. and cause oom/MapJoinMemoryExhaustionError
select
*
from
big_table_1 t1
join
(
select
c1,c2,c3
from big_table_2
) t2 on xxxxx;
----- use smj. no map join. job success
create table t2 as
select
c1,c2,c3
from big_table_2;
select
*
from
big_table_1 t1
join
t2 on xxxxx;
{code}
The above SQL can be executed normally in HIVE3.
Can anyone guide me on how to deal with this kind of problem?
was:
Hello. Team.
Currently we found that in version 4.0, HIVE many many occasions will be
wrongly estimated table size, and thus incorrectly use MapJoin to optimise,
ultimately leading to a series of problems such as
OOM/MapJoinMemoryExhaustionError.
We found a typical scenario as follows:
{code:java}
---dataset size
select
c1,c2,c3
from big_table_2; 50GB
big_table_1 1TB;
----- use map join. and cause oom/MapJoinMemoryExhaustionError
select
*
from
big_table_1 t1
join
(
select
c1,c2,c3
from big_table_2
) t2 on xxxxx;
----- use smj. no map join. job success
create table t2 as
select
c1,c2,c3
from big_table_2;
select
*
from
big_table_1 t1
join
t2 on xxxxx;
{code}
The above SQL can be executed normally in HIVE3.
Can anyone guide me on how to deal with this kind of problem?
> In the case of subqueries, HIVE often incorrectly uses MAP-JOIN for large
> tables.
> ---------------------------------------------------------------------------------
>
> Key: HIVE-28583
> URL: https://issues.apache.org/jira/browse/HIVE-28583
> Project: Hive
> Issue Type: Bug
> Security Level: Public(Viewable by anyone)
> Reporter: yongzhi.shao
> Priority: Major
>
> Hello. Team.
>
> Currently we found that in version 4.0.1, HIVE many many occasions will be
> wrongly estimated table size, and thus incorrectly use MapJoin to optimise,
> ultimately leading to a series of problems such as
> OOM/MapJoinMemoryExhaustionError.
>
> We found a typical scenario as follows:
> {code:java}
> ---dataset size
> select
> c1,c2,c3
> from big_table_2; 50GB
>
> big_table_1 1TB;
>
>
> ----- use map join. and cause oom/MapJoinMemoryExhaustionError
> select
> *
> from
> big_table_1 t1
> join
> (
> select
> c1,c2,c3
> from big_table_2
> ) t2 on xxxxx;
>
>
>
> ----- use smj. no map join. job success
> create table t2 as
> select
> c1,c2,c3
> from big_table_2;
> select
> *
> from
> big_table_1 t1
> join
> t2 on xxxxx;
> {code}
> The above SQL can be executed normally in HIVE3.
> Can anyone guide me on how to deal with this kind of problem?
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)