[
https://issues.apache.org/jira/browse/HIVE-28700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhihua Deng updated HIVE-28700:
-------------------------------
Description:
Steps to repro:
set mapreduce.job.reduces=7;
create table ext(a int);
insert into table ext values(1),(2),(3),(3),(3),(3),(4),(5),(6),(7);
create table full_acid(a int) stored as orc
tblproperties("transactional"="true");
insert overwrite table full_acid select * from ext where a = 3;
insert into table full_acid select * from ext where a != 3 group by a;
select * from full_acid;
alter table full_acid compact 'major' and wait;
select * from full_acid;
After the major compaction, the full_acid table misses records "a = 3";
This issue might happen on overwriting table then inserting into, followed by a
major compaction. During the major compaction, due to the accidental bucket on
the base file and no the same bucket found on the delta files, the compactor
will miss this base file, making all records in this file loss.
was:
Steps to repro:
set mapreduce.job.reduces=7;
create table ext(a int);
insert into table ext values(1),(2),(3),(3),(3),(3),(4),(5),(6),(7);
create table full_acid(a int) stored as orc
tblproperties("transactional"="true");
insert overwrite table full_acid select * from ext where a = 3;
insert into table full_acid select * from ext where a != 3 group by a;
select * from full_acid;
alter table full_acid compact 'major' and wait;
select * from full_acid;
After the major compaction, the full_acid table misses records "a = 3";
This issue might happen on overwriting table then inserting into, following a
major compaction. During the major compaction, due to the accidental bucket on
the base file and no the same bucket found on the delta files, the compactor
will miss this base file, making all records in this file loss.
> MRCompactor may cause data loss when performing the major compaction
> --------------------------------------------------------------------
>
> Key: HIVE-28700
> URL: https://issues.apache.org/jira/browse/HIVE-28700
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 4.0.0, 4.0.1
> Reporter: Zhihua Deng
> Assignee: Zhihua Deng
> Priority: Blocker
> Labels: hive-4.1.0-must, pull-request-available
> Fix For: 4.1.0
>
>
> Steps to repro:
> set mapreduce.job.reduces=7;
> create table ext(a int);
> insert into table ext values(1),(2),(3),(3),(3),(3),(4),(5),(6),(7);
> create table full_acid(a int) stored as orc
> tblproperties("transactional"="true");
> insert overwrite table full_acid select * from ext where a = 3;
> insert into table full_acid select * from ext where a != 3 group by a;
> select * from full_acid;
> alter table full_acid compact 'major' and wait;
> select * from full_acid;
> After the major compaction, the full_acid table misses records "a = 3";
> This issue might happen on overwriting table then inserting into, followed by
> a major compaction. During the major compaction, due to the accidental bucket
> on the base file and no the same bucket found on the delta files, the
> compactor will miss this base file, making all records in this file loss.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)