[ https://issues.apache.org/jira/browse/HIVE-14980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eugene Koifman resolved HIVE-14980. ----------------------------------- Resolution: Not A Bug fixed in HIVE-15202 > Minor compaction when triggered simultaniously on the same table/partition > deletes data > --------------------------------------------------------------------------------------- > > Key: HIVE-14980 > URL: https://issues.apache.org/jira/browse/HIVE-14980 > Project: Hive > Issue Type: Bug > Components: Metastore, Transactions > Affects Versions: 2.1.0 > Reporter: Mahipal Jupalli > Assignee: Mahipal Jupalli > Priority: Critical > Original Estimate: 96h > Remaining Estimate: 96h > > I have two tables (TABLEA, TABLEB). If I manually trigger compaction after > each INSERT into TABLEB from TABLEA, compactions are triggered on random > metastore asynchronously and are stepping on each other which is causing the > data to be deleted. > Example here: > TABLEA - has 10k rows. > insert into mj.tableb select * from mj.tablea; > alter table mj.tableb compact 'MINOR'; > insert into mj.tableb select * from mj.tablea; > alter table mj.tableb compact 'MINOR'; > Once all the compactions are complete, I should ideally see 20k rows in > TABLEB. But I see only 10k rows (Only the rows INSERTED before the last > compaction persist, the old rows are deleted. I believe the old delta files > are deleted). > To further confirm the bug, if I do only one compaction after two inserts, I > see 20k rows in TABLEB. > Proposed Fix: > I have identified the bug in the code, it requires an additional check in the > org.apache.hadoop.hive.ql.txn.compactor.Worker class to check for any active > compactions on the table/partition. I will 'share the details of the fix once > I test it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)