Karen Coppage created HIVE-24235: ------------------------------------ Summary: Drop and recreate table during MR compaction leaves behind base/delta directory Key: HIVE-24235 URL: https://issues.apache.org/jira/browse/HIVE-24235 Project: Hive Issue Type: Bug Reporter: Karen Coppage Assignee: Karen Coppage
If a table is dropped and recreated during MR compaction, the table directory and a base (or delta, if minor compaction) directory could be created, with or without data, while the table "does not exist". E.g. {code:java} create table c (i int) stored as orc tblproperties ("NO_AUTO_COMPACTION"="true", "transactional"="true"); insert into c values (9); insert into c values (9); alter table c compact 'major'; While compaction job is running: { drop table c; create table c (i int) stored as orc tblproperties ("NO_AUTO_COMPACTION"="true", "transactional"="true"); } {code} The table directory should be empty, but table directory could look like this after the job is finished: {code:java} Oct 6 14:23 c/base_0000002_v0000101/._orc_acid_version.crc Oct 6 14:23 c/base_0000002_v0000101/.bucket_00000.crc Oct 6 14:23 c/base_0000002_v0000101/_orc_acid_version Oct 6 14:23 c/base_0000002_v0000101/bucket_00000 {code} or perhaps just: {code:java} Oct 6 14:23 c/base_0000002_v0000101/._orc_acid_version.crc Oct 6 14:23 c/base_0000002_v0000101/_orc_acid_version {code} Insert another row and you have: {code:java} Oct 6 14:33 base_0000002_v0000101/ Oct 6 14:33 base_0000002_v0000101/._orc_acid_version.crc Oct 6 14:33 base_0000002_v0000101/.bucket_00000.crc Oct 6 14:33 base_0000002_v0000101/_orc_acid_version Oct 6 14:33 base_0000002_v0000101/bucket_00000 Oct 6 14:35 delta_0000001_0000001_0000/._orc_acid_version.crc Oct 6 14:35 delta_0000001_0000001_0000/.bucket_00000_0.crc Oct 6 14:35 delta_0000001_0000001_0000/_orc_acid_version Oct 6 14:35 delta_0000001_0000001_0000/bucket_00000_0 {code} Selecting from the table will result in this error because the highest valid writeId for this table is 1: {code:java} thrift.ThriftCLIService: Error fetching results: org.apache.hive.service.cli.HiveSQLException: Unable to get the next row set at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:482) ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] ... Caused by: java.io.IOException: java.lang.RuntimeException: ORC split generation failed with exception: java.io.IOException: Not enough history available for (1,x). Oldest available base: .../warehouse/b/base_0000004_v0000092 {code} Solution: Resolve the table again after compaction is finished; compare the id with the table id from when compaction began. If the ids do not match, abort the compaction's transaction. -- This message was sent by Atlassian Jira (v8.3.4#803005)