Karen Coppage created HIVE-24235:
------------------------------------

             Summary: Drop and recreate table during MR compaction leaves 
behind base/delta directory
                 Key: HIVE-24235
                 URL: https://issues.apache.org/jira/browse/HIVE-24235
             Project: Hive
          Issue Type: Bug
            Reporter: Karen Coppage
            Assignee: Karen Coppage


If a table is dropped and recreated during MR compaction, the table directory 
and a base (or delta, if minor compaction) directory could be created, with or 
without data, while the table "does not exist".

E.g.
{code:java}
create table c (i int) stored as orc tblproperties 
("NO_AUTO_COMPACTION"="true", "transactional"="true");
insert into c values (9);
insert into c values (9);
alter table c compact 'major';

While compaction job is running: {
drop table c;
create table c (i int) stored as orc tblproperties 
("NO_AUTO_COMPACTION"="true", "transactional"="true");
}
{code}
The table directory should be empty, but table directory could look like this 
after the job is finished:
{code:java}
Oct  6 14:23 c/base_0000002_v0000101/._orc_acid_version.crc
Oct  6 14:23 c/base_0000002_v0000101/.bucket_00000.crc
Oct  6 14:23 c/base_0000002_v0000101/_orc_acid_version
Oct  6 14:23 c/base_0000002_v0000101/bucket_00000
{code}
or perhaps just: 
{code:java}
Oct  6 14:23 c/base_0000002_v0000101/._orc_acid_version.crc
Oct  6 14:23 c/base_0000002_v0000101/_orc_acid_version
{code}
Insert another row and you have:
{code:java}
Oct  6 14:33 base_0000002_v0000101/
Oct  6 14:33 base_0000002_v0000101/._orc_acid_version.crc
Oct  6 14:33 base_0000002_v0000101/.bucket_00000.crc
Oct  6 14:33 base_0000002_v0000101/_orc_acid_version
Oct  6 14:33 base_0000002_v0000101/bucket_00000
Oct  6 14:35 delta_0000001_0000001_0000/._orc_acid_version.crc
Oct  6 14:35 delta_0000001_0000001_0000/.bucket_00000_0.crc
Oct  6 14:35 delta_0000001_0000001_0000/_orc_acid_version
Oct  6 14:35 delta_0000001_0000001_0000/bucket_00000_0
{code}
Selecting from the table will result in this error because the highest valid 
writeId for this table is 1:
{code:java}
thrift.ThriftCLIService: Error fetching results: 
org.apache.hive.service.cli.HiveSQLException: Unable to get the next row set
        at 
org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:482)
 ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
...
Caused by: java.io.IOException: java.lang.RuntimeException: ORC split 
generation failed with exception: java.io.IOException: Not enough history 
available for (1,x).  Oldest available base: 
.../warehouse/b/base_0000004_v0000092
{code}
Solution: Resolve the table again after compaction is finished; compare the id 
with the table id from when compaction began. If the ids do not match, abort 
the compaction's transaction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to