[jira] [Commented] (HIVE-24235) Drop and recreate table during MR compaction leaves behind base/delta directory

Karen Coppage (Jira) Tue, 27 Jul 2021 05:34:06 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-24235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17388018#comment-17388018
 ]


Karen Coppage commented on HIVE-24235:
--------------------------------------

This is useless because if the table is dropped, all related compactions are 
cleared from metadata. So here we're trying to fail a compaction that doesn't 
exist.

The actual fix for this issue is HIVE-25393.

> Drop and recreate table during MR compaction leaves behind base/delta 
> directory
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-24235
>                 URL: https://issues.apache.org/jira/browse/HIVE-24235
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Karen Coppage
>            Assignee: Karen Coppage
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> If a table is dropped and recreated during MR compaction, the table directory 
> and a base (or delta, if minor compaction) directory could be created, with 
> or without data, while the table "does not exist".
> E.g.
> {code:java}
> create table c (i int) stored as orc tblproperties 
> ("NO_AUTO_COMPACTION"="true", "transactional"="true");
> insert into c values (9);
> insert into c values (9);
> alter table c compact 'major';
> While compaction job is running: {
> drop table c;
> create table c (i int) stored as orc tblproperties 
> ("NO_AUTO_COMPACTION"="true", "transactional"="true");
> }
> {code}
> The table directory should be empty, but table directory could look like this 
> after the job is finished:
> {code:java}
> Oct  6 14:23 c/base_0000002_v0000101/._orc_acid_version.crc
> Oct  6 14:23 c/base_0000002_v0000101/.bucket_00000.crc
> Oct  6 14:23 c/base_0000002_v0000101/_orc_acid_version
> Oct  6 14:23 c/base_0000002_v0000101/bucket_00000
> {code}
> or perhaps just: 
> {code:java}
> Oct  6 14:23 c/base_0000002_v0000101/._orc_acid_version.crc
> Oct  6 14:23 c/base_0000002_v0000101/_orc_acid_version
> {code}
> Insert another row and you have:
> {code:java}
> Oct  6 14:33 base_0000002_v0000101/
> Oct  6 14:33 base_0000002_v0000101/._orc_acid_version.crc
> Oct  6 14:33 base_0000002_v0000101/.bucket_00000.crc
> Oct  6 14:33 base_0000002_v0000101/_orc_acid_version
> Oct  6 14:33 base_0000002_v0000101/bucket_00000
> Oct  6 14:35 delta_0000001_0000001_0000/._orc_acid_version.crc
> Oct  6 14:35 delta_0000001_0000001_0000/.bucket_00000_0.crc
> Oct  6 14:35 delta_0000001_0000001_0000/_orc_acid_version
> Oct  6 14:35 delta_0000001_0000001_0000/bucket_00000_0
> {code}
> Selecting from the table will result in this error because the highest valid 
> writeId for this table is 1:
> {code:java}
> thrift.ThriftCLIService: Error fetching results: 
> org.apache.hive.service.cli.HiveSQLException: Unable to get the next row set
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:482)
>  ~[hive-service-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
> ...
> Caused by: java.io.IOException: java.lang.RuntimeException: ORC split 
> generation failed with exception: java.io.IOException: Not enough history 
> available for (1,x).  Oldest available base: 
> .../warehouse/b/base_0000004_v0000092
> {code}
> Solution: Resolve the table again after compaction is finished; compare the 
> id with the table id from when compaction began. If the ids do not match, 
> abort the compaction's transaction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-24235) Drop and recreate table during MR compaction leaves behind base/delta directory

Reply via email to