[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=797529=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-797529
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 03/Aug/22 07:50
Start Date: 03/Aug/22 07:50
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged PR #3457:
URL: https://github.com/apache/hive/pull/3457




Issue Time Tracking
---

Worklog Id: (was: 797529)
Time Spent: 13.5h  (was: 13h 20m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=797189=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-797189
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 02/Aug/22 10:52
Start Date: 02/Aug/22 10:52
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r935414100


##
ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java:
##
@@ -3080,6 +3084,83 @@ public void testAcidOrcWritePreservesFieldNames() throws 
Exception {
 hiveConf.setBoolVar(HiveConf.ConfVars.HIVE_VECTORIZATION_ENABLED, true);
   }
 
+  @Test
+  public void testFailureScenariosCleanupCTAS() throws Exception {
+boolean[] booleans = {true, false};
+for (boolean var1 : booleans) {

Review Comment:
   Done. Nice use of bitwise operators :) .





Issue Time Tracking
---

Worklog Id: (was: 797189)
Time Spent: 13h 20m  (was: 13h 10m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=797188=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-797188
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 02/Aug/22 10:51
Start Date: 02/Aug/22 10:51
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r935413606


##
ql/src/java/org/apache/hadoop/hive/ql/Driver.java:
##
@@ -905,6 +905,11 @@ public QueryDisplay getQueryDisplay() {
 return driverContext.getQueryDisplay();
   }
 
+  @VisibleForTesting
+  DriverContext getDriverContext() {

Review Comment:
   Removed. Done.





Issue Time Tracking
---

Worklog Id: (was: 797188)
Time Spent: 13h 10m  (was: 13h)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=797142=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-797142
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 02/Aug/22 07:43
Start Date: 02/Aug/22 07:43
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r935221669


##
ql/src/java/org/apache/hadoop/hive/ql/Driver.java:
##
@@ -905,6 +905,11 @@ public QueryDisplay getQueryDisplay() {
 return driverContext.getQueryDisplay();
   }
 
+  @VisibleForTesting
+  DriverContext getDriverContext() {

Review Comment:
   no need to expose context in test. if you need queryPlan, you can simply 
call:
   
   driver.getPlan()
   





Issue Time Tracking
---

Worklog Id: (was: 797142)
Time Spent: 13h  (was: 12h 50m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 13h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=797141=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-797141
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 02/Aug/22 07:31
Start Date: 02/Aug/22 07:31
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r935205909


##
ql/src/java/org/apache/hadoop/hive/ql/HiveQueryLifeTimeHook.java:
##
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.CompactionRequest;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.ql.hooks.PrivateHookContext;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.plan.FileSinkDesc;
+import org.apache.thrift.TException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.IOException;
+
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.IF_PURGE;
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.META_TABLE_LOCATION;
+
+public class HiveQueryLifeTimeHook implements QueryLifeTimeHook {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveQueryLifeTimeHook.class);
+
+  @Override
+  public void beforeCompile(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterCompile(QueryLifeTimeHookContext ctx, boolean hasError) {
+
+  }
+
+  @Override
+  public void beforeExecution(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterExecution(QueryLifeTimeHookContext ctx, boolean hasError) {
+if (hasError) {
+  checkAndRollbackCTAS(ctx);
+}
+  }
+
+  private void checkAndRollbackCTAS(QueryLifeTimeHookContext ctx) {
+HiveConf conf = ctx.getHiveConf();
+PrivateHookContext pCtx = (PrivateHookContext) ctx.getHookContext();
+QueryPlan queryPlan = ctx.getHookContext().getQueryPlan();
+if (queryPlan.getAcidSinks() != null && queryPlan.getAcidSinks().size() > 
0) {
+  FileSinkDesc fileSinkDesc = queryPlan.getAcidSinks().iterator().next();
+  Table table = fileSinkDesc.getTable();
+  long writeId = fileSinkDesc.getTableWriteId();
+  boolean isCTAS = 
ctx.getHookContext().getQueryPlan().getQueryProperties().isCTAS();
+  Path destinationPath = pCtx.getContext().getLocation();
+
+  if (destinationPath != null && table != null && isCTAS &&
+  HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {
+LOG.info("Performing cleanup as part of rollback: {}", 
table.getFullTableName().toString());
+try {
+  CompactionRequest rqst = new CompactionRequest(table.getDbName(), 
table.getTableName(),
+  CompactionType.MAJOR);
+  rqst.setRunas(TxnUtils.findUserToRunAs(destinationPath.toString(), 
table.getTTable(), conf));
+  rqst.putToProperties(META_TABLE_LOCATION, 
destinationPath.toString());
+  rqst.putToProperties(IF_PURGE, Boolean.toString(true));
+  boolean success = Hive.get(conf).getMSC().submitForCleanup(rqst, 
writeId,
+  pCtx.getQueryState().getTxnManager().getCurrentTxnId());
+  if (success) {
+LOG.info("The cleanup request has been submitted");
+  } else {
+LOG.info("The cleanup request has not been submitted");
+  }
+} catch (HiveException | IOException | InterruptedException | 
TException e) {

Review Comment:
   we shouldn't catch InterruptedException





Issue Time Tracking
---

Worklog Id: (was: 797141)
Time Spent: 12h 

[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=797139=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-797139
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 02/Aug/22 07:25
Start Date: 02/Aug/22 07:25
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r935205909


##
ql/src/java/org/apache/hadoop/hive/ql/HiveQueryLifeTimeHook.java:
##
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.CompactionRequest;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.ql.hooks.PrivateHookContext;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.plan.FileSinkDesc;
+import org.apache.thrift.TException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.IOException;
+
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.IF_PURGE;
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.META_TABLE_LOCATION;
+
+public class HiveQueryLifeTimeHook implements QueryLifeTimeHook {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveQueryLifeTimeHook.class);
+
+  @Override
+  public void beforeCompile(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterCompile(QueryLifeTimeHookContext ctx, boolean hasError) {
+
+  }
+
+  @Override
+  public void beforeExecution(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterExecution(QueryLifeTimeHookContext ctx, boolean hasError) {
+if (hasError) {
+  checkAndRollbackCTAS(ctx);
+}
+  }
+
+  private void checkAndRollbackCTAS(QueryLifeTimeHookContext ctx) {
+HiveConf conf = ctx.getHiveConf();
+PrivateHookContext pCtx = (PrivateHookContext) ctx.getHookContext();
+QueryPlan queryPlan = ctx.getHookContext().getQueryPlan();
+if (queryPlan.getAcidSinks() != null && queryPlan.getAcidSinks().size() > 
0) {
+  FileSinkDesc fileSinkDesc = queryPlan.getAcidSinks().iterator().next();
+  Table table = fileSinkDesc.getTable();
+  long writeId = fileSinkDesc.getTableWriteId();
+  boolean isCTAS = 
ctx.getHookContext().getQueryPlan().getQueryProperties().isCTAS();
+  Path destinationPath = pCtx.getContext().getLocation();
+
+  if (destinationPath != null && table != null && isCTAS &&
+  HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {
+LOG.info("Performing cleanup as part of rollback: {}", 
table.getFullTableName().toString());
+try {
+  CompactionRequest rqst = new CompactionRequest(table.getDbName(), 
table.getTableName(),
+  CompactionType.MAJOR);
+  rqst.setRunas(TxnUtils.findUserToRunAs(destinationPath.toString(), 
table.getTTable(), conf));
+  rqst.putToProperties(META_TABLE_LOCATION, 
destinationPath.toString());
+  rqst.putToProperties(IF_PURGE, Boolean.toString(true));
+  boolean success = Hive.get(conf).getMSC().submitForCleanup(rqst, 
writeId,
+  pCtx.getQueryState().getTxnManager().getCurrentTxnId());
+  if (success) {
+LOG.info("The cleanup request has been submitted");
+  } else {
+LOG.info("The cleanup request has not been submitted");
+  }
+} catch (HiveException | IOException | InterruptedException | 
TException e) {

Review Comment:
   we shouldn't catch InterruptedException





Issue Time Tracking
---

Worklog Id: (was: 797139)
Time Spent: 12h 

[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=797137=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-797137
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 02/Aug/22 07:20
Start Date: 02/Aug/22 07:20
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r935201293


##
ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java:
##
@@ -3080,6 +3084,83 @@ public void testAcidOrcWritePreservesFieldNames() throws 
Exception {
 hiveConf.setBoolVar(HiveConf.ConfVars.HIVE_VECTORIZATION_ENABLED, true);
   }
 
+  @Test
+  public void testFailureScenariosCleanupCTAS() throws Exception {
+boolean[] booleans = {true, false};
+for (boolean var1 : booleans) {
+  for (boolean var2 : booleans) {
+for (boolean var3 : booleans) {
+  for (boolean var4 : booleans) {
+failureScenarioCleanupCTAS(var1, var2, var3, var4);
+  }
+}
+  }
+}
+  }
+
+  public void failureScenarioCleanupCTAS(boolean isPartitioned,
+ boolean isDirectInsertEnabled,
+ boolean isLocklessReadsEnabled,
+ boolean isLocationUsed) throws 
Exception {
+String tableName = "atable";
+
+//Set configurations
+hiveConf.setBoolVar(HiveConf.ConfVars.HIVE_ACID_DIRECT_INSERT_ENABLED, 
isDirectInsertEnabled);
+hiveConf.setBoolVar(HiveConf.ConfVars.HIVE_ACID_LOCKLESS_READS_ENABLED, 
isLocklessReadsEnabled);
+hiveConf.setBoolVar(HiveConf.ConfVars.TXN_CTAS_X_LOCK, true);
+hiveConf.setIntVar(HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD, 
0);
+
+// Add a '1' at the end of table name for custom location.
+String querylocation = (isLocationUsed) ? " location '" + 
getWarehouseDir() + "/" + tableName + "1'" : "";
+String queryPartitions = (isPartitioned) ? " partitioned by (a)" : "";
+
+d.run("insert into " + Table.ACIDTBL + "(a,b) values (3,4)");
+d.run("drop table if exists " + tableName);
+d.compileAndRespond("create table " + tableName + queryPartitions + " 
stored as orc" + querylocation +
+" tblproperties ('transactional'='true') as select * from " + 
Table.ACIDTBL);
+long txnId = d.getQueryState().getTxnManager().getCurrentTxnId();
+DriverContext driverContext = d.getDriverContext();
+traverseTasksRecursively(driverContext.getPlan().getRootTasks());

Review Comment:
   Assigning a task like this wont help us in simulating the error because the 
write will not happen. However, I have replaced the failure in DDLTask my 
mocking in a similar way.





Issue Time Tracking
---

Worklog Id: (was: 797137)
Time Spent: 12.5h  (was: 12h 20m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 12.5h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=797135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-797135
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 02/Aug/22 07:19
Start Date: 02/Aug/22 07:19
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r935200088


##
ql/src/java/org/apache/hadoop/hive/ql/HiveQueryLifeTimeHook.java:
##
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.CompactionRequest;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.ql.hooks.PrivateHookContext;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.plan.HiveOperation;
+import org.apache.thrift.TException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.IOException;
+
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.IF_PURGE;
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.META_TABLE_LOCATION;
+
+public class HiveQueryLifeTimeHook implements QueryLifeTimeHook {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveQueryLifeTimeHook.class);
+
+  @Override
+  public void beforeCompile(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterCompile(QueryLifeTimeHookContext ctx, boolean hasError) {
+
+  }
+
+  @Override
+  public void beforeExecution(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterExecution(QueryLifeTimeHookContext ctx, boolean hasError) {
+if (hasError) {
+  checkAndRollbackCTAS(ctx);
+}
+  }
+
+  private void checkAndRollbackCTAS(QueryLifeTimeHookContext ctx) {
+HiveConf conf = ctx.getHiveConf();
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {
+  PrivateHookContext pCtx = (PrivateHookContext) ctx.getHookContext();
+  Table table = 
ctx.getHookContext().getQueryPlan().getAcidSinks().iterator().next().getTable();

Review Comment:
   Done.



##
ql/src/java/org/apache/hadoop/hive/ql/Context.java:
##
@@ -191,6 +191,8 @@ public class Context {
 
   private List> parsedTables = new ArrayList<>();
 
+  private Path destinationPath;

Review Comment:
   Done.





Issue Time Tracking
---

Worklog Id: (was: 797135)
Time Spent: 12h 10m  (was: 12h)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=797136=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-797136
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 02/Aug/22 07:19
Start Date: 02/Aug/22 07:19
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r935200343


##
ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java:
##
@@ -3080,6 +3084,83 @@ public void testAcidOrcWritePreservesFieldNames() throws 
Exception {
 hiveConf.setBoolVar(HiveConf.ConfVars.HIVE_VECTORIZATION_ENABLED, true);
   }
 
+  @Test
+  public void testFailureScenariosCleanupCTAS() throws Exception {
+boolean[] booleans = {true, false};
+for (boolean var1 : booleans) {
+  for (boolean var2 : booleans) {
+for (boolean var3 : booleans) {
+  for (boolean var4 : booleans) {
+failureScenarioCleanupCTAS(var1, var2, var3, var4);
+  }
+}
+  }
+}
+  }
+
+  public void failureScenarioCleanupCTAS(boolean isPartitioned,
+ boolean isDirectInsertEnabled,
+ boolean isLocklessReadsEnabled,
+ boolean isLocationUsed) throws 
Exception {
+String tableName = "atable";
+
+//Set configurations
+hiveConf.setBoolVar(HiveConf.ConfVars.HIVE_ACID_DIRECT_INSERT_ENABLED, 
isDirectInsertEnabled);
+hiveConf.setBoolVar(HiveConf.ConfVars.HIVE_ACID_LOCKLESS_READS_ENABLED, 
isLocklessReadsEnabled);
+hiveConf.setBoolVar(HiveConf.ConfVars.TXN_CTAS_X_LOCK, true);
+hiveConf.setIntVar(HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD, 
0);
+
+// Add a '1' at the end of table name for custom location.
+String querylocation = (isLocationUsed) ? " location '" + 
getWarehouseDir() + "/" + tableName + "1'" : "";
+String queryPartitions = (isPartitioned) ? " partitioned by (a)" : "";
+
+d.run("insert into " + Table.ACIDTBL + "(a,b) values (3,4)");
+d.run("drop table if exists " + tableName);
+d.compileAndRespond("create table " + tableName + queryPartitions + " 
stored as orc" + querylocation +
+" tblproperties ('transactional'='true') as select * from " + 
Table.ACIDTBL);
+long txnId = d.getQueryState().getTxnManager().getCurrentTxnId();
+DriverContext driverContext = d.getDriverContext();
+traverseTasksRecursively(driverContext.getPlan().getRootTasks());
+int assertError = 0;
+try {
+  d.run();
+} catch (Exception e) {
+  assertError = 1;
+}
+
+runInitiator(hiveConf);

Review Comment:
   Removed. Done.





Issue Time Tracking
---

Worklog Id: (was: 797136)
Time Spent: 12h 20m  (was: 12h 10m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 12h 20m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=797134=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-797134
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 02/Aug/22 07:19
Start Date: 02/Aug/22 07:19
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r935199794


##
ql/src/java/org/apache/hadoop/hive/ql/HiveQueryLifeTimeHook.java:
##
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.CompactionRequest;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.ql.hooks.PrivateHookContext;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.plan.HiveOperation;
+import org.apache.thrift.TException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.IOException;
+
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.IF_PURGE;
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.META_TABLE_LOCATION;
+
+public class HiveQueryLifeTimeHook implements QueryLifeTimeHook {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveQueryLifeTimeHook.class);
+
+  @Override
+  public void beforeCompile(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterCompile(QueryLifeTimeHookContext ctx, boolean hasError) {
+
+  }
+
+  @Override
+  public void beforeExecution(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterExecution(QueryLifeTimeHookContext ctx, boolean hasError) {
+if (hasError) {
+  checkAndRollbackCTAS(ctx);
+}
+  }
+
+  private void checkAndRollbackCTAS(QueryLifeTimeHookContext ctx) {
+HiveConf conf = ctx.getHiveConf();
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {
+  PrivateHookContext pCtx = (PrivateHookContext) ctx.getHookContext();
+  Table table = 
ctx.getHookContext().getQueryPlan().getAcidSinks().iterator().next().getTable();

Review Comment:
   Made sure that there is atleast one `AcidSink`. Done



##
ql/src/java/org/apache/hadoop/hive/ql/HiveQueryLifeTimeHook.java:
##
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.CompactionRequest;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.ql.hooks.PrivateHookContext;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import 

[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=797133=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-797133
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 02/Aug/22 07:18
Start Date: 02/Aug/22 07:18
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r934701343


##
ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java:
##
@@ -3080,6 +3084,83 @@ public void testAcidOrcWritePreservesFieldNames() throws 
Exception {
 hiveConf.setBoolVar(HiveConf.ConfVars.HIVE_VECTORIZATION_ENABLED, true);
   }
 
+  @Test
+  public void testFailureScenariosCleanupCTAS() throws Exception {
+boolean[] booleans = {true, false};
+for (boolean var1 : booleans) {

Review Comment:
   could we create parametrized test:
   
   private static Stream generateArgs() {
   return IntStream.range(0, 1 << 4).mapToObj(i ->
 Arguments.of((i & 1) != 0, ((i >>> 1) & 1) != 0, ((i >>> 2) & 1) != 0, 
((i >>> 3) & 1) != 0));
 }
   
 @ParameterizedTest
 @MethodSource("generateArgs")
 public void failureScenarioCleanupCTAS(boolean isDirectInsertEnabled,
boolean isLocklessReadsEnabled,
boolean isLocationUsed, 
   boolean isLocationUsed) throws 
Exception {
   





Issue Time Tracking
---

Worklog Id: (was: 797133)
Time Spent: 11h 50m  (was: 11h 40m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=797132=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-797132
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 02/Aug/22 07:18
Start Date: 02/Aug/22 07:18
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r935198844


##
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##
@@ -7983,7 +7984,17 @@ protected boolean enableColumnStatsCollecting() {
 return true;
   }
 
-  private Path getCtasLocation(CreateTableDesc tblDesc) throws 
SemanticException {
+  private Path getCtasLocation(CreateTableDesc tblDesc, boolean 
createTableWithSuffix) throws SemanticException {
+Path destinationPath = getCtasLocationWithoutSuffix(tblDesc);
+if (createTableWithSuffix) {
+  long txnId = ctx.getHiveTxnManager().getCurrentTxnId();
+  String suffix = AcidUtils.getPathSuffix(txnId);
+  destinationPath = new Path(destinationPath.toString() + suffix);
+}
+return destinationPath;
+  }
+
+  private Path getCtasLocationWithoutSuffix(CreateTableDesc tblDesc) throws 
SemanticException {

Review Comment:
   Done



##
ql/src/java/org/apache/hadoop/hive/ql/HiveQueryLifeTimeHook.java:
##
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.CompactionRequest;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.ql.hooks.PrivateHookContext;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.plan.HiveOperation;
+import org.apache.thrift.TException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.IOException;
+
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.IF_PURGE;
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.META_TABLE_LOCATION;
+
+public class HiveQueryLifeTimeHook implements QueryLifeTimeHook {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveQueryLifeTimeHook.class);
+
+  @Override
+  public void beforeCompile(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterCompile(QueryLifeTimeHookContext ctx, boolean hasError) {
+
+  }
+
+  @Override
+  public void beforeExecution(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterExecution(QueryLifeTimeHookContext ctx, boolean hasError) {
+if (hasError) {
+  checkAndRollbackCTAS(ctx);
+}
+  }
+
+  private void checkAndRollbackCTAS(QueryLifeTimeHookContext ctx) {
+HiveConf conf = ctx.getHiveConf();
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {
+  PrivateHookContext pCtx = (PrivateHookContext) ctx.getHookContext();
+  Table table = 
ctx.getHookContext().getQueryPlan().getAcidSinks().iterator().next().getTable();
+  boolean isCTAS = ctx.getHookContext().getQueryState().getCommandType()

Review Comment:
   Done





Issue Time Tracking
---

Worklog Id: (was: 797132)
Time Spent: 11h 40m  (was: 11.5h)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time 

[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796944=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796944
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 01/Aug/22 15:55
Start Date: 01/Aug/22 15:55
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r934683229


##
ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java:
##
@@ -3080,6 +3084,83 @@ public void testAcidOrcWritePreservesFieldNames() throws 
Exception {
 hiveConf.setBoolVar(HiveConf.ConfVars.HIVE_VECTORIZATION_ENABLED, true);
   }
 
+  @Test
+  public void testFailureScenariosCleanupCTAS() throws Exception {
+boolean[] booleans = {true, false};
+for (boolean var1 : booleans) {
+  for (boolean var2 : booleans) {
+for (boolean var3 : booleans) {
+  for (boolean var4 : booleans) {
+failureScenarioCleanupCTAS(var1, var2, var3, var4);
+  }
+}
+  }
+}
+  }
+
+  public void failureScenarioCleanupCTAS(boolean isPartitioned,
+ boolean isDirectInsertEnabled,
+ boolean isLocklessReadsEnabled,
+ boolean isLocationUsed) throws 
Exception {
+String tableName = "atable";
+
+//Set configurations
+hiveConf.setBoolVar(HiveConf.ConfVars.HIVE_ACID_DIRECT_INSERT_ENABLED, 
isDirectInsertEnabled);
+hiveConf.setBoolVar(HiveConf.ConfVars.HIVE_ACID_LOCKLESS_READS_ENABLED, 
isLocklessReadsEnabled);
+hiveConf.setBoolVar(HiveConf.ConfVars.TXN_CTAS_X_LOCK, true);
+hiveConf.setIntVar(HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD, 
0);
+
+// Add a '1' at the end of table name for custom location.
+String querylocation = (isLocationUsed) ? " location '" + 
getWarehouseDir() + "/" + tableName + "1'" : "";
+String queryPartitions = (isPartitioned) ? " partitioned by (a)" : "";
+
+d.run("insert into " + Table.ACIDTBL + "(a,b) values (3,4)");
+d.run("drop table if exists " + tableName);
+d.compileAndRespond("create table " + tableName + queryPartitions + " 
stored as orc" + querylocation +
+" tblproperties ('transactional'='true') as select * from " + 
Table.ACIDTBL);
+long txnId = d.getQueryState().getTxnManager().getCurrentTxnId();
+DriverContext driverContext = d.getDriverContext();
+traverseTasksRecursively(driverContext.getPlan().getRootTasks());

Review Comment:
   could be simplified with
   
   MapRedTask mrtask = Mockito.spy(new MapRedTask());
   Mockito.doThrow(new RuntimeException()).when(mrtask).execute();
   driverContext.getPlan().setRootTasks(Lists.newArrayList(mrtask));
   





Issue Time Tracking
---

Worklog Id: (was: 796944)
Time Spent: 11.5h  (was: 11h 20m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796909=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796909
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 01/Aug/22 14:21
Start Date: 01/Aug/22 14:21
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r934588653


##
ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java:
##
@@ -3080,6 +3084,83 @@ public void testAcidOrcWritePreservesFieldNames() throws 
Exception {
 hiveConf.setBoolVar(HiveConf.ConfVars.HIVE_VECTORIZATION_ENABLED, true);
   }
 
+  @Test
+  public void testFailureScenariosCleanupCTAS() throws Exception {
+boolean[] booleans = {true, false};
+for (boolean var1 : booleans) {

Review Comment:
   Its always better to check all cases.





Issue Time Tracking
---

Worklog Id: (was: 796909)
Time Spent: 11h 20m  (was: 11h 10m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796908=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796908
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 01/Aug/22 14:21
Start Date: 01/Aug/22 14:21
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r934588615


##
ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java:
##
@@ -3080,6 +3084,83 @@ public void testAcidOrcWritePreservesFieldNames() throws 
Exception {
 hiveConf.setBoolVar(HiveConf.ConfVars.HIVE_VECTORIZATION_ENABLED, true);
   }
 
+  @Test
+  public void testFailureScenariosCleanupCTAS() throws Exception {
+boolean[] booleans = {true, false};
+for (boolean var1 : booleans) {
+  for (boolean var2 : booleans) {
+for (boolean var3 : booleans) {
+  for (boolean var4 : booleans) {
+failureScenarioCleanupCTAS(var1, var2, var3, var4);
+  }
+}
+  }
+}
+  }
+
+  public void failureScenarioCleanupCTAS(boolean isPartitioned,
+ boolean isDirectInsertEnabled,
+ boolean isLocklessReadsEnabled,
+ boolean isLocationUsed) throws 
Exception {
+String tableName = "atable";
+
+//Set configurations
+hiveConf.setBoolVar(HiveConf.ConfVars.HIVE_ACID_DIRECT_INSERT_ENABLED, 
isDirectInsertEnabled);
+hiveConf.setBoolVar(HiveConf.ConfVars.HIVE_ACID_LOCKLESS_READS_ENABLED, 
isLocklessReadsEnabled);
+hiveConf.setBoolVar(HiveConf.ConfVars.TXN_CTAS_X_LOCK, true);
+hiveConf.setIntVar(HiveConf.ConfVars.HIVE_COMPACTOR_ABORTEDTXN_THRESHOLD, 
0);
+
+// Add a '1' at the end of table name for custom location.
+String querylocation = (isLocationUsed) ? " location '" + 
getWarehouseDir() + "/" + tableName + "1'" : "";
+String queryPartitions = (isPartitioned) ? " partitioned by (a)" : "";
+
+d.run("insert into " + Table.ACIDTBL + "(a,b) values (3,4)");
+d.run("drop table if exists " + tableName);
+d.compileAndRespond("create table " + tableName + queryPartitions + " 
stored as orc" + querylocation +
+" tblproperties ('transactional'='true') as select * from " + 
Table.ACIDTBL);
+long txnId = d.getQueryState().getTxnManager().getCurrentTxnId();
+DriverContext driverContext = d.getDriverContext();
+traverseTasksRecursively(driverContext.getPlan().getRootTasks());
+int assertError = 0;
+try {
+  d.run();
+} catch (Exception e) {
+  assertError = 1;
+}
+
+runInitiator(hiveConf);

Review Comment:
   why do you need initiator here?





Issue Time Tracking
---

Worklog Id: (was: 796908)
Time Spent: 11h 10m  (was: 11h)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796907=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796907
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 01/Aug/22 14:18
Start Date: 01/Aug/22 14:18
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r934585516


##
ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java:
##
@@ -3080,6 +3084,83 @@ public void testAcidOrcWritePreservesFieldNames() throws 
Exception {
 hiveConf.setBoolVar(HiveConf.ConfVars.HIVE_VECTORIZATION_ENABLED, true);
   }
 
+  @Test
+  public void testFailureScenariosCleanupCTAS() throws Exception {
+boolean[] booleans = {true, false};
+for (boolean var1 : booleans) {

Review Comment:
   do we really need to cover all 16 combinations?





Issue Time Tracking
---

Worklog Id: (was: 796907)
Time Spent: 11h  (was: 10h 50m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 11h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796898=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796898
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 01/Aug/22 14:01
Start Date: 01/Aug/22 14:01
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r934568273


##
ql/src/java/org/apache/hadoop/hive/ql/Context.java:
##
@@ -191,6 +191,8 @@ public class Context {
 
   private List> parsedTables = new ArrayList<>();
 
+  private Path destinationPath;

Review Comment:
   change this to `location`





Issue Time Tracking
---

Worklog Id: (was: 796898)
Time Spent: 10h 50m  (was: 10h 40m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796897=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796897
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 01/Aug/22 14:00
Start Date: 01/Aug/22 14:00
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r934566867


##
ql/src/java/org/apache/hadoop/hive/ql/HiveQueryLifeTimeHook.java:
##
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.CompactionRequest;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.ql.hooks.PrivateHookContext;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.plan.HiveOperation;
+import org.apache.thrift.TException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.IOException;
+
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.IF_PURGE;
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.META_TABLE_LOCATION;
+
+public class HiveQueryLifeTimeHook implements QueryLifeTimeHook {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveQueryLifeTimeHook.class);
+
+  @Override
+  public void beforeCompile(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterCompile(QueryLifeTimeHookContext ctx, boolean hasError) {
+
+  }
+
+  @Override
+  public void beforeExecution(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterExecution(QueryLifeTimeHookContext ctx, boolean hasError) {
+if (hasError) {
+  checkAndRollbackCTAS(ctx);
+}
+  }
+
+  private void checkAndRollbackCTAS(QueryLifeTimeHookContext ctx) {
+HiveConf conf = ctx.getHiveConf();
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {
+  PrivateHookContext pCtx = (PrivateHookContext) ctx.getHookContext();
+  Table table = 
ctx.getHookContext().getQueryPlan().getAcidSinks().iterator().next().getTable();

Review Comment:
   extract queryPlan into a local var for readability
   





Issue Time Tracking
---

Worklog Id: (was: 796897)
Time Spent: 10h 40m  (was: 10.5h)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 40m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796894=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796894
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 01/Aug/22 13:58
Start Date: 01/Aug/22 13:58
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r934564806


##
ql/src/java/org/apache/hadoop/hive/ql/HiveQueryLifeTimeHook.java:
##
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.CompactionRequest;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.ql.hooks.PrivateHookContext;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.plan.HiveOperation;
+import org.apache.thrift.TException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.IOException;
+
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.IF_PURGE;
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.META_TABLE_LOCATION;
+
+public class HiveQueryLifeTimeHook implements QueryLifeTimeHook {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveQueryLifeTimeHook.class);
+
+  @Override
+  public void beforeCompile(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterCompile(QueryLifeTimeHookContext ctx, boolean hasError) {
+
+  }
+
+  @Override
+  public void beforeExecution(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterExecution(QueryLifeTimeHookContext ctx, boolean hasError) {
+if (hasError) {
+  checkAndRollbackCTAS(ctx);
+}
+  }
+
+  private void checkAndRollbackCTAS(QueryLifeTimeHookContext ctx) {
+HiveConf conf = ctx.getHiveConf();
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {

Review Comment:
   check this only if it's CTAS, move it under the location and operationType 
check





Issue Time Tracking
---

Worklog Id: (was: 796894)
Time Spent: 10.5h  (was: 10h 20m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10.5h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796892=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796892
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 01/Aug/22 13:58
Start Date: 01/Aug/22 13:58
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r934564806


##
ql/src/java/org/apache/hadoop/hive/ql/HiveQueryLifeTimeHook.java:
##
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.CompactionRequest;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.ql.hooks.PrivateHookContext;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.plan.HiveOperation;
+import org.apache.thrift.TException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.IOException;
+
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.IF_PURGE;
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.META_TABLE_LOCATION;
+
+public class HiveQueryLifeTimeHook implements QueryLifeTimeHook {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveQueryLifeTimeHook.class);
+
+  @Override
+  public void beforeCompile(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterCompile(QueryLifeTimeHookContext ctx, boolean hasError) {
+
+  }
+
+  @Override
+  public void beforeExecution(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterExecution(QueryLifeTimeHookContext ctx, boolean hasError) {
+if (hasError) {
+  checkAndRollbackCTAS(ctx);
+}
+  }
+
+  private void checkAndRollbackCTAS(QueryLifeTimeHookContext ctx) {
+HiveConf conf = ctx.getHiveConf();
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {

Review Comment:
   check this if it's CTAS, move it under the location and operationType check





Issue Time Tracking
---

Worklog Id: (was: 796892)
Time Spent: 10h 20m  (was: 10h 10m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 20m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796888=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796888
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 01/Aug/22 13:40
Start Date: 01/Aug/22 13:40
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r934530893


##
ql/src/java/org/apache/hadoop/hive/ql/Context.java:
##
@@ -191,6 +191,8 @@ public class Context {
 
   private List> parsedTables = new ArrayList<>();
 
+  private Path destinationPath;

Review Comment:
   change to `location`
   





Issue Time Tracking
---

Worklog Id: (was: 796888)
Time Spent: 10h 10m  (was: 10h)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796887=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796887
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 01/Aug/22 13:37
Start Date: 01/Aug/22 13:37
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r934530893


##
ql/src/java/org/apache/hadoop/hive/ql/Context.java:
##
@@ -191,6 +191,8 @@ public class Context {
 
   private List> parsedTables = new ArrayList<>();
 
+  private Path destinationPath;

Review Comment:
   change to `location`, please   check what is resDir, maybe it's the same
   





Issue Time Tracking
---

Worklog Id: (was: 796887)
Time Spent: 10h  (was: 9h 50m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796886=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796886
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 01/Aug/22 13:36
Start Date: 01/Aug/22 13:36
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r934537997


##
ql/src/java/org/apache/hadoop/hive/ql/HiveQueryLifeTimeHook.java:
##
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.CompactionRequest;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.ql.hooks.PrivateHookContext;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.plan.HiveOperation;
+import org.apache.thrift.TException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.IOException;
+
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.IF_PURGE;
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.META_TABLE_LOCATION;
+
+public class HiveQueryLifeTimeHook implements QueryLifeTimeHook {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveQueryLifeTimeHook.class);
+
+  @Override
+  public void beforeCompile(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterCompile(QueryLifeTimeHookContext ctx, boolean hasError) {
+
+  }
+
+  @Override
+  public void beforeExecution(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterExecution(QueryLifeTimeHookContext ctx, boolean hasError) {
+if (hasError) {
+  checkAndRollbackCTAS(ctx);
+}
+  }
+
+  private void checkAndRollbackCTAS(QueryLifeTimeHookContext ctx) {
+HiveConf conf = ctx.getHiveConf();
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {
+  PrivateHookContext pCtx = (PrivateHookContext) ctx.getHookContext();
+  Table table = 
ctx.getHookContext().getQueryPlan().getAcidSinks().iterator().next().getTable();

Review Comment:
   call it after checking the location and operationType. Make sure it's not 
empty





Issue Time Tracking
---

Worklog Id: (was: 796886)
Time Spent: 9h 50m  (was: 9h 40m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 50m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796884=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796884
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 01/Aug/22 13:32
Start Date: 01/Aug/22 13:32
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r934532620


##
ql/src/java/org/apache/hadoop/hive/ql/HiveQueryLifeTimeHook.java:
##
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.CompactionRequest;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.ql.hooks.PrivateHookContext;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.plan.HiveOperation;
+import org.apache.thrift.TException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.IOException;
+
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.IF_PURGE;
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.META_TABLE_LOCATION;
+
+public class HiveQueryLifeTimeHook implements QueryLifeTimeHook {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveQueryLifeTimeHook.class);
+
+  @Override
+  public void beforeCompile(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterCompile(QueryLifeTimeHookContext ctx, boolean hasError) {
+
+  }
+
+  @Override
+  public void beforeExecution(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterExecution(QueryLifeTimeHookContext ctx, boolean hasError) {
+if (hasError) {
+  checkAndRollbackCTAS(ctx);
+}
+  }
+
+  private void checkAndRollbackCTAS(QueryLifeTimeHookContext ctx) {
+HiveConf conf = ctx.getHiveConf();
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {
+  PrivateHookContext pCtx = (PrivateHookContext) ctx.getHookContext();

Review Comment:
   no need for a cast, HookContext is good





Issue Time Tracking
---

Worklog Id: (was: 796884)
Time Spent: 9h 40m  (was: 9.5h)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796883=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796883
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 01/Aug/22 13:30
Start Date: 01/Aug/22 13:30
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r934530893


##
ql/src/java/org/apache/hadoop/hive/ql/Context.java:
##
@@ -191,6 +191,8 @@ public class Context {
 
   private List> parsedTables = new ArrayList<>();
 
+  private Path destinationPath;

Review Comment:
   change to `location`





Issue Time Tracking
---

Worklog Id: (was: 796883)
Time Spent: 9.5h  (was: 9h 20m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9.5h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796882=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796882
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 01/Aug/22 13:26
Start Date: 01/Aug/22 13:26
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r934526189


##
ql/src/java/org/apache/hadoop/hive/ql/HiveQueryLifeTimeHook.java:
##
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.CompactionRequest;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.ql.hooks.PrivateHookContext;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.plan.HiveOperation;
+import org.apache.thrift.TException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.IOException;
+
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.IF_PURGE;
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.META_TABLE_LOCATION;
+
+public class HiveQueryLifeTimeHook implements QueryLifeTimeHook {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(HiveQueryLifeTimeHook.class);
+
+  @Override
+  public void beforeCompile(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterCompile(QueryLifeTimeHookContext ctx, boolean hasError) {
+
+  }
+
+  @Override
+  public void beforeExecution(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterExecution(QueryLifeTimeHookContext ctx, boolean hasError) {
+if (hasError) {
+  checkAndRollbackCTAS(ctx);
+}
+  }
+
+  private void checkAndRollbackCTAS(QueryLifeTimeHookContext ctx) {
+HiveConf conf = ctx.getHiveConf();
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {
+  PrivateHookContext pCtx = (PrivateHookContext) ctx.getHookContext();
+  Table table = 
ctx.getHookContext().getQueryPlan().getAcidSinks().iterator().next().getTable();
+  boolean isCTAS = ctx.getHookContext().getQueryState().getCommandType()

Review Comment:
   
   ctx.getHookContext().getQueryPlan().getQueryProperties().isCTAS()
   





Issue Time Tracking
---

Worklog Id: (was: 796882)
Time Spent: 9h 20m  (was: 9h 10m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 20m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796879=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796879
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 01/Aug/22 13:18
Start Date: 01/Aug/22 13:18
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r934518439


##
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##
@@ -7983,7 +7984,17 @@ protected boolean enableColumnStatsCollecting() {
 return true;
   }
 
-  private Path getCtasLocation(CreateTableDesc tblDesc) throws 
SemanticException {
+  private Path getCtasLocation(CreateTableDesc tblDesc, boolean 
createTableWithSuffix) throws SemanticException {
+Path destinationPath = getCtasLocationWithoutSuffix(tblDesc);
+if (createTableWithSuffix) {
+  long txnId = ctx.getHiveTxnManager().getCurrentTxnId();
+  String suffix = AcidUtils.getPathSuffix(txnId);
+  destinationPath = new Path(destinationPath.toString() + suffix);
+}
+return destinationPath;
+  }
+
+  private Path getCtasLocationWithoutSuffix(CreateTableDesc tblDesc) throws 
SemanticException {

Review Comment:
   please keep it as `getCtasLocation` and call overloaded 
getCtasLocation(tblDesc, false)





Issue Time Tracking
---

Worklog Id: (was: 796879)
Time Spent: 9h 10m  (was: 9h)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796866=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796866
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 01/Aug/22 12:43
Start Date: 01/Aug/22 12:43
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r934484243


##
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java:
##
@@ -3478,6 +3478,17 @@ CompactionResponse compact2(String dbname, String 
tableName, String partitionNam
*/
   ShowCompactResponse showCompactions() throws TException;
 
+  /**
+   * Submit a request for performing cleanup of output directory. This is 
particularly
+   * useful for CTAS when the query fails after write and before creation of 
table.
+   * @return Status of whether the request was successfully submitted. True 
indicates
+   * the request was successfully submitted and false indicates failure of 
request submitted.
+   * @throws TException
+   */
+  boolean submitForCleanup(String dbname, String tableName, CompactionType 
type,

Review Comment:
   Done





Issue Time Tracking
---

Worklog Id: (was: 796866)
Time Spent: 9h  (was: 8h 50m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 9h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796836=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796836
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 01/Aug/22 09:49
Start Date: 01/Aug/22 09:49
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r934330597


##
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java:
##
@@ -3478,6 +3478,17 @@ CompactionResponse compact2(String dbname, String 
tableName, String partitionNam
*/
   ShowCompactResponse showCompactions() throws TException;
 
+  /**
+   * Submit a request for performing cleanup of output directory. This is 
particularly
+   * useful for CTAS when the query fails after write and before creation of 
table.
+   * @return Status of whether the request was successfully submitted. True 
indicates
+   * the request was successfully submitted and false indicates failure of 
request submitted.
+   * @throws TException
+   */
+  boolean submitForCleanup(String dbname, String tableName, CompactionType 
type,

Review Comment:
   As discussed offline, this was already done. The request is created inside 
this function which actually makes the request to the HMS.





Issue Time Tracking
---

Worklog Id: (was: 796836)
Time Spent: 8h 50m  (was: 8h 40m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796830=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796830
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 01/Aug/22 09:33
Start Date: 01/Aug/22 09:33
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r934330597


##
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java:
##
@@ -3478,6 +3478,17 @@ CompactionResponse compact2(String dbname, String 
tableName, String partitionNam
*/
   ShowCompactResponse showCompactions() throws TException;
 
+  /**
+   * Submit a request for performing cleanup of output directory. This is 
particularly
+   * useful for CTAS when the query fails after write and before creation of 
table.
+   * @return Status of whether the request was successfully submitted. True 
indicates
+   * the request was successfully submitted and false indicates failure of 
request submitted.
+   * @throws TException
+   */
+  boolean submitForCleanup(String dbname, String tableName, CompactionType 
type,

Review Comment:
   As discussed offline, this was already done. The request is created inside 
this function which actually makes the request to the HMS.





Issue Time Tracking
---

Worklog Id: (was: 796830)
Time Spent: 8h 40m  (was: 8.5h)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796812=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796812
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 01/Aug/22 08:29
Start Date: 01/Aug/22 08:29
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r934269159


##
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java:
##
@@ -3478,6 +3478,17 @@ CompactionResponse compact2(String dbname, String 
tableName, String partitionNam
*/
   ShowCompactResponse showCompactions() throws TException;
 
+  /**
+   * Submit a request for performing cleanup of output directory. This is 
particularly
+   * useful for CTAS when the query fails after write and before creation of 
table.
+   * @return Status of whether the request was successfully submitted. True 
indicates
+   * the request was successfully submitted and false indicates failure of 
request submitted.
+   * @throws TException
+   */
+  boolean submitForCleanup(String dbname, String tableName, CompactionType 
type,

Review Comment:
   please use a request object, so it could be modified in future
   





Issue Time Tracking
---

Worklog Id: (was: 796812)
Time Spent: 8.5h  (was: 8h 20m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8.5h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796795=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796795
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 01/Aug/22 07:18
Start Date: 01/Aug/22 07:18
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r934210285


##
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CtasQueryLifeTimeHook.java:
##
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.ddl.table.create;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.CompactionRequest;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.txn.TxnStore;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.ql.Context;
+import org.apache.hadoop.hive.ql.hooks.PrivateHookContext;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.IOException;
+
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.IF_PURGE;
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.META_TABLE_LOCATION;
+
+public class CtasQueryLifeTimeHook implements QueryLifeTimeHook {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(CtasQueryLifeTimeHook.class);
+
+  @Override
+  public void beforeCompile(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterCompile(QueryLifeTimeHookContext ctx, boolean hasError) {
+
+  }
+
+  @Override
+  public void beforeExecution(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterExecution(QueryLifeTimeHookContext ctx, boolean hasError) {
+HiveConf conf = ctx.getHiveConf();
+PrivateHookContext privateHookContext = (PrivateHookContext) 
ctx.getHookContext();
+Context context = privateHookContext.getContext();
+
+if (hasError && HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {
+  Table table = context.getDestinationTable();
+  if (table != null) {
+LOG.info("Performing cleanup as part of rollback: {}", 
table.getFullTableName().toString());
+try {
+  TxnStore txnHandler = TxnUtils.getTxnStore(conf);

Review Comment:
   Implemented API for this in HiveMetastoreClient - `submitForCleanup`. Made 
use of this API to send the cleanup request. Done.





Issue Time Tracking
---

Worklog Id: (was: 796795)
Time Spent: 8h 20m  (was: 8h 10m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796794=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796794
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 01/Aug/22 07:17
Start Date: 01/Aug/22 07:17
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r934209248


##
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CtasQueryLifeTimeHook.java:
##
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.ddl.table.create;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.CompactionRequest;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.txn.TxnStore;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.ql.Context;
+import org.apache.hadoop.hive.ql.hooks.PrivateHookContext;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.IOException;
+
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.IF_PURGE;
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.META_TABLE_LOCATION;
+
+public class CtasQueryLifeTimeHook implements QueryLifeTimeHook {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(CtasQueryLifeTimeHook.class);
+
+  @Override
+  public void beforeCompile(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterCompile(QueryLifeTimeHookContext ctx, boolean hasError) {
+
+  }
+
+  @Override
+  public void beforeExecution(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterExecution(QueryLifeTimeHookContext ctx, boolean hasError) {
+HiveConf conf = ctx.getHiveConf();

Review Comment:
   Done





Issue Time Tracking
---

Worklog Id: (was: 796794)
Time Spent: 8h 10m  (was: 8h)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-08-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796793=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796793
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 01/Aug/22 07:17
Start Date: 01/Aug/22 07:17
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r933135808


##
ql/src/java/org/apache/hadoop/hive/ql/HookRunner.java:
##
@@ -56,6 +57,7 @@ public class HookRunner {
   HookRunner(HiveConf conf, SessionState.LogHelper console) {
 this.conf = conf;
 this.hooks = new HiveHooks(conf, console);
+addNecessaryHooks();

Review Comment:
   `addNecessaryHooks()` will have all the hooks that are absolutely required 
from now on. Thought this would be a good way to go about. But let me know if 
this function is required so that I can change it.



##
ql/src/java/org/apache/hadoop/hive/ql/HookRunner.java:
##
@@ -56,6 +57,7 @@ public class HookRunner {
   HookRunner(HiveConf conf, SessionState.LogHelper console) {
 this.conf = conf;
 this.hooks = new HiveHooks(conf, console);
+addNecessaryHooks();

Review Comment:
   Done



##
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CtasQueryLifeTimeHook.java:
##
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.ddl.table.create;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.CompactionRequest;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.txn.TxnStore;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.ql.Context;
+import org.apache.hadoop.hive.ql.hooks.PrivateHookContext;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.IOException;
+
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.IF_PURGE;
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.META_TABLE_LOCATION;
+
+public class CtasQueryLifeTimeHook implements QueryLifeTimeHook {

Review Comment:
   Done





Issue Time Tracking
---

Worklog Id: (was: 796793)
Time Spent: 8h  (was: 7h 50m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796363=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796363
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 29/Jul/22 11:45
Start Date: 29/Jul/22 11:45
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r933146801


##
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CtasQueryLifeTimeHook.java:
##
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.ddl.table.create;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.CompactionRequest;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.txn.TxnStore;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.ql.Context;
+import org.apache.hadoop.hive.ql.hooks.PrivateHookContext;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.IOException;
+
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.IF_PURGE;
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.META_TABLE_LOCATION;
+
+public class CtasQueryLifeTimeHook implements QueryLifeTimeHook {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(CtasQueryLifeTimeHook.class);
+
+  @Override
+  public void beforeCompile(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterCompile(QueryLifeTimeHookContext ctx, boolean hasError) {
+
+  }
+
+  @Override
+  public void beforeExecution(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterExecution(QueryLifeTimeHookContext ctx, boolean hasError) {
+HiveConf conf = ctx.getHiveConf();
+PrivateHookContext privateHookContext = (PrivateHookContext) 
ctx.getHookContext();
+Context context = privateHookContext.getContext();
+
+if (hasError && HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {
+  Table table = context.getDestinationTable();
+  if (table != null) {
+LOG.info("Performing cleanup as part of rollback: {}", 
table.getFullTableName().toString());
+try {
+  TxnStore txnHandler = TxnUtils.getTxnStore(conf);

Review Comment:
   You shouldn't use TxnStore here, but mscClient. see DbTxnManager.getMS()





Issue Time Tracking
---

Worklog Id: (was: 796363)
Time Spent: 7h 50m  (was: 7h 40m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796362=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796362
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 29/Jul/22 11:39
Start Date: 29/Jul/22 11:39
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r933146801


##
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CtasQueryLifeTimeHook.java:
##
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.ddl.table.create;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.CompactionRequest;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.txn.TxnStore;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.ql.Context;
+import org.apache.hadoop.hive.ql.hooks.PrivateHookContext;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.IOException;
+
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.IF_PURGE;
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.META_TABLE_LOCATION;
+
+public class CtasQueryLifeTimeHook implements QueryLifeTimeHook {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(CtasQueryLifeTimeHook.class);
+
+  @Override
+  public void beforeCompile(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterCompile(QueryLifeTimeHookContext ctx, boolean hasError) {
+
+  }
+
+  @Override
+  public void beforeExecution(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterExecution(QueryLifeTimeHookContext ctx, boolean hasError) {
+HiveConf conf = ctx.getHiveConf();
+PrivateHookContext privateHookContext = (PrivateHookContext) 
ctx.getHookContext();
+Context context = privateHookContext.getContext();
+
+if (hasError && HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {
+  Table table = context.getDestinationTable();
+  if (table != null) {
+LOG.info("Performing cleanup as part of rollback: {}", 
table.getFullTableName().toString());
+try {
+  TxnStore txnHandler = TxnUtils.getTxnStore(conf);

Review Comment:
   can we extract TxnStore into the constructor or init method? that would be 
really expensive to create a new one, maybe consider ThreadLocal





Issue Time Tracking
---

Worklog Id: (was: 796362)
Time Spent: 7h 40m  (was: 7.5h)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796358=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796358
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 29/Jul/22 11:35
Start Date: 29/Jul/22 11:35
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r933143569


##
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CtasQueryLifeTimeHook.java:
##
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.ddl.table.create;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.CompactionRequest;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.txn.TxnStore;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.ql.Context;
+import org.apache.hadoop.hive.ql.hooks.PrivateHookContext;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.IOException;
+
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.IF_PURGE;
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.META_TABLE_LOCATION;
+
+public class CtasQueryLifeTimeHook implements QueryLifeTimeHook {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(CtasQueryLifeTimeHook.class);
+
+  @Override
+  public void beforeCompile(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterCompile(QueryLifeTimeHookContext ctx, boolean hasError) {
+
+  }
+
+  @Override
+  public void beforeExecution(QueryLifeTimeHookContext ctx) {
+
+  }
+
+  @Override
+  public void afterExecution(QueryLifeTimeHookContext ctx, boolean hasError) {
+HiveConf conf = ctx.getHiveConf();

Review Comment:
   
   if (hasError) {
 checkAndRollbackCTAS(ctx);
   }
   





Issue Time Tracking
---

Worklog Id: (was: 796358)
Time Spent: 7.5h  (was: 7h 20m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796357=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796357
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 29/Jul/22 11:34
Start Date: 29/Jul/22 11:34
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r933142617


##
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CtasQueryLifeTimeHook.java:
##
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hive.ql.ddl.table.create;
+
+import org.apache.hadoop.hive.conf.HiveConf;
+import org.apache.hadoop.hive.metastore.api.CompactionRequest;
+import org.apache.hadoop.hive.metastore.api.CompactionType;
+import org.apache.hadoop.hive.metastore.api.MetaException;
+import org.apache.hadoop.hive.metastore.txn.TxnStore;
+import org.apache.hadoop.hive.metastore.txn.TxnUtils;
+import org.apache.hadoop.hive.ql.Context;
+import org.apache.hadoop.hive.ql.hooks.PrivateHookContext;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHook;
+import org.apache.hadoop.hive.ql.hooks.QueryLifeTimeHookContext;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import java.io.IOException;
+
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.IF_PURGE;
+import static 
org.apache.hadoop.hive.metastore.api.hive_metastoreConstants.META_TABLE_LOCATION;
+
+public class CtasQueryLifeTimeHook implements QueryLifeTimeHook {

Review Comment:
   can we make it a generic one: HiveQueryLifeTimeHook?





Issue Time Tracking
---

Worklog Id: (was: 796357)
Time Spent: 7h 20m  (was: 7h 10m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796355=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796355
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 29/Jul/22 11:31
Start Date: 29/Jul/22 11:31
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r933135808


##
ql/src/java/org/apache/hadoop/hive/ql/HookRunner.java:
##
@@ -56,6 +57,7 @@ public class HookRunner {
   HookRunner(HiveConf conf, SessionState.LogHelper console) {
 this.conf = conf;
 this.hooks = new HiveHooks(conf, console);
+addNecessaryHooks();

Review Comment:
   `addNecessaryHooks()` will have all the hooks that are absolutely required 
from now on. Thought this would be a good way to go about. But let me know if 
this function is required so that I can change it.





Issue Time Tracking
---

Worklog Id: (was: 796355)
Time Spent: 7h 10m  (was: 7h)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796354=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796354
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 29/Jul/22 11:29
Start Date: 29/Jul/22 11:29
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r933135808


##
ql/src/java/org/apache/hadoop/hive/ql/HookRunner.java:
##
@@ -56,6 +57,7 @@ public class HookRunner {
   HookRunner(HiveConf conf, SessionState.LogHelper console) {
 this.conf = conf;
 this.hooks = new HiveHooks(conf, console);
+addNecessaryHooks();

Review Comment:
   `addNecessaryHooks()` will have all the hooks that are absolutely required 
from now on. Thought this would be a good way to go about. But let me know if 
this function required so that I can change it.





Issue Time Tracking
---

Worklog Id: (was: 796354)
Time Spent: 7h  (was: 6h 50m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796353=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796353
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 29/Jul/22 11:26
Start Date: 29/Jul/22 11:26
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r933135808


##
ql/src/java/org/apache/hadoop/hive/ql/HookRunner.java:
##
@@ -56,6 +57,7 @@ public class HookRunner {
   HookRunner(HiveConf conf, SessionState.LogHelper console) {
 this.conf = conf;
 this.hooks = new HiveHooks(conf, console);
+addNecessaryHooks();

Review Comment:
   `addNecessaryHooks()` will have all the hooks that are absolutely required 
from now on. Thought this would be a good way to go about. But let me know if I 
have to change and directly add it.





Issue Time Tracking
---

Worklog Id: (was: 796353)
Time Spent: 6h 50m  (was: 6h 40m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-29 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=796352=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-796352
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 29/Jul/22 11:22
Start Date: 29/Jul/22 11:22
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r933133215


##
ql/src/java/org/apache/hadoop/hive/ql/HookRunner.java:
##
@@ -56,6 +57,7 @@ public class HookRunner {
   HookRunner(HiveConf conf, SessionState.LogHelper console) {
 this.conf = conf;
 this.hooks = new HiveHooks(conf, console);
+addNecessaryHooks();

Review Comment:
   just add it directly





Issue Time Tracking
---

Worklog Id: (was: 796352)
Time Spent: 6h 40m  (was: 6.5h)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=795674=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795674
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 27/Jul/22 14:23
Start Date: 27/Jul/22 14:23
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r931126498


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +493,27 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupOutputDir(Context ctx) throws MetaException {
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {
+  Table destinationTable = ctx.getDestinationTable();
+  if (destinationTable != null) {
+try {
+  CompactionRequest rqst = new CompactionRequest(
+  destinationTable.getDbName(), 
destinationTable.getTableName(), CompactionType.MAJOR);
+  
rqst.setRunas(TxnUtils.findUserToRunAs(destinationTable.getSd().getLocation(),
+  destinationTable.getTTable(), conf));
+
+  rqst.putToProperties(META_TABLE_LOCATION, 
destinationTable.getSd().getLocation());
+  rqst.putToProperties(IF_PURGE, Boolean.toString(true));
+  TxnStore txnHandler = TxnUtils.getTxnStore(conf);

Review Comment:
   > btw, would it be hard to create a completionHook similar to Iceberg one?
   
   We could create one but it would include failures only within Query 
execution.
   Anything done after query execution (post execution activities) will not be 
within its scope, which is why I disregarded the Hook approach.
   
   The hooks are used as part of finally block here - 
   
https://github.com/apache/hive/blob/b197ed86029f07696e326acb5878f86c286e9e1a/ql/src/java/org/apache/hadoop/hive/ql/Executor.java#L118
   
   Cleanup will then be dependent on a HiveConf - `hive.query.lifetime.hooks`. 





Issue Time Tracking
---

Worklog Id: (was: 795674)
Time Spent: 6.5h  (was: 6h 20m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=795673=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795673
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 27/Jul/22 14:21
Start Date: 27/Jul/22 14:21
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r931126498


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +493,27 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupOutputDir(Context ctx) throws MetaException {
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {
+  Table destinationTable = ctx.getDestinationTable();
+  if (destinationTable != null) {
+try {
+  CompactionRequest rqst = new CompactionRequest(
+  destinationTable.getDbName(), 
destinationTable.getTableName(), CompactionType.MAJOR);
+  
rqst.setRunas(TxnUtils.findUserToRunAs(destinationTable.getSd().getLocation(),
+  destinationTable.getTTable(), conf));
+
+  rqst.putToProperties(META_TABLE_LOCATION, 
destinationTable.getSd().getLocation());
+  rqst.putToProperties(IF_PURGE, Boolean.toString(true));
+  TxnStore txnHandler = TxnUtils.getTxnStore(conf);

Review Comment:
   > btw, would it be hard to create a completionHook similar to Iceberg one?
   
   We could create one but it would include failures only within Query 
execution.
   Anything done after query execution (post execution activities like 
releasing locks) will not be within its scope, which is why I disregarded the 
Hook approach.
   
   The hooks are used as part of finally block here - 
   
https://github.com/apache/hive/blob/b197ed86029f07696e326acb5878f86c286e9e1a/ql/src/java/org/apache/hadoop/hive/ql/Executor.java#L118
   
   Cleanup will then be dependent on a HiveConf - `hive.query.lifetime.hooks`. 





Issue Time Tracking
---

Worklog Id: (was: 795673)
Time Spent: 6h 20m  (was: 6h 10m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=795575=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795575
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 27/Jul/22 08:55
Start Date: 27/Jul/22 08:55
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r930799014


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +493,27 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupOutputDir(Context ctx) throws MetaException {
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {
+  Table destinationTable = ctx.getDestinationTable();
+  if (destinationTable != null) {
+try {
+  CompactionRequest rqst = new CompactionRequest(
+  destinationTable.getDbName(), 
destinationTable.getTableName(), CompactionType.MAJOR);
+  
rqst.setRunas(TxnUtils.findUserToRunAs(destinationTable.getSd().getLocation(),
+  destinationTable.getTTable(), conf));
+
+  rqst.putToProperties(META_TABLE_LOCATION, 
destinationTable.getSd().getLocation());
+  rqst.putToProperties(IF_PURGE, Boolean.toString(true));
+  TxnStore txnHandler = TxnUtils.getTxnStore(conf);

Review Comment:
   btw, would it be hard to create a completionHook similar to Iceberg one?





Issue Time Tracking
---

Worklog Id: (was: 795575)
Time Spent: 6h 10m  (was: 6h)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=795571=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795571
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 27/Jul/22 08:53
Start Date: 27/Jul/22 08:53
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r930796903


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +493,27 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupOutputDir(Context ctx) throws MetaException {
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {
+  Table destinationTable = ctx.getDestinationTable();
+  if (destinationTable != null) {
+try {
+  CompactionRequest rqst = new CompactionRequest(
+  destinationTable.getDbName(), 
destinationTable.getTableName(), CompactionType.MAJOR);
+  
rqst.setRunas(TxnUtils.findUserToRunAs(destinationTable.getSd().getLocation(),
+  destinationTable.getTTable(), conf));
+
+  rqst.putToProperties(META_TABLE_LOCATION, 
destinationTable.getSd().getLocation());
+  rqst.putToProperties(IF_PURGE, Boolean.toString(true));
+  TxnStore txnHandler = TxnUtils.getTxnStore(conf);

Review Comment:
   that would be expensive to create a new txnStore everytime





Issue Time Tracking
---

Worklog Id: (was: 795571)
Time Spent: 6h  (was: 5h 50m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=795567=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795567
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 27/Jul/22 08:51
Start Date: 27/Jul/22 08:51
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r930794617


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -598,6 +627,16 @@ public void rollbackTxn() throws LockException {
 }
   }
 
+  @Override
+  public void rollbackTxn(Context ctx) throws LockException {
+try {
+  cleanupOutputDir(ctx);
+} catch (TException e) {
+  throw new 
LockException(ErrorMsg.METASTORE_COMMUNICATION_FAILED.getMsg(), e);
+}
+rollbackTxn();

Review Comment:
   we should call rollback first, and then try the cleanup, if cleanup fails, 
we'll never mark a txn as aborted 





Issue Time Tracking
---

Worklog Id: (was: 795567)
Time Spent: 5h 50m  (was: 5h 40m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=795565=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795565
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 27/Jul/22 08:48
Start Date: 27/Jul/22 08:48
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r930790946


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java:
##
@@ -161,7 +161,7 @@ void replTableWriteIdState(String validWriteIdList, String 
dbName, String tableN
* @throws LockException if there is no current transaction or the
* transaction has already been committed or aborted.
*/
-  void rollbackTxn() throws LockException;
+  void rollbackTxn(Context ctx) throws LockException;

Review Comment:
   true, however, Impala could still be using an old API
   





Issue Time Tracking
---

Worklog Id: (was: 795565)
Time Spent: 5h 40m  (was: 5.5h)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=795137=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-795137
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 26/Jul/22 06:33
Start Date: 26/Jul/22 06:33
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r929574342


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java:
##
@@ -161,7 +161,7 @@ void replTableWriteIdState(String validWriteIdList, String 
dbName, String tableN
* @throws LockException if there is no current transaction or the
* transaction has already been committed or aborted.
*/
-  void rollbackTxn() throws LockException;
+  void rollbackTxn(Context ctx) throws LockException;

Review Comment:
   Added `void rollbackTxn(Context ctx)` and retained the old signature. Done. 
   
   However most of the places will use `void rollbackTxn(Context ctx)` from now 
on.





Issue Time Tracking
---

Worklog Id: (was: 795137)
Time Spent: 5.5h  (was: 5h 20m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794884=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794884
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 25/Jul/22 12:56
Start Date: 25/Jul/22 12:56
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r928844426


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java:
##
@@ -161,7 +161,7 @@ void replTableWriteIdState(String validWriteIdList, String 
dbName, String tableN
* @throws LockException if there is no current transaction or the
* transaction has already been committed or aborted.
*/
-  void rollbackTxn() throws LockException;
+  void rollbackTxn(Context ctx) throws LockException;

Review Comment:
   You mean retain this function - 
   `void rollbackTxn() throws LockException;`
As well as create a new function -
   `void rollbackTxn(Context ctx) throws LockException;`  in the interface?





Issue Time Tracking
---

Worklog Id: (was: 794884)
Time Spent: 5h 20m  (was: 5h 10m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794878=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794878
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 25/Jul/22 12:50
Start Date: 25/Jul/22 12:50
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r928844426


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java:
##
@@ -161,7 +161,7 @@ void replTableWriteIdState(String validWriteIdList, String 
dbName, String tableN
* @throws LockException if there is no current transaction or the
* transaction has already been committed or aborted.
*/
-  void rollbackTxn() throws LockException;
+  void rollbackTxn(Context ctx) throws LockException;

Review Comment:
   You mean retain this function - 
   `void rollbackTxn() throws LockException;`
As well as create a new function -
   `void rollbackTxn(Context ctx) throws LockException;`  in the interface?





Issue Time Tracking
---

Worklog Id: (was: 794878)
Time Spent: 5h 10m  (was: 5h)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794877=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794877
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 25/Jul/22 12:48
Start Date: 25/Jul/22 12:48
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r928844426


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java:
##
@@ -161,7 +161,7 @@ void replTableWriteIdState(String validWriteIdList, String 
dbName, String tableN
* @throws LockException if there is no current transaction or the
* transaction has already been committed or aborted.
*/
-  void rollbackTxn() throws LockException;
+  void rollbackTxn(Context ctx) throws LockException;

Review Comment:
   You mean retain this function - 
   `void rollbackTxn() throws LockException;`
As well as create a -
   `void rollbackTxn(Context ctx) throws LockException;`  in the interface?





Issue Time Tracking
---

Worklog Id: (was: 794877)
Time Spent: 5h  (was: 4h 50m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794876=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794876
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 25/Jul/22 12:47
Start Date: 25/Jul/22 12:47
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r928844426


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java:
##
@@ -161,7 +161,7 @@ void replTableWriteIdState(String validWriteIdList, String 
dbName, String tableN
* @throws LockException if there is no current transaction or the
* transaction has already been committed or aborted.
*/
-  void rollbackTxn() throws LockException;
+  void rollbackTxn(Context ctx) throws LockException;

Review Comment:
   You mean retain this function - 
   `void rollbackTxn() throws LockException;`
As well as create a -
   `void rollbackTxn(Context ctx) throws LockException;` ?





Issue Time Tracking
---

Worklog Id: (was: 794876)
Time Spent: 4h 50m  (was: 4h 40m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794875=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794875
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 25/Jul/22 12:42
Start Date: 25/Jul/22 12:42
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r928839830


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java:
##
@@ -161,7 +161,7 @@ void replTableWriteIdState(String validWriteIdList, String 
dbName, String tableN
* @throws LockException if there is no current transaction or the
* transaction has already been committed or aborted.
*/
-  void rollbackTxn() throws LockException;
+  void rollbackTxn(Context ctx) throws LockException;

Review Comment:
   this is a change in API, we should keep the original signature





Issue Time Tracking
---

Worklog Id: (was: 794875)
Time Spent: 4h 40m  (was: 4.5h)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794715=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794715
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 25/Jul/22 06:04
Start Date: 25/Jul/22 06:04
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r928497915


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +496,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() throws LockException {

Review Comment:
   Done



##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +496,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() throws LockException {
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {
+  if (destinationTable != null) {
+try {
+  CompactionRequest rqst = new CompactionRequest(
+  destinationTable.getDbName(), 
destinationTable.getTableName(), CompactionType.MAJOR);
+  
rqst.setRunas(TxnUtils.findUserToRunAs(destinationTable.getSd().getLocation(),
+  destinationTable.getTTable(), conf));
+
+  rqst.putToProperties(META_TABLE_LOCATION, 
destinationTable.getSd().getLocation());
+  rqst.putToProperties(IF_PURGE, Boolean.toString(true));
+  TxnStore txnHandler = TxnUtils.getTxnStore(conf);
+  txnHandler.submitForCleanup(rqst, 
destinationTable.getTTable().getWriteId(), getCurrentTxnId());
+} catch (InterruptedException | IOException | MetaException e) {

Review Comment:
   Done





Issue Time Tracking
---

Worklog Id: (was: 794715)
Time Spent: 4.5h  (was: 4h 20m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794714=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794714
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 25/Jul/22 06:04
Start Date: 25/Jul/22 06:04
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r928497740


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {

Review Comment:
   Done





Issue Time Tracking
---

Worklog Id: (was: 794714)
Time Spent: 4h 20m  (was: 4h 10m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794193=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794193
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 22/Jul/22 12:15
Start Date: 22/Jul/22 12:15
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r927592887


##
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##
@@ -7852,6 +7852,10 @@ protected Operator genFileSinkPlan(String dest, QB qb, 
Operator input)
   throw new SemanticException("Error while getting the full qualified 
path for the given directory: " + ex.getMessage());
 }
   }
+
+  if (!isNonNativeTable && 
AcidUtils.isTransactionalTable(destinationTable) && qb.isCTAS()) {

Review Comment:
   I enquired with @lcspinter offline, Iceberg tables are treated as external 
tables and specifically for Iceberg tables, there is a implementation of 
cleanup here - 
   
https://github.com/apache/hive/blob/master/iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergQueryLifeTimeHook.java





Issue Time Tracking
---

Worklog Id: (was: 794193)
Time Spent: 4h 10m  (was: 4h)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794187=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794187
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 22/Jul/22 11:41
Start Date: 22/Jul/22 11:41
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r927568434


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {

Review Comment:
   Yes, the cleanup is for cases when there is no concurrent CTAS and a single 
query fails. But if we disable exclusive locking and perform concurrent CTAS 
operations and let's say the first query fails, then cleanup is triggered by 
the first query on the same location and the second query will write to the 
same location. 
   
   This is the situation I want to avoid which is why perform cleanup only when 
the exclusive locking on CTAS is enabled.





Issue Time Tracking
---

Worklog Id: (was: 794187)
Time Spent: 4h  (was: 3h 50m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794185=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794185
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 22/Jul/22 11:39
Start Date: 22/Jul/22 11:39
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r927568434


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {

Review Comment:
   Yes, the cleanup is for cases when there is no concurrent CTAS and a single 
query fails. But if we disable exclusive locking and perform concurrent CTAS 
operations and let's say the first query fails, then cleanup is triggered by 
the first query on the same location as well the second query will write to the 
same location. 
   
   This is the situation I want to avoid which is why perform cleanup only when 
the exclusive locking on CTAS is enabled.





Issue Time Tracking
---

Worklog Id: (was: 794185)
Time Spent: 3h 50m  (was: 3h 40m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794160=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794160
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 22/Jul/22 10:08
Start Date: 22/Jul/22 10:08
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r927504663


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +496,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() throws LockException {
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {
+  if (destinationTable != null) {
+try {
+  CompactionRequest rqst = new CompactionRequest(
+  destinationTable.getDbName(), 
destinationTable.getTableName(), CompactionType.MAJOR);
+  
rqst.setRunas(TxnUtils.findUserToRunAs(destinationTable.getSd().getLocation(),
+  destinationTable.getTTable(), conf));
+
+  rqst.putToProperties(META_TABLE_LOCATION, 
destinationTable.getSd().getLocation());
+  rqst.putToProperties(IF_PURGE, Boolean.toString(true));
+  TxnStore txnHandler = TxnUtils.getTxnStore(conf);
+  txnHandler.submitForCleanup(rqst, 
destinationTable.getTTable().getWriteId(), getCurrentTxnId());
+} catch (InterruptedException | IOException | MetaException e) {

Review Comment:
   we could simply do next:
   
   private void cleanupOutputDir() throws MetaException {
   ..
   } catch (InterruptedException | IOException e) {
 throwMetaException(e);
   }
   





Issue Time Tracking
---

Worklog Id: (was: 794160)
Time Spent: 3h 40m  (was: 3.5h)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794157=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794157
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 22/Jul/22 10:00
Start Date: 22/Jul/22 10:00
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r927498250


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +496,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() throws LockException {

Review Comment:
   I would make the method name more generic, like `cleanupResultDir`/ 
`cleanupOutputDir`, so it could be extended in the future.





Issue Time Tracking
---

Worklog Id: (was: 794157)
Time Spent: 3.5h  (was: 3h 20m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794150=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794150
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 22/Jul/22 09:50
Start Date: 22/Jul/22 09:50
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r927489420


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {

Review Comment:
   shouldn't we do cleanup when there is no concurrent CTAS, but the operation 
was canceled by the user in the middle?
   





Issue Time Tracking
---

Worklog Id: (was: 794150)
Time Spent: 3h 20m  (was: 3h 10m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794148=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794148
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 22/Jul/22 09:48
Start Date: 22/Jul/22 09:48
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r927487444


##
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##
@@ -7852,6 +7852,10 @@ protected Operator genFileSinkPlan(String dest, QB qb, 
Operator input)
   throw new SemanticException("Error while getting the full qualified 
path for the given directory: " + ex.getMessage());
 }
   }
+
+  if (!isNonNativeTable && 
AcidUtils.isTransactionalTable(destinationTable) && qb.isCTAS()) {

Review Comment:
   is there a reason, why we should exclude them? additional Iceberg-related 
changes are required?





Issue Time Tracking
---

Worklog Id: (was: 794148)
Time Spent: 3h 10m  (was: 3h)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794147=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794147
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 22/Jul/22 09:46
Start Date: 22/Jul/22 09:46
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r927485805


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {

Review Comment:
   As you mentioned, we need to add an overloaded `rollbackTxn` method into the 
`HiveTxnManager` interface that accepts `Context ctx` param.





Issue Time Tracking
---

Worklog Id: (was: 794147)
Time Spent: 3h  (was: 2h 50m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794133=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794133
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 22/Jul/22 08:49
Start Date: 22/Jul/22 08:49
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r927437027


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {
+  if (destinationTable != null) {
+try {
+  CompactionRequest rqst = new CompactionRequest(
+  destinationTable.getDbName(), 
destinationTable.getTableName(), CompactionType.MAJOR);
+  
rqst.setRunas(TxnUtils.findUserToRunAs(destinationTable.getSd().getLocation(),
+  destinationTable.getTTable(), conf));
+
+  rqst.putToProperties("location", 
destinationTable.getSd().getLocation());

Review Comment:
   Done





Issue Time Tracking
---

Worklog Id: (was: 794133)
Time Spent: 2h 50m  (was: 2h 40m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794108=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794108
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 22/Jul/22 08:09
Start Date: 22/Jul/22 08:09
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r927401945


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {
+  if (destinationTable != null) {
+try {
+  CompactionRequest rqst = new CompactionRequest(
+  destinationTable.getDbName(), 
destinationTable.getTableName(), CompactionType.MAJOR);
+  
rqst.setRunas(TxnUtils.findUserToRunAs(destinationTable.getSd().getLocation(),
+  destinationTable.getTTable(), conf));
+
+  rqst.putToProperties("location", 
destinationTable.getSd().getLocation());
+  rqst.putToProperties("ifPurge", Boolean.toString(true));
+  TxnStore txnHandler = TxnUtils.getTxnStore(conf);
+  txnHandler.submitForCleanup(rqst, 
destinationTable.getTTable().getWriteId(), getCurrentTxnId());
+} catch (InterruptedException | IOException | MetaException e) {
+  throw new RuntimeException("Not able to submit cleanup operation of 
directory written by CTAS");

Review Comment:
   All three exceptions are thrown within the `try-catch` code. Changed it to 
LockException instead of RuntimeException.





Issue Time Tracking
---

Worklog Id: (was: 794108)
Time Spent: 2h 40m  (was: 2.5h)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794105=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794105
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 22/Jul/22 08:08
Start Date: 22/Jul/22 08:08
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r927401158


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {
+  if (destinationTable != null) {
+try {
+  CompactionRequest rqst = new CompactionRequest(
+  destinationTable.getDbName(), 
destinationTable.getTableName(), CompactionType.MAJOR);
+  
rqst.setRunas(TxnUtils.findUserToRunAs(destinationTable.getSd().getLocation(),
+  destinationTable.getTTable(), conf));
+
+  rqst.putToProperties("location", 
destinationTable.getSd().getLocation());
+  rqst.putToProperties("ifPurge", Boolean.toString(true));

Review Comment:
   Create a constant called `IF_PURGE` . Done.



##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {
+  if (destinationTable != null) {
+try {
+  CompactionRequest rqst = new CompactionRequest(
+  destinationTable.getDbName(), 
destinationTable.getTableName(), CompactionType.MAJOR);
+  
rqst.setRunas(TxnUtils.findUserToRunAs(destinationTable.getSd().getLocation(),
+  destinationTable.getTTable(), conf));
+
+  rqst.putToProperties("location", 
destinationTable.getSd().getLocation());
+  rqst.putToProperties("ifPurge", Boolean.toString(true));

Review Comment:
   Created a constant called `IF_PURGE` . Done.





Issue Time Tracking
---

Worklog Id: (was: 794105)
Time Spent: 2h 20m  (was: 2h 10m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794107=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794107
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 22/Jul/22 08:08
Start Date: 22/Jul/22 08:08
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r927401945


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {
+  if (destinationTable != null) {
+try {
+  CompactionRequest rqst = new CompactionRequest(
+  destinationTable.getDbName(), 
destinationTable.getTableName(), CompactionType.MAJOR);
+  
rqst.setRunas(TxnUtils.findUserToRunAs(destinationTable.getSd().getLocation(),
+  destinationTable.getTTable(), conf));
+
+  rqst.putToProperties("location", 
destinationTable.getSd().getLocation());
+  rqst.putToProperties("ifPurge", Boolean.toString(true));
+  TxnStore txnHandler = TxnUtils.getTxnStore(conf);
+  txnHandler.submitForCleanup(rqst, 
destinationTable.getTTable().getWriteId(), getCurrentTxnId());
+} catch (InterruptedException | IOException | MetaException e) {
+  throw new RuntimeException("Not able to submit cleanup operation of 
directory written by CTAS");

Review Comment:
   All three exceptions are thrown within the code. Changed it to LockException 
instead of RuntimeException.





Issue Time Tracking
---

Worklog Id: (was: 794107)
Time Spent: 2.5h  (was: 2h 20m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794104=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794104
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 22/Jul/22 08:07
Start Date: 22/Jul/22 08:07
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r927400722


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -29,19 +29,10 @@ Licensed to the Apache Software Foundation (ASF) under one
 import org.apache.hadoop.hive.metastore.IMetaStoreClient;
 import org.apache.hadoop.hive.metastore.LockComponentBuilder;
 import org.apache.hadoop.hive.metastore.LockRequestBuilder;
-import org.apache.hadoop.hive.metastore.api.LockComponent;
-import org.apache.hadoop.hive.metastore.api.LockResponse;
-import org.apache.hadoop.hive.metastore.api.LockState;
-import org.apache.hadoop.hive.metastore.api.MetaException;
-import org.apache.hadoop.hive.metastore.api.NoSuchLockException;
-import org.apache.hadoop.hive.metastore.api.NoSuchTxnException;
-import org.apache.hadoop.hive.metastore.api.TxnAbortedException;
-import org.apache.hadoop.hive.metastore.api.TxnToWriteId;
-import org.apache.hadoop.hive.metastore.api.CommitTxnRequest;
-import org.apache.hadoop.hive.metastore.api.DataOperationType;
-import org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse;
-import org.apache.hadoop.hive.metastore.api.TxnType;
+import org.apache.hadoop.hive.metastore.api.*;

Review Comment:
   Done





Issue Time Tracking
---

Worklog Id: (was: 794104)
Time Spent: 2h 10m  (was: 2h)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794100=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794100
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 22/Jul/22 07:51
Start Date: 22/Jul/22 07:51
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r927387412


##
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##
@@ -7852,6 +7852,10 @@ protected Operator genFileSinkPlan(String dest, QB qb, 
Operator input)
   throw new SemanticException("Error while getting the full qualified 
path for the given directory: " + ex.getMessage());
 }
   }
+
+  if (!isNonNativeTable && 
AcidUtils.isTransactionalTable(destinationTable) && qb.isCTAS()) {

Review Comment:
   Yes





Issue Time Tracking
---

Worklog Id: (was: 794100)
Time Spent: 2h  (was: 1h 50m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794071=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794071
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 22/Jul/22 06:07
Start Date: 22/Jul/22 06:07
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r927295389


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {

Review Comment:
   Currently `Context ctx` is not available during `rollbackTxn` which is why I 
chose to store the object.
   However I agree passing `Context ctx` is better.
   There are multiple ways of doing this - 
   First would be to involve passing `Context ctx` to `rollbackTxn` method 
which would change the HiveTxnManager API itself (I particularly dont like this 
since this would be a breaking change).
   
   Or we could create a new function in the `HiveTxnManager` interface of the 
same name and call it from the driver when rollback conditions are satisfied.
   
   My idea was to avoid both and initialise the `destinationTable` in one of 
the existing APIs but I am open for any other suggestions.





Issue Time Tracking
---

Worklog Id: (was: 794071)
Time Spent: 1h 50m  (was: 1h 40m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794063=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794063
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 22/Jul/22 05:47
Start Date: 22/Jul/22 05:47
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r927309557


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {

Review Comment:
   The idea is that `destinationTable` object is set only when CTAS operations 
are performed 
([here](https://github.com/apache/hive/pull/3457/files#diff-d4b1a32bbbd9e283893a6b52854c7aeb3e356a1ba1add2c4107e52901ca268f9R7856)).
 So no inherent need of checking whether the operation is CTAS.
   
   As far as exclusive lock is enabled, it would be right to perform cleanup 
when a exclusive lock is acquired otherwise we might have a situation wherein 
the cleaner is cleaning and concurrent CTAS operations write to the same 
location causing issues.





Issue Time Tracking
---

Worklog Id: (was: 794063)
Time Spent: 1h 40m  (was: 1.5h)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794058=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794058
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 22/Jul/22 05:13
Start Date: 22/Jul/22 05:13
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r927295389


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {

Review Comment:
   Currently `Context ctx` is not available during `rollbackTxn` which is why I 
chose to store the object.
   However I agree passing `Context ctx` is better.
   There are multiple ways of doing this - 
   First would be to involve passing `Context ctx` to `rollbackTxn` method 
which would change the HiveTxnManager API itself (I particularly dont like this 
since this would be a breaking change).
   
   Or we could create a new function in the `HiveTxnManager` interface of the 
same name and call it from the driver when rollback conditions are satisfied.
   
   My idea was to avoid both and initialise the destination in one of the 
existing APIs but I am open for any other suggestions.





Issue Time Tracking
---

Worklog Id: (was: 794058)
Time Spent: 1.5h  (was: 1h 20m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=793599=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793599
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 21/Jul/22 08:33
Start Date: 21/Jul/22 08:33
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r926391951


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {

Review Comment:
   could we pass a `Context ctx` here instead of initializing the destination 
table when acquiring locks? 



##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {

Review Comment:
   shouldn't we check that this was a CTAS operation? do we only need to 
cleanup if an exclusive lock is enabled?



##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {
+  if (destinationTable != null) {
+try {
+  CompactionRequest rqst = new CompactionRequest(
+  destinationTable.getDbName(), 
destinationTable.getTableName(), CompactionType.MAJOR);
+  
rqst.setRunas(TxnUtils.findUserToRunAs(destinationTable.getSd().getLocation(),
+  destinationTable.getTTable(), conf));
+
+  rqst.putToProperties("location", 
destinationTable.getSd().getLocation());

Review Comment:
   use hive_metastoreConstants.META_TABLE_LOCATION?



##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -29,19 +29,10 @@ Licensed to the Apache Software Foundation (ASF) under one
 import org.apache.hadoop.hive.metastore.IMetaStoreClient;
 import org.apache.hadoop.hive.metastore.LockComponentBuilder;
 import org.apache.hadoop.hive.metastore.LockRequestBuilder;
-import org.apache.hadoop.hive.metastore.api.LockComponent;
-import org.apache.hadoop.hive.metastore.api.LockResponse;
-import org.apache.hadoop.hive.metastore.api.LockState;
-import org.apache.hadoop.hive.metastore.api.MetaException;
-import org.apache.hadoop.hive.metastore.api.NoSuchLockException;
-import org.apache.hadoop.hive.metastore.api.NoSuchTxnException;
-import org.apache.hadoop.hive.metastore.api.TxnAbortedException;
-import org.apache.hadoop.hive.metastore.api.TxnToWriteId;
-import org.apache.hadoop.hive.metastore.api.CommitTxnRequest;
-import org.apache.hadoop.hive.metastore.api.DataOperationType;
-import org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse;
-import org.apache.hadoop.hive.metastore.api.TxnType;
+import org.apache.hadoop.hive.metastore.api.*;

Review Comment:
   please avoid wildcard imports



##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {
+  if (destinationTable != null) {
+try {
+  CompactionRequest rqst = new CompactionRequest(
+  destinationTable.getDbName(), 
destinationTable.getTableName(), CompactionType.MAJOR);
+  
rqst.setRunas(TxnUtils.findUserToRunAs(destinationTable.getSd().getLocation(),
+  destinationTable.getTTable(), conf));
+
+  rqst.putToProperties("location", 
destinationTable.getSd().getLocation());
+  rqst.putToProperties("ifPurge", Boolean.toString(true));
+  TxnStore txnHandler = TxnUtils.getTxnStore(conf);
+  txnHandler.submitForCleanup(rqst, 
destinationTable.getTTable().getWriteId(), getCurrentTxnId());
+} catch (InterruptedException | IOException | MetaException e) {
+  throw new RuntimeException("Not able to submit cleanup operation of 
directory written by CTAS");

Review Comment:
   should we catch just InterruptedException | IOException and re-throw 
MetaException here?



##
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##
@@ -7852,6 +7852,10 @@ protected Operator genFileSinkPlan(String dest, QB qb, 
Operator input)
   throw new SemanticException("Error while getting the full qualified 
path for the given directory: " + ex.getMessage());
 }
   }
+
+  if (!isNonNativeTable && 
AcidUtils.isTransactionalTable(destinationTable) && qb.isCTAS()) {

Review 

[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=793132=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793132
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 20/Jul/22 09:44
Start Date: 20/Jul/22 09:44
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r925400948


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {

Review Comment:
   Thanks for the explanation!





Issue Time Tracking
---

Worklog Id: (was: 793132)
Time Spent: 1h 10m  (was: 1h)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=793110=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793110
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 20/Jul/22 08:59
Start Date: 20/Jul/22 08:59
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r925356465


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {

Review Comment:
   Creation of table happens here - As part of DDLTask. 
   
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CreateTableOperation.java#L104-L117
   
   The test case I have written is essentially manipulating the contents of 
CreateTableDesc to null so that creation of table fails. But in reality, if 
there is a failure before creation of table then the data written will stay and 
is not associated to any table and is never cleaned up.





Issue Time Tracking
---

Worklog Id: (was: 793110)
Time Spent: 1h  (was: 50m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=793108=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793108
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 20/Jul/22 08:51
Start Date: 20/Jul/22 08:51
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r925344280


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {

Review Comment:
   Yes you are right. Since this scenario (test case written) is about the case 
when failure occurs before creation of table and after write, the directory 
must only be deleted.





Issue Time Tracking
---

Worklog Id: (was: 793108)
Time Spent: 50m  (was: 40m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=793103=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793103
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 20/Jul/22 08:48
Start Date: 20/Jul/22 08:48
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r925344280


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {

Review Comment:
   Yes you are correct.





Issue Time Tracking
---

Worklog Id: (was: 793103)
Time Spent: 0.5h  (was: 20m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=793104=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793104
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 20/Jul/22 08:48
Start Date: 20/Jul/22 08:48
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r925344280


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {

Review Comment:
   Yes you are right.





Issue Time Tracking
---

Worklog Id: (was: 793104)
Time Spent: 40m  (was: 0.5h)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=793093=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793093
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 20/Jul/22 08:35
Start Date: 20/Jul/22 08:35
Worklog Time Spent: 10m 
  Work Description: pvary commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r925331780


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {

Review Comment:
   Do I understand correctly, that we do not create a table for CTAS, we just 
create a `destinationTable` object which behaves as a table, but it is not 
created in the HMS?
   So basically we do not really have to drop the table, we just have to drop 
the directory?





Issue Time Tracking
---

Worklog Id: (was: 793093)
Time Spent: 20m  (was: 10m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=793076=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793076
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 20/Jul/22 07:54
Start Date: 20/Jul/22 07:54
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya opened a new pull request, #3457:
URL: https://github.com/apache/hive/pull/3457

   …f uncommitted data
   
   
   
   ### What changes were proposed in this pull request?
   
   Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data 
when the failure happens before creation of table and after write.
   
   
   ### Why are the changes needed?
   
   The data will lie around without cleanup and will accumulate.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Unit test
   




Issue Time Tracking
---

Worklog Id: (was: 793076)
Remaining Estimate: 0h
Time Spent: 10m

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)