[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=784107&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-784107 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 23/Jun/22 09:55 Start Date: 23/Jun/22 09:55 Worklog Time Spent: 10m Work Description: deniskuzZ merged PR #3307: URL: https://github.com/apache/hive/pull/3307 Issue Time Tracking --- Worklog Id: (was: 784107) Time Spent: 10.5h (was: 10h 20m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 10.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=783767&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-783767 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 22/Jun/22 08:35 Start Date: 22/Jun/22 08:35 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r903452007 ## ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java: ## @@ -3225,6 +3237,14 @@ public static String getPathSuffix(long txnId) { return (SOFT_DELETE_PATH_SUFFIX + String.format(DELTA_DIGITS, txnId)); } + public static boolean isExclusiveCTASEnabled(Configuration conf) { +return HiveConf.getBoolVar(conf, ConfVars.TXN_CTAS_X_LOCK); + } + + public static boolean isExclusiveCTASEnabled(Table t, Configuration conf) { +return HiveConf.getBoolVar(conf, ConfVars.TXN_CTAS_X_LOCK) && isTransactionalTable(t); Review Comment: if you'll be submitting new fixes, could we please rename just this method to `isTableExclusiveCTASEnabled` to avoid confusion with `isExclusiveCTASEnabled` or just remove isExclusiveCTASEnabled - it's using in 1 place so you could just directly access conf there. Issue Time Tracking --- Worklog Id: (was: 783767) Time Spent: 10h 20m (was: 10h 10m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 10h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=783765&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-783765 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 22/Jun/22 08:29 Start Date: 22/Jun/22 08:29 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r903446307 ## ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java: ## @@ -3086,6 +3086,13 @@ Seems much cleaner if each stmt is identified as a particular HiveOperation (whi output.getWriteType().name())); break; + case CTAS: +if (AcidUtils.isExclusiveCTASEnabled(t, conf)) { + compBuilder.setExclWrite(); + compBuilder.setOperationType(DataOperationType.NO_TXN); + break; +} Review Comment: i do not see `continue` keyword Issue Time Tracking --- Worklog Id: (was: 783765) Time Spent: 10h 10m (was: 10h) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 10h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=783753&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-783753 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 22/Jun/22 08:03 Start Date: 22/Jun/22 08:03 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r903420540 ## ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java: ## @@ -3086,6 +3086,13 @@ Seems much cleaner if each stmt is identified as a particular HiveOperation (whi output.getWriteType().name())); break; + case CTAS: +if (AcidUtils.isExclusiveCTASEnabled(t, conf)) { + compBuilder.setExclWrite(); + compBuilder.setOperationType(DataOperationType.NO_TXN); + break; +} Review Comment: Yes, it should continue if it is not a transactional table or the config is not enabled. This was the old behavior as well. Issue Time Tracking --- Worklog Id: (was: 783753) Time Spent: 10h (was: 9h 50m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 10h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=783749&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-783749 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 22/Jun/22 07:52 Start Date: 22/Jun/22 07:52 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r903406430 ## ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java: ## @@ -3086,6 +3086,13 @@ Seems much cleaner if each stmt is identified as a particular HiveOperation (whi output.getWriteType().name())); break; + case CTAS: +if (AcidUtils.isExclusiveCTASEnabled(t, conf)) { + compBuilder.setExclWrite(); + compBuilder.setOperationType(DataOperationType.NO_TXN); + break; +} Review Comment: should we continue otherwise? Issue Time Tracking --- Worklog Id: (was: 783749) Time Spent: 9h 50m (was: 9h 40m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 9h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=783735&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-783735 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 22/Jun/22 07:26 Start Date: 22/Jun/22 07:26 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r903379054 ## serde/if/test/complex.thrift: ## @@ -1,3 +1,4 @@ + Review Comment: nit: space Issue Time Tracking --- Worklog Id: (was: 783735) Time Spent: 9h 40m (was: 9.5h) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 9h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=783140&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-783140 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 20/Jun/22 22:24 Start Date: 20/Jun/22 22:24 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r901999420 ## ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java: ## @@ -3086,6 +3086,13 @@ Seems much cleaner if each stmt is identified as a particular HiveOperation (whi output.getWriteType().name())); break; + case CTAS: +if (AcidUtils.isExclusiveCTASEnabled(conf) && AcidUtils.isTransactionalTable(t)) { Review Comment: No, since we do not have table data at the other places where the isExclusiveCTASEnabled method is called. Issue Time Tracking --- Worklog Id: (was: 783140) Time Spent: 9.5h (was: 9h 20m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 9.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=783128&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-783128 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 20/Jun/22 21:47 Start Date: 20/Jun/22 21:47 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r901984683 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: ## @@ -3163,6 +3168,7 @@ private boolean shouldUpdateTxnComponent(long txnid, LockRequest rqst, LockCompo case INSERT: case UPDATE: case DELETE: +case CTAS: Review Comment: It was NO_TXN before. Issue Time Tracking --- Worklog Id: (was: 783128) Time Spent: 9h 20m (was: 9h 10m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 9h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=782901&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782901 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 20/Jun/22 09:20 Start Date: 20/Jun/22 09:20 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r901442197 ## ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java: ## @@ -3086,6 +3086,13 @@ Seems much cleaner if each stmt is identified as a particular HiveOperation (whi output.getWriteType().name())); break; + case CTAS: +if (AcidUtils.isExclusiveCTASEnabled(conf) && AcidUtils.isTransactionalTable(t)) { Review Comment: Could we move `AcidUtils.isTransactionalTable(t)` check inside of isExclusiveCTASEnabled? AcidUtils.isExclusiveCTASEnabled(t, conf) Issue Time Tracking --- Worklog Id: (was: 782901) Time Spent: 9h 10m (was: 9h) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 9h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=782898&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782898 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 20/Jun/22 09:19 Start Date: 20/Jun/22 09:19 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r901442197 ## ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java: ## @@ -3086,6 +3086,13 @@ Seems much cleaner if each stmt is identified as a particular HiveOperation (whi output.getWriteType().name())); break; + case CTAS: +if (AcidUtils.isExclusiveCTASEnabled(conf) && AcidUtils.isTransactionalTable(t)) { Review Comment: Could we move `AcidUtils.isTransactionalTable(t)` check inside of isExclusiveCTASEnabled? Issue Time Tracking --- Worklog Id: (was: 782898) Time Spent: 9h (was: 8h 50m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 9h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=782879&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782879 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 20/Jun/22 08:51 Start Date: 20/Jun/22 08:51 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r901411913 ## ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java: ## @@ -3104,6 +3111,11 @@ Seems much cleaner if each stmt is identified as a particular HiveOperation (whi return lockComponents; } + public static boolean isExclusiveCTAS(List lockComponents, HiveConf conf) { +return lockComponents.stream().anyMatch(lc -> DataOperationType.CTAS == lc.getOperationType() Review Comment: use Set'<'WriteEntity'>' outputs.getType isExclusiveCTAS(work.getOutputs()) Issue Time Tracking --- Worklog Id: (was: 782879) Time Spent: 8h 50m (was: 8h 40m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 8h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=782877&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782877 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 20/Jun/22 08:50 Start Date: 20/Jun/22 08:50 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r901411913 ## ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java: ## @@ -3104,6 +3111,11 @@ Seems much cleaner if each stmt is identified as a particular HiveOperation (whi return lockComponents; } + public static boolean isExclusiveCTAS(List lockComponents, HiveConf conf) { +return lockComponents.stream().anyMatch(lc -> DataOperationType.CTAS == lc.getOperationType() Review Comment: use Set outputs.getType isExclusiveCTAS(work.getOutputs()) ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/OperationType.java: ## @@ -34,7 +34,8 @@ public enum OperationType { INSERT('i', DataOperationType.INSERT), UPDATE('u', DataOperationType.UPDATE), DELETE('d', DataOperationType.DELETE), - COMPACT('c', null); + COMPACT('c', null), + CTAS('t', DataOperationType.CTAS); Review Comment: remove this Issue Time Tracking --- Worklog Id: (was: 782877) Time Spent: 8h 40m (was: 8.5h) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 8h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=782876&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782876 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 20/Jun/22 08:47 Start Date: 20/Jun/22 08:47 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r901411252 ## ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java: ## @@ -3086,6 +3086,13 @@ Seems much cleaner if each stmt is identified as a particular HiveOperation (whi output.getWriteType().name())); break; + case CTAS: +if (AcidUtils.isExclusiveCTASEnabled(conf) && AcidUtils.isTransactionalTable(t)) { + compBuilder.setExclWrite(); + compBuilder.setOperationType(DataOperationType.CTAS); Review Comment: set to `NO_TXN` Issue Time Tracking --- Worklog Id: (was: 782876) Time Spent: 8.5h (was: 8h 20m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 8.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=782875&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782875 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 20/Jun/22 08:41 Start Date: 20/Jun/22 08:41 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r901405027 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: ## @@ -310,6 +310,11 @@ abstract class TxnHandler implements TxnStore, TxnStore.MutexAPI { "INNER JOIN \"TXNS\" ON \"TC_TXNID\" = \"TXN_ID\" WHERE \"TXN_STATE\" = " + TxnStatus.ABORTED + " GROUP BY \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\" HAVING COUNT(\"TXN_ID\") > ?"; + private static final String EX_CTAS_ERR_MSG = Review Comment: rename to `EXCL_CTAS_ERR_MSG` Issue Time Tracking --- Worklog Id: (was: 782875) Time Spent: 8h 20m (was: 8h 10m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 8h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=782874&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782874 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 20/Jun/22 08:40 Start Date: 20/Jun/22 08:40 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r901399288 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: ## @@ -3163,6 +3168,7 @@ private boolean shouldUpdateTxnComponent(long txnid, LockRequest rqst, LockCompo case INSERT: case UPDATE: case DELETE: +case CTAS: Review Comment: was the operationType for CTAS = 'i' before so we had to include it here? I think that should be removed, it we didn't insert before Issue Time Tracking --- Worklog Id: (was: 782874) Time Spent: 8h 10m (was: 8h) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 8h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=782873&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782873 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 20/Jun/22 08:35 Start Date: 20/Jun/22 08:35 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r901399288 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: ## @@ -3163,6 +3168,7 @@ private boolean shouldUpdateTxnComponent(long txnid, LockRequest rqst, LockCompo case INSERT: case UPDATE: case DELETE: +case CTAS: Review Comment: was the operationType for CTAS = 'i' before so we had to include it here? Issue Time Tracking --- Worklog Id: (was: 782873) Time Spent: 8h (was: 7h 50m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 8h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=782870&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782870 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 20/Jun/22 08:33 Start Date: 20/Jun/22 08:33 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r901397117 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: ## @@ -5280,23 +5291,24 @@ is performed on that db (e.g. show tables, created table, etc). LOG.debug("Failure to acquire lock({} intLockId:{} {}), blocked by ({})", JavaUtils.lockIdToString(extLockId), intLockId, JavaUtils.txnIdToString(txnId), blockedBy); -if (zeroWaitReadEnabled && isValidTxn(txnId)) { +if ((zeroWaitReadEnabled || isExclusiveCTAS) && isValidTxn(txnId)) { LockType lockType = LockTypeUtil.getLockTypeFromEncoding(lockChar) - .orElseThrow(() -> new MetaException("Unknown lock type: " + lockChar)); - - if (lockType == LockType.SHARED_READ) { -String cleanupQuery = "DELETE FROM \"HIVE_LOCKS\" WHERE \"HL_LOCK_EXT_ID\" = " + extLockId; + .orElseThrow(() -> new MetaException("Unknown lock type: " + lockChar)); -LOG.debug("Going to execute query: <" + cleanupQuery + ">"); -stmt.executeUpdate(cleanupQuery); + if (lockType == LockType.SHARED_READ || isExclusiveCTAS) { +if (!isExclusiveCTAS) { Review Comment: delete from HIVE_LOCKS should be done for isExclusiveCTAS as well Issue Time Tracking --- Worklog Id: (was: 782870) Time Spent: 7h 50m (was: 7h 40m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 7h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=782864&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782864 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 20/Jun/22 08:24 Start Date: 20/Jun/22 08:24 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r901388096 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: ## @@ -3163,6 +3168,7 @@ private boolean shouldUpdateTxnComponent(long txnid, LockRequest rqst, LockCompo case INSERT: case UPDATE: case DELETE: +case CTAS: Review Comment: do we need new operationType here? Issue Time Tracking --- Worklog Id: (was: 782864) Time Spent: 7.5h (was: 7h 20m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 7.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=782865&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782865 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 20/Jun/22 08:24 Start Date: 20/Jun/22 08:24 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r901388685 ## common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: ## @@ -4678,6 +4678,12 @@ public static enum ConfVars { HIVE_ACID_DIRECT_INSERT_ENABLED("hive.acid.direct.insert.enabled", true, "Enable writing the data files directly to the table's final destination instead of the staging directory." + "This optimization only applies on INSERT operations on ACID tables."), + HIVE_ACID_CHECK_FOR_CONCURRENT_CTAS_ENABLED("hive.acid.check.for.concurrent.ctas.enabled", false, Review Comment: unused config Issue Time Tracking --- Worklog Id: (was: 782865) Time Spent: 7h 40m (was: 7.5h) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 7h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=782251&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782251 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 17/Jun/22 06:37 Start Date: 17/Jun/22 06:37 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r899807718 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: ## @@ -5297,6 +5303,28 @@ is performed on that db (e.g. show tables, created table, etc). return response; } } + +if (checkForConcurrentCtas && isValidTxn(txnId)) { + LockType lockType = LockTypeUtil.getLockTypeFromEncoding(lockChar) + .orElseThrow(() -> new MetaException("Unknown lock type: " + lockChar)); + + if (lockType == LockType.EXCL_WRITE && blockedBy.state == LockState.ACQUIRED) { + +String deleteBlockedByTxnComp = "DELETE FROM \"TXN_COMPONENTS\" WHERE" + " \"TC_TXNID\"=" + txnId; Review Comment: Realized that the cleaner will take care of this. I have removed the delete query in the recent commit. Issue Time Tracking --- Worklog Id: (was: 782251) Time Spent: 7h 10m (was: 7h) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 7h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=782252&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782252 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 17/Jun/22 06:37 Start Date: 17/Jun/22 06:37 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r899808001 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: ## @@ -5297,6 +5303,28 @@ is performed on that db (e.g. show tables, created table, etc). return response; } } + +if (checkForConcurrentCtas && isValidTxn(txnId)) { Review Comment: done Issue Time Tracking --- Worklog Id: (was: 782252) Time Spent: 7h 20m (was: 7h 10m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 7h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=781817&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781817 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 15/Jun/22 19:59 Start Date: 15/Jun/22 19:59 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r898367198 ## ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java: ## @@ -3104,6 +3112,15 @@ Seems much cleaner if each stmt is identified as a particular HiveOperation (whi return lockComponents; } + public static boolean isCTASOperation(List lockComponents, HiveConf conf) { +boolean isCtas = false; +for (LockComponent lock : lockComponents) { + if (lock.getOperationType().name().equals(OperationType.CTAS.name()) && + conf.getBoolVar(ConfVars.HIVE_ACID_CHECK_FOR_CONCURRENT_CTAS_ENABLED)) Review Comment: Done. Issue Time Tracking --- Worklog Id: (was: 781817) Time Spent: 7h (was: 6h 50m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 7h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=781808&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781808 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 15/Jun/22 19:40 Start Date: 15/Jun/22 19:40 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r898345184 ## common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: ## @@ -4678,6 +4678,9 @@ public static enum ConfVars { HIVE_ACID_DIRECT_INSERT_ENABLED("hive.acid.direct.insert.enabled", true, "Enable writing the data files directly to the table's final destination instead of the staging directory." + "This optimization only applies on INSERT operations on ACID tables."), + HIVE_ACID_CHECK_FOR_CONCURRENT_CTAS_ENABLED("hive.acid.check.for.concurrent.ctas.enabled", false, Review Comment: done Issue Time Tracking --- Worklog Id: (was: 781808) Time Spent: 6h 50m (was: 6h 40m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 6h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=781807&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781807 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 15/Jun/22 19:39 Start Date: 15/Jun/22 19:39 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r898344978 ## ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java: ## @@ -420,6 +420,7 @@ LockState acquireLocks(QueryPlan plan, Context ctx, String username, boolean isB } List lockComponents = AcidUtils.makeLockComponents(plan.getOutputs(), plan.getInputs(), ctx.getOperation(), conf); + rqstBuilder.setCheckForConcurrentCtas(AcidUtils.isCTASOperation(lockComponents, conf)); Review Comment: sure Issue Time Tracking --- Worklog Id: (was: 781807) Time Spent: 6h 40m (was: 6.5h) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 6h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=781806&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781806 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 15/Jun/22 19:39 Start Date: 15/Jun/22 19:39 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r898344710 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: ## @@ -5297,6 +5303,28 @@ is performed on that db (e.g. show tables, created table, etc). return response; } } + +if (checkForConcurrentCtas && isValidTxn(txnId)) { + LockType lockType = LockTypeUtil.getLockTypeFromEncoding(lockChar) + .orElseThrow(() -> new MetaException("Unknown lock type: " + lockChar)); + + if (lockType == LockType.EXCL_WRITE && blockedBy.state == LockState.ACQUIRED) { + +String deleteBlockedByTxnComp = "DELETE FROM \"TXN_COMPONENTS\" WHERE" + " \"TC_TXNID\"=" + txnId; Review Comment: No, txn abort still leaves behind entries in TXN_COMPONENTS . This happens even when an insert query is aborted. Looks like a bug. Below TXN_ID = 2 was aborted. But TXN_COMPONENTS still has entry. ``` mysql> select * from TXNS; ++---+---++--+---++---+-+--+ | TXN_ID | TXN_STATE | TXN_STARTED | TXN_LAST_HEARTBEAT | TXN_USER | TXN_HOST | TXN_AGENT_INFO | TXN_META_INFO | TXN_HEARTBEAT_COUNT | TXN_TYPE | ++---+---++--+---++---+-+--+ | 0 | c | 0 | 0 | | | NULL | NULL |NULL | NULL | | 1 | c | 1655320638716 | 1655320638716 | hive | localhost | NULL | NULL |NULL |0 | | 2 | a | 1655320644989 | 1655320664896 | hive | localhost | NULL | NULL |NULL |0 | | 3 | c | 1655320688403 | 1655320688403 | hive | localhost | NULL | NULL |NULL |0 | | 4 | c | 1655320719660 | 1655320719660 | hive | localhost | NULL | NULL |NULL |0 | mysql> select * from TXN_COMPONENTS; +--+-+--+--+---++ | TC_TXNID | TC_DATABASE | TC_TABLE | TC_PARTITION | TC_OPERATION_TYPE | TC_WRITEID | +--+-+--+--+---++ |2 | default | t1 | NULL | i | 1 | +--+-+--+--+---++ 1 row in set (0.00 sec) ``` Issue Time Tracking --- Worklog Id: (was: 781806) Time Spent: 6.5h (was: 6h 20m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 6.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=781103&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781103 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 14/Jun/22 11:26 Start Date: 14/Jun/22 11:26 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r896696959 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: ## @@ -5297,6 +5303,28 @@ is performed on that db (e.g. show tables, created table, etc). return response; } } + +if (checkForConcurrentCtas && isValidTxn(txnId)) { Review Comment: embed this check with the above zeroWaitReadEnabled if ((zeroWaitReadEnabled || isExclusiveCTAS) && isValidTxn(txnId)) { if (lockType == LockType.SHARED_READ || isExclusiveCTAS) { make sure to set proper setErrorMessage in case of exclusiveCTAS Issue Time Tracking --- Worklog Id: (was: 781103) Time Spent: 6h 20m (was: 6h 10m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 6h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=781102&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781102 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 14/Jun/22 11:24 Start Date: 14/Jun/22 11:24 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r896696959 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: ## @@ -5297,6 +5303,28 @@ is performed on that db (e.g. show tables, created table, etc). return response; } } + +if (checkForConcurrentCtas && isValidTxn(txnId)) { Review Comment: embed this check with above zeroWaitReadEnabled if ((zeroWaitReadEnabled || isExclusiveCTAS) && isValidTxn(txnId)) { if (lockType == LockType.SHARED_READ || isExclusiveCTAS) { ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: ## @@ -5297,6 +5303,28 @@ is performed on that db (e.g. show tables, created table, etc). return response; } } + +if (checkForConcurrentCtas && isValidTxn(txnId)) { Review Comment: embed this check with the above zeroWaitReadEnabled if ((zeroWaitReadEnabled || isExclusiveCTAS) && isValidTxn(txnId)) { if (lockType == LockType.SHARED_READ || isExclusiveCTAS) { Issue Time Tracking --- Worklog Id: (was: 781102) Time Spent: 6h 10m (was: 6h) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 6h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=781101&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781101 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 14/Jun/22 11:20 Start Date: 14/Jun/22 11:20 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r896679958 ## ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java: ## @@ -3225,6 +3242,9 @@ public static String getPathSuffix(long txnId) { return (SOFT_DELETE_PATH_SUFFIX + String.format(DELTA_DIGITS, txnId)); } + public static boolean isNoRenameCtasEnabled(Configuration conf) { Review Comment: could we please rename it to isExclusiveCTASEnabled should we check here as well if table is transactional Issue Time Tracking --- Worklog Id: (was: 781101) Time Spent: 6h (was: 5h 50m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 6h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=781100&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781100 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 14/Jun/22 11:20 Start Date: 14/Jun/22 11:20 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r896679958 ## ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java: ## @@ -3225,6 +3242,9 @@ public static String getPathSuffix(long txnId) { return (SOFT_DELETE_PATH_SUFFIX + String.format(DELTA_DIGITS, txnId)); } + public static boolean isNoRenameCtasEnabled(Configuration conf) { Review Comment: could we please rename it to isExclusiveCTASEnabled should we check here if table isTransactional Issue Time Tracking --- Worklog Id: (was: 781100) Time Spent: 5h 50m (was: 5h 40m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 5h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=781099&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781099 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 14/Jun/22 11:18 Start Date: 14/Jun/22 11:18 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r896684186 ## ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java: ## @@ -3104,6 +3112,15 @@ Seems much cleaner if each stmt is identified as a particular HiveOperation (whi return lockComponents; } + public static boolean isCTASOperation(List lockComponents, HiveConf conf) { +boolean isCtas = false; +for (LockComponent lock : lockComponents) { + if (lock.getOperationType().name().equals(OperationType.CTAS.name()) && + conf.getBoolVar(ConfVars.HIVE_ACID_CHECK_FOR_CONCURRENT_CTAS_ENABLED)) Review Comment: return lockComponents.stream().anyMatch(lc -> DataOperationType.CTAS == lc.getOperationType() && isExclusiveCTASEnabled(conf)) Issue Time Tracking --- Worklog Id: (was: 781099) Time Spent: 5h 40m (was: 5.5h) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 5h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=781098&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781098 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 14/Jun/22 11:17 Start Date: 14/Jun/22 11:17 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r896672973 ## ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java: ## @@ -420,6 +420,7 @@ LockState acquireLocks(QueryPlan plan, Context ctx, String username, boolean isB } List lockComponents = AcidUtils.makeLockComponents(plan.getOutputs(), plan.getInputs(), ctx.getOperation(), conf); + rqstBuilder.setCheckForConcurrentCtas(AcidUtils.isCTASOperation(lockComponents, conf)); Review Comment: could we rename it to `isExclusiveCTAS` rqstBuilder.setExclusiveCTAS(AcidUtils.isExclusiveCTAS(lockComponents, conf)) Issue Time Tracking --- Worklog Id: (was: 781098) Time Spent: 5.5h (was: 5h 20m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 5.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=781096&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781096 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 14/Jun/22 11:16 Start Date: 14/Jun/22 11:16 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r896684186 ## ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java: ## @@ -3104,6 +3112,15 @@ Seems much cleaner if each stmt is identified as a particular HiveOperation (whi return lockComponents; } + public static boolean isCTASOperation(List lockComponents, HiveConf conf) { +boolean isCtas = false; +for (LockComponent lock : lockComponents) { + if (lock.getOperationType().name().equals(OperationType.CTAS.name()) && + conf.getBoolVar(ConfVars.HIVE_ACID_CHECK_FOR_CONCURRENT_CTAS_ENABLED)) Review Comment: boolean isExclusiveCtas = lockComponents.stream().anyMatch(lc -> DataOperationType.CTAS == lc.getOperationType() && isExclusiveCTASEnabled(conf)) ## ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java: ## @@ -3104,6 +3112,15 @@ Seems much cleaner if each stmt is identified as a particular HiveOperation (whi return lockComponents; } + public static boolean isCTASOperation(List lockComponents, HiveConf conf) { +boolean isCtas = false; +for (LockComponent lock : lockComponents) { + if (lock.getOperationType().name().equals(OperationType.CTAS.name()) && + conf.getBoolVar(ConfVars.HIVE_ACID_CHECK_FOR_CONCURRENT_CTAS_ENABLED)) Review Comment: boolean isExclusiveCTAS = lockComponents.stream().anyMatch(lc -> DataOperationType.CTAS == lc.getOperationType() && isExclusiveCTASEnabled(conf)) Issue Time Tracking --- Worklog Id: (was: 781096) Time Spent: 5h 20m (was: 5h 10m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 5h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=781094&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781094 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 14/Jun/22 11:09 Start Date: 14/Jun/22 11:09 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r896684186 ## ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java: ## @@ -3104,6 +3112,15 @@ Seems much cleaner if each stmt is identified as a particular HiveOperation (whi return lockComponents; } + public static boolean isCTASOperation(List lockComponents, HiveConf conf) { +boolean isCtas = false; +for (LockComponent lock : lockComponents) { + if (lock.getOperationType().name().equals(OperationType.CTAS.name()) && + conf.getBoolVar(ConfVars.HIVE_ACID_CHECK_FOR_CONCURRENT_CTAS_ENABLED)) Review Comment: why not use isNoRenameCtasEnabled/isExclusiveCTASEnabled helper method Issue Time Tracking --- Worklog Id: (was: 781094) Time Spent: 5h 10m (was: 5h) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 5h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=781092&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781092 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 14/Jun/22 11:08 Start Date: 14/Jun/22 11:08 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r896683204 ## ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java: ## @@ -3104,6 +3112,15 @@ Seems much cleaner if each stmt is identified as a particular HiveOperation (whi return lockComponents; } + public static boolean isCTASOperation(List lockComponents, HiveConf conf) { Review Comment: could we rename it to isExclusiveCTAS Issue Time Tracking --- Worklog Id: (was: 781092) Time Spent: 5h (was: 4h 50m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=781089&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781089 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 14/Jun/22 11:03 Start Date: 14/Jun/22 11:03 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r896679958 ## ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java: ## @@ -3225,6 +3242,9 @@ public static String getPathSuffix(long txnId) { return (SOFT_DELETE_PATH_SUFFIX + String.format(DELTA_DIGITS, txnId)); } + public static boolean isNoRenameCtasEnabled(Configuration conf) { Review Comment: could we please rename it to isExclusiveCTASEnabled Issue Time Tracking --- Worklog Id: (was: 781089) Time Spent: 4h 50m (was: 4h 40m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 4h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=781088&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781088 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 14/Jun/22 11:01 Start Date: 14/Jun/22 11:01 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r896678287 ## common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: ## @@ -4678,6 +4678,9 @@ public static enum ConfVars { HIVE_ACID_DIRECT_INSERT_ENABLED("hive.acid.direct.insert.enabled", true, "Enable writing the data files directly to the table's final destination instead of the staging directory." + "This optimization only applies on INSERT operations on ACID tables."), + HIVE_ACID_CHECK_FOR_CONCURRENT_CTAS_ENABLED("hive.acid.check.for.concurrent.ctas.enabled", false, Review Comment: please rename to TXN_CTAS_X_LOCK("hive.txn.xlock.ctas" Issue Time Tracking --- Worklog Id: (was: 781088) Time Spent: 4h 40m (was: 4.5h) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 4h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=781085&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781085 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 14/Jun/22 10:55 Start Date: 14/Jun/22 10:55 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r896672973 ## ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java: ## @@ -420,6 +420,7 @@ LockState acquireLocks(QueryPlan plan, Context ctx, String username, boolean isB } List lockComponents = AcidUtils.makeLockComponents(plan.getOutputs(), plan.getInputs(), ctx.getOperation(), conf); + rqstBuilder.setCheckForConcurrentCtas(AcidUtils.isCTASOperation(lockComponents, conf)); Review Comment: could we rename it to `isExclusiveCTAS` Issue Time Tracking --- Worklog Id: (was: 781085) Time Spent: 4.5h (was: 4h 20m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 4.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=781081&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781081 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 14/Jun/22 10:50 Start Date: 14/Jun/22 10:50 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r896668783 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: ## @@ -5297,6 +5303,28 @@ is performed on that db (e.g. show tables, created table, etc). return response; } } + +if (checkForConcurrentCtas && isValidTxn(txnId)) { + LockType lockType = LockTypeUtil.getLockTypeFromEncoding(lockChar) + .orElseThrow(() -> new MetaException("Unknown lock type: " + lockChar)); + + if (lockType == LockType.EXCL_WRITE && blockedBy.state == LockState.ACQUIRED) { Review Comment: no need for that, this is already embedded in checkForConcurrentCtas Issue Time Tracking --- Worklog Id: (was: 781081) Time Spent: 4h 20m (was: 4h 10m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 4h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=781080&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781080 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 14/Jun/22 10:48 Start Date: 14/Jun/22 10:48 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r896667206 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: ## @@ -5297,6 +5303,28 @@ is performed on that db (e.g. show tables, created table, etc). return response; } } + +if (checkForConcurrentCtas && isValidTxn(txnId)) { + LockType lockType = LockTypeUtil.getLockTypeFromEncoding(lockChar) + .orElseThrow(() -> new MetaException("Unknown lock type: " + lockChar)); + + if (lockType == LockType.EXCL_WRITE && blockedBy.state == LockState.ACQUIRED) { + +String deleteBlockedByTxnComp = "DELETE FROM \"TXN_COMPONENTS\" WHERE" + " \"TC_TXNID\"=" + txnId; Review Comment: no need for txn_components cleanup, txn abort should do the trick Issue Time Tracking --- Worklog Id: (was: 781080) Time Spent: 4h 10m (was: 4h) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 4h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=780336&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780336 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 10/Jun/22 15:06 Start Date: 10/Jun/22 15:06 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r894630680 ## common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: ## @@ -4667,6 +4667,9 @@ public static enum ConfVars { HIVE_ACID_DIRECT_INSERT_ENABLED("hive.acid.direct.insert.enabled", true, "Enable writing the data files directly to the table's final destination instead of the staging directory." + "This optimization only applies on INSERT operations on ACID tables."), +HIVE_ACID_NO_RENAME_CTAS_ENABLED("hive.acid.no.rename.ctas.enabled", false, Review Comment: Changed the conf name to hive.acid.check.for.concurrent.ctas.enabled as it is better suited here. Direct CTAS patch does not introduce a new conf. It depends on hive.acid.direct.insert.enabled . Issue Time Tracking --- Worklog Id: (was: 780336) Time Spent: 4h (was: 3h 50m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 4h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=780333&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780333 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 10/Jun/22 15:02 Start Date: 10/Jun/22 15:02 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r894626694 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: ## @@ -3087,7 +3090,24 @@ private void insertTxnComponents(long txnid, LockRequest rqst, Connection dbConn String tblName = normalizeCase(lc.getTablename()); String partName = normalizePartitionCase(lc.getPartitionname()); OperationType opType = OperationType.fromDataOperationType(lc.getOperationType()); - + if (opType.getSqlConst().equals(OperationType.CTAS.getSqlConst())) { Review Comment: Reverted to checkLocks method in the new commit. Issue Time Tracking --- Worklog Id: (was: 780333) Time Spent: 3h 40m (was: 3.5h) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 3h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=780334&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780334 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 10/Jun/22 15:03 Start Date: 10/Jun/22 15:03 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r894627428 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: ## @@ -60,6 +60,7 @@ import javax.sql.DataSource; import com.google.common.collect.ImmutableList; +import jline.internal.Log; Review Comment: Intellij auto imported this... have removed it now. Issue Time Tracking --- Worklog Id: (was: 780334) Time Spent: 3h 50m (was: 3h 40m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 3h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=779800&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-779800 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 09/Jun/22 07:57 Start Date: 09/Jun/22 07:57 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r893190254 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: ## @@ -3087,7 +3090,24 @@ private void insertTxnComponents(long txnid, LockRequest rqst, Connection dbConn String tblName = normalizeCase(lc.getTablename()); String partName = normalizePartitionCase(lc.getPartitionname()); OperationType opType = OperationType.fromDataOperationType(lc.getOperationType()); - + if (opType.getSqlConst().equals(OperationType.CTAS.getSqlConst())) { Review Comment: why do you even need to create locks for CTAS with this approach? that won't work as select doesn't even use any locking and we shouldn't add one. Why can't we stick to checkLock method and only check for Excl_Write and Exclusive locks in case of CTAS operation? Issue Time Tracking --- Worklog Id: (was: 779800) Time Spent: 3.5h (was: 3h 20m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 3.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=779799&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-779799 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 09/Jun/22 07:51 Start Date: 09/Jun/22 07:51 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r893184493 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: ## @@ -60,6 +60,7 @@ import javax.sql.DataSource; import com.google.common.collect.ImmutableList; +import jline.internal.Log; Review Comment: we should probably use `org.slf4j.Logger` as everywhere else Issue Time Tracking --- Worklog Id: (was: 779799) Time Spent: 3h 20m (was: 3h 10m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 3h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=779797&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-779797 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 09/Jun/22 07:47 Start Date: 09/Jun/22 07:47 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r893180482 ## common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: ## @@ -4667,6 +4667,9 @@ public static enum ConfVars { HIVE_ACID_DIRECT_INSERT_ENABLED("hive.acid.direct.insert.enabled", true, "Enable writing the data files directly to the table's final destination instead of the staging directory." + "This optimization only applies on INSERT operations on ACID tables."), +HIVE_ACID_NO_RENAME_CTAS_ENABLED("hive.acid.no.rename.ctas.enabled", false, Review Comment: you'll probably need to rebase as CTAS change is already merged Issue Time Tracking --- Worklog Id: (was: 779797) Time Spent: 3h 10m (was: 3h) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 3h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=779352&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-779352 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 08/Jun/22 08:27 Start Date: 08/Jun/22 08:27 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r892070374 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: ## @@ -5283,6 +5284,39 @@ is performed on that db (e.g. show tables, created table, etc). return response; } } + +if (isValidTxn(txnId)) { + LockType lockType = LockTypeUtil.getLockTypeFromEncoding(lockChar) + .orElseThrow(() -> new MetaException("Unknown lock type: " + lockChar)); + + if (lockType == LockType.EXCL_WRITE && blockedBy.state == LockState.ACQUIRED) { Review Comment: I was able to optimize it by moving this check to a much earlier step, right before we call checkLocks. This prevents unnecessary insert into the TXN_COMPONENTS table and saves us at least 3 to 4 subsequent queries to the metastore. Issue Time Tracking --- Worklog Id: (was: 779352) Time Spent: 3h (was: 2h 50m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=777328&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-777328 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 02/Jun/22 07:33 Start Date: 02/Jun/22 07:33 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r887639411 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: ## @@ -5283,6 +5284,39 @@ is performed on that db (e.g. show tables, created table, etc). return response; } } + +if (isValidTxn(txnId)) { + LockType lockType = LockTypeUtil.getLockTypeFromEncoding(lockChar) + .orElseThrow(() -> new MetaException("Unknown lock type: " + lockChar)); + + if (lockType == LockType.EXCL_WRITE && blockedBy.state == LockState.ACQUIRED) { Review Comment: I don't really like that we are adding extra overhead in checkLocks method, it's already a sensitive part performance-wise. I think we should try to optimize: if it's CTAS we know that it could only be blocked by another artificial CTAS or DROP database (EXCLUSIVE + EXCL_WRITE), so no need to run an expensive checkLock `BIG` query. Also that would mean that we can just give up and do not check against TXNS table what is the type of blocking TXN. Also, I would expect IOW to behave similarly to CTAS, currently it doesn't fail and is executed in sequential order, however, it doesn't require any extra cleanup in case of failure. So I am OK with the selected approach, but we should try to optimize if possible. Issue Time Tracking --- Worklog Id: (was: 777328) Time Spent: 2h 50m (was: 2h 40m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=777325&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-777325 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 02/Jun/22 07:28 Start Date: 02/Jun/22 07:28 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r887639411 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: ## @@ -5283,6 +5284,39 @@ is performed on that db (e.g. show tables, created table, etc). return response; } } + +if (isValidTxn(txnId)) { + LockType lockType = LockTypeUtil.getLockTypeFromEncoding(lockChar) + .orElseThrow(() -> new MetaException("Unknown lock type: " + lockChar)); + + if (lockType == LockType.EXCL_WRITE && blockedBy.state == LockState.ACQUIRED) { Review Comment: I don't really like that we are adding extra overhead in checkLocks method, it's already a sensitive part performance-wise. I think we should try to optimize: if it's CTAS we know that it could only be blocked by another artificial CTAS or DROP database (EXCLUSIVE + EXCL_WRITE), so no need to run expensive checkLock `BIG` query. Also, I would expect IOW to behave similarly to CTAS, currently it doesn't fail and is executed in sequential order, however, it doesn't require any extra cleanup in case of failure. So I am OK with the selected approach, but we should try to optimize if possible. Issue Time Tracking --- Worklog Id: (was: 777325) Time Spent: 2h 40m (was: 2.5h) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=777322&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-777322 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 02/Jun/22 07:26 Start Date: 02/Jun/22 07:26 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r887639411 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: ## @@ -5283,6 +5284,39 @@ is performed on that db (e.g. show tables, created table, etc). return response; } } + +if (isValidTxn(txnId)) { + LockType lockType = LockTypeUtil.getLockTypeFromEncoding(lockChar) + .orElseThrow(() -> new MetaException("Unknown lock type: " + lockChar)); + + if (lockType == LockType.EXCL_WRITE && blockedBy.state == LockState.ACQUIRED) { Review Comment: I don't really like that we are adding extra overhead in checkLocks method, it's already a sensitive part performance-wise. I think we should try to optimize: if it's CTAS we know that it could only be blocked by another artificial CTAS or DROP database (EXCLUSIVE + EXCL_WRITE), so no need to run expensive checkLock `BIG` query. Also, I would expect IOW to behave similarly to CTAS, currently it doesn't fail and is executed in sequential order. Issue Time Tracking --- Worklog Id: (was: 777322) Time Spent: 2.5h (was: 2h 20m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 2.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=777316&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-777316 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 02/Jun/22 07:22 Start Date: 02/Jun/22 07:22 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r887645373 ## ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: ## @@ -13910,7 +13911,13 @@ private void addDbAndTabToOutputs(String[] qualifiedTabName, TableType type, for(Map.Entry serdeMap : storageFormat.getSerdeProps().entrySet()){ t.setSerdeParam(serdeMap.getKey(), serdeMap.getValue()); } -outputs.add(new WriteEntity(t, WriteEntity.WriteType.DDL_NO_LOCK)); +if (tblProps != null && +tblProps.get(TABLE_IS_CTAS) == "true" && Review Comment: could we do `Boolean.parseBoolean(..)` instead of comparing with the string Issue Time Tracking --- Worklog Id: (was: 777316) Time Spent: 2h 20m (was: 2h 10m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=777312&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-777312 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 02/Jun/22 07:15 Start Date: 02/Jun/22 07:15 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r887639411 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: ## @@ -5283,6 +5284,39 @@ is performed on that db (e.g. show tables, created table, etc). return response; } } + +if (isValidTxn(txnId)) { + LockType lockType = LockTypeUtil.getLockTypeFromEncoding(lockChar) + .orElseThrow(() -> new MetaException("Unknown lock type: " + lockChar)); + + if (lockType == LockType.EXCL_WRITE && blockedBy.state == LockState.ACQUIRED) { Review Comment: I don't really like that we are adding extra overhead in checkLocks method, it's already a sensitive part performance-wise. I think we should try to optimize: if it's CTAS we know that it could only be blocked by another artificial CTAS or DROP database, so no need to run expensive checkLock `BIG` query. Also, I would expect IOW to behave similarly to CTAS, currently it doesn't fail and is executed in sequential order. Issue Time Tracking --- Worklog Id: (was: 777312) Time Spent: 2h 10m (was: 2h) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=777136&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-777136 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 01/Jun/22 19:59 Start Date: 01/Jun/22 19:59 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r887249049 ## ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CreateTableOperation.java: ## @@ -99,7 +99,8 @@ public int execute() throws HiveException { createTableNonReplaceMode(tbl); } -DDLUtils.addIfAbsentByName(new WriteEntity(tbl, WriteEntity.WriteType.DDL_NO_LOCK), context); + DDLUtils.addIfAbsentByName(new WriteEntity(tbl, WriteEntity.WriteType.DDL_NO_LOCK), context); Review Comment: Removed populating outputs in CreateTableOperation from the previous commit and retained it only in SemanticAnalyze. So, when removing the previous code, extra space has crept in. Will remove the extra line and space. Issue Time Tracking --- Worklog Id: (was: 777136) Time Spent: 2h (was: 1h 50m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=776814&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776814 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 01/Jun/22 11:42 Start Date: 01/Jun/22 11:42 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r886705645 ## ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CreateTableOperation.java: ## @@ -99,7 +99,8 @@ public int execute() throws HiveException { createTableNonReplaceMode(tbl); } -DDLUtils.addIfAbsentByName(new WriteEntity(tbl, WriteEntity.WriteType.DDL_NO_LOCK), context); + DDLUtils.addIfAbsentByName(new WriteEntity(tbl, WriteEntity.WriteType.DDL_NO_LOCK), context); Review Comment: what changed here, extra space? Issue Time Tracking --- Worklog Id: (was: 776814) Time Spent: 1h 50m (was: 1h 40m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=773781&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-773781 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 23/May/22 22:20 Start Date: 23/May/22 22:20 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r879908488 ## ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: ## @@ -13900,7 +13901,11 @@ private void addDbAndTabToOutputs(String[] qualifiedTabName, TableType type, for(Map.Entry serdeMap : storageFormat.getSerdeProps().entrySet()){ t.setSerdeParam(serdeMap.getKey(), serdeMap.getValue()); } -outputs.add(new WriteEntity(t, WriteEntity.WriteType.DDL_NO_LOCK)); +if (tblProps.get("created_with_ctas") == "true") { + outputs.add(new WriteEntity(t, WriteType.CTAS)); Review Comment: In CreateTableOperation, DDLUtils.addIfAbsentByName would populate the output only if it was missed in the SemanticAnalyzer stage. So I think, removing this step in CreateTableOperation and retaining it only in SemanticAnalyze would be fine. Issue Time Tracking --- Worklog Id: (was: 773781) Time Spent: 1h 40m (was: 1.5h) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=773780&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-773780 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 23/May/22 22:17 Start Date: 23/May/22 22:17 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r879906851 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: ## @@ -5283,6 +5284,39 @@ is performed on that db (e.g. show tables, created table, etc). return response; } } + +if (isValidTxn(txnId)) { + LockType lockType = LockTypeUtil.getLockTypeFromEncoding(lockChar) + .orElseThrow(() -> new MetaException("Unknown lock type: " + lockChar)); + + if (lockType == LockType.EXCL_WRITE && blockedBy.state == LockState.ACQUIRED) { Review Comment: We do not know at what stage the 1st query can abort. As it is non-deterministic, we needed to make an assumption. So when this is enabled via the conf, we will be optimistic about the outcome and assume the 1st query always succeeds. With this assumption, we can fail-early the 2nd concurrent ctas query and prevent any unnecessary move tasks and clean up that would have been associated with the 2nd query, if it was to continue until the commit stage. Also, the 2nd user will not have to wait for a long time to find out that the query failed. But when this feature is disabled, the query will run with a pessimistic assumption that the 1st query can abort. As a result, it does not fail the 2nd query until the commit stage. This will result in a lot of overhead and clean-up associated with the failed query. This may also make the user wait for a long time only to find out that the query failed which I think is not ideal. This was my thought process. Would this be fine? Issue Time Tracking --- Worklog Id: (was: 773780) Time Spent: 1.5h (was: 1h 20m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=773775&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-773775 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 23/May/22 22:08 Start Date: 23/May/22 22:08 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r879902730 ## ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java: ## @@ -18,24 +18,13 @@ package org.apache.hadoop.hive.ql.lockmgr; import org.apache.commons.lang3.StringUtils; -import org.apache.hadoop.fs.FileStatus; -import org.apache.hadoop.fs.FileSystem; -import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.*; import org.apache.hadoop.hive.common.JavaUtils; import org.apache.hadoop.hive.common.ValidTxnList; import org.apache.hadoop.hive.common.ValidWriteIdList; import org.apache.hadoop.hive.metastore.MetastoreTaskThread; -import org.apache.hadoop.hive.metastore.api.AddDynamicPartitions; -import org.apache.hadoop.hive.metastore.api.AllocateTableWriteIdsRequest; -import org.apache.hadoop.hive.metastore.api.AllocateTableWriteIdsResponse; -import org.apache.hadoop.hive.metastore.api.DataOperationType; -import org.apache.hadoop.hive.metastore.api.LockState; -import org.apache.hadoop.hive.metastore.api.LockType; -import org.apache.hadoop.hive.metastore.api.ShowLocksRequest; -import org.apache.hadoop.hive.metastore.api.ShowLocksResponse; -import org.apache.hadoop.hive.metastore.api.ShowLocksResponseElement; -import org.apache.hadoop.hive.metastore.api.TxnType; -import org.apache.hadoop.hive.metastore.api.CommitTxnRequest; +import org.apache.hadoop.hive.metastore.Warehouse; +import org.apache.hadoop.hive.metastore.api.*; Review Comment: This was due to Intellij's auto-import. Will remove the wild card. Issue Time Tracking --- Worklog Id: (was: 773775) Time Spent: 1h 20m (was: 1h 10m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=773774&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-773774 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 23/May/22 22:08 Start Date: 23/May/22 22:08 Worklog Time Spent: 10m Work Description: simhadri-g commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r879902647 ## ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: ## @@ -13900,7 +13901,11 @@ private void addDbAndTabToOutputs(String[] qualifiedTabName, TableType type, for(Map.Entry serdeMap : storageFormat.getSerdeProps().entrySet()){ t.setSerdeParam(serdeMap.getKey(), serdeMap.getValue()); } -outputs.add(new WriteEntity(t, WriteEntity.WriteType.DDL_NO_LOCK)); +if (tblProps.get("created_with_ctas") == "true") { Review Comment: Thanks for the review:) . Yes, will update this. Issue Time Tracking --- Worklog Id: (was: 773774) Time Spent: 1h 10m (was: 1h) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=773310&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-773310 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 23/May/22 07:19 Start Date: 23/May/22 07:19 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r879099718 ## ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: ## @@ -13900,7 +13901,11 @@ private void addDbAndTabToOutputs(String[] qualifiedTabName, TableType type, for(Map.Entry serdeMap : storageFormat.getSerdeProps().entrySet()){ t.setSerdeParam(serdeMap.getKey(), serdeMap.getValue()); } -outputs.add(new WriteEntity(t, WriteEntity.WriteType.DDL_NO_LOCK)); +if (tblProps.get("created_with_ctas") == "true") { + outputs.add(new WriteEntity(t, WriteType.CTAS)); Review Comment: Why is this needed/when, we are populating outputs in CreateTableOperation as well? Issue Time Tracking --- Worklog Id: (was: 773310) Time Spent: 1h (was: 50m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=773309&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-773309 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 23/May/22 07:15 Start Date: 23/May/22 07:15 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r879096827 ## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: ## @@ -5283,6 +5284,39 @@ is performed on that db (e.g. show tables, created table, etc). return response; } } + +if (isValidTxn(txnId)) { + LockType lockType = LockTypeUtil.getLockTypeFromEncoding(lockChar) + .orElseThrow(() -> new MetaException("Unknown lock type: " + lockChar)); + + if (lockType == LockType.EXCL_WRITE && blockedBy.state == LockState.ACQUIRED) { Review Comment: I don't think we should give up if there is already CTAS on the same table. What happens if the 1st CTAS aborts? Issue Time Tracking --- Worklog Id: (was: 773309) Time Spent: 50m (was: 40m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=773304&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-773304 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 23/May/22 07:01 Start Date: 23/May/22 07:01 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r879086482 ## ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java: ## @@ -18,24 +18,13 @@ package org.apache.hadoop.hive.ql.lockmgr; import org.apache.commons.lang3.StringUtils; -import org.apache.hadoop.fs.FileStatus; -import org.apache.hadoop.fs.FileSystem; -import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.*; Review Comment: please remove wildcard imports ## ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java: ## @@ -18,24 +18,13 @@ package org.apache.hadoop.hive.ql.lockmgr; import org.apache.commons.lang3.StringUtils; -import org.apache.hadoop.fs.FileStatus; -import org.apache.hadoop.fs.FileSystem; -import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.*; import org.apache.hadoop.hive.common.JavaUtils; import org.apache.hadoop.hive.common.ValidTxnList; import org.apache.hadoop.hive.common.ValidWriteIdList; import org.apache.hadoop.hive.metastore.MetastoreTaskThread; -import org.apache.hadoop.hive.metastore.api.AddDynamicPartitions; -import org.apache.hadoop.hive.metastore.api.AllocateTableWriteIdsRequest; -import org.apache.hadoop.hive.metastore.api.AllocateTableWriteIdsResponse; -import org.apache.hadoop.hive.metastore.api.DataOperationType; -import org.apache.hadoop.hive.metastore.api.LockState; -import org.apache.hadoop.hive.metastore.api.LockType; -import org.apache.hadoop.hive.metastore.api.ShowLocksRequest; -import org.apache.hadoop.hive.metastore.api.ShowLocksResponse; -import org.apache.hadoop.hive.metastore.api.ShowLocksResponseElement; -import org.apache.hadoop.hive.metastore.api.TxnType; -import org.apache.hadoop.hive.metastore.api.CommitTxnRequest; +import org.apache.hadoop.hive.metastore.Warehouse; +import org.apache.hadoop.hive.metastore.api.*; Review Comment: please remove wildcard imports Issue Time Tracking --- Worklog Id: (was: 773304) Time Spent: 0.5h (was: 20m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=773303&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-773303 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 23/May/22 07:01 Start Date: 23/May/22 07:01 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r879086214 ## ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: ## @@ -13900,7 +13901,11 @@ private void addDbAndTabToOutputs(String[] qualifiedTabName, TableType type, for(Map.Entry serdeMap : storageFormat.getSerdeProps().entrySet()){ t.setSerdeParam(serdeMap.getKey(), serdeMap.getValue()); } -outputs.add(new WriteEntity(t, WriteEntity.WriteType.DDL_NO_LOCK)); +if (tblProps.get("created_with_ctas") == "true") { Review Comment: is there any constant for "created_with_ctas"? Issue Time Tracking --- Worklog Id: (was: 773303) Time Spent: 20m (was: 10m) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=773306&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-773306 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 23/May/22 07:02 Start Date: 23/May/22 07:02 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3307: URL: https://github.com/apache/hive/pull/3307#discussion_r879087417 ## ql/src/java/org/apache/hadoop/hive/ql/ddl/table/create/CreateTableOperation.java: ## @@ -98,8 +98,11 @@ public int execute() throws HiveException { } createTableNonReplaceMode(tbl); } - -DDLUtils.addIfAbsentByName(new WriteEntity(tbl, WriteEntity.WriteType.DDL_NO_LOCK), context); +if (context.getQueryState().getCommandType().equals("CREATETABLE_AS_SELECT")) { Review Comment: should we generalize this to ACID create statements, not only CTAS? Issue Time Tracking --- Worklog Id: (was: 773306) Time Spent: 40m (was: 0.5h) > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (HIVE-26244) Implementing locking for concurrent ctas
[ https://issues.apache.org/jira/browse/HIVE-26244?focusedWorklogId=772887&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-772887 ] ASF GitHub Bot logged work on HIVE-26244: - Author: ASF GitHub Bot Created on: 20/May/22 15:30 Start Date: 20/May/22 15:30 Worklog Time Spent: 10m Work Description: simhadri-g opened a new pull request, #3307: URL: https://github.com/apache/hive/pull/3307 ### What changes were proposed in this pull request? 1. Address the issue with concurrent CTAS operations when creating a table with the same name. 2. Introduce a new dataOperationType for CTAS(t). 3. Change the lock that is taken by the CTAS operation for transactional tables from DDL_NO_LOCK to EXCL_WRITE lock. 4. Check for entries in TXN_COMPONETS table when a concurrent CTAS operation (to create the same table) is blocked and fail-early the concurrent CTAS operation as it is unnecessary. ### Why are the changes needed? 1. Currently, the CTAS operation does not acquire any lock(ie DDL_NO_LOCK). 2. Let us say that there are 2 concurrent CTAS operations on the same target table. This will result in a race to determine who commits first. The query that commits first will succeed whereas the query that is yet to commit will fail with the table already exists exception. This will result in an unnecessary overhead of cleaning up any data written by the failed query and a significant amount of unnecessary move tasks. With this PR, CTAS operation for transactional tables will acquire EXCL_WRITE lock and fail-early any other concurrent ctas query that tries to create the same table. ### Does this PR introduce _any_ user-facing change? New conf is introduced hive.acid.no.rename.ctas.enabled ### How was this patch tested? Unit tests and Q tests. Issue Time Tracking --- Worklog Id: (was: 772887) Remaining Estimate: 0h Time Spent: 10m > Implementing locking for concurrent ctas > > > Key: HIVE-26244 > URL: https://issues.apache.org/jira/browse/HIVE-26244 > Project: Hive > Issue Type: Improvement >Reporter: Simhadri G >Assignee: Simhadri G >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007)