[jira] [Work logged] (HIVE-26411) Fix TestReplicationMetricCollector flakiness
[ https://issues.apache.org/jira/browse/HIVE-26411?focusedWorklogId=794082&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794082 ] ASF GitHub Bot logged work on HIVE-26411: - Author: ASF GitHub Bot Created on: 22/Jul/22 06:46 Start Date: 22/Jul/22 06:46 Worklog Time Spent: 10m Work Description: ghanko commented on code in PR #3458: URL: https://github.com/apache/hive/pull/3458#discussion_r927340981 ## ql/src/test/org/apache/hadoop/hive/ql/parse/repl/metric/TestReplicationMetricCollector.java: ## @@ -75,6 +84,12 @@ public void setup() throws Exception { MetricCollector.getInstance().init(conf); Mockito.when(fmd.getFailoverEventId()).thenReturn(10L); Mockito.when(fmd.getFilePath()).thenReturn("dummyDir"); +disableBackgroundThreads(); + } + + private void disableBackgroundThreads() { +PowerMockito.mockStatic(MetricSink.class); +Mockito.when(MetricSink.getInstance()).thenReturn(metricSinkInstance); Review Comment: Yes, this is the main idea and I verified locally with the debugger that the background threads are not started. Precisely what happens is that whenever MetricSink.getInstance() is called, it returns the mock metricSinkInstance and even if its init() is called, it does nothing because it's an empty method provided by the mock framework. Issue Time Tracking --- Worklog Id: (was: 794082) Time Spent: 0.5h (was: 20m) > Fix TestReplicationMetricCollector flakiness > > > Key: HIVE-26411 > URL: https://issues.apache.org/jira/browse/HIVE-26411 > Project: Hive > Issue Type: Bug > Components: Tests >Reporter: Hankó Gergely >Assignee: Hankó Gergely >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > TestReplicationMetricCollector tests can fail intermittently because > ReplicationMetricCollector schedules a MetrikSink thread that consumes the > MetricCollector's > metricMap regularly and if this happens at the wrong time, the tests, that > use the MetricCollector.getInstance().getMetrics() method, can fail. > Example stack trace: > {code:java} > java.lang.AssertionError: expected:<1> but was:<0> at > org.junit.Assert.fail(Assert.java:88) at > org.junit.Assert.failNotEquals(Assert.java:743) at > org.junit.Assert.assertEquals(Assert.java:118) at > org.junit.Assert.assertEquals(Assert.java:555) at > org.junit.Assert.assertEquals(Assert.java:542) at > org.apache.hadoop.hive.ql.parse.repl.metric.TestReplicationMetricCollector.testFailoverReadyDumpMetrics(TestReplicationMetricCollector.java:227){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
[ https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794071&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794071 ] ASF GitHub Bot logged work on HIVE-26414: - Author: ASF GitHub Bot Created on: 22/Jul/22 06:07 Start Date: 22/Jul/22 06:07 Worklog Time Spent: 10m Work Description: SourabhBadhya commented on code in PR #3457: URL: https://github.com/apache/hive/pull/3457#discussion_r927295389 ## ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java: ## @@ -485,6 +480,26 @@ private void clearLocksAndHB() { stopHeartbeat(); } + private void cleanupDirForCTAS() { Review Comment: Currently `Context ctx` is not available during `rollbackTxn` which is why I chose to store the object. However I agree passing `Context ctx` is better. There are multiple ways of doing this - First would be to involve passing `Context ctx` to `rollbackTxn` method which would change the HiveTxnManager API itself (I particularly dont like this since this would be a breaking change). Or we could create a new function in the `HiveTxnManager` interface of the same name and call it from the driver when rollback conditions are satisfied. My idea was to avoid both and initialise the `destinationTable` in one of the existing APIs but I am open for any other suggestions. Issue Time Tracking --- Worklog Id: (was: 794071) Time Spent: 1h 50m (was: 1h 40m) > Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data > --- > > Key: HIVE-26414 > URL: https://issues.apache.org/jira/browse/HIVE-26414 > Project: Hive > Issue Type: Improvement >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > When a CTAS query fails before creation of table and after writing the data, > the data is present in the directory and not cleaned up currently by the > cleaner or any other mechanism currently. This is because the cleaner > requires a table corresponding to what its cleaning. In order surpass such a > situation, we can directly pass the relevant information to the cleaner so > that such uncommitted data is deleted. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
[ https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794063&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794063 ] ASF GitHub Bot logged work on HIVE-26414: - Author: ASF GitHub Bot Created on: 22/Jul/22 05:47 Start Date: 22/Jul/22 05:47 Worklog Time Spent: 10m Work Description: SourabhBadhya commented on code in PR #3457: URL: https://github.com/apache/hive/pull/3457#discussion_r927309557 ## ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java: ## @@ -485,6 +480,26 @@ private void clearLocksAndHB() { stopHeartbeat(); } + private void cleanupDirForCTAS() { +if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) { Review Comment: The idea is that `destinationTable` object is set only when CTAS operations are performed ([here](https://github.com/apache/hive/pull/3457/files#diff-d4b1a32bbbd9e283893a6b52854c7aeb3e356a1ba1add2c4107e52901ca268f9R7856)). So no inherent need of checking whether the operation is CTAS. As far as exclusive lock is enabled, it would be right to perform cleanup when a exclusive lock is acquired otherwise we might have a situation wherein the cleaner is cleaning and concurrent CTAS operations write to the same location causing issues. Issue Time Tracking --- Worklog Id: (was: 794063) Time Spent: 1h 40m (was: 1.5h) > Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data > --- > > Key: HIVE-26414 > URL: https://issues.apache.org/jira/browse/HIVE-26414 > Project: Hive > Issue Type: Improvement >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > When a CTAS query fails before creation of table and after writing the data, > the data is present in the directory and not cleaned up currently by the > cleaner or any other mechanism currently. This is because the cleaner > requires a table corresponding to what its cleaning. In order surpass such a > situation, we can directly pass the relevant information to the cleaner so > that such uncommitted data is deleted. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26375) Invalid materialized view after rebuild if source table was compacted
[ https://issues.apache.org/jira/browse/HIVE-26375?focusedWorklogId=794061&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794061 ] ASF GitHub Bot logged work on HIVE-26375: - Author: ASF GitHub Bot Created on: 22/Jul/22 05:34 Start Date: 22/Jul/22 05:34 Worklog Time Spent: 10m Work Description: amansinha100 commented on code in PR #3420: URL: https://github.com/apache/hive/pull/3420#discussion_r927303861 ## itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestMaterializedViewRebuild.java: ## @@ -97,7 +91,7 @@ public void testWhenMajorCompactionThenIncrementalMVRebuildIsStillAvailable() th txnHandler.cleanTxnToWriteIdTable(); List result = execSelectAndDumpData("explain cbo alter materialized view " + MV1 + " rebuild", driver, ""); -Assert.assertEquals(INCREMENTAL_REBUILD_PLAN, result); +Assert.assertEquals(FULL_REBUILD_PLAN, result); Review Comment: Thanks for the explanation. Makes sense that since the entries are deleted from the COMPLETED_TXN_COMPONENTS table at compaction, there's no way to check if only insert operation was done in a full ACID table. Issue Time Tracking --- Worklog Id: (was: 794061) Time Spent: 40m (was: 0.5h) > Invalid materialized view after rebuild if source table was compacted > - > > Key: HIVE-26375 > URL: https://issues.apache.org/jira/browse/HIVE-26375 > Project: Hive > Issue Type: Bug > Components: Materialized views, Transactions >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > After HIVE-25656 MV state depends on the number of rows deleted/updated in > the source tables of the view. However if one of the source tables are major > compacted the delete delta files are no longer available and reproducing the > rows should be deleted from the MV is no longer possible. > {code} > create table t1(a int, b varchar(128), c float) stored as orc TBLPROPERTIES > ('transactional'='true'); > insert into t1(a,b, c) values (1, 'one', 1.1), (2, 'two', 2.2), (NULL, NULL, > NULL); > create materialized view mv1 stored as orc TBLPROPERTIES > ('transactional'='true') as select a,b,c from t1 where a > 0 or a is null; > update t1 set b = 'Changed' where a = 1; > alter table t1 compact 'major'; > alter materialized view t1 rebuild; > select * from mv1; > {code} > Select should result > {code} > "1\tChanged\t1.1", > "2\ttwo\t2.2", > "NULL\tNULL\tNULL" > {code} > but was > {code} > "1\tone\t1.1", > "2\ttwo\t2.2", > "NULL\tNULL\tNULL", > "1\tChanged\t1.1" > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
[ https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794058&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794058 ] ASF GitHub Bot logged work on HIVE-26414: - Author: ASF GitHub Bot Created on: 22/Jul/22 05:13 Start Date: 22/Jul/22 05:13 Worklog Time Spent: 10m Work Description: SourabhBadhya commented on code in PR #3457: URL: https://github.com/apache/hive/pull/3457#discussion_r927295389 ## ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java: ## @@ -485,6 +480,26 @@ private void clearLocksAndHB() { stopHeartbeat(); } + private void cleanupDirForCTAS() { Review Comment: Currently `Context ctx` is not available during `rollbackTxn` which is why I chose to store the object. However I agree passing `Context ctx` is better. There are multiple ways of doing this - First would be to involve passing `Context ctx` to `rollbackTxn` method which would change the HiveTxnManager API itself (I particularly dont like this since this would be a breaking change). Or we could create a new function in the `HiveTxnManager` interface of the same name and call it from the driver when rollback conditions are satisfied. My idea was to avoid both and initialise the destination in one of the existing APIs but I am open for any other suggestions. Issue Time Tracking --- Worklog Id: (was: 794058) Time Spent: 1.5h (was: 1h 20m) > Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data > --- > > Key: HIVE-26414 > URL: https://issues.apache.org/jira/browse/HIVE-26414 > Project: Hive > Issue Type: Improvement >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > When a CTAS query fails before creation of table and after writing the data, > the data is present in the directory and not cleaned up currently by the > cleaner or any other mechanism currently. This is because the cleaner > requires a table corresponding to what its cleaning. In order surpass such a > situation, we can directly pass the relevant information to the cleaner so > that such uncommitted data is deleted. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26400) Provide a self-contained docker
[ https://issues.apache.org/jira/browse/HIVE-26400?focusedWorklogId=794057&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794057 ] ASF GitHub Bot logged work on HIVE-26400: - Author: ASF GitHub Bot Created on: 22/Jul/22 05:07 Start Date: 22/Jul/22 05:07 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on PR #3448: URL: https://github.com/apache/hive/pull/3448#issuecomment-1192186831 The new changes add support for running with local version of Hive: ``` sh deploy.sh --hadoop --tez ``` this command will build the image with given Hadoop and Tez version, and the local packaging/target/apache-hive-${project.version}-bin.tar.gz that built from source, a cluster with HiveServer2, Metastore and MySQL would be started. We can also build the image with a specified Hive version, just append `--hive ` to the above command. By default, the command will read the version info from project `pom.xml`: `project.version`, `hadoop.version`, `tez.version`, these properties are read as hive version, hadoop.version, tez.version and used for `deploy.sh` to build the image. Besides, we can start a standalone HiveServer2 only with embedded Metastore, ``` sh deploy.sh --hiveserver2 ``` or just start a standalone Metastore with derby, ``` sh deploy.sh --metastore ``` Issue Time Tracking --- Worklog Id: (was: 794057) Time Spent: 0.5h (was: 20m) > Provide a self-contained docker > --- > > Key: HIVE-26400 > URL: https://issues.apache.org/jira/browse/HIVE-26400 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26415) Add epoch time in the information_schema.scheduled_executions view
[ https://issues.apache.org/jira/browse/HIVE-26415?focusedWorklogId=794048&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794048 ] ASF GitHub Bot logged work on HIVE-26415: - Author: ASF GitHub Bot Created on: 22/Jul/22 04:45 Start Date: 22/Jul/22 04:45 Worklog Time Spent: 10m Work Description: shreenidhiSaigaonkar opened a new pull request, #3467: URL: https://github.com/apache/hive/pull/3467 This commit doesn't contain any secrets. ### What changes were proposed in this pull request? - It just adds an extra column in the ```scheduled_executions``` view of ```information_schema``` ### Why are the changes needed? - Makes correlation between ```replication_metrics``` and ```scheduled_execution``` easier. ### Does this PR introduce _any_ user-facing change? - No ### How was this patch tested? - Manually tested by running the script through beeline interface. Issue Time Tracking --- Worklog Id: (was: 794048) Remaining Estimate: 167.5h (was: 167h 40m) Time Spent: 0.5h (was: 20m) > Add epoch time in the information_schema.scheduled_executions view > -- > > Key: HIVE-26415 > URL: https://issues.apache.org/jira/browse/HIVE-26415 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0 >Reporter: Imran >Assignee: Shreenidhi >Priority: Major > Labels: pull-request-available > Original Estimate: 168h > Time Spent: 0.5h > Remaining Estimate: 167.5h > > information_schema.scheduled_executions shows time as the System time. > replication_metrics shows time in epoch time. > Only way to corelate the two is using the scheduled_execution id. Looking at > the time at the 2 tables causes some confusion. So we can add a new column in > the information_schema.scheduled_executions view displaying the epoch time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26415) Add epoch time in the information_schema.scheduled_executions view
[ https://issues.apache.org/jira/browse/HIVE-26415?focusedWorklogId=794047&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794047 ] ASF GitHub Bot logged work on HIVE-26415: - Author: ASF GitHub Bot Created on: 22/Jul/22 04:42 Start Date: 22/Jul/22 04:42 Worklog Time Spent: 10m Work Description: shreenidhiSaigaonkar closed pull request #3465: HIVE-26415 : Add epoch time in the information_schema.scheduled_executions view URL: https://github.com/apache/hive/pull/3465 Issue Time Tracking --- Worklog Id: (was: 794047) Remaining Estimate: 167h 40m (was: 167h 50m) Time Spent: 20m (was: 10m) > Add epoch time in the information_schema.scheduled_executions view > -- > > Key: HIVE-26415 > URL: https://issues.apache.org/jira/browse/HIVE-26415 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0 >Reporter: Imran >Assignee: Shreenidhi >Priority: Major > Labels: pull-request-available > Original Estimate: 168h > Time Spent: 20m > Remaining Estimate: 167h 40m > > information_schema.scheduled_executions shows time as the System time. > replication_metrics shows time in epoch time. > Only way to corelate the two is using the scheduled_execution id. Looking at > the time at the 2 tables causes some confusion. So we can add a new column in > the information_schema.scheduled_executions view displaying the epoch time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26411) Fix TestReplicationMetricCollector flakiness
[ https://issues.apache.org/jira/browse/HIVE-26411?focusedWorklogId=793951&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793951 ] ASF GitHub Bot logged work on HIVE-26411: - Author: ASF GitHub Bot Created on: 21/Jul/22 22:24 Start Date: 21/Jul/22 22:24 Worklog Time Spent: 10m Work Description: jfsii commented on code in PR #3458: URL: https://github.com/apache/hive/pull/3458#discussion_r927147243 ## ql/src/test/org/apache/hadoop/hive/ql/parse/repl/metric/TestReplicationMetricCollector.java: ## @@ -75,6 +84,12 @@ public void setup() throws Exception { MetricCollector.getInstance().init(conf); Mockito.when(fmd.getFailoverEventId()).thenReturn(10L); Mockito.when(fmd.getFilePath()).thenReturn("dummyDir"); +disableBackgroundThreads(); + } + + private void disableBackgroundThreads() { +PowerMockito.mockStatic(MetricSink.class); +Mockito.when(MetricSink.getInstance()).thenReturn(metricSinkInstance); Review Comment: I'm new to the this testing framework - but I assume my understanding is correct: This works because metricSinkInstance gets its MetricSink() constructor called because it is annotated with Mock and since the metricSinkInstance never gets init called on it, it never starts the executorService, correct? Issue Time Tracking --- Worklog Id: (was: 793951) Time Spent: 20m (was: 10m) > Fix TestReplicationMetricCollector flakiness > > > Key: HIVE-26411 > URL: https://issues.apache.org/jira/browse/HIVE-26411 > Project: Hive > Issue Type: Bug > Components: Tests >Reporter: Hankó Gergely >Assignee: Hankó Gergely >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > TestReplicationMetricCollector tests can fail intermittently because > ReplicationMetricCollector schedules a MetrikSink thread that consumes the > MetricCollector's > metricMap regularly and if this happens at the wrong time, the tests, that > use the MetricCollector.getInstance().getMetrics() method, can fail. > Example stack trace: > {code:java} > java.lang.AssertionError: expected:<1> but was:<0> at > org.junit.Assert.fail(Assert.java:88) at > org.junit.Assert.failNotEquals(Assert.java:743) at > org.junit.Assert.assertEquals(Assert.java:118) at > org.junit.Assert.assertEquals(Assert.java:555) at > org.junit.Assert.assertEquals(Assert.java:542) at > org.apache.hadoop.hive.ql.parse.repl.metric.TestReplicationMetricCollector.testFailoverReadyDumpMetrics(TestReplicationMetricCollector.java:227){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26419) Use a different pool for DataNucleus' secondary connection factory
[ https://issues.apache.org/jira/browse/HIVE-26419?focusedWorklogId=793887&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793887 ] ASF GitHub Bot logged work on HIVE-26419: - Author: ASF GitHub Bot Created on: 21/Jul/22 18:23 Start Date: 21/Jul/22 18:23 Worklog Time Spent: 10m Work Description: hsnusonic opened a new pull request, #3466: URL: https://github.com/apache/hive/pull/3466 …n factory ### What changes were proposed in this pull request? Introduce another connection pool for DataNucleus' secondary connection factory ### Why are the changes needed? We now use the same connection pool for primary and secondary connection factory. It can possibly cause connection starvation when DataNucleus try to acquire a lock for schema validation or value generation but there is no available connection in the pool. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Running Impala test suites with pool size = 1 and all tests can pass Issue Time Tracking --- Worklog Id: (was: 793887) Remaining Estimate: 0h Time Spent: 10m > Use a different pool for DataNucleus' secondary connection factory > -- > > Key: HIVE-26419 > URL: https://issues.apache.org/jira/browse/HIVE-26419 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore >Reporter: Yu-Wen Lai >Assignee: Yu-Wen Lai >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Quote from DataNucleus documentation: > {quote}The secondary connection factory is used for schema generation, and > for value generation operations (unless specified to use primary). > {quote} > We should not use same connection pool for DataNucleus' primary and secondary > connection factory. An awful situation is that each thread holds one > connection and request for another connection for value generation, but no > connection is available in the pool. It will keep retrying and fail at the > end. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26419) Use a different pool for DataNucleus' secondary connection factory
[ https://issues.apache.org/jira/browse/HIVE-26419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-26419: -- Labels: pull-request-available (was: ) > Use a different pool for DataNucleus' secondary connection factory > -- > > Key: HIVE-26419 > URL: https://issues.apache.org/jira/browse/HIVE-26419 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore >Reporter: Yu-Wen Lai >Assignee: Yu-Wen Lai >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Quote from DataNucleus documentation: > {quote}The secondary connection factory is used for schema generation, and > for value generation operations (unless specified to use primary). > {quote} > We should not use same connection pool for DataNucleus' primary and secondary > connection factory. An awful situation is that each thread holds one > connection and request for another connection for value generation, but no > connection is available in the pool. It will keep retrying and fail at the > end. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26419) Use a different pool for DataNucleus' secondary connection factory
[ https://issues.apache.org/jira/browse/HIVE-26419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu-Wen Lai reassigned HIVE-26419: - > Use a different pool for DataNucleus' secondary connection factory > -- > > Key: HIVE-26419 > URL: https://issues.apache.org/jira/browse/HIVE-26419 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore >Reporter: Yu-Wen Lai >Assignee: Yu-Wen Lai >Priority: Major > > Quote from DataNucleus documentation: > {quote}The secondary connection factory is used for schema generation, and > for value generation operations (unless specified to use primary). > {quote} > We should not use same connection pool for DataNucleus' primary and secondary > connection factory. An awful situation is that each thread holds one > connection and request for another connection for value generation, but no > connection is available in the pool. It will keep retrying and fail at the > end. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26415) Add epoch time in the information_schema.scheduled_executions view
[ https://issues.apache.org/jira/browse/HIVE-26415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-26415: -- Labels: pull-request-available (was: ) > Add epoch time in the information_schema.scheduled_executions view > -- > > Key: HIVE-26415 > URL: https://issues.apache.org/jira/browse/HIVE-26415 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0 >Reporter: Imran >Assignee: Shreenidhi >Priority: Major > Labels: pull-request-available > Original Estimate: 168h > Time Spent: 10m > Remaining Estimate: 167h 50m > > information_schema.scheduled_executions shows time as the System time. > replication_metrics shows time in epoch time. > Only way to corelate the two is using the scheduled_execution id. Looking at > the time at the 2 tables causes some confusion. So we can add a new column in > the information_schema.scheduled_executions view displaying the epoch time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26415) Add epoch time in the information_schema.scheduled_executions view
[ https://issues.apache.org/jira/browse/HIVE-26415?focusedWorklogId=793856&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793856 ] ASF GitHub Bot logged work on HIVE-26415: - Author: ASF GitHub Bot Created on: 21/Jul/22 17:26 Start Date: 21/Jul/22 17:26 Worklog Time Spent: 10m Work Description: shreenidhiSaigaonkar opened a new pull request, #3465: URL: https://github.com/apache/hive/pull/3465 This commit doesn't contain any secrets. ### What changes were proposed in this pull request? - It just adds an extra column in the ```scheduled_executions``` view of ```information_schema``` ### Why are the changes needed? - Makes correlation between ```replication_metrics``` and ```scheduled_execution``` easier. ### Does this PR introduce _any_ user-facing change? - No ### How was this patch tested? - Manually tested by running the script through beeline interface. Issue Time Tracking --- Worklog Id: (was: 793856) Remaining Estimate: 167h 50m (was: 168h) Time Spent: 10m > Add epoch time in the information_schema.scheduled_executions view > -- > > Key: HIVE-26415 > URL: https://issues.apache.org/jira/browse/HIVE-26415 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 4.0.0 >Reporter: Imran >Assignee: Shreenidhi >Priority: Major > Original Estimate: 168h > Time Spent: 10m > Remaining Estimate: 167h 50m > > information_schema.scheduled_executions shows time as the System time. > replication_metrics shows time in epoch time. > Only way to corelate the two is using the scheduled_execution id. Looking at > the time at the 2 tables causes some confusion. So we can add a new column in > the information_schema.scheduled_executions view displaying the epoch time. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-24484) Upgrade Hadoop to 3.3.1 And Tez to 0.10.2
[ https://issues.apache.org/jira/browse/HIVE-24484?focusedWorklogId=793735&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793735 ] ASF GitHub Bot logged work on HIVE-24484: - Author: ASF GitHub Bot Created on: 21/Jul/22 13:32 Start Date: 21/Jul/22 13:32 Worklog Time Spent: 10m Work Description: ayushtkn commented on PR #3279: URL: https://github.com/apache/hive/pull/3279#issuecomment-1191490566 >But Hive 3.1.x version is not very old and 4.x still looks like in alpha so we may not able to upgrade. so with this PR still we have compatibility issues with Hive 3.x version. any suggestions? thanks @sujith71955 Unfortunately, I don't have a use case for 3.x line, but that should be doable but would requires changes across Hadoop, Hive & Tez. We did the same here as well It is certainly doable, if you folks have a use case, feel free to create a Jira for 3.1.x line. Running busy so couldn't spare time to check the problem you folks stated above.. Issue Time Tracking --- Worklog Id: (was: 793735) Time Spent: 14h 13m (was: 14.05h) > Upgrade Hadoop to 3.3.1 And Tez to 0.10.2 > -- > > Key: HIVE-24484 > URL: https://issues.apache.org/jira/browse/HIVE-24484 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 14h 13m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work started] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
[ https://issues.apache.org/jira/browse/HIVE-26414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-26414 started by Sourabh Badhya. - > Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data > --- > > Key: HIVE-26414 > URL: https://issues.apache.org/jira/browse/HIVE-26414 > Project: Hive > Issue Type: Improvement >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > When a CTAS query fails before creation of table and after writing the data, > the data is present in the directory and not cleaned up currently by the > cleaner or any other mechanism currently. This is because the cleaner > requires a table corresponding to what its cleaning. In order surpass such a > situation, we can directly pass the relevant information to the cleaner so > that such uncommitted data is deleted. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26409) Assign NO_TXN operation type for table in global locks for scheduled queries
[ https://issues.apache.org/jira/browse/HIVE-26409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569415#comment-17569415 ] Sourabh Badhya commented on HIVE-26409: --- Thanks for the review [~dkuzmenko] . > Assign NO_TXN operation type for table in global locks for scheduled queries > > > Key: HIVE-26409 > URL: https://issues.apache.org/jira/browse/HIVE-26409 > Project: Hive > Issue Type: Bug >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 0.5h > Remaining Estimate: 0h > > NO_TXN operation type has to be assigned while acquiring locks for the table > in global locks because this will not add records to COMPLETED_TXN_COMPONENTS > table. Currently the record introduced by this lock had writeId as NULL. > There was a situation which lead to large number of records in > COMPLETED_TXN_COMPONENTS table because these records had writeId as NULL > which weren't deleted up by AcidHouseKeeperService/Cleaner and this > subsequently lead to OOM errors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-26409) Assign NO_TXN operation type for table in global locks for scheduled queries
[ https://issues.apache.org/jira/browse/HIVE-26409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sourabh Badhya resolved HIVE-26409. --- Fix Version/s: 4.0.0-alpha-2 Resolution: Fixed > Assign NO_TXN operation type for table in global locks for scheduled queries > > > Key: HIVE-26409 > URL: https://issues.apache.org/jira/browse/HIVE-26409 > Project: Hive > Issue Type: Bug >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 0.5h > Remaining Estimate: 0h > > NO_TXN operation type has to be assigned while acquiring locks for the table > in global locks because this will not add records to COMPLETED_TXN_COMPONENTS > table. Currently the record introduced by this lock had writeId as NULL. > There was a situation which lead to large number of records in > COMPLETED_TXN_COMPONENTS table because these records had writeId as NULL > which weren't deleted up by AcidHouseKeeperService/Cleaner and this > subsequently lead to OOM errors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26409) Assign NO_TXN operation type for table in global locks for scheduled queries
[ https://issues.apache.org/jira/browse/HIVE-26409?focusedWorklogId=793700&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793700 ] ASF GitHub Bot logged work on HIVE-26409: - Author: ASF GitHub Bot Created on: 21/Jul/22 11:55 Start Date: 21/Jul/22 11:55 Worklog Time Spent: 10m Work Description: deniskuzZ merged PR #3454: URL: https://github.com/apache/hive/pull/3454 Issue Time Tracking --- Worklog Id: (was: 793700) Time Spent: 0.5h (was: 20m) > Assign NO_TXN operation type for table in global locks for scheduled queries > > > Key: HIVE-26409 > URL: https://issues.apache.org/jira/browse/HIVE-26409 > Project: Hive > Issue Type: Bug >Reporter: Sourabh Badhya >Assignee: Sourabh Badhya >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > NO_TXN operation type has to be assigned while acquiring locks for the table > in global locks because this will not add records to COMPLETED_TXN_COMPONENTS > table. Currently the record introduced by this lock had writeId as NULL. > There was a situation which lead to large number of records in > COMPLETED_TXN_COMPONENTS table because these records had writeId as NULL > which weren't deleted up by AcidHouseKeeperService/Cleaner and this > subsequently lead to OOM errors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26397) Honour Iceberg sort orders when writing a table
[ https://issues.apache.org/jira/browse/HIVE-26397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569362#comment-17569362 ] László Pintér commented on HIVE-26397: -- Merged into master. Thanks, [~szita] and [~pvary] for the review! > Honour Iceberg sort orders when writing a table > --- > > Key: HIVE-26397 > URL: https://issues.apache.org/jira/browse/HIVE-26397 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > Iceberg specification defines sort orders. We should consider this when > writing to an Iceberg table through Hive. > See: https://iceberg.apache.org/spec/#sort-orders -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-26397) Honour Iceberg sort orders when writing a table
[ https://issues.apache.org/jira/browse/HIVE-26397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Pintér resolved HIVE-26397. -- Resolution: Fixed > Honour Iceberg sort orders when writing a table > --- > > Key: HIVE-26397 > URL: https://issues.apache.org/jira/browse/HIVE-26397 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > Iceberg specification defines sort orders. We should consider this when > writing to an Iceberg table through Hive. > See: https://iceberg.apache.org/spec/#sort-orders -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26397) Honour Iceberg sort orders when writing a table
[ https://issues.apache.org/jira/browse/HIVE-26397?focusedWorklogId=793681&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793681 ] ASF GitHub Bot logged work on HIVE-26397: - Author: ASF GitHub Bot Created on: 21/Jul/22 11:23 Start Date: 21/Jul/22 11:23 Worklog Time Spent: 10m Work Description: lcspinter merged PR #3445: URL: https://github.com/apache/hive/pull/3445 Issue Time Tracking --- Worklog Id: (was: 793681) Time Spent: 1h 10m (was: 1h) > Honour Iceberg sort orders when writing a table > --- > > Key: HIVE-26397 > URL: https://issues.apache.org/jira/browse/HIVE-26397 > Project: Hive > Issue Type: Improvement >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > Iceberg specification defines sort orders. We should consider this when > writing to an Iceberg table through Hive. > See: https://iceberg.apache.org/spec/#sort-orders -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
[ https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=793599&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793599 ] ASF GitHub Bot logged work on HIVE-26414: - Author: ASF GitHub Bot Created on: 21/Jul/22 08:33 Start Date: 21/Jul/22 08:33 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3457: URL: https://github.com/apache/hive/pull/3457#discussion_r926391951 ## ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java: ## @@ -485,6 +480,26 @@ private void clearLocksAndHB() { stopHeartbeat(); } + private void cleanupDirForCTAS() { Review Comment: could we pass a `Context ctx` here instead of initializing the destination table when acquiring locks? ## ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java: ## @@ -485,6 +480,26 @@ private void clearLocksAndHB() { stopHeartbeat(); } + private void cleanupDirForCTAS() { +if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) { Review Comment: shouldn't we check that this was a CTAS operation? do we only need to cleanup if an exclusive lock is enabled? ## ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java: ## @@ -485,6 +480,26 @@ private void clearLocksAndHB() { stopHeartbeat(); } + private void cleanupDirForCTAS() { +if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) { + if (destinationTable != null) { +try { + CompactionRequest rqst = new CompactionRequest( + destinationTable.getDbName(), destinationTable.getTableName(), CompactionType.MAJOR); + rqst.setRunas(TxnUtils.findUserToRunAs(destinationTable.getSd().getLocation(), + destinationTable.getTTable(), conf)); + + rqst.putToProperties("location", destinationTable.getSd().getLocation()); Review Comment: use hive_metastoreConstants.META_TABLE_LOCATION? ## ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java: ## @@ -29,19 +29,10 @@ Licensed to the Apache Software Foundation (ASF) under one import org.apache.hadoop.hive.metastore.IMetaStoreClient; import org.apache.hadoop.hive.metastore.LockComponentBuilder; import org.apache.hadoop.hive.metastore.LockRequestBuilder; -import org.apache.hadoop.hive.metastore.api.LockComponent; -import org.apache.hadoop.hive.metastore.api.LockResponse; -import org.apache.hadoop.hive.metastore.api.LockState; -import org.apache.hadoop.hive.metastore.api.MetaException; -import org.apache.hadoop.hive.metastore.api.NoSuchLockException; -import org.apache.hadoop.hive.metastore.api.NoSuchTxnException; -import org.apache.hadoop.hive.metastore.api.TxnAbortedException; -import org.apache.hadoop.hive.metastore.api.TxnToWriteId; -import org.apache.hadoop.hive.metastore.api.CommitTxnRequest; -import org.apache.hadoop.hive.metastore.api.DataOperationType; -import org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse; -import org.apache.hadoop.hive.metastore.api.TxnType; +import org.apache.hadoop.hive.metastore.api.*; Review Comment: please avoid wildcard imports ## ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java: ## @@ -485,6 +480,26 @@ private void clearLocksAndHB() { stopHeartbeat(); } + private void cleanupDirForCTAS() { +if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) { + if (destinationTable != null) { +try { + CompactionRequest rqst = new CompactionRequest( + destinationTable.getDbName(), destinationTable.getTableName(), CompactionType.MAJOR); + rqst.setRunas(TxnUtils.findUserToRunAs(destinationTable.getSd().getLocation(), + destinationTable.getTTable(), conf)); + + rqst.putToProperties("location", destinationTable.getSd().getLocation()); + rqst.putToProperties("ifPurge", Boolean.toString(true)); + TxnStore txnHandler = TxnUtils.getTxnStore(conf); + txnHandler.submitForCleanup(rqst, destinationTable.getTTable().getWriteId(), getCurrentTxnId()); +} catch (InterruptedException | IOException | MetaException e) { + throw new RuntimeException("Not able to submit cleanup operation of directory written by CTAS"); Review Comment: should we catch just InterruptedException | IOException and re-throw MetaException here? ## ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java: ## @@ -7852,6 +7852,10 @@ protected Operator genFileSinkPlan(String dest, QB qb, Operator input) throw new SemanticException("Error while getting the full qualified path for the given directory: " + ex.getMessage()); } } + + if (!isNonNativeTable && AcidUtils.isTransactionalTable(destinationTable) && qb.isCTAS()) { Re
[jira] [Work logged] (HIVE-26375) Invalid materialized view after rebuild if source table was compacted
[ https://issues.apache.org/jira/browse/HIVE-26375?focusedWorklogId=793591&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793591 ] ASF GitHub Bot logged work on HIVE-26375: - Author: ASF GitHub Bot Created on: 21/Jul/22 08:20 Start Date: 21/Jul/22 08:20 Worklog Time Spent: 10m Work Description: kasakrisz commented on code in PR #3420: URL: https://github.com/apache/hive/pull/3420#discussion_r926392298 ## itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestMaterializedViewRebuild.java: ## @@ -97,7 +91,7 @@ public void testWhenMajorCompactionThenIncrementalMVRebuildIsStillAvailable() th txnHandler.cleanTxnToWriteIdTable(); List result = execSelectAndDumpData("explain cbo alter materialized view " + MV1 + " rebuild", driver, ""); -Assert.assertEquals(INCREMENTAL_REBUILD_PLAN, result); +Assert.assertEquals(FULL_REBUILD_PLAN, result); Review Comment: We search for update/delete operations in the `COMPLETED_TXN_COMPONENTS` affected source tables at MV rebuild. Records are deleted from this table at compaction. So after compaction we can not confirm whether there were any deletes of any of the source tables any longer. It is relevant since executing an incremental rebuild plan which expects insert operations in all source table only in case there were deletes leads to data corruption in the refreshed view. The second rebuild can be an incremental since the first rebuild resets the source tables snapshot to a fresh one and txn data of operations done since that first rebuild still exists in `COMPLETED_TXN_COMPONENTS`. Issue Time Tracking --- Worklog Id: (was: 793591) Time Spent: 0.5h (was: 20m) > Invalid materialized view after rebuild if source table was compacted > - > > Key: HIVE-26375 > URL: https://issues.apache.org/jira/browse/HIVE-26375 > Project: Hive > Issue Type: Bug > Components: Materialized views, Transactions >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > After HIVE-25656 MV state depends on the number of rows deleted/updated in > the source tables of the view. However if one of the source tables are major > compacted the delete delta files are no longer available and reproducing the > rows should be deleted from the MV is no longer possible. > {code} > create table t1(a int, b varchar(128), c float) stored as orc TBLPROPERTIES > ('transactional'='true'); > insert into t1(a,b, c) values (1, 'one', 1.1), (2, 'two', 2.2), (NULL, NULL, > NULL); > create materialized view mv1 stored as orc TBLPROPERTIES > ('transactional'='true') as select a,b,c from t1 where a > 0 or a is null; > update t1 set b = 'Changed' where a = 1; > alter table t1 compact 'major'; > alter materialized view t1 rebuild; > select * from mv1; > {code} > Select should result > {code} > "1\tChanged\t1.1", > "2\ttwo\t2.2", > "NULL\tNULL\tNULL" > {code} > but was > {code} > "1\tone\t1.1", > "2\ttwo\t2.2", > "NULL\tNULL\tNULL", > "1\tChanged\t1.1" > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-25621) Alter table partition compact/concatenate commands should send HivePrivilegeObjects for Authz
[ https://issues.apache.org/jira/browse/HIVE-25621?focusedWorklogId=793569&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793569 ] ASF GitHub Bot logged work on HIVE-25621: - Author: ASF GitHub Bot Created on: 21/Jul/22 07:39 Start Date: 21/Jul/22 07:39 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on code in PR #2731: URL: https://github.com/apache/hive/pull/2731#discussion_r926352617 ## ql/src/java/org/apache/hadoop/hive/ql/ddl/table/storage/compact/AlterTableCompactAnalyzer.java: ## @@ -67,6 +73,17 @@ protected void analyzeCommand(TableName tableName, Map partition } AlterTableCompactDesc desc = new AlterTableCompactDesc(tableName, partitionSpec, type, isBlocking, mapProp); +Table table = getTable(tableName); +WriteEntity.WriteType writeType = null; +if (AcidUtils.isTransactionalTable(table)) { + setAcidDdlDesc(desc); + writeType = WriteType.DDL_EXCLUSIVE; Review Comment: cloud you please explain a little bit why we choose DDL_EXCLUSIVE for transactional tables? does it works the same for insert only tables? Issue Time Tracking --- Worklog Id: (was: 793569) Time Spent: 1h 10m (was: 1h) > Alter table partition compact/concatenate commands should send > HivePrivilegeObjects for Authz > - > > Key: HIVE-25621 > URL: https://issues.apache.org/jira/browse/HIVE-25621 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > # Run the following queries > Create table temp(c0 int) partitioned by (c1 int); > Insert into temp values(1,1); > ALTER TABLE temp PARTITION (c1=1) COMPACT 'minor'; > ALTER TABLE temp PARTITION (c1=1) CONCATENATE; > Insert into temp values(1,1); > # The above compact/concatenate commands are currently not sending any hive > privilege objects for authorization. Hive needs to send these objects to > avoid malicious users doing any operation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-25621) Alter table partition compact/concatenate commands should send HivePrivilegeObjects for Authz
[ https://issues.apache.org/jira/browse/HIVE-25621?focusedWorklogId=793568&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793568 ] ASF GitHub Bot logged work on HIVE-25621: - Author: ASF GitHub Bot Created on: 21/Jul/22 07:38 Start Date: 21/Jul/22 07:38 Worklog Time Spent: 10m Work Description: saihemanth-cloudera opened a new pull request, #2731: URL: https://github.com/apache/hive/pull/2731 …erge/concatenate ### What changes were proposed in this pull request? Added HivePrivilegeObjects for alter table merge/concatenate commands. ### Why are the changes needed? To prevent malicious users from doing alter operations on table. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Local machine, remote cluster. Issue Time Tracking --- Worklog Id: (was: 793568) Time Spent: 1h (was: 50m) > Alter table partition compact/concatenate commands should send > HivePrivilegeObjects for Authz > - > > Key: HIVE-25621 > URL: https://issues.apache.org/jira/browse/HIVE-25621 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > # Run the following queries > Create table temp(c0 int) partitioned by (c1 int); > Insert into temp values(1,1); > ALTER TABLE temp PARTITION (c1=1) COMPACT 'minor'; > ALTER TABLE temp PARTITION (c1=1) CONCATENATE; > Insert into temp values(1,1); > # The above compact/concatenate commands are currently not sending any hive > privilege objects for authorization. Hive needs to send these objects to > avoid malicious users doing any operation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-25621) Alter table partition compact/concatenate commands should send HivePrivilegeObjects for Authz
[ https://issues.apache.org/jira/browse/HIVE-25621?focusedWorklogId=793567&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793567 ] ASF GitHub Bot logged work on HIVE-25621: - Author: ASF GitHub Bot Created on: 21/Jul/22 07:38 Start Date: 21/Jul/22 07:38 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on code in PR #2731: URL: https://github.com/apache/hive/pull/2731#discussion_r926347947 ## ql/src/java/org/apache/hadoop/hive/ql/ddl/table/storage/compact/AlterTableCompactAnalyzer.java: ## @@ -67,6 +73,17 @@ protected void analyzeCommand(TableName tableName, Map partition } AlterTableCompactDesc desc = new AlterTableCompactDesc(tableName, partitionSpec, type, isBlocking, mapProp); +Table table = getTable(tableName); +WriteEntity.WriteType writeType = null; +if (AcidUtils.isTransactionalTable(table)) { + setAcidDdlDesc(desc); + writeType = WriteType.DDL_EXCLUSIVE; +} else { + writeType = WriteEntity.determineAlterTableWriteType(AlterTableType.COMPACT); +} +inputs.add(new ReadEntity(table)); Review Comment: should we take care of `partitionSpec` as well? ## ql/src/java/org/apache/hadoop/hive/ql/ddl/table/storage/concatenate/AlterTableConcatenateAnalyzer.java: ## @@ -95,10 +97,14 @@ protected void analyzeCommand(TableName tableName, Map partition } } - private void compactAcidTable(TableName tableName, Map partitionSpec) throws SemanticException { + private void compactAcidTable(TableName tableName, Table table, Map partitionSpec) throws SemanticException { boolean isBlocking = !HiveConf.getBoolVar(conf, ConfVars.TRANSACTIONAL_CONCATENATE_NOBLOCK, false); AlterTableCompactDesc desc = new AlterTableCompactDesc(tableName, partitionSpec, "MAJOR", isBlocking, null); +WriteEntity.WriteType writeType = WriteEntity.WriteType.DDL_EXCLUSIVE; Review Comment: should we take care of `partitionSpec` as well? ## ql/src/java/org/apache/hadoop/hive/ql/ddl/table/storage/compact/AlterTableCompactAnalyzer.java: ## @@ -67,6 +73,17 @@ protected void analyzeCommand(TableName tableName, Map partition } AlterTableCompactDesc desc = new AlterTableCompactDesc(tableName, partitionSpec, type, isBlocking, mapProp); +Table table = getTable(tableName); +WriteEntity.WriteType writeType = null; +if (AcidUtils.isTransactionalTable(table)) { + setAcidDdlDesc(desc); + writeType = WriteType.DDL_EXCLUSIVE; Review Comment: cloud you please explain a little bit why we choose DDL_EXCLUSIVE for transactional table? does it works the same for insert only tables? Issue Time Tracking --- Worklog Id: (was: 793567) Time Spent: 50m (was: 40m) > Alter table partition compact/concatenate commands should send > HivePrivilegeObjects for Authz > - > > Key: HIVE-25621 > URL: https://issues.apache.org/jira/browse/HIVE-25621 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Sai Hemanth Gantasala >Assignee: Sai Hemanth Gantasala >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > # Run the following queries > Create table temp(c0 int) partitioned by (c1 int); > Insert into temp values(1,1); > ALTER TABLE temp PARTITION (c1=1) COMPACT 'minor'; > ALTER TABLE temp PARTITION (c1=1) CONCATENATE; > Insert into temp values(1,1); > # The above compact/concatenate commands are currently not sending any hive > privilege objects for authorization. Hive needs to send these objects to > avoid malicious users doing any operation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26400) Provide a self-contained docker
[ https://issues.apache.org/jira/browse/HIVE-26400?focusedWorklogId=793560&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793560 ] ASF GitHub Bot logged work on HIVE-26400: - Author: ASF GitHub Bot Created on: 21/Jul/22 07:21 Start Date: 21/Jul/22 07:21 Worklog Time Spent: 10m Work Description: dengzhhu653 commented on PR #3448: URL: https://github.com/apache/hive/pull/3448#issuecomment-1191134925 > Is there any scope to run it with local version of Hive/Hadoop/Tez or do we need a released version always for this? The quick answer is yes, but there are some places to modify in order to run specified version: - change version in docker-compose.yml https://github.com/apache/hive/blob/213208570d1efa0f7a41d5a742edd0439b99163b/dev-support/docker/docker-compose.yml#L51-L52 - change the download url in Dockerfile https://github.com/apache/hive/blob/213208570d1efa0f7a41d5a742edd0439b99163b/dev-support/docker/Dockerfile#L37-L43 I'm wondering if we can build hive from source directly, but need some feedback and investigation. Issue Time Tracking --- Worklog Id: (was: 793560) Time Spent: 20m (was: 10m) > Provide a self-contained docker > --- > > Key: HIVE-26400 > URL: https://issues.apache.org/jira/browse/HIVE-26400 > Project: Hive > Issue Type: Improvement > Components: Build Infrastructure >Reporter: Zhihua Deng >Assignee: Zhihua Deng >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)