[jira] [Work logged] (HIVE-26411) Fix TestReplicationMetricCollector flakiness

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26411?focusedWorklogId=794082&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794082
 ]

ASF GitHub Bot logged work on HIVE-26411:
-

Author: ASF GitHub Bot
Created on: 22/Jul/22 06:46
Start Date: 22/Jul/22 06:46
Worklog Time Spent: 10m 
  Work Description: ghanko commented on code in PR #3458:
URL: https://github.com/apache/hive/pull/3458#discussion_r927340981


##
ql/src/test/org/apache/hadoop/hive/ql/parse/repl/metric/TestReplicationMetricCollector.java:
##
@@ -75,6 +84,12 @@ public void setup() throws Exception {
 MetricCollector.getInstance().init(conf);
 Mockito.when(fmd.getFailoverEventId()).thenReturn(10L);
 Mockito.when(fmd.getFilePath()).thenReturn("dummyDir");
+disableBackgroundThreads();
+  }
+
+  private void disableBackgroundThreads() {
+PowerMockito.mockStatic(MetricSink.class);
+Mockito.when(MetricSink.getInstance()).thenReturn(metricSinkInstance);

Review Comment:
   Yes, this is the main idea and I verified locally with the debugger that the 
background threads are not started. Precisely what happens is that whenever 
MetricSink.getInstance() is called, it returns the mock metricSinkInstance and 
even if its init() is called, it does nothing because it's an empty method 
provided by the mock framework.





Issue Time Tracking
---

Worklog Id: (was: 794082)
Time Spent: 0.5h  (was: 20m)

> Fix TestReplicationMetricCollector flakiness
> 
>
> Key: HIVE-26411
> URL: https://issues.apache.org/jira/browse/HIVE-26411
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Hankó Gergely
>Assignee: Hankó Gergely
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> TestReplicationMetricCollector tests can fail intermittently because 
> ReplicationMetricCollector schedules a MetrikSink thread that consumes the 
> MetricCollector's 
> metricMap regularly and if this happens at the wrong time, the tests, that 
> use the MetricCollector.getInstance().getMetrics() method, can fail.
> Example stack trace:
> {code:java}
> java.lang.AssertionError: expected:<1> but was:<0> at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.failNotEquals(Assert.java:743) at 
> org.junit.Assert.assertEquals(Assert.java:118) at 
> org.junit.Assert.assertEquals(Assert.java:555) at 
> org.junit.Assert.assertEquals(Assert.java:542) at 
> org.apache.hadoop.hive.ql.parse.repl.metric.TestReplicationMetricCollector.testFailoverReadyDumpMetrics(TestReplicationMetricCollector.java:227){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794071&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794071
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 22/Jul/22 06:07
Start Date: 22/Jul/22 06:07
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r927295389


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {

Review Comment:
   Currently `Context ctx` is not available during `rollbackTxn` which is why I 
chose to store the object.
   However I agree passing `Context ctx` is better.
   There are multiple ways of doing this - 
   First would be to involve passing `Context ctx` to `rollbackTxn` method 
which would change the HiveTxnManager API itself (I particularly dont like this 
since this would be a breaking change).
   
   Or we could create a new function in the `HiveTxnManager` interface of the 
same name and call it from the driver when rollback conditions are satisfied.
   
   My idea was to avoid both and initialise the `destinationTable` in one of 
the existing APIs but I am open for any other suggestions.





Issue Time Tracking
---

Worklog Id: (was: 794071)
Time Spent: 1h 50m  (was: 1h 40m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794063&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794063
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 22/Jul/22 05:47
Start Date: 22/Jul/22 05:47
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r927309557


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {

Review Comment:
   The idea is that `destinationTable` object is set only when CTAS operations 
are performed 
([here](https://github.com/apache/hive/pull/3457/files#diff-d4b1a32bbbd9e283893a6b52854c7aeb3e356a1ba1add2c4107e52901ca268f9R7856)).
 So no inherent need of checking whether the operation is CTAS.
   
   As far as exclusive lock is enabled, it would be right to perform cleanup 
when a exclusive lock is acquired otherwise we might have a situation wherein 
the cleaner is cleaning and concurrent CTAS operations write to the same 
location causing issues.





Issue Time Tracking
---

Worklog Id: (was: 794063)
Time Spent: 1h 40m  (was: 1.5h)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26375) Invalid materialized view after rebuild if source table was compacted

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26375?focusedWorklogId=794061&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794061
 ]

ASF GitHub Bot logged work on HIVE-26375:
-

Author: ASF GitHub Bot
Created on: 22/Jul/22 05:34
Start Date: 22/Jul/22 05:34
Worklog Time Spent: 10m 
  Work Description: amansinha100 commented on code in PR #3420:
URL: https://github.com/apache/hive/pull/3420#discussion_r927303861


##
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestMaterializedViewRebuild.java:
##
@@ -97,7 +91,7 @@ public void 
testWhenMajorCompactionThenIncrementalMVRebuildIsStillAvailable() th
 txnHandler.cleanTxnToWriteIdTable();
 
 List result = execSelectAndDumpData("explain cbo alter 
materialized view " + MV1 + " rebuild", driver, "");
-Assert.assertEquals(INCREMENTAL_REBUILD_PLAN, result);
+Assert.assertEquals(FULL_REBUILD_PLAN, result);

Review Comment:
   Thanks for the explanation.  Makes sense that since the entries are deleted 
from the COMPLETED_TXN_COMPONENTS table at compaction, there's no way to check 
if only insert operation was done in a full ACID table. 





Issue Time Tracking
---

Worklog Id: (was: 794061)
Time Spent: 40m  (was: 0.5h)

> Invalid materialized view after rebuild if source table was compacted
> -
>
> Key: HIVE-26375
> URL: https://issues.apache.org/jira/browse/HIVE-26375
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views, Transactions
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> After HIVE-25656 MV state depends on the number of rows deleted/updated in 
> the source tables of the view. However if one of the source tables are major 
> compacted the delete delta files are no longer available and reproducing the 
> rows should be deleted from the MV is no longer possible.
> {code}
> create table t1(a int, b varchar(128), c float) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> insert into t1(a,b, c) values (1, 'one', 1.1), (2, 'two', 2.2), (NULL, NULL, 
> NULL);
> create materialized view mv1 stored as orc TBLPROPERTIES 
> ('transactional'='true') as select a,b,c from t1 where a > 0 or a is null;
> update t1 set b = 'Changed' where a = 1;
> alter table t1 compact 'major';
> alter materialized view t1 rebuild;
> select * from mv1;
> {code}
> Select should result 
> {code}
>   "1\tChanged\t1.1",
>   "2\ttwo\t2.2",
>   "NULL\tNULL\tNULL"
> {code}
> but was
> {code}
>   "1\tone\t1.1",  
>   "2\ttwo\t2.2",
>   "NULL\tNULL\tNULL",
>   "1\tChanged\t1.1"
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=794058&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794058
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 22/Jul/22 05:13
Start Date: 22/Jul/22 05:13
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r927295389


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {

Review Comment:
   Currently `Context ctx` is not available during `rollbackTxn` which is why I 
chose to store the object.
   However I agree passing `Context ctx` is better.
   There are multiple ways of doing this - 
   First would be to involve passing `Context ctx` to `rollbackTxn` method 
which would change the HiveTxnManager API itself (I particularly dont like this 
since this would be a breaking change).
   
   Or we could create a new function in the `HiveTxnManager` interface of the 
same name and call it from the driver when rollback conditions are satisfied.
   
   My idea was to avoid both and initialise the destination in one of the 
existing APIs but I am open for any other suggestions.





Issue Time Tracking
---

Worklog Id: (was: 794058)
Time Spent: 1.5h  (was: 1h 20m)

> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26400) Provide a self-contained docker

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26400?focusedWorklogId=794057&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794057
 ]

ASF GitHub Bot logged work on HIVE-26400:
-

Author: ASF GitHub Bot
Created on: 22/Jul/22 05:07
Start Date: 22/Jul/22 05:07
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on PR #3448:
URL: https://github.com/apache/hive/pull/3448#issuecomment-1192186831

   The new changes add support for running with local version of Hive:
   ```
   sh deploy.sh --hadoop  --tez 
   ```
   this command will build the image with given Hadoop and Tez version, and the 
local packaging/target/apache-hive-${project.version}-bin.tar.gz that built 
from source, a cluster with HiveServer2, Metastore and MySQL would be started.
   
   We can also build the image with a specified Hive version, just append 
`--hive ` to the above command.
   
   By default, the command will read the version info from project `pom.xml`:  
`project.version`, `hadoop.version`, `tez.version`, these properties are read 
as hive version, hadoop.version, tez.version and used for `deploy.sh` to build 
the image.
   
   Besides, we can start a standalone HiveServer2 only with embedded Metastore,
   ```
   sh deploy.sh --hiveserver2
   ```
   or just start a standalone Metastore with derby,
   ```
   sh deploy.sh --metastore
   ```
   




Issue Time Tracking
---

Worklog Id: (was: 794057)
Time Spent: 0.5h  (was: 20m)

> Provide a self-contained docker
> ---
>
> Key: HIVE-26400
> URL: https://issues.apache.org/jira/browse/HIVE-26400
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26415) Add epoch time in the information_schema.scheduled_executions view

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26415?focusedWorklogId=794048&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794048
 ]

ASF GitHub Bot logged work on HIVE-26415:
-

Author: ASF GitHub Bot
Created on: 22/Jul/22 04:45
Start Date: 22/Jul/22 04:45
Worklog Time Spent: 10m 
  Work Description: shreenidhiSaigaonkar opened a new pull request, #3467:
URL: https://github.com/apache/hive/pull/3467

   This commit doesn't contain any secrets.
   
   
   
   ### What changes were proposed in this pull request?
   - It just adds an extra column in the ```scheduled_executions``` view of 
```information_schema```
   
   
   
   ### Why are the changes needed?
   - Makes correlation between ```replication_metrics``` and 
```scheduled_execution``` easier.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   - No
   
   
   
   ### How was this patch tested?
   - Manually tested by running the script through beeline interface.
   
   




Issue Time Tracking
---

Worklog Id: (was: 794048)
Remaining Estimate: 167.5h  (was: 167h 40m)
Time Spent: 0.5h  (was: 20m)

> Add epoch time in the information_schema.scheduled_executions view
> --
>
> Key: HIVE-26415
> URL: https://issues.apache.org/jira/browse/HIVE-26415
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Imran
>Assignee: Shreenidhi
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 168h
>  Time Spent: 0.5h
>  Remaining Estimate: 167.5h
>
> information_schema.scheduled_executions shows time as the System time. 
> replication_metrics shows time in epoch time.
> Only way to corelate the two is using the scheduled_execution id. Looking at 
> the time at the 2 tables causes some confusion. So we can add a new column in 
> the information_schema.scheduled_executions view displaying the epoch time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26415) Add epoch time in the information_schema.scheduled_executions view

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26415?focusedWorklogId=794047&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-794047
 ]

ASF GitHub Bot logged work on HIVE-26415:
-

Author: ASF GitHub Bot
Created on: 22/Jul/22 04:42
Start Date: 22/Jul/22 04:42
Worklog Time Spent: 10m 
  Work Description: shreenidhiSaigaonkar closed pull request #3465: 
HIVE-26415 : Add epoch time in the information_schema.scheduled_executions view
URL: https://github.com/apache/hive/pull/3465




Issue Time Tracking
---

Worklog Id: (was: 794047)
Remaining Estimate: 167h 40m  (was: 167h 50m)
Time Spent: 20m  (was: 10m)

> Add epoch time in the information_schema.scheduled_executions view
> --
>
> Key: HIVE-26415
> URL: https://issues.apache.org/jira/browse/HIVE-26415
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Imran
>Assignee: Shreenidhi
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 168h
>  Time Spent: 20m
>  Remaining Estimate: 167h 40m
>
> information_schema.scheduled_executions shows time as the System time. 
> replication_metrics shows time in epoch time.
> Only way to corelate the two is using the scheduled_execution id. Looking at 
> the time at the 2 tables causes some confusion. So we can add a new column in 
> the information_schema.scheduled_executions view displaying the epoch time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26411) Fix TestReplicationMetricCollector flakiness

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26411?focusedWorklogId=793951&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793951
 ]

ASF GitHub Bot logged work on HIVE-26411:
-

Author: ASF GitHub Bot
Created on: 21/Jul/22 22:24
Start Date: 21/Jul/22 22:24
Worklog Time Spent: 10m 
  Work Description: jfsii commented on code in PR #3458:
URL: https://github.com/apache/hive/pull/3458#discussion_r927147243


##
ql/src/test/org/apache/hadoop/hive/ql/parse/repl/metric/TestReplicationMetricCollector.java:
##
@@ -75,6 +84,12 @@ public void setup() throws Exception {
 MetricCollector.getInstance().init(conf);
 Mockito.when(fmd.getFailoverEventId()).thenReturn(10L);
 Mockito.when(fmd.getFilePath()).thenReturn("dummyDir");
+disableBackgroundThreads();
+  }
+
+  private void disableBackgroundThreads() {
+PowerMockito.mockStatic(MetricSink.class);
+Mockito.when(MetricSink.getInstance()).thenReturn(metricSinkInstance);

Review Comment:
   I'm new to the this testing framework - but I assume my understanding is 
correct:
   This works because metricSinkInstance gets its MetricSink() constructor 
called because it is annotated with Mock and since the metricSinkInstance never 
gets init called on it, it never starts the executorService, correct?





Issue Time Tracking
---

Worklog Id: (was: 793951)
Time Spent: 20m  (was: 10m)

> Fix TestReplicationMetricCollector flakiness
> 
>
> Key: HIVE-26411
> URL: https://issues.apache.org/jira/browse/HIVE-26411
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Reporter: Hankó Gergely
>Assignee: Hankó Gergely
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> TestReplicationMetricCollector tests can fail intermittently because 
> ReplicationMetricCollector schedules a MetrikSink thread that consumes the 
> MetricCollector's 
> metricMap regularly and if this happens at the wrong time, the tests, that 
> use the MetricCollector.getInstance().getMetrics() method, can fail.
> Example stack trace:
> {code:java}
> java.lang.AssertionError: expected:<1> but was:<0> at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.failNotEquals(Assert.java:743) at 
> org.junit.Assert.assertEquals(Assert.java:118) at 
> org.junit.Assert.assertEquals(Assert.java:555) at 
> org.junit.Assert.assertEquals(Assert.java:542) at 
> org.apache.hadoop.hive.ql.parse.repl.metric.TestReplicationMetricCollector.testFailoverReadyDumpMetrics(TestReplicationMetricCollector.java:227){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26419) Use a different pool for DataNucleus' secondary connection factory

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26419?focusedWorklogId=793887&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793887
 ]

ASF GitHub Bot logged work on HIVE-26419:
-

Author: ASF GitHub Bot
Created on: 21/Jul/22 18:23
Start Date: 21/Jul/22 18:23
Worklog Time Spent: 10m 
  Work Description: hsnusonic opened a new pull request, #3466:
URL: https://github.com/apache/hive/pull/3466

   …n factory
   
   
   
   ### What changes were proposed in this pull request?
   
   Introduce another connection pool for DataNucleus' secondary connection 
factory
   
   ### Why are the changes needed?
   
   We now use the same connection pool for primary and secondary connection 
factory. It can possibly cause connection starvation when DataNucleus try to 
acquire a lock for schema validation or value generation but there is no 
available connection in the pool.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Running Impala test suites with pool size = 1 and all tests can pass




Issue Time Tracking
---

Worklog Id: (was: 793887)
Remaining Estimate: 0h
Time Spent: 10m

> Use a different pool for DataNucleus' secondary connection factory
> --
>
> Key: HIVE-26419
> URL: https://issues.apache.org/jira/browse/HIVE-26419
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Quote from DataNucleus documentation:
> {quote}The secondary connection factory is used for schema generation, and 
> for value generation operations (unless specified to use primary).
> {quote}
> We should not use same connection pool for DataNucleus' primary and secondary 
> connection factory. An awful situation is that each thread holds one 
> connection and request for another connection for value generation, but no 
> connection is available in the pool. It will keep retrying and fail at the 
> end.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26419) Use a different pool for DataNucleus' secondary connection factory

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26419:
--
Labels: pull-request-available  (was: )

> Use a different pool for DataNucleus' secondary connection factory
> --
>
> Key: HIVE-26419
> URL: https://issues.apache.org/jira/browse/HIVE-26419
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Quote from DataNucleus documentation:
> {quote}The secondary connection factory is used for schema generation, and 
> for value generation operations (unless specified to use primary).
> {quote}
> We should not use same connection pool for DataNucleus' primary and secondary 
> connection factory. An awful situation is that each thread holds one 
> connection and request for another connection for value generation, but no 
> connection is available in the pool. It will keep retrying and fail at the 
> end.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26419) Use a different pool for DataNucleus' secondary connection factory

2022-07-21 Thread Yu-Wen Lai (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu-Wen Lai reassigned HIVE-26419:
-


> Use a different pool for DataNucleus' secondary connection factory
> --
>
> Key: HIVE-26419
> URL: https://issues.apache.org/jira/browse/HIVE-26419
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Reporter: Yu-Wen Lai
>Assignee: Yu-Wen Lai
>Priority: Major
>
> Quote from DataNucleus documentation:
> {quote}The secondary connection factory is used for schema generation, and 
> for value generation operations (unless specified to use primary).
> {quote}
> We should not use same connection pool for DataNucleus' primary and secondary 
> connection factory. An awful situation is that each thread holds one 
> connection and request for another connection for value generation, but no 
> connection is available in the pool. It will keep retrying and fail at the 
> end.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26415) Add epoch time in the information_schema.scheduled_executions view

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-26415:
--
Labels: pull-request-available  (was: )

> Add epoch time in the information_schema.scheduled_executions view
> --
>
> Key: HIVE-26415
> URL: https://issues.apache.org/jira/browse/HIVE-26415
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Imran
>Assignee: Shreenidhi
>Priority: Major
>  Labels: pull-request-available
>   Original Estimate: 168h
>  Time Spent: 10m
>  Remaining Estimate: 167h 50m
>
> information_schema.scheduled_executions shows time as the System time. 
> replication_metrics shows time in epoch time.
> Only way to corelate the two is using the scheduled_execution id. Looking at 
> the time at the 2 tables causes some confusion. So we can add a new column in 
> the information_schema.scheduled_executions view displaying the epoch time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26415) Add epoch time in the information_schema.scheduled_executions view

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26415?focusedWorklogId=793856&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793856
 ]

ASF GitHub Bot logged work on HIVE-26415:
-

Author: ASF GitHub Bot
Created on: 21/Jul/22 17:26
Start Date: 21/Jul/22 17:26
Worklog Time Spent: 10m 
  Work Description: shreenidhiSaigaonkar opened a new pull request, #3465:
URL: https://github.com/apache/hive/pull/3465

   This commit doesn't contain any secrets.
   
   
   
   ### What changes were proposed in this pull request?
   - It just adds an extra column in the ```scheduled_executions``` view of 
```information_schema```
   
   
   
   
   ### Why are the changes needed?
   - Makes correlation between ```replication_metrics``` and 
```scheduled_execution``` easier.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   - No
   
   
   
   ### How was this patch tested?
   - Manually tested by running the script through beeline interface.
   
   




Issue Time Tracking
---

Worklog Id: (was: 793856)
Remaining Estimate: 167h 50m  (was: 168h)
Time Spent: 10m

> Add epoch time in the information_schema.scheduled_executions view
> --
>
> Key: HIVE-26415
> URL: https://issues.apache.org/jira/browse/HIVE-26415
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Imran
>Assignee: Shreenidhi
>Priority: Major
>   Original Estimate: 168h
>  Time Spent: 10m
>  Remaining Estimate: 167h 50m
>
> information_schema.scheduled_executions shows time as the System time. 
> replication_metrics shows time in epoch time.
> Only way to corelate the two is using the scheduled_execution id. Looking at 
> the time at the 2 tables causes some confusion. So we can add a new column in 
> the information_schema.scheduled_executions view displaying the epoch time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-24484) Upgrade Hadoop to 3.3.1 And Tez to 0.10.2

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24484?focusedWorklogId=793735&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793735
 ]

ASF GitHub Bot logged work on HIVE-24484:
-

Author: ASF GitHub Bot
Created on: 21/Jul/22 13:32
Start Date: 21/Jul/22 13:32
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on PR #3279:
URL: https://github.com/apache/hive/pull/3279#issuecomment-1191490566

   >But Hive 3.1.x version is not very old and 4.x still looks like in alpha so 
we may not able to upgrade. so with this PR still we have compatibility issues 
with Hive 3.x version. any suggestions? thanks
   
   @sujith71955  Unfortunately, I don't have a use case for 3.x line, but that 
should be doable but would requires changes across Hadoop, Hive & Tez. We did 
the same here as well
   
   It is certainly doable, if you folks have a use case, feel free to create a 
Jira for 3.1.x line. Running busy so couldn't spare time to check the problem 
you folks stated above..




Issue Time Tracking
---

Worklog Id: (was: 793735)
Time Spent: 14h 13m  (was: 14.05h)

> Upgrade Hadoop to 3.3.1 And Tez to 0.10.2 
> --
>
> Key: HIVE-24484
> URL: https://issues.apache.org/jira/browse/HIVE-24484
> Project: Hive
>  Issue Type: Improvement
>Reporter: David Mollitor
>Assignee: Ayush Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 14h 13m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-21 Thread Sourabh Badhya (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-26414 started by Sourabh Badhya.
-
> Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data
> ---
>
> Key: HIVE-26414
> URL: https://issues.apache.org/jira/browse/HIVE-26414
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When a CTAS query fails before creation of table and after writing the data, 
> the data is present in the directory and not cleaned up currently by the 
> cleaner or any other mechanism currently. This is because the cleaner 
> requires a table corresponding to what its cleaning. In order surpass such a 
> situation, we can directly pass the relevant information to the cleaner so 
> that such uncommitted data is deleted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26409) Assign NO_TXN operation type for table in global locks for scheduled queries

2022-07-21 Thread Sourabh Badhya (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-26409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569415#comment-17569415
 ] 

Sourabh Badhya commented on HIVE-26409:
---

Thanks for the review [~dkuzmenko] .

> Assign NO_TXN operation type for table in global locks for scheduled queries
> 
>
> Key: HIVE-26409
> URL: https://issues.apache.org/jira/browse/HIVE-26409
> Project: Hive
>  Issue Type: Bug
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> NO_TXN operation type has to be assigned while acquiring locks for the table 
> in global locks because this will not add records to COMPLETED_TXN_COMPONENTS 
> table. Currently the record introduced by this lock had writeId as NULL. 
> There was a situation which lead to large number of records in 
> COMPLETED_TXN_COMPONENTS table because these records had writeId as NULL 
> which weren't deleted up by AcidHouseKeeperService/Cleaner and this 
> subsequently lead to OOM errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26409) Assign NO_TXN operation type for table in global locks for scheduled queries

2022-07-21 Thread Sourabh Badhya (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sourabh Badhya resolved HIVE-26409.
---
Fix Version/s: 4.0.0-alpha-2
   Resolution: Fixed

> Assign NO_TXN operation type for table in global locks for scheduled queries
> 
>
> Key: HIVE-26409
> URL: https://issues.apache.org/jira/browse/HIVE-26409
> Project: Hive
>  Issue Type: Bug
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0-alpha-2
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> NO_TXN operation type has to be assigned while acquiring locks for the table 
> in global locks because this will not add records to COMPLETED_TXN_COMPONENTS 
> table. Currently the record introduced by this lock had writeId as NULL. 
> There was a situation which lead to large number of records in 
> COMPLETED_TXN_COMPONENTS table because these records had writeId as NULL 
> which weren't deleted up by AcidHouseKeeperService/Cleaner and this 
> subsequently lead to OOM errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26409) Assign NO_TXN operation type for table in global locks for scheduled queries

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26409?focusedWorklogId=793700&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793700
 ]

ASF GitHub Bot logged work on HIVE-26409:
-

Author: ASF GitHub Bot
Created on: 21/Jul/22 11:55
Start Date: 21/Jul/22 11:55
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged PR #3454:
URL: https://github.com/apache/hive/pull/3454




Issue Time Tracking
---

Worklog Id: (was: 793700)
Time Spent: 0.5h  (was: 20m)

> Assign NO_TXN operation type for table in global locks for scheduled queries
> 
>
> Key: HIVE-26409
> URL: https://issues.apache.org/jira/browse/HIVE-26409
> Project: Hive
>  Issue Type: Bug
>Reporter: Sourabh Badhya
>Assignee: Sourabh Badhya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> NO_TXN operation type has to be assigned while acquiring locks for the table 
> in global locks because this will not add records to COMPLETED_TXN_COMPONENTS 
> table. Currently the record introduced by this lock had writeId as NULL. 
> There was a situation which lead to large number of records in 
> COMPLETED_TXN_COMPONENTS table because these records had writeId as NULL 
> which weren't deleted up by AcidHouseKeeperService/Cleaner and this 
> subsequently lead to OOM errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-26397) Honour Iceberg sort orders when writing a table

2022-07-21 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-26397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569362#comment-17569362
 ] 

László Pintér commented on HIVE-26397:
--

Merged into master. Thanks, [~szita] and [~pvary] for the review!

> Honour Iceberg sort orders when writing a table
> ---
>
> Key: HIVE-26397
> URL: https://issues.apache.org/jira/browse/HIVE-26397
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Iceberg specification defines sort orders. We should consider this when 
> writing to an Iceberg table through Hive.
> See: https://iceberg.apache.org/spec/#sort-orders



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-26397) Honour Iceberg sort orders when writing a table

2022-07-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HIVE-26397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Pintér resolved HIVE-26397.
--
Resolution: Fixed

> Honour Iceberg sort orders when writing a table
> ---
>
> Key: HIVE-26397
> URL: https://issues.apache.org/jira/browse/HIVE-26397
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Iceberg specification defines sort orders. We should consider this when 
> writing to an Iceberg table through Hive.
> See: https://iceberg.apache.org/spec/#sort-orders



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26397) Honour Iceberg sort orders when writing a table

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26397?focusedWorklogId=793681&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793681
 ]

ASF GitHub Bot logged work on HIVE-26397:
-

Author: ASF GitHub Bot
Created on: 21/Jul/22 11:23
Start Date: 21/Jul/22 11:23
Worklog Time Spent: 10m 
  Work Description: lcspinter merged PR #3445:
URL: https://github.com/apache/hive/pull/3445




Issue Time Tracking
---

Worklog Id: (was: 793681)
Time Spent: 1h 10m  (was: 1h)

> Honour Iceberg sort orders when writing a table
> ---
>
> Key: HIVE-26397
> URL: https://issues.apache.org/jira/browse/HIVE-26397
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Pintér
>Assignee: László Pintér
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Iceberg specification defines sort orders. We should consider this when 
> writing to an Iceberg table through Hive.
> See: https://iceberg.apache.org/spec/#sort-orders



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26414) Aborted/Cancelled CTAS operations must initiate cleanup of uncommitted data

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26414?focusedWorklogId=793599&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793599
 ]

ASF GitHub Bot logged work on HIVE-26414:
-

Author: ASF GitHub Bot
Created on: 21/Jul/22 08:33
Start Date: 21/Jul/22 08:33
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3457:
URL: https://github.com/apache/hive/pull/3457#discussion_r926391951


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {

Review Comment:
   could we pass a `Context ctx` here instead of initializing the destination 
table when acquiring locks? 



##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {

Review Comment:
   shouldn't we check that this was a CTAS operation? do we only need to 
cleanup if an exclusive lock is enabled?



##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {
+  if (destinationTable != null) {
+try {
+  CompactionRequest rqst = new CompactionRequest(
+  destinationTable.getDbName(), 
destinationTable.getTableName(), CompactionType.MAJOR);
+  
rqst.setRunas(TxnUtils.findUserToRunAs(destinationTable.getSd().getLocation(),
+  destinationTable.getTTable(), conf));
+
+  rqst.putToProperties("location", 
destinationTable.getSd().getLocation());

Review Comment:
   use hive_metastoreConstants.META_TABLE_LOCATION?



##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -29,19 +29,10 @@ Licensed to the Apache Software Foundation (ASF) under one
 import org.apache.hadoop.hive.metastore.IMetaStoreClient;
 import org.apache.hadoop.hive.metastore.LockComponentBuilder;
 import org.apache.hadoop.hive.metastore.LockRequestBuilder;
-import org.apache.hadoop.hive.metastore.api.LockComponent;
-import org.apache.hadoop.hive.metastore.api.LockResponse;
-import org.apache.hadoop.hive.metastore.api.LockState;
-import org.apache.hadoop.hive.metastore.api.MetaException;
-import org.apache.hadoop.hive.metastore.api.NoSuchLockException;
-import org.apache.hadoop.hive.metastore.api.NoSuchTxnException;
-import org.apache.hadoop.hive.metastore.api.TxnAbortedException;
-import org.apache.hadoop.hive.metastore.api.TxnToWriteId;
-import org.apache.hadoop.hive.metastore.api.CommitTxnRequest;
-import org.apache.hadoop.hive.metastore.api.DataOperationType;
-import org.apache.hadoop.hive.metastore.api.GetOpenTxnsResponse;
-import org.apache.hadoop.hive.metastore.api.TxnType;
+import org.apache.hadoop.hive.metastore.api.*;

Review Comment:
   please avoid wildcard imports



##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -485,6 +480,26 @@ private void clearLocksAndHB() {
 stopHeartbeat();
   }
 
+  private void cleanupDirForCTAS() {
+if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.TXN_CTAS_X_LOCK)) {
+  if (destinationTable != null) {
+try {
+  CompactionRequest rqst = new CompactionRequest(
+  destinationTable.getDbName(), 
destinationTable.getTableName(), CompactionType.MAJOR);
+  
rqst.setRunas(TxnUtils.findUserToRunAs(destinationTable.getSd().getLocation(),
+  destinationTable.getTTable(), conf));
+
+  rqst.putToProperties("location", 
destinationTable.getSd().getLocation());
+  rqst.putToProperties("ifPurge", Boolean.toString(true));
+  TxnStore txnHandler = TxnUtils.getTxnStore(conf);
+  txnHandler.submitForCleanup(rqst, 
destinationTable.getTTable().getWriteId(), getCurrentTxnId());
+} catch (InterruptedException | IOException | MetaException e) {
+  throw new RuntimeException("Not able to submit cleanup operation of 
directory written by CTAS");

Review Comment:
   should we catch just InterruptedException | IOException and re-throw 
MetaException here?



##
ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java:
##
@@ -7852,6 +7852,10 @@ protected Operator genFileSinkPlan(String dest, QB qb, 
Operator input)
   throw new SemanticException("Error while getting the full qualified 
path for the given directory: " + ex.getMessage());
 }
   }
+
+  if (!isNonNativeTable && 
AcidUtils.isTransactionalTable(destinationTable) && qb.isCTAS()) {

Re

[jira] [Work logged] (HIVE-26375) Invalid materialized view after rebuild if source table was compacted

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26375?focusedWorklogId=793591&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793591
 ]

ASF GitHub Bot logged work on HIVE-26375:
-

Author: ASF GitHub Bot
Created on: 21/Jul/22 08:20
Start Date: 21/Jul/22 08:20
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on code in PR #3420:
URL: https://github.com/apache/hive/pull/3420#discussion_r926392298


##
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestMaterializedViewRebuild.java:
##
@@ -97,7 +91,7 @@ public void 
testWhenMajorCompactionThenIncrementalMVRebuildIsStillAvailable() th
 txnHandler.cleanTxnToWriteIdTable();
 
 List result = execSelectAndDumpData("explain cbo alter 
materialized view " + MV1 + " rebuild", driver, "");
-Assert.assertEquals(INCREMENTAL_REBUILD_PLAN, result);
+Assert.assertEquals(FULL_REBUILD_PLAN, result);

Review Comment:
   We search for update/delete operations in the `COMPLETED_TXN_COMPONENTS` 
affected source tables at MV rebuild. Records are deleted from this table at 
compaction. So after compaction we can not confirm whether there were any 
deletes of any of the source tables any longer. It is relevant since executing 
an incremental rebuild plan which expects insert operations in all source table 
only in case there were deletes leads to data corruption in the refreshed view.
   
   The second rebuild can be an incremental since the first rebuild resets the 
source tables snapshot to a fresh one and txn data of operations done since 
that first rebuild still exists in `COMPLETED_TXN_COMPONENTS`.





Issue Time Tracking
---

Worklog Id: (was: 793591)
Time Spent: 0.5h  (was: 20m)

> Invalid materialized view after rebuild if source table was compacted
> -
>
> Key: HIVE-26375
> URL: https://issues.apache.org/jira/browse/HIVE-26375
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views, Transactions
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> After HIVE-25656 MV state depends on the number of rows deleted/updated in 
> the source tables of the view. However if one of the source tables are major 
> compacted the delete delta files are no longer available and reproducing the 
> rows should be deleted from the MV is no longer possible.
> {code}
> create table t1(a int, b varchar(128), c float) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> insert into t1(a,b, c) values (1, 'one', 1.1), (2, 'two', 2.2), (NULL, NULL, 
> NULL);
> create materialized view mv1 stored as orc TBLPROPERTIES 
> ('transactional'='true') as select a,b,c from t1 where a > 0 or a is null;
> update t1 set b = 'Changed' where a = 1;
> alter table t1 compact 'major';
> alter materialized view t1 rebuild;
> select * from mv1;
> {code}
> Select should result 
> {code}
>   "1\tChanged\t1.1",
>   "2\ttwo\t2.2",
>   "NULL\tNULL\tNULL"
> {code}
> but was
> {code}
>   "1\tone\t1.1",  
>   "2\ttwo\t2.2",
>   "NULL\tNULL\tNULL",
>   "1\tChanged\t1.1"
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-25621) Alter table partition compact/concatenate commands should send HivePrivilegeObjects for Authz

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25621?focusedWorklogId=793569&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793569
 ]

ASF GitHub Bot logged work on HIVE-25621:
-

Author: ASF GitHub Bot
Created on: 21/Jul/22 07:39
Start Date: 21/Jul/22 07:39
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #2731:
URL: https://github.com/apache/hive/pull/2731#discussion_r926352617


##
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/storage/compact/AlterTableCompactAnalyzer.java:
##
@@ -67,6 +73,17 @@ protected void analyzeCommand(TableName tableName, 
Map partition
 }
 
 AlterTableCompactDesc desc = new AlterTableCompactDesc(tableName, 
partitionSpec, type, isBlocking, mapProp);
+Table table = getTable(tableName);
+WriteEntity.WriteType writeType = null;
+if (AcidUtils.isTransactionalTable(table)) {
+  setAcidDdlDesc(desc);
+  writeType = WriteType.DDL_EXCLUSIVE;

Review Comment:
   cloud you please explain a little bit why we choose DDL_EXCLUSIVE for 
transactional tables?  does it works the same for insert only tables?





Issue Time Tracking
---

Worklog Id: (was: 793569)
Time Spent: 1h 10m  (was: 1h)

> Alter table partition compact/concatenate commands should send 
> HivePrivilegeObjects for Authz
> -
>
> Key: HIVE-25621
> URL: https://issues.apache.org/jira/browse/HIVE-25621
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> # Run the following queries 
> Create table temp(c0 int) partitioned by (c1 int);
> Insert into temp values(1,1);
> ALTER TABLE temp PARTITION (c1=1) COMPACT 'minor';
> ALTER TABLE temp PARTITION (c1=1) CONCATENATE;
> Insert into temp values(1,1);
>  # The above compact/concatenate commands are currently not sending any hive 
> privilege objects for authorization. Hive needs to send these objects to 
> avoid malicious users doing any operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-25621) Alter table partition compact/concatenate commands should send HivePrivilegeObjects for Authz

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25621?focusedWorklogId=793568&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793568
 ]

ASF GitHub Bot logged work on HIVE-25621:
-

Author: ASF GitHub Bot
Created on: 21/Jul/22 07:38
Start Date: 21/Jul/22 07:38
Worklog Time Spent: 10m 
  Work Description: saihemanth-cloudera opened a new pull request, #2731:
URL: https://github.com/apache/hive/pull/2731

   …erge/concatenate
   
   
   
   ### What changes were proposed in this pull request?
   Added HivePrivilegeObjects for alter table merge/concatenate commands.
   
   
   
   ### Why are the changes needed?
   To prevent  malicious users from doing alter operations on table.
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   
   ### How was this patch tested?
   Local machine, remote cluster.
   
   




Issue Time Tracking
---

Worklog Id: (was: 793568)
Time Spent: 1h  (was: 50m)

> Alter table partition compact/concatenate commands should send 
> HivePrivilegeObjects for Authz
> -
>
> Key: HIVE-25621
> URL: https://issues.apache.org/jira/browse/HIVE-25621
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> # Run the following queries 
> Create table temp(c0 int) partitioned by (c1 int);
> Insert into temp values(1,1);
> ALTER TABLE temp PARTITION (c1=1) COMPACT 'minor';
> ALTER TABLE temp PARTITION (c1=1) CONCATENATE;
> Insert into temp values(1,1);
>  # The above compact/concatenate commands are currently not sending any hive 
> privilege objects for authorization. Hive needs to send these objects to 
> avoid malicious users doing any operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-25621) Alter table partition compact/concatenate commands should send HivePrivilegeObjects for Authz

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25621?focusedWorklogId=793567&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793567
 ]

ASF GitHub Bot logged work on HIVE-25621:
-

Author: ASF GitHub Bot
Created on: 21/Jul/22 07:38
Start Date: 21/Jul/22 07:38
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on code in PR #2731:
URL: https://github.com/apache/hive/pull/2731#discussion_r926347947


##
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/storage/compact/AlterTableCompactAnalyzer.java:
##
@@ -67,6 +73,17 @@ protected void analyzeCommand(TableName tableName, 
Map partition
 }
 
 AlterTableCompactDesc desc = new AlterTableCompactDesc(tableName, 
partitionSpec, type, isBlocking, mapProp);
+Table table = getTable(tableName);
+WriteEntity.WriteType writeType = null;
+if (AcidUtils.isTransactionalTable(table)) {
+  setAcidDdlDesc(desc);
+  writeType = WriteType.DDL_EXCLUSIVE;
+} else {
+  writeType = 
WriteEntity.determineAlterTableWriteType(AlterTableType.COMPACT);
+}
+inputs.add(new ReadEntity(table));

Review Comment:
   should we take care of `partitionSpec` as well?



##
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/storage/concatenate/AlterTableConcatenateAnalyzer.java:
##
@@ -95,10 +97,14 @@ protected void analyzeCommand(TableName tableName, 
Map partition
 }
   }
 
-  private void compactAcidTable(TableName tableName, Map 
partitionSpec) throws SemanticException {
+  private void compactAcidTable(TableName tableName, Table table, Map partitionSpec) throws SemanticException {
 boolean isBlocking = !HiveConf.getBoolVar(conf, 
ConfVars.TRANSACTIONAL_CONCATENATE_NOBLOCK, false);
 
 AlterTableCompactDesc desc = new AlterTableCompactDesc(tableName, 
partitionSpec, "MAJOR", isBlocking, null);
+WriteEntity.WriteType writeType = WriteEntity.WriteType.DDL_EXCLUSIVE;

Review Comment:
   should we take care of `partitionSpec` as well?



##
ql/src/java/org/apache/hadoop/hive/ql/ddl/table/storage/compact/AlterTableCompactAnalyzer.java:
##
@@ -67,6 +73,17 @@ protected void analyzeCommand(TableName tableName, 
Map partition
 }
 
 AlterTableCompactDesc desc = new AlterTableCompactDesc(tableName, 
partitionSpec, type, isBlocking, mapProp);
+Table table = getTable(tableName);
+WriteEntity.WriteType writeType = null;
+if (AcidUtils.isTransactionalTable(table)) {
+  setAcidDdlDesc(desc);
+  writeType = WriteType.DDL_EXCLUSIVE;

Review Comment:
   cloud you please explain a little bit why we choose DDL_EXCLUSIVE for 
transactional table?  does it works the same for insert only tables?





Issue Time Tracking
---

Worklog Id: (was: 793567)
Time Spent: 50m  (was: 40m)

> Alter table partition compact/concatenate commands should send 
> HivePrivilegeObjects for Authz
> -
>
> Key: HIVE-25621
> URL: https://issues.apache.org/jira/browse/HIVE-25621
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> # Run the following queries 
> Create table temp(c0 int) partitioned by (c1 int);
> Insert into temp values(1,1);
> ALTER TABLE temp PARTITION (c1=1) COMPACT 'minor';
> ALTER TABLE temp PARTITION (c1=1) CONCATENATE;
> Insert into temp values(1,1);
>  # The above compact/concatenate commands are currently not sending any hive 
> privilege objects for authorization. Hive needs to send these objects to 
> avoid malicious users doing any operation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26400) Provide a self-contained docker

2022-07-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26400?focusedWorklogId=793560&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793560
 ]

ASF GitHub Bot logged work on HIVE-26400:
-

Author: ASF GitHub Bot
Created on: 21/Jul/22 07:21
Start Date: 21/Jul/22 07:21
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on PR #3448:
URL: https://github.com/apache/hive/pull/3448#issuecomment-1191134925

   > Is there any scope to run it with local version of Hive/Hadoop/Tez or do 
we need a released version always for this?
   
   The quick answer is yes, but there are some places to modify in order to run 
specified version:
   - change version in docker-compose.yml
   
https://github.com/apache/hive/blob/213208570d1efa0f7a41d5a742edd0439b99163b/dev-support/docker/docker-compose.yml#L51-L52
   - change the download url in Dockerfile
   
https://github.com/apache/hive/blob/213208570d1efa0f7a41d5a742edd0439b99163b/dev-support/docker/Dockerfile#L37-L43
   
   I'm wondering if we can build hive from source directly, but need some 
feedback and investigation.




Issue Time Tracking
---

Worklog Id: (was: 793560)
Time Spent: 20m  (was: 10m)

> Provide a self-contained docker
> ---
>
> Key: HIVE-26400
> URL: https://issues.apache.org/jira/browse/HIVE-26400
> Project: Hive
>  Issue Type: Improvement
>  Components: Build Infrastructure
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)