[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=848631=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-848631
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 02/Mar/23 11:10
Start Date: 02/Mar/23 11:10
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged PR #3576:
URL: https://github.com/apache/hive/pull/3576




Issue Time Tracking
---

Worklog Id: (was: 848631)
Time Spent: 6h 40m  (was: 6.5h)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be group-committed across many users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-03-02 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=848630=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-848630
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 02/Mar/23 11:08
Start Date: 02/Mar/23 11:08
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3576:
URL: https://github.com/apache/hive/pull/3576#discussion_r1122929881


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java:
##
@@ -1535,20 +1556,14 @@ public void setHadoopJobId(String hadoopJobId, long id) 
{
   @Override
   @RetrySemantics.Idempotent
   public long findMinOpenTxnIdForCleaner() throws MetaException {
-Connection dbConn = null;
 try {
-  try {
-dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED, 
connPoolCompaction);
+  try (Connection dbConn = 
getDbConn(Connection.TRANSACTION_READ_COMMITTED, connPoolCompaction)) {
 return getMinOpenTxnIdWaterMark(dbConn);
   } catch (SQLException e) {
-LOG.error("Unable to getMinOpenTxnIdForCleaner", e);
-rollbackDBConn(dbConn);
-checkRetryable(e, "getMinOpenTxnForCleaner");
-throw new MetaException("Unable to execute getMinOpenTxnIfForCleaner() 
" +
-e.getMessage());
-  } finally {
-closeDbConn(dbConn);
-  }
+LOG.error("Unable to findMinOpenTxnIdForCleaner", e);
+checkRetryable(e, "findMinOpenTxnIdForCleaner");
+throw new MetaException("Unable to execute getMinOpenTxnIfForCleaner() 
" + e.getMessage());

Review Comment:
   fixed





Issue Time Tracking
---

Worklog Id: (was: 848630)
Time Spent: 6.5h  (was: 6h 20m)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be group-committed across many users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=848522=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-848522
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 02/Mar/23 06:04
Start Date: 02/Mar/23 06:04
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3576:
URL: https://github.com/apache/hive/pull/3576#issuecomment-1451346304

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3576)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
 
[![E](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/E-16px.png
 
'E')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
 [2 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
 [17 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3576=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3576=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 848522)
Time Spent: 6h 20m  (was: 6h 10m)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be 

[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=848419=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-848419
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 01/Mar/23 18:05
Start Date: 01/Mar/23 18:05
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3576:
URL: https://github.com/apache/hive/pull/3576#issuecomment-1450622471

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3576)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
 
[![E](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/E-16px.png
 
'E')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
 [2 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
 [17 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3576=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3576=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 848419)
Time Spent: 6h 10m  (was: 6h)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be 

[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=848396=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-848396
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 01/Mar/23 16:25
Start Date: 01/Mar/23 16:25
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3576:
URL: https://github.com/apache/hive/pull/3576#discussion_r1121822320


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java:
##
@@ -140,41 +140,36 @@ public void run() {
 
HiveConf.ConfVars.HIVE_COMPACTOR_CLEANER_DURATION_UPDATE_INTERVAL, 
TimeUnit.MILLISECONDS),
 new 
CleanerCycleUpdater(MetricsConstants.COMPACTION_CLEANER_CYCLE_DURATION, 
startedAt));
   }
-
   long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner();
-
   checkInterrupt();
 
   List readyToClean = 
txnHandler.findReadyToClean(minOpenTxnId, retentionTime);
-
   checkInterrupt();
 
   if (!readyToClean.isEmpty()) {
-long minTxnIdSeenOpen = txnHandler.findMinTxnIdSeenOpen();
-final long cleanerWaterMark =
-minTxnIdSeenOpen < 0 ? minOpenTxnId : Math.min(minOpenTxnId, 
minTxnIdSeenOpen);
-
-LOG.info("Cleaning based on min open txn id: " + cleanerWaterMark);
 List> cleanerList = new ArrayList<>();
 // For checking which compaction can be cleaned we can use the 
minOpenTxnId
 // However findReadyToClean will return all records that were 
compacted with old version of HMS
 // where the CQ_NEXT_TXN_ID is not set. For these compactions we 
need to provide minTxnIdSeenOpen
 // to the clean method, to avoid cleaning up deltas needed for 
running queries
 // when min_history_level is finally dropped, than every HMS will 
commit compaction the new way
 // and minTxnIdSeenOpen can be removed and minOpenTxnId can be 
used instead.
-for (CompactionInfo compactionInfo : readyToClean) {
-
+for (CompactionInfo ci : readyToClean) {
   //Check for interruption before scheduling each compactionInfo 
and return if necessary
   checkInterrupt();
-
+  
   CompletableFuture asyncJob =
   CompletableFuture.runAsync(
-  ThrowingRunnable.unchecked(() -> 
clean(compactionInfo, cleanerWaterMark, metricsEnabled)),
-  cleanerExecutor)
-  .exceptionally(t -> {
-LOG.error("Error clearing {}", 
compactionInfo.getFullPartitionName(), t);
-return null;
-  });
+  ThrowingRunnable.unchecked(() -> {
+long minOpenTxn = (ci.minOpenWriteId > 0) ? 
+ci.nextTxnId + 1 : Math.min(minOpenTxnId, 
txnHandler.findMinTxnIdSeenOpen());

Review Comment:
   yea, missed that, if we have minOpenWriteId, we shouldn't even call 
findMinTxnIdSeenOpen
   





Issue Time Tracking
---

Worklog Id: (was: 848396)
Time Spent: 6h  (was: 5h 50m)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be group-committed across many users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=848395=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-848395
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 01/Mar/23 16:24
Start Date: 01/Mar/23 16:24
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3576:
URL: https://github.com/apache/hive/pull/3576#discussion_r1122005116


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java:
##
@@ -140,41 +140,36 @@ public void run() {
 
HiveConf.ConfVars.HIVE_COMPACTOR_CLEANER_DURATION_UPDATE_INTERVAL, 
TimeUnit.MILLISECONDS),
 new 
CleanerCycleUpdater(MetricsConstants.COMPACTION_CLEANER_CYCLE_DURATION, 
startedAt));
   }
-
   long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner();
-
   checkInterrupt();
 
   List readyToClean = 
txnHandler.findReadyToClean(minOpenTxnId, retentionTime);
-
   checkInterrupt();
 
   if (!readyToClean.isEmpty()) {
-long minTxnIdSeenOpen = txnHandler.findMinTxnIdSeenOpen();
-final long cleanerWaterMark =
-minTxnIdSeenOpen < 0 ? minOpenTxnId : Math.min(minOpenTxnId, 
minTxnIdSeenOpen);
-
-LOG.info("Cleaning based on min open txn id: " + cleanerWaterMark);
 List> cleanerList = new ArrayList<>();
 // For checking which compaction can be cleaned we can use the 
minOpenTxnId
 // However findReadyToClean will return all records that were 
compacted with old version of HMS
 // where the CQ_NEXT_TXN_ID is not set. For these compactions we 
need to provide minTxnIdSeenOpen
 // to the clean method, to avoid cleaning up deltas needed for 
running queries
 // when min_history_level is finally dropped, than every HMS will 
commit compaction the new way
 // and minTxnIdSeenOpen can be removed and minOpenTxnId can be 
used instead.
-for (CompactionInfo compactionInfo : readyToClean) {
-
+for (CompactionInfo ci : readyToClean) {
   //Check for interruption before scheduling each compactionInfo 
and return if necessary
   checkInterrupt();
-
+  
   CompletableFuture asyncJob =
   CompletableFuture.runAsync(
-  ThrowingRunnable.unchecked(() -> 
clean(compactionInfo, cleanerWaterMark, metricsEnabled)),
-  cleanerExecutor)
-  .exceptionally(t -> {
-LOG.error("Error clearing {}", 
compactionInfo.getFullPartitionName(), t);
-return null;
-  });
+  ThrowingRunnable.unchecked(() -> {
+long minOpenTxn = (ci.minOpenWriteId > 0) ? 
+ci.nextTxnId + 1 : Math.min(minOpenTxnId, 
txnHandler.findMinTxnIdSeenOpen());

Review Comment:
   fixed





Issue Time Tracking
---

Worklog Id: (was: 848395)
Time Spent: 5h 50m  (was: 5h 40m)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be group-committed across many users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=848352=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-848352
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 01/Mar/23 14:27
Start Date: 01/Mar/23 14:27
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3576:
URL: https://github.com/apache/hive/pull/3576#discussion_r1121822320


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java:
##
@@ -140,41 +140,36 @@ public void run() {
 
HiveConf.ConfVars.HIVE_COMPACTOR_CLEANER_DURATION_UPDATE_INTERVAL, 
TimeUnit.MILLISECONDS),
 new 
CleanerCycleUpdater(MetricsConstants.COMPACTION_CLEANER_CYCLE_DURATION, 
startedAt));
   }
-
   long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner();
-
   checkInterrupt();
 
   List readyToClean = 
txnHandler.findReadyToClean(minOpenTxnId, retentionTime);
-
   checkInterrupt();
 
   if (!readyToClean.isEmpty()) {
-long minTxnIdSeenOpen = txnHandler.findMinTxnIdSeenOpen();
-final long cleanerWaterMark =
-minTxnIdSeenOpen < 0 ? minOpenTxnId : Math.min(minOpenTxnId, 
minTxnIdSeenOpen);
-
-LOG.info("Cleaning based on min open txn id: " + cleanerWaterMark);
 List> cleanerList = new ArrayList<>();
 // For checking which compaction can be cleaned we can use the 
minOpenTxnId
 // However findReadyToClean will return all records that were 
compacted with old version of HMS
 // where the CQ_NEXT_TXN_ID is not set. For these compactions we 
need to provide minTxnIdSeenOpen
 // to the clean method, to avoid cleaning up deltas needed for 
running queries
 // when min_history_level is finally dropped, than every HMS will 
commit compaction the new way
 // and minTxnIdSeenOpen can be removed and minOpenTxnId can be 
used instead.
-for (CompactionInfo compactionInfo : readyToClean) {
-
+for (CompactionInfo ci : readyToClean) {
   //Check for interruption before scheduling each compactionInfo 
and return if necessary
   checkInterrupt();
-
+  
   CompletableFuture asyncJob =
   CompletableFuture.runAsync(
-  ThrowingRunnable.unchecked(() -> 
clean(compactionInfo, cleanerWaterMark, metricsEnabled)),
-  cleanerExecutor)
-  .exceptionally(t -> {
-LOG.error("Error clearing {}", 
compactionInfo.getFullPartitionName(), t);
-return null;
-  });
+  ThrowingRunnable.unchecked(() -> {
+long minOpenTxn = (ci.minOpenWriteId > 0) ? 
+ci.nextTxnId + 1 : Math.min(minOpenTxnId, 
txnHandler.findMinTxnIdSeenOpen());

Review Comment:
   yea, missed that, if we have minOpenWriteId, wee should call 
findMinTxnIdSeenOpen
   





Issue Time Tracking
---

Worklog Id: (was: 848352)
Time Spent: 5h 40m  (was: 5.5h)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be group-committed across many users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-03-01 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=848274=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-848274
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 01/Mar/23 10:37
Start Date: 01/Mar/23 10:37
Worklog Time Spent: 10m 
  Work Description: veghlaci05 commented on code in PR #3576:
URL: https://github.com/apache/hive/pull/3576#discussion_r1121440562


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java:
##
@@ -140,41 +140,36 @@ public void run() {
 
HiveConf.ConfVars.HIVE_COMPACTOR_CLEANER_DURATION_UPDATE_INTERVAL, 
TimeUnit.MILLISECONDS),
 new 
CleanerCycleUpdater(MetricsConstants.COMPACTION_CLEANER_CYCLE_DURATION, 
startedAt));
   }
-
   long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner();
-
   checkInterrupt();
 
   List readyToClean = 
txnHandler.findReadyToClean(minOpenTxnId, retentionTime);
-
   checkInterrupt();
 
   if (!readyToClean.isEmpty()) {
-long minTxnIdSeenOpen = txnHandler.findMinTxnIdSeenOpen();
-final long cleanerWaterMark =
-minTxnIdSeenOpen < 0 ? minOpenTxnId : Math.min(minOpenTxnId, 
minTxnIdSeenOpen);
-
-LOG.info("Cleaning based on min open txn id: " + cleanerWaterMark);
 List> cleanerList = new ArrayList<>();
 // For checking which compaction can be cleaned we can use the 
minOpenTxnId
 // However findReadyToClean will return all records that were 
compacted with old version of HMS
 // where the CQ_NEXT_TXN_ID is not set. For these compactions we 
need to provide minTxnIdSeenOpen
 // to the clean method, to avoid cleaning up deltas needed for 
running queries
 // when min_history_level is finally dropped, than every HMS will 
commit compaction the new way
 // and minTxnIdSeenOpen can be removed and minOpenTxnId can be 
used instead.
-for (CompactionInfo compactionInfo : readyToClean) {
-
+for (CompactionInfo ci : readyToClean) {
   //Check for interruption before scheduling each compactionInfo 
and return if necessary
   checkInterrupt();
-
+  
   CompletableFuture asyncJob =
   CompletableFuture.runAsync(
-  ThrowingRunnable.unchecked(() -> 
clean(compactionInfo, cleanerWaterMark, metricsEnabled)),
-  cleanerExecutor)
-  .exceptionally(t -> {
-LOG.error("Error clearing {}", 
compactionInfo.getFullPartitionName(), t);
-return null;
-  });
+  ThrowingRunnable.unchecked(() -> {
+long minOpenTxn = (ci.minOpenWriteId > 0) ? 
+ci.nextTxnId + 1 : Math.min(minOpenTxnId, 
txnHandler.findMinTxnIdSeenOpen());

Review Comment:
   txnHandler.findMinTxnIdSeenOpen() now called inside the for loop, instead of 
called once. Is there a reason for doing this?





Issue Time Tracking
---

Worklog Id: (was: 848274)
Time Spent: 5.5h  (was: 5h 20m)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be group-committed across many users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-02-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=847928=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-847928
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 28/Feb/23 04:31
Start Date: 28/Feb/23 04:31
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3576:
URL: https://github.com/apache/hive/pull/3576#discussion_r1119563491


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java:
##
@@ -1535,20 +1556,14 @@ public void setHadoopJobId(String hadoopJobId, long id) 
{
   @Override
   @RetrySemantics.Idempotent
   public long findMinOpenTxnIdForCleaner() throws MetaException {
-Connection dbConn = null;
 try {
-  try {
-dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED, 
connPoolCompaction);
+  try (Connection dbConn = 
getDbConn(Connection.TRANSACTION_READ_COMMITTED, connPoolCompaction)) {
 return getMinOpenTxnIdWaterMark(dbConn);
   } catch (SQLException e) {
-LOG.error("Unable to getMinOpenTxnIdForCleaner", e);
-rollbackDBConn(dbConn);
-checkRetryable(e, "getMinOpenTxnForCleaner");
-throw new MetaException("Unable to execute getMinOpenTxnIfForCleaner() 
" +
-e.getMessage());
-  } finally {
-closeDbConn(dbConn);
-  }
+LOG.error("Unable to findMinOpenTxnIdForCleaner", e);
+checkRetryable(e, "findMinOpenTxnIdForCleaner");
+throw new MetaException("Unable to execute getMinOpenTxnIfForCleaner() 
" + e.getMessage());

Review Comment:
   nit: Typo - `getMinOpenTxnIdForCleaner` instead of 
`getMinOpenTxnIfForCleaner`





Issue Time Tracking
---

Worklog Id: (was: 847928)
Time Spent: 5h 20m  (was: 5h 10m)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be group-committed across many users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-02-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=847904=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-847904
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 28/Feb/23 00:32
Start Date: 28/Feb/23 00:32
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3576:
URL: https://github.com/apache/hive/pull/3576#issuecomment-1447340160

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3576)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
 
[![E](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/E-16px.png
 
'E')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
 [2 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
 [18 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3576=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3576=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 847904)
Time Spent: 5h 10m  (was: 5h)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be 

[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-02-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=847846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-847846
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 27/Feb/23 15:07
Start Date: 27/Feb/23 15:07
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3576:
URL: https://github.com/apache/hive/pull/3576#issuecomment-1446500387

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3576)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
 
[![E](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/E-16px.png
 
'E')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
 [2 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
 [18 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3576=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3576=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 847846)
Time Spent: 5h  (was: 4h 50m)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be 

[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-02-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=847842=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-847842
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 27/Feb/23 14:38
Start Date: 27/Feb/23 14:38
Worklog Time Spent: 10m 
  Work Description: veghlaci05 commented on code in PR #3576:
URL: https://github.com/apache/hive/pull/3576#discussion_r1118834857


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java:
##
@@ -5894,6 +5902,63 @@ private void addTxnToMinHistoryLevel(Connection dbConn, 
List txnIds, long
 }
   }
 
+  @Override
+  @RetrySemantics.SafeToRetry
+  public void addWriteIdsToMinHistory(long txnid, Map 
minOpenWriteIds) throws MetaException {
+if (!useMinHistoryWriteId) {
+  return;
+}
+// Need to register minimum open writeId for current transactions into 
MIN_HISTORY_WRITE_ID table.
+try {
+  Connection dbConn = null;
+  try {
+dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
+try (PreparedStatement pstmt = 
dbConn.prepareStatement(MIN_HISTORY_WRITE_ID_INSERT_QUERY)) {
+  int writeId = 0;
+
+  for (Map.Entry validWriteId : 
minOpenWriteIds.entrySet()) {
+String[] names = TxnUtils.getDbTableName(validWriteId.getKey());
+
+pstmt.setLong(1, txnid);
+pstmt.setString(2, names[0]);
+pstmt.setString(3, names[1]);
+pstmt.setLong(4, validWriteId.getValue());
+
+pstmt.addBatch();
+writeId++;
+if (writeId % maxBatchSize == 0) {
+  LOG.debug("Executing a batch of <" + 
TXN_TO_WRITE_ID_INSERT_QUERY + "> queries. " +
+"Batch size: " + maxBatchSize);
+  pstmt.executeBatch();
+}
+  }
+  if (writeId % maxBatchSize != 0) {
+LOG.debug("Executing a batch of <" + TXN_TO_WRITE_ID_INSERT_QUERY 
+ "> queries. " +
+  "Batch size: " + writeId % maxBatchSize);
+pstmt.executeBatch();
+  }
+}

Review Comment:
   Ok, I'll create a separate ticket for it. It mag get picked up eventually



##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java:
##
@@ -5894,6 +5902,63 @@ private void addTxnToMinHistoryLevel(Connection dbConn, 
List txnIds, long
 }
   }
 
+  @Override
+  @RetrySemantics.SafeToRetry
+  public void addWriteIdsToMinHistory(long txnid, Map 
minOpenWriteIds) throws MetaException {
+if (!useMinHistoryWriteId) {
+  return;
+}
+// Need to register minimum open writeId for current transactions into 
MIN_HISTORY_WRITE_ID table.
+try {
+  Connection dbConn = null;
+  try {
+dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
+try (PreparedStatement pstmt = 
dbConn.prepareStatement(MIN_HISTORY_WRITE_ID_INSERT_QUERY)) {
+  int writeId = 0;
+
+  for (Map.Entry validWriteId : 
minOpenWriteIds.entrySet()) {
+String[] names = TxnUtils.getDbTableName(validWriteId.getKey());
+
+pstmt.setLong(1, txnid);
+pstmt.setString(2, names[0]);
+pstmt.setString(3, names[1]);
+pstmt.setLong(4, validWriteId.getValue());
+
+pstmt.addBatch();
+writeId++;
+if (writeId % maxBatchSize == 0) {
+  LOG.debug("Executing a batch of <" + 
TXN_TO_WRITE_ID_INSERT_QUERY + "> queries. " +
+"Batch size: " + maxBatchSize);
+  pstmt.executeBatch();
+}
+  }
+  if (writeId % maxBatchSize != 0) {
+LOG.debug("Executing a batch of <" + TXN_TO_WRITE_ID_INSERT_QUERY 
+ "> queries. " +
+  "Batch size: " + writeId % maxBatchSize);
+pstmt.executeBatch();
+  }
+}

Review Comment:
   Ok, I'll create a separate ticket for it. It may get picked up eventually





Issue Time Tracking
---

Worklog Id: (was: 847842)
Time Spent: 4h 50m  (was: 4h 40m)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance 

[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-02-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=847832=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-847832
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 27/Feb/23 14:01
Start Date: 27/Feb/23 14:01
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3576:
URL: https://github.com/apache/hive/pull/3576#discussion_r1118777607


##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java:
##
@@ -140,41 +140,36 @@ public void run() {
 
HiveConf.ConfVars.HIVE_COMPACTOR_CLEANER_DURATION_UPDATE_INTERVAL, 
TimeUnit.MILLISECONDS),
 new 
CleanerCycleUpdater(MetricsConstants.COMPACTION_CLEANER_CYCLE_DURATION, 
startedAt));
   }
-
   long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner();
-
   checkInterrupt();
 
   List readyToClean = 
txnHandler.findReadyToClean(minOpenTxnId, retentionTime);
-
   checkInterrupt();
 
   if (!readyToClean.isEmpty()) {
-long minTxnIdSeenOpen = txnHandler.findMinTxnIdSeenOpen();
-final long cleanerWaterMark =
-minTxnIdSeenOpen < 0 ? minOpenTxnId : Math.min(minOpenTxnId, 
minTxnIdSeenOpen);
-
-LOG.info("Cleaning based on min open txn id: " + cleanerWaterMark);
 List> cleanerList = new ArrayList<>();
 // For checking which compaction can be cleaned we can use the 
minOpenTxnId
 // However findReadyToClean will return all records that were 
compacted with old version of HMS
 // where the CQ_NEXT_TXN_ID is not set. For these compactions we 
need to provide minTxnIdSeenOpen
 // to the clean method, to avoid cleaning up deltas needed for 
running queries
 // when min_history_level is finally dropped, than every HMS will 
commit compaction the new way
 // and minTxnIdSeenOpen can be removed and minOpenTxnId can be 
used instead.
-for (CompactionInfo compactionInfo : readyToClean) {
-
+for (CompactionInfo ci : readyToClean) {
   //Check for interruption before scheduling each compactionInfo 
and return if necessary
   checkInterrupt();
-
+  
   CompletableFuture asyncJob =
   CompletableFuture.runAsync(
-  ThrowingRunnable.unchecked(() -> 
clean(compactionInfo, cleanerWaterMark, metricsEnabled)),
-  cleanerExecutor)
-  .exceptionally(t -> {
-LOG.error("Error clearing {}", 
compactionInfo.getFullPartitionName(), t);
-return null;
-  });
+  ThrowingRunnable.unchecked(() -> {
+long minOpenTxnGLB = (ci.minOpenWriteId > 0) ? 

Review Comment:
   fixed



##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -395,6 +399,32 @@ private boolean allowOperationInATransaction(QueryPlan 
queryPlan) {
 return false;
   }
 
+  @Override
+  public void addWriteIdsToMinHistory(QueryPlan plan, ValidTxnWriteIdList 
txnWriteIds) {
+if (plan.getInputs().isEmpty()) {
+  return;
+}
+Map writeIds = plan.getInputs().stream()
+  .filter(input -> !input.isDummy() && 
AcidUtils.isTransactionalTable(input.getTable()))
+  .map(input -> input.getTable().getFullyQualifiedName())
+  .collect(Collectors.toSet()).stream()

Review Comment:
   fixed





Issue Time Tracking
---

Worklog Id: (was: 847832)
Time Spent: 4h 40m  (was: 4.5h)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to 

[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-02-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=847831=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-847831
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 27/Feb/23 14:00
Start Date: 27/Feb/23 14:00
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3576:
URL: https://github.com/apache/hive/pull/3576#discussion_r1118775359


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java:
##
@@ -5894,6 +5902,63 @@ private void addTxnToMinHistoryLevel(Connection dbConn, 
List txnIds, long
 }
   }
 
+  @Override
+  @RetrySemantics.SafeToRetry
+  public void addWriteIdsToMinHistory(long txnid, Map 
minOpenWriteIds) throws MetaException {
+if (!useMinHistoryWriteId) {
+  return;
+}
+// Need to register minimum open writeId for current transactions into 
MIN_HISTORY_WRITE_ID table.
+try {
+  Connection dbConn = null;
+  try {
+dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
+try (PreparedStatement pstmt = 
dbConn.prepareStatement(MIN_HISTORY_WRITE_ID_INSERT_QUERY)) {
+  int writeId = 0;
+
+  for (Map.Entry validWriteId : 
minOpenWriteIds.entrySet()) {
+String[] names = TxnUtils.getDbTableName(validWriteId.getKey());
+
+pstmt.setLong(1, txnid);
+pstmt.setString(2, names[0]);
+pstmt.setString(3, names[1]);
+pstmt.setLong(4, validWriteId.getValue());
+
+pstmt.addBatch();
+writeId++;
+if (writeId % maxBatchSize == 0) {
+  LOG.debug("Executing a batch of <" + 
TXN_TO_WRITE_ID_INSERT_QUERY + "> queries. " +
+"Batch size: " + maxBatchSize);
+  pstmt.executeBatch();
+}
+  }
+  if (writeId % maxBatchSize != 0) {
+LOG.debug("Executing a batch of <" + TXN_TO_WRITE_ID_INSERT_QUERY 
+ "> queries. " +
+  "Batch size: " + writeId % maxBatchSize);
+pstmt.executeBatch();
+  }
+}

Review Comment:
   that would be a massive refactor of TxnHandler





Issue Time Tracking
---

Worklog Id: (was: 847831)
Time Spent: 4.5h  (was: 4h 20m)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be group-committed across many users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-02-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=847830=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-847830
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 27/Feb/23 13:54
Start Date: 27/Feb/23 13:54
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3576:
URL: https://github.com/apache/hive/pull/3576#discussion_r1118766777


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -395,6 +399,32 @@ private boolean allowOperationInATransaction(QueryPlan 
queryPlan) {
 return false;
   }
 
+  @Override
+  public void addWriteIdsToMinHistory(QueryPlan plan, ValidTxnWriteIdList 
txnWriteIds) {
+if (plan.getInputs().isEmpty()) {
+  return;
+}
+Map writeIds = plan.getInputs().stream()
+  .filter(input -> !input.isDummy() && 
AcidUtils.isTransactionalTable(input.getTable()))
+  .map(input -> input.getTable().getFullyQualifiedName())
+  .collect(Collectors.toSet()).stream()
+  .collect(Collectors.toMap(Function.identity(), table -> 
getMinOpenWriteId(txnWriteIds, table)));
+
+if (!writeIds.isEmpty()) {
+  try {
+getMS().addWriteIdsToMinHistory(txnId, writeIds);
+  } catch (TException | LockException e) {
+throw new 
RuntimeException(ErrorMsg.METASTORE_COMMUNICATION_FAILED.getMsg(), e);
+  }
+}
+  }
+
+  private Long getMinOpenWriteId(ValidTxnWriteIdList txnWriteIds, String 
table) {

Review Comment:
   it's used in a lambda





Issue Time Tracking
---

Worklog Id: (was: 847830)
Time Spent: 4h 20m  (was: 4h 10m)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be group-committed across many users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-02-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=847828=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-847828
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 27/Feb/23 13:53
Start Date: 27/Feb/23 13:53
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3576:
URL: https://github.com/apache/hive/pull/3576#discussion_r1118764958


##
ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/TestCleaner2.java:
##
@@ -23,6 +23,6 @@
 public class TestCleaner2 extends TestCleaner {
   @Override
   boolean useHive130DeltaDirName() {
-return false;
+return true;

Review Comment:
   no, but we have TestCleaner & TestCleaner2 running the same stuff





Issue Time Tracking
---

Worklog Id: (was: 847828)
Time Spent: 4h 10m  (was: 4h)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be group-committed across many users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-02-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=847827=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-847827
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 27/Feb/23 13:52
Start Date: 27/Feb/23 13:52
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3576:
URL: https://github.com/apache/hive/pull/3576#discussion_r1118764121


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java:
##
@@ -5894,6 +5902,63 @@ private void addTxnToMinHistoryLevel(Connection dbConn, 
List txnIds, long
 }
   }
 
+  @Override
+  @RetrySemantics.SafeToRetry
+  public void addWriteIdsToMinHistory(long txnid, Map 
minOpenWriteIds) throws MetaException {
+if (!useMinHistoryWriteId) {
+  return;
+}
+// Need to register minimum open writeId for current transactions into 
MIN_HISTORY_WRITE_ID table.
+try {
+  Connection dbConn = null;
+  try {
+dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);

Review Comment:
   I need it in exception handling to rollbackDBConn





Issue Time Tracking
---

Worklog Id: (was: 847827)
Time Spent: 4h  (was: 3h 50m)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be group-committed across many users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-02-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=847813=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-847813
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 27/Feb/23 12:26
Start Date: 27/Feb/23 12:26
Worklog Time Spent: 10m 
  Work Description: veghlaci05 commented on code in PR #3576:
URL: https://github.com/apache/hive/pull/3576#discussion_r1118536472


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java:
##
@@ -5894,6 +5902,63 @@ private void addTxnToMinHistoryLevel(Connection dbConn, 
List txnIds, long
 }
   }
 
+  @Override
+  @RetrySemantics.SafeToRetry
+  public void addWriteIdsToMinHistory(long txnid, Map 
minOpenWriteIds) throws MetaException {
+if (!useMinHistoryWriteId) {
+  return;
+}
+// Need to register minimum open writeId for current transactions into 
MIN_HISTORY_WRITE_ID table.
+try {
+  Connection dbConn = null;
+  try {
+dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);

Review Comment:
   Try with resources instead?



##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java:
##
@@ -5894,6 +5902,63 @@ private void addTxnToMinHistoryLevel(Connection dbConn, 
List txnIds, long
 }
   }
 
+  @Override
+  @RetrySemantics.SafeToRetry
+  public void addWriteIdsToMinHistory(long txnid, Map 
minOpenWriteIds) throws MetaException {
+if (!useMinHistoryWriteId) {
+  return;
+}
+// Need to register minimum open writeId for current transactions into 
MIN_HISTORY_WRITE_ID table.
+try {
+  Connection dbConn = null;
+  try {
+dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
+try (PreparedStatement pstmt = 
dbConn.prepareStatement(MIN_HISTORY_WRITE_ID_INSERT_QUERY)) {
+  int writeId = 0;
+
+  for (Map.Entry validWriteId : 
minOpenWriteIds.entrySet()) {
+String[] names = TxnUtils.getDbTableName(validWriteId.getKey());
+
+pstmt.setLong(1, txnid);
+pstmt.setString(2, names[0]);
+pstmt.setString(3, names[1]);
+pstmt.setLong(4, validWriteId.getValue());
+
+pstmt.addBatch();
+writeId++;
+if (writeId % maxBatchSize == 0) {
+  LOG.debug("Executing a batch of <" + 
TXN_TO_WRITE_ID_INSERT_QUERY + "> queries. " +
+"Batch size: " + maxBatchSize);
+  pstmt.executeBatch();
+}
+  }
+  if (writeId % maxBatchSize != 0) {
+LOG.debug("Executing a batch of <" + TXN_TO_WRITE_ID_INSERT_QUERY 
+ "> queries. " +
+  "Batch size: " + writeId % maxBatchSize);
+pstmt.executeBatch();
+  }
+}

Review Comment:
   Maybe a utility method which accepts a lambda with PreparedStatement sets? I 
see this code pattern again and again in this class. Sth like void 
executeInBatch(PreparedStatement statement, Iterable list, Consumer 
batchCreator)



##
ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java:
##
@@ -140,41 +140,36 @@ public void run() {
 
HiveConf.ConfVars.HIVE_COMPACTOR_CLEANER_DURATION_UPDATE_INTERVAL, 
TimeUnit.MILLISECONDS),
 new 
CleanerCycleUpdater(MetricsConstants.COMPACTION_CLEANER_CYCLE_DURATION, 
startedAt));
   }
-
   long minOpenTxnId = txnHandler.findMinOpenTxnIdForCleaner();
-
   checkInterrupt();
 
   List readyToClean = 
txnHandler.findReadyToClean(minOpenTxnId, retentionTime);
-
   checkInterrupt();
 
   if (!readyToClean.isEmpty()) {
-long minTxnIdSeenOpen = txnHandler.findMinTxnIdSeenOpen();
-final long cleanerWaterMark =
-minTxnIdSeenOpen < 0 ? minOpenTxnId : Math.min(minOpenTxnId, 
minTxnIdSeenOpen);
-
-LOG.info("Cleaning based on min open txn id: " + cleanerWaterMark);
 List> cleanerList = new ArrayList<>();
 // For checking which compaction can be cleaned we can use the 
minOpenTxnId
 // However findReadyToClean will return all records that were 
compacted with old version of HMS
 // where the CQ_NEXT_TXN_ID is not set. For these compactions we 
need to provide minTxnIdSeenOpen
 // to the clean method, to avoid cleaning up deltas needed for 
running queries
 // when min_history_level is finally dropped, than every HMS will 
commit compaction the new way
 // and minTxnIdSeenOpen can be removed and minOpenTxnId can be 
used instead.
-for (CompactionInfo compactionInfo : readyToClean) {
-
+for (CompactionInfo ci : readyToClean) {
   //Check for interruption before scheduling each 

[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=847634=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-847634
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 25/Feb/23 18:07
Start Date: 25/Feb/23 18:07
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3576:
URL: https://github.com/apache/hive/pull/3576#issuecomment-1445172151

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3576)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
 
[![E](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/E-16px.png
 
'E')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
 [2 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
 [17 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3576=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3576=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 847634)
Time Spent: 3h 40m  (was: 3.5h)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be 

[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=847630=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-847630
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 25/Feb/23 15:02
Start Date: 25/Feb/23 15:02
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3576:
URL: https://github.com/apache/hive/pull/3576#discussion_r1117935763


##
ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/CompactorTest.java:
##
@@ -133,6 +133,7 @@ protected final void setup(HiveConf conf) throws Exception {
 MetastoreConf.setTimeVar(conf, MetastoreConf.ConfVars.TXN_OPENTXN_TIMEOUT, 
2, TimeUnit.SECONDS);
 MetastoreConf.setBoolVar(conf, 
MetastoreConf.ConfVars.COMPACTOR_INITIATOR_ON, true);
 MetastoreConf.setBoolVar(conf, 
MetastoreConf.ConfVars.COMPACTOR_CLEANER_ON, true);
+MetastoreConf.setBoolVar(conf, 
MetastoreConf.ConfVars.TXN_USE_MIN_HISTORY_WRITE_ID, false);

Review Comment:
   added new test





Issue Time Tracking
---

Worklog Id: (was: 847630)
Time Spent: 3.5h  (was: 3h 20m)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be group-committed across many users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-02-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=847621=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-847621
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 25/Feb/23 13:22
Start Date: 25/Feb/23 13:22
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3576:
URL: https://github.com/apache/hive/pull/3576#issuecomment-1445119688

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3576)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
 
[![E](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/E-16px.png
 
'E')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
 [2 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
 [16 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3576=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3576=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 847621)
Time Spent: 3h 20m  (was: 3h 10m)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be 

[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-02-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=847586=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-847586
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 24/Feb/23 21:15
Start Date: 24/Feb/23 21:15
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3576:
URL: https://github.com/apache/hive/pull/3576#issuecomment-1444505778

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3576)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
 
[![E](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/E-16px.png
 
'E')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
 [2 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
 [16 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3576=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3576=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 847586)
Time Spent: 3h 10m  (was: 3h)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be 

[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-02-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=847551=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-847551
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 24/Feb/23 16:42
Start Date: 24/Feb/23 16:42
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3576:
URL: https://github.com/apache/hive/pull/3576#issuecomment-1443987605

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3576)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
 
[![E](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/E-16px.png
 
'E')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
 [2 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
 [16 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3576=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3576=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 847551)
Time Spent: 3h  (was: 2h 50m)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be 

[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-02-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=847531=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-847531
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 24/Feb/23 14:15
Start Date: 24/Feb/23 14:15
Worklog Time Spent: 10m 
  Work Description: deniskuzZ opened a new pull request, #3576:
URL: https://github.com/apache/hive/pull/3576

   
   
   ### What changes were proposed in this pull request?
   
   Removes a bottleneck in the Compaction process
   
   ### Why are the changes needed?
   
   Currently, if there is a single long-running transaction that can prevent 
the Cleaner to clean up any tables. This causes file buildup in tables, which 
can cause performance penalties when listing the directories 
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Unit tests




Issue Time Tracking
---

Worklog Id: (was: 847531)
Time Spent: 2h 50m  (was: 2h 40m)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be group-committed across many users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-02-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=846505=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-846505
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 21/Feb/23 00:22
Start Date: 21/Feb/23 00:22
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #3576: 
HIVE-26704: Cleaner shouldn't be blocked by global min open txnId
URL: https://github.com/apache/hive/pull/3576




Issue Time Tracking
---

Worklog Id: (was: 846505)
Time Spent: 2h 40m  (was: 2.5h)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be group-committed across many users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=844495=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-844495
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 09/Feb/23 05:08
Start Date: 09/Feb/23 05:08
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3576:
URL: https://github.com/apache/hive/pull/3576#discussion_r1100986388


##
ql/src/test/org/apache/hadoop/hive/ql/txn/compactor/CompactorTest.java:
##
@@ -133,6 +133,7 @@ protected final void setup(HiveConf conf) throws Exception {
 MetastoreConf.setTimeVar(conf, MetastoreConf.ConfVars.TXN_OPENTXN_TIMEOUT, 
2, TimeUnit.SECONDS);
 MetastoreConf.setBoolVar(conf, 
MetastoreConf.ConfVars.COMPACTOR_INITIATOR_ON, true);
 MetastoreConf.setBoolVar(conf, 
MetastoreConf.ConfVars.COMPACTOR_CLEANER_ON, true);
+MetastoreConf.setBoolVar(conf, 
MetastoreConf.ConfVars.TXN_USE_MIN_HISTORY_WRITE_ID, false);

Review Comment:
   If we set it as false, is there any test which run using the new logic?
   CompactorTest is extended by a lot of other Test classes (TestCleaner, 
TestCleaner2, TestCleanerWithoutMinHistoryLevel etc).





Issue Time Tracking
---

Worklog Id: (was: 844495)
Time Spent: 2.5h  (was: 2h 20m)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be group-committed across many users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=844491=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-844491
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 09/Feb/23 03:59
Start Date: 09/Feb/23 03:59
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3576:
URL: https://github.com/apache/hive/pull/3576#discussion_r1100958098


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -393,6 +397,32 @@ private boolean allowOperationInATransaction(QueryPlan 
queryPlan) {
 return false;
   }
 
+  @Override
+  public void addWriteIdsToMinHistory(QueryPlan plan, ValidTxnWriteIdList 
txnWriteIds) {
+if (plan.getInputs().isEmpty()) {
+  return;
+}
+Map writeIds = plan.getInputs().stream()
+  .filter(input -> !input.isDummy() && 
AcidUtils.isTransactionalTable(input.getTable()))
+  .map(input -> input.getTable().getFullyQualifiedName())
+  .collect(Collectors.toSet()).stream()

Review Comment:
   Makes sense.





Issue Time Tracking
---

Worklog Id: (was: 844491)
Time Spent: 2h 20m  (was: 2h 10m)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be group-committed across many users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-02-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=844490=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-844490
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 09/Feb/23 03:59
Start Date: 09/Feb/23 03:59
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3576:
URL: https://github.com/apache/hive/pull/3576#discussion_r1100958004


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java:
##
@@ -1577,24 +1590,20 @@ public long findMinTxnIdSeenOpen() throws MetaException 
{
 }
 minOpenTxn = rs.getLong(1);
 if (rs.wasNull()) {
-  minOpenTxn = -1L;
+  minOpenTxn = Long.MAX_VALUE;

Review Comment:
   Ok





Issue Time Tracking
---

Worklog Id: (was: 844490)
Time Spent: 2h 10m  (was: 2h)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be group-committed across many users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-02-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=843775=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-843775
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 06/Feb/23 12:06
Start Date: 06/Feb/23 12:06
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3576:
URL: https://github.com/apache/hive/pull/3576#issuecomment-1418976271

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3576)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
 
[![E](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/E-16px.png
 
'E')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
 [2 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
 [16 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3576=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3576=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 843775)
Time Spent: 2h  (was: 1h 50m)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be 

[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-02-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=843743=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-843743
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 06/Feb/23 08:31
Start Date: 06/Feb/23 08:31
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3576:
URL: https://github.com/apache/hive/pull/3576#discussion_r1097072672


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java:
##
@@ -76,6 +76,7 @@
 import java.lang.reflect.UndeclaredThrowableException;
 import java.nio.ByteBuffer;
 import java.security.PrivilegedExceptionAction;
+import java.sql.SQLException;

Review Comment:
   fixed





Issue Time Tracking
---

Worklog Id: (was: 843743)
Time Spent: 1h 50m  (was: 1h 40m)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be group-committed across many users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-02-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=843741=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-843741
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 06/Feb/23 08:23
Start Date: 06/Feb/23 08:23
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3576:
URL: https://github.com/apache/hive/pull/3576#discussion_r1097065723


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java:
##
@@ -1577,24 +1590,20 @@ public long findMinTxnIdSeenOpen() throws MetaException 
{
 }
 minOpenTxn = rs.getLong(1);
 if (rs.wasNull()) {
-  minOpenTxn = -1L;
+  minOpenTxn = Long.MAX_VALUE;

Review Comment:
   nope, per javaDoc wasNull reports whether the last column read had a value 
of SQL NULL. You must first call one of the getter methods on a 
column to try to read its value and then call the method wasNull 
to see if the value read was SQL NULL.





Issue Time Tracking
---

Worklog Id: (was: 843741)
Time Spent: 1h 40m  (was: 1.5h)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be group-committed across many users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-02-06 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=843739=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-843739
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 06/Feb/23 08:19
Start Date: 06/Feb/23 08:19
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on code in PR #3576:
URL: https://github.com/apache/hive/pull/3576#discussion_r1097062436


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -393,6 +397,32 @@ private boolean allowOperationInATransaction(QueryPlan 
queryPlan) {
 return false;
   }
 
+  @Override
+  public void addWriteIdsToMinHistory(QueryPlan plan, ValidTxnWriteIdList 
txnWriteIds) {
+if (plan.getInputs().isEmpty()) {
+  return;
+}
+Map writeIds = plan.getInputs().stream()
+  .filter(input -> !input.isDummy() && 
AcidUtils.isTransactionalTable(input.getTable()))
+  .map(input -> input.getTable().getFullyQualifiedName())
+  .collect(Collectors.toSet()).stream()

Review Comment:
   so we don't call an expensive getMinOpenWriteId multiple times for the same 
table





Issue Time Tracking
---

Worklog Id: (was: 843739)
Time Spent: 1.5h  (was: 1h 20m)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be group-committed across many users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-01-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=842614=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-842614
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 31/Jan/23 14:43
Start Date: 31/Jan/23 14:43
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3576:
URL: https://github.com/apache/hive/pull/3576#discussion_r1092032619


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java:
##
@@ -76,6 +76,7 @@
 import java.lang.reflect.UndeclaredThrowableException;
 import java.nio.ByteBuffer;
 import java.security.PrivilegedExceptionAction;
+import java.sql.SQLException;

Review Comment:
   nit: Unused import





Issue Time Tracking
---

Worklog Id: (was: 842614)
Time Spent: 1h 20m  (was: 1h 10m)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be group-committed across many users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-01-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=842608=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-842608
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 31/Jan/23 14:32
Start Date: 31/Jan/23 14:32
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3576:
URL: https://github.com/apache/hive/pull/3576#discussion_r1092018459


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java:
##
@@ -1577,24 +1590,20 @@ public long findMinTxnIdSeenOpen() throws MetaException 
{
 }
 minOpenTxn = rs.getLong(1);
 if (rs.wasNull()) {
-  minOpenTxn = -1L;
+  minOpenTxn = Long.MAX_VALUE;

Review Comment:
   Can we use ternary operator here - 
   `minOpenTxn = rs.wasNull() ? Long.MAX_VALUE : rs.getLong(1)`





Issue Time Tracking
---

Worklog Id: (was: 842608)
Time Spent: 1h 10m  (was: 1h)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be group-committed across many users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-01-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=842594=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-842594
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 31/Jan/23 14:11
Start Date: 31/Jan/23 14:11
Worklog Time Spent: 10m 
  Work Description: SourabhBadhya commented on code in PR #3576:
URL: https://github.com/apache/hive/pull/3576#discussion_r1091985114


##
ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java:
##
@@ -393,6 +397,32 @@ private boolean allowOperationInATransaction(QueryPlan 
queryPlan) {
 return false;
   }
 
+  @Override
+  public void addWriteIdsToMinHistory(QueryPlan plan, ValidTxnWriteIdList 
txnWriteIds) {
+if (plan.getInputs().isEmpty()) {
+  return;
+}
+Map writeIds = plan.getInputs().stream()
+  .filter(input -> !input.isDummy() && 
AcidUtils.isTransactionalTable(input.getTable()))
+  .map(input -> input.getTable().getFullyQualifiedName())
+  .collect(Collectors.toSet()).stream()

Review Comment:
   Why do we need to collect as set when its going to be converted to a map 
eventually? 
   Can we write it as this - 
   `.map(input -> 
input.getTable().getFullyQualifiedName()).collect(Collectors.toMap(Function.identity(),
 table -> getMinOpenWriteId(txnWriteIds, table)));`





Issue Time Tracking
---

Worklog Id: (was: 842594)
Time Spent: 1h  (was: 50m)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be group-committed across many users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-01-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=842512=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-842512
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 31/Jan/23 10:55
Start Date: 31/Jan/23 10:55
Worklog Time Spent: 10m 
  Work Description: deniskuzZ opened a new pull request, #3576:
URL: https://github.com/apache/hive/pull/3576

   
   
   ### What changes were proposed in this pull request?
   
   Removes a bottleneck in the Compaction process
   
   ### Why are the changes needed?
   
   Currently, if there is a single long-running transaction that can prevent 
the Cleaner to clean up any tables. This causes file buildup in tables, which 
can cause performance penalties when listing the directories 
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Unit tests




Issue Time Tracking
---

Worklog Id: (was: 842512)
Time Spent: 50m  (was: 40m)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be group-committed across many users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-01-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=838490=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-838490
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 11/Jan/23 00:20
Start Date: 11/Jan/23 00:20
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #3576: 
HIVE-26704: Cleaner shouldn't be blocked by global min open txnId
URL: https://github.com/apache/hive/pull/3576




Issue Time Tracking
---

Worklog Id: (was: 838490)
Time Spent: 40m  (was: 0.5h)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be group-committed across many users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2023-01-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=836765=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-836765
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 04/Jan/23 00:19
Start Date: 04/Jan/23 00:19
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] commented on PR #3576:
URL: https://github.com/apache/hive/pull/3576#issuecomment-1370355426

   This pull request has been automatically marked as stale because it has not 
had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the d...@hive.apache.org list if the patch is in 
need of reviews.




Issue Time Tracking
---

Worklog Id: (was: 836765)
Time Spent: 0.5h  (was: 20m)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be group-committed across many users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2022-11-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=823441=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-823441
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 04/Nov/22 15:37
Start Date: 04/Nov/22 15:37
Worklog Time Spent: 10m 
  Work Description: rkirtir commented on code in PR #3576:
URL: https://github.com/apache/hive/pull/3576#discussion_r1014184230


##
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java:
##
@@ -384,29 +384,47 @@ public List findReadyToClean(long 
minOpenTxnWaterMark, long rete
  * By filtering on minOpenTxnWaterMark, we will only cleanup after 
every transaction is committed, that could see
  * the uncompacted deltas. This way the cleaner can clean up 
everything that was made obsolete by this compaction.
  */
-String whereClause = " WHERE \"CQ_STATE\" = '" + READY_FOR_CLEANING + 
"'";
-if (minOpenTxnWaterMark > 0) {
+String whereClause = " WHERE \"CQ_STATE\" = " + 
quoteChar(READY_FOR_CLEANING) + 

Review Comment:
   Is it possible to move this bigger query to TxnQueries Class and use 
.append()





Issue Time Tracking
---

Worklog Id: (was: 823441)
Time Spent: 20m  (was: 10m)

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> *Single transaction blocks cluster-wide Cleaner operations*
> Currently, if there is a single long-running transaction that can prevent the 
> Cleaner to clean up any tables. This causes file buildup in tables, which can 
> cause performance penalties when listing the directories (note that the 
> compaction is not blocked by this, so unnecessary data is not read, but the 
> files remain there which causes performance penalty). 
> We can reduce the protected files from the open transaction if we have 
> query-table correlation data stored in the backend DB, but this change will 
> need the current method of recording that detail to be revisited. 
> The naive and somewhat backward-compatible approach is to capture the 
> minOpenWriteIds per table. It involves a non-mutation operation (as in, there 
> is no need for the HMS DB to wait for another user’s operation to record it). 
> This does spew data writes into the HMS backend DB, but this is a blind 
> insert operation that can be group-committed across many users. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work logged] (HIVE-26704) Cleaner shouldn't be blocked by global min open txnId

2022-11-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=823192=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-823192
 ]

ASF GitHub Bot logged work on HIVE-26704:
-

Author: ASF GitHub Bot
Created on: 03/Nov/22 20:39
Start Date: 03/Nov/22 20:39
Worklog Time Spent: 10m 
  Work Description: sonarcloud[bot] commented on PR #3576:
URL: https://github.com/apache/hive/pull/3576#issuecomment-1302638546

   Kudos, SonarCloud Quality Gate passed!  [![Quality Gate 
passed](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/QualityGateBadge/passed-16px.png
 'Quality Gate 
passed')](https://sonarcloud.io/dashboard?id=apache_hive=3576)
   
   
[![Bug](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/bug-16px.png
 
'Bug')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
 [0 
Bugs](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=BUG)
  
   
[![Vulnerability](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/vulnerability-16px.png
 
'Vulnerability')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
 [0 
Vulnerabilities](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=VULNERABILITY)
  
   [![Security 
Hotspot](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/security_hotspot-16px.png
 'Security 
Hotspot')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
 
[![E](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/E-16px.png
 
'E')](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
 [2 Security 
Hotspots](https://sonarcloud.io/project/security_hotspots?id=apache_hive=3576=false=SECURITY_HOTSPOT)
  
   [![Code 
Smell](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/common/code_smell-16px.png
 'Code 
Smell')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
 
[![A](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/RatingBadge/A-16px.png
 
'A')](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
 [18 Code 
Smells](https://sonarcloud.io/project/issues?id=apache_hive=3576=false=CODE_SMELL)
   
   [![No Coverage 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/CoverageChart/NoCoverageInfo-16px.png
 'No Coverage 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3576=coverage=list)
 No Coverage information  
   [![No Duplication 
information](https://sonarsource.github.io/sonarcloud-github-static-resources/v2/checks/Duplications/NoDuplicationInfo-16px.png
 'No Duplication 
information')](https://sonarcloud.io/component_measures?id=apache_hive=3576=duplicated_lines_density=list)
 No Duplication information
   
   




Issue Time Tracking
---

Worklog Id: (was: 823192)
Remaining Estimate: 0h
Time Spent: 10m

> Cleaner shouldn't be blocked by global min open txnId
> -
>
> Key: HIVE-26704
> URL: https://issues.apache.org/jira/browse/HIVE-26704
> Project: Hive
>  Issue Type: Task
>Reporter: Denys Kuzmenko
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)