[ https://issues.apache.org/jira/browse/HIVE-26704?focusedWorklogId=847831&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-847831 ]
ASF GitHub Bot logged work on HIVE-26704: ----------------------------------------- Author: ASF GitHub Bot Created on: 27/Feb/23 14:00 Start Date: 27/Feb/23 14:00 Worklog Time Spent: 10m Work Description: deniskuzZ commented on code in PR #3576: URL: https://github.com/apache/hive/pull/3576#discussion_r1118775359 ########## standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java: ########## @@ -5894,6 +5902,63 @@ private void addTxnToMinHistoryLevel(Connection dbConn, List<Long> txnIds, long } } + @Override + @RetrySemantics.SafeToRetry + public void addWriteIdsToMinHistory(long txnid, Map<String, Long> minOpenWriteIds) throws MetaException { + if (!useMinHistoryWriteId) { + return; + } + // Need to register minimum open writeId for current transactions into MIN_HISTORY_WRITE_ID table. + try { + Connection dbConn = null; + try { + dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED); + try (PreparedStatement pstmt = dbConn.prepareStatement(MIN_HISTORY_WRITE_ID_INSERT_QUERY)) { + int writeId = 0; + + for (Map.Entry<String, Long> validWriteId : minOpenWriteIds.entrySet()) { + String[] names = TxnUtils.getDbTableName(validWriteId.getKey()); + + pstmt.setLong(1, txnid); + pstmt.setString(2, names[0]); + pstmt.setString(3, names[1]); + pstmt.setLong(4, validWriteId.getValue()); + + pstmt.addBatch(); + writeId++; + if (writeId % maxBatchSize == 0) { + LOG.debug("Executing a batch of <" + TXN_TO_WRITE_ID_INSERT_QUERY + "> queries. " + + "Batch size: " + maxBatchSize); + pstmt.executeBatch(); + } + } + if (writeId % maxBatchSize != 0) { + LOG.debug("Executing a batch of <" + TXN_TO_WRITE_ID_INSERT_QUERY + "> queries. " + + "Batch size: " + writeId % maxBatchSize); + pstmt.executeBatch(); + } + } Review Comment: that would be a massive refactor of TxnHandler Issue Time Tracking ------------------- Worklog Id: (was: 847831) Time Spent: 4.5h (was: 4h 20m) > Cleaner shouldn't be blocked by global min open txnId > ----------------------------------------------------- > > Key: HIVE-26704 > URL: https://issues.apache.org/jira/browse/HIVE-26704 > Project: Hive > Issue Type: Task > Reporter: Denys Kuzmenko > Assignee: Denys Kuzmenko > Priority: Major > Labels: pull-request-available > Time Spent: 4.5h > Remaining Estimate: 0h > > *Single transaction blocks cluster-wide Cleaner operations* > Currently, if there is a single long-running transaction that can prevent the > Cleaner to clean up any tables. This causes file buildup in tables, which can > cause performance penalties when listing the directories (note that the > compaction is not blocked by this, so unnecessary data is not read, but the > files remain there which causes performance penalty). > We can reduce the protected files from the open transaction if we have > query-table correlation data stored in the backend DB, but this change will > need the current method of recording that detail to be revisited. > The naive and somewhat backward-compatible approach is to capture the > minOpenWriteIds per table. It involves a non-mutation operation (as in, there > is no need for the HMS DB to wait for another user’s operation to record it). > This does spew data writes into the HMS backend DB, but this is a blind > insert operation that can be group-committed across many users. -- This message was sent by Atlassian Jira (v8.20.10#820010)