veghlaci05 commented on code in PR #4313:
URL: https://github.com/apache/hive/pull/4313#discussion_r1204241304
##########
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java:
##########
@@ -472,23 +473,38 @@ public List<CompactionInfo> findReadyToClean(long
minOpenTxnWaterMark, long rete
@Override
@RetrySemantics.ReadOnly
- public List<CompactionInfo> findReadyToCleanAborts(long
abortedTimeThreshold, int abortedThreshold) throws MetaException {
+ public List<CompactionInfo> findReadyToCleanAborts(long
abortedTimeThreshold, int abortedThreshold, long retentionTime) throws
MetaException {
try {
List<CompactionInfo> readyToCleanAborts = new ArrayList<>();
try (Connection dbConn =
getDbConn(Connection.TRANSACTION_READ_COMMITTED, connPoolCompaction);
Statement stmt = dbConn.createStatement()) {
boolean checkAbortedTimeThreshold = abortedTimeThreshold >= 0;
- String sCheckAborted = "SELECT \"tc\".\"TC_DATABASE\",
\"tc\".\"TC_TABLE\", \"tc\".\"TC_PARTITION\", " +
- " \"tc\".\"MIN_TXN_START_TIME\", \"tc\".\"ABORTED_TXN_COUNT\",
\"minOpenWriteTxnId\".\"MIN_OPEN_WRITE_TXNID\" FROM " +
- " ( SELECT \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\", " +
- " MIN(\"TXN_STARTED\") AS \"MIN_TXN_START_TIME\", COUNT(*) AS
\"ABORTED_TXN_COUNT\" FROM \"TXNS\", \"TXN_COMPONENTS\" " +
- " WHERE \"TXN_ID\" = \"TC_TXNID\" AND \"TXN_STATE\" = " +
TxnStatus.ABORTED +
- " GROUP BY \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\" " +
- (checkAbortedTimeThreshold ? "" : " HAVING COUNT(*) > " +
abortedThreshold) + " ) \"tc\" " +
- " LEFT JOIN ( SELECT MIN(\"TC_TXNID\") AS
\"MIN_OPEN_WRITE_TXNID\", \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\" FROM
\"TXNS\", \"TXN_COMPONENTS\" " +
- " WHERE \"TXN_ID\" = \"TC_TXNID\" AND \"TXN_STATE\"=" +
TxnStatus.OPEN + " GROUP BY \"TC_DATABASE\", \"TC_TABLE\", \"TC_PARTITION\" )
\"minOpenWriteTxnId\" " +
- " ON \"tc\".\"TC_DATABASE\" =
\"minOpenWriteTxnId\".\"TC_DATABASE\" AND \"tc\".\"TC_TABLE\" =
\"minOpenWriteTxnId\".\"TC_TABLE\"" +
- " AND (\"tc\".\"TC_PARTITION\" =
\"minOpenWriteTxnId\".\"TC_PARTITION\" OR (\"tc\".\"TC_PARTITION\" IS NULL AND
\"minOpenWriteTxnId\".\"TC_PARTITION\" IS NULL))";
+ String firstInnerQuery = "SELECT \"tc\".\"TC_DATABASE\" AS \"DB\",
\"tc\".\"TC_TABLE\" AS \"TBL\", \"tc\".\"TC_PARTITION\" AS \"PART\", " +
Review Comment:
I agree that the performance penalty should not be high. The benefits are
already written by @SourabhBadhya.
I think the fact that from now on we can clearly and easily separate aborted
TXN related issues from Compaction related ones, fully compensates us for the
extra table and more complex query.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]