[ https://issues.apache.org/jira/browse/HIVE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196731#comment-15196731 ]
Eugene Koifman commented on HIVE-12439: --------------------------------------- 1. CompactionTxnHandler.cleanEmptyAborted() - why rewrite "String s = "select txn_id from TXNS where " + "txn_id not in (select tc_txnid from TXN_COMPONENTS) and " + "txn_state = '" + TXN_ABORTED + "'";" The IN clause here doesn't list values - it's not (cannot in fact be) subject to 1000 or any other limit. Also, part of your rewrite lost "LOG.info("Removed " + rc + " empty Aborted transactions: " + txnIdBatch + " from TXNS");" This is a critical debug/support log statement - it logs the actual txn IDs that were cleared. 2. TxnHandler.openTxns() " if (i > first) { valuesClause.append(", "); } " this will generate a query with "values,(..." if the previous "if" with METASTORE_DIRECT_SQL_MAX_ELEMENTS_VALUES_CLAUSE executes. This is a nit but this class has quoteString() and quoteChar() to generate SQL with string values 3. TxnHandler.timeOutLocks() - why does this need a suffix at all? The extra parentheses seem redundant. 4. TxnHandler.abortTxns() - there seems to be a redundant set or parentheses wrapping the IN clause. Why is this necessary? 5. TestTxnUtils - I think this test is very limited. It would be better (in addition) to add some tests that will actually cause the new queries to execute in a DB (Derby in practice). In particular, once the 2 new properties are exceeded. I think that would provide better test coverage. > CompactionTxnHandler.markCleaned() and TxnHandler.openTxns() misc improvements > ------------------------------------------------------------------------------ > > Key: HIVE-12439 > URL: https://issues.apache.org/jira/browse/HIVE-12439 > Project: Hive > Issue Type: Improvement > Components: Metastore, Transactions > Affects Versions: 1.0.0 > Reporter: Eugene Koifman > Assignee: Wei Zheng > Attachments: HIVE-12439.1.patch > > > # add a safeguard to make sure IN clause is not too large; break up by txn id > to delete from TXN_COMPONENTS where tc_txnid in ... > # TxnHandler. openTxns() - use 1 insert with many rows in values() clause, > rather than 1 DB roundtrip per row -- This message was sent by Atlassian JIRA (v6.3.4#6332)