jcamachor commented on a change in pull request #2088:
URL: https://github.com/apache/hive/pull/2088#discussion_r599016363



##########
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
##########
@@ -2328,6 +2328,20 @@ public void 
testCompactionDataLoadedWithInsertOverwrite() throws Exception {
     return valuesReadFromHiveDriver;
   }
 
+  /**
+   * Execute Hive CLI statement and ignore any exception thrown.
+   *
+   * @param cmd arbitrary statement to execute
+   */
+  static void executeStatementOnDriverSilently(String cmd, IDriver driver) {
+    try {
+      executeStatementOnDriver(cmd, driver);
+    }
+    catch (Exception ignore) {
+

Review comment:
       Should we log a message here? 'Exception thrown and ignored' or anything 
in that style? It can help with debugging any issues in the future.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
##########
@@ -1917,7 +1917,8 @@ public HiveRelOptMaterialization 
getMaterializedViewForRebuild(String dbName, St
             // We will not try partial rewriting if there were update/delete 
operations on source tables

Review comment:
       This comment is out of date, can you update it?

##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##########
@@ -2364,91 +2364,129 @@ public Materialization 
getMaterializationInvalidationInfo(
 
     // We are composing a query that returns a single row if an update 
happened after
     // the materialization was created. Otherwise, query returns 0 rows.
+
+    // Parse validReaderWriteIdList from creation metadata
+    final ValidTxnWriteIdList validReaderWriteIdList =
+            new ValidTxnWriteIdList(creationMetadata.getValidTxnList());
+
+    // Parse validTxnList
+    final ValidReadTxnList currentValidTxnList = new 
ValidReadTxnList(validTxnListStr);
+    // Get the valid write id list for the tables in current state
+    final List<TableValidWriteIds> currentTblValidWriteIdsList = new 
ArrayList<>();
+    Connection dbConn = null;
+      for (String fullTableName : creationMetadata.getTablesUsed()) {
+        try {
+          dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
+          currentTblValidWriteIdsList.add(getValidWriteIdsForTable(dbConn, 
fullTableName, currentValidTxnList));
+        } catch (SQLException ex) {
+          String errorMsg = "Unable to query Valid writeIds of table " + 
fullTableName;
+          LOG.warn(errorMsg, ex);
+          throw new MetaException(errorMsg + " " + 
StringUtils.stringifyException(ex));
+        } finally {
+          closeDbConn(dbConn);
+        }
+      }
+    final ValidTxnWriteIdList currentValidReaderWriteIdList = 
TxnCommonUtils.createValidTxnWriteIdList(
+            currentValidTxnList.getHighWatermark(), 
currentTblValidWriteIdsList);
+
+    List<String> params = new ArrayList<>();
+    StringBuilder queryUpdateDelete = new StringBuilder();
+    StringBuilder queryCompletedCompactions = new StringBuilder();
+    StringBuilder queryCompactionQueue = new StringBuilder();
+    // compose a query that select transactions containing an update...
+    queryUpdateDelete.append("SELECT \"CTC_UPDATE_DELETE\" FROM 
\"COMPLETED_TXN_COMPONENTS\" WHERE \"CTC_UPDATE_DELETE\" ='Y' AND (");
+    queryCompletedCompactions.append("SELECT 1 FROM \"COMPLETED_COMPACTIONS\" 
WHERE (");
+    queryCompactionQueue.append("SELECT 1 FROM \"COMPACTION_QUEUE\" WHERE (");
+    int i = 0;
+    for (String fullyQualifiedName : creationMetadata.getTablesUsed()) {
+      ValidWriteIdList tblValidWriteIdList =
+              
validReaderWriteIdList.getTableValidWriteIdList(fullyQualifiedName);
+      if (tblValidWriteIdList == null) {
+        LOG.warn("ValidWriteIdList for table {} not present in creation 
metadata, this should not happen", fullyQualifiedName);
+        return null;
+      }
+
+      // First, we check whether the low watermark has moved for any of the 
tables.
+      // If it has, we return true, since it is not incrementally refreshable, 
e.g.,
+      // one of the commits that are not available may be an update/delete.
+      ValidWriteIdList currentTblValidWriteIdList =
+              
currentValidReaderWriteIdList.getTableValidWriteIdList(fullyQualifiedName);
+      if (currentTblValidWriteIdList == null) {
+        LOG.warn("Current ValidWriteIdList for table {} not present in 
creation metadata, this should not happen", fullyQualifiedName);
+        return null;
+      }
+      if (!Objects.equals(currentTblValidWriteIdList.getMinOpenWriteId(), 
tblValidWriteIdList.getMinOpenWriteId())) {
+        LOG.debug("Minimum open write id do not match for table {}", 
fullyQualifiedName);
+        return null;
+      }
+
+      // ...for each of the tables that are part of the materialized view,
+      // where the transaction had to be committed after the materialization 
was created...
+      if (i != 0) {
+        queryUpdateDelete.append("OR");
+        queryCompletedCompactions.append("OR");
+        queryCompactionQueue.append("OR");
+      }
+      String[] names = TxnUtils.getDbTableName(fullyQualifiedName);
+      assert (names.length == 2);
+      queryUpdateDelete.append(" (\"CTC_DATABASE\"=? AND \"CTC_TABLE\"=?");
+      queryCompletedCompactions.append(" (\"CC_DATABASE\"=? AND 
\"CC_TABLE\"=?");
+      queryCompactionQueue.append(" (\"CQ_DATABASE\"=? AND \"CQ_TABLE\"=?");
+      params.add(names[0]);
+      params.add(names[1]);
+      queryUpdateDelete.append(" AND (\"CTC_WRITEID\" > " + 
tblValidWriteIdList.getHighWatermark());
+      queryCompletedCompactions.append(" AND (\"CC_HIGHEST_WRITE_ID\" > " + 
tblValidWriteIdList.getHighWatermark());
+      queryUpdateDelete.append(tblValidWriteIdList.getInvalidWriteIds().length 
== 0 ? ") " :
+              " OR \"CTC_WRITEID\" IN(" + StringUtils.join(",",

Review comment:
       Is this 1000 limit in Oracle configurable? Wondering whether we could 
run into issues if it is. Also, not clear whether other RDBMSs may have a 
similar limit.
   I guess we are using an IN clause because we assume its execution will be 
more performant for the RDBMSs backing HMS. However, it could be that those 
RDBMSs are already optimizing such queries internally. Variant with equals 
would always work. Should we consider that option?
   If we decide to stick to the IN clause, I believe we could aim at emitting a 
single query to the RDBMS (iiuc that would be option b) and be rather 
conservative in the size of each sublist (e.g., 50 elements?).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to