[ 
https://issues.apache.org/jira/browse/HIVE-24840?focusedWorklogId=570427&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-570427
 ]

ASF GitHub Bot logged work on HIVE-24840:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 23/Mar/21 13:22
            Start Date: 23/Mar/21 13:22
    Worklog Time Spent: 10m 
      Work Description: kasakrisz commented on a change in pull request #2088:
URL: https://github.com/apache/hive/pull/2088#discussion_r599561646



##########
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
##########
@@ -2364,91 +2364,129 @@ public Materialization 
getMaterializationInvalidationInfo(
 
     // We are composing a query that returns a single row if an update 
happened after
     // the materialization was created. Otherwise, query returns 0 rows.
+
+    // Parse validReaderWriteIdList from creation metadata
+    final ValidTxnWriteIdList validReaderWriteIdList =
+            new ValidTxnWriteIdList(creationMetadata.getValidTxnList());
+
+    // Parse validTxnList
+    final ValidReadTxnList currentValidTxnList = new 
ValidReadTxnList(validTxnListStr);
+    // Get the valid write id list for the tables in current state
+    final List<TableValidWriteIds> currentTblValidWriteIdsList = new 
ArrayList<>();
+    Connection dbConn = null;
+      for (String fullTableName : creationMetadata.getTablesUsed()) {
+        try {
+          dbConn = getDbConn(Connection.TRANSACTION_READ_COMMITTED);
+          currentTblValidWriteIdsList.add(getValidWriteIdsForTable(dbConn, 
fullTableName, currentValidTxnList));
+        } catch (SQLException ex) {
+          String errorMsg = "Unable to query Valid writeIds of table " + 
fullTableName;
+          LOG.warn(errorMsg, ex);
+          throw new MetaException(errorMsg + " " + 
StringUtils.stringifyException(ex));
+        } finally {
+          closeDbConn(dbConn);
+        }
+      }
+    final ValidTxnWriteIdList currentValidReaderWriteIdList = 
TxnCommonUtils.createValidTxnWriteIdList(
+            currentValidTxnList.getHighWatermark(), 
currentTblValidWriteIdsList);
+
+    List<String> params = new ArrayList<>();
+    StringBuilder queryUpdateDelete = new StringBuilder();
+    StringBuilder queryCompletedCompactions = new StringBuilder();
+    StringBuilder queryCompactionQueue = new StringBuilder();
+    // compose a query that select transactions containing an update...
+    queryUpdateDelete.append("SELECT \"CTC_UPDATE_DELETE\" FROM 
\"COMPLETED_TXN_COMPONENTS\" WHERE \"CTC_UPDATE_DELETE\" ='Y' AND (");
+    queryCompletedCompactions.append("SELECT 1 FROM \"COMPLETED_COMPACTIONS\" 
WHERE (");
+    queryCompactionQueue.append("SELECT 1 FROM \"COMPACTION_QUEUE\" WHERE (");
+    int i = 0;
+    for (String fullyQualifiedName : creationMetadata.getTablesUsed()) {
+      ValidWriteIdList tblValidWriteIdList =
+              
validReaderWriteIdList.getTableValidWriteIdList(fullyQualifiedName);
+      if (tblValidWriteIdList == null) {
+        LOG.warn("ValidWriteIdList for table {} not present in creation 
metadata, this should not happen", fullyQualifiedName);
+        return null;
+      }
+
+      // First, we check whether the low watermark has moved for any of the 
tables.
+      // If it has, we return true, since it is not incrementally refreshable, 
e.g.,
+      // one of the commits that are not available may be an update/delete.
+      ValidWriteIdList currentTblValidWriteIdList =
+              
currentValidReaderWriteIdList.getTableValidWriteIdList(fullyQualifiedName);
+      if (currentTblValidWriteIdList == null) {
+        LOG.warn("Current ValidWriteIdList for table {} not present in 
creation metadata, this should not happen", fullyQualifiedName);
+        return null;
+      }
+      if (!Objects.equals(currentTblValidWriteIdList.getMinOpenWriteId(), 
tblValidWriteIdList.getMinOpenWriteId())) {
+        LOG.debug("Minimum open write id do not match for table {}", 
fullyQualifiedName);
+        return null;
+      }
+
+      // ...for each of the tables that are part of the materialized view,
+      // where the transaction had to be committed after the materialization 
was created...
+      if (i != 0) {
+        queryUpdateDelete.append("OR");
+        queryCompletedCompactions.append("OR");
+        queryCompactionQueue.append("OR");
+      }
+      String[] names = TxnUtils.getDbTableName(fullyQualifiedName);
+      assert (names.length == 2);
+      queryUpdateDelete.append(" (\"CTC_DATABASE\"=? AND \"CTC_TABLE\"=?");
+      queryCompletedCompactions.append(" (\"CC_DATABASE\"=? AND 
\"CC_TABLE\"=?");
+      queryCompactionQueue.append(" (\"CQ_DATABASE\"=? AND \"CQ_TABLE\"=?");
+      params.add(names[0]);
+      params.add(names[1]);
+      queryUpdateDelete.append(" AND (\"CTC_WRITEID\" > " + 
tblValidWriteIdList.getHighWatermark());
+      queryCompletedCompactions.append(" AND (\"CC_HIGHEST_WRITE_ID\" > " + 
tblValidWriteIdList.getHighWatermark());
+      queryUpdateDelete.append(tblValidWriteIdList.getInvalidWriteIds().length 
== 0 ? ") " :
+              " OR \"CTC_WRITEID\" IN(" + StringUtils.join(",",

Review comment:
       It seems that this 1000 limitation is not configurable.
   
   Found that `TxnUtils.buildQueryWithINClauseStrings` is used elsewhere. This 
function can produce a list of queries not just based on the number of 
expressions in the `in` operator but the overall size of the direct sql query 
text. However we already have multiple in operators: at least two per source 
table. 
   
   Since this issue is not introduced by this patch and all available solutions 
are complex I recommend dealing this in a follow-up patch: 
[HIVE-24925](https://issues.apache.org/jira/browse/HIVE-24925)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 570427)
    Time Spent: 1.5h  (was: 1h 20m)

> Materialized View incremental rebuild produces wrong result set after 
> compaction
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-24840
>                 URL: https://issues.apache.org/jira/browse/HIVE-24840
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Critical
>              Labels: pull-request-available
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> {code}
> create table t1(a int, b varchar(128), c float) stored as orc TBLPROPERTIES 
> ('transactional'='true');
> insert into t1(a,b, c) values (1, 'one', 1.1), (2, 'two', 2.2), (NULL, NULL, 
> NULL);
> create materialized view mat1 stored as orc TBLPROPERTIES 
> ('transactional'='true') as 
>             select a,b,c from t1 where a > 0 or a is null;
> delete from t1 where a = 1;
> alter table t1 compact 'major';
> -- Wait until compaction finished.
> alter materialized view mat1 rebuild;
> {code}
> Expected result of query
> {code}
> select * from mat1;
> {code}
> {code}
> 2 two 2
> NULL NULL NULL
> {code}
> but if incremental rebuild is enabled the result is
> {code}
> 1 one 1
> 2 two 2
> NULL NULL NULL
> {code}
> Cause: Incremental rebuild queries whether the source tables of a 
> materialized view has delete or update transaction since the last rebuild 
> from metastore from COMPLETED_TXN_COMPONENTS table. However when a major 
> compaction is performed on the source tables the records related to these 
> tables are deleted from COMPLETED_TXN_COMPONENTS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to