[ 
https://issues.apache.org/jira/browse/HIVE-24519?focusedWorklogId=527908&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-527908
 ]

ASF GitHub Bot logged work on HIVE-24519:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 24/Dec/20 00:45
            Start Date: 24/Dec/20 00:45
    Worklog Time Spent: 10m 
      Work Description: jcamachor commented on a change in pull request #1772:
URL: https://github.com/apache/hive/pull/1772#discussion_r548328477



##########
File path: 
ql/src/test/results/clientpositive/llap/materialized_view_rebuild_2.q.out
##########
@@ -0,0 +1,171 @@
+PREHOOK: query: create table t1(col0 int) stored as orc TBLPROPERTIES 
('transactional'='true')
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@t1
+POSTHOOK: query: create table t1(col0 int) stored as orc TBLPROPERTIES 
('transactional'='true')
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@t1
+PREHOOK: query: insert into t1(col0) values(1)
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@t1
+POSTHOOK: query: insert into t1(col0) values(1)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@t1
+POSTHOOK: Lineage: t1.col0 SCRIPT []
+PREHOOK: query: create materialized view mat1 as
+select col0 from t1 where col0 = 1
+PREHOOK: type: CREATE_MATERIALIZED_VIEW
+PREHOOK: Input: default@t1
+PREHOOK: Output: database:default
+PREHOOK: Output: default@mat1
+POSTHOOK: query: create materialized view mat1 as
+select col0 from t1 where col0 = 1
+POSTHOOK: type: CREATE_MATERIALIZED_VIEW
+POSTHOOK: Input: default@t1
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@mat1
+Materialized view default.mat1 is up to date. Cancelling rebuild.
+PREHOOK: query: explain
+alter materialized view mat1 rebuild
+PREHOOK: type: ALTER_MATERIALIZED_VIEW_REBUILD
+POSTHOOK: query: explain
+alter materialized view mat1 rebuild
+POSTHOOK: type: ALTER_MATERIALIZED_VIEW_REBUILD
+STAGE DEPENDENCIES:

Review comment:
       If it is easy to do, can STAGE DEPENDENCIES and STAGE PLANS not be 
printed, e.g., it could be that if they are made null instead of empty, they 
are skipped in the EXPLAIN?

##########
File path: 
ql/src/test/results/clientpositive/llap/materialized_view_rebuild_2.q.out
##########
@@ -0,0 +1,171 @@
+PREHOOK: query: create table t1(col0 int) stored as orc TBLPROPERTIES 
('transactional'='true')
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@t1
+POSTHOOK: query: create table t1(col0 int) stored as orc TBLPROPERTIES 
('transactional'='true')
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@t1
+PREHOOK: query: insert into t1(col0) values(1)
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@t1
+POSTHOOK: query: insert into t1(col0) values(1)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@t1
+POSTHOOK: Lineage: t1.col0 SCRIPT []
+PREHOOK: query: create materialized view mat1 as
+select col0 from t1 where col0 = 1
+PREHOOK: type: CREATE_MATERIALIZED_VIEW
+PREHOOK: Input: default@t1
+PREHOOK: Output: database:default
+PREHOOK: Output: default@mat1
+POSTHOOK: query: create materialized view mat1 as
+select col0 from t1 where col0 = 1
+POSTHOOK: type: CREATE_MATERIALIZED_VIEW
+POSTHOOK: Input: default@t1
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@mat1
+Materialized view default.mat1 is up to date. Cancelling rebuild.

Review comment:
       This should not be printed since this is a CREATE MV statement. Please 
review the code, we may miss a check to avoid printing a message.

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/plan/HiveOperation.java
##########
@@ -120,6 +120,8 @@
       new Privilege[]{Privilege.DROP}),
   ALTER_MATERIALIZED_VIEW_REWRITE("ALTER_MATERIALIZED_VIEW_REWRITE", 
HiveParser.TOK_ALTER_MATERIALIZED_VIEW_REWRITE,
       new Privilege[]{Privilege.ALTER_METADATA}, null),
+  ALTER_MATERIALIZED_VIEW_REBUILD("ALTER_MATERIALIZED_VIEW_REBUILD", 
HiveParser.TOK_ALTER_MATERIALIZED_VIEW_REBUILD,

Review comment:
       Why do we need to add this operation? I saw a comment about it in 
another conversation but it was not clear over there.
   
   In the q file below, I see that that this change is leading to inconsistent 
operation type when rebuild is executed (QUERY) vs skipped 
(ALTER_MATERIALIZED_VIEW_REBUILD). This should not happen: The operation type 
should be the same in both cases.

##########
File path: 
ql/src/test/results/clientpositive/llap/materialized_view_rebuild_2.q.out
##########
@@ -0,0 +1,171 @@
+PREHOOK: query: create table t1(col0 int) stored as orc TBLPROPERTIES 
('transactional'='true')
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@t1
+POSTHOOK: query: create table t1(col0 int) stored as orc TBLPROPERTIES 
('transactional'='true')
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@t1
+PREHOOK: query: insert into t1(col0) values(1)
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@t1
+POSTHOOK: query: insert into t1(col0) values(1)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@t1
+POSTHOOK: Lineage: t1.col0 SCRIPT []
+PREHOOK: query: create materialized view mat1 as
+select col0 from t1 where col0 = 1
+PREHOOK: type: CREATE_MATERIALIZED_VIEW
+PREHOOK: Input: default@t1
+PREHOOK: Output: database:default
+PREHOOK: Output: default@mat1
+POSTHOOK: query: create materialized view mat1 as
+select col0 from t1 where col0 = 1
+POSTHOOK: type: CREATE_MATERIALIZED_VIEW
+POSTHOOK: Input: default@t1
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@mat1
+Materialized view default.mat1 is up to date. Cancelling rebuild.
+PREHOOK: query: explain
+alter materialized view mat1 rebuild
+PREHOOK: type: ALTER_MATERIALIZED_VIEW_REBUILD
+POSTHOOK: query: explain
+alter materialized view mat1 rebuild
+POSTHOOK: type: ALTER_MATERIALIZED_VIEW_REBUILD
+STAGE DEPENDENCIES:
+
+STAGE PLANS:
+Materialized view default.mat1 is up to date. Cancelling rebuild.
+PREHOOK: query: alter materialized view mat1 rebuild
+PREHOOK: type: ALTER_MATERIALIZED_VIEW_REBUILD
+POSTHOOK: query: alter materialized view mat1 rebuild
+POSTHOOK: type: ALTER_MATERIALIZED_VIEW_REBUILD
+PREHOOK: query: insert into t1(col0) values(1)
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@t1
+POSTHOOK: query: insert into t1(col0) values(1)
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@t1
+POSTHOOK: Lineage: t1.col0 SCRIPT []
+PREHOOK: query: explain
+alter materialized view mat1 rebuild
+PREHOOK: type: QUERY
+PREHOOK: Input: default@t1
+PREHOOK: Output: default@mat1
+POSTHOOK: query: explain
+alter materialized view mat1 rebuild
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@t1
+POSTHOOK: Output: default@mat1
+STAGE DEPENDENCIES:
+  Stage-1 is a root stage
+  Stage-2 depends on stages: Stage-1
+  Stage-0 depends on stages: Stage-2
+  Stage-3 depends on stages: Stage-0
+  Stage-4 depends on stages: Stage-3
+
+STAGE PLANS:
+  Stage: Stage-1
+    Tez
+#### A masked pattern was here ####
+      Edges:
+        Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
+#### A masked pattern was here ####
+      Vertices:
+        Map 1 
+            Map Operator Tree:
+                TableScan
+                  alias: t1
+                  filterExpr: ((ROW__ID.writeid > 1L) and (col0 = 1)) (type: 
boolean)
+                  Statistics: Num rows: 2 Data size: 8 Basic stats: COMPLETE 
Column stats: COMPLETE
+                  Filter Operator
+                    predicate: ((ROW__ID.writeid > 1L) and (col0 = 1)) (type: 
boolean)
+                    Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE 
Column stats: COMPLETE
+                    Select Operator
+                      expressions: 1 (type: int)
+                      outputColumnNames: _col0
+                      Statistics: Num rows: 1 Data size: 4 Basic stats: 
COMPLETE Column stats: COMPLETE
+                      File Output Operator
+                        compressed: false
+                        Statistics: Num rows: 1 Data size: 4 Basic stats: 
COMPLETE Column stats: COMPLETE
+                        table:
+                            input format: 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
+                            output format: 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
+                            serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
+                            name: default.mat1
+                      Select Operator
+                        expressions: _col0 (type: int)
+                        outputColumnNames: col0
+                        Statistics: Num rows: 1 Data size: 4 Basic stats: 
COMPLETE Column stats: COMPLETE
+                        Group By Operator
+                          aggregations: min(col0), max(col0), count(1), 
count(col0), compute_bit_vector(col0, 'hll')
+                          minReductionHashAggr: 0.4
+                          mode: hash
+                          outputColumnNames: _col0, _col1, _col2, _col3, _col4
+                          Statistics: Num rows: 1 Data size: 168 Basic stats: 
COMPLETE Column stats: COMPLETE
+                          Reduce Output Operator
+                            null sort order: 
+                            sort order: 
+                            Statistics: Num rows: 1 Data size: 168 Basic 
stats: COMPLETE Column stats: COMPLETE
+                            value expressions: _col0 (type: int), _col1 (type: 
int), _col2 (type: bigint), _col3 (type: bigint), _col4 (type: binary)
+            Execution mode: llap
+            LLAP IO: may be used (ACID table)
+        Reducer 2 
+            Execution mode: llap
+            Reduce Operator Tree:
+              Group By Operator
+                aggregations: min(VALUE._col0), max(VALUE._col1), 
count(VALUE._col2), count(VALUE._col3), compute_bit_vector(VALUE._col4)
+                mode: mergepartial
+                outputColumnNames: _col0, _col1, _col2, _col3, _col4
+                Statistics: Num rows: 1 Data size: 168 Basic stats: COMPLETE 
Column stats: COMPLETE
+                Select Operator
+                  expressions: 'LONG' (type: string), UDFToLong(_col0) (type: 
bigint), UDFToLong(_col1) (type: bigint), (_col2 - _col3) (type: bigint), 
COALESCE(ndv_compute_bit_vector(_col4),0) (type: bigint), _col4 (type: binary)
+                  outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
+                  Statistics: Num rows: 1 Data size: 264 Basic stats: COMPLETE 
Column stats: COMPLETE
+                  File Output Operator
+                    compressed: false
+                    Statistics: Num rows: 1 Data size: 264 Basic stats: 
COMPLETE Column stats: COMPLETE
+                    table:
+                        input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
+                        output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
+                        serde: 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
+
+  Stage: Stage-2
+    Dependency Collection
+
+  Stage: Stage-0
+    Move Operator
+      tables:
+          replace: false
+          table:
+              input format: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
+              output format: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
+              serde: org.apache.hadoop.hive.ql.io.orc.OrcSerde
+              name: default.mat1
+
+  Stage: Stage-3
+    Stats Work
+      Basic Stats Work:
+      Column Stats Desc:
+          Columns: col0
+          Column Types: int
+          Table: default.mat1
+
+  Stage: Stage-4
+    Materialized View Update
+      name: default.mat1
+      update creation metadata: true
+
+PREHOOK: query: alter materialized view mat1 rebuild
+PREHOOK: type: QUERY
+PREHOOK: Input: default@t1
+PREHOOK: Output: default@mat1
+POSTHOOK: query: alter materialized view mat1 rebuild
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@t1
+POSTHOOK: Output: default@mat1
+POSTHOOK: Lineage: mat1.col0 SIMPLE []

Review comment:
       Can we add a similar test with an aggregate MV too (incremental rebuild 
with merge)?

##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/ddl/view/materialized/alter/rebuild/AlterMaterializedViewRebuildAnalyzer.java
##########
@@ -63,6 +66,20 @@ public void analyzeInternal(ASTNode root) throws 
SemanticException {
       unparseTranslator.addTableNameTranslation(tableTree, 
SessionState.get().getCurrentDatabase());
       return;
     }
+
+    try {
+      Boolean outdated = db.isOutdatedMaterializedView(getTxnMgr(), tableName);
+      if (outdated != null && !outdated) {
+        String msg = String.format("Materialized view %s.%s is up to date. 
Cancelling rebuild.",
+                tableName.getDb(), tableName.getTable());
+        LOG.info(msg);
+        console.printInfo(msg, false);

Review comment:
       nit. `Materialized view %s.%s is up to date. Cancelling rebuild.` -> 
`Materialized view %s.%s is up to date. Skipping rebuild.` ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 527908)
    Time Spent: 40m  (was: 0.5h)

> Optimize MV: Materialized views should not rebuild when tables are not 
> modified
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-24519
>                 URL: https://issues.apache.org/jira/browse/HIVE-24519
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Materialized views
>            Reporter: Rajesh Balamohan
>            Assignee: Krisztian Kasa
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> e.g
> {noformat}
> create materialized view c_c_address as 
> select c_customer_sk from customer c, customer_address ca where 
> c_current_addr_sk = ca.ca_address_id;
> ALTER MATERIALIZED VIEW c_c_address REBUILD; <-- This shouldn't trigger 
> rebuild, when source tables are not modified
>  {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to