okumin commented on code in PR #6331:
URL: https://github.com/apache/hive/pull/6331#discussion_r2929819403


##########
ql/src/test/results/clientpositive/llap/lvj_stats_isolation.q.out:
##########
@@ -0,0 +1,360 @@
+PREHOOK: query: create table lvj_stats (id string, f1 string)
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@lvj_stats
+POSTHOOK: query: create table lvj_stats (id string, f1 string)
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@lvj_stats
+PREHOOK: query: insert into lvj_stats values
+  ('a','v1'), ('a','v2'), ('a','v3'),
+  ('b','v4'), ('b','v5'), ('b','v6')
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+PREHOOK: Output: default@lvj_stats
+POSTHOOK: query: insert into lvj_stats values
+  ('a','v1'), ('a','v2'), ('a','v3'),
+  ('b','v4'), ('b','v5'), ('b','v6')
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+POSTHOOK: Output: default@lvj_stats
+POSTHOOK: Lineage: lvj_stats.f1 SCRIPT []
+POSTHOOK: Lineage: lvj_stats.id SCRIPT []
+PREHOOK: query: analyze table lvj_stats compute statistics
+PREHOOK: type: QUERY
+PREHOOK: Input: default@lvj_stats
+PREHOOK: Output: default@lvj_stats
+POSTHOOK: query: analyze table lvj_stats compute statistics
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@lvj_stats
+POSTHOOK: Output: default@lvj_stats
+PREHOOK: query: analyze table lvj_stats compute statistics for columns
+PREHOOK: type: ANALYZE_TABLE
+PREHOOK: Input: default@lvj_stats
+PREHOOK: Output: default@lvj_stats
+#### A masked pattern was here ####
+POSTHOOK: query: analyze table lvj_stats compute statistics for columns
+POSTHOOK: type: ANALYZE_TABLE
+POSTHOOK: Input: default@lvj_stats
+POSTHOOK: Output: default@lvj_stats
+#### A masked pattern was here ####
+PREHOOK: query: explain
+select id, f1, count(*)
+from (select id, f1 from lvj_stats group by id, f1) sub
+lateral view posexplode(array(f1, f1)) t1 as pos1, val1
+group by id, f1
+PREHOOK: type: QUERY
+PREHOOK: Input: default@lvj_stats
+#### A masked pattern was here ####
+POSTHOOK: query: explain
+select id, f1, count(*)
+from (select id, f1 from lvj_stats group by id, f1) sub
+lateral view posexplode(array(f1, f1)) t1 as pos1, val1
+group by id, f1
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@lvj_stats
+#### A masked pattern was here ####
+STAGE DEPENDENCIES:
+  Stage-1 is a root stage
+  Stage-0 depends on stages: Stage-1
+
+STAGE PLANS:
+  Stage: Stage-1
+    Tez
+#### A masked pattern was here ####
+      Edges:
+        Reducer 2 <- Map 1 (SIMPLE_EDGE)
+        Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
+#### A masked pattern was here ####
+      Vertices:
+        Map 1 
+            Map Operator Tree:
+                TableScan
+                  alias: lvj_stats
+                  Statistics: Num rows: 6 Data size: 1026 Basic stats: 
COMPLETE Column stats: COMPLETE
+                  Select Operator
+                    expressions: id (type: string), f1 (type: string)
+                    outputColumnNames: id, f1
+                    Statistics: Num rows: 6 Data size: 1026 Basic stats: 
COMPLETE Column stats: COMPLETE
+                    Group By Operator
+                      keys: id (type: string), f1 (type: string)
+                      minReductionHashAggr: 0.4
+                      mode: hash
+                      outputColumnNames: _col0, _col1
+                      Statistics: Num rows: 6 Data size: 1026 Basic stats: 
COMPLETE Column stats: COMPLETE
+                      Reduce Output Operator
+                        key expressions: _col0 (type: string), _col1 (type: 
string)
+                        null sort order: zz
+                        sort order: ++
+                        Map-reduce partition columns: _col0 (type: string), 
_col1 (type: string)
+                        Statistics: Num rows: 6 Data size: 1026 Basic stats: 
COMPLETE Column stats: COMPLETE
+            Execution mode: vectorized, llap
+            LLAP IO: all inputs
+        Reducer 2 
+            Execution mode: llap
+            Reduce Operator Tree:
+              Group By Operator
+                keys: KEY._col0 (type: string), KEY._col1 (type: string)
+                mode: mergepartial
+                outputColumnNames: _col0, _col1
+                Statistics: Num rows: 6 Data size: 1026 Basic stats: COMPLETE 
Column stats: COMPLETE
+                Lateral View Forward
+                  Statistics: Num rows: 6 Data size: 1026 Basic stats: 
COMPLETE Column stats: COMPLETE
+                  Select Operator
+                    expressions: _col0 (type: string), _col1 (type: string)
+                    outputColumnNames: _col0, _col1
+                    Statistics: Num rows: 6 Data size: 1026 Basic stats: 
COMPLETE Column stats: COMPLETE
+                    Lateral View Join Operator
+                      outputColumnNames: _col0, _col1, _col2, _col3
+                      Statistics: Num rows: 6 Data size: 12546 Basic stats: 
COMPLETE Column stats: COMPLETE
+                      Select Operator
+                        expressions: _col0 (type: string), _col1 (type: string)
+                        outputColumnNames: _col0, _col1
+                        Statistics: Num rows: 6 Data size: 12546 Basic stats: 
COMPLETE Column stats: COMPLETE
+                        Group By Operator
+                          aggregations: count()
+                          keys: _col0 (type: string), _col1 (type: string)
+                          minReductionHashAggr: 0.4
+                          mode: hash
+                          outputColumnNames: _col0, _col1, _col2
+                          Statistics: Num rows: 6 Data size: 1074 Basic stats: 
COMPLETE Column stats: COMPLETE
+                          Reduce Output Operator
+                            key expressions: _col0 (type: string), _col1 
(type: string)
+                            null sort order: zz
+                            sort order: ++
+                            Map-reduce partition columns: _col0 (type: 
string), _col1 (type: string)
+                            Statistics: Num rows: 6 Data size: 1074 Basic 
stats: COMPLETE Column stats: COMPLETE
+                            value expressions: _col2 (type: bigint)
+                  Select Operator
+                    expressions: array(_col1,_col1) (type: array<string>)
+                    outputColumnNames: _col0
+                    Statistics: Num rows: 6 Data size: 11520 Basic stats: 
COMPLETE Column stats: COMPLETE
+                    UDTF Operator
+                      Statistics: Num rows: 6 Data size: 11520 Basic stats: 
COMPLETE Column stats: COMPLETE

Review Comment:
   I could be wrong. I am feeling the bug is not in LateralViewJoinStatsRule 
but in UDTFStatsRule. The posexclude accepts a single array, `_col0`, emitted 
by the SelectOperator, and emits `pos1` and `val1`. However, it retains the 
original column statistics for the single array column, named `_col0` and 
something accidentally gets wrong.
   
   <img width="1246" height="246" alt="Image" 
src="https://github.com/user-attachments/assets/41b15c05-f650-44f8-ba33-b96eff9359ca";
 />
   
   I guess the current rules construct the statistics tree like this.
   
   <img width="700" height="468" alt="Image" 
src="https://github.com/user-attachments/assets/f8921c53-4b19-4d1c-aee9-ba84ce233791";
 />
   
   I'd say this needs to be like this one?
   
   <img width="687" height="473" alt="Image" 
src="https://github.com/user-attachments/assets/01b2e73e-75e2-4346-b7a4-108f31799272";
 />
   
   I think the current logic works as expected when we fully discard the output 
from the right-hand side, i.e., UDTF. However, if we pick up values from the 
right hand side, something might get wrong because it has no chance to resolve 
the col stats of pos1 or val1.
   
   ```
   select id, f1, pos1, count(*)
   from (select id, f1 from lvj_stats group by id, f1) sub
   lateral view posexplode(array(f1, f1)) t1 as pos1, val1
   group by id, f1, pos1;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to