Re: [PR] HIVE-28280: SemanticException when querying VIEW with DISTINCT clause [hive]

via GitHub Thu, 06 Nov 2025 08:03:44 -0800


zabetak commented on code in PR #6103:
URL: https://github.com/apache/hive/pull/6103#discussion_r2499419956



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java:
##########
@@ -102,8 +101,7 @@ public class HiveRelFieldTrimmer extends RelFieldTrimmer {
 
   private static final ThreadLocal<ColumnAccessInfo> COLUMN_ACCESS_INFO =
       new ThreadLocal<>();
-  private static final ThreadLocal<Map<HiveProject, Table>> 
VIEW_PROJECT_TO_TABLE_SCHEMA =
-      new ThreadLocal<>();
+  private static final ThreadLocal<Map<RelNode, Table>> VIEW_RELNODE_TO_TABLE 
= new ThreadLocal<>();

Review Comment:
   The `VIEW_` prefix does not indicate much inside this class so we could keep 
the name simpler: `RELNODE_TO_TABLE`.



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java:
##########
@@ -5048,6 +5047,7 @@ private RelNode genLogicalPlan(QB qb, boolean outerMostQB,
       return srcRel;
     }
 
+

Review Comment:
   redundant space



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/RelFieldTrimmer.java:
##########
@@ -1239,6 +1240,17 @@ public TrimResult trimFields(
     return result(newTableAccessRel, mapping);
   }
 
+  /**
+   * Sets columnAccessInfo object for views.
+   * 
+   * @param rel RelNode
+   * @param fieldsUsed Fields used
+   */
+  protected void setColumnAccessInfoForViews(RelNode rel, ImmutableBitSet 
fieldsUsed) {
+    // This method is overridden in child class
+    throw new UnsupportedOperationException();
+  }

Review Comment:
   It would be nice to model this in a more general way that is less Hive 
specific and if possible contribute it back to calcite as well. Eventually, we 
would like to get rid of the Hive copy of the class but this becomes more 
difficult with every new customization that we add.
   
   How about something like:
   
   ```java
   protected void preTrim(RelNode rel, ImmutableBitSet fieldsUsed, 
Set<RelDataTypeField> extraFields) {
   }
   ```
   Plus, it is better if we don't throw by default cause that breaks all 
potential consumers of this API.
   
   Can you log a CALCITE ticket with a proposal and the appropriate motivation 
and see if it gets accepted?



##########
ql/src/test/queries/clientpositive/view_top_relnode_not_project_authorization.q:
##########
@@ -0,0 +1,16 @@
+set hive.security.authorization.enabled=true;
+create table t1 (username string, id int);
+
+create view vw_t1 as select distinct username from t1 limit 5;
+explain cbo select * from vw_t1;
+select * from vw_t1; 
+
+create view vw_t2 as 

Review Comment:
   Does vw_t2 and vw_t3 have a different operator on top of the view? If not 
then we are not really adding test coverage so we should drop those test cases.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java:
##########
@@ -203,6 +202,25 @@ protected RexNode handle(RexFieldAccess fieldAccess) {
     return dispatchTrimFields(input, fieldsUsedBuilder.build(), extraFields);
   }
 
+  @Override
+  protected void setColumnAccessInfoForViews(RelNode rel, ImmutableBitSet 
fieldsUsed) {
+    final ColumnAccessInfo columnAccessInfo = COLUMN_ACCESS_INFO.get();
+    final Map<RelNode, Table> relNodeToTableAndProjects = 
VIEW_RELNODE_TO_TABLE.get();
+
+    // HiveTableScans are handled separately in HiveTableScan's trimFields 
method.
+    if (!(rel instanceof HiveTableScan) &&

Review Comment:
   It would be nice to unify the logic of collecting column access information 
in a single place. However, I would prefer to do this in a follow-up ticket to 
avoid spinning off another round of reviews.



##########
ql/src/test/results/clientpositive/llap/view_top_relnode_not_project_authorization.q.out:
##########
@@ -0,0 +1,129 @@
+PREHOOK: query: create table t1 (username string, id int)
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@t1
+POSTHOOK: query: create table t1 (username string, id int)
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@t1
+PREHOOK: query: create view vw_t1 as select distinct username from t1 limit 5
+PREHOOK: type: CREATEVIEW
+PREHOOK: Input: default@t1
+PREHOOK: Output: database:default
+PREHOOK: Output: default@vw_t1
+POSTHOOK: query: create view vw_t1 as select distinct username from t1 limit 5
+POSTHOOK: type: CREATEVIEW
+POSTHOOK: Input: default@t1
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@vw_t1
+POSTHOOK: Lineage: vw_t1.username SIMPLE [(t1)t1.FieldSchema(name:username, 
type:string, comment:null), ]
+PREHOOK: query: explain cbo select * from vw_t1
+PREHOOK: type: QUERY
+PREHOOK: Input: default@t1
+PREHOOK: Input: default@vw_t1
+#### A masked pattern was here ####
+POSTHOOK: query: explain cbo select * from vw_t1
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@t1
+POSTHOOK: Input: default@vw_t1
+#### A masked pattern was here ####
+CBO PLAN:
+HiveSortLimit(fetch=[5])
+  HiveProject(username=[$0])
+    HiveAggregate(group=[{0}])
+      HiveTableScan(table=[[default, t1]], table:alias=[t1])
+
+PREHOOK: query: select * from vw_t1
+PREHOOK: type: QUERY
+PREHOOK: Input: default@t1
+PREHOOK: Input: default@vw_t1
+#### A masked pattern was here ####
+POSTHOOK: query: select * from vw_t1
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@t1
+POSTHOOK: Input: default@vw_t1
+#### A masked pattern was here ####
+PREHOOK: query: create view vw_t2 as 
+select username from (select username, id from t1 where id > 10 limit 1) x 
where username > 'a' order by id
+PREHOOK: type: CREATEVIEW
+PREHOOK: Input: default@t1
+PREHOOK: Output: database:default
+PREHOOK: Output: default@vw_t2
+POSTHOOK: query: create view vw_t2 as 
+select username from (select username, id from t1 where id > 10 limit 1) x 
where username > 'a' order by id
+POSTHOOK: type: CREATEVIEW
+POSTHOOK: Input: default@t1
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@vw_t2
+POSTHOOK: Lineage: vw_t2.username SIMPLE [(t1)t1.FieldSchema(name:username, 
type:string, comment:null), ]
+PREHOOK: query: explain cbo select * from vw_t2
+PREHOOK: type: QUERY
+PREHOOK: Input: default@t1
+PREHOOK: Input: default@vw_t2
+#### A masked pattern was here ####
+POSTHOOK: query: explain cbo select * from vw_t2
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@t1
+POSTHOOK: Input: default@vw_t2
+#### A masked pattern was here ####
+CBO PLAN:
+HiveFilter(condition=[>($0, _UTF-16LE'a')])
+  HiveProject(username=[$0])
+    HiveSortLimit(fetch=[1])
+      HiveProject(username=[$0])
+        HiveFilter(condition=[>($1, 10)])
+          HiveTableScan(table=[[default, t1]], table:alias=[t1])

Review Comment:
   I don't see the `order by id` reflected in the plan. Is this normal? Does it 
really matter that is present in the view definition?



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java:
##########
@@ -1577,7 +1577,7 @@ public class CalcitePlannerAction implements 
Frameworks.PlannerAction<RelNode> {
     private final Map<String, PrunedPartitionList>        partitionCache;
     private final Map<String, ColumnStatsList>            colStatsCache;
     private final ColumnAccessInfo columnAccessInfo;
-    private Map<HiveProject, Table> viewProjectToTableSchema;
+    private Map<RelNode, Table> relNodeToTable;

Review Comment:
   Additionally, it seems that we could make this final and do the 
initialization here.



##########
ql/src/test/queries/clientpositive/view_top_relnode_not_project_authorization.q:
##########
@@ -0,0 +1,16 @@
+set hive.security.authorization.enabled=true;
+create table t1 (username string, id int);
+
+create view vw_t1 as select distinct username from t1 limit 5;

Review Comment:
   Should we also add a test case with an aggregate as the top operator?
   Maybe something like:
   ```sql
   create view vw_t0 as select distinct username from t1 group by username;
   ```



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java:
##########
@@ -203,6 +202,25 @@ protected RexNode handle(RexFieldAccess fieldAccess) {
     return dispatchTrimFields(input, fieldsUsedBuilder.build(), extraFields);
   }
 
+  @Override
+  protected void setColumnAccessInfoForViews(RelNode rel, ImmutableBitSet 
fieldsUsed) {
+    final ColumnAccessInfo columnAccessInfo = COLUMN_ACCESS_INFO.get();
+    final Map<RelNode, Table> relNodeToTableAndProjects = 
VIEW_RELNODE_TO_TABLE.get();
+
+    // HiveTableScans are handled separately in HiveTableScan's trimFields 
method.
+    if (!(rel instanceof HiveTableScan) &&
+        columnAccessInfo != null &&
+        relNodeToTableAndProjects != null &&
+        relNodeToTableAndProjects.containsKey(rel)) {
+      Table table = relNodeToTableAndProjects.get(rel);
+      List<FieldSchema> tableAllCols = table.getAllCols();
+      
+      for (int i = fieldsUsed.nextSetBit(0); i >= 0; i = 
fieldsUsed.nextSetBit(i + 1)) {

Review Comment:
   It's a bit hard to understand what the for-loop does without going and 
reading the Javadoc of `ImmutableBitSet#nextSetBit`. For readability, it would 
be probably much easier to understand the following:
   ```java
   for (Integer i: fieldsUsed) {
   ```
   



##########
ql/src/test/queries/clientpositive/view_top_relnode_not_project_authorization.q:
##########
@@ -0,0 +1,16 @@
+set hive.security.authorization.enabled=true;
+create table t1 (username string, id int);
+
+create view vw_t1 as select distinct username from t1 limit 5;
+explain cbo select * from vw_t1;
+select * from vw_t1; 

Review Comment:
   The query was failing at compile time so executing it doesn't seem necessary 
especially since the table has no rows in it.



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java:
##########
@@ -1577,7 +1577,7 @@ public class CalcitePlannerAction implements 
Frameworks.PlannerAction<RelNode> {
     private final Map<String, PrunedPartitionList>        partitionCache;
     private final Map<String, ColumnStatsList>            colStatsCache;
     private final ColumnAccessInfo columnAccessInfo;
-    private Map<HiveProject, Table> viewProjectToTableSchema;
+    private Map<RelNode, Table> relNodeToTable;

Review Comment:
   There are a bunch of other mappings below and they all have the prefix 
`relToSomething` so for consistency reasons I would name this `relToTable`.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveRelFieldTrimmer.java:
##########
@@ -203,6 +202,25 @@ protected RexNode handle(RexFieldAccess fieldAccess) {
     return dispatchTrimFields(input, fieldsUsedBuilder.build(), extraFields);
   }
 
+  @Override
+  protected void setColumnAccessInfoForViews(RelNode rel, ImmutableBitSet 
fieldsUsed) {
+    final ColumnAccessInfo columnAccessInfo = COLUMN_ACCESS_INFO.get();
+    final Map<RelNode, Table> relNodeToTableAndProjects = 
VIEW_RELNODE_TO_TABLE.get();

Review Comment:
   Not sure if the `Projects` suffix is still relevant.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-28280: SemanticException when querying VIEW with DISTINCT clause [hive]

Reply via email to