zabetak commented on code in PR #6316:
URL: https://github.com/apache/hive/pull/6316#discussion_r2824269607


##########
ql/src/test/queries/clientpositive/distribution_key_constant_value.q:
##########
@@ -0,0 +1,7 @@
+CREATE TABLE test (col1 string, col2 string);
+
+EXPLAIN CBO
+SELECT col1 FROM test
+WHERE col2 = 'a'
+DISTRIBUTE BY col1, col2     
+SORT BY col1, col2; 

Review Comment:
   Can we drop the `SORT BY` to minimize the repro?
   ```sql
   SELECT col1, col2 FROM test
   WHERE col2 = 'a'
   DISTRIBUTE BY col1, col2
   ```
   
   



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelDistribution.java:
##########
@@ -95,9 +96,11 @@ public RelDistribution apply(TargetMapping mapping) {
       tmp.put(aMapping.source, aMapping.target);
     }
 
-    for (Integer key : keys) {
-      newKeys.add(tmp.get(key));
-    }
+    keys.stream()
+        .map(tmp::get)
+        .filter(Objects::nonNull)
+        .forEach(newKeys::add);

Review Comment:
   This change goes against the API specification of 
org.apache.calcite.rel.RelDistribution#apply:
   ```
      * <p>If mapping eliminates one of the distribution keys, the {@link 
Type#ANY}
      * distribution will be returned.
   ```
   At this level it is undefined what a null target means so the transformation 
may not always be valid. https://issues.apache.org/jira/browse/CALCITE-3969 may 
contain additional insights.
   
   It's probably safer to apply a change inside the 
`HiveSortPullUpConstantsRule` similar to what is done for collation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to