Re: [PR] HIVE-28917: NPE in merge statement when checking nullability of joini… [hive]

via GitHub Mon, 05 May 2025 10:29:09 -0700


zabetak commented on code in PR #5781:
URL: https://github.com/apache/hive/pull/5781#discussion_r2073771610



##########
ql/src/java/org/apache/hadoop/hive/ql/Context.java:
##########
@@ -361,8 +361,16 @@ private DestClausePrefix getMergeDestClausePrefix(ASTNode 
curNode) {
     assert insert != null && insert.getType() == HiveParser.TOK_INSERT;
     ASTNode query = (ASTNode) insert.getParent();
     assert query != null && query.getType() == HiveParser.TOK_QUERY;
-
-    int tokFromIdx = 
query.getFirstChildWithType(HiveParser.TOK_FROM).getChildIndex();
+    ASTNode from = (ASTNode) query.getFirstChildWithType(HiveParser.TOK_FROM);
+    
+    if (from == null) {
+      // We are here when TOK_FROM is missing from the AST.
+      // This can happen for merge queries with a predicate like 
`<joining_column> is null`
+      // in the matched clause.
+      return DestClausePrefix.MERGE;
+    }

Review Comment:
   I understand that the FROM clause may be missing but why do we have to 
return `DestClausePrefix.MERGE` and not one of the other constants?



##########
iceberg/iceberg-handler/src/test/queries/positive/merge_with_null_check_on_joining_col.q:
##########
@@ -0,0 +1,24 @@
+
+create table target(a int, b int, c int) stored by iceberg 
tblproperties('format-version'='2', 'write.merge.mode'='copy-on-write');
+create table source(a int, b int, c int) stored by iceberg 
tblproperties('format-version'='2', 'write.merge.mode'='copy-on-write');
+
+-- empty plan as joining column cannot be null for matched clause
+explain cbo
+merge into target as t using source as s on t.a = s.a and t.b = s.b
+when matched and t.a is null then delete;
+
+explain cbo
+merge into target as t using source as s on t.a = s.a and t.b = s.b
+when matched and t.a is null then update set b = t.b + 10;

Review Comment:
   Does it make sense to add tests where the merge statement has multiple 
branches (delete, update, insert) and only one of them is false each time?



##########
iceberg/iceberg-handler/src/test/queries/positive/merge_with_null_check_on_joining_col.q:
##########
@@ -0,0 +1,24 @@
+
+create table target(a int, b int, c int) stored by iceberg 
tblproperties('format-version'='2', 'write.merge.mode'='copy-on-write');
+create table source(a int, b int, c int) stored by iceberg 
tblproperties('format-version'='2', 'write.merge.mode'='copy-on-write');
+
+-- empty plan as joining column cannot be null for matched clause
+explain cbo
+merge into target as t using source as s on t.a = s.a and t.b = s.b
+when matched and t.a is null then delete;
+
+explain cbo
+merge into target as t using source as s on t.a = s.a and t.b = s.b
+when matched and t.a is null then update set b = t.b + 10;
+--------------------------------------------------------------------
+
+ -- non empty plans for these queries
+explain cbo
+merge into target as t using source as s on t.a = s.a and t.b = s.b
+when not matched and t.a is null then insert values (1, 2, 3);
+
+explain cbo
+merge into target as t using source as s on t.a = s.a and t.b = s.b
+when matched and t.a is null then delete
+when matched then update set b = t.b + 10
+when not matched then insert values (1, 2, 3);

Review Comment:
   I assume we already have tests with MERGE statements and non-empty branches 
so if the above queries do not add new coverage then I would suggest to remove 
them.



##########
iceberg/iceberg-handler/src/test/queries/positive/merge_with_null_check_on_joining_col.q:
##########
@@ -0,0 +1,24 @@
+
+create table target(a int, b int, c int) stored by iceberg 
tblproperties('format-version'='2', 'write.merge.mode'='copy-on-write');
+create table source(a int, b int, c int) stored by iceberg 
tblproperties('format-version'='2', 'write.merge.mode'='copy-on-write');

Review Comment:
   It seems that the problem is not specific to iceberg but affects all kinds 
of merge statements. If that's the case it may be good to write the test case 
using regular ACID tables and the more common `TestMiniLlapLocalCliDriver`.



##########
ql/src/java/org/apache/hadoop/hive/ql/Context.java:
##########
@@ -361,8 +361,16 @@ private DestClausePrefix getMergeDestClausePrefix(ASTNode 
curNode) {
     assert insert != null && insert.getType() == HiveParser.TOK_INSERT;
     ASTNode query = (ASTNode) insert.getParent();
     assert query != null && query.getType() == HiveParser.TOK_QUERY;
-
-    int tokFromIdx = 
query.getFirstChildWithType(HiveParser.TOK_FROM).getChildIndex();
+    ASTNode from = (ASTNode) query.getFirstChildWithType(HiveParser.TOK_FROM);

Review Comment:
   According to the comment a few line above the code here assumes that the AST 
has a certain shape. I am not sure to what extend other parts of the code rely 
on this structure. Have you checked if there is a more intuitive fix when going 
from CBO to AST?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-28917: NPE in merge statement when checking nullability of joini… [hive]

Reply via email to