kasakrisz opened a new pull request, #5267:
URL: https://github.com/apache/hive/pull/5267

   ### What changes were proposed in this pull request?
   Hive rewrites merge statements to insert overwrite statements. When the 
target table is Iceberg with table property
   ```
   'write.merge.mode'='copy-on-write'
   ```
   The rewrite of
   ```
   explain
   merge into target_ice as t using source src ON t.a = src.a
   when matched and t.a > 100 THEN DELETE
   when not matched then insert (a, b) values (src.a, concat(src.b, '-merge new 
2'));
   ```
   should be
   ```
   WITH `src` AS ( SELECT * FROM
   (SELECT `PARTITION__SPEC__ID` AS `t__PARTITION__SPEC__ID`,`PARTITION__HASH` 
AS `t__PARTITION__HASH`,`FILE__PATH` AS `t__FILE__PATH`,`ROW__POSITION` AS 
`t__ROW__POSITION`,`a` AS `t__a`, `b` AS `t__b`, `c` AS `t__c` FROM 
`default`.`target_ice`) `t`
     FULL OUTER JOIN
     `default`.`source` `src`
     ON `t__a` = `src`.`a`
   ),
   t AS (
     select 
`t__PARTITION__SPEC__ID`,`t__PARTITION__HASH`,`t__FILE__PATH`,-1,`t__a`,`t__b`,`t__c`
 from (
       select 
`t__PARTITION__SPEC__ID`,`t__PARTITION__HASH`,`t__FILE__PATH`,-1,`t__a`,`t__b`,`t__c`,
 row_number() OVER (partition by `t__FILE__PATH`) rn from `src`
       where `t__a` = `src`.`a` AND `t__a` > 100
     ) q
     where rn=1
   )
   INSERT INTO `default`.`target_ice`
       -- insert clause
   SELECT 
`t__PARTITION__SPEC__ID`,`t__PARTITION__HASH`,`t__FILE__PATH`,`t__ROW__POSITION`,`src`.`a`,concat(`src`.`b`,
 '-merge new 2'),null
   FROM `src`
      WHERE `t__a` IS NULL
   union all
       -- delete clause
   SELECT 
`t__PARTITION__SPEC__ID`,`t__PARTITION__HASH`,`t__FILE__PATH`,`t__ROW__POSITION`,`t__a`,`t__b`,`t__c`
   FROM `src`  WHERE 
     ( NOT(`t__a` = `src`.`a` AND `t__a` > 100 OR `t__a` IS NULL) OR (`t__a` = 
`src`.`a` AND `t__a` > 100 OR `t__a` IS NULL) IS NULL )
     AND `t__FILE__PATH` IN ( select `t__FILE__PATH` from t )
   union all
   select * from t
   ```
   however when the values clause in when not matched insert branch has a 
function call with more than one parameter the expressions in the values clause 
are malformed.
   The values clause is treated as one string which contains all the value 
expressions delimited by `,` and it is passed to  `CopyOnWriteMergeRewriter` 
which generates the insert statement. The value expressions are extracted from 
the string by splitting it 
   
https://github.com/apache/hive/blob/3be197e2ca8ad819fd41c507936ef9327571f855/ql/src/java/org/apache/hadoop/hive/ql/parse/rewrite/CopyOnWriteMergeRewriter.java#L160
   In our example
   ```
   src.a, concat(src.b, '-merge new 2')
   ```
   the function call
   ```
   concat(src.b, '-merge new 2')
   ```
   also contains a `,` and it is split. So we end up with the following value 
expressions
   ```
   `src`.`a`
   `concat`(`src`.`b`
   '-merge new 2')
   ```
   where
   ```
   `concat`(`src`.`b`
   '-merge new 2')
   ```
   should remain
   ```
   concat(`src`.`b`, '-merge new 2')
   ```
   ### Why are the changes needed?
   Splitting value expressions leads to invalid value expression list hence 
malformed rewritten statement.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### Is the change a dependency upgrade?
   No.
   
   ### How was this patch tested?
   ```
   mvn test -Dtest.output.overwrite -Dtest=TestIcebergCliDriver 
-Dqfile=merge_iceberg_copy_on_write_unpartitioned.q -pl itests/qtest-iceberg/ 
-Pitests,iceberg
   mvn test -Dtest.output.overwrite -Dtest=TestMiniLlapLocalCliDriver 
-Dqfile=insert_into_default_keyword.q,insert_into_default_keyword_2.q -pl 
itests/qtest -Pitests
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to