kasakrisz opened a new pull request, #5267:
URL: https://github.com/apache/hive/pull/5267
### What changes were proposed in this pull request?
Hive rewrites merge statements to insert overwrite statements. When the
target table is Iceberg with table property
```
'write.merge.mode'='copy-on-write'
```
The rewrite of
```
explain
merge into target_ice as t using source src ON t.a = src.a
when matched and t.a > 100 THEN DELETE
when not matched then insert (a, b) values (src.a, concat(src.b, '-merge new
2'));
```
should be
```
WITH `src` AS ( SELECT * FROM
(SELECT `PARTITION__SPEC__ID` AS `t__PARTITION__SPEC__ID`,`PARTITION__HASH`
AS `t__PARTITION__HASH`,`FILE__PATH` AS `t__FILE__PATH`,`ROW__POSITION` AS
`t__ROW__POSITION`,`a` AS `t__a`, `b` AS `t__b`, `c` AS `t__c` FROM
`default`.`target_ice`) `t`
FULL OUTER JOIN
`default`.`source` `src`
ON `t__a` = `src`.`a`
),
t AS (
select
`t__PARTITION__SPEC__ID`,`t__PARTITION__HASH`,`t__FILE__PATH`,-1,`t__a`,`t__b`,`t__c`
from (
select
`t__PARTITION__SPEC__ID`,`t__PARTITION__HASH`,`t__FILE__PATH`,-1,`t__a`,`t__b`,`t__c`,
row_number() OVER (partition by `t__FILE__PATH`) rn from `src`
where `t__a` = `src`.`a` AND `t__a` > 100
) q
where rn=1
)
INSERT INTO `default`.`target_ice`
-- insert clause
SELECT
`t__PARTITION__SPEC__ID`,`t__PARTITION__HASH`,`t__FILE__PATH`,`t__ROW__POSITION`,`src`.`a`,concat(`src`.`b`,
'-merge new 2'),null
FROM `src`
WHERE `t__a` IS NULL
union all
-- delete clause
SELECT
`t__PARTITION__SPEC__ID`,`t__PARTITION__HASH`,`t__FILE__PATH`,`t__ROW__POSITION`,`t__a`,`t__b`,`t__c`
FROM `src` WHERE
( NOT(`t__a` = `src`.`a` AND `t__a` > 100 OR `t__a` IS NULL) OR (`t__a` =
`src`.`a` AND `t__a` > 100 OR `t__a` IS NULL) IS NULL )
AND `t__FILE__PATH` IN ( select `t__FILE__PATH` from t )
union all
select * from t
```
however when the values clause in when not matched insert branch has a
function call with more than one parameter the expressions in the values clause
are malformed.
The values clause is treated as one string which contains all the value
expressions delimited by `,` and it is passed to `CopyOnWriteMergeRewriter`
which generates the insert statement. The value expressions are extracted from
the string by splitting it
https://github.com/apache/hive/blob/3be197e2ca8ad819fd41c507936ef9327571f855/ql/src/java/org/apache/hadoop/hive/ql/parse/rewrite/CopyOnWriteMergeRewriter.java#L160
In our example
```
src.a, concat(src.b, '-merge new 2')
```
the function call
```
concat(src.b, '-merge new 2')
```
also contains a `,` and it is split. So we end up with the following value
expressions
```
`src`.`a`
`concat`(`src`.`b`
'-merge new 2')
```
where
```
`concat`(`src`.`b`
'-merge new 2')
```
should remain
```
concat(`src`.`b`, '-merge new 2')
```
### Why are the changes needed?
Splitting value expressions leads to invalid value expression list hence
malformed rewritten statement.
### Does this PR introduce _any_ user-facing change?
No.
### Is the change a dependency upgrade?
No.
### How was this patch tested?
```
mvn test -Dtest.output.overwrite -Dtest=TestIcebergCliDriver
-Dqfile=merge_iceberg_copy_on_write_unpartitioned.q -pl itests/qtest-iceberg/
-Pitests,iceberg
mvn test -Dtest.output.overwrite -Dtest=TestMiniLlapLocalCliDriver
-Dqfile=insert_into_default_keyword.q,insert_into_default_keyword_2.q -pl
itests/qtest -Pitests
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]