cshuo opened a new issue, #17641:
URL: https://github.com/apache/hudi/issues/17641
### Bug Description
**What happened:**
When the ordering field is of String type, and there is a record marked as
delete record by `_hoodie_is_deleted` = true, the delete record will always be
chosen during merging, regardless of the ordering value.
**What you expected:**
Delete record with smaller ordering field should not be chosen during
merging.
**Steps to reproduce:**
Put the following test code in `ITTestHoodieDataSource`
```java
@Test
void testHardDelete() throws Exception {
ExecMode execMode = ExecMode.BATCH;
String hoodieTableDDL = "create table t1(\n"
+ " uuid varchar(20),\n"
+ " name varchar(10),\n"
+ " age int,\n"
+ " _hoodie_is_deleted boolean,\n"
+ " `partition` varchar(20),\n"
+ " ts STRING,\n"
+ " PRIMARY KEY(uuid) NOT ENFORCED\n"
+ ")\n"
+ "PARTITIONED BY (`partition`)\n"
+ "with (\n"
+ " 'connector' = 'hudi',\n"
+ " 'table.type' = 'MERGE_ON_READ',\n"
+ " 'index.type' = 'BUCKET',\n"
+ " 'path' = '" + tempFile.getAbsolutePath() + "',\n"
+ " 'read.streaming.skip_compaction' = 'false'\n"
+ ")";
batchTableEnv.executeSql(hoodieTableDDL);
// first commit
String insertInto = "insert into t1 values\n"
+ "('id1','Danny',23,false,'par1', '101'),\n"
+ "('id2','Stephen',33,false,'par1', '103')";
execInsertSql(batchTableEnv, insertInto);
final String expected = "["
+ "+I[id1, Danny, 23, false, par1, 101], "
+ "+I[id2, Stephen, 33, false, par1, 103]]";
// second commit, hard delete record with smaller order value
insertInto = "insert into t1 values\n"
+ "('id2','Stephen',33, true,'par1', '102')";
execInsertSql(batchTableEnv, insertInto);
List<Row> result2 = execSelectSql(batchTableEnv, "select * from t1",
execMode);
// no record is deleted.
assertRowsEquals(result2, expected);
}
```
### Environment
**Hudi version:** 0.14.1, 0.15.0
**Query engine:** (Spark/Flink/Trino etc) flink 1.20
**Relevant configs:**
### Logs and Stack Trace
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]