prashantwason opened a new issue, #18691:
URL: https://github.com/apache/hudi/issues/18691
## Bug Description
**What happened:**
`CREATE INDEX IF NOT EXISTS record_index ON <table> (<record_key_col>)`
throws
`HoodieMetadataIndexException: Index already exists: record_index` when the
index
already exists. The `IF NOT EXISTS` clause has no effect.
**What you expected:**
With `IF NOT EXISTS`, the command should be a no-op when the index already
exists — matching standard SQL semantics and matching the behavior implied by
the `ignoreIfExists: Boolean` field on `CreateIndexCommand`.
**Steps to reproduce:**
1. Create a Hudi COW table with a record-key column (e.g. `uuid`).
2. `CREATE INDEX record_index ON tbl (uuid)` — succeeds.
3. `CREATE INDEX IF NOT EXISTS record_index ON tbl (uuid)` — throws
`HoodieMetadataIndexException: Index already exists: record_index`.
## Root cause
`CreateIndexCommand` in
`hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/IndexCommands.scala`
parses an `ignoreIfExists: Boolean` field from the SQL but never propagates
it
to `HoodieSparkIndexClient.create(...)`:
```scala
} else if
(indexName.equals(HoodieTableMetadataUtil.PARTITION_NAME_RECORD_INDEX)) {
ValidationUtils.checkArgument(...)
new HoodieSparkIndexClient(sparkSession).create(metaClient, indexName,
HoodieTableMetadataUtil.PARTITION_NAME_RECORD_INDEX, columnsMap,
options.asJava, table.properties.asJava)
// ignoreIfExists is dropped here
}
```
`HoodieSparkIndexClient.create` in
`hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/index/HoodieSparkIndexClient.java`
has no `ignoreIfExists` parameter at all, and `createRecordIndex`
unconditionally
throws when the index exists:
```java
String fullIndexName = PARTITION_NAME_RECORD_INDEX;
if (indexExists(metaClient, fullIndexName)) {
throw new HoodieMetadataIndexException("Index already exists: " +
userIndexName);
}
```
The same gap exists for the column_stats/bloom_filters/secondary-index
branches —
none of the `HoodieSparkIndexClient(...).create(...)` call sites in
`CreateIndexCommand.run` pass through `ignoreIfExists`.
## Suggested fix
1. Add an `ignoreIfExists: boolean` parameter to
`HoodieSparkIndexClient.create(...)`.
2. Pass it through from every branch in `CreateIndexCommand.run`.
3. In `createRecordIndex` and `createExpressionOrSecondaryIndex`, return
early
(instead of throwing) when the index already exists and `ignoreIfExists
== true`.
## Notes
- The expression/secondary-index path (`createExpressionOrSecondaryIndex`,
`HoodieSparkIndexClient.java:155-159`) already silently skips
re-registration
when the index exists, so observable behavior between record_index and
expression/secondary indexes already differs. The fix is a good opportunity
to unify behavior across both paths.
- `DROP INDEX IF EXISTS` works correctly today: the `ignoreIfNotExists:
Boolean`
field on `DropIndexCommand` IS propagated to
`HoodieSparkIndexClient.drop(metaClient, indexName, ignoreIfNotExists)`.
`CREATE INDEX IF NOT EXISTS` is the missing symmetric path.
## Environment
**Hudi version:** 1.x (verified on 1.2; affects all releases where
record_index DDL exists)
**Query engine:** Spark 3.3
**Relevant configs:** standard MDT-enabled COW table
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]