GitHub user xushiyan edited a discussion: Use config 
`hoodie.index.scope=global|partition` to avoid having variants of the same 
index type

Hudi supports many indexes for both read and write: simple, bloom, bucket, 
record, etc. And there are global variants such as global_simple, global_bloom, 
and partitioned variant like partitioned record index.

To simplify the usage, we can add a new config 
`hoodie.index.scope=global|partition` to eliminate the need of keeping global 
or partitioned variants. A legacy config like `hoodie.index.type=global_simple` 
will be equivalent to 

```
hoodie.index.type=simple
hoodie.index.scope=global
```

The default index scope config can be inferred based on the index type to stay 
compatible with the current default behaviors of each index type.

Setting a scope config value should be validated for the index type, for e.g., 
bucket index is currently working on partition level, so explicitly setting the 
scope to global will fail the config validation.

Although it uses one more config, it makes the index names cleaner and aligned. 
For e.g., we have global_simple and global_bloom for global variants, but 
record_index is a global index with partitioned_record_index working on 
partition-level. The new index scope config also makes it more explicit to 
users about how the chosen index type behaves, and how the uniqueness should be 
enforced--either across partitions or within a partition.



GitHub link: https://github.com/apache/hudi/discussions/13864

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to