+1000. It is also noteworthy that while correcting historical parameters, we must establish a mechanism (likely Checkstyle?) to constrain inevitable future modifications to parameters. Looking forward a more detailed discussion in the RFC.
Best, zhangyue19921010 At 2025-08-01 08:36:52, "Danny Chan" <[email protected]> wrote: >+1, this is history technical debt, we should fix it as our >notions/terminologies are prone to be stable nowadays. > >Best, >Danny > >Bhavani Sudha <[email protected]> 于2025年7月31日周四 18:13写道: >> >> +1 on the idea Shiyan. Love to see an RFC as a next step. >> >> Thanks, >> Sudha >> >> On Thu, Jul 31, 2025 at 1:37 AM Geser Dugarov <[email protected]> >> wrote: >> >> > Hi Shiyan! >> > >> > I totally support this proposal and I'm happy to help if needed. >> > >> > I just want to highlight the scope of this work - currently, we have 989 >> > configuration parameters. I had analyzed this earlier and have updated the >> > list after receiving your message. You can check it here: >> > >> > https://docs.google.com/spreadsheets/d/1a6BZbL5EmuTbftA2dShvSa0WeSOVNV2u/edit?usp=sharing&ouid=117459384969247807552&rtpof=true&sd=true >> > >> > It might be a good idea to prepare a corresponding RFC to define naming >> > standards for configuration parameters. It’s also crucial that the RFC >> > includes a clear plan for the steps of renaming, deprecation, and dropping >> > aliases across different groups of parameters — there are simply too many >> > to manage without a structured approach. >> > >> > Best regards, >> > Geser >> > >> > >> > On Thu, Jul 31, 2025 at 8:14 AM Shiyan Xu <[email protected]> >> > wrote: >> > >> > > Hi all, >> > > >> > > Since config names are the first thing users see when working with Hudi >> > and >> > > directly impact user and dev experience, we should pay careful attention >> > to >> > > keeping them standardized and easy to remember and use. I wanted to start >> > > this thread to raise some points so we can establish a set of standards >> > and >> > > create a migration path. >> > > >> > > 1. Plural vs Singular >> > > >> > > If a config supports taking multiple values, it has to be plural if >> > > applicable. For e.g., since Hudi 1.1, we support multiple ordering >> > fields, >> > > we should make `hoodie.datasource.write.precombine.field` plural. To >> > show a >> > > little bit seriousness, treat this kind of misleading config name >> > (singular >> > > but supports multiple values) as a bug. >> > > >> > > 2. Namespaces >> > > >> > > Always start with `hoodie.<function area>.` as the namespace to denote >> > the >> > > area of the config would serve. For e.g., `hoodie.table.*` is always a >> > > table config, `hoodie.write.*` is meant for writer to set, >> > `hoodie.read.*` >> > > is meant for query engines to use, >> > > `hoodie.<compaction|clustering|cleaning|indexing>.*` always denotes table >> > > service specific configs, `hoodie.<storage>.*` indicates configs that >> > > control storage layer settings, `hoodie.table.metadata.*` is specific for >> > > the metadata table. >> > > >> > > Keep these namespaces a fixed set of constants (a mandatory enum for >> > > composing config names), and do not causally change the words, like >> > > `compaction` vs `compact`, `cleaning` vs `clean` >> > > >> > > 3. snake_case >> > > >> > > Use `.` to delimit functionally distinct words and `_` (snake_case) to >> > > connect a meaningful phrase. For example: >> > > >> > > - `hoodie.table.recordkey.fields` should be >> > > `hoodie.table.record_key.fields`, as `recordkey` is not one word and >> > should >> > > follow snake_case. >> > > - `hoodie.table.keygenerator.class` should be >> > > `hoodie.table.key_generator.class`, for similar reason >> > > - `hoodie.table.index.defs.path` should be >> > `hoodie.table.index_defs.path`, >> > > "index defs" putting together is meant for one thing, but reading them >> > > separately as "index" and "defs" do not convey meaningful info about this >> > > config >> > > - `hoodie.file.group.reader.enabled` should be >> > > `hoodie.file_group.reader.enabled`, for similar reason >> > > >> > > 4. `hoodie.properties` only for catalog/table configs >> > > >> > > Only keep catalog/table configs in `hoodie.properties`; keep configs like >> > > `hoodie.datasource.write.*` out of it, add new table configs for those do >> > > not have a table config alias. For e.g., remove >> > > `hoodie.datasource.write.hive_style_partitioning` and put >> > > `hoodie.table.hive_style_partitioning` instead. >> > > >> > > 5. Improve naming case by case >> > > >> > > Some examples to consider: >> > > - All `hoodie.datasource.write.*` move to `hoodie.write.*`, keep things >> > > shorter >> > > - All feature-switching configs end with `enabled`, not to mix with >> > > `enable` >> > > - All meta/hive-sync related configs move to `hoodie.catalog.sync.*`, >> > > clearly stating it's working with catalogs, and the function is about >> > > "sync" >> > > >> > > 6. Standardize shorthand property names in SQL TBLPROPERTIES >> > > >> > > Everyone's first example of running Hudi has contained something like >> > this >> > > >> > > TBLPROPERTIES ( >> > > primaryKey = 'id', >> > > preCombineField = 'ts' >> > > ); >> > > >> > > Let's fix it: >> > > >> > > - "record key" is the term in Hudi so we don't want people to remember >> > > "primary key is meant for record key", and make sure the plural rule >> > > applies >> > > - "ordering field" is the newer term so let's deprecate the term >> > > "pre-combine field", and make sure the plural rule applies too >> > > - again, snake_case all the way so it should be like below (omit the >> > > `hoodie.table.` namespace) so people can associate them with the full >> > name >> > > easily: >> > > >> > > TBLPROPERTIES ( >> > > record_key.fields = 'id', >> > > ordering.fields = 'ts' >> > > ); >> > > >> > > - in cases where non-table configs need to be put in TBLPROPERTIES() , we >> > > can just omit `hoodie.` since we have `USING HUDI` in the SQL, so it >> > should >> > > support `read.*`, `write.*`, `storage.*` sort of shorthand keys >> > > >> > > 7. Address discrepancies between Flink options and Spark options >> > > >> > > A one-time sweep of flink configs that diverge from Spark configs, and >> > > align them according to the standards we're making. The goals are: >> > > >> > > - All `hoodie.*` configs should be engine-agnostic and universally >> > accepted >> > > by all engines when applicable >> > > - Any engine-specific config should be owned by the engine, and starts >> > with >> > > `hudi.` (like how the Trino Hudi connector does now) >> > > >> > > >> > > About migration: we should start adding new config names while keeping >> > the >> > > old ones compatible as aliases. That means, throughout the codebase, >> > config >> > > variables will contain the standard strings as the names, and any >> > > user-provided config will be translated to its new name if applicable. >> > > >> > > We don't really want to fail writers/readers just because of old config >> > > names so we can keep the aliases for quite some time, but there has to be >> > > deprecation warnings from now, and drop aliases at some major release >> > (like >> > > 2.0 or 3.0). But before that, any table version upgrade should strive to >> > > rename the configs in `hoodie.properties` as per the standards to >> > > evangelize the new names. >> > > >> > > Best, >> > > Shiyan >> > > >> >
