yihua opened a new pull request, #18867:
URL: https://github.com/apache/hudi/pull/18867
### Describe the issue this Pull Request addresses
Updates the website documentation to cover features new in Hudi **1.2.0**
(632 commits in `release-1.1.0...release-1.2.0`). Release notes and the
auto-generated config reference (`configurations.md` /
`basic_configurations.md`) are tracked separately and are not in scope here.
### Summary and Changelog
31 files under `website/docs/` changed (1457 insertions / 84 deletions),
split into 15 logical commits so each can be reviewed against its source PRs.
Highlights:
- **Critical corrections** in already-published AI pages (verified directly
against source):
- `lance_file_format.md`: fix wrong config key
(`hoodie.datasource.write.base.file.format` does not exist →
`hoodie.table.base.file.format`); correct indexing claim (column stats /
partition stats are auto-disabled on Lance, only bloom filters work).
- `blob_unstructured_data.md`: fix wrong default for
`hoodie.read.blob.inline.mode` (`DESCRIPTOR`, not `CONTENT`); add the required
`managed:boolean` field to every BLOB `reference` struct example; add caution
about `read_blob()` under DESCRIPTOR mode.
- `variant_type.md`: split Flink row by version — VARIANT requires Flink
2.1+.
- `vector_search.md`: add Flink constraint (cannot read VECTOR columns).
- **New net-new sections:** Flink Source V2 (RFC-95), RLI-based bucket
indexing for Flink, append write-buffer modes (`write.buffer.type`), lookup
join, Azure storage-based lock provider, pre-write validators,
JsonKinesisSource, `show_timeline` procedure, two CLI commands,
archival/post-commit/rollback metrics.
- **Engine matrices:** Flink 1.2.x row (Flink 2.1.x is the new default
build); Spark patch versions (3.4.3 / 3.5.5 / 4.0.2 / 4.1.1); Java 11 minimum
build target.
- **Many new configs** documented across cleaning, clustering, compaction,
metadata, sync, write, and key-generation pages.
### Per-commit → feature PR mapping (for review)
| Commit | Pages | Source PRs |
|---|---|---|
| Document Lance file format updates | `lance_file_format.md` | #17731,
#17660, #17632, #17629, #18341, #18613, #18304, #17904, #17768, #18498, #18760,
#18481, #18775, #18678, #18588, #18497, #18575, #18586, #18744, #18042, #17769 |
| Document BLOB unstructured data updates | `blob_unstructured_data.md` |
#18108, #18013, #18728, #18098, #18347, #18538, #18744, #18575, #18586, #18736,
#18581, #18580, #18566, #18482, #18643 |
| Document VARIANT type updates | `variant_type.md` | #13743, #17833,
#18036, #17751, #18274, #18564, #18674, #18483, #18511, #18510 |
| Document VECTOR / vector search updates | `vector_search.md` | #18146,
#18190, #18328, #18488, #18431, #18545, #18540, #18432, #14218, #18480, #18712 |
| Document Flink Source V2 (RFC-95) and writer features |
`ingestion_flink.md`, `flink_tuning.md`, `indexes.md`,
`reading_tables_batch_reads.md`, `reading_tables_streaming_reads.md` | #13381,
#17989, #18022, #18212, #18074, #18406, #18369, #18370, #17490, #17580, #14186,
#14183, #14205, #18361, #18436, #18206, #14309, #18083, #13892, #17864, #18319,
#18231, #18193, #17803, #17802, #18254, #18484, #18560, #17838, #18103, #18444,
#17991, #18283, #18127, #18250, #14202, #13259, #14087 |
| Update Flink support matrix and download snippets |
`flink-quick-start-guide.md` | #17574, #18567 |
| Document new cleaning options | `cleaning.md` | #18337, #18587, #18322,
#17550, #17935, #17819, #18197 |
| Document new clustering plan strategies | `clustering.md` | #18251,
#18174, #18302, #18191, #18172, #18409 |
| Document compaction, metadata table, and RLI updates | `compaction.md`,
`metadata.md`, `metadata_indexing.md` | #18295, #17603, #18012, #18306, #18181,
#17996, #18380, #18353, #14244, #14180, #14215, #17461 |
| Document Azure storage-based lock provider and concurrency |
`concurrency_control.md`, `azure_hoodie.md` | #17951, #13886, #18439, #17869,
#17870, #17871, #18593, #18123, #18448, #18280 |
| Document new validator frameworks | `precommit_validator.md` | #18239,
#18362, #18068, #17505, #18128 |
| Document ingestion and sync updates | `hoodie_streaming_ingestion.md`,
`syncing_metastore.md`, `syncing_aws_glue_data_catalog.md` | #18224, #18689,
#18088, #17777, #18204, #18385, #18203, #18227, #18307, #18707, #18064 |
| Document engine support, SQL, and partitioning updates |
`quick-start-guide.md`, `deployment.md`, `key_generation.md`, `sql_queries.md`,
`sql_ddl.md` | #17674, #18549, #17637, #18292, #17514, #18205, #18297, #17787,
#18195, #18126, #18086, #18738, #17994 |
| Document new procedures, CLI commands, and metrics | `procedures.md`,
`cli.md`, `metrics.md` | #14261, #17940, #17511, #18133, #18148, #18196,
#18179, #17945, #17518 |
| Document new payloads and write-side configs | `record_merger.md`,
`writing_data.md` | #17928, #18413, #18421, #18379, #17495 |
### Impact
User-facing documentation only. No code or build changes.
### Risk Level
low — documentation only; every config key, default, class name, SQL/DDL
syntax, and procedure parameter was verified directly against the 1.2.0 source
before being written.
### Documentation Update
This PR _is_ the documentation update for 1.2.0 features. The auto-generated
config reference (`configurations.md` / `basic_configurations.md`) and the
1.2.0 release notes are tracked separately.
### Contributor's checklist
- [x] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [x] Enough context is provided in the sections above
- [ ] Adequate tests were added if applicable — N/A (documentation-only)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]