yihua opened a new pull request, #18867:
URL: https://github.com/apache/hudi/pull/18867

   ### Describe the issue this Pull Request addresses
   
   Updates the website documentation to cover features new in Hudi **1.2.0** 
(632 commits in `release-1.1.0...release-1.2.0`). Release notes and the 
auto-generated config reference (`configurations.md` / 
`basic_configurations.md`) are tracked separately and are not in scope here.
   
   ### Summary and Changelog
   
   31 files under `website/docs/` changed (1457 insertions / 84 deletions), 
split into 15 logical commits so each can be reviewed against its source PRs. 
Highlights:
   
   - **Critical corrections** in already-published AI pages (verified directly 
against source):
     - `lance_file_format.md`: fix wrong config key 
(`hoodie.datasource.write.base.file.format` does not exist → 
`hoodie.table.base.file.format`); correct indexing claim (column stats / 
partition stats are auto-disabled on Lance, only bloom filters work).
     - `blob_unstructured_data.md`: fix wrong default for 
`hoodie.read.blob.inline.mode` (`DESCRIPTOR`, not `CONTENT`); add the required 
`managed:boolean` field to every BLOB `reference` struct example; add caution 
about `read_blob()` under DESCRIPTOR mode.
     - `variant_type.md`: split Flink row by version — VARIANT requires Flink 
2.1+.
     - `vector_search.md`: add Flink constraint (cannot read VECTOR columns).
   - **New net-new sections:** Flink Source V2 (RFC-95), RLI-based bucket 
indexing for Flink, append write-buffer modes (`write.buffer.type`), lookup 
join, Azure storage-based lock provider, pre-write validators, 
JsonKinesisSource, `show_timeline` procedure, two CLI commands, 
archival/post-commit/rollback metrics.
   - **Engine matrices:** Flink 1.2.x row (Flink 2.1.x is the new default 
build); Spark patch versions (3.4.3 / 3.5.5 / 4.0.2 / 4.1.1); Java 11 minimum 
build target.
   - **Many new configs** documented across cleaning, clustering, compaction, 
metadata, sync, write, and key-generation pages.
   
   ### Per-commit → feature PR mapping (for review)
   
   | Commit | Pages | Source PRs |
   |---|---|---|
   | Document Lance file format updates | `lance_file_format.md` | #17731, 
#17660, #17632, #17629, #18341, #18613, #18304, #17904, #17768, #18498, #18760, 
#18481, #18775, #18678, #18588, #18497, #18575, #18586, #18744, #18042, #17769 |
   | Document BLOB unstructured data updates | `blob_unstructured_data.md` | 
#18108, #18013, #18728, #18098, #18347, #18538, #18744, #18575, #18586, #18736, 
#18581, #18580, #18566, #18482, #18643 |
   | Document VARIANT type updates | `variant_type.md` | #13743, #17833, 
#18036, #17751, #18274, #18564, #18674, #18483, #18511, #18510 |
   | Document VECTOR / vector search updates | `vector_search.md` | #18146, 
#18190, #18328, #18488, #18431, #18545, #18540, #18432, #14218, #18480, #18712 |
   | Document Flink Source V2 (RFC-95) and writer features | 
`ingestion_flink.md`, `flink_tuning.md`, `indexes.md`, 
`reading_tables_batch_reads.md`, `reading_tables_streaming_reads.md` | #13381, 
#17989, #18022, #18212, #18074, #18406, #18369, #18370, #17490, #17580, #14186, 
#14183, #14205, #18361, #18436, #18206, #14309, #18083, #13892, #17864, #18319, 
#18231, #18193, #17803, #17802, #18254, #18484, #18560, #17838, #18103, #18444, 
#17991, #18283, #18127, #18250, #14202, #13259, #14087 |
   | Update Flink support matrix and download snippets | 
`flink-quick-start-guide.md` | #17574, #18567 |
   | Document new cleaning options | `cleaning.md` | #18337, #18587, #18322, 
#17550, #17935, #17819, #18197 |
   | Document new clustering plan strategies | `clustering.md` | #18251, 
#18174, #18302, #18191, #18172, #18409 |
   | Document compaction, metadata table, and RLI updates | `compaction.md`, 
`metadata.md`, `metadata_indexing.md` | #18295, #17603, #18012, #18306, #18181, 
#17996, #18380, #18353, #14244, #14180, #14215, #17461 |
   | Document Azure storage-based lock provider and concurrency | 
`concurrency_control.md`, `azure_hoodie.md` | #17951, #13886, #18439, #17869, 
#17870, #17871, #18593, #18123, #18448, #18280 |
   | Document new validator frameworks | `precommit_validator.md` | #18239, 
#18362, #18068, #17505, #18128 |
   | Document ingestion and sync updates | `hoodie_streaming_ingestion.md`, 
`syncing_metastore.md`, `syncing_aws_glue_data_catalog.md` | #18224, #18689, 
#18088, #17777, #18204, #18385, #18203, #18227, #18307, #18707, #18064 |
   | Document engine support, SQL, and partitioning updates | 
`quick-start-guide.md`, `deployment.md`, `key_generation.md`, `sql_queries.md`, 
`sql_ddl.md` | #17674, #18549, #17637, #18292, #17514, #18205, #18297, #17787, 
#18195, #18126, #18086, #18738, #17994 |
   | Document new procedures, CLI commands, and metrics | `procedures.md`, 
`cli.md`, `metrics.md` | #14261, #17940, #17511, #18133, #18148, #18196, 
#18179, #17945, #17518 |
   | Document new payloads and write-side configs | `record_merger.md`, 
`writing_data.md` | #17928, #18413, #18421, #18379, #17495 |
   
   ### Impact
   
   User-facing documentation only. No code or build changes.
   
   ### Risk Level
   
   low — documentation only; every config key, default, class name, SQL/DDL 
syntax, and procedure parameter was verified directly against the 1.2.0 source 
before being written.
   
   ### Documentation Update
   
   This PR _is_ the documentation update for 1.2.0 features. The auto-generated 
config reference (`configurations.md` / `basic_configurations.md`) and the 
1.2.0 release notes are tracked separately.
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [ ] Adequate tests were added if applicable — N/A (documentation-only)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to