All, Hudi 1.0 is entering voting soon. It's a large community effort from so many people here. Thank you! Please test/provide feedback on Slack or on the dev mailing list. Watch for a separate vote email on the dev list.
*New docs are live* (continuously updated for the next few days. but they are already redone to reflect all new features and usage) . - https://hudi.apache.org/docs/next/overview (Note the “next”) the URL - Docs will get finalized once the community ratifies the release . *Notable changes to docs.* - Use-cases: https://hudi.apache.org/docs/next/use_cases. this lands the core use-cases while talking about design differences that make Hudi shine for these use-cases. - Python/Rust guide: https://hudi.apache.org/docs/next/python-rust-quick-start-guide about Hudi in those ecosystems - Hudi stack page: https://hudi.apache.org/docs/next/hudi_stack revamped to pull out table format, and clearly show what Hudi adds on top. Aligned with a seminal database academic paper - Timeline page : https://hudi.apache.org/docs/next/timeline (explains time and specifically how Hudi implements TrueTime that also powers Google Cloud Spanner, Cockroach etc) - Storage format versioning : https://hudi.apache.org/docs/next/storage_layouts#storage-format-versioning (explains how we are making upgrades easy, with backwards compatible writing) - Write operations: https://hudi.apache.org/docs/next/write_operations (clearly split into incremental and batch operations.. showing benefits to data pipelines..) - Record merger https://hudi.apache.org/docs/next/record_merger (again here we talk differentiation on incremental pipelines/streaming workloads, stuff like event time ordering that no other system can do) *Finally, the headliners & brand new, industry-first features:* - Indexing page https://hudi.apache.org/docs/next/indexes#multi-modal-indexing (we’re bringing secondary indexes to the lakehouse.) - look at examples in https://hudi.apache.org/docs/next/quick-start-guide#indexing and https://hudi.apache.org/docs/next/sql_ddl#create-index - you can build an index on any column now.. https://hudi.apache.org/docs/next/metadata_indexing and accelerate any IN and = queries on top of those columns. - Non blocking concurrency control: https://hudi.apache.org/docs/next/sql_dml#non-blocking-concurrency-control-experimental (for Flink users and in general people unhappy with OCC/looking for best in class concurrency control) - Finally, partial update encoding.. https://www.youtube.com/watch?v=mEwhBdOl53o (we are seeing about 85% reduction in data written and 30-60% drop in write latencies..) On Mon, Dec 2, 2024 at 9:56 PM sagar sumit <cod...@apache.org> wrote: > Hello Everyone, > > We are very close to cutting the RC1 for Hudi 1.0. For a preview of what's > coming, please take a look at the updated docs. To begin with, check out > the `Use Cases` doc [1] that throws more light on Streaming/CDC use > cases. Then, of course my favorite docs are in `Design & Concepts` > section such as: > > Apache Hudi Stack [2] > Storage Layout [3] > Timeline [4] > Write Operations [5] > Table and Query Types [6] > > We are actively updating the website. As the RM, I would like to invite you > to check out the docs and try out RC1 yourself, and provide us feedback. > > So, gear up for an amazing Hudi 1.0! > > Regards, > Sagar > > [1] https://hudi.apache.org/docs/next/use_cases > [2] https://hudi.apache.org/docs/next/hudi_stack > [3] https://hudi.apache.org/docs/next/storage_layouts > [4] https://hudi.apache.org/docs/next/timeline > [5] https://hudi.apache.org/docs/next/write_operations > [6] https://hudi.apache.org/docs/next/table_types >