The Apache Hudi team is pleased to announce the release of Apache Hudi 0.6.0.
Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals. Apache Hudi manages storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage) and provides the ability to query them. This release comes 2 months after 0.5.3. It includes more than 200 resolved issues, comprising new features, perf improvements, as well as general improvements and bug-fixes. Hudi 0.6.0 introduces mechanisms to efficiently bootstrap large datasets into Hudi without having to copy the data (experimental feature), via both Spark datasource writer and DeltaStreamer tool. A new index (HoodieSimpleIndex) is added that can be faster than bloom index for cases where updates/deletes spread across a large portion of the table. With this version, rollbacks are done using marker files and a supporting upgrade and downgrade infrastructure is provided to users for smooth transition. HoodieMultiDeltaStreamer tool (experimental feature) is added in this version to support ingesting multiple kafka streams in a single DeltaStreamer deployment for enhancing operational experience. Bulk inserts are further improved by avoiding any dataframe-rdd conversions, accompanied with configurable sorting modes. While this conversion of dataframe to rdd, is not a bottleneck for upsert/deletes, subsequent releases will expand this to other write operations. Other performance improvements include supporting async compaction for spark streaming writes. For details on how to use Hudi, please look at the quick start page located at: https://hudi.apache.org/docs/quick-start-guide.html If you'd like to download the source release, you can find it here: https://github.com/apache/hudi/releases/tag/release-0.6.0 You can read more about the release (including release notes) here: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346663 We would like to thank all contributors, the community, and the Apache Software Foundation for enabling this release and we look forward to continued collaboration. We welcome your help and feedback. For more information on how to report problems, and to get involved, visit the project website at: http://hudi.apache.org/ Thanks to everyone involved! - Bhavani Sudha
