Re: [ANNOUNCE] Apache Hudi 0.6.0 released

Pratyaksh Sharma Mon, 24 Aug 2020 22:18:06 -0700

Great news! :)

On Tue, Aug 25, 2020 at 10:09 AM Vinoth Chandar <[email protected]> wrote:


> - announce
>
> Folks, please keep the follow ups to dev@ and users@
>
>
>
> On Mon, Aug 24, 2020 at 9:26 PM vino yang <[email protected]> wrote:
>
> > Great news!
> >
> > Thanks to Bhavani Sudha for driving the release! And thanks to every one
> of
> > the whole community!
> >
> > Best,
> > Vino
> >
> > Bhavani Sudha <[email protected]> 于2020年8月25日周二 上午11:37写道：
> >
> > > The Apache Hudi team is pleased to announce the release of Apache Hudi
> > > 0.6.0.
> > >
> > > Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and
> > > Incrementals. Apache Hudi manages storage of large analytical datasets
> on
> > > DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage)
> and
> > > provides the ability to query them.
> > >
> > > This release comes 2 months after 0.5.3. It includes more than 200
> > > resolved issues, comprising new features, perf improvements, as well as
> > > general improvements and bug-fixes. Hudi 0.6.0 introduces mechanisms to
> > > efficiently bootstrap large datasets into Hudi without having to copy
> the
> > > data (experimental feature), via both Spark datasource writer and
> > > DeltaStreamer tool. A new index (HoodieSimpleIndex) is added that can
> be
> > > faster than bloom index for cases where updates/deletes spread across a
> > > large portion of the table. With this version, rollbacks are done using
> > > marker files and a supporting upgrade and downgrade infrastructure is
> > > provided to users for smooth transition. HoodieMultiDeltaStreamer tool
> > > (experimental feature) is added in this version to support ingesting
> > > multiple kafka streams in a single DeltaStreamer deployment for
> enhancing
> > > operational experience. Bulk inserts are further improved by avoiding
> any
> > > dataframe-rdd conversions, accompanied with configurable sorting modes.
> > > While this conversion of dataframe to rdd, is not a bottleneck for
> > > upsert/deletes, subsequent releases will expand this to other write
> > > operations. Other performance improvements include supporting async
> > > compaction for spark streaming writes.
> > >
> > > For details on how to use Hudi, please look at the quick start page
> > > located at:
> > > https://hudi.apache.org/docs/quick-start-guide.html
> > >
> > > If you'd like to download the source release, you can find it here:
> > > https://github.com/apache/hudi/releases/tag/release-0.6.0
> > >
> > > You can read more about the release (including release notes) here:
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346663
> > >
> > > We would like to thank all contributors, the community, and the Apache
> > > Software Foundation for enabling this release and we look forward to
> > > continued collaboration. We welcome your help and feedback. For more
> > > information on how to report problems, and to get involved, visit the
> > > project website at:
> > > http://hudi.apache.org/
> > >
> > > Thanks to everyone involved!
> > > - Bhavani Sudha
> > >
> >
>

Re: [ANNOUNCE] Apache Hudi 0.6.0 released

Reply via email to