Re: [ANNOUNCE] Apache Hudi 0.9.0 released

leesf Fri, 03 Sep 2021 20:19:50 -0700

Thanks Udit for driving the release, Great news!

Rubens Rodrigues <rubenssoto2...@gmail.com> 于2021年9月4日周六 上午10:12写道：


> Hello
>
> Im from Brazil and Im follow hudi since version 0.5, congratulations for
> everyone, The hudi evolution in only one year is impressive.
>
> Me and my folks are very happy to choose hudi for our datalake.
>
> Thank you so much for this wonderfull work
>
> Em sex., 3 de set. de 2021 22:57, Raymond Xu <xu.shiyan.raym...@gmail.com>
> escreveu:
>
> > Congrats! Another awesome release.
> >
> > On Wed, Sep 1, 2021 at 11:49 AM Pratyaksh Sharma <pratyaks...@gmail.com>
> > wrote:
> >
> > > Great news! This one really feels like a major release with so many
> good
> > > features getting added. :)
> > >
> > > On Wed, Sep 1, 2021 at 7:19 AM Udit Mehrotra <udi...@apache.org>
> wrote:
> > >
> > > > The Apache Hudi team is pleased to announce the release of Apache
> Hudi
> > > > 0.9.0.
> > > >
> > > > This release comes almost 5 months after 0.8.0. It includes 387
> > resolved
> > > > issues, comprising new features as well as
> > > > general improvements and bug-fixes. Here are a few quick highlights:
> > > >
> > > > *Spark SQL DML and DDL Support*
> > > > We have added experimental support for DDL/DML using Spark SQL
> taking a
> > > > huge step towards making Hudi more
> > > > easily accessible and operable by all personas (non-engineers,
> analysts
> > > > etc). Users can now use SQL statements like
> > > > "CREATE TABLE....USING HUDI" and "CREATE TABLE .. AS SELECT" to
> > > > create/manage tables in catalogs like Hive,
> > > > and "INSERT", "INSERT OVERWRITE", "UPDATE", "MERGE INTO" and "DELETE"
> > > > statements to manipulate data.
> > > > For more information, checkout our docs here
> > > > <https://hudi.apache.org/docs/quick-start-guide> clicking on the
> > > SparkSQL
> > > > tab.
> > > >
> > > > *Query Side Improvements*
> > > > Hudi tables are now registered with Hive as spark datasource tables,
> > > > meaning Spark SQL on these tables now uses the
> > > > datasource as well, instead of relying on the Hive fallbacks within
> > > Spark,
> > > > which are ill-maintained/cumbersome. This
> > > > unlocks many optimizations such as the use of Hudi's own FileIndex
> > > > <
> > > >
> > >
> >
> https://github.com/apache/hudi/blob/bf5a52e51bbeaa089995335a0a4c55884792e505/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieFileIndex.scala#L46
> > > > >
> > > > implementation for optimized caching and the use
> > > > of the Hudi metadata table, for faster listing of large tables. We
> have
> > > > also added support for time travel query
> > > > <https://hudi.apache.org/docs/quick-start-guide#time-travel-query>,
> > for
> > > > spark
> > > > datasource.
> > > >
> > > > *Writer Side Improvements*
> > > > This release has several major writer side improvements. Virtual key
> > > > support has been added to avoid populating meta
> > > > fields and leverage existing fields to populate record keys and
> > partition
> > > > paths.
> > > > Bulk Insert operation using row writer is now enabled by default for
> > > faster
> > > > inserts.
> > > > Hudi's automatic cleaning of uncommitted data has been enhanced to be
> > > > performant over cloud stores. You can learn
> > > > more about this new centrally coordinated marker mechanism in this
> blog
> > > > <https://hudi.apache.org/blog/2021/08/18/improving-marker-mechanism/
> >.
> > > > Async Clustering support has been added to both DeltaStreamer and
> Spark
> > > > Structured Streaming Sink. More on this
> > > > can be found in this blog
> > > > <https://hudi.apache.org/blog/2021/08/23/async-clustering/>.
> > > > Users can choose to drop fields used to generate partition paths.
> > > > Added a new write operation "delete_partition" support in spark.
> Users
> > > can
> > > > leverage this to delete older partitions in
> > > > bulk, in addition to record level deletes.
> > > > Added Support for Huawei Cloud Object Storage, BAIDU AFS storage
> > format,
> > > > Baidu BOS storage in Hudi.
> > > > A pre commit validator framework
> > > > <
> > > >
> > >
> >
> https://github.com/apache/hudi/blob/bf5a52e51bbeaa089995335a0a4c55884792e505/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/validator/SparkPreCommitValidator.java
> > > > >
> > > > has been added for spark engine, which can be used for DeltaStreamer
> > and
> > > > Spark
> > > > Datasource writers. Users can leverage this to add any validations to
> > be
> > > > executed before committing writes to Hudi.
> > > > Few out of the box validators are available like
> > > > SqlQueryEqualityPreCommitValidator
> > > > <
> > > >
> > >
> >
> https://github.com/apache/hudi/blob/release-0.9.0/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/validator/SqlQueryEqualityPreCommitValidator.java
> > > > >,
> > > > SqlQueryInequalityPreCommitValidator
> > > > <
> > > >
> > >
> >
> https://github.com/apache/hudi/blob/release-0.9.0/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/validator/SqlQueryInequalityPreCommitValidator.java
> > > > >
> > > > and SqlQuerySingleResultPreCommitValidator
> > > > <
> > > >
> > >
> >
> https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/validator/SqlQuerySingleResultPreCommitValidator.java
> > > > >
> > > > .
> > > >
> > > > *Flink Integration Improvements*
> > > > The Flink writer now supports propagation of CDC format for MOR
> table,
> > by
> > > > turning on the option "changelog.enabled=true".
> > > > Hudi would then persist all change flags of each record, allowing
> users
> > > to
> > > > do stateful computation based on these change logs.
> > > > Flink writing is now close to feature parity with spark writing, with
> > > > addition of write operations like "bulk_insert" and
> > > > "insert_overwrite", support for non-partitioned tables, automatic
> > cleanup
> > > > of uncommitted data, global indexing support, hive
> > > > style partitioning and handling of partition path updates.
> > > > Writing also supports a new log append mode, where no records are
> > > > de-duplicated and base files are directly written for each flush.
> > > > Flink readers now support streaming reads from COW/MOR tables.
> > Deletions
> > > > are emitted by default in streaming read mode, the
> > > > downstream receives the "DELETE" message as a Hoodie record with
> empty
> > > > payload.
> > > > Hive sync has been improved by adding support for different Hive
> > versions
> > > > and asynchronous execution.
> > > > Flink Streamer tool now supports transformers.
> > > >
> > > > *DeltaStreamer Improvements*
> > > > We have enhanced DeltaStreamer utility with 3 new sources. JDBC
> > > > <
> > > >
> > >
> >
> https://github.com/apache/hudi/blob/release-0.9.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JdbcSource.java
> > > > >
> > > > will help with fetching data from RDBMS sources and
> > > > SQLSource
> > > > <
> > > >
> > >
> >
> https://github.com/apache/hudi/blob/release-0.9.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/SqlSource.java
> > > > >
> > > > will assist in backfilling use cases. S3EventsHoodieIncrSource
> > > > <
> > > >
> > >
> >
> https://github.com/apache/hudi/blob/release-0.9.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/S3EventsHoodieIncrSource.java
> > > > >
> > > > and S3EventsSource
> > > > <
> > > >
> > >
> >
> https://github.com/apache/hudi/blob/release-0.9.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/S3EventsSource.java
> > > > >
> > > > assist in reading data from S3
> > > > reliably and efficiently ingesting that to Hudi. In addition, we have
> > > added
> > > > support for timestamp based fetch from kafka and added
> > > > basic auth support to schema registry.
> > > >
> > > > Please find more information about the release here:
> > > > https://hudi.apache.org/releases/release-0.9.0
> > > >
> > > > For details on how to use Hudi, please look at the quick start page
> > > located
> > > > here:
> > > > https://hudi.apache.org/docs/quick-start-guide.html
> > > >
> > > > If you'd like to download the source release, you can find it here:
> > > > https://github.com/apache/hudi/releases/tag/release-0.9.0
> > > >
> > > > You can read more about the release (including release notes) here:
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12350027
> > > >
> > > > We welcome your help and feedback. For more information on how to
> > report
> > > > problems, and to get involved, visit the project
> > > > website at https://hudi.apache.org/
> > > >
> > > > Thanks to everyone involved!
> > > >
> > > > Udit Mehrotra
> > > > (on behalf of the Hudi Community)
> > > >
> > >
> >
>

Re: [ANNOUNCE] Apache Hudi 0.9.0 released

Reply via email to