Re: [ANNOUNCE] Apache Hudi 0.9.0 released

Rubens Rodrigues Fri, 03 Sep 2021 19:12:41 -0700

Hello

Im from Brazil and Im follow hudi since version 0.5, congratulations for
everyone, The hudi evolution in only one year is impressive.


Me and my folks are very happy to choose hudi for our datalake.

Thank you so much for this wonderfull work

Em sex., 3 de set. de 2021 22:57, Raymond Xu <[email protected]>
escreveu:

> Congrats! Another awesome release.
>
> On Wed, Sep 1, 2021 at 11:49 AM Pratyaksh Sharma <[email protected]>
> wrote:
>
> > Great news! This one really feels like a major release with so many good
> > features getting added. :)
> >
> > On Wed, Sep 1, 2021 at 7:19 AM Udit Mehrotra <[email protected]> wrote:
> >
> > > The Apache Hudi team is pleased to announce the release of Apache Hudi
> > > 0.9.0.
> > >
> > > This release comes almost 5 months after 0.8.0. It includes 387
> resolved
> > > issues, comprising new features as well as
> > > general improvements and bug-fixes. Here are a few quick highlights:
> > >
> > > *Spark SQL DML and DDL Support*
> > > We have added experimental support for DDL/DML using Spark SQL taking a
> > > huge step towards making Hudi more
> > > easily accessible and operable by all personas (non-engineers, analysts
> > > etc). Users can now use SQL statements like
> > > "CREATE TABLE....USING HUDI" and "CREATE TABLE .. AS SELECT" to
> > > create/manage tables in catalogs like Hive,
> > > and "INSERT", "INSERT OVERWRITE", "UPDATE", "MERGE INTO" and "DELETE"
> > > statements to manipulate data.
> > > For more information, checkout our docs here
> > > <https://hudi.apache.org/docs/quick-start-guide> clicking on the
> > SparkSQL
> > > tab.
> > >
> > > *Query Side Improvements*
> > > Hudi tables are now registered with Hive as spark datasource tables,
> > > meaning Spark SQL on these tables now uses the
> > > datasource as well, instead of relying on the Hive fallbacks within
> > Spark,
> > > which are ill-maintained/cumbersome. This
> > > unlocks many optimizations such as the use of Hudi's own FileIndex
> > > <
> > >
> >
> https://github.com/apache/hudi/blob/bf5a52e51bbeaa089995335a0a4c55884792e505/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieFileIndex.scala#L46
> > > >
> > > implementation for optimized caching and the use
> > > of the Hudi metadata table, for faster listing of large tables. We have
> > > also added support for time travel query
> > > <https://hudi.apache.org/docs/quick-start-guide#time-travel-query>,
> for
> > > spark
> > > datasource.
> > >
> > > *Writer Side Improvements*
> > > This release has several major writer side improvements. Virtual key
> > > support has been added to avoid populating meta
> > > fields and leverage existing fields to populate record keys and
> partition
> > > paths.
> > > Bulk Insert operation using row writer is now enabled by default for
> > faster
> > > inserts.
> > > Hudi's automatic cleaning of uncommitted data has been enhanced to be
> > > performant over cloud stores. You can learn
> > > more about this new centrally coordinated marker mechanism in this blog
> > > <https://hudi.apache.org/blog/2021/08/18/improving-marker-mechanism/>.
> > > Async Clustering support has been added to both DeltaStreamer and Spark
> > > Structured Streaming Sink. More on this
> > > can be found in this blog
> > > <https://hudi.apache.org/blog/2021/08/23/async-clustering/>.
> > > Users can choose to drop fields used to generate partition paths.
> > > Added a new write operation "delete_partition" support in spark. Users
> > can
> > > leverage this to delete older partitions in
> > > bulk, in addition to record level deletes.
> > > Added Support for Huawei Cloud Object Storage, BAIDU AFS storage
> format,
> > > Baidu BOS storage in Hudi.
> > > A pre commit validator framework
> > > <
> > >
> >
> https://github.com/apache/hudi/blob/bf5a52e51bbeaa089995335a0a4c55884792e505/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/validator/SparkPreCommitValidator.java
> > > >
> > > has been added for spark engine, which can be used for DeltaStreamer
> and
> > > Spark
> > > Datasource writers. Users can leverage this to add any validations to
> be
> > > executed before committing writes to Hudi.
> > > Few out of the box validators are available like
> > > SqlQueryEqualityPreCommitValidator
> > > <
> > >
> >
> https://github.com/apache/hudi/blob/release-0.9.0/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/validator/SqlQueryEqualityPreCommitValidator.java
> > > >,
> > > SqlQueryInequalityPreCommitValidator
> > > <
> > >
> >
> https://github.com/apache/hudi/blob/release-0.9.0/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/validator/SqlQueryInequalityPreCommitValidator.java
> > > >
> > > and SqlQuerySingleResultPreCommitValidator
> > > <
> > >
> >
> https://github.com/apache/hudi/blob/master/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/validator/SqlQuerySingleResultPreCommitValidator.java
> > > >
> > > .
> > >
> > > *Flink Integration Improvements*
> > > The Flink writer now supports propagation of CDC format for MOR table,
> by
> > > turning on the option "changelog.enabled=true".
> > > Hudi would then persist all change flags of each record, allowing users
> > to
> > > do stateful computation based on these change logs.
> > > Flink writing is now close to feature parity with spark writing, with
> > > addition of write operations like "bulk_insert" and
> > > "insert_overwrite", support for non-partitioned tables, automatic
> cleanup
> > > of uncommitted data, global indexing support, hive
> > > style partitioning and handling of partition path updates.
> > > Writing also supports a new log append mode, where no records are
> > > de-duplicated and base files are directly written for each flush.
> > > Flink readers now support streaming reads from COW/MOR tables.
> Deletions
> > > are emitted by default in streaming read mode, the
> > > downstream receives the "DELETE" message as a Hoodie record with empty
> > > payload.
> > > Hive sync has been improved by adding support for different Hive
> versions
> > > and asynchronous execution.
> > > Flink Streamer tool now supports transformers.
> > >
> > > *DeltaStreamer Improvements*
> > > We have enhanced DeltaStreamer utility with 3 new sources. JDBC
> > > <
> > >
> >
> https://github.com/apache/hudi/blob/release-0.9.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JdbcSource.java
> > > >
> > > will help with fetching data from RDBMS sources and
> > > SQLSource
> > > <
> > >
> >
> https://github.com/apache/hudi/blob/release-0.9.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/SqlSource.java
> > > >
> > > will assist in backfilling use cases. S3EventsHoodieIncrSource
> > > <
> > >
> >
> https://github.com/apache/hudi/blob/release-0.9.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/S3EventsHoodieIncrSource.java
> > > >
> > > and S3EventsSource
> > > <
> > >
> >
> https://github.com/apache/hudi/blob/release-0.9.0/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/S3EventsSource.java
> > > >
> > > assist in reading data from S3
> > > reliably and efficiently ingesting that to Hudi. In addition, we have
> > added
> > > support for timestamp based fetch from kafka and added
> > > basic auth support to schema registry.
> > >
> > > Please find more information about the release here:
> > > https://hudi.apache.org/releases/release-0.9.0
> > >
> > > For details on how to use Hudi, please look at the quick start page
> > located
> > > here:
> > > https://hudi.apache.org/docs/quick-start-guide.html
> > >
> > > If you'd like to download the source release, you can find it here:
> > > https://github.com/apache/hudi/releases/tag/release-0.9.0
> > >
> > > You can read more about the release (including release notes) here:
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12350027
> > >
> > > We welcome your help and feedback. For more information on how to
> report
> > > problems, and to get involved, visit the project
> > > website at https://hudi.apache.org/
> > >
> > > Thanks to everyone involved!
> > >
> > > Udit Mehrotra
> > > (on behalf of the Hudi Community)
> > >
> >
>

Re: [ANNOUNCE] Apache Hudi 0.9.0 released

Reply via email to