[hudi] branch asf-site updated: [DOCS][BLOG] 2022 Blog post (#7581)

yihua Thu, 29 Dec 2022 14:45:47 -0800

This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 4880154bb11 [DOCS][BLOG] 2022 Blog post (#7581)
4880154bb11 is described below

commit 4880154bb1152353acbcc51b6390176e6d1e926b
Author: Kyle Weller <kywe...@gmail.com>
AuthorDate: Thu Dec 29 15:45:29 2022 -0700

    [DOCS][BLOG] 2022 Blog post (#7581)
---
 ...2022-12-29-Apache-Hudi-2022-A-Year-In-Review.md |  89 +++++++++++++++++++++
 .../assets/images/blog/Apache-Hudi-2022-Review.png | Bin 0 -> 664778 bytes
 .../assets/images/blog/Apache-Hudi-Conferences.png | Bin 0 -> 6480488 bytes
 .../blog/Apache-Hudi-Pull-Request-History.png      | Bin 0 -> 296199 bytes
 4 files changed, 89 insertions(+)

diff --git a/website/blog/2022-12-29-Apache-Hudi-2022-A-Year-In-Review.md 
b/website/blog/2022-12-29-Apache-Hudi-2022-A-Year-In-Review.md
new file mode 100644
index 00000000000..82246324766
--- /dev/null
+++ b/website/blog/2022-12-29-Apache-Hudi-2022-A-Year-In-Review.md
@@ -0,0 +1,89 @@
+---
+title: "Apache Hudi 2022 - A year in Review"
+excerpt: "2022 was the best year for Apache Hudi yet! Huge thank you to 
everyone who contributed!"
+author: Sivabalan Narayanan
+category: blog
+image: /assets/images/blog/Apache-Hudi-2022-Review.png
+tags:
+- apache hudi
+---
+
+<img src="/assets/images/blog/Apache-Hudi-2022-Review.png" alt="drawing" 
style={{width:'80%', display:'block', marginLeft:'auto', marginRight:'auto'}} />
+
+## Apache Hudi Momentum
+As we wrap up 2022 I want to take the opportunity to reflect on and highlight 
the incredible progress of the Apache Hudi 
+project and most importantly, the community. First and foremost, I want to 
thank all of the contributors who have made 
+2022 the best year for the project ever. There were [over 2,200 
PRs](https://ossinsight.io/analyze/apache/hudi#pull-requests) 
+created (+38% YoY) and over 600+ users engaged on 
[Github](https://github.com/apache/hudi/). The Apache Hudi community 
+[slack 
channel](https://join.slack.com/t/apache-hudi/shared_invite/zt-1e94d3xro-JvlNO1kSeIHJBTVfLPlI5w)
 has grown to more 
+than 2,600 users (+100% YoY growth) averaging nearly 200 messages per month! 
The most impressive stat is that with this 
+volume growth, the median response time to questions is ~3h. [Come join the 
community](https://join.slack.com/t/apache-hudi/shared_invite/zt-1e94d3xro-JvlNO1kSeIHJBTVfLPlI5w)
 
+where people are sharing and helping each other!
+
+<img src="/assets/images/blog/Apache-Hudi-Pull-Request-History.png" 
alt="drawing" style={{width:'80%', display:'block', marginLeft:'auto', 
marginRight:'auto'}} />
+
+## Key Releases in 2022
+2022 has been a year jam packed with exciting new features for Apache Hudi 
across 0.11.0 and 0.12.0 releases. In addition to new features, 
vendor/ecosystem partnerships and relationships have been strengthened across 
many in the community. [AWS continues to double 
down](https://www.onehouse.ai/blog/apache-hudi-native-aws-integrations) on 
Apache Hudi, upgrading versions in 
[EMR](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hudi.html), 
[Athena](https://docs.aws.amazon.com/athena [...]
+
+While there are too many features added in 2022 to list them all, take a look 
at some of the exciting highlights:
+
+- [Multi-Modal 
Index](https://hudi.apache.org/blog/2022/05/17/Introducing-Multi-Modal-Index-for-the-Lakehouse-in-Apache-Hudi)
 is a first-of-its-kind high-performance indexing subsystem for the Lakehouse. 
It improves metadata lookup performance by up to 100x and reduces overall query 
latency by up to 30x. Two new indices were added to the metadata table - Bloom 
filter index that enables faster upsert performance and[  column stats index 
along with Data skipping](https://hudi.apache.org/bl [...]
+- Hudi added support for [asynchronous 
indexing](https://hudi.apache.org/releases/release-0.11.0/#async-indexer) to 
assist building such indices without blocking ingestion so that regular writers 
don't need to scale up resources for such one off spikes.
+- A new type of index called Bucket Index was introduced this year. This could 
be game changing for deterministic workloads with partitioned datasets. It is 
very light-weight and allows the distribution of records to buckets using a 
hash function.
+- Filesystem based Lock Provider - This implementation avoids the need of 
external systems and leverages the abilities of underlying filesystem to 
support lock provider needed for optimistic concurrency control in case of 
multiple writers. Please check the [lock 
configuration](https://hudi.apache.org/docs/configurations#Locks-Configurations)
 for details.
+- Deltastreamer Graceful Completion - Users can now configure a post-write 
completion strategy with deltastreamer continuous mode for graceful shutdown.
+- Schema on read is supported as an experimental feature since 0.11.0, 
allowing users to leverage Spark SQL DDL  support for [evolving data 
schema](https://hudi.apache.org/docs/schema_evolution) needs(drop, rename etc). 
 Added support for a lot of [CALL 
commands](https://hudi.apache.org/docs/procedures/) to invoke an array of 
actions on Hudi tables.
+- It is now feasible to [encrypt](https://hudi.apache.org/docs/encryption/) 
your data that you store with Apache Hudi.
+- Pulsar Write Commit Callback - On new events to the Hudi table, users can 
get notified via Pulsar.
+- Flink Enhancements: We added metadata table support, async clustering, data 
skipping, and bucket index for write paths. We also extended flink support to 
versions 1.13.x, 1.14.x and[  
1.15.x](https://hudi.apache.org/releases/release-0.12.0/#bundle-updates).
+- Presto Hudi integration: In addition to the hive connector we have had for a 
long time, we added [native Presto Hudi 
connector](https://prestodb.io/docs/current/connector/hudi.html). This enables 
users to get access to advanced features of Hudi faster. Users can now leverage 
metadata table to reduce file listing cost. We also added support for accessing 
clustered datasets this year.
+- Trino Hudi integration: We also added [native Trino Hudi 
connector](https://trino.io/docs/current/connector/hudi.html) to assist in 
querying Hudi tables via Trino Engine. Users can now leverage metadata table to 
make their queries performant.
+- Performance enhancements: Many performance optimizations were landed by the 
community throughout the year to keep Hudi on par with competition or better. 
Check out this [TPC-DS 
benchmark](https://hudi.apache.org/blog/2022/06/29/Apache-Hudi-vs-Delta-Lake-transparent-tpc-ds-lakehouse-performance-benchmarks)
 comparing Hudi vs Delta Lake.
+- [Long Term 
Support](https://hudi.apache.org/releases/release-0.12.2#long-term-support): We 
start to maintain 0.12 as the Long Term Support releases for users to migrate 
to and stay for a longer duration. In lieu of that, we have made 0.12.1  and 
0.12.2 releases to assist users with stable release that comes packed with a 
lot of stability and bug fixes.
+
+## Community Events
+Apache Hudi is a global community and thankfully we live in a world today that 
empowers virtual collaboration and productivity. In addition to connecting 
virtually this year we have seen the Apache Hudi community gather at many 
events in person. Re:Invent, Data+AI Summit, Flink Forward, Alluxio Day, Data 
Council, PrestoCon, Confluent Current, DBT Coalesce, Cinco de Trino, Data 
Platform Summit, and many more.
+
+<img src="/assets/images/blog/Apache-Hudi-Conferences.png" alt="drawing" 
style={{width:'80%', display:'block', marginLeft:'auto', marginRight:'auto'}} />
+
+You don’t have to travel far to meet and collaborate with the Hudi community. 
We hold monthly virtual meetups, weekly office hours, and there are plenty of 
friendly faces on Hudi Slack who like to talk shop. Join us via Zoom for the 
next Hudi meetup!
+
+## Community Content
+A wide diversity of organizations around the globe use Apache Hudi as the 
foundation of their production data platforms. Over 800+ organizations have 
engaged with Hudi (up 60% YoY) Here are a few highlights of content written by 
the community sharing their experiences, designs, and best practices:
+
+1. [Build your Hudi data lake on 
AWS](https://aws.amazon.com/blogs/big-data/part-1-build-your-apache-hudi-data-lake-on-aws-using-amazon-emr/)
 - Suthan Phillips and Dylan Qu from AWS
+2. [Soumil Shah Hudi Youtube 
Playlist](https://www.youtube.com/playlist?list=PLL2hlSFBmWwwbMpcyMjYuRn8cN99gFSY6)
 - Soumil Shah from JobTarget
+3. [SCD-2 with Apache 
Hudi](https://medium.com/walmartglobaltech/implementation-of-scd-2-slowly-changing-dimension-with-apache-hudi-465e0eb94a5)
 - Jayasheel Kalgal from Walmart
+4. [Hudi vs Delta vs Iceberg 
comparisons](https://www.onehouse.ai/blog/apache-hudi-vs-delta-lake-vs-apache-iceberg-lakehouse-feature-comparison)
 - Kyle Weller from Onehouse
+5. [Serverless, real-time analytics 
platform](https://aws.amazon.com/blogs/big-data/how-nerdwallet-uses-aws-and-apache-hudi-to-build-a-serverless-real-time-analytics-platform/)
 - Kevin Chun from NerdWallet
+6. [DBT and Hudi to Build Open 
Lakehouse](https://hudi.apache.org/blog/2022/07/11/build-open-lakehouse-using-apache-hudi-and-dbt/)
 - Vinoth Govindarajan from Apple
+7. [TPC-DS Benchmarks Hudi vs Delta 
Lake](https://www.onehouse.ai/blog/apache-hudi-vs-delta-lake-transparent-tpc-ds-lakehouse-performance-benchmarks)
 - Alexey Kudinkin from Onehouse
+8. [Key Learnings Using Hudi building a 
Lakehouse](https://blogs.halodoc.io/key-learnings-on-using-apache-hudi-in-building-lakehouse-architecture-halodoc/)
 - Jitendra Shah from Halodoc
+9. [Growing your business with modern data 
capabilities](https://aws.amazon.com/blogs/architecture/insights-for-ctos-part-3-growing-your-business-with-modern-data-capabilities/)
 - Jonathan Hwang from Zendesk
+10. [Low-latency data lake using MSK, Flink, and 
Hudi](https://aws.amazon.com/blogs/big-data/create-a-low-latency-source-to-data-lake-pipeline-using-amazon-msk-connect-apache-flink-and-apache-hudi/)
 - Ali Alemi from AWS
+11. [Fresher data lakes on AWS 
S3](https://robinhood.engineering/author-balaji-varadarajan-e3f496815ebf) - 
Balaji Varadarajan from Robinhood
+12. [Experiences with Hudi from Uber 
meetup](https://www.youtube.com/watch?v=ZamXiT9aqs8) - Sam Guleff from Walmart 
and Vinay Patil from Disney+ Hotstar
+
+## What to look for in 2023
+Thanks to the strength of the community, Apache Hudi has a bright future for 
2023. Check out [this recording](https://youtu.be/9LPSdd-AS8E?t=2090) from our 
Re:Invent meetup where Vinoth Chandar talks about exciting new features to 
expect in 2023.
+
+0.13.0 will be the next major release, with a package of exciting new 
features. Here are a few highlights:
+
+- [Record-key-based 
index](https://cwiki.apache.org/confluence/display/HUDI/RFC-08++Record+level+indexing+mechanisms+for+Hudi+datasets)
 to speed up the lookup of records for UUID-based updates and deletes, well 
tested with 10+ TB index data for hundreds of billions of records at Uber;
+- [Consistent Hashing 
Index](https://github.com/apache/hudi/blob/master/rfc/rfc-42/rfc-42.md) with 
dynamically-sized buckets to achieve fast upsert performance with no data skew 
among file groups compared to existing [Bucket 
Index](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+29%3A+Hash+Index);
+- [New CDC 
format](https://github.com/apache/hudi/blob/master/rfc/rfc-51/rfc-51.md) with 
Debezium-like database change logs to provide before and after image and 
operation field for streaming changes from Hudi tables, friendly to engines 
like Flink;
+- [New Record Merge 
API](https://github.com/apache/hudi/blob/master/rfc/rfc-46/rfc-46.md) to 
support engine-specific record representation for more efficient writes;
+- [Early detection of 
conflicts](https://github.com/apache/hudi/blob/master/rfc/rfc-56/rfc-56.md) 
among concurrent writers to give back compute resources proactively.
+
+The long-term vision of Apache Hudi is to make streaming data lake the 
mainstream, achieving sub-minute commit SLAs with stellar query performance and 
incremental ETLs.  We plan to harden the indexing subsystem with [Table 
APIs](https://github.com/apache/hudi/pull/7080) for easy integration with query 
engines and access to Hudi metadata and indexes, [Indexing 
Functions](https://github.com/apache/hudi/pull/7235) and [a Federated Storage 
Layer](https://github.com/apache/hudi/blob/master/rf [...]
+
+Check out [Hudi Roadmap](https://hudi.apache.org/roadmap) for more to come in 
2023!
+
+If you haven't tried Apache Hudi yet, 2023 is your year! Here are a few useful 
links to help you get started:
+ 
+1. [Apache Hudi Docs](https://hudi.apache.org/docs/overview)
+2. [Hudi Slack 
Channel](https://join.slack.com/t/apache-hudi/shared_invite/zt-1e94d3xro-JvlNO1kSeIHJBTVfLPlI5w)
+3. [Hudi Weekly Office Hours](https://hudi.apache.org/community/office_hours) 
and [Monthly 
Meetup](https://hudi.apache.org/community/syncs#monthly-community-call)
+4. [Contributor Guide](https://hudi.apache.org/contribute/how-to-contribute)
+
+If you enjoyed Hudi in 2022 don't forget to give it a little star on 
[Github](https://github.com/apache/hudi/) ⭐
\ No newline at end of file
diff --git a/website/static/assets/images/blog/Apache-Hudi-2022-Review.png 
b/website/static/assets/images/blog/Apache-Hudi-2022-Review.png
new file mode 100644
index 00000000000..1d35f21e8cb
Binary files /dev/null and 
b/website/static/assets/images/blog/Apache-Hudi-2022-Review.png differ
diff --git a/website/static/assets/images/blog/Apache-Hudi-Conferences.png 
b/website/static/assets/images/blog/Apache-Hudi-Conferences.png
new file mode 100644
index 00000000000..29f5cab0598
Binary files /dev/null and 
b/website/static/assets/images/blog/Apache-Hudi-Conferences.png differ
diff --git 
a/website/static/assets/images/blog/Apache-Hudi-Pull-Request-History.png 
b/website/static/assets/images/blog/Apache-Hudi-Pull-Request-History.png
new file mode 100644
index 00000000000..f0b0a9648c1
Binary files /dev/null and 
b/website/static/assets/images/blog/Apache-Hudi-Pull-Request-History.png differ

[hudi] branch asf-site updated: [DOCS][BLOG] 2022 Blog post (#7581)

Reply via email to