This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push: new a702ced7f0f [DOCS] Diagram Changes for Clustering, Rollbacks, Table Types (#10510) a702ced7f0f is described below commit a702ced7f0f4e0e058ae0f0eaff28ec278f62fbf Author: Dipankar Mazumdar <103004148+dipankarmazum...@users.noreply.github.com> AuthorDate: Tue Jan 16 21:46:06 2024 -0500 [DOCS] Diagram Changes for Clustering, Rollbacks, Table Types (#10510) * remaining diagrams * fixed issue with rollbacks page --------- Co-authored-by: Dipankar Mazumdar <dipankarmazum...@dipankars-mbp-2.home> --- website/docs/clustering.md | 6 +++--- website/docs/rollbacks.md | 4 ++-- website/docs/table_types.md | 4 ++-- website/static/assets/images/COW_new.png | Bin 0 -> 1034864 bytes website/static/assets/images/MOR_new.png | Bin 0 -> 1342587 bytes .../assets/images/blog/clustering/clustering1_new.png | Bin 0 -> 1420549 bytes .../assets/images/blog/clustering/clustering2_new.png | Bin 0 -> 302821 bytes .../assets/images/blog/clustering/clustering_3.png | Bin 0 -> 513090 bytes .../assets/images/blog/rollbacks/Rollback_1.png | Bin 0 -> 311672 bytes .../assets/images/blog/rollbacks/rollback2_new.png | Bin 0 -> 569899 bytes 10 files changed, 7 insertions(+), 7 deletions(-) diff --git a/website/docs/clustering.md b/website/docs/clustering.md index 2feab1902ac..7749292b1cf 100644 --- a/website/docs/clustering.md +++ b/website/docs/clustering.md @@ -59,7 +59,7 @@ Clustering Service builds on Hudi’s MVCC based design to allow for writers to NOTE: Clustering can only be scheduled for tables / partitions not receiving any concurrent updates. In the future, concurrent updates use-case will be supported as well. -![Clustering example](/assets/images/blog/clustering/example_perf_improvement.png) +![Clustering example](/assets/images/blog/clustering/clustering1_new.png) _Figure: Illustrating query performance improvements by clustering_ ## Clustering Usecases @@ -71,7 +71,7 @@ such small files could lead to higher query latency. From our experience support few users who are using Hudi just for small file handling capabilities. So, you could employ clustering to batch a lot of such small files into larger ones. -![Batching small files](/assets/images/clustering_small_files.gif) +![Batching small files](/assets/images/blog/clustering/clustering2_new.png) ### Cluster by sort key @@ -80,7 +80,7 @@ arrival time, while query predicates do not sit well with it. With clustering, y based on query predicates and so, your data skipping will be very efficient and your query can ignore scanning a lot of unnecessary data. -![Batching small files](/assets/images/clustering_sort.gif) +![Batching small files](/assets/images/blog/clustering/clustering_3.png) ## Clustering Strategies diff --git a/website/docs/rollbacks.md b/website/docs/rollbacks.md index 5a2ebf2a70b..c78b8f3b084 100644 --- a/website/docs/rollbacks.md +++ b/website/docs/rollbacks.md @@ -35,7 +35,7 @@ for any actions/commits that is not yet committed and that refers to partially f is triggered and all dirty data is cleaned up followed by cleaning up the commit instants from the timeline. -![An example illustration of single writer rollbacks](/assets/images/blog/rollbacks/single_write_rollback.png) +![An example illustration of single writer rollbacks](/assets/images/blog/rollbacks/Rollback_1.png) _Figure 1: single writer with eager rollbacks_ @@ -63,7 +63,7 @@ information whether the writer that started the commit of interest is still maki the commit, the heartbeat file is deleted. Or if the write failed midway, the last modification time of the heartbeat file is no longer updated, so other writers can deduce the failed write after a period of time elapses. -![An example illustration of multi writer rollbacks](/assets/images/blog/rollbacks/multi_writer_rollback.png) +![An example illustration of multi writer rollbacks](/assets/images/blog/rollbacks/rollback2_new.png) _Figure 2: multi-writer with lazy cleaning of failed commits_ ## Related Resources diff --git a/website/docs/table_types.md b/website/docs/table_types.md index 28814d239e8..e280909a9f3 100644 --- a/website/docs/table_types.md +++ b/website/docs/table_types.md @@ -69,7 +69,7 @@ Following illustrates how this works conceptually, when data written into copy-o <figure> - <img className="docimage" src={require("/assets/images/hudi_cow.png").default} alt="hudi_cow.png" /> + <img className="docimage" src={require("/assets/images/COW_new.png").default} alt="hudi_cow.png" /> </figure> @@ -97,7 +97,7 @@ their columnar base file, to keep the query performance in check (larger delta l Following illustrates how the table works, and shows two types of queries - snapshot query and read optimized query. <figure> - <img className="docimage" src={require("/assets/images/hudi_mor.png").default} alt="hudi_mor.png" /> + <img className="docimage" src={require("/assets/images/MOR_new.png").default} alt="hudi_mor.png" /> </figure> There are lot of interesting things happening in this example, which bring out the subtleties in the approach. diff --git a/website/static/assets/images/COW_new.png b/website/static/assets/images/COW_new.png new file mode 100644 index 00000000000..9a996e01c76 Binary files /dev/null and b/website/static/assets/images/COW_new.png differ diff --git a/website/static/assets/images/MOR_new.png b/website/static/assets/images/MOR_new.png new file mode 100644 index 00000000000..519e9eb6fb8 Binary files /dev/null and b/website/static/assets/images/MOR_new.png differ diff --git a/website/static/assets/images/blog/clustering/clustering1_new.png b/website/static/assets/images/blog/clustering/clustering1_new.png new file mode 100644 index 00000000000..6aec715ae6c Binary files /dev/null and b/website/static/assets/images/blog/clustering/clustering1_new.png differ diff --git a/website/static/assets/images/blog/clustering/clustering2_new.png b/website/static/assets/images/blog/clustering/clustering2_new.png new file mode 100644 index 00000000000..5ccd84ab083 Binary files /dev/null and b/website/static/assets/images/blog/clustering/clustering2_new.png differ diff --git a/website/static/assets/images/blog/clustering/clustering_3.png b/website/static/assets/images/blog/clustering/clustering_3.png new file mode 100644 index 00000000000..8d1ca9275d6 Binary files /dev/null and b/website/static/assets/images/blog/clustering/clustering_3.png differ diff --git a/website/static/assets/images/blog/rollbacks/Rollback_1.png b/website/static/assets/images/blog/rollbacks/Rollback_1.png new file mode 100644 index 00000000000..cc3fd458c22 Binary files /dev/null and b/website/static/assets/images/blog/rollbacks/Rollback_1.png differ diff --git a/website/static/assets/images/blog/rollbacks/rollback2_new.png b/website/static/assets/images/blog/rollbacks/rollback2_new.png new file mode 100644 index 00000000000..f7bd86a5c0f Binary files /dev/null and b/website/static/assets/images/blog/rollbacks/rollback2_new.png differ