This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new efdbfba  Publishing website 2021/12/03 18:03:23 at commit 862ece1
efdbfba is described below

commit efdbfbab9e1221d92319bbc4728d12382197ac73
Author: jenkins <bui...@apache.org>
AuthorDate: Fri Dec 3 18:03:24 2021 +0000

    Publishing website 2021/12/03 18:03:23 at commit 862ece1
---
 website/generated-content/documentation/index.xml  | 39 +++++++++++++-
 .../index.html                                     |  2 +-
 .../runners/capability-matrix/index.html           |  2 +-
 .../runners/capability-matrix/index.xml            | 62 ++++++++++++++++++++++
 .../documentation/runtime/model/index.html         | 29 ++++++++--
 website/generated-content/sitemap.xml              |  2 +-
 6 files changed, 128 insertions(+), 8 deletions(-)

diff --git a/website/generated-content/documentation/index.xml 
b/website/generated-content/documentation/index.xml
index 7289a86..fc279a0 100644
--- a/website/generated-content/documentation/index.xml
+++ b/website/generated-content/documentation/index.xml
@@ -15395,7 +15395,9 @@ serializing the elements and broadcasting them to all 
the workers executing
 the &lt;code>ParDo&lt;/code>.&lt;/li>
 &lt;li>Passing elements between transforms that are running on the same worker.
 This may allow the runner to avoid serializing elements; instead, the runner
-can just pass the elements in memory.&lt;/li>
+can just pass the elements in memory. This is done as part of an
+optimization that is known as
+&lt;a 
href="https://beam.apache.org/documentation/glossary/#fusion";>fusion&lt;/a>.&lt;/li>
 &lt;/ul>
 &lt;p>Some situations where the runner may serialize and persist elements 
are:&lt;/p>
 &lt;ol>
@@ -15420,6 +15422,41 @@ choose an appropriate middle-ground between persisting 
results after every
 element, and having to retry everything if there is a failure. For example, a
 streaming runner may prefer to process and commit small bundles, and a batch
 runner may prefer to process larger bundles.&lt;/p>
+&lt;h3 id="data-partitioning-and-inter-stage-execution">Data partitioning and 
inter-stage execution&lt;/h3>
+&lt;p>Partitioning and parallelization of element processing within a Beam 
pipeline is
+dependent on two things:&lt;/p>
+&lt;ul>
+&lt;li>Data source implementation&lt;/li>
+&lt;li>Inter-stage key parallelism&lt;/li>
+&lt;/ul>
+&lt;p>Beam pipelines read data from a source (e.g. &lt;code>KafkaIO&lt;/code>, 
&lt;code>BigQueryIO&lt;/code>, &lt;code>JdbcIO&lt;/code>,
+or your own source implementation). To implement a Source in Beam one must
+implement it as a Splittable &lt;code>DoFn&lt;/code>. A Splittable 
&lt;code>DoFn&lt;/code> provides the runner
+with interfaces to facilitate the splitting of work.&lt;/p>
+&lt;p>When running key-based operations in Beam (e.g. 
&lt;code>GroupByKey&lt;/code>, &lt;code>Combine&lt;/code>,
+&lt;code>Reshuffle.perKey&lt;/code>, and stateful &lt;code>DoFn&lt;/code>s), 
Beam runners perform serialization
+and transfer of data known as &lt;em>shuffle&lt;/em>&lt;sup>1&lt;/sup>. 
Shuffle allows data
+elements of the same key to be processed together.&lt;/p>
+&lt;p>The way in which runners &lt;em>shuffle&lt;/em> data may be slightly 
different for Batch and
+Streaming execution modes.&lt;/p>
+&lt;p>&lt;sup>1&lt;/sup>Not to be confused with the &lt;code>shuffle&lt;/code> 
operation in some runners.&lt;/p>
+&lt;h4 id="data-ordering-in-a-pipeline-execution">Data ordering in a pipeline 
execution&lt;/h4>
+&lt;p>The Beam model does not define strict guidelines regarding the order in 
which
+runners process elements or transport them across 
&lt;code>PTransforms&lt;/code>. Runners are
+free to implement data transfer semantics in different forms.&lt;/p>
+&lt;p>Some use cases exist where user pipelines may need to rely on specific 
ordering
+semantics in pipeline execution. The &lt;a 
href="/documentation/runners/capability-matrix/additional-common-features-not-yet-part-of-the-beam-model/index.html">capability
 matrix documents&lt;/a>
+runner behavior for &lt;strong>key-ordered delivery&lt;/strong>.&lt;/p>
+&lt;p>Consider a single Beam worker processing a series of bundles from the 
same Beam
+transform, and consider a &lt;code>PTransform&lt;/code> that outputs data from 
this Stage into a
+downstream &lt;code>PCollection&lt;/code>. Finally, consider two events 
&lt;em>with the same key&lt;/em>
+emitted in a certain order by this worker (within the same bundle or as part of
+different bundles).&lt;/p>
+&lt;p>We say that the Beam runner supports &lt;strong>key-ordered 
delivery&lt;/strong> if it guarantees
+that these two events will be observed in the same order by a PTransform that 
is
+immediately downstream independently of the kind of data transmission 
method.&lt;/p>
+&lt;p>This characteristic will hold true in runners and operations that have
+key-limited parallelism.&lt;/p>
 &lt;h2 id="parallelism">Failures and parallelism within and between 
transforms&lt;/h2>
 &lt;p>In this section, we discuss how elements in the input collection are 
processed
 in parallel, and how transforms are retried when failures occur.&lt;/p>
diff --git 
a/website/generated-content/documentation/runners/capability-matrix/additional-common-features-not-yet-part-of-the-beam-model/index.html
 
b/website/generated-content/documentation/runners/capability-matrix/additional-common-features-not-yet-part-of-the-beam-model/index.html
index f3ef192..4082d17 100644
--- 
a/website/generated-content/documentation/runners/capability-matrix/additional-common-features-not-yet-part-of-the-beam-model/index.html
+++ 
b/website/generated-content/documentation/runners/capability-matrix/additional-common-features-not-yet-part-of-the-beam-model/index.html
@@ -18,7 +18,7 @@
 function addPlaceholder(){$('input:text').attr('placeholder',"What are you 
looking for?");}
 function endSearch(){var 
search=document.querySelector(".searchBar");search.classList.add("disappear");var
 icons=document.querySelector("#iconsBar");icons.classList.remove("disappear");}
 function blockScroll(){$("body").toggleClass("fixedPosition");}
-function openMenu(){addPlaceholder();blockScroll();}</script><div 
class="clearfix container-main-content"><a class=back-button 
href=/documentation/runners/capability-matrix><i class="fas 
fa-arrow-left"></i>back to collapsed details</a><h4>Additional common features 
not yet part of the Beam model</h4><div class=table-container><div 
class="table-left 
big-left"><table><tr><th></th></tr><tr><th>Drain</th></tr><tr><th>Checkpoint</th></tr></table></div><div
 class="table-right big-right"><div i [...]
+function openMenu(){addPlaceholder();blockScroll();}</script><div 
class="clearfix container-main-content"><a class=back-button 
href=/documentation/runners/capability-matrix><i class="fas 
fa-arrow-left"></i>back to collapsed details</a><h4>Additional common features 
not yet part of the Beam model</h4><div class=table-container><div 
class="table-left 
big-left"><table><tr><th></th></tr><tr><th>Drain</th></tr><tr><th>Checkpoint</th></tr><tr><th>Key-ordered
 delivery</th></tr></table></div><di [...]
 <a href=http://www.apache.org>The Apache Software Foundation</a>
 | <a href=/privacy_policy>Privacy Policy</a>
 | <a href=/feed.xml>RSS Feed</a><br><br>Apache Beam, Apache, Beam, the Beam 
logo, and the Apache feather logo are either registered trademarks or 
trademarks of The Apache Software Foundation. All other products or name brands 
are trademarks of their respective holders, including The Apache Software 
Foundation.</div></div></div></div></footer></body></html>
\ No newline at end of file
diff --git 
a/website/generated-content/documentation/runners/capability-matrix/index.html 
b/website/generated-content/documentation/runners/capability-matrix/index.html
index 057fe9a..245259b 100644
--- 
a/website/generated-content/documentation/runners/capability-matrix/index.html
+++ 
b/website/generated-content/documentation/runners/capability-matrix/index.html
@@ -18,7 +18,7 @@
 function addPlaceholder(){$('input:text').attr('placeholder',"What are you 
looking for?");}
 function endSearch(){var 
search=document.querySelector(".searchBar");search.classList.add("disappear");var
 icons=document.querySelector("#iconsBar");icons.classList.remove("disappear");}
 function blockScroll(){$("body").toggleClass("fixedPosition");}
-function openMenu(){addPlaceholder();blockScroll();}</script><div 
class="clearfix container-main-content"><div class="section-nav closed" 
data-offset-top=90 data-offset-bottom=500><span class="section-nav-back 
glyphicon glyphicon-menu-left"></span><nav><ul class=section-nav-list 
data-section-nav><li><span 
class=section-nav-list-main-title>Runners</span></li><li><a 
href=/documentation/runners/capability-matrix/>Capability Matrix</a></li><li><a 
href=/documentation/runners/direct/>Direct Ru [...]
+function openMenu(){addPlaceholder();blockScroll();}</script><div 
class="clearfix container-main-content"><div class="section-nav closed" 
data-offset-top=90 data-offset-bottom=500><span class="section-nav-back 
glyphicon glyphicon-menu-left"></span><nav><ul class=section-nav-list 
data-section-nav><li><span 
class=section-nav-list-main-title>Runners</span></li><li><a 
href=/documentation/runners/capability-matrix/>Capability Matrix</a></li><li><a 
href=/documentation/runners/direct/>Direct Ru [...]
 
<script>$('.table-headers').scroll(function(e){$('#'+this.id+'.table-center').scrollLeft($(this).scrollLeft());});$('.table-center').scroll(function(e){$('#'+this.id+'.table-headers').scrollLeft($(this).scrollLeft());});</script><div
 class=feedback><p class=update>Last updated on 2021/02/05</p><h3>Have you 
found everything you were looking for?</h3><p class=description>Was it all 
useful and clear? Is there anything that you would like to change? Let us 
know!</p><button class=load-button> [...]
 <a href=http://www.apache.org>The Apache Software Foundation</a>
 | <a href=/privacy_policy>Privacy Policy</a>
diff --git 
a/website/generated-content/documentation/runners/capability-matrix/index.xml 
b/website/generated-content/documentation/runners/capability-matrix/index.xml
index 8ac8d1b..88ede27 100644
--- 
a/website/generated-content/documentation/runners/capability-matrix/index.xml
+++ 
b/website/generated-content/documentation/runners/capability-matrix/index.xml
@@ -28,6 +28,9 @@ back to collapsed details
 &lt;tr>
 &lt;th>Checkpoint&lt;/th>
 &lt;/tr>
+&lt;tr>
+&lt;th>Key-ordered delivery&lt;/th>
+&lt;/tr>
 &lt;/table>
 &lt;/div>
 &lt;div class="table-right big-right">
@@ -155,6 +158,65 @@ Samza has a native checkpoint capability.
 &lt;br>
 &lt;/td>
 &lt;/tr>
+&lt;tr>
+&lt;td style='background-color:#f9f9f9;border-color:#d8d8d8'>
+&lt;b>
+&lt;p>Partially : &lt;/p>
+&lt;/b>
+&lt;br>
+Dataflow performs different shuffling algorithms for batch and streaming. 
Dataflow guarantees key-ordered delivery in streaming, though not in batch.
+&lt;/td>
+&lt;td style='background-color:#f9f9f9;border-color:#d8d8d8'>
+&lt;b>
+&lt;p>Partially : &lt;/p>
+&lt;/b>
+&lt;br>
+Flink may perform different shuffling algorithms for batch and streaming. 
Flink guarantees key-ordered delivery in streaming, though not in batch.
+&lt;/td>
+&lt;td style='background-color:#e1e0e0;border-color:#bcbcbc'>
+&lt;b>
+&lt;p>Unverified : &lt;/p>
+&lt;/b>
+&lt;br>
+&lt;/td>
+&lt;td style='background-color:#e1e0e0;border-color:#bcbcbc'>
+&lt;b>
+&lt;p>Unverified : &lt;/p>
+&lt;/b>
+&lt;br>
+&lt;/td>
+&lt;td style='background-color:#e1e0e0;border-color:#bcbcbc'>
+&lt;b>
+&lt;p>Unverified : &lt;/p>
+&lt;/b>
+&lt;br>
+&lt;/td>
+&lt;td style='background-color:#f9f9f9;border-color:#d8d8d8'>
+&lt;b>
+&lt;p>Partially : &lt;/p>
+&lt;/b>
+&lt;br>
+Samza may perform different shuffling algorithms for batch and streaming. 
Samza guarantees key-ordered delivery in streaming, though not in batch.
+&lt;/td>
+&lt;td style='background-color:#e1e0e0;border-color:#bcbcbc'>
+&lt;b>
+&lt;p>Unverified : &lt;/p>
+&lt;/b>
+&lt;br>
+&lt;/td>
+&lt;td style='background-color:#e1e0e0;border-color:#bcbcbc'>
+&lt;b>
+&lt;p>Unverified : &lt;/p>
+&lt;/b>
+&lt;br>
+&lt;/td>
+&lt;td style='background-color:#e1e0e0;border-color:#bcbcbc'>
+&lt;b>
+&lt;p>Unverified : &lt;/p>
+&lt;/b>
+&lt;br>
+&lt;/td>
+&lt;/tr>
 &lt;/table>
 &lt;/div>
 &lt;/div>
diff --git a/website/generated-content/documentation/runtime/model/index.html 
b/website/generated-content/documentation/runtime/model/index.html
index f612cc3..2a483c4 100644
--- a/website/generated-content/documentation/runtime/model/index.html
+++ b/website/generated-content/documentation/runtime/model/index.html
@@ -18,7 +18,7 @@
 function addPlaceholder(){$('input:text').attr('placeholder',"What are you 
looking for?");}
 function endSearch(){var 
search=document.querySelector(".searchBar");search.classList.add("disappear");var
 icons=document.querySelector("#iconsBar");icons.classList.remove("disappear");}
 function blockScroll(){$("body").toggleClass("fixedPosition");}
-function openMenu(){addPlaceholder();blockScroll();}</script><div 
class="clearfix container-main-content"><div class="section-nav closed" 
data-offset-top=90 data-offset-bottom=500><span class="section-nav-back 
glyphicon glyphicon-menu-left"></span><nav><ul class=section-nav-list 
data-section-nav><li><span 
class=section-nav-list-main-title>Documentation</span></li><li><a 
href=/documentation>Using the Documentation</a></li><li 
class=section-nav-item--collapsible><span class=section-nav-lis [...]
+function openMenu(){addPlaceholder();blockScroll();}</script><div 
class="clearfix container-main-content"><div class="section-nav closed" 
data-offset-top=90 data-offset-bottom=500><span class="section-nav-back 
glyphicon glyphicon-menu-left"></span><nav><ul class=section-nav-list 
data-section-nav><li><span 
class=section-nav-list-main-title>Documentation</span></li><li><a 
href=/documentation>Using the Documentation</a></li><li 
class=section-nav-item--collapsible><span class=section-nav-lis [...]
 may observe various effects as a result of the runner’s choices. This page
 describes these effects so you can better understand how Beam pipelines 
execute.</p><h2 id=processing-of-elements>Processing of elements</h2><p>The 
serialization and communication of elements between machines is one of the
 most expensive operations in a distributed execution of your pipeline. Avoiding
@@ -32,7 +32,9 @@ involve serializing elements and communicating them to other 
workers.</li><li>Us
 serializing the elements and broadcasting them to all the workers executing
 the <code>ParDo</code>.</li><li>Passing elements between transforms that are 
running on the same worker.
 This may allow the runner to avoid serializing elements; instead, the runner
-can just pass the elements in memory.</li></ul><p>Some situations where the 
runner may serialize and persist elements are:</p><ol><li>When used as part of 
a stateful <code>DoFn</code>, the runner may persist values to some
+can just pass the elements in memory. This is done as part of an
+optimization that is known as
+<a 
href=https://beam.apache.org/documentation/glossary/#fusion>fusion</a>.</li></ul><p>Some
 situations where the runner may serialize and persist elements 
are:</p><ol><li>When used as part of a stateful <code>DoFn</code>, the runner 
may persist values to some
 state mechanism.</li><li>When committing the results of processing, the runner 
may persist the outputs
 as a checkpoint.</li></ol><h3 id=bundling-and-persistence>Bundling and 
persistence</h3><p>Beam pipelines often focus on &ldquo;<a 
href=https://en.wikipedia.org/wiki/embarrassingly_parallel>embarassingly 
parallel</a>&rdquo;
 problems. Because of this, the APIs emphasize processing elements in parallel,
@@ -46,7 +48,26 @@ bundles is arbitrary and selected by the runner. This allows 
the runner to
 choose an appropriate middle-ground between persisting results after every
 element, and having to retry everything if there is a failure. For example, a
 streaming runner may prefer to process and commit small bundles, and a batch
-runner may prefer to process larger bundles.</p><h2 id=parallelism>Failures 
and parallelism within and between transforms</h2><p>In this section, we 
discuss how elements in the input collection are processed
+runner may prefer to process larger bundles.</p><h3 
id=data-partitioning-and-inter-stage-execution>Data partitioning and 
inter-stage execution</h3><p>Partitioning and parallelization of element 
processing within a Beam pipeline is
+dependent on two things:</p><ul><li>Data source 
implementation</li><li>Inter-stage key parallelism</li></ul><p>Beam pipelines 
read data from a source (e.g. <code>KafkaIO</code>, <code>BigQueryIO</code>, 
<code>JdbcIO</code>,
+or your own source implementation). To implement a Source in Beam one must
+implement it as a Splittable <code>DoFn</code>. A Splittable <code>DoFn</code> 
provides the runner
+with interfaces to facilitate the splitting of work.</p><p>When running 
key-based operations in Beam (e.g. <code>GroupByKey</code>, 
<code>Combine</code>,
+<code>Reshuffle.perKey</code>, and stateful <code>DoFn</code>s), Beam runners 
perform serialization
+and transfer of data known as <em>shuffle</em><sup>1</sup>. Shuffle allows data
+elements of the same key to be processed together.</p><p>The way in which 
runners <em>shuffle</em> data may be slightly different for Batch and
+Streaming execution modes.</p><p><sup>1</sup>Not to be confused with the 
<code>shuffle</code> operation in some runners.</p><h4 
id=data-ordering-in-a-pipeline-execution>Data ordering in a pipeline 
execution</h4><p>The Beam model does not define strict guidelines regarding the 
order in which
+runners process elements or transport them across <code>PTransforms</code>. 
Runners are
+free to implement data transfer semantics in different forms.</p><p>Some use 
cases exist where user pipelines may need to rely on specific ordering
+semantics in pipeline execution. The <a 
href=/documentation/runners/capability-matrix/additional-common-features-not-yet-part-of-the-beam-model/index.html>capability
 matrix documents</a>
+runner behavior for <strong>key-ordered delivery</strong>.</p><p>Consider a 
single Beam worker processing a series of bundles from the same Beam
+transform, and consider a <code>PTransform</code> that outputs data from this 
Stage into a
+downstream <code>PCollection</code>. Finally, consider two events <em>with the 
same key</em>
+emitted in a certain order by this worker (within the same bundle or as part of
+different bundles).</p><p>We say that the Beam runner supports 
<strong>key-ordered delivery</strong> if it guarantees
+that these two events will be observed in the same order by a PTransform that 
is
+immediately downstream independently of the kind of data transmission 
method.</p><p>This characteristic will hold true in runners and operations that 
have
+key-limited parallelism.</p><h2 id=parallelism>Failures and parallelism within 
and between transforms</h2><p>In this section, we discuss how elements in the 
input collection are processed
 in parallel, and how transforms are retried when failures occur.</p><h3 
id=data-parallelism>Data-parallelism within one transform</h3><p>When executing 
a single <code>ParDo</code>, a runner might divide an example input
 collection of nine elements into two bundles as shown in figure 1.</p><p><img 
src=/images/execution_model_bundling.svg alt="Bundle A contains five elements. 
Bundle B contains four elements."></p><p><em>Figure 1: A runner divides an 
input collection into two bundles.</em></p><p>When the <code>ParDo</code> 
executes, workers may process the two bundles in parallel as
 shown in figure 2.</p><p><img src=/images/execution_model_bundling_gantt.svg 
alt="Two workers process the two bundles in parallel. Worker one processes 
bundle A. Worker two processes bundle B."></p><p><em>Figure 2: Two workers 
process the two bundles in parallel.</em></p><p>Since elements cannot be split, 
the maximum parallelism for a transform depends
@@ -87,7 +108,7 @@ elements in the input bundle must be retried. These two 
<code>ParDo</code>s are
 the input bundle are retried.</em></p><p>Note that the retry does not 
necessarily have the same processing time as the
 original attempt, as shown in the diagram.</p><p>All <code>DoFns</code> that 
experience coupled failures are terminated and must be torn
 down since they aren’t following the normal <code>DoFn</code> lifecycle 
.</p><p>Executing transforms this way allows a runner to avoid persisting 
elements
-between transforms, saving on persistence costs.</p><div class=feedback><p 
class=update>Last updated on 2020/08/28</p><h3>Have you found everything you 
were looking for?</h3><p class=description>Was it all useful and clear? Is 
there anything that you would like to change? Let us know!</p><button 
class=load-button><a href="mailto:d...@beam.apache.org?subject=Beam Website 
Feedback">SEND FEEDBACK</a></button></div></div></div><footer class=footer><div 
class=footer__contained><div class=foote [...]
+between transforms, saving on persistence costs.</p><div class=feedback><p 
class=update>Last updated on 2021/12/03</p><h3>Have you found everything you 
were looking for?</h3><p class=description>Was it all useful and clear? Is 
there anything that you would like to change? Let us know!</p><button 
class=load-button><a href="mailto:d...@beam.apache.org?subject=Beam Website 
Feedback">SEND FEEDBACK</a></button></div></div></div><footer class=footer><div 
class=footer__contained><div class=foote [...]
 <a href=http://www.apache.org>The Apache Software Foundation</a>
 | <a href=/privacy_policy>Privacy Policy</a>
 | <a href=/feed.xml>RSS Feed</a><br><br>Apache Beam, Apache, Beam, the Beam 
logo, and the Apache feather logo are either registered trademarks or 
trademarks of The Apache Software Foundation. All other products or name brands 
are trademarks of their respective holders, including The Apache Software 
Foundation.</div></div></div></div></footer></body></html>
\ No newline at end of file
diff --git a/website/generated-content/sitemap.xml 
b/website/generated-content/sitemap.xml
index 424aa32..313115d 100644
--- a/website/generated-content/sitemap.xml
+++ b/website/generated-content/sitemap.xml
@@ -1 +1 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset 
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"; 
xmlns:xhtml="http://www.w3.org/1999/xhtml";><url><loc>/blog/beam-2.34.0/</loc><lastmod>2021-11-11T11:07:06-08:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2021-11-11T11:07:06-08:00</lastmod></url><url><loc>/blog/</loc><lastmod>2021-11-11T11:07:06-08:00</lastmod></url><url><loc>/categories/</loc><lastmod>2021-12-01T21:32:04+03:00</lastmod></url><url><loc>/blog/g
 [...]
\ No newline at end of file
+<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset 
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"; 
xmlns:xhtml="http://www.w3.org/1999/xhtml";><url><loc>/blog/beam-2.34.0/</loc><lastmod>2021-11-11T11:07:06-08:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2021-11-11T11:07:06-08:00</lastmod></url><url><loc>/blog/</loc><lastmod>2021-11-11T11:07:06-08:00</lastmod></url><url><loc>/categories/</loc><lastmod>2021-12-01T21:32:04+03:00</lastmod></url><url><loc>/blog/g
 [...]
\ No newline at end of file

Reply via email to