This is an automated email from the ASF dual-hosted git repository. git-site-role pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/asf-site by this push: new b5fbe9e Publishing website 2021/10/27 00:01:45 at commit 5cb634e b5fbe9e is described below commit b5fbe9e0dc7f19c65d536b81f83089af018a322f Author: jenkins <bui...@apache.org> AuthorDate: Wed Oct 27 00:01:46 2021 +0000 Publishing website 2021/10/27 00:01:45 at commit 5cb634e --- .../documentation/basics/index.html | 26 +++++++++++++-- website/generated-content/documentation/index.xml | 38 ++++++++++++++++++++++ website/generated-content/sitemap.xml | 2 +- 3 files changed, 62 insertions(+), 4 deletions(-) diff --git a/website/generated-content/documentation/basics/index.html b/website/generated-content/documentation/basics/index.html index 14a033f..8dc0acd 100644 --- a/website/generated-content/documentation/basics/index.html +++ b/website/generated-content/documentation/basics/index.html @@ -18,7 +18,7 @@ function addPlaceholder(){$('input:text').attr('placeholder',"What are you looking for?");} function endSearch(){var search=document.querySelector(".searchBar");search.classList.add("disappear");var icons=document.querySelector("#iconsBar");icons.classList.remove("disappear");} function blockScroll(){$("body").toggleClass("fixedPosition");} -function openMenu(){addPlaceholder();blockScroll();}</script><div class="clearfix container-main-content"><div class="section-nav closed" data-offset-top=90 data-offset-bottom=500><span class="section-nav-back glyphicon glyphicon-menu-left"></span><nav><ul class=section-nav-list data-section-nav><li><span class=section-nav-list-main-title>Documentation</span></li><li><a href=/documentation>Using the Documentation</a></li><li class=section-nav-item--collapsible><span class=section-nav-lis [...] +function openMenu(){addPlaceholder();blockScroll();}</script><div class="clearfix container-main-content"><div class="section-nav closed" data-offset-top=90 data-offset-bottom=500><span class="section-nav-back glyphicon glyphicon-menu-left"></span><nav><ul class=section-nav-list data-section-nav><li><span class=section-nav-list-main-title>Documentation</span></li><li><a href=/documentation>Using the Documentation</a></li><li class=section-nav-item--collapsible><span class=section-nav-lis [...] data-parallel processing pipelines. To get started with Beam, you’ll need to understand an important set of core concepts:</p><ul><li><a href=#pipeline><em>Pipeline</em></a> - A pipeline is a user-constructed graph of transformations that defines the desired data processing operations.</li><li><a href=#pcollection><em>PCollection</em></a> - A <code>PCollection</code> is a data set or data @@ -33,7 +33,10 @@ a <code>PCollection</code>. The schema for a <code>PCollection</code> defines el <code>PCollection</code> as an ordered list of named fields.</li><li><a href=/documentation/sdks/java/><em>SDK</em></a> - A language-specific library that lets pipeline authors build transforms, construct their pipelines, and submit them to a runner.</li><li><a href=#runner><em>Runner</em></a> - A runner runs a Beam pipeline using the capabilities of -your chosen data processing engine.</li></ul><p>The following sections cover these concepts in more detail and provide links to +your chosen data processing engine.</li><li><a href=#splittable-dofn><em>Splittable DoFn</em></a> - Splittable DoFns let you process +elements in a non-monolithic way. You can checkpoint the processing of an +element, and the runner can split the remaining work to yield additional +parallelism.</li></ul><p>The following sections cover these concepts in more detail and provide links to additional documentation.</p><h3 id=pipeline>Pipeline</h3><p>A Beam pipeline is a graph (specifically, a <a href=https://en.wikipedia.org/wiki/Directed_acyclic_graph>directed acyclic graph</a>) of all the data and computations in your data processing task. This includes @@ -182,7 +185,24 @@ Flink runner translates a Beam pipeline into a Flink job. The Direct Runner runs pipelines locally so you can test, debug, and validate that your pipeline adheres to the Apache Beam model as closely as possible.</p><p>For an up-to-date list of Beam runners and which features of the Apache Beam model they support, see the runner -<a href=/documentation/runners/capability-matrix/>capability matrix</a>.</p><p>For more information about runners, see the following pages:</p><ul><li><a href=/documentation/#choosing-a-runner>Choosing a Runner</a></li><li><a href=/documentation/runners/capability-matrix/>Beam Capability Matrix</a></li></ul><div class=feedback><p class=update>Last updated on 2021/10/25</p><h3>Have you found everything you were looking for?</h3><p class=description>Was it all useful and clear? Is there an [...] +<a href=/documentation/runners/capability-matrix/>capability matrix</a>.</p><p>For more information about runners, see the following pages:</p><ul><li><a href=/documentation/#choosing-a-runner>Choosing a Runner</a></li><li><a href=/documentation/runners/capability-matrix/>Beam Capability Matrix</a></li></ul><h3 id=splittable-dofn>Splittable DoFn</h3><p>Splittable <code>DoFn</code> (SDF) is a generalization of <code>DoFn</code> that lets you process +elements in a non-monolithic way. Splittable <code>DoFn</code> makes it easier to create +complex, modular I/O connectors in Beam.</p><p>A regular <code>ParDo</code> processes an entire element at a time, applying your regular +<code>DoFn</code> and waiting for the call to terminate. When you instead apply a +splittable <code>DoFn</code> to each element, the runner has the option of splitting the +element’s processing into smaller tasks. You can checkpoint the processing of an +element, and you can split the remaining work to yield additional parallelism.</p><p>For example, imagine you want to read every line from very large text files. +When you write your splittable <code>DoFn</code>, you can have separate pieces of logic to +read a segment of a file, split a segment of a file into sub-segments, and +report progress through the current segment. The runner can then invoke your +splittable <code>DoFn</code> intelligently to split up each input and read portions +separately, in parallel.</p><p>A common computation pattern has the following steps:</p><ol><li>The runner splits an incoming element before starting any processing.</li><li>The runner starts running your processing logic on each sub-element.</li><li>If the runner notices that some sub-elements are taking longer than others, +the runner splits those sub-elements further and repeats step 2.</li><li>The sub-element either finishes processing, or the user chooses to +checkpoint the sub-element and the runner repeats step 2.</li></ol><p>You can also write your splittable <code>DoFn</code> so the runner can split the unbounded +processing. For example, if you write a splittable <code>DoFn</code> to watch a set of +directories and output filenames as they arrive, you can split to subdivide the +work of different directories. This allows the runner to split off a hot +directory and give it additional resources.</p><p>For more information about Splittable <code>DoFn</code>, see the following pages:</p><ul><li><a href=/documentation/programming-guide/#splittable-dofns>Splittable DoFns</a></li><li><a href=/blog/splittable-do-fn-is-available/>Splittable DoFn in Apache Beam is Ready to Use</a></li></ul><div class=feedback><p class=update>Last updated on 2021/10/26</p><h3>Have you found everything you were looking for?</h3><p class=description>Was it all us [...] <a href=http://www.apache.org>The Apache Software Foundation</a> | <a href=/privacy_policy>Privacy Policy</a> | <a href=/feed.xml>RSS Feed</a><br><br>Apache Beam, Apache, Beam, the Beam logo, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation. All other products or name brands are trademarks of their respective holders, including The Apache Software Foundation.</div></div></div></div></footer></body></html> \ No newline at end of file diff --git a/website/generated-content/documentation/index.xml b/website/generated-content/documentation/index.xml index 69e5ee9..3fd4cca 100644 --- a/website/generated-content/documentation/index.xml +++ b/website/generated-content/documentation/index.xml @@ -3205,6 +3205,10 @@ pipeline authors build transforms, construct their pipelines, and submit them to a runner.</li> <li><a href="#runner"><em>Runner</em></a> - A runner runs a Beam pipeline using the capabilities of your chosen data processing engine.</li> +<li><a href="#splittable-dofn"><em>Splittable DoFn</em></a> - Splittable DoFns let you process +elements in a non-monolithic way. You can checkpoint the processing of an +element, and the runner can split the remaining work to yield additional +parallelism.</li> </ul> <p>The following sections cover these concepts in more detail and provide links to additional documentation.</p> @@ -3472,6 +3476,40 @@ model they support, see the runner <ul> <li><a href="/documentation/#choosing-a-runner">Choosing a Runner</a></li> <li><a href="/documentation/runners/capability-matrix/">Beam Capability Matrix</a></li> +</ul> +<h3 id="splittable-dofn">Splittable DoFn</h3> +<p>Splittable <code>DoFn</code> (SDF) is a generalization of <code>DoFn</code> that lets you process +elements in a non-monolithic way. Splittable <code>DoFn</code> makes it easier to create +complex, modular I/O connectors in Beam.</p> +<p>A regular <code>ParDo</code> processes an entire element at a time, applying your regular +<code>DoFn</code> and waiting for the call to terminate. When you instead apply a +splittable <code>DoFn</code> to each element, the runner has the option of splitting the +element&rsquo;s processing into smaller tasks. You can checkpoint the processing of an +element, and you can split the remaining work to yield additional parallelism.</p> +<p>For example, imagine you want to read every line from very large text files. +When you write your splittable <code>DoFn</code>, you can have separate pieces of logic to +read a segment of a file, split a segment of a file into sub-segments, and +report progress through the current segment. The runner can then invoke your +splittable <code>DoFn</code> intelligently to split up each input and read portions +separately, in parallel.</p> +<p>A common computation pattern has the following steps:</p> +<ol> +<li>The runner splits an incoming element before starting any processing.</li> +<li>The runner starts running your processing logic on each sub-element.</li> +<li>If the runner notices that some sub-elements are taking longer than others, +the runner splits those sub-elements further and repeats step 2.</li> +<li>The sub-element either finishes processing, or the user chooses to +checkpoint the sub-element and the runner repeats step 2.</li> +</ol> +<p>You can also write your splittable <code>DoFn</code> so the runner can split the unbounded +processing. For example, if you write a splittable <code>DoFn</code> to watch a set of +directories and output filenames as they arrive, you can split to subdivide the +work of different directories. This allows the runner to split off a hot +directory and give it additional resources.</p> +<p>For more information about Splittable <code>DoFn</code>, see the following pages:</p> +<ul> +<li><a href="/documentation/programming-guide/#splittable-dofns">Splittable DoFns</a></li> +<li><a href="/blog/splittable-do-fn-is-available/">Splittable DoFn in Apache Beam is Ready to Use</a></li> </ul></description></item><item><title>Documentation: Beam glossary</title><link>/documentation/glossary/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/documentation/glossary/</guid><description> <!-- Licensed under the Apache License, Version 2.0 (the "License"); diff --git a/website/generated-content/sitemap.xml b/website/generated-content/sitemap.xml index 32613c6..b570578 100644 --- a/website/generated-content/sitemap.xml +++ b/website/generated-content/sitemap.xml @@ -1 +1 @@ -<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/blog/beam-2.33.0/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/blog/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/categories/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/blog/b [...] \ No newline at end of file +<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/blog/beam-2.33.0/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/blog/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/categories/</loc><lastmod>2021-10-11T18:22:03-07:00</lastmod></url><url><loc>/blog/b [...] \ No newline at end of file