This is an automated email from the ASF dual-hosted git repository. github-bot pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/asf-site by this push: new 316fadd412f Publishing website 2024/03/29 05:37:43 at commit e3fee51 316fadd412f is described below commit 316fadd412fc1bb5a2d32ca7f7e38962745321ed Author: runner <runner@main-runner-zt478-b56pw> AuthorDate: Fri Mar 29 05:37:43 2024 +0000 Publishing website 2024/03/29 05:37:43 at commit e3fee51 --- website/generated-content/documentation/index.xml | 23 +++++++++-- .../documentation/programming-guide/index.html | 11 ++++- .../documentation/sdks/yaml-udf/index.html | 48 +++++++++++++++++++++- .../elementwise/enrichment-vertexai/index.html | 6 +-- website/generated-content/sitemap.xml | 2 +- 5 files changed, 80 insertions(+), 10 deletions(-) diff --git a/website/generated-content/documentation/index.xml b/website/generated-content/documentation/index.xml index 0e12a63f6a1..59bd3db239c 100644 --- a/website/generated-content/documentation/index.xml +++ b/website/generated-content/documentation/index.xml @@ -8416,6 +8416,19 @@ for instance).</p> </span></span><span class="line"><span class="cl"><span class="kr">const</span> <span class="nx">topDecile</span>: <span class="kt">PCollection</span><span class="p">&lt;</span><span class="nt">Student</span><span class="p">&gt;</span> <span class="o">=</span> <span class="nx">deciles</span><span class="p">[</span><span class="mi">9</span><span class="p">];</span></span></span></code></pre>& [...] </div> </div> +<div class='language-yaml snippet'> +<div class="notebook-skip code-snippet"> +<a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard"> +<img src="/images/copy-icon.svg"/> +</a> +<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">type</span><span class="p">:</span><span class="w"> </span><span class="l">Partition</span><span class="w"> +</span></span></span><span class="line"><span class="cl"><span class="w"></span><span class="nt">config</span><span class="p">:</span><span class="w"> +</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">by</span><span class="p">:</span><span class="w"> </span><span class="l">str(percentile // 10)</span><span class="w"> +</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">language</span><span class="p">:</span><span class="w"> </span><span class="l">python</span><span class="w"> +</span></span></span><span class="line"><span class="cl"><span class="w"> </span><span class="nt">outputs</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">&#34;0&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;1&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;2&#34;</s [...] +</div> +</div> +<p class="language-yaml">Note that in Beam YAML, <code>PCollections</code> are partitioned via string rather than integer values.</p> <h3 id="requirements-for-writing-user-code-for-beam-transforms">4.3. Requirements for writing user code for Beam transforms</h3> <p>When you build user code for a Beam transform, you should keep in mind the distributed nature of execution. For example, there might be many copies of your @@ -8759,6 +8772,10 @@ use <code>beam.ParDoN</code> which will return a <code>[]beam.PCollecti from <code>apply</code>). If you want to have multiple outputs, emit an object with distinct properties in your <code>ParDo</code> operation and follow this operation with a <code>Split</code> to break it into multiple <code>PCollection</code>s.</p> +<p class="language-yaml">In Beam YAML, one obtains multiple outputs by emitting all outputs to a single +<code>PCollection</code>, possibly with an extra field, and then using <code>Partition</code> to +split this single <code>PCollection</code> into multiple distinct <code>PCollection</code> +outputs.</p> <h4 id="output-tags">4.5.1. Tags for multiple outputs</h4> <p class="language-typescript">The <code>Split</code> PTransform will take a PCollection of elements of the form <code>{tagA?: A, tagB?: B, ...}</code> and return a object @@ -19448,9 +19465,9 @@ The following example demonstrates how to create a pipeline that use the enrichm <a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard"> <img src="/images/copy-icon.svg"/> </a> -<pre tabindex="0"><code>Row(user_id=&#39;2963&#39;, product_id=14235, sale_price=15.0, age=29.0, gender=&#39;1&#39;, state=&#39;97&#39;, country=&#39;2&#39;) -Row(user_id=&#39;21422&#39;, product_id=11203, sale_price=12.0, age=36.0, state=&#39;184&#39;, gender=&#39;1&#39;, country=&#39;5&#39;) -Row(user_id=&#39;20592&#39;, product_id=8579, sale_price=9.0, age=30.0, state=&#39;86&#39;, gender=&#39;1&#39;, country=&#39;4&#39;)</code></pre> +<pre tabindex="0"><code>Row(user_id=&#39;2963&#39;, product_id=14235, sale_price=15.0, age=12.0, state=&#39;1&#39;, gender=&#39;1&#39;, country=&#39;1&#39;) +Row(user_id=&#39;21422&#39;, product_id=11203, sale_price=12.0, age=12.0, state=&#39;0&#39;, gender=&#39;0&#39;, country=&#39;0&#39;) +Row(user_id=&#39;20592&#39;, product_id=8579, sale_price=9.0, age=12.0, state=&#39;2&#39;, gender=&#39;1&#39;, country=&#39;2&#39;)</code></pre> </div> </div> </p> diff --git a/website/generated-content/documentation/programming-guide/index.html b/website/generated-content/documentation/programming-guide/index.html index 2fd8f556e8a..81e8f467b64 100644 --- a/website/generated-content/documentation/programming-guide/index.html +++ b/website/generated-content/documentation/programming-guide/index.html @@ -1755,7 +1755,11 @@ for instance).</p><p>The following example divides a <code>PCollection</code> in </span></span><span class=line><span class=cl> <span class=mi>10</span> </span></span><span class=line><span class=cl> <span class=p>)</span> </span></span><span class=line><span class=cl><span class=p>);</span> -</span></span><span class=line><span class=cl><span class=kr>const</span> <span class=nx>topDecile</span>: <span class=kt>PCollection</span><span class=p><</span><span class=nt>Student</span><span class=p>></span> <span class=o>=</span> <span class=nx>deciles</span><span class=p>[</span><span class=mi>9</span><span class=p>];</span></span></span></code></pre></div></div></div><h3 id=requirements-for-writing-user-code-for-beam-transforms>4.3. Requirements for writing user code for B [...] +</span></span><span class=line><span class=cl><span class=kr>const</span> <span class=nx>topDecile</span>: <span class=kt>PCollection</span><span class=p><</span><span class=nt>Student</span><span class=p>></span> <span class=o>=</span> <span class=nx>deciles</span><span class=p>[</span><span class=mi>9</span><span class=p>];</span></span></span></code></pre></div></div></div><div class='language-yaml snippet'><div class="notebook-skip code-snippet"><a class=copy type=button data-b [...] +</span></span></span><span class=line><span class=cl><span class=w></span><span class=nt>config</span><span class=p>:</span><span class=w> +</span></span></span><span class=line><span class=cl><span class=w> </span><span class=nt>by</span><span class=p>:</span><span class=w> </span><span class=l>str(percentile // 10)</span><span class=w> +</span></span></span><span class=line><span class=cl><span class=w> </span><span class=nt>language</span><span class=p>:</span><span class=w> </span><span class=l>python</span><span class=w> +</span></span></span><span class=line><span class=cl><span class=w> </span><span class=nt>outputs</span><span class=p>:</span><span class=w> </span><span class=p>[</span><span class=s2>"0"</span><span class=p>,</span><span class=w> </span><span class=s2>"1"</span><span class=p>,</span><span class=w> </span><span class=s2>"2"</span><span class=p>,</span><span class=w> </span><span class=s2>"3"</span><span class=p>,</span><span class=w> </span><span class=s [...] distributed nature of execution. For example, there might be many copies of your function running on a lot of different machines in parallel, and those copies function independently, without communicating or sharing state with any of the @@ -2009,7 +2013,10 @@ function that matches the number of outputs. <code>beam.ParDo2</code> for two ou use <code>beam.ParDoN</code> which will return a <code>[]beam.PCollection</code>.</p><p class=language-typescript>While <code>ParDo</code> always produces a main output <code>PCollection</code> (as the return value from <code>apply</code>). If you want to have multiple outputs, emit an object with distinct properties in your <code>ParDo</code> operation and follow this operation with a <code>Split</code> -to break it into multiple <code>PCollection</code>s.</p><h4 id=output-tags>4.5.1. Tags for multiple outputs</h4><p class=language-typescript>The <code>Split</code> PTransform will take a PCollection of elements of the form +to break it into multiple <code>PCollection</code>s.</p><p class=language-yaml>In Beam YAML, one obtains multiple outputs by emitting all outputs to a single +<code>PCollection</code>, possibly with an extra field, and then using <code>Partition</code> to +split this single <code>PCollection</code> into multiple distinct <code>PCollection</code> +outputs.</p><h4 id=output-tags>4.5.1. Tags for multiple outputs</h4><p class=language-typescript>The <code>Split</code> PTransform will take a PCollection of elements of the form <code>{tagA?: A, tagB?: B, ...}</code> and return a object <code>{tagA: PCollection<A>, tagB: PCollection<B>, ...}</code>. The set of expected tags is passed to the operation; how multiple or diff --git a/website/generated-content/documentation/sdks/yaml-udf/index.html b/website/generated-content/documentation/sdks/yaml-udf/index.html index 1b7e8893714..11381d6a2f1 100644 --- a/website/generated-content/documentation/sdks/yaml-udf/index.html +++ b/website/generated-content/documentation/sdks/yaml-udf/index.html @@ -35,7 +35,7 @@ <img class=banner-img-mobile src=/images/banners/tour-of-beam/tour-of-beam-mobile.png alt="Start Tour of Beam"></a></div><div class=swiper-slide><a href=https://beam.apache.org/documentation/ml/overview/><img class=banner-img-desktop src=/images/banners/machine-learning/machine-learning-desktop.jpg alt="Machine Learning"> <img class=banner-img-mobile src=/images/banners/machine-learning/machine-learning-mobile.jpg alt="Machine Learning"></a></div></div><div class=swiper-pagination></div><div class=swiper-button-prev></div><div class=swiper-button-next></div></div><script src=/js/swiper-bundle.min.min.e0e8f81b0b15728d35ff73c07f42ddbb17a108d6f23df4953cb3e60df7ade675.js></script> <script src=/js/sliders/top-banners.min.afa7d0a19acf7a3b28ca369490b3d401a619562a2a4c9612577be2f66a4b9855.js></script> -<script>function showSearch(){addPlaceholder();var e,t=document.querySelector(".searchBar");t.classList.remove("disappear"),e=document.querySelector("#iconsBar"),e.classList.add("disappear")}function addPlaceholder(){$("input:text").attr("placeholder","What are you looking for?")}function endSearch(){var e,t=document.querySelector(".searchBar");t.classList.add("disappear"),e=document.querySelector("#iconsBar"),e.classList.remove("disappear")}function blockScroll(){$("body").toggleClass(" [...] +<script>function showSearch(){addPlaceholder();var e,t=document.querySelector(".searchBar");t.classList.remove("disappear"),e=document.querySelector("#iconsBar"),e.classList.add("disappear")}function addPlaceholder(){$("input:text").attr("placeholder","What are you looking for?")}function endSearch(){var e,t=document.querySelector(".searchBar");t.classList.add("disappear"),e=document.querySelector("#iconsBar"),e.classList.remove("disappear")}function blockScroll(){$("body").toggleClass(" [...] get data into the correct shape. The simplest of these is <code>MaptoFields</code> which creates records with new fields defined in terms of the input fields.</p><h2 id=field-renames>Field renames</h2><p>To rename fields one can write</p><pre tabindex=0><code>- type: MapToFields config: @@ -140,6 +140,52 @@ criteria. This can be accomplished with a <code>Filter</code> transform, e.g.</p config: language: sql keep: "col2 > 0" +</code></pre><h2 id=partitioning>Partitioning</h2><p>It can also be useful to send different elements to different places +(similar to what is done with side outputs in other SDKs). +While this can be done with a set of <code>Filter</code> operations, if every +element has a single destination it can be more natural to use a <code>Partition</code> +transform instead which sends every element to a unique output. +For example, this will send all elements where <code>col1</code> is equal to <code>"a"</code> to the +output <code>Partition.a</code>.</p><pre tabindex=0><code>- type: Partition + input: input + config: + by: col1 + outputs: ['a', 'b', 'c'] + +- type: SomeTransform + input: Partition.a + config: + param: ... + +- type: AnotherTransform + input: Partition.b + config: + param: ... +</code></pre><p>One can also specify the destination as a function, e.g.</p><pre tabindex=0><code>- type: Partition + input: input + config: + by: "'even' if col2 % 2 == 0 else 'odd'" + language: python + outputs: ['even', 'odd'] +</code></pre><p>One can optionally provide a catch-all output which will capture all elements +that are not in the named outputs (which would otherwise be an error):</p><pre tabindex=0><code>- type: Partition + input: input + config: + by: col1 + outputs: ['a', 'b', 'c'] + unknown_output: 'other' +</code></pre><p>Sometimes one wants to split a PCollection into multiple PCollections +that aren’t necessarily disjoint. To send elements to multiple (or no) outputs, +one could use an iterable column and precede the <code>Partition</code> with an <code>Explode</code>.</p><pre tabindex=0><code>- type: Explode + input: input + config: + fields: col1 + +- type: Partition + input: Explode + config: + by: col1 + outputs: ['a', 'b', 'c'] </code></pre><h2 id=types>Types</h2><p>Beam will try to infer the types involved in the mappings, but sometimes this is not possible. In these cases one can explicitly denote the expected output type, e.g.</p><pre tabindex=0><code>- type: MapToFields diff --git a/website/generated-content/documentation/transforms/python/elementwise/enrichment-vertexai/index.html b/website/generated-content/documentation/transforms/python/elementwise/enrichment-vertexai/index.html index db94e3b1463..1182222a3c4 100644 --- a/website/generated-content/documentation/transforms/python/elementwise/enrichment-vertexai/index.html +++ b/website/generated-content/documentation/transforms/python/elementwise/enrichment-vertexai/index.html @@ -64,9 +64,9 @@ The following example demonstrates how to create a pipeline that use the enrichm </span></span><span class=line><span class=cl> <span class=n>p</span> </span></span><span class=line><span class=cl> <span class=o>|</span> <span class=s2>"Create"</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>Create</span><span class=p>(</span><span class=n>data</span><span class=p>)</span> </span></span><span class=line><span class=cl> <span class=o>|</span> <span class=s2>"Enrich W/ Vertex AI"</span> <span class=o>>></span> <span class=n>Enrichment</span><span class=p>(</span><span class=n>vertex_ai_handler</span><span class=p>)</span> -</span></span><span class=line><span class=cl> <span class=o>|</span> <span class=s2>"Print"</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>Map</span><span class=p>(</span><span class=nb>print</span><span class=p>))</span></span></span></code></pre></div></div></div><p><p class=notebook-skip>Output:</p><div class=snippet><div class="notebook-skip code-snippet without_switcher"><a class=copy type=button data-bs-toggle=tooltip [...] -Row(user_id='21422', product_id=11203, sale_price=12.0, age=36.0, state='184', gender='1', country='5') -Row(user_id='20592', product_id=8579, sale_price=9.0, age=30.0, state='86', gender='1', country='4')</code></pre></div></div></p><h2 id=example-2-enrichment-with-vertex-ai-feature-store-legacy>Example 2: Enrichment with Vertex AI Feature Store (legacy)</h2><p>The precomputed feature values stored in Vertex AI Feature Store (Legacy) use the following format:</p><div class=table-wrapper><table><thead><tr><th style=text-align:left>entity_id</th><th style=text [...] +</span></span><span class=line><span class=cl> <span class=o>|</span> <span class=s2>"Print"</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>Map</span><span class=p>(</span><span class=nb>print</span><span class=p>))</span></span></span></code></pre></div></div></div><p><p class=notebook-skip>Output:</p><div class=snippet><div class="notebook-skip code-snippet without_switcher"><a class=copy type=button data-bs-toggle=tooltip [...] +Row(user_id='21422', product_id=11203, sale_price=12.0, age=12.0, state='0', gender='0', country='0') +Row(user_id='20592', product_id=8579, sale_price=9.0, age=12.0, state='2', gender='1', country='2')</code></pre></div></div></p><h2 id=example-2-enrichment-with-vertex-ai-feature-store-legacy>Example 2: Enrichment with Vertex AI Feature Store (legacy)</h2><p>The precomputed feature values stored in Vertex AI Feature Store (Legacy) use the following format:</p><div class=table-wrapper><table><thead><tr><th style=text-align:left>entity_id</th><th style=text- [...] </span></span><span class=line><span class=cl><span class=kn>from</span> <span class=nn>apache_beam.transforms.enrichment</span> <span class=kn>import</span> <span class=n>Enrichment</span> </span></span><span class=line><span class=cl><span class=kn>from</span> <span class=nn>apache_beam.transforms.enrichment_handlers.vertex_ai_feature_store</span> \ </span></span><span class=line><span class=cl> <span class=kn>import</span> <span class=nn>VertexAIFeatureStoreLegacyEnrichmentHandler</span> diff --git a/website/generated-content/sitemap.xml b/website/generated-content/sitemap.xml index e0798bb575f..0c19eefeefe 100644 --- a/website/generated-content/sitemap.xml +++ b/website/generated-content/sitemap.xml @@ -1 +1 @@ -<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/blog/beam-2.55.0/</loc><lastmod>2024-03-28T13:02:16-07:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2024-03-28T13:02:16-07:00</lastmod></url><url><loc>/blog/</loc><lastmod>2024-03-28T13:02:16-07:00</lastmod></url><url><loc>/categories/</loc><lastmod>2024-03-28T13:02:16-07:00</lastmod></url><url><loc>/catego [...] \ No newline at end of file +<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/blog/beam-2.55.0/</loc><lastmod>2024-03-28T17:25:00-07:00</lastmod></url><url><loc>/categories/blog/</loc><lastmod>2024-03-28T17:25:00-07:00</lastmod></url><url><loc>/blog/</loc><lastmod>2024-03-28T17:25:00-07:00</lastmod></url><url><loc>/categories/</loc><lastmod>2024-03-28T17:25:00-07:00</lastmod></url><url><loc>/catego [...] \ No newline at end of file