This is an automated email from the ASF dual-hosted git repository. pabloem pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/master by this push: new 83bb3dd [BEAM-7700] Java transform catalog (#9129) 83bb3dd is described below commit 83bb3ddbfb103796de8dc28525fc133472988163 Author: rosetn <40248483+ros...@users.noreply.github.com> AuthorDate: Mon Jul 29 15:15:46 2019 -0700 [BEAM-7700] Java transform catalog (#9129) * Java transform catalog --- .../src/_includes/section-menu/documentation.html | 67 ++++++++- .../java/aggregation/approximatequantiles.md | 43 ++++++ .../java/aggregation/approximateunique.md | 40 ++++++ .../transforms/java/aggregation/cogroupbykey.md | 73 ++++++++++ .../transforms/java/aggregation/combine.md | 82 +++++++++++ .../java/aggregation/combinewithcontext.md | 37 +++++ .../transforms/java/aggregation/count.md | 50 +++++++ .../transforms/java/aggregation/distinct.md | 43 ++++++ .../transforms/java/aggregation/groupbykey.md | 50 +++++++ .../java/aggregation/groupintobatches.md | 42 ++++++ .../transforms/java/aggregation/latest.md | 52 +++++++ .../transforms/java/aggregation/max.md | 56 ++++++++ .../transforms/java/aggregation/mean.md | 58 ++++++++ .../transforms/java/aggregation/min.md | 42 ++++++ .../transforms/java/aggregation/sample.md | 40 ++++++ .../transforms/java/aggregation/sum.md | 51 +++++++ .../transforms/java/aggregation/top.md | 39 ++++++ .../transforms/java/element-wise/filter.md | 62 +++++++++ .../java/element-wise/flatmapelements.md | 40 ++++++ .../transforms/java/element-wise/keys.md | 43 ++++++ .../transforms/java/element-wise/kvswap.md | 43 ++++++ .../transforms/java/element-wise/mapelements.md | 63 +++++++++ .../transforms/java/element-wise/pardo.md | 152 +++++++++++++++++++++ .../transforms/java/element-wise/partition.md | 62 +++++++++ .../transforms/java/element-wise/regex.md | 36 +++++ .../transforms/java/element-wise/reify.md | 39 ++++++ .../transforms/java/element-wise/tostring.md | 37 +++++ .../transforms/java/element-wise/values.md | 44 ++++++ .../transforms/java/element-wise/withkeys.md | 55 ++++++++ .../transforms/java/element-wise/withtimestamps.md | 36 +++++ website/src/documentation/transforms/java/index.md | 81 +++++++++++ .../documentation/transforms/java/other/create.md | 40 ++++++ .../documentation/transforms/java/other/flatten.md | 67 +++++++++ .../documentation/transforms/java/other/passert.md | 61 +++++++++ .../documentation/transforms/java/other/view.md | 37 +++++ .../documentation/transforms/java/other/window.md | 40 ++++++ 36 files changed, 1900 insertions(+), 3 deletions(-) diff --git a/website/src/_includes/section-menu/documentation.html b/website/src/_includes/section-menu/documentation.html index 8c7347a..0f26c36 100644 --- a/website/src/_includes/section-menu/documentation.html +++ b/website/src/_includes/section-menu/documentation.html @@ -138,8 +138,8 @@ <li><a href="{{ site.baseurl }}/documentation/transforms/python/elementwise/regex/">Regex</a></li> <li><a href="{{ site.baseurl }}/documentation/transforms/python/elementwise/reify/">Reify</a></li> <li><a href="{{ site.baseurl }}/documentation/transforms/python/elementwise/tostring/">ToString</a></li> - <li><a href="{{ site.baseurl }}/documentation/transforms/python/elementwise/withtimestamps/">WithTimestamps</a></li> - <li><a href="{{ site.baseurl }}/documentation/transforms/python/elementwise/values/">Values</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/python/elementwise/values/">Values</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/python/elementwise/withtimestamps/">WithTimestamps</a></li> </ul> </li> <li class="section-nav-item--collapsible"> @@ -168,7 +168,68 @@ </li> </ul> </li> - </ul> + + <li class="section-nav-item--collapsible"> + <span class="section-nav-list-title">Java</span> + + <ul class="section-nav-list"> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/overview/">Overview</a></li> + <li class="section-nav-item--collapsible"> + <span class="section-nav-list-title">Element-wise</span> + + <ul class="section-nav-list"> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/filter/">Filter</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/flatmapelements/">FlatMapElements</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/keys/">Keys</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/kvswap/">KvSwap</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/mapelements/">MapElements</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/pardo/">ParDo</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/partition/">Partition</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/regex/">Regex</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/reify/">Reify</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/values/">Values</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/tostring/">ToString</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/withkeys/">WithKeys</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/withtimestamps/">WithTimestamps</a></li> + </ul> + </li> + <li class="section-nav-item--collapsible"> + <span class="section-nav-list-title">Aggregation</span> + + <ul class="section-nav-list"> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/approximatequantiles/">ApproximateQuantiles</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/approximateunique/">ApproximateUnique</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/cogroupbykey/">CoGroupByKey</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/combine/">Combine</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/combinewithcontext/">CombineWithContext</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/count/">Count</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/distinct/">Distinct</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/groupbykey/">GroupByKey</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/groupintobatches/">GroupIntoBatches</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/latest/">Latest</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/max/">Max</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/mean/">Mean</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/min/">Min</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/sample/">Sample</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/sum/">Sum</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/top/">Top</a></li> + </ul> + </li> + <li class="section-nav-item--collapsible"> + <span class="section-nav-list-title">Other</span> + + <ul class="section-nav-list"> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/other/create/">Create</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/other/flatten/">Flatten</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/other/passert/">PAssert</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/other/view/">View</a></li> + <li><a href="{{ site.baseurl }}/documentation/transforms/java/other/window/">Window</a></li> + </ul> + </li> + </ul> + </li> + </ul> + </li> <li class="section-nav-item--collapsible"> diff --git a/website/src/documentation/transforms/java/aggregation/approximatequantiles.md b/website/src/documentation/transforms/java/aggregation/approximatequantiles.md new file mode 100644 index 0000000..561cca0 --- /dev/null +++ b/website/src/documentation/transforms/java/aggregation/approximatequantiles.md @@ -0,0 +1,43 @@ +--- +layout: section +title: "ApproximateQuantiles" +permalink: /documentation/transforms/java/aggregation/approximatequantiles/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# ApproximateQuantiles +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/ApproximateQuantiles.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +Takes a comparison function and the desired number of quantiles *n*, either +globally or per-key. Using an approximation algorithm, it returns the +minimum value, *n-2* intermediate values, and the maximum value. + +## Examples +**Example**: to compute the quartiles of a `PCollection` of integers, we +would use `ApproximateQuantiles.globally(5)`. This will produce a list +containing 5 values: the minimum value, Quartile 1 value, Quartile 2 +value, Quartile 3 value, and the maximum value. + +## Related transforms +* [ApproximateUnique]({{ site.baseurl }}/documentation/transforms/java/aggregation/approximateunique) + estimates the number of distinct elements or distinct values in key-value pairs +* [Combine]({{ site.baseurl }}/documentation/transforms/java/aggregation/combine) \ No newline at end of file diff --git a/website/src/documentation/transforms/java/aggregation/approximateunique.md b/website/src/documentation/transforms/java/aggregation/approximateunique.md new file mode 100644 index 0000000..9b3e6d0 --- /dev/null +++ b/website/src/documentation/transforms/java/aggregation/approximateunique.md @@ -0,0 +1,40 @@ +--- +layout: section +title: "ApproximateUnique" +permalink: /documentation/transforms/java/aggregation/approximateunique/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# ApproximateUnique +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/ApproximateUnique.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +Transforms for estimating the number of distinct elements in a collection +or the number of distinct values associated with each key in a collection +of key-value pairs. + +## Examples +See [BEAM-7703](https://issues.apache.org/jira/browse/BEAM-7703) for updates. + +## Related transforms +* [Count]({{ site.baseurl }}/documentation/transforms/java/aggregation/count) + counts the number of elements within each aggregation. +* [Distinct]({{ site.baseurl }}/documentation/transforms/java/aggregation/distinct) \ No newline at end of file diff --git a/website/src/documentation/transforms/java/aggregation/cogroupbykey.md b/website/src/documentation/transforms/java/aggregation/cogroupbykey.md new file mode 100644 index 0000000..a546058 --- /dev/null +++ b/website/src/documentation/transforms/java/aggregation/cogroupbykey.md @@ -0,0 +1,73 @@ +--- +layout: section +title: "CoGroupByKey" +permalink: /documentation/transforms/java/aggregation/cogroupbykey/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# CoGroupByKey +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/join/CoGroupByKey.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +Aggregates all input elements by their key and allows downstream processing +to consume all values associated with the key. While `GroupByKey` performs +this operation over a single input collection and thus a single type of +input values, `CoGroupByKey` operates over multiple input collections. As +a result, the result for each key is a tuple of the values associated with +that key in each input collection. + +See more information in the [Beam Programming Guide]({{ site.baseurl }}/documentation/programming-guide/#cogroupbykey). + +## Examples +**Example**: Say you have two different files with user data; one file has +names and email addresses and the other file has names and phone numbers. + +You can join those two data sets, using the username as a common key and the +other data as the associated values. After the join, you have one data set +that contains all of the information (email addresses and phone numbers) +associated with each name. + +```java +PCollection<KV<UID, Integer>> pt1 = /* ... */; +PCollection<KV<UID, String>> pt2 = /* ... */; + +final TupleTag<Integer> t1 = new TupleTag<>(); +final TupleTag<String> t2 = new TupleTag<>(); +PCollection<KV<UID, CoGBKResult>> result = + KeyedPCollectionTuple.of(t1, pt1).and(t2, pt2) + .apply(CoGroupByKey.create()); +result.apply(ParDo.of(new DoFn<KV<K, CoGbkResult>, /* some result */>() { + @ProcessElement + public void processElement(ProcessContext c) { + KV<K, CoGbkResult> e = c.element(); + CoGbkResult result = e.getValue(); + // Retrieve all integers associated with this key from pt1 + Iterable<Integer> allIntegers = result.getAll(t1); + // Retrieve the string associated with this key from pt2. + // Note: This will fail if multiple values had the same key in pt2. + String string = e.getOnly(t2); + ... +})); +``` + +## Related transforms +* [GroupByKey]({{ site.baseurl }}/documentation/transforms/java/aggregation/groupbykey) + takes one input collection. \ No newline at end of file diff --git a/website/src/documentation/transforms/java/aggregation/combine.md b/website/src/documentation/transforms/java/aggregation/combine.md new file mode 100644 index 0000000..3bca34b --- /dev/null +++ b/website/src/documentation/transforms/java/aggregation/combine.md @@ -0,0 +1,82 @@ +--- +layout: section +title: "Combine" +permalink: /documentation/transforms/java/aggregation/combine/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# Combine +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/Combine.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +A user-defined `CombineFn` may be applied to combine all elements in a +`PCollection` (global combine) or to combine all elements associated +with each key. + +While the result is similar to applying a `GroupByKey` followed by +aggregating values in each `Iterable`, there is an impact +on the code you must write as well as the performance of the pipeline. +Writing a `ParDo` that counts the number of elements in each value +would be very straightforward. However, as described in the execution +model, it would also require all values associated with each key to be +processed by a single worker. This introduces a lot of communication overhead. +Using a `CombineFn` requires the code be structured as an associative and +commumative operation. But, it allows the use of partial sums to be precomputed. + +See more information in the [Beam Programming Guide]({{ site.baseurl }}/documentation/programming-guide/#combine). + +## Examples +**Example 1**: Global combine +Use the global combine to combine all of the elements in a given `PCollection` +into a single value, represented in your pipeline as a new `PCollection` containing +one element. The following example code shows how to apply the Beam-provided +sum combine function to produce a single sum value for a `PCollection` of integers. + +```java +// Sum.SumIntegerFn() combines the elements in the input PCollection. The resulting PCollection, called sum, +// contains one value: the sum of all the elements in the input PCollection. +PCollection<Integer> pc = ...; +PCollection<Integer> sum = pc.apply( + Combine.globally(new Sum.SumIntegerFn())); +``` + +**Example 2**: Keyed combine +Use a keyed combine to to combine all of the values associated with each key +into a single output value for each key. As with the global combine, the +function passed to a keyed combine must be associative and commutative. + +```java +// PCollection is grouped by key and the Double values associated with each key are combined into a Double. +PCollection<KV<String, Double>> salesRecords = ...; +PCollection<KV<String, Double>> totalSalesPerPerson = + salesRecords.apply(Combine.<String, Double, Double>perKey( + new Sum.SumDoubleFn())); +// The combined value is of a different type than the original collection of values per key. PCollection has +// keys of type String and values of type Integer, and the combined value is a Double. +PCollection<KV<String, Integer>> playerAccuracy = ...; +PCollection<KV<String, Double>> avgAccuracyPerPlayer = + playerAccuracy.apply(Combine.<String, Integer, Double>perKey( + new MeanInts()))); +``` + +## Related transforms +* [CombineWithContext]({{ site.baseurl }}/documentation/transforms/java/aggregation/combinewithcontext) +* [GroupByKey]({{ site.baseurl }}/documentation/transforms/java/aggregation/groupbykey) \ No newline at end of file diff --git a/website/src/documentation/transforms/java/aggregation/combinewithcontext.md b/website/src/documentation/transforms/java/aggregation/combinewithcontext.md new file mode 100644 index 0000000..20f29dd --- /dev/null +++ b/website/src/documentation/transforms/java/aggregation/combinewithcontext.md @@ -0,0 +1,37 @@ +--- +layout: section +title: "CombineWithContext" +permalink: /documentation/transforms/java/aggregation/combinewithcontext/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# CombineWithContext +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/curent/index.html?org/apache/beam/sdk/transforms/CombineWithContext.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +A class of transforms that contains combine functions that have access to `PipelineOptions` and side inputs through `CombineWithContext.Context`. + +## Examples +See [BEAM-7703](https://issues.apache.org/jira/browse/BEAM-7703) for updates. + +## Related transforms +* [Combine]({{ site.baseurl }}/documentation/transforms/java/aggregation/combine) + for combining all values associated with a key to a single result \ No newline at end of file diff --git a/website/src/documentation/transforms/java/aggregation/count.md b/website/src/documentation/transforms/java/aggregation/count.md new file mode 100644 index 0000000..17b9ff7 --- /dev/null +++ b/website/src/documentation/transforms/java/aggregation/count.md @@ -0,0 +1,50 @@ +--- +layout: section +title: "Count" +permalink: /documentation/transforms/java/aggregation/count/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# Count +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/Count.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +Counts the number of elements within each aggregation. The `Count` +transform has three varieties: + +* `Count.globally()` counts the number of elements in the entire + `PCollection`. The result is a collection with a single element. +* `Count.perKey()` counts how many elements are associated with each + key. It ignores the values. The resulting collection has one + output for every key in the input collection. +* `Count.perElement()` counts how many times each element appears + in the input collection. The output collection is a key-value + pair, containing each unique element and the number of times it + appeared in the original collection. + +## Examples +See [BEAM-7703](https://issues.apache.org/jira/browse/BEAM-7703) for updates. + +## Related transforms +* [ApproximateUnique]({{ site.baseurl }}/documentation/transforms/java/aggregation/approximateunique) + estimates the number of distinct elements or distinct values in key-value pairs +* [Sum]({{ site.baseurl }}/documentation/transforms/java/aggregation/sum) computes + the sum of elements in a collection \ No newline at end of file diff --git a/website/src/documentation/transforms/java/aggregation/distinct.md b/website/src/documentation/transforms/java/aggregation/distinct.md new file mode 100644 index 0000000..08a3bcd --- /dev/null +++ b/website/src/documentation/transforms/java/aggregation/distinct.md @@ -0,0 +1,43 @@ +--- +layout: section +title: "Distinct" +permalink: /documentation/transforms/java/aggregation/distinct/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# Distinct +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/Distinct.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +Produces a collection containing distinct elements of the input collection. + +On some data sets, it might be more efficient to compute an approximate +answer using `ApproximateUnique`, which also allows for determining distinct +values for each key. + +## Examples +See [BEAM-7703](https://issues.apache.org/jira/browse/BEAM-7703) for updates. + +## Related transforms +* [Count]({{ site.baseurl }}/documentation/transforms/java/aggregation/count) + counts the number of elements within each aggregation. +* [ApproximateUnique]({{ site.baseurl }}/documentation/transforms/java/aggregation/approximateunique) + estimates the number of distinct elements in a collection. \ No newline at end of file diff --git a/website/src/documentation/transforms/java/aggregation/groupbykey.md b/website/src/documentation/transforms/java/aggregation/groupbykey.md new file mode 100644 index 0000000..f73cb4f --- /dev/null +++ b/website/src/documentation/transforms/java/aggregation/groupbykey.md @@ -0,0 +1,50 @@ +--- +layout: section +title: "GroupByKey" +permalink: /documentation/transforms/java/aggregation/groupbykey/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# GroupByKey +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/GroupByKey.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +Takes a keyed collection of elements and produces a collection where +each element consists of a key and an `Iterable` of all values +associated with that key. + +The results can be combined with windowing to subdivide each key +based on time or triggering to produce partial aggregations. Either +windowing or triggering is necessary when processing unbounded collections. + +See more information in the [Beam Programming Guide]({{ site.baseurl }}/documentation/programming-guide/#groupbykey). + +## Examples +**Example 1**: (a, 1), (b, 2), (a, 3) will result into (a, [1, 3]), (b, [2]). + +**Example 2**: Given a collection of customer orders keyed by postal code, +you could use `GroupByKey` to get the collection of all orders in each postal code. + +## Related transforms +* [CoGroupByKey]({{ site.baseurl }}/documentation/transforms/java/aggregation/cogroupbykey) + for multiple input collections +* [Combine]({{ site.baseurl }}/documentation/transforms/java/aggregation/combine) + for combining all values associated with a key to a single result \ No newline at end of file diff --git a/website/src/documentation/transforms/java/aggregation/groupintobatches.md b/website/src/documentation/transforms/java/aggregation/groupintobatches.md new file mode 100644 index 0000000..8874648 --- /dev/null +++ b/website/src/documentation/transforms/java/aggregation/groupintobatches.md @@ -0,0 +1,42 @@ +--- +layout: section +title: "GroupIntoBatches" +permalink: /documentation/transforms/java/aggregation/groupintobatches/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# GroupIntoBatches +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/GroupIntoBatches.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +Batches inputs to a desired batch size. + +Batches contain only elements of a single key. Elements are buffered until +`batchSize` number of elements buffered. Then, these elements are output +to the output collection. + +Batches contain elements from the same window, so windows are preserved. Batches might contain elements from more than one bundle. + +## Examples +See [BEAM-7703](https://issues.apache.org/jira/browse/BEAM-7703) for updates. + +## Related transforms +* [GroupByKey]({{ site.baseurl }}/documentation/transforms/java/aggregation/groupbykey) takes one input collection. \ No newline at end of file diff --git a/website/src/documentation/transforms/java/aggregation/latest.md b/website/src/documentation/transforms/java/aggregation/latest.md new file mode 100644 index 0000000..a2ed7a5 --- /dev/null +++ b/website/src/documentation/transforms/java/aggregation/latest.md @@ -0,0 +1,52 @@ +--- +layout: section +title: "Latest" +permalink: /documentation/transforms/java/aggregation/latest/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# Latest +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/Latest.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +A transform and `Combine.CombineFn` for computing the latest element in a collection. + +* `Latest.globally()` takes a collection of values and produces the collection + containing the single value with the latest implicit timestamp. +* `Latest.perKey()` takes a collection of key value pairs, and returns the + latest value for each key, according to the implicit timestamp. + +For elements with the same timestamp, the output element is arbitrarily selected. + +## Examples +**Example**: compute the latest value for each session +```java + PCollection input = ...; + PCollection sessioned = input + .apply(Window.into(Sessions.withGapDuration(Duration.standardMinutes(5))); + PCollection latestValues = sessioned.apply(Latest.globally()); +``` + +## Related transforms +* [Reify]({{ site.baseurl }}/documentation/transforms/java/elementwise/reify) + converts between explicit and implicit form of various Beam values +* [WithTimestamps]({{ site.baseurl }}/documentation/transforms/java/elementwise/withtimestamps) + assigns timestamps to all the elements of a collection diff --git a/website/src/documentation/transforms/java/aggregation/max.md b/website/src/documentation/transforms/java/aggregation/max.md new file mode 100644 index 0000000..941647f --- /dev/null +++ b/website/src/documentation/transforms/java/aggregation/max.md @@ -0,0 +1,56 @@ +--- +layout: section +title: "Max" +permalink: /documentation/transforms/java/aggregation/max/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# Max +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/Max.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +Provides a variety of different transforms for computing the maximum +values in a collection, either globally or for each key. + +## Examples +**Example 1**: get the maximum of a `PCollection` of `Doubles`. + +```java +PCollection<Double> input = ...; +PCollection<Double> max = input.apply(Max.doublesGlobally()); +``` + +**Example 2**: calculate the maximum of the `Integers` associated +with each unique key (which is of type `String`). + +```java +PCollection<KV<String, Integer>> input = ...; +PCollection<KV<String, Integer>> maxPerKey = input + .apply(Max.integersPerKey()); +``` + +## Related transforms +* [Min]({{ site.baseurl }}/documentation/transforms/java/aggregation/min) + for computing minimum values in a collection +* [Mean]({{ site.baseurl }}/documentation/transforms/java/aggregation/mean) + for computing the arithmetic mean of the elements in a collection +* [Combine]({{ site.baseurl }}/documentation/transforms/java/aggregation/combine) + for combining all values associated with a key to a single result \ No newline at end of file diff --git a/website/src/documentation/transforms/java/aggregation/mean.md b/website/src/documentation/transforms/java/aggregation/mean.md new file mode 100644 index 0000000..36b487e --- /dev/null +++ b/website/src/documentation/transforms/java/aggregation/mean.md @@ -0,0 +1,58 @@ +--- +layout: section +title: "Mean" +permalink: /documentation/transforms/java/aggregation/mean/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# Mean +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/Mean.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +Transforms for computing the arithmetic mean of the elements in a collection, +or the mean of the values associated with each key in a collection of key-value pairs. + +* `Mean.globally()` returns a transform that then returns a collection whose contents is the mean of the input collection's elements. If there are no elements in the input collection, it returns 0. +* `Mean.perKey()` returns a transform that returns a collection that contains an output element mapping each distinct key in the input collection to the mean of the values associated with that key in the input collection. + +## Examples +**Example 1**: get the mean of a `PCollection` of `Longs`. + +```java +PCollection<Double> input = ...; +PCollection<Double> mean = input.apply(Mean.globally()); +``` + +**Example 2**: calculate the mean of the `Integers` associated with each unique key (which is of type `String`). + +```java +PCollection<KV<String, Integer>> input = ...; +PCollection<KV<String, Integer>> meanPerKey = + input.apply(Mean.perKey()); +``` + +## Related transforms +* [Max]({{ site.baseurl }}/documentation/transforms/java/aggregation/max) + for computing maximum values in a collection +* [Min]({{ site.baseurl }}/documentation/transforms/java/aggregation/min) + for computing maximum values in a collection +* [Combine]({{ site.baseurl }}/documentation/transforms/java/aggregation/combine) + for combining all values associated with a key to a single result \ No newline at end of file diff --git a/website/src/documentation/transforms/java/aggregation/min.md b/website/src/documentation/transforms/java/aggregation/min.md new file mode 100644 index 0000000..538b02f --- /dev/null +++ b/website/src/documentation/transforms/java/aggregation/min.md @@ -0,0 +1,42 @@ +--- +layout: section +title: "Min" +permalink: /documentation/transforms/java/aggregation/min/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# Min +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/Min.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +Provides a variety of different transforms for computing the minimum +values in a collection, either globally or for each key. + +## Examples +See [BEAM-7703](https://issues.apache.org/jira/browse/BEAM-7703) for updates. + +## Related transforms +* [Max]({{ site.baseurl }}/documentation/transforms/java/aggregation/max) + for computing maximum values in a collection +* [Mean]({{ site.baseurl }}/documentation/transforms/java/aggregation/mean) + for computing the arithmetic mean of the elements in a collection +* [Combine]({{ site.baseurl }}/documentation/transforms/java/aggregation/combine) + for combining all values associated with a key to a single result \ No newline at end of file diff --git a/website/src/documentation/transforms/java/aggregation/sample.md b/website/src/documentation/transforms/java/aggregation/sample.md new file mode 100644 index 0000000..6f7e181 --- /dev/null +++ b/website/src/documentation/transforms/java/aggregation/sample.md @@ -0,0 +1,40 @@ +--- +layout: section +title: "Sample" +permalink: /documentation/transforms/java/aggregation/sample/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# Sample +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/Sample.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +Transforms for taking samples of the elements in a collection, or +samples of the values associated with each key in a collection of key-value pairs. + +## Examples +See [BEAM-7703](https://issues.apache.org/jira/browse/BEAM-7703) for updates. + +## Related transforms +* [Top]({{ site.baseurl }}/documentation/transforms/java/aggregation/top) + finds the largest (or smallest) set of elements in a collection +* [Latest]({{ site.baseurl }}/documentation/transforms/java/aggregation/latest) + computes the latest element in a collection \ No newline at end of file diff --git a/website/src/documentation/transforms/java/aggregation/sum.md b/website/src/documentation/transforms/java/aggregation/sum.md new file mode 100644 index 0000000..ee1f282 --- /dev/null +++ b/website/src/documentation/transforms/java/aggregation/sum.md @@ -0,0 +1,51 @@ +--- +layout: section +title: "Sum" +permalink: /documentation/transforms/java/aggregation/sum/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# Sum +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/Sum.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +Transforms for computing the sum of the elements in a collection, or the sum of the +values associated with each key in a collection of key-value pairs. + +## Examples +**Example 1**: get the sum of a `PCollection` of `Doubles`. + +```java +PCollection<Double> input = ...; +PCollection<Double> sum = input.apply(Sum.doublesGlobally()); +``` + +Example 2: calculate the sum of the `Integers` associated with each unique key (which is of type `String`). + +```java +PCollection<KV<String, Integer>> input = ...; +PCollection<KV<String, Integer>> sumPerKey = input + .apply(Sum.integersPerKey()); +``` + +## Related transforms +* [Count]({{ site.baseurl }}/documentation/transforms/java/aggregation/count) + counts the number of elements within each aggregation \ No newline at end of file diff --git a/website/src/documentation/transforms/java/aggregation/top.md b/website/src/documentation/transforms/java/aggregation/top.md new file mode 100644 index 0000000..7f060f1 --- /dev/null +++ b/website/src/documentation/transforms/java/aggregation/top.md @@ -0,0 +1,39 @@ +--- +layout: section +title: "Top" +permalink: /documentation/transforms/java/aggregation/top/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# Top +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/Top.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +Transforms for finding the largest (or smallest) set of elements in +a collection, or the largest (or smallest) set of values associated +with each key in a collection of key-value pairs. + +## Examples +See [BEAM-7703](https://issues.apache.org/jira/browse/BEAM-7703) for updates. + +## Related transforms +* [Sample]({{ site.baseurl }}/documentation/transforms/java/aggregation/sample) + takes samples of collection \ No newline at end of file diff --git a/website/src/documentation/transforms/java/element-wise/filter.md b/website/src/documentation/transforms/java/element-wise/filter.md new file mode 100644 index 0000000..975fa09 --- /dev/null +++ b/website/src/documentation/transforms/java/element-wise/filter.md @@ -0,0 +1,62 @@ +--- +layout: section +title: "Filter" +permalink: /documentation/transforms/java/elementwise/filter/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# Filter +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/Filter.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +Given a predicate, filter out all elements that don't satisfy that predicate. +May also be used to filter based on an inequality with a given value based +on the natural ordering of the element. + +## Examples +**Example 1**: Filtering with a predicate + +```java +PCollection<String> allStrings = Create.of("Hello", "world", "hi"); +PCollection<String> longStrings = allStrings + .apply(Filter.by(new SerializableFunction<String, Boolean>() { + @Override + public Boolean apply(String input) { + return input.length() > 3; + } + })); +``` +The result is a `PCollection` containing "Hello" and "world". + +**Example 2**: Filtering with an inequality + +```java +PCollection<Long> numbers = Create.of(1L, 2L, 3L, 4L, 5L); +PCollection<Long> bigNumbers = numbers.apply(Filter.greaterThan(3)); +PCollection<Long> smallNumbers = numbers.apply(Filter.lessThanEq(3)); +``` +Other variants include `Filter.greaterThanEq`, `Filter.lessThan` and `Filter.equal`. + +## Related transforms +* [FlatMapElements]({{ site.baseurl }}/documentation/transforms/java/elementwise/flatmapelements) behaves the same as `Map`, but for + each input it might produce zero or more outputs. +* [ParDo]({{ site.baseurl }}/documentation/transforms/java/elementwise/pardo) is the most general element-wise mapping + operation, and includes other abilities such as multiple output collections and side-inputs. \ No newline at end of file diff --git a/website/src/documentation/transforms/java/element-wise/flatmapelements.md b/website/src/documentation/transforms/java/element-wise/flatmapelements.md new file mode 100644 index 0000000..62fe930 --- /dev/null +++ b/website/src/documentation/transforms/java/element-wise/flatmapelements.md @@ -0,0 +1,40 @@ +--- +layout: section +title: "FlatMapElements" +permalink: /documentation/transforms/java/elementwise/flatmapelements/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# FlatMapElements +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/FlatMapElements.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +Applies a simple 1-to-many mapping function over each element in the +collection. The many elements are flattened into the resulting collection. + +## Examples +See [BEAM-7702](https://issues.apache.org/jira/browse/BEAM-7702) for updates. + +## Related transforms +* [Filter]({{ site.baseurl }}/documentation/transforms/java/elementwise/filter) is useful if the function is just + deciding whether to output an element or not. +* [ParDo]({{ site.baseurl }}/documentation/transforms/java/elementwise/pardo) is the most general element-wise mapping + operation, and includes other abilities such as multiple output collections and side-inputs. \ No newline at end of file diff --git a/website/src/documentation/transforms/java/element-wise/keys.md b/website/src/documentation/transforms/java/element-wise/keys.md new file mode 100644 index 0000000..b0d0738 --- /dev/null +++ b/website/src/documentation/transforms/java/element-wise/keys.md @@ -0,0 +1,43 @@ +--- +layout: section +title: "Keys" +permalink: /documentation/transforms/java/elementwise/keys/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# Keys +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/Keys.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +Takes a collection of key-value pairs, and returns the key of each element. + +## Examples +**Example** + +```java +PCollection<KV<String, Integer>> keyValuePairs = /* ... */; +PCollection<String> keys = keyValuePairs.apply(Keys.create()); +``` + +## Related transforms +* [KvSwap]({{ site.baseurl }}/documentation/transforms/java/elementwise/kvswap) swaps key-value pair values. +* [Values]({{ site.baseurl }}/documentation/transforms/java/elementwise/values) for extracting the value of each element. +* [WithKeys]({{ site.baseurl }}/documentation/transforms/java/elementwise/withkeys) for adding a key to each element. \ No newline at end of file diff --git a/website/src/documentation/transforms/java/element-wise/kvswap.md b/website/src/documentation/transforms/java/element-wise/kvswap.md new file mode 100644 index 0000000..96a423c --- /dev/null +++ b/website/src/documentation/transforms/java/element-wise/kvswap.md @@ -0,0 +1,43 @@ +--- +layout: section +title: "KvSwap" +permalink: /documentation/transforms/java/elementwise/kvswap/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# KvSwap +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/KvSwap.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +Takes a collection of key-value pairs and returns a collection of key-value pairs which has each key and value swapped. + +## Examples +**Example**: + +```java +PCollection<KV<String, Integer>> strIntPairs = /* ... */; +PCollection<KV<Integer, String>> intStrPairs = strIntPairs.apply(KvSwap.create()); +``` + +## Related transforms +* [Keys]({{ site.baseurl }}/documentation/transforms/java/elementwise/keys) for extracting the key of each component. +* [Values]({{ site.baseurl }}/documentation/transforms/java/elementwise/values) for extracting the value of each element. +* [WithKeys]({{ site.baseurl }}/documentation/transforms/java/elementwise/withkeys) for adding a key to each element. \ No newline at end of file diff --git a/website/src/documentation/transforms/java/element-wise/mapelements.md b/website/src/documentation/transforms/java/element-wise/mapelements.md new file mode 100644 index 0000000..e916e2e --- /dev/null +++ b/website/src/documentation/transforms/java/element-wise/mapelements.md @@ -0,0 +1,63 @@ +--- +layout: section +title: "MapElements" +permalink: /documentation/transforms/java/elementwise/mapelements/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# MapElements +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/MapElements.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +Applies a simple 1-to-1 mapping function over each element in the collection. + +## Examples +**Example 1**: providing the mapping function using a `SimpleFunction` + +```java +PCollection<String> lines = Create.of("Hello World", "Beam is fun"); +PCollection<Integer> lineLengths = lines.apply(MapElements.via( + new SimpleFunction<String, Integer>() { + @Override + public Integer apply(String line) { + return line.length(); + } + }); +``` + +**Example 2**: providing the mapping function using a `SerializableFunction`, +which allows the use of Java 8 lambdas. Due to type erasure, you need +to provide a hint indicating the desired return type. + +```java +PCollection<String> lines = Create.of("Hello World", "Beam is fun"); +PCollection<Integer> lineLengths = lines.apply(MapElements + .into(TypeDescriptors.integers()) + .via((String line) -> line.length())); +``` + +## Related transforms +* [FlatMapElements]({{ site.baseurl }}/documentation/transforms/java/elementwise/flatmapelements) behaves the same as `Map`, but for + each input it may produce zero or more outputs. +* [Filter]({{ site.baseurl }}/documentation/transforms/java/elementwise/filter) is useful if the function is just + deciding whether to output an element or not. +* [ParDo]({{ site.baseurl }}/documentation/transforms/java/elementwise/pardo) is the most general element-wise mapping + operation, and includes other abilities such as multiple output collections and side-inputs. \ No newline at end of file diff --git a/website/src/documentation/transforms/java/element-wise/pardo.md b/website/src/documentation/transforms/java/element-wise/pardo.md new file mode 100644 index 0000000..3a7b979 --- /dev/null +++ b/website/src/documentation/transforms/java/element-wise/pardo.md @@ -0,0 +1,152 @@ +--- +layout: section +title: "ParDo" +permalink: /documentation/transforms/java/elementwise/pardo/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# ParDo +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/ParDo.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +A transform for generic parallel processing. A `ParDo` transform considers each +element in the input `PCollection`, performs some processing function +(your user code) on that element, and emits zero or more elements to +an output PCollection. + +See more information in the [Beam Programming Guide]({{ site.baseurl }}/documentation/programming-guide/#pardo). + +## Examples +**Example 1**: Passing side inputs + +```java + // Pass side inputs to your ParDo transform by invoking .withSideInputs. + // Inside your DoFn, access the side input by using the method DoFn.ProcessContext.sideInput. + + // The input PCollection to ParDo. + PCollection<String> words = ...; + + // A PCollection of word lengths that we'll combine into a single value. + PCollection<Integer> wordLengths = ...; // Singleton PCollection + + // Create a singleton PCollectionView from wordLengths using Combine.globally and View.asSingleton. + final PCollectionView<Integer> maxWordLengthCutOffView = + wordLengths.apply(Combine.globally(new Max.MaxIntFn()).asSingletonView()); + + + // Apply a ParDo that takes maxWordLengthCutOffView as a side input. + PCollection<String> wordsBelowCutOff = + words.apply(ParDo + .of(new DoFn<String, String>() { + public void processElement(ProcessContext c) { + String word = c.element(); + // In our DoFn, access the side input. + int lengthCutOff = c.sideInput(maxWordLengthCutOffView); + if (word.length() <= lengthCutOff) { + c.output(word); + } + } + }).withSideInputs(maxWordLengthCutOffView) + ); +``` + +**Example 2**: Emitting to multiple outputs in your `DoFn` + +```java +// To emit elements to multiple output PCollections, create a TupleTag object to identify each collection +// that your ParDo produces. For example, if your ParDo produces three output PCollections (the main output +// and two additional outputs), you must create three TupleTags. The following example code shows how to +// create TupleTags for a ParDo with three output PCollections. + + // Input PCollection to our ParDo. + PCollection<String> words = ...; + + // The ParDo will filter words whose length is below a cutoff and add them to + // the main ouput PCollection<String>. + // If a word is above the cutoff, the ParDo will add the word length to an + // output PCollection<Integer>. + // If a word starts with the string "MARKER", the ParDo will add that word to an + // output PCollection<String>. + final int wordLengthCutOff = 10; + + // Create three TupleTags, one for each output PCollection. + // Output that contains words below the length cutoff. + final TupleTag<String> wordsBelowCutOffTag = + new TupleTag<String>(){}; + // Output that contains word lengths. + final TupleTag<Integer> wordLengthsAboveCutOffTag = + new TupleTag<Integer>(){}; + // Output that contains "MARKER" words. + final TupleTag<String> markedWordsTag = + new TupleTag<String>(){}; + +// Passing Output Tags to ParDo: +// After you specify the TupleTags for each of your ParDo outputs, pass the tags to your ParDo by invoking +// .withOutputTags. You pass the tag for the main output first, and then the tags for any additional outputs +// in a TupleTagList. Building on our previous example, we pass the three TupleTags for our three output +// PCollections to our ParDo. Note that all of the outputs (including the main output PCollection) are +// bundled into the returned PCollectionTuple. + + PCollectionTuple results = + words.apply(ParDo + .of(new DoFn<String, String>() { + // DoFn continues here. + ... + }) + // Specify the tag for the main output. + .withOutputTags(wordsBelowCutOffTag, + // Specify the tags for the two additional outputs as a TupleTagList. + TupleTagList.of(wordLengthsAboveCutOffTag) + .and(markedWordsTag))); +``` + +**Example 3**: Tags for multiple outputs + +```java +// Inside your ParDo's DoFn, you can emit an element to a specific output PCollection by passing in the +// appropriate TupleTag when you call ProcessContext.output. +// After your ParDo, extract the resulting output PCollections from the returned PCollectionTuple. +// Based on the previous example, this shows the DoFn emitting to the main output and two additional outputs. + + .of(new DoFn<String, String>() { + public void processElement(ProcessContext c) { + String word = c.element(); + if (word.length() <= wordLengthCutOff) { + // Emit short word to the main output. + // In this example, it is the output with tag wordsBelowCutOffTag. + c.output(word); + } else { + // Emit long word length to the output with tag wordLengthsAboveCutOffTag. + c.output(wordLengthsAboveCutOffTag, word.length()); + } + if (word.startsWith("MARKER")) { + // Emit word to the output with tag markedWordsTag. + c.output(markedWordsTag, word); + } + }})); +``` + + +## Related transforms +* [MapElements]({{ site.baseurl }}/documentation/transforms/java/elementwise/mapelements) + applies a simple 1-to-1 mapping function over each element in the collection. +* [Filter]({{ site.baseurl }}/documentation/transforms/java/elementwise/filter) + is useful if the function is just deciding whether to output an element or not. \ No newline at end of file diff --git a/website/src/documentation/transforms/java/element-wise/partition.md b/website/src/documentation/transforms/java/element-wise/partition.md new file mode 100644 index 0000000..583f631 --- /dev/null +++ b/website/src/documentation/transforms/java/element-wise/partition.md @@ -0,0 +1,62 @@ +--- +layout: section +title: "Partition" +permalink: /documentation/transforms/java/elementwise/partition/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# Partition +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/Partition.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +Separates elements in a collection into multiple output collections. The partitioning function contains the logic that determines how to separate the elements of the input collection into each resulting partition output collection. + +The number of partitions must be determined at graph construction time. You cannot determine the number of partitions in mid-pipeline. + +See more information in the [Beam Programming Guide]({{ site.baseurl }}/documentation/programming-guide/#partition). + +## Examples +**Example**: dividing a `PCollection` into percentile groups + +```java +// Provide an int value with the desired number of result partitions, and a PartitionFn that represents the +// partitioning function. In this example, we define the PartitionFn in-line. Returns a PCollectionList +// containing each of the resulting partitions as individual PCollection objects. +PCollection<Student> students = ...; +// Split students up into 10 partitions, by percentile: +PCollectionList<Student> studentsByPercentile = + students.apply(Partition.of(10, new PartitionFn<Student>() { + public int partitionFor(Student student, int numPartitions) { + return student.getPercentile() // 0..99 + * numPartitions / 100; + }})); + +// You can extract each partition from the PCollectionList using the get method, as follows: +PCollection<Student> fortiethPercentile = studentsByPercentile.get(4); +``` + +## Related transforms +* [Filter]({{ site.baseurl }}/documentation/transforms/java/elementwise/filter) is useful if the function is just + deciding whether to output an element or not. +* [ParDo]({{ site.baseurl }}/documentation/transforms/java/elementwise/pardo) is the most general element-wise mapping + operation, and includes other abilities such as multiple output collections and side-inputs. +* [CoGroupByKey]({{ site.baseurl }}/documentation/transforms/java/aggregation/cogroupbykey) + performs a per-key equijoin. \ No newline at end of file diff --git a/website/src/documentation/transforms/java/element-wise/regex.md b/website/src/documentation/transforms/java/element-wise/regex.md new file mode 100644 index 0000000..5fff595 --- /dev/null +++ b/website/src/documentation/transforms/java/element-wise/regex.md @@ -0,0 +1,36 @@ +--- +layout: section +title: "Regex" +permalink: /documentation/transforms/java/elementwise/regex/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# Regex +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/Regex.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +Provides a variety of functionality based on regular expressions. + +## Examples +See [BEAM-7702](https://issues.apache.org/jira/browse/BEAM-7702) for updates. + +## Related transforms +* [MapElements]({{ site.baseurl }}/documentation/transforms/java/elementwise/mapelements) \ No newline at end of file diff --git a/website/src/documentation/transforms/java/element-wise/reify.md b/website/src/documentation/transforms/java/element-wise/reify.md new file mode 100644 index 0000000..84c3241 --- /dev/null +++ b/website/src/documentation/transforms/java/element-wise/reify.md @@ -0,0 +1,39 @@ +--- +layout: section +title: "Reify" +permalink: /documentation/transforms/java/elementwise/reify/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# Reify +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/Reify.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +Transforms for converting between explicit and implicit form of various Beam values. + +## Examples +See [BEAM-7702](https://issues.apache.org/jira/browse/BEAM-7702) for updates. + +## Related transforms +* [WithTimestamps]({{ site.baseurl }}/documentation/transforms/java/elementwise/withtimestamps) + assigns timestamps to all the elements of a collection +* [Window]({{ site.baseurl }}/documentation/transforms/java/other/window/) divides up or + groups the elements of a collection into finite windows diff --git a/website/src/documentation/transforms/java/element-wise/tostring.md b/website/src/documentation/transforms/java/element-wise/tostring.md new file mode 100644 index 0000000..dde6ac0 --- /dev/null +++ b/website/src/documentation/transforms/java/element-wise/tostring.md @@ -0,0 +1,37 @@ +--- +layout: section +title: "ToString" +permalink: /documentation/transforms/java/elementwise/tostring/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# ToString +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/ToString.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +A variety of utility transforms for invoking the `toString()` method +on every element in the input collection. + +## Examples +See [BEAM-7702](https://issues.apache.org/jira/browse/BEAM-7702) for updates. + +## Related transforms +* [MapElements]({{ site.baseurl }}/documentation/transforms/java/elementwise/mapelements) \ No newline at end of file diff --git a/website/src/documentation/transforms/java/element-wise/values.md b/website/src/documentation/transforms/java/element-wise/values.md new file mode 100644 index 0000000..b810db4 --- /dev/null +++ b/website/src/documentation/transforms/java/element-wise/values.md @@ -0,0 +1,44 @@ +--- +layout: section +title: "Values" +permalink: /documentation/transforms/java/elementwise/values/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# Values +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/Values.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +The `Values` transform takes a collection of key-value pairs, and +returns the value of each element. + +## Examples +**Example** + +```java +PCollection<KV<String, Integer>> keyValuePairs = /* ... */; +PCollection<Integer> values = keyValuePairs.apply(Values.create()); +``` + +## Related transforms +* [Keys]({{ site.baseurl }}/documentation/transforms/java/elementwise/keys) for extracting the key of each component. +* [KvSwap]({{ site.baseurl }}/documentation/transforms/java/elementwise/kvswap) swaps key-value pair values. +* [WithKeys]({{ site.baseurl }}/documentation/transforms/java/elementwise/withkeys) for adding a key to each element. diff --git a/website/src/documentation/transforms/java/element-wise/withkeys.md b/website/src/documentation/transforms/java/element-wise/withkeys.md new file mode 100644 index 0000000..6c6df43 --- /dev/null +++ b/website/src/documentation/transforms/java/element-wise/withkeys.md @@ -0,0 +1,55 @@ +--- +layout: section +title: "WithKeys" +permalink: /documentation/transforms/java/elementwise/withkeys/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# WithKeys +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/WithKeys.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +Takes a `PCollection<V>` and produces a `PCollection<KV<K, V>>` by associating +each input element with a key. + +There are two versions of `WithKeys`, depending on how the key should be determined: + +* `WithKeys.of(SerializableFunction<V, K> fn)` takes a function to + compute the key from each value. +* `WithKeys.of(K key)` associates each value with the specified key. + +## Examples +**Example** +```java +PCollection<String> words = Create.of("Hello", "World", "Beam", "is", "fun"); +PCollection<KV<Integer, String>> lengthAndWord = + words.apply(WithKeys.of(new SerialiazableFunction<String, Integer>() { + @Override + public Integer apply(String s) { + return s.length(); + } + }); +``` + +## Related transforms +* [Keys]({{ site.baseurl }}/documentation/transforms/java/elementwise/keys) for extracting the key of each component. +* [Values]({{ site.baseurl }}/documentation/transforms/java/elementwise/values) for extracting the value of each element. +* [KvSwap]({{ site.baseurl }}/documentation/transforms/java/elementwise/kvswap) swaps key-value pair values. \ No newline at end of file diff --git a/website/src/documentation/transforms/java/element-wise/withtimestamps.md b/website/src/documentation/transforms/java/element-wise/withtimestamps.md new file mode 100644 index 0000000..a2f7e41 --- /dev/null +++ b/website/src/documentation/transforms/java/element-wise/withtimestamps.md @@ -0,0 +1,36 @@ +--- +layout: section +title: "WithTimestamps" +permalink: /documentation/transforms/java/elementwise/withtimestamps/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# WithTimestamps +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/WithTimestamps.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +Assigns timestamps to all the elements of a collection. + +## Examples +See [BEAM-7702](https://issues.apache.org/jira/browse/BEAM-7702) for updates. + +## Related transforms +* [Reify]({{ site.baseurl }}/documentation/transforms/java/elementwise/reify) \ No newline at end of file diff --git a/website/src/documentation/transforms/java/index.md b/website/src/documentation/transforms/java/index.md new file mode 100644 index 0000000..b36e305 --- /dev/null +++ b/website/src/documentation/transforms/java/index.md @@ -0,0 +1,81 @@ +--- +layout: section +title: "Java transform catalog overview" +permalink: /documentation/transforms/java/overview/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +# Java transform catalog overview + +## Element-wise + +<table class="table-bordered table-striped"> + <tr><th>Transform</th><th>Description</th></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/filter">Filter</a></td><td>Given a predicate, filter out all elements that don't satisfy the predicate.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/flatmapelements">FlatMapElements</a></td><td>Applies a function that returns a collection to every element in the input and + outputs all resulting elements.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/keys">Keys</a></td><td>Extracts the key from each element in a collection of key-value pairs.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/kvswap">KvSwap</a></td><td>Swaps the key and value of each element in a collection of key-value pairs.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/mapelements">MapElements</a></td><td>Applies a function to every element in the input and outputs the result.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/pardo">ParDo</a></td><td>The most-general mechanism for applying a user-defined <code>DoFn</code> to every element + in the input collection.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/partition">Partition</a></td><td>Routes each input element to a specific output collection based on some partition + function.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/regex">Regex</a></td><td>Filters input string elements based on a regex. May also transform them based on the matching groups.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/reify">Reify</a></td><td>Transforms for converting between explicit and implicit form of various Beam values.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/tostring">ToString</a></td><td>Transforms every element in an input collection to a string.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/withkeys">WithKeys</a></td><td>Produces a collection containing each element from the input collection converted to a key-value pair, with a key selected by applying a function to the input element.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/withtimestamps">WithTimestamps</a></td><td>Applies a function to determine a timestamp to each element in the output collection, + and updates the implicit timestamp associated with each input. Note that it is only safe to adjust timestamps forwards.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/elementwise/values">Values</a></td><td>Extracts the value from each element in a collection of key-value pairs.</td></tr> +</table> + + + +## Aggregation +<table class="table-bordered table-striped"> + <tr><th>Transform</th><th>Description</th></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/approximatequantiles">ApproximateQuantiles</a></td><td>Uses an approximation algorithm to estimate the data distribution within each aggregation using a specified number of quantiles.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/approximateunique">ApproximateUnique</a></td><td>Uses an approximation algorithm to estimate the number of unique elements within each aggregation.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/cogroupbykey/">CoGroupByKey</a></td><td>Similar to <code>GroupByKey</code>, but groups values associated with each key into a batch of a given size</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/combine">Combine</a></td><td>Transforms to combine elements according to a provided <code>CombineFn</code>.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/combinewithcontext">CombineWithContext</a></td><td>An extended version of Combine which allows accessing side-inputs and other context.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/count">Count</a></td><td>Counts the number of elements within each aggregation.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/distinct">Distinct</a></td><td>Produces a collection containing distinct elements from the input collection.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/groupbykey">GroupByKey</a></td><td>Takes a keyed collection of elements and produces a collection where each element + consists of a key and all values associated with that key.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/groupintobatches">GroupIntoBatches</a></td><td>Batches values associated with keys into <code>Iterable</code> batches of some size. Each batch contains elements associated with a specific key.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/latest">Latest</a></td><td>Selects the latest element within each aggregation according to the implicit timestamp.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/max">Max</a></td><td>Outputs the maximum element within each aggregation.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/mean">Mean</a></td><td>Computes the average within each aggregation.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/min">Min</a></td><td>Outputs the minimum element within each aggregation.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/sample">Sample</a></td><td>Randomly select some number of elements from each aggregation.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/sum">Sum</a></td><td>Compute the sum of elements in each aggregation.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/aggregation/top">Top</a></td><td>Compute the largest element(s) in each aggregation.</td></tr> +</table> + + +## Other +<table class="table-bordered table-striped"> + <tr><th>Transform</th><th>Description</th></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/other/create">Create</a></td><td>Creates a collection from an in-memory list.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/other/flatten">Flatten</a></td><td>Given multiple input collections, produces a single output collection containing + all elements from all of the input collections.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/other/passert">PAssert</a></td><td>A transform to assert the contents of a <code>PCollection</code> used as part of testing a pipeline either locally or with a runner.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/other/view">View</a></td><td>Operations for turning a collection into view that may be used as a side-input to a <code>ParDo</code>.</td></tr> + <tr><td><a href="{{ site.baseurl }}/documentation/transforms/java/other/window">Window</a></td><td>Logically divides up or groups the elements of a collection into finite + windows according to a provided <code>WindowFn</code>.</td></tr> +</table> \ No newline at end of file diff --git a/website/src/documentation/transforms/java/other/create.md b/website/src/documentation/transforms/java/other/create.md new file mode 100644 index 0000000..d81607b --- /dev/null +++ b/website/src/documentation/transforms/java/other/create.md @@ -0,0 +1,40 @@ +--- +layout: section +title: "Create" +permalink: /documentation/transforms/java/other/create/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# Create +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/Create.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +Creates a collection containing a specified set of elements. This is useful +for testing, as well as creating an initial input to process in parallel. +For example, a single element to execute a one-time `ParDo` or a list of filenames to be read. + + +## Examples + +See [BEAM-7704](https://issues.apache.org/jira/browse/BEAM-7704) for updates. + +## Related transforms +N/A diff --git a/website/src/documentation/transforms/java/other/flatten.md b/website/src/documentation/transforms/java/other/flatten.md new file mode 100644 index 0000000..c8c22aa --- /dev/null +++ b/website/src/documentation/transforms/java/other/flatten.md @@ -0,0 +1,67 @@ +--- +layout: section +title: "Flatten" +permalink: /documentation/transforms/java/other/flatten/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# Flatten +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/Flatten.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +Merges multiple `PCollection` objects into a single logical `PCollection`. + +By default, the coder for the output `PCollection` is the same as the coder +for the first `PCollection` in the input `PCollectionList`. However, the +input `PCollection` objects can each use different coders, as long as +they all contain the same data type in your chosen language. + +When using `Flatten` to merge `PCollection` objects that have a windowing +strategy applied, all of the `PCollection` objects you want to merge must +use a compatible windowing strategy and window sizing. For example, all +the collections you're merging must all use (hypothetically) identical +5-minute fixed windows or 4-minute sliding windows starting every 30 seconds. + +If your pipeline attempts to use `Flatten` to merge `PCollection` objects +with incompatible windows, Beam generates an `IllegalStateException` error +when your pipeline is constructed + +See more information in the [Beam Programming Guide]({{ site.baseurl }}/documentation/programming-guide/#flatten). + +## Examples +**Example**: Apply a `Flatten` transform to merge multiple `PCollection` objects + +```java +// Flatten takes a PCollectionList of PCollection objects of a given type. +// Returns a single PCollection that contains all of the elements in the PCollection objects in that list. +PCollection<String> pc1 = Create.of("Hello"); +PCollection<String> pc2 = Create.of("World", "Beam"); +PCollection<String> pc3 = Create.of("Is", "Fun"); +PCollectionList<String> collections = PCollectionList.of(pc1).and(pc2).and(pc3); + +PCollection<String> merged = collections.apply(Flatten.<String>pCollections()); +``` +The resulting collection now has all the elements: "Hello", "World", +"Beam", "Is", and "Fun". + +## Related transforms +* [ParDo]({{ site.baseurl }}/documentation/transforms/java/elementwise/pardo) +* [Partition]({{ site.baseurl }}/documentation/transforms/java/elementwise/partition) \ No newline at end of file diff --git a/website/src/documentation/transforms/java/other/passert.md b/website/src/documentation/transforms/java/other/passert.md new file mode 100644 index 0000000..85d4201 --- /dev/null +++ b/website/src/documentation/transforms/java/other/passert.md @@ -0,0 +1,61 @@ +--- +layout: section +title: "PAssert" +permalink: /documentation/transforms/java/other/passert/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# PAssert +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/PAssert.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +`PAssert` is a class included in the Beam Java SDK that is an +assertion on the contents of a `PCollection`. You can use `PAssert` to verify +that a `PCollection` contains a specific set of expected elements. + +## Examples +For a given `PCollection`, you can use `PAssert` to verify the contents as follows: +```java +PCollection<String> output = ...; + +// Check whether a PCollection contains some elements in any order. +PAssert.that(output) +.containsInAnyOrder( + "elem1", + "elem3", + "elem2"); +``` + +Any code that uses `PAssert` must link in `JUnit` and `Hamcrest`. +If you're using Maven, you can link in `Hamcrest` by adding the +following dependency to your project's pom.xml file: + +```java +<dependency> + <groupId>org.hamcrest</groupId> + <artifactId>hamcrest-all</artifactId> + <version>1.3</version> + <scope>test</scope> +</dependency> +``` + +## Related transforms +* TestStream \ No newline at end of file diff --git a/website/src/documentation/transforms/java/other/view.md b/website/src/documentation/transforms/java/other/view.md new file mode 100644 index 0000000..fe30b99 --- /dev/null +++ b/website/src/documentation/transforms/java/other/view.md @@ -0,0 +1,37 @@ +--- +layout: section +title: "View" +permalink: /documentation/transforms/java/other/view/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# View +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/2.13.0/index.html?org/apache/beam/sdk/transforms/View.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +Operations for turning a collection into view that may be used as a side-input to a `ParDo`. + +## Examples +See [BEAM-7704](https://issues.apache.org/jira/browse/BEAM-7704) for updates. + +## Related transforms +* [ParDo]({{ site.baseurl }}/documentation/transforms/java/elementwise/pardo) +* [CombineWithContext]({{ site.baseurl }}/documentation/transforms/java/aggregation/combinewithcontext) diff --git a/website/src/documentation/transforms/java/other/window.md b/website/src/documentation/transforms/java/other/window.md new file mode 100644 index 0000000..038f18c --- /dev/null +++ b/website/src/documentation/transforms/java/other/window.md @@ -0,0 +1,40 @@ +--- +layout: section +title: "Window" +permalink: /documentation/transforms/java/other/window/ +section_menu: section-menu/documentation.html +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +# Window +<table align="left"> + <a target="_blank" class="button" + href="https://beam.apache.org/releases/javadoc/current/index.html?org/apache/beam/sdk/transforms/windowing/Window.html"> + <img src="https://beam.apache.org/images/logos/sdks/java.png" width="20px" height="20px" + alt="Javadoc" /> + Javadoc + </a> +</table> +<br> +Logically divides up or groups the elements of a collection into finite +windows according to a function. + +## Examples +See [BEAM-7704](https://issues.apache.org/jira/browse/BEAM-7704) for updates. + +## Related transforms +* [Reify]({{ site.baseurl }}/documentation/transforms/java/elementwise/reify) + converts between explicit and implicit form of various Beam values. +* [WithTimestamps]({{ site.baseurl }}/documentation/transforms/java/elementwise/withtimestamps) + applies a function to determine a timestamp to each element in the output collection. \ No newline at end of file