tysonjh commented on a change in pull request #13456: URL: https://github.com/apache/beam/pull/13456#discussion_r539452514
########## File path: website/www/site/content/en/blog/splittable-do-fn-is-available.md ########## @@ -0,0 +1,88 @@ +--- +title: "Splittable DoFn in Apache Beam is Ready to Use" +date: 2020-12-16 00:00:01 -0800 +categories: + - blog +aliases: + - /blog/2020/12/16/splittable-do-fn-is-available.html +authors: + - boyuanzz +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +In 2017, [Splittable DoFn Blog Post](https://beam.apache.org/blog/splittable-do-fn/) proposed +to build [Splittable DoFn](https://s.apache.org/splittable-do-fn) APIs as the new recommended way of +building I/O connectors. Splittable DoFn is a generalization of `DoFn` that gives it the core +capabilities of `Source` while retaining `DoFn`'s syntax, flexibility, modularity, and ease of +coding. Thus, it becomes much easier to develop complex I/O connectors with simpler and reusable +code. + +Splittable DoFn has three advantages over the existing `UnboundedSource` and `BoundedSource`: +* Splittable DoFn provides a unified set of APIs to handle both unbounded and bounded cases. +* Splittable DoFn enables reading from source descriptors dynamically. + - Taking Kafka IO as an example, within `UnboundedSource`/`BoundedSource` API, you must specify + the topic and partition you want to read from during pipeline construction time. There is no way + for `UnboundedSource`/`BoundedSource` to accept topics and partitions as inputs during execution + time. But it's native to Splittable DoFn. +* Splittable DoFn fits in as any node on a pipeline freely with the ability of splitting. + - `UnboundedSource`/`BoundedSource` has to be the root node of the pipeline to gain performance + benefits from splitting strategies, which limits many real-world usages. This is no longer a limit + for a Splittable DoFn. + +As Splittable DoFn is now ready to use with all the mentioned improvements, it is the recommended +way to build the new I/O connectors.Try out building your own Splittable DoFn by following the Review comment: ```suggestion way to build the new I/O connectors. Try out building your own Splittable DoFn by following the ``` ########## File path: website/www/site/content/en/blog/splittable-do-fn-is-available.md ########## @@ -0,0 +1,88 @@ +--- +title: "Splittable DoFn in Apache Beam is Ready to Use" +date: 2020-12-16 00:00:01 -0800 +categories: + - blog +aliases: + - /blog/2020/12/16/splittable-do-fn-is-available.html +authors: + - boyuanzz +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +In 2017, [Splittable DoFn Blog Post](https://beam.apache.org/blog/splittable-do-fn/) proposed +to build [Splittable DoFn](https://s.apache.org/splittable-do-fn) APIs as the new recommended way of +building I/O connectors. Splittable DoFn is a generalization of `DoFn` that gives it the core +capabilities of `Source` while retaining `DoFn`'s syntax, flexibility, modularity, and ease of +coding. Thus, it becomes much easier to develop complex I/O connectors with simpler and reusable +code. + +Splittable DoFn has three advantages over the existing `UnboundedSource` and `BoundedSource`: +* Splittable DoFn provides a unified set of APIs to handle both unbounded and bounded cases. +* Splittable DoFn enables reading from source descriptors dynamically. + - Taking Kafka IO as an example, within `UnboundedSource`/`BoundedSource` API, you must specify + the topic and partition you want to read from during pipeline construction time. There is no way + for `UnboundedSource`/`BoundedSource` to accept topics and partitions as inputs during execution + time. But it's native to Splittable DoFn. +* Splittable DoFn fits in as any node on a pipeline freely with the ability of splitting. + - `UnboundedSource`/`BoundedSource` has to be the root node of the pipeline to gain performance + benefits from splitting strategies, which limits many real-world usages. This is no longer a limit + for a Splittable DoFn. + +As Splittable DoFn is now ready to use with all the mentioned improvements, it is the recommended +way to build the new I/O connectors.Try out building your own Splittable DoFn by following the +[programming guide](https://beam.apache.org/documentation/programming-guide/#splittable-dofns). We +have provided tones of common utility classes such as common types of `RestrictionTracker` and Review comment: ```suggestion have provided tonnes of common utility classes such as common types of `RestrictionTracker` and ``` ########## File path: website/www/site/content/en/blog/splittable-do-fn-is-available.md ########## @@ -0,0 +1,88 @@ +--- +title: "Splittable DoFn in Apache Beam is Ready to Use" +date: 2020-12-16 00:00:01 -0800 +categories: + - blog +aliases: + - /blog/2020/12/16/splittable-do-fn-is-available.html +authors: + - boyuanzz +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +In 2017, [Splittable DoFn Blog Post](https://beam.apache.org/blog/splittable-do-fn/) proposed +to build [Splittable DoFn](https://s.apache.org/splittable-do-fn) APIs as the new recommended way of +building I/O connectors. Splittable DoFn is a generalization of `DoFn` that gives it the core +capabilities of `Source` while retaining `DoFn`'s syntax, flexibility, modularity, and ease of +coding. Thus, it becomes much easier to develop complex I/O connectors with simpler and reusable +code. + +Splittable DoFn has three advantages over the existing `UnboundedSource` and `BoundedSource`: +* Splittable DoFn provides a unified set of APIs to handle both unbounded and bounded cases. +* Splittable DoFn enables reading from source descriptors dynamically. + - Taking Kafka IO as an example, within `UnboundedSource`/`BoundedSource` API, you must specify + the topic and partition you want to read from during pipeline construction time. There is no way + for `UnboundedSource`/`BoundedSource` to accept topics and partitions as inputs during execution + time. But it's native to Splittable DoFn. +* Splittable DoFn fits in as any node on a pipeline freely with the ability of splitting. + - `UnboundedSource`/`BoundedSource` has to be the root node of the pipeline to gain performance + benefits from splitting strategies, which limits many real-world usages. This is no longer a limit + for a Splittable DoFn. + +As Splittable DoFn is now ready to use with all the mentioned improvements, it is the recommended +way to build the new I/O connectors.Try out building your own Splittable DoFn by following the +[programming guide](https://beam.apache.org/documentation/programming-guide/#splittable-dofns). We +have provided tones of common utility classes such as common types of `RestrictionTracker` and +`WatermarkEstimator` in Beam SDK, which will help you onboard easily. As for the existing I/O +connectors, we have wrapped `UnboundedSource` and `BoundedSource` implementations into Splittable +DoFns, yet we still encourage developers to convert `UnboundedSource`/`BoundedSource` into actual +Splittable DoFn implementation to gain more performance benefits. + +Many thanks to every contributor who brought this highly expected design into the data processing +world. We are really excited to see that users benefit from Splittable DoFn. + +At the end, hope you enjoy exploring some real-world Splittable DoFn examples. Review comment: ```suggestion Below are some real-world Splittable DoFn examples for you to explore. ``` ########## File path: website/www/site/content/en/blog/splittable-do-fn-is-available.md ########## @@ -0,0 +1,88 @@ +--- +title: "Splittable DoFn in Apache Beam is Ready to Use" +date: 2020-12-16 00:00:01 -0800 +categories: + - blog +aliases: + - /blog/2020/12/16/splittable-do-fn-is-available.html +authors: + - boyuanzz +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +In 2017, [Splittable DoFn Blog Post](https://beam.apache.org/blog/splittable-do-fn/) proposed +to build [Splittable DoFn](https://s.apache.org/splittable-do-fn) APIs as the new recommended way of +building I/O connectors. Splittable DoFn is a generalization of `DoFn` that gives it the core +capabilities of `Source` while retaining `DoFn`'s syntax, flexibility, modularity, and ease of +coding. Thus, it becomes much easier to develop complex I/O connectors with simpler and reusable +code. Review comment: +1 ########## File path: website/www/site/content/en/blog/splittable-do-fn-is-available.md ########## @@ -0,0 +1,88 @@ +--- +title: "Splittable DoFn in Apache Beam is Ready to Use" +date: 2020-12-16 00:00:01 -0800 +categories: + - blog +aliases: + - /blog/2020/12/16/splittable-do-fn-is-available.html +authors: + - boyuanzz +--- +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> + +In 2017, [Splittable DoFn Blog Post](https://beam.apache.org/blog/splittable-do-fn/) proposed +to build [Splittable DoFn](https://s.apache.org/splittable-do-fn) APIs as the new recommended way of +building I/O connectors. Splittable DoFn is a generalization of `DoFn` that gives it the core +capabilities of `Source` while retaining `DoFn`'s syntax, flexibility, modularity, and ease of +coding. Thus, it becomes much easier to develop complex I/O connectors with simpler and reusable +code. + +Splittable DoFn has three advantages over the existing `UnboundedSource` and `BoundedSource`: +* Splittable DoFn provides a unified set of APIs to handle both unbounded and bounded cases. +* Splittable DoFn enables reading from source descriptors dynamically. + - Taking Kafka IO as an example, within `UnboundedSource`/`BoundedSource` API, you must specify + the topic and partition you want to read from during pipeline construction time. There is no way + for `UnboundedSource`/`BoundedSource` to accept topics and partitions as inputs during execution + time. But it's native to Splittable DoFn. +* Splittable DoFn fits in as any node on a pipeline freely with the ability of splitting. + - `UnboundedSource`/`BoundedSource` has to be the root node of the pipeline to gain performance + benefits from splitting strategies, which limits many real-world usages. This is no longer a limit + for a Splittable DoFn. + +As Splittable DoFn is now ready to use with all the mentioned improvements, it is the recommended +way to build the new I/O connectors.Try out building your own Splittable DoFn by following the +[programming guide](https://beam.apache.org/documentation/programming-guide/#splittable-dofns). We +have provided tones of common utility classes such as common types of `RestrictionTracker` and +`WatermarkEstimator` in Beam SDK, which will help you onboard easily. As for the existing I/O +connectors, we have wrapped `UnboundedSource` and `BoundedSource` implementations into Splittable +DoFns, yet we still encourage developers to convert `UnboundedSource`/`BoundedSource` into actual +Splittable DoFn implementation to gain more performance benefits. + +Many thanks to every contributor who brought this highly expected design into the data processing Review comment: ```suggestion Many thanks to every contributor who brought this highly anticipated design into the data processing ``` I think this is what you mean? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
