rszper commented on code in PR #30842: URL: https://github.com/apache/beam/pull/30842#discussion_r1550546970
########## website/www/site/content/en/documentation/sdks/yaml.md: ########## @@ -23,80 +23,132 @@ title: "Apache Beam YAML API" # Beam YAML API -While Beam provides powerful APIs for authoring sophisticated data -processing pipelines, it often still has too high a barrier for -getting started and authoring simple pipelines. Even setting up the -environment, installing the dependencies, and setting up the project -can be an overwhelming amount of boilerplate for some (though -https://beam.apache.org/blog/beam-starter-projects/ has gone a long -way in making this easier). - -Here we provide a simple declarative syntax for describing pipelines -that does not require coding experience or learning how to use an -SDK—any text editor will do. -Some installation may be required to actually *execute* a pipeline, but -we envision various services (such as Dataflow) to accept yaml pipelines -directly obviating the need for even that in the future. -We also anticipate the ability to generate code directly from these -higher-level yaml descriptions, should one want to graduate to a full -Beam SDK (and possibly the other direction as well as far as possible). - -Though we intend this syntax to be easily authored (and read) directly by -humans, this may also prove a useful intermediate representation for -tools to use as well, either as output (e.g. a pipeline authoring GUI) -or consumption (e.g. a lineage analysis tool) and expect it to be more -easily manipulated and semantically meaningful than the Beam protos -themselves (which concern themselves more with execution). - -It should be noted that everything here is still under development, but any -features already included are considered stable. Feedback is welcome at [email protected]. - -## Running pipelines - -The Beam yaml parser is currently included as part of the Apache Beam Python SDK. -This can be installed (e.g. within a virtual environment) as +Beam YAML is a declarative syntax for describing Apache Beam pipelines by using +YAML files. You can use Beam YAML to author and run a Beam pipeline without +writing any code. + +## Overview + +Beam provides a powerful model for creating sophisticated data processing +pipelines. However, getting started with Beam programming can be challenging +because it requires writing code in one of the supported Beam SDK languages. +You need to understand the APIs, set up a project, manage dependencies, and +perform other programming tasks. + +Beam YAML makes it easier to get started with creating Beam pipelines. Instead Review Comment: If possible, I would find a way to make this the first paragraph in the overview so that we start by listing the benefits of Beam YAML instead of the challenges of normal Beam. That might require some rewriting, though, so just a suggestion. Feel free to ignore. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
