This is an automated email from the ASF dual-hosted git repository. tvalentyn pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/master by this push: new d38c2d77e7f BEAM-13582 Fixing broken links in the documentation (#17300) d38c2d77e7f is described below commit d38c2d77e7f23711f8964028bc7210fc087136d4 Author: rszper <98840847+rsz...@users.noreply.github.com> AuthorDate: Thu Apr 21 04:36:05 2022 -0700 BEAM-13582 Fixing broken links in the documentation (#17300) --- CHANGES.md | 2 +- website/www/site/content/en/blog/beam-2.29.0.md | 2 +- website/www/site/content/en/blog/beam-a-look-back.md | 2 +- website/www/site/content/en/blog/beam-summit-digital-2020.md | 4 ++-- website/www/site/content/en/blog/beam-summit-europe-2019.md | 2 +- .../site/content/en/blog/review-input-streaming-connectors.md | 10 +++++----- website/www/site/content/en/contribute/become-a-committer.md | 4 ++-- website/www/site/content/en/contribute/release-guide.md | 2 +- .../content/en/documentation/io/built-in/google-bigquery.md | 2 +- website/www/site/content/en/documentation/io/testing.md | 4 ++-- website/www/site/content/en/documentation/programming-guide.md | 2 +- .../content/en/documentation/resources/learning-resources.md | 3 +-- website/www/site/content/en/documentation/runners/direct.md | 2 +- website/www/site/content/en/documentation/runners/jstorm.md | 4 +--- .../www/site/content/en/documentation/runtime/environments.md | 2 +- 15 files changed, 22 insertions(+), 25 deletions(-) diff --git a/CHANGES.md b/CHANGES.md index 8168402c859..8e70a27a1aa 100644 --- a/CHANGES.md +++ b/CHANGES.md @@ -441,7 +441,7 @@ ## New Features / Improvements * DataFrame API now supports pandas 1.2.x ([BEAM-11531](https://issues.apache.org/jira/browse/BEAM-11531)). -* Multiple DataFrame API bugfixes ([BEAM-12071](https://issues.apache/jira/browse/BEAM-12071), [BEAM-11929](https://issues.apache/jira/browse/BEAM-11929)) +* Multiple DataFrame API bugfixes ([BEAM-12071](https://issues.apache.org/jira/browse/BEAM-12071), [BEAM-11929](https://issues.apache.org/jira/browse/BEAM-11929)) ## Breaking Changes diff --git a/website/www/site/content/en/blog/beam-2.29.0.md b/website/www/site/content/en/blog/beam-2.29.0.md index e2b3e1c9694..4bf577d77b1 100644 --- a/website/www/site/content/en/blog/beam-2.29.0.md +++ b/website/www/site/content/en/blog/beam-2.29.0.md @@ -42,7 +42,7 @@ For more information on changes in 2.29.0, check out the [detailed release notes ### New Features / Improvements * DataFrame API now supports pandas 1.2.x ([BEAM-11531](https://issues.apache.org/jira/browse/BEAM-11531)). -* Multiple DataFrame API bugfixes ([BEAM-12071](https://issues.apache/jira/browse/BEAM-12071), [BEAM-11929](https://issues.apache/jira/browse/BEAM-11929)) +* Multiple DataFrame API bugfixes ([BEAM-12071](https://issues.apache.org/jira/browse/BEAM-12071), [BEAM-11929](https://issues.apache.org/jira/browse/BEAM-11929)) * DDL supported in SQL transforms ([BEAM-11850](https://issues.apache.org/jira/browse/BEAM-11850)) * Upgrade Flink runner to Flink version 1.12.2 ([BEAM-11941](https://issues.apache.org/jira/browse/BEAM-11941)) diff --git a/website/www/site/content/en/blog/beam-a-look-back.md b/website/www/site/content/en/blog/beam-a-look-back.md index 87d1bc217ee..221272e94d1 100644 --- a/website/www/site/content/en/blog/beam-a-look-back.md +++ b/website/www/site/content/en/blog/beam-a-look-back.md @@ -61,7 +61,7 @@ new and updated runners were developed: - Apache Spark 2.x update - [IBM Streams runner](https://www.ibm.com/blogs/bluemix/2017/10/streaming-analytics-updates-ibm-streams-runner-apache-beam-2-0/) - MapReduce runner - - [JStorm runner](http://jstorm.io/) + - [JStorm runner](https://github.com/alibaba/jstorm) In addition to runners, Beam added new IO connectors, some notable ones being the Cassandra, MQTT, AMQP, HBase/HCatalog, JDBC, Solr, Tika, Redis, and diff --git a/website/www/site/content/en/blog/beam-summit-digital-2020.md b/website/www/site/content/en/blog/beam-summit-digital-2020.md index b5e5333f6c7..903ee5a4410 100644 --- a/website/www/site/content/en/blog/beam-summit-digital-2020.md +++ b/website/www/site/content/en/blog/beam-summit-digital-2020.md @@ -45,8 +45,8 @@ As all things Beam, this is a community effort. The door is open for participati 1. Submit a proposal to talk. Please check out the **[Call for Papers](https://sessionize.com/beam-digital-summit-2020/)** and submit a talk. The deadline for submissions is _June 15th_! 2. Register to join as an attendee. Registration is now open at the **[registration page](https://crowdcast.io/e/beamsummit)**. Registration is free! -3. Consider sponsoring the event. If your company is interested in engaging with members of the community please check out our [sponsoring prospectus](https://drive.google.com/open?id=1EbijvZKpkWwWyMryLY9sJfyZzZk1k44v). -4. Help us get the word out. Please make sure to let your colleagues and friends in the data engineering field (and beyond!) know about the Beam Summit. +<!--- 3. Consider sponsoring the event. If your company is interested in engaging with members of the community please check out our sponsoring prospectus.---> +3. Help us get the word out. Please make sure to let your colleagues and friends in the data engineering field (and beyond!) know about the Beam Summit. ## Follow up and more information diff --git a/website/www/site/content/en/blog/beam-summit-europe-2019.md b/website/www/site/content/en/blog/beam-summit-europe-2019.md index 0d303b202c0..7f5351f9efd 100644 --- a/website/www/site/content/en/blog/beam-summit-europe-2019.md +++ b/website/www/site/content/en/blog/beam-summit-europe-2019.md @@ -56,7 +56,7 @@ Keep an eye out for a meetup in [Paris](https://www.meetup.com/Paris-Apache-Beam If you are interested in starting your own meetup, feel free [to reach out](https://beam.apache.org/community/contact-us)! Good places to start include our Slack channel, the dev and user mailing lists, or the Apache Beam Twitter. -Even if you can’t travel to these meetups, you can stay informed on the happenings of the community. The talks and sessions from previous conferences and meetups are archived on the [Apache Beam YouTube channel](https://www.youtube.com/c/ApacheBeamYT). If you want your session added to the channel, don’t hesitate to get in touch! And in case you want to attend the next Beam event in style, you can also order your swag on the [Beam swag store](https://store-beam.myshopify.com) +Even if you can’t travel to these meetups, you can stay informed on the happenings of the community. The talks and sessions from previous conferences and meetups are archived on the [Apache Beam YouTube channel](https://www.youtube.com/c/ApacheBeamYT). If you want your session added to the channel, don’t hesitate to get in touch! ## Summits The first summit of the year will be held in Berlin: diff --git a/website/www/site/content/en/blog/review-input-streaming-connectors.md b/website/www/site/content/en/blog/review-input-streaming-connectors.md index 7c4f7a912c7..9cafe96747f 100644 --- a/website/www/site/content/en/blog/review-input-streaming-connectors.md +++ b/website/www/site/content/en/blog/review-input-streaming-connectors.md @@ -127,7 +127,7 @@ and <a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/stre Beam has an official [Python SDK](/documentation/sdks/python/) that currently supports a subset of the streaming features available in the Java SDK. Active development is underway to bridge the gap between the featuresets in the two SDKs. Currently for Python, the [Direct Runner](/documentation/runners/direct/) and [Dataflow Runner](/documentation/runners/dataflow/) are supported, and [several streaming options](/documentation/sdks/python-streaming/) were introduced in beta in [version 2 [...] -Spark also has a Python SDK called [PySpark](https://spark.apache.org/docs/latest/api/python/pyspark.html). As mentioned earlier, Scala code compiles to a bytecode that is executed by the JVM. PySpark uses [Py4J](https://www.py4j.org/), a library that enables Python programs to interact with the JVM and therefore access Java libraries, interact with Java objects, and register callbacks from Java. This allows PySpark to access native Spark objects like RDDs. Spark Structured Streaming sup [...] +Spark also has a Python SDK called [PySpark](https://spark.apache.org/docs/latest/api/python/index.html). As mentioned earlier, Scala code compiles to a bytecode that is executed by the JVM. PySpark uses [Py4J](https://www.py4j.org/), a library that enables Python programs to interact with the JVM and therefore access Java libraries, interact with Java objects, and register callbacks from Java. This allows PySpark to access native Spark objects like RDDs. Spark Structured Streaming suppo [...] Below are the main streaming input connectors for available for Beam and Spark DStreams in Python: @@ -149,7 +149,7 @@ Below are the main streaming input connectors for available for Beam and Spark D </td> <td><a href="https://beam.apache.org/releases/pydoc/{{< param release_latest >}}/apache_beam.io.textio.html">io.textio</a> </td> - <td><a href="https://spark.apache.org/docs/latest/api/python/pyspark.streaming.html#pyspark.streaming.StreamingContext.textFileStream">textFileStream</a> + <td><a href="https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.streaming.StreamingContext.textFileStream.html">textFileStream</a> </td> </tr> <tr> @@ -158,7 +158,7 @@ Below are the main streaming input connectors for available for Beam and Spark D <td><a href="https://beam.apache.org/releases/pydoc/{{< param release_latest >}}/apache_beam.io.hadoopfilesystem.html">io.hadoopfilesystem</a> </td> <td><a href="https://spark.apache.org/docs/latest/api/java/org/apache/spark/SparkContext.html#hadoopConfiguration--">hadoopConfiguration</a> (Access through <code>sc._jsc</code> with Py4J) -and <a href="https://spark.apache.org/docs/latest/api/python/pyspark.streaming.html#pyspark.streaming.StreamingContext.textFileStream">textFileStream</a> +and <a href="https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.streaming.StreamingContext.textFileStream.html">textFileStream</a> </td> </tr> <tr> @@ -184,7 +184,7 @@ and <a href="https://spark.apache.org/docs/latest/api/python/pyspark.streaming.h </td> <td>N/A </td> - <td><a href="https://spark.apache.org/docs/latest/api/python/pyspark.streaming.html#pyspark.streaming.kafka.KafkaUtils">KafkaUtils</a> + <td><a href="https://spark.apache.org/docs/2.4.8/api/python/pyspark.streaming.html#pyspark.streaming.kafka.KafkaUtils">KafkaUtils</a> </td> </tr> <tr> @@ -192,7 +192,7 @@ and <a href="https://spark.apache.org/docs/latest/api/python/pyspark.streaming.h </td> <td>N/A </td> - <td><a href="https://spark.apache.org/docs/latest/api/python/pyspark.streaming.html#module-pyspark.streaming.kinesis">KinesisUtils</a> + <td><a href="https://spark.apache.org/docs/2.4.8/api/python/pyspark.streaming.html#pyspark.streaming.kinesis.KinesisUtils">KinesisUtils</a> </td> </tr> <tr> diff --git a/website/www/site/content/en/contribute/become-a-committer.md b/website/www/site/content/en/contribute/become-a-committer.md index 126010fb7b6..e7a78560430 100644 --- a/website/www/site/content/en/contribute/become-a-committer.md +++ b/website/www/site/content/en/contribute/become-a-committer.md @@ -39,8 +39,8 @@ makes someone a committer via nomination, discussion, and then majority vote. We use data from as many sources as possible to inform our reasoning. Here are some examples: - - [dev@ archives](https://lists.apache.org/list.html?d...@beam.apache.org) and [statistics](https://lists.apache.org/trends.html?d...@beam.apache.org) - - [user@ archives](https://lists.apache.org/list.html?u...@beam.apache.org) and [statistics](https://lists.apache.org/trends.html?u...@beam.apache.org) + - [dev@ archives](https://lists.apache.org/list.html?d...@beam.apache.org) + - [user@ archives](https://lists.apache.org/list.html?u...@beam.apache.org) - [`apache-beam` StackOverflow tag](https://stackoverflow.com/questions/tagged/apache-beam) - Git metrics for [Beam](https://github.com/apache/beam/graphs/contributors) - Code reviews given and received on diff --git a/website/www/site/content/en/contribute/release-guide.md b/website/www/site/content/en/contribute/release-guide.md index d47766638e0..1570306aba9 100644 --- a/website/www/site/content/en/contribute/release-guide.md +++ b/website/www/site/content/en/contribute/release-guide.md @@ -584,7 +584,7 @@ See the source of the script for more details, or to run commands manually in ca 1. Select repository `orgapachebeam-NNNN`. 1. Click the Close button. 1. When prompted for a description, enter “Apache Beam, version X, release candidate Y”. - 1. Review all staged artifacts on https://repository.apache.org/content/repositories/orgapachebeam-NNNN/. + 1. Review all staged artifacts on `https://repository.apache.org/content/repositories/orgapachebeam-NNNN/`. They should contain all relevant parts for each module, including `pom.xml`, jar, test jar, javadoc, etc. Artifact names should follow [the existing format](https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.beam%22) in which artifact name mirrors directory structure, e.g., `beam-sdks-java-io-kafka`. Carefully review any new artifacts. diff --git a/website/www/site/content/en/documentation/io/built-in/google-bigquery.md b/website/www/site/content/en/documentation/io/built-in/google-bigquery.md index c957f5e5ed6..1759016f706 100644 --- a/website/www/site/content/en/documentation/io/built-in/google-bigquery.md +++ b/website/www/site/content/en/documentation/io/built-in/google-bigquery.md @@ -92,7 +92,7 @@ a string, or use a [TableReference](https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/index.html?com/google/api/services/bigquery/model/TableReference.html) </span> <span class="language-py"> - [TableReference](https://github.com/googleapis/google-cloud-python/blob/master/bigquery/google/cloud/bigquery/table.py#L153) + [TableReference](https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.bigquery.html#table-references) </span> object. diff --git a/website/www/site/content/en/documentation/io/testing.md b/website/www/site/content/en/documentation/io/testing.md index 0f23bddd6e3..d7a17adf398 100644 --- a/website/www/site/content/en/documentation/io/testing.md +++ b/website/www/site/content/en/documentation/io/testing.md @@ -172,7 +172,7 @@ Example usage on Cloud Dataflow runner: Example usage on HDFS filesystem and Direct runner: -NOTE: Below setup will only work when /etc/hosts file contains entries with hadoop namenode and hadoop datanodes external IPs. Please see explanation in: [Small Cluster config file](https://github.com/apache/beam/blob/master/.test-infra/kubernetes/hadoop/SmallITCluster/pkb-config.yml) and [Large Cluster config file](https://github.com/apache/beam/blob/master/.test-infra/kubernetes/hadoop/LargeITCluster/pkb-config.yml). +NOTE: Below setup will only work when /etc/hosts file contains entries with hadoop namenode and hadoop datanodes external IPs. Please see explanation in: [Small Cluster config file](https://github.com/apache/beam/blob/master/.test-infra/kubernetes/hadoop/SmallITCluster/hdfs-single-datanode-cluster.yml) and [Large Cluster config file](https://github.com/apache/beam/blob/master/.test-infra/kubernetes/hadoop/LargeITCluster/hdfs-multi-datanode-cluster.yml). ``` export HADOOP_USER_NAME=root @@ -334,7 +334,7 @@ If you modified/added new Jenkins job definitions in your Pull Request, run the As mentioned before, we measure the performance of IOITs by gathering test execution times from Jenkins jobs that run periodically. The consequent results are stored in a database (BigQuery), therefore we can display them in a form of plots. -The dashboard gathering all the results is available here: [Performance Testing Dashboard](https://s.apache.org/io-test-dashboards) +The dashboard gathering all the results is available here: [Performance Testing Dashboard](http://metrics.beam.apache.org/d/1/getting-started?orgId=1&viewPanel=123125) ### Implementing Integration Tests {#implementing-integration-tests} diff --git a/website/www/site/content/en/documentation/programming-guide.md b/website/www/site/content/en/documentation/programming-guide.md index d2f06417e8e..998be6fcaeb 100644 --- a/website/www/site/content/en/documentation/programming-guide.md +++ b/website/www/site/content/en/documentation/programming-guide.md @@ -3971,7 +3971,7 @@ Standard Go types like `int`, `int64` `float64`, `[]byte`, and `string` and more Structs and pointers to structs default using Beam Schema Row encoding. However, users can build and register custom coders with `beam.RegisterCoder`. You can find available Coder functions in the -[coder](https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam/core/graph/coders) +[coder](https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder) package. {{< /paragraph >}} diff --git a/website/www/site/content/en/documentation/resources/learning-resources.md b/website/www/site/content/en/documentation/resources/learning-resources.md index 8da964e0206..8e7cedd75f1 100644 --- a/website/www/site/content/en/documentation/resources/learning-resources.md +++ b/website/www/site/content/en/documentation/resources/learning-resources.md @@ -97,8 +97,7 @@ If you have additional material that you would like to see here, please let us k ### Python -* **[Python Qwik Start](https://qwiklabs.com/focuses/1100?locale=en&parent=catalog)** (30m) - Run a word count pipeline on the Dataflow runner. -* **[NDVI from Landsat Images](https://qwiklabs.com/focuses/1849?locale=en&parent=catalog)** (45m) - Process Landsat satellite data in a distributed environment to compute the [Normalized Difference Vegetation Index](https://en.wikipedia.org/wiki/Normalized_difference_vegetation_index) (NDVI). +* **[Python Qwik Start](https://www.qwiklabs.com/focuses/1098?parent=catalog)** (30m) - Run a word count pipeline on the Dataflow runner. * **[Simulate historic flights](https://qwiklabs.com/focuses/1159?locale=en&parent=catalog)** (60m) - Simulate real-time historic internal flights in the United States and store the resulting simulated data in BigQuery. ## Beam Katas {#beam-katas} diff --git a/website/www/site/content/en/documentation/runners/direct.md b/website/www/site/content/en/documentation/runners/direct.md index 1249aa9a286..24acdf0bce3 100644 --- a/website/www/site/content/en/documentation/runners/direct.md +++ b/website/www/site/content/en/documentation/runners/direct.md @@ -36,7 +36,7 @@ Here are some resources with information about how to test your pipelines. <li class="language-java">The <a href="/get-started/wordcount-example/#testing-your-pipeline-with-asserts">Apache Beam WordCount Walkthrough</a> contains an example of logging and testing a pipeline with <a href="https://beam.apache.org/releases/javadoc/{{< param release_latest >}}/index.html?org/apache/beam/sdk/testing/PAssert.html">PAssert</a>.</li> <!-- Python specific links --> - <li class="language-py">The <a href="/get-started/wordcount-example/#testing-your-pipeline-with-asserts">Apache Beam WordCount Walkthrough</a> contains an example of logging and testing a pipeline with <a href="https://beam.apache.org/releases/pydoc/{{< param release_latest >}}/apache_beam.testing.util.html#apache_beam.testing.util.assert_that">assert_that</a>.</li> + <li class="language-py">The <a href="/get-started/wordcount-example/#testing-your-pipeline-with-asserts">Apache Beam WordCount Walkthrough</a> contains an example of logging and testing a pipeline with <code>assert_that</code>.</li> </ul> ## Direct Runner prerequisites and setup diff --git a/website/www/site/content/en/documentation/runners/jstorm.md b/website/www/site/content/en/documentation/runners/jstorm.md index cbf477d6127..6cbf00a8aa9 100644 --- a/website/www/site/content/en/documentation/runners/jstorm.md +++ b/website/www/site/content/en/documentation/runners/jstorm.md @@ -16,7 +16,7 @@ limitations under the License. --> # Using the JStorm Runner -The JStorm Runner can be used to execute Beam pipelines using [JStorm](http://jstorm.io/), while providing: +The JStorm Runner can be used to execute Beam pipelines using [JStorm](https://github.com/alibaba/jstorm), while providing: * High throughput and low latency. * At-least-once and exactly-once fault tolerance. @@ -52,8 +52,6 @@ When you submit a topology with argument `"--external-libs beam"`, JStorm will l jstorm jar WordCount.jar org.apache.beam.examples.WordCount --external-libs beam --runner=org.apache.beam.runners.jstorm.JStormRunner ``` -To learn about deploying a JStorm cluster, please refer to [JStorm cluster deploy](http://jstorm.io/QuickStart/Deploy/index.html) - ## Pipeline options for the JStorm Runner When executing your pipeline with the JStorm Runner, you should consider the following pipeline options. diff --git a/website/www/site/content/en/documentation/runtime/environments.md b/website/www/site/content/en/documentation/runtime/environments.md index aff23e2d3f4..2243bbe636b 100644 --- a/website/www/site/content/en/documentation/runtime/environments.md +++ b/website/www/site/content/en/documentation/runtime/environments.md @@ -102,7 +102,7 @@ This method requires building image artifacts from Beam source. For additional i git checkout origin/release-$BEAM_SDK_VERSION ``` -2. Customize the `Dockerfile` for a given language, typically `sdks/<language>/container/Dockerfile` directory (e.g. the [Dockerfile for Python](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile). If you're adding dependencies from [PyPI](https://pypi.org/), use [`base_image_requirements.txt`](https://github.com/apache/beam/blob/master/sdks/python/container/base_image_requirements.txt) instead. +2. Customize the `Dockerfile` for a given language, typically `sdks/<language>/container/Dockerfile` directory (e.g. the [Dockerfile for Python](https://github.com/apache/beam/blob/master/sdks/python/container/Dockerfile). 3. Return to the root Beam directory and run the Gradle `docker` target for your image.