This is an automated email from the ASF dual-hosted git repository. mergebot-role pushed a commit to branch mergebot in repository https://gitbox.apache.org/repos/asf/beam-site.git
commit 9da560eef802eac7c7fb29dedc71387328776b3b Author: Stephen Sisk <s...@google.com> AuthorDate: Wed Jul 19 16:28:32 2017 -0700 Updated IO IT docs based on PR feedback --- src/documentation/io/testing.md | 49 ++++++++++++++++++++--------------------- 1 file changed, 24 insertions(+), 25 deletions(-) diff --git a/src/documentation/io/testing.md b/src/documentation/io/testing.md index 26ebc55..6281b5f 100644 --- a/src/documentation/io/testing.md +++ b/src/documentation/io/testing.md @@ -111,6 +111,7 @@ If your I/O transform allows batching of reads/writes, you must force the batchi ## I/O Transform Integration Tests {#i-o-transform-integration-tests} +> We do not currently have examples of Python I/O integration tests or integration tests for unbounded or eventually consistent data stores. We would welcome contributions in these areas - please contact the Beam dev@ mailing list for more information. ### Goals {#it-goals} @@ -126,7 +127,7 @@ In order to test I/O transforms in real world conditions, you must connect to a The Beam community hosts the data stores used for integration tests in Kubernetes. In order for an integration test to be run in Beam's continuous integration environment, it must have Kubernetes scripts that set up an instance of the data store. -However, when working locally, there is no requirement to use Kubernetes. All of the test infrastructure allows passing in connection info, so developers can use their preferred hosting infrastructure for local development. +However, when working locally, there is no requirement to use Kubernetes. All of the test infrastructure allows you to pass in connection info, so developers can use their preferred hosting infrastructure for local development. ### Running integration tests {#running-integration-tests} @@ -136,18 +137,18 @@ The high level steps for running an integration test are: 1. Run the test, passing it connection info from the just created data store 1. Clean up the data store -Since setting up data stores and running the tests involves a number of steps, and we wish to time these tests when running performance benchmarks, we use PerfKit Benchmarker (PKB) to manage the process end to end. With a single command, you can go from an empty Kubernetes cluster to a running integration test. +Since setting up data stores and running the tests involves a number of steps, and we wish to time these tests when running performance benchmarks, we use PerfKit Benchmarker to manage the process end to end. With a single command, you can go from an empty Kubernetes cluster to a running integration test. -However, **PerfKit Benchmarker is not required for running integration tests**. Therefore, we have listed the steps for both using PerfKit, and manually running the tests below. +However, **PerfKit Benchmarker is not required for running integration tests**. Therefore, we have listed the steps for both using PerfKit Benchmarker, and manually running the tests below. #### Using PerfKit Benchmarker {#using-perfkit-benchmarker} Prerequisites: -1. [Install PerfKit](https://github.com/GoogleCloudPlatform/PerfKitBenchmarker) +1. [Install PerfKit Benchmarker](https://github.com/GoogleCloudPlatform/PerfKitBenchmarker) 1. Have a running Kubernetes cluster you can connect to locally using kubectl -You won't need to invoke PerfKit directly. Run mvn verify in the directory of the I/O module you'd like to test, with the parameter io-it-suite. +You won't need to invoke PerfKit Benchmarker directly. Run mvn verify in the directory of the I/O module you'd like to test, with the parameter io-it-suite. Example run with the direct runner: ``` @@ -179,13 +180,13 @@ Parameter descriptions: <tr> <td>-Dio-it-suite </td> - <td>Invokes the call to PerfKit. + <td>Invokes the call to PerfKit Benchmarker. </td> </tr> <tr> <td>-Dio-it-suite-local </td> - <td>Modifies the call to PerfKit so that it exposes the postgres service via LoadBalancer, making it available to users not on the immediate network of the kubernetes cluster. This is useful if you are running on a remote kubernetes cluster. + <td>Modifies the call to PerfKit Benchmarker so that it exposes the postgres service via LoadBalancer, making it available to users not on the immediate network of the kubernetes cluster. This is useful if you are running on a remote kubernetes cluster. </td> </tr> <tr> @@ -243,7 +244,7 @@ If you're using Kubernetes, make sure you can connect to your cluster locally us There are three components necessary to implement an integration test: * **Test code**: the code that does the actual testing: interacting with the I/O transform, reading and writing data, and verifying the data. * **Kubernetes scripts**: a Kubernetes script that sets up the data store that will be used by the test code. -* **Integrate with PerfKit Benchmarker using io-it-suite**: this allows users to easily invoke perfkit, creating the Kubernetes resources and running the test code. +* **Integrate with PerfKit Benchmarker using io-it-suite**: this allows users to easily invoke PerfKit Benchmarker, creating the Kubernetes resources and running the test code. These three pieces are discussed in detail below. @@ -266,8 +267,6 @@ These are the conventions used by integration testing code: An end to end example of these principles can be found in [JdbcIOIT](https://github.com/ssisk/beam/blob/jdbc-it-perf/sdks/java/io/jdbc/src/test/java/org/apache/beam/sdk/io/jdbc/JdbcIOIT.java). -If you'd like to implement Python I/O integration tests or integration tests for unbounded or eventually consistent data stores, please contact the Beam dev@ mailing list for more information. - #### Kubernetes scripts {#kubernetes-scripts} @@ -296,9 +295,9 @@ Guidelines for creating a Beam data store Kubernetes script: #### Integrate with PerfKit Benchmarker {#integrate-with-perfkit-benchmarker} -To allow developers to easily invoke your I/O integration test, perform the following steps: -1. Create a PerfKit benchmark configuration file for the data store. Each pipeline option needed by the integration test should have a configuration entry. See [Defining the benchmark configuration file](#defining-the-benchmark-configuration-file) for information about what to include. -1. Modify the [Per-I/O mvn pom configuration](#per-i-o-mvn-pom-configuration). +To allow developers to easily invoke your I/O integration test, you must perform these two steps. The follow sections describe each step in more detail. +1. Create a PerfKit Benchmarker benchmark configuration file for the data store. Each pipeline option needed by the integration test should have a configuration entry. +1. Modify the per-I/O Maven pom configuration so that PerfKit Benchmarker can be invoked from Maven. The goal is that a checked in config has defaults such that other developers can run the test without changing the configuration. @@ -397,7 +396,7 @@ and may contain the following elements: <tr> <td>dynamic_pipeline_options </td> - <td>The set of mvn pipeline options that PerfKit will determine at runtime. + <td>The set of mvn pipeline options that PerfKit Benchmarker will determine at runtime. </td> </tr> <tr> @@ -425,15 +424,15 @@ and may contain the following elements: #### Per-I/O mvn pom configuration {#per-i-o-mvn-pom-configuration} -Each I/O is responsible for adding a section to its pom with a profile that invokes PerfKit with the proper parameters during the verify phase. Below are the set of PerfKit parameters and how to configure them. +Each I/O is responsible for adding a section to its pom with a profile that invokes PerfKit Benchmarker with the proper parameters during the verify phase. Below are the set of PerfKit Benchmarker parameters and how to configure them. -The [JdbcIO pom](https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/pom.xml) has an example of how to put these options together into a profile and invoke Python+PerfKit with them. +The [JdbcIO pom](https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/pom.xml) has an example of how to put these options together into a profile and invoke Python+PerfKit Benchmarker with them. <table class="table"> <thead> <tr> - <td><strong>PerfKit Parameter</strong> + <td><strong>PerfKit Benchmarker Parameter</strong> </td> <td><strong>Description</strong> </td> @@ -445,7 +444,7 @@ The [JdbcIO pom](https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/po <tr> <td>benchmarks </td> - <td>Defines the PerfKit benchmark to run. This is same for all I/O integration tests. + <td>Defines the PerfKit Benchmarker benchmark to run. This is same for all I/O integration tests. </td> <td>beam_integration_benchmark </td> @@ -453,7 +452,7 @@ The [JdbcIO pom](https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/po <tr> <td>beam_location </td> - <td>The location where PerfKit can find the Beam repository. + <td>The location where PerfKit Benchmarker can find the Beam repository. </td> <td>${beamRootProjectDir} - this is a variable you'll need to define for each maven pom. See example pom for an example. </td> @@ -469,7 +468,7 @@ The [JdbcIO pom](https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/po <tr> <td>beam_sdk </td> - <td>Whether PerfKit will run the Beam SDK for Java or Python. + <td>Whether PerfKit Benchmarker will run the Beam SDK for Java or Python. </td> <td>java </td> @@ -493,7 +492,7 @@ The [JdbcIO pom](https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/po <tr> <td>beam_it_module </td> - <td>The path to the pom that contains the test (needed for invoking the test with PerfKit). + <td>The path to the pom that contains the test (needed for invoking the test with PerfKit Benchmarker). </td> <td>sdks/java/io/jdbc </td> @@ -517,7 +516,7 @@ The [JdbcIO pom](https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/po <tr> <td>kubeconfig </td> - <td>The standard PerfKit parameter `kubeconfig`, which specifies where the Kubernetes config file lives. + <td>The standard PerfKit Benchmarker parameter `kubeconfig`, which specifies where the Kubernetes config file lives. </td> <td>Always use ${kubeconfig} </td> @@ -525,7 +524,7 @@ The [JdbcIO pom](https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/po <tr> <td>kubectl </td> - <td>The standard PerfKit parameter `kubectl`, which specifies where the kubectl binary lives. + <td>The standard PerfKit Benchmarker parameter `kubectl`, which specifies where the kubectl binary lives. </td> <td>Always use ${kubectl} </td> @@ -542,7 +541,7 @@ The [JdbcIO pom](https://github.com/apache/beam/blob/master/sdks/java/io/jdbc/po </table> -There is also a set of Maven properties which are useful when invoking PerfKit. These properties are configured in the I/O parent pom, and some are only available when the io-it-suite profile is active in Maven. +There is also a set of Maven properties which are useful when invoking PerfKit Benchmarker. These properties are configured in the I/O parent pom, and some are only available when the io-it-suite profile is active in Maven. #### Small Scale and Large Scale Integration Tests {#small-scale-and-large-scale-integration-tests} @@ -561,7 +560,7 @@ You can do this by: 1. Creating two Kubernetes scripts: one for a small instance of the data store, and one for a large instance. 1. Having your test take a pipeline option that decides whether to generate a small or large amount of test data (where small and large are sizes appropriate to your data store) -An example of this is `HadoopInputFormatIO`'s tests. +An example of this is [HadoopInputFormatIO](https://github.com/apache/beam/tree/master/sdks/java/io/hadoop/input-format)'s tests. <!-- # Next steps -- To stop receiving notification emails like this one, please contact "commits@beam.apache.org" <commits@beam.apache.org>.