Hey Beam community,
We are glad to announce some new changes that our team has been working on for
a while to implement new solutions and enhance the existing ones. The main
focus was to improve the Beam infrastructure by increasing test coverage,
adding a reporting mechanism, and enhancing the level of code analysis:
* Load and Stress Tests for streaming cases
We've laid the foundation for implementing Stress Tests to be used for writing
new tests and improving existing ones. Stress Tests were introduced for the
following IOs:
* BigQueryIO
* BigTableIO
* KafkaIO
* PubSubIO
* SpannerIO
We've also implemented a new Load Test for PubSubIO.
The intention behind the Stress Tests is to measure how a write operation
behaves under load and use the results to define potential SLAs for IOs. As a
result, we came up with a document describing all the experiments conducted
during the implementation, which helped us identify some bugs related to
missing records. The document contains a set of prerequisites, links to the PRs
and write jobs.
For more information about the experiments:
https://docs.google.com/document/d/1CVywXz7WwidIMYEp0iAMmQmmMfcExDvkGDK3dOq1bUs/edit?usp=sharing
* Training DuetAI for Dataflow
We continue to enrich the knowledge base of DuetAI so that it knows even more
about Apache Beam: starting from basic questions related to documentation and
ending with generating code examples on how to use I/O connectors and
explaining to the user what a particular piece of code provided by them does.
The knowledge base contains 56 prompt/response pairs for documentation lookup,
11 code generation prompts and 11 code explanation prompts, covering various
I/O connectors implemented in Java and Python.
See the knowledge base:
https://github.com/apache/beam/tree/master/learning/prompts
* Beam Flaky Test Detection
We've developed a reporting mechanism to notify about flaky test cases when
constant failed runs occur. Previously, there were no clear signals on what
tests were consistently flaky. Now, the tool monitors the current statistics
and creates a GitHub issue with a link to Grafana attached. You may have
noticed the open issues with the name "The <job_name> is flaky" in the daily
Beam High Priority Issue Report.
For more information on how the tool works:
https://docs.google.com/document/d/13lwRAWoE7XA2ig0TDt98pI_nVBEQQ2UYqeUPJ0rGnME/edit?usp=sharing
* Beam Code Coverage Analysis
There were some gaps in Python code coverage and no coverage analysis for Java.
As a result, we fixed configuration issues for the Jacoco plugin to generate
.xml files, which are used to display statistics in the Codecov report, and
adjusted the configuration for Python.
For more details:
https://docs.google.com/document/d/1186dvd1t774EydPW0T31ynmwYxjqXUo9bh9nCO17rO0/edit?usp=sharing
* Beam Playground
We've added a Playground CI Nightly check to make sure that Playground examples
remain functioning between SDK changes, etc. This will help ensure that the
examples are always up-to-date and that users can successfully use them.
Taking this opportunity, I would like to thank our team for these changes:
* Vlado Djerek<https://github.com/volatilemolotov>
([email protected]<mailto:[email protected]>)
* Vitaly Terentyev<https://github.com/Amar3tto>
([email protected]<mailto:[email protected]>)
* Akarys Shorabek<https://github.com/akashorabek>
([email protected]<mailto:[email protected]>)
* Oleg Borisevich<https://github.com/olehborysevych>
([email protected]<mailto:[email protected]>)
* Daria Bezkorovaina<https://github.com/dariabezkorovaina>
([email protected]<mailto:[email protected]>)
* Danny McCormick<https://github.com/damccorm>
([email protected]<mailto:[email protected]>)
* Yi Hu<https://github.com/Abacn>
([email protected]<mailto:[email protected]>)
* XQ Hu<https://github.com/liferoad>
([email protected]<mailto:[email protected]>)
Feel free to reach out to any of us if you have any questions.
Thanks,
Andrey