Beam Infrastructure - Highlights of Recent Changes

Andrey Devyatkin via dev Thu, 13 Jun 2024 08:27:06 -0700

Hey Beam community,

We are glad to announce some new changes that our team has been working on for 
a while to implement new solutions and enhance the existing ones. The main 
focus was to improve the Beam infrastructure by increasing test coverage, 
adding a reporting mechanism, and enhancing the level of code analysis:


  *   Load and Stress Tests for streaming cases
We've laid the foundation for implementing Stress Tests to be used for writing 
new tests and improving existing ones. Stress Tests were introduced for the 
following IOs:
     *   BigQueryIO
     *   BigTableIO
     *   KafkaIO
     *   PubSubIO
     *   SpannerIO
We've also implemented a new Load Test for PubSubIO.
The intention behind the Stress Tests is to measure how a write operation 
behaves under load and use the results to define potential SLAs for IOs. As a 
result, we came up with a document describing all the experiments conducted 
during the implementation, which helped us identify some bugs related to 
missing records. The document contains a set of prerequisites, links to the PRs 
and write jobs.
For more information about the experiments: 
https://docs.google.com/document/d/1CVywXz7WwidIMYEp0iAMmQmmMfcExDvkGDK3dOq1bUs/edit?usp=sharing

  *   Training DuetAI for Dataflow
We continue to enrich the knowledge base of DuetAI so that it knows even more 
about Apache Beam: starting from basic questions related to documentation and 
ending with generating code examples on how to use I/O connectors and 
explaining to the user what a particular piece of code provided by them does. 
The knowledge base contains 56 prompt/response pairs for documentation lookup, 
11 code generation prompts and 11 code explanation prompts, covering various 
I/O connectors implemented in Java and Python.
See the knowledge base: 
https://github.com/apache/beam/tree/master/learning/prompts
  *   Beam Flaky Test Detection
We've developed a reporting mechanism to notify about flaky test cases when 
constant failed runs occur. Previously, there were no clear signals on what 
tests were consistently flaky. Now, the tool monitors the current statistics 
and creates a GitHub issue with a link to Grafana attached. You may have 
noticed the open issues with the name "The <job_name> is flaky" in the daily 
Beam High Priority Issue Report.
For more information on how the tool works: 
https://docs.google.com/document/d/13lwRAWoE7XA2ig0TDt98pI_nVBEQQ2UYqeUPJ0rGnME/edit?usp=sharing
  *   Beam Code Coverage Analysis
There were some gaps in Python code coverage and no coverage analysis for Java. 
As a result, we fixed configuration issues for the Jacoco plugin to generate 
.xml files, which are used to display statistics in the Codecov report, and 
adjusted the configuration for Python.
For more details: 
https://docs.google.com/document/d/1186dvd1t774EydPW0T31ynmwYxjqXUo9bh9nCO17rO0/edit?usp=sharing
  *   Beam Playground
We've added a Playground CI Nightly check to make sure that Playground examples 
remain functioning between SDK changes, etc. This will help ensure that the 
examples are always up-to-date and that users can successfully use them.

Taking this opportunity, I would like to thank our team for these changes:

  *   Vlado Djerek<https://github.com/volatilemolotov> 
([email protected]<mailto:[email protected]>)
  *   Vitaly Terentyev<https://github.com/Amar3tto> 
([email protected]<mailto:[email protected]>)
  *   Akarys Shorabek<https://github.com/akashorabek> 
([email protected]<mailto:[email protected]>)
  *   Oleg Borisevich<https://github.com/olehborysevych> 
([email protected]<mailto:[email protected]>)
  *   Daria Bezkorovaina<https://github.com/dariabezkorovaina> 
([email protected]<mailto:[email protected]>)
  *   Danny McCormick<https://github.com/damccorm> 
([email protected]<mailto:[email protected]>)
  *   Yi Hu<https://github.com/Abacn> 
([email protected]<mailto:[email protected]>)
  *   XQ Hu<https://github.com/liferoad> 
([email protected]<mailto:[email protected]>)

Feel free to reach out to any of us if you have any questions.


Thanks,
Andrey

Beam Infrastructure - Highlights of Recent Changes

Reply via email to