Hi all,

moving the discussion to the dev list:
https://github.com/apache/beam/pull/8919. I think that Perfkit Benchmarker
should be removed from all our tests.

Problems that we face currently:

   1. Changes to Gradle tasks/build configuration in the Beam codebase have
   to be reflected in Perfkit code. This required PRs to Perfkit which can
   last and the tests break due to this sometimes (no change in Perfkit +
   change already there in beam = incompatibility). This is what happened in
   PR 8919 (above),
   2. Can't run in Python3 (depends on python 2 only library like
   functools32),
   3. Black box testing which hard to collect pipeline related metrics,
   4. Measurement of run time is inaccurate,
   5. It offers relatively small elasticity in comparison with eg. Jenkins
   tasks in terms of setting up the testing infrastructure (runners,
   databases). For example, if we'd like to setup Flink runner, and reuse it
   in consequent tests in one go, that would be impossible. We can easily do
   this in Jenkins.

Tests that use Perfkit:

   1.  IO integration tests,
   2.  Python performance tests,
   3.  beam_PerformanceTests_Dataflow (disabled),
   4.  beam_PerformanceTests_Spark (failing constantly - looks not
   maintained).

>From the IOIT perspective (1), only the code that setups/tears down
Kubernetes resources is useful right now but these parts can be easily
implemented in Jenkins/Gradle code. That would make Perfkit obsolete in
IOIT because we already collect metrics using Metrics API and store them in
BigQuery directly.

As for point 2: I have no knowledge of how complex the task would be (help
needed).

Regarding 3, 4: Those tests seem to be not maintained - should we remove
them?

Opinions?

Thank you,
Łukasz

Reply via email to