Thanks for summarizing this discussion and post in dev list. I was closely working on Python performance tests and those Perfkit problems are really painful. So +1 to remove Perfkit and also remove those tests that are no longer maintained.
For #2 (Python performance tests), there are no special setup for them. The only missing part I can see is metrics collection and data upload to a shared storage (e.g. BigQuery), which is provided free in Perfkit framework. This seems common to all language, so wondering if a shared infra is possible. Mark On Wed, Jul 3, 2019 at 9:36 AM Lukasz Cwik <[email protected]> wrote: > Makes sense to me to move forward with your suggestion. > > On Wed, Jul 3, 2019 at 3:57 AM Łukasz Gajowy <[email protected]> > wrote: > >> Are there features in Perfkit that we would like to be using that we >>> aren't? >>> >> >> Besides the Kubernetes related code I mentioned above (that, I believe, >> can be easily replaced) I don't see any added value in having Perfkit. The >> Kubernetes parts could be replaced with a set of fine-grained Gradle tasks >> invoked by other high-level tasks and Jenkins job's steps. There also seem >> to be some Gradle + Kubernetes plugins out there that might prove useful >> here (no solid research in that area). >> >> >>> Can we make the integration with Perfkit less brittle? >>> >> >> There was an idea to move all beam benchmark's code from Perfkit ( >> beam_benchmark_helper.py >> <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/blob/5680e174ad1799056b4b6d4a6600ef9f93fe39ad/perfkitbenchmarker/beam_benchmark_helper.py> >> , beam_integration_benchmark.py >> <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/blob/7cdcea2561c66baa838e3ce4d776236a248e6700/perfkitbenchmarker/linux_benchmarks/beam_integration_benchmark.py>) >> to beam repository and inject it to Perfkit every time we use it. However, >> that would require investing time and effort in doing that and it will >> still not solve the problems I listed above. It will also still require >> knowledge of how Perfkit works from Beam developers while we can avoid that >> and use the existing tools (gradle, jenkins). >> >> Thanks! >> >> pt., 28 cze 2019 o 17:31 Lukasz Cwik <[email protected]> napisał(a): >> >>> +1 for removing tests that are not maintained. >>> >>> Are there features in Perfkit that we would like to be using that we >>> aren't? >>> Can we make the integration with Perfkit less brittle? >>> >>> If we aren't getting much and don't plan to get much value in the short >>> term, removal makes sense to me. >>> >>> On Thu, Jun 27, 2019 at 3:16 AM Łukasz Gajowy <[email protected]> >>> wrote: >>> >>>> Hi all, >>>> >>>> moving the discussion to the dev list: >>>> https://github.com/apache/beam/pull/8919. I think that Perfkit >>>> Benchmarker should be removed from all our tests. >>>> >>>> Problems that we face currently: >>>> >>>> 1. Changes to Gradle tasks/build configuration in the Beam codebase >>>> have to be reflected in Perfkit code. This required PRs to Perfkit which >>>> can last and the tests break due to this sometimes (no change in >>>> Perfkit + >>>> change already there in beam = incompatibility). This is what happened >>>> in >>>> PR 8919 (above), >>>> 2. Can't run in Python3 (depends on python 2 only library like >>>> functools32), >>>> 3. Black box testing which hard to collect pipeline related metrics, >>>> 4. Measurement of run time is inaccurate, >>>> 5. It offers relatively small elasticity in comparison with eg. >>>> Jenkins tasks in terms of setting up the testing infrastructure >>>> (runners, >>>> databases). For example, if we'd like to setup Flink runner, and reuse >>>> it >>>> in consequent tests in one go, that would be impossible. We can easily >>>> do >>>> this in Jenkins. >>>> >>>> Tests that use Perfkit: >>>> >>>> 1. IO integration tests, >>>> 2. Python performance tests, >>>> 3. beam_PerformanceTests_Dataflow (disabled), >>>> 4. beam_PerformanceTests_Spark (failing constantly - looks not >>>> maintained). >>>> >>>> From the IOIT perspective (1), only the code that setups/tears down >>>> Kubernetes resources is useful right now but these parts can be easily >>>> implemented in Jenkins/Gradle code. That would make Perfkit obsolete in >>>> IOIT because we already collect metrics using Metrics API and store them in >>>> BigQuery directly. >>>> >>>> As for point 2: I have no knowledge of how complex the task would be >>>> (help needed). >>>> >>>> Regarding 3, 4: Those tests seem to be not maintained - should we >>>> remove them? >>>> >>>> Opinions? >>>> >>>> Thank you, >>>> Łukasz >>>> >>>> >>>> >>>> >>>>
