Thanks for summarizing this discussion and post in dev list. I was closely
working on Python performance tests and those Perfkit problems are really
painful. So +1 to remove Perfkit and also remove those tests that are no
longer maintained.

For #2 (Python performance tests), there are no special setup for them. The
only missing part I can see is metrics collection and data upload to a
shared storage (e.g. BigQuery), which is provided free in Perfkit
framework. This seems common to all language, so wondering if a shared
infra is possible.

Mark

On Wed, Jul 3, 2019 at 9:36 AM Lukasz Cwik <lc...@google.com> wrote:

> Makes sense to me to move forward with your suggestion.
>
> On Wed, Jul 3, 2019 at 3:57 AM Łukasz Gajowy <lukasz.gaj...@gmail.com>
> wrote:
>
>> Are there features in Perfkit that we would like to be using that we
>>> aren't?
>>>
>>
>> Besides the Kubernetes related code I mentioned above (that, I believe,
>> can be easily replaced) I don't see any added value in having Perfkit. The
>> Kubernetes parts could be replaced with a set of fine-grained Gradle tasks
>> invoked by other high-level tasks and Jenkins job's steps. There also seem
>> to be some Gradle + Kubernetes plugins out there that might prove useful
>> here (no solid research in that area).
>>
>>
>>> Can we make the integration with Perfkit less brittle?
>>>
>>
>> There was an idea to move all beam benchmark's code from Perfkit (
>> beam_benchmark_helper.py
>> <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/blob/5680e174ad1799056b4b6d4a6600ef9f93fe39ad/perfkitbenchmarker/beam_benchmark_helper.py>
>> , beam_integration_benchmark.py
>> <https://github.com/GoogleCloudPlatform/PerfKitBenchmarker/blob/7cdcea2561c66baa838e3ce4d776236a248e6700/perfkitbenchmarker/linux_benchmarks/beam_integration_benchmark.py>)
>> to beam repository and inject it to Perfkit every time we use it. However,
>> that would require investing time and effort in doing that and it will
>> still not solve the problems I listed above. It will also still require
>> knowledge of how Perfkit works from Beam developers while we can avoid that
>> and use the existing tools (gradle, jenkins).
>>
>> Thanks!
>>
>> pt., 28 cze 2019 o 17:31 Lukasz Cwik <lc...@google.com> napisał(a):
>>
>>> +1 for removing tests that are not maintained.
>>>
>>> Are there features in Perfkit that we would like to be using that we
>>> aren't?
>>> Can we make the integration with Perfkit less brittle?
>>>
>>> If we aren't getting much and don't plan to get much value in the short
>>> term, removal makes sense to me.
>>>
>>> On Thu, Jun 27, 2019 at 3:16 AM Łukasz Gajowy <lgaj...@apache.org>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> moving the discussion to the dev list:
>>>> https://github.com/apache/beam/pull/8919. I think that Perfkit
>>>> Benchmarker should be removed from all our tests.
>>>>
>>>> Problems that we face currently:
>>>>
>>>>    1. Changes to Gradle tasks/build configuration in the Beam codebase
>>>>    have to be reflected in Perfkit code. This required PRs to Perfkit which
>>>>    can last and the tests break due to this sometimes (no change in 
>>>> Perfkit +
>>>>    change already there in beam = incompatibility). This is what happened 
>>>> in
>>>>    PR 8919 (above),
>>>>    2. Can't run in Python3 (depends on python 2 only library like
>>>>    functools32),
>>>>    3. Black box testing which hard to collect pipeline related metrics,
>>>>    4. Measurement of run time is inaccurate,
>>>>    5. It offers relatively small elasticity in comparison with eg.
>>>>    Jenkins tasks in terms of setting up the testing infrastructure 
>>>> (runners,
>>>>    databases). For example, if we'd like to setup Flink runner, and reuse 
>>>> it
>>>>    in consequent tests in one go, that would be impossible. We can easily 
>>>> do
>>>>    this in Jenkins.
>>>>
>>>> Tests that use Perfkit:
>>>>
>>>>    1.  IO integration tests,
>>>>    2.  Python performance tests,
>>>>    3.  beam_PerformanceTests_Dataflow (disabled),
>>>>    4.  beam_PerformanceTests_Spark (failing constantly - looks not
>>>>    maintained).
>>>>
>>>> From the IOIT perspective (1), only the code that setups/tears down
>>>> Kubernetes resources is useful right now but these parts can be easily
>>>> implemented in Jenkins/Gradle code. That would make Perfkit obsolete in
>>>> IOIT because we already collect metrics using Metrics API and store them in
>>>> BigQuery directly.
>>>>
>>>> As for point 2: I have no knowledge of how complex the task would be
>>>> (help needed).
>>>>
>>>> Regarding 3, 4: Those tests seem to be not maintained - should we
>>>> remove them?
>>>>
>>>> Opinions?
>>>>
>>>> Thank you,
>>>> Łukasz
>>>>
>>>>
>>>>
>>>>
>>>>

Reply via email to