Re: Stop using Perfkit Benchmarker tool in all tests?

2019-07-18 Thread Łukasz Gajowy
Thank you all for your comments. I started working on this. Here's the
issue I created: https://issues.apache.org/jira/browse/BEAM-7772

For #2 (Python performance tests), there are no special setup for them. The
only missing part I can see is metrics collection and data upload to a
shared storage (e.g. BigQuery), which is provided free in Perfkit
framework. This seems common to all language, so wondering if a shared
infra is possible.

Actually, metrics upload is a thing that for sure should be common to all
tests in all SDKs. We're currently investigating the possibility to migrate
from existing IOIT and load test dashboards to existing Grafana (community
metrics). Community metrics use Postgres database for storage - if we
decide to stick with it, we could have a common interface for accepting
rest (POST) requests that could be used by all SDKs.

Alternatively (and this sounds more tempting), maybe we could use
Prometheus[1] to store all metrics? It:
 - has grafana support [2],
 - has a "Push gateway interface" in multiple languages (Java, Python, Go
included), [3]
 - seems to be the defacto industry standard for storing and exposing
metrics

[1] https://prometheus.io/
[2] https://prometheus.io/docs/visualization/grafana/
[3] https://prometheus.io/docs/instrumenting/pushing/

wdyt?

Thanks!

pon., 8 lip 2019 o 20:11 Udi Meiri  napisał(a):

> The Python 3 incompatibility is reason enough to move off of Perfkit. (+1)
>
> On Mon, Jul 8, 2019 at 9:49 AM Mark Liu  wrote:
>
>> Thanks for summarizing this discussion and post in dev list. I was
>> closely working on Python performance tests and those Perfkit problems are
>> really painful. So +1 to remove Perfkit and also remove those tests that
>> are no longer maintained.
>>
>> For #2 (Python performance tests), there are no special setup for them.
>> The only missing part I can see is metrics collection and data upload to a
>> shared storage (e.g. BigQuery), which is provided free in Perfkit
>> framework. This seems common to all language, so wondering if a shared
>> infra is possible.
>>
>> Mark
>>
>> On Wed, Jul 3, 2019 at 9:36 AM Lukasz Cwik  wrote:
>>
>>> Makes sense to me to move forward with your suggestion.
>>>
>>> On Wed, Jul 3, 2019 at 3:57 AM Łukasz Gajowy 
>>> wrote:
>>>
 Are there features in Perfkit that we would like to be using that we
> aren't?
>

 Besides the Kubernetes related code I mentioned above (that, I believe,
 can be easily replaced) I don't see any added value in having Perfkit. The
 Kubernetes parts could be replaced with a set of fine-grained Gradle tasks
 invoked by other high-level tasks and Jenkins job's steps. There also seem
 to be some Gradle + Kubernetes plugins out there that might prove useful
 here (no solid research in that area).


> Can we make the integration with Perfkit less brittle?
>

 There was an idea to move all beam benchmark's code from Perfkit (
 beam_benchmark_helper.py
 
 , beam_integration_benchmark.py
 )
 to beam repository and inject it to Perfkit every time we use it. However,
 that would require investing time and effort in doing that and it will
 still not solve the problems I listed above. It will also still require
 knowledge of how Perfkit works from Beam developers while we can avoid that
 and use the existing tools (gradle, jenkins).

 Thanks!

 pt., 28 cze 2019 o 17:31 Lukasz Cwik  napisał(a):

> +1 for removing tests that are not maintained.
>
> Are there features in Perfkit that we would like to be using that we
> aren't?
> Can we make the integration with Perfkit less brittle?
>
> If we aren't getting much and don't plan to get much value in the
> short term, removal makes sense to me.
>
> On Thu, Jun 27, 2019 at 3:16 AM Łukasz Gajowy 
> wrote:
>
>> Hi all,
>>
>> moving the discussion to the dev list:
>> https://github.com/apache/beam/pull/8919. I think that Perfkit
>> Benchmarker should be removed from all our tests.
>>
>> Problems that we face currently:
>>
>>1. Changes to Gradle tasks/build configuration in the Beam
>>codebase have to be reflected in Perfkit code. This required PRs to 
>> Perfkit
>>which can last and the tests break due to this sometimes (no change in
>>Perfkit + change already there in beam = incompatibility). This is 
>> what
>>happened in PR 8919 (above),
>>2. Can't run in Python3 (depends on python 2 only library like
>>functools32),
>>3. Black box testing which hard to collect 

Re: Stop using Perfkit Benchmarker tool in all tests?

2019-07-08 Thread Udi Meiri
The Python 3 incompatibility is reason enough to move off of Perfkit. (+1)

On Mon, Jul 8, 2019 at 9:49 AM Mark Liu  wrote:

> Thanks for summarizing this discussion and post in dev list. I was closely
> working on Python performance tests and those Perfkit problems are really
> painful. So +1 to remove Perfkit and also remove those tests that are no
> longer maintained.
>
> For #2 (Python performance tests), there are no special setup for them.
> The only missing part I can see is metrics collection and data upload to a
> shared storage (e.g. BigQuery), which is provided free in Perfkit
> framework. This seems common to all language, so wondering if a shared
> infra is possible.
>
> Mark
>
> On Wed, Jul 3, 2019 at 9:36 AM Lukasz Cwik  wrote:
>
>> Makes sense to me to move forward with your suggestion.
>>
>> On Wed, Jul 3, 2019 at 3:57 AM Łukasz Gajowy 
>> wrote:
>>
>>> Are there features in Perfkit that we would like to be using that we
 aren't?

>>>
>>> Besides the Kubernetes related code I mentioned above (that, I believe,
>>> can be easily replaced) I don't see any added value in having Perfkit. The
>>> Kubernetes parts could be replaced with a set of fine-grained Gradle tasks
>>> invoked by other high-level tasks and Jenkins job's steps. There also seem
>>> to be some Gradle + Kubernetes plugins out there that might prove useful
>>> here (no solid research in that area).
>>>
>>>
 Can we make the integration with Perfkit less brittle?

>>>
>>> There was an idea to move all beam benchmark's code from Perfkit (
>>> beam_benchmark_helper.py
>>> 
>>> , beam_integration_benchmark.py
>>> )
>>> to beam repository and inject it to Perfkit every time we use it. However,
>>> that would require investing time and effort in doing that and it will
>>> still not solve the problems I listed above. It will also still require
>>> knowledge of how Perfkit works from Beam developers while we can avoid that
>>> and use the existing tools (gradle, jenkins).
>>>
>>> Thanks!
>>>
>>> pt., 28 cze 2019 o 17:31 Lukasz Cwik  napisał(a):
>>>
 +1 for removing tests that are not maintained.

 Are there features in Perfkit that we would like to be using that we
 aren't?
 Can we make the integration with Perfkit less brittle?

 If we aren't getting much and don't plan to get much value in the short
 term, removal makes sense to me.

 On Thu, Jun 27, 2019 at 3:16 AM Łukasz Gajowy 
 wrote:

> Hi all,
>
> moving the discussion to the dev list:
> https://github.com/apache/beam/pull/8919. I think that Perfkit
> Benchmarker should be removed from all our tests.
>
> Problems that we face currently:
>
>1. Changes to Gradle tasks/build configuration in the Beam
>codebase have to be reflected in Perfkit code. This required PRs to 
> Perfkit
>which can last and the tests break due to this sometimes (no change in
>Perfkit + change already there in beam = incompatibility). This is what
>happened in PR 8919 (above),
>2. Can't run in Python3 (depends on python 2 only library like
>functools32),
>3. Black box testing which hard to collect pipeline related
>metrics,
>4. Measurement of run time is inaccurate,
>5. It offers relatively small elasticity in comparison with eg.
>Jenkins tasks in terms of setting up the testing infrastructure 
> (runners,
>databases). For example, if we'd like to setup Flink runner, and reuse 
> it
>in consequent tests in one go, that would be impossible. We can easily 
> do
>this in Jenkins.
>
> Tests that use Perfkit:
>
>1.  IO integration tests,
>2.  Python performance tests,
>3.  beam_PerformanceTests_Dataflow (disabled),
>4.  beam_PerformanceTests_Spark (failing constantly - looks not
>maintained).
>
> From the IOIT perspective (1), only the code that setups/tears down
> Kubernetes resources is useful right now but these parts can be easily
> implemented in Jenkins/Gradle code. That would make Perfkit obsolete in
> IOIT because we already collect metrics using Metrics API and store them 
> in
> BigQuery directly.
>
> As for point 2: I have no knowledge of how complex the task would be
> (help needed).
>
> Regarding 3, 4: Those tests seem to be not maintained - should we
> remove them?
>
> Opinions?
>
> Thank you,
> Łukasz
>
>
>
>
>


smime.p7s
Description: S/MIME Cryptographic Signature


Re: Stop using Perfkit Benchmarker tool in all tests?

2019-07-03 Thread Lukasz Cwik
Makes sense to me to move forward with your suggestion.

On Wed, Jul 3, 2019 at 3:57 AM Łukasz Gajowy 
wrote:

> Are there features in Perfkit that we would like to be using that we
>> aren't?
>>
>
> Besides the Kubernetes related code I mentioned above (that, I believe,
> can be easily replaced) I don't see any added value in having Perfkit. The
> Kubernetes parts could be replaced with a set of fine-grained Gradle tasks
> invoked by other high-level tasks and Jenkins job's steps. There also seem
> to be some Gradle + Kubernetes plugins out there that might prove useful
> here (no solid research in that area).
>
>
>> Can we make the integration with Perfkit less brittle?
>>
>
> There was an idea to move all beam benchmark's code from Perfkit (
> beam_benchmark_helper.py
> 
> , beam_integration_benchmark.py
> )
> to beam repository and inject it to Perfkit every time we use it. However,
> that would require investing time and effort in doing that and it will
> still not solve the problems I listed above. It will also still require
> knowledge of how Perfkit works from Beam developers while we can avoid that
> and use the existing tools (gradle, jenkins).
>
> Thanks!
>
> pt., 28 cze 2019 o 17:31 Lukasz Cwik  napisał(a):
>
>> +1 for removing tests that are not maintained.
>>
>> Are there features in Perfkit that we would like to be using that we
>> aren't?
>> Can we make the integration with Perfkit less brittle?
>>
>> If we aren't getting much and don't plan to get much value in the short
>> term, removal makes sense to me.
>>
>> On Thu, Jun 27, 2019 at 3:16 AM Łukasz Gajowy  wrote:
>>
>>> Hi all,
>>>
>>> moving the discussion to the dev list:
>>> https://github.com/apache/beam/pull/8919. I think that Perfkit
>>> Benchmarker should be removed from all our tests.
>>>
>>> Problems that we face currently:
>>>
>>>1. Changes to Gradle tasks/build configuration in the Beam codebase
>>>have to be reflected in Perfkit code. This required PRs to Perfkit which
>>>can last and the tests break due to this sometimes (no change in Perfkit 
>>> +
>>>change already there in beam = incompatibility). This is what happened in
>>>PR 8919 (above),
>>>2. Can't run in Python3 (depends on python 2 only library like
>>>functools32),
>>>3. Black box testing which hard to collect pipeline related metrics,
>>>4. Measurement of run time is inaccurate,
>>>5. It offers relatively small elasticity in comparison with eg.
>>>Jenkins tasks in terms of setting up the testing infrastructure (runners,
>>>databases). For example, if we'd like to setup Flink runner, and reuse it
>>>in consequent tests in one go, that would be impossible. We can easily do
>>>this in Jenkins.
>>>
>>> Tests that use Perfkit:
>>>
>>>1.  IO integration tests,
>>>2.  Python performance tests,
>>>3.  beam_PerformanceTests_Dataflow (disabled),
>>>4.  beam_PerformanceTests_Spark (failing constantly - looks not
>>>maintained).
>>>
>>> From the IOIT perspective (1), only the code that setups/tears down
>>> Kubernetes resources is useful right now but these parts can be easily
>>> implemented in Jenkins/Gradle code. That would make Perfkit obsolete in
>>> IOIT because we already collect metrics using Metrics API and store them in
>>> BigQuery directly.
>>>
>>> As for point 2: I have no knowledge of how complex the task would be
>>> (help needed).
>>>
>>> Regarding 3, 4: Those tests seem to be not maintained - should we remove
>>> them?
>>>
>>> Opinions?
>>>
>>> Thank you,
>>> Łukasz
>>>
>>>
>>>
>>>
>>>


Re: Stop using Perfkit Benchmarker tool in all tests?

2019-07-03 Thread Łukasz Gajowy
>
> Are there features in Perfkit that we would like to be using that we
> aren't?
>

Besides the Kubernetes related code I mentioned above (that, I believe, can
be easily replaced) I don't see any added value in having Perfkit. The
Kubernetes parts could be replaced with a set of fine-grained Gradle tasks
invoked by other high-level tasks and Jenkins job's steps. There also seem
to be some Gradle + Kubernetes plugins out there that might prove useful
here (no solid research in that area).


> Can we make the integration with Perfkit less brittle?
>

There was an idea to move all beam benchmark's code from Perfkit (
beam_benchmark_helper.py

, beam_integration_benchmark.py
)
to beam repository and inject it to Perfkit every time we use it. However,
that would require investing time and effort in doing that and it will
still not solve the problems I listed above. It will also still require
knowledge of how Perfkit works from Beam developers while we can avoid that
and use the existing tools (gradle, jenkins).

Thanks!

pt., 28 cze 2019 o 17:31 Lukasz Cwik  napisał(a):

> +1 for removing tests that are not maintained.
>
> Are there features in Perfkit that we would like to be using that we
> aren't?
> Can we make the integration with Perfkit less brittle?
>
> If we aren't getting much and don't plan to get much value in the short
> term, removal makes sense to me.
>
> On Thu, Jun 27, 2019 at 3:16 AM Łukasz Gajowy  wrote:
>
>> Hi all,
>>
>> moving the discussion to the dev list:
>> https://github.com/apache/beam/pull/8919. I think that Perfkit
>> Benchmarker should be removed from all our tests.
>>
>> Problems that we face currently:
>>
>>1. Changes to Gradle tasks/build configuration in the Beam codebase
>>have to be reflected in Perfkit code. This required PRs to Perfkit which
>>can last and the tests break due to this sometimes (no change in Perfkit +
>>change already there in beam = incompatibility). This is what happened in
>>PR 8919 (above),
>>2. Can't run in Python3 (depends on python 2 only library like
>>functools32),
>>3. Black box testing which hard to collect pipeline related metrics,
>>4. Measurement of run time is inaccurate,
>>5. It offers relatively small elasticity in comparison with eg.
>>Jenkins tasks in terms of setting up the testing infrastructure (runners,
>>databases). For example, if we'd like to setup Flink runner, and reuse it
>>in consequent tests in one go, that would be impossible. We can easily do
>>this in Jenkins.
>>
>> Tests that use Perfkit:
>>
>>1.  IO integration tests,
>>2.  Python performance tests,
>>3.  beam_PerformanceTests_Dataflow (disabled),
>>4.  beam_PerformanceTests_Spark (failing constantly - looks not
>>maintained).
>>
>> From the IOIT perspective (1), only the code that setups/tears down
>> Kubernetes resources is useful right now but these parts can be easily
>> implemented in Jenkins/Gradle code. That would make Perfkit obsolete in
>> IOIT because we already collect metrics using Metrics API and store them in
>> BigQuery directly.
>>
>> As for point 2: I have no knowledge of how complex the task would be
>> (help needed).
>>
>> Regarding 3, 4: Those tests seem to be not maintained - should we remove
>> them?
>>
>> Opinions?
>>
>> Thank you,
>> Łukasz
>>
>>
>>
>>
>>


Re: Stop using Perfkit Benchmarker tool in all tests?

2019-06-28 Thread Lukasz Cwik
+1 for removing tests that are not maintained.

Are there features in Perfkit that we would like to be using that we aren't?
Can we make the integration with Perfkit less brittle?

If we aren't getting much and don't plan to get much value in the short
term, removal makes sense to me.

On Thu, Jun 27, 2019 at 3:16 AM Łukasz Gajowy  wrote:

> Hi all,
>
> moving the discussion to the dev list:
> https://github.com/apache/beam/pull/8919. I think that Perfkit
> Benchmarker should be removed from all our tests.
>
> Problems that we face currently:
>
>1. Changes to Gradle tasks/build configuration in the Beam codebase
>have to be reflected in Perfkit code. This required PRs to Perfkit which
>can last and the tests break due to this sometimes (no change in Perfkit +
>change already there in beam = incompatibility). This is what happened in
>PR 8919 (above),
>2. Can't run in Python3 (depends on python 2 only library like
>functools32),
>3. Black box testing which hard to collect pipeline related metrics,
>4. Measurement of run time is inaccurate,
>5. It offers relatively small elasticity in comparison with eg.
>Jenkins tasks in terms of setting up the testing infrastructure (runners,
>databases). For example, if we'd like to setup Flink runner, and reuse it
>in consequent tests in one go, that would be impossible. We can easily do
>this in Jenkins.
>
> Tests that use Perfkit:
>
>1.  IO integration tests,
>2.  Python performance tests,
>3.  beam_PerformanceTests_Dataflow (disabled),
>4.  beam_PerformanceTests_Spark (failing constantly - looks not
>maintained).
>
> From the IOIT perspective (1), only the code that setups/tears down
> Kubernetes resources is useful right now but these parts can be easily
> implemented in Jenkins/Gradle code. That would make Perfkit obsolete in
> IOIT because we already collect metrics using Metrics API and store them in
> BigQuery directly.
>
> As for point 2: I have no knowledge of how complex the task would be (help
> needed).
>
> Regarding 3, 4: Those tests seem to be not maintained - should we remove
> them?
>
> Opinions?
>
> Thank you,
> Łukasz
>
>
>
>
>