Thank you Ahmet. Answer your questions below:

> - Could you comment on what kind of parallelization we will gain by this?
> In terms of real numbers, how would this affect build and test times?


The proposal is based on Gradle parallel execution
<https://guides.gradle.org/performance/#parallel_execution>: "you can force
Gradle to execute tasks in parallel as long as those tasks are in different
projects". In Beam, project is declared per build.gradle file and
registered in settings.gradle
<https://github.com/apache/beam/blob/master/settings.gradle>. Tasks that
are included in single Gradle execution will run in parallel only if they
are declared in separate build.gradle files.

An example of applying parallel is beam_PreCommit_Python
<https://builds.apache.org/job/beam_PreCommit_Python_Cron/> job which runs
:pythonPreCommit
<https://github.com/apache/beam/blob/master/build.gradle#L193> task that
contains tasks distributed in 4 build.gradle. The execution graph looks
like https://scans.gradle.com/s/4frpmto6o7hto/timeline:
[image: image.png]
Without this proposal, all tasks will run in sequential which can be ~2x
longer. If more py36 and py37 tests added in the future, things will be
even worse.

- I am guessing this will reduce complexity. Is it possible to quantify the
> improvement related to this?


The general code complexity of function/method/property may not change here
since we basically group tasks in a different way without changing inside
logic. I don't know if there is any tool to measure Gradle build
complexity. Would love to try if there is.


> - Beyond the proposal, I am assuming you are willing to work on. Just want
> to clarify this. In either case, would you need help?


Yes, I'd love to take on major refactor works. At the same time, I'll
create jira for each kind of tests (like flink/protable/hdfs tests) in
sdks/python/build.gradle to move into test-suites. Test owners or anyone
interested to this work are welcome to contribute!

Mark

On Wed, Mar 27, 2019 at 3:53 PM Ahmet Altay <al...@google.com> wrote:

> This sounds good to me. Thank you for doing this. Few questions:
> - Could you comment on what kind of parallelization we will gain by this?
> In terms of real numbers, how would this affect build and test times?
> - I am guessing this will reduce complexity. Is it possible to quantify
> the improvement related to this?
> - Beyond the proposal, I am assuming you are willing to work on. Just want
> to clarify this. In either case, would you need help?
>
> Thank you,
> Ahmet
>
> On Wed, Mar 27, 2019 at 10:19 AM Mark Liu <mark...@apache.org> wrote:
>
>> Hi Python SDK Developers,
>>
>> You may notice that Gradle files changed a lot recently as
>> parallelization
>> <https://guides.gradle.org/performance/#parallel_execution> applied to
>> Python tests and more python versions were enabled in testing. There are
>> tricks over the build scripts and tests are grown naturally and distributed
>> under sdks/python, which caused frictions (like rollback PR-8059
>> <https://github.com/apache/beam/pull/8059>).
>>
>> Thus, I created BEAM-6907
>> <https://issues.apache.org/jira/browse/BEAM-6907> and would like to
>> initiate some works to cleanup and standardize Gradle structure in Python
>> SDK. In general, I think we want to:
>>
>> - Apply parallel execution
>> - Share common tasks
>> - Centralize test related tasks
>> - Have a clear Gradle structure for projects/tasks
>>
>> This is Gradle directory structure I proposed:
>>
>> sdks/python/
>>
>> build.gradle    --> hold builds, snapshot, analytic tasks
>> test-suites/    --> all pre/post/VR test suites under here
>>
>> README.md
>>
>> dataflow/    --> grouped by runner or unit test (tox)
>>
>> py27/    --> grouped by py version
>>
>> build.gradle
>>
>> py35/
>>
>> ...
>>
>> direct/
>>
>> py27/
>>
>> ...
>>
>> flink/
>>
>> tox/
>> ...
>>
>>
>> The ideas are:
>> - Only keep builds, snapshot and analytic jobs in sdks/python/build.gradle
>> - Move all test related tasks to sdks/python/test-suites/
>> - In sdks/python/test-suites, we first group by runners, unit test or
>> other testing that can't fit to them, and then group by py versions if
>> needed.
>> - An example of ../test-suites/../py35/build.gradle is this
>> <https://github.com/apache/beam/blob/master/sdks/python/test-suites/dataflow/py3/build.gradle>
>> .
>>
>> Please feel free to explore existing Gradle scripts in Python SDK and
>> bring any thoughts on this proposal if you have.
>>
>> Thanks!
>> Mark
>>
>

Reply via email to