Done!

On Mon, Jul 2, 2018 at 4:10 PM Scott Wegner <sc...@apache.org> wrote:

> Hey Rafael, looks like we need more 'INSTANCE_TEMPLATES' quota [1]. Can
> you take a look? I've filed [BEAM-4722]:
> https://issues.apache.org/jira/browse/BEAM-4722
>
> [1] https://github.com/apache/beam/pull/5861#issuecomment-401963630
>
> On Mon, Jul 2, 2018 at 11:33 AM Rafael Fernandez <rfern...@google.com>
> wrote:
>
>> OK, Scott just sent https://github.com/apache/beam/pull/5860 . Quotas
>> should not be a problem, if they are, please file a JIRA under gcp-quota.
>>
>> Cheers,
>> r
>>
>> On Mon, Jul 2, 2018 at 10:06 AM Kenneth Knowles <k...@google.com> wrote:
>>
>>> One thing that is nice when you do this is to be able to share your
>>> results. Though if all you are sharing is "they passed" then I guess we
>>> don't have to insist on evidence.
>>>
>>> Kenn
>>>
>>> On Mon, Jul 2, 2018 at 9:25 AM Scott Wegner <sc...@apache.org> wrote:
>>>
>>>> A few thoughts:
>>>>
>>>> * The Jenkins job getting backed up
>>>> is beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle_PR [1]. Since
>>>> Mikhail refactored Jenkins jobs, this only runs when explicitly requested
>>>> via "Run Dataflow ValidatesRunner", and only has 8 total runs. So this job
>>>> is idle more often than backlogged.
>>>>
>>>> * It's difficult to reason about our exact quota needs because Dataflow
>>>> jobs get launched from various Jenkins jobs that have different parallelism
>>>> configurations. If we have budget, we could enable concurrent execution of
>>>> this job and increase our quota enough to give some breathing room. If we
>>>> do this, I recommend limiting the max concurrency via
>>>> throttleConcurrentBuilds [2] to some reasonable limit.
>>>>
>>>> * This test suite is meant to be an exhaustive post-commit validation
>>>> of Dataflow runner, and tests a lot of different aspects of a runner. It
>>>> would be more efficient to run locally only the tests affected by your
>>>> change. Note that this requires having access to a GCP project with
>>>> billing, but most Dataflow developers probably have access to this already.
>>>> The command for this is:
>>>>
>>>> ./gradlew :beam-runners-google-cloud-dataflow-java:validatesRunner
>>>> -PdataflowProject=myGcpProject -PdataflowTempRoot=gs://myGcsTempRoot
>>>> --tests "org.apache.beam.MyTestClass"
>>>>
>>>> [1]
>>>> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle_PR/buildTimeTrend
>>>> [2]
>>>> https://jenkinsci.github.io/job-dsl-plugin/#method/javaposse.jobdsl.dsl.jobs.FreeStyleJob.throttleConcurrentBuilds
>>>>
>>>>
>>>> On Mon, Jul 2, 2018 at 8:33 AM Lukasz Cwik <lc...@google.com> wrote:
>>>>
>>>>> The validates runner test parallelism is controlled here and is
>>>>> currently set to be "unlimited":
>>>>>
>>>>> https://github.com/apache/beam/blob/fbfe6ceaea9d99cb1c8964087aafaa2bc2297a03/runners/google-cloud-dataflow-java/build.gradle#L115
>>>>>
>>>>> Each test fork is run on a different gradle worker, so the number of
>>>>> parallel test runs is limited to the max number of workers configured 
>>>>> which
>>>>> is controlled here:
>>>>>
>>>>> https://github.com/apache/beam/blob/fbfe6ceaea9d99cb1c8964087aafaa2bc2297a03/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow.groovy#L50
>>>>> It is currently configured to 3 * number of CPU cores.
>>>>>
>>>>> We are already running up to 48 Dataflow jobs in parallel.
>>>>>
>>>>>
>>>>> On Sat, Jun 30, 2018 at 9:51 AM Rafael Fernandez <rfern...@google.com>
>>>>> wrote:
>>>>>
>>>>>> - How many resources to ValidatesRunner tests use?
>>>>>> - Where are those settings?
>>>>>>
>>>>>> On Sat, Jun 30, 2018 at 9:50 AM Reuven Lax <re...@google.com> wrote:
>>>>>>
>>>>>>> The specific issue only affects Dataflow ValidatesRunner tests. We
>>>>>>> currently allow only one of these to run at a time, to control usage of
>>>>>>> Dataflow and of GCE quota. Other types of tests do not suffer from this
>>>>>>> issue.
>>>>>>>
>>>>>>> I would like to see if it's possible to increase Dataflow quota so
>>>>>>> we can run more of these in parallel. It took me 8 hours end to end to 
>>>>>>> run
>>>>>>> these tests (about 6 hours for the run to be scheduled). If there was a
>>>>>>> failure, I would have had to repeat the whole process. In the worst 
>>>>>>> case,
>>>>>>> this process could have taken me days. While this is not as pressing as
>>>>>>> some other issues (as most people don't need to run the Dataflow tests 
>>>>>>> on
>>>>>>> every PR), fixing it would make such changes much easier to manage.
>>>>>>>
>>>>>>> Reuven
>>>>>>>
>>>>>>> On Sat, Jun 30, 2018 at 9:32 AM Rafael Fernandez <
>>>>>>> rfern...@google.com> wrote:
>>>>>>>
>>>>>>>> +Reuven Lax <re...@google.com> told me yesterday that he was
>>>>>>>> waiting for some test to be scheduled and run, and it took 6 hours or 
>>>>>>>> so. I
>>>>>>>> would like to help reduce these wait times by increasing parallelism. I
>>>>>>>> need help understanding the continuous minimum of what we use. It 
>>>>>>>> seems the
>>>>>>>> following is true:
>>>>>>>>
>>>>>>>>
>>>>>>>>    - There seems to always be 16 jenkins machines on (16 CPUs each)
>>>>>>>>    - There seems to be three GKE machines always on (1 CPU each)
>>>>>>>>    - Most (if not all) unit tests run on 1 machine, and seem to
>>>>>>>>    run one-at-a-time <-- I think we can safely parallelize this to 20.
>>>>>>>>
>>>>>>>> With current quotas, if we parallelize to 20 concurrent unit tests,
>>>>>>>> we still have room for 80 other concurrent dataflow jobs to execute, 
>>>>>>>> with
>>>>>>>> 75% of CPU capacity.
>>>>>>>>
>>>>>>>> Thoughts? Additional data?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> r
>>>>>>>>
>>>>>>>

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to