I don't see difference at first glance and no difference is expected.

We never utilized concurrent jobs originally, because job took ~1 hour and
was triggered once every 6 hours. At some point, I added triggering job
when new commit is available and this started triggering jobs in parallel
for each commit. That is unnecessary overhead for post-commits. Removing
concurrent job runs for post-commits triggers single job for multiple
commits that accumulated during execution of previous job.

I believe you are talking about triggering test cases concurrently withing
single Jenkins job. That was not changed.

--Mikhail

Have feedback <http://go/migryz-feedback>?


On Mon, Aug 6, 2018 at 2:44 PM Lukasz Cwik <lc...@google.com> wrote:

> How much slower did the post commits become after removing concurrency?
>
> On Thu, Aug 2, 2018 at 2:32 PM Mikhail Gryzykhin <mig...@google.com>
> wrote:
>
>> I've disabled concurrency for auto-triggered post-commits job. That
>> should reduce job scheduling considerably.
>>
>> I believe that this change should resolve quota issue we have seen this
>> time. I'll monitor if problem reappears.
>>
>> --Mikhail
>>
>> Have feedback <http://go/migryz-feedback>?
>>
>>
>> On Wed, Aug 1, 2018 at 9:40 AM Pablo Estrada <pabl...@google.com> wrote:
>>
>>> It feels to me like a peak of 60 jobs per minute is pretty high. If I
>>> understand correctly, we run up to 20 dataflow jobs in parallel per test
>>> suite? Or what's the number here?
>>>
>>> It is also true that most our tests are simple NeedsRunner tests, that
>>> test a couple elements, so the whole pipeline overhead is on startup. This
>>> may be improved by lumping tests together (though might we lose
>>> debuggability?).  Our average number of jobs is, I hope, muuuch smaller
>>> than 60 per minute...
>>>
>>> With all these considerations, I would lean more towards having a retry
>>> policy as the immediate solution.
>>> -P.
>>>
>>> On Wed, Aug 1, 2018 at 9:07 AM Andrew Pilloud <apill...@google.com>
>>> wrote:
>>>
>>>> I like 1 and 2. How do credentials get into Jenkins? Could we create a
>>>> user per Jenkins host?
>>>>
>>>> On Tue, Jul 31, 2018 at 4:33 PM Reuven Lax <re...@google.com> wrote:
>>>>
>>>>> There was also a proposal to lump multiple tests into a single
>>>>> Dataflow job instead of spinning up a separate Dataflow job for each test.
>>>>>
>>>>> On Tue, Jul 31, 2018 at 4:26 PM Mikhail Gryzykhin <mig...@google.com>
>>>>> wrote:
>>>>>
>>>>>> I synced with Rafael. Below is summary of discussion.
>>>>>>
>>>>>> This quota is CreateRequestsPerMinutePerUser and it has 60 requests
>>>>>> per user by default.
>>>>>>
>>>>>> I've created Jira [BEAM-5053](
>>>>>> https://issues.apache.org/jira/browse/BEAM-5053) for this.
>>>>>>
>>>>>> I see following options we can utilize:
>>>>>> 1. Add retry logic. Although this limits us to 1 dataflow job start
>>>>>> per second for whole Jenkins. In long scale this can also block one test
>>>>>> job if other jobs take all the slots.
>>>>>> 2. Utilize different users to spin Dataflow jobs.
>>>>>> 3. Find way to rise quota limit on Dataflow. By default the field
>>>>>> limits value to 60 requests per minute.
>>>>>> 4. Long run generic suggestion: limit amount of dataflow jobs we spin
>>>>>> up and move tests to the form of unit or component tests.
>>>>>>
>>>>>> Please, fill in any insights or ideas you have on this.
>>>>>>
>>>>>> Regards,
>>>>>> --Mikhail
>>>>>>
>>>>>> Have feedback <http://go/migryz-feedback>?
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 31, 2018 at 3:55 PM Mikhail Gryzykhin <mig...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Everyone,
>>>>>>>
>>>>>>> Seems that we hit quota issue again:
>>>>>>> https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/553/consoleFull
>>>>>>>
>>>>>>> Can someone share information on how was this triaged last time or
>>>>>>> guide me on possible follow-up actions?
>>>>>>>
>>>>>>> Regards,
>>>>>>> --Mikhail
>>>>>>>
>>>>>>> Have feedback <http://go/migryz-feedback>?
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jul 3, 2018 at 9:12 PM Rafael Fernandez <rfern...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Summary for all folks following this story -- and many thanks for
>>>>>>>> explaining configs to me and pointing me to files and such.
>>>>>>>>
>>>>>>>> - Scott made changes to the config and we can now run 3
>>>>>>>> ValidatesRunner.Dataflow in parallel (each run is about 2 hours)
>>>>>>>> - With the latest quota changes, we peaked at ~70% capacity in
>>>>>>>> concurrent Dataflow jobs when running those
>>>>>>>> - I've been keeping an eye on quota peaks for all resources today
>>>>>>>> and have not seen any worryisome limits overall.
>>>>>>>> - Also note there are improvements planned to the
>>>>>>>> ValidatesRunner.Dataflow test so various items get batched and the test
>>>>>>>> itself runs faster -- I believe it's on Alan's radar
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> r
>>>>>>>>
>>>>>>>> On Mon, Jul 2, 2018 at 4:23 PM Rafael Fernandez <
>>>>>>>> rfern...@google.com> wrote:
>>>>>>>>
>>>>>>>>> Done!
>>>>>>>>>
>>>>>>>>> On Mon, Jul 2, 2018 at 4:10 PM Scott Wegner <sc...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hey Rafael, looks like we need more 'INSTANCE_TEMPLATES' quota
>>>>>>>>>> [1]. Can you take a look? I've filed [BEAM-4722]:
>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-4722
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> https://github.com/apache/beam/pull/5861#issuecomment-401963630
>>>>>>>>>>
>>>>>>>>>> On Mon, Jul 2, 2018 at 11:33 AM Rafael Fernandez <
>>>>>>>>>> rfern...@google.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> OK, Scott just sent https://github.com/apache/beam/pull/5860 .
>>>>>>>>>>> Quotas should not be a problem, if they are, please file a JIRA 
>>>>>>>>>>> under
>>>>>>>>>>> gcp-quota.
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> r
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jul 2, 2018 at 10:06 AM Kenneth Knowles <k...@google.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> One thing that is nice when you do this is to be able to share
>>>>>>>>>>>> your results. Though if all you are sharing is "they passed" then 
>>>>>>>>>>>> I guess
>>>>>>>>>>>> we don't have to insist on evidence.
>>>>>>>>>>>>
>>>>>>>>>>>> Kenn
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jul 2, 2018 at 9:25 AM Scott Wegner <sc...@apache.org>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> A few thoughts:
>>>>>>>>>>>>>
>>>>>>>>>>>>> * The Jenkins job getting backed up
>>>>>>>>>>>>> is beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle_PR [1]. 
>>>>>>>>>>>>> Since
>>>>>>>>>>>>> Mikhail refactored Jenkins jobs, this only runs when explicitly 
>>>>>>>>>>>>> requested
>>>>>>>>>>>>> via "Run Dataflow ValidatesRunner", and only has 8 total runs. So 
>>>>>>>>>>>>> this job
>>>>>>>>>>>>> is idle more often than backlogged.
>>>>>>>>>>>>>
>>>>>>>>>>>>> * It's difficult to reason about our exact quota needs because
>>>>>>>>>>>>> Dataflow jobs get launched from various Jenkins jobs that have 
>>>>>>>>>>>>> different
>>>>>>>>>>>>> parallelism configurations. If we have budget, we could enable 
>>>>>>>>>>>>> concurrent
>>>>>>>>>>>>> execution of this job and increase our quota enough to give some 
>>>>>>>>>>>>> breathing
>>>>>>>>>>>>> room. If we do this, I recommend limiting the max concurrency via
>>>>>>>>>>>>> throttleConcurrentBuilds [2] to some reasonable limit.
>>>>>>>>>>>>>
>>>>>>>>>>>>> * This test suite is meant to be an exhaustive post-commit
>>>>>>>>>>>>> validation of Dataflow runner, and tests a lot of different 
>>>>>>>>>>>>> aspects of a
>>>>>>>>>>>>> runner. It would be more efficient to run locally only the tests 
>>>>>>>>>>>>> affected
>>>>>>>>>>>>> by your change. Note that this requires having access to a GCP 
>>>>>>>>>>>>> project with
>>>>>>>>>>>>> billing, but most Dataflow developers probably have access to 
>>>>>>>>>>>>> this already.
>>>>>>>>>>>>> The command for this is:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ./gradlew :beam-runners-google-cloud-dataflow-java:validatesRunner
>>>>>>>>>>>>> -PdataflowProject=myGcpProject 
>>>>>>>>>>>>> -PdataflowTempRoot=gs://myGcsTempRoot
>>>>>>>>>>>>> --tests "org.apache.beam.MyTestClass"
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]
>>>>>>>>>>>>> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle_PR/buildTimeTrend
>>>>>>>>>>>>> [2]
>>>>>>>>>>>>> https://jenkinsci.github.io/job-dsl-plugin/#method/javaposse.jobdsl.dsl.jobs.FreeStyleJob.throttleConcurrentBuilds
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jul 2, 2018 at 8:33 AM Lukasz Cwik <lc...@google.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The validates runner test parallelism is controlled here and
>>>>>>>>>>>>>> is currently set to be "unlimited":
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://github.com/apache/beam/blob/fbfe6ceaea9d99cb1c8964087aafaa2bc2297a03/runners/google-cloud-dataflow-java/build.gradle#L115
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Each test fork is run on a different gradle worker, so the
>>>>>>>>>>>>>> number of parallel test runs is limited to the max number of 
>>>>>>>>>>>>>> workers
>>>>>>>>>>>>>> configured which is controlled here:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://github.com/apache/beam/blob/fbfe6ceaea9d99cb1c8964087aafaa2bc2297a03/.test-infra/jenkins/job_PostCommit_Java_ValidatesRunner_Dataflow.groovy#L50
>>>>>>>>>>>>>> It is currently configured to 3 * number of CPU cores.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We are already running up to 48 Dataflow jobs in parallel.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Jun 30, 2018 at 9:51 AM Rafael Fernandez <
>>>>>>>>>>>>>> rfern...@google.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> - How many resources to ValidatesRunner tests use?
>>>>>>>>>>>>>>> - Where are those settings?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sat, Jun 30, 2018 at 9:50 AM Reuven Lax <re...@google.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The specific issue only affects Dataflow ValidatesRunner
>>>>>>>>>>>>>>>> tests. We currently allow only one of these to run at a time, 
>>>>>>>>>>>>>>>> to control
>>>>>>>>>>>>>>>> usage of Dataflow and of GCE quota. Other types of tests do 
>>>>>>>>>>>>>>>> not suffer from
>>>>>>>>>>>>>>>> this issue.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I would like to see if it's possible to increase Dataflow
>>>>>>>>>>>>>>>> quota so we can run more of these in parallel. It took me 8 
>>>>>>>>>>>>>>>> hours end to
>>>>>>>>>>>>>>>> end to run these tests (about 6 hours for the run to be 
>>>>>>>>>>>>>>>> scheduled). If
>>>>>>>>>>>>>>>> there was a failure, I would have had to repeat the whole 
>>>>>>>>>>>>>>>> process. In the
>>>>>>>>>>>>>>>> worst case, this process could have taken me days. While this 
>>>>>>>>>>>>>>>> is not as
>>>>>>>>>>>>>>>> pressing as some other issues (as most people don't need to 
>>>>>>>>>>>>>>>> run the
>>>>>>>>>>>>>>>> Dataflow tests on every PR), fixing it would make such changes 
>>>>>>>>>>>>>>>> much easier
>>>>>>>>>>>>>>>> to manage.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Reuven
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sat, Jun 30, 2018 at 9:32 AM Rafael Fernandez <
>>>>>>>>>>>>>>>> rfern...@google.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> +Reuven Lax <re...@google.com> told me yesterday that he
>>>>>>>>>>>>>>>>> was waiting for some test to be scheduled and run, and it 
>>>>>>>>>>>>>>>>> took 6 hours or
>>>>>>>>>>>>>>>>> so. I would like to help reduce these wait times by 
>>>>>>>>>>>>>>>>> increasing parallelism.
>>>>>>>>>>>>>>>>> I need help understanding the continuous minimum of what we 
>>>>>>>>>>>>>>>>> use. It seems
>>>>>>>>>>>>>>>>> the following is true:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>    - There seems to always be 16 jenkins machines on (16
>>>>>>>>>>>>>>>>>    CPUs each)
>>>>>>>>>>>>>>>>>    - There seems to be three GKE machines always on (1
>>>>>>>>>>>>>>>>>    CPU each)
>>>>>>>>>>>>>>>>>    - Most (if not all) unit tests run on 1 machine, and
>>>>>>>>>>>>>>>>>    seem to run one-at-a-time <-- I think we can safely 
>>>>>>>>>>>>>>>>> parallelize this to 20.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> With current quotas, if we parallelize to 20 concurrent
>>>>>>>>>>>>>>>>> unit tests, we still have room for 80 other concurrent 
>>>>>>>>>>>>>>>>> dataflow jobs to
>>>>>>>>>>>>>>>>> execute, with 75% of CPU capacity.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thoughts? Additional data?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> r
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>> Got feedback? go/pabloem-feedback
>>> <https://goto.google.com/pabloem-feedback>
>>>
>>

Reply via email to