The specific issue only affects Dataflow ValidatesRunner tests. We
currently allow only one of these to run at a time, to control usage of
Dataflow and of GCE quota. Other types of tests do not suffer from this
issue.

I would like to see if it's possible to increase Dataflow quota so we can
run more of these in parallel. It took me 8 hours end to end to run these
tests (about 6 hours for the run to be scheduled). If there was a failure,
I would have had to repeat the whole process. In the worst case, this
process could have taken me days. While this is not as pressing as some
other issues (as most people don't need to run the Dataflow tests on every
PR), fixing it would make such changes much easier to manage.

Reuven

On Sat, Jun 30, 2018 at 9:32 AM Rafael Fernandez <rfern...@google.com>
wrote:

> +Reuven Lax <re...@google.com> told me yesterday that he was waiting for
> some test to be scheduled and run, and it took 6 hours or so. I would like
> to help reduce these wait times by increasing parallelism. I need help
> understanding the continuous minimum of what we use. It seems the following
> is true:
>
>
>    - There seems to always be 16 jenkins machines on (16 CPUs each)
>    - There seems to be three GKE machines always on (1 CPU each)
>    - Most (if not all) unit tests run on 1 machine, and seem to run
>    one-at-a-time <-- I think we can safely parallelize this to 20.
>
> With current quotas, if we parallelize to 20 concurrent unit tests, we
> still have room for 80 other concurrent dataflow jobs to execute, with 75%
> of CPU capacity.
>
> Thoughts? Additional data?
>
> Thanks,
> r
>

Reply via email to