Re: Python Precommit duration pushing 2 hours

Robert Bradshaw Mon, 11 Nov 2019 14:16:49 -0800

On Fri, Nov 8, 2019 at 5:45 PM Ahmet Altay <[email protected]> wrote:
>
> I looked at the log but I could not figure what is causing the timeout 
> because the gradle scan links are missing. I sampled a few of the successful 
> jobs, It seems like python 3.7 and python 2 are running 3 tests in serial 
> {interactive, py37cython, py37gcp} and {docs, py27cython, py27gcp} 
> respectively. These two versions are pushing the total time because other 
> variants are now only running {cython, gcp} versions.
>
> I suggest breaking up docs, and interactive into 2 separate suites of their 
> own. docs is actually faster than interactive,just separating that out to a 
> new suite might help.
>
> Interactive was recently added (https://github.com/apache/beam/pull/9741). 
> +Ning Kang could you separate interactive to new suite?


I would ask why interactive is a separate tox configuration at all; I
don't think there's a need to run every test again with a couple of
extra dependencies (adding ~30 minutes to every presumbit). I think it
would be much more valuable to run the (presumably relatively small)
set of interactive tests in all modes.

(The other suites are to guerentee the tests specifically run
*without* installing gcp and *without* compiling with Cython.)

> On Fri, Nov 8, 2019 at 11:09 AM Robert Bradshaw <[email protected]> wrote:
>>
>> Just saw another 2-hour timeout:
>> https://builds.apache.org/job/beam_PreCommit_Python_Commit/9440/ , so
>> perhaps we're not out of the woods yet (though in general things have
>> been a lot better).
>>
>> On Tue, Nov 5, 2019 at 10:52 AM Ahmet Altay <[email protected]> wrote:
>> >
>> > GCP tests are already on separate locations. IO related tests are under 
>> > /sdks/python/apache_beam/io/gcp and Dataflow related tests are under 
>> > sdks/python/apache_beam/runners/dataflow. It should be a matter of 
>> > changing gradle files to run either one of the base tests or GCP tests 
>> > depending on the types of changes. I do not expect this to have any 
>> > material impact on the precommit times because these two test suites take 
>> > about exactly the same time to complete.
>> >
>> > #9985 is merged now. Precommit times on master branch dropped to ~1h 20 
>> > for the last 5 runs.
>> >
>> > On Tue, Nov 5, 2019 at 10:12 AM David Cavazos <[email protected]> wrote:
>> >>
>> >> +1 to moving the GCP tests outside of core. If there are issues that only 
>> >> show up on GCP tests but not in core, it might be an indication that 
>> >> there needs to be another test in core covering that, but I think that 
>> >> should be pretty rare.
>> >>
>> >> On Mon, Nov 4, 2019 at 8:33 PM Kenneth Knowles <[email protected]> wrote:
>> >>>
>> >>> +1 to moving forward with this
>> >>>
>> >>> Could we move GCP tests outside the core? Then only code changes 
>> >>> touches/affecting GCP would cause them to run in precommit. Could still 
>> >>> run them in postcommit in their own suite. If the core has reasonably 
>> >>> stable abstractions that the connectors are built on, this should not 
>> >>> change coverage much.
>> >>>
>> >>> Kenn
>> >>>
>> >>> On Mon, Nov 4, 2019 at 1:55 PM Ahmet Altay <[email protected]> wrote:
>> >>>>
>> >>>> PR for the proposed change: https://github.com/apache/beam/pull/9985
>> >>>>
>> >>>> On Mon, Nov 4, 2019 at 1:35 PM Udi Meiri <[email protected]> wrote:
>> >>>>>
>> >>>>> +1
>> >>>>>
>> >>>>> On Mon, Nov 4, 2019 at 12:09 PM Robert Bradshaw <[email protected]> 
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> +1, this seems like a good step with a clear win.
>> >>>>>>
>> >>>>>> On Mon, Nov 4, 2019 at 12:06 PM Ahmet Altay <[email protected]> wrote:
>> >>>>>> >
>> >>>>>> > Python precommits are still timing out on #9925. I am guessing that 
>> >>>>>> > means this change would not be enough.
>> >>>>>> >
>> >>>>>> > I am proposing cutting down the number of test variants we run in 
>> >>>>>> > precommits. Currently for each version we ran the following 
>> >>>>>> > variants serially:
>> >>>>>> > - base: Runs all unit tests with tox
>> >>>>>> > - Cython: Installs cython and runs all unit tests as base version. 
>> >>>>>> > The original purpose was to ensure that tests pass with or without 
>> >>>>>> > cython. There is probably a huge overlap with base. (IIRC only a 
>> >>>>>> > few coders have different slow vs fast tests.)
>> >>>>>> > - GCP: Installs GCP dependencies and tests all base + additional 
>> >>>>>> > gcp specific tests. The original purpose was to ensure that GCP is 
>> >>>>>> > an optional component and all non-GCP tests still works without GCP 
>> >>>>>> > components.
>> >>>>>> >
>> >>>>>> > We can reduce the list to cython + GCP tests only. This will cover 
>> >>>>>> > the same group of tests and will check that tests pass with or 
>> >>>>>> > without cython or GCP dependencies. This could reduce the precommit 
>> >>>>>> > time by ~30 minutes.
>> >>>>>> >
>> >>>>>> > What do you think?
>> >>>>>> >
>> >>>>>> > Ahmet
>> >>>>>> >
>> >>>>>> >
>> >>>>>> > On Tue, Oct 29, 2019 at 11:15 AM Robert Bradshaw 
>> >>>>>> > <[email protected]> wrote:
>> >>>>>> >>
>> >>>>>> >> https://github.com/apache/beam/pull/9925
>> >>>>>> >>
>> >>>>>> >> On Tue, Oct 29, 2019 at 10:24 AM Udi Meiri <[email protected]> 
>> >>>>>> >> wrote:
>> >>>>>> >> >
>> >>>>>> >> > I don't have the bandwidth right now to tackle this. Feel free 
>> >>>>>> >> > to take it.
>> >>>>>> >> >
>> >>>>>> >> > On Tue, Oct 29, 2019 at 10:16 AM Robert Bradshaw 
>> >>>>>> >> > <[email protected]> wrote:
>> >>>>>> >> >>
>> >>>>>> >> >> The Python SDK does as well. These calls are coming from
>> >>>>>> >> >> to_runner_api, is_stateful_dofn, and validate_stateful_dofn 
>> >>>>>> >> >> which are
>> >>>>>> >> >> invoked once per pipene or bundle. They are, however, 
>> >>>>>> >> >> surprisingly
>> >>>>>> >> >> expensive. Even memoizing across those three calls should save a
>> >>>>>> >> >> significant amount of time. Udi, did you want to tackle this?
>> >>>>>> >> >>
>> >>>>>> >> >> Looking at the profile, Pipeline.to_runner_api() is being 
>> >>>>>> >> >> called 30
>> >>>>>> >> >> times in this test, and [Applied]PTransform.to_fn_api being 
>> >>>>>> >> >> called
>> >>>>>> >> >> 3111 times, so that in itself might be interesting to 
>> >>>>>> >> >> investigate.
>> >>>>>> >> >>
>> >>>>>> >> >> On Tue, Oct 29, 2019 at 8:26 AM Robert Burke 
>> >>>>>> >> >> <[email protected]> wrote:
>> >>>>>> >> >> >
>> >>>>>> >> >> > As does the Go SDK. Invokers are memoized and when possible 
>> >>>>>> >> >> > code is generated to avoid reflection.
>> >>>>>> >> >> >
>> >>>>>> >> >> > On Tue, Oct 29, 2019, 6:46 AM Kenneth Knowles 
>> >>>>>> >> >> > <[email protected]> wrote:
>> >>>>>> >> >> >>
>> >>>>>> >> >> >> Noting for the benefit of the thread archive in case someone 
>> >>>>>> >> >> >> goes digging and wonders if this affects other SDKs: the 
>> >>>>>> >> >> >> Java SDK memoizes DoFnSignatures and generated DoFnInvoker 
>> >>>>>> >> >> >> classes.
>> >>>>>> >> >> >>
>> >>>>>> >> >> >> Kenn
>> >>>>>> >> >> >>
>> >>>>>> >> >> >> On Mon, Oct 28, 2019 at 6:59 PM Udi Meiri <[email protected]> 
>> >>>>>> >> >> >> wrote:
>> >>>>>> >> >> >>>
>> >>>>>> >> >> >>> Re: #9283 slowing down tests, ideas for slowness:
>> >>>>>> >> >> >>> 1. I added a lot of test cases, some with locally run 
>> >>>>>> >> >> >>> pipelines.
>> >>>>>> >> >> >>> 2. The PR somehow changed how coders are selected, and now 
>> >>>>>> >> >> >>> we're using less efficient ones.
>> >>>>>> >> >> >>> 3. New dependency funcsigs is slowing things down? (py2 
>> >>>>>> >> >> >>> only)
>> >>>>>> >> >> >>>
>> >>>>>> >> >> >>> I ran "pytest -k PipelineAnalyzerTest --profile-svg" on 2.7 
>> >>>>>> >> >> >>> and 3.7 and got these cool graphs (attached).
>> >>>>>> >> >> >>> 2.7: core:294:get_function_arguments takes 56.66% of CPU 
>> >>>>>> >> >> >>> time (IIUC), gets called ~230k times
>> >>>>>> >> >> >>> 3.7: core:294:get_function_arguments 30.88%, gets called 
>> >>>>>> >> >> >>> ~200k times
>> >>>>>> >> >> >>>
>> >>>>>> >> >> >>> After memoization of get_function_args_defaults:
>> >>>>>> >> >> >>> 2.7: core:294:get_function_arguments 20.02%
>> >>>>>> >> >> >>> 3.7: core:294:get_function_arguments 8.11%
>> >>>>>> >> >> >>>
>> >>>>>> >> >> >>>
>> >>>>>> >> >> >>> On Mon, Oct 28, 2019 at 5:38 PM Pablo Estrada 
>> >>>>>> >> >> >>> <[email protected]> wrote:
>> >>>>>> >> >> >>>>
>> >>>>>> >> >> >>>> *not deciles, but 9-percentiles : )
>> >>>>>> >> >> >>>>
>> >>>>>> >> >> >>>> On Mon, Oct 28, 2019 at 5:31 PM Pablo Estrada 
>> >>>>>> >> >> >>>> <[email protected]> wrote:
>> >>>>>> >> >> >>>>>
>> >>>>>> >> >> >>>>> I've ran the tests in Python 2 (without cython), and used 
>> >>>>>> >> >> >>>>> a utility to track runtime for each test method. I found 
>> >>>>>> >> >> >>>>> some of the following things:
>> >>>>>> >> >> >>>>> - Total test methods run: 2665
>> >>>>>> >> >> >>>>> - Total test runtime: 990 seconds
>> >>>>>> >> >> >>>>> - Deciles of time spent:
>> >>>>>> >> >> >>>>>   - 1949 tests run in the first 9% of time
>> >>>>>> >> >> >>>>>   - 173 in the 9-18% rang3e
>> >>>>>> >> >> >>>>>   - 130 in the 18-27% range
>> >>>>>> >> >> >>>>>   - 95 in the 27-36% range
>> >>>>>> >> >> >>>>>   - 77
>> >>>>>> >> >> >>>>>   - 66
>> >>>>>> >> >> >>>>>   - 55
>> >>>>>> >> >> >>>>>   - 46
>> >>>>>> >> >> >>>>>   - 37
>> >>>>>> >> >> >>>>>   - 24
>> >>>>>> >> >> >>>>>   - 13 tests run in the last 9% of time. This represents 
>> >>>>>> >> >> >>>>> about 1 minute and a half.
>> >>>>>> >> >> >>>>>
>> >>>>>> >> >> >>>>> We may be able to look at the slowest X tests, and get 
>> >>>>>> >> >> >>>>> gradual improvements from there. Although it seems .. not 
>> >>>>>> >> >> >>>>> dramatic ones : )
>> >>>>>> >> >> >>>>>
>> >>>>>> >> >> >>>>> FWIW I uploaded the results here: 
>> >>>>>> >> >> >>>>> https://storage.googleapis.com/apache-beam-website-pull-requests/python-tests/nosetimes.json
>> >>>>>> >> >> >>>>>
>> >>>>>> >> >> >>>>> The slowest 13 tests were:
>> >>>>>> >> >> >>>>>
>> >>>>>> >> >> >>>>> [('apache_beam.runners.interactive.pipeline_analyzer_test.PipelineAnalyzerTest.test_basic',
>> >>>>>> >> >> >>>>>   5.253582000732422),
>> >>>>>> >> >> >>>>>  
>> >>>>>> >> >> >>>>> ('apache_beam.runners.interactive.interactive_runner_test.InteractiveRunnerTest.test_wordcount',
>> >>>>>> >> >> >>>>>   7.907713890075684),
>> >>>>>> >> >> >>>>>  
>> >>>>>> >> >> >>>>> ('apache_beam.io.gcp.bigquery_test.PipelineBasedStreamingInsertTest.test_failure_has_same_insert_ids',
>> >>>>>> >> >> >>>>>   5.237942934036255),
>> >>>>>> >> >> >>>>>  
>> >>>>>> >> >> >>>>> ('apache_beam.transforms.combiners_test.CombineTest.test_global_sample',
>> >>>>>> >> >> >>>>>   5.563946008682251),
>> >>>>>> >> >> >>>>>  
>> >>>>>> >> >> >>>>> ('apache_beam.runners.worker.sideinputs_test.EmulatedCollectionsTest.test_large_iterable_values',
>> >>>>>> >> >> >>>>>   5.680700063705444),
>> >>>>>> >> >> >>>>>  
>> >>>>>> >> >> >>>>> ('apache_beam.io.parquetio_test.TestParquet.test_sink_transform_multiple_row_group',
>> >>>>>> >> >> >>>>>   6.111238956451416),
>> >>>>>> >> >> >>>>>  
>> >>>>>> >> >> >>>>> ('apache_beam.runners.worker.statesampler_test.StateSamplerTest.test_basic_sampler',
>> >>>>>> >> >> >>>>>   6.007534980773926),
>> >>>>>> >> >> >>>>>  
>> >>>>>> >> >> >>>>> ('apache_beam.runners.interactive.interactive_runner_test.InteractiveRunnerTest.test_basic',
>> >>>>>> >> >> >>>>>   13.993916988372803),
>> >>>>>> >> >> >>>>>  
>> >>>>>> >> >> >>>>> ('apache_beam.runners.interactive.pipeline_analyzer_test.PipelineAnalyzerTest.test_read_cache_expansion',
>> >>>>>> >> >> >>>>>   6.3383049964904785),
>> >>>>>> >> >> >>>>>  
>> >>>>>> >> >> >>>>> ('apache_beam.runners.interactive.pipeline_analyzer_test.PipelineAnalyzerTest.test_word_count',
>> >>>>>> >> >> >>>>>   9.157485008239746),
>> >>>>>> >> >> >>>>>  
>> >>>>>> >> >> >>>>> ('apache_beam.runners.portability.portable_runner_test.PortableRunnerTestWithSubprocesses.test_pardo_side_and_main_outputs',
>> >>>>>> >> >> >>>>>   5.191173076629639),
>> >>>>>> >> >> >>>>>  
>> >>>>>> >> >> >>>>> ('apache_beam.io.vcfio_test.VcfSourceTest.test_pipeline_read_file_pattern_large',
>> >>>>>> >> >> >>>>>   6.2221620082855225),
>> >>>>>> >> >> >>>>>  
>> >>>>>> >> >> >>>>> ('apache_beam.io.fileio_test.WriteFilesTest.test_streaming_complex_timing',
>> >>>>>> >> >> >>>>>   7.7187910079956055)]
>> >>>>>> >> >> >>>>>
>> >>>>>> >> >> >>>>> On Mon, Oct 28, 2019 at 3:10 PM Pablo Estrada 
>> >>>>>> >> >> >>>>> <[email protected]> wrote:
>> >>>>>> >> >> >>>>>>
>> >>>>>> >> >> >>>>>> I have written https://github.com/apache/beam/pull/9910 
>> >>>>>> >> >> >>>>>> to reduce FnApiRunnerTest variations.
>> >>>>>> >> >> >>>>>> I'm not in a rush to merge, but rather happy to start a 
>> >>>>>> >> >> >>>>>> discussion.
>> >>>>>> >> >> >>>>>> I'll also try to figure out if there are other tests 
>> >>>>>> >> >> >>>>>> slowing down the suite significantly.
>> >>>>>> >> >> >>>>>> Best
>> >>>>>> >> >> >>>>>> -P.
>> >>>>>> >> >> >>>>>>
>> >>>>>> >> >> >>>>>> On Fri, Oct 25, 2019 at 7:41 PM Valentyn Tymofieiev 
>> >>>>>> >> >> >>>>>> <[email protected]> wrote:
>> >>>>>> >> >> >>>>>>>
>> >>>>>> >> >> >>>>>>> Thanks, Brian.
>> >>>>>> >> >> >>>>>>> +Udi Meiri
>> >>>>>> >> >> >>>>>>> As next step, it would be good to know whether slowdown 
>> >>>>>> >> >> >>>>>>> is caused by tests in this PR, or its effect on other 
>> >>>>>> >> >> >>>>>>> tests, and to confirm that only Python 2 codepaths were 
>> >>>>>> >> >> >>>>>>> affected.
>> >>>>>> >> >> >>>>>>>
>> >>>>>> >> >> >>>>>>> On Fri, Oct 25, 2019 at 6:35 PM Brian Hulette 
>> >>>>>> >> >> >>>>>>> <[email protected]> wrote:
>> >>>>>> >> >> >>>>>>>>
>> >>>>>> >> >> >>>>>>>> I did a bisect based on the runtime of `./gradlew 
>> >>>>>> >> >> >>>>>>>> :sdks:python:test-suites:tox:py2:testPy2Gcp` around 
>> >>>>>> >> >> >>>>>>>> the commits between 9/1 and 9/15 to see if I could 
>> >>>>>> >> >> >>>>>>>> find the source of the spike that happened around 9/6. 
>> >>>>>> >> >> >>>>>>>> It looks like it was due to PR#9283 [1]. I thought 
>> >>>>>> >> >> >>>>>>>> maybe this search would reveal some mis-guided 
>> >>>>>> >> >> >>>>>>>> configuration change, but as far as I can tell 9283 
>> >>>>>> >> >> >>>>>>>> just added a well-tested feature. I don't think 
>> >>>>>> >> >> >>>>>>>> there's anything to learn from that... I just wanted 
>> >>>>>> >> >> >>>>>>>> to circle back about it in case others are curious 
>> >>>>>> >> >> >>>>>>>> about that spike.
>> >>>>>> >> >> >>>>>>>>
>> >>>>>> >> >> >>>>>>>> I'm +1 on bumping some FnApiRunner configurations.
>> >>>>>> >> >> >>>>>>>>
>> >>>>>> >> >> >>>>>>>> Brian
>> >>>>>> >> >> >>>>>>>>
>> >>>>>> >> >> >>>>>>>> [1] https://github.com/apache/beam/pull/9283
>> >>>>>> >> >> >>>>>>>>
>> >>>>>> >> >> >>>>>>>> On Fri, Oct 25, 2019 at 4:49 PM Pablo Estrada 
>> >>>>>> >> >> >>>>>>>> <[email protected]> wrote:
>> >>>>>> >> >> >>>>>>>>>
>> >>>>>> >> >> >>>>>>>>> I think it makes sense to remove some of the extra 
>> >>>>>> >> >> >>>>>>>>> FnApiRunner configurations. Perhaps some of the 
>> >>>>>> >> >> >>>>>>>>> multiworkers and some of the grpc versions?
>> >>>>>> >> >> >>>>>>>>> Best
>> >>>>>> >> >> >>>>>>>>> -P.
>> >>>>>> >> >> >>>>>>>>>
>> >>>>>> >> >> >>>>>>>>> On Fri, Oct 25, 2019 at 12:27 PM Robert Bradshaw 
>> >>>>>> >> >> >>>>>>>>> <[email protected]> wrote:
>> >>>>>> >> >> >>>>>>>>>>
>> >>>>>> >> >> >>>>>>>>>> It looks like fn_api_runner_test.py is quite 
>> >>>>>> >> >> >>>>>>>>>> expensive, taking 10-15+
>> >>>>>> >> >> >>>>>>>>>> minutes on each version of Python. This test 
>> >>>>>> >> >> >>>>>>>>>> consists of a base class
>> >>>>>> >> >> >>>>>>>>>> that is basically a validates runner suite, and is 
>> >>>>>> >> >> >>>>>>>>>> then run in several
>> >>>>>> >> >> >>>>>>>>>> configurations, many more of which (including some 
>> >>>>>> >> >> >>>>>>>>>> expensive ones)
>> >>>>>> >> >> >>>>>>>>>> have been added lately.
>> >>>>>> >> >> >>>>>>>>>>
>> >>>>>> >> >> >>>>>>>>>> class FnApiRunnerTest(unittest.TestCase):
>> >>>>>> >> >> >>>>>>>>>> class FnApiRunnerTestWithGrpc(FnApiRunnerTest):
>> >>>>>> >> >> >>>>>>>>>> class 
>> >>>>>> >> >> >>>>>>>>>> FnApiRunnerTestWithGrpcMultiThreaded(FnApiRunnerTest):
>> >>>>>> >> >> >>>>>>>>>> class 
>> >>>>>> >> >> >>>>>>>>>> FnApiRunnerTestWithDisabledCaching(FnApiRunnerTest):
>> >>>>>> >> >> >>>>>>>>>> class 
>> >>>>>> >> >> >>>>>>>>>> FnApiRunnerTestWithMultiWorkers(FnApiRunnerTest):
>> >>>>>> >> >> >>>>>>>>>> class 
>> >>>>>> >> >> >>>>>>>>>> FnApiRunnerTestWithGrpcAndMultiWorkers(FnApiRunnerTest):
>> >>>>>> >> >> >>>>>>>>>> class 
>> >>>>>> >> >> >>>>>>>>>> FnApiRunnerTestWithBundleRepeat(FnApiRunnerTest):
>> >>>>>> >> >> >>>>>>>>>> class 
>> >>>>>> >> >> >>>>>>>>>> FnApiRunnerTestWithBundleRepeatAndMultiWorkers(FnApiRunnerTest):
>> >>>>>> >> >> >>>>>>>>>>
>> >>>>>> >> >> >>>>>>>>>> I'm not convinced we need to run all of these 
>> >>>>>> >> >> >>>>>>>>>> permutations, or at
>> >>>>>> >> >> >>>>>>>>>> least not all tests in all permutations.
>> >>>>>> >> >> >>>>>>>>>>
>> >>>>>> >> >> >>>>>>>>>> On Fri, Oct 25, 2019 at 10:57 AM Valentyn Tymofieiev
>> >>>>>> >> >> >>>>>>>>>> <[email protected]> wrote:
>> >>>>>> >> >> >>>>>>>>>> >
>> >>>>>> >> >> >>>>>>>>>> > I took another look at this and precommit ITs are 
>> >>>>>> >> >> >>>>>>>>>> > already running in parallel, albeit in the same 
>> >>>>>> >> >> >>>>>>>>>> > suite. However it appears Python precommits became 
>> >>>>>> >> >> >>>>>>>>>> > slower, especially Python 2 precommits [35 min per 
>> >>>>>> >> >> >>>>>>>>>> > suite x 3 suites], see [1]. Not sure yet what 
>> >>>>>> >> >> >>>>>>>>>> > caused the increase, but precommits used to be 
>> >>>>>> >> >> >>>>>>>>>> > faster. Perhaps we have added a slow test or a lot 
>> >>>>>> >> >> >>>>>>>>>> > of new tests.
>> >>>>>> >> >> >>>>>>>>>> >
>> >>>>>> >> >> >>>>>>>>>> > [1] 
>> >>>>>> >> >> >>>>>>>>>> > https://scans.gradle.com/s/jvcw5fpqfc64k/timeline?task=ancsbov425524
>> >>>>>> >> >> >>>>>>>>>> >
>> >>>>>> >> >> >>>>>>>>>> > On Thu, Oct 24, 2019 at 4:53 PM Ahmet Altay 
>> >>>>>> >> >> >>>>>>>>>> > <[email protected]> wrote:
>> >>>>>> >> >> >>>>>>>>>> >>
>> >>>>>> >> >> >>>>>>>>>> >> Ack. Separating precommit ITs to a different 
>> >>>>>> >> >> >>>>>>>>>> >> suite sounds good. Anyone is interested in doing 
>> >>>>>> >> >> >>>>>>>>>> >> that?
>> >>>>>> >> >> >>>>>>>>>> >>
>> >>>>>> >> >> >>>>>>>>>> >> On Thu, Oct 24, 2019 at 2:41 PM Valentyn 
>> >>>>>> >> >> >>>>>>>>>> >> Tymofieiev <[email protected]> wrote:
>> >>>>>> >> >> >>>>>>>>>> >>>
>> >>>>>> >> >> >>>>>>>>>> >>> This should not increase the queue time 
>> >>>>>> >> >> >>>>>>>>>> >>> substantially, since precommit ITs are running 
>> >>>>>> >> >> >>>>>>>>>> >>> sequentially with precommit tests, unlike 
>> >>>>>> >> >> >>>>>>>>>> >>> multiple precommit tests which run in parallel 
>> >>>>>> >> >> >>>>>>>>>> >>> to each other.
>> >>>>>> >> >> >>>>>>>>>> >>>
>> >>>>>> >> >> >>>>>>>>>> >>> The precommit ITs we run are batch and streaming 
>> >>>>>> >> >> >>>>>>>>>> >>> wordcount tests on Py2 and one Py3 version, so 
>> >>>>>> >> >> >>>>>>>>>> >>> it's not a lot of tests.
>> >>>>>> >> >> >>>>>>>>>> >>>
>> >>>>>> >> >> >>>>>>>>>> >>> On Thu, Oct 24, 2019 at 1:07 PM Ahmet Altay 
>> >>>>>> >> >> >>>>>>>>>> >>> <[email protected]> wrote:
>> >>>>>> >> >> >>>>>>>>>> >>>>
>> >>>>>> >> >> >>>>>>>>>> >>>> +1 to separating ITs from precommit. Downside 
>> >>>>>> >> >> >>>>>>>>>> >>>> would be, when Chad tried to do something 
>> >>>>>> >> >> >>>>>>>>>> >>>> similar [1] it was noted that the total time to 
>> >>>>>> >> >> >>>>>>>>>> >>>> run all precommit tests would increase and also 
>> >>>>>> >> >> >>>>>>>>>> >>>> potentially increase the queue time.
>> >>>>>> >> >> >>>>>>>>>> >>>>
>> >>>>>> >> >> >>>>>>>>>> >>>> Another alternative, we could run a smaller set 
>> >>>>>> >> >> >>>>>>>>>> >>>> of IT tests in precommits and run the whole 
>> >>>>>> >> >> >>>>>>>>>> >>>> suite as part of post commit tests.
>> >>>>>> >> >> >>>>>>>>>> >>>>
>> >>>>>> >> >> >>>>>>>>>> >>>> [1] https://github.com/apache/beam/pull/9642
>> >>>>>> >> >> >>>>>>>>>> >>>>
>> >>>>>> >> >> >>>>>>>>>> >>>> On Thu, Oct 24, 2019 at 12:15 PM Valentyn 
>> >>>>>> >> >> >>>>>>>>>> >>>> Tymofieiev <[email protected]> wrote:
>> >>>>>> >> >> >>>>>>>>>> >>>>>
>> >>>>>> >> >> >>>>>>>>>> >>>>> One improvement could be move to Precommit IT 
>> >>>>>> >> >> >>>>>>>>>> >>>>> tests into a separate suite from precommit 
>> >>>>>> >> >> >>>>>>>>>> >>>>> tests, and run it in parallel.
>> >>>>>> >> >> >>>>>>>>>> >>>>>
>> >>>>>> >> >> >>>>>>>>>> >>>>> On Thu, Oct 24, 2019 at 11:41 AM Brian Hulette 
>> >>>>>> >> >> >>>>>>>>>> >>>>> <[email protected]> wrote:
>> >>>>>> >> >> >>>>>>>>>> >>>>>>
>> >>>>>> >> >> >>>>>>>>>> >>>>>> Python Precommits are taking quite a while 
>> >>>>>> >> >> >>>>>>>>>> >>>>>> now [1]. Just visually it looks like the 
>> >>>>>> >> >> >>>>>>>>>> >>>>>> average length is 1.5h or so, but it spikes 
>> >>>>>> >> >> >>>>>>>>>> >>>>>> up to 2h. I've had several precommit runs get 
>> >>>>>> >> >> >>>>>>>>>> >>>>>> aborted due to the 2 hour limit.
>> >>>>>> >> >> >>>>>>>>>> >>>>>>
>> >>>>>> >> >> >>>>>>>>>> >>>>>> It looks like there was a spike up above 1h 
>> >>>>>> >> >> >>>>>>>>>> >>>>>> back on 9/6 and the duration has been 
>> >>>>>> >> >> >>>>>>>>>> >>>>>> steadily rising since then. Is there anything 
>> >>>>>> >> >> >>>>>>>>>> >>>>>> we can do about this?
>> >>>>>> >> >> >>>>>>>>>> >>>>>>
>> >>>>>> >> >> >>>>>>>>>> >>>>>> Brian
>> >>>>>> >> >> >>>>>>>>>> >>>>>>
>> >>>>>> >> >> >>>>>>>>>> >>>>>> [1] 
>> >>>>>> >> >> >>>>>>>>>> >>>>>> http://104.154.241.245/d/_TNndF2iz/pre-commit-test-latency?orgId=1&from=now-90d&to=now&fullscreen&panelId=4

Re: Python Precommit duration pushing 2 hours

Reply via email to