[ 
https://issues.apache.org/jira/browse/BEAM-8213?focusedWorklogId=318494&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-318494
 ]

ASF GitHub Bot logged work on BEAM-8213:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 25/Sep/19 18:06
            Start Date: 25/Sep/19 18:06
    Worklog Time Spent: 10m 
      Work Description: youngoli commented on issue #9642: [BEAM-8213] Split up 
monolithic python preCommit tests on jenkins
URL: https://github.com/apache/beam/pull/9642#issuecomment-535142584
 
 
   > > The monolythic job already runs 4 out of 5 tasks in parallel - I do not 
see where 80% speedup will come from.
   > 
   > I think this is just a communication error. We're saying that the sum of 
the running time of all the 5 jobs will be the same or more than the old 
monolithic job, but each individual job will be on average 1/5th the time of 
the old monolithic job. We're not saying that the sum of all 5 jobs will be 
faster.
   
   What Valentyn means is that the monolithic job is already running the tests 
in parallel within one Jenkins slot. I didn't know that, but if that's the case 
then splitting the tests wouldn't make them finish faster, it would just 
continue running the tests in parallel, but using 5 Jenkins slots instead of 1.
   
   > Increasing slots per worker may help, but there are some potentially 
heavy-weight tests, such as portable python precommit tests that bring up 
Flink, that may cause jenkins VMs to OOM if we run a lot of them in parallel on 
the same VM. I have heard of a second hand account that parallelizing portable 
precommit tests 4x on the same Jenkins worker caused OOMs, but did not verify 
myself. Perhaps not an issue, but we need a reliable way to monitor Jenkins 
worker health / utilization to be confident.
   
   Is there a way to distribute tests to workers with the lowest resource 
utilization? Or even better, have resource benchmarks for our various test 
suites so we can avoid sending resource-intensive tests to workers that don't 
have enough available resources? I don't really know how Jenkins works so that 
might be a little advanced, but it would definitely avoid that problem and let 
us increase the number of slots in the workers without hitting resource limits.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 318494)
    Time Spent: 5.5h  (was: 5h 20m)

> Run and report python tox tasks separately within Jenkins
> ---------------------------------------------------------
>
>                 Key: BEAM-8213
>                 URL: https://issues.apache.org/jira/browse/BEAM-8213
>             Project: Beam
>          Issue Type: Improvement
>          Components: build-system
>            Reporter: Chad Dombrova
>            Priority: Major
>          Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> As a python developer, the speed and comprehensibility of the jenkins 
> PreCommit job could be greatly improved.
> Here are some of the problems
> - when a lint job fails, it's not reported in the test results summary, so 
> even though the job is marked as failed, I see "Test Result (no failures)" 
> which is quite confusing
> - I have to wait for over an hour to discover the lint failed, which takes 
> about a minute to run on its own
> - The logs are a jumbled mess of all the different tasks running on top of 
> each other
> - The test results give no indication of which version of python they use.  I 
> click on Test results, then the test module, then the test class, then I see 
> 4 tests named the same thing.  I assume that the first is python 2.7, the 
> second is 3.5 and so on.   It takes 5 clicks and then reading the log output 
> to know which version of python a single error pertains to, then I need to 
> repeat for each failure.  This makes it very difficult to discover problems, 
> and deduce that they may have something to do with python version mismatches.
> I believe the solution to this is to split up the single monolithic python 
> PreCommit job into sub-jobs (possibly using a pipeline with steps).  This 
> would give us the following benefits:
> - sub job results should become available as they finish, so for example, 
> lint results should be available very early on
> - sub job results will be reported separately, and there will be a job for 
> each py2, py35, py36 and so on, so it will be clear when an error is related 
> to a particular python version
> - sub jobs without reports, like docs and lint, will have their own failure 
> status and logs, so when they fail it will be more obvious what went wrong.
> I'm happy to help out once I get some feedback on the desired way forward.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to