Ismael, there's still room for that (as well as for running multiple times
and taking the median as Valentyn proposed) as the jobs anyway fully occupy
one machine. The load statistics [1] show that currently that worker is
most of the time idle. As of now, last time the jobs were executed they all
took about 40 minutes [2]. That makes nearly 90% of idle time for the `
apache-beam-jenkins-16`, because they are triggered every 6 hours. That's
the state of the cron-triggered jobs listed here [2].
There are also `_PR` versions of these jobs, that share the DSL config, and
could be run from GitHub phrases: [3], [4], [5]. They are not tied to the
16th worker and spread on the rest of them, but that shouldn't be an issue,
either. As you can see in the history they are not triggered that often.

[1]
https://ci-beam.apache.org/computer/apache-beam-jenkins-16/load-statistics?type=min
[2] https://ci-beam.apache.org/label/beam-perf/
[3] https://ci-beam.apache.org/job/beam_PostCommit_Java_Nexmark_Flink_PR
[4] https://ci-beam.apache.org/job/beam_PostCommit_Java_Nexmark_Spark_PR
[5] https://ci-beam.apache.org/job/beam_PostCommit_Java_Nexmark_Direct_PR

On Mon, Jul 27, 2020 at 6:54 PM Valentyn Tymofieiev <valen...@google.com>
wrote:

> +1, thanks, Damian!
>
> > Are Spark and Flink runners benchmarking against local clusters on the
> Jenkins VMs?
>
> I believe that's the case and yes, the load on
> local-running benchmarks seems to be rather low, especially on some queries.
> Another avenue to improve the signal stability would be to run the
> benchmarks multiple times and analyze the 50th percentile of the readings.
>
> On Mon, Jul 27, 2020 at 9:47 AM Ismaël Mejía <ieme...@gmail.com> wrote:
>
>> Great analysis Damian thanks for taking a look and fixing this. Great
>> to know it was not anything related to Beam's code.
>>
>> I wonder if we should probably change the input size for the open
>> source runners (currently is 1/10 of Dataflow, that explains the big
>> difference on time), with the goal of detecting regressions better,
>> the current size is so small that adding 1s of extra time in some runs
>> looks like a 50-60% degradation and we cannot know if this is due to
>> some small small CPU/GC pause or a real regression. I wonder however
>> if this will impact negatively the worker utilization.
>>
>>
>> On Mon, Jul 27, 2020 at 4:07 PM Damian Gadomski
>> <damian.gadom...@polidea.com> wrote:
>> >
>> > Hey all,
>> >
>> > I've done a few checks to pinpoint the issue and it seems that I've
>> just fixed it.
>> >
>> > Didn't know that before but the Flink, Spark and Direct Nexmark tests
>> are running on special Jenkins worker. The `apache-beam-jenkins-16` is
>> labeled with `beam-perf`, so only these tests can execute there. I'm not
>> sure, because the configuration on the old CI is already gone, but I guess
>> that this worker was configured to have only one executor (which I had
>> missed). That would forbid concurrent execution of the jobs and
>> improve/stabilize the timings.
>> >
>> > That's how I currently configured the node and seems that the timings
>> are back to the pre-migration values:
>> http://104.154.241.245/d/ahuaA_zGz/nexmark?orgId=1&from=no:w-90d&to=now
>> >
>> > Dataflow was not affected because it wasn't restricted to run on
>> `apache-beam-jenkins-16`.
>> >
>> > Regards,
>> > Damian
>> >
>> >
>> > On Wed, Jul 22, 2020 at 5:11 PM Kenneth Knowles <k...@apache.org>
>> wrote:
>> >>
>> >> Are Spark and Flink runners benchmarking against local clusters on the
>> Jenkins VMs? Needless to say that is not a very controlled environment (and
>> of course not realistic scale). That is probably why Dataflow was not
>> affected. Is it possible that simply the different version of the Jenkins
>> worker software and/or the instructions from the Cloudbees instance cause
>> differing load?
>> >>
>> >> Kenn
>> >>
>> >> On Tue, Jul 21, 2020 at 4:17 PM Valentyn Tymofieiev <
>> valen...@google.com> wrote:
>> >>>
>> >>> FYI it looks like the transition to new Jenkins CI is visible on
>> Nexmark performance graphs[1][2]. Are new VM nodes less performant than old
>> ones?
>> >>>
>> >>> [1] hhttp://
>> 104.154.241.245/d/ahuaA_zGz/nexmark?orgId=1&from=1587597387737&to=1595373387737&var-processingType=batch&var-ID=All&var-runner=All
>> >>> [2]
>> https://issues.apache.org/jira/browse/BEAM-10542?focusedCommentId=17162374&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17162374
>> >>>
>> >>> On Thu, Jun 18, 2020 at 3:32 PM Tyson Hamilton <tyso...@google.com>
>> wrote:
>> >>>>
>> >>>> Currently no. We're already experiencing a backlog of builds so the
>> additional load would be a problem. I've opened two related issues that I
>> think need completion before allowing non-committers to trigger tests:
>> >>>>
>> >>>> Load sharing improvements:
>> https://issues.apache.org/jira/browse/BEAM-10281
>> >>>> Admin access (maybe not required but nice to have):
>> https://issues.apache.org/jira/browse/BEAM-10280
>> >>>>
>> >>>> I created https://issues.apache.org/jira/browse/BEAM-10282 to track
>> opening up triggering for non-committers.
>> >>>>
>> >>>> On Thu, Jun 18, 2020 at 3:30 PM Luke Cwik <lc...@google.com> wrote:
>> >>>>>
>> >>>>> Was about to ask the same question, so can non-committers trigger
>> the tests now?
>> >>>>>
>> >>>>> On Thu, Jun 18, 2020 at 11:54 AM Heejong Lee <heej...@google.com>
>> wrote:
>> >>>>>>
>> >>>>>> This is awesome. Could non-committers also trigger the test now?
>> >>>>>>
>> >>>>>> On Wed, Jun 17, 2020 at 6:12 AM Damian Gadomski <
>> damian.gadom...@polidea.com> wrote:
>> >>>>>>>
>> >>>>>>> Hello,
>> >>>>>>>
>> >>>>>>> Good news, we've just migrated to the new CI:
>> https://ci-beam.apache.org. As from now beam projects at
>> builds.apache.org are disabled.
>> >>>>>>>
>> >>>>>>> If you experience any issues with the new setup please let me
>> know, either here or on ASF slack.
>> >>>>>>>
>> >>>>>>> Regards,
>> >>>>>>> Damian
>> >>>>>>>
>> >>>>>>> On Mon, Jun 15, 2020 at 10:40 PM Damian Gadomski <
>> damian.gadom...@polidea.com> wrote:
>> >>>>>>>>
>> >>>>>>>> Happy to see your positive response :)
>> >>>>>>>>
>> >>>>>>>> @Udi Meiri, Thanks for pointing that out. I've checked it and
>> indeed it needs some attention.
>> >>>>>>>>
>> >>>>>>>> There are two things basing on my research:
>> >>>>>>>>
>> >>>>>>>> data uploaded by performance and load tests by the jobs,
>> directly to the influx DB - that should be handled automatically as new
>> jobs will upload the same data in the same way
>> >>>>>>>> data fetched using Jenkins API by the metrics tool
>> (syncjenkins.py) - here the situation is a bit more complex as the script
>> relies on the build number (it's used actually as a time reference and
>> primary key in the DB is created from it). To avoid refactoring of the
>> script and database migration to use timestamp instead of build number I've
>> just "fast-forwarded" the numbers on the new https://ci-beam.apache.org
>> to follow current numbering from the old CI. Therefore simple replacement
>> of the Jenkins URL in the metrics scripts should do the trick to have
>> continuous metrics data. I'll check that tomorrow on my local grafana
>> instance.
>> >>>>>>>>
>> >>>>>>>> Please let me know if there's anything that I missed.
>> >>>>>>>>
>> >>>>>>>> Regards,
>> >>>>>>>> Damian
>> >>>>>>>>
>> >>>>>>>> On Mon, Jun 15, 2020 at 8:05 PM Alexey Romanenko <
>> aromanenko....@gmail.com> wrote:
>> >>>>>>>>>
>> >>>>>>>>> Great! Thank you for working on this and letting us know.
>> >>>>>>>>>
>> >>>>>>>>> On 12 Jun 2020, at 16:58, Damian Gadomski <
>> damian.gadom...@polidea.com> wrote:
>> >>>>>>>>>
>> >>>>>>>>> Hello,
>> >>>>>>>>>
>> >>>>>>>>> During the last few days, I was preparing for the Beam Jenkins
>> migration from builds.apache.org to ci-beam.apache.org. The new Jenkins
>> Master will be dedicated only for Beam related jobs, all Beam Committers
>> will have build configure access, and Beam PMC will have Admin (GUI) Access.
>> >>>>>>>>>
>> >>>>>>>>> We (in cooperation with Infra) are almost ready for the
>> migration itself and I want to share with you the details of our plan. We
>> are planning to start the migration next week, most likely on Tuesday. I'll
>> keep you updated on the progress. We do not expect any issues nor the
>> outage of the CI services, everything should be more or less unnoticeable.
>> Just don't be surprised that the Jenkins URL will change to
>> https://ci-beam.apache.org
>> >>>>>>>>>
>> >>>>>>>>> If you are curious, here are the steps that we are going to
>> take:
>> >>>>>>>>>
>> >>>>>>>>> 1. Create 16 new CI nodes that will be connected to the new CI.
>> We will then have simultaneously running two CI servers.
>> >>>>>>>>> 2. Verify that new builds work as expected on the new instance
>> (compare results of cron builds). (a day or two would be sufficient)
>> >>>>>>>>> 3. Move the responsibility of Phrase/PR/Commit builds to the
>> new CI, disable on the old one.
>> >>>>>>>>> 4. Modify the .test-infra/jenkins/README.md to point to the new
>> instance and replace Post-commit tests status in README.md and
>> .github/PULL_REQUEST_TEMPLATE.md
>> >>>>>>>>> 5. Disable the jobs on the old Jenkins and add a description to
>> each job with the URL to the corresponding one on the new CI.
>> >>>>>>>>> 6. Turn off VM instances of the old nodes.
>> >>>>>>>>> 7. Remove VM instances of the old nodes.
>> >>>>>>>>>
>> >>>>>>>>> In case of any questions or doubts feel free to ask :)
>> >>>>>>>>>
>> >>>>>>>>> Regards,
>> >>>>>>>>> Damian
>> >>>>>>>>>
>> >>>>>>>>>
>>
>

Reply via email to