+1, thanks, Damian!

> Are Spark and Flink runners benchmarking against local clusters on the
Jenkins VMs?

I believe that's the case and yes, the load on
local-running benchmarks seems to be rather low, especially on some queries.
Another avenue to improve the signal stability would be to run the
benchmarks multiple times and analyze the 50th percentile of the readings.

On Mon, Jul 27, 2020 at 9:47 AM Ismaël Mejía <ieme...@gmail.com> wrote:

> Great analysis Damian thanks for taking a look and fixing this. Great
> to know it was not anything related to Beam's code.
>
> I wonder if we should probably change the input size for the open
> source runners (currently is 1/10 of Dataflow, that explains the big
> difference on time), with the goal of detecting regressions better,
> the current size is so small that adding 1s of extra time in some runs
> looks like a 50-60% degradation and we cannot know if this is due to
> some small small CPU/GC pause or a real regression. I wonder however
> if this will impact negatively the worker utilization.
>
>
> On Mon, Jul 27, 2020 at 4:07 PM Damian Gadomski
> <damian.gadom...@polidea.com> wrote:
> >
> > Hey all,
> >
> > I've done a few checks to pinpoint the issue and it seems that I've just
> fixed it.
> >
> > Didn't know that before but the Flink, Spark and Direct Nexmark tests
> are running on special Jenkins worker. The `apache-beam-jenkins-16` is
> labeled with `beam-perf`, so only these tests can execute there. I'm not
> sure, because the configuration on the old CI is already gone, but I guess
> that this worker was configured to have only one executor (which I had
> missed). That would forbid concurrent execution of the jobs and
> improve/stabilize the timings.
> >
> > That's how I currently configured the node and seems that the timings
> are back to the pre-migration values:
> http://104.154.241.245/d/ahuaA_zGz/nexmark?orgId=1&from=no:w-90d&to=now
> >
> > Dataflow was not affected because it wasn't restricted to run on
> `apache-beam-jenkins-16`.
> >
> > Regards,
> > Damian
> >
> >
> > On Wed, Jul 22, 2020 at 5:11 PM Kenneth Knowles <k...@apache.org> wrote:
> >>
> >> Are Spark and Flink runners benchmarking against local clusters on the
> Jenkins VMs? Needless to say that is not a very controlled environment (and
> of course not realistic scale). That is probably why Dataflow was not
> affected. Is it possible that simply the different version of the Jenkins
> worker software and/or the instructions from the Cloudbees instance cause
> differing load?
> >>
> >> Kenn
> >>
> >> On Tue, Jul 21, 2020 at 4:17 PM Valentyn Tymofieiev <
> valen...@google.com> wrote:
> >>>
> >>> FYI it looks like the transition to new Jenkins CI is visible on
> Nexmark performance graphs[1][2]. Are new VM nodes less performant than old
> ones?
> >>>
> >>> [1] hhttp://
> 104.154.241.245/d/ahuaA_zGz/nexmark?orgId=1&from=1587597387737&to=1595373387737&var-processingType=batch&var-ID=All&var-runner=All
> >>> [2]
> https://issues.apache.org/jira/browse/BEAM-10542?focusedCommentId=17162374&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17162374
> >>>
> >>> On Thu, Jun 18, 2020 at 3:32 PM Tyson Hamilton <tyso...@google.com>
> wrote:
> >>>>
> >>>> Currently no. We're already experiencing a backlog of builds so the
> additional load would be a problem. I've opened two related issues that I
> think need completion before allowing non-committers to trigger tests:
> >>>>
> >>>> Load sharing improvements:
> https://issues.apache.org/jira/browse/BEAM-10281
> >>>> Admin access (maybe not required but nice to have):
> https://issues.apache.org/jira/browse/BEAM-10280
> >>>>
> >>>> I created https://issues.apache.org/jira/browse/BEAM-10282 to track
> opening up triggering for non-committers.
> >>>>
> >>>> On Thu, Jun 18, 2020 at 3:30 PM Luke Cwik <lc...@google.com> wrote:
> >>>>>
> >>>>> Was about to ask the same question, so can non-committers trigger
> the tests now?
> >>>>>
> >>>>> On Thu, Jun 18, 2020 at 11:54 AM Heejong Lee <heej...@google.com>
> wrote:
> >>>>>>
> >>>>>> This is awesome. Could non-committers also trigger the test now?
> >>>>>>
> >>>>>> On Wed, Jun 17, 2020 at 6:12 AM Damian Gadomski <
> damian.gadom...@polidea.com> wrote:
> >>>>>>>
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> Good news, we've just migrated to the new CI:
> https://ci-beam.apache.org. As from now beam projects at builds.apache.org
> are disabled.
> >>>>>>>
> >>>>>>> If you experience any issues with the new setup please let me
> know, either here or on ASF slack.
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>> Damian
> >>>>>>>
> >>>>>>> On Mon, Jun 15, 2020 at 10:40 PM Damian Gadomski <
> damian.gadom...@polidea.com> wrote:
> >>>>>>>>
> >>>>>>>> Happy to see your positive response :)
> >>>>>>>>
> >>>>>>>> @Udi Meiri, Thanks for pointing that out. I've checked it and
> indeed it needs some attention.
> >>>>>>>>
> >>>>>>>> There are two things basing on my research:
> >>>>>>>>
> >>>>>>>> data uploaded by performance and load tests by the jobs, directly
> to the influx DB - that should be handled automatically as new jobs will
> upload the same data in the same way
> >>>>>>>> data fetched using Jenkins API by the metrics tool
> (syncjenkins.py) - here the situation is a bit more complex as the script
> relies on the build number (it's used actually as a time reference and
> primary key in the DB is created from it). To avoid refactoring of the
> script and database migration to use timestamp instead of build number I've
> just "fast-forwarded" the numbers on the new https://ci-beam.apache.org
> to follow current numbering from the old CI. Therefore simple replacement
> of the Jenkins URL in the metrics scripts should do the trick to have
> continuous metrics data. I'll check that tomorrow on my local grafana
> instance.
> >>>>>>>>
> >>>>>>>> Please let me know if there's anything that I missed.
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Damian
> >>>>>>>>
> >>>>>>>> On Mon, Jun 15, 2020 at 8:05 PM Alexey Romanenko <
> aromanenko....@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>> Great! Thank you for working on this and letting us know.
> >>>>>>>>>
> >>>>>>>>> On 12 Jun 2020, at 16:58, Damian Gadomski <
> damian.gadom...@polidea.com> wrote:
> >>>>>>>>>
> >>>>>>>>> Hello,
> >>>>>>>>>
> >>>>>>>>> During the last few days, I was preparing for the Beam Jenkins
> migration from builds.apache.org to ci-beam.apache.org. The new Jenkins
> Master will be dedicated only for Beam related jobs, all Beam Committers
> will have build configure access, and Beam PMC will have Admin (GUI) Access.
> >>>>>>>>>
> >>>>>>>>> We (in cooperation with Infra) are almost ready for the
> migration itself and I want to share with you the details of our plan. We
> are planning to start the migration next week, most likely on Tuesday. I'll
> keep you updated on the progress. We do not expect any issues nor the
> outage of the CI services, everything should be more or less unnoticeable.
> Just don't be surprised that the Jenkins URL will change to
> https://ci-beam.apache.org
> >>>>>>>>>
> >>>>>>>>> If you are curious, here are the steps that we are going to take:
> >>>>>>>>>
> >>>>>>>>> 1. Create 16 new CI nodes that will be connected to the new CI.
> We will then have simultaneously running two CI servers.
> >>>>>>>>> 2. Verify that new builds work as expected on the new instance
> (compare results of cron builds). (a day or two would be sufficient)
> >>>>>>>>> 3. Move the responsibility of Phrase/PR/Commit builds to the new
> CI, disable on the old one.
> >>>>>>>>> 4. Modify the .test-infra/jenkins/README.md to point to the new
> instance and replace Post-commit tests status in README.md and
> .github/PULL_REQUEST_TEMPLATE.md
> >>>>>>>>> 5. Disable the jobs on the old Jenkins and add a description to
> each job with the URL to the corresponding one on the new CI.
> >>>>>>>>> 6. Turn off VM instances of the old nodes.
> >>>>>>>>> 7. Remove VM instances of the old nodes.
> >>>>>>>>>
> >>>>>>>>> In case of any questions or doubts feel free to ask :)
> >>>>>>>>>
> >>>>>>>>> Regards,
> >>>>>>>>> Damian
> >>>>>>>>>
> >>>>>>>>>
>

Reply via email to