alright, builds are looking solid except for SBT... if someone here could take a look at those failures i'd be most appreciative.
the important ones: PRB, PRB-K8s, k8s, snapshot and maven builds all green! i'm literally gobsmacked by how smoothly this went. :) we're all going to enjoy a mellow holiday and i'll check build statuses every now and then and see if i find anything else like this: https://issues.apache.org/jira/browse/SPARK-33565 have a great holiday everyone! we'll start getting the new primary set up on monday, and hopefully by tuesday be fully up and running. shane On Wed, Nov 25, 2020 at 1:35 PM shane knapp ☠ <skn...@berkeley.edu> wrote: > hey all, work is going quite well and smoothly for this project. > > today's update: > > we will experience significant downtime monday/tuesday as we spin up the > new primary jenkins node. until then, we'll be building over the next few > days so i'll have a chance to better track down and fix any system-level > build breaks. > > but most importantly, i just added 3 of the 4 new ubuntu 20.04 workers to > the pool: research-jenkins-worker-03, 04 and 06. -05 is being difficult, > so i'm going to let it pout in the corner for a while before hitting it > again w/the ansible cannon. > > shane > > On Tue, Nov 24, 2020 at 6:08 PM shane knapp ☠ <skn...@berkeley.edu> wrote: > >> all spark builds have been ported and triggered: >> >> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/ >> >> not shown are the regular and k8s PRB, which are also running. >> >> i think i've nailed down most of the stupid PATH and JAVA_HOME issues, >> but i'm sure we'll have some stuff to work out. i'm mostly keeping an eye >> on the build history of research-jenkins-worker-01 and -02, as they're >> running the latest OS + ansible (which will be moved in to the spark repo >> asap). >> >> i'm still concerned about sbt failures, which includes the PRB. we'll >> see how things go, and just focus on getting things working on ubuntu 20 >> LTS. if we need to drop the ubuntu 16 workers from the pool temporarily, i >> would be more than happy to do that. we'll lose some capacity, but it >> looks like we have a solid template for getting these suckers redeployed so >> turn-around should be pretty quick. >> >> we also need to dedicate some time to clean up/fix our plugin configs. >> there's been a lot of change over the past three years and things like PRB >> triggers seem flaky (it took 28m instead of 5m for this job to trigger: >> https://github.com/apache/spark/pull/29994) >> >> this all being said, i'm really happy w/our progress so far and have >> started leaning towards 'cautiously optimistic'... we'll see how things go >> and recalibrate accordingly. i'll have a better idea of where we are >> tomorrow and keep the list updated. >> >> and finally: a HUGE thanks goes out to jon for the work going on at the >> colo this moment: rack rearrangement, cleaning up networking, fixing >> hardware, reimaging and generally kicking ass! >> >> have a great holiday! >> >> shane >> >> On Tue, Nov 24, 2020 at 2:24 PM shane knapp ☠ <skn...@berkeley.edu> >> wrote: >> >>> our very first ubuntu-based PRB is running: >>> >>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131701/ >>> >>> crossing my fingers! :) >>> >>> On Tue, Nov 24, 2020 at 1:30 PM shane knapp ☠ <skn...@berkeley.edu> >>> wrote: >>> >>>> due to scheduling, upcoming holiday and in-the-colo work requirements, >>>> all of the centos workers are being wiped NOW. >>>> >>>> this is great, as the sooner we can get started on fixing builds the >>>> better. i'm not going anywhere over the holiday, so i'll get a good >>>> head-start on things. >>>> >>>> thank you jon! >>>> >>>> shane >>>> >>>> On Tue, Nov 24, 2020 at 11:24 AM shane knapp ☠ <skn...@berkeley.edu> >>>> wrote: >>>> >>>>> this is a lengthy, but important read for everyone here. >>>>> >>>>> in the next few days, the remaining centos machines (PRB/SBT workers >>>>> AND primary) will have be reimaged from centos6.9 to ubuntu 20.04LTS. >>>>> >>>>> this means three important things on the very near horizon: >>>>> 1 -- the PRB and SBT tests WILL BE BROKEN (by thanksgiving) >>>>> 2 -- jenkins itself will be down for a while as we move the jenkins >>>>> installation to it's new home. >>>>> 3 -- those of you with accounts here will temporarily lose access >>>>> >>>>> regarding (1), brian (cced) will be helping me debug and fix any >>>>> system-level bugs (python envs, missing packages, etc). jon (cced) will >>>>> be >>>>> doing the reimaging and cobbling together of hardware to keep us on our >>>>> feet. their help is going to be invaluable to getting us back on the >>>>> ground. >>>>> >>>>> we already have two ubuntu 20 workers up and building >>>>> (research-jenkins-worker-0[1,2]), and the SparkPullRequestBuilder-K8s >>>>> build >>>>> is already green. i'll keep an eye on these workers to ensure i didn't >>>>> miss anything. >>>>> >>>>> once we have a couple of more ubuntu 20 machines up, i'll move the PRB >>>>> and SBT builds there and let them fail as often as possible so we can use >>>>> the build logs during the migration of the primary. >>>>> >>>>> then we shut down jenkins and move to the new primary. >>>>> >>>>> this will all be happening in the next week to week-and-a-half. >>>>> >>>>> nearish on the horizon, we need to do two things: >>>>> 1 -- reimage the ubuntu 16 workers >>>>> 2 -- clean up the all of the breakages within jenkins plugin >>>>> universe. there's a lot of stacktraces everywhere after the upgrade, but >>>>> things are still building so i'm inclined to push this out. >>>>> 3 -- fix the PRB/SBT builds. >>>>> >>>>> further off, once we're stable, we (the spark community) will need to >>>>> have an honest conversation about where the build system lives. we don't >>>>> currently have enough resources here to manage the system in a way that it >>>>> deserves, and i can't forsee getting the staffing for long-term support >>>>> any >>>>> time soon. >>>>> >>>>> however, with the ansible configs (which i plan on moving to the spark >>>>> repo), it should be much easier to replicate the build system. >>>>> >>>>> by this time next year, i would like to have helped find the build >>>>> system a new home, and sunset jenkins. over the past 11 years (i think), >>>>> this system has built spark. it's getting a little tired and needs a well >>>>> deserved break. :) >>>>> >>>>> shane >>>>> -- >>>>> Shane Knapp >>>>> Computer Guy / Voice of Reason >>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead >>>>> https://rise.cs.berkeley.edu >>>>> >>>> >>>> >>>> -- >>>> Shane Knapp >>>> Computer Guy / Voice of Reason >>>> UC Berkeley EECS Research / RISELab Staff Technical Lead >>>> https://rise.cs.berkeley.edu >>>> >>> >>> >>> -- >>> Shane Knapp >>> Computer Guy / Voice of Reason >>> UC Berkeley EECS Research / RISELab Staff Technical Lead >>> https://rise.cs.berkeley.edu >>> >> >> >> -- >> Shane Knapp >> Computer Guy / Voice of Reason >> UC Berkeley EECS Research / RISELab Staff Technical Lead >> https://rise.cs.berkeley.edu >> > > > -- > Shane Knapp > Computer Guy / Voice of Reason > UC Berkeley EECS Research / RISELab Staff Technical Lead > https://rise.cs.berkeley.edu > -- Shane Knapp Computer Guy / Voice of Reason UC Berkeley EECS Research / RISELab Staff Technical Lead https://rise.cs.berkeley.edu