hey all, work is going quite well and smoothly for this project. today's update:
we will experience significant downtime monday/tuesday as we spin up the new primary jenkins node. until then, we'll be building over the next few days so i'll have a chance to better track down and fix any system-level build breaks. but most importantly, i just added 3 of the 4 new ubuntu 20.04 workers to the pool: research-jenkins-worker-03, 04 and 06. -05 is being difficult, so i'm going to let it pout in the corner for a while before hitting it again w/the ansible cannon. shane On Tue, Nov 24, 2020 at 6:08 PM shane knapp ☠ <skn...@berkeley.edu> wrote: > all spark builds have been ported and triggered: > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/ > > not shown are the regular and k8s PRB, which are also running. > > i think i've nailed down most of the stupid PATH and JAVA_HOME issues, but > i'm sure we'll have some stuff to work out. i'm mostly keeping an eye on > the build history of research-jenkins-worker-01 and -02, as they're running > the latest OS + ansible (which will be moved in to the spark repo asap). > > i'm still concerned about sbt failures, which includes the PRB. we'll see > how things go, and just focus on getting things working on ubuntu 20 LTS. > if we need to drop the ubuntu 16 workers from the pool temporarily, i would > be more than happy to do that. we'll lose some capacity, but it looks like > we have a solid template for getting these suckers redeployed so > turn-around should be pretty quick. > > we also need to dedicate some time to clean up/fix our plugin configs. > there's been a lot of change over the past three years and things like PRB > triggers seem flaky (it took 28m instead of 5m for this job to trigger: > https://github.com/apache/spark/pull/29994) > > this all being said, i'm really happy w/our progress so far and have > started leaning towards 'cautiously optimistic'... we'll see how things go > and recalibrate accordingly. i'll have a better idea of where we are > tomorrow and keep the list updated. > > and finally: a HUGE thanks goes out to jon for the work going on at the > colo this moment: rack rearrangement, cleaning up networking, fixing > hardware, reimaging and generally kicking ass! > > have a great holiday! > > shane > > On Tue, Nov 24, 2020 at 2:24 PM shane knapp ☠ <skn...@berkeley.edu> wrote: > >> our very first ubuntu-based PRB is running: >> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131701/ >> >> crossing my fingers! :) >> >> On Tue, Nov 24, 2020 at 1:30 PM shane knapp ☠ <skn...@berkeley.edu> >> wrote: >> >>> due to scheduling, upcoming holiday and in-the-colo work requirements, >>> all of the centos workers are being wiped NOW. >>> >>> this is great, as the sooner we can get started on fixing builds the >>> better. i'm not going anywhere over the holiday, so i'll get a good >>> head-start on things. >>> >>> thank you jon! >>> >>> shane >>> >>> On Tue, Nov 24, 2020 at 11:24 AM shane knapp ☠ <skn...@berkeley.edu> >>> wrote: >>> >>>> this is a lengthy, but important read for everyone here. >>>> >>>> in the next few days, the remaining centos machines (PRB/SBT workers >>>> AND primary) will have be reimaged from centos6.9 to ubuntu 20.04LTS. >>>> >>>> this means three important things on the very near horizon: >>>> 1 -- the PRB and SBT tests WILL BE BROKEN (by thanksgiving) >>>> 2 -- jenkins itself will be down for a while as we move the jenkins >>>> installation to it's new home. >>>> 3 -- those of you with accounts here will temporarily lose access >>>> >>>> regarding (1), brian (cced) will be helping me debug and fix any >>>> system-level bugs (python envs, missing packages, etc). jon (cced) will be >>>> doing the reimaging and cobbling together of hardware to keep us on our >>>> feet. their help is going to be invaluable to getting us back on the >>>> ground. >>>> >>>> we already have two ubuntu 20 workers up and building >>>> (research-jenkins-worker-0[1,2]), and the SparkPullRequestBuilder-K8s build >>>> is already green. i'll keep an eye on these workers to ensure i didn't >>>> miss anything. >>>> >>>> once we have a couple of more ubuntu 20 machines up, i'll move the PRB >>>> and SBT builds there and let them fail as often as possible so we can use >>>> the build logs during the migration of the primary. >>>> >>>> then we shut down jenkins and move to the new primary. >>>> >>>> this will all be happening in the next week to week-and-a-half. >>>> >>>> nearish on the horizon, we need to do two things: >>>> 1 -- reimage the ubuntu 16 workers >>>> 2 -- clean up the all of the breakages within jenkins plugin universe. >>>> there's a lot of stacktraces everywhere after the upgrade, but things are >>>> still building so i'm inclined to push this out. >>>> 3 -- fix the PRB/SBT builds. >>>> >>>> further off, once we're stable, we (the spark community) will need to >>>> have an honest conversation about where the build system lives. we don't >>>> currently have enough resources here to manage the system in a way that it >>>> deserves, and i can't forsee getting the staffing for long-term support any >>>> time soon. >>>> >>>> however, with the ansible configs (which i plan on moving to the spark >>>> repo), it should be much easier to replicate the build system. >>>> >>>> by this time next year, i would like to have helped find the build >>>> system a new home, and sunset jenkins. over the past 11 years (i think), >>>> this system has built spark. it's getting a little tired and needs a well >>>> deserved break. :) >>>> >>>> shane >>>> -- >>>> Shane Knapp >>>> Computer Guy / Voice of Reason >>>> UC Berkeley EECS Research / RISELab Staff Technical Lead >>>> https://rise.cs.berkeley.edu >>>> >>> >>> >>> -- >>> Shane Knapp >>> Computer Guy / Voice of Reason >>> UC Berkeley EECS Research / RISELab Staff Technical Lead >>> https://rise.cs.berkeley.edu >>> >> >> >> -- >> Shane Knapp >> Computer Guy / Voice of Reason >> UC Berkeley EECS Research / RISELab Staff Technical Lead >> https://rise.cs.berkeley.edu >> > > > -- > Shane Knapp > Computer Guy / Voice of Reason > UC Berkeley EECS Research / RISELab Staff Technical Lead > https://rise.cs.berkeley.edu > -- Shane Knapp Computer Guy / Voice of Reason UC Berkeley EECS Research / RISELab Staff Technical Lead https://rise.cs.berkeley.edu