no, 8 hours is plenty. things will speed up soon once the backlog of builds works through.... i limited the number of PRB builds to 4 per worker, and things are looking better. let's see how we look next week.
On Fri, Jul 10, 2020 at 3:31 PM Frank Yin <ukby.1...@gmail.com> wrote: > Can we also increase the build timeout? > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125617 > This one fails because it times out, not because of test failures. > > On Fri, Jul 10, 2020 at 2:16 PM Frank Yin <ukby.1...@gmail.com> wrote: > >> Yeah, that's what I figured -- those workers are under load. Thanks. >> >> On Fri, Jul 10, 2020 at 12:43 PM shane knapp ☠ <skn...@berkeley.edu> >> wrote: >> >>> only 125561, 125562 and 125564 were impacted by -9. >>> >>> 125565 exited w/a code of 15 (143 - 128), which means the process was >>> terminated for unknown reasons. >>> >>> 125563 looks like mima failed due to a bunch of errors. >>> >>> i just spot checked a bunch of recent failed PRB builds from today and >>> they all seemed to be legit. >>> >>> another thing that might be happening is an overload of PRB builds on >>> the workers due to the backlog... the workers are under a LOT of load >>> right now, and i can put some rate limiting in to see if that helps out. >>> >>> shane >>> >>> On Fri, Jul 10, 2020 at 11:31 AM Frank Yin <ukby.1...@gmail.com> wrote: >>> >>>> Like from build number 125565 to 125561, all impacted by kill -9. >>>> >>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console >>>> >>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125564/console >>>> >>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125563/console >>>> >>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125562/console >>>> >>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125561/console >>>> >>>> On Fri, Jul 10, 2020 at 9:35 AM shane knapp ☠ <skn...@berkeley.edu> >>>> wrote: >>>> >>>>> define "a lot" and provide some links to those builds, please. there >>>>> are roughly 2000 builds per day, and i can't do more than keep a cursory >>>>> eye on things. >>>>> >>>>> the infrastructure that the tests run on hasn't changed one bit on any >>>>> of the workers, and 'kill -9' could be a timeout, flakiness caused by old >>>>> build processes remaining on the workers after the master went down, or me >>>>> trying to clean things up w/o a reboot. or, perhaps, something wrong >>>>> w/the >>>>> infra. :) >>>>> >>>>> On Fri, Jul 10, 2020 at 9:28 AM Frank Yin <ukby.1...@gmail.com> wrote: >>>>> >>>>>> Agree, but I’ve seen a lot of kill by signal 9, assuming that >>>>>> infrastructure? >>>>>> >>>>>> On Fri, Jul 10, 2020 at 8:19 AM shane knapp ☠ <skn...@berkeley.edu> >>>>>> wrote: >>>>>> >>>>>>> yeah, i can't do much for flaky tests... just flaky infrastructure. >>>>>>> >>>>>>> >>>>>>> On Fri, Jul 10, 2020 at 12:41 AM Hyukjin Kwon <gurwls...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Couple of flaky tests can happen. It's usual. Seems it got better >>>>>>>> now at least. I will keep monitoring the builds. >>>>>>>> >>>>>>>> 2020년 7월 10일 (금) 오후 4:33, ukby1234 <ukby.1...@gmail.com>님이 작성: >>>>>>>> >>>>>>>>> Looks like Jenkins isn't stable still. My PR fails two times in a >>>>>>>>> row: >>>>>>>>> >>>>>>>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console >>>>>>>>> >>>>>>>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125536/testReport >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Sent from: >>>>>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/ >>>>>>>>> >>>>>>>>> >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Shane Knapp >>>>>>> Computer Guy / Voice of Reason >>>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead >>>>>>> https://rise.cs.berkeley.edu >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Shane Knapp >>>>> Computer Guy / Voice of Reason >>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead >>>>> https://rise.cs.berkeley.edu >>>>> >>>> >>> >>> -- >>> Shane Knapp >>> Computer Guy / Voice of Reason >>> UC Berkeley EECS Research / RISELab Staff Technical Lead >>> https://rise.cs.berkeley.edu >>> >> -- Shane Knapp Computer Guy / Voice of Reason UC Berkeley EECS Research / RISELab Staff Technical Lead https://rise.cs.berkeley.edu