Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-13 Thread Xiao Li
Thank you very much, Shane! Xiao On Mon, Jul 13, 2020 at 10:15 AM shane knapp ☠ wrote: > alright, the system load graphs show that we've had a generally decreasing > load since friday, and have burned through ~3k builds/day since the reboot > last week! i don't see many timeouts, and the PRB

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-13 Thread shane knapp ☠
alright, the system load graphs show that we've had a generally decreasing load since friday, and have burned through ~3k builds/day since the reboot last week! i don't see many timeouts, and the PRB builds have been generally green for a couple of days. again, i will keep an eye on things but i

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-10 Thread shane knapp ☠
no, 8 hours is plenty. things will speed up soon once the backlog of builds works through i limited the number of PRB builds to 4 per worker, and things are looking better. let's see how we look next week. On Fri, Jul 10, 2020 at 3:31 PM Frank Yin wrote: > Can we also increase the build

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-10 Thread Frank Yin
Yeah, that's what I figured -- those workers are under load. Thanks. On Fri, Jul 10, 2020 at 12:43 PM shane knapp ☠ wrote: > only 125561, 125562 and 125564 were impacted by -9. > > 125565 exited w/a code of 15 (143 - 128), which means the process was > terminated for unknown reasons. > > 125563

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-10 Thread shane knapp ☠
only 125561, 125562 and 125564 were impacted by -9. 125565 exited w/a code of 15 (143 - 128), which means the process was terminated for unknown reasons. 125563 looks like mima failed due to a bunch of errors. i just spot checked a bunch of recent failed PRB builds from today and they all

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-10 Thread shane knapp ☠
define "a lot" and provide some links to those builds, please. there are roughly 2000 builds per day, and i can't do more than keep a cursory eye on things. the infrastructure that the tests run on hasn't changed one bit on any of the workers, and 'kill -9' could be a timeout, flakiness caused

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-10 Thread shane knapp ☠
yeah, i can't do much for flaky tests... just flaky infrastructure. On Fri, Jul 10, 2020 at 12:41 AM Hyukjin Kwon wrote: > Couple of flaky tests can happen. It's usual. Seems it got better now at > least. I will keep monitoring the builds. > > 2020년 7월 10일 (금) 오후 4:33, ukby1234 님이 작성: > >>

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-10 Thread Hyukjin Kwon
Couple of flaky tests can happen. It's usual. Seems it got better now at least. I will keep monitoring the builds. 2020년 7월 10일 (금) 오후 4:33, ukby1234 님이 작성: > Looks like Jenkins isn't stable still. My PR fails two times in a row: > >

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-10 Thread ukby1234
Looks like Jenkins isn't stable still. My PR fails two times in a row: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125536/testReport -- Sent from:

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-09 Thread shane knapp ☠
i'm seeing green PRB builds now, so i feel that we've gotten things building again! :) On Thu, Jul 9, 2020 at 5:33 PM Hyukjin Kwon wrote: > Thank you Shane. > > 2020년 7월 10일 (금) 오전 2:35, shane knapp ☠ 님이 작성: > >> and -06 is back! i'll keep an eye on things today, but suffice to >> say on

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-09 Thread Hyukjin Kwon
Thank you Shane. 2020년 7월 10일 (금) 오전 2:35, shane knapp ☠ 님이 작성: > and -06 is back! i'll keep an eye on things today, but suffice to say > on each worker i: > > 1) rebooted > 2) cleaned ~/.ivy2, ~/.m2, and other associated caches > > we should be g2g! please reply here if you continue to

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-09 Thread shane knapp ☠
and -06 is back! i'll keep an eye on things today, but suffice to say on each worker i: 1) rebooted 2) cleaned ~/.ivy2, ~/.m2, and other associated caches we should be g2g! please reply here if you continue to see weirdness. On Thu, Jul 9, 2020 at 10:08 AM shane knapp ☠ wrote: > ok,

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-09 Thread shane knapp ☠
ok, we're back up and building (just waiting for one worker, -06 to finish cleaning itself up). On Thu, Jul 9, 2020 at 9:30 AM shane knapp ☠ wrote: > this is happening now. > > On Wed, Jul 8, 2020 at 9:07 AM shane knapp ☠ wrote: > >> this will be happening tomorrow... today is Meeting Hell

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-09 Thread Dongjoon Hyun
Thank you always, Shane! Bests, Dongjoon. On Thu, Jul 9, 2020 at 9:30 AM shane knapp ☠ wrote: > this is happening now. > > On Wed, Jul 8, 2020 at 9:07 AM shane knapp ☠ wrote: > >> this will be happening tomorrow... today is Meeting Hell Day[tm]. >> >> On Tue, Jul 7, 2020 at 1:59 PM shane

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-09 Thread shane knapp ☠
this is happening now. On Wed, Jul 8, 2020 at 9:07 AM shane knapp ☠ wrote: > this will be happening tomorrow... today is Meeting Hell Day[tm]. > > On Tue, Jul 7, 2020 at 1:59 PM shane knapp ☠ wrote: > >> i wasn't able to get to it today, so i'm hoping to squeeze in a quick >> trip to the colo

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-09 Thread Jungtaek Lim
As a side note, I've raised patches for addressing two frequent flaky tests, CliSuite [1] and HiveSessionImplSuite [2]. Hope this helps to mitigate the situation. 1. https://github.com/apache/spark/pull/29036 2. https://github.com/apache/spark/pull/29039 On Thu, Jul 9, 2020 at 11:51 AM Hyukjin

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-08 Thread Hyukjin Kwon
Thanks Shane! BTW, it's getting serious .. e.g) https://github.com/apache/spark/pull/28969 . The tests could not pass in 7 days .. Hopefully restarting the machines will make the current situation better :-) Separately, I am working on a PR to run the Spark tests in Github Actions. We could

Re: restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-08 Thread shane knapp ☠
this will be happening tomorrow... today is Meeting Hell Day[tm]. On Tue, Jul 7, 2020 at 1:59 PM shane knapp ☠ wrote: > i wasn't able to get to it today, so i'm hoping to squeeze in a quick trip > to the colo tomorrow morning. if not, then first thing thursday. > > -- > Shane Knapp > Computer

restarting jenkins build system tomorrow (7/8) ~930am PDT

2020-07-07 Thread shane knapp ☠
i wasn't able to get to it today, so i'm hoping to squeeze in a quick trip to the colo tomorrow morning. if not, then first thing thursday. -- Shane Knapp Computer Guy / Voice of Reason UC Berkeley EECS Research / RISELab Staff Technical Lead https://rise.cs.berkeley.edu