Thank you very much, Shane!
Xiao
On Mon, Jul 13, 2020 at 10:15 AM shane knapp ☠ wrote:
> alright, the system load graphs show that we've had a generally decreasing
> load since friday, and have burned through ~3k builds/day since the reboot
> last week! i don't see many timeouts, and the PRB
alright, the system load graphs show that we've had a generally decreasing
load since friday, and have burned through ~3k builds/day since the reboot
last week! i don't see many timeouts, and the PRB builds have been
generally green for a couple of days.
again, i will keep an eye on things but i
no, 8 hours is plenty. things will speed up soon once the backlog of
builds works through i limited the number of PRB builds to 4 per
worker, and things are looking better. let's see how we look next week.
On Fri, Jul 10, 2020 at 3:31 PM Frank Yin wrote:
> Can we also increase the build
Yeah, that's what I figured -- those workers are under load. Thanks.
On Fri, Jul 10, 2020 at 12:43 PM shane knapp ☠ wrote:
> only 125561, 125562 and 125564 were impacted by -9.
>
> 125565 exited w/a code of 15 (143 - 128), which means the process was
> terminated for unknown reasons.
>
> 125563
only 125561, 125562 and 125564 were impacted by -9.
125565 exited w/a code of 15 (143 - 128), which means the process was
terminated for unknown reasons.
125563 looks like mima failed due to a bunch of errors.
i just spot checked a bunch of recent failed PRB builds from today and they
all
define "a lot" and provide some links to those builds, please. there are
roughly 2000 builds per day, and i can't do more than keep a cursory eye on
things.
the infrastructure that the tests run on hasn't changed one bit on any of
the workers, and 'kill -9' could be a timeout, flakiness caused
yeah, i can't do much for flaky tests... just flaky infrastructure.
On Fri, Jul 10, 2020 at 12:41 AM Hyukjin Kwon wrote:
> Couple of flaky tests can happen. It's usual. Seems it got better now at
> least. I will keep monitoring the builds.
>
> 2020년 7월 10일 (금) 오후 4:33, ukby1234 님이 작성:
>
>>
Couple of flaky tests can happen. It's usual. Seems it got better now at
least. I will keep monitoring the builds.
2020년 7월 10일 (금) 오후 4:33, ukby1234 님이 작성:
> Looks like Jenkins isn't stable still. My PR fails two times in a row:
>
>
Looks like Jenkins isn't stable still. My PR fails two times in a row:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125536/testReport
--
Sent from:
i'm seeing green PRB builds now, so i feel that we've gotten things
building again! :)
On Thu, Jul 9, 2020 at 5:33 PM Hyukjin Kwon wrote:
> Thank you Shane.
>
> 2020년 7월 10일 (금) 오전 2:35, shane knapp ☠ 님이 작성:
>
>> and -06 is back! i'll keep an eye on things today, but suffice to
>> say on
Thank you Shane.
2020년 7월 10일 (금) 오전 2:35, shane knapp ☠ 님이 작성:
> and -06 is back! i'll keep an eye on things today, but suffice to say
> on each worker i:
>
> 1) rebooted
> 2) cleaned ~/.ivy2, ~/.m2, and other associated caches
>
> we should be g2g! please reply here if you continue to
and -06 is back! i'll keep an eye on things today, but suffice to say
on each worker i:
1) rebooted
2) cleaned ~/.ivy2, ~/.m2, and other associated caches
we should be g2g! please reply here if you continue to see weirdness.
On Thu, Jul 9, 2020 at 10:08 AM shane knapp ☠ wrote:
> ok,
ok, we're back up and building (just waiting for one worker, -06 to finish
cleaning itself up).
On Thu, Jul 9, 2020 at 9:30 AM shane knapp ☠ wrote:
> this is happening now.
>
> On Wed, Jul 8, 2020 at 9:07 AM shane knapp ☠ wrote:
>
>> this will be happening tomorrow... today is Meeting Hell
Thank you always, Shane!
Bests,
Dongjoon.
On Thu, Jul 9, 2020 at 9:30 AM shane knapp ☠ wrote:
> this is happening now.
>
> On Wed, Jul 8, 2020 at 9:07 AM shane knapp ☠ wrote:
>
>> this will be happening tomorrow... today is Meeting Hell Day[tm].
>>
>> On Tue, Jul 7, 2020 at 1:59 PM shane
this is happening now.
On Wed, Jul 8, 2020 at 9:07 AM shane knapp ☠ wrote:
> this will be happening tomorrow... today is Meeting Hell Day[tm].
>
> On Tue, Jul 7, 2020 at 1:59 PM shane knapp ☠ wrote:
>
>> i wasn't able to get to it today, so i'm hoping to squeeze in a quick
>> trip to the colo
As a side note, I've raised patches for addressing two frequent flaky
tests, CliSuite [1] and HiveSessionImplSuite [2]. Hope this helps to
mitigate the situation.
1. https://github.com/apache/spark/pull/29036
2. https://github.com/apache/spark/pull/29039
On Thu, Jul 9, 2020 at 11:51 AM Hyukjin
Thanks Shane!
BTW, it's getting serious .. e.g) https://github.com/apache/spark/pull/28969
.
The tests could not pass in 7 days .. Hopefully restarting the machines
will make the current situation better :-)
Separately, I am working on a PR to run the Spark tests in Github Actions.
We could
this will be happening tomorrow... today is Meeting Hell Day[tm].
On Tue, Jul 7, 2020 at 1:59 PM shane knapp ☠ wrote:
> i wasn't able to get to it today, so i'm hoping to squeeze in a quick trip
> to the colo tomorrow morning. if not, then first thing thursday.
>
> --
> Shane Knapp
> Computer
i wasn't able to get to it today, so i'm hoping to squeeze in a quick trip
to the colo tomorrow morning. if not, then first thing thursday.
--
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu
19 matches
Mail list logo