only 125561, 125562 and 125564 were impacted by -9.

125565 exited w/a code of 15 (143 - 128), which means the process was
terminated for unknown reasons.

125563 looks like mima failed due to a bunch of errors.

i just spot checked a bunch of recent failed PRB builds from today and they
all seemed to be legit.

another thing that might be happening is an overload of PRB builds on the
workers due to the backlog...  the workers are under a LOT of load right
now, and i can put some rate limiting in to see if that helps out.

shane

On Fri, Jul 10, 2020 at 11:31 AM Frank Yin <ukby.1...@gmail.com> wrote:

> Like from build number 125565 to 125561, all impacted by kill -9.
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125564/console
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125563/console
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125562/console
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125561/console
>
> On Fri, Jul 10, 2020 at 9:35 AM shane knapp ☠ <skn...@berkeley.edu> wrote:
>
>> define "a lot" and provide some links to those builds, please.  there are
>> roughly 2000 builds per day, and i can't do more than keep a cursory eye on
>> things.
>>
>> the infrastructure that the tests run on hasn't changed one bit on any of
>> the workers, and 'kill -9' could be a timeout, flakiness caused by old
>> build processes remaining on the workers after the master went down, or me
>> trying to clean things up w/o a reboot.  or, perhaps, something wrong w/the
>> infra.  :)
>>
>> On Fri, Jul 10, 2020 at 9:28 AM Frank Yin <ukby.1...@gmail.com> wrote:
>>
>>> Agree, but I’ve seen a lot of kill by signal 9, assuming that
>>> infrastructure?
>>>
>>> On Fri, Jul 10, 2020 at 8:19 AM shane knapp ☠ <skn...@berkeley.edu>
>>> wrote:
>>>
>>>> yeah, i can't do much for flaky tests...  just flaky infrastructure.
>>>>
>>>>
>>>> On Fri, Jul 10, 2020 at 12:41 AM Hyukjin Kwon <gurwls...@gmail.com>
>>>> wrote:
>>>>
>>>>> Couple of flaky tests can happen. It's usual. Seems it got better now
>>>>> at least. I will keep monitoring the builds.
>>>>>
>>>>> 2020년 7월 10일 (금) 오후 4:33, ukby1234 <ukby.1...@gmail.com>님이 작성:
>>>>>
>>>>>> Looks like Jenkins isn't stable still. My PR fails two times in a row:
>>>>>>
>>>>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console
>>>>>>
>>>>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125536/testReport
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>
>>>>>>
>>>>
>>>> --
>>>> Shane Knapp
>>>> Computer Guy / Voice of Reason
>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>> https://rise.cs.berkeley.edu
>>>>
>>>
>>
>> --
>> Shane Knapp
>> Computer Guy / Voice of Reason
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>

-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu

Reply via email to