no, 8 hours is plenty.  things will speed up soon once the backlog of
builds works through....  i limited the number of PRB builds to 4 per
worker, and things are looking better.  let's see how we look next week.

On Fri, Jul 10, 2020 at 3:31 PM Frank Yin <ukby.1...@gmail.com> wrote:

> Can we also increase the build timeout?
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125617
> This one fails because it times out, not because of test failures.
>
> On Fri, Jul 10, 2020 at 2:16 PM Frank Yin <ukby.1...@gmail.com> wrote:
>
>> Yeah, that's what I figured -- those workers are under load. Thanks.
>>
>> On Fri, Jul 10, 2020 at 12:43 PM shane knapp ☠ <skn...@berkeley.edu>
>> wrote:
>>
>>> only 125561, 125562 and 125564 were impacted by -9.
>>>
>>> 125565 exited w/a code of 15 (143 - 128), which means the process was
>>> terminated for unknown reasons.
>>>
>>> 125563 looks like mima failed due to a bunch of errors.
>>>
>>> i just spot checked a bunch of recent failed PRB builds from today and
>>> they all seemed to be legit.
>>>
>>> another thing that might be happening is an overload of PRB builds on
>>> the workers due to the backlog...  the workers are under a LOT of load
>>> right now, and i can put some rate limiting in to see if that helps out.
>>>
>>> shane
>>>
>>> On Fri, Jul 10, 2020 at 11:31 AM Frank Yin <ukby.1...@gmail.com> wrote:
>>>
>>>> Like from build number 125565 to 125561, all impacted by kill -9.
>>>>
>>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console
>>>>
>>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125564/console
>>>>
>>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125563/console
>>>>
>>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125562/console
>>>>
>>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125561/console
>>>>
>>>> On Fri, Jul 10, 2020 at 9:35 AM shane knapp ☠ <skn...@berkeley.edu>
>>>> wrote:
>>>>
>>>>> define "a lot" and provide some links to those builds, please.  there
>>>>> are roughly 2000 builds per day, and i can't do more than keep a cursory
>>>>> eye on things.
>>>>>
>>>>> the infrastructure that the tests run on hasn't changed one bit on any
>>>>> of the workers, and 'kill -9' could be a timeout, flakiness caused by old
>>>>> build processes remaining on the workers after the master went down, or me
>>>>> trying to clean things up w/o a reboot.  or, perhaps, something wrong 
>>>>> w/the
>>>>> infra.  :)
>>>>>
>>>>> On Fri, Jul 10, 2020 at 9:28 AM Frank Yin <ukby.1...@gmail.com> wrote:
>>>>>
>>>>>> Agree, but I’ve seen a lot of kill by signal 9, assuming that
>>>>>> infrastructure?
>>>>>>
>>>>>> On Fri, Jul 10, 2020 at 8:19 AM shane knapp ☠ <skn...@berkeley.edu>
>>>>>> wrote:
>>>>>>
>>>>>>> yeah, i can't do much for flaky tests...  just flaky infrastructure.
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jul 10, 2020 at 12:41 AM Hyukjin Kwon <gurwls...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Couple of flaky tests can happen. It's usual. Seems it got better
>>>>>>>> now at least. I will keep monitoring the builds.
>>>>>>>>
>>>>>>>> 2020년 7월 10일 (금) 오후 4:33, ukby1234 <ukby.1...@gmail.com>님이 작성:
>>>>>>>>
>>>>>>>>> Looks like Jenkins isn't stable still. My PR fails two times in a
>>>>>>>>> row:
>>>>>>>>>
>>>>>>>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125565/console
>>>>>>>>>
>>>>>>>>> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125536/testReport
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Sent from:
>>>>>>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Shane Knapp
>>>>>>> Computer Guy / Voice of Reason
>>>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>>>> https://rise.cs.berkeley.edu
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Shane Knapp
>>>>> Computer Guy / Voice of Reason
>>>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>>>> https://rise.cs.berkeley.edu
>>>>>
>>>>
>>>
>>> --
>>> Shane Knapp
>>> Computer Guy / Voice of Reason
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>

-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu

Reply via email to